[
  {
    "path": "COCO_caption_prompts_30k.txt",
    "content": "A man about to return a serve with his tennis racket\nA horse drawn carriage in a historic city.\nTwo gray fire hydrants sitting next to each other at a park.\nA goat with horns is standing in a grassy field.\nSeveral buses parked under a carport in a parking lot.\nA bowl of soup with bread and a cup of coffee.\nThree polar bears walk across a snowy field.\nA glass mosaic vase filled with colorful flowers.\nA bunch of apples in large trays on top of wooden crates.\nA girl does a skateboard trick in the air.\nA dog rests in the grass next to a fire hydrant.\nA giraffe standing in the shade of a near by tree.\na plate of chilies with carrots and peas\nA motorcycle parked in front of a red brick wall.\nA tour bus stopped near a mountain while people gather nearby\nA young girl and her dad play with kites in a park.\na man who appears to be herding sheep is closing two big fence doors\nA man with glasses holding a glass of wine.\nGroup of skiers and different colored outfits on Ace Eastlynn.\nA fortune note on a tea bag next to a bagel.\na man walks along the beach with a surfboard\nThe view of a clean toilet surrounded by marble tile.\na couple of people that are skateboarding down the road\nA couple of cars are riding down the street from a window view.\nA pet crate and a lot of tools and wires.\na man holding a kite and a dog in a field.\nFour women with snowboards and gear are posing for a photograph near some snowy mounds.\nThe boy and his dog are posing for the camera.\nThe streets and the double decker bus are lit up in the night.\nA small vase has a good luck plant in it.\na man riding on the back of a bike on dirt ground.\nA group of people standing on the street.\nThe girl is riding her skateboard while using her cell phone.\nA person doing a trick on a skateboard in the road\nA lone zebra is walking in tall green grass.\nA fuzzy image of some people on skate boards.\nA man throws a frisbee to another man with two children.\na number of people in a body of water with a small boat\nA complete train set, with tracks, buildings, and three piece train.\nA couple of buses parked in front of a building.\nA girl wearing a wet suit surfing in the ocean.\nA picture of something and it appears like sustenance.\nA man on a skateboard on a concrete lip.\nA person walking next to a horse at a horse show.\nA man with a beard looks pensive and wears a tie.\nA blue vase filled with colorful flowers sitting on the ground.\nA street name on a sign built into a curb.\nTwo zebras are behind a fence on green grass.\nYoung person on the street skateboarding wearing a helmet.\nA toilet with a red seat in a small bathroom with red tiles.\nWoman lying in an unmowed field with a frisbee.\nCars and a bus driving down a busy road.\nA blurry image of some car lights on a dark night.\nA smiling young man stands beneath an Obama street sign.\nthis is a train riding under a bridge\nA group of giraffes eating bark off trees.\nA horse is looking in the living room window of a farmhouse.\nA man holding a frisbee on a beach next to another man.\na mixture of black and white sheep in a dried out field\nA chocolate caked covered in strawberries sitting next to a knife.\nTwo people standing next to each other on a snow covered slope.\nA young boy is flying a kite in the park.\nBaseball pitcher in the process of pitching in a baseball game.\nTwo men are in the water on a boat.\nA fighting plane turning sideways in a cloudy sky.\nSeveral red roosters together in a small area.\nA few small boats sail down a waterway.\nA dog and a cat are looking at the snowy front yard through a glass door\nA small child sitting on a shelf with teddy bears.\nOne giraffe standing behind a dead tree branch.\nA bright green kite with a scary monster face flying high\nA man holding a tennis racket and staring at the camera with pride.\nThis is a thing that is straightforward and plain.\nA large open field with small bushes and trees, and a giraffe standing in the middle of the field.\nTwo women who are sitting at a table together.\nA small dog eating out of a bowl on the floor.\na cow that is standing up eating a pan\nTwo people smile as they ride on an elephant.\nsome city workers work on a car crash\nA giraffe walking past a tree on a dirt landscape.\nA tow truck vehicle on a street in a city area.\nProfessional baseball player winding up to pitch the ball.\nA dog is wearing a baseball hat over it's eyes.\nA bearded man standing in front of bookcases\nA horse drawn carriage traveling away fro ma very large cathedral.\na man is standing inside of a food truck\nThere is a stop sign with two road signs on top of it.\nA girl looking a a beautiful view of the Rockies.\nA very big high ceiling room with a yellow fan.\nA man in an office chair looking at a laptop next to a glass of wine.\na brown desk a keyboard a computer and a monitor and speakers\nA woman talking on her cell phone while walking.\nAn abandoned train with lots of graffiti painted on it.\na person bending down cutting another persons hair\nA plane at an airport with a truck driving past\nthere are many men sitting at a small table\nA girl looking inside a living cartoon refrigerator.\nTwo sheep are in a dirt outdoor enclosure.\nPeople look on as a ball heads towards a batter.\nSeveral glazed donuts are lined up on a tray.\nA guy standing on a snowboard in the snow.\nThere is now image here to provide a caption for.\nA man practicing baseball on a field.\nA bedroom that has a large computer desk in it.\nA close up of a pizza with spinach on it.\na living room with a couch a tv and a table\nA man checks his cell phone as he walks to his car in the parking lot.\nSeveral people on skis in the snow outside of a lodge.\nA small pink beanie hat next to a cell phone.\nA mother handing her son a piece of cake on a  paper plate.\na man in a tuxedo sits at a table and uses a laptop\nA young man holding a Nintendo Wii game controller.\nA wrap of some sort on a plate with potatoes\nA motor boat next to a beach and others in the background\nA fruit bowl sitting on a table with bananas and apples.\nThree people running around in grass playing Frisbee.\nA  person sits down to their meal of a sandwich on a croissant with a side of french fries.\nA man is holding up an old cellphone\na couple of animals standing in a field\na cup on a table next to a tv\nA large brick building with a tall tower containing a clock near the top\nA figurine of a little boy riding a snow board in yellow pants.\nA memorial set on a fence by the ocean with flowers and teddy bears.\nA group of white sheep eating from blue bowls\nA couple of kids laying in a bed with an umbrella.\nA small table set with pastries and tea\nA rainbow lorikeet parrot eats sun flower seeds.\nA white plate topped with a sandwich and a salad.\nA woman getting ready to hit a tennis ball.\nA tennis player jumps into the air and swings his racket.\nA mustached man is standing in front of a larger mustache.\nStreet signs showing the intersection of Eight Mile and Shadyside\nA white desk has a computer, keyboard, globe and green phone.\nA man with bandaged hands lying in bed.\nA baseball player is trying to hit the ball.\na woman gets ready to pet a big horse\nA herd of black cows grazing in a field.\nA very pretty girl looking at her cell phone.\nA man rides a bicycle across a wet intersection.\na man with a bat walks as other look on\nTwo zebras are running through some high grass.\nA baseball stadium full of fans while two teams play ball.\nthere is two pictures of a female tennis player\nThere are flowers that are in a vase filled with rocks.\nFruit and vegetables are hanging in a metal basket.\nA boat that is sitting in the water.\nA living room with a couch, TV, and fireplace.\nsome people walking up a snowy hill with skis\nA woman is eating spam off of a plate with a camera next to her.\nA person that is playing in a tennis game.\nA red truck in street next to wall and buildings.\nA light with multiple bulbs is on a tall post.\nGrey toned elephant head closeup with grass and hill background.\nA man in striped shirt sitting on a fire hydrant.\nA person on a skateboard and bike at a skate park.\na close up of a red tie and a white and blue shirt\nA kid about to ride his skateboard down a pool.\nLIVING ROOM WITH COUCH, TABLE, END TABLE, LAMP, PHONE, AND MIRROR\nA happy boy is waiting outside with his suitcase.\nTwo men stand together using their cell phones.\nA group of men cutting into a celebratory cake\nA man eating a donut wrapped in tissue paper.\nA bedroom with a bed with blue cover and blue curtains, and a pair of shoes on the wooden floor.\nA penguin is standing and pecking at a teddy bear left in the snow.\nTwo brown bears sitting on top of a black and white checkered bed.\nTwo lambs with black heads look out from a gate.\nThis person is holding a cell phone while standing on the sidewalk.\nThere are a group of people snow boarding on the hill\nA jet on a runway near other jets.\nA baseball player is getting ready to go to bat.\nMan on a black motorcycle wearing a helmet.\nHorse standing in dead grass area near fenced field.\nsomeone is skiing through the trees by themself.\nThe zebras are grazing in the field together.\nThe cut sandwich has meat, lettuce and tomato.\nA small dog sitting on a stuffed animal teddy bear\nA person holding a slice of pizza in their hands.\nA man wearing a hat eating a hotdog at a sporting event.\nA painting of a man sitting next to a woman near the ocean.\nPeople are standing in front of a small store\na train on tracks at the front of the train depot\nElephants moving along on a very open field of some sort.\nA cat sits curiously perched in an empty cup\nLarge collection of scissors attached with price tags.\nAn older man and two kids sitting on a bench.\nTwo motorcycle riders are riding a motorcycle bike.\nPeople watching motor cross bike riders racing on a field\nTwo small children stand together scrubbing an elephant.\na close up of a shirtless man wearing a neck tie\nThe view of a distant mountain taken from an airplane window.\nA train on tracks in a city with high rises.\nPEOPLE BOARDING A BUS PARKED ON A STREET.\nA woman holds a decorative umbrella and walks with a man.\nA couple of cows standing on top of a grass covered field.\nA crossed eyed man holding a remote in his mouth.\nA single young giraffe stands and looks forward.\ntwo giraffes standing next to several huge rocks\nA person is flying a kite on the beach.\nTwo giraffes are standing next to a tall fence.\nthere is a person holding up a nokia lumia phone\nA boy sitting at a table while he puts something in his mouth.\nA man sitting in front of a laptop computer in an office.\nA woman is on the side of a mountain in ski gear.\nA view of a bed from across the room, it has a TV tray on it.\nView of traffic signal against a dark sky that looks like rain.\nA boy being affectionate with a baby on a bed.\nA man with a top hat on and a carrot in his mouth.\nThere is a stove and a sink in a narrow kitchen.\nA hand picking up a bunch of bananas from a display.\nA cow and a person on a horse in the dirt.\nMany doughnuts on a display in a store.\numbrellas, trees and a hut line a sandy beach\nA man eating a hot dog on a tray\nSome stacked with much sublime sustenance ready to eat.\nSeven people smile as they pose with tennis rackets.\nA school bus is parked by a street sign.\nA train coming out of an enclosure under a snowy mountain.\nA woman holding a surfboard walking into the ocean toward a dog.\nThree people on snowboards riding down a snowy slope.\nA black dog on a leash holding a frisbee in his mouth.\nA woman water boarding in a lake near land.\nA red stop light on a street at night.\nA kitchen with wood cabinets and white countertops.\nA bathroom decorated in pink ceramic tile and wallpaper..\nA platter on a table that has pizza on it.\nA snowboarding is doing tricks on a ramp.\nA desk with a computer, a keyboard, a mouse, a bobble head, speakers and a lava lamp on it.\nA man in a crowd balancing a skateboard on top of his head\nA bathroom with a paper dispenser, toilet roll and garbage bin.\nA seagull majestically flying through the air over the ocean.\nA bed topped with a colorful blanket and lots of papers.\nA person looking into a convex mirror on the front of a school bus.\nwild animals graze in fields in front of a lake and snow-covered mountains\nThe side dish of the meal consists of a macaroni salad.\nA plated meal on a table with flowers.\nTHERE ARE TWO ZEBRAS THAT ARE STANDING BY EACH OTHER\na cross walk sign in a busy city as light up the walk symbol\nA small cooked pizza on a dining table.\na close up of a bunch of bananas and a container of garlic\na couple of people sit on a horse pulled cart\nA wooden park bench sitting in front of a window.\nTwo white vases on a shelf next to a window.\nWearing shorts, a man holds up a snowboard while standing in the snow.\nTwo people playing tennis in a neighborhood park.\nAn outdoor area that has a glass top table with a plate on it and a blue vase with flowers in it.\nAn empty bathroom with toilet and pictures on the wall.\nTwo yellow bowls of food containing broccoli and potatoes.\nGroup of people paddling boats on water in front of a city.\nA child holds an object while someone else cuts it\nA man approaching a water ski jump holding on to a wire.\na toilet with a shower near by with tiled walls\nA skateboarder performs a trick while being photographed.\nA white fire hydrant near the address 700 Jones Street.\nA bed with red sheets on it and messy blanket and a lap top.\nA yellow school bus traveling a dark road.\nA boy wearing a helmet and using a skateboard on the sidewalk.\nA woman holding a knife over an unconscious woman.\nA close-up of a man brushing his teeth.\na man sits next to a child as he uses a computer\na kitchen view of cabinets a stove microwave adn refridgerator\nA dog and a railroad official and one person in train yard.\nA motorcycle is parked inside of a building.\nTwo elephants that are walking in the dirt.\nA baseball player throws a pitch while others watch from the dugout.\nA group of people walking around an area together.\npeople on a small boat in a body of water\na train station and a train removing much smoke\nThere is a elephant in the grass, there are also trees in the background.\nA small dog sitting on top of a couch cushion.\nA security officer using a segway as a footrest\nThe mostly eaten pizza slice is next to olive pieces.\nA baseball player starting his run for first base.\nA man standing on top of a tennis court holding a racquet.\nA post clock is positioned on the sidewalk with flags in the background.\nCars drive down a multi-lane road and pass businesses.\nInside of a living room with a sofa and several tables.\nTwo small suitcase is sitting in front of a white sheet.\na living room with couches and a table\nElephant with a brown eye hyper focused in the camera.\nA man in a white shirt has his hands on another man's shirt collar.\nA picture of five african american's sitting on a bench and chair.\nA counter holds tomatoes, bananas, pineapples and other fruits.\nA girl and a woman watching a candle being lit up on top of a cake.\nPair of giraffes walking on grassy area in enclosure.\nA red fire hydrant next to some stones.\na yellow and red apple and some bananas\nA green and yellow rain on tracks with building in background.\nA large elephant is standing in a fenced in enclosure.\nA boy on a boogie board in the snow.\nPhotos of sports memorabilia including shirts, caps, and baseball bats.\nA young child mutton busting at a rodeo event\nThe bald man leaning against the tree holds his face in place with one gloved hand.\nA young man with a neck tie untied around his neck\nSheep are running across a green field of grass.\na simple, normal toilet with the lid closed\nA cat look through a window at a dog.\nBusy city street with red signs on the traffic lights.\nseveral cows one lying and one standing in a dirt field\nA plastic container sitting on top of a table.\nAn old fashioned television and a newer electronic gadget sitting on top.\nA white pickup truck sitting in front of two wooden scaffolding.\nA child biting into a piece of pizza\nThe people are over by the cows in the water\nA couple of cows standing on a lush green field.\nA view along the transept of an older style church.\nThe bed in the room has been made with a large purple blanket.\nA table that has people sitting around it with food in front of them.\na group of small dogs are staring out of the window\nTwo people sitting on a park bench near trees.\nA table has a bowl, candle, and Christmas decorations on it.\nA Skyteam Delta airline passenger jet taking off from an airport.\na black white yellow blue green and red kite and a person\nA STREET SIGN POINTING IN WITH A TREE AND BUILDING IN THE BACKGROUND\nTwo different slices of pizza with tissue paper under it on a paper plate.\nPeople riding a small boat under a very large bridge.\nA group of people injured and covered in blood.\nThree fire hydrants that are green stands near a parking sign at night.\nA striped giraffe is grazing in the grass.\nthree chickens some water a fence and trees\nFour men play tennis together on a sunny day.\nA counter filled with coffee, cookies, and bagels.\na baseball player with a bat on a field\nA green truck with a canvas tarp over the bed.\nA man stands in a large kitchen holding a coffee mug.\nAn old motorcycle standing in a grassy field.\nA table with a white plate of food that includes salad and sausage.\nA cat on top of the counter sitting next to vegetables.\nA small green leafy plant in the ground.\nTwo people eating hot dogs on a busy sidewalk.\nA bird perched on the limb of a tree\na person riding a snow board on a snowy surface\nA boy doing a trick on a skateboard off a rail.\nTwo woman are standing behind a large teddy bear.\nA couple of people standing in the water under a kite.\nA close up of the front of an old locomotive.\nThe man in the hat is carrying an umbrella.\nA red brick building next to a green door.\nA dog looking at a book called \"The Marriage of True Minds\" by Stephen Evans.\nThis is an image of an Air Canada plane flying.\nA bird perched on a plant in the middle of a forest.\nA living room filled with brown furniture on top of hard wood flooring.\nA black computer keyboard in a dim room\nThe traffic lights are clearly visible for us to see.\nA couple of birds sitting on top of a rocky beach cliff.\nA couple of people throwing a Frisbee in a field.\nA sandwich with a dipping sauce served on a plate\nRoad sign for the corner of Jackson and Montgomery\nA large group of giraffes roaming around in an enclosure.\nStainless steel industrial stove sitting in a white and black kitchen.\nA little girl is posing for a picture and holding an umbrella.\nA couple of people with surfboards on a beach.\na woman is standing in front of a giraffe\nA man in a blue shirt looking at his cell phone.\nA man on a skate board jumps high in the air for a trick.\nA woman sitting on the grass behind a pile of stuffed animals.\nThe barbecue sandwich is on a plate near a glass of wine.\nWrapped utensils are a part of a sterile and healthy meal.\nA metallic refrigerator freezer sitting in a kitchen.\nThe child is putting the tooth brush in his mouth.\nA white comforter with a toy, book and child shirt on the top of it.\nA picture of a room in a house.\nThe wildflower is sitting in the glass of water,\nA girl showing parrots to a group of children\nTwo zebras fighting outside in an opening near some trees.\nTwo surfers walk down the beach holding their boards.\nA group of people in a courtyard next to a pavilion.\nA renovated kitchen with wooden cabinets and white refridgerator\nSome soldiers are standing in line for food\na man on a surfboard surfing a wave\nTwo giraffes are standing together in the wild.\nThe boat and the truck are parked by the dock.\nOranges and bananas sitting in a stack together.\nAn intersection with traffic lights and a street sign.\nThree horses standing close but in an open field.\nA person is riding a motorcycle in the mud.\nA statue of a giraffe is in a Children 's Hospital.\nA man is standing in a room with something yellow.\nA mother elephant and her tiny calf walk through the trees.\nA breakfast plate with eggs and meats, served with a gourmet coffee.\na black and white photo of a person holding a skate board\nA boy swinging a baseball bat at a ball.\nA meter maid car is by a fire hydrant.\nA person flying a kite over another person on a roof.\nA  car parked behind a wooden bench.\nA young skateboarder rides down the street alone.\nA row of bikes parked along a sidewalk beside some cars\nThe bathroom in the home was just cleaned.\nA circus act with five elephants and some women put on a show.\nThere is a muffin with white frosting and walnut bits on it.\nA curious giraffe leaning over into a car at a zoo field.\nTwo signs one with the speed limit and one telling what freeway is which way.\na lady that is on a tennis court with a racket\nA man sitting on his couch using his laptop\nA statue of a cowboy on a horse in the middle of building.\nThere are street signs that show a direction of travel\nA couple of small beds and mirror in a room.\nA couple of large jetliner sitting on top of an airport tarmac.\nThe room is decorated in terra cotta tile.\nA white plate topped with a pizza next to a bowl of salad.\nMany suit cases are stacked on top one another\nA gray and white cat sitting in front of a mirror.\nA dog riding on the back of a motorcycle down a street.\nA coffe and plate of bread sit next to a pillar.\na red and white tail of a large plane\nSeveral pilots walking as a group across a street.\nA man riding a skateboard on the side of a ramp.\na woman walking by a display with teddy beas and bottles\nA street sign on a light pole near on a city street.\nA male standing behind eight pieces of luggage.\na vintage black and white picture of a train\nTwo colorful umbrellas open against a blue sky.\nA small bookshelf is filled with books and decorative items.\nSeveral people walking in the snow, some carrying skis.\na bottle of whiskey and a bottle in a brown bag on top of a fridge\nWomen waiting for luggage at an airport luggage carousel.\nTwo sheep standing side by side at a petting zoo.\nA pizza pie sitting on a board on a table\nA couple of airplanes that are on a runway.\na bathroom with white walls and brown tile\nA woman walks down the water with a surfboard.\nthis is an image of a train with black smoke.\nA lady is on the entrance of a train holding her luggage.\nA man is catching a frisbee while playing a game.\nA skateboarder with his skateboard is sitting on the side of a ramp.\nOn a bright day, a young elephant in partial shade near a tree.\nYoung people stand near a bus with a large amount of luggage.\nA bear costume cutting some cake with a Park ranger.\na large clock reading 5 54 on the side of a building\nAn unfurnished room contains a sleeping bag on the carpet.\nA little girl that is standing next to a horse.\nTwo white bowls with vegetables, meats and herbs and chopsticks nearby.\nA kitchen with a stove, refrigerator and cabinets.\na small child is sitting on a bench outside\nBrown, white, and black rams eating on a hill.\nA skateboarder performing in front of a crowd riding a rail.\nThis is an image of scooters and bicycles.\nA cell phone peeks out of a crocheted cell phone holder.\nA kitchen that has white cabinets and black counters.\nMan on cellphone behind curtain while art displayed in front.\nTwo men in bucket hats taking frisbees out of a frisbee golf bucket.\nA snowboarder grabs his board while high up in the air.\nA woman's eyes are hidden by the cast of a shadow.\nA black and white cat sitting in a bathroom sink.\nA stop sign in the grass beside an old farm silo.\nAn Asian family that is eating pizza together.\nA yellow train traveling through the green countryside.\nA man and woman are standing beside each other playing a video game.\nTwo cats lounging on the back of a couch.\nThis picture shows sand, water, and some type of silver and red pole equipment.\nA cut dog in a basket with orange ears.\nAn adult and child are skiing in the snow.\nTwo children playing with the knob on money meters.\nFlowers sitting in a glass vase on a desk.\nA laptop computer is seen sitting next to a television.\nA small bathroom has a vanity, mirror, toilet and bathtub.\na black and white photo of people eating\na group of zebra standing in the sand in a fenced area\nA man standing by a kitchen counter doing something\nA bird is sitting on a silver truck\nA mother and son sitting in a bed with two cats\nA large piece of meat surrounded by vegetables.\nWhite oatmeal sitting next to toast, coffee, and orange juice.\nA bathroom with a shower combination tub and sink.\nA white and brown cat sitting on the shelf in a cabinet.\nA man is driving a small train with children.\nA man and a woman cutting a sheet cake with a knife.\na man cooking some hot dogs on a grill\nsome kind of chicken, rice, and vegetable dish on a pizza tray being served to a man.\nOne man stands on top of the train while another man stands on the platform.\nA piece of cake is served on a plate.\na close up of a child on a skate board\nFruit juice is spilled all over a counter next to a knife and two pastries.\nA junk pile that looks to be piled with old bathroom sinks.\nTwo black and yellow circular clocks affixed to an office building.\nThere is a single bed in an old room with a window.\nAn a kitchen is being cleaned and decluttered.\nColorado Rockies' pitcher about to release ball from mound.\nA bird perched on top of a tree branch under a light blue sky.\nThere's a desk with a laptop, phone charging, and other various electronics.\nThe two men are standing outside by the tail of the airplane.\nA couple of brown horses pulling people on a wagon.\nA small plane flying over an ocean with waves\nThe large SUV drives along a busy street.\nA row of motorcycles parked on a city street.\na kitchen decorated with a couple american flags\nA room with a chair, a piano, and a laptop.\nA cat sitting on the side of a car door window.\nTwo pelicans on the sidewalk in the foreground with several more in the water in the background\nA black and white photo of a man and woman sitting on a bench.\nA picture of a thick crust pizza and a bottle of wine, setting on a table.\nA MAN WEARING A SUIT AND A TIE STARING\nMale surfer riding a large wave with sun low in the sky\nA snowboarder hitting a trick on a trail, jumping over a person.\nThe man is feeding the elephant with milk\nA bathroom with white fixtures and blue accessories\ntwo brown horses in a field gazing around\nSeveral people that are playing video games together.\nA plate with beans, broccoli, small sausages, fork and a small container.\nA black and grey double decker bus next to a building.\nA smoking women in a scarf makes a phone call.\nA person is traveling down the road on a motorcycle.\nA group of men in colorful jackets skiing down a hill\nA man and two others skiing across a snow covered field.\nA man is standing over a black motorcycle.\nA KITHCEH WITH A MICROWAVE SINK AND REFG\nA jar of peanuts and a cell phone sit on a laptop computer on a cluttered table top.\nA woman holding her child so she can see her birthday cake.\nthere are two giraffes that seem to be embracing each other\nA table with wine glasses and people on the counter\nA wooden bench sitting on top of a green grass covered ground.\nA long couch with many pillows, a table and some seat cushions around it.\nSix people are paddle boarding in the ocean.\nTwo young men play a game of soccer on a field.\nA black and white image of tennis players.\nThis is an aerial view of a tennis player hitting the ball.\nAn item is capture here in the photo.\nA cow grazing in a field next to a fence.\nAn individual is taken in this very picture.\nPeople are outside flying kites in the sun.\na person riding a skate board on a street\nThere is a flower display in the corner of a room\nA man pulling a sled behind him while using ski poles.\nA person playing a game of tennis and other people watching.\nA decorative congratulations cake for a graduating student.\nA group of young people standing next to each other on top of a field.\na semi truck loaded to the top with sheep\nA white truck crosses an intersection behind a traffic light.\nTwo street signs sitting on top of a metal pole.\nThe man in the red shirt is going to hit the ball with his racket.\nA shop called Pendulum with a clock out front.\na traffic light on the side walk of a city street\nTwo signs above a blue pole under a blue sky.\nA skateboarder is getting ready to skate down a ramp.\nA man wearing a black vest and black glasses.\nA bicyclist stopped beside a fence feeding or petting sheep.\nAn orange and white bus crossing under a blue footbridge.\nA flock of birds landing in a field of grass\nA man reaching into a bucket near an elephant while another elephant stands near a pond in the back.\nA cat standing on the keyboard of a laptop.\nAn empty and open silver metallic refrigerator in a kitchen\nCouple of people about to share kiss in front of wooden building\nA person para-sailing in the water with mountains in the background.\na yellow taxi riding down a street that has a building with clock\na snall toilet and a sink in a bathroom\nA zebra standing on a grassy pasture in the daytime.\nA group of people playing a game with remote controllers.\nA fighter jet with two streams of smoke coming out the back.\nA bike parked on the side of a city street.\nA glass plate topped with sliced apples and caramel.\nA player swings at the ball during a baseball game.\nA dog and man sit on rocks by water.\nA bunch of food on a tortia in some foil\nA woman tennis player is in a cropped photo.\nA plate of chicken, rice, and some vegetables.\nA man is staring at the viewer while a man plays a guitar and a woman sticks her thumb up sitting on a busy sidewalk.\nA large jetliner sitting on top of a runway.\nA person presenting a birthday cake to another.\nA woman with a child on skis go down the snow.\nA fridge in the middle of some cabinets\nKitchen knives and scissors are stored in a wooden holder.\nA man speaks to some children on a farm.\nTwo zebras are walking along a path outdoors.\nA woman lying in a bed looking at a laptop.\nA lot of motorcycle people that are on the road.\nA dog sits on a rug with its eyes closed.\nA pick up truck parked near a strange house\nA deformed orange sitting on top of an orange tree.\npeeled banana sits on a table uneaten and ripe\nA clock hangs from the wall of a beat up room\nA woman sitting on a chair blow drying her hair.\nA white plane with two people standing in front of it\nA young girl with a cape holds onto a kite.\nThe puppy is eating food from the tiny bowl.\nA hummingbird is floating next to the feeder.\nA man riding a surfboard on a wave in the ocean.\nA werid skirt like outfit on a person.\nA security officer is setting up traffic guiding signs\nA tropical beach with a banana tree in the forefront.\nMilitary officer in dress uniform with many medals.\nA boy doing a trick on a skateboard on a ramp.\nA young boy is eating a meal in his pajamas.\nThis is a vintage photo with four men in it.\nA dog is laying on the bed like a person.\nPlayers react to the ball being hit at a baseball game.\nA toilet with its lid raised in a stall.\nYoung boy and his plastic skateboard at home\nSet of toy animals sitting in front of a red wooden wagon.\nA silly brown dog wears sunglasses as it sits in a car\nBroccoli is on a cutting board and is being cut in to smaller pieces.\nan image of a tennis racket and tennis ball\nA snowboarder sitting down with his snowboard on his feet.\nA tennis player pauses during a game in a public tennis court.\nA group of people on skis and snowboards outside.\na couple of trains parked on some tracks under a closed roof\nRed double decker buses on a city roundabout.\nthere is a game that is ging on at thte gym and people aer looking\nThe man is wearing a tee shirt and a tie-dyed tie.\nA group of motor cycles parked on the street\nA set of two pictures showing a group of young people standing under a gazebo and next to surfboards.\nA man is shown feeding an baby elephant.\nThere is no image here to provide a caption for.\nThree zebras are shown in a black and white photo.\nA Chinese public train waiting at the station.\nA man walking along the shore with a surf board.\nTwo giraffes looking at a photographer inside of a barn.\nA dog looks up at a flying disk.\nA group of men standing on a city street.\nA dog stares intently off to the left in front of a glaring TV.\nA small pizza sitting in a frying pan of food.\nA mirror that is on a tiled wall.\nA black and white photo of people waiting at a boat ramp\nModern jet airplanes lined up on the runway ready for take off\nTwo white cows sitting in a farm area.\nA stop sigm at an intersection with some graffitti on it.\nA seagull at the beach with food in its mouth\nA passenger bus that is driving down the street.\nAn umbrella strapped to the cross bar of a bicycle\nView off the wingtip of a passenger airliner on a taxiway.\nA cute little girl smiles for the camera\nA city with traffic lights, cars and buses.\nA man skate boarding in a pool with another man looking on.\na young woman cuts up some food on a trey\nTwo men skateboarding with a light and a camera.\nA group of drinking glasses sit along a bar, with two people nearby.\nA person on a motorcycle on a track near another person.\nA food item is shown on a napkin.\ntwo giraffes are standing in the open field.\nA man riding the back of a brown horse.\nA bunch of statues that are in the grass.\nA man riding a snowboard down a snow covered slope.\nA man swinging a baseball bat at a ball.\nA cat sticking its head out of a cement wall looking up.\na few cowboys stand watching some animals outside\nThe woman is riding the horse on the course.\na woman sitting on a wooden park bench smiling at the camera\nA cat sitting on a wooden chair in a room.\nThe animals are grazing on the wheat grain\nTwo men playing frisbee in a park\nTwo people standing in a market by a fruit stand.\nSeveral cars are seen going down a city street.\nA grown elephant and a young elephant roam freely together in an open field.\nWoman in sunglasses hugging a red fire hydrant.\na teen standing on a skateboard while riding part of the wall\na close up of a cat laying in a luggage bag\nA big family pose for a picture with a surfboard\ntwo motorcycles line up as they lean against some seats\nTwo puffins sitting in some grass on a mountain.\nMany young men pose for a picture.\na person showing cellphones on sale in a shop\nA YOUNG GIRL ON A SKATEBOARD IN A PARKING LOT\nA man is posing excitedly on a surfboard.\nA very nice boat on the water with a dog on it.\nA young man holding a Wii mote plays a video game\nA stack of suitcases stacked in a front lawn.\nA gang of bikers riding down a street.\nLittle girls play soccer on a field on a sunny day.\nA living room and dining area with hard wood floors.\nA bunch of ripe oranges are stacked neatly on top of each other.\nSomeone is making a sandwich consisting of carrots and alfalfa sprouts.\nA brown cardboard box filled with bananas, apples, oranges and kiwis.\nThe young kids are playing a game of soccer.\nThe kitchen bar is near a dining room table.\nA food container with five sections filled with various items.\nA woman sits on a brick wall, holding her umbrella, looking out at the city.\nWoman talking on cellphone in a dining room.\nA small industrial machine car on train tracks.\nA desk witgmh a telephone, laptop, cell phone and a book on it.\na man in glasses is playing with a white controller\nA BIG  BOX OF TOMATO AND BASIL PIZZA.\nA large group of people on a field playing soccer.\nA yellow and black train is on a train track.\nA black and white cat curled up on a brown checked sofa.\nA man is talking a picture of a bus.\nTwo horses standing near each other in  a field\nA lone swan swims in a river near a bridge.\nWoman playing tennis with bleachers in the background.\nA vintage photograph of a war plane flying\nTwo food items are displayed on separate plates.\nA brown and gold fire hydrant in front of a brick building.\nTHIS IS A PICTURE OF A FEW ZEBRAS GRAZING IN A LARGE ENCLOSER\nan image of umbrellas lined up with tables\nA group of sheep walk along a dirt path.\nA desk with a pc, monitor, laptop, mouse, and stuffed animal\nA gray and white cat laying on it's back with it's head looking up in a open drawer.\nA man walking a bike near a train station.\nPhoto of a living room with a Christmas tree in the corner.\nA toiler and some buckets in a small room.\nLarge group of ships tied together at a peer.\nAn empty road with a red stoplight that spans voer the road.\nA black dog playing in the ocean while barking.\nA dark gray bird flying towards a palm tree\nA motor home parked along side an outdoor flea market.\nA closeup of a wine glass and a wine bottle\nA boy jumping up over a bench on a skateboard.\nA huge, captive fish gapes his mouth open at a woman taking a photograph in an aquarium.\na person reading a book and cooking food on a stove\nOnlookers watch as a skateboarder performs a jump.\nSome wooden benches are in the middle of the forest.\nA man standing on top of a sidewalk holding a skateboard.\nA laptop sitting on a living space table with a spacious desert view.\nfour colorful vases of different types are sitting on a shelf.\nA close up of a stop light positioned against a high rise building.\nA room features two identical beds with stools at the end.\nA computer screen showing photos on it while a smokestack is visible out the window\nNight shot of skateboarders in wide open area with lights above.\nA couple of giraffe standing next to a zebra near a rock wall.\nA table holding a white plate with bananas and a brown glass.\nA bus that is travelling on a road in a town that has many houses and buildings.\nA pair of giraffes grazing on hay by a fence.\nA couple of plates of sausages, broccoli and purple food.\nA family of giraffe on a wild field next to zebras.\nA large cow standing in a grassy field near other cows.\nA tennis player is jumping and reaching to hit the ball.\nA couple of people on a wall playing with a Frisbee.\na laptop on the ground near a turn table\nAn old clock with a flower design in a small room.\na kitchen with a refrigerator near a sink\nMen loading luggage from a train onto a cart.\nA fan closely watches the professional baseball batter\nLittle boy and girl sitting on the porch eating their meal.\nA man standing behind a display case filled with jewelry.\nA black and white photo of a man fixing anthers tie.\na person preparing an authentic pizza on a wooden spoon\nA street sign next to a traffic sign next to tall buildings..\nHerd of wild cattle walking along the beach\nA city street with people out and two large buses\na small dog with some glasses over its eyes\na dormitory consisting of many beds lined up along the wall\nA kitchen has a stove,microwave, and wooden cabinets.\nAn electric commuter train on the tracks under a cloudy sky\nThe person is wearing black clothes, shoes, and hose.\nThere is a child that is walking in the gradd\nHundreds of bicycle enthusiasts embark on a race on a city street.\nA man in the air skateboarding at the park\nA close up of a Harley Davidson parked on the road.\nA man holding video game controllers and playing.\nGray and white bird with red crest using bird feeder.\nA white car passing a person in a black jacket.\na large train is going down the tracks outside\nTwo ponies together standing on a mountainous terrain.\nA small bathroom with a shower, sink, and wooden medicine cabinet.\nSmall train coming out of a tunnel on an overpass.\nA group of people who are standing in the dirt.\nbread with banana milk and nutella on a table\nA clean and tidy kitchen with a stove, dishwasher, microwave, widow and a door.\nA cat sleeping on top of an open laptop computer.\nTwo motorcycle riders talking on the side of the road.\nA bunch of people is watching something and a man in a brown and blue stripped shirt has his fingers in his ears.\nA ukelele is passed over a table with cake and lots of food.\nA giraffe is standing near its fenced area observing.\nA pasture with sheep in front of a large home\nA large group of birds sitting on metal pipes in the water.\nA group of people sit down at a table to share a meal.\nA man pinning a number to a child's shirt.\nA dog sleeping on bed against the wall.\na room with wood flooring filled with furniture.\nTwo zebras stand close to each other in a field.\na person on a bicycle a bus a truck and a child\na vandalized stop sign in the dark with a sky background\na red and yellow trains engine pulling its cars and some tracks\nA woman sitting on a bed with a laptop.\nTwo people with surfboards are standing in a sandy parking lot.\nA PERSON JUMPING FROM A SLOPE ON A SNOWBOARD\nLong bamboo poles with umbrella tops in front of the sky\nA stove is away from the wall in a kitchen area.\nA trolley driving down a street lined with tall buildings.\nA giraffe looking concerned on a grass field.\nTwo metal lamps are placed beside a window.\nA close-up photo of a pool table with a man playing.\nA group of different parking meters displayed together.\nA boy is doing a trick on a skateboard.\nA black and white photo of a city street with old cars and people on it.\na number of people standing around a large group of luggage bags\npiece of cake with a plate and fork\nLarge giraffe roams in the lush green vegetation.\na blue and pink kit with streamers flying in a clear blue sky\nA man pushes a brightly smiling little girl on a swing.\nTwo stuffed bears that are next to each other.\nA car that is parked in some snow.\nA zebra standing in some brush without leaves.\nA bunch of people walking across the air field to get to their plane.\nA man hitting a tennis ball with a racquet.\na close up of stuffed animal with metal pieces on his chest\nA person is on skis in a very snowy place.\nairport coming in to dock at the airport\na person on a city street operating a cell phone\nTwo man standing near each other in a park.\nA man is eating a hot dog and talking to a young girl.\nA person bending over to adjust a child's skis.\nA woman in a boat eating a sandwich.\nA person standing on a mountain top with some skis.\nA group of cattle walking across a lush green field.\ntwo giraffes standing under a tree to get some shade\nAn owner plays tug-of-war with their Golden Retriever\nA dog looking up and running to catch a frisbee.\nan open toilet on the side walk of a street\nA man with a very bright orange hat sitting in a car.\nA traffic light next to a busy street in front of a brick highrise building.\na person standing near a bush near an elephant\nA woman that is standing on a sidewalk.\nThree park benches are in a garden type setting.\na bathroom with a toilet and a bath tub\nA flower vase in the center of the kitchen table.\nA bull walks up to a pile of wood and a teddy bear.\na couple of coaches in a cluttered living room\nTHREE BOYS RIDING THEIR BICYCLES ON A STREET.\nthere is a male baseball player about to throw the ball\nA couple of people standing around holding snowboards.\nA photograph of a highly decorated cake on a table.\nA passenger bus that is driving down the street.\nA large building with a railroad crossing near it\nA bike is covered and parked on a street.\nA small bird in a tree with red fruit on the  tree\nA cat looking out a window at a bird.\nHigh school girls soccer game action shot of green versus red team.\nA person with a lighter lighting several sticks.\nReflection of a school bus in its own side view mirror.\ntwo people in costume pose for a photo\nbrown bathroom with white toilet and white sink\nA pretty young lady eating a hot dog on a bun.\nAn elderly man blowing out birthday candles on his cake.\nA plate filled with breakfast foods sits on top of a wooden table.\nThis clean bathroom has a tile floor and a brown toilet lid\nA woman with a painted face is on a phone.\nA green bird bath decorated with various jewels\nA fancy clock graces the corner of this old building.\nA cat sitting in front of a television watching a hockey game.\nA chair and a blue umbrella are attached to a wheel.\nPeople sitting around an oval table in a restaurant posing for a photo.\nA street scene with a truck and trailer in the foreground.\nA locomotive on train tracks in a wooded countryside.\nA train riding pass a platform and buildings.\na stuffed sandwich with meat, cheese and pickles\nA man riding on top of a board on a wave.\nA man and woman making a cut into a wedding cake\nSome cows stand beneath the shade of some trees.\nPink lunch box with compartments for all types of food\ntwo white birds flying over the sea water\nThe lobby has a few people in it but for the most part it isn't very busy.\nA double oven with one side completely full and the other empty\nMan are standing near a couch holding Wii controllers.\nA pizza on a board with a pizza cutter\nA BABY EATING A MEAL WITH HIS TOY DRUMMER BEAR\nA tan dog laying next to a park bench.\nA skateboarder performing a jump off the edge of a stone wall.\na cat sits on the floor looking at the camera\nTHERE IS A CITY BUS ON THE STREET\na couple of people that are sitting on a bench\nA person sitting in bed with a dog on his lap.\nA seaplane is docked near a residential area.\nAn umbrella and rain boots in a corner\nA laptop and a desktop computer sitting on a table.\nA zebra and a giraffe walking in opposite directions.\nTwo children staring out a window while on public transportation.\nGiraffe and zebra grazing in a field next to plants.\nA stop sign is shown on the side of a corner.\nA modernist kitchen, with a white and aluminum color theme.\na large room that has a big kitchen table in it\nA baseball player hitting  a baseball with a bat.\nA baseball player running to catch a baseball during a game.\nA large white bus on a city street.\nA woman putting post it notes on a wall in a room.\nThe baseball player is practicing his swing for his favorite game.\nAssorted pastries and tongs have been arranged above stacks of plates.\nA cellphone and a remote control sitting on top of a book.\nA woman is skiing down a high mountain.\nA man standing on a tennis court holding a racquet.\nA herd of elephants with birds at sunset.\nA white plate topped with broccoli and meat covered in sauce.\nA woman is smiling and holding a monkey.\nA black and white photo shows workers working on a road.\nA small girl holding skies in the snow\nA person is running with a kite in the air.\nPeople watching two elephants from behind a cement platform.\nA man flips a skateboard while doing a trick.\nThe toothbrushes have a holder on the bathroom sink.\na train station with a train sitting parked in it\nA guy on a skateboard at the top of a concrete bowl.\nMany stuffed animals hanging and sticking to a tree.\nSeveral \"One Way\" Signs are placed near an \"All Way\" Stop sign.\nA man riding a surfboard on a wave in the ocean.\nTwo young boys eating carrots while sitting on a bed.\na long train is going down the rail road outside\nFive benches in the park in an area surrounded by trees\nsome people are pushing a truck in a lot\nA group of small children having a birthday party.\nA dish contains carrots, onions and other vegetables.\nThe two baseball players are walking on the sidewalk.\nA room with decorations on a shelf and a painting on the wall.\na close up of a laptop on a desk\nA laptop computer is sitting on a table top.\na couple of people on skate boards do a trick\nA closeup photo of a bulldog wearing an Army style hat.\nA sprinkled doughnut with pink icing sitting on a plate.\nA kitchen area with refrigerator in the background and a sink and stovetop oven on the side surrounded with wooden cabinets.\ntwo zebras are in their pen at the zoo\nA baseball player holding a bat on top of a field.\nA couple of skateboards, two sitting on the sidewalk, the other on the board.\nA red stop sign sitting above a traffic light.\nA desk with two computers, phone, and other accoutrements.\nA teenager in wild clothing playing a video game\nBlue, pink, purple, and yellow flowers are in a red vase.\nA hot dog with a large amount of cheese.\nA city street filled with traffic at night\nThe box of a dozen donuts has two different flavors.\nA living room setting with furniture and lamp\nThe three men are dressed in costumes.\nA red double-decker bus driving down the road.\ntwo giraffes and one is eating some food\nAn iPod and a laptop computer on a desk\nan orange and white cat and its orange play toy\nThe closeup of a clock on the face of a tower.\nLarge poster on wall behind white commode in dark tiled bathroom.\nA dish with shrimp and cucumbers and lettuce.\nA man with a skateboard that is up in the air.\nChildren are looking at a zebra in an enclosure.\nPeople are sitting on a motorcycle with a woman standing behind them\nOriental woman preparing to put a toothbrush into her mouth.\nA table with a laptop, bag of coffee and cellphone on it.\nA girl excited about a cake at her table.\nDog laying down on a grey and yellow striped couch.\nThe top of a desk with a keyboard, computer and phone.\nA coal fired train with passengers behind a split rail fence.\nA collection of fine furniture is displayed in a room.\nA transit train badly in need of a paint job\nA plate with a roasted carrot and broccoli on it.\nA woman holding a tennis racquet prepares to play tennis.\nA person gets ready to swing their racket.\na girafee looking around by some people\nA girl stands on a bed and appears to be crying.\nA very large building with a tower near some water.\na close up of person sitting with a laptop\na cream colored dog lying on a brown carpet.\nWhite toilet with a shower with a tree on it beside it.\nA catcher has his mitt out as a baseball batter swings his bat and hits the ball.\nA man who is sitting at a table with a plate in front of him.\na lady covering herself from rain with an umbrella\nA man wearing a gas mask and a suit and yellow striped tie.\nA glass bowl filled with noodle salad on to of a table.\nTwo elephants are standing on the grass near some trees.\nThree boys sitting in chairs with game controllers.\nA city intersection with a sign redirecting traffic.\nNice looking front room with brown furniture to decorate it with.\nA train with a red caboose sitting on tracks.\nMilitary looking truck parked in an old warehouse\nThe living room actually features several different colors.\nSmall pizza sits on a plate on a restaurant table.\nA man holding a bat next to a catcher and umpire.\nThe teddy bear was left on the empty bench.\nvery many pizzas in a plate in a kitchen\nA black microwave on a cabinet in a hotel room.\nMan flying a tailed kite high into the sky\nA man that has a gold tie on.\nThe outdoor furniture with a table umbrella is made out of wood.\nA family sitting at a booth in a restaurant looking up.\nTwo women standing on tree stumps with a boy and a teddy bear\nA group of men let their horses drink from a fountain.\nThe girl is eating her pizza with a fork.\na grey dog seated on a chair of a vehicle\nA brown cat lying at the back of a car\nA group of kids standing beside an opened fire hydrant.\nA herd of giraffe running across a field.\nA person rides a horse in front of a large group of people.\nA woman sitting with her legs crossed on a bench in a green field.\nA man riding a skateboard on top of a cement park.\nBRIGHT RED FIRE HYDRANT WITH A SIGN NEXT TO IT\nThe cat is laying partially in the light with its eyes closed.\nA couple of men playing soccer against each other on a field.\nTwo people work on a shed while standing on a tractor.\nthere are two grizzly bears walking down the gravel road\nTwo men running and playing baseball with plate and grass\nA couple women with remotes in a room.\nA red trolley passing by a group of people under umbrellas.\nA young girl standing on top of a tennis court holding a racquet.\nA large open concept living room leads into the dining room.\nA person with a red umbrella walking towards a bike chained to a lamppost.\nA bunch of broccoli spread out on a table\nA closeup view of food on a plate\na dried out tree with fruit hanging on it\nA small and a large teddy bear sitting in plants\nView from the stands of sparsely attended tennis match\nA man that is sitting down next to a cops motorcycle.\na brown table with a toaster a plate and a black microwave\nA snowy road with snow covered trees on which a skier is traveling.\ntwo guys are outside moving a refrigerator\nAn Asian woman in front of a body of water with two umbrellas\nA man standing on top of a blue tennis court.\nSigns at a city intersection indicate no turning is allowed.\nTwo elephants walking in water next to grassy area.\nA man on a skateboard doing a trick.\nWoman in a white uniform holding a pencil to wall.\nA small boat with several flags on it moving across the water.\na big building with a clock built inside of it\nA market has many fruits and vegetables on display for sale\nthere is a large truck and a yellow truck behind it\na store front has many stuffed animals on display\na tennis getting ready to hit the tennis ball\nA group of sheep walking down a path with a few stopping to eat grass along the side.\nA small cat laying on a couch in a room.\nA fire hydrant behind a gate on the sidewalk.\nGiraffes eating leaves from bushes near logs on sand.\nA man sitting in a chair looking at someone's food.\nA zebra with its mouth open and lip in the air.\nA man holding skis and poles and walking up a snow covered mountain slope.\nA man is cross country skiing on a bright day.\na woman in white is cutting into a cake\nA person surfing a white water rapid.\nA glass vase with yellow flowers in it\nA rear view mirror on the side of a car reflecting a mountain range.\nTwo sheep are in a barn standing next to each other.\nA double Decker bus is traveling down a street.\nAn old steam locomotive waits at a country station.\nA snow covered parking lot meter in front of a building\nLondon double decker bus in motion on street\nThe horse is grazing in the fenced coral.\na zebra is laying down in some dirt\nA man in his skiing gear is on his board looking on.\na guy on a surfboard with a kite attached to it.\nA sheep with its new born lamb in a field.\nA toy ship made out of Legos is attached to the side of the refrigerator.\nA person up in the air on a skateboard.\nElderly man sitting on a bench facing the beach.\nA bathroom has pictures hanging on the wall.\nA small hot pink bathroom with a few touches of royal blue on the toilet is shown.\nA father and son in a kitchen preparing a meal.\nThe parking meters are posted beside a cement wall.\nthese men are playing a sport in a field\nA large city bus making a turn at a crosswalk with a clock tower behind it.\nTwo woman play Wii video game with wireless controllers.\nA square of cheesecake on a marble cutting board with a two-pronged fork.\nA plant in a blue cup on a windowsill.\nA bright red bench sitting in front of a decorated store front.\nA man has some food hanging out of his mouth.\nA group of people standing around each other near a street.\nA child at a table sitting in front of a birthday cake.\ntwo giraffes standing next to some trees\nMany people are standing next to a very large plane with its bottom doors open\nA man in black shirt and white shorts playing tennis.\nTwo different transit trains can be seen in this photograph.\nA group of people around a table with a blue tablecloth\nA bedroom with two beds and a table with a lamp.\nA bear jumping into a pool of water.\na man with a napkin at his neck eating a dangling food\nAn elephant walking by a group of ATVs.\nTwo people that are sitting on a table.\nTwo photos of a young man in a suit and tie\nA young boy with a helmet rolling down the street on his skateboard\nCommercial airplane flying in the air on a cloudy day.\nThe neat bedroom has a large window in it.\nTwo cows on a grass covered hillside on a sunny day.\nbathroom with white toilet and white sink berside each oher\nA bowl filled with yellow bananas and green apples.\nAn aircraft soars over a beach near a city.\na person sitting on a bench while the rest look somehwere else\nA lot of people  that are on a sidewalk.\nThe sun is blocked by a statue holding a round object.\nA man with an umbrella hat stands next to another man.\nFour horses walking across grass on a lake with mountains in the background.\nOutside view of the MGM Grand in Las Vegas with people sitting and walking.\na brown and white cat is looking in a mirror with glowing yellow eyes\nA fork sticks out of a parking meter.\nA man in a bathroom on an airplane.\nBunches of bananas in yellow and green hang from a ceiling.\nA white and red double decker bus on street next to car.\nACCIDENT SCENE WITH FIRE TRUCK, AMBULANCE, VEHICLES, AND PEOPLE\nView of jet airliner taking off over tree top.\nA train sits on the tracks while people stand near by.\nA professional baseball player about to hit the ball\nA full view of an outdoor space with many things to see.\na red motorcycle with a windshield parked on the sidewalk\nA dog with a tiara on and his head rested on an armrest.\nTwo guys standing on the right hand side of a motorcycle.\nA bath tub in the center of a large bathroom.\nA stop sign and two fire hydrants set up in the woods.\nA condiment filled hotdog is in a red basket next to an iced beverage.\nTwo slices of pizza on a table with one beverage.\na store sits in front of a fire hydrant\nA set of three cow statues siting above a crowded walkway.\nA large plate of doughnuts on a table.\na bunch of sheep in a field eating\nA child on a blanket with an apple.\nA woman leads a race horse down a cobblestone path.\nA man drinking a beverage with his sandwich.\nA person is squatting by a banana tree.\na small group of fish about to be cleaned\nTwo men standing in a parking lot dressed in business atire\nAn arrangement of fruits and vegetables are laid out on a counter.\nA kitchen with a large wooden table and clutter on it's counter.\nA slightly dirty room that has green items on the floor\nA cat is laying on the lap of a man playing video games.\nPeople sitting around a long table using laptops\nA bear is next to a body of water outside.\nGirl on phone looking up a statue of Ronald McDonald.\nWhite living room furniture looks very modern and clean.\nA hose sitting next to a fire hydrant on the street.\nA large clock stands on a post on a city street.\nTwo giraffes standing in a rocky area by a river.\nA group of women are gathered by a long table of food.\nA stop sign that has been covered with graffiti.\nA dog with its mouth open about to ear pizza crust .\nA shelf with donuts being sold six for five dollars.\nA muscular man surfing on a vast blue ocean\nHerd of elephants crossing a water hole next to another herd of elephants.\nA collaboration  of people in different pictures doing things\nAirplanes on the tarmac in the rain at an island airport\nA person jumping a skateboard at a skateboard competition.\nthere is a man pointing up standing on a building\nA boy and dog sitting on a recliner the boy looking at a laptop.\nMen on horses herding a group of cows down a road.\nA skier looks up to the camera above her.\nA row of urinals with air freshener boxes on a wall.\na person standing in a door way and a horse in the foreground\nA baseball batter readies himself waiting for the pitch.\nkids out on the field playing soccer together\nA man flying through the air while riding a surfboard on a wave.\nMan in red shirt standing in front of a man holding a frisbee.\nTwo mountain peaks rise above a large meadow.\nA bike in view in a living room with a Christmas Tree in the background.\na group of horses standing next to a tree in an open field\nA player sliding onto bass while an opposing player tries to grab the ball at a baseball game.\na big bear that is staring at a camera\nTo people sit on benches in the rain, holding umbrellas.\na person in a kitchen with a pizza\nA mans toilet attached to a black pole.\nSomeone holding a stuffed teddy bear in their arm.\nA crowded city street is full of big umbrellas.\nA naked baby is on a bench in a backyard.\npeople standing outside of a building with a fire truck\nA meat filled pizza sitting on a pan on a table.\nA rainbow colored kite caught in the branches of a tree.\nA young boy riding a skateboard up a ramp.\nShe is eating a sandwich and having a drink.\nA kitchen with marble counter tops and black appliances\na stop sign graffiti written on the front\na train yard with several stopped trains waiting to go\nStreet signs, including a stop sign, where someone wrote \"Don't stop believin!\"\nTwo pillows sitting on the ground next to furniture.\nA person is holding something donut shaped in their hands.\nA woman with glasses and a scarf skateboards along Hollywood's Walk of Fame.\nA glider gliding in the sky over the ocean.\na group of people sit around a table\nThe pastry has a substance in the middle of it\nA group of baseball player congratulating each other.\nA man strikes a tennis ball during game.\na number of people on horses playing polo\nsome baseball players on some grass and some trees\nTwo people climbing a mountain on their skiis\nA view of a oven with the food flipped over in it.\nA woman sitting at a table while using a laptop.\nA counter topped with lots of pizza and sandwiches.\nSeveral zebras walking in the shade near some trees.\nA kitchen has light wood and shiny floors.\nPlate of food including chicken, pasta  and vegetables.\nGirl competing in a horse competition at the county fair.\nA white train sitting in a train station next to a Bologna sign.\nA zebra standing on top of a grass field.\na motor bike sits parked on a cracked street\nA number of food items and two beverage atop a wooden table.\nVases of flowers sit among plates of pastries.\nAn airplane is on a snowy runway at an airport.\nA dark colored dog sitting on blanket and looking up.\nA bunch of different fruits sitting in baskets and on a table.\nVegetables and fruits are on a brown cutting board.\nA low angle view of a church clock.\nA man and woman sitting by a pile of bananas\na person sitting in a boat with a dalmation\nA group of people near a table full of bananas.\nA large flock of sheep are in a grassy meadow.\nA person sitting on a machine with wheels in the middle above a pedal.\nThis lamp is standing near a wall that is painted red.\nA toddler laying in a bed with pink sheets.\nA person on skateboard in a parking ramp looking area.\nAn old, dilapidated toilet with a broken seat\nA smiling clothed man sitting on a toilet.\nA huge white and blue airplane sits on the runway.\nTwo men shaking hands and one being presented with a key.\nStuffed animals on a shelf with some books.\nAll of the donuts each have a different flavor of icing.\nA close of a bobble head doll with a computer in the background.\nA bathroom with toilet glass sink, mirror and extra toilet paper.\nA very cute orange cat laying with some shoes.\nA clock tower sitting behind an illuminated star display on a tree.\nA paper plate topped with a slice of cake next to a spoon.\na shop with a bunch of signs sitting out front\nA room with wooden floors and wooden walls\nA school bus that is made by Chevrolet has a few bumper stickers.\nA suitcase full of random assorted food items\nA living room with couches, a table, and a fireplace.\nA plane takes off from a runway while a large building stands in the background.\nA trolley with people on tracks in a rural area.\nThe batter prepares to hit the ball, while the fans watch from the side.\nYoung snowboarder spending time on slope in ski area.\na sink with a microwave oven on top of it\nA delta airlines jet sitting next to a  truck on a runway.\nTwo people standing in the reflection of a mirror.\nTwo children on a soccer field kicking a soccer ball during a game.\nGroup of motorcycle riders looking over traffic on the street\nA man holds an enormous sandwich in front of his face.\nGroup of motorcyclist riding motorcycles down a highway.\nA man skiing down  a snow covered ski slope with two ski poles.\nSmall boy in green shirt touching a yellow fire hydrant.\nA small herd of cows near a water bank.\nDecorated coffee cup with spoon next to miniature bicycle.\nA man standing next to a woman in a kitchen preparing food.\ntwo street signs on a pole on a sidewalk next to a street.\nA man walks out of a colorful train onto a platform.\nA pudgy man holds a huge hot dog and chips.\nDonuts and a cell phone laying on a table.\nthere is a man with tattoos talking on the phone\nA man with a racket goes to hit a tennis ball.\nA high speed train pulls into a platform while people watch.\nA bed in a room with two windows.\ntwo plates side by side, one with a roll and jam.\nA young zebra stands away from the zebra in the light.\nThere is a white stove with pans on top of it and next to it, a refrigerator.\nA lady sitting at a kitchen table alone.\nA tennis player serving a tennis ball on the court.\nA typical living room with couch, glass coffee table, television and water dispenser.\nA cop leading a gang of bikers down a street.\nA couple of giraffe standing next to each other in a  forest.\nLaptop computer sitting on a table with a sticky note on it.\nAn animal grazing under a wide, gnarled tree.\nA woman laying on a surf board is riding a wave.\na group of people next to a train with a sky background\nA cat staring at a another cat hidden in a travel bag\nSeveral people gathered around a table that has a cake on it.\nA coffee cup sitting on a pad of paper next to a keyboard.\nA white plate is filled with a variety of doughnuts.\nAn aisle in a store that is selling holiday items.\nThe player is hitting the ball with strength.\nThe man is holding a glass pan full of liquid mix.\nThere is a row of cows with baby cows next to them.\nThe black cat is turning away from the large computer screen.\nThe window to the store has graphics on it.\nA plate of food and a cup of coffee.\na girl with a microphone talking about a cow\nCarrots and cucumber on wooden cutting board near knives.\n2 buses and numerous cars move down the street.\na male in a brown shirt sitting on a bench with a laptop\nA painting of several flowers in a vase sitting on a shiny surface.\nA large yellow dump truck driving on top of a sandy beach.\nA person is riding a horse inside an obstacle course.\nA man is standing near a computer giving a presentation.\nA black cat with green eyes rests on colorful blankets.\nPERSON GOING FOR THE RETURN ON A TENNIS COURT\na view of a keyboard, remotes sitting on a desk\nA man riding a skateboard down the side of a ramp.\nthe sandwich is on the plate and has been cut in two\nA man is standing at the base of a ski hill.\nA refrigerator and a stove in a kitchen.\nThe bottom of a large airplane flying overhead.\na man wearing a yellow snow jacket and black snow pants snow boarding.\nA piece of chocolate cake is in a plastic container.\nA small airplane flying over a body of water.\nA child holding the hand of an adult while moving on skis.\nPeople are walking on cobblestone with umbrellas and shadows.\nA child is under the covers reading a book.\na plate with some food in it on a table.\nA man is riding on skis down a snowy mountain.\nA batter, catcher and umpire in a baseball game.\nA line of black and white cows are lined up and grazing.\na woman is working at a pastry shop\nA cat is sitting on top of an entertainment system\nA woman sitting on a bench with a mean look on her face\nA large group of people on a grass field.\nA person sitting in front of a laptop computer.\na desk with a bag and a bunch of other things sitting on the floor\nA dog runs alongside a skateboard with one paw on.\nA skateboarder balances on his skateboard, then balances on the board at the edge of a low wall.\nA man riding skis down a snow covered ski slope.\na close up of a plate of food with broccoli\nA man wearing a wet suit riding a wave on a surfboard.\na police on a big white horse in front of a retail store\nA small cat standing by a mirror on the ground.\nA bedroom with a white bed on a frame next to a window.\nA white table topped with lots of plates and food.\nA suitcase surrounded by some items on a floor\nA man holding a Wii game controller while standing in a living room.\nA fried dish is pictured on a plate.\nA tall clock tower sitting on at the end of a street.\nA herd of wild elephants walking along a dry grass filled hillside.\nA group of people sitting at a table with plates and soda.\nfour woman standing next to each other with bike helmets on and holding bananas\nA woman walks in the road shimmering with rain past the city lights.\nA man making a cut into a celebratory cake\nThere are cars parked along the side of the snowy street.\nA red crafted bird is pasted to a parking sign.\nclose up of a cow standing on the other side of a barbed wire fence\nA small boat rests on wooden planks by the water.\nMan on skateboard on top of wall in factory.\nA plate of food featuring burger patties, potatoes and carrots.\nA girl lying in bed and playing a handheld game.\nTwo clocks on post next to building in street.\nLooking down at cookies baking in a home oven\nColorful flags hanging lined up in a row.\nA black and white image of an older air plane.\nA stuffed animal dog birching out in front of people at the beach\nA box is full of old items as a tribute to Forrest Gump.\nA vase that has flowers in it on the table.\nA dog on a bed looking at something.\nA man on a surfboard that just caught a wave\nA bird stands next to many black benches.\nA woman with makeup bruises is in a suitcase.\na man lifting the lid of a square shaped toilet\nA man is pulled by an unseen boat while water skiing.\nSmall cup of baked brownies being scooped out into small snack sized dishes.\na group of people sitting at a table to eat at the beach\na couple of people that are in a kitchen\nThe train can be seen through a chain link fence.\nBirds stand on a side walk under the large trees.\nA view of a great room consisting of a living room and dinning room.\na street light with street signs in front of trees\nA photo of some cows standing in a field.\nmany luggage bags near each other on the ground\nA line of people in suits holding roses.\nA BATHROOM WITH A TOILET AND A SINK\nThat collage of nude women probably means this bathroom belongs to a guy.\nA man standing in a room holding a drink and a game controller.\nA white toilet and sink in a room.\nA child showing a banana to the camera.\nA close up of a cats profile is shown.\nA man is standing next to another man who is laying on the floor\nA man is looking at a bus stop sign\nA baseball player holding a bat standing near home plate.\nA man holding a laptop sitting beside a woman with a small child.\na toilet with a remote control mounted on the side\nA couple of women riding on the back of a horse drawn carriage.\nA man at a campground eating a sandwich.\nA man sitting at a table with food and beverages in front of him.\nA bride, groom, and minister at a wedding ceremony.\nA TV sitting on top of a stand in a living room.\nFour people riding on horses along the beach shore line\nA train engine carrying carts across a bridge over water.\nA black train sits on the tracks as people stop to admire it.\nA kitchen with two stoves, an island, and appliances.\na woman walking down a crosswalk next to woman riding a skateboard\nA double length metro bus drives down a city street.\nA tray covered in tin foil on top of a counter.\nA clock sitting on top of a sidewalk.\nA park bench by a body of water.\nTwo cats laying next to a cup of coffee.\nA hairy man is holding a frisbee on the beach.\nTwo people hold their colorful pastries next to each other.\nMany skiers are going up a snow hill\nA picture of a baseball game being played in a stadium.\nA collage of pastries, and a boxed of donuts.\nA triple layer cake sitting on top of a table.\nA red and yellow traffic sign sitting on the side of a road.\nA bowl with red and green apples and an orange.\nA living room with furniture, television thrown rug and a window.\na small child standing in a living room eating something\na dog moving towards the horses at the mountains\nman with skull decorated surfboard eyeing the ocean\nA young man standing on top of a snow board in the snow.\nA cyclist passes a bus while it picks up passengers.\na big living room with stained glass windows leading to a piano\nMen's doubles tennis players shaking hands on the court\nA white cat is sitting on a white sofa.\nA woman wearing a white shit and apron standing by a man in front of a traffic light.\nA white plane sitting on top of a runway near a building.\nAn old black and white photo of a man holding skis.\nCows in pasture within a fence on a field.\na broken toilet bowl base overturned in a shrubbery next to dirt and rocks.\nBathroom with a toilet, glass sink and a mirror.\nA couple of animals grazing on a dry grass field.\nElectronic and personal items from a back pack laid out neatly\nFemale flying a kite in an open field.\na woman posing on a bench in front of stony ruins\nfour giraffe stand at a tree all with their noses stuck into some kind of nest\nA group of elephants walking through the street with a pepsi stand in the background.\nA man holding white surf board on the beach\nA walk in shower next to a tub in a bathroom.\nA kids baseball game with a runner sliding into home\nA black, blond and white cat crouches on the side of a table with a cake on it .\nA man on a bike balancing quite a bit on his head.\nA group of female soccer players at the pitch playing\nHorse jumping over an obstacle on a course with a rider.\nMan cooking marshmallows over an electric stove with fancy tongs.\nA plane on a runway drives off to the air\nA group of people playing Wii in a family room.\nFlowers in a vase full of water next to a window.\nMany different piece of luggage that are open on the floor.\ntwo people with two dogs on a surf board and one dog swimming\na plate filled with assorted veggies and cheese\nA train engine is sitting at a train station.\nA couple of people on skis in the snow.\nA bathroom with a toilet, towel rack and a tub in it.\nA man in wet suit surfing wave on a surfboard.\nA cat is curled up on a bed beside a remote control.\nA couch with a cat and toy teddy bears on it.\na number of horses standing near one another\nA child in snow gear and skis on a ski slope.\nAn old truck with no passenger door with tires and body painted in different colors.\nA close up of a plant center surrounded by leaves\nA couple of kids are skate boarding down a street.\nA yellow train is stopped against a barrier on the tracks.\nThe soccer player is kicking the ball while a crowd watches.\nA silver pan filled with food on top of a stove top.\nA man riding a dirt bike on top of a sandy beach.\nA cat is stretched out on a couch under a window.\nClose up images of bikes parked next to the highway.\na table with some glasses of beer and some pizzas on it\nTwo horses giving each other a loving nose kiss.\nA monkey with a banana sitting in the dirt.\nA table that has a plate of food and a glass of wine on it.\nA bowl filled with ice cream, sprinkles, cherries and other toppings.\nhalf a dozen giraffe in a wooded area\nTwo skiiers jump down a snowy slope towards a ski lodge.\nA full, black and white coffee cup held in front of a computer keyboard.\nThis kitchen has white cabinets and counters and silver appliances\na man that is walking down a sidewalk\nA beautiful young woman riding a pink skateboard.\nA commuter train makes a left-hand track change to change direction.\nA beautiful Asian girl with a white rose in her black hair. She is holding an open blue umbrella over her head.\nA man riding a motorcycle on the street.\na group of people holding wooden utensils a smiling at he camera\nA bear made out of gummy bears in a candy store.\nThis aerial shot shows several people using a cross walk while holding umbrellas.\nA group of people sitting on a couch in front of a cluttered table.\nA white bed topped with pillows sitting next to a wooden night stand.\nA boy is playing tennis with other people in the background\nMany people on the beach with large colorful kites flying in the air.\nA women is holding an ID and holding a pair of scissors to it.\nThe elephant is walking outside by himself along the wall.\nA woman with nice legs laying next to a purple umbrella.\nA group of teddy bears all dressed in Pilgrim and Halloween outfits.\nA man in light clothing stands near a boy with sunglasses and jeans and they are both by a white glider.\nDisplay of ornamental vases and figurines with oranges stacked on stands.\nAthlete in motion during attended competition on gray and blue court.\nA large jetliner flying over a body of water.\nA kitchen with stove, refrigerator, and cabinets in it.\nTwo people riding bicycles alongside the river on a sunny day\nAn older gentleman flies a kite on the beach.\nA variety of items are spread out on the bed.\na bathroom with a sink right next to the shower\nTwo girls in pink robes standing in front of a television.\nTrays of party food lined up on a table.\nA copious amount of food are served up in the kitchen wares.\nA white horse is out eating in a field\nThree donuts piled together on a small plate.\nAn airplane on the runway either just landed or ready to take off.\nThe cat was laying in the sun on top of the zippered bag.\nA train with two cars is on a railroad track that splits into several directions.\nA desk nook area has a desk, a chair and a book shelf.\na man is arranging a set of appetizers on a tray\nA train is parked at a depot on the tracks.\nA man hitting a tennis ball with a racquet on a court.\na person riding a skate board at a skate park\nA silver truck driving past a giant arch from a mcdonalds.\nA man in green shirt riding on an elephant.\nA man is wearing a robe and a tie.\nA man is standing in front of a grill with an umbrella.\nA young boy aims his video game controller as a man watches.\nA group of cows standing around in an open field.\nOrange and white cat laying down and chewing on some cups.\nWoman on tennis court grasping racket with both hands.\nA piece of pizza being held in a persons hands.\na red fire hydrant at the corner of a street\nA green and red semi trailer truck front without a load.\nA couple of men standing on a lush green park playing a game of frisbee.\nthe duck is looking over the side walk\nA group of men standing next to each other.\nA zebra in an open ground near a bench.\nA toddler brushing his teeth and gums at the sink.\nA group of zebras that are standing in the grass.\nMan mid swing playing Tennis on tennis court\nBoxed hotdog, fries and a drink are set out for daytime reading.\nWooden mantle holding two vases of flowers and a picture.\nTilted pic of a mountain road with a street sign.\nA couple of people kneeling over a pile of snow.\nA bird with outstretched blue wings is sitting on some bird feeder.\na kitchen with a table a stove and an oven\nA red stop sign sitting on the side of a road.\nA bed sitting in bedroom under a picture.\na living room with big couches and a ceiling fan\nA person with glasses on the phone in a restaurant.\nAn elephant is standing in a grassy field in front of trees.\nFour boxes of donuts of various descriptions on a table\nThe baseball pitcher has wound up his arm to pitch that ball.\nA little girl is holding a Minnie Mouse umbrella above her head.\nA man is looking at hanging fruit arrangements.\nThere is plenty of clutter by the computer on the desk.\nA cat in a bathroom stands on the rim of the toilet.\na red fire hydrant with two nozzles on it\nA giraffe is standing in a field with a group of zebras.\nSomeone who is holding a hot dog in front of a box of teddy bears.\na man in a uniform is cutting a cake\nA man on cellphone and woman walking by building.\na teen girl sitting at a table with some pizza in front of her\nThe ingredients represented in the meal might include pineapple.\nA small group of men playing with a frisbee.\nSeveral people are swimming in a blue lake\nA train emits thick steam as it moves on the rails through a flowing plains.\nAn orange tabby cat stands in a doorway with a bookshelf in the background.\nA group of skateboarders standing around while another skates.\nA family of elephants stand close to each other.\nsome yellow signs attached to a building wall\nA variety of boats are shown in the water.\nA woman speaks into her microphone while looking at the cow.\nA green umbrella over some chairs and tables\nThe pitcher is winding up to make the pitch.\nA very cute girl holding up some scissors.\nA male and two females jumping to catch a Frisbee.\nThere is a train moving along a railroad track.\nA couple of people that are sitting on a bench.\nThe man is watching hockey on his computer.\nMan on a boat carrying large quantities of cabbages.\na parking meter that has been drawn on\nA group of people standing around a white cake.\nA white and black potted plant with a mirror behind it.\nsome kids are standing outside with an umbrella\nA train car traveling on a bridge over water.\nA boy with two marks on his back stands on a skateboard.\na surfer runs into the waves on a beaching with his surfboard\na tattooed man with a skateboard thinking about doing a trick\nTwo cats who are laying down on a bed.\nA plate of food, that appears to a small omelet and other pieces of meat.\nA dog that is sitting in a window.\nA cake shaped like an elephant squishing a horse.\nA WOMAN SITTING ON A BENCH EATING PIZZA WITH A LITTLE BOY\na pepperoni pizza sitting on an oven done cooking\nA pitcher standing on a mound on top of a baseball field.\nA man riding a snowboard down a snow covered slope.\nA man kneeling down next to a little girl.\nA dog with a frisbee in its mouth is jumping over a man lying on the ground.\nA giraffe looking over the corral fence in his zoo habitat.\nA horse and foal are standing in the meadow.\nThis is a picture of a black furry cat on a laptop.\nThe man is riding down the ramp on his skateboard.\nA couple of people walking in a parking lot by several motorcycles.\nA man holding an umbrella light on a beach.\nA man in a red and white baseball uniform holds out a bat toward a baseball on a baseball field.\nA man standing in a kitchen in front of a stove top white oven.\nA small bathroom with a toilet and flushing system.\nA living room with a fireplace and contemporary furnishings.\nA pizza that is sitting on a table.\nA cat in a chair peeking above the table's edge at a drink.\nLeaves sitting on a street next to a parking meter.\nYoung man gliding along rail on his skateboard.\nA mother and child carry kites through a park.\nA blender with something in it to blend\nA batter is getting ready to take a swing.\nA man flying through the air while swinging from a pole.\nA small green boat at a dock in the water.\nA desert topped with whipped cream is sitting on a plate.\nTwo women who are riding in a horse drawn carriage.\nA vase sitting on top of a roof top.\nPizza sitting on top of a table next to a couple of wine glasses.\na kitchen with a refrigerator a sink and a stove\nBlue and purple vase sitting not he side of a white wall.\nA boy eating a doughnut in a diner.\nA line of traffic beside a metro bullet style train indoors.\nA woman is on snow skis on top of a mountain.\nA large black bear walking through a forest.\na bench dedicated to someone with a weird edge\nShe is eating a slice and watching the small countertop TV.\nA river with rocks in the middle and a train trestle in the background.\nA boy in a yellow shirt is riding the edge of a half-pipe on his skateboard.\nA plane flying over the beach with a mountain in the background.\nthe man has returned a server of a tennis ball\nA large Japan Airlines jet landing on a runway.\nThe back legs of a cat dangling over a keyboard.\nthere is a large wooden platform bed in this room\nResidential bathroom with wooden cabinet and mirror next to shower.\nA dog plays with a frizbee in a pile of snow\nA man is standing up, taking a shot of the water, while a pigeon looks on.\nA man wearing a bandana, holding a skateboard.\nA woman is riding the waves on a surfboard.\nA giraffe is in a field of grass eating leaves off a tree.\nA black and a white horse are grazing in a green pasture.\nKites being flown from the water in the ocean\nA man is riding a bike while using a cell phone.\na group of people are at a market\nA room with a table, chairs and a doll in it.\nThe woman is eating breakfast in the kitchen.\nThis shows an innovative Apple device and a keyboard.\na small bus sits parked as a kid runs across the street\na building with some windows next to a street\nA man surfing a nice wave on a bright day with a ship in the background.\nA wet bear stands in the river looking for fish to eat\nA white table topped with a flower surrounded by chairs.\na woman skiing down a ski slope in the slope\nA smiling man stirring something in a kitchen.\na male with a beard a book and a child in bed\nA bird sitting next to a dried cob of corn.\nMan walking up the side of a mountain with his skis on.\nan oven outfitted with several Christmas lights\nA girl on a bicycle is stopped before crossing traffic.\nMan sitting on chair in kitchen with baked pizza on table.\nA little girl buying a small teddy bear.\nfour men in an office working on their computers\nA skateboarder is riding the green ramp.\nA skateboarder performs a trick in a skate park.\nSeveral people are talking next to a yellow plane in a hangar.\nA person points a remote control at the television.\nA brown and white dog laying on top of a green field.\nA red and black motorcycle with people in the background.\nA boy that is on top of a skateboard.\nThree giraffes tower above trees and brush as they feed.\nThere are small trees with oranges growing on them.\nOlder black and white photo of a woman playing baseball and swinging a bat.\nA white frisbee laying on top of a dirt field.\nGroup of giraffes standing by a pile of wood in an exhibit.\nTwo brown bears walking on an unpaved forest road.\nA bear walking on a fallen tree in the woods.\nTwo horses that are pulling a piece of farm equipment.\nPeople on a tarmac board a Qantas airplane.\nUp close to a giraffe in its natural habitat.\nA red truck is parked on the lawn of this house\nA man's feet resting on a skateboard\nA child standing between two luggage carts behind a car.\nPeople sitting around a table as someone puts stuff in a blender.\nA city intersection displays a clock on a long tall stand.\nFour giraffes encircle the palm tree within the fence.\na stop sign with some graffit on it\nA lit birthday cake has some penguin candles.\nA building displaying a clock showing the time to be 6 oclock.\nSome very pretty whit bowls with some food in them.\nA man is eating a peanut butter and jelly sandwich.\nA black and white cat sitting on top of cabinets.\nAn elephant stands in weeds with trees in the background.\nA man and woman sitting on a motorcycle.\nA shirtless man reading a book and eating.\nPark bench near tree during fall in open area.\nTwo people standing on the beach with a kite.\nA man making a surprised face is getting a hair cut.\nA young boy learns the meaning of the word strike.\nThere is a batch of doughnuts being made\na man sitting at a table with a plate full of food\na white horse is standing near a train\nThe bus  is parked at the bus stop.\na man taking a picture of a truck parked next to a building\nA group of young people standing next to each other on a beach.\nA train traveling down tracks next to a power grid.\nThree giraffes in a field with an Egyptian theme in the background.\nSomeone on a snowboard holding the bottom of it in mid air.\nA gourmet style pizza with a variety of vegetables.\nCows graze in a field in front of a lake.\nA room with a sunny window contains a bed and a desk.\nA toddler holding an electric toothbrush to his mouth.\na close up of a clock on a pole with a wind tool\nA white refrigerator and cabinets in a grey kitchen.\nA partially eaten plate of eggs, bacon and toast.\nGarbage and police trucks on a city street\nA small gray goat standing on large rocks.\nAn orange and yellow flower sitting in a see through humming bird.\nA yellow and orange fire hydrant in front of a building.\nA group of snowboarders snowboarding down a mountain.\nMan stands up on his bike and looks up next to a parked car.\nA herd of animals standing on top of rocks.\nA bus is traveling down a street near a building.\nA few birds are on the roof of a house.\nA woman sitting on a rail next to skis\nA kitchen with a microwave, cabinets, stove and dishes on the counter.\nA child skier standing at the bottom of a slope.\nA teddy bear sitting on a tricycle on a sidewalk next to a flower bed.\nA man standing next to a woman in ski equipment.\nAn acrobatic dog catching a frisbee mid air.\nA blender is full of food being prepared to puree.\nGirls walking in a park talking and taking pictures.\nA group of people sitting around a living room together.\na couple of people that have tennis rackets in hand\nA group of people sitting outside at a restaurant table.\nA woman standing over a stove cooking food.\nA little league pitcher standing in a field holding a catchers mitt.\nA colorful doll-house bedroom with one girl doll occupant.\nPeople walk across a footbridge that stretches over a river.\nA person holding a dog who is looking at it's self in a mirror.\nOverripe bananas on plates with breakfast food packages.\nTwo double decker buses passing each other on the street\nThere are several people riding mopeds and motorcycles traveling down the street.\nA man holding a flying disc in a park.\nA large man is holding a black suitcase.\nAn elephant throws dirt on his back with his trunk\nTwo giraffe, two zebra, a monkey, and two flamingo are searching for food.\na young boy wearing ski equipment in the snow.\nA surfer is on the water and is waiting for a wave.\nA herd of elephants walking through a lush green field.\nA woman with a tennis racket tosses a tennis ball.\nPolice car parked behind a car illegally parked at fire hydrant.\nA herd of elephants splashing and playing in a  waterfall.\nA group of people sitting at a table with beverages in front of a window with ocean view.\nA woman with a pen is writing while a man in a tie is watching.\nA bathroom scene with the sink and shower.\nA cheesecake on a plate with a croissant behind it.\na woman reaching up her arm as she looks at tennis ball\nRound mirrors above clean sinks in a public bathroom.\nA man playing tennis, ready with racket in hand.\nA photo taken in a mirror showing the side of a truck.\na man in a surfer suit walks down a street\nA gray airplane with metal petals on the wings takes off from an airport.\nA European tour bus with luggage on top on a brick city street.\nA close up of a glass bowl full of small oranges.\nA very beautiful woman wearing a black hat, black shirt and tie.\nAn ipod plugged into a dock inside of a kitchen.\nA man sitting on a bench waiting to get a ride from a bus.\nan elephant behind a fence at the zoo\nA cat sleeping on a bed with its head on a teddy bear.\nA person viewing a picture on their cellphone.\nA meal with two plates full of broccoli and other items.\nA refrigerator adorned with several magnets and clippings.\nA few people are getting to know one another in affection.\nA street light turned green on a dark street.\nSmall bird sitting on a skateboard posed in front of dark blue background cloth.\nA man leading a horse around the town.\nA sign that has a camel on it.\nColor fruit is on a stand including pears and apples.\nA giraffe standing alone next to some trees.\nA man is holding a military medal in a bar.\nA silver and green train stopped at a train station near kids.\nA couple of knitting books sit on a couch.\nSingle zebra standing in a field of semi dried grass.\nsome blue and orange surfboards on the sand water and rocks\nSmall child in baseball uniform standing next to players.\nA boy swinging a bat at a ball on a field.\nThe man is on a horse pointing his finger.\nA small elephant lawn decoration near a plant.\nA child dressed in random clothing standing barefoot in the kitchen.\nA woman in short shorts standing next to a young man.\nsome people some buildings and some are flying kites\nThe back end of three zebras walking in a group.\nA man in a red shirt motions toward his cell phone.\na corner of a building with the name of the street on it.\nA man is checking his cell phone while snowboarding.\nA sandwich in a basket accompanied by a beer and a lollipop.\nThe ships are all docked on the beach by the water.\nSkiers on a snowy slope are high above a small town.\nA dirty nasty urinal in a very dark rest room.\nA family gathered at the table eating breafast\na black and silver trains engine and a car and grass\nA steer walking down a busy market street.\nA pizza with several toppings sliced and ready to eat.\nA child dressed in a skeleton Halloween costume.\nTeenage boy about to catch the flying Frisbee.\nSoldiers with guns in the back of trucks in a parade.\nA boat that has been beached on the shore.\nA couple of hot dogs sitting next to a basket of fries.\nFour meals have been placed on a table with beverages.\nA photo of a couple singing karaoke.\nPerson in black surfing a wave near the beach.\nthere are two woman that are walking in the street under a umbrella\nA black and white photograph of a skater performing a trick.\nA herd of sheep grazing on a lush green field.\nA group of people riding bikes down a street.\ntwo high school soccer teams play against each other.\nA silver fire hydrant stands in the grass next to shrubbery.\nTwo people with umbrellas stand at the fence looking over the water.\nA woman stands in line at an airport.\nA big vase with flowers near a cup and window.\nA person in a park playing with a frisbee.\nA television sits above a fireplace in a living room.\nA single tall flower in a green glass vase sitting on a windowsill.\na close up of a bench surrounded by plant life\nA black stereo speaker near a computer monitor and mouse.\nA woman standing with a donut and a candy apple in her hands.\nA red train sits on the rail road tracks.\na young woman sitting at  a table resting her elbow on the table\nAn elephant in dirt area next to a booth.\nTwo giraffes eating leaves off the trees in the woods\nA variety of fruits and vegetables sit on a table.\nYoung male baseball player in full uniform and glove alone posing.\na really sad picture of some men with guns sitting next to a dead zebra.\nA small plane with the cockpit open and landing gear down\nA woman is standing next to a display of giraffes.\nA living room filled with lots of furniture and a TV.\nThe woman in red sunglasses is walking in snow with ski poles.\na close up of a zebra in a field of wheat\nA group of people riding horses through a small village.\nA sandwich on toast with potato chips on the side.\nA brown bear is grazing in the grass.\nA multi-colored umbrella that is blocking out the sun\nThere is a laptop on a crowded desk.\nA tennis player is on one foot hitting a tennis ball.\nA zebra grazing on dry grass in a field.\nVariety of meat and produce displayed for meal preparation.\nA red stop sign on the side of a building.\nA black and red train engine with train cars behind it.\nThe lady on the bicycle is waiting for the light to change.\na person with one foot in a snowboard\na bike shop with various bikes in it\nA woman is holding an umbrella over her head\nA couple of guys playing video games inside\nThis is a meal made for two people.\na small little plate that has some fruit on it\nA large pair of scissors on display next to plaques.\nDelicious looking meal of vegetables, cheese and meat on bread.\nA public bus near a curb on a wet day.\nA red traffic light hanging on a street pole.\na shirtless male surfer is carrying a white board\nA man in yellow vest on motorcycle next to a building.\nA large tree sitting on top of green grass.\na person is sitting on a couch while on a laptop\nA white bathroom with corner shower and tiled floor.\nA sausage sits on a takeout plate with spicy carrots.\nA bunch of books that are on a bed.\nA large red truck visible through the rear view mirror of a car.\nA young girls soccer team posing for a picture.\nTwo lines of bicycles parked on a brick surface.\nA snowboarder mid-air above a ramp outside in the snow.\nA polar bear goes bobbing for fish at the zoo\nA cat lies in a crib next to a small child.\nMan pushing a cart loaded with luggage in an airport check in line.\nA red and yellow high speed passenger train rolling along the track.\nA large, ancient looking clock tower rises above a neighboring structure.\nA laptop on a table with a white cloth at an art auction in a hotel ballroom.\nA white bird with wings spread under a cloudy sky\nmany people sitting on the ground with a big container in front of them\nStainless refrigerator and microwave on the counter of a kitchen.\nA guy with headphones does a trick with a skate board.\nA picture to people and horses in the water.\nA baseball player holding a bat over home plate\nA Christmas tree sitting inside of a living room.\nA woman sitting on a bed talking on the phone.\nA subway train with the doors wide open next to a bench and pole.\nThe clock face on the exterior of a building.\nSome animals that are hanging out in the dirt.\nA lady holding a camera up near a big black dog.\na small giraffe that is next to some rocks\nA dog playing with a toy in the snow\nA half full glass of red wine on a table.\nA group photo of men and boys from the Goodmayes Boys School dated April 1929.\nA white and brown dog laying on carpet under a desk.\nI can see one tennis player but I cannot see the other.\nA couple holding wine glasses and holding up a tag reading USQ.\nTwo soccer teams playing a soccer match in a stadium.\na group of people playing frisbee in a field\na cat laying down stretched out near a laptop\nAn orange kitten is hiding under a blue blanket.\nEight dishes on a platter, each with a different food item\nA room with two side by side beds, one of the nightstand lamps are on.\nAn oven with fire in it and ashes around it.\na small child standing above a skateboard on a tiled patio\na black cat sleeping on some bags of carrots\nA cat is looking at a cluttered computer desk.\nAdult elephants crossing roadway with young in native land.\nA young guy is surfing in the ocean.\nCars and a truck lined up across a train car\nA man poses in front of some green wood panels.\nA donut frying in oil along a conveyor belt.\nSome hands are coming from the closet and reaching for a sleeping woman.\nA pizza slice is being removed from a pie.\nThe umbrella is ready to be installed at the restaurant.\nMan with ponytail digging out condiment for sandwich in hand\nA plate full of food sitting on the table next to a fork, orange, cup and salt and pepper shakers.\nThe student is trying to relax on the floor.\nthere is a man sleeping on a mattress outside\nA packet of ramen, remote control, cigarettes and a lighter on table.\nA deser plate with cake ice cream and fruit on it.\nA table topped with a toothbrush and other items next to a wall.\nCars and buses seen through the reflection of a window\nGreen cake with a pair of pink pigs next to it.\nA young woman is eating a piece of pizza.\nA clock tower at seven forty three in the afternoon.\nYoung boys walking on wet pavement with umbrellas.\nA clear tube containing a flower sits on the floor.\nTwo men sitting on a couch one who is holding a remote.\nThree colored beached chairs, yellow, red and blue by the ocean\nAntique warplane surrounded by safety cones near person.\nA couple of boats floating along a river.\na man is holding a baby  and playing with a laptop\na street sign outside near a flag pole\na rusty flatbed truck sitting by a building\nA black fire hydrant that has two exits.\nAn appliance is standing next to cabinets in a kitchen.\nThe red clock is displayed for the people can see\na close up of a slice of cake on a plate\nA hot dog covered in cheese on top of a plate.\nA cat laying on top of a pair of shoes.\nA laptop on top of a box on a table\nA picture of some food and some coffee.\nA room with chairs and a couch next to a fireplace\nA very dimly lit dining area with some pretty flowers.\nA tennis player wearing all white reaches hi racket up to a ball.\nKites laying on the beach on a sunny day\na small candle lit beside a placemat and some glasses\nA woman standing alone holding an open umbrella over her head.\nThere is a place of food on a white table.\nA cat and some people on a grass field.\na train stopped at a train station with people near by\ntwo cows in a body of water near a field\nA cat sitting on a dresser with a person in the mirror behind it\nA pallet holds a display of fresh vegetables.\nTwo geese and their babies stand together outside.\nA military plane is flying upward in the sky.\nPeople standing next to sheep and feeding them.\nRows of green bananas on a tree with big green leaves.\nA young woman riding a horse holding a flag\nTwo boys sitting on a bed playing a video game.\nA meal that looks like falafel and hummous.\nThree sheep are grazing freely in the open field\nA vehicle near a stop sign with a poster.\nA living room minimal furniture and a large window.\nA view of a bus stop from across the street.\na person dressed in ski gear in the snow coming down a mountain side\nDual digital parking meters are in place and waiting for a visitor.\nA female soccer player sits in the bleachers holding her ball.\na black and white picture of a white man singing a song\nA man and woman smile while standing beside each other.\nThere are several holiday teddy bears in a shop window.\nWall with tools hanging on hooks and two litter boxes under alcove\nA girl sitting at a table full of bananas.\nA man brushes his teeth with a toothbrush.\nA tablet PC decorated with a picture of a girl and three baby pandas.\nPIZZA, SPOON, BOWL, COFFEE POT ON TOP OF STOVE\nA dog in a grass field with a Frisbee.\nA person on the beach flying a kite.\nAn elephant is being taken down a road in the back of a truck\nA black cat staring out the window behind a computer\nA man with a black tie smiling and holding a white box.\nA puzzle picture of a baseball player batting a ball\nA cardboard cutout of two boys kicking a soccer ball\nA pizza sitting on top of a white plate on a table.\nAn old fashioned refrigerator in a kitchen next to an old fashioned stove.\nA mom and a baby who is holding a teddy bear\nA white horse looking through the window of a tall brick building.\nA big orange truck driving down a street.\nA wooden bench written 'CITY OF LONDON' at the park\nA man smiling while slicing into a cake.\nA bed with white sheets and a night stand.\nA stop sign at the intersection of fifth avenue and fifth street.\nThe brown dog is riding a wave on a blue surfboard.\nA boy and a group of sheep walking away in dirt field with trucks in background.\nA picture of a very nice clean living room.\nA person with glasses holds a Frisbee standing in the grass.\nLocomotive pulling cars on tracks in outdoor area.\na cute happy bright yellow and red bird sitting on a tree branch\nA team of horses hitched and ready to pull a wagon.\nA man riding a skateboard while a child sits on the front of it\na newly shaved sheep walks away from it shaven fur\nthere are any kites that are being flown in the sky\nTwo elephants are in front of a muddy waterway trampling in the wet dirt.\nZebras are grazing on grass by a car.\na row of three ambulances with white and yello paint\nthis kitchen is all white and all white appliances\nA person sitting in a bed with a laptop before them\nSomeone is wind sailing out at the beach\nA man is sitting on a park bench speaking on his cellphone.\nA laptop in front of a computer on a desk and a blue chair with a colorful blanket on top of it.\nTwo men standing and holding video game controllers.\nA 18 wheeler truck on a highway carrying a large over-sized covered load.\nA WOODEN HAND MADE KEYBOARD WITH A MOUSE\nA view of an airplane traveling across the bright sky.\nA dog laying in the grass next to the sidewalk.\nA baseball player is swinging high in front of the readied umpire and catcher.\na couple of people on the beach playing with their kites\nA homemade pizza with toppings served on a plate\nTwo giraffes in a grassy field with trees in the background.\nA flock of sheep are crossing the street next to the cars.\nA couple is sleeping in a bed with red sheets.\nA young girl smiles for a picture at the beach.\nA parking sign and a fire hydrant.\nA red stop sign sitting under a green street sign.\na close up of a bird on a beach near water\na girl dressed in red shirt and black pants playing tennis\nA girl with glasses curled up under a colorful, crocheted blanket\nSome women are talking next to some sheep.\nA laptop computer and mouse on top of a desk.\nThe refrigerator and the kitchen is being cleaned.\nDifferent kinds of food rest on a plate.\nA hot dog lays on a white paper next to a can of juice.\nThe living room has two couches and an easy chair.\nA woman on the beach has a pink hat and umbrella.\nHorses and carriages are lined up along a walkway awaiting customers.\nA piece of wood with bananas and forks on it\nMan in white shirt and scarf throwing a frisbee.\na large field full of sheep out in the outdoors\nThere is a man about to fall off his skateboard\nA group of people standing together for a gathering.\nA long train yard full of different equipment\nA man with a bunch of plates in front of him by a red house with an open door.\nA tile bathroom with a large mirror on the back wall.\na person sliding in to home plate when the guy didn't catch the ball\nSeveral potty pieces with a white background and blue design painted on and one is adorned with feathers.\nA big city bus parked right beside a building.\nYoung black cat lying on desk with head on keyboard.\na close up of a baseball player with a glove\na man that has a wii remote in his hand\nAn applegate hot dog is placed in a bun.\nA row of matching planters are arranged on an outside colorful wall.\nGroup of people walking through a city at night.\nan image of a tray of food on a table\na case in a bakery full of doughnuts of different flavors\nA vegetable succotash has cashews, broccoli and sauce.\na image of a dessert on a plate with toppings\nA man with lots of tattoos sitting in front of a bowl of food.\nAdult wearing white shirt and tie holding baby in outdoor scene.\na pair of gray scissors hanging on a nail and another black item\nA horse is standing in the green mountainside grass.\nA person is flying a kite on a beach.\na toilet in a wooden themed bathr oom is open\nA clock on a tall brick and white tower.\nA series of images of skateboarders skating and jumping.\na couch in the living room near some stairs\nTrio of large birds sitting next to each other on wooden perch.\nA painting of a woman sitting in a chair with a laptop computer.\nA couple of small birds on a wooden pole.\nThis is a collection of different kinds of hot dogs and french fries.\nPerson wearing all white leaned up against a wall with a yellow sign.\nA woman at a table in a restaurant\nA black and white image of a young woman sitting on a grassy knoll using her lap top.\nA person that is playing a tennis game.\nA baseball game is being played before a crowd.\nA mini refrigerator stocked with bottles and cans of alcohol and soft drinks.\nA young boy holds a kite in a grassy park.\nThe young man races toward the yellow frisbee.\nAn orange cat licking a blue pair of shoes.\nA close-up of a desert with cookies, ice cream and a cherry.\nTwo semi cabs are parked neatly beside one another in a park area.\na dog begging for food off a table\nA living room with low lights, a couch and a tv.\nA television stand has a television and vases on it.\nA young boy holding a tennis racquet near a house.\nA clock, bird with missile, american flag at an area that looks like a flea market.\nThree women at a party posing for a photo.\nA display of a man with striped tie and a bird on his shoulder utilizing two Instamatic photos\nThree inset pictures including bottled water, small pizza, and cup of coffee.\nA man riding a motorcycle down a street next to a train car.\nOlder man rides on a carriage pulled by two horses\nSeveral men are unloading trunks from a Model T.\nA tray with a hot dog, fries, ketchup and mustard on it.\nA piece of cake is seen on a clean, white plate.\nSmall boy holding a kite over his head waiting.\nA firetruck without emergency lights on cruising through an intersection.\nA TV sitting on top of a brown couch next to a pool.\nA black and white photo of a person surfing. The picture is from underneath the water.\nA train with a red engine in the countryside.\nA man riding a skateboard down a street in front of a red car.\nA trio of men throwing a Frisbee in a field.\nA motorcycle stood up in a forest with melting snow.\na green fire hydrant siting by a yellow pole\nA pile of garbage sitting on the curb in front of a wall.\na few pieces of pizza on a pan\nSkiers pause for a photo before hitting the slopes.\nA woman plays tennis in a tennis court\nA zebra standing by a log and container eating grass.\ntwo stuffed teddy bears sitting in a chair\nA cat stands alert on a park bench.\nTwo pedestrian walk signals are lit up at night.\nA cow licking its side in an enclosure\nTwo female tennis players walking in opposite directions on the tennis court.\nA man stands in front of a Jamaican food truck in a city.\nAn airplane flying a in clear sky above a light.\na man is skating around a cement skate park\nA orange and yellow freight train traveling down the tracks.\nA stew pot holding carrots, celery, and squash.\nA blender has some sort of liquid inside.\nA lady dressed with a pink hat and unique clothing snow boarding.\nA green airplane flying over a lush green field.\nTwo children at a skateboard park under a blue sky.\nA lamp that is on in the corner of a living room.\nA man holding a tennis racquet on top of a tennis court.\nA man dressed like a zombie with other zombies around him.\nAn older man and a younger boy play a video game.\nA clock is shown on top of a building.\nA small baby is biting into a banana.\nA very tall clock tower towering above a city at night.\nA man in glasses wearing a suit and vest.\na group of women gathered together side by side in front of a table with pastries on it\nA man standing on a  tennis court holding a racquet and a ball.\nThree people riding ponies and horses in a residential area.\na woman and horse walking behind a giant pickup truck\nAn image of half a bathroom and half stairs.\nA calico cat is laying on a laptop computer.\na man and woman are outside taking a picture together\nAn elephant and it's trainers interact with each other.\nAn arial vierw of a building with a clock tower.\nA woman and two men are having a conversation.\nA far off picture of birds flying above a field.\nA man and two women with Wii video game controllers.\nA couch that is in a living room with pillows on it.\nThere are three adult giraffees that are walking in the park.\nmany brown and black sheep bushes grass rocks and trees\nSeveral women sit at a table tasting wine\nA cat lays down around some stuffed animals.\na person on a motor bike drives down a street\nYoung man wearing a suit and tie standing inside a building.\nthere is a tall sign that is on the side of a building\nA busy city intersection with people and cars\nA close up of a clock reading 1028 and 54 seconds.\nA train traveling past two cars on a road in a rural area.\na person on a surf board rides in the water\nTwo elephants bathing in a man made environment.\nVarious tools are sitting on the table together\nA small bike laying beside a fire hydrant.\nWhite sheep are grazing in a green pasture.\nMale surfer on a red and blue surf board.\nA mother elephant and her baby are standing alongside a dry water pool.\nThe view of green mountains and a valley from a cockpit.\nA man and a young boy riding on a donkey while people move behind them.\nA  ORANGE WITH A WINE BOTTLE ON THE COUNTER\nso many elephants moving near some waters in the forest\nAn image of a baseball player getting ready to take a swing at the ball.\nA long river runs alongside the train tracks.\nA street sign points in the direction of the road.\nSeveral signs can be read at a pillar in the fence.\nA large bear is sitting near a rock in an enclosure.\nthree people walking a dog in the snow\na guy standing by a fench with his skateboard\nA mom and two smaller sheep in a large green field.\nA women reading a red book in her bed.\nA man riding a motorcycle with another person during a sunny day.\nA table in a restaurant covered in plates and mugs.\nThe man is playing tennis at a very high level.\nA fire hydrant sits in a small grassy island near the sidewalk.\nAn orange kitten laying in a chair with a stuffed bear.\nA train is making its way around a snow dusted corner track.\nBlueberry stuffed beanie teddy bear sitting on a table.\nA black and white zebra stands next to a tree.\nAssortment of laptop computers displayed on table with backpacks full of electronic cords.\nTwo pictures of a stoplight, one is green and one is red.\nA large orange bus stopped next to another bus.\na couple of different types of signs on the outside\nA group of beautiful woman walking down a street in bathing suits.\na large air plane flying in the sky\nSeveral toy SUV's alongside a toy bus on a highway.\nTwo parents are helping a baby put on a hat.\nThree horses are pulling a wagon full of hay.\nA kitchen knife on a cutting board with vegetables and spices beside it.\nA boy and a girl posing for a picture.\nA variety of Domino's pizzas and a business man selecting a piece.\nTwo dogs playing tug of war over a frisbee\nA gentleman laying on the couch while talking on the phone.\nThis bathroom is all white and has a framed mirror on the wall\nAttractive landscape with picture frames and large white vase.\nThere is a man cutting something up over what looks like a homemade pizza.\na bunch of bananas hanging on a wall\nA female sitting at a table cutting a cake.\nA woman sitting at a table with a plate of food.\nLarge black towel sitting any Penwith hey with people looking at it.\nA van is driving through an alley way.\nA stop sign is standing on the side of the road in front of houses.\nA white fire hydrant is in front of an old couch sitting on a sidewalk in front of a house.\nA few mack trucks in a parking lot.\nA person wearing all black does a one handed hand stand as he holds a skateboard on his feet.\na close up of a cat laying on a desk\nA flooded street with the water up to the traffic lights.\nA baseball player waits at the plate for the pitch.\nA large green bus transporting passengers through a city\nA kitchen with lots of counter space and a black oven stove top.\nA table with a stack of orange cups by orange scissors.\nA dog is standing on a tile floor.\nA large red bus parked in a stationary position.\nA boy skate boarding down some steps .\nAn angled photograph of people flying kites at the beach on a sunny day.\nTwo dogs are laying next to a bike.\nA long silver train traveling through a wooded area.\nA light red fire hydrant on the corner of a street.\nA person crouching next to a pair of motorcycles\nBusy stadium with many people outside near vendor trucks.\nTwo urinals in a restroom with multicolored tile.\nA person watching a sheepdog chase a white disc across a green field with mist covered mountains in the background.\nSeveral people holding umbrellas are lined up near a fence.\nA stoplight and street signs beside old buildings\nA tennis player getting ready to swing her racket.\na close up of a drink on a table near a laptop\nSeveral pieces of ancient pottery and stoneware on display in an exhibit.\nA woman standing in a room holding a Wii game controller.\na group of people with surf board standing on some snow\nthe ice cream vendor is talking on his cell phone\nA pile of TVs sitting next to a brick building.\nan image of street signs on a residential\nA guy on a skateboard in front of a water fountain.\nA giant cake decorated with round discs on a table\na woman on skis is standing in snow with her dog\nA woman casually reaches up to hit a tennis ball.\nA fat kid enthusiastically enjoying a pizza from a big pan.\nAn old rusty fire hydrant sitting in the grass near a picnic table.\nA new kitchen that has just been built.\nA sleek, modern toilet has a backlight and granite counter for storage.\nA teddy bear posed sitting holding a book\nA train running on train tracks through the wilderness.\nA white toilet sits in a bathroom, with the lid open.\na bathroom with dark tiling in iit and a pink bathtub\nA machine is on a folding table in a small kitchen.\nA person riding a skateboard down a street.\nA man kiteboarding over a large body of water.\nA man working on a laptop computer at a desk.\nTwo people pulling a luggage cart down a sidewalk.\nA white plate topped with meat and vegetables.\nSkiers come down a snowy hill in a row\ntwo glasses of juiced carrots and apples on a white cutting board\nA man is seen in a mirror in a bathroom.\nA person is holdiing a kite in a field.\nA sepia colored room shows vintage furniture with a tendency to the frilly, including a bed with a curtained balcony and a chair, both in matching floral pattern,  and a dress form.\na little tourist train pulling three cars of passengers\nA mirror is shown with a man driving in it.\nA moped parked in front of a yellow wall and traffic sign.\nA flock of sheep sitting in the middle of a field.\nA skier kicking up a spray of snow.\nYellow and blue fire hydrant in front of a movie theater.\nA teenager does not make any expression as he rides a skate board.\nA man holding two small green birds in his right hand.\nThere is a baby elephant with its parent\na truck parked on a beach near water\nThe neatly made bed is beside an open window.\na big grizzly bear looks toward the camersa\nA renovated propeller airplane flying in a blue sky\nA giraffe that is eating a piece of food near another giraffe.\nA group photo has smiling people and one dog.\nBrick houses with brown stairs stand near a wide sidewalk by a line of trees.\nA large building with windows and cars parked below\nAn asian woman holds a sub sandwich near her mouth.\nA pizza cut up into many pieces on a white plate.\nA red truck with a trailer attached, is parked near a red house.\nA plate of food with eggs, meat, salad and a fruit cup on it.\nA small table with cups and saucers and a clock on it.\nA boy flies a kite on a beach near colorful tents.\nA very cute cat laying on a desk.\nSeveral Billabong surfboards make up a nice display.\nA cat looking at the television with flowers on the screen.\nA group of people in  a park playing frisbee.\nAn airplane hanging from the ceiling of a building.\nA table topped with lots of different types of fruit.\nA small child's bed sitting next to a window.\nTennis player with white outfit holding a racket.\nA person wind sailing next to a person para sailing.\nA man is prepared to get on a wake board\nA horse pulled carriage on a open street.\na couple of bowls of food sitting on a table\nA refrigerator is shut with black duct tape.\na man standing on a surfboard in the water\nA large clock rests on the side of a brick building.\nA picture of a vase with colorful flowers in it.\nA giraffe walks the grasslands by himself at sunset.\nA dog heading into the water near a horse.\nA woman rared back at a tennis ball with a racquet.\na person at a table with a plate of food\nA small child on a bed looking at a lap top computer.\nA group of people are looking at something or someonr\nA bride waits for something while holding her bouquet.\nA white cloth with scissors, a needle, thread, and measuring tape resting on top.\nA tennis player gets ready to hit the ball as a crown watches from the bleachers.\nA white truck has a vision sign on it.\nA clock on a steeple of a tall building.\na shop with some wine bottles sitting on a counter\nThe kitten is nesting inside the empty bowl.\nA gondola like boat crossing over a bridge\nA large clock tower with a roman numeral clock on it's side.\nA woman on a court with a tennis racket.\nThere are three dishes and a vase with two roses on the table.\nA boat sailing on a massive lake surrounded by mountains.\nA woman in a long dress talks on her cell phone.\nA white plate topped with a hamburger next to fries.\nA man riding a surfboard in the ocean on a wave.\nA LADY FEEDING HER CAT WITH A SPOON.\na man standing at the beach with a surfboard and a paddle\na white building and some people flying a kite\nA guy standing in the grass is ready to throw something.\ngrandma watching two kids playing a video game\nA bowl of soup and a sandwich plate on a table.\nA photo taken from a boat with a long bridge in the background.\nAn average hotel room with twin occupancy capabilities.\nA woman is eating a pita on the street.\nA baseball player holding a baseball bat in a game.\nA serving dish has meat and greens in it.\ntwo boys with painted faces laying in a bed\nTwo men sitting at a table with plates in front of them.\nWooly horse and sheep dog face each other down\nA white pitcher filled with orange and purple flowers.\nA lady bent over with her tennis racket while another girl looks down court.\nA grey and red train next to a train station.\nA person standing in a bathroom next to a white toilet.\nA counter with various baking ingredients that include bananas, butter and oats.\nA small plane is dwarfed by the larger ones in the background.\nA couple of dogs standing outside of a wrecked car.\nMotocross rider displaying aerial tricks on nice day.\nA male child swinging his bat at a ball, another child behind him as the catcher.\na man holds parts of a broken television\nA lone bench sits atop a hill looking over the river.\nDonuts are going through the mechanical glaze machine.\nA white boat sitting next to a  dock near a white building.\nPeople sitting on the side of a street next to suitcases.\nA group of baseball player standing on top of a baseball field.\nA man wearing a brown suit and brown tie.\nA train on some tracks with power lines above it.\nA boy in a blue and white shirt playing tennis on a brown tennis court.\nA toy town with a train on the tracks passing a signal.\nA couple of sheep standing on top of a grass hillside.\nA man on a stage with ski poles in his hands.\nChef stirring large pot on top of stove.\nA young boy is playing tennis at the tennis courts.\nA boy riding a skateboard and doing a trick.\nTwo girls compete in a game involving a frisbee.\nThe dog is sitting in a chair beside a bright window.\nA close up of someone's feet on a skateboard.\nA white dog is on top of a bed looking into a box.\nBrightly colored oranges, pear and apple in a colander.\nSuitcase containing many compact clothes for just one person\na person on a tennis court holding a rackett\nThree men on a field playing a sports game.\nA bunch of people walking on wet sidewalk by buildings.\nA woman holding a tray of food in a kitchen.\nThe sink counter of the small bathroom is made of wood.\nThe girl is standing with her laptop in her hand\nThree people on snowboards on the slope of a mountain.\nA man standing next to his guitar case talking on his cell phone.\na small boy in a black shirt a brown and black dog and a bed\nThere is an airplane flying by a mountain.\nA toilet stall with green marble walls and a painting.\na toilet a tub  some pipes and a window\nA black and white photo of a train pulling into the station\nA lady is running with a tennis racket on a tennis court.\nA meal of hot dogs and stuffed vegetables\na person cutting a cake on a table\nA woman is sitting down with the light turned down low to take a picture of herself with her cell phone.\nA couple of white horses walking along side a rocky hillside.\nThe motorcyclist is traveling down the busy street.\nPark bench on snowy elevated viewing area above city.\nA man in a green tennis outfit hits a tennis ball with his raquet.\na man that is standing up on a stage\nA room with a picture on a wall and a vase near the window with flowers in it.\nA train moving along a track, approaching a light signal.\na small toy truck with a cat peering through a window\nBed with yellow blanket against a wall with hard wood floor.\nA group of people are painting a bench in the park.\nThe pitcher just threw the ball to the batter at the baseball game.\nA classic clock sits on a wooden table.\nA giraffe on the dirt looks tall among the trees.\na living room with couches covered by sheets\nA man in a suit and tie is playing a key board.\na number of oranges in a tree on branches near leaves\nA hat is sitting on the top of a bed.\nA man eating a hot dog on top of a bun.\nSeveral birds are standing in a large nest.\nTwo woolly sheep in front of a wooden fence and barn\nA couple of trick planes flying by each other.\nGuy in shorts and a cap ride along top of wall with his skateboard\nA person skiing downhill in the white snow.\nTwo men looking at a plane on a runway.\ntwo woman sitting on the ground one is on a cell phone\nTwo Asian men standing in a office with business suits on.\nA bunch of used appliances sitting on the street\na clock attached to a green pole on a building\na sheep standing in the grass next to a fene\nA row of passenger buses traveling down a lone road.\nA black bear lying down near many trees.\na couple of houses that are next to each other\nSeveral broccoli plants planted next to a wall.\nTrain cars sit on the tracks next to a platform.\nA group of lambs are running in the opposite direction of a dog who lays barking.\nFluffy white cat laying on a lightly colored bed.\nA living room filled with furniture and a flat screen TV.\nAn employ looking kitchen has a black refrigerator.\nThe bathroom is mostly a red color. It looks very old.\nA plate of food with a salad and very large chicken sandwich.\nan image of a cat next to the feet of a person\nA green train sitting along side a train station platform.\nA man lays in a hospital bed while holding a teddy bear.\nA man stands behind the counter of a restaurant.\nA skate boarder jumps off a curb into the street.\nA lit up display of teddy bears of different colors and sizes.\nA cap that is sitting on a blanket next to a remote control.\na couple of cats are laying on a bench\na group of people standing around a metal briefcase\nSeveral types of wild animals grazing in an open field.\nA man sitting at a table with a pizza in front of him.\nA large long train on a steel track.\nBaseball player preparing to strike ball from the pitcher during game.\nThe woman is in a ski racing down the path.\nA vase filled with flowers sitting on top of a counter.\nAn old parking meter sets with time expired in front of a parked vehicle.\nA large white polar bear sitting on top of a rocky ground.\nA kitchen that has white cabinets and a white oven.\na close up of a child near an opened refrigerator\na bedroom with a large window cover with shiny curtains\nSome traffic lights suspended over a road by some parked cars and houses.\nA row of parked motorcycles on the side of a street.\nA goat with red painted horns on its head\nA refrigerator packed with lots of food and drinks.\nA surfer falling in a wave with other surfers nearby.\nA huge cargo ship sits empty in a bay reflecting blue skies.\nA man that is sitting at a table.\nA slice of pizza with cheese and golden crust.\nThe pizza is topped with broccoli and onions.\na bunch of plates of food no a table\nA tower that has a clock on it.\nA man surfboards on a wave in muddy water.\nA woman is skiing near a bunch of trees.\nA caution light and traffic cones set up to block a street.\na black cat is sitting on a green bench\nA narrow city city features colorful buildings and a large green bus with cars behind it.\nA man standing next to a beautiful woman.\na kitten laying on a bed next to some phones\nA big piece of bread is placed on a white plate.\nA small white dog begging at a door to come inside.\nA doughnut on a plate and a banana.\nA plate that has a cooked pizza on it.\na man carving the turkey for thanksgiving dinner\nThree sheep graze in front of a barn.\nThere is an old street sign leading against a building.\na bear in teh middle of a grassy field\na green and white street sign and a traffic light\nTraffic light signaling green at the train tracks\nA cell phone held open in a bathroom.\nA teddy bear sitting in a very unusual spot high up\nA man touches a hammer to the center of a clock.\nA group of three people sitting on top of a green couch.\nA couple of military men cutting up a  giant sheet cake.\nA person eating food from a large white dish on a desk.\nA man standing in front of a microphone wearing a suit and tie.\nA cow laying on top of a grass covered field.\nThe side of an airplane that is parked, and an Air China sign on the side of the plane.\nShadows dominate the landscape in this dark, dreary scene.\nA snowboarder is boarding next to a chairlift.\nA woman in a tiara cutting a birthday cake at a party\na desk with a keyboard mouse monitor and a tv\nA giraffe is crouching in the grass next to a tree.\nA cat sits on the table next to a bowl.\ntwo adults holding a baby while wearing ski wear and standing on a snow bank.\nGiraffe looking through a set of bars in a cage.\na white building with a white clock and some trees\nthere is a red stop sign on this street pole\nThe tall zebra is following slightly behind the shorter one.\nA skier is headed down the steep slope.\nA surfer riding a small ocean wave on his surfboard.\nA young woman is holding a cell phone open next to her face.\na brown horse with a white stripe on its head\nA cat is sitting by a single shoe.\nA single giraffe that is walking in a field.\nA group of people posing for a picture.\nA person is standing on the beach holding a kite.\nA group of people in grassy field with kites in the sky.\nTwo women playing tennis on a tennis court.\nThere is a man interacting with a black dog.\nA parking meter on the curb of a city street\nA ceramic object with blue flowers on it.\nThe dog went all the way into the water to fetch the hat.\nCows and a sheep eating food from a red box.\nthere is a piece of cake and a fruit on a green late\nThere is a stop sign covered in snow.\na couple of people holding a martini in their hand\nAn airplane on a runway with another plane flying overhead and a truck nearby\nA very tall white clock tower towering over a lake.\nA corner of a rest room with a shower with glass walls.\nThree sheep laying in hay in a gated area.\nA group of people sit at a table with food.\nA desktop computer monitor sitting on top of a desk next to a mouse.\nA lot of carrots on a wood board for sale.\nA man standing on top of his head while riding a skateboard.\nA man dressed in a military style uniform shaking another mans hand.\nA empty living room that has a table in the center.\nA kitchen that has pots on the stove.\nA large two story boat floating in a lake surrounded by mountains.\nA person on a motorcycle is doing a wheelie\ntwo white black and brown dogs are lying on a red couch\na vast, grassy field with animals in the distance\na gray cat is sitting on a wooden bench\nAn iPod with ear buds and a mouse near a book and keyboard.\nA black leather case containing several pairs of scissors.\nSeveral boats that are moored at a dock\na man standing on the corner and people walking down the sidewalk\nA woman holds a colorful kite in a city park\na close up of a person playing nintendo wii\nA commander cuts a cake at a military function.\nA boy looking at the camera while sitting at a wooden table.\nA large ship making it's way through the water.\nA woman in a bustier holding a stuffed animal.\nan image of two men standing in front of a Christmas tree\nA vase with flowers sitting next to a glass tomato.\nThe lamps are on next to the pull out couch.\nBlack and white photograph of two people on a moped.\nA kitchen with white cabinets and a stove on the counter top.\nPresident Barrack Obama standing in front of a crowd while giving a speech.\nA boy smiling as a large spider walks on his arm.\nA giraffe standing outside of a building next to a tree.\nThe back of a moving truck that has a man standing on a lift with a royalty style chair next to him.\nA duck is swimming in the pond to the next destination.\nA bathroom with some of the wall removed during a renovation.\na couple of little kids in baseball clothes stand next to each other\nA group of people flying kites over a beach.\na person siting on a bench with a dog near by\na black cat sitting on top of a black suitcase on a bed\nTwo computer monitors, two keyboards and two CPU's on a desk.\nA cat looking inquisitively over the top of a car seat.\nplanes and cars sitting on an airplane tarmac\nTrucks and cars going down a commercial retail street in a city\nA young woman is preparing to hit a tennis ball.\nA man that is at the beach jumping in the air.\nThere are many books and magazines in the small room.\nA framed wedding picture on a crowded wooden table.\nA child holds a string over the water.\nThe train has spots of rust that are obscuring the graffiti.\nA series of photographs depicting bathroom before and after minor changes.\nDark cabinets around a white two doors refrigerator.\nA man holds a card and wine glass with a woman who also holds a wine glass.\nA casserole containing broccoli and  topped with cheese.\nWoman standing behind open refrigerator door in modern kitchen.\nA turquoise and orange station wagon with two surf boards on its top.\nA tennis player in black shorts and white shirt looks up and holds back a red racket.\na train is passing over water on a bridge\nA woman wearing a maroon sweater standing in front of crates.\nA flock of black-faced sheep near a watering trough on a rural hillside.\nTwo giraffes standing outside while people watch them\nTable centerpiece of a tall wine glass shaped vase with flowers\nA man is holding a large piece of pizza.\nThe young child is close enough to pet the cow.\nOne horse has taken the lead in the race.\nA skier struggles in deep snow with their lost ski.\nA collection of different smart phones on a table.\nNote with listed items on white refrigerator in kitchen area.\nA man with a backpack and coat walks by a bus.\nAn magazine photo of a restroom toilet and sink.\nA man choosing a piece of pizza from two boxes\nA scenic view overlooking the water at night or early morning\nSeagull flying through marina with many boats around .\nA blue bird standing on the ground among large green leaves\nA male skateboarder does tricks on a half-pipe course.\nPeople ride on the back of an elephant while being guided along with other elephants.\nA peeled banana on the front of a car.\nThree people ski in a row in the snow.\nA woman riding a wave in a wet suit on a surfboard.\nA beautiful young woman brushing her teeth in a bathroom.\nA scene containing a couch with flowers and a mirror.\nAn automobile with a timer attached on a city street.\nAn elephant placing its trunk on some plants and some people watching.\nSeveral elephants dressed for the circus are in line next to people.\nTwo sheep sitting behind a fenced in area\nA man flying through the air while riding a skateboard.\nA close shot of a pizza plate with a rubber on it.\nA bear observing something on the ground of a field.\nA dog in a river chasing a red ball that is thrown into the water.\na person handing a child a plate of food\nA bride and groom are cutting their wedding cake.\nTwo dump trucks driving down a two lane road with a white pick up approaching from the opposite direction.\nLady using oxygen in bed with a little dog.\nAn elegant kitchen has an attached stone fireplace\nTwo zebras in a field are eating grass.\nthere are many computer monitors and things on this desk\nA man leaps in the air while on his ski board.\nA woman dressed in a button up white shirt, suit and necktie.\nThe lid is up on the toilet bowl.\nA dog is resting on the window sill of the building.\nPeople milling about outside in a busy city\nA young boy wearing goggles and a billed hat holding a stick.\nA group of people walking under a leafy green tree.\nA motorcyclist with a female rider in the back and a dog in a sidecar.\na woman poses in front of a giant pizza\nA woman holding her hand over a giant pizza\nSun setting on a dark street and buildings.\nGiraffe holding it's head mid way with a wooden gate behind it.\nThere is an elephant that is lying in the grass\nA giraffe presses his head against another giraffe.\nA plate with a pastry on it, topped with whipped cream.\nThe airplane is waiting at the airport for passengers.\nA group of people on bicycles riding down a road.\nA man and a woman stand in a field with cows and horses.\nA bathroom in the process of demolition.\nBlack and white photo of a small air craft.\nA person standing next to some old junk appliances.\na tennis player in a black shirt is wiping his face\nA couple of plastic containers filled with lots of food.\nA man is holding a banana in his hand.\na cow in a field looking into a camera\nA view of a cell phone and a watch on a table.\nA bowl with sliced avocado, eggs and tomatoes.\nA empty water bottle sitting on the corner of a wooden bench.\nA bathroom that has a broken wall in the shower.\nA full course meal with meat and mixed vegetables.\nA young man in a kitchen shapes dough into balls.\nA couple of elephants standing next to each other on a dirt field.\nA couple of dogs walking through a large body of water.\nTwo flat computer keyboards laying on a table\nelephants in the wild surrounding a large tree\nA large park with people flying kites in the sky.\nA man with glasses on and a suit and tie.\nA traffic light on a street corner with shops behind it.\nA boy soaring into the air, doing tricks on his skateboard.\nSmiling people are holding a large white snowboard.\nA black cat and a \"K\" sitting on a green bench.\nAn attractive young woman holding an umbrella under a tree.\nGrass roofed umbrellas on a bay with cliffs\nA person holding a banana in front of a basket containing fruit.\ntwo dogs playing in the snow as a one person wearing black uses a snow board to go down a hill.\nTwo cows are standing on the end of a boat\nA picture of several street signs on a post.\nA man walking with his surfboard on the beach.\na person on a beach with a kite flying in the sky\nThe cat stands on the edge of a bed looking at television.\nThe large scissors are sitting alone on the counter.\nA pair of scissors and some stick like things in a bag on a wooden table.\ntwo horses in a field of grass near bushes\nInfant in a high chair eating a chocolate frosted chocolate cupcake.\nA skateboarder jumping with two others behind him.\nA painting that shows a vase with flowers and a table.\nA person riding a horse, jumping it over an obstacle.\nA family of zebras standing together at a zoo.\nThere is a train attached between two buildings as a walkway.\nA black headed sheep sitting in a field looking onward.\nTwo people embrace while walking down the street under a pink umbrella\na room that has all kinds of christmas deco in it\nA crowd of people standing around an old fashioned train engine.\nA couple of very small cute kids in the rest room.\nA BLACK AND WHITE PICTURE OF TWO WOMEN BASEBALL PLAYERS\nA baseball pitcher delivering a pitch to a batter.\nA Harry Potter novel is set next to a plate of eggs and toast.\na person milking a cow next to a wall\na woman watching a dog jump up for a frisbee\nA zebra grazing on top of a grass covered field.\nA large metal pan filled with peeled food items.\nA woman at a baseball game talking on her phone.\nWhite truck with painted words parked at night.\na white box with 12 sugar glazed donuts\nA large black bear about to take a swim in a pool\nA train has been painted with Christmas decorations and lights.\nThe large crowd watches a skateboarder descend a rail on a stair case.\nA group of people having a picnic on the beach.\nA little dog that has a frisbee in their mouth.\nA small cup cake and a knife on a plate.\nA frosted doughnut with sprinkles on a table.\nA man in a yellow jacket is snowboarding\nThe flowers are in a vase on the table.\nA close up of a vary unique looking vase in front of the tree.\nA little girl standing and holding a remote in her hand.\nA baseball player holding his arm up with a ball in his hand.\nA pitcher in the middle of delivering a pitch.\nsome kind of room with some weird things in it\na bear partially submerged in a body of water\nYoung girl making funny face in residential home.\nA table with a plate of food, utensils and some other items\nA plate of cooked broccoli on a long white platter, next to a dipping sauce.\nPeople inspecting a large, shiny semi trailer truck at a park\na couple of men compete for a frisbe\nArtistic black and white photo of man on a motorcycle.\na cake made to look like two trains\nA group of oranges stacked in a wooden bucket.\nChilled beverage in glass bottle next to orange halves.\nA woman in playing with a green frisbee at the beach.\nA stuffed bear sitting in a chair with napkins and cup.\nA display of wild animals inside a building.\na big train passes under a big bridge\nA man sitting on a bench next to a man.\nTwo kitchen stools sitting in front of an island in a kitchen\nA bowl of fruit and a plate on a table.\nA phone and a computer on a kitchen counter.\nSomeone having dinner in a dimly lit restaurant with wine.\nA pole, light and traffic signal have all been painted green.\nCat sitting inside kitchen cabinet, near the dishes.\nA red and white bus driving on the street\nA living room scene with chairs, lamp and a clock.\nA girl carrying a kite walks along a beach.\nTwo children in a fire truck amusement park ride.\nA group of plates with grilled meat, bread, and appetizers.\nA man in a blue blindfold reaches a doughnut tied to a string with his mouth.\nA seagull wading in the surf at the waters edge.\nPeople enjoy a day at a mountain lake.\nPolice tow truck parked on a city street in front of stores\nLiving room with TV playing and view of a hand in the picture.\nPeople stand beneath umbrellas on a flooded road.\na dog jumping in the air with a frisbee in its mouth\nA tree with a white low trees sign hanging off of it's side.\nA man in a dirt field next to a group of sheep.\nA baby girl wearing a red shirt holding a tooth brush in her mouth.\nAn upstairs bathroom is pictured in this image.\nWoman laying down on a mattress at a store.\nA skateboarder performing a trick on the edge of a ramp.\nMan in a kilt and woman and white dress cutting into a cake.\nA faded stop sign near a street side.\nA snow covered wood bench in a park\nMajor League Baseball player taking a very fast pitch from the pitcher\nA young toddler playing in a suitcase on a bed.\na group of people playing frisby in an open field\nWoman kissing little girl's cheek under umbrella indoors.\nA man is surprised by a very large doughnut.\nPeople bringing in a loaded boat of vegetables to the market.\nTwo skiiers ski down a mountain in front of a village while it is snowing.\na close up of a pizza on a wooden spoon\nA Bus Stop sign peeking out from a vined wall\nTwo green animal food bowls sitting on a tile floor in a room being refinished\nTwo people standing near the water holding surfboards.\nThe man is standing in the snow with his snow board.\na close up of a broccoli plant with leaves\nA cow is standing near a fence in a field.\nA sign that is standing in a parking lot.\nA boy in white shirt playing with a Nintendo Wii controller.\na red umbrella is inside out in a city\nA woman sitting on a bench while reading a magazine.\nA dog riding on the back of a horse.\nA catcher and an umpire near home plate.\nthere is a blue and black bus stopped at a bus stop\nA young girl riding a skateboard behind a man on a bike.\nsome baseball players are playing baseball on a field\nA black-and-white photo of a person sleeping in a bed.\nTwo people dressed a refrigerators walk down a crowded street.\na sink a picture a mirror and white tiles\nYoung boy inspects a picture on a table with construction paper materials.\nA man is standing by a movie poster talking on his phone..\nA sesame sandwich sits on a white plate with a cup of coffee.\nA kitchen with a white stove top oven and a refrigerator.\nA man sitting in a chair with a canned beverage in hand.\nA motorcycle rider bends down on a track.\nSeveral sandwiches sliced and neatly arranged on a white plate.\nA street light pole with many street signs and warning signs.\nThe girl in a tan skirt is sitting on a bed.\nThere are electronics and other music equipment around a desk.\na person in a field with a kite flying in the air\nWoman talking on cellphone in front of personal computer.\nA red stop sign mounted to a wooden pole\nA red and yellow fire hydrant in an open field.\nA couple of giraffe standing on a grass covered hillside.\nA guy on a tennis court holding a raquet.\nan image of a woman at a ski slope\nRear-view of a horse as it grazes near concrete.\na cat is sitting on a white keyboard\nThe plate is loaded down with a lot of food.\nA pair of leather chairs beside a table and matching couch.\na hot dog sitting on top of a mound of french fries\nA young boy appears hesitant to eat some broccoli.\nA woman with glasses making a call with a cell phone\nTwo bears laying against wood with a sign.\nA plate that has broccoli and other food on it.\nA blue and silver cell phone with an accessory.\nThey've gotten off the bus to stretch their legs for a few minutes.\nHorses stand saddled in their paddock near the beach.\nA man swinging a baseball bat on a field.\nChicken with sauce and broccoli is served in a serving dish.\nA couple of boys playing frisbee against each other.\nLarge white birds with black beaks sit atop benches.\nTwo women who are standing under an umbrella.\nA plate of food, bread and salad, sits on a chair.\na big boat is going down a small river\nA caramel apple is sitting next to a jar of french fries.\nyes we have no bananas we have no bananas today\na man gives children food to feed an elephant\nA table covered in different kinds of baked goods\nA bunch of dead, stuffed wild animals on display.\nTwo men loading up the back of a truck\nThere is a mug with a fork in it and an unidentifiable liquid.\ntwo red double decked buses side by side\ntwo people riding motorcycles on a street at night\nA woman is looking at ribbon for children participating in an art activity.\nA box of pepperoni pizza already has two pieces missing.\nA room filled with people sitting at tables eating food.\nA cutting board with pizza and a glass of wine\nA woman with a purple umbrella stands on a brick street.\nA man dressed in all white is posing on a motorcycle.\nA mixture of food and drinks sitting on a table outside.\nA man sitting on a horse in the sun\nA man is sitting on a picnic table next to s ski slope.\nA woman on a bed kissing a mans face.\nA young boy puts together a kite on the floor.\nA store has displays of pans and other things.\nSeveral people in a large building that is filled with luggage tagged with yellow tags.\na perishing square tent set up with a bicycle\nA carnival occurred on a beautifully sunny day.\nA herd at of zebras are grazing in the field\nA cow poking its head between skinny tree trunks.\na skier at high speed coming down the mountain\nA bunch of vegetables and fruit arranged on a table.\nA large slice of angel food cake sitting on top of a plate.\na baseball player swinging a baseball bat at a game\nTwo guys in suits are having a conversation at the couch.\nA large truck driving down a city road.\na whole bunch with luggage standing outside\nMultiple people standing in the water on a beach.\nTwo airplanes lined up on pavement near a building.\nA man lounging in a computer room with a laptop on his lap.\nA man carrying a kite while walking on the beach.\nA young elephant walking in tall grass behind a larger elephant\nA snowboarder leans into the snow with their board.\nA lady with green hair and red boots sitting in the grass near a horse.\na man looking at the chocolate on his fingers\nA group of children running after a soccer ball.\nWoman approaching the door of a train at a station.\nA group of people sitting on a bench in front of a restaurant.\nA lot of animals that are in the grass.\nsome people are walking down the street with each other\nA nutty cake is sitting in the grass.\nA car sitting in the middle of the grass in the rain.\nA lot of boats parked in a large body of water.\nA man in a police uniform sitting on a horse by a traffic light.\nA flipped image of 2 toned room with a small chandelier.\nLight green and white painted fire hydrant with people walking in background.\nA man and a woman sitting on adjacent couches focus on their laptops.\nA golden bath area with a chandelier and blue and white bathtub.\nThe clock is below the dome of the tower.\nA close up of a full cooked pizza pie.\ntwo elephants are drinking some water on a sunny day\nA small glass of liquid sits on a table.\nA sandwich and salad on a plate sitting on a black table.\nA woman is getting ready to hit a tennis ball.\nTwo motorcycles are parked next to each other.\nA professional baseball player about to pitch the ball\nA modern kitchen with a large window by the sink.\nthere are two men riding a motorcycle and holding a umbrella\nTwo racks on top of a white counter topped with cup cakes.\nA person with a skateboard on a ramp.\nThe little girl is standing between the low shrub and the fire plug\nA man and a woman beside bicycles with orange train cars behind them.\nA snowboarder riding in the air above the snow.\nA man and woman in the middle of a conversation.\nA woman skier in costume at the beginning of a race.\nA store window with stuffed teddy bears in it\nA man bent down fixing a toilet .\nGroup of motorcycle riders being led by a police car.\nRed passenger train passing over top of a bridge.\nA small coyote is seen in the back of some tall grass.\nA large red chair in front of a building.\nA man is on snow skis on top of a mountain.\nA baseball player is ready to swing at a pitch.\nA woman serves out a sauce for her dinner party.\nA young man is preparing to throw a Frisbee.\nA brown and black dog holding onto a couple of crushed water bottles.\na nice black back splash a plant  some body oils and a black Kleenex box\nA beautiful young woman laying on top of a bed next to a dog.\nLarge pizza sitting on a table next to beer glasses.\nA fire hydrant with a blurry view in the back of it.\nAn infant is sitting in front of a computer.\nA cook placing two pies in the oven.\nA group of people standing around each other in front of a building.\nA skateboarder performs a trick on a small ledge.\nA boy on his skateboard at the top of a skateboard ramp.\na women that is playing tennis on a court\nVery small remote control that fits in the palm of your hand.\nA huge elephant is walking down the road.\nA woman standing next to a man while wearing a short dress.\nA MAN WITH KIDS ARE ON THE FLOOR\nA kitchen with an island that has place settings\nA moose is getting some shade outside an old building.\nJockey on black horse being walked around infield.\nA person cutting into a plate of food on a table.\nA long train is going down one of many tracks.\nSeveral stacks of  different types of books on a bed.\nA giraffe laying on lush green grass next to trees.\nTwo glazed and one chocolate doughnut placed on a napkin.\na police officer rides a motorcycle on a walkway\nA child stands next to a window near a bear.\nBABY IN BLUE JEAN OVERALLS HOLDING A CELL PHONE\nA view of a Sony remote, next to a laptop.\nA person on a skateboard is riding up a ramp\ntwo people riding skis on a snowy slope\nA baseball player up to bat swinging at a baseball.\nA little girl laying in bed holding a book next to a black cat.\nA woman cuts a cake while two dogs watch closely.\nA fritter and a donut on a white bag next to a donut box.\nSome chefs working together in a big kitchen.\nA picture of a person brushing her teeth.\nThe pizza is topped with very unusual ingredients.\nBicycle parked at meter outside large building with column.\nThree of six people standing and sitting at a restaurant table are on cell phones.\nA brown teddy bear sitting next to a wall with a painting.\nA small room cluttered with piles of books, a portable TV, and stereo equipment.\nA yellow rectangle sign stating that pedestrian priority crossing is ahead.\nA restaurant clock displays the time of ten twenty.\nTHERE IS A METER POST ON THE STREET\nA group of men that are in the back of a truck.\nA man in a orange and yellow outfit juggling tennis rackets.\nA sign that is on the side of a building.\nA horse wearing a pink hat pulling a carriage.\nMany people are scattered together near an Orange stand.\nTwo men playing a video game inside of a room\nA woman riding a surf board through the waves.\nA pretty young woman walking a bike with a small dog in a basket.\nA man standing in front of a brown horse.\nA very big group of happy looking people posing together.\nDifferent kites flying around in a field with a bunch of people.\nA wall with a  black and gold clock and walkway above.\nA plate full of a lot of good food ready to eat.\nA slice of cake with icing on three sides, a knife and fork beside it, on a wooden table surface with a knot area visible in the wood.\nman jumping up super high in a grey jacket\nA person on a sidewalk holding a kite for the camera.\nA very big pretty green vase with some flowers.\nA polar bear standing near a tree on grass\nA kitchen island has a farmhouse sink on it.\nA sheep looks at the camera, by the side of the road.\nTwo pieces of bread coated with a dark brown spread.\nThe train car is used as an office by railroad personnel.\nCity scene of cars at sunset going past stoplights.\nA red and white street sign that reads \"no parking any time.\"\nThe towel bar is above the toilet in the bathroom.\nBaseball batter hitting ball standing near catcher with mitt.\nTwo skiis and poles stand upright in the snow.\nThis kitchen table has fruits and vegetables on it\nA cluttered room with a televisions that is surrounded by shelves that have various games and supplies all over them.\nA view of a narrow kitchen with the only light coming from a glass door.\nA woman standing on the sidewalk, looking at her phone.\nA tall clock sitting next to a barren tree.\nA vase that has flowers inside of it on a glass table.\nA man in a blue shirt, blue hat and gray shorts playing tennis.\nA person is near a row of luggage carts as one man pushes a cart.\nA boy blowing out candles on a birthday cake.\nA small locomotive on small train tracks with people inside.\nA girl is standing against a wall in a room.\nA breakfast plate of scrambled eggs and fruit.\nBaseball player swinging a bat during a game\na large plane is parked at the runway\nA man in a wet suit on a surfboard in the water.\nClock tower sitting in a pier with clear blue water.\nLarge group of motorcycles on brick street with trees\nWoman standing on a surfboard in calm water.\nMan riding a horse in a foreign country.\nWoman on a bus looking out the window at another orange bus.\nA rusty parking meter that is empty\nA woman with blonde hair sitting at a bench in front of a building.\na man is riding a snowboard in the snow\nA giraffe hiding behind a grove of very tall trees.\nAn elephant with no tusks walking in the woods kicking up dirt.\nA person in a baseball uniform about to catch a flying baseball.\nA pizza that is sitting on a table.\nA man eating a hot dog and holding up a dollar bill.\nA plane up in the sky viewed from below labeled \"Cityjet.com\"\nA cat on the toilet peeks its head into the bowl.\nA group of people gathered around a table outdoors having a meeting.\nThere are several people in this funny boat.\nA man slicing pieces of bread with a knife.\nA stack of pancakes covered in blueberries and whip cream.\nA man is standing close to a tv playing video game bowling.\nTwo elderly men preparing a motorbike for a journey.\nThe motorcyclists turn the corner of the road next to the home.\nA cat is playing with the bottom part of the umbrella\nA traffic light and cars on a street.\nA bride and groom teddy bear each in a coffee cup on a saucer.\nA laptop computer sits on a girl's lap.\nA street sign that reads, \"right turn only.\"\nA baseball player is preparing to swing while several people watch.\nA man sitting behind a group of different wines ready to taste them.\nA black cat rubbing up against a woman laying on a surfboard.\na person walking on a train station platform\nA long train traveling past a forest near a road.\na herd of elephants walking down a path in some tall grass\nA silver railroad train traveling down the tracks\nA man with glasses showing off two cell phones.\nA dog looking cautiously at its reflection in a mirror like object.\nA man using his laptop sitting on the balcony with a water view\nPlate of food with a variety of vegetables.\nA blue bullet train stopped at a train station.\nA red fire hydrant between two potted plants.\nWhat can only be described as an interesting and presumably authentic dish.\nThe man has a tennis racket in his hand.\nA large jet liner sitting on top of a runway.\nA batter, catcher and baseman during a baseball game.\nThe train drives between the forest trees.\nA green city bus traveling by a parked truck.\nA woman is using the ingredients to make sushi.\nTwo men in a living room holding the Nintendo Wii remote.\nA clock with roman numerals hanging on the wall next to flower patterned drapes.\nA tray with carrots, snap beans, mash potatoes and an egg.\nA dressmakers dummy with hat, coat and tie.\nA plate of food has carrots and broccoli.\nA mini stagecoach being pulled by one horse and driven by one driver.\nA young girl holding a tennis racquet on a  tennis court.\nA dog is in mid air with a frisbee in its mouth.\nA skier in the air coming off a jump with a mountain in the background.\nA cutting board topped with fruits and vegetables.\nA pizza in it box siting on a table with a side dish.\nA little bird standing a the twig of a tree.\nA polar bear laying down on rocks by some water.\nA woman sitting by a man at a restaurant eating food\nA group of young men standing on a sandy beach.\nPeople sitting and walking in the patio and grass area of a building with tented sitting tables and lawn chairs.\nA man serving a tennis ball on top of a tennis court.\nA white toilet sitting on a sidewalk outside.\nMultiple skateboarders in the same outfit ride in a demonstration.\nA fit young woman enjoying a game of tennis.\nA decorative cake with several layers and an animal on top.\nA brown dog laying in an open suitcase on floor.\nA man bending over on a tennis court.\nA slice of cake baside a fancy beverage on a wooden tray.\na car with luggage bags on the roof and in a trailer\nA couple of men standing next to each other holding snow boards.\nThe bathroom is equipped with many electronic devices.\na person standing with a tooth brush and tooth paste\nTwo giraffes are standing together in a field.\na small cat gets petted in front of a laptop\nA cardinal sitting on a small branch of a cherry tree.\nThere is a man wearing a dress shirt and tie.\nA man about to sit at a restaurant table with a woman.\nMany people around a chocolate birthday cake with candles.\nA man riding down the side of a skateboard ramp.\nA man sits on a broken toilet as people walk by.\nA person flying a kite on a sunny day.\nA couple of elephants standing next to each other.\nA rainbow that is above a street corner.\nSwans gather in the middle of a parking lot.\nThe skier is repairing his ski on the slope.\nFour dishes of food are organized on a counter.\na kitchen with a double sink, stove and counter top\nA small child is enjoying a donut at the table.\nThese two people are using the phones at a parade\nA paper plate holding a piece of cake.\nA beautiful young lady standing next to a  man on a tennis court.\nA clean white bathroom with a simple mirror above the vanity.\nA man standing next to friends eating food.\nA bathroom toilet with a phone hanging on the wall.\na guy that is on a surfboard flying in the sky\nSigns and wooden poles stand in front of houses and lawns.\nA mouse pad sitting on top of a desk under a mouse.\nA couple of trains move side by side down the tracks.\nA man in a yellow shirt stands in a dirt circle.\nTwo horses standing on the grass near a body of water.\nTwo giraffes are in the foreground and there is a zebra in the background.\nA young man riding a skate board up a ramp.\nA small horse with his eyes closed standing on snow covered ground.\nA bottle of liquor called Granite next to a half-filled glass.\nA flock of sheep standing around in the middle of a pen.\nA mans face coming out of a chili dog with a fez.\nSeveral boats in a river with people in each boat.\nPeople sitting in a subway station that is in black and white.\nTwo kids are holding sprinkled doughnuts at the table.\nThe man is sitting down resting before his tennis match.\nA bowl of beans sitting on next to a sandwich.\nSix people in a boat rowing on a body of water.\na plane flying by below a slighly cloudy sky\nOld style bed has a cross on the headboard\nA dog jumping up in the air to catch a frisbee in it's mouth.\nA group of young ladies kicking around a soccer ball.\nTwo back-lit computer monitors and a keyboard and mouse.\nA beautiful woman sitting at a table next to two pizzas.\nThere is a person jumping high on a snowboard.\nA big orange cat sitting on a wooden bench.\nA child sitting at a table with a hot dog.\nhungry dog inching it's way toward the donut.\nA brown ottoman sits near a black counter in a vacant room.\nA pretty young lady carrying two large donuts in a restaurant.\nA woman is getting ready to dive in to some donuts while two guys watch.\nA couple cut their wedding cake while the bride makes a face for the camera.\nA large animal laying on top of a lush green field.\nRows of books on bookshelves in a library setting\nA room with two woman, a dog laying on the floor and table and chairs in it.\nA healthy meal with various fruits and vegetables.\nA man that is on the side of the wall with a skateboard.\nAn adult and young horse interacting in a field of grass.\ntwo trains on a track near a platform\nA dog is in the air catching a frisbee with a crowd watching.\na group of people stand by watching a group of elephants\nMan with racquet about to hit ball tennis ball.\nA man riding on the back of a white horse.\nkids are enjoying a nice game of soccer\nA man getting ready to take a picture in a field.\nSeveral people in a canoe with oars on the river.\nA tower that has a clock on the side of it.\nA large clock is next to a pillar.\nCat staring at something while sitting on porch.\na woman plays a video game in a living room\na man snow boarding on a ledge with a snowy field behind him\nA woman walking in the rain with an umbrella\nVarious toys and items on carpet that includes wallets and a camera.\nPerson holding a string with a white kite on the other end.\nYoung skier posing for photo in alpine ski area.\nA baseball player holds the ball in his glove using his other hand.\nA red motor bike is being repaired in the driveway.\nA little boy on a skateboard on the road.\nThe long plate has cookies, fruit, and chocolate on it.\nA group of people in a living room playing video games\nA man in white jacket rowing a yellow surfboard on water.\na person that is riding around on a horse\nA woman walking in a muddy field carrying an umbrella.\nThere is a birthday cake covered in this guys face.\nSeveral people who are waling on a dirt road.\na nicely decorated living room with a big mirror above the fireplace\nA person holding a little girl next to a sheep.\nA town full of street signs connected to a building .\nFish, small potatoes and broccoli are arranged on the plate.\na large black sheep who has been shaved\nThe large detailed cathedral has a clock on it.\nA ship in the ocean with a seagull and another bird standing on things on the boat.\nThree helicopters are flying through the clouded sky.\ncows with ear tags standing in a field\na laptop projecting an image on to a flat screen television\na person cooking a pizza in an outdoor grill\nParasails in the wind in front of a bridge on a gloomy day.\nTwo men on bicycles riding on the street\nA group of skateboarders watch a skater perform a trick.\nA rear view mirror view shows a truck coming up behind.\nThis is a traffic light signaling green in a downtown area.\nA man jumping over a blue park bench.\nA small infant holds a soft toy bat.\nA frisbee that is laying down in the sand.\nA man with wide eyes eating a muffin.\na giraffe grazing in a high line of shrubbery.\nA batter, catcher, and umpire anticipating the pitch.\nA big boat on the water near the shore.\nA woman standing on a tennis court holding a racket.\nthere are two urinals in a public bathroom\nA silhouette of a horse is seen against the back drop of the sea.\nOne lonely person on the platform waiting for train to open for boarding\nA tennis player prepares to hit the tennis ball\nA black bear walking through a zoo exhibit.\nA cat sprawled out over the top of a laptop computer keyboard.\nPeople walking along snow and trees with skis on.\nA woman walking a gray horse around a field.\nA dark vase is holding pink flowers in front of a window.\nA parking meter reads \"90\" minutes on the window.\na man being feed a cake seated on a yellow chair\nTwo briefcases are stacked up on a desk chair.\nThree horse and buggies are parked out in front of a building that has three steeples.\na woman writing something down on paper while the laptop sits on the table\nA green parking meter on a city street.\nA group of people on bicycles next to a passing train.\nA woman is getting in her car on a busy street.\nAn empty classroom with four unoccupied desks and writing on a chalkboard.\nPeople are standing by a small passenger train.\nA elephant being ridden by a little boy.\nan image of a man riding on a skateboard\ntwo computer monitors are sitting on the computer desk\nA train on one of multiple parallel tracks passes under a bridge.\nA pink and a blue toothbrush are on a white background.\nA person on some skis in the snow.\nA person on a skateboard going up a small ramp.\nGuy jumps high in the air on his skateboard off the hill ramp\nMan shopping at a grocery store at the produce section.\nThree children posing with their tennis rackets at a tennis court.\na keyboard a persons hand a mouse and a monitor\nOne building has a clock and the other one doesn't.\na close up of a train on a train track\nThere is an image of a bear jumping on another bear.\nSeveral trucks and cars are driving on a muddy road.\nits raining so all the people are carrying umbrellas\na lady and a man getting ready to fly a kite\nMotorized scooter parked in front of a gated roadway.\nA computer screen and keyboard on a desk.\nA stuffed teddy bear sitting next to hay holds a stuffed dog.\nBoats are docked by houses on the shore side.\nA young boy riding a skateboard at a skate park.\nA man holding a phone that has a picture on it of a man holding a phone.\ntwo people in a body of water near a pier\nA couple of young boys riding on the back of wooden bikes.\nA man is standing in the snow with skis and ski poles.\nWavy wooden seat bench with a sidewalk, grass, and stones\nA dog lying down on a couch next to a nightstand with a wedding picture on top.\nit is extremely foggy and theres a truck on the road\nA recliner chair sitting next to a table with a lamp.\na person in a kitchen preparing food on a plate\nTwo men in black wet suits ride surfboards on small waves.\nA tennis player running to hit the ball.\nA man on stilts is holding a pink, polka dot unbrella over a woman in colorful clothes in a park setting and there is a crowd milling about.\nA white bathroom has an aqua colored container.\nA very small child on a surf board near a big fake wave.\nA large airplane flying through a cloudy sky.\nA woman starts to remove something from the oven.\nA man watching a single engine plane make an approach to land.\nA row of retro kitchen designs in various colors.\nA kid with a baseball bat on a field.\na person standing outside of a building with an umbrella\nA stove pulled away from the wall in a kitchen.\nTwo pink roses sitting inside of a blue vase on a table.\nA military jet is parked on the runway of the airport\nA group of women hanging around a long table\na close up of a large and a small zebra\nA surfboard is recycled into a unique planter.\nSeveral colorful trains are parked at a station.\nA man holding a surf board standing on rocks.\nA black cat resting on a bed wearing a tiny winter hat.\nA black and white dog is catching a Frisbee in the air.\nA man eating food from a napkin in his hand.\nA giraffe with an object in its mouth.\nA baseball player up at bat in a game in a stadium.\nPeople walking in the rain on a city street some with umbrellas.\nA group of sheep sitting on the ground around a bench.\nA crowd of people standing on snow covered ground.\nA group of people crossing a street while holding umbrellas.\nA glass cup holding three toothbrushes next to a wall.\nTwo pictures of the same woman playing tennis.\nThe room has a large china cabinet, and two couches.\nTwo guys reach high to catch the Frisbee.\nPeople walking their dogs on a park trail.\nA large wooden clock by a window in a room.\nA brown dog standing next to a toilet in a bathroom.\nPink and white flowers in a blue vase.\nA group of three horses standing on a lush green field.\nA group of people eating at a table raise their glasses\nTwo people walking and talking on a huge air strip.\nA man riding a skateboard on top of a ramp.\nthis is a mulicolored stripe sun umbrella near a palm tree\nA group of skiers trekking through the snow\nA clock that is on the side of a tower.\nA boy is holding a video game controller in his hands in a living room.\nA stool is inside of a walk in shower.\nSome very pretty giraffes standing by a big fence.\nA sink is shown in front of a frame covered wall.\nTwo jets landed at an airport facility with many service trucks.\nan old school bus sits in a field in a retro photo\nA woman standing over a cake with a knife.\nTwo young boys play a game of frisbee.\nA young woman plays with a frisbee indoors.\nA baseball player about to receive a pitch in a stadium full of people.\nA man near a curb with a bag and a box.\nA picked off cake somewhat resembles the original design.\nA crucifix is on the wall next to a clock.\nA close up view of a small broccoli plant.\nA police officer is riding his motorcycle on duty\nA cat sniffing a small teddy bear laying on the floor.\nTwo birds are standing on park benches outdoors.\nSomeone chopping up foods and placing them in bowls and plates.\nA tow truck hauls a jeep along a busy street.\nWoman walking beside a man riding a horse in a yellow shirt.\nAdult in suit and tie with markings across back of hands.\nA small older bus parked alongside a roadway and another behind it.\nThe horse is tied to a tree in this snowy yard.\nA living room consisting of windows, rugs, chairs, and a coffee table.\nA man holding a kite next to another man.\nA bird standing on the sand near a body of water.\nThe bathroom in the house is clean and ready to use.\na toilet with metal walls and a sign\nA woman takes a picture of the newspaper with her phone.\nA large red bus on a city street.\na lemon sitting on top of some fish with veggies and rice on the side\nA woman stands behind her luggage next to a building.\nA group of men holding up a white and yellow frosted cake.\nA few birds wading in some shallow water.\nHerd of cows, walking in a drinking from, a river.\na man in sunglasses and a pilots uniform looking down\nA skateboarder grinding down a red and black ramp.\nTwo men sitting at a restaurant table holding up a tray of pastries.\nA food entree is shown on a platter.\nThis kitchen has wooden ceilings and two tables\nFew horses out in the distance eating grass\nA picture of a black bag with a motorcycle in the background, on a dirt path.\na train with green and purple on it on a track\na small vase sitting on the table with flowers inside\nA stop sign outside of a building with the word Liberty on it.\nTwo older gentlemen sitting on a public bench in a park.\na close up of a pizza on a pan\nA woman with a dog throws a frisbee to a hill.\nThe inside of a bicycle store with numerous related items on display\ntwo sinks a toilet mirrors and a counter\nA baby sitting on a couch next to a brown teddy bear wearing a t shirt.\nA dog standing next to a sheep behind a fence.\nA man is in a field flying a kite.\nA airplane with striped wings is in the air.\nA man in snow gear in skis at the side of a snow slope.\nSmall desk with electronic equipment in office type room.\nA plane is flying over a gas station preparing for landing.\nA man spins around with his forehead on a baseball bat.\nA group of men tours a building that has had fire damage.\nA white sink and toilet in a small bathroom.\nA man sitting on a sofa using a laptop with his dog curled up next to his shoulder.\nA man and a woman dress up in costumes.\nA horse is standing behind behind a fence\nA red, white and blue train on the tracks beneath a mostly cloudy sky.\nA person doing tricks on a skate board at a skate park\nA motorcycle sits parked in the corner parking space.\nA professional baseball player in a white uniform on a baseball field with a bat in his hand.\nWe are looking at a photo of buses in a demolition derby.\nA man rides a snowboard down a snowy hill.\nA baseball player takes a high grip on the bat as a catcher scrambles for the ball.\nTwo guys are excited to be catching a Frisbee in this tournament.\nA glass coffee table sits in a living room.\nThe man wearing a hat holds a kite near many other kite fliers.\nA green plate topped with pasta, broccoli and a salad.\nTwo shorn sheep graze on tall green grass in a sunny pasture.\nA dog on top of a building yawning and an airplane above him.\nVarious contents laid out on  a wooden table\na number of people near many bunches of bananas\nA dog is chasing a frisbee in a park.\nA man standing on a tennis court holding a racquet.\nA toddler with a frisbee in his hand.\nfour children in a living room with one of the children holding a game controller before a television.\nThe men are playing a game of baseball in the yard.\nA clean living room containing a couch, three tables and other decorations.\nA female tennis player is holding the racquet, ready to swing.\nTwo plastic containers sitting on a table filled with food.\nA man standing beside a robot with a camera around his kneck\na kitchen with a sink on a counter top\nA woman holding an umbrella while a man walks behind her.\nMen are eating hotdogs during an eating competition.\nA bed sits in a white room with a window view of a nearby house.\nA bus that is parked inside of a building.\nA gigantic bird statue outside of a building\na street people cars trees and buildings and police\nthere is a man with a uniform at the supermarket\nTwo bananas that grew as one  speckled banana\nA older woman is fixing a younger man's tie.\nKitty fast asleep on its back on the bed.\nA BED NEATLY MADE WITH A TABLE NEXT TO IT\na giraffe sticks out his blue tongue at zoo visitors\nTwo giraffe sitting on a dirty lot next to a forest.\nA cat snooping in a bag on a bathroom counter.\nLooking up at the belly of a jet airpline\nA young many getting ready to serve a tennis ball on a clay court.\nAn old man is flying his kite in the middle of no where.\nDog curled up in the bed under covers\nA living room area with couches facing a television and windows on the side wall and on the wall behind the television.\nSeveral people are skiing along on a snowy field.\nSome bottles and glasses of wine surrounding entrees.\nA woman takes a bite of her sandwich.\nFood and a cutting board sitting on a table.\nHere is a stop sign with graffiti written on it.\na young man is doing a skateboard jump.\nA cake on a table with other desserts and pastries.\nA baseball player swings at the pitch being thrown.\nA zebra standing in  a dry field of grass.\nThis is a nighttime image of a church\nA baby in a crib with a nighlight\nThe brown and white cat is sitting by the computer.\nThree pies on separate plates on a table.\nOutdoor subway train pulling into an empty station.\nSugar covered doughnuts are heaped in a pile.\nA man standing on the grass preparing to throw a frisbee.\nA man seated with a mouse, a keyboard and cell phone in front\nA calf is laying in a pen as people gather outside to look.\nTraffic is stopped on the road because of a red light.\nA truck parked on the beach next to the lifeguard station.\nA man is on his cell phone outside under an umbrella.\nA colander sitting on a countertop in a kitchen next to a microwave.\nA man in black shirt doing a trick on a skateboard.\nA flight of red brick stairs which lead to an antique bench and a view of historical brick buildings.\na number of small boats in a body of water\nA table topped with a giant penny and a tray full of vegetables.\nA dirty kitchen stove with a timer located on at it's center.\nAn electric pole with a single light and two street signs next to a 5 eleven sign.\nA yellow train parked at the end of the tracks\nA guard with a dog walking around a bus in a parking lot.\nA cutting board with a beet cut in half\nA tall giraffe rests amidst green grass and trees.\nA man is skiing in a snowy forest.\nA tennis player dives for an incoming ball on the court.\nSmiling man enthusiastically hugging a plush teddy bear.\nA full view of some tall buildings in the downtown.\nA bird standing on debris in the water\nA young boy is skateboarding in the middle of a parking lot.\nA man playing tennis with a racket in one hand.\nA skateboarder riding on a park bench, on a cloudy day.\nA black cat holding a Nintendo Wii controller.\nMan smiling while displaying food item in kitchen area.\nTwo people stand by a motorcycle and a van.\nA black and white cat walks near someone's legs.\nA group of people gather near a motorcycle.\nA man talking to another man sitting at a table in front of a laptop.\nA kitchen with refrigerator, sink, and a curtain over a doorway.\nTwo teams are playing a frisbee sports game.\nCarrots, cauliflower and broccoli sitting in a clear container.\nA man eating food off of a plate\nA tennis player hits the ball up from the racket.\nA large jetliner sitting on top of an airport tarmac.\nPeople sitting on a bench above the water.\nA kitchen that has wooden cabinets and a kettle on the stove.\nAn orange and white kitten laying on a chair.\na cook standing in a restaurant kitchen while making a meal\nA woman smiles while posing wearing skis and holding ski poles.\nA woman in grey sweater lighting a candle at table with hotdogs.\nA skateboarder performing a trick on a ramp.\nMan and a rowboat next to a misty mountain lake\nSeveral cut up carrots boiling in a pot of water.\nA bus all lite up inside and out with diffrent colors .\nA television and a cabinet on both side.\nParents and children dressed up as Santa Claus sit on a park bench.\nA simple yellow vase holds two red and white tulips.\nsome people and a long thin boat water and houses\nBaseball player coming in for the base, while catcher readies himself to get the ball first\nA person on a skateboard and a person walking a bike.\nA glass sitting on top of a wooden table. next to a keyboard.\nCommuter buses parked in a lot outside an apartment complex.\na table full of vegetables and fruits stacked on top of each other\nA plate with a sandwich with people in the background.\na police officer on a red motorcycle in the street\nThe child is pushing a cart full of luggage bags.\nA young toddler is standing next to the high toilet.\nA female nurse is standing next to a man in a bed.\na blue and white  road sign written in japanese\nA small elephant standing next to a wooden tree.\nA lone person snowboards down an empty slope.\nAn open laptop computer sitting on top of a bed.\na street light on a street next to a tree lined median.\na close up of a bench near a ledge with a statue\nAn antique car being towed on a flatbed trailer\nTwo guys are having fun while playing the Wii.\nThe young person in a cap is grabbing his skateboard during a jump.\nA dog attempting to use its mouth to pick up a pair of scissors from the floor.\nthis is a pizza and a fork on a table\nA group of women with umbrellas and a couple without umbrellas outside in the rain.\na woman tosses a frisbee in a public park\nA train on an overpass over a parking lot.\nthree kids at the field playing frisbee together\nA tennis player is in motion with his racquet raised.\nA man who is holding a skateboard in his hands.\na close up of a sink with many dishes in a tray\nA man on a phone with a sandwich in his hand\nA red fire hydrant sitting in the middle of a forest covered in snow.\nTwo little girls sitting in the grass with toys.\nA kitchen scene with focus on the microwave and oven.\nA dog sitting on its dog bed in the middle of a living room.\nAn adult zebra and a young zebra stand together in a zoo enclosure.\nA man is on a paddleboard in rough waters\na man holding a teddy bear standing next to a basketball.\nA man flies a kite at the beach.\nA sandy beach topped with lots of people and tens.\nOne cat lying on the floor, and another with its front paws up on a stool\nA man holding a plastic gun in his hand.\nA train is passing along a hillside during the day.\nA baby in pajamas outside, sitting on a skate board and waving to someone.\nA street sign hanging from the side of a pole.\nThis is a photo of a hotel bathroom, all nice and neat.\nA pen of five sheep surrounded by other pens.\nA person in a large yellow and purple train.\na large air plane on a run way\nA herd of sheep shares the road with cars and motorcycles.\nA paper mache \"bandit\" piece of artwork stuck to a pole under a \"neighborhood watch area\" sign.\nPeople standing in line by several food trucks parked on the street\nA man holding a huge slice of cheese pizza with a crying kid on his lap.\nMan on purple tennis court swinging at a ball.\nA knight riding a horse and greeting a crowd as he clutches his shield.\nA broom is hanging on the wall of a house.\nPicnic tables under a pavilion in a park.\nAn assortment of thrift store objects, including two vases and some miniature carousel horses.\nA man riding through the air on top of skateboard.\nA young child is palying with some books.\nA giraffe standing up with its legs crossed.\nLOTS OF PEOPLE ON THE STREET IN A DIFFERENT  DECADE\nA tennis player reacts during a match in a tennis court.\nA man is standing by a trolley station as one approaches.\nA brown couch sitting in front of a flat screen TV.\na plaza filled with a lot of birds and some cows\nThe produce section of a grocery store\na couple of boats sit parked by a dock\nA white and red moped is parked on the sidewalk.\nA young woman with a smile on her face rides her bicycle down the street past some parked cars.\nFour Blue Angel jets are flying in formation.\nTraffic sign and barricades on roadway near large city building.\nA zebra stands alone in some tall grass.\nA blue and yellow train in the train station.\nSeveral people sit on top of an elephant as another person watches.\nA white toilet in a small tile walled bathroom\ntwo people setting up a table of food\na girl is looking at her cellphone and a blond boy\na big tower that has a clock on top\nSomeone who is enjoying some rest on the edge of the pier.\nA group of people flying kites on a field.\nA man that is standing on a court with a racquet.\nA sheet cake with a tractor frosted on it.\nA very clean, modern and minimalistic style bedroom.\nA woman using a Wii controller and playing a game.\nTwo giraffes in a zoo enclosure with a zebra.\nA man wearing a glove pitching a baseball.\nA woman with her face painted white and decorated with a dragon.\nA traffic signal underneath some tall buildings.\nA rear view mirror on the side of a car door.\nA bedroom scene with focus on the window with the bed in the reflection.\nFour men playing doubles tennis on a court.\nA Chanel sign sitting behind a display window at a store.\nA man riding a motorcycle with another man hang off it's side.\nA dirt road through the woods with a rolling suitcase.\nA  black motorcycle is outside of a house\nThe children watch the man and woman cut the cake.\nmany people in a boat in the water and trees\nGlassed in bathroom with the sinks on the outside.\nA truck waits in traffic next to a wooded area in the city.\nA man standing next to a cat in a kitchen in front of a laptop computer.\nA bathroom door is open showing a shower with the shower curtain mostly open.\na brown table and chairs and a vase with flowers\nSeveral students sit at a conference table with their laptops.\nA photo of a jetty and a body of water.\nA surfer wipes out as the waves break.\nToddler getting ready to hit a t-ball with his bat.\nYoung lady falling to the ground catching a frisbee.\nTHERE ARE TWO ANIMALS THAT ARE PLAYING TOGETHER\nLarge brown bear sitting next to rocks in open area.\ntwo boys with a skateboard and  a bicylce\nA young boy holding a Nintendo Wii game controller next to a lego controller.\nAn umpire in the field, talking to the batter.\nA group of people riding an elephant on a dirt road.\nA group of trucks and cars are coming out of a tunnel.\nA man walks past a weathered structure and parking meters.\nA small hotel room with a king bed.\nA male standing up with a Wii remote control strapped on his hand.\nA guy is performing a trick on his skateboard.\nA person standing on the river bank by bushes\nA picture of an open air zone that looks incredible.\nA man pulling a piece of something from a machine\nA large stuffed bear is sitting on the ground with a cup next to it.\nA man in a beige suit with graying hair.\na couple of vehicles on a busy city street\nA man stands with his produce in baskets.\nAdult preparing to catch flying disc in open area near trees and water.\nPeople stand in line for an ice cream truck.\nA man in a the middle of a bunch of cows.\nA female tennis player hitting the ball with racket\nPeople on a busy sidewalk, some on bicycles.\nA fat ass sitting on a toilet with lady magazines.\nA small white tow truck parked to another small white tow truck.\nA man without a shirt squatting on top of a skateboard.\nA man on a motorcycle driving beside a van.\nThere are signs on a cobble stone sidewalk\nBlack and white photograph of a man on subway with bicycle.\nA stop sign in an older residential neighborhood is marked with graffiti.\na couple of large pizzas that are sitting on the table\nThe man is spraying down the toilet to clean it.\na bunch of people watch a person do a trick on a skate board\na man on a beach tries to fly a geometrically made kite\nan image of a clock on a tower high in the air\nA large black dog standing near an open suitcase.\nA man standing behind a white frisbee on a green field.\nA black-and-white photo with a colored red double-decker bus.\nLine of people behind plastic fence with umbrellas\nThe street signs are clearly visible for us to see.\nA pair of red scissors sitting on top of a piece of paper.\nA baby feeding cake to a man with a fork.\nSEVERAL PEOPLE STANDING AROUND APPEARING TO BE LOOKING AT SOMETHING\nskateboarder in red helmet jumping on skateboard ramp\nA person working at an airport, outside of an airplane.\nA birthday cake for a child is sitting on the table with candles.\nA dirty, overturned motorbike lays in the mud.\nAn airplane in the middle of a field with some jeeps parked near it\nThe front of a store that has a large teddy bear in the window.\nOld suitcases piled to the ceiling on a luggage cart make art in an airport\nA giraffe standing on a lush green field.\nThe white and black dog is in front of an open refrigerator\nA small boy next to a table upon which sits a birthday cake shaped like a racecar.\nA train runs down a track past run down buildings.\nA man that is standing on a platform with a frisbee.\nA black plate with a hot dog and fries\nTwo pieces of pastry sitting on a mat next to a spoon and fork.\nA bunch of green bananas is on a tree.\nA woman holding a video game controller and grinning.\nA pan of some kind of food cooking in an oven\nA man in a suit and colorful tie in a park.\nA close up of a girls boots as she sits on the counter.\na black and white photo of a cow near a tree\nAssorted vegetable along with cheese and nuts for food preperation.\nTwo tennis players are doubled in their pursuit of the two tennis balls.\nA man on his skis on a snowy slope.\nA comparison photo showing bathroom in regular view and in stretched version.\nSmall yellow container attached to a digital camera.\nTwo slices of square pizza on a plate with a fork.\na man in a tie holding a cup\nA street scene of an intersection with a street light.\nThree wild cows in a field on a nice day.\nA small sink inside of a very clean and white bathroom.\na small machine is sitting by a cliff\nA wall with words painted on the glass and people behind it.\nA laptop sitting in someones lap and a dog lying on the floor.\nA little girl posing for a photo next to an elephant.\nA man is at an outdoor table under an umbrella.\nA bathroom window reveals a snowy day outside.\nAnimals grazing in grass in front of an industrial landscape.\nI THINK THIS IS A RECEPTION HALL AND ITS FULLY DECORATED\nA red frisbee has been thrown by a man.\nA blender filled with flower and eggs on top of a counter.\nColorful collection of fruits and vegetables with some type of \"baby\" decorations\nA white boat on water with brick wall next to it.\nA baseball player up to bat and missing a hit.\nA park bench sitting in a snow covered field\nA woman cutting thru a pastry on a white cutting surface.\nA vase of flowers one being a large sunflower in front of a brick wall.\nA woman biting into a sandwich with a happy look on her face.\nA man sitting on a couch playing with a game system.\nTwo people walking down the sidewalk with a \"wrong way\" street sign directly above them\nA couple of horses standing on top of a grass covered field.\nAn elephant is drinking with its trunk at the watering hole.\na close up of slices of bread on a table near a spoon\nA woman with eyeglasses in a kitchen with bowls, spoon and glasses\nA person is eating at the kitchen table.\nTwo women are facing each other and one is blow drying her hair.\na drawing of a big fancy court house\ntwo elephants walking in  an stone enclosure.\nA cake with a train on a track and a ground made of cookies.\nA baby is laying on a bed with the cat next to and a man is looking over in the mirror.\nA supply truck in a snowy area driving towards a tunnel.\nA bunch of carrots are on a plate next to broccoli.\nA fire hydrant on a sidewalk next to grass\nA stirfry containing broccoli, carrots and other vegetables.\nA wooden bench surrounded by potted plants in front of a house.\na close up of a teddy bear on a balcony\nThe storefront of a bakery that has been painted green.\nThree stuffed animals hanging on the structure of a train track.\nChildren get off a school bus as a crossing guard stands among them.\nTwo horses underneath a canopy of green trees\nGroup of mixed vegetables sitting on a counter top in a kitchen.\na close up of a plate with a sandwich\nA carnival atmosphere is effected by these colorful stuffed creatures.\ndelivery truck dropping off delivery at train depot\nA young child laying down in bed with one arm raised up that is wrapped up.\nA woman in black jacket with umbrella on a sidewalk.\nDifferent types of fruit are shown on the counter.\nTwo individuals sit on motorcycles on a busy street in the rain.\ntwo people riding motorcycles, one sliver one white\ntwo girls and a boy standing in a kitchen\na couple of birds stand on top of a rock\nA man in a white shirt with a red tie is standing in front of a door way.\na couple of small beds are in a room together\nA white bedroom decorated with low level furniture\nPeople looking at zebra and cows behind fences at a zoo.\nA man standing on one leg on a baseball mound.\nThis tennis enthusiast, not using correct form, is practicing on the court in the city.\nA large eating and living area inside a house\nA lady walking with an umbrella during the day on a rainy day.\nA person on a skateboard on the ground of a park.\nA person sitting on a bicycle at night outside of a shop.\nTwo different types of dogs sitting together on a bed\nPeople are riding in a boat on a lake.\nTwo horses rubbing necks together in a field\nA zebra is chasing another zebra inside an enclosure.\nTwo zebra standing next to each other in a field.\nA white toilet sitting in a bathroom next to a wall.\nA woman standing on skis and holding poles in the snow.\nA GROUP OF SHEEP WITH MOST OF THEM IN INDIVIDUAL CAGES\nAn empty alley with a street sign at the end of it.\nA sandwich that has several toppings on it.\nThe man is kneeling low to hit the tennis ball.\nA lone giraffe bending over to graze within an enclosure\nA man driving a scooter with two women sitting on the back of it.\nA very up close picture of a sign.\nSomeone is holding a toy bear in the mans face.\nA tour boat on a man made lake under a blue sky\nA man hitting a tennis ball with a racquet.\nA young boy is doing a trick on a skateboard.\nA man working in a market surrounded by produce and meats.\nA dog sits in a suitcase with a doll.\nthis is a group of people eating together\nA player is swinging at the ball in a baseball game.\nA blue doorway with a clock mounted above it.\na teddy bear sitting in a hanging basket\nA man talking on the phone while he walks down the street.\nThe bear is wondering about in the woods.\nTwo wooden desks holding keyboards and computer monitors.\nA man cross country snow skiing near a wooden post.\nThere is a young child sitting on the floor in front of the refrigerator.\nThis bedroom features a bed and chest of drawers.\nTwo men stand next to a horse near a bus.\nA long bookshelf behind the head of a bed.\nA large fire truck with a water tank on the side of the road.\nA group of people carrying bags of luggage through a lobby.\na baseball player swinging a baseball bat at a ball\nThere are people riding bikes on the street.\nA LARGE PAINTING ON SIDE OF BUILDING WALL OF TOASTERS\na small bird sitting perched on a chain link fence\nTennis players at match standing over net shaking hands.\nA city street lined with very tall buildings.\nA woman in white shirt holding a kite on beach.\nA table with a sandwich, sandwich makings and glasses of red wine.\nThe unique meal includes both carrots and peppers.\na kitchen with maple cabinets and black appliances\nA wine bottle is being used as a flower vase.\nThe alleyway is lined with many parked motorcycles.\na woman is walking down the street in a sweater\nWoman eating a really large sandwich at a dinner table.\nBEAUTIFUL VIEW OF A BLUE SKY TOWERING OVER WHITE CAPPED MOUNTAINS\na white plate with three donuts and two drinks\na boat docked at a wooden dock on a lake\nA young man uses a fork to eat food.\nMotorcyclists perform a pyramid stunt in a darkened auditorium.\nA person sitting down in front of a laptop.\nthere is a young girl that is feeding a giraffe\nCattle grazing and eating grass while looking at the camera.\nA child, wearing a cat costume and umbrella, stands before a brick building.\nThe little league ball player is posing for his picture.\nThree men sitting on top of a green bench.\nA bouquet of flowers is stuffed inside an arrangement of wine glasses on a table.\nBaseball pitcher in the middle of a windup.\nA vase sitting on top of a plastic pedestal.\nA mannequin wearing a jock strap, unbuttoned shirt and tie\nA couple of men sitting on the side of the street.\nAn old flip cell phone inside a cozy.\ntwo kids battle over a soccer ball while on a field\nA red motorcycle on display at a show.\nA stuffed animal sits atop a barbed wire fence.\nA man sitting in a car on a cellphone.\nA tennis player on the court stepping backwards in preparation to swing\nA bedroom with a fluffy comforter and lights above the headboard\nthis is a man skiing down a hill\nAn older male with white hair holding a flip phone.\na female tennis player diving to hit the ball\nA red and white plane sitting on a runway.\nA group of people standing around each other.\nMom has to help him eat his hot dog and bun.\nA kitchen counter with some bananas and eggs.\nA group of people who are sitting on horses.\nA small vase of pink and yellow flowers next to a candle holder.\nA variety of vegetables laid out on a kitchen counter.\nIt is out in the open with various things in viewpoint.\nAn advertisement for Samsung Galaxy Golden cell phone\nA man wearing headphones looking at the camera.\nChildren in the snow with skies and snowboards.\nan exhibit featuring various animals under a wooden roof\nA big bear standing out in the shade and sun light\nA city is lit up at twilight near a river and a clock tower is lit up in the distance as a large boat is seen on the river.\na number of giraffes in a field near one another\nTwo soccer player on opposing teams playing soccer.\nAn Asiana Airlines plane taxiing at an airport.\nA man riding a red motorcycle on a street next to a crowd.\nA blue street sign in an Asian language and English.\nPink lunchbox filled with fruit and vegetables and snacks.\nA white refrigerator on the side of a road next to cars.\nVarious vegetables in a roasting pan in an oven\nA giraffe walking through a lush green field.\nA bowl filled with apples, limes and lemons.\nA man riding a gray elephant holding a ball in it's trunk.\nA bus stops on a street corner as pedestrians walk down the street.\na bridge lit up with some  blue lights\nHe is heading for the beach with his surfboard.\nA couple of kids looking out of a window on a subway car.\nTwo dogs are watching a television set intently.\nThis is a picture of a persons garage sale.\nAn elephant under trees in the night time\nA chair and debt with laptop, monitor and a cat.\nA black, brown, and white cat is near a laptop.\nA teddy bear wearing green hat and jacket.\nSeveral baseball bats leaning against a fence with a short hanging from the fence.\nA person in red jacket skiing down a hill between trees.\nA woman dressed in costume is sitting on the motorcycle\nA woman on a tennis court is hitting a ball with a racquet.\nA man in a traditional African outfit gestures while a black cow is in the background.\nA lone skier is seen on the slopes on a cloudy day.\nChildren flank an old pickup truck in a parade.\nThere is a bird statue and clocks outside of an apartment building.\nA baseball player holding a baseball bat over his shoulder.\nA plate of food that includes meat and broccoli.\na person wearing an apron in front of kitchen appliance\nThe hot dog is loaded with many toppings.\nA large jetliner flying through a gray sky under clouds.\nA bunch of ripe bananas sitting on top of a table.\nsome baseball players are playing baseball and some trees\nA man talking on a cell phone while sitting down.\n2 zebras outside eating grass in a wide open space\nA man in India herds a number of cows on the street.\nA laptop and desktop computer on a desk with a light on next to them.\nA group of baseball players standing on top of a field.\nA train that is driving through some houses.\nCat and dog in the windowsill of a building.\nA desktop computer is displayed at a wooden table.\nA large group poses for a photo in their ski gear.\nA man sits in the snow while breaking from snowboarding\nA man standing and talking on a phone in a courtyard.\nAutomobiles stopped at an intersection because of a passing train.\nA giraffe standing on a stretch of sand at a zoo.\na woman wearing a crown and a young boy smile at a table with a cake\nAn older man and a boy are on the beach with their surf boards.\nA bench next to a lamp post on a cobble stone street.\nTwo parking meters that are nearly covered with snow\nCars present at an intersection with traffic lights.\nA sign indicates when parking is off limits on West 25 12 street.\nA Pizza with red peppers, zucchini and cheese.\nA bird with a red face is standing on a rock.\nA dog laying on the back of a couch.\na kitchen is decorated with american flags\na tropical bread on a branch surrounded by trees\nA zebra standing in a dirt field next to green plants.\nA colorful plate with a pizza sitting on top of it.\nthe people on the beach are flying kits over head\nA man in white baseball uniform throwing a pitch.\nA boy with a helmet on eating food across from a bicycle.\nA large giraffe standing in a dry brush field.\nWoman in grey and blue throwing a frisbee.\nDozens of people on a grassy field flying kites.\nA man that is standing in the snow.\nA cat is standing on a toilet with its front paws inside.\na small baby is biting into some food\nSeveral pieces of pottery in the process of being painted.\nVarious types of flowers sitting inside of a vase.\nA little girl holding a colorful umbrella next to a penguin.\nA red piece of luggage sitting on top of a bed.\nA black and white shot shows evergreens, bare shade trees, and bushes that slightly obstruct the view of a building with a low roof in comparison to its  clock tower, which stands more than twice as tall as the evergreens, against a grey sky.\nA very long street with traffic under some cloudy skies.\nThe men are racing on skis on the snow covered race course.\nsome white and brown signs a tree and a building\nThe corner of college street and 5th street\nMiniature Poors on the side of the road in a rural mountain.\na \"use crosswalk\" sign on a post in front of a rain-covered street\na person holding up a cell phone\nA man sitting on a couch has two cats on his lap.\nMilitary jet on tarmac near wooded area on cloudy day.\nA desk with a monitor, keyboard, and laptop on it.\nA plate with a piece of cake and a spoon on it.\nA table with family photos, sentimental mementos, and a potted plant\nA couple of buses that are on the lot.\nGroup of four ladies sitting at table overlooking parking lot\nFlowers in a window box sit in front a closed window.\nPeople behind a barricade watch a man ride a motorcycle.\nThe bathroom has been cleaned and is ready to use.\na woman on a tennis court holding her tennis racket up to hit the ball\nA couple of men sitting at a table with pizza.\nAn orange cat  grooming themself underneath a piece of furniture.\na yellow hall with a brown floor and a mirror\na cat sitting on an organ looking out the window\nOne adult giraffe and two kid giraffes standing in the woods.\nA man in cowboy hat on horse next to cattle.\nA trolley at a train station at night.\nA soccer player in front of the goal holding a soccer ball.\nA couple of horses standing in a grass field.\nA young person with an umbrella is crossing a busy intersection.\nSeveral closeup shots of giraffes near a fence.\nA airplane that is sitting on a runway.\ntwo female tennis players are playing tennis on a court\nA pair of workers unloading the back of a pickup truck.\nA glass table contains a bowl of spheres and two fancy vases.\nA man stands with several ripe and unripe bananas.\nA cellphone next to a laptop computer.\nThere is a man on skis in the snow.\nA humongous jumbo jet is on the airport runway.\nSeveral people are flying kites in a field.\nA black and white dog catches a Frisbee in the grass\nA glass vase filled with different colored flowers.\nA street on a city at night that says \"Obama\".\nA large silver truck with a tractor parked on it's flat bed.\nA large knife sticking out of an apple in front of a blood soaked wall.\nA man who is swinging a tennis racket.\nThe orange and white cat is wearing a bow tie.\nA huge group of people stand outside several buildings, holding umbrellas of various colors\nSmall bird feeding near chair in grassy area.\nA cat laying on the ledge of a window.\nA white and brown sandpiper with a long, black beak lifts up one leg.\nA city street with people, cars and police.\ntwo long haired cats laying on a bed beside each other\nThe man is cutting bell peppers near a large pot on the stove.\nA man sitting at a table with a large plate of breakfast food on it.\na person standing at a tennis court holding a tennis racket\nA traffic light near a building on red.\nSuitcases revolving around on an airport baggage belt.\nA baseball player throwing a pitch into the field\nA tow truck and fire truck are at the scene of the accident\nAn ocean view with people water skiing using parachutes.\nA cat looks happy while sitting in a bowl.\nA child is flying a kite while sitting in a yard.\nThe fire truck red and the green pastures make it look just like Christmas.\nTwo people sit near many luggage bags using laptops.\nA meal of noodles and broccoli being held by chopsticks.\nA woman carries a basket of bananas on her head while some men stand around.\nboy skateboarding next to a graffitti covered wall\nA lad and a lady patting their favorite horse.\nPeople are sitting on surfboards in the water.\na couple of people that are walking in some grass\nPeople on a boat on a lake and two people jumping into the water.\nPeople standing on a dock near a elephant on a phonton boat.\nI am unable to see the image above.\na man that is sitting at a table with a laptop\na little boy that is eating a pizza\nA baby zebra is standing in a pen\nA large truck sits on the dock as a boat pulls up.\nThe bathroom has a shower area, toilet and sink.\nA display case in a store filled with lots of efferent foods.\nan extreme close up of many different types of bottles\nA boy standing on the grass as bicyclists ride by.\nFour men riding horses playing a round of cricket.\nA man feeding a baby her bottle with a smile.\nA bunch of candles that are on a cake.\na woman sitting on a wooden bench in the middle of nowhere\nA very tall white clock tower sitting under a blue sky.\nA bathroom with white toilet and walls and blue accent bars.\nA woman is holding an umbrella while walking down a flooded street.\nTwo apples and a bowl and jar of applesauce on a cloth.\na line of very tall buildings next to a clock tower\nA group of people who are standing outside.\nThe apple and banana are on the table.\nFour trucks are parked in front of a paint store.\nA small sandwich made on fresh bread with lettuce and mayonaise\nA cat that is in a white sink.\nA group of people sit on a couch in front of a kitchen.\nA city street with busy traffic including a yellow bus, many cars and a person ridding a bicycle.\na number of baseball players with bats\nA soldier who is standing near a goat to feed it.\nA woman taking a hard swing at the tennis ball.\nA person watches tv in a room with a couch and a laptop\nA bed with two pillows and a backpack leaning against it\nA yellow and orange double decker bus is shown.\nA man surfing in the ocean as the sun sets.\nA living room with white furniture and a small wooden table.\nA group of athletes engage in an organized game of ultimate frisbee.\nA cat on the floor next to a room with a sink\nA living room tastefully decorated with flowers on the coffee table\n1 12 loaded hot dogs and veggie side\nA person standing next to a chair with two tennis rackets.\nA man is swinging a bat at a baseball game\nA toy fire truck sitting on top of a wooden table.\nA man is riding a skateboard in an underground parking garage.\nThere is a wood bench in the garden.\nA bright computer screen inside of a room.\nWindsurfer kites are seen from above the beach.\nA kitchen with white counters, sink, and stove.\nThe elephant is attempting to complete the difficult trick.\nA small white plate of food on a table.\nTwo giraffes and one other animal grazing in a field.\na man holding a tennis racket on a tennis court.\na close up of a young person holding a kite\nA sleepy tortoise cat laying in front of the monitor.\nA man at a kitchen counter preparing food.\nAn old diesel truck driving down the path next to freeway\nA beach volleyball game with a kite flying in the background.\nSeveral baby elephants standing on a plain on the side of a river.\nA large group of people sitting in the sun\na number of food trucks parked near one another\nSemi trucks on a parking lot with orange cones.\nA photo taken in a car looking at a dog in the back seat.\nYoung man surfing a fairly good size wave\nBLACK AND WHITE PHOTO OF A MAN AND A WOMAN\nTrunk and small chest in cream colored room.\nA baby girl with beautiful blue eyes standing next to a brown teddy bear.\nMen playing soccer on a field at dusk.\na building with a clock at the top.\nA couple of clocks mounted to the side of a wall.\nA young person riding a skateboard up the side of a ramp.\nA man standing on a tennis court holding a tennis racquet\nA white plate of food on a table.\ntwo donuts left in a box of donuts on a counter\nA woman wearing skis with her black dog in he snow.\nA young boy smiles while holding a hot dog.\na pizza in an oven not yet cooked\nBarry Bonds holding onto a baseball with the number 754 written next to him\nA kitchen scene with focus on the pantry and a clock.\nSeveral planes fly through the sky, close together\nA smiling woman holding her cel phone up and open beside her face.\nA small dog getting a bath by it's owner.\nA man is riding his horse on the field near the blue trash cans.\nA woman holding a cat in her arm.\nA fake large cow that is standing in the snow.\nA tennis player is bending over and reaching to hit the ball.\nA long mirror is above the sink in a small bathroom.\nA view of a city and a body of water from a plane.\nA person that is driving on the street.\nA twin engine aircraft is flying in the sky.\nPeople are being served at the outdoor restaurant\nI am unable to see the image above.\nThe lights of a vehicle streak across a modern bridge.\ntwo people are cutting into a cake with forks\nSeveral zebras from behind standing on grass plain with distant trees.\na plate filled with grapes and some sliced apples, kiwi and oranges\nFour young people crowded in a bathroom brushing their teeth happily.\nA boat docked at the shore of a lake.\nSome people with wine glasses are smiling and laughing.\nPeople with umbrellas looking towards the grassy area\nA peacock standing near some metal grill fence\nA young man in a tan suit and shoes\nTwo sheep on the top of a hill covered in grass.\nThere is a tennis player holding a tennis racket\nPeople near a stone building with a clock tower.\nTwo pictures of a woman talking on he phone at a coffee shop\nBaby elephant alone by a tree in the evening.\nPatty on a whole grain bun served over salad.\nA doughnut sits on a napkin, with red frosting and one missing bite.\nA cat sitting by a microwave under a cabinet.\nan image of a man in the middle of playing baseball\na zebra walking on a dirt path near a fence\nWoman riding bike with basket and walking dog.\nScissors with a blue handle are in a plastic package.\nThe skateboarder is trying his latest aerial trick.\nSeveral parked bikes sitting in the grass near a tree.\nAn airplane is flying high in a blue sky.\na large plane is sitting on a runway\nA dog staring at a camera while laying on a bed.\na picture of public restrooms taken from the outside\nThe man is flying his kite high in the sky.\nA bowl that has food and a spoon in it.\nA bunch of children wearing winter gear playing ball in the snow\na close up of a zebra near a car window\nThere is a sign that warns people of work ahead\nA man is jumping in the air to catch a frisbee.\nA red bus driving down a street near a building.\na man holding a cell phone so someone else can look at it\nA fire hydrant stands on the sidewalk in between two poles.\nA stoplight controlling traffic in an urban intersection\nA group of men riding on the back of horses.\nA black and white cat sitting between bottles and a furniture leg\na group of people sitting on a bench in front of some blooming flowers\nA girl in a striped shirt and red skirt playing tennis.\ncarrots rice and potatoes in a bowl with a spoon\nA man with a helmet on coming up a ramp.\nPeople riding horses on the sand of a beach.\nA person on a surfboard in the water.\nA man wearing a hat while standing next to a  purple teddy bear.\nA black and white photo of sheep grazing near an old fashioned car.\nA surfer is surfing in the ocean.\na young woman holding a cell phone in her right hand\nA urinal in the men's bathroom and a small sink.\nA kitchen view of a dining table with a bowl with bananas.\nA train pulling through a grassy area with two children near.\nA skateboarder in mid jump while others look on.\nA couple of horses standing next to each other.\nA robot built from a Lego robotics kit\nTwo tall birds are standing in some mulch.\nA gaming system plugged into an electric source.\nA person with cold weather gear on while skiing in the snow.\nA woman on a horse silhouetted by the sun behind.\nThe two people are facing away from the screen\nA man holding a tennis racquet and tennis ball.\npalm trees in front of a building and mountains in the background\nA person on a snowboard sitting in the snow.\nA turkey sandwich and an apple are on a plate.\nA view inside of a room with a television.\nA brown kitchen table with chairs and brown high bar chairs.\nA couple of small statue sculptures on display in a garden.\nA small giraffe stands alone in some thick grass.\nA number of little league baseball players gathered in the dugout.\nA bowl of oranges with several on table around bowl.\nA dog running through the grass holding a frisbee in his mouth.\nTwo people stand in the snow with their skis on their backpacks.\nA red train on the track in between two buildings.\nA group of baseball players is crowded at the mound.\nThe stop sign has two street names posted above it.\nThe woman is rocking the newborn baby and smiling.\nBirds in the air in a circle with ocean and mountains and city in background\nA car near a toilet sitting on a sidewalk.\na wine bottle and a small vase sitting next to a tiny pizza\nBoats sit in the lake next to one another.\nA yellow stoplight with a smiley face drawn on the lens.\nA narrow kitchen with a refrigerator at the end of it.\na man in a red and black leather jacket on a motorcycle\nA man in blue jersey throwing a baseball.\nThree women sitting on a surfboard in the water.\nthere is a large building under construction and many parking meters\na big teddy bear behind a glass wall\nA living room area with some couches and a television\na large group of zebras under a shingled roof\nA piece of meat covered in marinara sauce, cheese and herbs.\nA brown and black puppy sits in the sun.\nSOMEONE BRUSHING THEIR DOGS TEETH WITH A TOOTH BRUSH\nSeveral people sit on park benches by the water.\nA man is taking a picture of a toilet from outside the restroom door.\nA very happy surfer guy is loving life as he hangs ten on a beautiful wave.\nA cat looking at the camera with a funny expression.\nA sign post with two street signs and a stop sign\na paper plate holding a slice of veggie and sausage pizza\nA train engine carrying carts over a hill side.\nSkier on red and black skis jumping near a mountain.\nan image of a bathroom scene with lots of hair products on counter\nA public transit train going through a station.\nA yellow fireplug in front of a blue street pole and store window.\nan image of a man riding a horse\nA STOP sign that has been written on with paint.\nGirl in a pink scarf eating a pastry at a table.\na surfer in a wet suit is surfing in a sunny day\nTwo people are walking up a snow covered hill.\nA bird standing on the rocks in front of water.\nA counter top in a kitchen with various items on it.\nWhite people looking ridiculous playing wii and drinking beer.\nA man bareback riding his bike down the street\nSmall fishing boats lazily drift in the bay.\nThree baseball players are standing by a base and smiling.\nA table topped with lots of different types of cakes.\nA train is approaching an opposite side boarding area.\nA horse is standing in the grass with its head over a fence.\nA puppy chases its tail, next to a mirror.\nThe two elephants are eating their grass for dinner.\nA man is standing next to a city fence.\na person riding a surf board on a wave\nAn image of a man wearing a baseball glove and leather jacket.\nAn assortment of foods on white and blue plates.\nA little girl sits with a piece of cake\nA red and yellow double decker bus at a bus stop.\nA dog that is laying on the bed.\nA dog and someone laying on a bed in a bedroom\nA man standing in a snowy forest wearing skis.\nA simple vase with a few flowers in it .\na child on a tennis court getting ready to swing a tennis racket\nA box that has different kinds of donuts.\nA baseball player swinging his bat, while the catcher and one spectator look on.\nA man is steaming his clothes in a bathroom.\nA man rides behind a horse during a race.\nA man in black gear skiing down the hill\nOld photograph of baseball team posed on a set of steps\nA woman at a crosswalk that has a green light.\nA young man riding a skateboard on top of a rail.\na young lady holding her kitten and kissing its head\nA man in a red shirt and sunglasses is playing frisbee.\nUp close view of two zebras in a zoo.\na monorail making it's way down the track above a bunch of cars\nA chocolate donut filled with cream and custard.\na large crowd of people in a park, a good portion of them are flying kites.\nan open field with some people flying two different kites\nA professional snow boarder flying through the air\nSeafood with pasta and broccoli is on a plate.\nA young person is biting into a hotdog\nA couple of birds sitting on top of a large clock.\nA woman leaning in and smiling at the camera.\nA double decker bus is on a street.\nA cat laying on the bed looking at the camera\nA busy intersection with traffic captured in motion.\na pair of pet bowls on a mat is next to a screen\nA man riding a skateboard on piece of concrete in a park.\nThere are two people posing and one man is holding a banana\nA FIELD AREA WITH GREEN GRASS AND TING BUILDINGS\nThere is a couple standing among some fountains\nA mp3 player sitting on top of a speaker system.\nLarge sunflower displayed in colorful vase on table.\nA group of skiers watch as one members does a trick.\nA man riding down a snow covered slope on skis.\nA single giraffe standing by a tree and some rocks.\nMan with tennis racquet, soccer ball, golf club, and hockey stick.\nAn airport is full of people's luggage and no one is there to claim it.\nA man riding a horse in an arena with a bull.\nA base with  yellow pink and orange daisies in it.\nA cat sitting on a table next to a vase with flowers.\nThere is a train sitting on the tracks.\nA white toilet sitting next to a bathroom sink.\nA bag and it's contents sitting beside it on a floor.\nA colorful display of hundreds of small teddy bears is featured.\na bed with a shelf above it with items and luggage\nA living room with a chair, fireplace and mirror.\nA tray of food sits on an outdoor table.\nA double sided parking meter buried in the snow.\nA black Macbook on top of a stand\nA stop sign near a Star Bucks Coffee Shop.\nA girl in dress sitting on a park bench.\nA man standing in water holding a fishing rod.\nA blue and green  hummingbird seems to hang in the air with its wings together and outstretched.\nThe man in the chair is playing a video game.\nThe airplane is on the runway t the airport.\nA plate with some eggs toast and bacon on it.\nTwo horses graze in a large grassy field.\nA bowl with rice and a side of broccoli in it.\nThere are two giraffes standing in the wild next to trees.\nA cow grazing upon a hill on a foggy day\na child lying in a children's bed next to a wicker basket dresser.\nA piece of birthday cake is sitting on a plate.\nThree people ride on an elephant in front of a forest.\nThe fork is attached to the dinner tray.\nA cutting board with chopped carrots and apples.\nA parachute floats in the sky above the ocean\na man on a snow board riding through the snow\nZebras, and elephant and another animal standing near water.\nThe young child with the missing tooth is holding up a new tooth brush.\nA man standing in a tool shed running water from a wooden sink.\nA woman is cross country skiing in a forest.\nA boy in jeans spreads his arms wide as he balances his skateboard on the edge of a pool.\nA girl is sitting at a table in front of a skate board and helmet.\nTwo double deckers buses travelling on a city street.\nA girl with a curious look in front of broccoli and chicken.\na bronze statue is looking at a clock on a building\nA bike in front of a scenic welcome sign.\nA kid wearing a Georgetown Day Shirt has a baseball glove in his hand.\na black and white cat is sitting on a yellow and red chair\nA living room with two red chairs on front of a television set.\nA plate of food sitting with a very elegant setup to it.\nThe remains of the breakfast table from above\nA picture of a person laying on a bed.\nblack and white picture of elephants in a fenced water tank\nTwo people playing a board game involving cards and chips.\nThe giraffe is standing by itself by the gate.\nA large jetliner flying through a clear blue sky.\nBEDROOM WITH BED, DRESSER, TV, LAMP AND OPEN WINDOW\nA woman dressed in white holding a colorful umbrella.\ntwo girls sitting in a restaurant eating noodles\nA man surfs a small wave on an overcast day.\nfruit hanging from a tree with trees in the background\nA desk with several computers and electronics on it.\nA couch sitting in front of a rub on a hard wood floor.\nA man that is on a surfboard sitting on a wave.\nThe people are piling on to a large truck bed.\nA plane on the tarmac with airport personnel.\nA man in a suit and a woman in a dress standing side by side.\nThe men are walking with each other wearing ties.\nThe guy is on the computer while there is a girl on the bed.\nA large clock in a mass transit station.\nPeople in a room with one man working on a laptop while another looks on.\nTwo horses standing together in an open field near some mountains.\na couple of players are out in a baseball field\nA couple of small birds and a building.\nA woman smiling while showing off her cell phone.\nA baby on the floor biting into a remote.\nThe teenager is taking a picture of her male friend with her cell phone.\nA prepared plate of dinner has meat and broccoli.\nA man is doing tricks on his skateboard, him and it up in the air\nA green and beige bus sitting on display behind a traffic light.\nthere is a small boy getting help brushing his teeth\nA hand that is that touching a dog.\nA lady making something in a home kitchen of some sort.\nA man sitting in the middle of a fresh produce stand.\nLarge oak desk with laptop, keyboard, and pictures on shelves.\nA small bed in a room with lacy curtains on the window.\na blue billboard sign in  a busy city\nA black and white dog on shore of a beach.\na trey with some fruit inside of it\nA cup of coffee and a banana are setting on this desk.\nBirds are sitting on the arms of poolside chairs.\nA remote and a container is sitting on a table.\nA person skiing down a snowy mountain slope.\nA doorway view of a bed window and doorway to another room.\nA man that is standing in front of a television.\na man lighting the candles on a birthday cake\nA group of people flying kites over a lush green field.\nAn airplane is flying through the sky during the day.\nA baseball game is being played on the grass.\nSeveral ducks are out in the middle of a lake.\nThere are some people walking on a beach with surfboards.\nA stove and some books in a kitchen.\nA small pizza with burnt edges and fresh toppings.\nA living room features a gray and yellow couch, and wooden furniture.\na man with a hat on a bicycle beside a tractor\nA pan filled with celery, onions, and carrots.\na person standing near a small motorcycle on a city street\nA remote control that is laid on a piece of furniture's cushion, which is ripped and exposing springs and wood.\nA man wearing gear on his feet walking in the grass.\nsome bananas oranges apples and other fruits and a bowl\nTHIS IS A TRAIN GOING THROUGH THE MIDDLE OF THE WOODS\nLiving room with large television and lit fireplace.\nA herd of zebra standing on top of a lush green field.\na bunch of fruit is laying on a table\nA large white bus on the side of the road.\nVarious street signs next to wall with a building in the background.\nThere is rice, broccoli mac and cheese, and turkey on the plate.\nSome people are holding a union Jack umbrella.\nA street sign saying Major Street with an arrow pointing to the right.\nSauce covered pizza in a box on a wooden table.\nA couple of people standing next to each other.\nA group of people standing and holding wii remotes.\nA man taking a swing at a baseball\nA bedroom with bed, chair, table and  bookcase.\nTwo large trucks traveling in the side view mirror of a car.\nPancake breakfast on wooden table with blue and white mat.\nA cow walking by a creek with two swans swimming in it.\nA stuffed bear sitting on top of a window sill.\nA group of sheep sit on top of hay bales.\nA dog wears goggles while sitting in the side car of a motorcycle.\nA boat sitting on the beach next to a van.\nTwo traffic lights facing opposite directions with a street sign atop the same pole.\nA red umbrella and chair are by the ocean.\nA woman hitting a tennis ball with a racquet.\nA bathroom with a pink toilet and pink tile.\nA young man sitting at a simple desk with a laptop computer and bed in the background.\na boy swinging a tennis racquet at a tennis ball on a tennis court\nA white truck is carrying three motorcycles on the road.\nA tennis player is hitting the ball on a tennis court.\nA man that is wearing a tie and is standing while smiling.\nSomeone's hand holding up a glass of wine.\nPerson of a surf board riding a wave in the ocean.\nthere is a male tennis player playing on the court\nSome giraffes are walking around near some bushes.\nA group of people traveling uphill on a snowy mountain.\nFour people sit at a table full of pizza.\nA street with many cars and busses in a city\nPlate of food with mixed vegetables and a side of meat.\nThe motorcyclist is happy to be on the road.\nA young girl smiles as she holds a cell phone.\nThe person in an apron is arranging boxes of fruit.\nA bird sitting on top of a tall metal weather vein.\nThee zebras graze in the middle of the zoo.\nThe tower has a clock displayed in order to tell time.\na jet that is parked on a runway\na man standing on a red boat out on a large body of water.\na cat looks out of a room while on a step\na white cat is sticking his head out of some iron bars\nA pitcher gets ready to throw in a baseball game.\nTwo two birds are sitting on a rock.\nFive giraffe stand around a pole eating hay.\na baseball player holding a baseball bat inside a stadium\nA woman holding a sign sitting on top of a truck.\nA truck full of luggage has the hood opened\nMotorcycles are going around a track leaned over.\nA drink cooler with bottles of water, juice, and soda.\nA train on one of two of the train tracks.\nIndian woman selling bananas while others look at stand.\na female tennis player in a white dress is playing tennis\nA baseball player standing on top of a field.\nA vase of colorful flowers sitting on a table.\nA white toilet commode sits on a tile floor.\nAssortment of fruits with pastry and beverage displayed on table.\nA skiier slides down a snowy mountain on his board.\nAn orange van with vehicles behind are sitting on the road.\na little grey teddy bear with a missing eye sitting by a tree stump\nA teddy bear in a top hat and bow tie with the message \"Me To You\"\nA couch sitting next to a white fire hydrant.\nA man with a frisbee in his hand in the woods.\na few people that are playing with a white  frizbe\nA person is choosing produce to bag at an outdoor market\nA stop sign in Arabic, in a desolate location.\nA couple of women sharing a toast at a table.\na black and white photo with a bench grass and trees\nA house renovation showing an unfinished room next to a kitchen.\ntwo buses moving on the street besides residential houses\nAn alley with a person on a bike and a girl walking.\nA pita is topped with onoins, carrots, and bacon.\nA toilet has been fitted with a system to potty train a cat.\nA cute dog has large pink ears and eyes.\nBig green monument with a clock on top.\nSomeone going down a hill on a pair of skis\nA white sink sitting under a bathroom mirror.\nA train going by a platform in a train station.\nA couple of boxes filled with hot dogs and fries.\na group of bike riders going past a yellow bus\nDifferent types of luggage trunks stacked up together\na kitchen with a stove and a glass door\nThe sink and counter is in the grouplab.\nA man in a gray hat and sunglasses on a cell phone\nan image of  a man posing with surfboard\nA baseball player standing next to a woman.\nA double decker bus pulling up to a bus stop.\nFour young skateboarders are holding onto the back of a bus.\nA long train traveling down tracks with rusted cars.\nA display rack of a variety of tools in packages.\nA bird in a dark room perched on a stack of books.\na television is turned on in a living room\nA black and white picture has a posing crowd.\nThere is a phone on top of a calculator\ntwo black and white clocks a tan building and a white and blue bus\nA motorcycle stopped on the road during nighttime in the city.\nThe two men stand next to each other looking out on the beach.\nTwo men standing on a sandy beach holding surfboards.\na man uses a bat gets ready to try and hit a ball\nA television that is on with a white man talking and campaign signs\nA gigantic size pizza on a table in front of a woman.\nthis is a trck driving over an overpass\nA woman with her dog are seated on a bench.\nTwo cows in a grass field with a blue sky and clouds in the background.\na bridge over a body of water near a building\nA yellow parking meter on the side of the street.\nThe couple is sitting at the table talking with friends.\nA person riding a horse next to a big black and brown dog.\na number of people in a small boat with a car\nA woman standing between two cows on a field.\nThis refrigerator has a monitor on its door.\nA bus drives through a street with an arch.\nA sink and bath in a small room.\nA child with stuffed animals in the background\nPeople playing a game with a Frisbee outside.\nOn that point are a bunch of individuals celebrating.\na man about to take a swing at a base ball.\nA small plastic container of rice and vegetables with a few crackers.\nA woman cutting the hair of a boy whose sitting in a toy airplane.\nA variety of cookbooks stuffed into and around a microwave\nA baseball player swinging a bat with a catcher and umpire behind him.\nA group of skiers trudging up a snow covered hill.\nSmoke billows from two smoke stacks of a steam engine boat.\nTwo Indian men decorate two different birthday cakes\nA person riding a yellow motorcycle on a track.\nTWO KIDS ARE PLAYING INT HE ROOOM\nA bird that is sitting by some water.\nA large dog holding something yellow in its mouth.\nA bunch of birds that are standing in the grass.\nA police office who is sitting on a motorcycle.\nA large metal clock and some bright lights.\na very large pizza that is on a wooden table\nA brown bear and a white bunny sitting next to each other.\na clock with religious icons painted on a wall\nThe surfing board is on the sand on the beach\nA boy and his dog are playing in the snow.\na man uses a knife to chop up some carrots\nA large jetliner sitting on top of a tarmac.\nA man wearing a hat carrying two lamps in a field.\nA group of people stand in a dimly lit area between roads.\na stop sign and street sign on an pole\na horse is standing with his owner next to a tree.\nlandscape of a snow covered field and mountains\nA man in the park with a frisbee.\nGroup of double decker buses on road near crane.\na photo of a man in the credits of a film\nA large blue train going down the rail road tracks\nThis is the front of a mobile library.\nBlack and white of man crossing to old style building with clock tower, possibly in Cuba as cars 1950's vintage.\nA lone zebra grazes on grass in a pasture.\nTwo gentlemen discussing something being viewed on one of their phone screens.\nA man sitting on top of a couch holding a game controller.\nA man is in the water with a beautifully painted surfboard.\nA living room complete with a couch sliding door and a window.\nA view of a downtown area, looks very rural.\nA bison and her babies walking through a field.\nRed Light at a street intersection with people present on the corner\nA man and woman having a drink on a docked boat.\nA woman and a baby are looking at laptops.\nThis is a boy on skateboard about to go down a ramp\nA girl in yellow dress eating a piece of cake on table.\na very nice draw showing a vase with flowers\nAn oven is shown with all of the burners in use.\nTwo young girls making pizza on a counter top.\nA bathroom under construction with a white tub next to a toilet.\nA person walking a dog on the beach\na beagle with it's tongue sticking out standing by a water bowl\nA stack of luggage by a curb and parked car.\nA person that is in the water doing a trick.\nA guy in a white t-shirt rides on his skateboard.\nAirplane being loaded sitting on the tarmac at the airport.\nA jumbo sized stuffed teddy bear waits on a wheeled dolly.\nA man in competition gear on a red snowboard going down a hill.\nSeven circus elephants,  on their hind legs, leaning on each other,  with a standing elephant in the middle of the line.\nA female tennis-player with her racket in-hand in front of a crowd of onlookers.\nThree sheep in a field of grass near a steep hill.\nA man is surfing on a wave in the ocean.\nTwo blue suitcases right next to each other\nA brick wall and several warning signs nearby.\nsome slices and pieces of yellow bananas on a towel\nA row of luggage sitting on a wooden floor.\nA giraffe licking a fence post while standing in a coral.\nA bathroom with a toilet, television and bathtub in it.\na bus that is parked on a very large hill\nThe woman is holding a teddy bear in her arms.\nFour cows in a pen on a sunny day\nA baby elephant stands near its mother.\nA person with a hat sitting down with an instrument between their legs.\nLarge assortment of traffic signals in outdoor area.\nA plate of noodles, beans, broccoli and an egg roll.\nThree Zebras standing in front of a gate.\nA man with a surfboard walking across a bridge towards the ocean\nSnow skiers enjoying the slopes in the mountains.\nPartially eaten donut with glazed topping on wax paper\na cat that is  on a couch and lap top\na woman riding a bike down the street\nA plate of food containing meat and vegetables.\nA man sampling donuts and ice creams for a birthday party\nThere is a computer on the work desk.\nA large jetliner taking off from a runway.\nAn older man holding up a handkerchief with an image of a woman in a bikini.\nA room with furniture and a fire place.\nAdult woman walking on sidewalk near yellow fire hydrant in city.\nthree people in Japanese clothing, two are carrying umbrellas and all are wearing sandals and they are walking past parked bikes.\nA baseball player attempts a slide as a catcher and umpire look on.\na man holding a brush standing in a room\nTrays of food that include couscous, apples, and raisins.\na cat in a blue hat is laying down\nA large orange cat sleeping on a pair of shoes.\na person riding a skate board ata skate park\na sign showing no birds allowed while a beautiful bird stand there\nA piece of cake in a plastic container next to a large cookie.\nA man in a blue shirt preparing to throw a Frisbee.\nA plate of meat, bread, and vegetables on a table.\nA group of people standing around a chicken coup.\na man at the beach holding a surf board\nA room with wooden floors and white walls\nA young woman sitting on a city bench talking on a cell phone\nA GPS device on top of a counter next to a book.\nA person surfing in shallow waves near the shore.\nA youth baseball team is grouped together for a photo.\nA LOT OF PEOPLE WALKING THROUGH A BUSY SQUARE\nA train with one of its doors open.\nA man sitting down at a table using a computer.\nOlive green vintage military truck, six wheeled.\nSculptures of zebras stand in the brush and grass.\nA clock tower lit up at night  with an array of bells at the top.\nA white plate of food that includes an artichoke and bread.\na close up of a vase with flowers on a table\nThe baby is helping its mom on the internet.\nA laptop and books sitting on white sheets on a bed.\nA man laying on top of a bench on a dirt field.\nA new tv on top of an old tv.\npeople standing in line beside a food truck\nA skateboarder rides the rail in an urban area.\nAn Asian building with a satellite dish on the roof.\nA bowl full of something that appears to be nuts, which can be eaten with chopsticks.\nTwo people are pictured standing in front of an apartment building.\nA train traveling down train tracks next to a forest.\nThe bathroom only has a broken door, a broken toilet, and a broken window.\ntwo people playing with a dog on a leash\nA metal bowl containing five oranges in sunlight.\nAn indoor fruit market with citrus and tropical fruit.\nA living room that has a couch and television set on a table.\nA beautiful woman standing next to a man holding a Nintendo Wii controller.\nA scenic view of Big Ben in the evening hours.\nA wooden table holding various bowls and food.\nThe dog lies down next to the parked motorcycle.\nA red and white fire hydrant was given eyes.\na woman in glasses sits in front of a laptop\nSurfer riding a large wave next to a platform with people standing on it.\nA chair is outside of a window that a woman is cleaning by a bathtub.\na living room with a lot of chairs and  a big entertainment center\nStairs that have some fading green paint on them.\nA man with a meal and drink at a round table.\nA pizza with various toppings is sitting on a wooden slate.\nTwo coffee mugs by an orange juice and a juice glass.\nA group picture of young men and women at an event at night.\nThe group of people in business suits is standing beside a large poster.\nA smiling man poses with a healthy cow\nA large commercial airplane taking off for flight\nA woman holding a carrot in one of her hands.\nA woman taking a picture of herself in a mirror.\na close up of two pots of food on a stove\nThe hand is holding a controller for the video game console.\nA hotel with a large blue poll lined with lawn chairs covered in umbrella.\nTwo pieces of fruit, an apple and an orange.\nA tennis player in action on the court.\nA woman is sheering the coat off of a sheep.\nThe woman in the mirror is taking a picture of herself and the dog.\nA flock of birds sitting on top of a tree.\nA woman with an umbrella on a bicycle\nTwo bulls are resting on the sand next to a boat.\nA man with a dog plays a game on his Wii.\nA man snowboarding down a hill while wearing a coat, goggles and a hat.\nA man is standing near a car with some luggage while another stands near by.\nSomeone's shoe stuck to a stop sign in the city.\nA large clock sitting on a sidewalk in front of a brick building.\nA man and woman are playing an interactive video game with controllers.\na small yellow Cessna plane flying on a clear day\nA monitor, keyboard, coffee cup, and plastic bottle sit on a table.\nTwo polar bears in the snow surrounded by trees.\na bunch of cattle are standing in a grassy field\nTwo pieces of bread with sauce on them next to a bowl of chicken salad.\nA smiling man eating at a table with people behind him.\nA stove top oven sitting next to a mixer.\na cat lying on the floor in front of a mirror\nFour giraffes standing in the grass in their enclosure.\nA small engine plane making a slight right turn over farmland.\nAn older man is on the soccer field with the ball.\nThe two sheep are enjoying their time in the hay.\nA skiier approaches a huge snowball at a ski resort\nA man doing a belly flop onto a bed\nBoats floating on top of a large lake.\nA surfer is surfing the waves in the ocean.\nA man standing on his surfboard riding a small wave\nthere is a male safer that is seen riding a wave\nA wooden shop stand loaded with drinks and food.\nA large tanker truck driving down a road.\nA girl trying to fly a kite with a face.\nStanding man and woman near a dining table full of food.\nA group of people hiking along a mountain line on the peak.\nWe are looking down on to a small bathroom.\nA parking tole on the side of the street with snow on it\nA passanger bus stopped in front of another passanger bus ready to pick up passangers.\nA very intricately designed old tower clock, with people coming through the arched doorway at the bottom of the photograph.\na small teddy bear dressed in clothing\nA bathroom scene is pictured in this image.\nTwo sheep are lying on the ground under a tree.\na number of luggage bags on a cart in a lobby\nA white cat sitting under an open umbrella\nA kitchen with woodwork cabinetry and pendant lights is displayed.\nA blue couch with a bunch of pillows on it\nA bus stop next to a curvy road surrounded by traffic lights.\nA man surfing rocking waves in the water.\nA blue eyed dog panting as he walks by.\nA team playing baseball on a baseball diamond.\nA baby in grey shirt sitting on a toilet net to a tub.\nThe bird is perched on the gate by the mountains.\nA woman wearing a red shirt blow drying her hair.\nAn airplane sits on the tarmac at an airport.\na woman looking at her phone in a crowded area\na room that has a couple of different computers\nA man holding a slice of pizza up to his mouth.\nA crowd of people standing underneath round lit orbs.\na man made pond inside some kind of enclosure\nA man flying through the air while riding skis.\na green motorcycle is parked next to another one\nA sign cheering on the Colts sports team.\nTwo woman on an island pose for the cameras.\nA pizza sits inside of a box on foil.\na bat and a ball on the ground next to a flower\nThere are seven chairs around the round table.\nA person wearing a helmet riding a skateboard while a person stands in the background looking off into the distance.\nTwo skateboarding decks mounted on a grey wall.\nTwo zebras in an enclosure walking a dirt path.\nA large room has a stone fireplace with candles inside.\nA group of giraffe standing around a tree.\nA pair of giraffes looking opposite directions in a forest.\nA basket with a stuffed teddy bear hangs outside.\nPeople sitting at a table with laptops and books.\nA picture of three computer screens with two on.\nA young girl eating something while wearing 2 different shoes.\nTwo men are working in a commercial kitchen to cook food.\na man with a bike sitting on a bench in front of some trees\nA man in thought sitting in front of a laptop with a pen in his hand.\na close up of a child eating a banana\nPeople standing at a park with a large yellow kite.\nA foyer of a home that is leading to a dining room.\nA black bear relaxing on a hammock supported by chains.\nOld blue bus with bicycles parked on roadway near green space.\nA man lifting up a lid on a toilet in a bathroom.\nA group of people playing Wii and smiling.\nTwo birds are sitting on the ground together.\nSeveral boats are docked in a harbor.\nA sign advertising a reptile sale in May\none zebra drinking at a pond and another standing\nA rain covered street filled with heavy traffic.\nA large decorated bus has a couple of folks standing by it.\nA close up of a doughnut covered in red, white, and blue sprinkles.\nA purse sitting next to it's contents on top of a table.\nA zebra standing in  a forest next to a large boulder.\nA bowl of fruit on a table next to a letter and a plate with two tomatoes.\nA bear in the arms on a heart pillow.\nThere is a laptop sitting on a computer desk\na lady taking a picture of a long horned mountain goat\nA railroad train heading toward a traffic light on the tracks\na line of people that have horse and wagons\nA skier stands posing on a flat area in front of the lodge.\nA young man holding a cell phone with two hands.\nA big sign telling people to stop eating animals next to a building with cars parked outside\nFire hydrant on a corner with a smile painted on it.\nYoungster on a skateboard, trying simple tip up stunt.\nA woman getting food out of an oven while another woman stands by.\nA small city bus with advertising on the side and back\nA hotel bathroom with focus on the toilet.\nA yellow yawning dog laying on the ground\nA black bear is crouched down in the water.\nA couple of men standing next to each other on a  lush green field.\nFlower are placed in a vase covered in shells.\na little boy with a suit, tie and glasses\nA wooden shelf holding a microwave and small refrigerator\nA giraffe sitting on a grassy patch of land.\nA man sleeping in a bed by a dog and remote controller.\nA jetliner sitting on top of an airport runway.\nA man is standing talking on the telephone\nTwo skiers are standing at the top of a hill.\nA table that has a pink hardcase carrier on it, along with several smaller containers.\nPeople are walking along a sidewalk with their luggage.\nBaseball player in the motion of swinging his bat at the plate.\nA woman is talking on a cell phone outside.\nsome water and a person is flying a kite\nA keyboard that is sitting next to a mouse.\nThe table has two wine glasses, a bottle of wine, and a vase sitting on it.\nA man that is on a skateboard on a sidewalk.\nA melted looking lay pot sitting on top of a spindle.\nA black and white image of three people on a bench\na big lake with some boats out in it\nA young man using a laptop computer with a large monitor.\nA small bird sitting on top of a tree branch.\nA airplane that is sitting on a tarmac.\nThe living room in the wooded house is empty.\na very clean bathroom with a walk in shower\nAn old, large clock hanging off of a building.\nTwo zebras standing side by side to each other in a zoo pen.\nTwo young boys carrying red and white surfboard.\na big airplane that is parked in teh woods\nAn assortment of muffins are on sale in a Japanese store.\nA close-up of a bear swimming in the water.\nA young man is playing tennis on a court.\na woman sits on an ornate wooden bed with fancy bedding\nMan sitting at table with pizza and beer in restaurant.\nSeveral pizzas are lined up on a table.\nA man driving a motorcycle with a sports car trailer and two dogs sitting in it.\na big building with a clock inside of it sitting in front of a water way\nA man riding skis down a snow covered slope.\nThree men admiring motorcycles in a sidewalk exhibit.\nThe inside view of a bus and its passengers.\nA side view of a train passing through a mountain trail.\nLooking down at a partially eaten salad sandwich in paper\nA black and white dog jumping up catching a Frisbee.\nA smiling little girl hugging a teddy bear.\na tennis player swinging a racket on a court\nA man standing in front of a flat screen TV.\nThe yellow mustang car is sitting on the side of the sign.\nA dog sitting on a chair underneath a painting.\nGroup of people outside and one pointing up to the sky.\nTwo teenage girls wearing hats are smiling for the camera.\nA plate and fork with toast and vegetables on a chair.\nA group of giraffes on a path near a few trees.\nA laptop computer, some speakers, a cellphone, empty pill package, and bowl of chili are on a narrow table.\nA man is sitting in the grass holding four cell phones.\nan asian man pitching the ball during a baseball game\nYoung baby in crib laughing with bear\na person in a scarf and suit sitting outdoors on a bench\nA street sign with two signs hanging off of it's sides.\na teenager playing in a skate park with a skateboard\nA police motorcyclist with a flag is riding while a large crowd watches.\nA man doing a trick on a skateboard off a rail.\nA person on a surfboard in the water.\nThe bed has yellow sheets with sheep on them\nTwo urinals next to each other in a bathroom.\nA motorcycle is parked in front of two garages.\nA person in skis stands over snowy ground.\nA little boy playing a Wii game in his living room.\nA man in a black shirt plays on an ocean wave.\nA small kitchen that has small kitchen appliances.\na man and a woman playing tennis on a tv.\nBowls of lettuce, pepper, chapati and other foodstuffs\nA dog is laying on a pillow holding his toy.\nTwo male tennis players, one has on a white hat. They look like they were mid conversation.\nMany people are dressed as zombies covered in blood.\nA white bathroom toilet sitting next to a urinal.\nA dog sits on a couch with pillows.\nA young boy is playing frisbee in a park.\na closeup of a person's hands as it plays with a Wii controller in front of booklet with Mario and Luigi characters\na note sitting between a couple baskets of oranges\nA couple of men standing next to a man and his brown horse.\nA bathroom with a hole in the ground for a toilet.\nA man behind a cashier holding a red pen\nWe are looking at a simple clock tower.\nA man eating a doughnut at his computer keyboard.\nA lab with refrigerators and a man sitting nearby in an office.\na CGI photo of a animal sitting on some vegetables\nA hodge podge of colors and patterns decorate a bathroom.\nA child and an adult are paddle boarding in the ocean.\nA parking area with trees next to a stadium.\nA person communicating with two phones at once.\nA laptop computer that is sitting on desk that has a lot of clutter on it.\nA young girl smiles brightly over a chocolate birthday cake.\nA person putting their foot up to a skateboard.\nA person's handing pressing a button on a WiiMoted for the Nintendo Wii gaming system.\nMan walking on beach near ocean carrying surfboard and holding para sail handle.\nA photo of an old white building with a clock tower.\nA large grey airplane flying through the sky in the daytime.\nA collage of photos including a restaurant, waterfall and a rose.\na baseball player holding a bat in the batters box\nA man riding a skateboard on a street next to a park.\nA mother and her baby zebra grazing on dry grass.\nA woman petting a giraffe that's leaning over a rail.\nA black laptop sitting on a desk next to a remote controller.\nA plate containing three sandwiches, fries, and ketchup on the side.\nA building that has a clock on the front of it.\nA parasailer on the water with sky line in the far distance.\nThe group of zebra are eating and there are small birds in their pen.\nA long-haired man with a beard is wearing a suit and tie.\nCross country skiers travel through the snow during a race.\nA plate filled with cooked vegetables and meat.\na person jumping a skate board in the air\nA bedroom with a day bed next to a window.\nA giraffe comes close to a visitor in its enclosure.\nA laptop computer sitting next to a computer monitor.\nThe buses and trolley cars run on the same street..\nApples and pears are in a box with grains and a bowl.\nPeople are marching down the freeway with a banner.\na man is sitting with a laptop box on him\nA full view of a flower vase with drinks and cups.\na baby giraffe stands in a area with some birds\nA man wearing a silver tie near a clock.\nA small bathroom with tiles appears clean and organized.\nAn airplane engine is seen passing a mountain in the distance.\nA couple of brown horses walking down a street next to buildings.\nA woman taking a bite of pizza in a restaurant.\nA group of people use paddles while standing on boards.\nA baseball game is being played on a dirt and grass field.\nA dust plane is pulling sharply up into the sky while leaving a trail.\nA man on a horse monument in front of a building.\nA girl with a bun is sitting on a scooter type motorcycle.\nThe stop sign is clearly visible for all of us to see.\nAdvertising image with writing backed by bags of oranges.\nA umpire signaling safe at a baseball game as a man slides into the home plate\nA person and a child ride on a skateboard in this black and white photo.\na man with a tennis racket is running on a court\nA plate contains chips and a sandwich.\nA man in a suit standing next to a control board and computer.\nA bus travels down a busy street  in a crowded city.\na traffic light with two street signs on the top of it\nA picture of an outdoor area that looks great.\nA man giving a thumbs up behind a computer screen\nMen keep watch on a herd of goats.\nA group of elephants gather around in a field.\nA street post with street signs and lights on it\nA plate of food with broccoli, radishes, rice and chips.\nThe cat looks through the door that is cracked open.\nA man in snow shoes and his dog on a snowy path in the woods.\na building is shown with a big clock in it\nFive stuffed teddy bears sitting in a row.\nA group of people standing together under an outdoor hut.\nA clean kitchen has dark brown cabinets and white appliances.\nA person is riding a horse on the sandy beach.\na couple of stuffed animals sit next to each other\nA variety of vases are shown on a table top.\nThis is an image of a baseball game with players at home plate.\nA man sitting on a bench next to potted flower.\nTwo people on bicycles riding in street with signage in the foreground.\nA salad made with yellow pepper strips and green sprouts sits on a square white plate.\nThe adult elephant stands near a large toy ball.\nA skier with a backpack pauses to enjoy the view\nA baseball player prepares to swing at the ball.\ntHERE IS A HOT DOG INSEAD OF A BREAD HOT DOG BUN\nTwo men sit atop motorcycles and two men sit in sidecars.\nA surfboard sitting in the sand on the beach.\nA cell phone sits beside a small crocheted change purse.\nA collection of green vegetables sits on a table.\nA toilet and shower-bath combo in a small restroom\nA room with a television, couch, chair, tables and potted plants.\nThree oranges sitting on a dark black surface.\nA black cat laying down on top of a refrigerator.\nA book opened sitting next to some mushroom ornaments and a vase.\nA very narrow busy road of shops with a lot of people.\nSome teddy bears hanging from chains on a sale rack\nZucchini, summer squash and broccoli are mounded in baskets.\nA large flock of birds fly through the sky.\nA baseball player is holding his bat, and blowing a bubble.\nA tennis player readies herself to receive a serve.\na tennis player hitting a serve on the court\na round table of people with drinks and a cake\nA snowboarder and several skiers at the top of a run.\nSeveral people around a boat on the beach with an umbrella shade.\nThe giraffes were outside the building in a pen.\nA couple of people sitting on a bench.\nA woman in a dark blue jacket playing Frisbee.\nZodiac on back of large boat in a lake.\nYoung people painting a mural on a traffic divider.\nA woman street skiing with a helmet on putting on her gloves.\nA closeup of several ripe bananas clustered together.\nA woman wearing a bandanna and ugly sun glasses.\na family in a small row boat in a river\nA pile of trash sitting on a boat next to an umbrella.\nA man holding a banana over his face.\na kneeling woman taking a photo of her black dog\nWoman eating a hot dog while walking down a street.\na white toilet two rolls of toilet paper and a phone\nA large airplane is on a runway with clouds in the distance.\nChild at bat in Little League baseball while teammate watches from first base.\nTwo chairs and a small birds below it\nthis bathroom has white sinks and black counters\nA line of people sitting on benches in a courtyard.\nA large concrete skyscraper on a sunny cloudless day..\nA hand holding a PDA with a illuminated keyboard.\nA man sitting in a motorcycle poses with his arms outstretched.\nA man balancing a bike on a bench.\nA trash can is sitting next to a lowered curb.\nDog laying down partially covered by a comforter.\nA yellow double-decker bus next to a traffic light.\nAn empty, clean toilet stall with a stack of toilet paper.\nA group of women sitting on the floor eating food.\na dog looking out a window with it's reflection in the mirror\nA blue and white street sign above fence and water.\nA baseball player balancing the ball on his left hand.\nA close up of a man petting an elephant.\nA murdered monkeys head sitting in a white bowl next to bananas.\nthere is a cup of coffee and a half eaten sandwich on the table\nA kitchen has a plain white fridge in the corner.\na plate of french fries, two sliced sandwiches, and a pickle\nA man using an outdoor oven to cook a pizza\nThe sink is on the island of a large kitchen.\nA cat is standing in the corner of the room\nIt is dusk, and the skiers have abandoned their skis and snow boards for social interaction.\nA little girl cutting a piece of paper with blue scissors.\nA group of people who are walking with umbrellas.\nThe flowers in a vase are dying.\nA woman with short brown hair getting ready to bite into a hot dog.\na dog is floating on top of a water.\nA baby and a young boy are inside of a rolling suitcase.\nA black and white street sign that reads \"end bird.\"\nA teddy bear sitting on the edge of a toilet seat in a bathroom.\nA woman is placing a flower into a cake.\nA unfunished bed in the corner of a room.\nOne boy watches as another kid performs a skateboard stunt\nA puppy cuddles with a shoe on a couch.\nAn adult giraffe extending it's tongue over a fence.\na couple of planes flying through the air\nA bathroom with a white toilet and window over the toilet.\na person in a red jacket skiing along a path\nA living room with a large book shelf filled with books.\na bed room with a neatly made bed and two lamps\nThree people standing near a table with several glasses on it.\nA small, white dog laying on a bed with a stuffed toy.\na toothbrush on a table with a bunch of scissors\nA pedestrian walk light is lit up on the corner of West 3th St. and Seventh Ave.\nA man placing a tie on a womans neck\nA man flys through the air on a snowboard\nA little kitty on the bed using a laptop.\nEvening view of traffic light intersection with cars with headlights on and a building and trees.\nA close up view of a hand on a keyboard by a monitor.\nHere is a image of an zoo animals.\nA tire sitting on top of a green fire hydrant.\nA subway sandwich on top of plate and napkin.\nA group of people riding on the backs of horses.\nClock tower ascending into overcast sky from buildings below\nA family of four playing a Wii game.\nA lot of sheep eating grass in a ranch.\nMany people are scattered together at the air port.\nA plate with a drink and a variety of deserts.\nA large pizza on a plate on top of a dining table.\nThe adjacent farm land hills attest to the height of the soaring kite.\nA small bird is perched on an empty bird feeder.\nA shot of a desk with two computer monitors with a teddy bear on top of one of the monitors.\nA cheese pizza is on a tray with pieces missing.\na couple of zebras watch a giraffe walk through the grass\nIn the evening a large amount of open umbrellas are together.\nIt is almost like the dog is flying in order to catch that Frisbee.\nA pile of busted up toilets and sinks laying on the ground.\nA person wearing a helmet and riding a motorcycle.\nA cat laying on a couch in a room.\nA young skateboarder performs a trick on the stairway.\ntwo elephants in a field near a tree\nA child sitting on a wood bench typing on a toy-like laptop.\nA man who is looking at his cell phone.\nA large plane with propellers high up in the sky.\nA double deck bus driving down the street.\nA bunch of people heading to a big plane.\nA youth baseball player throws a baseball outside.\nA horse that is enclosed eating grass during the day.\nThe young man on the skateboard is practicing his tricks.\nA hallway with piles of luggage and other things.\nA cat under a table on a wooden floor playing with a jar.\nA very tall clock tower towering over a city.\nSeven people are posing for an old time photo in a large kitchen.\nA laptop computer and a desktop computer sitting on a desk.\nA cat sitting in a flower pot with no flowers.\nA bike and a large pile of luggage sitting under a sign.\nTwo giraffes with dried grass and trees in gray light\nA large red stop sign on a street.\nA train that is going by a train stop.\nA woman wearing blue crosses the street on a bike.\nA bathroom with sinks, mirrors and a towel dispenser.\nSeveral people on a beach one is parasailing , one has been wind surfing , and some are gathering up a picnic.\nA woman and her son using an old iMac computer\nA herd of sheep laying down next to each other on grass.\nA couple of men sitting next to each other.\nA large white double decker bus parked at a bus stop.\nA table is adorned with red, yellow and green fruits and vegetables.\nA picture of a sidewalk in front of stores.\nclose up of a toilet that looks like it is smiling\na fire truck at an intersection resting on its side\nA couple of people carrying luggage through the snow.\nA bird perches beneath a multitude of clocks\nsome baseball players playing as people watch on\nThe couple are posing for a picture while he is brushing his teeth.\nTwo men riding an elephant driven by a boy.\nWide angle view of a girl in a living room watching television.\nA bathroom with the door opened to a toilet and separate sink and vanity area.\nTwo people sitting back to back on a train.\nA red double bus is traveling down the road.\nA man and woman on a couch playing the Nintendo Wii.\nTwo zebra standing next to each other on a hill.\nA WOMAN TAKING A PICTURE OF CAT IN THE BATHROOM\nA family open Christmas presents near a Christmas tree.\nA sleeping woman cuddling a cat in bed.\nPedestrians walk on the sidewalk of a busy city\nA man is sitting by a river and brushing his teeth.\nA clean and bright kitchen with hard wood floors.\nA professional baseball player holding a bat on the baseball field.\na bird in a tree branch with green leaves\nA plate with stir fried noodles, broccoli, beef and carrots.\na couple of bowls with some fruit in them\nTwo people on horseback are posing while the horses gallop on a beach shore.\nMan standing on shoreline by ocean holding surfboard\nA large white tank sitting on top of a green lawn.\nA white table with two laptops and a bag on it.\nTwo hot dogs on buns next to a glass of water.\nA pizza toped with cheese and met on a wooden table.\nA photograph of a kitchen in the day.\nA group of giraffes standing around their enclosure\nA bedroom with a large bed with a white comforter.\nA boat floating out in the ocean next to a  shore.\nA man on skis walks on the ground.\na bedroom with two big beds covered with green blankets\nA ,man holding a boys legs learning to surf\nA male surfer performs a stunt in the ocean\nA bear reading a Christmas book in four separate shots\nA pizza with pepperoni and sausage sits on a baking pan.\nA group of kids that are sitting around a table.\nA bathroom with a white toilet and white tub.\nA morotcycle sits parked near a curb where two people are walking.\nMotorcycles standing in a row in a museum.\nPicture of an exterior place that looks wonderful.\nA toddler eats cake with his hands in his high chair.\nA thing leafy green tree branch with many oranges.\nThere are many cows on both sides of the road.\nTwo zebras are standing close in a field.\nA CD case is sitting on a bench.\nA cat laying in a bathroom sink, looking at the camera.\nPeople browse and relax in a wine store.\na man sits on a bench while petting his dog\nYellow fire hydrant with a blue top sits on sparsely cut green grass.\nA couple of giraffe standing next to each other near  a fence.\nA person with their feet on a coffee table in a living room.\nA man sitting in a motorized raft in the water.\nin a baron field a heard of zebras move about. 2 seem to be fighting\nA smile white dog by a bike on the road.\na close up of a person playing nintendo wii\nA multi layer platter filled with different types of cup cakes.\nA woman with pink hair riding a motorcycle.\nA plate with different vegetables and bread on it.\nGroup of four zebras standing in a field of grass.\nA man holding a Frisbee about to throw it.\nThe baseball player at bat is hitting the ball\nThree different road signs are stacked on top each other, as a man on a bike approaches.\nTwo children are on surfboards in the water.\nA woman is seen in the kitchen cooking on a white stove\nA bus sitting on the side of the road.\nA man riding a skateboard down a street next to a  tree.\nA dog that is wearing a dog collar smiling\nA small child holding a remote and a remote controller.\nA person on a snowboard riding it in the snow.\nA purple bird perched on a tree branch.\nDouble decker bus that is blue and green\nA man with a playful look standing by a dessert.\nA skateboarder performing a trick on his skateboard.\nA man on skis on a snow covered slope.\nA man riding on the back of a bike.\nA man holding something with some beakers on a table like a science experiment.\nThree people are looking at their cell phones and drinking wine.\nA small child and a baby are lying down together.\nA baby is laying down with a teddy bear.\nA group of young adults play frisbee in a park.\nCar parked in front of a donut store.\nthere is a baseball game on and a player is preparing to run\nA close up view of Italian mini hoagie sandwich.\nA close up of a man's hand holding a cell phone on his lap.\nA man playing baseball prepares to run after batting.\na herd of cattle on the field grazing\nA bed with a purple bedspread on it in a room with a picture on the wall.\nTwo very large vehicles side by side on a street.\nTwo red two-story buses are parked outside of a building.\na dish of food some small plates and a wooden fork and spoon\nA picture of a restaurant interior is taken through a fish-eye lens.\nA young girl standing on a field with a flock of birds.\nA man skiing down a hill covered in snow.\nA man standing on top of a field holding a bat.\nTwo bottles of champagne sit in an empty fridge.\nA heart shaped cake with bear decorations on a pedestal.\nA trash can and a white toilet in a room.\nA couple of giraffe standing on either side of a tree.\nA made-up bed in a drab-colored hotel room.\nTwo dogs are tugging on the same Frisbee.\nA man holds up a hot dog covered in toppings.\na person wearing shirt and tie and looking up.\nA yellow packet sits on a wooden bench.\nThis photo is shot from a side angle, capturing the dog looking out of the car window.\nA dog looking out the car window as seen in the side mirror.\nA group of people eating and drinking in a restaurant\nA clock tower overlooks the city and tells the time\nA hedge row with rock pillars and a blue gate with sheep behind the gate and a mountain in the background.\nA ship is coming in to port and is about to be docked.\nA red towel hanging in a black and white bathroom.\nA skier is posing in skis and with poles.\nA woman riding a wave on top of a surfboard.\nChildren in suits and ties are standing together\nThe bathroom is clean and ready to be used.\nSome green bananas and coconuts are sitting on a picnic table.\nSeveral elephants walking together in a line near water.\nA group of baseball player playing a game of baseball.\nA fully tiled bathroom with a bathtub and bowl type sink, and a wooden framed mirror.\nA crowd of people standing on loading platform between two trains.\nA cart by some water loaded with old traveling trunks.\nA stop sign is shown behind two trees.\nA man on a tennis court about to hit a ball\nA furniture store display, with a chair and set out\nA couple of people on the snow putting skis on.\nA building sitting along side of a street.\na couple of zebras are standing in a field\nWomen and a child in a boat made of tree trunks\nA woman holding a small boy while a man feeds him some rainbow colored cake.\nSmiling orange shirted sports fan using cell phone.\nA vase filled with lots of different colored flowers.\nAn oriental temple of some sort somewhere in the world.\nA man is wind sailing on the lake.\nA woman holds a baby on her arm and both are looking forward at an enclosed area with two giraffes in it.\nTwo baseball players from different teams holding their baseball caps against their chests.\nA shirtless man on a beach with a disc in his hand.\nMen and women standing and crouching in front of a door.\nA child is holding a baseball bat at a game.\nA man is holding something up that says PPK\ntwo people riding motorcycles on a city street\nA cat sitting on a towel that's covering a plastic chair.\na black and orange cat in a shoe box and shoes\nA small bird on an orange chair back.\nAn SUV parked in front of bock of businesses.\na person riding a skate board on a street\na stop light a md del line road\nA little boy is eating a donut with white frosting and blue candy.\nA zebra running through the brush tail swinging\nthere is a man and a woman posing in a kitchen\nAn old fashion oven is shown in  dim lighting.\nA long exposure photograph of a tattoo'd man skateboarding.\nA group of friends gather on a hill to enjoy a day of sking\nA single tulip is seen in a small vase.\nSmall bathroom with a shower with red curtains on it.\nA truck traveling down the street near a fire hydrant.\nA hand is near a pizza that sits on a silver platter.\nA woman jumping up from a wooden park bench.\nA double decker bus waits at a bus stop.\na box holds some gloves and old-fashioned photographs, with ties hanging above\nThe police officer is riding the motorcycle threw the streets.\nA silver vase sits on a wood surface with sprigs of silver leaves in it next to a leafy green plant.\nA bowl has a dish that contains broccoli and mushrooms.\nA zebra bends over to pick up a stick off of the ground.\nA bunch of bruised apples sitting on the cement\nA big green Ford F250 Pickup truck parked in the city lot\nA dinner of a pork sandwich and french fries, with beer as a beverage.\nStreet sign on pole outside of building with windows.\nA window with so Michael light coming inside\nA boy riding a skateboard down a hill.\nA man with short grey hair talking on a cell phone.\nA crowd is shown walking on the street.\na man that is on a skateboard on a ramp\nThree mountain goats sit and stand on a rocky cliff.\nA woman on a sidewalk against a wall on a cell phone.\nTwo children are playing frisbee on the beach.\nA young man spreads his arms to steady himself in mid air, as he and his skate board soar over the pavement below the concrete stairs.\nThe two cats are laying on top of the computer desk.\nThe huge airliner has four engines on it's wings.\nA double decked bus from behind in front of building.\nA tall giraffe is observed by people at a zoo.\nA snowboarder makes a somersault on a snowy course.\nA child drinking from a bottle in one hand and holding a remote control in another.\nA man sits at a table and takes a drink of his beverage.\nA flower bouquet in a glass vase and some writing on the photo.\nan old picture of a person riding a bicycle\nA man stands next to a very small plane.\nThe yellow fire hydrant is rusted on the sidewalk.\nPeople walking and waiting around a baggage claim area.\na person wearing a suit and tie\nA man who is holding a tennis racket.\nA black dog laying on bed with a striped comforter.\nA microwave oven door with a light bulb on inside it.\nCar's driving on a city street lined with houses.\nThere is a vase with red, yellow, and orange roses.\nA cheesesteak with a bite out of it along with someone else holding one.\nTwo teams compete for the ball during a soccer game.\nA couple of neon signs sitting above a bar.\nA kitchen stove with a microwave in the cabinet above.\nA set of electronics and appliances sitting next to each other.\nA pole with a clock on the top of it and a building in the background.\nA very uo close and personal look at a sugar glazed donut.\nA lady tennis player is bent over slightly and off the ground.\nA basket filled with items on top of a table.\nA stack of plates is adorned with pictures of round cats.\nHorses pulling carriages on the sidewalk along the ocean or a large lake\nA women who is swinging a tennis racket as two others watch.\nA white vase filled with different colored flowers.\nA man posing next to a couple of bikes on a street.\nA girl sits between a mans legs on a skateboard.\na man is in the air on a skateboard\nA red sports car parked next to a truck.\nThree men ice skating in a line while one juggles a Frisbee on his head.\nA group of people riding skis on top of a ski slope.\nA bride and groom are sitting outside on a bench.\na couple of guys that are standing up with a wii remote\nA parking meter on the side of the road.\nEmpty wrought iron bench outside the house on a tile base.\nA cat drinking water from a toilet in a bathroom.\nPeople are sitting and eating in a cafeteria.\nA group of three people riding waves on surfboards.\nasian woman with umbrella smiles at the camera\nA train pulling several train cars full of coal.\na desk with a monitor a keyboard and a mouse\nMan serving sliced pizza in brightly lit kitchen.\nPlanes lined up at the airport arrival gates on a snowy tarmac.\na bunch of kites being flown in the sky\nA surfer stands with his board on its back in the water.\nA subway car is coming down the tracks\nA woman walking on a stone wall near two giraffes and a zebra.\nA man standing on top of a snow covered slope with a snowboard.\nThe fire hydrant is by the building on the grass.\na silver car is parked in a lot\nA cat is sitting on the dashboard of a car.\nA plate of salad at a table setting with a glass of wine.\nHorse statue displayed on stand in park setting with trees and flowers.\nA child mixing food in a bowl on a table.\nA pair of skiers sitting down looking at the scenery from a top of a hill.\nVery finely made vases with painted designs on them.\nA man is in a restaurant eating sandwiches off paper.\na close up of a bowl of broccoli on a table\nAccident scene with a fire truck tilted on its site.\nA room that has a couch, chair, and table in it.\nA plate of meat, broccoli, rolls and rice with gravy.\nA health hazard sign closing a beach to watersports with a sailboat in the background.\nTwo men laying on the ground near parked motorcycles.\nA small airplane that is flying in the air near the airport\nA dog laying on the ground with a pink frisbee in it's mouth.\nLittle boys playing soccer together on a field.\nA group of people wearing orange are standing next to a VW bus.\ntwo men standing in a room near two microwaves\nFishing boats docked in a harbor with mountains in the background.\nA train can be seen in the foreground and a shipping dock in the background.\nA blue ceramic vase with fresh flowers on a window sill.\nTall buildings surrounding an alley way with birds flying over it.\nSome people in white overalls working with some metal bars.\nA table in the kitchen of a building with screen walls\nThere is no image here to provide a caption for.\nClean plates, cups, and spoons drying on a towel.\nTwo people  seated at a table with other people in the background.\nTwo birds are standing among leaves and sticks.\nA snowboarder holding a board while looking at the mountain.\nA large wave with some people on surfboards\nSomeone is surfing the breakers under a sky filled with fluffy clouds.\nA sign with a gnome crossing symbol on it.\nTwo people playing a video game on a large television.\nA busy city street with a traffic light on it.\nA black and white photo of a cow running in the desert.\nSome people are flying a kite at the beach.\nA man sitting on the floor with a laptop as others walk by.\nAn old bus being driven by a beard man.\na clock is up near a statue of a bird\nA man shaving in a large bathroom mirror.\nA man stands at a counter with food items.\nA skier skiing between poles on a ski course.\ntwo girraffe standing in an open field with their necks crossed\nA bus with the windows broken down sitting in the open area.\nA girl on a surfboard that is on the ground\nA freeway is busy in the late evening.\nA garden with yellow flowers on a sunny day.\nA child skier is headed down a small slope on their skies.\nA group of people stand in shallow water near a wind farm.\nA white toilet is shown in an all black bathroom.\nA laptop is powered and sitting next to a mouse and a cell phone.\nA train that is on the rails in a station.\nA man rides an elephant as it crosses a river.\nA boy is eating a dessert on a table.\nA woman, man and child standing near a food truck.\ntwo hands are holding white video game controllers\nA fresh fruit plate with grapes and oranges.\nTwo female tennis players shaking hands over the net.\nA group of people are on a platform above giraffes.\nThe boy is sitting on his blue suitcase.\nA large boat is carrying a smaller boat through the water.\nA small clock on a pole in front of a building.\nTwo beach chairs with towels draped over on a beach.\nLawn area outside a McDonalds, no customers, appearing closed.\nSome men are putting lots of bananas into piles\nA cat is looking up next to a large television.\nA man dressed up in renaissance clothing talking on a cell phone.\nFour soldiers and retired officer jointly cutting ceremonial cake.\nA cat sitting on the edge of an open car window.\nThere is a woman that is riding on a bike\nA traffic light with a red light and an arrow pointing to the right.\nA group of horses stand beside water and grass.\nA group of bikers riding down a busy city street\nA large cat is laying belly up on the bed\nA fire hydrant is sitting on a sidewalk.\nA lone zebra standing next to a sheep in an enclosure.\nA woman and a man with a surf board on the beach.\nA red fire hydrant raised up in the grass.\nTagged animals are grazing on grass in a field\nA photographer with his nice camera walking in a dirty road\nPerson in a black wetsuit and gorilla mask carrying a surfboard on a beach.\nA woman sitting on a curb with her feet on top of a skateboard.\nA room with a large clock sitting next to a wall.\nA motorcycle sits on the side of a building.\nA young surfer rides the side of a wave.\nA meal of broccoli and some kind of meat.\nA bunch of cows in a field with a man standing near the fence.\na vandalized stop sign on a city street near a pole\nOld style computer with keyboard and mouse sitting on rug.\nA close shot of two separate trains.\nA heavy set woman wearing a gray sweater holds a brown teddy bear.\nA kitchen with a counter, refrigerator and a dishwasher.\nA hotel looking room has another room through the door.\nA grass yard that has a large sheep laying down on the grass next to a dog.\nA woman and a little girl in blue are making pancakes and another person with her hands are putting on some cheese.\nA large brown dog standing on the side of a small road.\nA bunch of people who are standing around a table.\nA dog is lying on the couch with its head on the arm\nA bird sitting on a branch looking away.\nThere are moving motor vehicles on the road.\nA woman playing tennis on a clay court.\nA blue and gray commuter bus traveling through a shopping district.\nA person sitting at the edge of the surf in a wet suit.\na black and white cat wearing a neck tie\nA giraffe is in the wild standing next to a tree.\nA group of people sitting around a table.\na suit case on the floor with a hat on it\nA little girl sitting on a bed with a teddy bear.\na person in a blue jacket and is rowing a red kayak\nA paper plate holds two slices of pizza.\nThe giraffe is walking beside the chain linked fence.\nTwo elephants are walking through the mud in a clearing.\nA turkey that is cooking in a large roaster oven on the counter.\nA man eating a piece of pizza on top of a plate.\nA cat sits on top of a laptop computer.\nA room of people standing around playing video games\nA small black and white dog with its head on its paws\nA small beige dog with short curly hair.\nNo dogs, only teacup poodles OK sign and fire hydrant.\na couple of people are typing on their cellphones\nA woman is holding a phone and sitting in a chair.\nA pan has fruit and vegetables on it.\nA white dog stands on the back of a sofa.\nA bedroom scene with focus on a bed and a teddy bear.\nMen work on the basket of a hot air balloon.\nA hand holding a small orange Japanese umbrella.\na statue of a cat sits next to some scissors\nA great view of a street in the picture.\na man is doing a trick on a skateboard\nA man standing on a tennis court holding a racquet.\nAn armchair with a stuffed bear on it on the sidewalk.\nOne small and large giraffe standing next to each other.\nA cat sitting on top of a grey cloth and next to two staplers.\nA older man enjoying a variety of pastries and breads.\nA couple of fans have painted their faces red in a large crowd.\nA person holding a hot dog on top of a bun.\nWoman sitting near a table eating a cake.\npoint of view shot of man using a small urinal in bathroom\nMan throwing a disc at a bush park.\nA zebra eating grass on a sunny day.\na person riding a surf board on a wave\nA vehicle with Melbourne Tigers painted on the side of it.\nan orange small van and a white surfboard\nA gray remote is sitting next to a black remote on top of burgundy fabric.\nA man sitting on top of the snow holding skis.\nPeople are meeting around a circular table.\nA train stops at a vacant train station.\na ball player holding a bet and some business men\nsome people in a room with tables and two are playing a video game\nTwo women on a park bench looking at a digital camera.\nA group if people that are sitting on a park bench.\nA red fire truck toy on a table.\nA man and woman on their cell phones by an umbrella.\nA dog is watching a man ride a skateboard on his stomach.\nA carousel view shows the circular.. lighted center and several rides, including horses, a giraffe and an elephant.\nA man is holding a large pepperoni pizza.\ntwo birds standing together on a rock\nA table with utensils, glass, plate of bread and salad, and stones, on a stone patio with chaise.\nA group of people standing around each other near a tent.\nA group of elephants that are in a field.\nthere is a flower in the glass vase on display\nA tennis player in a blue shirt runs toward a ball.\nTwo men are skating on their skateboards in the middle of the afternoon.\nCows grazing on the side of a mountain covered in green grass and trees.\nA small locomotive engine blowing a cloud of steam.\na couple of people on a tennis court pose for a picture\nA person wearing stiletto heels laying in a bed.\nA bride and groom exchange a fork-full of cake on their wedding day.\nA nerdy woman brushing her teeth with a friend nearby.\nThere are lots of kites in the sky by the beach outside.\nA man throwing a Frisbee on a sandy area.\nTwo oranges on a cutting board with a zester full of rind on a counter with pot in back.\nA TV sitting on top of a wooden dresser.\nA dog with closed eyes sitting on a cushion.\na close up of a cat walking on a brick surface\nthere is a cow that is drinking water from a hose\nA woman helping along man put on a tie.\nA boy laying on his side typing on a laptop computer.\nA pair of dogs lie down beside each other on a bed.\nTHERE IS A SINK AND A SCREEN DOOR IN THE HOSUE\nFour individuals on a basket ball court, one of them holding a tennis racket.\nPeople standing in the sand flying colorful kites.\nA large fancy clock on a building showing the time of 1255pm.\nA blurred picture of a laptop and a box of tissues.\nA couple of one way street signs hanging on a traffic light.\nA bull and two calves block a vehicle from going down a road.\na white plate with meat and a green vegetable on a glass table\nA subway train painted with graffiti pulls up to a platform.\nA person surfing a wave on a yellow surf board in the ocean.\nA dog is sitting on a work bench in a shop.\nCrowd of people at public market in urban setting.\nA man uses an oar as another man looks on\na adult sheep stands by a tree as some baby sheep look on\nAn old man with a tooth brush head under his nose, mimicking Hitler\na little girl with her teddy bear sitting in front of a morror\nA group of older people sitting next to each other eating cake.\nA police motorcycle is parked at a festival procession.\nA guy walking on a field holding a Frisbee.\nan image of a person making a video game character\nA small bathroom, with only the toilet and sink visible\na person riding a skate board at a skate park\nA bulletin board filled with blue pamphlets on a city bus.\nA small bird sitting on a thin tree branch.\nTwo people and their dogs skiing along a trail in the woods.\nA living room with red carpet and blue couches.\nA man sitting at a desk with a laptop and a coffee mug.\nA herd of sheep and cattle standing on a lush green hillside.\nA woman in plaid shirt looking at a bird on a ledge.\nthree baseball players standing around  a base\nA large airplane flying high up in the sky.\nSunset seen across an expanse of calm water\nA man is holding an umbrella beside a truck.\nStalks in a ceramic vase against a mustard background\nA man in white hoodie sitting in front of a leather couch.\nThe baby elephant is walking with a small object in it's trunk.\nMale surfer in a wet suit, on a board, about to be overcome by a breaking wave.\nBaby elephant with ears spread standing in front of larger elephant.\nA photo of stuffed large animals taken through glass.\nTwo plates full of breakfast foods are next to cups of coffee.\nThe two slices of toast each have cheese on them.\nTwo young men trying to catch the same frisbee.\nA flotilla of small boats circle around water buoys.\nTwo colorful parrots perched on a tree branch.\na group of people watching a baseball game\nA photo of a yellow and green fire hydrant.\nA yellow teddy bear on a little girls bed\nA train station with a red,white yellow and blue train pulling in on the tracks\nthere are many cows that are laying in this barn\nWhite dog sticking its face inside a white toilet bowl.\nTwo guys in a bar eating pizza and drinking beer.\nA white carriage with a white horse carries passengers through a city square.\nA man crouched down with a camera next to a small white horse\nthree laptop computers sit on a table in front of a television playing the opening scene of a Star Wars movie\na teddy bear with a hat placed on his head\nA brown horse with pink and black harness stands before a business with a short white fence.\na bathroom with a very dirty toilet and sink\nA stainless steel sink is next to shelving in a room.\nA small dog is beside a laptop computer.\nA long row of buses driving bumper to bumper near trees.\nRoad sigh on wooden pole shown upside down next to white wall.\nTwo leather clad motorcycle riders on a paved road.\nbaked round bready pieces of food piled on a plate next to bowls of vegetables and other sauces\nA red vase with  dozens of roses sitting on a piano.\nA man performs a trick on a skateboard in a skate park.\nSnow-dusted evergreens and rolling hills mark the distance, while in the foreground a hunched over skier moves through a dip between two snow-packed slopes.\nA dog is running on the beach sand.\nA lone horse shades himself under some trees.\nA pan with pizza and its cutter on it sit on  a stove top.\nA half eaten cake sits on the table with a knife.\nThree people water skiing at the same time while folks in another boat watch\nA woman in a bar is wearing a tube dress.\nA neat and modular kitchen with electronic gadgets and dining table and two chairs.\nA clock tower and other buildings in a city.\nA chocolate cake sitting on top of a table.\nA woman and a man playing an interactive video game.\nSome men hugging each other and a person with plates on their head and shoulders.\nSeveral people are sitting around a table having a business meeting.\nA woman is checking her phone outside on a fall day.\nA young boy who is eating a chocolate doughnut.\nA man who is sitting at a bar.\nA man wearing a red neck tie and a blue jacket.\nA display shows hummus and vegetables on white trays.\nA black train sits on the tracks of a station.\nTable with food on it including bananas and rice.\nA skate boarder is doing stunts on a bench.\nA hand is cutting into a large white cake.\nAn old truck with a broken side view mirror.\nA lady with a brown hat and long white socks sitting on a wood bench.\nSun peering through leaves of a grand land scape in distance\nA group of friends posing for a picture at a deli.\nA large passenger jet on an airport runway near the coast.\nA man standing next to a white horse.\nA baby girl is using two brushes to arrange an older child's hair.\nA young girl is sitting on a bench in front of a rock cliff.\nA dog laying on a bed under a pillow.\nAn adorable grinning girl laying in bed between ms piggie and kermet the frog.\nCoffee and powdered sugar doughnut on a woven cloth.\nTwo pieces of meat covered with gravy next to broccoli on a plate.\nA flock of birds sits on top of a large giraffe.\nA statue of man sitting on a bench overlooking the ocean.\na train on a track near a platform with people near by\nTHERE IS AN INSIDE OF A KITCHEN IWTH A STOVE AND A DOOR\nA baseball player is holding a base ball bat at the game\nA young man ridding a surfboard down the rapids of a river.\nA red stop sign under a street light.\nOne man is on skis and another man is behind him as they stand in the snow near a pond as a group of onlookers stand off to the side.\nA toy truck sits on top of a table.\nThe small bathroom has an electronic toilet near the sink.\nA clean handicap restroom with plenty of toilet paper.\nTwo multicolored cows cross the road very slowly.\nA airplane in a field with a freight train going by in the distance\nAn equestrian riding on the back of a horse at an event.\nAn picture of an old building with two towers and a clock is taken from below.\na computer that is sitting inside of a room\nThree young people sitting at a table and enjoying some lunch.\na brick building with a blue sign on it in front of a metal pole\nA group of people posing for a photograph at a black tie event.\nA man sitting in a chair while working on her laptop.\nA man is jumping as he tilts his skateboard to the side with his feet.\nA hand holds a ball in a green sign that sits on a post.\na woman stretches high to hit a shot in tennis\na close up of a dog laying on a bed\nThe people are drinking from cups and smiling.\nTwo ovens next to a plastic bucket and trash container.\nA giraffe enjoying the company of another giraffe.\nA warning sign in front of train tracks.\na laptop with some other electronics on top of it\nA bench is in front of a flower bed.\nThree airplanes are lined up for take off.\nA couch is sitting outside on the curb by the pole.\nA cat relaxes in a suitcase next to a pile of clothes.\nCooked broccolini and greens on a white plate.\nA leaning stop sign has a street sign on top.\nthere are many vegetables sitting on this counter\nChildren looking at a zoo giraffe and its baby\na motor bike sits parked on some ply wood\nA donation station on the side of the street\nA group of vehicles in street area next to a building.\nA sumo wrestler is shown wielding a baseball bat and awaiting an incoming pitch.\nA bathroom sink with no mirror behind it\nA shiny new racket looks down upon the worn shoes of a tennis player.\nThree people in the distance riding horses along the beach.\nA young skier headed down to the ski lodge.\nThe beat up car is parked beside the building with a statue of several men.\nA large long train on a steel track.\nA dog is on the grass playing frisbee.\nTwo zebras chasing each other in an enclosure.\nThere is an animal walking along the hill.\nA group of long horn bulls in a field.\nA man is laying on the floor of a hotel room next to an open suitcase.\nA view of a dock with a lighthouse in the background.\na person holding a hot dog with mustard and ketchup.\nA man on skis on a mountain trail.\nThere is a hotdog and a side dish on a plate.\nA black and white picture of a jumbo jet parked  on a runway.\nA black and white photograph of something I cannot quite make out.\nA zebra standing on a field next to lush green trees.\nThe man is sitting on the bench by himself.\nA girl is petting a cow through a fence.\nA woman wearing a short skirt kneeling on a tennis court.\nTwo black, white and orange stand on the grass near a cliff.\nthe baseball players are talking on the field.\nPeople are walking on the street by a homeless person.\nA grey clock tower above grassy area and building.\nAn elephant standing next to a body of water.\nA person trying to fly a kite on a beach.\nA dog with a bandana and goggles sits on a red motorcycle.\nA pile of identical teddy bears lays on top of some pillows.\nA couple of red double decker buses sandwiching a small white bus.\nA grove orange trees filled with juicy oranges.\nA man riding a snowboard down a snow covered slope.\nPeople and cattle standing at the waters edge on a bright sunny day.\na little boy retrieving a mans banner flags that have broken\nan overhead view of people at desks working on their computers\nA person in a wet suite running beside the water, holding a surfboard.\nA pair of seagulls resting on the top of a lake.\nA close-up photo of a piece of broccoli upside down.\nA large tall tower with a clock on the top.\nA snow boarder in mid trick with the Hilton in the background\nA bear and a dog sitting together on a hillside.\nA person that is catching a ball in a baseball game.\na guy standing in front a large building holding a tennis racket\nThree people sitting on a bench in front of a lake.\nA bus driving down a city street next to tall buildings.\nA couple of little kids sitting in the grass.\nA black plate topped with lasagna and garlic bread.\na bus is driving down a snowy road near the days inn and suites\nA vegetable and fruit stand on display at the market\nSeveral bundles of green and yellow banana's hanging around a table.\nA group of people who are sitting on couches.\nBoy performing a trick in mid air on a snowboard\nThe people are cooking out and have hot dogs on the grill.\nTHIS IS A PHOTO OF A SITTING AREA WHERE SOMEONE HAS PLACED THERE LAPTOP\nA young child sitting in front of a pizza.\nA Siberian Husky dog is being brushed while he lies on the floor.\nA toilet area with bright and colorful wallpaper.\nA group of people playing a video game on Nintendo Wii.\nThere are plenty of apples to choose from in this outdoor market.\nA woman and a small child watch a train as it passes.\na lock on a door under a window\nTwo vases of flowers are sitting on a counter top with bears.\nA class room full of students and they are on their laptops.\nAn old stoplight with a clock and a troll doll next to it.\nBuses are parked near a field with a fence.\nA lady is pulling her luggage through a terminal.\nA group of sheep gathers under a tree in a grassy field.\nA BOY JUMPING OFF A CRATE WITH HIS SKATE BOARD\nA black and white cat sitting on a suitcase.\nAn older man downhill skiing down a slope.\nLarge canyon surrounded by a series of trees.\nA waiter lighting candles on a cake at a restaurant.\nA man with skiing gear on top of a mountain with snow\nA man skateboarding on a skate ramp at night.\na person spraying a horse with a water hose.\nA man and a woman ties a boat to a wall.\nI am unable to see the image above.\nA black and green vintage engine moves along tracks by a station.\na see through sun roof cover is being used\nA person in a room with knives and scissors hanging on the wall.\nA train pulling several carts traveling down the rail road tracks.\nA table with various meets, breads and tomatoes on it.\nThe two ladies is outside  talking in the rain\nA woman in a plaid cap cross-country skiing in a group.\nA small child with a kite walking on a beach.\na man sitting  in a lawn chair in the snow\nA window with a bench is under a staircase.\na sign on a pole advertising free bus rides\na bunch of students stand around on the field behind some school buildings, playing Frisbee\nA family who are selling bananas in a portable cart.\na person is holding an old cellphone outside\nGirl in a dress throwing a red frisbee.\nA group of people in the ocean on surf boards.\nA large crowd of people gather in a square with Capitol Hill in the distance.\nA plane with a sign attached to it flies high over an ocean beach.\nThe kitchenette has a stove, microwave, and sink.\nA red, white and blue train filled with passengers.\na close up of a vandalized stop sign\nA woman playing tennis holding a pink umbrella\na man and woman recline in a bed, each with their own laptop\nThere is a purse on the floor with its contents spilled out\nA bunch of surfboards are standing in a room.\nA man and a woman are cutting a cake while others watch.\nkitchen with a wooden kitchen island and checkered floor\nThe two elephants walk next to each other in front of the water.\nA bus filled with little monitors displaying video.\nA laptop sitting on a couch with cell phone on a table.\nA tile floor in a bathroom with a urial.\nA jet airliner is on a runway on a cloudy day.\nA vase of flowers sitting on a checkered table cloth\nTwo vases filled with white and purple flowers.\nPeople walk up the stairs to get on a small airplane.\nThe young child is eating from a spoon.\nAn end table with a vase, remote, phone, candle and wedding picture.\nlarge tourist clock near a body of water.\nA parked motor bike on the side of a the street.\nA bowl of fruit is presented with a pitcher of water.\nA woman that is hold a device in her hand while standing on a court.\nA strawberry shaped cell phone holder hangs from a belt loop.\nA silver train traveling into a train station next to a platform.\nTwo women, one with glasses standing next to a sheep pen\nA man is in the park with a hula hoop.\nA car that has the front of it open.\nA street scene with signs and people on bikes.\nA plane is flying in the air nearby a mini van and rental truck.\nA bird perched in the top of a leafless tree.\nA stream with rocks outside by train, with hills and evergreens.\nA picture of a gross looking cheese pizza.\nA door is open on a white subway train\nView from behind of two women under umbrellas\nA man or woman skiing down a snowy hill.\nThere are sheep grazing together in the grass.\nan image of two horses at an outdoor park\nA woman and two men on skis on a snowy hillside surrounded by trees\nTwo women are riding motorcycles down the street.\nTwo street signs on top of a metal pole.\na brown horse with a brown nose laying down\nA hazard yellow Navy plane sits in a hangar.\nA couple of people playing Frisbee out side.\nSeveral remote controls piled up on a flat surface\nThe people are moving across the snowy mountainside.\nrecovery tow truck towing a bus from a parking lot\nA living room setting with two bookcases with books\na small refrigerator on the floor next to a freezer\nZebras are grazing next to a car in a field.\nThe curious kitten is looking down into the bathtub.\na yoilet is int he middle of a clean bathroom\nA young girl puts a TV remote control to her face like a cell phone\nA man engaging in a game of tennis on a court.\nA clock on a red building letting people know the time.\nA bunch of green bananas tops a large bloom.\nA brightly painted bus pulls out of a parking space.\nFront half of a commercial airplane on a runway closeup with dusky sky.\nA vase of flowers sits near a window with a blind.\nA street sign in front of an old building in Ottawa, Canada\nsome elephants are standing around some water\nA computer mouse, mousepad, and computer keyboard on a table.\nThis is a picture of a popular sking mountain.\nA woman walking on a tennis court holding a racquet.\nA man that is sitting down holding a telephone.\nA sand area that has various sets of vehicle tire tracks on it and one beach umbrella open and set up in the sand.\nan old man holding an umbrella next to a bare tree\nPeople fishing and enlarge Mountainlake with trees lining the shoreline\nA tractor sits on the back of a large truck in front of a clock tower.\nA boy feeding a giraffe something green with palm trees in the background\nThe clock tower sits in the middle of the pavilion.\nA sign indicates directions of travel in a circle\nAn elephant standing in dirt under a tree.\nThere is a giraffe that is standing at the fence and someone is petting the giraffe\nA young dog lies on a freshly made bed.\nThree green birds perched on a limb with the sky in the background.\nThree good friends having a bit of lunch and drinks together.\na person walking on a city street with signs and poles\na baseball player holding a bat on the field\na group of people on a paddle boat in the water\nSome bananas are placed on a cutting board along with some yogurt and a package of creel.\nA white toaster in the middle of an asphalt road.\nWe see a close up of a vegetable ad pasta salad.\nA couple of men playing a game of frisbee.\na diced up credit card next to some scissors\nA blue and red airplane is flying in blue skies\nTHIS IS A UP CLOSE PHOTO OF A PLATE OF FOOD\nRed traffic light at intersection on paved four lane road.\na cargo truck is loading a train with luggage\nKites flown in large grassy open area with numerous onlookers.\nMotion capture shots of a person riding a snowboard.\nA large long train on a steel track.\nTwo black bears being kept in an enclosure\nA man snowboarding down a snowy and hilly slope\nA female tennis player gets ready to begin play.\nOne tennis racket is place on top of the other one.\nA man on his motorcycle is attempting to mount before taking off.\na man holds his racket out while on the tennis court.\nA man in white shirt and shorts playing a game of tennis.\nA baseball player holding a bat while standing next to home plate.\nA cat lying on carpet with its head on a banana.\nA Macbook is placed on top of a book.\nMany brown and white cows are in a dusty field.\nA woman holding a tennis racquet walking near a little child and a man.\na person holing a hot dog with onions on it\nA black and white photo of a glass bakery shelf.\nA man on a boat preparing his fishing pole.\nA football game on TV reflects in a bathroom mirror.\nA baseball player holding a bat next to home plate.\nA couple buses are parked in a parking lot at night.\nA light brown dog laying on a leather sofa.\nTHERE IS A BABY THAT IS IN FRONT OF THE REFIGERATOR DOOR\nA man is filming something on a cellphone\nA little boy holds a toothbrush in the bath.\nA photo of a bathroom sinks and tub taken in a mirror.\nA cat sleeping on the contents of a piece of luggage.\nA laptop, mouse, cell phone and a notepad sitting on a table.\nTwo giraffes grazing the in wilderness with a mountain in the background\nA rusted locomotive on a hot summer day.\nA group of men standing next to each other holding snowboards.\nA man in a baseball uniform walking on a field.\nA train travels along the platform in a train station.\nThe table has most of the items needed to keep in the repair shoulder bag.\nA bird is perched on the branch of a tree.\nA metal stove sits under a granite countertop in a kitchen.\nA metal refrigerator freezer inside of a kitchen.\nA person in winter clothing, a helmet and skis, doing a trick i the air with jos skis crossed.\nthere is someone holding a remote in there hand\nThere is a rowboat out on the water in this sepia tinted photo.\nA bathroom with a shower and toilet decorated in pink and green.\nA  woman in black standing near a bus stop\nA black and white motorcycle parked on the sidewalk outside a store.\nLights shine on two matching, white, pedestal sinks.\nA man with a dog in his backpack walks down an aisle on a bus\na clock on a building with a sky background\nA new silver motorbike parked in a garage.\nA black dog laying on a bed under a blanket.\nA truck that is parked on the side of the street.\nA guy with a broom and dog stands on a surfboard.\nA young person laying down on a surfboard riding a wave.\na man and a woman are playing a video game together\nA boy is sitting on a hospital bed.\nZebra alone in a field of dry vegetation.\nA chef preparing food inside of a kitchen near  a window.\nsome people on some grass playing frisbee and some trees\nA young boy standing on his tip toes playing a game on the Wii.\na surfer laying on their board about to catch a wave\nA clock at 7 during a hazy autumn time of year\nTwo laptops sit atop a desk on either side of a phone.\nA man and a woman sit together on a bench.\ndressed up toilets in a toilet competition on fake grass\nA group of sheep are out in the fog wandering.\nA baeball player I l9e standing in a field\nA brand new black stove in a primarily white colored kitchen.\nA herd of cows walking across a river.\nA baseball player slides in to base to try and take it.\nThe plane is flying very low to the ground.\nA parked red and white motorcycle is shown from closeup.\nBlack British fighter jet doing a barrel roll.\na rusty old truck sitting in an overgrown field\nA pair of men looking at a tablet perched on a table.\nA pizza sitting on a table outside\na sotre front with bread in the display window\nA cat lies on a rug and chews on a banana.\nA bed is in a bedroom with two lamps on nightstands.\nWoman as seen through window of red vehicle.\nSpectators watch as a skateboarder performs a trick on a ramp.\nA girl is on her cellphone surrounded by fruit\nTwo people using cleaning brushes on an outdoor monument.\na lone zebra stands between some trees with a zoo sign in the background.\nThere is a row of parking meters on the street.\nA baseball player throwing a baseball during a game.\nCargo train is traveling on a track next to a forest.\nThere is a man putting bread on a shelf.\na tennis player swinging a racket at a ball\nA large commercial airline taking off from the airport.\nTables of laptops are visited by various people.\nA person riding a skateboard through the air on a ramp.\nA woman swinging her tennis racket on a tennis court.\nA surfer balances on a surfboard in the ocean.\nA steeple of a large building, with a clock on it.\nAn airport runway with a jet airplane ready for takeoff.\nA pair of people sit at a table with food and drinks.\nA laptop computer on a desk beside a paperback book.\nBowls of soup sitting next to oranges and limes on a table.\nA fire hydrant is shown on a sidewalk with a brick building nearby.\nA large tray filled with tasty looking food.\na boy throwing a Frisbee at night in a park\na close up of a man with a mustache smiling\nA flock of ducks swimming on a lake together.\nthere is a man in the water and a boat next to him\nA very crowded busy street with many signs hanging from tall buildings.\na surfer is out at sea riding a wave\nA man on a skateboard passing by large glass windows.\nA train is on a bridge next to buildings.\nThese are special repair vehicles used on train tracks.\nPerson laying on bed by a window reading a book.\na group of people seated around a dining table outdoors.\nAn old fashion setup with cakes, candy, and tea.\nA skateboarder doing a trick in a parking lot.\nA bench with memorial bears and flowers on it.\nUp close picture of baseball batter wearing gloves and helmet.\nA woman crosses the street while she talks on her cellphone.\nYoung adult male is surfing and riding the waves.\na surfer walking down the beach looking at the ocean\nThe young man is practicing on his skateboard.\nThere is a person holding a Wii remote in their right hand while holding the nun-chuck in their left.\nA man riding a wave on a skateboard in an ocean.\nA cat is laying on top of something on the side of the road.\nA man is talking on the phone while working on the computer.\nan older person passing out plates of food to young people\nA person on a snowboard is sitting in the snow.\nA team of baseball players are posing for a group picture.\nA plate of food with broccoli and different kinds of pasta.\nA woman laying on top of a surfboard next to a black cat.\nA  bedroom filled with bunk beds and a latter.\nA few kittens in a bowl in a white void\nA small dog wearing a colorful sweater leaning out a car window.\nan open pizza box sitting on top of a stove\nA donut with red, white, and black sprinkles.\nA woman running to hit a tennis ball with her racket.\na baseball player holding a bat in a batters box\nMilitary plane is being flown by a pilot\nA cow stands next to a calf inside a fence.\nthere is a man on the beach that is flying a kite\nThree boys playing a soccer game on a green soccer field.\nA bowl of corn chowder with broccoli, and a spoon on the side.\nan image of a group of people surfing\nA man surfing with a photoshopped character on board in front of him\nTwo zebras grazing in front of a large bird\nA zebra grazing on long dry grass in a field.\nA man and woman looking at their cell phones.\nA dog has a frisbee in his mouth outside\nA young man swinging around a Wii remote.\na field full of windsocks and cars parked in the background\nA couple of deer standing next to a zebra on a grass field.\nA stylistic pot and vase sit on top of a mantle.\nTwo stacks of towels are laid out on a bed.\nA classic car waiting at a 3-way stop sign.\nA bathroom scene complete with a toilet, sink and bath tub.\nTwo women sitting next to each other on luggage.\nA dog sits waiting while his owner cuts some meat.\nA living room with chairs and a wall of windows looking to a patio.\nRoom with a lamp on a wooden computer desk.\nA woman is reading a book with her head cupped in her hand, as she sits in front of a park.\nTwo towels hanging over a shower rack in a well-lit bathroom.\nFive jets flying in the sky and making colored smoke.\nA young child eats a hot dog on a bun.\nA train passing by a field that has been cleared.\nA cat who is sitting in front of a keyboard.\nSnowboarder going down the side of the mountain of snow.\nA bird is sitting perched on a branch.\nThere is a black cat that is sitting on top of a toilet\na little cat standing on the lap of a man sitting in a chair\na kid is riding on a surfboard at the beach\nA small dog sits on the driver's seat of a car.\nTwo toilet stalls in a bathroom with a black and white checkered floor.\nA giraffe standing on top of a grass covered field.\nTwo people siting together on a bench matching .\nSeveral people seated at a table with pizza.\na small white hand held remove control device.\nA person driving a motor bike through the sand on the beach flying a kite.\nTwo frames of a woman with a tennis racquet.\nA woman with blue hair and a giant toothbrush\nFour guys with game remotes playing a video game.\nA photo taken from a plane looking down at the mountains.\nAn airplane moves along a runway at an airport.\ntwo guys sitting next to each other with laptops\nA man jumping to strike a tennis board with his tennis racket\nA piece of cake sitting on top of a plate.\nA kite surfer flying above the ocean in his wetsuit\nMen and women sitting under umbrellas on the beach.\nVehicles at night on a highway near a large hotel.\na double decker bus is parked in front of a store\nA large teddy bear sits next to a red wall inside a toy store.\nAn abandoned red train car in dirt lot.\nTwo women sharing a plate of breakfast are happy.\nA porceline toilet sits outside on a sidewalk.\nThis is a train that is parked near a building.\nA warped photo of an unoccupied bathroom in a home.\nThe manufacturers box for the Nintendo wii on the floor\nFour people carrying luggage turning for a picture\nA dog with a purple Frisbee in its mouth.\nPeople standing on a sidewalk near a parked bus at night.\n6 people gather and socialize in a kitchen.\nA bird flies over the water near an island.\nGreen salad with broccoli and peas with fork and bowl\na woman laying on a bed with a sleeping cat and poodle\nA man posing in his office work cloths\nA sturdy, small brown horse looks back as he walks through the hot sands.\nA young blonde boy sheers the wool of a sheep.\na group of pictures of the same table with multiple trays of food\na man is standing on a skateboard around people\nA fuzzy picture of a man on skis\nAn old refrigerator is near shelves of bottles.\nA herd of sheep standing on top of a lush green hillside.\nTwo puppies playing in the green grass of their yard.\nA teddy bear is sitting outside on a chair near flowers.\na large kitchen with fancy counters and white cabinets\nThe pizza has more sauce than cheese and pepperonis.\nA picture of a group of people surrounded by bananas.\nA person in a purple shirt plays frisbee golf.\nA stop sign and street sign stand on the corner of a street.\nThe woman on the horse is racing the course.\nA rusted stop sign attached to a school bus\nA cow that is standing in the grass.\nA zebra taking a drink out of a basin at the zoo.\nA white horse pulling a horse carriage down a street.\nCat laying on the floor wearing a tie around his neck\nTwo white plates topped with french toast and fruit.\nA sub sandwich on a white plate on a table.\nA woman sitting in a car while her dog hangs out the window.\nA sheep in a field overlooking a lake and forest of trees.\nA pizza that is laying on a table.\nA young boy catches a soccer ball in his house\nLaptop and mouse sits on desk in front of computer monitor\nA pole with several street signs outside of a building.\nA boat floating on a river that runs through a city.\nA phone case, with a phone hanging on a belt loop.\nA young man standing next to a skateboard.\nA man sits holding jewelry near a woman.\nA cat sits on the seat of a motorcycle.\nA picture of an airplane flying high in the sky.\nA pickup truck with a camper is in a parking lot.\nA boy skateboarding down the a busy street\nA train sits on the tracks by the platform.\na microwave sits on a stands with a vase on it\nA food entree is served on a plate with skewers.\nA man sitting next to a Wii machine with a Wii controller in his hand.\nA building outdoors on a town street near some street signs.\na man wearing a wet suit in turbulent water\nA group of cows walking across a grass covered field.\nThe modified school buses are in a muddy arena.\nAn assortment of different pottery on elevated shelves\nA parking meter in front of building windows.\nA woman stands with her green and black luggage.\nPeople with a without surfboards watching a surfer in the water.\na blue vase holding some flowers next to a wall with a border\nA clock tower in roundabout next to an ocean.\nA half eaten pizza sitting on a table next to stuffed animals.\nA street riddled with garbage and people walking, sitting and standing around it.\nA woman inside of a room with many items plugged into wall outlets.\nA  child riding a bicycle with a lady sitting behind him.\nA kitchen table and bench made from a door\nA black bird standing in the green grass.\na couple of women take a photo of a bath room\nPlane on the tar mat of an airport.\nan image of a woman eating food at the restaurant\nA view from the street of two traffic lights and a building.\nA clean bedroom with a tidy bed and large windows.\nA man is on a field kicking a soccer ball.\nA CITY BUS IS PARKED ON THE SIDE WALK\nMany different fruits that have been organized by types.\nA set of three piles of ripe bananas.\na very big bus moving on the street with no people\nA newly married couple touching a strange mans hand.\nTwo business men with colorful ties looking to the right.\nA breakfast plate including potatoes, biscuits and gravy.\nTwo white toilets in a alley with a tiled wall.\nA old fashioned colonial dining room hutch and an anniversary clock on a shelf on the wall.\nstop light placed near the ground beside a white building.\nA man walks under an umbrella for The Bitter End.\nThe streetlight has several different  colored  lights.\nA woman sitting at a couch with two cats looking out a window.\nA boat is parked on the side of the dock.\nA hand sitting on an open laptop computer.\na couple of people that are playing a wii\nThree women are enjoying an outdoor lunch on a sunny day.\nA large black and white statue of a cow.\nA large building in the background with a clock and tower on the top of it and people walking in front of it down a sidewalk and paved area.\nA surfer is bent over riding the wave in to shore.\nan elephant in an enclosure at the zoo is walking\nA man in a tie and vest looks seriously at the camera.\nSeveral different kinds of vegetables on a black countertop.\nA woman stands next to a traffic light.\nEight busses are parked in front of a field.\nA white refrigerator freezer combo sitting in a kitchen.\nTwo older people walking two dogs on the beach with surfers in the water.\nThe people are dancing down the street with umbrellas.\na close up of a person in bed with a book\nA cake sitting on top of a plate with a knife\nA lady with a dog is talking to a lady and man.\nA toothbrush with round and straight bristles on it.\nFour people standing next to a net holding racquets.\nA clock that looks like it has melted sitting on the edge of a shelf.\nA peach cobbler is made in pizza style.\nsomeone sitting on the couch while they use  their laptop\nTwo elephants with grass in front of them in an enclosure.\nA very attractive and neatly kept bed room decorated in red .\nMeat and cooked vegetables served on a white plate.\nMan herding some skinny cows in a street.\na tennis player stretching to hit a serve\nBicycles in the bed of a pickup truck.\nA black cat sitting on top of a bathroom sink.\na black cat is hiding in a box with shoes\nA bouquet of flowers in a blue vase contains roses and large leaves.\nA woman is talking on the phone and leaning on he xar\nA kitchen with a black stove top oven.\nAirplane flying over the top of a White Castle.\nA man walking his dog on a quiet country road.\nA man poses with a cane and purple hat in front of a woman carrying an umbrella.\nA man speaking to an audience in an auditorium.\nA long train traveling along train tracks in a train yard.\na hand is holding a silver cellphone against a white background\nA flock of birds standing on top of a grass covered field.\nA picture of a bench outside by the water.\nA dog that is sitting by a computer.\nTHERE IS A BLACK BEAR THAT S WALKING IN TEH DEN\nMan up at bat in a baseball game.\nA couple of people pouring a glass of wine.\nA bird is perched on a large rock near the shore.\nA pot on the range with different types of vegetables.\nsome cupboards with a microwave sitting on top\nA person is holding a doughnut with coconut on it.\nThe confused man is trying to read the sign.\nA brown and white cat sitting on top of a desk.\nA view of a sign on the side of a building.\nA body of water near a city with ice chunks.\nA plate with a chicken breast, ear of corn and broccoli with sprinkled parmesan cheese.\na boy performing a skateboard trick in a skate bowl at night\nA baseball stadium with a crowd watching as a man holds his bat and another man throws a ball.\nA bus is stopped while three people are crossing.\nAn adult and a baby zebra are walking through the grass.\nA man on a tennis court holding a tennis racquet.\na yellow blue red and silver train engine and some tracks\nA small toilet in a wood walled bathroom\nA tennis player, playing in a stadium, in mid air.\nA tan dog's head poking out from a dark colored backpack.\nA bike is propped up against a building.\nA professional tennis player walks at the back of the court.\nTwo horses grazing on green grass in a fenced in area.\nthis is a close up picture of a roosters neck\nA cat that has just come through a doggie door.\nA dog in an open doorway with a pile of green bananas in front of the house.\nA skateboarder performing a trick next to a bike rider.\nA boy in a hat is smiling while holding a Wii controller.\nA cat lays between two parked bicycles in a black and white photo.\nThe batter on the Ray's  baseball team is celebrating a run, giving the incoming runner  his outstretched palm.\nThe woman is playing a video game on tv.\nThe woman is flying the kite on the walkway next to the water.\nA bathroom with a wooden vanity and large wall mirror.\nAn elephant with seat on its back standing by a fence.\nA little boy running on the beach with a kite.\na desk with a ton of televisions and monitors on it\nA dog standing by a truck pulling a trailer.\nBoat that just crossed under a bridge on the waterway of a city.\nA beautiful young lady standing next to another beautiful lady and a man.\na person riding a horse on a beach\na young child standing in the kitchen next to an oven\nTwo birthday cakes sitting on table beside each other.\nBright and shiny red motorcycle parked on the street.\nTwo people sitting on a bench by a tree outside a building.\nA man holding a tennis racquet pretending it's a guitar.\nA dog is lying on a bed with a red blanket.\nTeddy bears of all colors are in a big pile.\nThree boys walking along the beach carrying surfboards.\nTwo bathroom sinks under two mirrors next to paper towel dispenser.\nA very cute cat sitting in a corner.\nA very big pretty bird by the water.\nTwo small children ski down a snowy tree lined slope.\nTwo women hold umbrellas outside a store with a young girl.\nWoman in a folding chair with surfboard beside her on the beach.\nA small bird perched on a fir tree\nFour zebra stand near each other looking at the ground.\nA woman prepares a fruit smoothie inside a blender.\na street view of cars parked alongside parking meters on a one way street\nA man who is looking at a giraffe in an enclosure.\nA white bowl that includes carrots and broccoli.\nA white pickup truck is parked in a parking lot.\nSeveral planes are admired in an airplane museum.\nA woman walking down a street on a sidewalk.\nTwo cats perch on the roof of a car.\nPeople are walking down the sidewalk in a storm,\nThe sheep are scattered to graze in the field.\na young kid stands in front of a granite table with a train on it\nA pink cell phone sitting beside a tree.\nA woman is milking a cow into a metal pail.\nBoy doing a skateboard stunt with feet and board off the ground.\nPeople line up in the snow for pizza and soda.\nA woman holding a umbrella over her head.\nGroup of people watching something with man recording in room\nTwo large pizzas covered in sauce and cheese.\nA billboard on the side of building features a bull.\nTwo plates filled with hot dogs sitting on a wooden counter next to drinks.\nAn airplane sitting on the tarmac with several service trucks around it.\nA glass table with pink flowers and green plants.\nThere is a funny picture on the screen of the laptop.\nA dining table is set with many different dishes\nA bakery shop displays an assortment of cakes in a vintage case.\nA girl displaying a sad expression while she eats.\nA wooden trunk sitting outside with stickers on it.\nA city view shows architecture and people walking.\na cat resting on top of a luggage bag resting on a bench seat\nA person holding an umbrella leans out the train door.\nThere are two computer screens next to a lap top on a desk\nAn artist's rendering of birds flying past a lighthouse.\nA boat is traveling on rough waters in the ocean.\nA cobble stone path through a park leading to a bench.\nA beautiful red haired lady preparing food in a kitchen.\nThe guest of the wedding are gathered in a house.\nA train that is sitting on the tracks.\nA herd of elephants walking across a stony river.\nA smart phone is very companct and handheld.\nA surfer riding a wave in the ocean, performing a trick.\nsome snow coming down on some street signs and trees\nSeveral women sitting in front of a birthday cake and laughing.\nA colorful railroad train arriving at a station.\nA piece of cooked pizza that is on a plate.\nthere are two street name signs on a street pole\na man standing in the park while holding onto a frisbee\nA box that is filled with oranges in the grass.\nA view of a sign that reads steep descent on it.\nA young man on his skateboard next to a rail.\nA plate of cookies, a bowl of carrots and blue frosted muffins\nA blue basket filled with bunches of ripe banana.\nA woman prepares a large pan of food.\na sandwich with a bunch of mushrooms on a plate\na person sitting wearing a suit and tie\nblue and yellow train carts on  the tracks\nMulti-colored stuffed animals standing side by side in a shop.\nA guy with a pet sits in a parking lot\na man is looking into an oven opening\nA herd of elephants walking across a field.\nA woman standing next to a man holding a cake filled with lit candles.\nModern bathroom with two sinks a toilet and a shower\nA person holding a controller aiming it at a tv.\nSleepy dog guarding two remote controls on the couch.\nTwo checkered chairs and a clock in a room\nHe is flying over the steps on his skateboard.\nAn open box of pizza with toppings on a counter\nA truck pulling out of a parking lot onto the street\nSlices of vegetable pizza arranged on a white platter.\nThis is a bride and groom cutting their cake\nThe front view of a bathroom toilet inside a stall.\na man is watching a television on the floor\nA teddy bear is sitting down wearing a bow.\nA person is standing in the intersection of a street.\nTwo men in purple rush to catch a frisbee.\nA group of competitive cross country skiiers in a race.\na white bathroom a sink toilet and tub\nA person that is looking at something down the street.\nThe flat bed truck has a huge roll of tape on the back.\na man swimming on a large wave in the ocean.\nA cartoon version of a bed and bedstand\nA caved in street with a bench in the hole\nA yellow train on the track at the train station.\na cat is being fed by it's owner in a bed.\nTwo young boys in shorts at park with hands raised.\nA young boy pulling a pink piece of luggage.\nA man who is holding a surfboard and walking in the water.\nA young surfer riding a very nice wave.\nFour photographs of a man shaving his face.\nA painting of a dog holding a dead duck in it's mouth.\nSome people walk on the sidewalk near a busy intersection.\nRow boat sitting in the middle of a lake by building\nA man riding a skateboard down the side of a ramp.\nthere are many people that are flying kites\nA piece of cake with a fork and one and a half apples on the plate.\nA young man is riding his skateboard on the road.\nthis is a man ridinbg down a hill on skis\nAn elephant crossing the road behind a car that has just passed\nthere is a pink rose in a glass vase\nA couple of men walking with a large elephant.\na train on a train track on a city street\nA tennis player getting ready to serve a tennis ball.\nA man on a surfboard riding a wave.\nSome vegetables in a stew of some sort.\nA man riding a skateboard up the side of a ramp.\nPeople seated on a stone bench on cell phones.\nA photograph of a giraffe in the wild.\nPeople sitting at a bar with a lady turned smiling at the camera.\nTwo men standing in front of a TV playing with a Wii.\nA woman flying a kite and holding onto kite string.\nTwo people jumping up to catch a frisbee\nA man sitting in a chair playing a guitar in front of a microphone.\na black gray and white cat is sitting in a sink\nLarge motorized model plane parked beside air field.\nA young child at the table with a birthday cake and three candles.\nA very big display of many kinds of pastries.\nA baby sitting on a females lap staring into the camera.\nA boat tied up to the pier next to other boats on a clear day.\nA large clock outside of a window building.\nThis is an image of a giraffe with a city in the background.\nA skateboarder has his feet off the board before a landing\nCattle walking in open rutted field on sunny day.\nA girl making a \"peace sign\" with her hand and a woman holding a big black suitcase.\nA laptop is next to a desktop compute near a window.\nA basket ball player is posing in front of a basket.\nA large sandwich being cut by a person\nLarge, mild waves are coursing towards two boats.\nA giant clock is on the wall of a brick building between two windows.\nA skier stands next to skis stuck into the snow.\nA plate that has a sub sandwich on it.\nA table has a plant next to the glass doorway in the kitchen.\nAn old fashioned passenger train traveling through the countryside.\na person sitting at a bench with a skate board\nA woman tossing a frisbee on a lush green field.\nJockey riding a race horse on a runway.\nA horse grazing in a field witha blanket over its back.\nTwo men in suits with one man leaning on a railing.\nA man and child sit on the floor with game controllers in their hands\nA blue and green plaid tie with a flag pin on it.\nA tofu and broccoli dish simmering on the stove\nA couple of chow dogs sitting in a car looking onward.\nA crowd is watching a woman play tennis.\nA small boy with a birthday hat on holding a tennis racket.\nThe view from a motorcyclist's point of view, looking down a street.\nA man playing a guitar and other musical instruments\nBatter winds up ready to hit the baseball\nA chocolate caked frosted and topped with blueberries on a metal cake plate.\nA large clock mounted to the side of a pillar.\nA bird sticks its head into the water underneath a layer of plants.\nA baseball player standing in front of an A's poster.\nWoman in maroon shirt holding up a bagel.\nA couple wearing skis at a ski slope\nA white toilet sitting next to a large window.\nA red and black motorcycle parked on the sidewalk\na yellow cat going after some corn on the cob\nA baby cow with his ears tagged with yellow markers.\nA red and gold painted fire hydrant on the street\na cat sits on a wooden cluttered table\nThree Asian takes on hot dogs on display.\nA gray haired man is wearing a blue shirt and has a tie draped around his neck.\na cake with a section missing sitting next to a burning candle\nThese two riders are far ahead of the ones behind them.\nA woman talks on a cellphone while holding a pen.\nA plastic container filled with sliced carrots next to a yellow object.\nA girl swings a net a tennis ball.\nA cargo train that is traveling down railroad tracks.\nmany small boats in a large body of water\nAn older zebra and younger one nuzzle in a field\nThe clock on the post has faces on four sides.\nA woman holding her head out the side of a train.\nA person on a motorcycle with a stuffed animal on back.\nA flower is put into strange pots next to a plate.\nTwo zebras stand in the grass together near a fence.\nA man in a surf board shaping studio.\nvery long and nice buses standing at the zebra crossing\na window shoing a man standing alone on a train platform\nA red fire hydrant stands in the dirt of a stone platform.\nA box containing three round doughnuts and a fritter\nvaries vegetables sitting on a black counter top\nAn orange cat laying on top of a black piece of luggage.\nthere is a man playing with a frisbee on the field\nA large market display of citrus fruits including navel oranges and clementines.\nA plate of assorted desserts and dessert sauces and a bowl of ice cream.\nsomeone holding a half eaten hot dog that has mustard and ketchup\nA bowl full of soap with a bowl of vegetables on the side\nThere is a rug on the lid of a toilet and another rug in front of the toilet.\nA black and white modern bathroom showing he sink and mirror\nA hospital bed next to a blue chair\na room wit ha chair a bed multiple windows\nIt's a very elegant looking bathroom with double sinks a large mirror and a tub.\nA baby holding a spoon and looking at a pair of scissors.\nA woman rides a horse through a grassy field.\na group of zebra drinking from a trough together\nA skate boarder reaches the top of a steep barrier.\nA plaque on the floor in front of a chair and grandfather clock.\na bus painted in white, blue and yellow\nSome is holding a bottle of wine next to a huge hot dog covered in chili.\nA cardboard box containing a reef of glazed donuts.\nA bathtub with candles lit up around it and a stool next to it.\nA man wearing a blue shirt and an orange and black neck tie.\nA car parked next to a brick sidewalk on a street at night.\nA man is flying a kite on a clear day\nA picture of a lot of kites in the air.\nA black and white image of a young men on his skateboard.\nA girl using her laptop computer on her bed.\nThe young man hurls his frisbee towards the metal structure.\nA clock above two pink colored stone arches\nVegetables being displayed with each other in arrangement.\nA game strategy is hatched by the boy NOT wearing the boat like a hat.\nA little boy wearing a bib eating a doughnut.\nA bathroom with a urinal and tiled walls.\nA photo of a living room with a purple chair thete\nA little girl that is flying a butterfly kite.\na man eating food at an airport terminal\nA child wearing pajamas holding a brown teddy bear.\nA slice of pizza with lots of vegetables on the top of it.\na bowl with some fruit inside of it\na man is on a surfboard with a dog\na number of people standing near one another wearing suits and ties\nThe kitchen counter is cleaned off and ready for us to use.\nTwo young people sit next to a bunch of snowboards.\nAn airport filled with planes sitting on tarmacs.\nA man on a horse during  a race jumps over a hurdle\nThe little girl is sitting in the chair eating candy.\nTwo zebra standing in the trees next to a fence.\nThe girl in purple is using her phone.\nA double decker bus driving down a city street\nFour sets of legs with one standing on a skateboard in the dirt\nA table filled with food on a patio\nA cross country skier traveling down a slight slope.\nA man and horse near a painted man wearing shorts.\nA man in a black coat sitting on a bench at night.\na city street with bicyclists, double-decker buses, and many lights\nA book mobile bus from a library sitting by a street side.\nA man sitting in front of a tv with a Wii remote in his hand.\nA group of young children are petting a horse near the gate.\nThe keys 1, 4, 7, and 8 are clearly visible on the remote.\nA laptop sitting on a bed near a window\na rocking chair siting in a house next to a green lamp\nA busy street is crowded with umbrellas on a rainy day.\nA little kid standing on a household appliance\nA fridge with a bunch of papers hanging on it.\nA pole that has different types of signs pointing.\nWoman leans over as she serve the tennis ball back to the other side\nA toy monkey sits on a desk beside a laptop.\nA man in a suit and tie is smiling.\nA young lady with blue hair is holding her phone, posing for the camera.\nA group of horseback riders walk down a trail.\nTwo giraffes eating the leaves off a tree.\nA flat screen TV mounted on a brick wall in a living room.\nA close up of the luggage claim at an airport with many suitcases.\nA baby with a teddy bear looking over his shoulder.\nThe skier is jumping into the air above a half pipe.\nA tow truck driving down a rural road.\nTwo street signs located above a stop sign.\nA group of actors and stage workers on the set of a TV show.\na tennis player about to hit a tennis ball.\nThe man is posing for a picture on his motorcycle.\nA person riding a board on top of a wave.\nA man in a wetsuit surfs a churning wave.\na living room that has a couch and a chair in it\nA hand is seen pulling a piece of food from a toaster.\nA man and a woman stand under an umbrella at a street crossing on a rainy day.\nA group of people riding on the back of an elephant.\nA pulled pork sandwich with a pickle slice.\nA virtual woman in a rainjacket, carrying an umbrella.\nA yellow cat is among the camping equiptment.\nA pigeon stands on a window ledge overlooking a street.\nTwo men are checking out several wines in a crowded room.\nSeveral sheep herding towards an outdoor pen on a county side.\nYellow train on the tracks running parallel to the trees.\npeople are taking samples of wines in a room next to an outdoor area where people are sitting\nLarge white passenger bus parked in a parking lot.\nan elephant standing by some trees with it's trunk in the air\nA dad or grandpa looking at a child both are smiling.\nA close shot of a unique looking plate of food.\na couple of bears are sitting near a glass\nA black and white dog examines something on the ground.\na big propeller plan flying through the air\nA group of street signs in a display case in a room.\nA small green vehicle model is on display next to a busy city street.\nThis is an old picture of a train at the station in Boyne City\nA group of people sitting on a yellow couch playing a video game.\nA zebra standing next to a tree on a dirt lot.\nA lady with a young girl standing in front of a few english muffins.\nA happy girl is showing off her Nintendo Wii.\nA man in a crowded room gazes into the distance.\nA series of street signs in French on a city street.\nA person in a cross guard uniform directing traffic.\nA man sitting in a  chair with a laptop computer.\nA \"One Way\" street sign pointing to the right.\nA man standing next to a news stand on a street.\nA red bus parks in front of a building by a large tree.\nA white busted up toilet sitting on it's side.\nA young girl standing over a soccer ball.\nA gray vanity with three spigots in a public restroom.\nA small refrigerator sitting on top of a wooden counter.\nA boy is sitting at a table eating.\nA group of people is playing frisbee in a field\nThere is a cat drinking from a faucet\na close up of a bench near many plant life\nThis is the head of a giraffe standing in a fenced in area.\nA plastic male doll is sitting on a toothbrush on its holder.\nMultiple items on a metal bar near an outlet.\nA big building with a large clock at the top of it .\nThe street pole contains traffic and street signs.\nTwo representatives from two different governments shake hands.\nA large black bear traveling across a grass covered field.\nA game of baseball being played in front of a large crowd at a stadium.\nA park with kites flying in the air\nA chair and a couch in a small room.\nThere are vegetables that look like they have seasoning on them\nTwo people riding horses down a sandy beach.\nA bunch of busses are in a lot.\nA plate filled with a chees filled meat sandwich with sauce.\nA bowl of soup, rice and fish by a woman.\nA group of scooters parked next to an old building.\nA man doing a trick on a skateboard off of a rail.\nA boat is running in the water with a low sun in the sky.\nThe jazz band is taking part in a parade.\nA bunch of seagulls eating on the beach.\nA little girl enjoying a sweet confection and awaiting a sugar rush.\nTwo giraffes stand in an open area with water and other animals in background.\na vase full of colorful flowers in a bedroom\nA group of people riding boats on top of a lake.\nA computer desk with a computer on it and a chair in front of it.\nA tennis player trying to hit the ball.\na very large animal submerged in water with two people near it\nAn adult and a child sleeping in a bed.\nA lady with blue pants and grey sweatshirt playing tennis.\na giraffe standing next to a tree with more trees in the back ground\nA closeup shot of several zebras standing together.\na baseball player wearing green and yellow wearing his glove\na giraffe standing on a field near a bush\nWoman getting ready to hit a ball on a grass court.\nA young lady taliking on a cellphone in the hallway at school.\nA woman holding a skateboard posing for a photo.\nA black and white dog on a brown tile floor next to counter.\nA black train traveling past a train station.\na tray that has a plate and a bowl with food on it\nA woman sitting on a couch in a living room.\nA stop sign with lights lit up all around it.\nTWO TRAYS FULL OF FOOD SITTING ON THE TABLE AT A RESTAURANT\nA train on train tracks that run parallel to many other train tracks.\nA group of zebras are standing in a field.\nA young child in a white dress holds a teddy bear while standing outside.\nA man with a tie and a work badge\nFive loaded hotdogs surrounding a tray of cheese fries setting on a round table.\nA train goes through an intersection with traffic lights to stop traffic.\nan image of two men that are walking down the street\nA large sheep and a smaller sheep graze from a field.\nA small bulldog sleeping on a bed while wearing a pirate hat.\nA brown dog carrying a black frisbee in its mouth\nA long row of wood and wrought iron benches along a sidewalk.\na young woman with a slice of pizza in her mouth\nA pitcher throws the ball towards the batter at a game.\nA cute little girl sleeping in a wooden framed bed.\nA horse connected to riding equipment walking in the street.\nA car parked on top of the curb next to a meter pole\nA date book is next to a phone, calculator, and a keyboard.\na cat sitting in the refrigerator next to a gallon of skim milk and a bottle of gatorade\nA man is prepping a turkey in front of a bottle of wine.\nA wooden stop sign in a rural area\nSmall children playing with toys and stuffed animals\nPeople sitting and eating in a restaurant.\nA group of young kids playing soccer on a grassy field.\nTables and beach chairs on a sandy beach.\nSome unfinished looking wood is in a white bathroom.\nA baseball player slides into base while another leaps over him.\nA boy is jumping into the air on a skateboard.\nChildren on a tennis court holding a tennis racket and tennis ball.\na person riding a skate board at a skate park\nA young child standing at a table with a plate of food.\na couple of elephants walk in a caged area\nThe gentleman is taking a selfie while riding his motorcycle.\nThe dog is looking at the toy bird being held by him.\nA bus driving on a street with people approaching it in the mountains.\nA man is peeing and has his behind exposed.\nA figurine with a plastic witches head is standing in front of a computer keyboard.\na bi plane with a nazi flag on the tail\nA dog is standing in the grass with its tongue out.\nA red car is parked by a parking meter.\nA man sits on a blue and black motorcycle.\na young man holds a snow board\na tennis player swinging a racket at a ball\nA COUPLE WEARING YELLOW DRESS STANDING NEAR TWO HORSES.\nA television is on the beach near the ocean.\nA white bear sniffing on to some rocks\na black and gray cat is sitting on a toilet\nA bike tire and a boy with a skateboard\nA sandwich with chicken and lettuce is on the table.\nVarious sized knives are hung on a wall magnet.\nTwo red traffic lights lit at a street corner\nA rhino and a baby elephant by a river.\nA girl is drawing on a birthday cake.\nA herd of cows is standing in a grassy field.\nA man laying on a bed bent like a pretzel.\nA brick wall with a blue and white sign next to arc.\nThere is a building with surfboards outside of it.\nAn old image of a pickup truck broken down on the side of the road.\na mama goat and her baby walking on a slope\nTwo men standing in a store aisle with one holding a baseball bat.\nA cat and dog are laying on a red rug.\nTrain on tracks riding pass bus and couple cars on the street\nA bunch of animals out on the field.\nA woman standing on a beach throws a frisbee.\nThree workers stand behind a colorful fruit stand.\nThere are several boats docked at the dock.\nA man standing next to a large elephant.\ntraffic lights besides the road with so many vehicles\nAn eagle soars through the sky near trees.\na bathroom with a corner bath tub and duel sink.\nA man wearing ski gear and  skiing downhill in the snow.\nlarge brown elephant making his surrounding look so small\nStanding in the ocean waves, a man flies a kite.\na twin engine airplane stored at aviation museum.\nSkiers grouped up in front of a vancouver sign.\nA very large group of people are sitting at tables.\nMan and woman enjoying video game in living room.\nSeveral people cross-country ski on a snowy mountain.\nA man stands in a room with a cardboard box sitting on a chair.\nA man holding a racquet preparing to serve a tennis ball in front of a crowd.\nA plate with cooked meat and vegetables served on it.\nA toilet is standing in a room with a picture frame on top of it.\nA Yorkshire Terrier is looking out the window of a house.\nA person does a trick off a ledge on a skateboard.\nA very big airplane that is making a turn in the sky.\nA man lunging forward towards a frisbee next to three other men.\nan old black and white photo of a large building\nsome big black cows in  a grassy field\nA line at an airport with people and their luggage\na nice neighborhood with some green grass in it\nA giraffe standing on top of a lush green field.\nTwo skateboarders are racing through an obstacle course.\nTwo giraffes and a zebra roam in a preserve area\nLittle Asian girl holding a wii remote control.\nA computer monitor, keyboard, phone and various papers sit on a desk.\nA blue suitcase is leaning against a post on the street while a man walks by.\nA VW long van parked on wood strips on a grassy area.\nA skier pauses near the side of the course.\nOld passenger train making its way down from a rocky hill.\nTwo women are standing playing with a nintendo wii.\nLarge group of clothing sitting on top of each other.\nA mug of hot beverage sitting by a computer.\na lake with a lot of boats on it\nA partially open door with a bathroom behind it.\nMany people are walking around in this square.\nTwo flowers are on the blanket across a bed.\nIngredients for a tasty bite, including peanut butter, oats, banana, preserves and syrup.\nLooking at a barge cross a channel of water under a cloudy sky\na man putting a pan of food into an oven\nSeveral people boarding an old fashioned airplane in a field.\nBlack dog jumping up at big screen television.\nA person is holding a banana that they are peeling.\nBox of dollar bills tooth brushes pills and spoons.\nA hotel bedroom with balcony overlooking the ocean.\nhorses stand around on a neighborhood street in front of a car\nA large crowd is watching a baseball game.\nA zebra is grazing on scarce grass in front of a rock wall.\nVarious luggage tagged and stored on numbered shelves\nA herd of cattle standing next to each other on a dirt field.\nA child dips broccoli in dressing before eating.\nTwo black cats looking out of a window.\nA dog in a grassy area with eyes on a flying frisbee.\nA young male plays with a green frisbee.\nA room with some big equipment and a toilet.\nan elephant with a seat on it's back\nA boat that is sitting in the water.\nA bed next to two mirrors on the floor.\na tour bus with a wi-fi notice parked on the side of the road\nA mostly white bathroom has a black toilet seat.\nA man on rollerblades at a crosswalk holding a sign that says slow.\nsome buildings and a clock tower with two white clocks\nthere is a herd of animals running infront of a man\na person that is holding up a frizbee\nPeople hanging out in a kitchen eating and drinking\nTwo animals that are looking at something in the wall.\nA banana and a vanilla bean are next to a shot glass.\nA young girl holds a Frisbee at a park.\nA person flying a kite on a beach\nSteak sits on a plate with broccoli and mashed potatoes, next to a glass of water.\na number of people riding skis on a snowy slope\nA house plant on a sink in a bathroom.\nan image of a female tennis player returning a serve\nPerhaps he's a magician who will pull a rabbit out of that hat.\nThe man is using the toilet with the bathroom door open.\nA person looks at their reflection in a bathroom mirror.\nTwo adult giraffes and a baby giraffe are in a cage.\nA grupo of people in a field with tents flying kites.\nAn asian dish topped with sesame seeds.\nTwo dogs and a cat on a boat at edge of water.\nSeveral plates of foods including strawberries and vegetables are next to a sippy cup.\nA very long limo with a bunch of farm animals on top of it.\nA couple of umbrellas in a small room.\nA girl walking and talking on a cell phone.\nA man with skis walks through a snowy area.\nA view of a bathroom with a yellow towel sitting on the shower.\nA red truck moving towards a busy highway.\nTwo people are walking towards some motorcycles to leave a market consisting of umbrellas over tables.\nA close up photo of parking meter on a street.\nA ship of people cruising along the water.\nA couple of giraffes are standing in the wild.\nA tray topped with sandwiches and cut up apples.\nA person on a snow board performing a jump on a mountainside.\nA large passenger jet flying through a cloudy blue sky.\nA man swinging a tennis racket during a tennis match.\nA computer desktop with a keyboard and monitor.\na small white vase is on a table\nA bridge stands over a river before a city sky line.\nA group of people riding skis across a snow covered slope.\nA cyclist rides through a tree-lined path in the park.\nA man drinking from a glass on top of a night stand.\nA giraffe standing at a dirt road eating off a tree branch.\nA pooh bear is sitting upright holding a honey pot.\na living room with a fireplace and a big brown chair\nA man riding a motorcycle across a lush green park.\na couple of beds sit inside of a room\na herd of giraffes on a dry grassy plain\nA group of cats sitting on top of a chair.\nSome strawberries floating in a bowl of pudding with sparklers added.\nA tablet, a laptop and a computer on a desk\nBathroom with granite counter top and single sink.\nA train traveling along a rocky mountain side.\nA man and a child are dancing by the water.\na girl and a dog are sitting on a bed hugging\nA man and a woman standing next to each other holding tennis racquet.\nA cat scanning the floor in front of an orange bucket\nA wet polar bear holding a green cone in its mouth.\nThe pink Frisbee is laying on the snow covered ground.\nTwo computers on a desk in a small bedroom\na close up of a table with an ipod headphones and a remote\nA destroyed toilet and sink lying on the ground.\ntwo people on stage performing a song to a crowd\nTwo females sitting on BMW motorcycles under a tent.\nA boy is holding his hands out as he jumps with his skateboard.\nA boat filled with produce and people floats on a river.\nA public official helping to feed some school children a healthy lunch.\nA red parking meter sits on the sidewalk.\nA man holding up his tennis racket .\nThe shirtless man plays frisbee in the water.\nA group of people sitting at a restaurant table with food.\nA wall with vines and old tools strewn on it.\nA very big nice looking truck on a street.\nA slug crawling on the seat of a toilet\nA pole holding a traffic sign at an intersection.\nA radish on a cutting board next to a knife\nA bride and groom walking next to one another.\nI am unable to see an image above.\nA couple of women holding up a cake together.\nA fire hydrant at a intersection at night.\nA baby pressing a key on a laptop.\nthere are many different pies on this table\na bunch of people are sitting in a busy room\npeople in a large sleigh being pulled by horses\na close up of a person throwing a pair of scissors\nA blue vase with sunflowers and other flowers\nThe snowboarder is standing on a conveyor belt with others.\nA couple of  men holding a bunch of baby sheep standing next to each other.\nThree people in work uniforms and visors standing together in front of various types of donuts.\na close up of the front end of a school bus\nA young child walking down a street past two nets blocking a road.\nDog fetching a frisbee in a rough field.\nA group of kids standing in a forest.\nA person wearing a wedding ring has their hand on a teddy bear.\na very decorated work cubicle with a laptop\nA catcher and a batter playing baseball in a park.\nA guy jumping through the air with a Frisbee in the air.\nTwo large green and white jumbo jet planes on the tarmac.\nTwisted bars of metal connected to a tall building.\nThe double sink in the bathroom is nice and clean.\na horse and foal grazing on dry grass.\nA train traveling over a bridge spanning a river.\nBunches of fruit growing on native trees shown on cloudy day.\nA man gets ready to hit a tennis ball with a racket.\nA lamp on a table in a livingroom\nAnd upload picture of some food in a bowl.\nLady with a slice of piece in front of a stack of pizza boxes.\nA train is on the train track, which is surrounded by trees with autumn foliage.\nA kitchen area with a refrigerator, table and doorway.\nThe person is taking a high jump on their skis.\nThe skiers are getting ready to go on their run,\nA red two level bus with front damage to it being towed down a street.\nA small bathroom with a yellow toilet, sink area and shower.\nSeagull on rock with ocean and lighthouse in the background.\nMen in a teaching kitchen discussing all the visible prepared food.\na wooden desk with a black and silver computer\nA train is going through the pretty country side.\nA herd of elephants drink from a river as two wander away from the group.\nA group of people standing in the snow with skis\nA picture of a person and a motorcycle on the street.\nA white bowl of food with a spoon.\nthree giraffes standing up near some dry plants.\nA couple of men riding motorcycles behind a herd of sheep.\nA girl in a bikini sits on a towel at the beach and holds a pastry.\nA large computer screen with keyboard on a small desk in a corner.\nThe zebra is standing behind the rocks in the exhibit.\nA zebra eating hay out of a container near a rock.\nA boy prepares to swing his bat during a baseball game.\nA foyer furnished with a sofa, arm chairs, and end tables.\nA motorcyclist is being followed by a familiar face.\nA person sitting at a wooden table with pizza, and some other foods on a brown paper bag in front of him\nA MAN IS RIDING AMOTOR BIKE IN THE CITY\nThe giraffe looks like he is in the wild.\na couple of giraffes that are outside a brick building\nA bathroom with hand towel, mirror, and sink.\nA man surfing on his surf board against the waves\nA statue is set on top of some banisters.\na man walking through the water holding a surfboard\nA man in a car who is on a cell phone.\nA bathroom counter with a sink and various cosmetics and toiletries.\nA blender sits on the concrete next to some greenery.\nan animal in a field behind a fence\nA metro bus approaches an intersection where a traffic cop is directing traffic.\na table covered in vegetables of all sizes and colors\nA traffic light with a pedestrian crossing sign on it's sides.\nA torn apart bathroom with a toilet in a bathtub.\nA man does a trick at a skating course.\nA double decker bus and a truck driving next to each other.\nBoats at a dock near a large hotel.\nBlack and white photograph of a tennis team and their coaches\nWooden benches are lined along the edge of the water.\na group of people standing in the snow next to a building\na couple of red lights are on a pole\na man standing on a tennis court holding a racket\nA chef is instructing two women on how to slice vegetables.\nA woman looking up at the kite that she is flying.\nA red and yellow fire hydrant with the lid taken off.\nA man in a green shirt holds an appliance while another man stands by.\nA small brown teddy bear sitting on a white bed leaning on pillows.\nThree people sitting in the snow with snowboard on their feet.\na very tall tree in the field with nice flowers\nA young boy standing next to a giraffe he can pet\nA young boy with a fish hat eats a snack.\nSkiers and snowboarders mill about on a mountain.\nA young girl standing in front of a book shelf holding a red tie\nA male skier navigates a course at the Vancouver Winter Olympics.\na kitchen that has a bunch of people in it\nHappy girl in a green shirt holds onto her suitcase.\na bath room with a toilet and a window\nA young man playing a ball game on a cement basketball court.\nMultiple boats sitting dormant on a lake bay.\nA bunch of people riding motorcycles down a road\nAssorted food items displayed in white dish on wooden table.\nA red double decker bus traveling down a city street\nChicken wrap cut in half displayed on wooden board near silverware.\na man in a wet suit rides on a surf board\nTwo horses graze in a pasture in the setting sun.\nAn airplane is flying in the air on a clear day.\nsome people are riding elephants in the jungle\nA man hitting tennis balls on a blue painted tennis court.\nA person with a laptop sitting in front of a window.\nA baseball player runs across home plate after hitting the ball\nA purple swamphen with a red crest on its head walks on the ground.\nA delicious looking hotdog sits in cardboard with tons of toppings.\nIt's strange to see a bow tie with a military uniform.\na black cat sitting on a cement patio\nAn infant in a high chair covered in pink glop\nBicyclists ride down the sidewalk in front of several stores.\nSome zebras that are sitting on the ground next to each other.\nPeople watching a horse race image is fuzzy.\nA bedroom with a bed and other furniture in it\nA blurry photo of meat patties on a big meat patty\na buffet in a restaurant with some big crocks glasses and bins of other foods\na group of zebras standing around a food trough to eat\nBrown gull in water on beach littered with seaweed.\nA group of people wait for the start of the race on their bikes.\na man sitting in a chair watching another man pretend to be an elephant while playing with a child on the floor\nA minimalist bedroom with low furniture and a quote on the wall.\nTwo birds sitting on top of a branch on a tree.\nCity bus next to traffic cones in the far right lane of a busy freeway.\nA cat reaching up to grab a feather on a string.\na person leaning on a stop sign with a skate board\nA woman in a blue sweater sitting at a table with food.\nA child making a silly face over a tray of donuts.\nThe bed is located on the edge of the beach.\nA bald man is using a surfboard to ride a waves.\nBroccoli next to some meat on a small plate.\nShelves filled with pots, pans, and cooking utensils.\nTwo large boats sitting on a docking area in the evening.\nA man jumping on a dirt bike while another man watches\nA cow laying down in a grass field.\na blender sits on a counter top unplugged\nA bicycle leaning against an old white building.\nThree men, one caring a skateboard, are wearing matching t-shirts.\nThe clear shelves on a green wall that have vases with designs on each shelf.\nHelmets should always be worn by motorcycle riders and passengers.\ntwo people on a field wearing baseball equipment\nA man puts on his jacket while standing near snow skis and poles.\nA blender sits on a kitchen counter surrounded by baking supplies.\nA small child wrapped in a towel brushing their teeth.\nGreen and white airplane sitting on a runway by the ocean.\nA keyboard and monitor on a corner desk.\nTwo couples getting ready for a tennis match.\nA street sign showing the words Gay Street.\nA man in brown shirt standing in a kitchen.\nPeople cut a cake outside for a celebration.\nA large passenger plane is parked on the runway.\na close up of a red fire hydrant with a chain on it\nA young child running down a rain covered walk way with an umbrella.\nCouple people walking up the snowy hill wearing skis\nA dog is sitting down in front of a mirror\na group of guys standing out on the road\nFour people standing on a balcony with a clock\nWhy would the cow be grazing in front of those homes?\nA woman is taking a picture in the bathroom.\na little boy and his father skii down a big hil\nA man with camera watching a group of giraffes\nHerd of sheep standing on pasture with stone buildings in the background.\na bunch of people walk on a beach to the water\nA myriad of wind socks blowing in the wind.\nA trolley rolling down the tracks in a forest.\nA box of doughnuts and pastries with strips of bacon.\nA woman on a cell phone at a station.\nA pair of hand slicing carrots with a large knife.\nA woman is sitting down in her kitchen to feed her young child.\nA horse drawn trolley sitting in the middle of a street.\na smiling woman holding onto a pizza box\nA picture of a computer sitting on the floor.\nTwo giraffes are found wandering around the buildings.\nJockeys on horses riding on a racing track.\nThe newborn baby is sleeping next to a teddy bear.\nA girl riding on the back of a scooter on a cobbled road.\nA girl in pink ski gear that is sitting in the snow.\nA couple of people sitting on a wooden bench.\nA flat screen TV sitting across the way from a laptop.\nTwo horses graze in a field surrounded by barbed wire.\nMass transit train waiting for passengers at the station.\nFour women sit on a park bench with groceries.\nA baseball player in a blue jersey standing ready with a catcher's mitt\nA larger commercial jet is flying in the air.\na large pizza is sitting on a pan\nA cow walking on the beach towards people on lounge chairs\nthis is stuffed teddy bears sitting in the grass\nA man a woman are standing together holding tennis rackets.\nA blue and white double decker bus on side of street.\nPerson in a parka taking pictures with a mobile phone camera.\nA cat next to a grocery bag on the hardwood floor in a kitchen.\nA man riding a paddle board into a massive wave in the ocean.\nMan and woman in a bedroom holding up Wii controllers.\nA group of people on some skis in the snow.\nA woman working in commercial kitchen with stainless steel appliances.\nA man sitting on a horse drawn carriage\nA girl in black jacket drinking milk and eating pizza.\nA  deep marble bathtub under an ornate mirror.\na woman posing on the street for a photo\nA horse pulling a carriage down a street with other people.\nPeople are huddled together under umbrellas on the beach.\na person sitting on steps talking on a phone\nA large passenger jet flying through a cloudy sky.\nA cat is lying on its back in a man's lap.\nSeveral people are snowboarding off the top of a snow covered truck.\nThe adult elephant stands idly in his zoo habitat.\nA man is holding a baseball bat while wearing a muddy outfit.\nA woman is sitting with a suitcase on some train tracks.\nA train traveling over a bridge over a freeway.\na couple of chairs are around a table outside\nA man in vest and bow tie standing over a keyboard.\nA cat sleeps on a pile of discarded shoes.\nAn open cell phone in a person's hand.\nThere is a dog walking down a path near grass.\nThree giraffes behind a wire fence next to a tree.\nA tennis player turns her racket sideways as she returns the ball.\nA horse grazes by itself on a grassy plain.\nPeople playing frisbee out on the lawn, on diving for it.\nSome type of bed outside on the beach\nVirgin Ameican Airline planes with passenger boarding bridges attached.\nA woman walking a horse down a trail.\nTwo polar bears playing in the ice and snow.\nA wide shot of a modern kitchen with a glass table in the foreground.\na photo of an old tall cathedral and bell tower.\nA parking meter and a umbrella on a street.\nA vase sitting on a table filled with flowers.\nDamaged bathroom with a toilet, sink, and damaged window.\nA batter up to swing in a baseball game\nBoats docked in the water in a marina.\nA young giraffe running across the road on an African plain.\na picture of a bulding with a open window and clock.\nA plate topped with two pieces of cake and strawberry.\nA man with a helmet is on a surfboard\nTwo men shaking hands while standing on a tennis court.\nVisitors walk beneath huge airplanes on display in a hangar.\nThis train is riding a rail near some water\nA stop sign that has another sign saying all way under it.\nPeople standing on the street holding umbrellas near buildings\nA knitted cap sits upon a red hat stand.\nA desktop computer sitting on top of a desk.\nThe sandwich has chicken, melted cheese, and tomato inside.\nA diesel locomotive approaching a rural grade crossing.\nA Nashville bus with a big ad for Coors Light on the side.\nThree people standing at a baggage claim at an airport.\nfive zebras standing in a row in the wild\na woman holding an umbrella on the street\nA pizza cutter being used as a spatula.\nA male flying a kite on a sunny day.\nA group of people standing on top of a lush green field.\na person sending a text on her phone\nA hand is holding a pack of Japanese donuts.\nA couple of red chairs against a wall\nSmiling and smirky people are in a small kitchen.\nA building with a clock tower on it\nA steam locomotive with passenger cars crosses a bridge over a channel\nA female tennis player hitting the ball.\nVarious zebra in dirt field with mountains in the background.\nA black and white image of a vehicle that is decorated like a dog.\na bucket of oranges sitting next to a bike\nA freshly made bed resting on a tiled floor.\nTwo people in the middle of a skiing trail with trees lined on each side of the trail.\na street-side market with colorful plastic furniture.\nA dessert that consists of a piece of cake and some ice cream.\nA toddler in a kitchen trying to use a vacuum cleaner.\nThree sheeps are grazing in a small field.\nA very cute small dog laying on a big couch.\na table and chairs with silverware and plates a pan and bowl of food\nStuffed bears sit in the window of a store.\nPeople and carts loaded with suitcases on a train platform.\nA boat with a bed set by a set of windows.\na messy living room with the television on.\nA living room with leather couch, settee and chair, rustic tables and a cowhide rug.\nA red flower vase placed next to a clock on a window sill.\nA  glass vase full of dried dead roses\nTwo googly eyes and a Santa beard placed on a microwave oven\nA tabby cat sleeping on a wooden island in an old looking kitchen.\nThree giraffes in the wild stand by shrubs.\nA young woman wearing a white hat in a commercial kitchen chopping lettuce.\nA person who is standing in front of a laptop.\nA couch and a chair in a room.\nA flat screen television mounted above a fireplace.\nA cart with a load of suitcases pile on it.\nA white sandwich has pink meat in it.\nA bunch of people are on stage and the guy in white is doing something to the one child who is holding his skateboard and next to him is a child in a red helmet.\nA banana plant with a large flower and unripe bananas.\nA yellow and grey train on tracks beneath a traffic signal\nA very nice looking trolley car on a city street.\na person wearing a black coat and a tie with bolt designs on it.\nA giraffe stand alone in a zoo during the day.\nThere is a long wooden bench with a fountain in the middle of the area.\nA couple pieces of food that are on a table.\nA adult elephant and a couple children in the water.\nTaking a moments rest on their cross country ski trip.\nA bunch of cut meat sitting on a cutting board.\nthere is a luggage that is sitting on metal outside\nFour people are posing for the camera with flags behind them.\na living room with book bottles a lamp and television set\nA white plate topped with two different type of food.\nA woman standing at a counter using a blender.\nthis man is jumping high over the grass\nPeople are gathered to watch two women, one who is doing the splits.\nTwo men are standing and talking alongside an old fire company van.\na young girl is getting her temperature taken\nA green tennis ball bouncing on a wood tennis racket.\nA cat sitting on a chair looking straight at the camera.\nA Studio apartment with minimal furniture and a refrigerator.\nA giraffe is walking through a grassy field.\nA room in a home that has a small table with one chair on the side and another piece of furniture in the next area.\nA living room with white walls and stained wood furniture.\nzebras stand next to each other in the zoo.\nA train station that has a train pulled into it.\nA stop sign in front of a Google building.\nA woman standing in a grass field with a cell phone.\na close up of a dog on a desk near a monitor\nA street sign that reads Ronald Reagan Allee.\nA white plate topped with a piece of toast and eggs.\nA cut in half bagel sandwich sitting on top of a plate.\nA couple of bowls are on a counter by a man.\nA man leaning on a fire hydrant on a city corner\na girl and a dog looking angry in a photo\nThe corner of a batch room with a white sink and red shower curtain.\na close up of a cell phone on a table near earbuds\nA person holding a red phone nest to a flower filled plant.\nA couple of cars that are in the dirt.\nA pastry is decorated in a lattice style on a piece of burlap with a knife.\nHerd of goats in grassy area with herder.\nA semi truck pulling a trailer filled with logs.\nA couple of people standing with a umbrella.\na man is in a store making donuts with flour\nPeople at the table getting sandwiches to put on their plates\nFour men and two women sitting on two different benches.\na whole bunch of bananas cut up in a large bowl\na male in a black shirt taking a photo in a mirror and a sink\nA large passenger jet flying through a blue sky.\nThe display case has many different scissors on it.\nA red fire hydrant sitting on a slab of cement in a patch of grass.\nA cat sits on the couch next to the remote.\na close up of a dog laying down with a chew toy\nA man holding his cell phone in front of him in his left hand.\nA boxer dog faces the camera while sitting on a computer chair.\nA man riding a snow board down a snow covered slope.\nA man sitting in the back of a van talking on a cellphone.\na black and white clock on a gold and black tower\na baseball player that has a ball in his hand\nA truck parked on the street with a man getting out\na woman hitting a ball at the end of a tennis court\nA bathroom with sink and a toilet in it.\nTwo plates of food that include potatoes, broccoli and sausage.\nTwo gentleman are playing on the Wii.\nA living room has a fireplace and bookcases in it.\nA soccer player chasing a ball in the air.\nTwo cows standing close together on a grass field.\nA laptop computer sitting on top of a wooden desk.\nA knife being slid into a wooden block.\na man pouring liquid in a line of glasses on a table with a hat on his head\nSeveral people in a field, some are flying kites.\nSome items sit next to the door.\nA food truck parked along side the street\nA man swinging at a tennis ball with a racquet.\na fire house with a grass field in the back of it\nA brown cow under a tree in a grassy area.\nThe insect is flying around on the porch.\nA woman puts her market shopping in her motor scooter seat\nFour hotdogs in buns sitting on a white platter.\nA group of boys wearing white shirts, black ties and red caps.\na woman hitting a tennis ball with her racket\nA transit bus pulling through a shopping area.\nA cluttered restaurant has a boy at a table with a phone.\nTwo girls sitting at a table near a dishwasher.\nA desk topped with a laptop computer and speakers.\nA passenger train is going down the track and people are in the car.\nA group of sheep grazing in a large open field.\nA teddy bear is next to a banana in the air.\nA bowl of fruit is on the floor in front of some feet.\nA woman riding down the side of a skateboard ramp.\nA beautiful girl playing a game of Frisbee with an orange Frisbee.\nA plate with a slice of cake on top of it next to a fork.\nA man loading luggage onto a machine as it comes off a plane.\nA man and boy are talking behind a rickshaw.\nA woman dressed in colorful clothing preparing a meal.\nA vegetable pizza on a plate on a table.\nA small, black cat sleeps next to a mouse and keyboard.\nA black dog laying on a rug next to a TV.\nA stop sign with street sign at an intersection.\ntwo woman standing in a kitchen by astove\nthis bathroom is all white and has a white toilet and a tub\nLarge dog laying on top of a bed and looking up at mirror.\nA room with some windows and a clock and a air sign.\nGroup of men with skateboard celebrating while in grassy park.\nSeveral skis and snowboards laying around in the snow.\nA dining and kitchen area with high wood ceilings.\nTwo older men throwing a ball on a baseball diamond.\nA dark frame surrounds the window and mirror in this bathroom.\nTwo children playing Wii while adults look on.\nsome toy buildings a fire engine and a police car\nA tablet is set in front of a Dell computer screen.\nAirport baggage handlers loading luggage into a cart.\nA picture of an animal catching a frisbee.\nTwo women laugh and show movement in the picture.\nThere are keyboard keys on a wooden table.\nA wooden cutting board topped with a sandwich with a knife.\nA woman in green cardigan with brown dog at a table.\na big bear walks through some grass\nA young woman riding on a brown house through a course.\na black and silver motorcycle is parked and some people\nA busy street in the city on a sunny day\na baby elephant walks through some shallow water\nA boat traveling along a river surrounded by grass fields.\nTwo people are crossing the street as they are heading towards the stop sign.\nA person performs a jump on a hill on their snowboard.\nA shelf that has a wedding photo on it with flowers.\nA gentleman is trying to pull off a skateboarding trick.\nA laptop and a keyboard are on a computer desk.\nA small boy skateboarding in a city mall\nA pile of broccoli with a sprout sticking out of the top.\nsome people playing soccer while a crowd watches them\na couple of birds that are standing on a beach\na bunch of bottles are in the fridge\nA white cup of coffee sitting on top of a wooden table.\nA green cloth holding a white tray full of food.\nAn open laptop computer sitting on top of a bed next to a mouse.\nThe cathedral has two clocks on each of it's walls.\nHe person that is doing a skateboard trick.\nTwo pieces of pizza on a plate with a small servor.\nSeveral cars and a motorcycle are parked in an alley.\nBlue bullet train waiting at the train station\nA plate of some sort of a vegetarian pizza dish.\na man with a hat holding a baseball bat\nan elephant is scratching his head on a tree\na train engine and box cars on the track\nTwo people sitting on a bench with their dog.\nA man smiling on skis in the snow.\nA cow running on to a road near a town\nA large wooden structure displaying boxes of fruit.\nA bird is sitting on the top of a log.\nA young boy riding a small skateboard on a pile of dirt.\nAn SUV driving on a rain soaked roadway past a red stop sign.\nA very big building with a clock on it.\nAn unmade bed is covered by a comforter and a bowl.\nA picture of a street light through a rainy lens.\nA man is about to hit a tennis ball during a match.\nSeveral women in a kitchen preparing many identical meals.\nA man and woman are walking and the man is pulling a suitcase.\nA woman riding a horse wearing a white outfit and helmet with yellow stars on it.\nA black and white traffic sign under a cloudy sky.\nA group of people sitting around a long white table.\nA bowl of pasta sits on a table with a candle.\nA line of buses parked along a wall by a building.\na semi truck driving on a road with a sky background\nA very tidy living room with a white couch with pillows on it.\nA stop sign in English as well as some other language.\nDozens of brightly colored kites lined up on a beach.\nFreshly cooked lobsters served at home with vegetables and salad\nSome people stand on the beach and others go in the water.\nThe people are riding motorcycles on a racecourse.\nA person sits on the road near their motorcycle.\nA Kenyan Airways airplane sits on the runway.\nA male tennis player stands with his racket poised.\na large air plane on a run way\nA white cats sleeps on the seat of a chair.\nA beautiful woman holding a hunk of cake.\nA field with large clear balls and a large amount of people in the bleachers.\nAn action shot of a moving bus on the street at night\nA man is skateboarding in front of two women.\nA stove top oven with a couple of pots and pans.\na hand is holding a kites string and a flying kite\nColorful graffiti on an old Canadian train car.\nThere is a picture of a traffic sign with north and south arrows in the foreground and a graveyard in the back ground on a projectable slide.\nA bedroom with a balcony in a hotel\nA fire hydrant is attached to a building wall.\nA living room with a white circular table in it's center.\nsmall boats in a large body of water\nA fluffy dog is walking up the beach\nA man on a court swinging a tennis racket.\nTwo custom pizzas with different and interesting toppings.\nZebras and wort hogs living together on the plains.\nA wooden statue of a man near a window of stacked donuts.\nA couple of shaggy haired sheep grazing in a field.\nA group of surfers walking along the beach with surfboards\nA group of people skateboarding in park area next to palm trees.\nA grass level shot of a small heard of zebras in the wild.\nA child is tossing a baseball to another child with a wooden bat.\nA man is spraying an elephant with a water hose.\nA horse-drawn carriage traveling down a city road.\nTwo giraffes standing together next to a wall.\nCouple of attentive enthused women playing Nintendo wii\nA group of women standing around a cake cutting slices.\nA street scene with a couple taxis lined up.\nA child in a vehicle holding some toys.\nTwo fire trucks from Seattle sitting in a lot.\nAn older gentlemen reads in his hotel room\nA man surfs down the waves of a beach\nA herd of horses grazing on bales of hay.\nA large clock constructed of landscaping plants and flowers on a small rise.\nA stop sign that is next to some plants.\nA flying bird seen through a liquid filled feeder.\nA close up side view of a zebras face\nA man on a water board speeding down the ocean.\nFour adult elephants and a younger elephant walk through dry soil.\na couple of anmails standing next to a truck\nA giraffe is overlooking a barren plain, behind trees.\nA woman and child walk, holding hands, under the large freeway sign.\nTwo people are eating pizza at a dinner table.\nA cat eating at something dead on the beach.\nA cluttered desk with a laptop and discs sitting on it.\nA cat in a bathroom sits on the lid of the commode.\nA little girl with a broken arm standing in her bathroom.\nA man is skateboarding near the parked cars,\nA stuffed animal dog sitting between to trash cans.\nA refrigerator with magnets on it sitting beside a trash can.\nA milking a cow in the middle of a  pen.\nA black cat lying down on a laptop.\nA red stop sign next to a brick building.\na sheep eating hay next to a log cabin.\na group of zebras on a farm in a field\nTwo people are in shallow water with horses.\nA hot pizza on the table is loaded with pepperoni and cheese and sausage.\nTwo boys sit on chairs and play video games.\nA lot of fishing boats have  a lot of men off loading their catch.\nGroup of people sitting in auditorium with a screen.\nA person is holding two spoons over the sink.\nA young boy who is surfing on a surfboard.\nAn assortment of computer devices resting on a large wooden table.\na fenced in park on a city street\na couple of zebras are grazing on some dead grass\nA baseball player taking a swing at a ball\na boy on a skateboard is skateboarding on the ramp doing a trick\nThere are adult bears that is sitting in a den\nA small cell phone sitting next to a glass of Pepsi.\nA couple of bananas hanging from a metal hook.\na bag that is filled with pens and scissors\nPeople sitting outside along a concrete wall on a sunny day.\nA large tiger cat sits on a chair.\nA man riding on the back of a motorcycle.\nA bike parked out a store front with a lot of boxes.\na group of people under a tent celebrating something\nThe young men are playing a game of baseball.\nA yellow door is detached from a refrigerator outside\nA gyro and french fries with a drink displayed on a table.\nLarge hotel room with a king sized bed and large view of the ocean.\nThere is some chicken with cherry tomatoes and edamame.\nA bed with white pillows next to a wall.\nA person with a blue, red, and green plaid umbrella\nA room with a bed, chairs and various boxes.\nA bunch of little kids playing a game of soccer.\nAn airplane parked on a runway in the day time.\nThe children are fascinated with the making of the cake.\nTies of various sizes and colors are hanging on a portable shelf.\nA cat is sitting on the floor while watching television.\nThree different colored apples and a banana next to one another.\nA boy playing tennis on a tennis court swings his racket.\nA skateboarder up in the air over a snowy hill.\nMan in grey uniform during a baseball game.\nA woman standing next to two baby elephants.\nTwo bicycle riders are on a trail through the woods.\nA cat sleeping on top of a brown chair in a yard.\nA mostly empty train station with two trains ready to depart.\nA man and a woman holding remote controllers in front of a television.\nSun is coming through a window in a living room.\nA young boy riding a surfboard on a wave in the ocean.\nA pretty little girl flying a kite on a lush green field.\nA picture of a police man riding on a motorcycle.\nA parking meter with a picture of a bicycle on it.\nThe thin pizza is sitting on the plate.\nTan suitcase behind a match magazine and CD.\nFive sausage, egg, and cheese egg muffins.\nA woman with pink hair walking next to a man with a suitcase.\nA group of skiers pose on a snowy slope.\nAn outdoor table and chair setting on the curb\nThe old time fire engine joining the parade.\nA tiled mosaic empty shower stall with bathroom mirror.\na desk with a laptop and a desktop on it\nA young man on a skateboard maneuvers around traffic cones\na train on the railroad near a forested area\na person on a bike rides next to a city street\nSkier on slope in alpine mountain area on sunny day.\na man in a suit carrying a drink and a red and white sign\nA glass of alcohol sitting next to an open laptop computer.\nThis old wooden fishing boat appears to be permanently dry docked.\nTHERE ARE PIZZA THAT IS ON THE TABLE\nA male tennis player bouncing a tennis ball.\nPeople standing around in the street talking near buildings.\nA bus and cars sit on a street.\nA closeup of a empty boat surrounded by dark waters.\na black gray and white cat is sitting on a bookshelf\na black silver white blue red an orange parking meter and a hand flipping it off\npitcher with grey and white shirt throwing a pitch\nA slice of pizza is on a round white plate.\nthree motorcycle riders some dry trees and a few green trees\nThree people, one in a suit, are posing for the camera.\nA toilet and trash can behind a wall in the bathroom\na group of people pose for a picture at a wedding\nA man is sitting on a bench, taking in the city.\nA man sitting on a white chair on top of a tennis court.\nIn the station people are standing and talking.\nA man and some giraffe standing in a field.\nA zebra with a left side pose while standing in a field.\nThe man is dressed in a suit and tie posing for a photo.\nA man wearing eye glasses is staring at the camera in front of a room.\nA white and blue vase with a peach rose in it.\na black and white photo of a tooth brush in a cup\nA group of sheep and some birds in a fenced in area.\na small child is looking at the kite flying.\nA yellow bus driving down a street next to a ball building.\nA person with blue hair takes a photo of themselves.\nAn adult with a child riding skis down a small hill.\nThe group of three friends are sitting on a fallen tree in the woods.\nA black cat laying on a parked car.\nThe man sits cross legged while typing on a laptop.\nA couple of women riding on top of a blue motorcycle.\nA clean looking bathroom has a white shower curtain.\nAn open door on a public transportation system.\nA small clean simple bathroom contains a sink tub and toliet\nA clock displays the time on a brick building\nPeople walking on a beach, many carrying surfboards\na person riding a skate board at a skate park\nThere are two people and two motorcycles by a brick building.\nA little girl out on the beach with a fish kite.\na little bedroom with some curtains blocking the window\nA brown horse grazing in field behind a fence.\nHundreds of people cycling in front of several skyscrapers\nthis is a sandwich and french fries on a plate\nA man is swinging a baseball bat on the field.\na close up of a cup with tooth brushes\na bunch of people sitting under a umbrella\nA small dog buried in the covers of a bed.\nTwo man preparing their surfboards to go surfing.\nTwo giraffes standing in front of a wooden wall\nA computer on a desk with two cds lying on top of the keyboard.\nA small cat sitting on the edge of a toilet seat looking into the toilet.\nSomeone skiing down a hill on the ski slope.\nA cartoon of a person surfing a big wave\nUrban street with storefronts and parked trucks, on a rainy day.\nA man takes a bite out of some sort of food.\nA man riding on top of a wave on a surfboard.\nA remote control lying on a wooden table\na gothic clock tower beneath a blue sky\nTwo skateboarders are riding on a slanted walkway.\nA group of kites flying through a blue sky.\nA ripe banana sitting on a table next to an apple.\nSome passenger buses that are driving down the street.\nA little boy is at a dining table in public.\nA double decker bus driving down a street next to a tall building.\nPizza and appetizers with a side of ranch dipping sauce.\nA large statue of an Italian chef wearing an orange tie.\na man standing at the beach in the water holding a kite\nA group of people in a circle, while holding tennis rackets and standing on a hard surface tennis court.\nA dog that is running on the grass with a Fribee in its mouth.\nA clock that is sitting on the side of a tower.\nA bed with a colorful blanket sitting under a picture.\nA stop sign on a corner with water and snow covered mountains in the distance.\nSnowboarder bundled up in winter clothing while on slope.\nAn old plane is sitting on a runway.\nAn old clock is seen on a foggy street.\nA man cross country skiing through the woods.\nA parrot sitting on a person's hand while eating fruit.\nA group of eople binding over fastening their ski boots.\nA young boy holding onto a parking meter.\nAn elephant roaming the grassy areas in his natural habitat.\nA few people standing on a court playing tennis.\nA couple of bears are outside, both on logs.\nA spiral glass water feature showpieces a commercial bathroom.\nA man attending to food by a pile of fruits and vegetables.\nA baseball player with a mitt on one hand.\na person jumping a skate board in the air\nA man who is eating a glazed doughnut.\nA computer on a countertop with a tangle of cords behind it.\nA tray of food consisting of vegetables meat and rice.\nA unique style bed with red covers and a mirror behind it.\nA large and a small teddy bear at the teddy bear museum.\nA person water skiing behind a boat full of people.\nA man on skis standing at the base of a mountain.\nA container with a variety of vegetables, desserts, breads and other types of foods, with one spoon on top of the food items.\nA pair of plush animals dressed in halloween costumes.\nA black dog in a yard jumps up toward a yellow Frisbee.\nA horse reflected in the surface of water\na toddler standing while holding onto a toilet and reaching for a towel\nTwo men with suitcases and a lady nearby.\nA black and white dog curiously looking at something on a counter.\nan adult and two children snow skiers snow and trees\nA group of children running after a soccer ball\nA blurry image of yellow flowers with a fence in the background.\nThe long meat and cheese sandwich is wrapped in plastic.\nA man standing in front of his tv.\nA train riding a group of people around.\nThis is a plate holding a double decker sandwich.\nA giraffe grazing from a tall tree next to a rock.\nA youth baseball team and their coach poses for a photo on the field\nsome people are sitting under umbrellas at the beach\nA surfer carries his board through the snow, and rides a wave.\nApples and leaves on the ground with a cat in the background.\nA dim living room with modern furniture and potted plants.\na big group of people that are standing under a shelter\nA street sign on a pole on the side of the road.\na woman taking a picture of her microwave\nA living room with everything in it labeled\npeople watching young boys playing a game of some sort\nThe young man is practicing his tricks on his skateboard.\nLambs in a sheltered place are eating and laying around.\nA living room scene with the television and a Christmas tree.\nA living room has two couches and a television.\nTwo men who are looking at a passenger jet.\nA man sleeping in a bed with two cats.\nA woman bending over holding and kissing her cat.\nA person jumping in the air on a skateboard.\nThe stop light reads green, and there are two huge buildings in the back.\nThe back ends and legs of three elephants, including a baby, are seen on the side of a road.\nA baseball player wearing the number thirteen at home plate.\nTwo large white commercial airliners on an airport runway.\nNested measuring cups and spoons on a gray surface.\nA man playing with a Frisbee in a gym.\nA crowd of people carrying umbrellas across a rain soaked street.\nthree fourths of a pizza with meats and vegetables on a pizza pan\nA woman and a man standing with a horse in a boat and a dog laying next to it.\nA dog is sitting on a counter in what looks like a factory setting.\nA large tow truck drives down the street.\nA silver, stainless steel refrigerator in a kitchen\nA living room filled with furniture in front of a fire place.\nA woman with green lace underwear is walking away as tennis balls are hanging all around her.\nA photo of a dirty bathroom with a sink and toilet.\nA gray bird perched on top of a tree branch.\nAn elderly man and a teen play video games together.\nA keyboard and monitor on a wood desk\nPeople sit and wait, looking at papers and phones.\nTraffic light in a blank space with lit green light.\nA woman at a table eating with two pizzas.\nA man standing next to a woman holding an umbrella.\na man that is looking into a stove\nA desk with ruler, whole punch and scissors on it\nA bathroom that has a mirror in it.\nA long empty road with an over pass bridge.\nA group of young children sitting around a table eating food.\nA man is skateboarding down a path next to some grass.\nA child at a store display selling green bananas.\nan adult feeding a baby some cake\na suit coat shirt and tie hanging on hooks\nA zebra walking through a green field of grass.\nTwo photos containing food with hot dogs and pastries.\nA picture of some delicious pizza ready to be eaten.\nA small white and brown bird resting on a twig.\nA herd of black cattle grazing on a lush green field.\nA toilet in the bathroom with a wheel in the window.\nA CUP OF COFFEE AND A PASTRY ON A TABLE\nA black and white cat is sitting in front of fall foliage.\nTwo adult males enjoy playing a videogame together.\na living room with a low ceiling and it has a couple of couches\nThe personal sized pizza on the plate has many toppings.\nA little toddler boy sleeping on his couch with a remote in his hand\na cat dressed with a collar and tie decorated with irish symbols\nLittle boy in boat with two halves of a banana in mouth.\na man on a pay phone holding his hand out to someone\nA bunch of food sitting on a plate with a spoon\nA giraffe leans its neck as it walks through the bush.\nA toddler brushing her teeth with an electronic toothbrush.\nLittle girl with a group of children watches a show.\nThe tray has fries, meats and vegetables.\nA long row of train carts sitting in a yard of tracks.\na woman in a dress prepares to hit a tennis ball\nthere is a woman sitting on a couch holding a piece of cake\nSeveral sausages cooking on a grill with glowing charcoal.\nBlack and white photo of a large clock located outside.\nthis person is doing his work on two computers\na dog and a person stand on an edge with a mountainn in the back ground\nA table with plates of food and an orange on it\nA hanging street sign that says Rockefeller Plaza.\nTwo people with remotes in a living room.\nA couple of people on a field playing baseball.\nan elephant resting in the water next to the shore area\nBlack and white photograph of a woman surrounded by pigeons on a city street\na couple of animals are standing in a field\nsome boats going down a tree lined canal\nAn overhead view of a lot containing many parked, empty buses.\nA sign indicating Florida Avenue and another one stating the speed limit is 35.\nA guy holding a piece of food up to his mouth.\nan image of a dog with one paw out the window\nA book is open and kept in front of a soft toy.\nA disturbing doll sits next to a clock in a mirrored image.\npeople bicycling down a city street in daylight\nA group of people sitting next to each other on a bench.\nA person walking down a sidewalk carrying a back pack.\nA reflection a person catching a frisbee in a mirror like object.\nThese military guy is celebrating something big with a nice cake.\na very big elephant with some clothes on carrying three people\nTwo bears coming out of the woods to a road.\nBroccoli and a deep fried food lay on a black and white plate.\nThere are two elephants standing next to each other.\nThree cows in a barn eating food off the ground.\nA truck driving through and intersection waiting on a pedestrian to finish crossing the street.\nA car at the light getting ready to go because the light is green.\nA book setting on a green bench in a park.\nA small herd of cows grazing along a path on the side of a hill.\nHelmeted and uniformed military men travel together on horseback.\ntwo men in suits standing next to each other\nThree people are standing in front of a truck while another is in the background.\nA person that is going out some candles.\nA man and a dog on a skate board.\nThe serving counter of a restaurant is quiet.\nA man tossing a teddy bear off the side of a bridge with a parachute.\nTwo men sitting at a table with a very large pizza.\nA couple of ships in the water by some buildings.\nA big building on some grassy field during the day.\na boy laying down on a surfboard in the water\nA plate of food that has pita bread, green peppers and tomatoes.\nA man that is sitting on a moped.\nA picture of an oven with food baking inside.\nA man riding down a snow covered ski slope on skis.\nThe travelers stare outside a tram as it approaches a giraffe standing by the roadside.\nA man filling jugs with water from a bathroom sink.\nA white cake with blueberries and oranges on top.\nLeaves and purple flowers come out of a brown vase on a desk.\nA toddler is running through a kitchen while some adults stand close by.\nA cat laying on top of a wooden desk near a monitor.\nA couple of people on a field with a Frisbee.\nA view of the street signs \"W 122 St.\", \"Seminary Row\", and \"Broadway\" in front of an old red brick building.\nA skier lifts their ski poles in the air on a slope, with other skiers nearby.\nA protest sign painted like a stop sign stating \"stop harper\"\nThe face of a dairy cow in a pen.\na person ina field playing with a frisbee with trees nearby\nThere is a person flying a kite at the beach\nA bowl of raw fruit on a table by a painting\nA plane flying over waves and a small island.\nA open suitcase containing shoes with a table on top.\nA man poses with a pinwheel against the blue sky.\nA modern looking living room in an apartment.\nPeople walking down the stone sidewalk in the rain.\nA group of people are around a dining table.\nA lot of people that are in the street.\nA zebra standing near a tree in a field\nA man eating chocolate donuts and a woman smiling next to him.\nA group of people posing around a woman holding a cake.\na person getting ready to swing on something\nA pretty young lady carrying a white umbrella.\nA flamboyant man wearing a tight green marching band uniform.\nSomeone has set up a make-shift photography workshop in the field.\nA gray train is on a track on a hill near water.\nTwo giraffes stand back to back and eat leaves\nFew persons are seen on zebra crossing on road and an elephant with a banner is there.\nA small herd of cows stand in a high mountain meadow.\nTwo elephants are in the middle of a circus ring.\nA person holding an unusually thin Chiquita banana.\nThe bathroom is in the process of being worked on.\nA plate full of broccoli with fries and carrots\nA horse is grazing in a grassy field with a view of mountains.\na wet black dog has some sand on its nose\nA man sitting in a wheel chair under an umbrella on a busy street.\nA white fire hydrant sitting outside a building with a mural painted on it.\nA herd of animals standing in a large field.\nA row of white toilets sitting on top of a dirt ground.\nA tennis player is lifting is bending his leg off the ground and reaching his arm up in order to hit the ball.\nA young boy in a wetsuit on a surfboard.\nTwo young women playing a game of soccer.\nA plane sitting on a runway beside water.\nThere  are people skiing next to a dog.\nA grey and white cat sitting in a sink\na person with a large afro and glasses\nA man holding up a kite so it catches the wind.\nA country scene has a rocky trail leading to a body of water.\nA hand holding a bagel covered in almonds.\nA girl taking a bite of a slice of pizza.\nA middle-aged man in a suit with messy black hair.\nbarren clean white kitchen with white appliances and stainless steel sink\na plate full of vegetables with seasonings sprinkled on top\nA young man in blue jacket riding skateboard in snow.\na hydrant in a place near some houses\nA female tennis player jumping up to hit the ball.\nAn open laptop computer sitting on top of an office desk.\nA picture of a building with a very nice clock.\nThere is a large plate of tomatoes and a pan of sliced tomatoes\nA car that is sitting near a green street light.\nA young woman using a cel phone, in a college tank top.\nA sink and some counters in a small room.\nA man riding a skateboard on top of a ramp.\nA couple of young people standing in front of a TV.\nA person putting some food on a white plate.\na person at a table with a plate of pizza\nBright blue train carriage awaiting passengers in Peru\nTwo slices of plain pizza are sitting on a plate.\nA grey table with a white plate of food.\ntwo people sitting on a bench near trees\na guy in the photo looks sad and dark\nA group of people seated using cellphones, three ladies with handbags\nThe side view of a man with coffee casts a shadow as he ponders at his laptop\nA large brown dog sitting next to a frisbee.\nA woman in a white and green tennis dress setting up her shot.\nThree women in a kitchen at a table full of food.\nTwo giraffes in their pen at the zoo.\nVery pretty clock with the base surrounding by brick floor\nA small transport truck with a white trailer.\nAn open faced sandwich, chips and sauce are on a plate.\nWhite dish piled with ham slices and broccoli.\nA train with lots of red cars traveling down tracks.\nA man on the beach has a large umbrella.\nA picture of a man in a green baseball uniform batting for his team.\ntwo brown and white birds sitting on a roof\nA herd of flamingo birds in the water near a construction site.\na boy in a baseball uniform standing in a field\nA red and black long van is parked in a parking lot.\nA giraffe next to the road on a safari ride.\nA group of cups that are sitting on a table.\nBiplane flying over blue ocean next to coastline.\nThere is snow on top of the snow board.\nAn open living room with hardwood floors and a vase of flowers\nA picture of a bathroom with white tile walls and a window with white blinds.\nA herd of sheep roam in the grass.\nA sidewalk area with a red fire hydrant near a light pole\nA slow moving subway train that is going down the track.\nA woman hitting a tennis ball with her racquet.\nA young child sitting on a leather couch holding a controller.\nA restaurant is filled with many people and newspapers.\nA couple of donuts are on the plate, ready to be eaten.\nA stove top cooking in a pot and frying pan\nA street sign with the name of a street on it, and next to it is a post with various names up and down the post.\nclose up of a bulding in the mirror of a vehicle\nA living room with a fire place and lots of furniture.\na bunch of bright lighted signs and mopeds on a street\na man with a helmet is touching some food\nA silver hippy van and a bus for vegans.\nA Southwest airplane is parked on the runway.\na close up of a person holding two birds\nmany different clock on a shelf near a wall\nThe small bathroom has a glass shower door.\nTrain traveling through countryside near tall brick structure.\nTwo girls outside, one flying a kite and one sitting down.\nA kitchen scene with focus on the sink and counter with vegetables.\nA cat sitting on a windowsill next to a painted pumpkin.\nA family on skis posing for a picture.\nThe big rig truck is parked in the parking lot.\nA cat saying on a sofa with many pillows.\nA group of animals grazing on grass in a field.\nA man riding a snowboard down a snow covered ski slope.\ntwo people on a beach next to a large body of water\nA banana with a face written on it in front of a mirror.\nA table topped with three plates of food.\nAn old ornamental building features many beautiful windows and a clock.\nA light blue airliners is parked on the tarmac.\nA man handing out slices of pizza to protesters\nA bus and a car travelling in the same direction on a sunny day.\nA computer mouse is placed next to a computer keyboard.\nSomebody is sleeping in the bed next to the clock on the table.\na baseball player holding a bat on a field\nA young bearded man holding a partially eaten hot dog.\nWoman in white and black outfit on a tennis court.\nA white bus driving down a street next to people.\nA sofa with pillows next to floor and blue rug.\nA smiling woman holds a banana up in the air.\nA yellow tractor digging next to a yellow and red fire hydrant.\nA snow boarder is going down an indoor slope.\na couple of trains sit parked as it overlooks a city\nGroup of people holding up umbrellas in front of cactus.\nFemale tennis player preparing to serve the ball\nA man riding a skateboard doing a trick.\nA double-decker Liverpool Street bus on a city street\nA person cutting a pizza with toppings into slices\nA large bird sitting on top of a metal spire.\nA tennis player raising his racket to hit a ball.\nA tray with food on a table\nA person takes a picture with their cell phone.\nA white toilet sitting in a bathroom next to a TP roller.\nA stop sign that has some foreign words written on it.\nA skateboarder rides his board in a concrete pool.\nSoup presented in a bowl on a plate.\nTwo buses under a large open structure at a station.\nA baby elephant reaching for grey bag at a zoo.\nMultiple pictures of a man brushing his teeth.\nA salad to be eaten with wooden chopsticks and a drink.\nA couple of men in police uniforms sitting on horses.\nTwo children sitting in the grass eating food\nA herd of elephants walking across a river.\nA line of stuffed animals in a child's room.\nPlate of food that are on top of a table.\nA computer is sitting on a computer desk on the far side of the room.\nA giraffe sticking its head through the rails of a wooden fence.\nA white and black bus on street next to a building.\nA woman talks on her cell phone as she skates down the sidewalk.\nTwo zebra and other animals grazing the grass.\na lady that has kids in her lap at the table\nTwo men jumping to catch a Frisbee while people watch them playing.\nA man wearing all blue with an oil can walking around a train engine.\nA young man walking down a sidewalk pulling his travel bag as others watch.\nA street view of a protest and a woman with her fist raised.\na cat on a bed with dishes on top of it\nA green and white airplane behind a fence.\nA yellow fire hydrant on the side of the street.\nA tall obelisk sitting next to a tall white building.\nA herd of zebra standing behind a wire fence.\nA cat relaxes in this tan leather chair.\nA group of people standing around a table filled with fruits and vegetables.\nA man with a racket walks on a court.\na person is playing tennis outside on a court\nA man in a black jacket riding a skateboard on the street.\nA group of young men standing next to each other on a field.\nCutting board with various fruits, utensils and spices.\nA little dog is running around an outside shopping stand.\nTwo tangerines and a banana atop a blue plastic bowl.\nSeveral traffic lights are seen near a busy highway.\nA yellow and white train traveling down train tracks.\nA group of people sitting on horses in a row.\nA bench outdoors on a path near a fence.\nAn animal stands in grass on a hillside on a sunny day.\nAn almost empty box with a partially eating doughnut and a knife in it.\nA clock tower next to a large building\nLARGE SANDWICH CUT IN HALF ON A PLATE.\nA lone train is parked on the train tracks at the station.\nPeople skiing down a slope with many moguls.\ntwo blue and white trains buildings and some wires\nA street scene with people and cars on the street.\nYoung professional looking man looking at the camera.\nA large lizard float is rising in the air.\nA young girl with a tennis racket is in a parking lot.\na person wearing a dress and riding skis indoors\nA WOMAN IS NEAR A CAMEL WITH A UMBRELLA\nBox of cereal sitting next to a box of donuts.\nA woman on a skateboard riding on the sidewalk\nA red passenger bus makes its way past Big Ben in London.\nA tooth brush in a blue glass sitting on a counter.\na person in a tie and suit sitting before a white plate of food and wine glass.\nA man approaches an intersection in the rain.\nA man sits at a table that has a surfboard propped against it.\nPeople in business suits standing in front of a building.\nSeveral elephants walking around grassy area in the wild.\nA tennis player holding a a racket on a tennis court.\nA single seagull swimming towards a rocky shore.\nA man in beard and glasses with a red and white suit on.\nA man skiing down the side of a snow covered slope.\nA view inside a refrigerator that is completely packed with food.\nA lamp post with traffic signal, street light and street signs.\nBaseball players playing on a professional baseball field.\nA spacious bathroom with lots of lower cabinets and a toilet.\nan image of a guy that is walking by a train\nStop sign is above a red triangle sign next to a barb wire fence.\nA hand holding a water bottle in front of a cat.\nfour plants being grown outside in a planter\nA surf boarder stands as he rides a wave.\nA museum display featuring professional baseball jersey and bat.\nA zebra standing in grass next to trees.\nThe old man is talking on the phone.\nAn unmade single bed in an upstairs bedroom in the early afternoon\nYoung skate boarder doing a nearly vertical stunt\nThe cat lies next to a cat sitting inside of a sport bag.\nA partly eaten pizza and a fork with wine on the table.\nAn older man sitting at a small table about to eat a slice of pizza.\nA man surfing down a rushing rivers wave\nA man walks a dog near a large bus.\nA bike rack full of bikes and people every were.\na small yellow boat set in the water by large rocks\nGiraffes in the wild on a sunny day\nA woman stirring a large metal pot of food.\nA pitcher preparing to throw a base ball.\na plate with half eaten food on it\nA large white bear walking across a river.\nA bike parked next to a parking meter on the side of a street.\nA white bowl topped with a sandwich filled with meat and veggies.\ntwo black horses are grazing on green grass in the field.\nPeople milling about a bus terminal getting ready to board.\nThe cat is balancing on top of the door.\nA man holding a white and black umbrella in a large parking lot.\npeople sitting on a bench facing the water.\nTwo men sitting down at a both eating .\nSeveral people standing around and looking at a vintage plane.\nA white plate topped with vegetables.covered in sauce.\nA girl is riding a surfboard in the water.\nA bright red motorcycle parked with other motorcycles beneath streetlights.\nThe street sign is in the middle of the flood waters.\nA stuffed panda bear is sitting on a bench near a Buddha statue.\nSmall bird standing on rope near open ocean.\nA plate topped with pasta, meat and broccoli.\nA woman with a snowboard jumping in the air.\nCars are driving through the intersection underneath traffic signals.\nA person holding a tennis racket on a tennis court.\ntwo baseball players standing close to the base\nA black and brown dog rests on a couch.\nA black dog sitting in the middle of a bathroom.\na motorcycle with a boot on the back wheel\nA woman and girl watching donuts being made through a window.\na scooter with a rifle bag parked in front of a fence\nA nun sharing pizza with two young men.\nA snowboarder is on a board and is jumping in the air.\nA little boy in a inter tube at a water park.\nTwo black cooling rack shoaling pieces of pizza.\nRoom with patterned carpet and wallpaper and dark wood furnishings.\nA door to a bedroom is open with a wooden dresser in view.\nA car parked in a lot with a surf board strapped to the top.\nA man, women, and child sitting at a table.\n4 seagulls stand on rusty rods with people in a boat in the background.\nA train is stopped in its tracks next to a building and cars.\nVarious items on white surface including a cellphone, keys and camera.\nA monkey hanging from ropes eating bananas strung to it.\na group of children playing soccer on an open field\nA rose, an entry way to a forest, a water fall and a lounge sign are in a series of photos.\na wedding cake with a picture on it\nA man sitting on bed looking at a television and person in mirror.\nA couple of large birds standing by some eggs.\nA hotel bathroom has a granite vanity with a big mirror.\nA young man is holding up a skateboard.\nAn umbrella is strapped to a blue bike.\nA couple of pairs of skis in the snow.\nA sliced panned pizza on a table ready to be served.\nThree giraffes stand in front of blurry trees.\nA green passenger train stops in a station to pick up passengers.\nThree pedestrians crossing a street at a stop light.\na man with a beard a deer and a pink fire hydrant\nTwo male tennis players meeting at the net for a high five.\nA very tall brick building sitting next to a traffic light.\nA man jumps and reaches for a frisbee\nA group of people sitting around a table.\nMan holding up a plate with a brownie in the shape of a spaceship.\nA small garden area features a few springs of growth and a small busy plant and a few bricks.\nA traffic light with stormy skies in the background.\nA girl thinks she is being funny while eating pizza.\nThe surfer is barely hanging on to his surfboard.\nA stop sign that has spikes sticking out of it.\nA plate of food that includes chicken and broccoli.\nA white toilet in a very small bathroom.\nA crowd of people watching a baseball game where a batter just hit a ball.\nA person holds a pink frosted donut with jimmies.\nsome white black and brown sheep in their pen\nA black truck is driving on an open sandy area\nA snowboarder flying up in the air with the sun behind him.\na table with a blender and a glass on it\nThis young girl is learning to throw a frisbee.\na couple of geese are on the water\nA row of boats on a river with trees in the background.\na red fire hydrant next to a stone brick wall\nA man sits on a bed talking with hand gestures.\nperson walking down the sidewalk at night in rain\nA giraffe standing next to trees on the plains.\nThree horses on a green pasture with an old building in the background.\na couple of boats are sitting in the water\nA man flying a double string kite in a large grassy area.\nA man in a ball cap riding on a mule.\nAn old red truck is driving by the water.\nAn assortment of vegetables sit out on the cutting board.\nThe two zebra stand in a black and white photo.\nA girl swings a raqcuet at a tennis ball.\nA young girl sitting with a young boy at a table with food.\nA silver and black train passing under a bridge.\nA person that is going out in the water.\na large red bus is at a stop\nan open kitchen and living room in a daylit house\nA young boy standing on top of a rug in a living room.\nThe head and arm of a person flying a kite.\nA busy market sports colorful umbrellas that shade the vendors.\nA bird is perched atop a computer monitor.\nA chocolate cake is being frosted with chocolate frosting.\nA large dog has a collar with clock on it.\nthe truck is going up the hill in the snow\nAdult with laptop with dog lying next to him.\nA white plate topped with fries and sausages.\nAn old man n his computer in front of the fire.\nA worker performs maintenance on a fire hydrant.\nSome donut are on a round white plate.\na cat standing on some rocks next to some bushes\nthe bus is blue and is stopped. Some people are standing waiting for it\nA man on a surfboard is riding a huge wave with his feet out and arms extended.\nA living room filled with furniture and a rug.\nTwo small birds in a large green grassy field.\ntwo older people stand next to a statue of a horse head\nA boy in the stands of a baseball game biting into a hot dog.\nA very pretty shallow stream in the woods.\nA busy intersection in the city is full of people and signs.\nA gray minivan on the curb at W 38th st in a big city.\nMan playing racquetball about to hit a ball.\nA couple of chairs sitting on top of the back of a truck.\nTwo men standing next to each other holding giant sugar donuts.\npepole eating at a restaurannt meat and veggies\nA close up of a parking meter by a parked car.\nA living room with windows all around it .\nA photo of an old clock tower next to some buildings.\nSeveral stuffed animals sitting in wooden boxes outside.\nSeveral long boarders are riding long boards down a quiet street.\nThe animals are roaming in the backyard outside int he grass.\nA kitchen with furniture and decor in it.\nCrowd of people standing around while someone flies a kite\nA man watches another man that has numerous bananas on his head.\na fake mouse is in a box of doughnuts\nA sink with several faucets and a large circular basin.\nTwo park benches with one man sitting in woods.\nA large white bed covered in two white pillows.\nA group of people that are sitting on benches.\nA living room features a wood ceiling, stone fireplace and large glass window.\na desk with a monitor keyboard and mouse\nA young boy about to hit a large ball with a large baseball-like bat\nA woman stands behind a cake and baking decorations.\nThe dining table and chairs are outside the small kitchen.\nA cat is sitting in front of some steps\na male skateboarder in a black shirt doing a trick\nAn elephant moves is gesturing toward a bus.\nMany people are sitting around tables with dinner plates on them.\nTwo men standing in a living room holding Wii controllers in their hands.\nA person walking on the shore with a surfboard under their arm.\nA kite made like an airplane flying above several American flags.\nan image of two bags set on a hotel bed\nKeyboard, sunglasses, book, pen, and various items on a table.\nCollection of books scattered all over a bed.\nThis blurry picture has a male in a suit in it.\na woman wearing a cowboy hat face to face with a horse\nA beautiful young lady sitting on a park next next to an old man.\nA yellow and red fire hydrant in a yard.\nA little boy is waving at the runway as a plane is sitting waiting for takeoff.\na lit candle sitting next to a plate filled with food\na picture of a hang glider on a beach\na clock that has two figures sitting on a mantle\nA male and a female holding up their cellphones\nAn empty bench is on the curb side of a grassy area.\nA man in casual wear holding a baseball type bat.\nA close up of a person's hand with a scissors cutting something wet.\nWine glasses sit in a row on a wooden ledge\nA large grassy field with giraffes and a few other animals.\nSmiling child with a tooth brush in hand.\nPitcher at mound throwing ball to baseman near runner and umpire.\nA luggage cart stacked with a very tall pile of luggage.\na road with many traffic lights and cars driving\nTwo people flying a kite in a park\nWildlife standing near water area in natural setting.\nTwo businessmen talk over a cup of coffee.\nA bunch of construction barriers near an old, worn down building.\nA flock of birds flying over a body of water.\nA tennis player hitting the ball with the racket.\nAn overturned skateboard lying on a grassy field.\nTwo boys are playing catch with a frisbee.\nA black and white zebra grazing on grass.\nA cat sitting on top of a chair.\nA clock that is in between two windows on a building.\nA stop sign set on the inner curve of a curving dirt road.\nA cat sitting on the floor by three shoes.\nA man holding a ski board and parasail rope.\nA tennis doubles team with one player in the air, her racquet in motion.\nA person standing on a surfboard riding a wave.\nSeagull in the sand near a boat launch\nA public restroom with focus on two urinals.\nan image of a man carrying luggage in a cart\nA CGI man sitting on top of a CGI hospital bed.\nChild sitting down in a chair eating a sandwich.\nThe cat is looking at the television screen.\nA small child stands in a shopping cart with an umbrella.\nA plain piece of bread resting on a wooden plate.\nA train door from the inside of the car with exit signs and grab bars.\nThe man is sitting on the bench typing on his laptop\na train car sits parked as people stand next to it\nFood trucks serve customers in the parking lot at the event.\nA man and woman seated at a table in a restaurant.\nA number of train tracks with a train on it\nA plate full of spinach salad with dressing\na couple of people that are playing with a Frisbee\nA wooden stand with many types of fruit.\nA white plate holding two pieces of cake on a table.\nA giraffe handler training a giraffe at a zoo.\nA small sink area is packed with items.\nA pole with many different stop lights in different directions.\nA tree filled with lots of fruit and leaves.\nA brown horse standing on a lush green field.\nA brown towel that is sitting on a tub next to a toilet.\nA blue bowl containing various fruits such as apples and bananas.\nA bear sits next to another bear on a white blanket\nA dish of vegetables mixed together in a bowl.\nA boy is doing a trick on his skateboard.\nA large bus on a open city street.\nA clock with a colorful drawing on it.\na person riding a surf board in a body of water\nCaucasian and African-American business men standing in line to buy 'Po-Boys from a catering truck.\nAn old picture shows a man up to bat on home plate.\nA plate of bread , eggs , and bacon .\nMany people and a few cows are spending some time in the water and on the shore.\nA man and a woman cut a cake together.\nA bear walks in the bushes and plants in the wild.\nA man is riding a horse in front of several buildings.\na basket of apples oranges and avacado on a table\nThe cup that contains a toothbrush, toothpastes are placed next to the mirror.\nA red car and red motorcycle parked at a curb near a woman walking with an umbrella.\nA toddler is sitting in the bathroom sink playing with toothbrushes.\nA woman posing next to a double layer stack of donuts.\nMan intercepts man over a game of frisbee\nMan surfing on an ocean wave in the summer time.\nAn aerial view of a city and waterway with ships in the water and a bridge.\nA bunch of books that are lined next to a clock.\nA man in a blue suit eating a hot dog in a gym.\nPink and white flowers planted in an outside area.\nA skateboarder rides a ramp in a skate park.\nA dog laying on a couch with a Frisbee.\nThe drink in the glass is garnished with toothpicks and rosemary.\na man dangling in the air over the ocean\nA chocolate and ice-cream dessert in a restaurant\na single person walking the beach with a dog\nThe train is traveling down the tracks by the station.\nA room filled with computers and laptops on a desk.\nA group of people sitting around a table with clutter on top of it.\nA bedroom containing a bed without sheets and a dresser.\nA couch is made into a bed in a room with a desk.\nA player at bat in a baseball game.\nA boy is flying a kite on the beach.\na woman in a gray top is cooking outdoors\nA man sitting with his back to a dining table, with a laptop on his lap.\nA young guy standing by a tree while playing outdoor activities.\nA couple of white parrots perched on top of a tree branch.\nA man holding up a tennis racket as he coughs into his arm .\nA young man is tilting a skateboard up with his feet.\nA woman is sitting on the curb with a decorated parking meter.\nA frozen pizza box with the cooked pizza lying next to it.\nA brown vase sitting inside of rocks next to a set of green plants.\nA half eaten sandwich is wrapped in white paper.\nA bald man with a mustache wearing a suit.\nThis is a yellow and blue double decker bus.\nA picture of a wooden hedge hog clock with a price tag of twelve dollars.\nFour people with a birthday cake on a table.\nyoung children getting healthy food from a table.\nSome street signs point directions to various places\nA Water Dept sign is placed in front of the fire hydrant.\nA plate has beef on it near a glass of wine.\nA cat looks back over its shoulder while laying on top of a fuzzy white blanket.\nA train engine carrying carts down a track to a station.\nThe electrical components of an oven are being tested with a multimeter.\nYoung man wearing shorts throws a frisbee among trees.\nThe woman is playing with a wii controller\nA person on a skateboard does a trick in a bowl.\nA man that is holding a knife and a pot with broccoli.\nThe Halloween display includes a spiderweb and lots of pumpkins.\nThere is a neatly made bed in a bedroom of a log cabin.\nWhite swans swimming in a harbor with docked boats.\nA novel is on the seat of a green metal bench.\nA man standing and posing for a pic in formal wear.\nA few airplanes on the runway at the airport\nBaseball team holding batting practice on the field\nA man with glasses talking into a microphone.\nThere is a hanging clock in the hallway of the home.\nTHERE IS A GIRAFEE THAT IS WALKING IN THE WOODS\nA surfer takes a ride on a wave near a mountain.\nA freshly baked pizza resting on a table.\nA man sitting at a table with a glass of juice in his hand.\na kitchen with a counter some chairs and a sink\nA woman riding a skateboard in the street behind a man on a bicycle.\na couple of guys that have emt equipment\nA man wearing skis holding two ski pose on top of a snow covered slope.\nA small potted bonsai plant is on the floor getting licked by a cat.\nA parking meter that has a blonde wig on it\nA large truck driving down a busy road with the back full if dirt.\nA clown talking on a  phone next to a building.\nA tennis player on the court holding a tennis racket.\nA plate of fries and a hot dog sandwich.\nA double decker bus stopped at a bus stop.\nTWO BUSINESS MEN WITH TIES ON CONVERSING OUTSIDE A BUILDING\nA bed sitting in a room next to a wooden door.\nA view of a person's hand on a computer mouse.\nA park bench in the woods with a bag on it\nA cat is looking out of the window.\nfour giraffes standing in a field 2 are facing forwards\nA pile of oranges sitting inside of a basket.\nOn this table there are mugs of hot chocolate with shapes and half eaten donuts on plates.\na teddy bear nailed to a tree suspended above garbage\nA room with a toilet, a door and shoes in it.\nA public bathroom area with orange tile walls.\nA watch and class with a beverage sitting on a wooden table.\nRoom with many hanging clothes, a bed and dresser.\nA photo of a group of bikes behind a bus.\nPendant lights illuminate a bathroom sink for two.\nA prepared pizza is sitting on an appliance.\nthree people standing in a room and eating food.\nA plate that has a sandwich and a bowl of fruits on it.\nA woman posing with a bat and wearing a batting helmet.\nA man wearing a white lab coat walking a cow down a field.\nA pony grazing on grass in front of a lighthouse.\nMan spreading peanut butter on an English muffin\nA train is traveling down a track in the middle of an arid plain.\nA woman is playing Frisbee with two dogs.\na vintage photo of man standing in the middle of some waves\nPersonal pan pizza on a wooden table top\nA woman is standing in front of a stove\nA man stares at a cake with candles\nA surfer is riding a yellow surf board as he hits the waves.\nSeveral cars driving towards a public market.\na traffic light and a street sign on poles\nA church steeple rising high in the sky.\nA jet airplane flying in the daytime sky.\nA tablet sits on a table with two pizzas.\nPeople walking down a sidewalk on a street.\nA yellow fire hydrant near a grassy field.\na man in a black jacket is holding a hot dog with mustard\nTwo friends are eating an extremely large pizza.\nThere is a toilet and a bathtub in a bathroom.\nA guy skateboarding indoors in front of a crowd of people.\nA cat that is looking out of a window.\nthere is a woman holding a baby and a pizza pie on the table\nA desk with laptop, mug, paper and a monitor.\nA woman with a tennis raquet prepared to hit the ball.\nThree shelf deli display case with bottle beverages on top.\nThe kitchen has five beams running across the ceiling.\nA woman grabbing a piece of cake off the top of a plate.\nA sandwich and a salad are on a tray on a wooden table.\na kid stands on a hillside while flying a kite\nA man in a red snow jacket is on skis.\nA woman standing next to a little girl playing a game on  Nintendo Wii.\na woman is playing tennis on a court\nA crowd of people standing outside of a brown brick building.\nA woman is preparing to bite into a sandwich.\nA young boy tying paper kites to a string stretched across a room.\nA dog that is sitting down by a bench.\nAn airplane sitting on the runway in the snow.\nA man about to put a leash on a large cow.\nPeople are lined up along a train station waiting for a train.\nYoung man on a skateboard approaching a street.\nA red and white fire hydrant on a sidewalk at the park.\nA man and woman walking across the lawn carrying an umbrella.\na person riding a motorcycle on a city street\na young kid performs a trick on a skate board\nMen playing soccer on a field at night.\nA couple of women riding skis on top of snow covered ground.\nTwo ladies using the Nintendo Wii in a living room.\nSome vegetables on the ground are in planters.\nThe clock on the side of the building is also a sculpture.\nShe appears to be hanging on the street sign.\nA giraffe on  a large plain with herd animals in the background.\nA man is about to swing a baseball bat.\nA child is in the snow with one ski on and one off.\nThe bed red couch from the Mc Donalds commercial sitting in a living room with a fireplace next to it.\nA sign above a white stove and refrigerator next to it.\nSome babies playing in the bath tub one holding a tooth brush.\nA man drives by on a person holds onto a ladder below an airplane\nA lone zebra standing in the middle of a field.\nA woman is playing tennis on a fenced outdoor court.\nWell decorated restroom with sink and chair for sitting.\nSkateboarder in the motion of turning on his skateboard.\nA large grey horse is behind a wooden fence.\na mountain with a bunch of animals next to it\nA dozen people smiling for the camera at a large wooden table in a restaurant.\nA black dog laying on top of a rug on a hardwood floor.\nTwo giraffes in an enclosure are bent over peering at visitors.\nA man riding a skateboard up the side of a ramp.\nA little girl that is sitting in front of a laptop.\nA professional female tennis player engaged in competition on grass.\nTwo computers are sitting on a brown desk.\nBeach umbrellas and chairs next to each other.\nA person who is on a barrel on a snowboard.\nA man standing outside holding a sausage dog in his hand beside the food stand.\nthe mirror is showing a picture of the microwave in the kitchen\na man is making some food in a kitchen\nA snow filled street with a stop sign on the corner.\nThis is a wide perspective of a room in a region.\na close up of slices of pizza on a plate\nA pair of boats stacked up on a beach.\nA man doing a jump on a skateboard\nA bunch of cars driving through down town New York City.\nA group of people are flying kites in a field.\nA large airplane sits on the runway at the airport.\nA group of people sitting down at a dining room table next to dishes.\nA group of young people playing a game of soccer.\nA guy riding the an incoming wave on a surfboard\na little bird sitting on a ledge as it looks at the window\ngreen peppers red peppers a tomato corn and hot peppers\nBiathelete skiing forward with her rifle on her back.\nZebras racing each other in their zoo enclosure\nA beautiful young bride standing next to a her husband as they prepare to cut a cake.\nPizza with pepperoni, mushrooms, olives and sausage on a pizza pan.\nBicycles and a motorcycle parked on a city sidewalk.\nA loading truck carrying boxes and a Stop sign\nA man jumping up to catch a frisbee\nA living room scene with a large window.\na close up of a doughnut covered in sprinkles\nA person with a bike and a dog on a leash, boarding a train.\na close up of a person wearing a shirt and bow tie\nA kite flying over a sandy brown beach.\nA group of people enjoying a day at the beach.\nA kid standing in the dirt with some fruit.\nA group of people are together in the snow on skis.\na person in an open area flying a kite in the sky\na train on a train station and people walking near by\nAn old fire hydrant casts a shadow on the sidewalk.\nA man driving a yellow car on the road\nA kid is playing on some toy drums\nA street with many signs on the corner\nA merry go round with lots of colorful giraffe and other animals.\nPeople are playing ultimate frisbee and someone is about to catch it\nA woman that is sitting on a bike.\nTwo women trying to compete for a Frisbee during a game.\nThe child in the black helmet is swinging at a tee ball stand.\na public transit bus on a city street with people near by\nA laptop computer sitting on a cluttered desk.\nA pedestrian sign has been devised in comic fashion.\nA man wearing a pair of glasses and a tie.\nTwo dogs near a carry-on bag on a tile floor.\nA dog wearing a bandana rides a skateboard.\nMini pizzas on shelves waiting to be bake.\nthere is a white toilet that is broken on the street\nA messy bed in a room with large glass windows.\nA black and white view of a clock tower with a ferris wheel in back.\nPeople at an outdoor market under a canopy.\nA computer monitor in a home style office\nA woman reaches out to pet a giraffe who stands in confinement with his companion behind a fence.\nA man on a cell phone resting his legs on his luggage\nA black and white photo of a man walking around with an umbrella.\na tall giraffe standing in front of a wood fence\nA clock tower with lighted clock faces, against a twilight sky.\nThe sign on the sidewalk shows a U turn.\nTwo people are aiming controllers at the television set while other sit on the sofa watching.\nA kitchen with steel dishwasher, refrigerator, cabinets and microwave.\nTwo men play Frisbee in the sand while others watch.\nA brown and black dog laying on top of a wooden seat.\nA blurry image of a knife cutting into frosted cake.\nA man in surf gear walking down a crowded street.\nA man in a warehouse riding some moving object.\npeople flying very high and waving their hands\nA man cross country skiing in the country.\na building with some really big and fancy clocks on the side of it\nA close up of a giraffe with its face against a pole.\nA person flies a kite in a field.\nSome cooked vegetables are sitting on a plate.\nA plate of food and some cups of drink on a table.\nSeveral slices of pepperoni pizza sliced into squares.\nA large fed ex plane flying over mountains.\nA banana, tomato and apple laying on a desk\nThe laptop is connected to a full size keyboard to make an effective work station.\nA giraffe is standing in the bushes and tilting its head.\nA man riding a skateboard while flying over a board.\na broken up DVD in front of a keyboard\nTwo people in a room playing a game of Wii.\na white horse sniffing the hand of a person in front of them\nSome chopped vegetables layed out on a pan\nA woman throwing a tennis ball up in the air to serve it.\nA baby elephant following an adult elephant by a fence.\nSmall group of people playing video games in a living room.\nA woman in pink dress playing a game of tennis with people in background.\nA group of airplanes fly through the sky.\nSkateboarder in purple shirt riding on top of his board.\nSeveral young Asian people are snowboarding and skiing.\nA kitchen with appliances that include a sink, dishwasher and a refrigerator.\nFour giraffes are standing next to a bare tree.\nThree beds with clothes laying folded on each one.\nA man has his neck covered by clothing.\nThe warning sign is below two street name signs.\nA group of people watching kites being flown in a park.\nA man and a woman eating donuts and having drinks.\nA jockey sitting on the back of a horse\na red and white sign in front of a white house\nA man rides a donkey pulling a trailer of hay\nA variety of sandwiches on a table with photos on it.\nA dog with it's nose on a couch and an open laptop\nA young boy is sitting on the wooden bench.\nClouds loom over the city skyline with a clocktower in the front.\nA group of men holding cell phones down at their waists.\nA guy that has a burrito in his hand and is eating the burrito.\nStuffed animals are sitting on top of bookcases.\nA woman playing a game of tennis on a tennis court.\nA close up photo of a baked food in a pan on a stove.\nA white bowl filled with different colored vegetables.\nA pile of carrots and broccoli next to green onion.\na surfer in a wet suit is surfing on a white board\nA lady wearing a hat talking on a cell phone.\nA female equestrian is riding her horse in a show arena.\nA little dog sitting on a wooden bench.\nA single skier is the only person for miles of flat snow.\nSomeone is displaying a colorful pinstripe wallpaper on a cell phone.\nA very rusty old car near some pretty flowers.\nA group of men sitting next to each other holding cell phones.\nA cat that is laying down on a couch next to a remote.\nA man is jumping and guarding in mid air while another guy is throwing the frisbee.\nA bear laying inside a decaying mass of some sort.\nTwo train cars are beneath some trees on the top of an incline.\nA man is doing a trick on a skateboard.\nA laptop computer is on a table in a nice back yard.\nA jar of food on a wooden table.\nThee people stand in a lot while one holds an umbrella.\nTWO BALL PLAYERS ON THE FIELD, ONE RUNNING TO BASE\nMen standing and one pointing to an object on a street.\nA man swinging a baseball bat as another looks on.\nThe baseball team getting ready to walk off the field.\nA small kitten walking on a laptop keyboard.\nA CITY HAS A CLOCK ON ITS BUILDING\nTrain that is very aerodynamic in its appearance\nA person wearing skis, standing in the snow.\nTwo plates of broccoli are sitting next to each other.\nA woman sitting at a table across from an entree of beef.\nA line of food trucks parked on a city street.\nA yellow commuter train pulling into a station.\nA large picture of a man with a mustache and a bird on his shoulder.\nA large group of people at a table using laptops.\nBird sitting atop a wooden railing among the trees.\nA guy holding a cellphone from a display.\nA view of a street with multiple store fronts.\nA woman helping a small child on snow skis.\na plate filled with assored meat, some fruit and veggiesm and a roll\nA person crouched over on open lid toilet\nA man using scissors to cut white paper.\nLook at how high the snowboarder is in the air.\nA large yellow and brown boat floating on a body of water.\nclose up of a large stuffed pasta shell and vegetables on a plate\nA line of bicycles beside a street where a bus is stopping for passengers.\nA ginger cat sits and looks out a window\nA holiday cake with holly designs on it.\nA woman feeding a giraffe under a tent.\nA clock on the side of a church tower.\nan image of a girl walking on the sand on the beach\nA promotional photograph of professional MLB player Travis Buck.\nThere are several hot dogs on this plate along with two sides.\nBaseball players are watching as a hitter hits a baseball.\nSeveral pictures of someone baking using an old school outdoor wood fired stove.\nA duck swims along a large body of water.\nLarge group of motorcycle riders coming down the street with flags.\nA horse is walking down the street alone.\na small child is playing in a field\nA blender and a glass on a counter top.\nA red stops sign stands on a grassy island that has grass and is near a street.\nA baseball player up to bat during a baseball game.\na multi-colored boat with tents sitting on the water\nA break room with a sink and a microwave.\nA couple of toilets sitting in a  bathroom.\nA locomotive on tacks with smoke coming out of it's stack.\nA group of baseball players standing on top of a field.\nSmall herd of sheep walking and grazing in fenced farm field.\nSheep are grazing on fresh leafy vegetables that have been given to them.\nAn incoming train is approaching a railroad crossing.\nA SURFING BOARD STAND WITH A PERSON STANDING NEAR BY.\nA boy that is holding a bat in the grass.\nA family gathered around a dinner table getting plates of food.\nAn older man is holding luggage outside a transport center\nA girl is standing next to a horse.\nA giraffe stands next to a lone tree in a grassy area.\nA white building sitting below a brown tile roof.\nWoman in center of dirt intersection holding pink umbrella.\nA pizza is shown displayed on a plate.\nA green road sign with a bike painted on it.\nSomeone holds a bottle of mayonnaise near a hashbrown sandwich.\na young person riding a skate board on a wooden surface\nTwo cows with heads through bars eating hay.\nTwo large elephants walking behind a wire fence on green grass.\nA towel rack in a bathroom topped with two stuffed animals.\nWoman in bathing suit sitting on a beach chair, drinking a soda.\nTwo sheep in a  grassy field with a rabbit nearby\nThe person in the bodysuit is surfing a wave.\nA small plane flying through a blue sky.\na old jar that is sitting on the ground\nMany pedestrians are navigating around a street corner\na man in a suit standing in an office\nAn orange cat is sitting on a bag.\nA landscape photo of a large swimming pool area.\nA cat outside a window looking at a Buddha statue.\nA batter has just hit the ball but has not dropped the bat yet to run.\nTrolleys in the mountains travel through the snow.\nA photo of a woman sitting on a train on her cell phone.\nA plane is parked and being examined by several men.\nA group of skateboarders atop a concrete surface.\nA man with sunglasses dressed in a suit and tie\nThere is a baseball game going on, the hitter is about to hit the ball.\nPeople look on as an airborne snowboarder competes.\na glass wall to a shower in a bathroom\nA water skier holds on to a rope being towed by a boat\nAn unmade bed in front of a poster on the wall.\nThree people on horse back at a rural road intersection.\nA woman walking around a living room next to a TV.\nA douhnut and coffee are on a table.\nA person covered with snow on the mountain with skis\nThree women sit on the beach with two of them holding onto some umbrellas.\nA woman in a red bandana slicing a banana.\nA man is paddle surfing alongside his dog.\nA plate full of half eaten food with utensils.\na person in red is snowboarding on a hill\na dog sits in front of a window on a bed\nA classic building in the background frames a stoplight.\nA group of men standing on top of a baseball field.\nMany people sitting under umbrellas on a sunny beach\nTwo zebras standing by a log in a grassy field while people in a car watch.\nA woman on a court swinging a tennis racket.\nA person in a red shirt is riding a skateboard.\nThree different vases are on a shelf.\nA woman in a red dress talking on the phone.\nCarrots fresh from the ground with dirt and gardening gloves\nA fireman is getting water out of a boot.\nTwo men are holding video game controllers preparing to play.\nSome guys are watching two others playing the Wii.\nA young person in plaid doing snowboard tricks\na iced cake that has been cut up with a server resting on the plate next to it\nThe living room looks into a small, well organized bedroom.\nCooked broccoli and beans are a side dish.\nA group of people standing in the sand with a kite.\nA group of people sitting around a wooden table in front of a projection screen.\nA person looks on as two other people prepare to fly a kite.\nthis is a person flying a kite in the water\nMan looking at a screen while holding a Wii controller in his hand.\nA man with a tennis ball sticking out of his skull.\nA donut factory with donuts on a conveyor belt\nA building with a clock tower and a light blue roof.\nA store shelf filled with different heart shaped boxes.\nA man is smiling as he eats his passover dinner.\nA city as the sun sets with a gas station next to a traffic light.\nModel car sitting on a table next to a slice of chocolate cake.\nA man and his shadow on a red tennis court while the man swings a tennis racket.\nA birthday party for a baby with it's parents\nThere is a big room with furniture and items inside.\nTwo wine glasses sitting on top of a table.\nTwo zebras face each other and graze an open field.\nan  image of a guy that is on skiis\nA little girl riding a pair of skis on top of a conveyor belt.\nA pastrami sandwich being held by someone\nA fanciful dressed piece of pizza on a plate.\nA small Frisbee is lying in the water.\nA man holding a small white dog while wearing a black hat.\nsome white birds flying over very long grass\nA red double decker bus parked near a curb\nTwo computers are side by side on a desk.\nbarefoot little boy holding a hairbrush in his hand\nA boy throwing out a pitch in a ball game.\nSome sport players are competing in the Frisbee game and having fun.\nA train platform with passengers and two stationary trains.\nTwo horses trot on a field with their handlers.\nA white cow makes a face as he stands near a stone wall.\nAn open top double decker bus driving down a street.\na desk with a laptop and a monitor and keyboard on top\nA giraffe standing in an open field next to some rocks.\nA group of three people sitting on a couch.\nA vase with ref flowers in it on a table.\nPoised to slice into an iced multi-layer cake.\nA bench next to a small pond with a white bird standing in the water.\nA black cat underneath a umbrella in a room.\nThere are many birds flying near the boat.\nA room with a wooden desk and matching shelves\nA clean and tidy kitchen counter with nothing on the counter.\nA couple of girls standing in a livin groom holding Wii controllers.\nInside a restroom stall, a rag floats in the toilet water.\nA closeup of a deep dish pizza in a restaurant,\nSeven vases sit displayed on top of pedestals.\nMen are in a life raft which is beside a ship.\nA large boat with people on the back in the water.\nA giraffe bust hanging by a Rain Forest Cafe Sign.\na man that is skateboarding on a ramp\nsome forks people and a white cake\nMan riding on the back of a painted elephant.\nTwo women with clear umbrellas stand near two people in uniforms near a building with a thatched roof.\nA motorcycle parked in front of green doors.\na plate holding a slice of broccoli pizza next to a bottle of beer\nTwo mean getting ready to hug each other while standing in a classroom.\nA young man preparing to throw a frisbee.\nA man on a surfboard surfing in the ocean.\nA close-up picture of some food on paper plates\nA baseball player at home plate with a crowd of onlookers watching\nSome children are playing game in the room.\nTwin beds with pillows, and a lamp and vase\nA table full with a display of cupcakes and donuts.\nA chicken sandwich and french fries are on this plate.\nA vintage tennis team posing together on the court.\nA group of trucks on a mountain side trail just sitting there.\nSomeone skateboarding in the park and doing a trick in the air.\na cat that is laying down on some carrots\nA black and tan dog laying peacefully on a sofa\nA pole with a lot of street light signs on it.\nA table with many fruits and vegetables, including carrots, potatoes, squash and apples to name a few.\nThe brown dog is waiting for his owner to play frisbee.\nA shot from the crowd of a player during a tennis match.\nThe train is stopped on the tracks to pick up passengers.\nSome hotdogs and plates are on a table.\nBlack train cars on tracks next to trees.\nA man with a helmet on, on skis at the top of a slope.\nA vase with yellow flowers sits upon a red and blue table cloth.\nAn office desk with several monitors and birthday balloons\nA sleepy dog wearing a cowboy hat in the back seat of a car\nA truck hauls a group of tractors down the road.\na large clock resting on a poll by some trees\nAn Italian meal with marinara sauce served on a long tray.\nA town square with a statue in the middle.\nLarge variety of fruits and vegetables on display at a market.\nThe complete perspective of a washroom with numerous things to see.\nAn object that looks like a dog sitting by a miniature cell phone.\nA MAN IS PACKING UP SKIES ON THE SNOW LAND\nA group of people holding candles on a sidewalk in the snow.\na big yellow school bus shown through the rear view of another school bus\nA woman on the phone standing in the kitchen with her mouth open\nA bathroom with shower, sink and a mirror.\nPlayer and referee at tennis match on red court.\na man holding a bat gets ready to swing it\nA BATHROOM THAT IS IN SERIOUS NEED OF A REMODEL\nFather, mother, and young son playing in the water.\nA man in a tennis match is swinging his tennis racket.\nA cow resting on the side of the road.\nA dark bathroom with a white bathtub and a white toilet.\nThe street sign has numerous street names on it.\nA giraffe walking through a zoo type enclosure.\nA stuffed monkey sitting alone on a bench.\nA guy doing tricks on his kate board\nA red table topped with two plates with slices of pizza.\nA man and a woman with three dogs read the menu outside of the deli.\nA group of people walking down the street in what appears to be a marketplace.\nA red box on a pole with a solar panel on top.\nA white plate holding a sandwich and fried potatoes.\nA train stopped in a station with people walking towards it with luggage.\nLady standing in front of two couches with a remote control in her hand.\nA bunch of airplanes are parked on the runway.\nSeveral small white boats on the open water.\nA couple of people on surfboards in the water.\nA view of a restroom urinal covered in filth.\nthe start of a broccoli stalk in the garden\nA toilet in front of a window, and next to the shower are shown\nA person on a field swinging a baseball bat.\nA cat that is cleaning its paws while sitting on a suitcase.\nA bear lays on a pile of food\nA pair of giraffes standing in a pen at a zoo.\nThree men stand in front of a beige building and the man in the middle who wears a hat holds a white Frisbee.\nYoung girl gets ready to blow out candles as family watches\nA young man doing a skateboard trick while others watch.\nA boy is eating a slice of pizza at a table.\nA man holds scissors to his protruding tongue, as if to cut it off.\na sign for Bras Basah Road next to a pedestrian stopwalk\nA man standing in a field holding a small parachute.\nA microwave oven mounted into the side of a wall.\nThe city street is quiet this time of night.\nA young boy standing on the top of a sky slope.\nSeveral kites of different colors laying on the sand on the beach.\nWorkers in uniforms next to a truck and construction equipment\nA bowl of vegetables with a silver spoon.\nPeople sitting at a table and eating soup.\nAn orange and white cat chasing a feather\nAn Australian Shepherd herds cattle in a pen.\nDoorway view of a bathroom with a toilet and window.\nTwo women standing under an umbrella having a conversation.\nA picture of some people playing with a frisbee.\nA cat playing with a shoelace of a tennis shoe.\nA man riding down a snow covered ski slope on skis.\nTwo donkeys are standing together.  One is facing out and the other one has his head bent.\nA cat is lying in a houseplant on the window sill\nA couple of glass items that are in a room.\nA plate filled with lots of different types of food.\nThe cow is hoping for a way out of the fence.\na small white and red plane parked at an airport\nThree men standing together while on of them handing another one a frisbee.\nA female surfer stands on her board in the water.\nThe extra long passenger bus is entering the intersection.\nA dog is crouched down beside a toilet looking up at the paper.\nA bicycle is parked between a welcome sign and a street light.\nA close up of two teddy bears hanging from two strings on a hook.\nfour sheep grazing in a open snow pack\nTwo men are seen eating something standing on the street\na large building with people outside looking around\na bed room that has a couple of beds in it\nthis is several zebras in the grass running\nA girl laying down on the couch holding something in her hand.\nA railroad train pulled into the station with people boarding\nA family riding on the back of an elephant across a field.\nA stop sign affixed to a cyprus tree in a body of water.\nA bed made up with flowered comforter  in a room with two windows.\nA group of Asian people seated around a restaurant table.\nSeveral people in ski gear standing in the snow and in front of trees.\none brown cow and one black cow standing in mud\nA large open living room with a decorative rug.\nA train moving along a track outside during the day.\nA man is holding an apple in an advertisement.\nA giraffe and a zebra grazing the grass.\nA desktop and a laptop on a desk.\nA number of signs hanging from buildings.\nA group of people sitting around a restaurant table.\nA person on a field with a baseball bat.\nA jar filled with different types of fruit on a table.\nA giraffe is standing in a grassy field.\nA bowl of food is sitting on a table beside a glass of wine.\nThree men holding snowboards on top of a mountain\nA lone cow walking in a large field near houses.\na toilet a bathtub a rack bottles and a shower curtain\nTHERE IS A MAN THAT IS PLAYING BASE BALL ON THE FIELD\nA skateboarder with a hat is skating down a ramp.\nA group of kids at a skateboard park doing tricks\nA small boy on a guys lap with a toy guitar.\nA parking meter sits in the foreground before a church and other large buildings.\nLaptops, keyboards, and other computer equipment on display.\nA short boy with a penguin backpack stares at a large bear in the zoo.\nA cat laying on top of a laptop computer.\nA cat is sitting on a wooden surface behind a vase of flowers.\nA large propeller airplane flying through a blue sky.\nA baseball player getting ready to hit  with a catcher and umpire at a game.\nBarrack Obama eating a hot dog with his young blond boy toy.\nthere is a plate that has meat and rice on it\nTennis players stand together for a group photo.\nA smaller giraffe is standing in the green grass.\nA wooden bathroom with a wooden toilet next to a window.\nA man stands with a tennis racket on turf.\nTwo gulls perch on a mossy concrete wall overlooking the sea.\nTwo little boys sitting at a restaurant table with an adult.\nA man holding a cabinet in a kitchen.\nThere is a mountain behind the light house.\nA young woman taking a picture with her phone.\nA train is parked near a platform at the station.\na couple of buses parked behind the other in the street outside some buildings\nA boy, three dogs and a frisbee in a dried up creek bed\nMany people are walking around the dock near numerous ships.\nA living room filled with furniture and a large TV.\nA white stove top oven with two tea pots on top of it.\nA small gathering in the living room with drinks being served.\nA sign saying no drinks allowed is hanging\nA giraffe is standing with his front legs apart.\nA room that has stained glass windows separating another room.\nAn instructor pointing at something on top of a screen.\na building with a clock tower near other buildings\nA woman holding a baby near a long horn steer.\nA crowd gathered for a small-town parade looks on as the next float comes down the street.\nA brown dog with it's head hanging out of a window.\nA man sitting on a concrete structure on the beach.\nA young man riding a skateboard down a curvy road.\nA living room with a sofa and built in tables.\nA guy leading a bunch of people in a choir.\nA large long train on a steel track.\nThe two hot dogs are prepared and ready on the plate.\nAn airplane is flying high in a blue sky.\nA man standing on a field talking on a phone under two colorful kites.\nThe man is carrying the bananas down the road.\nA sailboat is floating outside on a lake.\nA wide building with many glass partitions has a front pavement with standing and milling people, some of whom are headed to the open door of a bus also resting on the pavement.\nTwo people are lying in a bed with a computer.\nA big bus and other traffic on a busy city street.\nNo parking signs hanging on a pole.\nThe man in black came up to the brightly colored food truck.\nA group of travelers wait to receive their luggage.\nThe motocross driver races down the dirt hill.\nA very comfortable looking bed with big plush pillows.\nA blue boat skims the ocean with a crew of several people.\nA pair of racing motorcycles coming to a start line.\nA hand lifting a slice of pizza off a pan.\nSeveral people are sitting at a restaurant as staff work.\nA tennis player reacts to hitting a ball.\nTwo giraffes, one is closer and larger then the other, appearing to be curious about the photographer.\nA banana sitting on top of a white plate.\nA man and woman look at a piece of paper\nA skateboarder rides his board at a skate-park.\nStuffed animals displayed on table with assorted items.\nMany kites are lying on the field on a cloudy day.\nA woman holds a little girl's hand while cross-country skiing\na desk with multiple monitors and a laptop\nA beautiful black and white dog catching a frisbee in midair\na group of people walk through a rain storm\nA group of children are standing in line.\nAn orange sign with black lettering near a city street.\nA person attempts to para-sail with a parachute.\nA man is riding a wave on a surfboard\nSeveral people on the beach with chairs and umbrellas.\nTwo zebras with one of them laying his head on the back of the other\nA group of people in a park watch a man in a green sweatshirt and hat catch a white frisbee.\nHorses bumbled up next to each other in an enclosure\na male sitting on a toilet with a laptop\nSeveral potted plants in front of a window.\nMan playing tennis in motion with crowd and tennis court\nA man standing on a tennis court holding a racquet.\na bus that is filled with people crammed together\nA picture of a stainless steel stove that is in someone's kitchen.\nA surfer in a bodysuit rides a wave.\nA picture of a toilet taken from above it.\na large monitor and a small laptop are on a desk\nA bench that looks like a round hut.\nCommuter bus on roadway at night in city setting.\nLunch recipe calls for whole eggs  baked inside bread, served with tomatoes on the vine.\nA handsome sink on a long pedestal in a bathroom\na man on a horse rides through the streets while others watch\nA group of black and white cows are on the grass.\na toilet a tub a brown wooden floor and a mirror\nThree cars traveling down a street in front of a large building.\nA white table that has black chairs in a kitchen.\ntwo zebras standing together in a field a by a small tree\nA close up of a fire hydrant with a skyscraper in the background.\nA person standing next to a pole working on a traffic signal.\nThis is a cluttered room with alot of boxes of stuff.\nA toddler pulls himself up next to a toilet\nA white toilet sitting in a bathroom next to a wall.\nA bicycle parked near parking meters both covered in snow.\nA man is jumping near a ramp on a skateboard.\nA surfer looks back as another surfer catches a wave.\nA small train is going through a bushy field.\nA beautiful woman taking a picture with her smart phone.\nA group of people flying kites over a sandy beach.\nA man holding a bat on the beach looks down\na woman is cutting a fourth of July cake while two other girls watch\nA table has a handbag, brush, mints, wallet, and cell phone on it.\nA man sitting on a stone wall talking on a cell phone.\nA smiling man holds a bunch of freshly picked bananas\nA person crossing a street next to a crosswalk.\nA small blue car parked outside a house\nThe woman is posing for a picture on the side of the road.\ntwo hands are toasting some wine glasses and a person in a black jacket\nA pizza with spinach on top of the sauce and cheese\nA group of people walking through building with large umbrellas.\na pastry with some powdered sugar on top of it\nA crosswalk signal with a lighted red figure.\nTwo trains on the track at a railway.\nA montage of people shaving and cutting their hair.\nA plate of pizza sitting on a table ready to serve.\nA yellow fire hydrant is on a city curb.\nA close up view of a mirror reflecting cars parked on a street.\nAsian man and woman sitting and looking at cell phones\nMan removing a pizza from a home oven with a peel.\nAn old restaurant in Lucerne that apparently has wonderful wiener schnitzel\nThe little girl is eating lunch and having milk.\nA bedroom with a bed under two framed paintings.\nseveral young students working at a desk with multiple computers\nA man cutting a cake on top of a table.\nA large group of sheep stand near the water all looking down eating\nA white and brown cow eating grass in a field.\nSmiling friends posing over a bag of donuts\nA behind the scenes look at a photoshoot for a bunch of bananas\nA zebra in a fenced in area next to a man.\nA man in a grey apron with a sandwich full of barbecue.\nA large teddy bear with pink camouflage on the street.\nA very tasty looking pizza sitting on a table ready to be eaten.\nA black TV sitting on top of a desk next to a couch.\nTwo horses eating grass by a body of water.\nA beautiful blond haired woman talking on a cell phone.\nA tourist looks at sheep grazing in a yard\nThese motorcyclists are waving their American and Marine flags\nBuses and cars stopped at a traffic light.\nClose up of metal post with a walk signal and a Do Not Enter sign with profane graffiti with building behind.\nA crowd watches a softball player with a red helmet.\nA man about to hit a tennis ball with a racket.\nA man doing a trick on a skateboard while people watch.\nThree adults watch a child holding a toy doll.\nA cat that is eating some food on the ground.\nA table with a book camera and shells\nA dark colored cat standing on a wood floor.\nTwo white ferries passing each other on a body of water.\nOrange cat walking across two red suitcases stacked on floor.\nA stop sign on the corner in front of a row of stores.\nA cat that is sitting near a sink.\nA toothbrush is sitting on a sink that has the words mystery toothbrush on it.\nA baseball player takes a swing at a pitch.\na small boat in a large body of water\na man sits on a bench while holding on to a dog\nCity two way street with cars lined up on both sides.\nFour different food dishes including rice and chicken.\nA man is wearing a blue shirt with a black coat and a gold tie.\nA black cat sitting on top of a red couch.\n2 farm cows stand on a baron field\nTwo female skiers are standing in the snow wearing purple attire.\nA sheep grazing in a field above a pond.\na pizza that is in a pan that is on a table\nA man on a skateboard riding over a hill.\nA bathroom with a large green plant growing on the wall.\nThe group of people walking in the city have umbrellas up.\nA pretty yellow city bus on a wide street.\nThe pre-school child is trying to kiss the toddler.\nA large group of people playing frisbee with onlookers.\nA black-and-white shot of a woman in a dress holding a tennis racket.\nWe have a distorted view of a bus and a pillar.\nAn upward photo of a man in suit staring in the distance with another man holding a finger up.\nA small teapot is on  a plain wooden table.\nA kitten is laying on a laptop watching a video.\nA girl wearing glasses posing for the camera while holding a tennis racket.\nA woman is painting a green fire hydrant.\nA couple of people standing in a room.\na man on a skate board does a trick in the air\nA woman riding on the back of a brown horse.\na little kid is looking at some doughnuts under a display\nMan in a field walking behind two Clydesdale horses.\nThe zebra and giraffe gaze into the open meadow.\nA person with a pink umbrella and a suitcase next to a taxi cab.\nSigns showing different street signs on the corner of the street.\na display of a giant bear standing in the middle of a shop\nA stop sign in an area with grass, trees and small buildings.\nA man stands near a podium in a gray suit and blue tie.\na woman standing in a kitchen while preparing food.\na person holding a kitten and feeding it milk\nA young person is playing a soccer game.\nTwo small dogs look around in the yard.\nA large truck is parked on a street.\nA man in the water on a surfboard.\nA train covered with snow sits in a train station.\nSome bananas are for sale at a store.\nA cat sitting on top of a television\nSlivers of cut, sun-dried tomatoes lay to the left of a pair of food shears there are uncut tomatoes on the right.\nA giraffe is walking near a fence at a zoo.\nA commuter train stops at a train station with it's doors  open.\nWild animals walking in large open field and path.\nA person is standing in front of a store mannequin in the dark.\nSeveral young boys are playing a baseball game.\nSomebody is having in the peaceful of the picture.\nBus, cars and a motorcyle all stopped in the street\n2 Motorcycles are sitting in an empty office\nA female tennis player swinging to hit a tennis ball on the court.\nA man looking down next to several hanging bunches of bananas.\nA woman in black shirt and skirt playing a game of tennis.\nA little girl smiles next to a foil wrapped cake.\na zebra is eating grass in a stable\nan emaciated man wearing tie standing erect showing teeth.\na motorcycle that has some sticks on his back\nA bride and a groom look ridiculous as they stuff cake into each other's mouth.\nA table topped with paint and construction tools.\nLady wearing a hat and sunglasses riding on an animal.\ntwo people at a bar holding drinks\nA three story white building with cars parked on the street in front of it.\nAn apple is carved with facial features and teeth\ntwo young children in a garden eating greens\nWoman and her dog tends to the herd of sheep\na man riding on an elephant near a stream of water.\nsome people are traveling down the street in a city\nA stop sign leans to the right at a small town intersection\nThe man is just getting ready to serve the tennis ball.\na mechanical robot holding a base ball bat\nA giraffe bends over to nibble grass in a rock and lawn area at the zoo.\nluggage is packed and lined up for traveling\npeople dressed in costumes at a ski resort\nThree people are having a cook off in the kitchen.\nThe skateboarder is learning how to complete his trick.\nA kitchen counter with  a lot of empty bottles on it.\nPeople are sitting in chairs with laptops, papers, and cups.\nA multi-hued teddy bear wearing a royal robe and blue ribbon.\nA group of giraffes feeding next to a tree in a caged area.\nA beautiful woman in a bikini surfing with her dog.\nA black handled toothbrush with new bristles on it.\nMany different types of small boats on the water.\na security officer sitting on a fence while talking on a cell phone and holding onto a segway\nSeveral kinds of doughnuts are in a cardboard box.\nA girl is going to the field with her soccer ball.\nA dog lying on a couch while wearing a collar.\nA multi colored train parked on a train track\nBlender on a messy counter in a kitchen filled with food.\nA slice of pizza with vegetables sitting on a plate near a drink.\nA man wearing a black helmet swings his baseball bat.\nBroccoli and waffles with a mushroom sauce on a plate with a spoon beside it.\nA pan has a slice of pizza left in it.\nToy cars line the parking lot of a toy setup.\nthere is a baby elephant standing in a field with tall grass\nA humming bird flying over a red bird feeder.\nA boy and a girl with a blue frisbee.\nA man on a bicycle passing by a taxi.\nA baseball player swings and makes contact with the ball.\nThis is a large kite flying high in the sky.\nA man standing on a tennis court holding a tennis racquet.\nTall green pine trees in back of large grassy field.\nA bench that is by some trees and grass.\nA man is standing in a semi-dark room making a call on a cell.\nA white vase of flowers sits on a wood table.\nA few empty boats at a river ride\nTwo men holding surfboards while standing in the ocean.\nDog on skateboard wearing t-shirt during parade event.\nA toilet and sink sit in an empty bathroom.\na sink sitting in front of a bathroom mirror\nA compact bathroom with a shower and a mirror.\nA shaggy dog lying on a green and blue blanket.\nElephant with young rider standing next to adult elephant near parking area.\nA small bathroom has green walls and beige floor tiles.\nA plate of pizza on top of the table\nSomeone is frosting a cake that is on a glass plate.\nA computer monitor and speakers on top of a desk.\nA snow skier is being pulled by a rope overhead.\nA table with pies being made and a person standing near a wall with pots and pans hanging on the wall.\nA large fleet of boats in a large body of water\nA man wearing a white shirt and tie.\na bunch of food is on a white plate\nA teenager has his feet off the ground holding an umbrella.\nTwo children stand near a large teddy bear.\nPeople in a market shopping for fresh produce.\nA brightly colored quilt on a bed in a furnished bedroom.\nA train that is yellow is moving down the tracks.\nA living room that has wooden shelves with many movies on them.\nA man riding a motorcycle over two cars.\nA black and white checkered bathroom with toilet\nfour motor cycle cops on a city street\nA white toilet sitting inside of a red bathroom stall.\nA surfer raises his arms for balance on a wave\nThree people on a bench are smiling and waving.\na group of lambs walk across a grassy plain\na person riding a surf board on a body of water\nA young elephant at a watering hole with other elephants in the background.\nA white polar bear standing on a concrete surface.\nA man jumping a motorcycle over a row of parked cars.\nA man that is doing a trick on a skateboard.\nA bathroom with a shower, toilet, and multiple sinks.\nA CAT IS SITTING NEAR A TOILET SEAT\nA group of four men riding horses holding flags.\nThis tech wizard leaves all options open, equipping his computer area with both a laptop and desktop machine.\na black and white cat a hand and a laptop\nFour people playing a game with a frisbee in a grassy area.\nA reflective mirror at the junction of two hallways.\nA very big dining table with some people at it.\nThe surf boarder is coming out of the water.\nA man in an old-fashioned baseball uniform hits a ball with a bat.\nPeople are wearing hats with umbrellas attached to them.\na man is in the air riding a skateboard outside\nA parking meter that is placed on a sidewalk.\nA stuffed teddy bear and memo sitting on a bunch.\ntwo zebras close to one another inside of a fence\nblack and white stripped  poles with stop lights attached\nA dog with his leash attached to a bench\nA bathroom sink, mirror, soap containers and a towel shelf below.\nLarge statue holding a black and white umbrella.\nA little, brown bird on a tree branch\nmany different bikes on a city street\nA wooden and metal park bench sets at the side of a path.\nThere are many traffic lights on this busy street.\nPedestrian traffic and advertising in an Italian airport\nA group of people on street in snow next to cars.\nA young man who is drinking a glass of wine.\nA building and cars parked in a lot.\nKitchen with white cabinets and refrigerator and black countertop.\nThe person who decorated this bathroom likes cats.\nA large room has many different planes displayed.\nA man is painting on the side of a wooden compartment.\nPeople waiting to cross at a busy intersection.\nA baseball player standing on the pitcher's mound\nThe street view of an average city street.\nElectrical plugs are coming out of a box on top of a box.\nA herd of zebra standing on top of a dirt and rock field.\na boy with glasses a cheese pizza with onions on a silver platter\nSeveral house barges lined up on a river.\nA policeman on a motorcycle waiting  on the street\nAn orange cat on carpet outside of a door.\nA picture of some kids playing a soccer game.\nA bi-plane in the sky on a sunny day.\nA couple of small white bears on some rocks.\nA donkey joins a group of zebras around a water trough.\nA man has a ponytail on top of his head\nThe man is playing baseball on the baseball field.\nA submarine sandwich sitting on a white dinner plate.\nA girl in grey jacket and tie standing on a street.\nA couple of kids hovering over a pizza sitting on top of a wooden cutting board.\nA woman in a black dress swings a tennis racket\na man takes a bite of a doughnut\nTHERE ARE YELLOW TOWELS IN THE BATHROOM HANGING\nThe woman is standing in the kitchen empty.\nA man on a couch talking on a cordless telephone\na group of people at a park playing with a white frisbee\nApples and oranges are being sold in a market.\nA cat is standing on top of TV near a huge bookcase.\nA man is standing next to a tall surfboard.\nA city filled with traffic next to a tall building.\nA red fire hydrant with the paint chipping off, next to a wire cable fence.\nThe room has red wall, white carpet and matching furniture.\nRain makes the brick streets shiny and dramatic\na close up of a person using a cell phone\nA cat that is wearing a festive hat.\na cozy living room with a couch and two chairs, a coffee table and lamp\na baby standing in a suitcase and a mom\na plane flying high in the air below a blue sky\nA giraffe in an enclosure standing by a tree.\nTrain traveling on tracks near populated area near waterway.\na man is snow boarding down a hill at night\nYoung girls sit at a table making paper kites.\na table with a calculator and phone siting on it\nA man is baking something in a portable miniature oven.\nA bathroom with blue tile in the midst of restoration.\nA brown and white cow standing next to a stream.\nThe woman is posing on her bed with clothes.\nA couple cross country skiing with their dog.\nOpen bottles of various wines on a glass table\nA very cute small child brushing its teeth.\nA spotted dog and a black cat hanging out in a bedroom.\nTwo men are playing a video game with a motion controller.\nA close-up shot of a zebra eating grass.\npeople walking in front of a building\na woman filling a bear at a build a bear type place\nA bathroom with a bathtub, sink, mirror, and toilet paper roll.\nA bunch of very cute cows going down a road.\nA purple frisbee is shown flying high above the sand.\nTwo pictures hung on a refrigerator by magnets.\nKites flying on the sandy beach on a sunny day.\nCluttered apartment with a large T.V. and a great view.\na man holding a tennis racket and ball\nA green double decker bus called \"Green Rovers\"\nA man doing a skateboard trick on some stairs.\nA vase with kanji holds flowers and is displayed next to a purple mug, a mug with a dog, and a white mug.\nTwo women walking on  a train platform\nan old diesel locomotive coming upon a track switch\nan airplane next to a large body of water\nGroup of table and desktop laptops sitting on a workbench.\nA small bathroom photo focused on the toilet\nA pizza with shrimp and basil on a table\nA black and white photograph of a traffic intersection.\nA man riding skis down the side of a snow slope.\nSomebody left the toilet seat and lid up.\nAn adult stands by a young child on a fake cow.\nA young man eating a sanwich while working on a laptop.\nA small dog sitting next to a wall in a hallway.\nA long black train sitting on top of railroad tracks.\nA computer mouse sitting next to a  laptop computer.\nA pair of zebras runs in tall grass.\nA picture filled with many things all inside.\nA woman looking into a mirror while blow drying her hair.\nA woman is giving her dog a bath.\nA bride and groom are slicing a wedding cake\nA pack of zebra standing in a field next to an ostrich.\nVeterans riding in the back of a military truck.\nA person with glasses on the skateboard as others watch\nA chef at a pizzeria behind the counter\nA brown dog in a grassy field with a purple frisbee.\nA stuffed animal is in a porcelain sink.\nGroup of children and adults playing a video game.\nA toy train track is set up with two trains, houses, and a tractor.\na male in a tan shirt is playing a video game\nA girl is smiling while riding a gondala.\nA broken fence seen through a broken window.\nthree men and a woman pose for a picture on the tennis court\nA man and woman put ketchup on a hot dog bun.\nA man riding on a skateboard on a sunny day.\nBrown bear laying down on a log of wood in the forrest.\ntwo men holding wii controllers in a living room\nA woman is standing by a truck smiling at simetjing\nA bird is sitting idly near some flowers.\nA woman walks in front of a horse next to a red trailer.\nThe bowl has broccoli, celery, and lemon slices in it.\na white stove top oven siting next to refrigerator.\nA snowboarder in winter gear riding a snowboard and a steep slope that is snow covered.\nA couple sheep on a steep grassy hill.\nAn outdoor image of a fence at a dog park with a fire hydrant\nA bear has just taken a dip in the water\na person riding a skate board jumping in the air\nA group of skiers trekking up a hillside in a snow storm.\nA meter on the street reads a time of zero.\nA kid is holding a controller on a coach\nA clock and its reflection placed near a sidewalk.\nOutdoor table set with wine and breads in the center.\nWhite cat sitting on sandy area near walkway.\nA man in suit and tie wearing a white beanie.\nAn aircraft that is inside of a building.\nAn ostrich in a zoo a long with three zebras.\nA little boy holding up a packaged electric toothbrush and smiling.\nMany people play sports in a grass field.\nThe banana in the car seat is aging and browning.\nA traffic light sitting on the side of a road.\nA close up image of a little girl getting her hair done.\nTowels stored under a bathroom sink with a glass countertop.\nA couple of men standing next to each other holding glasses.\nA person standing in front of a stove top in a kitchen.\nA person standing next to a building holding an open red umbrella.\nA cat with a peculiar look sitting on a bench.\nA man riding between two oxen as they travel through water.\nTwo black crows sit atop two tree branches\nA man laying alongside of a white toilet near a sink.\nA bedroom with an almost empty bookshelf and desk\nTwo people holding up cell phones with photos of a young man and woman.\nYoung lambs with adult in fenced grassy area.\nAn airplane flying in the sky during the day.\nA long-haired grey tabby cat resting on a sofa.\nThere is a long line of cars behind the rearview mirror.\nA lady wearing a white shirt trying to tie a tie.\nSkateboarder riding through the middle of park benches.\nA man with no shirt rides a skateboard over a ledge of a skateboard park.\nA soccer player in the midst of kicking a soccer ball.\nThe cool dog is riding on a motorcycle.\nTwo surfers walk onto the beach from the water.\nA man is sitting on a black couch with a cat.\nA soda can sitting next to a laptop and remote control.\nA man on skis on a snowy trail.\na giraffe looking over fence, at person walking away.\nthere is a young girl and her mother boarding a plane\nAn outside bathroom carved of wood with a toilet and sink.\nA bed covered in clutter and clothing with blankets.\nThe young person sits on bench seeing the tranquil lake\nAn open laptop computer sitting on top of a desk.\nAn unattended office containing several computers and a chair.\nSeveral people sitting around together eating and drinking at a venue.\nA group of people that are on a soccer field.\nA woman swinging a tennis racket on a court.\na person stands while holding on to a pole\nSkier with backpack down hill skiing in the sun\nA yellow and red train traveling down train tracks.\nA walk in shower sitting next to a white sink.\nThis is a cake  and a fork in laying in a plate.\nAn apple is being cut with a sharp knife.\nA motorbike parked on a road with a man.\nA snowboarder goes airborne over a snowy hill.\nPerson on skateboard in mid air with color lights above.\nchildren holding stuffed animals and a parent holding a baby\nA young girl climbing on a painted fire hydrant\nTwo people that are skiing together in the snow.\nPeople standing behind a clock in a clock tower filled with massive golden bells.\nFour persons are skating on the skate board on snow.\nA room of chairs and sofa with red stairs next to it.\nA very large semi truck on a wide road.\nSomeone taking a slefie with a large camera in a large mirror.\nA window looking out at a brick building\nA pile of luggage on top of a cart\nA cart filled with lots of luggage driving down a street.\nMan wearing riding gear sitting on parked motorcycle.\nA group of people who are skiing on a snowy hill.\nA man posing for the camera holding a skateboard.\na group of peeled oranges with purple flowers on top of them\na person is holding a baseball bat by a brick wall\nA grey cat sits on an office chair in a home office.\nUmbrellas litter a sandy beach next to a beautiful blue ocean.\nSeveral kites sit on the ground, with a few people in the background.\nA pink bicycle leaning against a fence near a river.\nA dock that is separating the harbor from the ocean.\nA yellow and silver train pulling away from a train station.\nA microwave or other small kitchen appliance is seen from behind.\nA piece of toast and grapefruit half is on a tray.\nA stack of four oranges on a table.\na person that is standing in a kitchen next to a icebox\nTwo elephants are walking through trees side by side.\nPasta with a mixture of different vegetables sitting on a plate.\nzebras and antelope graze on the planes next to shrubs\nGame pitching plungers into a toilet in a field.\nA male is skateboarding in an outdoor skate park near the ocean with many people standing nearby.\nA woman with purple hair taking a picture of herself in a mirror.\na man dressed as jesus holding a cell phone\nTwo brown dogs lying on a burgundy comforter.\nTwo people in a public bathroom painted red.\nA woman in a bra laying on a white surface.\nA meal of beef, broccoli, and mushrooms is eaten with chopsticks.\nSmall sailboats are sitting on the water all over the lake.\nA fire hydrant outside a shop with graffiti.\nA gentleman is walking through the boardwalk with his surfboard.\nA cat looking intently out of a window.\nA sliced chocolate desert covered in powered sugar\nA grey stripped cat on a table in a room with many books.\na tangering sitting on top of some bananas\nThe clock is located near the body of water.\nA man bending over scooping food into a pan.\nA park bench surrounded by a green forest of trees.\nA person that is holding a kite in his hand.\na airplane that is flying through the sky over some snow\nA cat looking out from a box designed like a bus.\nSome cars that are driving through an intersection.\nA dog catching a frisbee with a man in the background.\na bunch of sports items sit in the grass\nA city street with business signs on buildings\nTwo people watching a small jet on the tar mat of a airport.\nA poster behind a gate against a fire hydrant\nA couple of green street signs sitting above  a stop sign.\nA baseball player holding a bat on a baseball field.\nTwo men are playing ball with some elephants.\nA pizza in a pan sitting on top of a wooden table.\nA cowboy leads a cow through a paddock.\nA man twirling a yellow frisbee with his finger\nA wide eyed teddy bear with a scarf is sitting on checkered bedding.\nA man gets his picture taken at a ski resort\nOld fashioned kitchen featuring a two compartment sink.\nA group of elephants are walking away from water.\nA little kid with a uniform, glove and hat on during a baseball game.\nA bunch of oranges hanging from an orange tree.\nA woman sitting on a bench while talking on her phone.\na small couch overed with blankes and pinapple designed pillows\nA dog laying on its back on a made bed.\na black and white photo with two males on cellphones\nVarious types of apples and other fruits at a market\nSomeone getting food from plates with a bunch of different foods on them\na male in a red tie and some other people\nA plate with steak, vegetables, and rice being served.\nA bed is shown next to a stand and TV.\nA pair of pizzas sit on trays with ingredients on top\nJet airplane parked on a cement runway under a large white cloud.\nA small herd of cows with halters and bells tied to a cable fence.\nthere is a woman playing with a dog with a toy donut\nA white bed with black pillows and a patterned throw.\nA refrigerator door is open and full of condiments, food and drink.\na metallic suit case in front of a couch\nA black cow and a brown cow walk near a motorcycle on a village street.\nA baby sitting in a chair getting a haircut.\nThe police officer is observing the airplane in flight.\nLarge red bed in room with dresser and futon.\na close up of a young baseball player touching his cap\nfrontal view of airplane with cockpit facing on white airplane\na close up of two stuffed animals siting on a table\nTHERE ARE PEOLE SITTING IN A WAITNG ROOM\na man is cooking some food on a grill\nA car with a wheel lock on its wheel next to a parking meter.\nA woman sitting next to a child on a couch.\nA tall giraffe eating leaves from a tree\na man standing at the edge of a tennis court getting ready to serve\nA teddy bear sitting outside in a chair.\nBlack and white photograph of a man sitting at a bench.\nA kitten laying on a man's lap while a woman plays with a Wii controller.\nA young child smiling for a picture, she has a plate of cake in front of her.\nA man holding a tennis racquet on a  tennis court.\na teenager attempting a jump on his skateboard\nA man in blue shirt walking on street with building in the background.\nA table topped with a pizza surrounded by people.\nthese people are waiting for a train at a station\nA little baby that is sleeping on someone.\nA group of people gathered together, one holding up an umbrella.\nPeople are loading onto an old red, yellow, and green train.\nIndividual plates of sausage sushi with ketchup packets\nA dog follows a cyclist along parked cars.\nA lot of red apples are put in a box.\nA couple of giraffe standing under a tall umbrella.\nThe inside of a bathroom leading out to the hall way and a room across.\nA guy skateboarding on a big ramp somewhere.\nA middle aged lady is decorating a cupcake.\nA baseball player has just thrown a ball.\nA neat and clean  kitchen with cooking range,microwave.\nTwo city buses traveling down a rain covered road way.\nA fry pan with a mixture of vegetables in it.\nA lot of food that are growing on a tree.\nCity scene with parked buses and people walking on the sidewalk.\nA street shows several street lights and an empty intersection.\nA couch with clothes and items scattered allover\na two story bus on a busy urban street\nSeveral men are playing baseball on a baseball diamond.\nA train rides down the tracks near a hilly area.\nA black and white photo of a dog standing happily on a horse.\nA woman sits on top of a motorbike.\nThe dining room has four chairs at the table, and a hard wood floor.\nA kid is sitting on a skateboard with another kid behind them.\nThe people are having a group meal at the table.\na desk with a laptop, some speakers and a mouse on it\nA zebra standing next to a  group of three trees.\nTwo pedestrians underneath their umbrellas walk across an open plaza in a rainstorm.\nTwo brown horses pulling a black carriage and driver.\nA guy in a big grassy field flying a kite.\nA cat holding a toothbrush in its paw and chewing on it.\na man that is skiing down a snowy hill\nA fluffy quiche or pizza is loaded with vegetables on top.\nA brown teddy bear holding a glass vase in front of a grave.\nA big commercial plane parked by some vehicles.\nTwo urinals in a tiled bathroom with windows.\na man and a woman standing in the living room with her holding a remote\nA stop light tells motorists to go across the intersection\nA person is showing their feet near a book and headphones.\na couple of large planes are on a runway\na couple of chairs sit under a umbrella\nFour dogs are sitting together on the bed.\nA bunch of green bananas hangs from the ceiling of an outdoor structure.\nA young man that is standing by a big pile of luggage.\nA skier in an orange jacket looks out over a snowy valley.\nA skateboarder is balancing on the rim of a bowl.\nA three dimensional rendering of a woman sitting on a giraffe.\nA yellow cat sleeping on the hood of a black car parked in the garage.\na fire hydrant on a city side walk\nSeveral young soccer players playing soccer on a field.\na woman in a white top some lights and a cake\nTwo medium sized dogs sitting next to each other.\nA thin pizza is on a plate with a spatula under it.\nA piece of art hanging from a yellow wall in a living room.\nA woman and child sitting on the bed with an open book.\nA group of men playing frisbee on a field\nA dog lies down and waits on sand at a beach.\nThree vases of different sizes and shapes all holding pink flowers\nA dining room features both chairs and a bench.\nSeveral just baked cakes on top of a stove\ntwo plates some food and a fork knife and spoon\nA bed that is unmade next to some plants.\nA man standing next to a smile giraffe.\nTwo teddy bears sit on a rocking chair.\nA microwave oven on a mini fridge in a room.\nA city bus coming up at the corner and someone is waiting for it.\nA couple of zebra standing next to each other on a field.\nBrown cabinets and dual mirrors and sinks in a bathroom.\nAn ostrich watches as a giraffe leans over as it eats some bark from a tree.\nA pizza cook getting ready to cook some pizza in the oven.\nA tennis player makes a strong return during a match\nTeenage girls with skateboards at night in front of a restaurant.\nA jar of water with a flower inside.\nFour luggage bags are stacked close to each other.\nA bathroom with a large mirror above a white sink.\nAn old train is on the track near a small shed.\na man holding his cell phone to his ear\nTHERE IS A DOG THAT IS IN THE POOL WITH PEOPLE\nA bear is swimming in a cold river.\nan image of a man with other men on skiis\na desk with a laptop a monitor and a keyboard\nA bus parked outside with Asian characters on it.\na man getting ready to hit a tennis ball\nA clock tower with a statue in front of it.\nFive surf boards arranged in an arc on a grassy area.\nA double-decker bus with few passengers aboard drives down the road.\nA little boy against a wall while holding a tennis ball and tennis racket.\nA skier skiing past a tree at Snowbird ski resort.\nPeople walk on the sidewalk near the buildings.\nA snow boarder laying in the snow after a run\nGoats and geese standing near each other in howling pen.\nA man sitting on the bed watching tv\nA large green train covered in graffiti.\nA dog is seated in the living room watching tv\nA man has his hand around a zebra as they stare at each other.\nSome very pretty zebras grazing in the grass.\nMotion blur photograph of a busy city esplanade at night\nA bathroom with a toilet, sink and a window in it.\nA green bowl of corn and broccoli in a white stew with a spoon and a biscuit  next to it.\nA siamese cat laying on top of a white sink.\na person riding a large skate board on a street\nA flock of sheep standing in a grassy field looking at the camera.\nShot of a small bathroom with a bathtub and a toilet.\nA picture with no head but a suit and tie and flower\nA sink with dishes in it and lined by various bottles.\nThe umbrella's on the street are decorated with messages.\nThe woman is sitting alone on the bench reading a book.\nA man kneeling down on a baseball field pitching a baseball.\nAn adorable little girl holding a brown teddy bear next to a wooden table.\nthere is a pair of slightly rusted scissors in a rusted handle\nA guy sitting on a big bright purple bench with some headphones.\nA group of people standing on top of a snow covered field.\nA bedroom is bright with colorful accents in it.\nA large zebra and small zebra are standing by a tree.\nPedestrians with umbrellas cross a rainy street corner.\nA U-Haul truck with a driver sits in a grassy field.\nA red teddy bear sitting in a chair with potted plants all around.\nA dining room table with some beautiful plants sitting on top of it.\na man in a black jacket standing by a red and black motorcycle\nA woman talking on a cell phone and looking into the distance\nA man on a skateboard going over a black box at a skate park.\nA cake sitting on top of a plate with a knife in it.\nA wooden table with a remote control that reads \"control a woman.\"\nLarge public transportation bus stopping to let passengers on and off.\nClose up of white USAF fighter jets in a blue sky\nA bunch of vegetables sprinkled with pepper sitting beside each other\nPeople ridding elephants and one is holding a camera.\nThe side of a truck that has spray paint on it.\nA large shower head in a bathroom shower.\ntree are two woman standing in the rain under a pink umbrella\nA slice of  vegetable casserole on a plate.\nA person with a hat standing by a parking meter.\nThree motorcycles stop at an intersection at an oriental restaurant.\nTwo people with boards riding a ski lift.\nMotocross rider going around a bend on the track.\nA woman hitting a tennis ball on a tennis court.\nA gray and white kitten walking through a square hole.\nTwo people in a room with assorted luggage\nA dog is wearing a paper hat with a star.\nA large bear in a river with some rocks.\nA giraffe is posing close to the camera in its enclosure.\nA large elephant with a couple people on the top.\nA helicopter that is sitting with its back wheels on the ground.\nA male skier dressed in orange and black performing an airborne stunt\nThe contents of a back pack are spread out on the floor.\nBlack and white bags above people on a field.\na young man rides a horse down a paved pedestrian area in a town\nthe man is swinging the bat at the ball\nA sculpture of a man reading a newspaper sitting at a bench.\nA boy and girl riding bicycles with a small dog.\nFive people just got off that gray bus.\nA young male is riding his skateboard in his empty pool.\nA commuter train passing by a field of wild flowers.\nA skier cutting a turn on a slope.\nthe hitter prepares the to hit the pitch\nA suitcase that is packed to the brim with things.\nA park with trees, bushes, walkways and benches in front of a skyline of buildings.\nA stop sign on a piece of paper.\nA woman sits at a table in a wooden cabin next to a lamp\nA toy model train station with a train on some tracks.\nAn umbrella on a beach with a towel.\nThe bus has the lights on as it travels down the road.\nA group of people standing on a field under a cloudy blue sky.\nA man that is standing on a board in the water.\nA big crowded beach with some guys playing with a disc.\nThe room has a television and sports jerseys.\nThree buses in a row that are different colored.\nA mantle with several glass vases of flowers.\nA man holding a baby girl while seated in a cafe.\nA comics page from the paper lies on the floor of a bathroom stall.\nA plane preparing to take off on an overcast day.\nA cup full of toothbrushes and tooth paste.\nA bedroom with a bed, radiator and laptop.\nA man works on an old steam engine train.\nA large yellow school bus driving down a road through a park.\nFiltered photograph of a man jumping on a skateboard.\nTwo people next to a bench at a dock above the water.\nA couple of cats relaxing with each other on the bed.\nA mom and her kids ride together on an elephant.\nAn assortment of shaped kites flying in the sky.\nTwo horse drawn carriages traveling towards a big house.\nA bunch of big colorful kites flying high in the sky.\nA parked pick up truck with a flame design on the hood.\nJetliner with \"Saturn\" on the side flying over a body of water\nA woman wearing a net on her head holding a box in a kitchen.\nSome old guys in funny costumes on some fake horses.\nA Kingfisher plant parked at an airport with a food service truck in front.\nA Eastcote welcome sign in a suburban neighborhood\nA man standing next to a yellow and orange fire hydrant.\nA woman cuts a cake at the table with a red cloth.\nA smiling man with a goatee sits in the backseat of a vehicle surrounded by luggage.\nPlayers at center court with camera man during tennis match.\nTwo large white sheep standing on a lush green field.\na stop that has been defaced with graffiti\nA black and red train engine next to train station.\nA cup with a straw in front of a laptop.\nThe baseball player is sliding into the base as another player is blocking it.\nA bus sitting parked next to a building with people in it.\na plane flying high in the sky on a cloudy day\nA very shaggy ram and a smaller lamb in the grass\nTwo men that are shaking hands behind a table.\nA very large commercial air plane on the tarmac.\nA man riding a motorcycle driving through a mountain side.\nSome kids are outdoors playing baseball during the day.\na single giraffe stands tall in field of bright green grass\nA group of four giraffes standing next to each other.\nBottles, cans, and foodstuffs within a wall's recess\ns close up of two dogs eating cake off of a table\nA man is talking on a phone while standing in the street.\nA white metal piece of artwork in the city.\nSomeone is riding a white horse with a grey mane.\nA skateboarder heads down a decorated ramp against a panorama that includes an overcast sky, a line of trees and a field of snow dotted with people in winter clothing.\nA man in a large room with baskets and pottery\nThere is a red car being towed on a truck\nA woman holds a string in her hand on a beach.\nA white table with umbrella and two chairs on a deck near a railing.\nTwo people cycling on a road as others walk by\nA giraffe standing next to several tree branches.\nA woman in a seat is on her laptop.\nTwo women in bathing suits next to a cat with planes flying across\nA man in sunglasses holding a sub sandwich\nA close up of a bowl of vegetables containing broccoli and carrots.\nAn open door shows a small bathroom space with a toilet and a shower while a sink sits near the open door.\nA child hugging a stuffed animal while surrounded by stuffed animals.\nA plane with stairs next to it sitting in a large lot.\na close up of a traffic light on a city street\nSome guys in a dark room playing a game on a big  TV.\nA cat laying on top of tie dyed pillow.\nA variety of healthy foods arranged on a table top.\nA room with a bed, fan and a dining table and chairs.\na close up of a cat sitting on a pillar\nThere are flowered vases and framed pictures set against a wall with balloons hanging above it.\nThere are several modern lavatories in the rest room.\nTraffic light on a long yellow pole in front of apartment balconies.\nA man in a ski suit sitting in the snow with a snowboard.\nSeveral employees are standing behind the bar of a restaurant.\nA car turns the corner of an intersection in the rain.\nA red toy train stopped on tracks near toy figurines.\nempty train cars sit in a snow-covered deserted train lot\nThree bears stand together near a fence.\nA woman that is standing up with a doughnut.\nBikers and pedestrians populate a street featuring many shops and stands.\nLong billed bird standing in green weeded area of fodder.\nA pair of surfers carry their boards along the shore.\nThe person rides in a yellow motorboat with a dog.\nA boy in a blue shirt catching a frisbee.\nA plate full of meat and broccoli on top of a table\nA single zebra walking by some water in the dirt.\nA lush green field topped with lots of vases.\nA pizza that is sitting on a plate.\nMen playing recreational basketball on a hot day\nTwo young men and a dog standing on a snowy road.\nTwo people are playing Wii games in the living room.\nCouple standing in snow on skis posing for the camera\nThe clock has beautiful gold detials on the face.\nA man holding a kite string as a woman releases the kite.\nA birthday cake has an airplane on it.\na labrador retriever bring a frisbee back for his owner\nA meat sandwich on a bun with a side of Brussels sprouts.\nA zebra and a giraffe foraging together by some trees.\nThe red and white train is relatively short in length.\nA person laying down with a book in one hand and a cell phone in another.\na person in a costume standing talking on a cell phone\nA tall building sitting next to a bunch of trees.\nBathroom sinks and a mirror lit by sunlight coming through a small window.\nThere is a giraffe that is looking at something\nA woman standing on top of a green field next to two men.\nA male tennis player on a court with a racket and ball.\nAdult elephant standing near a multi-wired electric fence.\nDifferent markings sitting on a bag on the floor.\nA pair of giraffe are walking in a field in Africa.\nA group of three zebra standing next to each other.\nThree vases that are red with flowers on them are on display.\nThe home office features several important business tools.\nA grey tiger cat staring at himself in the mirror.\nan overview of a marketplace sale with child toys\nSeveral surf boarders at a city wave pool.\na man lays down on a surf board as he paddles through the water\ntwo teddy bears sitting on a chair and wearing costumes\nA small single sink in a home bathroom cluttered with items.\nThis painting shows a perplexed fellow staring at a laptop computer.\na small bathroom with a sink and a toilet the toilet lid is raised.\nA modern living room in a cabin with food.\nA bunch of horses are walking two by two down a road in a city with a few riders.\nA little league batter await a pitch at home plate.\nAn up close shot of a woman wearing a badge on a lanyard opening a banana.\nA couple of people riding skis down a snow covered slope.\na plate of meat and bananas on a table\nVarious different animals that are standing in the grass.\nA dog laying on the floor chewing a toy while a man laying on a couch watches.\nA father and a daughter flying a kite in a park.\nA train on the tracks blowing smoke out of the engine.\nA group of ninjas wearing all black hold up small white fans.\nTwo people in orange jackets smile as they ski up a road.\nA Michael Jackson birthday set is shown in gems\nA woman is standing outside in the snow holding a snowboard.\nTwo zebras cross a dirt road outside a village.\nA banana laying next to a plastic container with lid.\nA small living room area with black furniture and curtains.\na kitchen with brown cabinets and a big door\nA guy with a cast does some flips with a skateboard\nA giraffe towers over thorny treetops in the day.\nA little boy that is holding an umbrella.\na couple of people play a game of wii\nA living room scene complete with two couches.\nA family plays with a Frisbee on cobblestones near the water.\na dog in a field with a frisbee in its mouth\na polar bear standing next to a cliff\nA snowboarder gets some big air off a ramp.\nAn airplane sits alone on an empty tarmac.\nA small family of Giraffes are together near a couple of trees.\n2 professional tennis players competing in a game of tennis\nA herd of sheep standing in a muddy pen with a chicken.\nA slice of cake with a single birthday candle sits on a plate.\nA bird is jumping off of a branch.\nA teddy bear sitting on the ground next to a garbage container.\na bench that is outside in the woods\nA lady is sitting in a restaurant eating while holding a jar of peanut butter containing a comb.\nA very fancy wooden mantle clock with ornate design.\nA large white boat floating on top of the ocean.\nA dog sitting at a picnic table peeking out from behind someone's legs.\nA DOG QUIETLY SLEEPING IN HIS BED ENJOYING THE SUN.\ntwo males are playing a video game and chairs\nBlack and white photograph of a bowl of apples.\nMan in a black jacket snowboarding down a hill.\nThree horse grazing on grass near a street sign.\nA person in a ball cap and holding a Frisbee with a dog.\nA bunch of bananas on a banana tree.\na big man running to hit a tennis ball\na light colored bear in a grassy field\nA base ball game in progress behind a fenced in park.\nThis is a picture of a kitchen that is also used as an office\nA man riding a skateboard through orange cones.\nA table covered with arts and craft supplies.\nA man riding a skateboard on the side of a rail.\nA very cute old looking fire hydrant on the curb.\nA stop sign is standing in front of a palm tree.\nA man plays a video game as a woman sleeps nearby.\nA group of people standing in the middle of a walkway.\nThe zebra is walking through the short green grass.\nFour cows are grazing on the short green grass.\na person jumping in the air with a skateboard\nThe mounted officers ride near buildings with flags on them.\nA pineapple, orange, and bananas sit on a plate in a kitchen.\nA city street has diners eating on outside tables.\nA chair and a couple of pieces of furniture in a room that had been burned.\nTHERE ARE CHRISTMAS DECROATION ALL OVER THE PLACE\nA mirror sits on the side of the tracks of a subway.\nA Japan Airlines plane waits at the gate while it is towed in.\nA person holding a wine glass with a dark beverage in it, in front of a television that has a cartoon on it.\nA steer is walking through the grass with large horns.\nA young man in a sweat shirt is standing on a wooden walkway.\nA vase with flowers on the table\nTwo men hold a kite together outside surrounded by chairs.\nSeveral men looking at phone in one's hand.\nA piece of pizza sitting on a plate.\nA polar bear keeping cool in the water.\na table that has all kinds of plates of food on it\nTwo glasses vases are next to each other with flowers in them.\nA snowboarder is in midair preparing to land.\nA street sign, with two signs on it.\nA young child that is sitting in front of a birthday cake.\nA bowl filled with oranges on top of a wooden surface.\nA kitchen scene looking toward the living room in the background.\nA very pretty dog laying on a person on a couch.\nA white tub sitting next to a sink and a toilet.\na woman is hitting a tennis ball across the tennis court\nA bathroom with white vanity, toilet and tub and open frosted windows.\nA baseball player takes a swing at a low ball.\nA Jeep towing a boat out of a body of water.\nA couple of men standing on top of a soccer field.\nA person holding an electric tooth brush next to a cat sleeping on a bed.\nA vegetable pizza on the edge of a table\nAn older woman preparing cookies and bread at a table.\na photo of a man wearing a tie with a tv monitor in front of him\nAn umbrella is tied to a bike on a rainy day.\nA sign warning drivers to slow down because of the presence of children.\nTired dog rests on top of a teddy bear.\nA bike parked in front of a red brick building.\nThree people walking toward a small airplane on a tarmac.\nAirplane with smoke coming out flying through blue skies.\ntwo women out in the snow with their skiis\nA tray of food in foil and a fork.\nCross country skiers are engaged in a race.\nA basket filled with food and a cup of salsa.\nGroup of cars parked in front of a large building.\nSeveral signs posted on a metal pole near a pharmacy.\nSmall boy in yellow shirt holding onto a white frisbee.\nA person on a surfboard in the water.\nA large bird is flying over a beach.\nA black cat with crazy eyes wearing a bib.\nA man with a suitcase walking in the road.\nThree giraffes standing in a zoo enclosure with trees.\nA group of people on a field playing baseball.\nThe side of a stainless steel vehicle with large wheels.\nAn adult in a wetsuit surfs a small wave.\nA beach with people flying their kites in the sky.\nA zebra walks by an alligator near a watering hole.\nA kitchen area with a stove, sink and dishwasher.\nA man sitting down holding a brown dog wearing a blue tie.\nA suitcase sitting next to the subway rail.\na man taking a nap at the end of a bench\nThe bedroom with the bedspread is dimly lit.\nA woman in white shirt climbing onto an elephant.\nTwo women in skis standing by a sign and trees.\nan image of a child that is playing tennis on the court\nA small air craft is heading in for landing.\nA large black bear standing next to a stone cave.\nA boy is sitting in front of a laptop.\nA woman kneeling down next to a fire hydrant with cans of paint.\nA father helping his child brush his teeth.\nA photograph of a thing in the picture.\nA man is standing under an umbrella next to a tent containing clothes for sale.\nTwo small children in green shirts on a baseball field.\nPeople walk in a narrow alley way while clutching umbrellas.\nA horse has a harness on its face.\nA dog that just caught a frisbee.\nA cat is laying on a laptop on a coach\nA red fire hydrant next to the curb with parking meters in the back ground.\nA kid in a car hiding from a zebra that is poking it's head in the window\nA man swinging a baseball bat in front of a man with a glove on.\nThe adult black bear is inside of a pool of water.\nThe two green military vehicle are parked in the field\nA man sitting next to a large pile of luggage.\nthree women stand by an elevator with their luggage\nThe two teens are on the sand dune, racing  to catch the frisbee.\nA road bike rests against a park bench.\nA living room filled with furniture and a wooden book shelf filled with books.\nA man jumps his skateboard over a fire hydrant\nMan serves tennis ball at high speed while other watches.\nA toilet that is on the ground near a trash bin.\nA poster that indicates the letter S stands for sandwich.\nAn opened stick of butter sitting near some scissors\na street pole with a sign on top of it\nA woman with a child in a carrier standing in front of a giraffe exhibit.\nOutside view of white horse in the window\nTwo shots of a woman swinging at a tennis ball.\nA bed above a desk with a computer\nA half-eaten pizza sits in an open takeaway box.\nA colorful dish of several fruits and vegetables\nA sports motorcycle is parked on a gravel road by a river.\nA very large orange cat lying on the roof of a vehicle.\nA small very messy rest room with many books.\nA woman throws a frisbee into the goal in frisbee golf.\nA very cute bright red fire hydrant by some bushes.\nA Mack truck parked in a parking lot.\nFruit, grain and vegetables have been putted in separate bows.\nA giraffe walking through a jungle next to a large tree.\nMan looks at another man that is holding a Wii controller in his hands.\nA bowl of vegetables containing carrots sitting on the stove.\ntwo long lines of boys paddle a canoe\nA lone elephant walking through the desert grasses.\nA woman sitting at a table cutting a princess cake.\nA man sitting on a high chair on a tennis court.\nThis person is riding their horse near the water.\nA woman holding the head of a horse wearing a bridle.\nStreet signs on lamp post in large city.\nChefs working in a kitchen at a restaurant.\nA man and woman posing with tennis rackets\nThe man talking on a cell phone has glasses on his head.\nThe red bus is driving down the street.\nThe pizza is on the dish and ready to be eaten.\nSmall boy in dress clothing sitting down on a white bench.\nA teenage girl with black hair and black makeup wearing kandi bracelets on her hand and holding up a sandwich.\nPeople are on the beach with water fun equipment.\nA man with his arm around a woman in front of several skiers.\nA person laying on top of a bed next to a white dog.\nA rock wall extends out from a stone building and tower.\nA pair of scissors sitting on a plastic chair in an office.\na white plate with eggs ketchup and a fork and a cup\na chocolate doughnut on a saucer, coffee in cup.\nThe young woman is selling many types of cupcakes.\nThree adults on the beach fly a very odd kite.\nA pastry is lying on a blanket on grass.\nA white airplane is on a asphalt lot as the sky is covered with clouds.\nA single giraffe looking into the camera on the plain.\nA view of a mountain range from an airplane.\nA family holding ski's posing for a picture on a mountain.\nA man is on the beach playing with a frisbee.\nthere is a woman that is standing in the snow with her skies\nA person loading a bite of cake onto a fork.\nA stop sign by a cross roads on the roads.\nA family is in a living room playing the Wii.\nTwo  large elephants laying down in the dirt.\nAll the items that are going to be packed for a trip.\nthere are many lights that are on in all of the buildings\nCrowd of people with backpacks line up on the runway to enter the plane\nA herd of giraffe walking across a field.\na vase with bright flowers sitting next to a man usiing a platform\ntwo hotdogs topped with a dill pickle tomatoes and tofu\na man is holding up a box of doughnuts\nsome people standing around by a table and chairs\na desktop computer monitor with a keyboard and mouse\nThe horse is in the water with a man.\nA close-up of the dirt in a garden with a small umbrella in the ground.\nA toilet seat with a picture of a dolphin on it.\nA horse looking over a fence on a snowy day\na round window overlooking a parking lot filled with cars\ntree is a man holding a small red guitar\nA group of people playing a game of frisbee on a beach.\na number of zebras near one another on a dirt ground\nA plate of food with mushrooms, beans, sausage and two kinds of meat on it.\nA woman about to enjoy a good lunch of a sub.\nA small park with benches and buildings in the back round.\nView of down town in a city and traffic driving on the opposite side of the road.\nA white bus driving down a street past a semi tall building.\nHot dogs are being cooked next to bins of toppings.\nA herd of giraffe standing around a pile of rocks.\nlemons and limes in baskets in the produce section\nA man in a white outfit, holding a tennis racquet.\nTwo zebras are walking in front of some trees.\nseveral multicolored scarves hanging on a display case.\nA beautiful woman holding a brown dog in her arms near a refrigerator.\nA bedroom with a bedspread and a window.\na white and brown cat is laying on top of a keyboard\nA person helping another person fix their skis.\nthere is a man with a pink shirt holding two surf boards\nA lot of cows are walking on a field.\nThree zebras that are standing in the grass.\nA young person riding a skateboard at a skate park.\nCollection of vintage motorcycles sitting on display at a museum.\nA group of cows standing on a road with a vehicle looking on.\nTwo elephants walk along the bank of a river.\nA pine tree branch in a vase decorated with a dove and colorful star.\nA person riding their bicycle in the rain.\nA dog sitting in front of a open book.\nscones sitting on a plate at a cafe\nA train travel at high speed with buildings reflected in the windows.\nThe man is ready to throw the frisbee.\nA man and a woman standing their surf boards next to each other at the beach.\nA small Christmas teddy bear is hanging on a tree.\nan image of a bedroom bed with a bookshelf in the background\nFive dessert samples, on clear glass plates, are displayed on a wood spoke wheel.\nA steamer filled with different types of vegetables.\nA fan sitting in the middle of a room next to a sink.\nWoman looking at cell phone while outside in the bright light.\nA man on a surf board riding a big wave.\na person walking with a cow in a parking lot\nA cordless land line phone is all lit up.\nA tractor and a herd of cows in a farming field.\nA woman and two young girls are blowing out a candle.\na person standing in a living room playing nintendo wii\nA burnt pizza covered in cheese and toppings.\nThis woman is playing tennis on a court.\nStreet signs at the intersection of Partridge Way and Pear Tree Lane.\nA room with chairs and a clock and a floor.\nWindow display of a suit and sewing machine.\nthere is a sign that has whoa on it and there as a truck behind it\nA brown and black cat underneath an umbrella.\nA man standing on a tennis court holding a racquet.\nMan on large open area covered with snow.\nA very long large train at a station.\na couple of buildings surrounding a pond with boats\nA man holding a tennis racquet in his right hand.\na sprinkled piece of cake on a pink polka dot plate\nthere is a man on the beach flying a kite\nA van parked on a road side, covered in snow, ice and sleet.\na woman holding a pole skying on the snow\nA suitcase has been re purposed into a charming bench seat.\nA large dog sleeps in front of a tv.\nA vintage image of a lady holding a baseball bat.\nA row of motorcycles parked next to each other.\nA man sitting at an office desk utilizing a computer.\nThree people in suits posing outside of a bus\nA view of bathroom with a sink, toilet, tub , and mirror.\nBaseball player wearing protective hat with a bat warming up before his turn.\nA skier in green snow pants recovers from a fall\nPassenger train at stop waiting for consumers to load\nA woman in a swimsuit with a racket in her hands on a tennis court.\na dining room table that is in a room\nA clock tower on the side of a brick building\nThere is a cross country skier wearing full gear\nAn elephant,fanning his ears is standing on the ground.\nA plate of food with meat and other vegetables.\nA woman surfing a wave on her surfboard.\nA lady walking down the street with a red umbrella.\nA young boy is standing on a skateboard.\nPizza, orange juice, and red wine sit on the table.\nA white kitten is sitting on a laptop computer.\nThree men are sitting on the couch, one is on the laptop.\nTwo giraffes eating together from a feeding station.\nA batter standing at home plate has just swung at the ball.\na cabinet with a coffee pot, toaster radio and microwve\nA lamp sitting next to a red vase filled with flowers.\na skateboarder with white tennis shoes is doing a trick\nA chocolate style cake with candles on it by a cutting knife.\nA woman brushes her teeth and looks at the camera.\nA building with a stop sign next to it with a man on a horse.\nA woman and child are about to cut a cake\nA red stop sign sitting under two street signs.\nA man sitting in field next to a herd of cows.\nA bed with an orange headboard, a green pillow, 3 regular pillows and the bedspread turned down.\nThere are two people watching another one play tennis.\nA pan with carrots, apples, meat, and potatoes.\nA group of cars that are parked on a beach.\nA sandwich sitting on top of a white plate.\nA person standing on a sandy beach next to the ocean.\nMen are standing together outside of an old train.\nA man is flying kite in the park.\nYoung couple cutting white cake at indoor celebration.\nGiraffes walking around outside in a wildlife park.\na toilet sits inside of a cramped bathroom\nTwo people on hard ground throwing a frisbee.\nA passenger sign on the tracks at a station.\nA young man riding a skateboard through a  puddle of water.\nA group of people enjoying a cake and pizza.\nA herd of zebra standing on top of a lush grass covered field.\nTwo skiers are going cross country in opposite directions, one taking the high road and other the low road.\nA toy chicken standing beside a flower vase.\nCat sitting on top of a chair near door.\na mixture of vegetables including broccoli and squash\na little kid that is standing next to a suitcase\nTwo pizzas being placed on top of a column of plates with an employee checking the pizza on a stone stove.\nA man petting a cat that's sitting on a kitchen counter.\nMan posing in front of a pair of giraffes in background.\nPlated lunch with condiments and utensils on dark table.\nAn old cellphone stand next to a mug and a statue of Jesus.\na tie on a pole outdoors in a field of grass\nThe meal is prepared and ready to be eaten.\nThere is a clock on the side of a building\nA family posing on skis with a young child in the snow.\nThis is a portrait of a bench next to the ocean.\nTwo sheep standing next to each other in the snow.\nThree people in uniform cutting a cake with others watching.\nThree bikers in a busy street riding in front of a bus.\nA woman bundled up in the snow skiing.\nA person in a purple shirt standing on a couch playing wii\nA silver train traveling down train tracks next to two men.\nA chick is siting on the edge of a bathtub.\nThe man has just thrown the frisbee in the air.\na group of zebras grazing on dry grass in a large field.\nA man returning a tennis ball in a tennis game.\njockeys riding horses in a fast horse race\nA seaboard soars majestically over the green-blue ocean.\nTwo men holding hands while holding a snowboard\na man standing on top of two horses\na woman wearing a wig holding a tennis racket\nThree donuts are on paper next to a coffee cup.\nA man skiing is doing a rail grind.\nMale surfer in wet suit, just thrown off surfboard at the peak of a wave.\nRed Oral B toothbrush in a blue cup.\nA pitcher, batter, umpire, and other baseball players on the field\nTwo menus sit atop some colorful decorations next to a green box with lights on it in front of a restaurant.\nthree people sitting on a motorcycle in a street\nLittle girl holding up a sheet of uncooked rolls by oven.\ntwo elephants in a encloseur at a zoo\nA colorful chain with a note attached is wrapped around a parking meter's post.\nThe child is jumping on the beach above a body board.\nStreet signs on the corner of Fillmore and Filbert\nBaseball players take various poses as a ball floats above the pitcher's mound\nA sink and toilet in a bathroom being remodeled.\ntwo ripe fruits on the floor ready to be eaten\nA woman is playing tennis on a hard green surface.\nYoung boys playing soccer trying to kick the ball.\nTwo giraffes stare at a crane from behind a fence.\na bunch of cupcakes stacked up on trays\nA person reaches out to pet a pony.\na close up of a cat sitting at a table\nA cat's head sticking out of a leather bag.\nA glove laying on a stuffed animal in the grass\nA woman is walking through the park texting on her cell phone.\nA table has potatoes, carrots, onions and broccoli.\nA group of lambs standing in a grassy field.\nA boxed lunch with a sandwich, veggies, fruit, pickles, and a dessert.\nA man sitting at a table at a diner with a basket of food in front of him.\nA microwave oven with a plate of nachos inside of it.\nCat sitting on top of a person's computer.\nA women in mid swing hitting a tennis ball.\nTwo plates of food in front of two dogs.\nA baseball player hitting a baseball with a bat.\nA cluttered desk filled with monitors and various items.\nA clock on an outside information board with snow all around it\nFour pieces of a television remote disassembled or taken apart.\nA couple of police officers in the middle of a street.\nA white paper topped with square slices of pizza.\nA spoon is resting in a bowl of cooked noodles and vegetables.\nThree guys are in the kitchen together preparing some type of meal.\nA device fashioned to look like a yellow car sits atop the desk blotter.\na bowl sitting on a table with flowers inside of it\nA person riding a wave on top of a surfboard.\nA black cat with a conspicuous look on its face in a bag.\nTwo hot dogs in cardboard plate one with pickle and the other with cheese.\nA reproduction steam train waiting at the station\nThe man watches the little boy on the surf board.\na black and white photo of children siting posing for a photo\nA little boy reading his book on top of a toilet.\nA clock sitting next to a brick sign under palm trees.\nA person on snow skis is pulling a rope that is attached to something heavy.\nA man and woman toasting with martinis with olives.\na public transit bus in a field with a sky background\nAt least nine giraffes live in the enclosure.\nFour boys with skateboard relax by an iron fence.\nthere are many people gathered here in the snow\nSeveral people interacting in a spacious living room.\nA blue and aqua colored train and people on the platform.\nA group of brown horses standing on a snow covered ground.\nThe clock tower stands tall and reads almost five-o-clock.\nA toilet, shower, and sink in a bathroom.\nA young man on a skateboard near a half pipe\nA picture of a modern looking kitchen area\nA row of parked jetliners sitting on top of a dirt field.\nA vehicle pulls up next to a building.\nA train on the tracks under a walkway from one building to the next\nA baseball player holding a bat during a game.\nan orange caution sign stating fresh oil in the street\nA lady is playing tennis game in a tennis court.\nTwo surfers carrying their surfboards in the sand at the ocean\nA Macbook sitting near a clock and a lamp on a desk.\nA couple of men in skies on a snowy slope\nA little girl holding a baseball bat on a field.\nA dog and a cat laying on some platforms.\nA person holding a cellphone that is opened upright on a table.\nA man in a suit standing in front of bookshelves.\nThe woman is playing tennis on the court.\nA fire place sitting below a brick and plaster mantel.\nA group of men in hats next to planes on a runway.\na lady on a horse and people taking a photo\nA person is watching animals in the wild with a camera.\nA stop light is shown over a road.\nA train is going down the track under a bridge.\nHere is a compact kitchen that uses it's limited space well.\nan image of a group of people outside for an event\nA subway train is parked at the station\nA jet plane flying through clear blue skies.\nA table topped with food and a remote control.\nA clock tower with elaborate details decorating it.\nPeople riding motorcycles along a street with a lady riding on the back of one giving the peace signal.\na plate of food with a banana and a sanwich\nA large long train on a steel track.\nA man hitting a tennis ball with a tennis racket at the tennis courts.\nSeveral people walk up a slope as others are coming down at an intersect.\na man and woman are sitting on the back of an elephant\nA refrigerator sits in a temporary spot in front of a doorway.\nA herd of giraffes and two zebras are grazing in a field near a fence.\na bus stop with a white bus picking up lots of people\nSomething outside the window has captured the dogs attention.\nThe woman sitting in a red chair is smiling while holding a cell phone.\na person riding skis on a body of water tethered to a boat\na person riding a surf board on a wave\nA man holding a device and a coke bottle in a clearing in a wood.\nA large jet sits at the gate at the airport.\na close up of food on a plate on a table\nA person hitting a tennis ball with a racquet.\nFour people standing on balcony and a parking meter\nA white toielt with a standing rail in front of it for support\nThree backpacks loaded with a variety of stuff sitting on a tile floor.\na bright yellow 'watch for rocks' sign in front of the blue sky.\nPeople are riding bicycles and walking across an intersection.\nTwo men jumping in the air across sand to catch a frisbee.\nScissors and material being made into small purse\nPeople in a stadium watching some men play baseball.\nA man has his hand up to his ear as he walks past a bridge.\nThree people are cutting into a yellow dinosaur cake.\nsome fireworks in the air above a clock tower\nA red stop sign sitting on top of a yellow gate.\nA man with black suits next to a surfboard\nA person with dark hair throws a frisbee.\na bike that is parked next to a brick wall\nYellow fire hydrant in between two blue posts.\nA row of boats on a beach with a dog near the boats.\nA blue and white train pulling up to the train station.\na woman eating out of a small bowl next to a computer\nA woman with her hand on a blender on a bicycle\nYoung women playing a game of softball in the hot sun.\nAn old photo of a group drinking in a restaurant.\nSome sushi rolls, apples and vegetables are in lunch containers.\nA boy holding a Frisbee on the beach.\nTwo small children hiding their faces behind umbrellas.\na boy wearing shorts and tennis shoes riding a skate board\nA man throws Frisbees in to the dark\ncolorful umbrellas and chairs in the sand on a beach\nBirds sitting on wires are silhouetted against the yellow sky.\nA desk with a computer, office items, and CDs on it.\na custom motor bike is parked on some gravel\nA street sign in grass with building in the background.\nA man in a shop working on some motorcycles.\nAnimals walk around a grassy area together.\nPeople walking on a snowy road in a village\na person sitting on a motorcycle on a city street\nA man in a hat and sunglasses eating a banana.\nA pastry of sliced banana on a white plate.\nA plate with a sandwich and fries on a table.\nA very tan man driving a wooden boat on the open water.\nA close up of a woman smiling while looking at her cell phone.\nA sign that says stop under a red light.\nA bay view with a city in the far distance.\nA herd of horses in a rocky field.\nA woman wearing red with a red purse while holding her cell phone.\nA black and white picture of a man wearing a turban walking down a street.\nSeveral vehicles providing ground transportation are shown in the photo streetcar, tourbus, classic car and family cars\nA white fence in front of a house next to a yellow fire hydrant.\nA clock tower is on the side of a building.\nThe sign on the pole says Wall Street.\nTwo skiers sitting on top of a snowy mountain.\nThe bird is an owl flying low above the grass.\nA dog and a little girl riding a tricycle.\nA semitrailer truck as seen in its outer rear view mirror\nA skier taking a leap off a pile of snow.\nsingle guy on a skate board skating on a roof top\nAn underneath view on a beach umbrella with a table to the side, and some people in rows of chairs on the beach.\na laptop placed on a wooden table in a room\nA large crowd is watching a baseball game.\nA man riding a wave on top of a surfboard.\nChildren in a room with many beds\nA fire hydrant sitting on the side of  a road.\nA red and white airplane is on the runway.\nTwo people sitting at a table across from each other.\nA group of people at the beach flying kites\nthe man is holding on to a small boat craft in the water\nA man standing in a sport coat and looking down at his hands as a woman passes in front of him.\nA covered horse grazing on grass while being fenced in.\nPeople are sitting on the ground petting a cat.\nA train on some train tracks near trees\nA train passing by fields and greenery on a track.\nTwo zebras on top of a dirt terrain.\nA group of people standing in line to get on a red bus in the city.\na plate that has some food on it\nA surfer crouching in to a choppy wave\nA empty, set table in a modern style kitchen.\na table with some food and beverages on it\nTwo gentleman in suits smiling and posing for a picture.\nA young boy eating out of a can.\nA baseball player standing on a field holding a baseball bat.\na cargo train being led by an orange and black engine\nTwo shaggy white sheep together in a fence.\nA young boy dunking a basketball into a yellow hoop.\nA person with their pants down next to a smart phone.\nA couple of people sitting on top of a bench.\nA stop sign and several other road signs attached to metal posts.\nA produce section of a grocery store with a wide variety of fruits.\nThere is a gray cat sitting on top of a gray luggage\nTaco salad bowls full of taco salad and a salsa container.\nCat sitting on a bookcase intently watching out a window.\nA child's hands holding a fresh orange with a leaf and twig attached\nA could people stand around a food truck to get their dinner\nA cat is standing on a desk in front of a computer.\nTwo men standing on either side of a pink inflatable object.\nsome zebras are standing on a green hill and rocks\nA man holding food and smiling with a full plate of food on a table.\nA man standing along side of a truck trailer.\nA man with a superman custom under neath his clothes posing\nA man riding skis across a snow covered countryside.\na cat laying on the keyboard of a computer\nA large wall clock on a white wall.\nPair of kites flown on grassy area with several onlookers.\nA person on a racing motorcycle making a sharp right turn.\nA man holding a pizza above a table filled with bowls food.\nA man is sitting on the couch and watching TV while holding the channel selector in his hand and a black guitar is sitting in a corner.\na man and woman cut into a wedding cake\nA man playing tennis prepares to hit the ball.\nTwo women are dancing with video game remotes.\nbaseball player swinging metal bat at home plate.\nA microwave that is wrapped in plastic and is inside of a larger piece of furniture.\nA couple of people in the water with surfboards.\na strawberry pie with whip cream and strawberries on a green plate\nA couple of men on hot rod motorcycles parked in a lot.\nA small bathroom features a small sink, toilet and mirror.\nClose up of the over-used bristles of a tooth brush\nAn open white  box of assorted decorated doughnuts\nA little boy swinging at a pitch during a baseball game.\nA group of boats that are sitting in the water.\nTwo giraffes are in the enclosure surrounded by a group of people.\na dual screen computer on a desk in a room\nthere are three giraffes embracing in the wild\nA man with a hat in the air with a skateboard.\nA table topped with breakfast food and a cup of orange juice.\nA hand holding a mouse next to a laptop on a table.\nYoung girl with brown hair and a flowery blue hat in kitchen looking downward\na man on a surfboard in the water\nMany pieces of luggage sitting neatly beside one another.\nA ripe banana, a pear, an orange and a strawberry.\nAn art exhibit with two chairs and a blue vase.\nA flower pot that is sitting on top of a chair.\nThree older individuals with luggage, standing near a sidewalk.\nAn old red VW van sitting on the street\nTwo different slices of pizza on a plate.\nA microwave in a puddle with leaves scattered around it.\nTwo people run for the Frisbee in a local park\nA man kissing a woman's forehead while laying in bed together.\nA dog sitting with a woman looking soulful\nA man riding a wave on a surfboard.\nTwo children in blue shirts squatting under an umbrella.\nA monster size truck moving down a quiet city street.\na young man playing tennis on a sunny day\nA white cat sitting on top of a woman sitting on a couch.\nA multi colored train riding on the tracks\nan image of a cat that is playing with a pair of tennis shoes\nProfessional baseball player hold a bat and scratching his armpit.\na bunch of cars drive in different directions on two sides of a street in a city\nGiraffes huddled next to a tree in their natural environment.\nA passenger jet taxiing on the tarmac of an airport.\nA cow inside a brick building with people looking at it through the door way.\nSome baseball players sitting in a dugout watching a game\nSkateboarder jumping off his board on a concrete course.\nA chapel filled with benches, a book stand, and other accessories.\nA cute little dog sitting on top of luggage.\nA young boy playing with a toy oven with a fake plastic sink.\nA dog inside a pin wearing a hat.\nLaptop computer sitting on top of a table in a personal office.\nA boy in a grey sweater is holding a blue kite with a whale picture on it.\nA small dog sitting inside a red duffle bag next to a frisbee.\nA woman crossing the street in the rain.\nSmall white toilet sitting in a small corner next to a wall.\nA small family seated at a table in a pizza parlor about to enjoy a meal.\nA large grey elephant walking through the middle of an auditorium.\nA cow in a barn cage looking towards a camera.\nA boat with a long cabin sits in the water close to shore.\nA baseball player swinging at a ball with a catcher and referee behind him.\nA white bathroom sink with a crack and a mirror.\nA baby elephant walking into a pool of water.\nA bathroom with a sink and several towels on the counter\nA batter poses with a bat over his head.\nA grass umbrella and two chairs on a tropical beach.\nThis bathroom has a toilet, tissue roll, bathtub, and two towel racks.\na couple of men are standing on a snowy mountain\nWearing a red shirt, a surfer rides a wave on a white surfboard.\nA baseball game in action with a man at the plate with a bat.\nA toilet that is next to a bathtub.\nMen in army shorts on skate boards near ramp.\nA couple play tennis on the tennis court.\nA brown plush teddy bear holding a heart\nA small boy holding up a tennis racket\nA picture of a vegetable that is starting to grow.\nan open book laid on top of a bed\nA person that is on his computer on a table.\nOrange placed in bowl next wet marsh land\nA man swinging a tennis racquet at a clock.\nAn outdoor garden area with verdant plants and a tree.\nA purse has a cellphone located in a side pocket.\nA woman hitting a tennis ball on a court.\nSeveral skiers are standing on a snow covered area.\na couple of men ride on some horses as they race\nbusy city  showing a big blue moving truck with graffiti  on it next to a  white van.\nA group of young boys standing on a lush green field.\nBlack and white photograph of a busy city beach\nA laundry room in a dimly lit place.\nA woman in a yellow apron ties the top of a bag of popcorn in her concession stand.\nA red train engine sitting next to a tree.\nFour zebras stand in a meadow in the black and white photo.\na close up of a person holding a hot dog\nAn individual on a kayak riding through waves of water.\nA plant sits on top of a refrigerator in an empty room.\nA woman sitting next to a child on a large grey teddy bear.\nA elephant that is standing in the dirt.\nA sandwich on a white plate on a table.\nThis is a picture of two bowls in a restaurant.\nan ostrich walking sneakily towards a couple of zebra\na woman rides on a bike down a street\nA big pile of building material is placed on the floor in the wooden structure.\nA man in a business suit in an office building.\nA man that is on a bike next to a woman.\nA group of birds sitting on a horizontal pole.\nThis is someones couch in their living room in their home.\nA baseball player is bunting the ball at a game\nA piece of cake with a dollop of cream filling next to it.\nAn outdoor swimming pool has people in it.\nA picture of some people holding a sign.\nThe bathroom is white with the shower curtain open\nA television playing on a desk in a room with colorful art on the walls\nA man steering cattle in a water puddle.\nan image of a couple that are on the couch\nAn escalator with a guy standing a kayak next to him.\nA group of people sit next to each other on a bus.\nA child sitting at a table smiling with its eyes closed .\nA desk that is cluttered and has two laptop screens.\nA skateboarder reaches the top of a ramp.\nSeveral boats filled with goods sitting in the water.\nA table full go delicious meals, the closest being seasoned shrimp over broccoli.\nA living room with a covered couch and coffee table.\nMan on a snowboard going down a hill.\na young boy standing on a surfboard at an amusement park\nSunset scene with surfers coming out of the water\nA dog is staring out over a body of water.\na boy is looking at his cellphone in a bathroom\nAn adult teaching a small girl how to play tennis.\nA man holding tie devices in his hands while he looks at his laptop.\nA body of water filled with lots of boats.\nMany people are waiting with bags and possessions.\nI am unable to see the image above.\nA red bus is at a bus stop.\nA sheep standing on the side of a lush green grass covered hill.\nA bear climbing across limbs and fallen trees.\nMan in black uniform holding a soccer ball in front of a net.\nA bathroom with mirror, lights, sink and bath tub.\nA man standing in a kitchen holding a bottle of ketchup and a hot dog.\nA woman is reaching for the ball on the court.\nA green street sign near a palm tree in a city.\nA giraffe is looked at by many people on a balcony.\nA naked baby lays on a towel in a bathroom and chews on a toothbrush.\nA dog lying on a couch next to a computer.\nSeveral glazed doughnuts in a white box container.\nA fire hydrant that is sitting on the sidewalk.\nMan in black and white uniform swinging at a baseball.\nA living area with a television and various places to sit.\nthere is a police man riding a motorcycle on the street\nA red fire hydrant sitting beside a lake.\nTwo trains traveling along a snowy railroad track.\nA woman sitting at a table in front of a pizza.\nA bird is taking flight during the day.\nA fire hydrant is partially under a tree.\nA child on a snow board stands in the snow.\nPeople are riding on bikes on a road after it has rained.\na man riding a wave with a colorful surfboard\nThe zebras are eating grass in the field.\na child practicing his bating in a batting cage\nA dirty bathroom stall with white toilet and papers\nvarious pieces of pottery lining the shelves in a workshop\nA blender on display next to some small glasses.\nHe rides his motorcycle through a narrow alley.\nA black and white photo of a dormitory with several beds in rows.\nPerson in yellow shirt playing tennis on a court\nA man in red jersey standing on a pitchers mound.\nA man riding on top of a brown horse while wearing a hat.\nTwo giraffes are standing amongst a bunch of trees.\nThere is a woman that is riding a bike\nA large metal clock hanging with chains from a roof.\nA man pushing a surfboard with a small boy standing on it\nA very plain and dull bathroom that's in someone's house.\na man walks next to a giant bike piled high with garbage bags\nA man in white shirt riding a skateboard down a hill.\nA blue street sign sitting on the side of a road.\nThe young catcher in black is throwing a baseball.\na food dish containing red peppers, broccoli potato and chicken.\nthERE IS A CLOCK IN THE MIDDLE OF A LARGE TRAIN STATION\nA cat sleeping on top of a blue towel.\na squat down toilet with a door\nthere's a white building with gold trim and a clock\nA woman is raising her hands at her desk.\nA small child sits in front of a decorated cake.\na collage of photos with a child near a cake\nMale and female at a party celebrating in front of balloons.\nA woman and a man flying a kite against a city background.\nA white sailboat floating across the ocean over waves.\nA bus is parked on the corner beside a large stone building.\nA train loaded with cargo crossing a bridge\nA photo of a bed that has been made,\nA flock of birds floating in the ocean next to a cement wall.\nA man reaches under his leg to catch a frisbee.\nA bus driving on a brick street\na lady in a canoe with fruits and her personal items\nA display of apples and tomatoes in their own crate.\nadult and baby sheep walk across a field\nA train on the train tracks surrounded by greenery.\nA city street has a fire hydrant, trash bin, and parked vehicles.\na stove with a pot cooking tomatoes and another holding a strainer\nA green and white bus parked in front of a small building.\na person riding skis in the middle of a snowy street\na group of birds sitting on back of a bench\nA stop sign on the side of a street\nMultiple white cars passing next to train at a train station.\na city street with a car and traffic lights\nA group of people that are standing in front of a surfboard.\nA stern man is speaking in the center of a political rally.\nSome zebras are standing in the middle of a grassland.\nA woman with soccer ball playing with two boys next to a fence.\nAn airplane in a very bright blue sky.\nA man getting ready to hit a ball in baseball.\nan image of two zebras side by side\nA man laying stretched out on the back of a boat.\nA country pasture with cows, grass and trees.\nThe woman is laughing as she gets ready to eat the sandwich.\nA baseball player is in the outfield of a baseball field.\nBlack and white photograph of a man with an umbrella.\nTwo boys sitting next to each other holding stuffed animals.\nA plate topped with three donuts next to a cup of coffee.\nTwo men are talking to each other during a presentation\nA woman performing in an arena with her horse.\nA man is playing Frisbee with a group of other people.\nA woman standing with a cell phone in her hand.\nA plate of fruit with bananas oranges and other fruits.\nOld black and white image of a man starting an airplane propeller.\nwhite and green street signs at an intersection next to buildings\nTwo cows in a field are staring at a motorcycle\nA white faced clock with roman numerals surrounded by a painting.\nTwo men on a dirt path in a grassy field.\nSeveral different types of apples sit in white bins.\nA person is leading a horse with a saddle down a beach.\na small child holding a tennis racket with two hands\nA man surfing inside a half pipe wave.\nA giraffe walking in a grassy area with a tall bird.\nA boy and a girl play on the Wii gaming system.\na person taking a photo in a bath room mirror\nA kid in black glasses pretends riding a red motorcycle.\nSkateboarders are attempting tricks in a concrete skate park.\nA skateboarder is in the air as he performs a stunt.\nTWO OF THE SAME PICTURE OF A BLACK DOG BY A WOOD CHAIR\nTwo teddy bears sitting next to a plush hello kitty.\na male is wearing a white shirt and black jacket\nSculpture fashioned to look like a cat holding a pole.\nA hand made felt sloth with a button nose.\nA group of motorcycle races flying down a race track.\nA fridge in the kitchen of a house with blue walls\na lady on her bed with a laptop smiling\nAn old fire hydrant in the middle of the woods.\na man on a surfboard riding the top of a wave\na white counter top in a home kitchen\nA flowered plate of meat and vegetables on a flat surface.\nA brick patio with a bench and flower pots.\nMulti-colored miniature stuffed bears that appear to float at the ceiling.\na little girl is dressed in a uniform outside\nA man at the beach leaping in the air to catch a frisbee.\nA man is paddleboarding in the ocean on a cloudy day\nAn unkempt bed, with a pillow, a blanket, and a book on it.\na plate with a sandwich on it with a side of salad and ketchup\na person sitting on a couch with a cat\nA young boy getting ready to fly a kite with his father on the beach\nA slice of macaroni and cheese pizza on a plate.\nA table filled with several different camera's and people sitting around them.\nA kitchen and dining room area with a fireplace.\na couple of stuffed animals sits on a street corner\nA variety of food dishes are shown on display.\nA train station with an incoming or departing train.\nA girl sitting on a stone wall and eating.\nTwo women are on an advertisement on the side of a pink bus.\na young woman walking on a sidewalk next to a firehydrant\nA dog laying in a room near a television and dresser.\nBaby elephant standing in the grass beside a truck.\nA small, green bathroom with a sink and a toilet.\nA person is flying off of they're skateboard\na family in the living room playing with a wii video game\nA man sitting in a chair drinking something out of a cup.\nMan man setting up a network inside a business.\nTwo men are sitting at a world economic panel.\nFlowers are in a vase on top of a table under some pictures.\nWoman in white jacket holding a snowboard in the snow.\na couple of people that are playing a wii\nA bunch of ceramic containers that are on a shelf.\nA man posing for the camera on his skis\nA plate of food that includes broccoli and white dough balls.\nA model train countryside scene with a bridge and plants\nA makeshift bathroom is equipped with a foot landing and a tiny hole for eliminating.\nBathroom with destroyed walls, a sink and a mirrored cabinet.\nSmiling woman standing with luggage in front of her car\nA television sitting on top of a television stand.\nA group of sheep are being herded by a dog as people watch.\nCommode with unusual bowl displayed in bathroom stall.\na teddy bear wearing a red dress and shoes sitting in a chair\nA couple of men playing a game with remote controllers.\nA laptop with a phone sits on a desk.\nCorner kitchen with refrigerator and counter space next to table\nA street scene where a vendor is standing and some ladies are doing window shopping.\nThree double decker buses are parked outside of a building.\nTwo fire trucks in front of the station.\nA group of different mopeds sitting in the street.\nSix men standing on stairs in front of building with large columns.\nA white swan standing on a lake next to small waves.\nClocks on the face of a building below a steeple.\nA person with a toothbrush in their mouth with a baby.\nOne man leaning on a parking meter talking to another man.\nA man with takeout sitting on the floor watching television.\nWoman sitting on the bus with her dog next to her in other seat\nA female professional tennis player preparing to serve the ball\nA couple sitting together on a bench in a park near water.\nA city intersection with several street signs and instructional signs.\nA stop light and a home built chair on a brick floor\nseveral people i the water para sailing near the beach\nA train is pulling into the station beside waiting passengers.\na group of tennis players chatting with one another\nA red train traveling down a track driven by an engineer.\nTwo signal lights displaying the 'red' stop light.\nA bus that is sitting on the side of the street.\nA person in a room with a television and a fireplace.\nA close up view of an open laptop in a room.\na desk with a monitor and some remote controls\nFamily poses in front of their house with horses next to them.\nA man in a suit helps a smiling boy straighten his tie.\nGreen apples, lemons and oranges are in a sink.\nA view of a kitten sniffing a pair of high hill shoes.\nTwo people sitting on the back of a horse carriage.\nA woman sitting at a table painting brown vases.\nA person biking in a roller skating lane during sunset.\na little girl that has a big doughnut in hand\nA serving of meat covered with gravy and a side salad on plate with utensils.\nTwo cats that are looking at a camera.\nA grey and white cat watches a cup of tea brew.\nAn elephant peers through a wired fence as far as his tusks will let him.\nTwo men working in the back of a pickup truck.\nA woman holding a red umbrella in the rain.\nA commercial airplane with the door open and people walking in.\nBlack and red bird standing in front of a caged in area.\nA man with a piercing in his left ear smiling.\nSome cars are stopping at a stop light.\nA couple of women standing next to a couple of soldiers.\nA women sitting in front of several laptops looking at her cell phone.\nA tennis player poses, racket in his right hand, left arm behind him.\nA group of people riding horses in a line along a trail.\nThree horses are in a pen and they are blind folded.\nA red fire hydrant sitting in the middle of a green field.\nA hand holding a pair of scissors next to a chair.\nSome people that are hanging outside my car.\nA plate full of food with potatoes and cheese.\nan airplane is flying past a large city\nA cow standing in the grass with a tag in its ear.\nThe legs of a person resting on a train with a backpack nearby.\nThree people look at paper work in a hospital room.\nA hot dog sitting on top of a bun in a wrapper.\na close up of a plate of food with broccoli\nA blender with a mixture in it sitting on a counter.\nSeveral different kinds of donuts on a tray.\nA bunch of airplanes parked at the airport\nA person snowboards down a large snowy mountain.\nTwo men and two brown horses pulling a cart in barn\nA man plays tennis on a tennis court.\nA lonely zebra galloping through a wildlife enclosure.\nA little girl puts something into her mouth while looking at the camera\nA small very neat kitchen near a bedroom and another room.\nA man paddling a surfboard on a lake.\nA wooden bench on the side of a trail has a backpack left on it.\nA man drinks wine while another man chops vegetables.\nA bunch of street signs sitting on the side of streets covered in snow.\na train traveling on an elevated train track.\nA bridge and clock tower are lit at night.\nSomeone takes a photo as they stand in a bathroom, near the mirror\nA  train that is parked in front of a large cruise ship, with a blue crane next to it.\nA motorcycle stands in an exhibits beneath some roofing.\nMany flat bottomed boats on a swampy river.\nA computer station with monitor, keyboard and personal items.\nA bathroom with a double-sink and some mirrors.\nA baseball game in progress with the batter starting to run.\nAn industrial type bathroom with an open shower.\nA yellow dump truck that is near a building.\nA boy with a kite in his hands in a grassy field.\na woman standing outdoors with a cat on her shoulders\nA LOAF OF BREAD IS ON THE TOP COUNTER\nA black and white picture of dunes, two benches and a trash can\na group of men play soccer in a dirt area\nA skier skiing down a slope wearing a dark snow suit.\na male skateboarder in a white shirt doing a trick\nThe woman is talking on her cellphone while walking down the street.\nA young man riding a bike past a car while talking on a cell phone.\nA motorcycle sitting on top of a wooden book shelf.\nA man and a women who are running toward a Frisbee.\nA black dog running across a green field with a frisbee in it's mouth.\na dog that is sitting in front of a frizbee\nThe reflection of two men in the mirrors of a public restroom.\nA cake donut sitting on a plate at a bistro.\nA boy pouring some drink into a cup at a counter.\na group of giraffes sit inside of a caged area\na man is sitting in front of some food at a table\nA bird flying into the side mirror of a red vehicle.\nA skier is posing in front of the sunset.\nA large sheep grazes at a countryside farm.\na photo of a man over a table of food smiling at the camera\nA man kneeling down next to two large dogs.\nA landscape of some mountains with a plane flying above them.\nA pan filled with food sitting on a stove top.\nsome baseball players are playing a batter and catcher\nMen with suitcases at an airport ticket counter.\nMan in the motion of running and throwing a frisbee from his hand.\nMan with a yellow jacket riding a scooter.\nMan standing holding a remote control towards a component.\nA plate with a brownie and vanilla ice cream.\nA family gathers around a table with cake and beverages on a deck at night.\na number of people holding surf boards close to one another\na man is standing in front of a table\nAn old classic red truck is parked in front of bank as a man stands near the window and a woman stands in the background.\nTwo elephants walking near a pool of water and a forest.\nA woman carrying a cake with lit candles towards a young boy.\nA man making a phone call has no shirt on.\nsome white sheep are eating grass on a hill\nTwo girls posing for a picture with painted on neckties.\nA clock and two vases sitting on a small table.\nA woman with a stuffed animal on a train platform\na glass of wine a table with dishes of food\nA man grabs the back end of his snowboard as he soars off a jump.\nA crowd of people are standing in line.\nA cake cover is made to look like a wire birdcage.\nBack view of three men on a baseball field.\nA boy is hugging his stuffed animal toy\nA construction working holding a stop sign while standing in the street.\nTwo ladies are riding horses on the beach.\nA baseball field filled with players and an umpire.\nA white bowl filled with lots of ripe bananas.\nA man taking a swing at a tennis ball\nThere is a country styled kitchen with wood flooring and white walls.\na plane at the airport landing and people besides it\nAn office with a desk and chair with the door open.\nThree urinals in a restroom each urinal is at a different height to accommodate adults and children.\na man riding a snowboard into the air.\na person holds a horse that stands on some beach\nA man looking into a refrigerator door for ingredients.\nSomeone holding on to a dog collar while the dog has a frisbee in his mouth\na big window showing the reflection of a building across the street\nThe men are going to ride their bikes in the dirt.\nA white polar bear is laying his head on his paw.\nSandwich sitting on a plate next to a glass of juice.\nA living room with a Christmas tree beside two couches.\nA man in a courtyard reaches out to catch a Frisbee.\nA red and white stuffed animal with a tv remote in a bed.\nThe bathroom has a toilet, sink, and mirror in it.\na baseball player that is at home plate with a bat\nAn overhead view of a man sweeping the street by a sidewalk.\nA black and white dog looking out a window.\nA black cat with white paw laying in a hanging cat bed.\nA woman with long blonde hair wearing a men's neck tie.\nAt the birthday party there are plenty of snacks.\nA man jumps to catch a frisbee with two hands\nA dog is asleep on a white blanket.\nA group of giraffe standing next to each other in front of a building.\na black and white clock on a pole a building and a flag\nan old and nasty bathroom with a toilet and shelf\nA person lying on the ground posing with a snowboard.\na man and a woman along with a baby sit an watch a lap top\nA duck and elephant stuffed animal sitting next to each other.\nA cat is sitting on top of a toilet seat.\nA person holding a hot dog with yellow mustard and onions on it, at a sports stadium.\nThe back of an Apple iPhone with the front on the table.\nA woman holding a container while milking a cow.\nA bird sits on the thin branches above colorful leaves.\nA herd of elephants walking along side of a river.\nA man wearing red is skiing down a hill.\nA red train with cars traveling with a mountain in the background.\nSome people in a very big area flying some kites.\ntwo guys play firsbe on a grass field\na red double-decker bus next to a bus stop.\na group of people that are posing for a picture\nA historic clock tower turret still keeps the time.\nA lounge with chairs, shelves, and a fireplace\nA living room couch with a display of large mirror and flowers.\nthere is a large truck that is carrying many things on it\nA window in a room with different shelves nearby.\nA woman buys a bunch of bananas from another woman.\nA salad and a partially eaten sandwich on a plate.\nA man looks at what he is currently holding in his hand.\nA white cat that has yellow eyes looking straight ahead.\nTwo zebras are battling each other on hind legs.\nthree surfers wearing we suits are riding the same wave\nA city sign that is underneath a stop light.\nA woman in a striped shirt in the kitchen next to the fridge.\nA man wearing skis standing in a victory pose.\nThere is a truck parked on the side of the road.\na man reading the label on a food package\nA person who is standing up holding a frisbee.\nA couple of kids standing next to each other.\na man that is standing in front of a stop sign\nA person reaching for a wii controller\nA chocolate cake sitting on a plate with ice cream\nA square white plate is holding a vegetable heavy entree.\nThere is elephants both young and old on this African bush land.\na baby and a bear play on a sofa\nA giraffe in the middle of the street blocking traffic.\npeople with a carmel at the beach playing\na plate with some pizza, salad, and some sauce on it\nA baby sleeping next to a brown teddy bear.\nModernistic couches and chairs surrounding a big-screen television.\nThere is a box that has a lot if wired inside of it\nTwo guys are sitting at table.  One is looking at a cell phone and a computer.\nA living room with blue seating and wooden tables and cabinets.\nA refrigerator that has its door closed and then opened.\nThere is a person sitting at the tablr\nA woman laying on top of a bed in red shoes.\nthree zebras standing next to each other looking into the camera\nA man laying on top of a sandy beach laying next to a surfboard.\nA snow boarder jumping off a ramp at night\nA small boy laying on the ground with a large stuffed animal.\nA BOY WITH A BLUE SHIRT AND JEAN PANTS DOING A TRICK WITH HIS SKATEBOARD\nA few cars are parked in a parking lot at night.\nAn apple is being cut into slices on a cutting board.\nA series of little weird cars in fron of an european arch.\nA picture of a very green plant and red flower.\nA large organ van is parked next to a smaller van.\nA bathroom has a diaper changing table in it.\nA woman standing on a blue mat with two broken tv's and a bat in her hands.\nA white bridled horse carrying blankets in the desert.\na bunch of bears that are in cases\nA person riding a wave on top of a surfboard.\nA male surfer on a white board in the water.\nA person lifts a slice of gooey pizza\nA large white polar bear walking through the snow.\na close up picture of President Obama\nA bear that is going towards some water.\nA beach with flags in the ground and kites overhead in the sky.\nMan sitting on the floor with a case full of pamphlets.\nA man is poised to hit a tennis ball.\nThe building has a large clock on the front.\nA group of people sitting around a table with glasses of wine.\nA sandwich on a plate and full wine glass are under blurry lights.\nA street lined with buildings and red double deck buses.\nA large white parked airplane and some trucks\nA white sink and toilet in a room.\nThree black bears on rocks on the side of the river.\nAn online game player playing while two other men look on.\nA bedroom with a desk, bed and entertainment center.\nA group of boys playing with kites in a field.\nA zebra eats grass with another zebra beside them and a third zebra nearby.\nThe man is riding his dirt bike on the street.\nA sandwhich in a deli tray, with a soda and a book sitting next to it.\na lady driving a wagon with red spoke wheels being pulled by a horse\na pedestrian traffic light with street name and pedestrian crossing signs\nA painting of a woman holding a Frisbee.\nThe table has meat and donuts sitting on it.\nA man walking on the sidewalk with a cart that is piled with a stack of luggage.\nA woman standing on top of a tennis court with a racquet.\nA couch and a table in a room.\nTwo giraffes lick a branch on a grassy field.\nA clock tower in an open space with decorative plaques under the clock.\nA dog is hiding half under a bed with its nose and rump sticking out.\nPeople posing with a white two door refrigerator\nA long train traveling across a road on train tracks.\nCars are driving past two tour buses on the road\nA calico cat lounges in a blue chair in a home.\nA kitchen sink with kitchen utensils in containers.\nA double-decker bus is parked in a large field.\nDessert on a white plate next to a silver fork.\nA woman reaches to hit an approaching tennis ball.\nGuy holding his mug why sitting in front of the computer\nan iron bed with a hand made quilt on it\nA stuffed pink teddy bear laying next to a doll in a dress.\nsome baseball players are playing baseball on a field\nTwo cows with big horns are on a dirt road.\na woman is taking her surf board out to the sea\na person on a skateboard in front of a car on the road\nA ferry docked at a ramp with people exiting.\nThe two snowboarders are relaxing at the bottom of the slope.\nTwo men sitting on the backs of horses in a field.\na city bus drives down a city street\nTwo towels arranged in a heart shape on a bed.\na boy is holding a tennis racket outside\nSeveral cows laying down in a hilly area near a body of water\nA traffic light flashes green against the backdrop of a city.\nA man with an umbrella and other pedestrians walk down a street.\nFour people brushing their teeth in a bathroom.\nPlayer preparing to return volley during major tennis match.\nA bathroom sink with a towel rack, soap bottle and an air freshener sitting next to it.\nA man is riding a skateboard over a ramp while wearing a helmet.\nA group of guys playing basketball on a city street\nMany different types of toppings on multiple pizzas.\na living room decorated in beautiful red, white and black oriental imagery with vases and scrolls\nA giraffe eating something out of a persons hand.\na bunch of giraffes are in a large pin\nA man holding a metal cup on top of a wooden table next to a window.\nA cat standing in a laundry hamper looking down.\nA motorcycle parked near a curb with a man on a bicycle riding by.\nThere are two men preparing their boards for a sport\nA little girl standing on top of a wooden chair.\nA man standing next to a pair of sheet while biting his clothes and holding a meat cleaver.\nA zebra walking across a dry grass field.\na close up of two people talking on cell phones\nA brown bear walking through an enclosure.\nA cheese pizza made with mac and cheese and flat bread.\nAn airplane monument placed beside of a road.\nA pile of ripe bananas sitting on top of a table under an umbrella.\nAn elephant and a bunch of cattle at a watering hole.\nA man on his phone in front of his laptop at a cafe\nA banana and some sliced cheese are on a cutting board.\nA tan muscle car sits outside a home on a gravel drive.\nSteamed white rice and a variety of dishes for lunch\nTwo people are riding on top of some elephants.\na man standing on a surfboard inside the water\na couple of men are playing tennis on a court\nTHERE IS A HORSE THAT IS EATING GRASS\nAdult baseball player preparing to throw ball from infield area.\ntwo men in the park playing with a frisbee\nA very tall building that has a clock.\nLarge elephant walking forward down a dirt road.\na person riding on a horse behind a fence\na close up of a young child eating something\nA kitchen with cabinets, wine glasses and a refrigerator in it.\nA small bird sits among a bunch of branches.\nTwo bags are full of fruits on the table.\nA person reaches to catch an incoming Frisbee.\nA cat curls up on a soft and comfortable bed.\na train is preparing to leave a train station\nA boat with equipment on it riding through a waterway.\nWarning signs outside a fence at a transit station\nThe young person is jumping over the back of a blue bench.\nA bunch of very cute fluffy sheep in some hay.\nA clock that is on the side of a building.\nA couple of men moving a large book shelf\nA man at a podium with another holding an umbrella over him.\nA smiling woman sitting on a motorcycle in front of a building.\nan overhead view of many people on motorcycles\nA cat on flora fabric with Obama on tv behind it\nan old car sitting on the side of the road\nThere is a close up picture of bread and eggs\nA red scooter is parked on the side of the road.\na bird on a beach with a ship in the back ground\nA toll booth next to a highway at night.\nAn owl sits in the grass with his eyes shut.\na person holding a skateboard riding an escalator\nA herd of sheep with two sherds moving down a road in the mountains.\nA white and black motorcycle sits in a parking spot.\nA flock of ducks swimming across a lake.\na woman clips her babies finger nails off\nA kite that is sitting up against a house\na big brown bear with two young cubs\nA peacock with very large feathers walking down a street.\nA watermelon pound cake with icing with a slice taken out.\nA cat is snuggled up in a black backpack sleeping.\nTwo motorcycles parked outside a building on a busy street.\nA girl throwing her frisbee so her dog can go catch it\na bunch of cars sit parked down a side walk\nA plate of food next to glasses and bottles of wine.\nA woman standing on a beach, holding a kite.\nA huge bathroom with a large window overlooks the ocean below.\nA man in a suit drives his car.\nA plate is piled high with a meat and broccoli entre.\nWine glasses and several items used in photography sit in a studio.\nA couple of people standing on a beach holding surfboards.\nA man para glides on the water near land.\nA bear sticks out its tongue while climbing.\nAn array of vegetables including tomatoes, turnips and others.\na room with drawers full of books and a screen\nPeople riding a sky lift watching others ski down the slopes.\nKids plastic tools and toys on a table.\nA large living room filled with art pieces\na bathroom with a black counter and a big mirror\na living room with a big black couch in the middle of it\na white horse with a white cover and some grass\nA woman wearing fishnet stockings sitting on a bed.\nA bunch of people are sitting together eating pizza and talking.\nA yellow finch perched on a white fence.\nA crossroad displaying the signs for Creek Road and Amethyst Street.\nA crowd of people are watching two teams of athletes perform.\nA woman in a living room playing a game system.\nThree people in the water, one of a surfboard\nTwo green freight trucks parked on the side of the road.\nA brown stuffed animal dog with a black collar sitting in front of the mirror.\nBowl of oranges on a wood surface with more oranges on the side.\nA shirtless man riding on a large motorcycle on the beach\nA tennis player is being watched by a crowd.\nThe man has his hand on a rack of small yellow objects.\nMan flying a kite from a roof top in an urban area.\nA man in a black dress jacket is talking on a cell phone.\nA very attractive young lady using her cell phone.\nA dog is sitting under a bench outside\nA little boy that is standing in front of a counter.\nA cameraman taking a photo of a skateboarder in action.\nThis black and white photo was taken by water.\nA man standing on top of a sandy beach near the ocean.\nA black bear sitting on a rock surface.\nA group of people surfing in some water.\nA red fire hydrant is leaking onto a side walk.\nA man dodging a frisbee flying at his face.\nThree giraffe standing next to each other at a zoo.\nA little girl standing in front of tall wooden doors next to a dog.\nA man with a tie and glasses is by a house.\nThere are people camping and flying kites in a field.\nA group of people standing on top of a dirt field.\nA boy smiles at his friend while his kite soars high.\nA baseball player that is standing in the dirt.\nA cat is lying on the hood of a black car.\nA couple of birds are flying over the beach\nTwo black cats are casually laying on a computer desk.\nAn assortment of food and four wine glasses.\na person is pulling apart a eggplant\nSeveral streamers float above people on a beach\na person standing on skis on a snow covered slope.\nA silver colored video monitor sitting on a gray table.\nA stop sign and a no u-turn sign.\nSkiers of all ages skiing down a slope and gathering at the bottom.\nA boy and two girls taste testing different vegetables\nA metallic refrigerator freezer sitting in a kitchen.\nA cat that is sitting on top of a speaker.\nOne of the giraffes is peering into the building.\nLarge dog laying down on a blanket next to a table.\nA crowd watches a batter in a baseball game.\nA man standing in front of a pile of food under an umbrella.\nA breakfast of bacon, waffles, and fried banana slices\nA crowd of people sitting in a room on to of a wooden floor.\nA white toilet sitting next to a white sink in a bathroom.\nAn umbrella and camera equipment sitting in the corner.\nGiraffe standing tall in open grassy field with fencing.\nA man on a surfboard on the waves surfing\nThe view of a busy urban area at night.\nA family stands at the top of a mountain while skiing.\nMan in a tiger suit in front of another man on the phone\nA table with a chicken sandwich and a cellphone.\nA view of a modern building with skylight and a fire hydrant.\nA baseball team in the dugout preparing to bat.\nMan in boxers on couch with two laptops\nThree limes are next to a small bushel of bananas.\nFour seagulls are standing in a line on a large logs in the middle of the sea.\nA pitcher partly covers another baseball player during a game sponsored by Comcast.\nThe man is skiing down the snow slop.\nA young boy eating a custard covered donut.\nA sandwich on a white plate on a table.\nThe foreheads of two zebras standing side by side.\nA girl on a bench outside a salon checks her phone.\nA long commuter train passing by a train station.\nA dog running in the snow with a Frisbee.\nFour horses and a man with a hat sitting on one.\nA plate filled with meat and different kinds of vegetables.\na living room area with a two-person couch and various living room furniture\nthere are many different donuts on a yellow plate\nA white plane getting ready to take off on a runway.\nTwo people in a small boat floating by some greenery.\nThree adult and one baby giraffe standing outside.\nA table topped with coffee cups and plates of food.\nA bathroom with a toilet, sink and bathtub.\nA man has laid out all of the items he plans to pack.\nA ski resort area with various skiers in the snow and several in line on an automatic transport belt.\nA bird on a table drinks from a tea cup.\nA man standing in a dry field, holding a Frisbee.\nA person on a cell phone on a street.\nA bird floating on top of water in the rain.\nA man in black jacket skiing down a hill with a kite.\nA green suitcase sitting on a wood floor.\nA bathroom area with a toilet, trashcan and tiled floor.\nA skateboarder coming up out of a dry pool.\nthere is a small lap top surrounded by other things\nA dog and man rest on the bottom of an overturned boat sitting on the bank of a body of water.\nA picture of a man wearing a suite and tie in a picture frame.\na bride and groom a purple table and a purple and white cake\nA group of people walk down the pa towards the beach\nTwo sailors are shown walking in a parking lot.\nThere is a dog wrapped up in a blanket.\nPeople ski at a ski lodge during a snowstore.\nCross country skiers in a competition with number 33 in front\nA white bathroom with a sink and mirror next to a shower.\na plate that has a table full of food\nA person in a purple jacket is on a snowboard on a snowy hillside.\nA surfer carries his board as he runs through the water.\na bath room with a toilet and a sink\nA young baseball player is getting ready to hit.\nTwo dogs plays together on the ground in the dirt.\nThe passenger train drives around the curve of the tracks.\nA man and a young woman walking down an alley way.\nA man swinging a baseball bat at a ball during a game.\nA man riding on the back of a parked motorcycle.\nA birthday cake with a number one and three candle.\na child sitting on a car eating a hot dog\nA large colored bird perched on a power line\nA man with a surfboard walking along a beach.\nA red double decker bus on street next to buildings.\nA plate of Mexican food with beans and tortillas.\nA gold and blue clock that is on a building.\na person laying on a bed while reading a book\nThe crowd of people are looking to fly their kites.\nA toilet stall that is white all around.\na large green and yellow train on a track\nCars are parked on the street next to an old fire hydrant.\nA close-up of a hawk with a group of people in the background.\nA newborn foal nursing his mother in a corral.\nDump truck alone on road with buildings and bare trees and shrubs behind it.\nTwo men in the Navy cut a cake shaped like an aircraft carrier.\nA woman riding a wave on top of a surfboard.\nthere is a plane flying very high in the sky\nA crowd of people walk along a sidewalk near a busy road.\nA person is standing at the edge of the water on a beach.\nAdults and children gather near a dock on the beach.\nThe ride attendant watches over the wave park.\nA residential street with large houses during sunset.\nA street scene with two men napping on a bench, a woman walking, and two other men looking at their own reflections in a shop window.\nA red stop sign targeted specifically at bicyclists.\nA large plate is adorned with broccoli and a rather small piece of meat.\nThe man on a bicycle is using a cell phone.\nAn Asian lady in a red dress petting a small elephant at a zoo.\nA small bird on a sandy beach near the water.\na close up of a plate of food with broccoli\nTwo people standing at a food truck placing an order.\nA red stop sign next to a street corner.\nA woman holding up a fairly large pizza.\nA plate of vegetable stir fry with sauce.\nA batter and catcher assume their stances as an umpire looks on.\nA dirt bike rider is racing through the dirt track.\nTwo giraffes standing next to one another and interlocking their necks.\na white bathtub in the center floor of a bathroom with a sitting chair and a window with drapes.\nA bus parked at a stop beside a small home\nthere is a man that is throwing a frisbee between his legs\nA man eating a piece of pizza at a table.\na person walking holding an open umbrella\nPeople and dogs sitting in a boat floating on water.\nTwo flowers are allowed to grow in a beer bottle.\nGirls reaching for the  basketball in a gym\nOcean fairing ship near land seen passing markers.\na couple of bowls with some food inside of it\nA man playing swinging at the ball during a tennis match in front of spectators.\nA couple of single beds with a phone and remote control by them.\na small child stands in a tennis court, about to serve a tennis ball\na girl in glasses is sitting at a laptop\nA wooden surface with three frosted doughnuts on the top.\nA tiny suit case full of girl's doll clothes\na city bus parked on the side of the road\na person jumping with their skateboard by some stairs\nAn image of a city skyline taken at night.\na blender with mixed fruit sitting in a container\nA shower stall set up with handrails and a seat.\nA slice of chocolate cake is on a small plate.\nTwo photos of a living room- one without a ceiling fan, one with the fan installed.\nYoung girl walking up steps to dog at pier area.\na cat that is standing on a red chair\nA man standing next to a truck parked on the side of a road.\na building with a large clock above an archway\nBirthday cake with a three candle and six other candles.\nA young precocious girl clutching her teddy bear.\nFive baseball bats on a silent auction table.\nTwo brown cows looking at the camera.\nTwo people laying on a green bunk beds\nA silver bin holding different kinds of vegetables.\nAn old, rusting, yellow fire hydrant n weeds.\nA stuffed animal is standing on a table\nA computer that is turned on with piles of paper to the side\nA person riding a board through the air.\nA dozen doughnuts sitting in a box and ready to eat.\nClose-up view of skateboarders lower body performing a trick on a high wall.\nA table with food and a drink on it\nSkiers make their way down the trail through some trees.\na bird sitting on a shore next to a lake.\nA motorcycle with a side car parked with other motorcycles.\nGlass enclosed shower with white tile walls,brown floor\nA man riding a board while hooked up to a parachute.\nPeople are standing in front of a castle type building with an eerie gray background.\nA small boy cutting out things from paper at a kitchen table.\nA dog is sleeping on the bed and having fun.\nA set of lights on a light blue motor vehicle.\nA man is standing in the water next to a boat.\nA small bathroom with a commode and sink, and empty corner.\nThe refrigerator, stove and microwave are on the same side of the kitchen.\na man on a skateboard performing a trick at a skate park\nA cat sitting on a motorcycle that is parked in a driveway.\nA dog leaps in the air to catch a Frisbee.\nA man in a grassy field throwing a Frisbee.\nStatute of a horse and rider on top of a block wall.\na black cat laying on a bed with a colorful blanket\nThree trucks with lawn mowers in the bed and people near by are parked side-by-side.\nA man brushing his teeth with a tooth brush.\nthere are two sandwiches that are on two white plates\nA stadium full of people are watching a baseball game.\nThe large bathroom has two beds in it.\nA group of people riding skis down a snow covered slope.\nA man doing a trick on a skateboard in a park.\nA man holds an umbrella and looks over a flowery hill to the sea beyond.\nA man talking on a phone while standing on a corner.\nA girl holding a tennis racket up with both hands\nA woman looks over her shoulder as she pauses while cross-country skiing.\nThe stands are full as a man in a blue and white uniform holds a bat in front of a catcher and umpire.\nA pug dog with a pirates hat licking a bottle.\nA living room area with eclectic furniture and accessories\nTwo birds sitting on top of a rear view mirror on a car\nA man surfing on a green surfboard in front of mountains.\nThe man on the grass is playing with his soccer ball.\na dog on a table on a porch\nA bathroom with yellow walls and a picture of  man over the toilet\nA vase filled with flowers sitting on top of a table.\na blender with a bunch of food inside of it\nA pizza cutter slicing up a food item on a cutting board.\nA wooden table topped with lots of veggies and greens.\nA horse-drawn carriage ride stopped at the gates of a European castle with three towers.\nA hummingbird hovers near a bird feeder.\nA man in his ski gear is in the air.\nA small bird perched on a metal bar next to a tree\nThey are selling a bunch of bananas at the fruit stand.\nA horse walks through the grass near sand.\nDinner plate with prepared steak, broccoli and sauteed mushrooms\nA man in a blue jersey swinging a bat on a baseball field.\nThree horse drawn carriages in front of a huge house with a clock on it.\nA view of a small plate of food with a orange.\nBananas and coconuts are sitting on an old fruit stand.\nGuy in a hat flies a kite on the beach while other people are in the ocean\nGroup of kids eating some food on a table\nA woman sitting down next to some bananas.\nMen hitting ball with round discs near brick building.\nA man that is standing up in a grass field and holding a kite that is over his head.\nFlat pizza like object sitting on table with a person taking a slice\nA photo taken within a sleeper car on the train looking at the window.\nA man poses in a double-breasted coat with a fur hat.\nDonuts in an open box on top of a table.\nA bathroom with a sink and toilet next to tile wall.\nA large commercial jet in the air with the landing gear down.\nThere is a woman sitting under her blankets\nA plastic hand reaching towards a plastic toy blender.\na woman is sitting with a red guitar and bananas\nA man is sitting on a chair holding a sign up\nthis is a man on a bike in the woods\nA pizza with basil, cheese and tomatoes displayed on a table.\nYoung girl posing at table with cake lit with candles.\nA cat is sitting on a cushion on a sofa.\nThere is a woman swinging at a tennis ball\nA man on a skateboard is going down a ramp.\nA jet airliner flying over a building with sky in background.\nFour zebras at the edge of a lake with a multitude of flamingos in front of them.\nThis bathroom has a toilet and a duvet.\nthere are two men standing and playing a video game\nA city block intersection with cars stopped on a corner.\nA large herd of sheep standing near each other\nA stuffed dog with a wizards hat on it's head.\nHerd of black cows grazing on a hillside.\nthere is a skateboarder doing a trick in the air\na short yellow school bus parked between two cars\nAn man taking a picture of a sink through a mirror.\na man is riding a skateboard in a bowl\nA giraffe with his head out of sight over a covering.\na toilet attached to a wall in a bath room\nHands putting motorcycle models onto a birthday cake.\nThat is using physical motions to play the video game.\na zebra standing alone in a pool of water\nA group of people sitting at tables with paper and laptops.\nA bunch of cows that are standing in the grass.\nLooking out from under a frayed sunshade at a beach and water view.\nA child brushing teeth in a blue sink.\nThree sheep eating grass near a water source.\nA boy is jumping off his skateboard a the top of a skateboard ramp.\nThe giraffes are bending their necks down to eat from the bush.\nA woman with a scarf and sunglasses standing next to an human size stuffed dog that has an outfit on.\nA silver mirror hangs above a sink in a bathroom.\nA shelf containing books, stationery, and a clock.\nTourists riding in a British double-decker bus that is making a stop.\nMan in yellow and black body suit on skateboard.\nChild wearing a red jacket skiing down a slope near the trees.\nA man in yellow shirt doing a trick on skateboard.\nA skateboarder doing tricks in a half pipe at a skate park.\nA lone woman stands posing in a large kitchen.\nA young girl in a chef's outfit cuts raw broccoli in a kitchen\nA man in a coat and tie and biker shorts carrying a backpack.\nSeveral graduates call friends and family on cell phones.\nA meal of french fries, salad, and meat is sitting on a table.\nBlack and white photograph of a skateboard with its rider leaping above it\nAn opened door to a bathroom with a counter and a tiled wall.\nA peeled banana sitting on a wooden fence.\npeople standing around in the snow with some snowboards\nA person gets ready to release a kite.\nA tennis player in an orange shirt and black shorts holds black tennis racket on a tennis court surrounded by onlookers.\nTwo people in ski gear standing at the top of a mountain.\nA tree filled with unripe apples in an apple orchard.\nA busy New York city street at night.\nthere is a pair if scissors leaning on a rock and paper\na couple of men are playing video games in a room\nAn old man standing next to a forest of trees.\nA group of men on skateboards on a ramp.\nA crowd is watching horses go down the street.\nA cluttered kitchen with white cabinets and tiled floor.\nA man is who is kiteboarding on the ocean is airborne.\nA red traffic light sitting on the corner of a street.\nA photograph of papers and a computer at a desk.\nA red and white sign reading \"Whoa\" and a red a white sign reading \"Caution children at play\".\nA baseball player holding a ball and a glove.\na black and white photo of a person in a suit and a person in a dress\nA yellow and green train traveling under signals.\nA reflection of a dog sticking its head out a car window\nBlack container sitting on top of a white toilet and a bathroom.\nA woman standing in front of a door with a broken surf board next to her.\nTennis player about to hit a ball in front of an ad.\nA man reaching his arm to catch a frisbee.\nthere is a green bike parked by a red bus\nPeople lined up on a sidewalk near a bus.\nA bathroom featuring toilet paper hung from a chain.\na street sign next to a tree lined street.\nA soldier wearing an Army uniform rides a regulation motorcycle.\nTheir is a little kid using a phone\nA man walking a brown horse wearing a red blanket.\nTwo giraffes inside a building near a beam.\nA cat sits on a wooden park bench.\nA orange tabby next to some black birds\nA horse drawn carriage going down a city street.\nPeople are flying kites on a beach near the boardwalk.\nA scooter with a helmet hanging off it's handlebars.\na person on a skate board does a trick\nA man with a catcher's mitt reaches out to catch a baseball.\nA child in a living room is swinging a bat.\nTwo small children are laying in a bed under blankets.\na tennis player hitting a serve on a court\nA plate with meat, onions, gravy, broccoli and cheese.\nRefrigerator and freezer are filled with soft drinks and beer.\nA giraffe standing next to some tall building\nA little boy holds a small dog while he sits on a bench\nA child in a giraffe costume and a child in shorts cooking in a kitchen on chairs.\nA shelf filled with organic mango peach juice, bananas, oranges and eggs.\nA dog celebrating its birthday with a cake.\na couple people on the beach flying a kite.\na bowl of fruit in black and white.\nSome traffic signs in front of a church.\nA small brown monkey sitting down while holding a banana.\nA woman cutting a cake with a knife.\nA young boy holding a toothbrush and toothpaste getting ready to brush his teeth.\nA kitchen with a black automatic dishwasher next to a  doorway.\nthere is a farmer market with lots of fruits\nThe white cat is sitting underneath an umbrella\na young boy standing in a living room holding a wii controller\na train on a track near many trees with a sky background\nA van is pulled up to a boat docking area while a cow stands alongside the signs.\nA person sits with their feet up with a boxed pizza.\nA boy sitting down with a shoe in his hand.\nA silver train traveling down train tacks near other trains.\nA demonic looking life like doll sitting on a bed next to pile of human skulls.\nTwo teddy bears, one a police officer bear sitting in the lap of the other, a white bear, both of them sitting on a wooden chair.\nA wide view of the patrons of a large library.\nA picture of a trolley that is on some train tracks.\nTwo male chefs cooking in a kitchen while another staff member uses a mobile phone.\na man riding a motorcycle down a city street with luggage and a sleeping bag attached\nA cat that has curled up in a bowl.\nFOUR SHEEP IN AN ENCLOSURE WITH SNOW AROUND THEM\nBasil, cheese, tomatoes and bread on a plate.\nThe elephant is an extremely large animal.It has a bug tusk.\nA couple of men that are standing near luggage.\nA man laying on a blue couch in a living room under  mirror.\nA glass vase of yellow daffodils sits on a checkered table cloth.\nA kitten is eating cat food from its dish.\nA man is holding a tennis racquet and hitting the ball.\nAn open refrigerator door with very little contents.\nTwo small brown sheep in a fenced in pen\nA man rides a skate ramp on his skateboard.\nA sink in the bathroom next to an open toilet.\nA post with several street signs on it, including the name.\nA teddy bear and another stuffed animal next to bookshelves.\nThe man who uses this bathroom shaved this morning\nA man wearing sunglasses talking on a cell phone.\na man is standing and holding a controller\nA man looks somewhat blurry on bike as others look on.\na person riding a race bike doing a trick\nFour zebras drinking water in a sandy field.\nA long desk area with a desktop computer at one end and a laptop computer and Wii video game system on the other end.\nAn old Gothic style church with a clock in the tower.\nA rhododendron bush is in full bloom beside a park bench.\nA lot of colorful umbrellas lay out on the grass.\nA young man sitting on a couch using a laptop computer.\na desk with many laptops a monitor and a mouse\nA woman standing in front of a cabin in the snow.\na bed with two tables a purse and books stacked in front of\nThere are two red and white street signs that show directions\nA white and green bus on road next to a car.\nThe woman runs to hit the tennis ball coming towards her.\nFive delivery bicycles are parked aligned along the wall.\nA herd of horses in a grassy field near a hill top.\nAn elephant statue sitting in front of a clock.\nA food combo has noodles, cabbage, eggs and meat.\nA train sits on tracks near power lines  and a street sign.\nA group of people at a long table eating dinner together.\nA couple of people playing a video game with remote controllers.\nA boy is putting peanut butter on a sandwich\na person in a living room with a emote control\nA blue and white plate with ham and vegetables on it.\nThe kitchen counter has a cutting board with chopped vegetables on it.\nA bus driving through traffic in a city with skyscrapers.\na bathroom with shower, toilet, and sink with shelves\nA bunch of men standing in a building and one of them is on a cell phone.\nA train in the middle of tracks with people.\nA group of snowboarders riding in the white snow\nA skier makes a jump on a very steep hill.\nBaseball memorabilia is displayed in glass stacked casings.\nA teddy bear is sitting on the rail of a wire fence.\nA cable car in front of a tall building.\nA little girl sitting at a table with lots of food.\nA dog sits and stares at the TV.\nA person that is playing in a tennis game.\nA young women in wet suits carrying surfboards.\nA man riding a snowboard down a snow covered slope.\nThe skateboarder is about to perform a trick at the cones.\nThree giraffes in an outdoor setting with one giraffe drooling.\nA cat sitting on a couch , with a shirt covering it.\nIguana eating fruit in fruit stand not intended for him.\nThese young grey hours are playing Frisbee with their owner\nA man holding a container of two hotdogs.\na messy kitchen counter and sink covered with dirty bowls and other cooking ingredients\nA compact kitchen with white appliances and shelving units for storage.\nA pile of luggage at a transportation hub.\nHerd of happy zebras in a field of grass\nA lot of flowers that are by a walk way.\nA man and a woman eating lunch at a restaurant.\nA glass vase with flowers resting on a grave.\nA quiet highway with a street sign up ahead.\nThe room in the house needs to be picked up.\nTwo people with their arms wrapped around each other sitting on a bench.\na purple mug is next to a bowl\nThere are some men playing a game of baseball.\nTwo giraffes stand in their enclosure at the zoo.\nA large clock on a pole on a street.\nA red fishing boat floating on the water.\nA group of three men riding in the snow.\nA woman having fun with a baby elephant\nA woman showing her hot dog to the camera.\nThere is a cat walking along the edge of a sink\nA large herd of cattle is in a field.\nTwo men skiing across a snow covered slope.\na woman leaning on a counter poses for a picture\ntwo zebra standing next to each other while one kisses the other in forest field.\nthere is a bench under a very large tree\nA lady puts a frisbee in a frisbee goal.\nA small blue car that has been hit by a  city bus\nA grouping of bananas and other fruits against a wall.\na group of men play a game of frisbee in a park\nA group of people in a park flying kites.\nA semi truck parked at a rest stop.\na close up of uncooked pizza on a surface\nA variety of kitchen utensils hanging from a peg board.\na cat is sitting in front of a television\na sink with soap a towel rack and a towel\nAn open marina with boats on both sides\nA hanging traffic light at an intersection with another traffic light visible in the distance.\na bedroom with a big bed, and a lamp.\nA toilet and sink side by side in a bathroom and a mirror.\nA train parked inside of a train station next to a loading platform.\nA linden tree overlooks a park bench on the banks of a lake.\nA blue counter top with lots of pairs of scissors on them.\nThe cat is playing with the shoes on the floor.\nThe bulldog has a mean look and is protecting his home.\nA couple of street signs hanging from the side of a pole.\nA very nice motorcycle in a drive way.\na baseball player is swinging his bat at a ball\nTwo skiers race while a crowd looks on.\nsome people a stool a counter some lights and bottles\nAn intersection with a crosswalk and street lights.\nA red frisbee stuck in a tree at a park.\nA black bear is surrounded by black birds on grass.\nA person with a lighter lighting candles on a cake.\nA man looks a donut hanging from a string.\nA women sits in bed with her white dog and she is looking at her cat.\na cow stands in front of tall stacks of hay on a grassy field\nA bunch of shirtless dudes walk down a road\na bathtub with bed behind it and big window.\nSomeone looking out their window at vehicles on the street.\nA living area with a  coffee table with food on it.\nA kid and an adult are flying a kite.\na small brown and white bird sitting on a branch\ncouple sitting with a dog wearing a cowboy hat\nFour young men sitting on a bench with four skateboards.\nA woman and two men on the beach with surfboards.\nYoung lady with her legs in the air laying on a bed in a room.\nA person sitting in a chair watching a computer screen while playing a guitar.\nKitchen with wooden cabinets and a center island.\nSeveral boats are docked along the side of a river.\nA brown and white dog laying on a floor.\nA Safeway truck that carries merchandise for the stores.\nTwo zebras are standing in the shade of a building\nThe people are posing for a photo out of an airplane.\nA colorful bird sitting on a branch full of leaves.\nA bathroom with a toilet next to a sink.\nA boat that is sitting in the water with a sail.\nA man riding a blue motorcycle on the road.\nA bedroom with a large, unmade be, a ceiling fan and other bedroom items\na giraffe in its pen and two people are feeding it\nA woman is standing on a tennis court and holding a racket.\nThe glass bowl holds a broccoli noodle dish.\nA red fire hydrant surrounded by yellow flowers and grass.\nA person with a ring smiling holding a object.\nA man and woman playing tennis on an asphalt court.\nA red fancy bus is parked by a standing man.\nFour umbrellas lying down a beach during the day.\na white bathroom with a urinal and two framed pictures of clowns\nA lady is playing doubles tennis with a man.\nGroup of zebras standing on a dirt field together.\nA salad that contains broccoli and oranges in a blue bowl.\nA small dog sitting on the back of a cow.\nAn animal is covering up the keyboard with it's long tail.\nA man skiing with a dog close to him\nA colorful plate of vegetables, fruit and beans\na grey cat sitting on top of a couple of plants\nA man making a vase on a pottery wheel.\nblue and white working truck sitting on the street\nA black and white image of a baseball game.\nA cat laying on a pink couch with a large brown hat on\nA man standing on skis next to a sign.\nA cat in a room with an assortment of luggage.\nA pretty young woman sitting at a desk working on a desktop computer.\nPerson on the beach flying a black and red kite.\nA woman sitting at a restaurant getting ready to eat her food.\nThe man and the dog walk near tall stacks of plastic chairs.\nCarrots and dressing on a plate with some yogurt.\nA giraffe in a pen looks down towards the ground.\nA plane sitting on a runway in the middle of the day.\nThree mountain goats on a rock with grass around it.\nA man wearing jeans sitting on a parked motorcycle.\nThis is an image of three children with play phones.\nA person riding on the back of a white horse.\na cat sitting between a window and security bar\nA bunch of different types of doughnuts together.\nA couple of air planes flying through a blue sky.\nA train's bathroom with a sink and a toilet.\nA large group of skiers waiting in a formation.\na coffee maker is sitting on a marble counter top\nA young man tossing a Frisbee in a  park.\na fridge sits in a kitchen next to a door\nThe bathroom has a sink, toilet, and a shower.\na bench in a field  looking at snowcapped mountains.\nTwo roosters walking next to a fence, near a fire hydrant.\nLittle children on a field playing soccer in a park.\na woman on a train holds up her camera to take a picture of something outside the window\nThree boats in the green and blue water.\nClose up of a street sign in front of a water tower.\nWoman in a jersey standing next to a large elephant.\nA man hosing a dog off while talking on the phone.\nA red stop sign sitting next to a street sign.\nA yellow vespa parked in a lot with other cars.\nA meal from Japan or China on a tray.\nA man standing near the ocean with his surf board\nA chair and a clock attached to the side of a building.\nA person on a blue snowboard going sledding between trees\nSnow boarder riding during the night over a fence.\nA train moving through the station with a man on the bench.\nA little boy is standing on a refrigerator shelf.\nA woman is sitting at her jewelry display and talking on the phone.\nFresh flowers and produce sitting on a counter top.\nA group of people sitting at different tables.\nA brown teddy bear holding three pizza boxes.\nThe Big Ben clock tower towering over the city of London\nA woman on a court with a tennis racket.\nA group of sheep gathered together standing next to a donkey .\nA young person on skis flying high through the air.\nA large bed sitting inside of a bedroom next to a  lamp.\nA keyboard, computer screen and mouse are on a table.\nA teddy bear in a chair dressed in clothes\na cat that is standing in front of a person\nA living room with a Christmas tree couchs and a black dog.\nA couple of cops riding on the back of motorcycles.\nA woman in a skirt is side saddling on a horse.\nlots of snow on the ground and the ocean is ahead.\nSmall boy smiling with his head tilted to the side.\nA heard of animals in a field approaching the water.\nA large ship is on the water near docked small boats.\nA decorated Chinese vase on a side board.\nA HERD OF GIRAFFES STANDING AND LYING UNDER THE TREES\nMany birds gather in the middle of buildings\nA dog is approaching a statue of a white bull.\nA baby wears sunglasses and plays with a pink suitcase.\nA group of people with drinks watching a game be played.\nA frisbee barely hits the surface at a lake\nA man and little girl are sitting on a bench in front of an airplane.\nA wine glass set on a counter of a kitchen area with a reflection of the kitchen in the wine glass.\nAn airplane outside of buildings near people sitting in chairs.\nA dimly lit remote control and image on screen\nAn adult bear and three babies cross a road\nA small bathroom stall has a maroon toilet rug.\nA man in a burgundy shirt playing Wii bowling.\nA dog is looking out the window of a car.\nTrainer shows man his elephant in tropical setting.\nA person riding a moto bike in the mud.\nA gas stove in a small simple kitchen.\nA person holding some some of electronic device.\nA cat that is sitting on a dogs back.\nTwo cats cuddle on the chair in the living room\nThe desert cake is frosted in two shades of pin, and topped with fancy frosting flowers.\na laptop besides an alarm clock maroon in color\nA bedroom with two beds sitting under four framed pictures.\nA dog lowers its head to the ground\nA person holding a snow board in the mountains.\nA couple of cows standing next to a building.\nA group of zebras on a grassy plain.\ntwo dogs laying beside each other on a couch\nA man is getting a haircut while another man sits.\nGroup of three players in a baseball game.\nA group of jet perform in the sky.\nA man flying through the air on top of a skateboard.\nThis shack has a small table to the left, a stove in the back, and a counter top on the right.\na man sits against a wall with punk accessories\nA man in a safety vest standing next to water hoses\nA display case in a bakery filled with lots of dessert.\nA tall building with a clock embedded at its top.\nthis paper plat has the word cat and a cat drawn on it\nA woman has an apron and head scarf while touching carrots at a produce market.\nA mother and baby zebra standing in their enclosure.\nA person on the water flying a kite.\nA person with their feet on a desk with a plate of pizza and a can of soda.\nA fork holding a pink food item on an upside down plate.\nA bed is in the middle of a well lit room\nA woman in glasses is sitting on a butterfly bench.\nA guy and a boy on a motorcycle with a side car.\nA man is looking down at a small cake.\nA man on a pink and blue bicycle on a crosswalk in a city.\nA bird sits on a car's rear view mirror.\nA close up image of a giraffes face while eating.\nthere is a small baby that is holding a small racket\nA pile of debris in front of a purple and red building.\na few ladies are playing tennis at school\nTwo cats are crouched in the refrigerator, among food.\nA vintage photo taken of a street sign on a dusty road.\nThis piece of paper has three hot dogs on it.\nA group of people at a wine tasting with a variety of wines.\ntwo bears touching noses standing on rocks\nThree people sit under umbrellas at the beach.\na number of sadnwiches and wine on a cloth near a body of water\nTable of sampled chocolate cake and ice cream on a table.\nA person walking over to a black and yellow kite in the park.\nA pan with a crust filled with raw broccoli, carrots and cheese.\nA table with a bunch of kids tools sitting on it and other items.\nHotdogs cooking on a commercial grill with condiments nearby\nBottles of infused oil and a glass vase full of glass flowers\nA large white cruise ship sitting in a harbor.\na stop sign and a pole in a dark knight\nA left hand holding a partially eaten, pink, iced donut\nA red fire hydrant in front of a building.\nA woman in a bikini standing next to a man on the beach.\nThe action during baseball as the pitcher throws\nSeveral baby and parent giraffes sitting around a cut down tree.\nA black and white photo of two female skiers in a mountainous landscape.\nA woman standing in front of a table of baked goods.\nA man is cutting an onion on the cutting board\nA man filming a women holding a microphone on a street corner.\nA man with a red hat, tie and white shirt\nA baby laying on a colorful quilt with a bib around his neck and a string in his mouth.\nA living room opens up into a kitchen.\nA large herd of sheep are grazing in a field.\nA young boy posing in a baseball uniform.\nThere is a woman standing n a field around kites\na batter holding a bat waiting for a ball to come\nSnowboarder performing trick on snow with trees in background\nA man winding up with a frisbee on a court.\nA women in military uniform who is giving a cow a shot.\nA lady is touching her lip, holding her purse, on the bench.\nA young boy eating breakfast in bed.\nThe space shuttle ridding \"piggy-back\" on a NASA 747 airplane.\nWoman in a white shirt laying in bed looking at a laptop.\na stop sign that has some signs on top\nA herd of sheep make their way down a rural path.\nA man plowing with oxen on a dirt road.\nDuck leaning forward towards a body of water from a concrete footing.\nA fire hydrant is covered with graffiti and spray paint as it stands in front of colorful building in the background.\nA large white building on the corner of a street .\nA view of a woman sitting on a chair with a guitar blocking her face.\nA stop sign has been amended with \"driving\" bumper sticker.\na church with a tower with clocks on the top of it\nA dessert is sitting on a small dessert plate.\nTable set with black and white dishes with a scissors and dotted line motif.\nPicture of a plate of food and a drink.\nThere is a lot of traffic outside because of the fire truck\nA picture of an airplane that is sitting at a terminal\nA street sign on top of a stop sign outdoors.\nA living room with a glass coffee table, couch and television.\nThe clock has many different measurements on it.\nA kitchen with three tall bar stools next to an island.\na bathroom with two toilets and a bunch of toilet paper\na few people that are walking down the street with some umbrellas\nThree birds flying high in the overcast sky\nThe back of a woman's head in church.\nA fancy bathroom with a stand up shower.\nA horse pokes his head over the metal railing.\nAn empty bathroom with white tile and a large mirror\nA guy lying in bed with a bag of munchies and holding a game controller in his hand.\nTwo girl holding tennis rackets on the court.\nA man in a suit a and tie with a umbrella.\nA white fire hydrant sitting on a street corner with a face painted on it.\nA man is sitting on a one wheeled bicycle next to a smoothie.\nA herd of cattle standing on top of a grass field.\nA group of people that are sitting in the grass.\nA group of people sitting around a table with food.\nA newly remodeled kitchen with stainless steel appliances.\na close up of an elephant walking on a dirt ground\nA couple is walking by a store front windo\nAn older man takes a pizza out of the oven.\nA bunch of pans that are hanging on the wall.\na man is riding a board at the beach\nA man wearing a wetsuit in the water on a surfboard.\nTwo elephants are locking trunks with each other.\nA reflection of a kitchen microwave and cabinets.\nA bathroom is adorned with a quilt pattern-inspired floor and walls.\nA person in a hooded jacket is near a transit bus.\nA person is holding an umbrella in a snowstorm.\na person wearing gloves kneeled down in front of a toilet\na wash room with toilet and wash basin are seen.\nA long white paddle boat with people riding on top of it.\nA somewhat dark image of a laptop sitting in the background of a bedroom.\nTwo men at the beach one of which is holding a surfboard and a para sail.\na ferry boat and a jet flying over head\nA computer and a laptop sitting next to one another.\nAn aircraft is releasing a red substance below them.\nA box of pizza that is opened has tomatoes, cheese, and spinach on top.\nA luscious desert tray to satisfy all tastes.\nA train on the tracks is parked while people board.\nwoman in long, light red dress with orange umbrella.\nA row of motorcycles posed on a floor next to a flag.\nA white bird with it's wing extended floating in the air.\nA group of people sitting around a living room together.\nA picture of a man and woman on the screen of a lap top computer.\nA worker in front of a kiln holding a vase.\nAn older man drinking white wine from a glass.\nClock tower and official buildings on the other side of the river.\nA white cup holding a tooth brush on top of an orange table.\nA lone polar bear walking across a frozen landscape.\nA coin meter next to a trash can on the sidewalk.\nmany people riding horse drawn carriages with umbrellas\nA group of surfboards and people at a beach festival.\nA bunch of birds in the air flying with kites.\na basket is behind a brown bicycle seat\nA very small plant is inside a cup.\nA busy street is blocked by a crane truck while a construction worker walks by.\nA family sits on the gravel of a beach flying a kite.\nTwo people enjoying a picnic by a river.\nThree zebra standing next to each other on a lush green field.\nA dark room lit only by one lamp and a computer screen\nGuy riding his  gold motorcycle giving a signal.\na young girl standing above a teddy bear taped to a chair\nA Pacific National train is stopped at the station\nA black and white dog laying on top of a pink and black frisbee.\na close up of a plate of food on a table\nYoung boy knocking over his t-ball stand in the backyard\nA sparse room with a bed sitting in the corner.\na yellow fire hydrant standing behind the tall grass\nSomeone is trying to eat a slice of vegetable pizza with a knife and fork.\nA professional motorcycle rider leaning into a curve.\nA group of men on a field playing baseball.\ntwo shots of a man climbing stairs, then jumping down them with a skateboard\na big plane flies through the blue sky\nThree people sit around a table eating a meal.\nA herd of four zebras in an open field.\nA desk with a computer, printer and other various items.\nA small blue and white gazebo sitting underneath a lush green tree.\nA pink double decker bus driving down a street.\nA dog is shown in a car rear view mirror.\nA skateboarder is using a ramp to jump into the air.\nA jockey with his horse and dog standing in a field.\nA metal bowl filled with oranges and tomatoes.\nTraditional narrow boats on a river with fruit and people.\nA bathroom with two sinks and a large mirror.\nthere are two men playing Frisbee one is jumping in the air to get it\nA batter prepares to hit a ball in a professional baseball game.\nA large black train on a track with steam coming out.\nThree people holding wine glasses in a bar.\nA man is jumping and doing a skateboard trick.\nThey have a variety of pizzas to choose from.\nA street with people in cars and bikes is shown.\na bus driving down a street with people seated on the roof of the bus.\na vintage photo of some people getting ready to cross a street\na dog laying on a bed with a stuffed animal\nPeople are shopping at a farmers market on the street.\nChildren standing in the grass on a field.\nA cat that is laying on the back of a chair and sleeping.\nA white sink and towels in a room.\nA city bus is slowly making its way down a very crowded street.\nA sport team is posing in a park.\nA man and woman are playing doubles in a tennis match.\nPeople laying in the sun on the beach on a sunny day\nTwo hot dogs covered in toppings on a blue tray.\nA person on a skateboard does an air trick.\nA picture of a bunkbed that is very clean.\nA lady in a bath robe touching something near the ceiling.\nA couple of boats parked on top of a beach.\nA group of sheep eating grass on a very sunny day.\nA man holds a large hot dog and hamburger\nA couple of people standing in a room with remotes.\nA large group of people sitting on the ground.\nA bed that has been made in a small room.\nA woman taking a swing at a tennis ball\nA silver microwave oven sits near a wooden cabinet that has a silver handle.\nA seagull is standing on a ledge and one is flying across a river that is flowing.\na woman stands in a bathroom blow drying her hair\nA group of boats on a body of water with clock tower in the background.\nA group of young women standing around in a half circle holding tennis racquet.\nA bird is chirping out of its nest.\nA steer and a baby brown cow staring into the camera.\na bell tower with a clock face on it\nOne woman leaps to hit a tennis ball while her teammate guards the net\nA bus driving down a street next to buildings.\nA television screen that has a video on it.\nA group of people standing and sitting on the sidewalk, watching a parade with horses.\nA really nice hotel room with a gorgeous view.\nA herd of zebra standing below a tall hillside.\nTwo ladies are sitting on their laptops at the table and one of them is on their phone.\nA photograph of a train traveling down some tracks.\nItems are laying on a long table in a narrow kitchen.\nA desert that has some Oreo cookies crumbled on top of it.\nA boat with a man fishing on it on a lake.\nA oneworld passenger plane taking off from an airport.\nA group of people running and being sprayed by a fire hydrant.\nA couple of buses parked in front of a two story home.\nA woman is taking a picture of herself in a mirror.\nA young man with acne holds up his necktie.\nTwo one way signs are on the same pole as a stop light.\nA woman sitting next to an older man holding a Nintendo Wii game controller.\na vase and flowers are sitting on a table\nA vase full of some yellow flowers sets on top of a counter.\nA large Banana tree on an island near the beach.\nA train traveling through a jungle next to a  bridge.\nA row of seats have closed off a stairwell.\nThe view of a large kitchen with a breakfast bar and stools.\nA man flying through the air while riding a skateboard.\nA little blonde girl standing in front of a fridge.\nA cow and calf sitting on the ground.\nthree people standing wearing umbrella hats near one another\nA large airplane flying through a gray cloudy sky.\nA picture of a building and some grass.\nA pizza cut into 8 pieces on a pizza pan.\nTwo elephants walking in the dirt near water.\nAdult giraffe with offspring in structured zoo enclosure.\nA card showing the right position to ride a horse.\nA donut with white and brown swirled frosting.\nA man is doing a trick on a skateboard.\na tray holding three plates of food including vegetables and fruit\nTwo cameras on a pole near a stoplight.\nA train is on the tracks in a country area.\nA city street with a fire truck, school bus and taxis.\na kitchen with a double sink a refrigerator and a counter top\nA boy cutting a piece of paper at a table.\na man doing a trick with his skate board\nA brightly colored train and a santa clause.\nAN ADULT BEAR IS STANDING IN THE FIELD\nThe view from the inside of a large clock tower with several people and bells inside.\na modern looking bathroom with solid wood paneling\nA bus driving down the road with several other cars.\nA messy bed with many books on top of it\nSkier on top of a mountain admiring view as sun rises.\nA group of people in the woods holding up clocks.\nTwo women and a pink umbrella riding a bicycle down the street.\nA boy is waiting by a train and train tracks\nA front view of a street stop sign.\nA group of skiers trekking a mountain in snow\nA man sleeping under a book bag on a floor.\nthere are many bike riders racing in a street race\nAn infant sitting on a table with a pink cake and pink decorations\na green plate of food with a fork.\na man is using a banana as a smiling mouth\nA dog looking around while standing in a window.\nA hotdog with mustard put on it by a mustard bottle hanging upside down.\nThere are many different vegetables grouped together here.\nUmpire makes a signal during a baseball game.\nA kitchen with all white cupboards and appliances.\nA person in a yellow shirt is standing on a long holding a water ski.\nA parking meter on the side of the road is covered in snow.\nA couple of men wearing uniforms playing a game of baseball.\nA white horse and a black horse standing in a field eating grass\nYoung boy gets ready to kick a ball.\nAn old suitcase with several worn stickers on it.\na man with a beard  is holding some food and some people walking\nA brick oven with pizza baking inside next to fire.\nTeenagers siting on crates are gathered around a small campfire.\nA piece of newspaper holding bananas with drawings on them.\na large bacon, spinach and cheese pizza with a large crust\nMale and female rams climb search for food on the side of a snowy hill.\na tall clock tower with a sky background\nA clock that is hanging on a wall above a window.\nSkiers skiing down a snow covered ski slope.\na white plate with some broccoli and some noodles\na man playing with his kids with a kite\nA bright orange and yellow engine pulls this train.\nA large blue and white airplane on the ground.\nA cupcake with frosting  and a star on top\nTwo young man playing soccer together on the field.\na large herd of horses standing in a field eating the grass\na man on a skate board grinds on a ramp\nA smiling blond haired little girl is hugging a teddy bear.\nA diamond shaped sign is sitting in the middle of the street as cars are riding on the side.\nThree giraffes eating in a heavily shrubbed area.\nA skier in a red jacket walking along a snowy forest.\nThe surfboard is painted in grey and pink splatters.\nA skier in the air over a jump.\nA couch on a trailer hooked to a bicycle.\nA small bed next do a daybed and coffee table.\nA meal sits on a table next to the ocean.\nThree young women hanging out on a bed.\nPeople standing near luggage placed on the floor.\nAn Equestrian jumping their horse over a white jump.\nA tusked elephant is walking among the greenery.\nA giraffes head peaking over bushes and trees.\nA team of ultimate frisbee players jump for the frisbee.\nA man with beard and tie on a subway car.\nMale skateboarder displaying leaping ability over steps with handrails.\nfemale surfer walking carrying surfboard on her side\nOne slice of pizza let with toppings on a pan.\nA man and woman posing for a picture.\nMany people are gathered to shop and eat.\nYoung girl having a meal in outdoor setting.\nA man with blue jersey holding a baseball bat.\nA mouse that is sitting next to a keyboard.\nA man an woman are sitting under an umbrella on a park bench.\nA mauve colored toilet bowl on the sidewalk\nA girl sits on top of a bouncy house texting on a phone.\nA man taking a turkey out of the oven\na locker with some books and school supplies in it\nA room with a couch, bookcase, and flat-screen television\na toothbrush is laying on a white sink\nA small child sits on the floor and watches tv.\nTwo young children are playing a video game.\nTwo dogs looking at some fenced in white cows.\nA cat sits looking out of a window.\nAn airplane with people under the wings at a field.\nA baseball player signing a baseball bat for a fan.\nFive old fashioned looking airplanes in formation in the sky.\nA pizza is shown on a plate with a serving knife.\nAn elephant in a fenced off area under a shaded tent.\nA brown purse is sitting on a green bench.\na desk with a laptop and a monitor sitting next to it\nA salad that has a white dressing on it.\nA bayside cafe with piers and boats in the water\nTwo kids that are standing in a living room.\nA man walks around with two sheep on leashes.\na man sitting alone on a black bench\na couple of small figures of a man and a horse\nAh, look at these sumptuous desserts under glass.\nA woman that is sitting in front of a cake.\nA living room with expensive furniture and a large window.\nA woman in a bikini showing a type of food\nA man holding a baseball bat in front of a catcher and umpire.\na small cat and small dog looking in the opposite direction.\nA slice of pizza sits on top of a plate.\nThe back of a semi truck on the freeway.\nA group of people that are standing in the snow.\nSome people are sitting and playing Wii in a family room.\nA small dog sitting on the ground at some ones feet.\nA red city bus parked on the street\nA woman standing near a large green pillar with a clock on it\nTwo red street lights that are on a wire.\nTwo plates filled with lots of hot dogs on buns.\nA guy's hat falls off as he plays tennis\nA women on ski's going through the air .\nA group of  people gathered around a laptop computer.\nAn airplane that has just taken off into the sky.\na bathtub with a small shelf above it\nGirl standing with a Wii controller in her hand\nA space shuttle is parked in a museum while visitors look around.\nA mother carries a dish to the sink, and a young man carries a beer bottle toward a counter, as a young girl looks on.\nOne kite flying and two stuck in a tree.\nA series of photographs about dinner at a skyscraper restaurant\nA woman holding a baby and sitting next to a dog.\nA tennis player's feet and shadow on a court made of clay.\nA red velvet cake next to an alcoholic drink.\nA couple of giraffe standing next to each other.\na building with art work and a sidewlak with afire hydrant on it\na cake that is less then half on a plate\nThis living room is large and has a glass sliding door\nTwo zebras and a giraffe in a dirt and rock covered area in front of a muddy pond.\nMen on a horseback at a polo competition.\nThe vegetables are sitting in the white bowl.\nBear behind fence of enclosure as official inspects him.\nA horse grazes on grass in the shadow of a mountain.\nGuy patiently waits on his surfboard for the best wave\nAdults shopping in produce section of grocery market.\nA young man skating boarding on a half-pipe.\nA small metal bowl holding an orange flower on purple sheet.\nA dingy with some pigeons on it in the water\nA sign on the side of a building on a street.\nA happy little girl lies in bed with a stuffed bunny.\nThis man is holding a breadstick and a bun.\nA Juicerator sits on a counter and dispenses a yellow juice.\nA crush soda on a white back ground with orange halves.\na close up of a motorcycle license plate\nA close-up of a table with three boxes of pizza.\nA cow and a bull walking down a skinny alley.\nTwo orange and silver trains passing on a street.\nFive chocolate donuts and three unfrosted ones and a Canadian penny sits on a blue pokadot cloth.\nA pole with multiple traffic signs near trees and bushes.\nGroup of men in white shirts and white hats holding tennis balls and tennis rackets.\nA bride and groom on their cell phones.\nA pizza sitting on top of a wooden table.\nA couple of girls sitting in a bed in a bedroom.\na brown and white owl and some green bushes\nAn upscale bathroom sunken tub with chandelier above.\nA man smiles as he holds a baseball bat in an historic photo.\nTwo zebra walking past a grassy forest in the daytime.\nTwo chairs and a glass table sitting in the middle of a well put together room.\nA room that is divided by pillars has two overstuffed chairs, coffee table, piano, a table with flowers in a vase.\nA toilet is in a small room with windows.\nThree elephants are on a dirt road.\nA white bowl filled with soup sitting on top of a counter.\nA woman holding a skateboard on the sidewalk.\nA mouse and a computer sit on top of a wooden desk.\nTwo people are walking in the shore of the ocean.\nA clock is shown on the side of a sidewalk.\nA man stands in a tree with an umbrella, observing birds,\nMan ironically holding up holes of scissors to eyes\na short woman helping a tall man fix his collar\nFive birthday cakes all in different and unique shapes for kids.\nSmall boy holding a bat above his head on a cobblestone street.\nPedestrians walking underneath a traffic light by a city road.\nA brown and white cow standing next to a tub.\nThere is a plate of broccoli  and vegetables\nA woman rides a horse quickly around barrels.\na person in a living room watching a television\nA black and white small dog sitting on a  foot stool.\nA man in a kitchen prepping a tray of food.\nA vase filled with flowers next to bottles of wine.\nThe stripes on the zebra almost disappear on its legs.\nA wooden table holding a white laptop and glass of wine.\nA teddy bear on a table and some red jello desserts\nTwo zookeepers feeding two giraffes in a zoo.\na living room with two chairs and a tv\nA large boat floating on top of a large body of water.\nA make shift office space in a bedroom.\nA large cock sitting in the middle of a street.\nThree boys look on at a little league baseball game.\nFour people are on a bench next to a store.\na large herd of sheep walking down a dirt road.\nA fire hydrant next to a sign of a fire hydrant.\nKitchen corner with refrigeratorfreezer and microwave next to an open closet door\nA person in a police uniform sitting on a motorcycle.\nA white terrier dog on a leash with a brown spot on his eye.\nA skateboard is skating down the sidewalk on his skate board.\nA computer desk with a turned on computer in front of a book rack.\nA man taking a bite of a large piece of chocolate cake.\nA group of people watching a black cow eat from a blue pot.\nA small silver and red airplane sitting on the ground.\nA pigeon on a brick street under a park bench.\nA few people are getting ready to ski.\nA large tiled  bathroom with glass sliding doors\nA crowd is watching a man on snow skis.\nSome food is about to be served for a meal.\nseveral boats docked at a marina with clear water\na lady and her dog on a paddle boat he dog as a life jacket on and hey are happy\nTHERE IS A YELLOW FIRE HYDRANT THAT IS ON THE GROUND WITH A BLUE CAP\nA man with a surfboard about to go surfing.\nA cowboy sitting on a horse at a festival.\na bunch of hot dogs that are in a bowl\nTwo guys that are sitting on horses in the dirt.\nTwo birds are perched upon a snowy bank.\nA bathroom with white fixtures and tiled floor\nthere is a piece of chocolate cake on a paper plate\na little dog trying to pick up a Frisbee\nA skateboarder is doing a trick in the air.\nA horse figure is on a snowy track\nA person who is holding a hotdog in a napkin.\nA horse and a dog positions for a picture outside.\na mom and her son eating at a restaurant\nA white outhouse toilet sitting inside of a stall.\nA man and a child on a ski containing a seat.\nA black bear crossing a road as a bus draws near.\nA dude in shorts playing baseball with a bat.\nA container that has a bratwurst in it.\nA do not enter sign sitting on the side of a road.\nA group of jets that are flying in the air.\nA Continental airplane is waiting for takeoff at the airport.\nA man prepares to fly a kite in a grassy area.\nA woman in black jacket holding skis next to trees.\nThree middle age men looking at a piece of machinery.\nI bet he will finish this entire meal in no time at all.\nA man power sliding on a long board\nPlayers and a referee playing on a football field.\nA motorcycle rally is attended by numerous riders.\na collection of lemons, limes and oranges in front of books and a mug\na computer desk area with electronic devices on it\na white purple and red double decker bus and some buildings\nA group of people at a park flying kites.\nA close up image of a bike gear and chain.\nTwo indians with pony tails are with some horses.\nChocolate and caramel sauces are on a tray with sliced bananas and strawberries.\nA train driving past mural of working men while billowing smoke.\nA plate with a large pancake cut in half.\nA white pickup trucking is lacking doors, bumpers, grill and one headlight.\nA photo taken outside a restaurant with tables and chairs.\na male in a blue shirt is playing ping pong\nA little red-haired boy standing in front of the refrigerator.\nA computer is shown with a keyboard and a mouse.\nA bathroom sink at a hotel with the usual amenities on the counter\nA bunch of people stand in front of a car and next to the nose of an airplane on the tarmac.\nA green and white bus on street next to dirt area.\nA person riding a wave on top of a surfboard.\nA person flying a kite in the snow\nAn elephant standing on top of a wooden stool.\nMany people and a dog under an umbrella on a beach.\nThe vase is holding the budding flowers on the table.\nA kitchen scene with wood floors and wood style cabinets.\nBlue umbrella on picnic table in front of food truck\nThe skier is carefully descending a snowy slope.\nA little girl riding on top of a skateboard in the street.\nA very sexy woman laying on top of a bd wearing fish next stockings.\nThe animals look to be walking a one direction.\nThis is a train on the tracks that is filled with doors for houses.\nA cat is lying on a cushion on a couch.\na man is on the court playing tennis outside\nA tennis player striking the tennis ball for his next shot.\nA man walks down the road with some cows.\nCrates used as tables, full of fresh produce at an outdoor market.\na person in a sweater holding a cake over a paper plate\na large group of children holding their kites\nA bus stopped at an intersection in front of a church.\na red fire hydrant is between a couple of poles\nA woman in a pink hat looks at her phone in a crowd.\nMan posing for a shot wearing a suit and tie and carrying a briefcase.\nthere is a dresser that has a mirror and many things on it\na double decked bus drives down a city street\nThree birds on some rocks near the ocean.\nA photo of several bunches of yellow bananas.\nA grizzly bear sitting outside in the grass.\na man dressed in riot gear wearing a face mask and holding a red and white umbrella\nA refrigerator in a corner of a room.\nA group of birds flying over the water looking for food\nAn alsation dog paddling through some water in front of a building\nA child playing with a baseball bat and a ball.\nSome leafy trees are hiding a black bear.\nA giraffe looking over a fence on a summers day.\nAn attractive young woman speaking on a cell phone.\nA silver and black computer mouse stands next to an open laptop.\nThe man is standing by a large herd of cows.\nA person on skis coming up a snowy path.\nTwo adults and a child walk on the beach in front of a cruise ship.\nA refrigerator covered in pictures and stickers in a kitchen.\nFour young men in a sitting area stand looking towards the opposite side of the room.\nPlates of food and two glasses of red wine are on a table.\na man and a small boy standing on a tennis court holding tennis racketts\nA bird displaying its decorative plumage among some leaves.\nA man and woman sitting on a park bench\nA GIRAFFE STANDING NEAR TO TWO DEER IN A SEMI-ARID GRASSLAND.\nA row of chairs and some umbrella's on a beach near the water.\nA man carrying his surf board out of the ocean.\npeople sitting, walking around and some are in groups\nTwo people cooking bowls of ramen in a kitchen.\nA boy in a plaid shirt holding an umbrella.\nA frisbee will be thrown to a girl's dad in time\nKites being flown by a crowd of young children on a cloudy day.\nA row of vespas parked next to a bunch of motorcycles\nA group of young ladies sitting around a table sharing a meal.\nTwo elephants in a concrete enclosure at a zoo.\npeople standing at counters of booths being served\nLooking down the length of a city street while cars pass by.\nA baseball player hitting a ball on the field.\nA person laying on a couch with a cat laying in their arms, covering part of the face.\nA counter filled with vases, candles, and fruit.\nThe entrance for a subway on a city street.\na large crowd of people at an airport terminal\nA small kitten is walking on a computer keyboard.\nA bowl containing meat, lo-mein noodles and broccoli.\nYellow umbrella stands sitting on a beach with one chair.\nTwo purple flowers sitting in a green vase.\nThe bedroom is decorated for a female and includes a breakfast tray.\nA young man kicking around a blue soccer ball.\nA train is traveling down the tracks in the open field.\nA woman holds her hand out to feed a giraffe.\nA person who is all bundled up standing in the snow on skis.\nA fire silver and red fire hydrant is in the grass near a curb.\nWoman in shopping aisle with bear on her head\nA woman holding a video game controller is playing games.\nThe tall tower in the middle is framed by two large buildings.\nA man in a yellow jacket washing an elephant.\nA man riding up the side of a pink ramp on a snowboard.\nA tennis player hitting the tennis ball with the racket.\nA small grey and white kitten stands next to a foot.\nAn airplane is flying high in the sky after taking off.\nThe car has two different shades on green on it\nTwo cats are siting right next to each other.\nA man sits on a couch in a sitting room with coffee table with an open laptop on it.\nA TALL VASE OF FLOWERS IS SITTING IN A WINDOW SEAL\na person holding a cell phone to a gerbil\na huge brown bear standing at the edge of a small hill\nOpen toilet, basin, and shower stall in compact bathroom.\nA bunch of flowers in a clear vase of some sort.\nA man looking at a computer game on the counter\nA bot stands in front of a bus, while other men look on.\na group of people in a field playing frisbee\nAn animal leaning against a bare tree relaxing.\nA side table with a lamp and books net to a home library.\nA surfer in a wet suit carrying a surfboard as he walks into the water.\nMan doing a skateboard trick while others casually watch.\nA boy is cutting slips of paper with scissors.\na photo of someones living room complete with, bookshelf full of dvds, two leather chairs, a flat screen tv, fireplace, and a overly large decorative clock.\nA woman sitting a table holding two hotdogs.\na snowboarder with a blue jacket walking up a hill\nWoman and child watching people row in water.\nA dour young man sits on a horse.\nA room with several types of luggage against a wall next to a mirror.\nThis bathroom is decorated with wood and has several mirrors\nAsian men at a a white board talking with a Samsung sign behind them.\nA boy is in a courtyard on a skateboard in the air.\nA large yellow dump truck parked and empty.\na close up of a giraffe and people holding a bucket\nAnimals eating grass in a hill by the ocean.\nA banana tree filled with lots of unripe bananas.\nThe airplane is flying high above many clouds.\nThree tall urinals and one short one in a restroom.\nThis picture shows the details of a red colored skateboard.\na close up of a vase near other vases and a plate\nA bus underneath a large crane at a factory\nA small airplane is parked on the runway.\nPeople walking in the middle of a snowy street on a campus.\nTwo adults and one baby elephant walking in the wilderness\nA dog sitting on a chair next to a soccer ball.\nTwo truck cabs facing each other on a road.\nA man standing on a surf board with a paddle.\nA man flying through the air while riding a snowboard.\nA woman is standing over a stove holding a cup.\nA flock of birds flying over a light house near the ocean.\nA water fountain that has a pigeon perches on it.\npeople in a boat moving in the deepest place\nA person walks a dog with little shoes on.\nA group of sheep grazing in the field\nA boat on the ocean with a grouping of birds flying around.\nA man riding the waves on his surf board.\nPlastic containers and a bowl filled with lots of food.\nYoung man in white playing tennis at a tennis club.\nMiniature pizzas and skewered heart shaped pretzel bites.\ntwo boats sitting on the shore close to the water\na cat that is laying down on a couch\nA red double decker bus driving down a busy street.\nA crowd watches two people at a tennis match.\nChurch cathedral with decorative arches, marble floors and high vaulted ceilings.\nA man with his shirt off, is flying a kite.\nA close up shot of a red apple beside an orange.\nA man on a dirt bike riding on a dirt road.\nA man riding a board on top of a skate park.\nA dog is sitting in the back of a pickup truck.\nMan painted in gold paint standing next to a horse.\nA street scene looking at a clock on a pole.\nA train traveling down tracks near a station.\nWooden benches in the middle of a forest.\nA building is shown with tables in front of it.\nA yellow commuter train traveling through a train station.\nA long red train o the side of a field.\nTwo pieces of cake are arranged on a table\nTwo kids with joysticks and remotes seated on a couch playing a game\nA man in a purple shirt trying to catch a frisbee.\nsome water boats bushes trees and buildings and a train\nA young girl walking on a road carrying an umbrella.\nThe baseball players are about ready to take the field.\nA person showing a selfie of themself to the camera\nA couple of judges judging some sheep at a county fair.\na cow in a field on a very foggy day\na living room with two laptops and a tv\nA clock between two archways on a castle\nA group of men sitting around a laptop at a table.\nSeveral people on motorcycles sitting parked on the road.\nFour cats inside a caged in area, two yellow, two not.\nA group of people sitting at a table eating.\nIt's hard to tell if these are tennis players from the thirties, forties, or fifties.\nbaby in pajama's sitting on the bed playing with an object\nHorses are eating grass on a large pasture.\nA bowl of vegetables is set next to a blender.\nA long freight train crossing on a bridge over the ocean.\nAn airplane flying with dark and light clouds in the background.\nA hot dog with cheese, mayo and a vegetable on it.\nA woman stands in front of a neatly made hotel bed.\nThe woman sits at the table overlooking the pink and white cake with lit candle.\nAssortment of shells and soaps displayed on commode with dental care products.\nA person flies their kites above people by water.\ntwo sheep next to a wooden structure behind a fence\nA train bed with a blue sheet and various items on it.\nA pitcher winds up for a throw at a neighborhood baseball field.\nA family and a dog playing frisbee very near to the edge of one of the cliff of the Grand Canyon.\ncows resting in the shade and relaxing for a moment.\nSkateboarders are doing tricks as a crowd watches.\ntwo brown bears on some rocks in their pen\nA woman walking down a street in a dress with a bag.\nA couple of zebra standing on top of a grass covered field.\nThree men stand on a beach watching a kite fly.\nmany different sinks near one another with mirrors\nA woman covering her face sitting next to a man on a log.\nA snowboarder decked on in great poses for a picture.\na train moving on a snowy area and besides an ocean\nA lady with a dog in the snow waiting to cross the street.\nA police officer standing next to his motorcycle after pulling someone over.\nA smiley face sitting on top of a wood table made out of fruit.\nThe bathroom contains a bathtub and shower, toilet and sink.\nthere is a male skate boarder doing a trick inside of a parking lot\nA girl taking a picture of herself in the mirror.\nThe train is on a railroad track, under a signal light.\nA train maintenance vehicle sits on train tracks.\nThere is a view of a bench and houses down the hill\nA man a woman pose on a tennis court.\nA couple of pictures of a cat sleeping on a hair brush.\nA white toilet sitting next to a white bath tub.\nA man and a woman surrounded by people.\nThere are red benches near the grassy area.\nA police officer rides his motorcycle next to the protesters.\nThe lights and sights of a busy, populated city in Asia.\nA group of men and women rowing a boat in the middle of the sea.\nA young girl in pink snow gear on a snowboard.\nA woman being pulled on her water skis.\nAn infant girl sitting in a shopping cart.\nA women who is holding an odd shaped carrot.\nA blue and yellow plate of food that includes rice and beans.\nA baby zebra hiding among the tall grass.\nA person with a camera taking a picture through a mirror.\nA clock mounted on the face of a building next to an eagle statue.\nA lady and a baby at a pizza parlor during the day.\nA man riding a skateboard in a covered skate park.\nA large brown and grey cat sits on top of a desk.\nA train rounding a corner on the tracks.\nSeveral crafting items laid out on a white linen.\nThe red city bus is driving next to a construction truck.\nA man holds the string of a kite, as many kites fly in the sky.\nThe girl is wearing a jacket with fur and has a yellow frisbee.\na woman standing on a tennis court and holding a racket\nBlurry shot of man at the intersection of busy street.\npeople holding a surfboard and walking down the beach\na cat with a big fluffy tail sitting on top of a car tire\nTwo birds are sitting in their respected area.\nA cat standing close to and looking at two geese.\nA small house stands in a small constraining carriage.\nA fire hydrant spraying acroos an empty street\nA person attempts to remove something from a large oven.\na close up of a stop sign with a sky background\nTwo elephants walking in a dirt field next to trees.\nAn old train makes its way down the track in the country.\nBunches of bananas hanging from wooden rafters by string.\nGiraffes walking in their enclosure at the zoo\nThree birds are sitting on the branches of a tree.\nA dog that is playing in the snow.\nA white SUV parked in front of a train.\nA green and white bus traveling down the street\nA street sign where there is currently construction.\na child is blowing out the candles on the cake.\nA styrofoam plate with cats with noodles on it.\nA hawk perches on a tree branch in a forest.\nA gray cat is sitting in an empty red suitcase.\nA man and woman walking past a fire hydrant.\nA stuffed animal laying across the steering wheel.\na red plate a table drinks and a sandwich\nA woman is swinging a tennis racket at a ball.\nan elephant is eating grass and a bike is nearby\nA bunch of surfboards that are on the ground.\nA cat is sitting on the seat of a blue motor-bike.\nSeveral men are standing or walking on a soccer field.\nA cake that is well decorated with green stuff on it.\ntwo surfers one in a white shirt and water\nLondon Olympic games statistics statue with many tourists and visitors nearby.\na number of motorcycles parked near one another\nA produce market displaying racks of fresh fruits and vegetables.\ntwo kayakers enjoy the clear open water\na dog is siting behind a large window\nA parking meter in a parking garage that has a lot of cars.\nHorse held by two leads in passageway of large stable.\nA plate topped with pancakes next to a cup of coffee.\nan empty truck parked next to a building\nA man wearing an american eagle tie in a suit.\nA platter of donuts sits on a wooden surface.\nFour men with smiles on their face, in a kitchen.\nthis is a close up picture of a giraffes head\na red orange double decker bus smoking on the road\na large building with words scrolled across it\nThe man holds a pig foot next to his mouth.\nPaved highway with several cars moving past an exit\nA small residential bathroom featuring oddly shaped furnishings.\nA glass vase holding a flower on a wood table.\nThere are several animals in the grassy field.\nPeople and sheep traveling down a long country road.\nA man with a surfboard stands in the water.\nA red bus is next to a curb and trash bag.\nAn old, small residential bathroom with blue curtains\nA group of friends waering skirts and dressing are walking down the street.\nA young girl appears to be enjoying a biscuit of some sort.\nAn Indian man and woman in the water on the edge of a river.\nA dog is chasing along behind a cow in a field.\nSeveral double decker buses driving down the road.\nA picture of a bathroom with a large shower.\nTwo people on a long rowboat in a river or lake.\na shed with giraffes near it behind a fence\nA bench sitting by some very pretty assorted plants.\nA bath tub sitting next to a white toilet.\na man in a suit glares while standing outside\nA metallic refrigerator freezer sitting next to a stove.\nA large long train on a steel track.\nA giraffe in the brush standing facing away from the camera.\nA man standing behind another man helping him with his tie.\nPeople sitting on a curb watch a parade and horses walking in the street.\nTwo gentlemen doing a show with umbrellas and colorful suits.\nTwo guys walking and talking in the room.\nA woman and a man pass food between their mouths.\nA table with a coffee and a salad on it.\nThree large kites flying in the sky near the water.\nA cat pawing at a television picture of some penguins.\nTwo women are sitting on a bench outdoors having a conversation.\nThere are two woman in bathing suits and a cat\nA herd of elephants in their natural habitat.\nA street is lined with people and buildings.\nMan checks wheel on mule drawn cart driven by girl.\nBoys standing in front of microphones outside in front of cameras.\nTwo people walking in the ocean away from a boat.\nA pizza with olives is on a plate.\nA street sign that is in front of a cemetery.\nTwo laptops next to each other are open on the desk.\nA sign in front of the airplane warns that tobacco is not allowed in the area.\nA person holding a blue umbrella in the rain.\nA young girl adjusts her pink sunglasses in a park.\nA grouping of luggage with tags and luggage trolleys.\nA baseball game in progress with the player running the bases.\na dog sitting under a desk with a monitor\nMany people ride on surfboards as one man catches a wave.\nA big bird stands between a trail and some trees.\nA bamboo bench with a backpack sitting on top of it.\nCellular phone displayed on display case with other phones.\nThe horses have made this patch of ground quite bare.\nA HERD OF SHEEP GRAZING ALONG SIDE A HILL.\nodd, four street signs on a hill away from the traffic\nA pizza sitting on an outdoor table in the sun\ntwo legs a toilet a stall door and white tile\nA man holding a child on top of a skateboard.\nA motorcycle parked in front of a wall.\nA red stop sign mounted to a black pole.\nA teddy bear sitting on top of a red plastic basket.\na large crowd of people at the park with some playing with a large kite\nA woman tennis player in a black army shirt and tennis skirt, swinging a tennis racket.\nPeople have set up tents near picnic tables on a beach.\nmany people and line of parked scooters and motorcycles at night\nA horse stands near a colonial era stone furnace\nTwo young women are eating hot dogs while walking down the sidewalk.\nA large neon sign at a market square\nA little girl is playing with a hair dryer\nToothbrushes sit in holders arranged around a sink.\nA young man in a suit, tie and glasses is smiling\nCows are trying to kiss the girls on the arm.\na bathroom with a sink, toilet , and tiled floor\nA man wearing a tie, jacket and white shirt.\nA girl alone on a beach flying a kite.\nA large stack of trunks and luggage on a sidewalk with people behind it.\nA painting of a vase with a polka dot gray background.\nThe plate of food has meat and cooked vegetables.\nTennis player with the teeth of a predator.\nThe elderly woman uses a video game remote near her companion.\nThe sleeping child is holding onto a teddy bear.\na little car sitting by a wall with a picture on it\nA young blond beautiful woman standing on a tennis court.\nA wall with lots of weird things mounted to it's side.\nA MAN IS HITTING A TENNIS WITH A RACKET\nA man that is on a snowboard that is in the snow.\nThe cow sticks it's large tongue out of his mouth.\nThere are street signs and a traffic light at a downtown street.\nthere is a woman sitting on the ground making food\nA woman in a bikini laying under a red umbrella.\nA giraffe standing next to a tree that it is chewing on.\nGiraffes mill about in their pen at the zoo.\nAn elephant pokes his nose in the brush.\nA large red bus on a city street.\nThere are two zebras walking side by side.\nvintage black and white photo of old motorcycle\nA large white bed sitting under two framed pictures.\nA small orange train traveling down tracks near a station.\nA sandwich is on a long bun in paper wrapping.\nA HERD OF SHEEP GATHERED AROUND AN OLD BARN\nThere is a birthday cake with chocolate icing on the table.\nThe three teens are talking to each other on the sidewalk\nBlack and white image of a woman and a man petting a horse.\nThree animals are standing near a body of water.\nA couple of plates of food on a table.\nAll ages can have a good time using the Nintendo Wii.\nA man in a suit poses for a picture with each of his arms around a boy in a suit.\nA little girl on the beach playing with a frisbee.\nA keyboard and mouse are sitting on a desk in front of a laptop and monitor.\nAll of the planes are flying in the same direction.\nA white plate topped with three donuts covered in frosting.\na number of baseball players on a field\nA personal single engine jet, on the runway\nA building seen through a rain and fog covered window.\nA man riding a wind sail over a large body of water.\nLots of colorful flower vases hang on a wall.\na woman on a phone is waiting for a bus\nA cake with tow layer smothered in white frosting.\nCarefully sculpted pieces of wood in a display case.\nA black man opens his fridge and looks inside.\nAn airplane during takeoff ascends into the clouds.\nA red light rail trains passes through a station\nA little girl is sitting with an umbrella\na man in a suit grabs his head while screaming\nFour persons skiing on snow clad mountains and slopes.\na small kid holds on to some balloons\na guy that is skateboarding on some kind of concrete\nA person in skis going tightly around a flag.\nThe desk has a desktop computer and a laptop on it.\na man stands in a kitchen by a table\nA person skiing in an open area of snow.\nA number of pizzaz sitting on a wood table\nThe group of skateboarders is headed towards the park.\nThere are many seagulls standing on the ledge over water\nA picture of a man swinging a tennis racket.\na parked air plane sit at a airfield\nAn old woman getting vegetables from a heavy loaded cart.\nA girl looks in the mirror as she brushes her teeth\nA wide variety of produce is for sale including apples, pears, and onions.\nA commercial district street with a sign pointing where to stop for a crosswalk.\nThe teddy bear is posed as if he was working out.\nA white cake with red designs and two cups next to it.\nA horse on its back with a man watching.\na kid on a snow board stands in the snow\nThere are two males on a vintage red train.\nA pizza pie with vegetable toppings and cheese\nA bathroom with the light on and a painting hanging over the toilet.\nTwo suitcases that are sitting on a chair.\nA laptop, Furby toy and books on top of a desk.\nThis is an image of a woman getting her hair styled.\nA refrigerator has a note pinned to it with a magnet.\nA small weiner dog that is cooling off in the pool.\nA young man wearing black does a trick on his skateboard where he is almost parallel to the street.\nA man standing in front of a flag holding a plaque.\nthere are many young men on the field playing soccer\nA market area displaying various fruits that include plumbs and pears.\nSome flowers in a vase on a table\nA lady petting her dog and a man standing on a log\na polar bear on a field near many trees\nA man getting ready to hit a tennis ball with a racket on a tennis court.\nA man is sitting in front of a desk with a coffee mug.\nA man helps his friend fix his tie before a photo shoot\na baseball player swings his bat at a ball\na dog that is on the lap of a women\nThe two cats are looking out the high window.\nA small group of people on the sidewalk with a few holding umbrellas.\nA man standing next to a woman in font of a tray of food.\nA young boy is holding a baseball mitt in a grassy field.\na man is riding a board in the water\nTwo doughnuts sit on a plate with drinks surrounding.\nThe is train cross a bridge over water.\nA child sitting at a table with a plate in front of him.\nA woman holds a birthday cake as a man lights the candles while another man looks on.\nA man in black shirt and apron in a kitchen.\nA man standing in the grass flying a kite.\nA person riding a skateboard up the side of a wall.\nA man in a brown shirt is playing a video game.\nA white and blue jet airliner docked at an airport.\nA man tossing a frisbee on a lush green field.\nThe bed has mosquito netting hanging around it.\nA man running after frisbees in a wooded area.\nA black boat with a dog on it going down the river.\nA person flipping a skateboard with his feet in the air\nA multi colored dog jumping up to catch a frisbee.\nA very big grassy field with a bunch of bats together.\nclear vase filled with white and yellow flowers with water\nA black sign with directions stands in front of the blue sky.\nA dog in a mirror with a person in a room.\nYellow passenger buses ride side by side down a crowded street.\nA black and white photo of old cars and a boat, all sitting in front of a lake.\nA desktop computer with a note attached to the screen.\nTwo men have thrown their ties over their shoulders during a meal.\nThe woman wearing a coat stands near sheep behind a fence.\nsome pasta in a bowl sits on the table\nA man holds up an x-ray and looks at the camera.\nA man wearing a backpack pauses to talk on his cell phone.\nTwo tennis players consult with the referee during a tennis match.\nA room with two pictures on the wall and a table with a computer monitor on it.  A wooden floor and a table with a yellow bowl and a grey and white rug.\na person cutting a small cake on a table\nA traffic speed limit sign sitting in the middle of a road.\na bunch of cows are in a field\nSeveral zebras eating together in a fenced in area.\nA bunch of lumberjacks moving logs in the woods.\nMany pots and pans have been hung over a kitchen bar.\nA cell phone being held by someone is showing two women on the screen.\nA sign indicating the historical site that is the Nathan Hale Homestead.\nA completely shattered television lying on the sidewalk.\nA very delicious sandwich with black eyed peas on the side.\nTwo men are sitting side by side as they are eating and smiling, they both are cutting their food with a knife.\nPublic transportation train with blue front approaching the station\nThree people preparing to launch a small boat in a river.\nFedEx trucks parked on the side of the street while cars wait in traffic.\na man skat boarding down a concrete pathway\nan elephant is standing behind a wooden fence area\nWe are looking at a crowded city street.\na man standing at a table with wine in a supermarket\nA sandwich with a drink and a bag of chips.\nthe woman is holding a cat with a hat on\nA row of fire hydrants sitting on the edge of a road.\nA clock stands alongside a busy street at night.\nThe two skiers are eager for the finish line to come.\nThe bench is chained to the outside door handle.\nthere is a woman sitting at a table eating\nTwo elephants in an animal sanctuary with trees\nThe back of a car that is pulling up to a stoplight.\nA produce stand with a variety of fruits and nuts on display.\nThis public restroom has no toilet but instead a simple porcelain hole in the floor.  .\nA white and black street sing covered in snow next to trees.\nA woman holding a pizza box and a paper bag.\nMany skiers are traveling along through the snow.\nA red, white and blue airplane is high in the clear blue sky.\nA group of people in a restaurant eating a meal.\nA group of softball bats leaned against each other on a field.\nA table with crafting supplies next to a cell phone.\nA bird sits on a wire over a street sign.\nA bird sits in a tree branch with leaves.\nAn old Boston baseball player sits while holding his bat.\nA man that is inside of an elevator shaft\nInside of a bathroom with a sink and mirror.\nA red kitchen with metallic appliances and paintings on the wall.\nA man holds a bat at the base.\nA man by a book case has a guitar.\nMany bananas and apples are on the kitchen counter.\nA cat is sitting alone in the middle of a large patio area with a historical building in the background.\nA man wearing a yellow and white striped vest and hat\nA person holds a bag while walking on train tracks.\nA large sandwich with meat, cheese, and vegetables.\nA man flying through the air on a skateboard.\nFancy standing clock sitting in a nice setting.\nA bed with pillows where the blanket is slightly pulled back.\nA woman in a bikini with a surfboard in her hand\nTwo cats are staring at a light spot on a floor.\nA snow boarder boarding down a snow covered mountain.\nA young man standing over a pan filled with food.\nA women who is eating some food and looking out a window.\nA man in skis holding a stuffed animal near a group of other skiers.\nPeople with their faces blurred out play Wii on a mounted TV.\nA collage of photos of cats and goats.\nA soldier riding on the back of a black horse.\nThe man is a wet suit is catching a wave.\nStreet sign advising to turn left for Shanks Avenue\nA woman holding a tennis racket swinging at a ball.\nThree businessmen who are crossing a city street together\na brown cow standing next some other cows\nThere is a surfboard sitting next to a car.\nThere is a dog that is walking on the beach at sun set\nAnd elephant behind a low log fence and someone leaning on the fence, taking a picture in another direction.\nA little girl using a laptop on a table\nAn employee slices a large piece of pizza, pretzels hang bear by\nThe man on the skateboard and the dog are getting their picture taken.\na large crowd of people is outside a building\nA tennis player stands before a net and waves while a camera man films him.\nA man playing a game with a remote controller.\nA suitcase, sitting on the floor, opened is full of clothes and a curtain is behind it.\nTwo cats standing under a windowsill with each other.\nA person with an umbrella near a building.\nA bottle of beer sits next to a gourmet pizza pie.\nA guy wearing a blue shirt is skiing.\nA man in striped shirt looking into an open refrigerator.\nA red stop sign that is on top of a pole.\nA bathroom sink with a facet and soap dish and three mirrors that reflect three sides of the sink.\nA man that is standing in a kitchen near a bowl.\nThe front of a city bus rolls down the street.\nThe bicyclists have formed a train, and are being towed by the city bus.\nA small giraffe with its head down, standing next to a tree.\nA man in a black wet suit is about to stand on his surf board.\nA man standing in front of a clock.\nThe person has fallen asleep while holding their skateboard.\nA street with people walking on it and items on the sides of the street.\nA bathtub sits against a wall with a sink and toilet in the foreground.\nAn expressway with street signs in Chinese.\nThis is a decorated red velvet cake on a red tablet cloth.\nA kitchen counter that has various objects on it.\nTwo people playing a video game on a projector\nTHERE IS A MAN THAT IS ON A SKATE BOARD IN THE STREET\nA little boy sitting in front of a computer keyboard.\nThree birds are lined in a row in a grassy area.\nA man wearing a helmet rides a skateboard\nA WOMAN CARRYING FOOD ON TOP OF HER HEAD\na close up of a bowl of food with broccoli\nA young woman in a gray, long sleeved t-shirt sits on top of a yellow structure looking at her cell phone.\nA man holding a dog sitting outside looking down.\nA fleet of airplanes rest at their gates at the airport.\nSnowboarder displaying aerial tricks in populated urban setting.\nA dog is in a living room sitting on the back of a couch.\nA clock that is above a pedestrian walk way.\nA pair of scissors next to some pieces of paper.\nA ripe banana sitting on top of a wooden table.\nA woman holding a cat in her arms in a car.\nA small white bird standing on top of a dirt field.\na small plate that has some food on it\nSheeps and goats eat food in their pen\nA man standing on top of a skateboard.\nA man up to hit in the middle of a baseball game\nA glider is flying over the beach on a foggy day.\nWomen walking down the street holding an umbrella\nWoman in midst of a Wii activity, holding the remote and smiling.\nA clock sits on an iron part with lights above it.\nTwo gray and white cats laying around a toilet.\nA red and white train sitting on the train tracks.\nA small inverted airplane flying in the sky.\nDouble decker bus in front of store on empty street.\nTwo men in a small living room are playing with the Wii.\nMany people prepping large kites on a beach.\nA guy in a hat skateboards across a ramp.\nA big black bear lays down in a lush green open field\nThe uncooked pizza has raw tomatoes and lettuce on it.\nA man walking on a tennis court with a racket in his hand.\nA horse drawn carriage riding past a city trash truck.\nA man leads a horse cart carrying four people including two ladies with headscarves.\nA car parked in the street next to a parking meter.\nFour men in military uniforms are smiling while holding an item next to a table as other people look on.\nA black and gray goose standing in the sand\nA boy doing a trick in the air on his skateboard\nThe back of the garbage truck has rotten bananas on the bottom of it.\nA man and a woman walking past a bus with an umbrella.\nThe meal being eaten at the table is on a blue and white plate with spoon, fork and knife.\nA cutting board topped with two sandwiches next to drinks..\nThe sink of a large modern bathroom is full of water.\nA pizza with two slices missing from it.\na cat that is laying down on a bed\nA hot dog on a bun with an abundance of yellow mustard.\nA man beside a valley stands beneath an umbrella in the rain\nA woman wearing goggles skies down a large hill.\nA person on a snowboard in the snow.\na girl flies a kite near some other people\na female in a white top is playing tennis\nThe Asian market has a large quantity of pears available as well as other produce.\nLooking up at a traffic sign and street light.\nA woman looks at her reflection in a handheld mirror.\nA man removes food from an oven with hot pads.\nStreet sign light on a traffic light pole\nAdult men standing in living room playing video game.\nA white bowl of tangerine slices on a wooden surface.\nA pile of veggies next to meat covered in gravy.\nA green backpack with a computer mouse poking out.\nA person with a pair of scissors about to cut hair.\nDog laying down on the sofa next to a cat.\nA cat that is standing on a bench.\nHerd of cattle laying on a beach that has people on it\na bowl with liquid flavors in containers lemon orange banana and pineapple\nA group of skateboarders riding down a city street\nA white plate topped with a sandwich and chips.\nA man and his son eating donuts at a restaurant.\nA person on  a skate board in mid air by a rail.\nOlder style single engine airplane being displayed at air show.\nThere are several people walking in a street parade.\nA bird perched on brick ledge with a hole in it.\nA beautiful blonde holding a Nintendo Wii controller with another beautiful woman holding another Nintendo Wii controller.\nGrey fighter jet, with pilot, on a runway.\nA woman is pulling on a man's tie.\nA city street with traffic caught in motion at night time.\nA man in an inter tube by a boat in a lake.\na man riding the side of a wall with a skateboard\nA skateboarder rides on the side of a large pipe.\nA dog is looking out of the window of a car.\nCattle are crossing the road to a beach front.\nA pair of scissors stabbed onto a wooden counter top.\nTwo photos of a tennis player rushing to hit a ball.\nA flatbed truck carrying the remains of a crashed light airplane.\nA black and white dog carrying a frisbee in a field\nA person is laying tennis with racket in hand\nThree guys at a table eating a giant pizza.\nA bathroom being remodeled with toilet set aside\nA man is taking a picture of his bathroom sink.\nA man and young woman fighting over a frisbee.\nTwo bowls of food next to a pack of lemonade.\nWe are looking down on a market square.\nA large boat is motoring toward the shore.\nTwo men drink wine with their eyes closed.\nA small gray elephant  standing in an exhibit at a zoo.\na bridge with a train driving over some water\nA woman taking a picture of the back of her top.\nA desktop computer on top of a wooden desk.\nThe zebras are grazing in the open field.\nThe young man is talking on his cel phone.\nA man holding up a phone and pointing to it.\na man in red is sitting on a barrel\nA tennis player getting ready to hit the ball.\nSeveral people who are skiing pose for a picture.\nThe dog is all dressed up and ready to ride.\nA van and car driving down a street.\nA clock is on a pole under a set of windows.\nA person jumping up into the air for a Frisbee.\nA young woman sitting on a rock under an umbrella\nA skier performs a somersault on a ski slope.\nCars are parked on the street near a traffic signal.\nA man drinking from a wine glass in a polo shirt\nTourists among taxi and double decker bus traffic\nA computer is on a desk in a blue room.\nThe road sign is visible for all to see.\na man typing on a desk top computer at a desk\nA baseball player is swinging his bat at a pitch\nA man making a goofy face while sitting near a cake.\nA man laying on top of a couch.\nA huge bundle of bananas is hanging from a tree.\nA man at the beach flies a red, white and blue kite.\nA great shot of a very lit up city.\nA close up of the edge of a table looking at a keyboard and a mouse.\nA black and red train traveling down tracks.\nA group of people sit on a dirty boat.\nA man is brushing his teeth while a piece of tissue sticks out of his ear.\nGlass and stained wood entertainment center, with decor and a flat screen television.\na couple of giraffes stand next to each other\nA bird perched on top of a wooden power pole.\nA sign is shown pointing two ways with a dog.\na man walking on the beach with a red surfboard\nA zebra standing on a dirt road next to a bunch of deer.\nA plate of food showing broccoli, fish, lemon and rice.\nthe man is leaning over taking a picture of another man\nA sheep is standing in the grass near water.\nA man holding a blue, red and green frisbee in his hands.\nA man carries a bulky, stuffed piece of luggage.\nA man holding a pair of headphones in his left hand.\nA para sailor goes airborne over waves in the water\nA high mountain of snow with a cross country skier.\nA fighter jet flying through a blue sky with smoke behind it.\nAn airplane is mounted on a stand in a park.\nthis is a woman using her cell phone\nA couch is looking quite dark with the blinds down.\nA pig head on a plate surrounded by a bunch of apples\nA skateboarder spreads her arms to balance herself as she circles the rim of a bowl shaped course.\nA man in a kitchen concentrating on cutting an onion on a board with a knife.\nTwo attached train cars on a track.\nCross-roads sign for Jekyll and Hyde roads attached to top of stop sign\nA man sitting at a table eating a sandwich next to a marker board.\na man hitting a baseball during a baseball game\na bathroom with some knobs built into the wall\nA truck sitting in the middle of heavy traffic.\nA large bus with several people standing out side waiting to get on.\nTwo women playing paddle ball on a sandy surface.\nA computer desk sits in the corner next to a dresser.\na shadowy looking man jumping over a ramp\nA tropical bird in flight on a sunny day.\nA small ham and pineapple pizza on a plate next to a spicy pepper shaker.\nA pizza with tomatoes, corn and a pizza cutter is laying next to it.\nHerd of Wilde beast and zebra walk through grass by shore line\nA bird standing next to a partially eaten apple.\nA row of surfboards sitting on the beach near the ocean.\nGroup of people watching two skiers come down a slope.\nBlack and white photograph of people with bicycles and skateboards next to a ramp.\nA salad bar filled with lots of different foods.\nLittle kid in a cap stands next to a fire hydrant\nmany red and white stuffed bears holding hearts grouped together\nA boat sailing in the water near a beach and grass.\nA baseball bat hanging to the side of a wall near a sign.\nTwo women sitting on a couch with remotes in their hands.\nA very cute green city bus on a busy street.\na young boy about to take off his helmit after playing baseball.\nA white sink and a shower in a room.\nA man flying a kite stands next to a young boy.\nA close up of multiple vegetables including broccoli.\nA brown horse standing on top of a grass covered hillside.\nA row of wooden shelves with lots of glass pottery on it.\nThe bears look like they are hugging each other.\nA bed has no sheets or pillow cases\nsome white jets are lined up on a runway\nAn outfielder watching what is going on at home plate.\nA bathroom that has a yellow floor mat in it.\nBoy in midair while skateboarding on indoor course\nmore than one yellow public transit bus in the road\nA person is skate boarding on a sidewalk.\nA collection of apples and oranges in wooden crates.\nA picture of someones bed and dresser in a bedroom.\nTulips about to bloom in vase in vacant room\na man rides on top of a race horse\nA cat peeking into a room from a curtained window\nA small puppy chews on a dog toy shaped like a pizza slice.\nA table sitting inside of a room next to a window.\na man rides on a horse near a blue car\nAsian vegetable stir fry dish with wreath of broccoli and assorted mushroom varieties.\nA group of people are sitting around a wooden table.\nTwo motorcycles side-by-side parked in a grassy area.\nA bunch of people on holiday at the beach.\nA giraffe walking in grass on a sunny day\nA baseball game where a player is running to 3rd base.\nA tennis player on the tennis court in the middle of the swing.\nA fritata on a plate with chicken and broccoli and tomatoes\nA bald man plays an informal tennis game.\nThe tall vase on the table is holding small flowers.\nA man and woman brushing their teeth and taking a selfie photo with a camera in a bathroom mirror.\nA woman is sitting on a bench looking sad\nA cat next to a windows behind cans and bottles.\nA person typing and working on a hp laptop\nA green bus is in a parking lot.\nMan feeding a costumed woman's head chocolate cake.\nthere is a black cat laying on a desk next to a computer\nA boat of some sort near a harbour.\nA person who is working on a laptop computer.\nA school bus sits in a parking lot with other cars.\nA cat sitting next to a banana on a shelf.\nTwo men who are wearing suits and hats standing next to each other.\nA person falling off a skateboard onto the ground.\nA bathroom with a large square mirror over the sink and a brown shower curtain with circle designs.\nA double decker bus passes a fellow motorist on the street.\nA woman riding on a motorcycle inside of a show room.\nA bagel, cream cheese and lox is served with fresh cucumber and tomato slices.\nA person in a wet suit is parasailing.\nFlowers arranged in vases on a shelf against a wall.\nA young person stands on a beach with a kite board.\nA tractor trailer is parked in a grassy field while people lean against it.\nA family of giraffe walking around a stone filled hillside.\nA train makes its way down the tracks in a wooded area.\nA TV has a cartoon-like screen with a keyboard sitting idly.\nA baby with bib on sitting on the floor putting an unidentified object in mouth.\nA man standing in front of an open refrigerator filled with food.\nSeveral senior citizens are at the table, posing for the camera.\nA dog is sleeping on a couch in a living room.\nThe woman in the red shirt jumps up to catch a Frisbee.\nA woman eating a doughnut and pointing at other doughnuts in a bowl.\nA man with glasses is wearing a white shirt and tie.\ntwo horses at the sunset in the field feeding\nFour people carrying surf boards on the beach in wet suits.\nA group of people on land looking at a flying boat\nA man talks on a cell phone while holding a camera.\nA man with a tennis racquet stands on a court.\nGuys in the gym playing soccer with teams\nA bowl of food and a spoon on a table.\nA woman riding on a bike past a busy intersection.\nA man outside in snow gear on a snowboard.\nThe edge of a bed and a closed window.\nMichael Jackson hat and glove to celebrate a birthday.\nA surfer is atop a wave with arms steadying from an upward position.\nA lady in the dark holding a remote up.\nA young man is waiting on a table of people at an asian restaurant.\nA woman talks on the phone while touches a yellow cup that sits on he table.\nYoung baseball player up to bat poised to hit the ball\nA bunch of plates of food such as fish, pork, watermelon, pasta salad and cocktail sauce.\ntwo pieces of toast, bacon and potatoes on a table with a cup of coffee\nSmall dog sitting on covered table with orange toy.\nTwo plates of food  and two glasses of wine placed on a table.\nA man in blue shirt holding two bowls full of ice cream.\na street sign attached to a pole on a street.\nA black hair dryer sits in a tan chair.\nA plate with a sandwich, fries, and a pickle are sitting on the table.\nA person walks along the beach with some dogs\nA large red double decker bus driving down a city street.\nTwo photos of a man sitting on a private jet .\nA girl standing and holding a sweatshirt next to a stop sign.\nA brown dog on wooden floor next to a window.\nA closeup view of a clock on a Christmas tree.\ntwo vases on a table with flowers in it\nA baby holding a i phone sleeping in it's mother's arms.\nA traffic light is shown next to a tunnel entrance.\nA large white clock on the side of a wall inside of a building.\nA photograph hangs above the tank of a toilet with a spare roll of toilet paper.\nA large selection of fruit of different types in baskets.\nA partially eaten taco pizza is in the foreground, while another type of pizza is in the background.\nThree giraffes pressing their mouths to each others heads.\nA man is laying in bed with headphones on.\nA dish of pie on a wooden table.\nA young man sitting in a car talking on a cell phone.\nA gray elephant walking around inside of an enclosure at a zoo.\nA small boy chewing on a blue and white toy.\nTwo microwaves sitting side by side on a countertop are marked with signs printed with the symbols for man and woman.\nSomeone using a cell phone while brushing their teeth\nA pepperoni pizza and a bottle of beer\nA dimly lit bathroom just has a toilet and dirty sink.\nA stop sign on a street corner with building, crane, and blue sky with clouds in background.\nA child holding up a baseball in a mitt.\nThe white bathroom is very sleek and modern.\nA man who is eating a pizza and looking out a window.\nA surfer sits on his surfboard while waiting for a wave.\nPeople with red suitcases walk towards a large building.\nA bathroom is shown with a shower and a toilet.\nthere is a stuffed animal that has a small stuffed animal inside it\nA man laying on top of a couch in a living room.\nThere are two people holding glasses of orange juice.\nA cup of coffee sits next to a keyboard and mouse.\nTwo women on a bus, one talking on a cell phone.\nThree boys are playing soccer underneath a bridge.\nan elephant picks up riders from a platform\nWoman sitting with bananas in camp with people in background\nSeveral people are on a lake with kayaks, boards, and boats.\nA young man sits on a bed that is made-up with lots of pillows.\nA person standing on top of a tennis court while wearing a white hat.\nTwo old suitcases, a blue one and a brown one, are stacked one on top of the other.\nA man riding an elephant plays basketball while others watch.\nA woman makes a crazy face over a plate of food.\nA man holding a toilet seat on a square toilet in a bathroom.\nOld photo of man with a beer sitting in the ground with others.\nA bathroom sink under a mirror on top of a counter.\nA girl plays with a cat on the ground\nTwo black birds sitting on the branches of a tree.\nA subway train that is crossing over a river by a bridge.\nA picture of a snowboarder jumping right into the air.\nA computer on a desk with a bottle of beer next to it\nA woman is posing next to a stop sign.\nA man with a tennis racket jumping on the grass\na man wearing a back pack walking toward another man\nA white airplane with two large propellers sitting on a runway.\nA clock on a pole in front of a tree.\nA woman standing on top of a tennis court holding a racquet.\nA man crosses a street at a corner with a market on it.\na couple of people riding on some big elephants\na kid poses on a side walk as a baseball player\nA bunch of zebras that are standing in the dirt.\nBathroom with glass shower door and art work hang above the toliet\nA boat is in the dimly lit water by the city.\nThere is a bathroom with a toilet and a bidet.\nA women who is riding a skateboard in the street.\nA nighttime picture of Big Ben in London, England.\nHere is an image of an outdoor place.\nA couple of women are playing tennis on astro turf.\nA kitchen is being installed with stainless steel refrigerator and glue is on the island.\nA man giving a thumbs up while on a cell phone.\nA group of young people throw a frisbee back and forth.\nA brown bear in the woods under a tree.\nA man is frowning while standing in an empty room.\nTwo giraffes graze on treetops in the distance.\nA woman cutting a portion of pizza from a tray next to a bowl of fruit\nA plane flying over a river in a rural area.\na man and a woman sitting at a table eating food\nA man riding a bike on a dirt path through a forest.\nAn elephant walking down the side of a dirt road.\nThe hood a street motorcycle, that has the Italian color, the number 7 and ALITala on it\na brown bear is laying on a rock and some trees\nA vase of flowers is sitting on a white table.\nA sausage link is strung out on a board ready to be cut.\nA colorful vase of flowers sitting on a glass table\nShe makes riding the waves look fun and easy.\nA small restroom with a single toilet and wooden toilet seat.\nThe kitten is enjoying the treats on the plate.\nA lovely cat have a cup to his face.\nA group of animals walking in the grass next to a road.\nA few men carrying some surfboards on a beach.\na small plate of cake on a table\nA person that is doing a skateboard trick in the air.\na close up of a pizza on a pan on a table\nA red umbrella with the ruins of a building in the background.\nA large gray cat laying on the floor next to a couch.\nA man standing under a blue cloudy sky.\nPeople sit around a table full of hot dogs and fries.\nA bus traveling on a road with other vehicles beside a large building.\nA sandwich with eggs and cheese on paper\nSkiers riding a ski lift and looking back behind them.\nA woman in skies is standing in the snow\nThe young boy is walking with his glove on.\nA man standing on a surfboard catching a wave.\nRows of Pullman bags for sale at a store.\nAn open door on a train at the end of a platform.\nA cat climbing down beside a t.v. screen.\nA computer sitting on top of a wooden desk near a window.\nA large bridge spanning the width of a bridge near a tall building.\nA car perched on a table looking closely at the television screen.\nThree friends look past a bottle of wine to the end of the table.\nAn old building with rote iron railings and landings.\nAn extreme close up of an expensive gaming keyboard.\na young boy sliding down a snowy hill on a snowboard\nA woman walks down an empty street next to a large street clock.\nAssorted food items with paper wrapper ready for consumption.\nA oven made of iron filled with pots and pans.\nThat building looks like the building downtown in Atlanta.\nA picture of a man's face next to another picture of a person's arm holding a glass of wine and a remote control.\nA woman holding two pairs of scissors next to a display.\nMan in black shirt and jeans doing a skateboard trick.\nA woman and black cat together in a bed.\nan image of a man that is riding his bike up high\nShot of bathroom with bath on far side near toilet.\nA dog is laying and resting on a walkway.\nA young persons clean and orderly bedroom and desk.\na man in a blazer uses a cell phone\nsomeone having a  chili cheese hot dog for lunch\nsome glass ware is on a wood shelf\nA bathroom with a white toilet next to a shower.\nA horse is standing inside a pen next to a smaller horse.\nA girl has her pony by the harness.\nA kitchen with an oven, stove, microwave, and refrigerator.\nImage of a bathroom showing the vanity and sink area.\nA large bathroom with a toilet and sink.\nThe person is deciding whether to try the skateboard trick.\nA photo of a horse on the back drop of an ocean\nA work desk cluttered with stamps and work supplies\nA woman walks down the street with an umbrella.\nA man is riding his bike down a subway area under a Clearance sign.\nFlowers in a vase sitting on a window seal.\nBlack and white cats laying down in the green grass.\nPastel umbrellas hang above a garden in the lobby of a fancy building.\nA zebra carefully walking around in a zoo pen.\nStud farm with horses and trainers in a vast ground.\na cat and a sheep are standing in a field\nTHERE IS A WHITE WII CONNSOLE AND GAME ON THE TABLE\nA fish eye view of part of a bathroom\nAn owl is eating the flesh of another bird.\nA white parrot standing next to a jungle covered hillside.\nA group of people standing in the middle of the street.\nTwo women are cutting a heart shaped cake together\nA farm stand selling plants and apples by the pound or quart.\nSidewalk under construction with safety cones by the fire hydrant.\nA man sitting on top of a pole next to a fire hydrant.\nA small tree is covered in snow from the storm.\nA man is dressed like a clown magician while pointing at a picture on his cell phone.\nA man is standing outside of the water observing the huge flock of birds.\nBlack and white photograph of people observing sheep in a field\nthere are many computers and lap tops on this desk\nSmiling indoor tennis players and their racquets with a football\nSome young children preparing for a baseball pitch.\nA kitchen area features a silver refrigerator, stove and counters on dark, wood flooring.\nYoung woman dressed in black and white playing soccer.\nA flower in a pot standing on a table.\na baseball player swinging a bat at a ball\nA road sign next to a building and tree\nA man dives in to catch a frisbee\nA giraffe in an enclosed area eats from branches up high.\na very pretty kite is flying high in the sky.\nAn orange cat sitting in the passenger seat of a car.\nAn assortment of remote controls lined up on the table\nA skateboarder who is jumping down a flight of stairs\nA skateboarder performs a difficult skill in a skate park.\nA bedroom has pink walls and a blue bedspread.\nA stack of donuts sitting on a piece of paper.\nA white boat on water with seagulls and umbrella in the foreground.\nA train coming into the train station\nA zebra is outside enjoying the grass before him\nThe bear is on the table in front of a glass of beer.\nAn intersection with a pole that has signs on it.\nA crowd of people on the sidewalk and an airplane overhead.\nA bathroom with a tub, toilet, sink and a mirror with red edging.\nA photo taken from the ground of a person standing with their skateboard.\nThere is a stop sign in a field.\nA group of people sitting around dinner tables.\nA senior tennis player prepares to backhand the ball.\nA woman standing next to a tree holding a pink frisbee.\nA baseball player taking a swing at a ball\nA woman standing next to a standing toilet.\na person riding skis jumping in the air\nthis is a woman on skis posing for a picture\nan elephant with its mouth open and some bushes and trees\nA dog sitting in a wooden rocking chair outside.\nClose up view of a large glass of wine.\nA woman talking on a phone while wearing glasses.\nThere is a hanging clock over a set of stairs.\nAn eagle flying past a group of green trees.\nA bento box with chopsticks containing strawberries, carrots, sandwiches, broccoli, lettuce, and some other foods.\nInteriors of a kitchen containing several household items.\nA man standing in a kitchen with a large pan of batter.\nA train traveling across a snow covered hillside.\nTwo traffic sings sit above a Parking sign.\nA man riding a skateboard up the side of a ramp.\nBaseball players at the pitch playing and a crowd watch\nthis is a man picking over bunches of bananas\nA Muslim man is being interviewed on TV\nan image of two urinals inside of a public restroom\nTrains sitting side by side on a train track.\nAn empty intersection in a mountainous area.\nA group of people eating food at a table together.\nA man walking on a brick sidewalk with an umbrella.\nA surfer riding a large wave in the ocean.\nNeckties are tied together around the circumference of the pole.\nA man holding a tennis racket up in the air\nA man taking a selfie while brushing his teeth and looking in the mirror\nA person that is trying to get a frisbee.\nCommercial airliner flying  near mast on cloudy day.\nSome pizza with toppings and some pasta on a plate.\nan image of a rotten fruit and burnt hot dog\nA young lady sitting at a table covered in food.\nA horse standing in a snow covered field in front of some buildings.\nA group of people sitting around a couple of benches.\nA baseball player with one leg kicked up preparing to throw a ball\nAn empty kitchen is shown with empty counters.\na white bus driving in a parking lot with a truck beside it\nA toilet in a bathroom that is being built.\nA vase and lids are sitting on a table.\na small passenger plane sitting in a filed of airplanes\nA motorbike with blue and silver bones painted on it\nA baseball game with players in uniform and one player swinging the bat at home plate.\nAn empty bed with gray sheets and a small lamp\nA woman carrying a pink umbrella wearing a blue scarf.\ngroup of people on bicycles waiting at a stop light\na train going down a track by a platform with stairs\nA kitchen table that has a vase on it.\nA cake shaped like a bear has a sparkler and candle on it.\nThe large cat fell asleep in the chair when no one was home.\nThe animals look very skinny and unhealthy as they walk around.\nA couple of zebras graze in their zoo habitat.\nA woman is taking a picture of herself in the mirror with a camera.\nA grey cat is being held by a woman at a cat show.\nA bird flying over a small city with small buildings.\nTwo young children playing in a living room\nA man with graying hair looks down at a stand full of yellow bananas.\nA train on the tracks under the electrical lines.\nThis person ordered this dish at a restaurant\nThe dog is on the couch in the room with the large TV.\nA large kitchen with wood floors and cabinets\nSoldiers on a train saying, \"goodbye,\" to nurses.\nA bathroom with a toilet, sink, towel rack and paper roll.\nA blue car that is parked on the side of the street.\nA slightly knocked over stop sign next to a small empty road.\nA line of buses parked in a bush lot with a fence.\nA guy doing skateboard tricks in front of a crowd of people.\nGiraffe and small dog stare at each other at the zoo\nA group of guys playing Frisbee in a park\nThe retro looking living room has blue couches and pictures on the wall.\nA pile of luggage is secured to the top of a small car.\nA plate holds a large salad with broccoli.\nMany horses are on the beach near the ocean.\nsome people are walking around a city with umbrellas\nTwo ladies wearing black texting on their cell phones.\nTwo men in a park playing a game of basketball\nMetal street signs with street names and a stop sign.\nA bird is standing upright in the water and leaves.\nTwo guys playing a game on the WII.\nYoung girl perched on rock about to rcieve thrown frisbee.\na person in a living room playing nintendo wii near a window\nA tennis player is playing tennis on the court.\nA sign on the street that lets you know where you are.\nA cat is using the toilet to go to the bathroom.\na person on a beach holding a surf board\nA white and black boat traveling near the Golden Gate bridge.\nA boat sailing close to shore near a lighthouse.\nLooking up at a dirt bike rider leaping over a jump\nElephants at the zoo holding each other's trunks.\nA man and a woman that are sitting on a couch.\nA very young girl brushing her blonde hair.\nan image of a man drawing pictures on the sidewalk\nThat concrete is going to be hard on his body if he misses this skateboard trick.\nBoys sitting on a bench at a baseball game.\nA man riding on the back of a motorcycle down a road.\nA row of red fire hydrants sitting in the middle of green bushes.\nA table with a television and a picture of electrical gadgets.\nA person wiping out on a surfboard on a wave.\nA bus traveling down the street next to a bunch of cars.\nA plane makes a landing at an airport.\nA man standing on a tennis court holding a tennis raquet.\nA woman standing talking to her cellphone next to a man in glasses.\nA sink and tub with towels in a room.\nA soccer player runs up to kick the ball while the crowd watches.\nA pregnant belly with a teddy bear on top of it\nA small table space that is in a tiny motel room.\na glass vase with some flowers inside of it\nBlue and green passenger train passing down the side of the small valley.\nA lone zebra stands under a tree branch.\na large pile of teddy bears in many different designs\nA couple of men standing on top of a field.\nTourists photographing a steam locomotive pulling into a station.\nAn old style truck that is parked on the grass.\na table that has a banana and some ice cream on it\nA horse is being led away by its bridle\nA tall building with a massive clock on it's face.\nTwo people standing in a room playing video games\nA covered dish beside a sandwich and other dishes of food up to the right.\nOne person cutting a cake while the other pulls out slices on a spatula.\nA tie with the picture of a deer on it sitting on a shirt.\nA woman is wearing sunglasses and holding a parasol.\nGuys in the park playing Frisbee golf on a cold day.\na bathroom with cream colored walls and a broken counter on the floor\nA horse or zebra in the middle of some shade trees.\na black and white kitty laying next to a chair leg\nA light that is on a table next to a laptop.\nA man rared back with his racquet on a tennis court.\nsome stuff blended up in a blender for some serious gainz\na person at a desk with a laptop and a note book\nA man flying through the air while riding a skateboard.\na platter and assortment of different desserts and cakes\nA group of people sitting at a table around a pizza.\nTwo skateboarders doing tricks at a skate park\nA pair of skis are placed in the snow.\nCat carefully examining a skateboard on a hardwood floor.\nTwo girls in red chasing a white soccer ball.\nWater spews into the air from a fire hydrant.\nA street corner with trees that are covered in snow.\nA horse is running down the dirt path.\nTwo young ladies are sleeping side-by-side in a subway station.\nA picture of a man holding a  remote.\na person lays on the snow with their feet up\nA large crane sitting next to a building under construction.\nA red train leaving a train station with man watching.\nA narrow lane runs between rows of parked buses on a rainy day.\nA glass of wine and a smart phone sits next to a laptop computer.\nA little girl that is sitting on a kitchen counter.\nA hot dog, french fries, and a spread.\nTHERE IS A CAKE ON THE TABLE\nA close up photo of a train set with a little train going by.\nA zebra standing in a grassy field by a woods.\nNear a wooden bench, a baby in blue places her rubber boot upon a skateboard.\nAn old fashioned train is parked as workers gather around it.\na refrigerator with stickers on it sits in a corner in front of a window\nThe young boy is standing and playing the game.\nA family sitting at a outdoor table at a restaurant.\nThere is a truck pulling a camper trailer\nThe display of the Magic Bullet blender, with a price tag of 53.99.\nPale shelves with bananas and other items and a  black marble topped L shaped counter against a brick wall with cooktop, sink,  and various kitchen items, meet, leaving a small section of inlaid wood floor.\nSmall white bathroom with a black-and-white shower curtain.\nA green couch sitting in between two lamps.\nThe skateboarder is performing a trick, mid jump.\nA light colored dog chewing up a child's toy on the carpet.\nBoy with legs out stretched taking a jump with a skate board.\nA vintage photo of hurricane damage to boats.\nA picture of a living room in a house.\na lady with a real colorful umbrella that is standing outside\nTwo women in white tennis outfits hold out their rackets as a crowd watches.\nGirl on a skateboard texting by the beach\nBreakfast for four with omelets, fried eggs, bacon, ham, french toast and pancakes.\nA man dressed all in blue playing tennis.\nA clear vase holding white flowers on a table.\nA kitchen area with many copper pots and bowls on display.\nTwo people huddle on a bench under their belongings.\nA cat is sitting on a couch while leaning against the couch's arm.\nAn upwards-looking view of a Stop Sign, an All Way sign, and a One Way sign.\nA person riding on top of an elephant near a tree.\nA man walking with a dog that has a frisbee in his mouth.\nTwo men pushing a full cart down the  road\nThe street is lined up and down with motorcycles.\na close up of a person holding food\nA black horse in the middle of a field with a mountain in the background.\nA bench on sidewalk below tree next to lamppost.\nA computer desk with a monitor, phone, and laptop on top of it.\nA couple of wine glasses next to some bottles.\nThere is a small window in a stone building\nA woman in yellow raises her tennis racket.\nFlying a kite on a wide beach with few people.\nA man standing next to a hipster girl.\nA man is taking a picture in a rear view mirror.\nthere is a man flattening dough on a tray\nA group of people standing and sitting around a table.\nA blue shelf filled with Chiquita bananas in  a store.\nA man with sun glasses and wearing a hat laying on a bed.\nA man in bright green prepares to serve a tennis ball.\nHere is A tender moment among zebras this afternoon\nA Muslim lady holding a child that is being fed a birthday cake.\nA plate of food is arranged with fruit and vegetables.\nHorse and rider walking on sandy beach at ocean.\nSkateboarder and board in mid air at a contoured park.\nCat sleeping near the sun on bed covers.\na group of guys playing with the wii\nA plate of food containing broccoli,cauliflower, celery and other foods\nPuppy and full grown dog outside near some refuse\nA sleeping black and white dog wearing a pirate hat.\nA hot dog with toppings and potato salad\nThere is a large group of skiers standing on a wide field\nTwo girls walk along a path near a waterfront.\nSmall brown dog laying in between a person's shoes.\na beach covered with umbrellas and tourists relaxing\nA group of people flying kites under cloudy skies.\nTwo old people in motion while playing a Wii.\nA private airplane is flying in the sky.\nA woman sits in a u-shaped bench with her legs elevated\nA street with people walking about it and a kite above.\nA red, double-decker bus drives through the town as dusk approaches.\nA metal sink filled with many lemons and apples.\nA man in a baseball uniform standing on a baseball field.\nA man smiling for a photograph and holding papers in his hand.\na yellow long tailed kite being put into the air by a couple\nThree lit candles on a chocolate birthday cake.\nA baseball player pitching a ball to a batter.\nA train coming on the track in a train station\nA person is doing a trick on his skateboard.\nA player chases a tennis ball while the umpire watches.\nA person giving a thumbs up to a computer screen.\nClosed toilet and shower in small, bright bathroom.\nA zoo keeper on a scale holding a giraffe with a \"me gusta face\"\nA woman and child on a silver motorcycle.\nBrown and white cat sleeping on desk next to a computer.\na person on a snow board does a trick over a hill\nA couple of guying chasing after a Frisbee.\nA person in a suit and tie looking unhappy.\nA dark colored river with several horses on the other side near the trees and brush.\nA bicycle with a springs mounted under the seat.\nTwo pieces of pizza on a plate pepperoni.\nA decorated room with no one in it has a table in the middle with various items on top.\nTwo trains driving inside of a train station.\nA woman in a costume inspired by the White Rabbit from \"Alice in Wonderland.\"\nA baseball player in mid swing and a catcher ready with his glove.\nA man biting into a slice of pizza.\nAdvertising and traffic clutter a busy city street\na crowd of people by a school bus and a girl holding a big blue bowl\nTwo people are in front of a deck, and about to go skiing.\na kitchen with a sink trashcan refrigerator and a heater\na building with a clock sitting near the top of it\na man in a red and gray snow outfit stands on skis holding his ski poles as he stands near other skiers and snowboarders.\nThere are different citrus fruits in the bowl.\nA bed and desk in a small room.\nA horse that is grazing around in the grass.\nA tie rack filled with lots of different colored ties.\nA couple drinking wine on a horse-drawn carriage ride through the countryside.\nhalf empty bowl of cereal with a loaf of bread, a banana, and beverage\nA bathroom with raised shower, sink, widow and mirror.\na very large building that has a clock on top\nA stop sign out front of a construction site\nA flat screen TV mounted to a wall over a lamp.\nA car and a large truck on a city street.\nA young man swinging a baseball bat on top of a field.\na big basket of bananas next to some people\nA chocolate cake sits half eaten on a table.\na public restroom with a white toilet and toilet paper\nA red couch behind a brown ottoman with a cat sitting on top of it.\nA stove with a willet cooking banana and a moka pot.\nHot dogs and buns cooking on a grill.\nA beer advertisement on the side of a passenger bus.\na man wearing a suit and tie standing in a room.\nA group of surfboards on a rack on the beach\nA red fire hydrant sitting next to a green plant.\na surfer wearing a wet suit is surfing on a sunny day\nA plate with with different kinds of food on it.\nA man and a boy playing Wii in a living room\nA man dressed in white is on a horse.\nSOME GOOD WAVES FOR TWO SURFERS IN THE OCEAN\nA smiling man that has long dread locks in his hair.\nThe tennis player is swinging the tennis racket.\nA zebra drinks from a pool of water in a grassy field.\nA kid holds a sandwich and a big candy cookie.\nA boy leans on a counter next an almost empty soda bottle.\na skier with a red jacket is next to some water and snow\nA man is smiling while talking on his cell phone.\nA zebra is grazing in an enclosure while an ostrich sits in the background.\nThe single train car is painted black, yellow, and orange.\nA man standing in front of a fridge with a lot of magnets on it.\nThe interior of a public bathroom with multiple sinks.\nA man and  woman standing next to each other with the woman holding an umbrella.\nA parking lot filled with yellow school buses parked side by side.\nA kitchen with a sink, coffee pot, refrigerator and shelves.\nA person handling bread over an open oven.\nA woman with a tennis racket is running\na tray covered with cheese fries, a corn dog and a hot dog\nA train is coming down the track near old warehouses.\nTwo teams playing soccer with one team  kicking the ball down field.\nShadow from a street sign with a message written on it.\nSeveral birds overlook the skyline of a distant city.\nA woman poses with avocado sandwich lunch at an outdoor restaurant\nA living room with chairs, a table, and painted walls.\nTwo women shaking hands at a tennis match.\nA police officer mounted on a horse while two children pet the horse.\nLone giraffe lying in dirt area of enclosure.\nthere are two hot dogs on a fake paper plate\na black red and white double decker bus people and buildings\nA boy grips his skateboard as he jumps the edge of a half pipe.\nA couple of elephants roaming through the tall grass\nA close-up photo of a white and brown cow.\nA young girl standing under a window next to a toilet.\nA tennis player stands by her equipment bag holding two rackets.\nA mountain view with two birds flying overhead\nA boy and woman in an open area in shopping center with three park benches.\nA street identifier installed as part of a curb in the sidewalk.\nA large elephant standing in a grass field.\nA young man is riding a skateboard with other young men watching him.\nA view of the city is very colorful.\nThe picture shows the underside of a jumping snowboarder.\nA black and white train on tracks next to a station.\nThere is a cat giving itself a bath while laying on a luggage.\nA woman with a dog talking to two people sitting on a bench.\nSeveral children and some adults celebrating a birthday party.\nA black and white zebra is standing in the green grass.\na orange cat sitting on a half rotted wooden bench\nA poppy seed muffin with orange slices on a plate.\nA man on snow skis traveling on some snow.\nThe three giraffes tower over the smaller animals.\nTwo beds in a tiled room, both with lime green bedspreads.\nA pitching about to throw a baseball at a game.\nA vase that is placed outside of a window.\nPumpkins sit under a spooky lit up Halloween display.\nTwo people roasting hot dogs outside on a stick.\ntwo people on a beach with a kite\nSmall group of kites being flown nice day.\nSeveral people in the heavy snow on skis.\nA green and silver train passing by a building.\nA group of people walking down a street on a rainy day.\nSlices of pepperoni pizza on a baking tray.\nPeople standing with sheared sheep inside a fenced enclosure.\na coupe of people sit on a couch while laughing\nA young man kissing the top of a young woman's head.\nVases and figurines line a long piece of furniture next to chairs, a lamp, and a picture.\nA bird walking in the grass with it's beak open.\nA man holding a large bag of lime green luggage.\nA baby sitting on a bed next to a large brown and little white teddy bear.\nA white horse looking up for a photo at a fence side.\nA stop sign that has been tagged with graffiti.\nTwo people in a group with one holding up a phone.\na couple of people that are walking on a beach\nA man standing on dirt holding a pink frisbee\nA baby is sleeping in a swing in a room.\na close up of a cat in an open luggage bag\nMany people on the city street with umbrellas.\nA man rides two brown cows across water.\nA meal of a sandwich and soup sits on a wooden table.\na close up of a bunch of green apples\na kitchen area with a stove-top oven and sink and cabinet with a dishrack\nA woman is blurry as she rides her bike next to shops in a city.\nThe horde of pigeons take advantage of the crumbs left by pedestrians.\nA group of elephants is standing on grass.\na person pointing to what they are putting on their snadwich.\nA man holding a snowboard standing at the bottom of steps.\nA man walking along the platform next to a subway car.\nA train covered in blue paint and graffiti.\nA bus displays an In Service sign, traveling down a road\nA woman holding a cat up tight against her.\nA mother and her child sitting on a couch using laptop computers.\nSmiling woman standing in front of refrigerator with wine bottles on top.\nThe giraffe is posing for the picture near the wooded area.\nThis train car features a variety of colors and carries passengers.\nA zebra that is outside eating some grass.\nA young man in black clothes holding a yellow frisbee\nGroup of children sitting at table eating pizza off plate\nMuseum with ancient artifacts and people looking at them.\nSeveral chocolate donuts with decorations sitting on a pink mat.\nSeveral street signs shown on a city street.\nA kitchen that has various types of appliances.\nA group of people are standing around a caged giraffe.\nPeople play in the water and fly kites at the beach.\nA person wearing glasses is walking away from a stop sign.\nPerson bundled up out for a ski in the soft snow\na green street sign surrounded by some trees\nA city street filled with tall buildings and motorcycles.\nA traffic light is displaying a green smiley face.\nA pile of vegetables sitting on top of a wooden table.\nA marching band stands in a street in front of spectators.\nThe blue bathroom is small, sleek and efficient.\na cup of coffee a laptop and a table\nThree women on a couch talking to each other.\nA locomotive train traveling across a train trestle.\nA street sign of NE 5th st and the back of a stop sign\nA water bottle with ear buds on it in front of a laptop.\nA cat is looking at himself in the mirror.\nA girl standing in a room holding a green Frisbee.\nTHERE IS AN AIR PLANE THAT IS FLYING IN THE SKY\nTwo planes are flying by one another and one is putting off pink smoke while another puts off blue.\nA room with three old tubs and peeling walls.\nThe boy is curious about what is beyond the umbrella.\nThe plane is flying over the parked cars.\nA woman and a black and white dog on the beach.\nSeveral cars at an intersection on a city street.\nFemale tennis player in blue outfit returning volley.\nA bowl of a kind of vegetable stew on a table.\nAn individual snowboarding down a snow covered hill.\ntwo elephants in tall grass with trees in the background\nA tennis player prepares to hit a tennis ball, while others watch.\nOranges hanging from an orange tree in an orange grove.\nThe little boy eats a slice of pizza.\nPeople standing around the stove and counter fixing plates of food\nTwo laptops and monitor on a desk in front of another monitor.\nA pulley is seen in a room with lots of stuff on shelves.\nA fake bear that is standing in the snow.\nA pool surrounded with chairs and trees.\nMan riding a snow board down a long slick area.\nSeveral kites are flying on the beach in the blue sky.\na living room with a tv a desk and another tv\na black cat walking into a kitchen\nA group of motorcycles parked in front of a tall building.\nA giant neon Coca Cola sign glows in the stadium during a baseball game.\nA bathroom counter has purple orchids on it.\nAn orange fruit beginning to grow on a tree.\nThe city streets are busy this time of night.\nA skier flying high in the air over a snowy hill.\nPeople standing in a long line at a train station.\nA round intersection on a surburban street with one floor homes.\na couple of people that are standing next to each other\nA large group of people standing around in red, white and blue colors\nA dog is sitting on a chair near a stuffed animal.\na group of people that are walking down a sidewalk\nA man and a baby lying on the couch in a living room\na giraffe eating leaves from a tree with its butt to the camera\nA woman holding a racquet and tennis ball on a court.\nA man wearing a suit directs two men riding horses through a city\nA young boy touching a cow through a metal fence\nA black and white photo of a train system going down tracks.\nAn aerial view of a street corner with a STOP sign and a ONE WAY sign above it.\nTwo men playing professional soccer on a field.\nA man riding a wave on top of a surfboard.\nA woman in a dark cave holding two sheep\nThe room has two couches in front of a tv.\nThe ingredients are on the kitchen counter next to the blender.\nsome people at a table with a umbrella silverware and some drinks\nA man that is wearing a suit and a pink tie.\nA gray and white tiger striped cat sitting in front of a brickwall\nThe fried rice has vegetables and meat in it.\na bunch of people on a snow slope in the moutains\nA toilet attached to a red and white brick wall.\nA train traveling along side of a road.\nA white table topped with plates and bowls of food.\na white keyboard sitting next to a white computer mouse on a mouse pad.\ntwo stage coaches traveling down a snow covered trail\nthere is a clock on the side of the old building.\nA group of people are lined up skiing.\nA man and his son playing Frisbee in a park\nFemale soccer player maneuvering ball on grassy field.\nSkiers skiing in the snow with their skis on the ski slope.\nA man in a purple shirt doing a trick on a skateboard.\nA British Airways airplane flying in the air.\nA man is doing an upside-down flip on his motorbike way up high in the clouds.\nA man does a jump on a skateboard.\nA white Ecohopper bus driving down a street.\nA PICTURE OF A WEATHERED YELLOW AND BLACK TRAIN\nA bathroom with toilet, mirror, picture and tub.\nA hand holds an old-style flip phone in the open position.\nA boy on a skateboard at the top of a rise on a skateboard ramp.\nA large wooden clock hangs from the ceiling in a store.\nA person holding two ski poles while standing in the snow.\na dog that is rolling down a skateboard\nthere are many people on the road riding motorcycles\na group of baseball players standing in a field\nLarge bird preparing to fly from beach area.\nA group of brown cows grazing in a field\nA LOT OF MEN ARE ON HORSES\nA child is playing in a recreational park.\nA street sweeper machine parked against a tree by a street.\na living room with some book cases beside the fireplace\na lady wearing a red sweater with an empty plate\nA teddy bear with multiple colors with a new tag still on it.\nA man laying down in the snow with skies on\nPassengers near a yellow and blue ski airplane.\nA large display of apples at a market.\nA red couch that has a laptop computer on it.\nA bedroom that is cluttered and needs organization.\nA MAN DRESSED AS A PIRATE AT A PARTY\na young man is performing a skateboard trick\nA baseball field full of baseball players standing on a field.\nA bathroom with a sink on the left under a mirror and a toilet.\nTo bananas sitting on two blue plastic bowls.\nThe yellow commuter train is pulling into the station.\nPhotograph of a public toilet as taken from above\nA LADY IN YELLOW ON THE COURT PLAYING TENNIS\nA woman is shown holding a pizza with zucchini\nAn elegant bathroom has a light up mirror, marble counter tops and dual sinks.\nA dog sitting on top of a made bed.\nA man trying to block another man with a frisbee during a game.\nA yellow cat is sitting on a green blanket.\nThe plate of food has a salad and toast on it.\nA snow skier skiing down the ski slope.\na guy that is jumping on a skateboard\nA white kitchen with a counter in the middle.\nA blurry picture of a bird sitting on a wire.\na motor bike parked on the side of a road across from cars\nA man is looking at his laptop while chatting on the phone.\nA man that is sitting down near a bird.\na bus in a city at night time stopped\nA person jetskiing in the water and creating a huge wave.\nA TV sitting on top of a counter inside of a store.\nA parking meter on the side of a street\nA man leading a flock of sheep down a street.\nA bunch of stuffed bears altogether during Christmas.\none sheep is standing in some tall grass\na family is sitting down at a table to have cake\na cappuccino and a overripe banana sit on a table\nA person rides horseback down a beach along the ocean.\nSome men and women in white shirts and bow ties standing in a row.\nSkiers lined up at the starting point for a race\nA baby sleeps sitting up while clutching a teddy bear.\nMultiple computers and soldering equipment on two desks.\nThe kitchen has a stove, and a microwave in it.\nA baseball player takes a swing at a ball.\nA man making pizza in an oven on a wooden board.\nA woman sitting back on a couch holding a little white dog.\nA person standing next to a tall giraffe.\nThe interior of a modern kitchen including an eating area\nA cat sitting on top of a blanket on a bed.\nA young man in a bathroom taking a picture of himself using the bathroom mirror.\nThe graffiti on this Stop sign denotes a positive impact.\nA small pizza sitting on a decorative plate.\na bird that is sitting on a pole outsid\nA cat climbing on top of a suitcase.\nMan herding sheep down a street with a child in front of the herd\nA bathroom with a small window and a odd toilet.\nA person helping a child stand on a skateboard.\nA baby girl on table next to cake and balloons.\nA woman sitting up in bead looking out the window.\nA blue and white fire hydrant on a lawn.\nA person skiing down a snowy mountain side.\nThe purple city bus is noticeable against the brick buildings.\nA bathroom with a standup shower, toilet and sink.\nThis person is laying in bed while reading a book.\nA group of people standing on top of a beach.\nA woman standing in front of a counter full of baked goods.\nA pack of elephants stand in a grassy plain.\nSeveral people are doing something with remote controls.\nSheep are grazing in a field in the distance.\nThere is a surfer holding on to a sail in the ocean\nA kitchen with drawers, a stove and a sink.\nA tall giraffe standing next to a tree on a grassy field.\nSKIER COMING DOWN THE SLOPES JUST OUTSIDE THE CABIN\nA guy and a girl are sitting in rocking chairs using laptops.\nA large pizza sliced in half in a box.\na herd of zebras drinking water at a lake\nA cat laying on top of a suitcase laying on the floor.\nA pile of chicken, carrots, and brussel sprouts.\nTwo buses driving over a bridge with boats in the background.\nThe slightly overcooked pizza is inside of a pizza box.\nA box of cookies sits by a wedding cake decorated with berries.\nA laptop and an old computer display text while sitting near a window.\nAn airport with an airplane that has a red tale\na number of different doughnuts on a table\nThe couch is directly in front of a huge television set.\nA home desk has a computer, lamp, and knick-knacks.\nThe black and grey cat is facing the other way\nA bowl of vegetables on a wooden table.\na close up of a jet flying in the air\nThe man in the red cart held the reigns controlling a pair of obedient horses.\nThe plate has broccoli and an egg roll on it.\na bunch of computers that are on a desk\nthere are many surf boards laying on top of each other\nMan in yellow shirt grinding down a railing with his skateboard.\nTwo men surfing in water next to a dock.\nThree horse wearing coats walk around a large field.\nA man looking downward holding a teddy bear.\nThe seat of the wooden bench is covered in snow.\nA group of friends playing a motion controlled video game\nToothbrushes and toothpaste lay on the counter by the sink.\nTHIS IS A PHOTO OF A MAN WORKING ON SOME SORT OF CRAFT PRJECT\nA man driving a carriage pulled by three horses.\nA baseball hitter swings at the pitched ball\nA sailboat is floating on a lake under a cloudy sky.\nOne person flies a kite near a crowded sidewalk.\nFreshly shorn sheep eat grass in a mountain pasture\nA girl looks into the distance, while holding a clicker.\nTwo men on a motorcycle pass through a crosswalk.\nA little girl eating a piece of birthday cake at a kitchen table.\nHorse and carriage going down the street in the city.\nA ship docked at an empty harbor at sunset.\nThe cat is on the desk by the two computers.\nMan lying on a bed in a furniture store display.\nFemale tennis player in a purple uniform ready to play.\nA couple of sheep are in the grass by a barn.\nA bench sits in the sun near a path and some water.\nTWO POLAR BEARS IN THE POOL EACH ONE HOLDING SOMETHING ORANGE\nMale and female intent while attending a function.\nA surfboard standing in the sand near trees and the water.\na few small boats in a large body of water\nA plate that has several sandwiches on it.\nsheep standing next to building near a city street\nA bathroom with a sink, mirror, and toilet and other items\nThe man is driving the bus full of people\nA cat dosing off while lying on a chair.\nScreen of an iPhone with German language text held in a person's hand.\na baseball player swinging a bat on the field\nA kitchen with a pull out ironing board and refrigerator.\nA street sign on the side a of cement wall.\nsome parked bicycles and two women on a bench and a book\nA hideous bathroom that is pink in theme.\nA man is playing with a frisbee on the beach.\na tall clock tower near a building with a dark background\nA line of girls holding frisbees or plates outside\nThere are two streets signs attached to the stop sign.\nTwo red and white stop signs on a street.\na decorated vase is sitting on the table top\nthere are many people that are sitting on the benches\ntwo trains on opposite sides of a railway platform median.\nA girl on a surf board riding a wave in on the ocean.\nA man wearing a blue shirt while eating a hot dog.\nA parade float with people on top of it\nA chair and contraption between a grandfather clock and a plaque on a floor.\nThe surfer is on the surfboard riding a wave.\nA large clock tower on the side of the water.\nA geese and several goslings in a pond\nA zebra walking in the grass while other animals are standing around behind him.\nA picture of a snowy street with a red fire hydrant.\nTwo cars parked in the grass as a train goes by.\nA bird sitting on a bird feeder next to green trees.\nA bathroom is shown in dim orange lighting.\nthis lady is using controllers and those men are watching\na toy train on train track next to a toy railway platform.\nThere's a computer monitor on a desk with speakers around it\nA photograph of Key Bank with a clock under the sign.\nA bicycle is parked in the narrow alleyway.\nA double decker bus drives down the street.\nA horse grazing in a pasture in a field, with mountains in the background.\nA girl that is standing away from the camera and has a Wii remote in her hand.\nA skateboarder performing tricks under the lights at night.\nA large red bed with a black cat laying on top of it.\na black dog sitting in a white bathroom\nA man standing in front of a train car door.\na person sitting on steps with a cell phone\nA man who is holding up a parachute.\nA man using a One Laptop Per Child computer, while another man uses a standard desktop computer.\nAn open cell phone next to a sprouting sunflower seed.\nA young man jumps up to catch a Frisbee underneath his legs.\na big zebra that has his mouth on top of it\nA man is sitting on the couch eating.\nVarious people eating in a restaurant at a table.\nA man with a tennis racket at a tennis court.\nBright yellow furniture sitting in a living room next to a lamp.\nA bathroom is shown with a door cracked.\nA man without a shirt is brushing his teeth.\nA man playing Frisbee on a beach on a cool morning.\nA woman is next to a scooter and cat.\nA small boat with people on the top.\nThree boats sit on dry land, the nearest one is called Lauren Jade.\nThe motorcycle officer wearing a helmet drives near a crowd of people.\nPeople making a for sale sign on a car.\nA man riding a motorcycle down a street and surrounded by houses.\nA toilet bowl with a bucket and trash can by it.\nA brown vase full of colorful flowers in front of a mirror.\nA group of giraffes stand in a large open field.\nneatly made bed with blue sheets in a pink room\ntwo bikes parked near a clock pole on a side walk\nA couple of boys wearing ties giving each other a hug.\na living room with a table chairs and a tv\nA tennis player at the match is returning a volley.\nThe last car of a train sits on train tracks.\nA bathroom with vanity with sink, toilet and tub.\nTwo men toilets, one regular toilet, and a sink in a bathroom\nA chair sitting next to a flat screen TV.\nA group of zebras are next to a patio table.\nA green street sign mounted to a white street light pole.\nA child posing on top of a mountain while they ski.\nA herd of zebra walking across a dry grass field.\nA boy in a tie poses for a picture.\nA small yellow room with a couch, table and lamp and wood flooring.\nElephants gathered in the corner of an enclosure\nA long train coming down the railroad tracks.\nLion statue with a large structural clock in the distance\na lady that is holding a laptop sitting by a street\nA woman is holding a racket on a tennis court.\nThree dogs are following three women toward the entrance of a building.\nCattle grazing in partially snow covered ground in winter.\nAn anniversary cake on a table with a picture and glass of wine.\nA young oerson is raiding a small fridge in their room.\nA pesto and chicken pizza cut into eight slices.\nA green double decker bus sitting on top of a parking lot.\nAn old bench on a porch of someone's house in the valleys.\nSome cars at a red light at an intersection stopped.\nA model kitchen is shown with white appliances.\nA large blue bus on the side of a road.\nA group of motorcyclists fly the Puerto Rican flag.\nA woman and a man look to the left while the woman points\nA photo dark room with the red light on.\nA giraffe standing near a tree by a body of water.\nA woman and small boy feeding some sheep\nPlates of food cover a table and includes vegetables and potatoes.\nThe bird with the purple feathers is perched on the branch of the tree.\nYou are proudly witnessing a 360 Ollie in progression\nA large kitchen with a table in the middle\nA cat is standing on top of a TV trying to look out the curtains.\nA pizza sitting on a table, with a spatula in the back.\nA woman pushing a stroller and looking at a cellphone.\nA cat laying in front of a computer next to a mouse.\nA small blue and white plate sitting on a small runway.\nA fire station on a street in a downtown area\nA large white airplane parked on a runway.\nA person is holding a tomato above a tray.\na smiling woman standing next to a baby in a high chair\nthere is a female tennis player serving the ball\nTwo baseball players and an umpire standing at home base.\nShe is going to nail that tennis ball.\nA view of a train station from the parking lot.\nTwo white cattle standing in water next to some ducks.\nA bathroom mirror over a marble sink with the lights turned on.\na lady with a knife laying down in a bed.\na bedroom with some posters a blue and white bed and some pillows\nThe restaurant platter piles  french fries high  with a juicy burger.\nMan prepares to throw a frisbee in an open park.\nA plate of food with a pizza on it.\nA view ofa  bar from behind the actual bar.\nA group of children playing baseball out side.\nThe commuters are busy while they wait for their plane.\nA few apples and a banana sit in a dark bowl.\nA person on an ocean beach flying a kite\nA train is traveling past a grassy area with a foot path.\nA couple of kids petting sheep inside of a corral\nThe sheep and the dog are on a race.\nA group of people sit at a table with cake.\nA man riding a board on top of waves in the ocean.\nA surfer catches a wave on a white and green surfboard with another surfer in the water behind.\nVery long Coney dog on a long buffet table in a ball room\nThere is a statue of a man's head next to a cat.\nA two sided pizza is being cut by someone.\nA jar filled with liquid sits on a wood surface.\nGroup of people riding their bicycles on a city street.\nA worn and tattered pink and black bag.\nA young giraffe leaning over a tall bush in a dry field.\nA large table with a laptop and home computer.\nA man dressed like Darth Vader is standing in a white bathroom looking at himself in the mirror.\nSmall dog in street next to a skateboard.\nA small sofa and coffee table in an apartment living room.\nTwo females are walking down the street wearing boots.\nFemale tennis player in the motion of hitting a ball.\nTwo street signs show an attraction and street name\nA bird perched on a log with a house in the background.\nA food cart with trays of food on the shelves\nPerspective-corrected photo of a large masonry building under a clear sky.\nMan cross country skiing with a yellow lab.\nPeople crossing the street and walking on the sidewalk in a city.\nThe black and white dog is lying beside a stuffed bear.\nA guy standing in a living room holding a controller playing a video game.\nA cow laying down in the sand on a beach, with the water in the background.\na bowl of food next to a keyborard\nthree guys sitting down eating sandwiches and smiling\nA group of people on surfboards in the ocean.\nA gang of bikers driving down a city street.\nMan standing in front of a parking meter holding a folder.\nPeople wind surfing on the water near a suspension bridge.\nAn old picture of a twin bed and radiator.\nA television, couches, table and a remote controller.\nA girl is flying a kite on a clear day.\nA room with a tile floor containing furniture. a staircase and people.\na parked van with graffiti painted all over it\na young person standing on a chair in a kitchen cooking doughnuts\nA small white cow and a big black cow walking in an empty field.\nA woman in a dress and Mary Janes bends down towards a Frisbee in a fenced in yard.\nThree people skiing together on a path carved into a hill\nA person wearing a red tie pointing to it with both hands.\nA person with six snapshots making a call and taking a beer\nA bathroom is decorated with white tiles and white towels.\nTwo woman standing in front of a mirror near a sink.\nA man in yellow shirt and black shorts playing frisbee.\na couple standing in front of a wishing well.\na close up of two people walking close together\nA very tall chicken standing next to the ocean.\na little boy playing a game on the television\nPassengers are standing in a line in front of the door of bus.\nA herd of cows graze in a field behind a wire fence.\nA couple of girls with tennis rackets in a room.\nA person in pajamas laying on a bed reading book.\nBatter at  baseball game waiting to hit the ball.\nA picture of a cat that is looking out a window.\nA man and woman cutting a white sheet cake.\nA bench right next to some tall grass at the edge of a body of water.\nA man is in the ktichen and the living room is painted blue.\nthe people are watching the animal drink water\nFour cell phone on a wooden table with their screens on.\nA kitchen with a red stove top under a framed picture.\nchopped onions sit on a cutting board next to a glass of wine\nA bicycle parked next to a lake on a cement floor.\nThere is a sheet of stickers that go on a keyboard.\nChildren learning to make their own kites.\nA black bear perched on the top of a fence.\nA woman in a robe is using a mobile device while holding a cigarette in front of a garage door\nTwo twin sized bunk beds in a room\nA young woman in the water wears a life vest holds a water ski.\nThe intersection at Durham Court with forest in the background\nPots being displayed at some sort of exhibit.\nA couple of red traffic lights next to a forty sign.\nFive men are around a table with food on it.\nA red train passing by bushes and a road.\nSome people with cowboy hats riding horses on a trail.\nA batter swinging at a pitch at a baseball game with a runner on first base.\nTwo cows stand in a pasture eating grass.\nA stuffed animal is smiling while sitting on a bed.\nA small white cat sitting on a  ledge.\nA group of people in boats on a river.\nA surfer on a surfboard riding a wave.\nCat covers it's face while sleeping by the window\nOnly one slice left of a fruit pie.\nThe clock is on a brown stand with a wall behind it\nA man on skis sitting near the mountains\nA man riding a snowboard down a snow covered slope.\nAn old couple is sitting down on a bench together.\nA skateboarder is mid-air doing a trick on their board.\nClothes hanging on a rope over an unfinished patio.\nA plate with chicken, broccoli and mushrooms with a bit of gravy.\nA group of people pose for a photo at an event.\nA baseball player taking a swing at a ball\nTwo kids in bunk beds reading while laying down.\nA sign warning of snakes in the area stands on a pole.\nA bathroom decor is in shades of browns.\nA biker has his young daughter on the bike\nA couple sit together for lunch on a street bench\nThe four engine airliner sits on the tarmac on a cloudy day.\nA sailboat in the water with the docks in the background.\nA photo of bananas, mangoes, and oranges in a pile.\nAn airplane is shown taking off into the sky.\nA tow truck carrying a bulldozer on a trailer.\nA half eaten doughnut sitting on the side of a road next to a  truck.\nA lone zebra standing next to a tree in front of a fence\nFive wine glasses sitting on paper on a table.\nAn amazing lunch spread with a beautiful salad, peaches, tomatoes, and sandwhiches\na group of people are traveling down a paved road\nA group of people with umbrellas stand in the road.\nChild watching kite as kite is flying in the air\nA man riding a paddle board down a river next to a lush green forest.\nA school bus waiting at a traffic light.\nBushel baskets full of vegetables at a market as shoppers walk by.\nA polar bear plays in its habitat next to a yellow traffic cone.\nA set of traffic lights over a busy road with cars.\nTwo guys passing each other on a tennis court holding rackets.\nA person walks on a platform next to a passenger train.\nA pizza with lots of mushrooms is seen here.\nA person is laying in bed reading a book\nTwo birds standing side by side on a branch\nTwo men in baseball uniforms stand on the dirt.\nA wireless keyboard and mouse are on the table.\nThere is a close up view of a giraffe.\nthere is a very high mcdonalds sign on this street\nA group of people are standing around holding video game controllers.\nAn apple, watermelon and bananas are setting on the table.\na ship sitting out on the ocean not moving\nA large metal tray of rice and some vegetables.\ntwo men and a woman stand by a fence and pet a elephant\nThere are three birds by the grass by the water.\nA woman sitting at a desk pretending to converse with a teddy bear.\nTwo men skiing downhill next to each other.\nSeveral snowboards with people on them located in the snow.\nA Stop sign and other street sign on a road\nBedroom with a bed, dresser, and small picture hanging on the wall.\nA person in action on a field with some people watching.\nA loft bed with various stuff being stored underneath it.\nA grey vintage truck on street next to a house.\nA large orange truck parked next to a woman.\nTwo boxes that have a dragon on the lid are filled with food.\nA crowded street and sidewalk on a city street.\nWoman in a living room with large screen TV and cloth-draped furniture.\nA dog resting his head on the side of the boat looking out at the water.\nA large cruise ship is traveling on the ocean.\nA girl lying on a bed looking at the camera\nA man is holding a piece of food with chocolate in it\nA skateboarder dressed in pink and black at night.\nA man is snowboarding off of a hill in front of a crowd.\nFour sheep watching a dog peek through their fence.\nA show shining station with a pair of boots on it.\nA man standing at a train station near a pile of luggage\nA group of people sitting down at a table to have a meal.\nA city street with lots of blurry traffic on top of it.\nA group of people standing next to each other.\nA group of stuffed teddy bears sitting on top of a counter.\nA group of birds that are standing in the sand.\nA large Cathedral like church with a clock tower and people at the gate.\na close up of an electric blender on a counter\nDigital painting of a tabby cat and large dog touching noses.\nTwo cows behind a fence on a farm\nTwo brown and white horses in an enclosure.\na cookie being held up by a woman\nA group of people standing around a man with a cop in front of him .\nA bathroom stall with a small trash can and a chair.\nA Kinnaird street sign and Stop sign with the word Art in yellow painted on it and houses in the background.\nA bed in a room that has a window open.\nA group of people waling across a cement covered round.\nA bed in side of a room with a small white mattress.\na black and white photo of a boy and girl walking a horse\nPassengers waiting patiently for their flight at the airport terminal\nA man and a cat sit on a sofa.\na sheep is walking around near a tree\nA bunch of giraffe hanging out together as a pack in the outdoors.\nA bathroom showing toilet, sink, and shower\nTHERE ARE A LOT OF PEOPLE WALKING AROUND WITH KITES\nA woman brushing the teeth of a toddler.\nA salad with lots of different greens covered in sauce.\nA skate board rider flying off a ramp in a skate park\nA young man playing on a skateboard at a play ground\na living room with red walls a chair and a television\nA picture of a fire hydrant next to a plant.\nA white refrigerator with the door open with a small amount of food in it.\nA beagle is sitting in a chair with arm propped up the way a human would sit.\na person riding a skate board on a city street\nA bathroom with four urinals and a drain on the floor.\nAn old motorcycle rests near a rundown building.\nA young man is doing a trick on a skate ramp.\nSign with the number \"eighty\" set against bright blue sky.\nA young boy playing whiffle ball in the grass\nThe contents of an open suitcase scattered on a table.\nA large display sign outside of a ski resort.\nCrowd of people in a field flying kites.\nGreen street signs sitting on the side of the road.\nA woman lies in bed reading a book, and petting a cat.\nTwo large elephants walking across a shallow body of water.\nA vintage photo of a city bank branch.\nThe cat is laying down while someone rubs it's head\nA smoking jet going straight up in the sky.\nA baseball player getting ready to hit the field.\nThe table is littered with a number of typical office items.\nA Dominos Pizza with pineapples on the pizza on the table\nA young man holding a piece of food in his hands.\na man with a tennis racket in his hand\nA  white microwave sitting on the ground outside\nA herd of animals grazes in a field while a zebra nurses its foal.\nA group of two people waiting to cross the street under an umbrella.\nA man holding a pizza on top of a pizza pan.\nThe man and woman are talking in the kitchen.\na man in a tie and a suit is indifferent\nA doughnut that has several bites taken out of it.\nA bus parking lot area with several buses parked and one multi level bus driving.\nA man is swinging a tennis racket at a tennis ball.\nA beautiful dinner of authentic pizza with fresh bread, a plate of mozzarella and tomatoes and a lovely red wine.\nA cute little girl sitting on a bench alone.\nTwo adult elephants interacting near a stand of trees.\nLittle kid leans against the gate in front of train\nBlurry silhouettes of people and a horse against an evening sky.\na polar bear pokes his head and one paw out of the water\na young person wearing a shirt and tie\nan image of a man eating a slice of pizza\nTwo rows of teddy bears of various colors and sizes.\nA man and woman wearing tiara while sitting at a table.\nThis toilet sits in a stall in a public bathroom\na skate boarder performing  a trick while others look on\nA cat laying on top of a wooden computer desk.\nTwo street signs atop a stop sign under a clear sky.\nA bathroom that has a couple of toilets, but no stall door for them.\nA man wearing glasses skiing during the day in the snow.\nA simple bathroom features standard toilet and tan sink with dark wood cabinet.\nFour zebras, two warthogs and a giraffe in an open field\nA tv sits enclosed in brick outside on the street\nkids watching a smiling woman milk a cow\nBoys play soccer in sand in front of a crowd.\nA kitchen includes a refrigerator, counter, and sink.\nON ONE SIDE OF THE PARK BENCH IS TREE DOGS SNOOZING\nA bird statue sitting on a bench in a library near bookshelves.\nA POLICE OFFICER IS SITTIGN DOWN TALKING\nA suit case filled with a magazine and a pair of shoes.\na man running on a tennis court with a rackett in his hand\nA MAN IS SWIMMING IN THE OCEAN WATER\nThe three men are walking down the road together.\na small red train is parked at the station\nTwo people walking and holding umbrellas over their heads.\nA man riding a skateboard prepares to roll down a ramp.\nTHERE IS A WOMAN SITTIGN AT THE TABLE WITH HER LAP TOP\nA kitchen that has white cabinets and drawers.\nA square in the city occupied by people.\nSunlight bounces off the green wall in the den.\na passenger train sitting by a platform and a fence\nA bathroom sink with all the usual toiletries on it and a hand towel hanging by it.\nA bathroom with a colorful rug, white towels, and a picture on the wall.\nA group of chefs standing in a kitchen preparing food.\nSome people sitting in the grass leaning against some wooden rest.\nA man hols a surfboard as he walks a beach alone.\nA person that is wearing headphones and glasses.\nA lot of ties are being hanged on the rack.\nAn older large green and yellow trash truck driving down a busy street.\nA pack of elephants are walking through the terrain.\nA herd of cattle in a field covered with snow\nA propeller plane that is flying in the sky.\nA man standing next to a motorcycle on a street.\nblue car wrecked against bus trying to before them\nA white cat holding  a wooden baseball bat.\nColorful toys in front of a cell phone rested on its side.\nA hot dog on a bun with mustard\nTwo giraffes standing by a tree with a forest in the background.\nA red double decker bus is parked on the street.\nA red, blue and silver motorcycle parked on the street.\nA man in a tie getting up from a meeting desk.\nA view of a toilet from the adjoining room.\nOn a beach, there is a clock in the middle of the sand.\nA surfer is riding a medium sized wave.\nA pizza has red and green peppers embedded in the cheese.\nA zebra at a zoo stands alone looking at the ground.\nA little girl in a red dress with a red flower in her hair standing at a sink.\nA wood deck table has a glass of ice tea and a plate with BLT on a sesame sub roll and green salad on it.\nA plane is flying through a cloudy sky\na herd of zebras walk in a caged area\nA man fixing a street sign on a raised up ladder.\nA person water skiing falls in a lake.\nA large group of doughnuts sitting on the table.\nA skier stands outside in the snow on their skis.\nA tabby cat sitting under the back of an old blue car\nSeveral toilets some without lids are sitting on the ground outside.\nAn adult and young zebra standing in a field of green grass.\nThe little girl whose name is Violet, is fast asleep in her bed\nA white and black cat sniffing a banana on couch.\nA man is holding a banana in front of his face.\na green field that has a man with a kite\nTwo dogs running and playing in the sun.\nA large metallic refrigerator freezer combination in a kitchen.\nBrown dog sleeping on a bed in a bedroom.\nA sitting room with three chairs a settee a sofa and a fire place.\nA person on a motor cycle in the street with blurry buildings behind them.\nCloseup of a corner of a metal tray containing three hotdogs.\nThe large grey sofas have throw pillows on them.\nThe picture shows a snow skier skiing down the hill.\nThe large cow is wearing a blue tag around it's neck.\nGroup of people standing outside a farm holding vegetables.\nA man riding a bike past another man without a shirt.\nA man in black jacket riding on a motorcycle.\nA zebra wagging its tail as it eats some grass on the ground.\nMan wearing glasses brushing his teeth in bathroom.\nTwo brown horses tied up at a post.\nA child on a surfboard floating in the ocean.\nThe glow from the lights are super blurry.\nA tall man eating and drinking next to a lady\nA surfer is riding a wave in the ocean.\nA plate with chicken,carrots and mashed potatoes with silverware.\nTwo men cooking food outside with jars of food behind them.\nA child tries to catch a frisbee in a park on green grass.\nA sandwich is cut into triangles and served with a salad on the side.\nA man who is going down a hill on snow skis.\nA time-lapse photo of a guy doing a skateboarding trick, jumping over a curb.\nA baby elephant standing under an adult elephant.\nLady posing with two horses standing on a street.\nThe fire engine is ready for any emergency.\nA group of people riding on the backs of elephants in a river.\nA man on a snowboard in the snow.\nA red stop sign with two green street signs posted above it.\nHe needs to rethink his choice of shoes for riding a motorcycle.\nA horse is standing by a wire fence.\nA bottle of water sits on a table next to fruit.\nA man playing tennis as people sit and watch from the stands.\nA selection of wooden kitchen tools on a counter.\nA stop sign on an empty, foggy street.\nA black cat relaxing in a cat bed on the floor\nSome animals are walking on the street and next to the car.\nThis is an image of the inside of a home with lots of pictures on the walls.\nTwo stop lights mounted on the same pole\nA man and woman stand with bikes in front of a field.\nA kitchen has a refrigerator and ice chest.\nThe view of the headlights, handlebars and mirror of a motorcycle\nA man is riding waves with his surfboard.\nA group of young students eat lunch in the classroom.\na very large teddy bear that is sitting on a chair\nAn old man sitting next to a graffiti covered wall while holding a music keyboard.\na pizza covered with assorted peppers on it\nA bus at a bus stop sports a bicycle rack.\nTwo small black bears walking through a grassy area.\nA man rides his motorcycle through the water on the beach.\nA fire extended hose for fire hydrant in rural area\nTwo children playing with a toy in a park.\nTwo photos side by side of fruit in a basket, vegetables and basil.\nA woman rushes with a handbag through an empty train station with a large clock.\nA man walking with a skateboard towards a concrete ramp.\nA large tub is in a beige tiled room that has two windows and one window is white while the other is brown.\nA kitchen with a magnet-covered refrigerator and a pile of junk nearby.\na clock attached to a tree in front of some buildings\nAn empty room with a light is currently on.\nA girl wearing a pink cap riding her bicycle.\nA large white polar bear walking near a building\nA person doing a trick on a snowboard off a hill.\nA man flying a rainbow kite in a clear blue sky\nA lone shorebird standing on the beach as a wave rolls in.\nA man on a cell phone taking a picture of himself.\nFour jets in the sky at an air show.\nA man in a suit stands at the podium and speaks.\nA man walks on a snowy trail in skis.\nA herd of deer and a single zebra in a field.\nA dark brown giraffe leaning over the short fence of an enclosure\na old black and white photo of a construction truck\nA black streamlined train pulling into the station.\nFour motorcycles are parked by the side of the road.\nThe man holds the umbrella for the woman as they walk through the wilderness.\nA train drives passed a station as another pulls up to the platform.\na couple of different pizzas on a counter top\nA man on a bike in the reflection of a car mirror\nA man on a tennis court is playing tennis in front of a crowd.\nSurfers bring their boards to the water on a crowded beach.\nA bus driving down the road near a church and traffic light.\nA PICTURE OF WAFFLES BACON EGGS, AND JUICE\na batter, catcher, and umpire on a  field during a game\na close up of a person holding a call phone\nA fridge that is halfway open during the night.\nA clock sitting in the middle of the city, in front of a building.\na couple of zebras stand next to some horses\nA group of people looking at an elephant.\na couple of indian men riding down a road on elephants\nthere are two airplanes  that look old hanging and one looks spaceship like\nA group of people cross country skiing in forest.\nA picture of someones meal being served on a plate.\nA herd of elephants walking along a lush green field.\nMan holding a surfboard by the beach in his hands.\nA young man catches a wave on a surfboard.\nA wooden table with an empty pizza box and napkin.\nA man in a green hoodie preparing to snowboard.\nA woman wearing medieval clothing with a cell phone attached to her belt.\ntwo women and a man holding a big white surfboard\nA man that is sitting down holding a sandwich.\nA group of friends posing for a picture together next to a pizza.\nAn orange truck parked next to a pink truck in a  forest.\nA guy on a white and orange surfboard catching a wave.\nA red plate topped with a cut in half pizza with an egg on it.\nA giraffe inside an enclosure with families watching in the background.\nA woman is dressed as a man and a man is dressed as a woman.\nthere is a man that is taking a picture of another man\nThree zebras running along a path in a field.\nPair of zebra standing in open area of grass and trees.\na small child sitting on a women's lap at a dinner table.\nA living room features a white couch and black loveseat.\na monorail going down the track as a bus parks by the side of a road\nA white fire place sitting below a giant clock.\nthree cooked dishes positioned on a wooden platter\na plane flying by a red sky during the sunset\nSmart phone sitting in a red case being hand held by someone.\nA lady in a red shirt shows a man how to use a video game controller.\nA fancy clock face is flanked by two angel statues.\nA cookie is sitting on a plate next to a cup of coffee.\nA woman sitting down holding onto a fork.\nA man skiing alone in a snow-capped bush\nA woman wearing skiis while riding a conveyor belt outside in the snow.\nA GIRAFFE STANDING SURROUNDED BY TREES LOOKING TOWARDS CAMERA.\na lone zebra stands just before a small body of water and looks down\nA couple of sheep standing on top of a lush green field.\nA sign on the side of a building for the business of Tomasino's Cellar Ristorante.\nA garbage truck travels under a stop light.\nA shower with a curtain stands next to a toilet with the lid open.\nA cat is rolled on its side while napping.\nThis is an image of a patrol boat in the ocean.\na man on a surf board rides on a big wave\nAn anime action figure doll on a computer\nA person in a baseball uniform holding a baseball bat.\nA group of men standing next to each other.\nThe sign in front of a French bar which indicates the location of the bar.\nA person is riding waves on a canal.\nA man in a green shirt is wearing a Christmas tie.\nTwo plates with sandwiches on them next to a bowl of vegetables.\nA white horse leaned over eating something in a corral.\nmany difference stuffed animals on a shelf on a wall\nBaseball player standing near home plate in stadium.\nA stop sign over a pedestrian crossing sign.\nTwo horses that are standing in the water.\nTwo eldery people are wnjoying the view of a lake in this park\nTwo people in the living area of an RV.\nA very large bathroom has a two toilets and two sinks and a very large glass bath tub sitting next to a glass shower.\nA photo of a man standing with a ram.\nThe young men are playing a baseball game.\nA giraffe with its head cocked walking about a sandy area.\nTwo giraffes are standing near each other in a field.\nsome people are sitting in front of desks\nA man skateboards in a parking lot while his buddies watch.\nA table set with plates and a cat.\nThe fragment of the burned plane rests on the ground.\nA large bird in the air over a heavily forested area.\nA man is surfing on his board in the ocean.\nA family watches television in a small living room.\na man holds a glowing item while in the dark\nThis person is preparing a  meal in the kitchen.\nBroccoli, carrots and a small amount of potatoes on a plate.\nA mix of beef and broccoli stew on a white plate.\nA flat bread pizza topped with green peppers, onions, and tomatoes.\nA large clock is posted above a turquoise rail.\nRoom with a bed and a chandelier and double doors.\nA red fire hydrant sitting in the grass near water.\nA stuffed animal sitting in a Christmas tree.\nAn unmade bed and a turned on lamp.\nAn old man wearing a hat with a snake around it and a cellphone clipped to it.\nTwo children with tennis rackets hold their hands up.\nA black dog standing on its legs and holding Frisbee in its mouth\nA man standing in front of a microphone.\nWoman sitting on floor next to commode with glass bottle on floor.\nA heavyset adult is outdoors and is wearing sunglasses.\nA baseball player preparing to hit the ball thrown by the pitcher.\nItems of fruit and flowers on a wooden surface.\nA bed topped with two red pillows and a head board.\nThe cat is standing on top of the microwave that is on top of the refrigerator.\nA woman puts her head in an oven.\nSheep gather in a grassy field in front of a lighthouse.\nA dog sitting on a couch under a blanket.\nSeveral species of animals grazing in grassy area.\nA young person wearing a jacket travels swiftly on a skateboard.\nA dog in a bathroom tears up a roll of toilet paper.\nThe women was playing tennis on the court.\nA guy sitting at a desk with a nice monitor by a window.\nTwo dogs are sitting a neatly made colorful bed.\na plane on the air flying very high\nAn old-fashioned safe and roll top desk in a green room\nA surfer is riding on a wave in the sunshine.\nToothpaste,toothbrush,mouth rinse,tongue cleaner and other mouth cleaning things are kept.\nPeople talking in a kitchen with a mixer on top of refrigerator.\nAnimals eating at the side of the road near mountains.\nA living room with a couch, television, and a colorful rug.\nA herd of elephants walking down a dirt road.\nA stop sign in the desert near an empty road\nA girls' soccer team poses with their coach for a team photo.\nA blue plate topped with bread and a salad.\nA hipster couple is giddy at a wine tasting.\nA cabin in snow with people around it.\nMany people are outside celebrating on a sunny day.\nTwo cats that are sitting in the bathtub.\nA book with a train on the cover near a keyboard.\na person is sitting on a park bench outside\nShelves in a dorm room, with knickknacks such as a photograph, a lamp, and a lucky cat figurine.\nA sculpture made up of several traffic lights.\nThe small kitchen has a black counter and wooden cabinets.\nSome type of wooden shower in a bathroom.\nA man stands by as a girl feeds an elephant\nA laptop sitting on a small black desk.\nthe fully furnished basement looks clean and orderly\nA doll with large eyes and blonde hair holds a teddy bear.\nA kitchen that has a tea pot on the stove.\nA small house with a large tower and a walkway leading up to it's door.\nTWO GIRAFFES GRAZING IN THE TREES DURING THE DAY\nnine blueberry muffins in a muffing tin\nlandscape of water with mountains on the horizon and a cloud filled sky\nA baby laying on its belly in front of a laptop.\na goose is standing by a body of water\nthere are three people sitting at a table holding up pizzas\nA family gathered around an outdoor table with drinks and menus.\nA dog catching a frisbee midair as his trainer prepares to toss another.\nA young person on skis lies in the snow\nA pair of scissors on top of a piece of paper on top of a rock.\nJet plane flying high in sky on partly cloudy day.\nA person flying a kite near a basketball hoop\nA pink and white laptop and three computer monitors on a desk.\na white bowl and a blue strainer and some bottles\nA small hotel bathroom has been well stocked\nA city bus parked by the side of the street.\nThe parking meter is empty by the building.\nA man flying through the air on a  skateboard.\nA child in a colorful airplane tie standing against a wall.\nA pitcher on the pitching mound in a \"after pitching\" position.\nA young girl blowing out candles on a cake.\nA person standing on a surfboard in the water.\nA man swinging a tennis racquet on a court.\nA young man wearing a dress shirt and a tie.\nA woman carrying a surfboard on top of a snow covered ground.\nA train pulls up to a platform with a line.\nAssorted flavored donuts being grabbed by multiple hands.\nA man in a safety suit walking along the edge of a dog where a cruise ship is docked.\ntwo benches placed on a snow covered land\nSome people at a table with some nice desserts.\nYoung boy with stuffed toys lying on bed.\nAn airplane is in the shallow blue water.\nGuy and his small dog out in a motor boat amongst bigger boats\nThe moon overlooking the boats in the harbor.\na dinner plate with steak, vegetables, and a baked potato\nThere is an old fashioned blue refrigerator and ice chest in a kitchen.\na desk some books a speaker and a video game system\nA hotel room with a bed, desk and chair.\nSome people with rackets on a tennis court.\nA small animal, maybe a baby sheep, is outside.\nA bunch of fresh produce sitting on a paper towel.\nDecorated living area with desk and cabinets with television.\nSkateboarder grinds along planter in an outdoor plaza.\nA group of snowboarders glide on the snow as a large snowy mountain stands in the background.\nA woman walking her bike on a busy sidewalk.\nA red traffic light sits on the street.\nA man is riding an elephant that appears to be playing basketball.\nA large grassy field filled with grazing cows\nA bus with three people getting out of it.\nA bird that is perched on some vines.\nA young male baseball player is about to swing for the ball.\na man in a uniform standing on a pitchers mound\nThe plate is full of pizza with chicken and vegetables on it.\nA group of teddy bears with princess crowns on.\nA young man and women in a very short skirt and heels.\nFour men standing next to a small airplane.\nA plate topped with a donut next to a cup of coffee.\nA refrigerator that still has its sale tags on it.\nA couple of men riding on the back of an elephant.\nAn elephant standing alone in a wooded area\nA red stop sign near two large buildings.\nMulti-colored patterned pillows on top of a white in an empty bedroom.\nA baseball player is getting ready to hit a ball.\nA stop sign below a lamp post at night.\nA male officer and another man looks at laptops\nThere is a full view of an outdoor area and it is nice.\nA man on a tennis court with a racket in his hand.\nA tall multi story building painted with colorful designs.\nA plate with a very big and tasty looking sandwich.\nA moving truck filled with furniture parked on the side of a road.\nA plate with a sandwich on it and several pieces of silverware on the table.\nA kid standing in the batters box, preparing to bat.\na bedroom with a circle purple bed with a view of a tv\nA very odd shaped but pretty style clock.\nThree elephants standing on a stool with woman sitting on their necks.\nMan and woman standing under a red umbrella.\nA stop sign obscured by the brightness of the sun.\nA young zebra sucking its mother in the wild\nThe stop light has various blue directional signs,\na tennis player wearing a red shirt  is playing tennis\nA man standing on the side of a court holding a microphone.\nan image of a man going on the ocean waves\nA statute built into the side of a building.\nPink flowers sitting in a flower pot full of water.\nA furnished doll house with stairs to a second floor.\na small bird stares out of a window looking at the outside\nA man in a reflective vest walks toward a parked airplane.\nA cage filled with candles sitting on a table next to a vase and another candle.\nA yellow banana sitting on top of a table.\nA young zebra is between two larger zebras.\nA beagle pads away from the camera across a reflective surface.\nMan walking on a sidewalk that is sloping downhill approaching the corner.\nDesktop computer setup with ergonomic keyboard and headphones.\nA down hill skier racing down the slopes in a blue ski suit.\nA little boy playing, eating and shopping while in a shopping cart.\na man holding a white umbrella in a wooded area.\nA young man stands on a skateboard on a sidewalk.\nA couple of giraffe sitting on top of a lush green field.\nA street sign and some cars next to a building.\na lady taking a picture of a red bus\nSeveral remote controls lines up next to each other.\na woman reading a book with another woman standing right behind her with an umbrella\nA hot dog wrapped in tin foil covered in ketchup  relish.\nClose up back and back of head of a cat in dark with two rectangles of light on ground in front.\nA pile of vintage suit cases in the middle of a building.\nA sandwich and a pickle with a bowl of food on a plate.\nYoung man with crew cut and dark denim shirt taking selfie in bathroom mirror.\nA person with some skis posing in the snow.\nA room with furniture, wood accents, and a fireplace\na blue tank of compressed gas near a house\nA man sitting on a train next to a woman.\na girl with a game controller with a boy standing next o her\nA motorcycle is parked in front of two people.\nA person on a snowboard rides down the snow.\nA woman crossing in front of a double decker bus.\nA pot full of vegetables is sitting on a table.\nA living room filled with blue and white checkered couches.\nA woman sitting on the floor with a teddy bear\nTwo men cooking and packaging food in a kitchen.\nA man walks next to a couple of horses loaded with supplies.\nA cup of coffee is sitting next to a laptop\nA few kids playing in the yard with a frisbee\nA large passenger jet with it's landing gear down.\nA dog wearing a collar standing next to the water.\nTHERE IS A MAN THAT IS JUMPING A RAMP WTH HIS SKATE BOARD\nThe view of an elephant's head through a display window.\nA dog leads the way for two crosscountry skiers\nTwo hot dogs in wrappers on a table.\nHere is an Asian standing by a yellow fire hydrant.\nTwo zebra standing next to each other in front of a cart full of dry hay.\nThe woman serves the tennis ball as a child watches.\nA skateboard enthusiast doing a jump on a skateboard on concrete near a small tan brick building with tinted windows.\nThe four skiers chose to wear bright colors, standing out from the snow covered white mountain.\nA city bus thats turning a corner with another at the intersection.\nA couple of elephants walking down a dirt road.\nA variety of fruits and vegetable on a plate.\nThe man is having to work outside in the rain.\nAn elephant walking draped with a colorful blanket.\nSmall silver cellphone sitting on top of a wooden table.\nan ocean a white fence and a black thing on some rocks\nYoung boys and their coach playing baseball in the sun\nA shop window with people outside on the street reflected on the suface.\na toothbrush holder is sitting on top of a bathroom sink\nA blue and white KLM Asia plane being serviced at an airport.\nAn old ad is showing a retro kitchen.\nSpacious kitchen with a center island and stainless steel appliances.\na bath room wit ha sink and a bath tub\nA man with glasses and in a suit talking in front of a microphone.\nA zebra brazing on green grass next to a pile of rocks.\nSheep and a woman in a field in front of a cityscape.\nA raw cut of meat still on the bone being seasoned.\nA man wearing glasses standing next to an airplane.\nFOLDED ROBE TIED UP LIKE A PRESENT IN A HOTEL ROOM\na vintage photo of a man washing a lamb\nA horse running by itself through a flat area of land.\nA man wearing a stripe shirt and a yellow neck tie.\nA woman taking a picture in a garden by a polka dot umbrella.\nA giraffe that is standing near rocks while an ostrich stands behind it.\nAn open refrigerator with various fruits and condiments in it.\nAssortment of baked pastry items displayed in case.\nA woman standing at a table filled with red lobsters.\nLong old train barreling through the mountainous countryside.\nA closeup action shot of a person surfing.\nA hand reaching out towards a standing giraffe\na table with some dishes with food on it\na tv near a closet and a book shelf\nA group of teddy bears in glass cases.\nSeveral people that are drinking beer together and talking.\nA large group of cows on a field.\nGreen wooden shelves holding blackened bunches of bananas.\na number of small boats near a body of water\nA bathroom sink next to a white toilet under a mirror\na black and white photo of a person with a cell phone\nCarrots are being cut into pieces with a large knife.\nA woman laying in bed with a powder puff girl pillow.\na small child dressed in adult clothing by a stair case\nA baseball player taking a swing at a ball\nTwo men in suits shake hands outside of an airplane while others look on.\nthree young cows in a fenced pasture with a  short black dog following them\nA windmill placed near several cows in a grassy field.\na shirtless man is skateboarding in a pool\na group of kids playing frisbee chasing it\nA man riding a sled down a snow covered hillside.\nGray cat laying with head on laptop on top of couch.\nAn active computer monitor that is sitting on a desk.\nA bathroom with a toilet, sink, tub and shower curtain.\nA person standing outside on the beach looking at a Frisbee.\nA man riding a blue two seat motorcycle wearing a helmet.\nTwo sheep are standing in a field next to a wall.\nA kite surfer rides the waves of the ocean.\nA black and white cat laying on top of a keyboard.\nThe man in a business suit has a bag on his shoulder.\nA group of men with volleyball's in pink uniforms.\nA female jockey riding a horse spectators in the background.\nA baseball field showing the catcher, umpire and a person up batting.\nsome food is laying out on some dishes\nTwo cows that are standing in the grass.\na large air plane flying in a sky\nA red plastic basket with two hot dogs on it.\nA bicycle leaning against a pole outside of a coffee shop.\nThe unmade bed has three pillows on it.\nA man with a hand bag standing in a room.\nA computer, keyboard and framed photo on a wooden desk.\nA beer mug that contains water and flowers.\nA cow standing next to a brick building.\nA variety of food is displayed on a table.\nA baseball player extends his swing to hit a pitch.\nA closeup shot of the insides of a squash.\nA person holds an apple slice with peanut butter on it.\nThe people has there umberellas up for the rain\ntwo ladies in a kitchen preparing some food\nTwo plates of food with vegetables and bread.\nA man preparing to ski off a steep slope.\nA variety of food items are displayed in dishes.\nA yellow street sign warns of a hump in the road.\nA kitchen and dining room table and chairs sitting next to a living room with a chair and couch in it.\nA group of boats in a body water on a clear sky day.\nA bunch of craft supplies and a pair of glasses.\nA group of people that are standing with umbrellas.\nA pizza with toppings and a missing slice.\ntow pieces of a desert on a plate on a table\na couple of people stand on some dry leaves\na couple of people that are laying on a couch\nA silver commuter train at a train station next to luggage carts.\nA guy smiling while standing under a run for rights banner.\nThree white flowers in a vase with flower images on it.\na field that has a bunch of cars in it\nA baseball player getting ready to catch a ball with his glove\nA cat sits between a window and a large birdcage.\nPeople cross the street in a busy downtown city area\nThe Time clock is in the center of town.\nTwo paper plates sitting on top of a table covered in pizza.\na living room with several chairs and a small table\nTwo women are sitting on a bench reading a magazine next to a bike rack.\nA group of men sitting around a living room in front of a tv\nAn Air France passenger jet is parked on a tarmac.\na person sitting at a table with a laptop\nA lone giraffe standing next to a river.\nA happy little boy with a banana in front of his face.\na man sitting in a chair with a cup  in his hand\nA street scene with a horse and carriage and buildings in the background.\nA baseball player swinging his bat in front of a crowd.\nSome boats in the water outside of some industrial buildings.\nA woman walks down the street alone late at night.\nA man standing near a van advertising a movie.\nHe does have control of the motorcycle while pulling a wheelie.\nA clock in front of a window on a winter day.\nA man jumping off of a red skateboard.\nA pair of youths pause for a photo on a ski slope.\nA bunch of plates that are laying on a plate.\nThe Big Ben clock tower towering over the city of London.\nBlue-and-white jet airplane sitting at an airport runway.\nA large white sink sitting under a bathroom mirror.\nA cooked pizza that has been placed on a table.\nAn old photo of a man on a motorcycle and cars in the background.\nA teddy bear sits next to a mossy tree behind some green leaves.\nA young woman in a bikini surfs a small wave.\nTwo men smiling in a grainy photo while holding a banana.\na black orange and yellow train on its track and some trees\na couple of bears are standing in a field\nA white and green fire hydrant sitting next to a light.\nA baseball player prepares to swing as a pitcher throws the ball.\nTrains parked on rail road tracks next to a tractor.\nAn image with multiple photos combined in it.\nA banana with a frownie face drawn on it is by a computer.\nA toddler sliding down a snowy slope on skis.\nTwo people are by a railing feeding a giraffe.\nA train inside a building going down the train track\na woman is standing outside talking on a phone\na group of people standing around in the park\nA crowd of people standing below the Eiffel tower.\nA trio of little kids in front of a birthday cake\na cat laying in some blankets on top of a bed\na man sits on the ground with a guitar\nWoman holding a small baby in front of her computer.\nSome art work with a man with a hat on and some fruit in a bowl.\nA triple decker sandwich is cut into quarters.\nA piece of pizza sits on a white plate that has gold accents.\nTwo children interact with a television video game, while a third person looks away.\ntwo beds are shown as the light creeps in.\nA group of baseball players standing on top of a green field.\nA couple of boats floating on top of a river.\nA group of people riding skis across snow covered ground.\nA white toilet sitting up against a brick wall.\nA ceremony for military men from US and China\nTwo women sit together as one of them dries her hair.\nA wooden bench leaning against a blue wooden wall.\na plate holding a big pizza in the middle of the table\nthree buses are parked at the buss station\nAn elephant statue standing on top of a lush green park.\na giraffe rinsk soem wate rin a nice pond\nA man in a harness holding a waterboard.\nA couple of men lying on some couches with covers on.\nA man jumping for joy in a field of kites\na fork thrust into what looks like a pan filled with potato chips\nSome cows and horses are outside grazing together.\nA fully stocked bathroom with a vanity mirror.\nA man balances on one end of a skateboard.\nGroup of people with wine glasses standing near table.\na couple of kids stand with a toy\nA girl places a white teddy bear in a container\nAn elephant tied up in a city park.\nA circus elephant using it's trunk to hold another elephant's tail.\na man riding a wave with a surfboard\nA father holding his little child upside down.\nA woman and two men posing for a picture.\nA baby elephant walking with two adult elephants.\nA cat sticking its head out of a piece of luggage on the floor.\nA table topped with a bird and plates of food.\nA woman in yellow shirt and skirt with cats in grass.\nmany giraffes standing together as a group eat from a basket\nA woman walking her cattle down the road.\nAn older man is flying a kite with a small child.\na man with a hat standing on a snow board\nA guy is returning a tennis ball that was hit to him.\nA person dressed in black doing skateboard stunts on a skateboard ramp.\nA white and red helicopter above a grassy field.\nSpectators enjoying a tennis game at the US Open.\nPeople shifting the concrete being poured in the forms.\nThere is a display of trophies on the table.\nA cute cat sitting on top of a couch cushion.\nA man is kneeling in front of a large elephant.\nTwo giraffes stand in the grass by trees.\nA bus that is on the side of the road.\nA man stands in the living room and plays Wii.\nA laptop with a green apple taped to its back.\nTwo laptop computers sitting on top of a desk.\nThis three people pose for a goofy photo\na room that has some furniture and a table in it\nThe two cows are fenced in the field.\nA tiny banana with a woman peeling another in the background.\nTwo bulls who are walking on a street.\nThe great wilderness with a white lonely horse grazing.\nA professional baseball player takes a swing in front of fans in a crowded stadium.\ntwo people in a body of water with a wake board\nTHERE IS A VAN THAT IS DRIVING DOWN THE STREET\nA cay laying on top of a blue couch arm next to a wall.\nA clock is displaying the time on a tower.\nTwo bees on an apple hanging from a tree.\nA young lady sitting on a couch in front of a laptop computer.\nA glass shower door in a small bathroom.\nOne giraffe standing and another giraffe sitting in the grass.\nThe bathroom of this house is spotless.\nChildren dressed in snow suits standing in a crowded resort.\nA red and blue small train is on the tracks.\na lady happy she got her tooth brush out of the holder\na man on a surf board riding a small wave\nTrio of elephants walking past a large log\nA man sanding next to an orange frisbee.\nA man riding a skateboard on top of a road.\nA plate of food in a dim restaurant, ready to eat.\nA black and white cat sits on a red cloth that is over a television set.\na man is jumping in the air with a disk\nA toddler wearing a ski outfit and a pair of skis in the snow.\nThe tennis player in the pink sport dress is holding a tennis racket and ball.\nTwo monitors with art from Akon albums on them.\nTea, a tea cup, a teddy bear, and a tea brewer sit on a countertop.\nA couple of people in the snow on skis.\nA picture of different types of herbs and vegetables available from the CSA.\nFour people on a sailboat one is on the phone and three are sunbathing.\nA little boy rolls in a wheelchair pulling a suitcase.\nA bike attached to a car bumper with people with luggage in the background.\nA group of people stand outside, exchanging items.\na bird standing on a plate of partially eaten food\nBaseball game with batter and referee on field with crowd\nA zebra stands with its head down in its enclosure.\nA group of people on bicycles in middle of street next to trees.\nan airplane with people standing under the wing\nA table topped with lots of fruit and vegetables.\nA row of table and chairs along side a street.\nA plane taking off in the air, on a clear day.\nAn umbrella on its top laying on the ground in the sun\nA group of boats are enjoying riding on the sea.\nA woman is wearing a jacket and a tie.\nA bed in a corner of a room next to two window's.\nThere is a horse standing by some grass.\nA baseball player getting ready to swing at the next pitch.\nA herd of giraffe walk through the tall grass on the plains.\nA fighter jet flying over two parked vehicles.\nA group of elephants moving in the middle of a river.\nThe man is drinking a glass of wine in his kitchen.\nA pair of elephants standing in their natural habitat.\nA dog that is laying down on a table.\nSnowboarders walking through the snow carrying their boards\nMany people in business attire are sitting around tables.\nA boy stands among a row of red mopeds.\nA woman with a nose piercing is holding and looking at her cell phone.\nA group of motorcycles on a street next to grassy area.\nA photo in an airport showing a backpack and a cell phone.\nA man in a tie and backpack is drinking a beer.\nThree zebras are standing near a gate in a wall.\nBlack and white photo of an old car on its side.\nA airplane parked out on a runway by itself.\nA man riding around on a scooter with luggage on his lap.\nA woman brushing a girls hair on a couch.\nA old photo of how things were a long time ago.\nA close shot of a cat staring at the camera.\nA woman is sitting in a garden tub while brushing her teeth for a window view.\na person leaning on a bank holding a remote in his hand\nA boy is standing out by the water\nA person that is in the snow doing a trick.\nTwo boys sitting,younger one is trying to read something.\nA one propeller airplane is in an airplane hanger.\nA man in a car wearing glasses and a shirt and tie.\ntwo people playing with a frisbee on a foot ball field\nA living area with a christmas tree in it\nA zebra standing around in the middle of a field.\nCat lying on top of a shelf with its front leg hanging down.\nA pile of paper towels is on the floor next to a toilet.\nA man that is on a pair of ski's in the snow.\nA man cutting up scallions at an outdoor table\nA bird as it flies lonely through the sky\nA dark skinned child getting ready to be pushed on a swing.\nA hotdog is placed on a table next to some french fries.\nA blue double decker bus that says Garage on it.\nAn red fire hydrant beside a grey fence.\nThe little girl is sitting in front of the computer.\nFour chairs sit around a dining table with papers and shoes on it.\nA couple of cows with wreath decorations on their heads.\nA blue train stopped outside of a train station.\nA brown horse grazing on grass in a field.\nTwo men and two women, all wearing flowers, are posing for a picture in formal wear.\nA view of a bathroom sink and porcelain tub.\nA Starbucks teddy bear sitting in a Starbucks.\nA cat laying its head against a teddy bear.\nA big dog is resting halfway out of the window.\nA baseball player throwing a baseball bat from home plate.\nA kitchen with white cabinets, black counter tops and a white breakfast bar.\na little table covered with paperwork, books and a laptop\nA woman holding a yellow umbrella standing near window.\nA cheese pizza sitting on a white tray on a table.\nA group of people with wine glasses stand together.\nA salad with broccoli, cheese and radishes is in a bowl.\na number of people sitting at a table with a cake\nMan in a black plaid shirt eating food while standing up.\nA pink flower sticks out of a narrow white vase.\nA couple of men working on a boat that's docked at a pier.\nA group of children are wearing school uniforms.\nA red and white air plane is parked on the run way.\nGrey dog laying down in black and white sheets.\nA kid in a baseball uniform holding a baseball bat.\nA commercial airplane is flying low to the ground.\na woman holding a wil controller with a steering wheel\nA busy street with many people standing around and lights on.\nA baseball game is in action as the catcher leans for the ball.\nA herd of sheep grazing in an open pasture.\na bowl with an apple and some bananas and some books\nTwo men in suits and ties shaking hands.\nA group of people sitting at a table with stacks of books\na bunch of vegetables and fruits sit on a chopping board\nA disembodied hand holds up a cellphone to take a picture of something on stage.\nYoung boy on blue skateboard in parking lot.\nThree cows that are standing in the grass.\nA flip phone open to a test message\nWorking man sharpening scissors with electric circular sharpener.\nA couple equipped with umbrella hats taking a break from walking their dog on a bridge on a rainy day.\nTHERE IS A STUFF ANIMAL WITH ONE PURPLE CLOSE WINKED EYE\nA man standing on a tennis court holding a tennis racquet.\na hand is holding a single banana to eat\nPretty blue flowers sit in a vase in the sunshine.\nA man's handicap restroom located in an establishment.\nVarious buisness signs and an ornate lamp post in the city.\nA cow that is laying down on the street.\nA guy on a snow board does tricks in the snow\nA plate that has various types of donuts on it.\nSurfer riding a large white top wave on the ocean.\nTwo people standing next to a statue that is an invisible man.\nA person walking across a snow covered ski slope.\nA baby sitting at a high chair in front of a table filled with food.\nThree hungry boys pose with a loaf of bread.\nA group of people walking down a street next to buildings.\nThree white castle hamburgers sitting in a white castle food bag.\na couple of people standing on a beach next to surfboards\nA large inflatable soccer ball with spikes floats up from a field.\nA male getting ready to throw a pitch at a baseball game.\nThe yard is full of stuff such as a truck and a tug boat.\nA wall that has a large number of clocks on it.\nA man turns to smile for a photo while talking on the phone\nA close up image of a type of salad.\ntwo kittens sitting on a woman in a chair\nA woman is flying a kite in a city park.\nseveral sheep watching two sheep standing by a drinking tub.\nA young girl holds up a pink umbrella.\nA man in a baseball uniform standing with a bat.\nA bench next to a tree in a park.\na bathroom with a corner toilet and a sink\nA boy with a cast is kneeling by a skateboard.\nA cat sits on a desk, on top of papers and in front of a computer.\nWoman playing in a tennis match in a tennis court.\nthe fire hydrant has on the side of the road\nAn \"on-deck\" batter watching the baseball game from the on-deck circle\nA man standing next to a brown piece of luggage on a floor.\na railroad bridge with an old  train crossing it\na group of animals graze on some grass\na number of baseball players on a field\nA selfie of a woman taken looking into a car mirror.\nA heron is standing on the edge of a body of water.\nA parking meter sits by a brick wall.\na man getting ready to grab a frisbee as others watch\nA cat sitting on top of a car outside during the day.\nA little girl and goat standing in the rain while the girl holds an umbrella\nThe boy wearing green is playing tennis on a green court.\na fire hydrant in the middle of a large paved area\nSome people walking on the top of a snow covered hill.\nTwo children struggle over a bat in their playroom.\nThere is a man with glasses that is letting a spider crawl on his arm\nA dirty wok on top of a stove beside a dirty tea kettle.\nTwo people that are laughing and holding a kite.\nA horse is trotting past a man on that walks behind him in the pasture.\nA man wearing a brown hat and a uniform shirt is holding a cockatoo upside down.\nA desk that has a drink in the middle of it.\nA man is skiing down hill using both ski poles and the snow looks powdery.\nA broken tv next to a brick on the street\na girl is standing on her bathroom sink\nA wooden cutting board with a knife, plate and several different vegetables.\nThe huge truck is carrying a construction tractor on it's bed.\nA couple of people sitting on a wooden bench.\nToiled in a dirty bathroom with a concrete sink and tiled walls.\nHouses of parliament on the edge of the River Thames.\nA motorcycle is parked on the side of the road.\nAn empty city bus travels down a city street.\nA desk that has a laptop computer on it.\nsome people a bus and cars a street lights and buildings\nA man is skiing down the hill next to a sign\na person is riding a motorcycle by a grassy hill\nPeople at an outdoor table eating pizza while surrounded by a crowd.\nA pelican strolls in the shallow water at the shore.\nA dog is sleeping on the step by a blue door.\nA man sitting on a bench next to a dog.\nAn elephant stands in a grassy area with words written on his body.\nA bicycle parked next to a wooded area, with a large brown bird perched on the bike seat.\nan old photo of three people holding skis on a snow background\nAn old dirty toilet and a sink in a bathroom\nA zebra standing on top of a lush green field.\nA ski instructor teaching a class of children.\nA herd of sheep standing below very tall buildings.\nA baseball player who is sliding into a base.\nA little girl standing on the grassy area of a beach.\na person sitting on a bench near other benches\nA purple, red, and orange  commercial airplane on a runway.\nA chicken or tuna club sandwich made with homemade bread.\nA sign on the side of a snowy road stating avalanche zone.\nA black and white coin meter on the side of a road.\nA motorcycle with a suitcase tied to the back of it\nA cat in a bed hiding under the cover.\nThree men wearing red standing on top of a ski slope.\nA sink and a dining table in a kitchen.\na plate with a bunch of meat and vegetables on it\nA triptych depicting skateboarders who are mid air.\nA couple of horses standing next to each other.\nA VERY TALL GIRAFFE AND A COUPLE OF PEOPLE NEAR IT\nClosed toilet, sink, and mirror in a modern bathroom.\nAn assortment of pens and pencils is spread before a keyboard.\nA white sign that reads no turns hanging from a traffic light.\nThe small bathroom has brown tile on the shower walls and floor.\na person riding a skate board at a skate park\nBus backing up and being loaded onto a truck outside\na conveyor belt holding some donuts after being deep fried\nA man is waiting for the wave he wants to ride to the shore\nA biker standing next to a motorcycle. near a garage.\nA group of elephants walking down a street with people on them.\nA man in a wetsuit surfing on a clear day\nFemale tennis player touching the US Open logo banner.\nA young man swinging a racquet at a tennis ball.\nWe see a girl playing a game on her Wii console.\nOne boat on the beach with the water in the back round.\nThe kitchen is full of various gourmet ingredients ingredients.\nTwo young children sit in bed and play on computers.\nOne man looks at the camera while another looks away\ntwo men in a kitchen making stuffed potatoes\nThree sheep next to each other at a farm\nThere are several pumpkins being used as decorations.\nA long nosed train on the tracks near a station.\na tennis player with a racket on a court\nA road side with graffiti sprayed on it to alter its message.\nA slow children street sign cutout is propped up next to a fire hydrant on the side of a road.\nA park bench next to fence and trees by grassy field.\nSheep grazing in a wise open green field with clouds above\nA man is standing on a carriage pulled by four ponies.\nThe tot is making a face to indicate a distatse for certain vegetables.\nA desk with two computers on it.\nPerson in gray hooded jacket attempting t cross busy street.\nA person holding a glass of champagne in their hand.\nA man on a laptop on a coach in his living room.\nA black back pack on the side of a dirt road.\nThere is a bacon, lettuce and tomato sandwich.\nA woman holding a cell phone to her ear.\nA few sheep eating and grazing in someone's yard.\nThat cake as fresh strawberries on the top of it.\npizza a knife and fork a bottle of wine and a glass\nA group of young skiers pose in a line on a snowy slope.\nA man in a wet suit crouches down as he rides a wave on his surfboard.\nA plane flying by a runway on a slightly cloudy day.\nA large long train going down a track.\nHumans holds dog back in a swimming pool\nA man riding a bike next to a bus on a street.\na man holding onto a rail in the middle of an empty parking lot\nAn empty wooden bench sits near a neatly trimmed lawn.\nA young boy on skateboard riding on a ramp.\nA man doing a jump on a skateboard\nBottles of Pellegrino are stacked on refrigerated shelves.\nA person riding a white board surrounded by a group of people in the ocean.\na close up of street signs with buildings in the background\nA person standing on the beach flying a kite.\nPair of electronic parking meters in front of a red truck.\nA bathroom done all in tile that is clean.\nA jeep that is sitting in a field with a large fire and smoke in the background.\nA bus that has bags of luggage on the side of it.\nA Volvo bus parked on a road near a hotel.\nPedestrians, a rider on a scooter and several bicyclists cross an intersection at a crosswalk.\nAdult women standing at open refrigerator filled with beverages.\na balding man in glasses holding an umbrella and wearing a jacket with a very high collar around his face\nA giraffe standing next to a fence near people.\nA bunch of people walking down a street with open umbrellas.\nA girl on a boogie boards catches a wave in the ocean\nAn old unlighted sign hangs overhead advertising \"Open Kitchen Restaurant\"\nA man is flying a cat while a cat watches\nA little kid skiing down a hill holding ski poles.\nA person's feet standing and balancing on a skateboard.\nA large tree situated next to a large body of water.\nLarge group of stop signs in the same area.\nthis is a group of parasails in the sky\na cow stares as it stands in a muddy area\nA Japan Airlines passenger jet climbs skyward with its wheels still down after takeoff.\nA woman standing in a kitchen cutting up vegetables.\nA bunch of people waiting on the train platform for the train\nA pile of luggage, helmet, clothes and mirror.\nThe people are walking to water to surf the waves.\nA horse drawn cart is driving down the road.\nSkateboarders at a park skating in an empty pool.\nMany cattle are trying to find food on the desert ground.\nA street is void of cars at night.\nA tall building surrounded by a crowd of people\nA train wreck near a river draws a crowd.\nA very large room filled with a bunch of diners.\nA kid that is swinging a baseball bat at a batting cage.\na woman standing around a bunch of clocks\na table with some plates of food and some glasses and cups\nA black and white cat with curious look sitting on a desk.\nThis is three cows eating hay from their stables.\nSoccer paying kicking the ball while others look on.\nSome luggage against the wall of a hallway\na person in skies is standing in the snow\nAn asian woman with black hair and a green headband posing with a tennis racket in front of a man with white hair and a cigarette.\nJet airplane in flight landing gear extended down\nA boy doing a skate-board trick on a ramp.\nThere is a food truck set up under a bridge.\nA young child looks at a group of zebras.\nA traffic signal at an intersection on a city street.\nA man uses his cell phone to take a picture of himself.\nA baseball player hits the ball as the crowd watches\nAn empty park bench in the middle of a tree covered park.\nA kitchen shelf holds an assortment of pots, pans, and utensils.\nYoung man spinning green frisbee on finger along shoreline\nA white toilet bowl with an electronic brown seat.\nA group of Asian chefs stand by bowls of food.\nSome animals are outside in the dirt in the daytime.\nA bathroom with bidet, toilet, tub, and a checkerboard floor.\nTwo people who are standing on a beach.\na bathroom with a strange looking toilet in it\nThe train car is stopped and it is empty.\nThe big ben clock tower standing tall in the foreground\nScissors, a marker, and two other items on a table.\nA guy playing the drums with a very intense look on his face.\nA dog attache by his leash to the side car of a motorcycle parked in a parking lot.\nYoung girl on surfboard riding small wave in ocean.\nA row of parking meters in front of a stop sign.\nTwo zebras stand in a field with tall grass.\nA view of the outside world through a train's window.\nA horse and a dog on a grass field.\nA piece of cake is on a white plate.\nA lone sheep surveys a fern and wild flower covered hillside.\nThe clock is sitting atop the antique building.\nTwo old fashioned black and white buses are parked next to each other.\na beach with a lawn chair and umbrella positioned on\nA man leans down playing a game of tennis\nYoung guy playing tennis on a clay court.\nA woman is sitting outside on her phone\nMan jumping about to serve a tennis ball.\nA band wearing costumes standing around talking.\nKids on a bike while a man is drive a horse drawn buggy.\nA police motorcycle is parked next to a police car.\nA large  two sided clock by a building\na couple of sheep are standing in a clearing\ncook prepares dish by putting it into the oven\nA skier carves his way down the snowy hill.\nA herd of adult and baby black sheep in a fenced field.\nA television that is showing a news program on it.\ntwo children playing with a frisbee in a drive way\nA view of a giant bridge during the day.\nA man riding on the back of a brown horse through a lush green field.\nA parking meter with an hour and thirteen minutes left to go.\nA clock tower stands over a city landscape.\nA small frame building with a large sign.\na group of pizza standing around a table eating pizza\nA man going down a slope near a ski lift on his snowboard.\nA man at a baseball game is holding his bat on the ground with is head on top it.\nA gang of bikers riding down a street.\nThat baseball player looks like he may have done something good.\nA couple cutting their wedding cake at their reception.\nA picture of a boat marina full of sail boats.\nA MAN IS ON A MOTOR BIKE SMILING THUMBS UP\na person riding skis on a snowy surface\nPeople surfing on a white water river.\na couple of trays that have some food in it\nA large tv seems too small for an enormous surrounding cabinet.\nA pan of food is in the middle of a table.\nA photo of a surfboard with a man in the background\nA young boy throws a pitch at a baseball game.\nA pizza in an iron pan on top of a table.\nBlack and white photograph of women walking towards an umbrella.\nA family sitting down for a meal and conversation.\nPeople on snow skis are by a wooden building.\nA small boy playing tennis while holding a racquet in his hand.\nThe sun is setting near the clock tower that reads 945.\nPolice person riding a blue and yellow check motor cycle.\nA giraffe is sleeping on bare dirt next to a dead log.\nThe airplane is about ready to land at the airport.\nA large kitchen with a large center island.\nMany sheep grazing on grass in a field.\nA crowd of people lined up in front of a food truck.\nPiles of unripe bananas sitting next to each other sitting on a floor.\nThe child is sleeping in the bed with his stuffed toy.\nA couple of men and a woman sitting next to each other at a table.\nFour different plates that have food on a table.\nAn old phone shows a horse and wagon on a wide street and children are in the forefront.\nA black and white cow is standing on the grass.\nA white toilet sitting next to a white bathroom sink.\nA skateboarder riding down a ramp in black clothing\nA woman wearing a mask holding a racquet\nA person on a court with a tennis racket.\nA boy riding on a skateboard in the street.\na room with a bunch of teddy bears in it\nA surfer rides a wave on his board in the ocean.\nA train driving past a lush tree filled forest.\na fork with a plate with carrot cake\na couple of kids are sitting on horses\nAn umbrella is attached to a bicycle frame with leather straps.\nA group of people riding bicycles down a street.\nA white tusked elephant at his compound at the zoo.\nThe clock tower appears very tall at this angle.\nA desert dish has powdered sugar on it.\nTwo men that are playing a game of baseball together.\nInside view of terminal building with large sunlit window and a clock.\nA sandwich and a cup of drink on a table.\nA large jetliner sitting on top of an airport runway.\nm mm m m m mmm mm m m  mm mmm m m m\nA stainless steel toilet with the sat up\nA lot of motorcycles that are in a window.\nA pink house with a bunch of bananas outside\nA large black cat sits by the front door.\na station wagon covered in foot prints and stuffed animals\na red and white sign and some parking spaces\nA wine server holds up a wine bottle display for a man to look at.\nA woman with ear protection on swings a bat.\nA wooded table filled with apples, oranges, pomegranate, and cherry tomatoes.\nA man wearing glasses and a green tie\nthis is a man riding a board down a rail\nA woman and man hold a kite with two children nearby.\nA woman is holding up bananas at a market.\nA red stop sign sitting above an orange not dumping sign.\nBack view of a female tennis player wearing orange shorts.\nThe backside of a travel bus on the side of the road.\nA young boy is surfing in the ocean.\nAn empty living room  has a cluttered coffee table.\na train passing on the railroad in a grassy hill\nA close shot of a mini fridge.\nA woman in a black and purple dress poses in front of some tall grass.\nA pile of fruit sitting on top of a wooden table.\nA black and whit photograph of a boy tying a tie.\npans filled with assorted veggies, fruit and rice\nThe old style airplane is flying on a cloudy day.\nA man reading a book in the park\na white bus is on a city street\nA large brown dog laying under an open umbrella.\na lady riding a horse holding the other black horse\nA stop sign in front of a large home\nA group of elephants walking in a green and rocky area with many trees surrounding them.\nA large white dog panting while laying down.\nA man in white shorts stands near a large television screen with a remote.\nSmall multicolored airplane sitting on a landing strip.\nAn orange train rides through the rural countryside.\nThe electronic contents of a bag are placed on a bed.\nA brown horse wearing a bit standing next to a wooden fence.\nA waterway with many people on some small boats.\nA man helping a boy on a paddle board in the water.\nA man cutting a cake celebrating his 50th birthday.\nThe desert is on the table ready to be eaten.\na close up of a child and a dog\nAn elderly man sitting on one of three park benches which are positioned side by side.\na bunch of signs together on a line.\nA blue and silver train is pulled up to a platform.\nTwo zebras are pictured but there is an elephant and other animals in the background.\na person holding an uncooked doughnut near other ones\nA person holding a cell phone next to many others.\nthere is a man taking a picture with his cigarette in his mouth\nA town square with several tall clock towers.\nfood on a plate that matches the countertop\nA man riding on top of a surfboard on top of a wave.\nA man playing tennis in the middle of a serve.\ntwo catcher and a pitcher stand on the pitchers mound\nA baseball player in a blue and white uniform holding a baseball bat.\npeople standing around a table filled with some plates of food\nA high clock tower is brown and has roman numerals.\nA woman is standing on a tennis court holding a tennis racket.\nA sidewalk with various signage and many cars in the street.\nA Polar airliner is parked on the tarmac.\nmany types of vegetables in the vegetable section of a market\nThe angle view of tower with a clock.\nA group of people riding skis on snow covered ground.\nThey look like they are beginning a ski race in the snow.\nA yellow firehydrant on the sidewalk near a building\nA giant desert covered in chocolate sauce next to cups of coffee.\nA bear is snuggling with a bear cub.\nA beautiful woman laying in bed reading a book.\nTraffic has stopped to allow four zebras to cross a highway.\nA man holding a tennis racket in his hands while on the tennis court\nthree skate boarders and one is doing a jump\na tea pot is steaming on the stove top\nA white plate of food on a table.\nA herd of wild horses grazing on a green grass covered field.\nA person manipulating the skateboard with his feet.\nA person sitting in an old refrigerator on the sidewalk, drinking beer.\nThe two signs give directions to upcoming cars.\nA bathroom with a toilet and bathtub and handheld shower.\nA man who is performing a trick on a skateboard.\nA nightlight is on over a kitchen sink.\nWill the elderly women finish the wii game?\nA plate of vegetables, chicken, and white rice.\nWatercraft in a row, floating on the calm ocean.\nSeveral stuffed animals and teddy bears laying on a bed.\nA white plate filled with slice oranges next to a pile of bananas.\nA zebra standing, its face down, grazing on dried grass.\nAn elephant strides through brown grass and trees.\nA kitchen with hard wood flooring and a stove top oven.\nA man surfing down a flowing river rapid.\nA banana filled with melting chocolate on a grill\nA little girl sitting at a wooden table in front of two bowls of food.\nA woman at a desk with a computer monitor and CD case.\na small girl with sunglasses is hitting a tennis ball\nA black cat sitting in a bathroom sink.\nthis is a yellow train riding the rails\nThe sheep is wearing a bell with a blue cord around its neck.\nA group of five zebras stand in a field.\nElephant playing inside mud, with fences surrounding her\na lady on the beach flying a kite\nA dog wearing sunglasses sitting in the front seat\nA person on snow skis in the snow.\na number of people standing in a kitchen near one another\ntwo apples and one banana lying in the shape of smile on a wooden table.\nA group of people ride a double decker bus and hold black umbrellas.\nA bathroom with a vanity sink, mirror, toilet, and bathtub.\nTwo pizzas and three cups of drinks sit atop a table.\nthere are several woman wearing bikinis and waiting for cake\nA desk with several computers and laptops on top.\nA full wine glass means the bottle has less in it for later.\nA man laying on a couch holding a gaming controller.\nsome writting on a wall by a window\nA woman with a bear in a photo with a sign.\nA red buses on a wet paved road by vendors undercover walkway, and one vender on curb with an umbrella over table.\nTwo giraffes are standing next to each other.\nThe farmers are working the land with their animals.\nA person is chasing sheep through a field.\nA dessert with few carrots on a plate near two candles.\nA group of bread slices with cheeses on them in a pan.\nMany people walking in the streets holding umbrellas.\nA bright patio umbrella stands out against the plain white building.\nan image of street signs being crossed in air\nA man is putting a pizza in a oven\nA small dog being carried in a backpack.\nA beautiful marina with many boats docked in it.\nA plate of food and drink on a table.\nA ram standing still in an empty pasture.\nA person is hanging in the air near a building.\na public males restroom with two urinals that are based on the floor\nA cat sitting in an orange chair in a bedroom\nGuys riding motorcycles through the path in the park\nA red stop sign with a no left turn sign.\nA man flying into the air as he catches a frisbee.\nA photograph of the inside of a public men's restroom\nThis boy is practising in a play ground\nA baby zebra getting a drink from mama during the day\nA man enjoys a quick ride down a ski slope.\nAll way stop sign at the intersection of Prairie Street.\ntwo men are riding in a train in hats\nA man holding an orange frisbee on top of a green field.\nA man in tie standing in front of a table.\nTwo white bears on the rocky shore of some water.\nPeople are walking along the beach and people are skiing on the water with parachutes.\nA bearded man riding a skateboard on pavement.\nTwo women on snow skis on a hill\nA large patio area with many table and chair sets covered by large umbrellas.\nPeople watch as a couple of people are skateboarding on ramps.\nA group of people riding a boat on top of water.\nA view of some snowy mountains from an airplane.\nTwo women stand at a store in front of a cooler containing various alcoholic beverages\nA little boy that has a spoon with food on it.\nA black stuffed cat with fangs is hanging on a rack with others.\nTwo red trains at a train station, with forest in the background.\nA person is sitting in a chair on a sidewalk while a bus drives by.\ncloseup of a white horse with someone riding it\nsome food is sitting on a green and white wrapper\nA bull and a dog charge across the field.\na ball player running toward home base by a bat\nA slice of pizza that is sitting on a table.\nA tiled floor bathroom with a red and black shower curtain.\nA display case of different types of doughnuts in it.\nA close up image of a bag of Broccoli florets.\nTwo brown sheep huddle near the back of a large plastic cage.\nkids playing Frisbee in a park on a bright day\nA person riding a bicycle on a street near a building.\nmany bananas hanging above some people in a shop\nA man rides a horse while driving longhorns down the street.\nA person flies a large and colorful kite at the beach.\nA black and white picture of a large house is shown.\na guy painted yellow with blue overalls holding a banana\na row of boats are lined up in the water\nThe couple scoots around town on the motorbike.\nA white sink sitting next to a bath tub.\nA chair is made out of stuffed pandas attached to each other in a clump.\nA bunch of people riding in an odd looking vehicle.\nTower clock designed with two western shooters for entertainment display\nA couple of women holding game controllers in their hands.\nA laptop on a plaid black and white blanket.\nA woman sitting on a rock holding an umbrella.\nA man riding a skateboard on top of pavement.\nA person with a tennis racket on a court.\nA bear peaking over a log in front of a rock wall.\nVariety of fruits being placed into a blender.\na clock on a walk with a bike parked near by\nCows on display on top of codling with people far down below\nA cat is sitting on a bathroom counter.\nA person leaving a trail of snow as he glides on his skies.\nA horse stands near a fence during winter time.\nA half eaten dessert and half empty cup.\nA young man standing on a beach holding a bat.\nA memorial bench with a can of liquid sitting on top.\nA clock tower with a clock against an overcast sky.\nA group of people getting onto a bus.\nA group of people standing in front of a Inn.\nA group of zebras stand in a field.\nA parking meter on an empty street at night.\nYoung boys with alien and spider facepaint tattoos\nTwo men in a showroom for snow skis.\na bed with a red and white bedspread and pillows\na group of children and adults gathered together on a snow covered bank\nSix well-dressed men drinking beer and eating pizza.\nThis is a sink and mirror of a hotel bathroom.\nA room filled with lots of clocks on it's walls.\nTwo dead birds covered in wires sitting inside a outdoor plant.\nA man on a horse near a dog and two cows.\nA young girl wearing a baseball cap eating a hot dog.\nBike left outside next to the bench in front of the river\nthere are many people in this living room playing a video game\nA woman standing next to a fire hydrant wearing  a backpack.\nA woman passing a bear mask on a market tent.\nA task force of drug dogs monitors an airport corridor\na couple of women sit on the ground next to each other\nthree geese on some grass by a pond\nA boy batting during a little league baseball game.\na woman is working with something over a book\nA man and woman that are standing next to rocks.\nThe woman wears a hat and has flowers in the basket.\na girl on a board riding along a boat in the water\nA guy is posing for the camera with a medal around his neck.\nA view from above two men working in a kitchen cutting fish.\nOven light on in a kitchen with wooden countertops.\nA stop sign on a snowy day in the daytime.\nA television sits on a dresser by a window.\nA young man is taking practice swings on the field.\nClose up of a plate of broccoli and stems.\nA stop sign centers an upside down street image.\nA baseball game is in action as a batter swings.\nA woman is holding a sawed off bat while wearing lingerie.\na small bird on a tree brand near fruits and leaves\nhistorical fighter plane on display in an air hanger\nA person eating lunch and using a computer in a cafe.\nA picture of a subway shuttle bus traveling down a city street.\nA very cute small child holding a big umbrella.\nan image of two planes that have just landed\nthere are two stuffed bears sitting on a toy horse\na bathroom toilet with a carpteted seat cover and floor rug.\nThree double-decker buses are parked in a lot.\nA street sign is near a lamppost and trees.\nA laptop that is sitting on a desk.\nA truck hauling a large load to a job site on a winding mountain road.\nA series of steep stairs lay next to a lake\nAn elephant at a water hole spraying water into his mouth.\nA woman with something in her hand in a decorated picture.\nTwo tool boxes sitting next to each other on a  table.\nA white and orange colored cat laying on a bed with its eyes open.\nA person jumping in the air on a snowboard.\nTwo people ride horse beside dogs near a meadow.\nA hot dog sitting on top of a bun covered in toppings.\nA young man has his foot placed on a pole while another looks on.\nThe two men are riding their horses on the road.\nA cat is laying on cozy white sheets.\nTHIS IS A CLOSE UP PICTURE OF A STUFFED BEAR AND MONKEY\nA picture of a person throwing a frisbee.\nA man in tuxedo posing for a photo.\nCloseup of a baseball glove and a black ball hat.\na truck sits next to a big plane\nA man standing in front of a parking meter about to put money in it.\nTwo young children laying in bed next to each other drinking from bottles.\nA herd of zebras where one of them is biting another.\nA dog sits with a frisbee at its feet.\nA wooden computer desk with a computer sitting on top of it.\nA red stop sign sitting next to a wooden electrical pole.\nA man is putting a pizza into an outdoor pizza oven.\nA cat that is sticking its head in a green bowl.\nA clock on top of a post shows the time\nThe face of a cat that is sitting in a sink.\nTwo kitties playing with toilet paper next to the toilet.\nA man who fell asleep with phone on face\nA bathroom with a sink and a bathtub\nLocomotive parked under a brick bridge in a secluded spot.\nA bunch of bananas sitting on top of a wooden table.\nA man in a swinging position holding a tennis raquet while on a court.\nA person that is laying on a bed.\nlarge semi truck with steel front end parked in grass\nA young boy smiling on his skis in the snow\nA woman in white dress playing a game of tennis.\nWoman in blue outfit taking a swing during tennis match.\nTwo girls kicking a soccerball on a soccer field.\nA young woman holding a baby with a teddy bear on her lap.\nA man lying in a field flying a kite.\nMan in baseball uniform playing shortstop waiting between pitches.\nA man with a knife in his belt and a beer in his arm enjoys a sandwich.\nAn older woman with white hair and glasses, seated at a dining room table and another person in the kitchen area.\nA table is set with a full dinner.\nA small kitten fits inside of a gray sneaker.\nGroup of horses in race near canvas fence.\nA man pushing a girl on a swing.\nA green plate of food that includes rice, broccoli and meat.\nThree horses grazing in a pasture in front of a house.\nA toilet sitting outside a building in an alley.\na bento box filled with different types of food\nKitchen photo with window over counter and a bowl in the middle.\na large vase with a big colorful boquet sitting on a table\nA bathroom with full vanity and wall mirror.\nA group of motorcycles parked in front of a white church.\nGroups of skiers near a ski trail in the snow.\nA white sink that has a necklace, a rubber ducky, toothpaste and some beauty items laying around them.\nA person skiing on a snow covered mountian\nA man is enjoying surfing in the water.\nA clock on an ornate metal pole in front of a shop.\nA train traveling over a river on a bridge.\nThe dog has a frisbee in his mouth in the snow.\nA cat and a dog sit in a colorful bathroom.\nA bathroom area with a tub, shelves and a sink.\nA dining table and a lamp are beside a fireplace.\nA plate full of couscous with mixed vegetables\nA young woman with tattoos using her cell phone..\nTwo beer trucks are parked beside one another and unloading.\nA person is looking at a pair of scissors.\nA colorful illustration of an old train and stormy skies.\na big airplane taxiing on a wet runway\nA man riding a bike next to another person on a bike.\nA couch and rocking chair are in the small living room.\nA couple of people sitting on a beach watching an assortment of para sail chutes.\nThe open faced sandwich contains a meat in casing.\nA man wearing winter gear snowboards while several people snowboard behind him\nA plate contains a meal of meat, potatoes, eggs and fruit.\nA white couch in a living room filled with Christmas decoration.\nA silver suitcase on a wood floor with a pair of black and white shoes next to it.\nan oddly tied tie on a pink shirt\nA young child learning how to ski down a slope.\nmany elephants are walking on a trail and some trees\na cat is sitting on a couch in a room\na big white bed with a dresser and lights new to it.\nA sign is displayed on a traffic light.\nFans sit in camping chairs along a fence to watch a children's baseball game.\nA group of people are eating near a wooden bench surrounded by trees.\nMan installing an OS while giving the \"devil horn\".\nSome people playing a wii video game in front of crowd\nA man cuts a bowl of greens with scissors.\nA group poses in ski gear in front of Olympic rings.\nA baby in an adult's arms is gnawing on a toothbrush.\ntwo people sit parked next to each other on motorcycles\nA car sits on the side of a road with letters written on it.\nWoman sits on beach with laptop pondering what to write.\nA person is giving a piece of crust to a dog.\nThe group of cows stand in a river drinking the water.\nA skaterboarder is doing tricks at a skate park.\nA person surfing on a surf board on some waves\nAn assembly line with doughnuts moving through an automated fryer on it.\nA baseball player hitting a ball with a bat.\nA dark street with signs and buildings on the side.\na older male with his mouth open wearing dusct tape.\nA blurry photo of people watching a bunch of horses.\nA white plate with a hot dog in a biscuit next to fried potatoes.\nA fruit cocktail with banana, oranges, and various other fruits\nA blue toilet is sitting in a blue bathroom.\na bunch of colorful items on a black plate\nAn old man with eyeglasses stands next to a giant screen\na toilet a shower a tub a sink cabinets and a mirror\nA man on a snowboard rides on the snow.\ntwo people in a green field playing with a frisbee.\nThe person is skiing at the bottom of a steep slope.\nA mountain view with a plume of smoke in the background\nA bathroom has a sink, toilet and an orange bucket in it.\nA dog jumping to catch a thrown Frisbee.\nA group of giraffes stands around near a watering hole\nA man in a suit and sunglasses drinking from a paper cup.\nA smiling woman eats outdoors with a group of people.\nA very cute little bird on a green leaf.\nA guy is cutting something out of a piece of paper.\nGuy plows the field behind two strong horses\nMany containers of food are on the table.\nA woman with an umbrella walking her dog who also has a smaller umbrella.\nA lighted mirror in what appears to be a bathroom.\nfifteen different varieties of doughnuts in a display case\nA mouse swimming and another climbing out of a river in a wooded area.\nA man standing next to a truck near a forest hillside.\nA person sits on a bench with the skyline in the distance.\na giraffe is crossing the road in front of a car\nA group of people standing near a bus\nThe man stands on the beach prepared to enter the water with the green sail.\nA basket full of white biscuits on a table.\nA bed with five pillows under a hanging print.\na man on a giant bicycle rides by a tall pole in front of an empty, large field in front of some mountains\nA train sitting next to a train station near other tracks.\nKeyboard with iPod shuffle in front on desk\nA giant chair with a horse statue on it\nA cement elephant on the other side of a fence.\nBox with picture of a hand holding a Nintendo wii remote.\nThe cat is sitting on top of the black suitcase.\nA group of people is standing in a driveway.\nFour people sitting at a table with a large pizza and cans of soda.\nA picture of a families living room with nice furniture.\nA tennis player who just hit the ball to their opponent.\nTwo newspaper stands with a fence behind them\na brown white and black animal and two people on a motorcycle\nAn older lay sharing a birthday cake with some little girls.\nA woman standing on the top of a snow covered slope wearing skis.\nA simple hotel bathroom with two sinks, free mini bottles of shampoo, and a hair dryer.\nthere is a man riding a bike and waving\nA man and a woman sitting on a couch.\nA bear sits on the rocks by a pool of water in a wildlife exhibit.\nA woman holding a colorful umbrella with writing on it.\nA man on a surf board rides a rough wave.\nTwo giraffes and a zebra with several trees.\na silver and black motorcycle is lying in the dirt\nA tiled shower, molded plastic bathtub, shelf, mirror, wooden vanity, lamp, and sink make up a beige colored bathroom.\nA hazy sun over chairs and an umbrella on the beach.\nA lawn chair sitting on top of a beach covered by an umbrella.\nSeveral monkey figures hang on a bedroom wall.\nA pair of scissors, a tape measure, and a spool of thread sitting on a piece of folded fabric.\nA flat screen TV sitting in a living room next too a shelf.\nWhite parrot sitting on a ledge eating a seed pod.\nA street corner with a stop sign and it's wet from rain.\nA woman stands holding a white controller near some chairs.\nA black cat is nestled among indoor plants.\nA tennis player hitting a tennis ball in a professional game.\nA fat cat laying in a bathroom sink.\nA woman stands in a dimly lit kitchen at a gas range.\nA man in grey shirt jumping on a skateboard.\nThere are parking meters alongside of the railroad.\nOld worn red truck parked in a driveway near a cactus.\na lot of people standing in the middle of the road with red stop lights\nA professional tennis player holding a tennis racket at the US Open.\nPeople on a shoreline are flying kites on a clear day.\nA commuter train leaving the clean subway platform\nA giraffe is standing tall next to a tree.\nan old photo of a miniture pony pulling a cart\nA man serving a birthday cake to a woman\nA kitchen has a washing machine in it.\nA cat sits next to a laptop on a desk\na white and green street sign and a traffic light\nA man reading a magazine and sitting on a toilet that is outside on the street.\nA large building with stained glass windows and a clock.\nA cat laying on its back with paws up in the air.\nA man rides his motorcycle through an alley way.\nA flock of doves and a man sitting in a park.\nThis is an image of a cake with a bear surfing.\nA man with a gun standing in formal dress.\nThree people riding horses together down a trail.\nShe is serving the tennis ball pretty high.\nA bridge over water near several buildings in a city.\na man that is skateboarding on a ramp\nA herd of zebras standing in some algae covered water in front of a sandy plain\nthere is a green vase with a plant inside of it\nA man holding a bowl with an open oven\nA man who is snowboarding down a hill.\nA black and white cat sitting in a chair.\nsome veggies are in a small cardboard box\nIn the dark two street signs are glowing.\nA surfer sits on a beach next to some surfboards.\nA large bus on a open city street.\nA bathroom that has some open windows in it.\nA man standing on a beach holding a surf board.\nThree geese that are standing by a pond.\na guy on a bicycle and a guy flying a kite\nA photo of a person being taken in this picture.\nPeople in a large body of a water using surfboards.\nMany pots of marijuana plants growing in a greenhouse.\na toilet sitting in a tile covered floor in a single room\nA large white bed in a red room\nSheep perched atop knoll on green countryside with rocks.\na large group of people walking on a city street\nA plate and vase on display in a room.\nA man laying on top of a white bed between two lamps.\nA woman reaching for a frisbee as another defends her.\nA man holding a Nintendo Will controller in a living room.\nA tow truck at a traffic stop with vehicles behind it.\nA horse drawn carriage riding across a snow covered field.\nAn older later sits and drinks from a cup.\nA sink sitting in the middle of a bathroom.\nA table that has a silver tea pot in the middle and several plates around it with desserts on the plates.\nA truck that is driving down the street.\nA boy in a red jersey throwing a baseball.\nA man on a horse is herding animals down a trail.\nAn elephant standing next to a lake on a beach.\nA griaffe walking on a road with two cars approaching.\nA clean passenger bus driving in a city.\nA small black cat laying on top of a couch.\nA cat is trying to squeeze through a door.\nThe transit train stretches down the track under the power lines.\nA man is standing near some graves, water, and a bus.\nA man playing a game in an RV with a remote controller.\nA black and white dog with a frisbee lying on the grass.\nThe tennis player is returning a strong serve.\nTwo young children play in the grass with a kite.\nTwo trains, side by side, waiting at the train station\nOne cow attempting to mate with another cow in a pasture.\na living room with a shelf a coffee table and couches\nTrophies and cell phones are on a table.\na person on a beach with a frisbee\nTwo soldiers taking pictures of a group of soldiers arranged for a photo.\nThere are three men with fishing poles at the beach.\nA BUS STANDING AT A TRAFFIC SIGNAL IN A STREET.\nA zebra and an ostrich up close with other animals in the background.\nA lone horse in the middle of a grassy field.\nAn elephant spraying water onto his body with his trunk swinging backward.\nA pretty cat with both front feet in someone's shoe\nClose up on meal food with three items side by side chicken with barbecue sauce, broccoli with shredded white cheese bits on top, and a bean and pasta or grain mix.\nThree images of a brown and white dog sitting beside a doughnut.\nA black and white kitten stands atop a laptop computer\nPeople are flying a kite in an open area.\na person skiing by a start sign above them\nA woman giving a man a haircut in a barber shop.\nA wooden table topped with the contents of a woman's purse.\nA brown fluffy smiling teddy bear with big paws.\nA man and woman wearing skis on a ski slope.\nThe slice of pizza has tater tots, green beans, and cheese on it.\na hamper with compartments having a cup clothes and two bears\nA small table has many foods and drinks on it\nA Photo of a man on skis gliding on flat snow .\nA MAN IS ON THE BIKE WITH A USA FLAG\na couple of sandwiches are on a white plate\nTwo views of bright objects floating through the blue sky.\nTwo men on a tennis court playing a game of tennis.\na giraffe standing in the foreground with an ostrich behind\na white plane is being prepared to board passengers\nsome giraffes are in a green field and some trees\na large in ground swimming pool near tents\nA large number of teddy bears are sitting at tables with fake food.\nFighter jets flying together in close formation leaving vapor trails.\nGroup of giraffes in high brown grass looking to feed.\na person with a horse and a car in the background\nA man that is standing on a surfboard in the water.\nA flock of birds are flying in formation.\nA huge heard of sheep are all scattered together.\nA man and a dog on a motorcycle.\nA giraffe leans over while another walks away from it in an outdoor area.\nA goth man sitting on top of the floor near a store.\nA child throwing a ball towards a batter during a ballgame on a field.\nThe person is walking on the sidewalk alone at night\nTwo cats lay together on a messy bed.\nA woman in equestrian clothing on a horse\nSome people by a long row of motorcycles parked together.\nLarge long tailed kite on string above rural town.\na giraffe in the distance in front of a tree.\nA group of chefs prepare food in a restaurant kitchen.\nA skier rides their skis down a snow covered hill.\nA blue clock spire next to buildings and cars.\nSeveral wooden cages with white cloth tops and sides.\nAn old man is sitting on a bench.\nA couple of sheep laying on top of a pile of dry grass.\nA white pick up truck driving down a road behind a line of elephants.\nA dog behind the steering wheel of a car.\nAn L-shaped couch in a living room with a coffee table.\nA urinal seperated from a toilet in a bathroom\nA small elephant stands next to a tree.\nThe white and blue boat is floating on top of the water\nA plate of two slices of pizza and a cup of juice.\nA herd of zebras graze on an open grassland.\na room that has a bunch of chairs in it\nA female tennis player is about to make a serve.\nan image of a plate of meat and vegetables\nA jumbo jet plane running along a runway.\nA room with a couch, chairs, television and a table.\nA group of people outside at a park playing softball.\nA photo of two plans with water and birds surrounding it , one plane in the air one one the ground.\nA herd of giraffes grazing on a tall tree stalk\nA man holds a glass of wine on a patio by a vineyard.\nThere are two zebras standing in the desert.\nShimmering lights inside a living room with a dog on soap\nA baseball player at the plate just after swinging at the ball\nthis is a little girl playing in the beach\na bride and groom are cutting their wedding cake\n2 giraffes one of them is doing the splits\nA group of girls on seats in a tour boat.\nA man wearing a helmet on a bicycle in a street that has a guard railing on the side of the walkway.\nThere is a small toy elephant sitting on a wall\nA skateboard on the walkway in an old bus.\nA red headed skateboarder sips on his drink.\nA table that is filled with hot dogs, and a hamburger.\nA duck can be seen in the water with high rise buildings in the background.\nA young woman on a surfboard getting ready to ride the wives.\na horse in a field of tall grass\nA man appears to be giving snowboarding instructions to a woman.\nman brushing his teeth in blue and white tiled room\nThree people in green and black snow suits with ski equipment on a ski slope.\nThe modern church has a clock on it's steeple.\nA cook is holding a wooden cooking utensil in his hand.\nA living room with hard wood floors and a tv over a fireplace.\nA train stopped at the station to pick up passengers.\na woman is riding on the back of a horse\ntwo zebra standing in pen and grazing side by side\nA picture of some food in a bowl together.\nA man sitting at a table with a cake in front of him.\npeople getting on a public bus at night\nTractor passing a statue of a dairy cow wearing a lei\nWoman with blue streaked hair sitting cross legged on bed.\nA stop sign with graffiti written on it.\nA professional horse back rider is getting ready to take a shot while the crowd looks on.\nA hotel room with a made up queen sized bed.\nA boulevard has been pictured by someone driving by\nThe empty bench is sitting in the nighttime street.\nAirline employees by an aircraft parked at the gate\nSome animals are standing together in a pin\nSmall piece of cake sitting on a plate with cherries on it.\nA woman sitting outside her house under a fruit tree.\nbaby in a highchair with bib and cake\nA red trolley train riding along the tracks near trees.\nAn array of lights on some sort of machine\nA man sitting inside of a car on the street.\nA man is being pulled on his skateboard by two dogs.\nA horse sticking his head out of a doorway.\nthere are several clocks that look like they are hanging from the ceiling\nA kid is doing a trick on his skateboard.\na man  with a suit stands in front of a brightly lit drapes\nAn orange cat and a black and white cat both laying on a bed.\nA man in skies is going up hill.\nA platter of food that includes eggs, hot dogs, and cheese.\nA living room with a couch, chairs, television and a child's high chair.\nA white bed sitting in a bedroom in front of a TV.\nRoses and other flowers are siting in the vase\nA person sitting on top of a rock over a river near a city.\ntwo public transit buses parked near one another\nA man is standing in a cluttered room.\nA young women sitting at a picnic table eating a meal.\nA crowded city street filled with traffic and bicycles.\nA bird with a large crest standing on a branch\nA giraffe lying on the ground in a zoo pin.\nA shower and a toilet in a bathroom.\nThree grey birds in a tree with blue backdrop.\ntwo cars parked on the sidewalk on the street\nA woman standing next to a modern style parking meter.\nA chili dog, onion rings and chili fries.\nThis rider takes a brown horse across dirt\nA large dark colored spoon sitting on a rack\nThe player who is up to bat next is getting ready for his turn.\nA man on a beach getting ready to throw a frisbee.\nTwo old people sitting on a bench before a wooded lake\nA small dog wearing a sweater and holding a Frisbee in its mouth.\nthere are many airplanes stop at the airport.\nA woman checks her phone while holding her hat sitting on a bench, with a bicycle in front of her and a hedge behind her.\nTHERE IS A PIECE OF CAKE ON A PLATE\nThe person fell off of the horse and into the water.\nA large truck can be seen in this picture near a bridge.\nA small old street sign hanging on a building.\nA man watching another man on a skateboard\nA man sitting on a large bench talking on a cell phone.\nParking meters stand in front of parking spaces in an empty lot.\nA man on horse back and a truck watching a herd of sheep cross a road.\nA man in wetsuit on surfing on white surfboard.\nA dog and a cat underneath a desk.\nTower clock made of rock set against a cloudy sky.\nTwo children who are standing next to a white fire hydrant.\nA bathroom with a white counter top and white towels\nTwo snowboarders are standing with one foot strapped into their boards and one foot out, at the top of a mountain.\nView of a partially shaded city street with autumn leaves.\nA laptop computer sits on top of a messy desk.\nA pair of sinks in the middle of a kitchen counter with a wooden countertop\nA group of bicycles that are sitting on the road.\nSeveral people are swimming and surfing in the ocean.\na herd of giraffes standing around a bare field\nTwo young men at sunset juggling a soccer ball on a beach\nA dish with meat and vegetables set on a bed of rice.\nA brown teddy bear sitting on top of a wooden bench.\nThe sheep are grazing in the grassy field.\nA blue train on the tracks at a train station.\nA cat sitting inside a toilet bowl looking alert.\nA woman who is standing near a clock.\na man with a tennis racket plays a game of tennis\nTwo small black bears stand near a tree.\nA group of people holding Nintendo wii game controllers.\nThe people ride the bike near the water.\nA piece of luggage with a rainbow strap and a ticket on it.\nA pizza is shown being cooked in an oven.\nA man holding a frisbee in the field with grass\nA street sign is posted to watch for senior citizens.\nA large intersection that doesnt have much traffic\nPeople in a park trying to fly a large purple kite that looks like a fish.\nmany different cup cakes on a grill on a table\nA man taking a swing at a tennis ball\nSnow skiing at night presents unknown dangers without the lights.\nA bathroom with the curtains drawn down and the lights on.\nTwo old-fashioned bicycles parked together on a beach.\nbananas and apples sitting next to each other on a counter\nA white and blue truck driving down a mountainous dirt road.\nBackpacks line a boardwalk to a beach surrounded by trees.\nA bird that is standing on a concrete ledge.\nA group of sheep sitting next to a stone wall.\nA large pizza with tomatoes, basil and cheese.\nAn Asian man is sitting in a cubicle with a near a computer near other people working in cubicles.\nA white seagull standing on a white column by a pier.\nA yellow and metal train traveling down train tracks.\nA pair of giraffes grazing through a wire fence.\nA board full of chopped vegetables near a computer\na giraffe standing in between some brush as a bird flies by it\nHe is intent on another bite of the sandwich.\nA dim runway has an airplane on it.\nsheep cross the road next to a white barn in the rolling hills\nA man sitting in front of a computer with a bottle of beer.\nWhite and blue plate of two glazed donuts by two glasses of orange juice.\nFlowers are in a vase on a shelf.\nA group of young people brushing their teeth.\nA grey cat with yellow eyes looking innocently at the camera.\nThere are giraffes that are standing g yogeter\nThe Asian woman is trying to sell her food on a local beach.\nA zebra standing next to a van door.\nA man riding a skateboard across a crosswalk.\na group of young people playing frisbee in a field\nA table topped with cut in half Twinkies on top of cupcakes.\nThe woman is on the tennis court playing a game.\nA picturesque view of a small town during winter.\nA woman riding a wave on top of the ocean.\nA zebra walking next to another animal across a dirt road.\nTwo horses and a man are on the beach.\nA yellow plant with green leaves in a glass vase.\nHeadphones help her to hear her cell phone.\nA picture of someone typing on a laptop.\nA cat curled up asleep in front of a laptop computer.\nA very nice looking room with a big bed.\nA man holding a baseball and a catchers mitt.\nPeople looking in the shop windows with a bicycle parked against the window.\nA person kicking up on their skateboard at the top of a ramp.\nA large bunch of green bananas hands from a tree.\nMan wearing a bandanna trying to catch a frisbee\nA woman preparing to serve a tennis ball.\na couple of people are standing in the shade\na telephone pole with a sign stuck on it\nA man talking on a cell phone while walking down a street.\nA boy in red shirt swing a baseball bat.\nA  man getting ready to hi a ball in a baseball game.\nA pie filled with white creme next to a yellow banana.\nA man outdoors jumping to catch a frisbee.\nA person on a snowboard catching some air over a hill.\nA zebra laying in the dirt looking away from the camera.\nA double decker bus parked next to a brick building.\nA cheesy pizza with red peppers is in a box.\nElephant holding onto the tail of another elephant with its trunk.\nA dog and it's owner sitting in front of a desk.\nA picture of a surfer as he catches a wave.\nA young man sitting under a tree with red leaves.\nA woman sitting on the grass with a computer outside the Brown Library.\nA pizza on a pizza pan with two pieces removed by a serving ladle.\nA blue subway train pulls into the subway station.\na woman whispers into a mans ear with a suit on\nA shirtless man is on top of a man on a couch\nAll the contents of a video game console have been unpacked from a box.\nwhite sailboat docked with other white sailboats\nA man swinging a racket on a tennis court.\nStuffed toy bears on display on shelf in large room.\nA red stop sign sitting above a no parking sign.\nA living room filled with furniture and a purple couch under a window.\nAirplanes sit parked on the runways of an airport.\na person jumping a skate board in the air\nA passenger train speeding down a track in the countryside.\nA boy playing baseball is winding up for the pitch.\nA dog and cat sleeping together on a dog bed.\nA living room with different living room furniture.\nA player prepares to run to first after hitting the baseball.\nThree adult giraffe stand at a grove of trees.\nA person wearing red gloves grilling a pizza.\nthe man is riding a skateboard down a ramp\nSeveral sailboats sit in the water in front of some trees.\nLooking up at a building with a large face clock near the top.\nA large jetliner flying over a row of runway lights.\nA clock on a tower as seen from a roof.\nA stuffed animal hanging from a post in a field.\nTwo nicely dressed men standing together next to a flag.\nA refrigerator filled with food and drinks next to condiments.\nPeople are riding the waves on surfboards on their stomachs.\nA few deer laying down in the grass near a bunch of trees\nA dog is sitting on a covered couch with some light.\nA couple of dolls are standing on a table.\nTwo children eat fresh vegetables from a skillet.\na person holding a coffee cup with a watch on his wrist.\na large air plane flying in he sky\na person on a skateboard is doing a jump\nAn elephant with tusks eating food behind a fence.\nA man riding the waves on a jet ski.\nA person on a horse that has a decorated hat on its head and covering it's ears, with another horse next to it that has a mask covering it's eyes.\nLarge blue metallic public transportation bus with Aubaines written across the back.\nLarge pizza with cheese, olives and tomato sauce\na couple of people that are staring into a icebox\nA man standing in front of a restaurant with a skateboard in front of him on the ground.\nA silver Sport Tourer BMW motorcycle on a sidewalk.\nSeen through a wire fence, is a stadium area with watchers and many vacant chairs, a dugout with a railing and many men leaning on it, and a playing area with a lunging, uniformed batter with a catcher and an umpire behind him.\na woman with a cell phone and another with a large bag\nA white table topped with two desktop monitors.\nA sneaker and a paw are seen on the grass.\nA train traveling through a grass covered park.\nA red fire hydrant sitting on a brick sidewalk.\nA very colorful mix of grilled vegetables looks delicious.\na large cat laying across a table next to a monitor.\nAn antique motorcycle restored to like new condition.\nBaseball player barely delivering hit to ball during game.\na zebra bending over eating grass at the zoo\nA person is cooking something on the stove.\na black vanity top sink toilet and mirror\nAn old fashioned bench is sitting on the sidewalk.\nA herd of cows grazing on the grass.\nA man prepares to swing at the tennis ball\nA white toilet in a bathroom next to a trash can.\nA boy that is jumping in the air with a skateboard.\nA couple of people by a boat in the water.\nA stop sign on the side of the road.\na pink and yellow sign is hanging above the street lights\nTennis player holding his racket looking ahead of him.\na woman with eye glasses sitting at a table covered with food\nA skateboarder is performing a round about handstand.\nA child is playing with a frisbee in the park.\nBlue plates are stacked on a wood countertop.\nAn old time car is parked at the curb near a stop sign.\nA trailer truck hauling with a crane hauling logs.\nA furry dog playing with a green apple on the carpet.\na boy sits sits on top of a horse in front of a jungle forest\nA couple of brown horses standing next to each other.\nA motorcycle parked next to a stairwell behind a plaque.\nA cat sitting in front of blinds in a window.\nA young man is skating around white cones.\nThere is a woman playing a game of tennis.\nAn old sign with trees in the background filled with fall colored leaves.\nA young boy in motion while holding a remote.\nA fighter jet is flying through the air.\nThe young children are playing a game of baseball.\nThe fat grey cat is wearing a red satin tie.\nThe large boat has nets extended on both sides of it.\nA man swings a racket during a game.\nA baby is sitting on a wooden bench.\nA man, woman, and two children laying together on a bed.\nPeople watch a baseball game in a large stadium.\nA very tall building with a massive clock tower.\nA man flying through the air while riding a pair of skis.\nA boy putting his leg back to kick a soccer ball.\nA stuffed bear head and paw on a laptop computer.\nSeveral dark colored cats laying together on a piece of luggage and a duffle bag.\nA woman laughs as a man brushes his teeth in a public location.\na man shaving in a bathroom while looking in the mirror\nA fruit salad with cantaloupe, kiwi, and bananas.\nA colorful train is waiting on the tracks at the station.\nA woman holding a tennis racquet on a field.\nA bowl with a plant, a large vase, and two cups on a table.\na person that has a lighter in their hand\nMan leaned back with his mouth open, sleeping on a bench\na bunch of knobs on a large metallic stove\nA zebra is eating while standing next to some hay.\nA group of people on a field playing with a frisbee.\nA girl laying on a bench reading a book.\nA hand holding a mug of green liquid next to a pile of fruit.\nA pair of photographs of a dessert with a vase of flowers.\nTwo zebras are standing outside near a tree.\nsome brown and black horses a table umbrellas and a person\nA child in striped shirt sitting on the top of a bench.\nA young baby that is brushing their teeth while sitting down.\nGroup of signs on top of each other on a pole.\nA man and a woman outside next to an old truck.\nPassenger train crossing a bridge next to a grassy field.\nA clock tower made of stucco with an arched window.\nA women holding a fork while looking at a cake.\nA restaurant with no one in it has several square and round tables.\na crowd of people standing around and sitting watching surfboard\nA man in grey shirt riding on a skateboard.\nStreet traffic light that is on blinking yellow.\nA man laying in a bed with tubes attached to his check and mouth.\nTwo bird flying low across a body of water.\nA baseball player posed to hit a ball.\nThe man is putting his feet up on the desk.\nA cat sits in a glass window by a stuffed toy.\nA fireplace with a mounted flat screen tv above it\nA person's hand on the back of a black cat that is diving into a bathtub filled with debris.\nTwo girls playing a game of tennis on a court.\nThe front edge of a well used skateboard.\nMan juggling three balls at the same time.\na woman is standing in a green field playing tennis\na giraffe bending down to eat grass off the ground\nTwo tennis players on the court and waiting to play.\nTwo children smiling and eating small personal pizzas.\nOld refrigerator open in an abandoned wooden building\nA painted postcard of the clock tower and bandshell at the Daytona Beach, Florida.\nA couple on a bike are riding on the sidewalk alongside a bus.\nBed in room of some home with windows.\na really big elephant that a man is on\nA person in a ski jacket next to a train\na red sign is hanging on a pole outside\nA orange cat sitting in a piece of green luggage.\nThis kid who is Pinoy is  skateboarding over an ollie jump\nThe man poses for a picture while holding a snowboard.\nA beautiful young woman talking on a phone.\nTwo ladies on a road with an umbrella\nA gathering of people playing a video game.\nA small office with a desk and book shelves\nSeveral planes are flying high in the air together.\nA large elephant walking next to a man\nA pair of young men stand in a field playing with a frisbee.\nA kitchen area features white appliances, counters and a white floor.\nCars parked on a dirt road near airplanes.\nA cat is looking at the side of a laptop computer.\nA green wall in bathroom with white and chrome fixtures.\nthere I a motor bike that is pakrd on the street and one with something on it\nA couple sitting on top of a bench under an umbrella.\nA flat screen tv on a wooden tv stand.\nTwo young men playing a motion controlled video game\nA cat sitting on the edge of a table.\nA woman sitting at a table eating a donut.\nA table contains a large square cake decorated with a flower.\nA set of blue bleachers sitting in the middle of a dirt field.\nA pair of rusted scissors stuck in a stone sculpture.\nMan standing in front of a television holding up a Wii controller.\nA man playing a game of tennis and people in the crowd watching.\nThree blenders with colorful tops and bases, two of them matching, stand in a row.\na room showing a microwave and a cooker also an oven\na man wearing a hat while riding a surfboard\nA man brushing his teeth in front of a mirror.\nTwo bears in an in closed area with trees and stumps.\nA man holding on to a parasail over the ocean.\nA white toilet sitting next to a white sink in a bathroom.\nA blue sea anemone living on a coral reef.\na long haired white dog is eating some cake on a plate\nA batter practicing his swing in the batters cage.\nA person riding a skateboard down a metal railing.\nA snow skier standing at the top of a snowy slope.\nA chocolate cake with decorations and a knife\nA large white jet airliner flying over trees.\nFlock of sheep eating grass on a mountain.\nA surfboard resting on the sand of a tropical island beach.\nAn overhead view of a group of people sitting at several tables.\na sandwich with some fruit and a drink\nA parking meter on the side of the road.\nA pile of luggage sitting on the floor.\nFruit stands with bananas, pineapples, oranges, and other fruit.\nThe three giraffes are walking together on the grasslands.\nThe hand is reaching out in hopes of catching the flying disc.\na paper plate with some pizza on top of it\nfemale skier, skiing slowly thru cold white snow\nThere is a computer monitor with a graphics program open sitting on a wooden desk.\nA giraffe laying on the ground in the grass.\nAn dinner of pinto beans, broccoli, a roll, skim milk, an apple, and something unidentifiable.\nSpectators watch a professional tennis player serve the ball\nA meal on an airplane of cereal, milk, and fruit.\nThis is a child holding a remote to a game console.\nA young skier in a red jacket goes down hill\nThree big horn sheep are in an enclosed pasture.\nA group of people standing around outside with their bicycles.\nA toilet with jelly fish and star fish on it\nthere are two pieces of bread on a yellow plate\nA large bear ornament hangs on the Christmas tree.\nA small bottle of liquor next to a whole orange and an orange half.\nA woman holding a few bread sticks and a glass of wine.\nA view of an airplane wing flying over a mountain range.\nA tennis player holds her racket during a match.\nThe cat is angry while sitting on top of a pillow.\na blue and yellow train engine and some people\na group of people that is surfing on some water\nA man riding skis down a snow covered slope.\nA plate of food that includes a sandwich and shoe string fries.\nThe dinner plate has three smaller bowls next to it.\nThe yellow bird is waiting for its mate.\nTwo empty stone park benches placed up against a stone wall.\nAn airplane sits on a stand for display.\na close up of a person holding an electronic device\nA airport tarmac filled with a jetliner and trucks.\nA piece of cake and for that are on a plate.\nA man skateboarding through an obstacle course with cones.\na white toilet and many rolls of toilet paper\na four-legged animal grazes on the side of a hill in a forest\nTwo young ladies petting a young calf on a farm\nThe woman in the colorful dress is holding a video game remote.\nA wing outside of an airplane window high above clouds.\nA person on a surf board surfing a wave.\nA small kitchenette with personal items displayed attractively.\nA man with glasses sitting at a wooden table with a lamp.\nA desk area with a computer monitor, keyboard and mouse.\nA girl that is sitting down with a cell phone.\nA great full shot of the bathroom with wooden floor.\nA cat lies in an empty fruit box amongst other fruit boxes.\nA man is outside grilling some hot dogs.\nThe truck is traveling down the road in really bad weather conditions.\nA field filled with lots of white sheep next to a river.\nThere are a lot of animal heads laying on the bed.\nA women standing on a bike backwards .\na woman sitting on top of a horse standing on a beach\nA man in a blue shirt holding a piece of pizza.\nA bed with two pillows under a window.\nA white plate with slices of meat and veggies.\nA young man is eating a hamburger while a young girl watches and laughs.\nA parking meter reads .90 cents as a silver car is parked behind it.\nA man looks down at a dog sitting on a chair outside.\nA picture frame with 2 pairs of scissors dangling from top and a painting sitting in front of the frame\nA woman that is leaning over a pizza.\nThere is a suitcase which appears to filled with foreign snacks.\nTwo women in a public place playing on a Wii system.\ntwo people on a tennis court playing a game\nA large crowd in a grassy area with the capital building in the background.\nThere is a plate of pasta with a fork in it.\nAn old mattress lying amidst overgrown brush and leaning against a fence.\nA person on a surfboard in the water.\nA refrigerator sits next to a counter in a kitchen.\nAn elephant is stretching its trunk on the ground.\nA kitchen has a vintage gas range and yellow walls.\nA man with a glass of wine in his hand.\nRed fire hydrant with blue top on downtown street.\nA person standing on top of a snow covered slope.\nA group of smiling police offers on brown horses.\nA media center with gaming consoles and a television\nA traffic light that is currently a green light.\nA white plate with meat and broccoli on it\nThe yellow earth mover sits in the field in front of the pole.\nA man holding a surfboard in a hotel room\nLooking up at a traffic light next to some street signs.\nthere are many train engines and cars\nDessert pastry with apples served with an autumn theme.\nA red stop sign next to a road in the middle of nowhere.\nA kitchen with a sink, counter, cabinets and a dish rack.\nthere is a woman riding a brown horse on gravel\nA noodle and vegetable dish is displayed on a plate.\nA snowboarder holds a snowboard for a photo.\nMotorcycle decorated with an American flag and reindeer\nA person on a skateboard on a ramp.\na man with a tennis racquet serving a tennis ball\nA young man is in the middle of performing a skateboard trick.\nA young man playing a game of tennis against an opponent.\nA man with a pan in his hand walks by pizzas in a oven and on counter tops waiting to be baked.\nsome kids are watching as giraffes walk around their zoo exhibit\nA young lady holding a bat behind her shoulders\nA couple of cats sitting on top of a couch.\nKid sits on the edge while another jumps over riding a skateboard\nA group of people walking across a crosswalk.\nA group of three men standing with their backs against a fence.\nA couple stands smiling next to a sitting older couple.\nA large white table with chairs surrounding it.\nA snowboarder and her child in the snow.\nA male tennis player holding his racket in the air.\nA red and blue train on a bridge during a cloudy day.\nBrown horse on the sand at the ocean.\npoles full of signs in front of a skyscraper\nPeople reaching into a broccoli garden and picking broccoli.\nAn old fashion looking bus is sitting idly.\nHomemade cheese and red sauce pizza on a plate with flour and dough on the wood table.\nA desk with a bunch of paper on It.\nA cat being offered water in a glass.\nThree loaves of bread are in an oven.\nA teenage couple dressed up and smiling in a aprk\nA kitten is held and fed with a bottle.\nA woman stands on a patch of dirt holding a tennis racket.\nA woman taking a swing at a baseball\na big bear stand next to a river stream\nA table topped with construction contents on top of a wooden table.\nA young man wearing a tie and sunglasses is looking away.\nA baby sitting in the grass looking at the kites in the sky.\nA fork and knife on a plate with pizza\nA trio of people stand near two elephants in a covered area.\nA girl playing frisbee in the backyard of a house.\nA doorway leading to a dining room area.\nA child is holding on to a rod while he rides a boat.\nA stainless steel pan with a pizza cooked on it.\nGraduate talking on cellphone with people behind him.\na painting of the president sitting with his hands folded in front of his face\nAutomobiles stopped at a traffic light at night on a busy street.\nA street sign sitting next to parked cars and motorcycles.\nA train in a subway that has a few passengers.\nA lot of people walking in the streets and on the sidewalks.\nA baseball player swings a bat at a thrown ball\nA motorcycle parked next to a green grass covered field.\nMany swans in a lake are overlooked by a cow.\nA herd of cattle that are sitting on the grass.\ntwo giraffes and a man in a brown shirt is feeding one\nA smaller kitchen with a very decorated fridge.\nwhite cabinets a sink stove refrigerator and a window\nA woman walking down a rain soaked street with a red umbrella.\na military vehicle and a smoking tow truck on a rural road\nA black and white picture of a child on a skateboard the street\nA number of grizzly bears sitting on tan rocks.\nA group of people sitting in a field eating together\nA man is cross country skiing through a forrest in winter.\nA buffalo is looking at a bird from a distant.\nA cardinal standing on an empty wine glass.\nA chocolate cake being sliced and served on plates\nsliced tomatoes on a plate and a bottle of wine\nTwo people stand next to each other holding cell phones.\nLarge pink boat on wheels parked on the side of the road.\nA bathroom with a unique double sink and round mirror.\nA polar bear walking along a snowy, rocky ridge.\na man on the phone looks angrily at the man\nDifferent sized and styled teddy bears on display with pictures and information.\nThis is a train traveling down the train tracks.\nTwo women riding an old motorcycle with a side car\nWoman jumping on bed caught in mid air.\nA person on a surfboard high up over the water.\nThere are two monitors and one laptop on the corner desk.\nPhoto of kitchen being remodeled with a new stainless steel stove.\nA surfboard is laying flat is the sand beside a palm tree.\nA sign that is on top of a pole.\nA group of three people sitting next to each other  on a bench.\nA table with an assortment of items such as a keyboard, phones, pens, snacks, keys, sunglasses, a water bottle, and more.\nA woman playing fetch outside with a dog.\nA man reflected in the mirror in a washroom.\nA man in black shirt holding a large striped flag.\nTwo children riding a horse in front of their home.\nA pair of buses sit next to each other on the road.\nA group of skiers entering a tunnel through the snow\nAfternoon tea in a living room of a home in a hot climate\nA male ostrich runs through the grass in front of the trees.\nA man posing for a picture, in a kitchen.\na wooden desk with two monitors and a keyboard on it.\nSomeone is holding their tablet connected to a surge protector.\nA woman walking next to a train, pulling a suitcase.\nA couple leaving their wedding ceremony in a shower of rice.\nA stuffed teddy bear sitting on top of a bench.\nA huge dump truck is fenced in in front of a neighborhood.\nTwo little girls sitting on a bench at a softball game\nSomeone is enjoying a small slice of pie.\nA group of people riding horses down a sandy beach.\nan image of a man that is drinking wine\nPeople in uniforms playing baseball on a baseball diamond.\nA bathroom with a double vanity and round mirror.\nLight colored cat lying on woven rug next to checkered shoes.\nA modern sink and shower stall are visible in this photo.\nA passenger bus that is driving down the street.\nthere is a broken tree log on the ground\nA mini keyboard attached by USB to a laptop.\nThe raw material of meal preperation including Broccoli is kept on the table.\nTwo apple computers are on a white desk\nMan sitting on a step in a run-down part of town.\nTwo people standing in the grass under a cloudy blue sky.\nTwo toilets sit outside on the pavement next to a yard with many decorations.\nA man high in the air mid trick while snowboarding.\na bathroom that has a white toilet in it\nA fire hydrant is across the street from an Asian restaurant.\nPeople waiting at a bus stop with a bus parked.\nA group of men playing a game of basketball on court.\na fridge filled with assorted foods and condiments\na guy in a half pipe gets ready for his trick\nA small boy is holding two pizza muffins.\nA display case in a bakery with decorated cakes and cream rolls.\nZebra crossing a dirt road by itself in daytime.\nA long yellow train traveling past a train station.\nA lady staring lovingly into her pizza.\nA brown teddy bear sitting on top of a pregnant woman's belly.\nAn old black and white photo of a man with glasses in a suit and tie.\nA man on a skateboard with a woman filming\norange, pear, and apple are all in a row.\nThis is a bus with a Titans themed advertisement for Coors Light on the side.\nLittle bird looking out from the tree it's standing in\nEveryone is waiting in line to purchase tickets.\nA small model train traveling around a small track.\nTwo forks on a plate of cake and cream.\nA doll house living room filed with furniture and a persian rug.\nA man standing next to a hipster woman while holding a beer in his hand.\nTwo tall birds stand together on a grassy spot next to a large rock wall.\nA group of people standing on a beach next to the ocean.\nSomeone sauteing broccoli and onions with wooden spoon.\nA woman posed for a picture while eating.\nA woman looks through things on a desk.\nA train car with purple and grey graffiti covering windows\nA small bathroom has a sink and a storage rack over the toilet.\na group of men trying to get an air blaoon  working\nA large dim kitchen with light coming in from a window.\nA skateboarder is gliding along a paved walkway.\nA group of elephants in sandy area next to trees.\nA bird sitting on a fence and looking around.\nA person windsurfing with the sky in the background.\nA stop sign and three street signs attached to a pole.\na black television is on a white table\nA picture of a red prop plane parked in a field.\nA woman sitting in front of the Eiffel tower near pigeons.\nan image of a closed mcdonalds taken in a parking lot\nA man is shown, with headphones around his ears.\nA man is waiting for a bus on the side of a city street.\nA kid up to bat in a baseball game.\na church with a clock built into the side of it\nA clear vase full of purple flowers sitting on a table.\nTwo women play singles tennis outdoor surrounded by trees.\nA stop sign is leaning a little bit.\nA cat sitting inside a piece of luggage on a vehicle.\nA white bathroom with pedestal sink and small cabinet and daylight window\nA dog picks up a Frisbee out of the grass.\nAn open laptop computer on a wooden night stand.\nAn old man holding a bag walks down a street.\nA laptop computer with pictures of giraffes on the homescreen.\nA person riding a skateboard on the sidewalk while holding a pole.\nA person laying face down and balancing himself on four yellow poles and a fire hydrant.\nA laptop computer and a desktop computer sit on a wooden desk.\nA woman is holding her daughter in front of a birthday cake with candles while another lady stands nearby\nA sheep with long horns wearing a purple bit.\nPink glasses are inside a clear plastic bag with bananas.\na bed that has some material items on it\nSheep resting under a blue boat foundered at low tide.\nA bed and mattress store front with open doors\nThree men sitting at a table in a restaurant eating.\nA very clean bathroom that is made out of wood.\nA small airplane flying in the air near land.\nA man and a giraffe are greeting one another.\ntwo horses standing in the snow inside a fence\nA bathroom with a separate tub and shower\nA large machine digs up a side walk at a construction site.\nThe yellow train is running along the tracks.\nA chili cheese dog in a travel box on a table.\na women that has a carrot in her hand\nPeople are sitting on a cart pulled by a horse.\nA little girl standing in front of a pile of surfboards.\nA close up of a modern motorcycle on display.\nA person rides a snowboard in a forest setting.\nThis is a cow on a grassy plane with a mountain in the background.\nA large display filled with bananas for sale.\nA woman is sitting outdoors at a table with a sandwich.\nAn individual is taken in this very picture.\nA photo taken over a water way with a clock tower in the background.\nsome men riding horses down a mud track\nA catcher throwing a ball at a baseball game.\nA mixer in the process of mixing foods.\nA busy inner city street with cars, a bus and a biker on it.\nA bathroom stall with two toilets and a plunger.\nA group of three men and one women are holding Wii controllers in a living room.\nThe boy is on his boogie board in the ocean.\nVases with flowers are setup against a pink backdrop.\nA women walking down the street while holding an umbrella.\nA couple of cows laying on top of green grass covered field.\nA dog lays on the bed with a remote.\nA woman hitting a tennis ball with a bat.\nA dog lying on the floor on some clothes and a remote\nA tennis ball sitting on a tennis racket.\nA long row of scooters stretch down the length along a sidewalk.\nA white and yellow plate holding three bananas.\na living room with a couch and two low wooden tables with floor cushions in a log cabin.\nA cat watching birds flying on a Sony TV screen.\nA person standing by a field with a large chair.\nThe man is standing by the table using his phone.\na person at a table with many plates of food\nA wooden ladder stands over a toilet in a tile bathroom.\nA toilet with a sink and a towel dispenser in a bathroom.\nA photo of Thomas the Train coming down the tracks.\ntwo elephants in a field behind a fence near many trees\nA very tasty looking dish of food with some broccoli.\nThree giraffes on grassy field next to trees.\nA town with buildings, vehicles, and street lights.\nA side view of a building on a street corner are shown.\nThe decrepit bathroom features a brand new toilet.\nIndividuals are there commending and having a ton of fun of their life.\nthe person is putting something into an oven.\nA man in a kitchen handling food, with another man in the background.\nA clock that is sitting on a wall.\nA giraffes head in front of a metal grid\nA young man in formal dress is standing.\nCustom made pizza sitting on a plate ready to be cooked.\nA pedestal sink and a toilet in a bathroom.\nA man dicing carrots with a large knife on a cutting board.\nGraffiti on a French street showing a man holding a red umbrella.\nA zebra is standing next to a fence.\nThe plains with zebras and gazelle around a watering hole\nA lonely bird sitting on a white bench.\nA man and a small girl standing next to a glider.\nA coffee cup and a plate of food.\na display table filled with assorted carrots and cauliflower\nA person on a skateboard riding on a street.\nA man that is on a skateboard in a concrete bowl.\nA crowd of people holding umbrellas walking down a sidewalk.\nA person's feet in the bed with socks and shoes on.\nTwo toilet paper rolls sitting next to a toilet.\nA young woman carefully touches a giraffe's long tongue.\nSeveral people ride down a dirt road in a horse drawn carriage.\nA foggy street with lots of traffic driving under traffic lights.\nThere is some food sitting in the pan.\nA view of many different scissors on display.\nA woman gives the peace sign at lunch with her friend\nA fighter jet flying through a blue cloudy sky.\nA vase filled with peacock feathers sits in front of the window.\nA man in a long sleeved hoodie holds a cup\nRed jello with fruit in container in microwave.\nMotor vehicle traffic on a paved city road.\nA school bus parked in a parking lot next to a building.\nTwo men watching three horses running down a path.\nan image of two zebras in the wild\na person jumping over a curb at a corner in front of a liquor store\nA man with piece of cake and a spoon sticking out of the top of the cake.\nan image of a zebra in a field\nA woman with a cell phone sitting on a couch surrounded by a red white and blue border.\nA boy in a safety vest lays in the snow as another boy on skies stands near by while a man in a red jacket kneels in the snow by the boys.\nA vase filled with white flowers sitting on top of a wooden table.\nA person sitting down a bench in front of the ocean.\nTHERE IS A MAN THAT IS SITTING ON A BENCH READING\nA red truck driving past tall buildings on a paved road.\nA young Asian boy holding a tennis racket.\nA woman standing near some steps at a river's edge.\nCoordinated bedding pulls together a full size bed and a set of bunkbeds.\nA person on a court with a tennis racket.\na fire hydrogen that is sticking out of the ground\nOld and young men sit around a table with laptops.\nA batter, catcher and umpire in a  baseball game.\nA skier with a huge black pompom on his hat.\nA bathroom is shown with a mirror and a sink.\nA motorcycle is parked near a quiet river.\nA man is holding a partially peeled banana in his hand.\nRailway train on tracks traveling on beach next to ocean.\ntwo people sitting at a table with many wine glasses\nAn empty refrigerator has its doors open as it stands next to a kitchen sink.\nA person with purple hair and and tie.\nPlane sits on a bridge above the water\nThis is a man and woman on a ski slope.\nthis is a fire hydrant sitting on the sidewalk\nA black and orange cat sitting on a wooden counter top.\nThe service man is putting meat on the tray.\nThere are giraffes standing together near the trees.\nA girl sits at a dining table set for three with food on the plates and cups and a candle on the table.\nA young woman decorating a cake with a frosting bag.\na pair of scissors and other knitting supplies are on a table\nA fire hydrant is spraying water onto a city street.\nA man on a bicycle is looking at a semi truck.\nA dog tied to a pole, with a bike behind it.\nA bedroom is shown with a suitcase on a bed along with several clothes on it.\nA girl with curly hair and a teddy bear on a bed.\nA man in ski gear skis down a slope.\nA train pulls up to a platform at a station.\nA kitchen with a stove and microwave above it.\nTwo stuffed bears sitting next to each other.\nThe surfer wearing a wetsuit is riding the wave.\nA woman making pizza at an outdoor event\nA bowl of cereal and a glass of water are sitting on a table.\nA plate with some food on the top.\na mom and a kid in a green kitchen\nA large artistic clock is posted on the side of a building.\nThe man takes a look at the food in his hand with the door to his fridge sitting open.\nA man waving at a school bus from his driveway.\nthree baby lambs laying on a pile of hay\nA crowd of people crossing across a street.\nThe brown cat has big round brown eyes\nBike riders travel next to a passenger train\nA group of turkeys feeding in a field.\na young man is holding a 2000's style cell phone up in front of his face.\nThe display of products for sale at a motorcycle shop.\nThe man is sitting on the beach with a head and sunglasses on.\nAn older couple in a boat float past ducks on an open river.\nA lone skier on a snow slope with some areas of dirt expolsed.\nA man on a surfboard riding a wave.\nAn older man sitting at a wooden table with a plate and a drink.\nA mural on a city wall with a women walking down the sidewalk.\nTwo women wearing bikinis on surfboards in front of beachfront hotels.\nThe kids are playing tennis on the courts for physical education\nA snowboarder does a trick beside a Hilton hotel\nA large passenger jet sitting at an airport.\nA couple of brown horses grazing on a green grass field.\nThe inside bow section of a narrow metal boat floating on blue-green water.\nA street sign has multiple street names on it.\nA woman with short ginger hair has a book open as she lays in bed.\na white and orange cat sitting on wooden table\nA tennis player runs towards the ball during a match.\nA bathroom sink sitting next to a window covered in curtains.\nFresh baked pizza being served at a restaurant.\nA walkway along a river that looks out at a bridge.\nA man hits a tennis ball during a tennis game.\nA sigh advertising a dancing club is present.\nA train pulls up to an empty platform.\nA person who is riding a wave on a surfboard.\nA double sinked bathroom has circular twig wreaths hanging above.\na close up of a bird with a blue head\nThe cheese bread appeals to a variety of people.\none lady is on the computer one is digging through a backpack there's a  man on the phone and another man on a computer\nA baseball player running down a baseball field.\nMan on a skate board holding himself up.\nSome plates and containers hold a variety of food.\nA person in glasses makes a funny face while eating.\na white plate of meat and carrots and a side of brocolli\nA group of children sitting at tables working on laptops.\nA person in a wetsuit on a surfboard on a wave in the ocean.\nA man is looking inside a fridge with only four items in it .\na bunch of umbrellas are in front of a house\nTwo horses in an enclosed area during the day.\nA plate of wild bananas sitting on a patio ledge.\nA couple of yellow school buses driving down a street.\nThis object has a long cord attached to it.\nA batch of sweets as well as oranges.\nA brown teddy bear sitting next to cup cakes and then sitting on a couch.\nA cluttered living room with a laptop computer.\nA person in a tennis outfit holding a racquet.\nA young girl playing tennis on an indoor court.\nA yellow fire hydrant is next to a tree.\nA motorcycle with cheetah print, parked on a curb\nA polar bear that is underneath the water.\nA boy is sitting in front of table filled with apples.\na  small dog laying and a cat laying on a sofa\nA sandwich with chopped vegetables sits in a cardboard container.\na man riding a snowboard down the side of snow covered slope.\nA large horse standing next to a smaller horse.\nA boy running to catch a frisbee in flight.\na cat sits on someones lap and looks at a plate of food\nAn orange that has been placed next to a beer.\nA close up of a large, white plane with someone standing beside it.\nA brown cow standing next to a black horse.\nMany colorful kite surfers over an ocean cove\nA airplane that is sitting in the grass.\nA view from an airplane of mountains with a partial snow-cap\nA man and a woman sitting on a motorcycle.\nA bun has carrots and parsley on it as it sits on a green plate.\nA batter swinging at a ball with a catcher and umpire behind the plate.\na man that is jumping a small skateboard\nA man is sitting next to a computer system with two monitors, keyboard, and mouse and a desk that has many figurines and dolls atop of it\nA black train is stopped on the tracks.\nA professional baseball player holding a ball during a game.\nA double decker bus driving down a street.\nTwo men playing a game while a boy watches.\nA man wearing a green shirt on top of a tennis court.\na airplane that is parked on a  runway\nBoys on different teams running for a basketball.\nAn empty street with a red double decker bus in the distance.\nMan and woman standing up while playing wii.\nA pile of pieces of dark green broccoli.\nGuy sitting at the front of the bus typing something on his laptop\nMan with glasses in suit leaning over to blow out candles on a cake.\nTwo men with backpacks and skis standing on top of hill.\nmain street of a slum with cars and people\nA bed with sheets, a chair and wall hangings\nA group of paddle boarders watch the beach.\nAn adult smiles while skiing with small children.\nA street with a bunch of street signs and a building near the street\nTwo benches and a garbage can sit on a beach.\nA cathedral with clocks set in four directions in the clock tower.\na person jumping a skate board in the air\nThree big rigs parked in a row in a field.\nThe white and black horses are grazing near mountains.\nAn airplane with the word navy pained on the side is sitting on a runway and people are sitting inside of the plane.\nTwo horses hold their heads near the short grass.\nThe woman is sitting at the table in the restaurant.\nMan and woman exchanging words on stage with horse\nA person standing on a skatebord on some grass\nred suitcase and two black suit cases on pavement\nA bus traveling on a city street near pedestrians and buildings.\na bunch of travel bags sit in front of a television\nA boy sitting on a bench looking at a cellphone\nA table set up with flowers is in a farm type area.\nA group of kids that are sitting in front of a table.\nA jetliner flying low as viewed between two skyscrapers.\nA white three tier wedding cake decorated with roses.\nThree dairy cows in a grassy paddocks fenced by bushes.\nTeddy bear tucked into bed in a bedroom.\nA person holding a rainbow colored umbrella near a crowd.\nA large sink with three silver faucets on it\nA cat is playing with a backpack strap.\nA woman holds a Wii remote in her hand while making a face of concentration.\nMotorcycles and mopeds line the street of an asian shop\nA young girl on skis and holding poles, posing in fake snow.\nA baseball player holding a bat on a baseball field.\nA vase filled with an orange reddish flower.\nA PERSON IS ON A HORSE ON THE BEACH SHORE\na group of men work on a air balloon\nBicyclist riding on a city street at night.\nAn apron is flying in the air next to a tree.\na motorcycle with a bag on the back of it parked in the road\nA young man talking on a cell phone with a stuffed animal on his stomach.\nA baby on a brown horse next to two people.\ntwo people in a kitchen preparing food\nA table of doughnuts with light showing on them.\nA women who is wearing snow skis and performing a jump.\nA man with glasses and a tie stares straight ahead.\nthree cats a gray one a black one and a brown and black one on a bed\na backpack and luggage on a car seat\nA man with a backpack holding a bottle of beer.\nThere is a woman sitting in a boat drinking something.\nA zebra and a \"part zebra\" eating grass.\nsome signs on the road showing the street and direction\nA nice setup of stuffed bears having a picnic.\nA man is surfing a small wave in the ocean.\nTHERE ARE DOUGHNUTS THAT ARE ON A PLATE\nA woman and a baby walk on a grass field where kites are in the sky\nA black headed woman skiing in the snow.\nTwo laptop computers sitting side by side on a wooden desk.\nA couple of sheep walking across a lush green field.\nThere are a lot of sweatered teddy bears in this pile.\nA doughnut with sprinkled sugar and icing on it.\nA young smiling boy stands holding a set of Wii motes.\nA group of elephants is walking across a grassy field.\nA large breakfast omelet, english muffin and fruit\nA zebra standing in a stall with its mouth open showing its teeth.\nA small elephant walking around in its enclosure\nA man is talking to children about surfing on the beach.\nA green street sign next to a neon sign on a building.\nA man leans back in a chair with a beverage.\nWoman posing in front of two pints of beer.\ncabinets a sink dishwasher and stove and a window\nA woman riding on the back of a white horse.\nTwo white swans and grey ducks  in a grassy area.\nA plate filled with an assortment of food\nThe toilet is broke and sitting on the grass.\nA blocked off street that is ready for a event to happen.\na red and white firehydrant sitting in some grass with cars and trees in the background\nCows are in a pasture with one glaring attentively for a photo.\nTwo people are on a bike together traveling down the road.\nA couple of people walking across a beach with a surfboard.\na chef slices jalapenos on a cutting board.\nThe plate of food with a spoon on it has broccoli in it.\nA contemporary light-rail train seen from the front is stopped in a station.\nA bike and some people on a street.\nBaseball player on the ground at home plate while an umpire makes a call.\na woman holding up a smart phone while smiling.\nWoman surfer in the river catches a wave\na close up of a pair of scissors in a scissors pouch\nThere is a man standing in the kitchen.\nThe man is reading the paper on the bench.\nA man riding on the back of a motorcycle.\nA stove with the clock set at 1159. There is a spice rack on the stove.\nThe cab of an eighteen wheeler in a parking lot surrounded by trees.\nA clock is standing in the middle of the grass in the middle of the afternoon.\na stop sign and no right run sign in a big city.\na person crossing the street in front an orange trolley while holding a garbage bag.\nA woman standing in a living room with a Wii remote.\nA man is sitting on a chair playing a guitar.\nA toilet sitting under a metal bar in a bathroom.\nA large dog laying on top of a bed in a bedroom.\nTwo women and a man sitting at a table.\nan image of a cat walking in the kitchen\nA mini pizza with an egg in the middle.\nThree women play with Frisbees in a shady park.\nA woman and a child are hiding under the covers.\nA topless girls sittinng on a bed holding a bear and leaning on a suitcase.\na man gets ready to catch a frisbe\na female lunging after a tennis ball holding a tennis racket in both hands\nthis is a green fire hydrant and brick street\nPlastic containers filled with food including fruits and vegetables.\na couple of guys are sitting at a table\nA table is set with plates with pancakes and bowls of fruit and a bottle of syrup.\nOutdoor art piece of an elephant covered in paint being displayed for sale.\nA woman and her two children walking in the rain while holding umbrellas.\nA woman standing next to a giant refrigerator freezer.\nA clock tower sitting in the middle of a parking lot.\nThere is a tea pot on the gas range.\nA large long train on a steel track.\nA green and a pink bus are next to a store.\nA very nice looking motorcycle parked by some trees.\nA sink of a bathroom with things on the counter\nA person on a surfboard rides a wave.\na person and a child playing with a kite\nThe stop sign is across the street from a bridge.\nA girl is holding a umbrella.Someone shorter than her took the picture. She isn't smiling.\nball bats standing on end leaning against each other\nA train drives under a sky walk for pedestrians.\nA lady is paying the Wii at the store while a man looks on.\nA man is on the snowy hill in his ski gear.\nA very long blue and white bus pulling out of a parking lot.\nFour girls and two boys sitting in the back of a parked white Ford Super Duty pickup truck.\nA red traffic light at a street corner with vehicles near it.\nA woman is riding the horse while the crowd watches.\nA teddy bear sitting in a window holding a cell phone.\nDishes with strawberries and walnuts are set on a table.\na police officer riding a bright yellow motor cycle.\nA man and a woman riding a scooter past a church.\nMilitary float plane flying overhead on cloudy day.\nAssortment of sliced pizzas in yellow cardboard boxes.\nthere is a small pizza that is on a white plate\nA man standing on top of a base on a field.\nMale tennis player standing on a court holding a racket.\nA man in the living room plays a game on a game system.\nA close-up of the face of the horse with a woman on the back.\nAn open face sandwich and a pile of potato chips on a plate.\nThe man is outside skiing in the snow.\nSeveral color fruits and vegetables, all unprepared on a concrete surface\nA small bathroom, with a commode, and a bathtub with bath toys in and around it.\nA large boat floating on top of a lake surrounded by a forest.\nA person carrying an umbrella walking on a path next to water.\nWhat a funny picture of one giraffe hanging on to the neck of the other.\nA man standing in front of a laptop computer.\na male surfer in a wet suit some rocks and water\nA cat sitting underneath a bed in a room.\nA stop sign is at the bottom of a four way stop.\nA blue, metallic parking meter with a yellow number six.\nA brightly colored food item is on a white plate on a black table.\nMan skiing down a slope just beneath a lift.\nA girl holding a yoga mat riding skateboard down the street\nA laptop computer sitting on top of a wooden desk.\na couple of treys with some food inside of it\nThe chocolate cake on the plate is topped with strawberries.\nA woman is standing in front of an old train.\nBlack and white photograph of animals and horses in field.\nTwo zebras graze on grass by a dry creek bed.\nA man standing in a kitchen preparing food.\nA rabbit on its hind legs in front of pigs.\na man sitting on a wooden box in front of a mural\nA partially eaten pizza sits on a tray on the table.\nThere is pizza topped with white sauce and broccoli.\nWhite planes lined up in a parking lot.\nA young man sitting on top of a skateboard.\nBlack and white photograph of a man on a skateboard.\nRed Regio buses parked close together in a line.\nA group of people waiting on a train with items balanced on their heads.\nan empty park bench sitting among the trees\nA gravestone with a vase and stuffed animal on it.\nA couple of people on a sidewalk holding umbrellas.\nA man with black sandles standing in a dress store.\nA crowd of people standing next to a vending machine.\nA slice of a  banana on the table\nMonkeys eating through the peels of bananas.\nThree zebras are standing in the grass, while one stares at the viewer and other two stare off to the right.\nThe boat makes a big splash in the water.\nGroup waiting to take their turns on the ski jump.\nA jumbo jet flying in the air during the evening.\nA dog lies on a floral rug near a living room window.\nA baseball player swinging a bat over home plate.\nA person with a snow board posing for a picture.\nA small glass table with vases on top sits by an open window.\nseveral zebra are walking together at a zoo\nA woman in a hair net in a bakery holding a box.\nA train covered in graffiti sitting on top of  train tracks.\nA young professional is working at his laptop while his coworker is reading material.\nA bed, chair, drawer and a wall hanging\nA kitchen counter covered in pots and pans and appliances.\nA giraffe eating leaves from a tree near a forest\na small child has a brush in his mouth\nTrees and a street sign  next to a street.\nA dog is poking its head out of a vehicle window.\nPolice officers ride horses on a city street.\nTrees mark the far side of a fence that encloses a large environment space with man made rocks and two giraffes, one close up and very large, the other small, and seemingly far away.\nA plate contains skewed meat with a side of vegetables.\nA red stop light across from a brick building\nTable set for two with pancakes and syrup.\nThe brown and black cat is laying on a computer laptop keyboard.\nA picture of a person walking down the street.\nThree zebras that are standing in the grass.\nA train traveling across a bridge over a river.\nA donut that with sprinkles on half sits atop a Nautilus jump rope.\na transit bus parked near a building near a cart\nA dirty toilet in a small bathroom.with items on top.\nBathroom with radiator, sink, lighting, shower curtain and decor items.\nA young man performing a trick on a skateboard,\nA white plate filled with pasta and broccoli.\nA little boy holding a Nintendo Wii game controller.\nA group of zebras watching in a field.\nA black bird with spiked hair standing on rocks\nA red double  decker bus in front of a white one.\nThe tattooed man is talking on a cell phone.\nLuggage on a tiled ground and people sitting on rows of chairs in the background.\nA very pretty bird perched by a tree.\nFarmers markets have become popular destination points in metropolitan areas.\nA man in a grey suit with a blue pixelated tie leaning against a wooden podium.\nTwo women in white dresses playing a game of tennis.\nA blue and white bus that is parked next to another bus.\nThree young boys are cavorting along an old sidewalk.\nA beautiful blonde girl standing next to a blonder.\nA man and two girls sitting on a couch with a dog.\nA brick wall with a sign giving directions and a clock on top of it\nAn African-American man, wearing a shirt and tie, glasses, and a cap, is looking downwards.\nA small grassy hill with three sheep at the top and a fence along the side.\nLots of people walking on a city street with Chinese stores on both sides of the street.\nA white bowl is filled with broccoli garnished with crumbles of cheese.\na man in glasses holds an umbrella with a brief case\nParents at a chain link fence watch a Little League baseball game.\nA dog riding on to of a yellow surf board on a wave.\nA young girl sitting in a chair covers her face.\nA jockey on a horse jumping over a hurdle.\nLarge open living room with black leather furniture.\nA red stop sign sitting in the middle of a road.\nA bird sitting in a shallow pool of water observing something.\nA old photo of a pitcher on the pitchers mound.\nA large quantity of banana's piled in a fruit stand.\nAn old fire hydrant sits on the grass in a park.\nAn old truck sits parked in an empty grassy lot.\nThis desk has a computer paper, water bottles, and a rolodex on it.\nAn airport has several planes on the runway.\nA man rides a surfboard down a wave.\nA worn down stove and oven sitting in a parking lot.\nA stadium full of people watching a batter hit the baseball at a game.\nA man with a wrench turning off a fire hydrant.\nA blue and white train raveling past a rusted out train.\nThe man is covered with a net and sitting on the ground.\nA patio table with two dinner plates of food and two bowls of salad.\nAn oriental style room with tatami floor coverings.\nA kitchen with gas stove with four burners and a sink.\nA pile of red and green apples sitting on top of each other.\nA person on a snowboard jumping a snow covered hill.\nA woman smiles while eating a pita sandwich.\nCrowd of people walking in snow in front of buildings\nA woman and man dance while smiling.\nA bird perched on a grave in a graveyard.\nA man with long hair and a beard smiling with his arms outstretched.\nA large cow laying on top of a sandy surface.\nA woman with green hair standing beside a brow and white horse\nA row of planes flying in sky with smoke coming from their tails.\nA boy with a tennis racket bouncing a tennis ball in the air.\nA skateboarder is standing, wearing a helmet and holding their board.\nA man on top of a ski slope on skies posing for a picture.\nA glasses wearing woman with a hotdog sandwich to her mouth.\nA child sleeping with a teddy bear on a bed.\nA table with plates of food that include corn and fruit.\nan Amtrak train with eight cards beside a field\nA Honda motorcycle parked next to a grassy area.\na close up of two pizzas on a plate\nTwo giraffes in a field between multiple trees.\nA little zebra playing around inside an enclosure at the zoo.\nSome black kitchen machines used for cooking food.\nAn older man holding a plant with banana bunches.\na person that is standing in a kitchen\nThe man in the colorful shirt pulls two luggage bags behind him.\nthere are many people awaiting a train at the station\nA sign next to a stone wall stating the road name.\nA giraffe walks on grass looking for something to eat.\na close up of a plant of bananas\nthis kitchen is large and has wooden cabinets and a granite island\nA display of teddy bears are on an outdoor blanket.\nA soccer player kicking soccer ball around opponent.\nA plate of pasta and a bowl of spaghetti\na stop sign sits on a street corner\nMen play a soccer game on a dirt field.\nA pepperoni pizza is bigger than the child sitting in front of it.\na group of small boats in a body of shallow water\nA person skis down a snowy hill while others watch.\nA long train going down the train track.\nA sink and some shelves are in this small bathroom.\nA small car is pulling a man on a bicycle.\na person standing in front of a mirror with his reflection in a different pose\nA bus that is standing next to a building.\nthree friends hanging out on a snowy hill\nHalf eaten berry filled dessert on a white plate.\nA living room with several books and paper on the floor.\nPart of a ship sits in the shallow end of the bay next to a city.\nThe letters of a laptop keyboard are sitting on a wooden table.\nA sign with many different stickers placed on it.\nA young boy hitting a ball in a yard with a bat.\nA small airplane flying through a blue sky.\nPeople in a field are looking up at a kite.\nA girl stands holding the string of a flying kite.\nHerd of sheep resting in the shade of tree in open area.\ncat sitting on top of a red and black motorcycle outdoors\nA toddler getting help from an adult to brush its teeth.\nA stack of orange solo cups near scissors.\nAn abstract photograph of a moving train on its way to New York.\nA group of young people sitting around a piece of luggage.\nA man is surfing the internet on his laptop.\nA bunch of people in a metropolitan area with umbrellas, walking on a sidewalk next to buildings.\nA man is riding a horse in a fenced enclosure.\nThis woman and man are holding a gold bag.\nA man riding a motorcycle with a woman on the back.\nThere is a bear walking across the grass.\nA young boy playing video games on tv.\nA number of people are in a building with many colorfulful items over their heads.\nA large passenger airplane sitting on a runway.\nTwo police officers riding motorcycles down a city street.\nA very neatly organized display of many items.\nA person with a red umbrella and a dog on a walking trail.\nThree children are in the bathroom brushing their teeth.\nSigns directing traffic in front of two several story buildings.\nA desk along a wall with book cases over head.\nA boy is performing skateboard tricks in parking garage area.\nA cabinet holding several oriental vases and lamps.\nA dog laying on the ground between someone's legs sitting in a chair.\nSmall piece of cake on china, with a fork.\nA group of people standing by a white and green train.\nPaperback book about Mother Theresa on a pillow\nA clock and some books in a room.\nA woman raises her tennis racket on the court\nThe man is holding a large bird outside.\nA blue sign suspended above a street with cars driving under it.\nPeople walk down a rainy avenue carrying umbrellas.\nTwo snow skiers pose on a snowy landscape.\nLake with boat in grassy fields with cows.\nA female tennis player readies for a hit\nBroccoli and chopped carrots sit next to each other.\nA large bed with pillows and a blanket.\na small airplane that is just lifting off into the air\nThree cats surrounding a stuffed bear holding the sign that says help.\nA group of people sitting around a wooden table.\nThe child smiles next to a stack of donuts with pink icing.\nAn old western town miniature in the backyard of a house.\nA train is coming down the tracks next to a field.\nSepia photograph of a stop sign next to row of mailboxes.\nShirtless man in white shorts writing on top of a skateboard.\ntwo ladies riding horses there's a reflection of one of them in a mirror\nA man in a yellow jacket that says police is looking across the street at a crowd of people and has his hand on a wooden structure.\nA little league baseball game showing a batter, catcher and umpire.\nTwo sheep graze in a grassy field at the edge of woods.\ncars parked on a city street with buildings in the background\nI am not sure what kind of food this is.\nAn airplane flying over a beautiful ocean shoreline peppered with sailboats.\nA train pulls into a station constructed of brick, rock, and metal.\nA large passenger airplane stands at the gate, near cargo vehicles.\nA hot dog has ketchup, mustard, and mayonnaise on it.\nA fire hydrant is decorated to look like a dog.\nThree girls biting into a piece of fruit.\nTwo blue bowls of food next to a bottle of cinnamon and sugar.\nA man is water skiing in the ocean.\nYellow commuter train at station near industrial area.\nCat overlooking keyboard as seen from above in lit room.\nStreet sign at intersection is written in English and Arabic.\nA pizza sitting on top of a blue plate near a salad.\nA tower clock on a building in the city\nA person wearing gear sitting on the side of a fence.\nA plate of baby carrots, mashed potatoes and tuna set on a table with a cup and utensils.\nA man talks on his cell phone in front of toothpaste advertising.\nA car carrier truck with a car loaded on it driving thru a city.\nA street sign in front of a gated parking lot area.\nTwo men on a sunny beach flying a kite\nA young man about to kick a soccer ball on a green field\na young boy smashing a toilet with a little sledge hammer\nRound vases sit on tiny shelves against a white wall.\na train that is on a train tracks that is a model\nA young man is standing in the surf on a surfboard.\na bench next to a tall thick tree\nThree people in skiing gear posing with trees in the background.\nA wooden clocks sits above a shelf holding several books.\nTwo elephants walk in an open savanna with dried grass.\nA fire place with a clock above it on a mantle.\nA large number of pots that are grouped together.\nFour pictures of people skiing and snowboarding on a snow covered slope.\nA man is playing frisbee by herself on the beach.\nBlack and white photo of person walking with umbrella.\nBuses are lined up in a single line along the curb.\nA young man playing tennis with his hat on backward.\na rainbow umbrella some bicycles a fence and some grass\nA group of people riding skis across a snow covered slope.\na hand a cellphone a laptop and a beer coaster\na little plane flying across a blue sky\na couple of women sits around a counter top\nA train passing down a station, in the middle of the day.\nA city bus leaving a bus stop on a residential street.\nA group of people are gathered around a large pizza.\nA young boy posing with a baseball bat for a team photo.\nA cut in half sandwich sitting on top of a table next to a foam container.\nAn air plane landing on a landing strip.\nPaintings are hanging on the walls in a living room.\nA food store is architecturally designed to include a clock.\nA horse is walking in the sand along the water.\nA cow lying down in the grass with a cowbird next to it\nCows walking on grass and a dirt road\nA sales person showing a customer different phones.\na child holds a spoon of rice while a woman offers the rice on chopsticks.\nA boat and lighthouse are in a wavy, stylized painting.\na number of cows in a field near one anotehr\nA woman standing next to a man in a living room.\nA child in diapers standing on a bed.\nA city bus driving over a bridge under an overpass\nProtesters gather with signs on a street corner.\nA man in grey shirt with red tie and red baseball cap.\nA soldier is mounted on a horse as a small dog walks near.\nA small bathroom with shower, toilet and vanity.\nA very tasty looking sandwich and fries waiting to be eaten.\nA man uses both hands to swing his tennis racket\nA picture of someone carrying video equipment in a bag.\nA baby sitting on a bed holding a book and smiling at the camera.\nDecorative event banner at a field full of flying kites.\nA woman on a tennis court holds her racket as she finishes up her swing.\nTwo boys who are playing soccer against each other.\nTwo women and a child flying a red, white, and blue kite.\nA little boy looking at his birthday cake.\nA young bog eating cake with his fingers quite messily.\nTwo busses on a street next to sidewalk and trees.\na man throwing some kind of frisbee toy\nstrangely colored luggage stands out in a line of passengers\nA large open room has an overhead book shelf.\nOne of the two children is smiling as they pose next to each other.\nFour bicycles with baskets parked under a tree.\nYoung girl spilling water into canisters in the park.\nA picture of a small bathroom taken outside the bathroom.\nA light brown teddy bear sitting up posing.\nA boat sitting on water next to a red bench.\nThere is a cat sitting on the back of a motorcycle.\nYoung boy carrying white Frisbee with toy stuffed monkey on back.\na cat napping on the laptop while on firefox\nA female tennis player holding a ball and a racquet\nThree slices of cheese pizza and a quesadilla are on a plate.\na number of people standing holding white boards\na man leaning over as he plays a video game with a wii mote\nA desk has different peripherals, computer, and a binder beside a shelf full of books.\nTwo females stand in a modest dorm room\nA miniature wooden toilet in a doll's house bathroom.\nA man that is standing on a surfboard in the water.\na women grabbing onto a statue holding an umbrella\nSheep and lambs grazing in a pasture behind a hedge.\nA woman is holding a baby next to an elephant.\nA tray filled with fresh vegetables on a wood table.\na large group of snow skiers out side of a ski lodge\nA group of skiers has gathered at a red fence\nA living room with couch and fireplace in it.\nA pitcher wearing a red shirt and red cap throwing the baseball.\nThe side of a motor bike and side mirror.\nA young child standing next to a large box.\nBlack and white bathroom with large shower stall.\nfour kids holding wii controllers in a living room\nFour stuffed animals, a leopard and three teddy bears, in a row sitting on a stone ledge with grass and trees behind.\nTwo people riding on the back of a large elephant.\nA toilet that has an open lid with water in it.\nA bearded man holding a wire whip and a Wii controller.\nA flower shop has a wall full of differed colored vases.\nA baby sitting at a chair by a computer desk.\nA man in a grey t shirt holding a purple frisbee\nA stop sign painted on a wood pole.\nTwo zebras in a open dead grassland, one is eating\nA glass block wall in a bathroom is shown\nA black and white picture of a traffic signal in a city.\nAn old military plane on a runway with wings folded.\nA young man riding a surfboard with a large wave behind him.\nIN THE BATHROOM THERE IS A TUB TOILET AND SINK\nA small bird perched on the windshield of a car\nThe train car has been vandalized on the outside.\na elephant walks through a vegetation area next to some trees\nSome very big commercial planes all parked in a row.\nYoung women are playing frisbee on the grass.\nBoat in the lake looking for spot to dock.\nAn alpine skier leaning forward while jumping through the air.\nA diner with large pepsi signs on the front of it.\nA truck pulls a construction truck on its back.\nseveral female soccer players engaged in a soccer match\nYoung man holding a skateboard and his helmet.\nA person with a snowboard, sitting in the snow.\nA young boy is holding a frisbee with a picture on it.\nThree adults sitting on a couch looking at their laptops.\nA tall brick clock tower with a clock on each of it's sides.\nA bride and groom are cutting into a cake.\nBox of various doughnuts on a wooden table.\nA pizza with different toppings sitting on a plate at a table\nA man and a woman are flying a kite.\nFour teddy bears outside sitting on chairs on a sidewalk.\nA young girl combs her hair with a yellow comb.\nA woman with an umbrella in front of a crowd\nA cluttered desk, containing a laptop, blue water bottle, and many other items\nThere is a coffee sign below the stoplight.\ntwo brown and white cows in a forest\na street sign on a light pole on a city street\nA parking meter sitting beside an empty street.\na small child in a white shirt  and a bowl of cereal\nA sandwich with french fries and cole slaw.\nA train on tracks with power lines and buildings in the background.\nA red bird sits in a bird feeder in a tree on a sunny day.\nA woman posing in front of a batch of apples.\nA cat chewing on a packaged pink toothbrush.\nThe back end of a semi truck driving on a divided highway.\na couple of people that are cutting a piece of cake\nmonitors are hanging over people who are sitting down\nA young baseball player gets ready to field a hit.\nA teddy bear sitting on a ledge of a building\nBlack and white birds walking in the grass near water.\ntwo elephants together standing on a dry plain\nA counter top with a plate with a fork and few scraps of food and a teddy bear lying on side with arm outstretched on plate near fork, with another plate with an apple and two bowls with produce, a canister and some metal objects.\nStreet and stop signs direct traffic in the proper direction.\nA lady is holding her tennis racket for the crowd.\nA BOY IS PLAYING WITH A FRISBEE IN HAND\nWoman pushing a cart of luggage in a transportation terminal.\nA nice looking story on a sidewalk near some other stores\nA train is making a turn past a closed station.\nA cat is sitting on the desk by the mouse.\nA man sits on a crate with bananas nearby.\nA automobile with multiple bicycles on a roof rack.\na small dog is standing on a motorcycle\nA cat wearing a colorful hat over it's head.\nChild sitting in high chair with plate of food, stuffed animal in buster chair and bottle of ketchup. Another hand holding a fork and a partially filled plate.\nA cluttered room contains green counters, a brown table and windows.\nA horse attached to a carriage on a street.\na computer,a keyboard and a mouse and a bottle of wine on a table\nA large long table full of many laptops.\nsome people and signs a bicycle two horses pulling people in a cart\nA dog suspended in mid air catching a frisbee.\nA man jumping up to catch a frisbee on  the beach.\nA stop sign sits along a road next to a shore\nA pizza sitting on to of a white plate covered in cheese.\nA sink underneath a mirror inside of a bathroom.\nA bunch of yellow and orange fruit in varied sizes.\nA room filled with lots of toilets and sinks.\nA man riding a surfboard on a wave in the ocean.\nA woman standing on top of two pieces of luggage.\nA parked motorcycle on a dirt road in front of an old building.\na small bird on a fallen branch near other trees\na pole holding a couple of street signs beside a building\na rusty brown train trackwith just one train on it\nPears, cheeses, cornichons, and other delicacies are artfully displayed on a dish.\nYoung man picture of receipt with the phone.\nA cat laying on top of a refrigerator.\nA WOMAN IS GIVING A MAN A HAIR CUT\nA zebra standing on a dry dirt lot.\nA girl sitting on a couch is adding something to her mug while other people stand nearby.\nA train traveling down railroad tracks next to a train station.\nA lot of toys that are on a table.\nA horse and foal galloping through the woods\nA person skiing down a snowy mountain side.\nA cutting board with fruits and vegetables that include broccoli and blueberries.\nA giraffe is walking through a wooded area.\nA large truck turning onto a road in a city.\nA display of coffee and sandwiches on a patio table.\nA pitcher in a baseball game pitching a baseball.\nAn empty beach dotted with straw umbrellas awaits tourists\nThe colorful bird is perched on the branch.\na couple of birds swimming in a lake\nA hot dog that has some cheese on top of it.\nA red train parked on a train track.\nYoung girl in sunglasses standing in a lawn, holding a frisbee\nA kitchen scene with yellow walls and a checkered floor pattern.\nTwo children and an adult ride in a horse pulled cart.\nA woman is laying in her bed playing on her laptop.\nA bird flying through a cloudy sky over a body of water.\nA man is holding a surfing board on a beach.\nTwo brown horses inside of a steel fenced corral.\ntwo adults dressed in ski attire and skiing in snow in an open field\nA laptop that has a picture of outside a window.\nsome people are riding horses at the beach\na picture of a sign post for a bikelane at the corner of Hancock ave.\nA man that is standing in the dirt with a baseball bat.\nA mother and daughter smile as they eat their meal.\nThe old plane is now hanging up as a decoration.\nAn elephant with tusks curling it's trunk upwards, standing behind a fence in the sand.\nReplica wooden sailing vessel with passengers in a harbor.\na little girl sits on a swing with a stuffed animal\nA cat stands on a bathroom floor alone.\nGiraffe trying to reach some leafs on a tree.\nTwo slices of bananas next to ice cream on a plate.\nA bunch of biker dudes begin led by one on a orange bike.\nA pizza that is not quite shaped correctly\nthere is a small plane that is very close to the ground\nA large living room with a cat on the rug\nFour glasses of wine sitting on a bar are half filled.\nA man flies a kite at an event from afar.\nHalf of an airplane jet over a snowy mountain range.\nA man holding a computer mouse next to a glass of water.\nYoung child enjoys a deathly meal for dinner\nThe peperoni pizza is served from the restaurant.\nA car driving on the road near a road sign and a bird.\nTwo draft horses pulling plow, color, under cloudy skies with trees and other horses in background.\nA kitchen has wood cabinets and white appliances.\nBASEBALL GAME WITH BATTER UP, READY TO SWING\nThe blender is full of some type of beverage.\nA crowd of people standing around a pole with three fire pits attached to it.\nan animal behind a fence next to a tree\nFour cup cakes with sprinkles on a plate.\nA man with a large remote controlled hobby aircraft.\nMajor League Baseball players practice throwing on the field between innings.\nArms and hands holding onto the bars of a bicycle.\nA man talks on the phone at the table.\nA pigeon is standing and eating in the street.\nTHERE IS A PERSON THAT IS WALKING WITH A SUIT CASE\nWoman of African descent in mid tennis backhand.\nA cup cake in one photo, an empty wrapper in the next photo.\nA man in a suit eats a banana in his car.\nThree zebras stand together in a field of grass.\ntwo city buses one  following the other\nSeveral sheep grazing in the grass on a sunny day.\nAn orange RV and white mini-bus is parked in an adjacent lot from a building.\nMan standing in a yellow room holding some kind of remote\nA young man holding a foot long hot dog covered in pickles.\na bunch of sheep are staanding in a field\nDark haired man making a serve at a tennis match.\nA large number of people riding motorcycles down the road.\nThe two people are ready to serve the variety of donuts.\nA green broccoli plant with lots of green leaves.\nA guy with a cap holding a blue surfboard.\nAdults and children sitting on a bench at a park.\nA person on a surfboard riding it in the water.\nA white plate with two slices of cheese and a whole banana unpealed.\nA woman flips a tortilla in the kitchen from a skillet.\nTwo  beige plates with thick sandwich and mustard.\na aircraft flying above a snowy mountain\nA view of a pizza from a table, with a man behind it.\nPublic bus travelling down road past apartment buildings.\nAn Air Canada airplane is waiting at an airport.\nAnimals outside a shelter grazing in a pasture.\nthere are two yellow empty school buses\na big boat that is floating in a body of water\nA skateboarder rides his board through a skate park.\nTwo people engaging in water sports in the ocean on a cloudy day.\nA pink room with two urinals near a door that says Catering Staff Only\nQuail walking in tall green grass near a fence.\nTwo young boys read in bed using a lamp light.\nA black cat on a wooden table in front of a laptop.\nA little dog is staring at a herd of sheep grazing in he field.\nTwo aeroplanes with two sets of wings flying in a clear sky.\nA surfer holding a surfboard straight up on a beach in front of ocean waves.\nMan in all black doing a trick on his skateboard.\nA male skateboarding over steps in front of other people.\nA group of people who are serving a cake.\nSeveral older men sitting in front of a library.\na pizza sits inside of a box on a table\nA suitcase is sitting in a hotel room\nSome people standing under an arch which has a fancy clock on it.\nA bed with a blanket underneath a window.\n2 baseball players in the field prepare to catch the ball\nA family of zebras in an open landscape.\na bunch of bananas hanging near a blue wall\nA couple of bears standing next to each other.\na man holding a cell phone sitting in a car\nA stand on the side of the street with political tones.\nTwo guys at a skate part having fun.\nA large plane with airport terminal in the background.\nA tray with a glass lamb next to a pot of flowers.\nA small, dirty bathroom has peeling yellow, walls.\nTwo bowls of food on metal plates next to a fork and spoon.\na close up of a dog with its head in a bag\nA couple sun bathing near their bikes on a bay\nCars and people on a street traveling under a traffic light.\nA green walled building in the middle of a brick wall.\nAn Apple desktop with an animated figure on the desktop\nA street scene of a busing coming down the road and dark clouds in the sky.\nThe woman waves at another surfer also carrying a surfboard.\nA sheep with lots of fur on a fence in the field\nCloseup view from front underneath of a commercial airliner plane in the air with wheels down, against blue sky.\nA woman is taking a picture of herself.\na close up of a plate with a doughnut near a cup\nA man about to hit the ball with the tennis racket.\nA marina full of boats nearby a seaside town\nA large brown dog running across a grass covered field.\nA silver tray on a counter serving pizza.\nA desk with a cell phone and two computers.\nA group of people sitting around each other in a room.\nA small yellow car with a driver sitting on the right side of the vehicle.\nA skateboarder, holding a skateboard in front of the camera.\nA bride and groom cut the cake on their wedding day.\nA truck driving down a road along side of train tracks.\nA stuffed teddy bear sitting on a green bed.\nThe famous Suzuran Street in Tokyo during the day\nA street sign for Rodeo drive is seen in close up.\nA kitchen with a sink, mirror and window.\nAn older man surfs in the large waves.\nThe two young children are playing with a plastic chair.\nThree laptops with faces on the screens on a bed.\nA group of boats sitting in a water cove next to some buoys.\nA Dell laptop on a desk is surrounded by cords, books, and papers.\nA blonde girl in green shorts playing tennis.\nA man is sitting on top of an elephant.\nA white and orange train traveling down train tracks.\nTwo trucks with workers in the extended baskets.\nA man wearing a snowboard is standing on his head.\nAmerican airlines commercial jet sitting on a tarmac.\nA yellow fire hydrant sitting on the side of a road.\nA woman holding a plate with a pizza on top of it.\nA person is holding a banana that is dressed in a costume.\nAn old black railroad car parked on the tracks\nA man that is standing on a tennis court with a racquet.\nA cupcake that has a ribbon on it.\nA young man standing in front of a white plane that a young woman is standing in.\nMany skiers are on the snow covered mountain side.\nThe glare of the sun cuts across a wave and a wet-suited surfer coming in on the tide.\nA woman with a bat hitting televisions that say Comcast Doesnt Care.\nA traffic light monstrosity shaped like a tree sitting in a parking lot.\nA little boy flying a kite up in the sky on a beach front.\nA herd of zebra standing along side of a river.\nThe tower of the building has a big decorative cross on it.\nBottle of red wine and red wine in a wine goblet.\nA crowd watches a player pitch a ball in a baseball game.\nSeveral people waiting for the train to arrive.\nA man is next to a boy on a surfboard catching a small wave.\nA man stands in a train station as a train passes\nTwo horses in grassy area with fence and house in background.\na clock in the center of some plants and bushes\nSeveral men are all trying to catch a Frisbee.\nA woman and a little girl approaching a train on the tracks.\nA toaster oven that is heating up on a table.\nMany animals sit on the beach next to the ocean.\nA person riding skis across snow covered ground.\nA bus is going down the road at night.\nA person on the street with ear phones neara parking meter.\nA laptop computer sitting on top of a table.\nA horse that is standing in front of a carriage.\nTwo people wearing jeans sit on a bench with their legs crossed.\nA sign with a button for crossing on a street corner\nA herd of sheep with a  man standing next to them.\nA bunch of animals being held during a competition.\nThe man is holding a teddy bear wearing a hat and scarf.\nStatues on the second floor of a building, sitting below a clock.\nA man his holding his cell phone overhead.\nA yellow traffic light hanging over a city street.\nCrowd of attendees among colorful display on banners.\nAn woman across the table puts her hands over her mouth and nose\nA motorcycle full of gear parked on a gravel road.\nA boy is on a tennis court carrying a tray of balls.\na kitchen next to a wood floored living area\nTwo elephants with their trunks raised are at a log rail.\nA tray that has various plates, with various foods.\nSandwich and greens on a plate with a glass of water.\nA bathroom vanity with a his and hers sink.\nA woman with a blazer on has her hand up to the side.\nA chair sitting at a fire hydrant near a road.\nA person who is wearing glasses holding food in their hands.\nThe two people are talking about items on the computer.\nA home office with a cat sitting in the middle of the desk.\nA granite counter with a plate of food and a drink.\nA chocolate cake with chocolate frosting and zebra top\nA person that is in the water having some fun.\nThe jet airplane is parked near a field of tall grass.\nThere is a double decker bus that is red and beige\na laptop on a desk with an extra keyboard\nA kitchen in industry with empty everything\nA group of five posing for picture on skis.\nFour older men sitting on a wooden bench.\nA picture strip and a pair of blue handled scissors.\nA kid with a large umbrella on a street.\na group of zebras under a huge shade tree in the middle of a grassy field\nA farm along a river overlooks a wind turbine.\nA man wrings his hands while observing a tray of pizza outside\nA kitchen sink sitting under a kitchen window.\nA boy in blue striped jacket playing with a toy.\nThere is a mirror and trash can and a mirror with two cats nearby.\nA bunch of goats are eating out of a box\nA woman smiling while holding an open umbrella.\nA little boy stands outdoors on a rainy day with a pink umbrella\nHotel room with a pair of beds and a sliding glass door.\nTwo giraffe's in a pin, one walking, one standing still.\nA Seattle Mariner's baseball player is up to bat at a baseball park.\nA woman standing on a bridge holding an umbrella.\na close up of a keyboard on a desk\nTHIS IS A PHOTO OF A BLUE MOTORCYCLE\na close up of a person cutting a piece of cake\nA vintage clock from the 19th century tells the time.\na kitchen with counters a door and cupboards\nThe white head of an animal sticks out from a field of green grass.\nA woman in a red jacket sits astride a white horse.\nThe people sit at the bar next to the motorcycles.\nThe children are getting ready to enjoy a piece of cake.\nPeople on a motorbike near a vehicle loaded with food.\na man on a pitchers mound lunging forward delivering a ball\nA giraffe walking near a tour vehicle in the grass\nA person spoons macaroni and cheese into a bowl.\nA very big messy bed filled with many items.\nA plate of food with various items on it.\nA refrigerator plugged into the wall of a kitchen.\na brown and black acoustic guitar and an orange frisbee\nA zebra follows another zebra through a park.\nA large vase contains an assortment of flowers.\nA man in a wet suit is surfing in the ocean.\na person holding onto a partially eaten donut hole\nA cat that is sitting on a motorcycle.\nBlue commercial airplane getting loaded at the gate.\nA man riding a motorcycle on a race track.\nA dog's face is partially showing and being blocked by something.\nA shot of a baseball player about to throw the ball.\nThere is an old yellow train coming down the tracks\nTwo vehicles cross under several street lights at night.\nA cat on a toilet seat in some dirty washroom.\nA group of people stare up at something out of the frame.\nMeat and mashed potatoes smothered in gravy with peas and carrots and bread\nA series of two pictures with a small dog wearing a fruit hat.\na man goes down the street on a skate board\nA large elephant standing next to a pile of dry hay.\nA woman and a horse standing in a corral.\nThe picture is full of many suitcases with tags.\nA woman walks across the street at the intersection.\nImage of a bedroom featuring a modern style bed and other furniture.\nA man wearing skis and holding a handle leans toward a sandy plain.\nA group of men in bathing suits next to an airplane boat in the water.\nA man is standing outside a store at night time.\nPeople stand by a truck near a street filled with vehicles in a city.\nthe bench is completely covered in snow so is the tree\nCloseup of row of yellow hats and baseball mitts.\nThe small refrigerator holds several different types of drinks.\nA wine glass with wine next to a wine bottle.\nA man riding a motorcycle down a road with a POW - MIA flag.\na number of people standing flying a kite\nA police officer on a motorcycle patrolling a protest.\nsome bowls of food, one with broccoli, the other with some chow mein noodles\nTwo people, a woman wearing a hat and carrying a paddle, and a man, both hold umbrellas.\nA white plate topped with meat and two types of veggies.\nA dog lying down on the beach.\nTwo girls holding Wii remotes and nunchucks while standing up\na woman is holding a tennis racket on a court\nThe motorcycle is sitting beside of the people.\nHeadless statues show of clothing beneath a colored background.\nPublic transit bus traveling past brick large building.\nAn intersection during a cold and foggy night.\nA couple of people riding on the back of an elephant.\nA man and two dogs stand near a park bench.\nA woman is jumping on a hotels bedding.\nA golden vase filled with flowers on top of a table.\nA few pieces of pizza sit on a skillet.\nA rusty bicycle filled with mangoes and bananas.\nA red fire hydrant sitting on the side of a road.\nA motorcycle is parked on the grass while people look\nA very colorful old style train engine on the tracks.\na toilet sitting underneath a big window\nThe man and the girl are flying a kite at the beach.\nTwo double Decker buses on a two way street.\nThe street signs are clearly visible for all to see.\nA male chef holding up a knife in a cooking area.\na small kitchen with stainless steel appliances and a large window\nA white toilet sitting in a bathroom stall.\nA cat standing on a woman's shoulder in a bathroom.\nA teddy bear cake with a candle and sparklers.\nA passenger rail train leaving the train depot.\nA man swinging a tennis racket at an outdoor court.\nA white-and-black cat sitting on top of a laptop.\nA snowboarder is doing a trick mid air.\nFour beautiful women in red posed around a motorcycle.\nA camper brushing his teeth standing on a stairs brushing his teeth.\na close up of a bird flying thru the air with people in the background\nPlate of food with green vegetables on top of bread.\nBusiness is slow at the local bathroom sink shop\na close up of a person bending down feeding a dog\nA large brown wooden fence near a wooded area.\nA skateboarder skating up the ramp at a skate park.\nA stainless steel stove that is in a kitchen.\nTwo sumo wrestlers and referee with people watching.\nA man sits in a chair and pets a furry dog.\nA small clock sitting on a bedside table\nThe dog is in the car with his head out the window.\nA sole person sits in the front pew of a large church.\nTwo guys talking while standing near a parking meter,\nA white bowl filled with a caramel chocolate dessert.\nA flock of birds are flying near a body of water.\nA lady sitting on the bleachers looking at her cellphone.\nA bare kitchen has light wood cabinets and counters that appear to be granite.\nA person riding a skateboard and doing a trick in the air.\nSome food and bread on a plate on a table.\nA piece of cake on a plate with cream filling next to a fork.\nA smiling woman pressing her head against a mans head.\nA large white church with a bus outside\nsome brown and white oxen laying in some dirt and cars\nA large living room with a kitchen in the background\nA table covered in fresh produce and a book called \"edible San Diego.\"\nA group of three women standing around each other near surfboards.\nA person with feet propped on top of a desk.\nA bowl filled with food sitting next to two pieces of bread.\nA toy train set with flowers and house\na desk with a cross on it and candles\nA mixture of random tools sit on a metal tray.\nA girl is hold a new white and black racket.\nA laptop seems to have the infamous \"blue screen of death\" on the desk.\nsuitcases sit on a dressed up stage and bags on a dressed up table\nA modern motel room features oak storage and casual accessories.\nA laptop sits precariously on a desk, with a second keyboard in front of it, and windows behind it.\na cooker and an oven well cleaned in a kitchen\nA small white bathroom with a colorful tile accent.\nPeople at a gathering with some hitting a beach ball into the air.\nA surfer prays while standing on his board.\nA chef is in the kitchen wearing a white apron\na doll sitting by a plate with a sandwich and fries on it\na male in a white shirt riding a bicycle and some signs\nBEAUTIFUL SCENE OF THE RIVER AND ALL THE BUILDINGS FROM THE BENCHES\na church with a tower and a clock built into it\nA black motorcycle sits on a paved surface.\nA city street filled with lots of traffic.\nChicken and broccoli are in a skillet on a stove burner.\nA cat sitting in a leather office chair.\na person in uniform riding a horse\nThe young woman is looking at her cel phone.\nA young soccer player is preparing for the kick.\nA group of young children sitting around a long table.\nA white plate and metal fork on a plate of food\nMom cuts the birthday cake for her daugher\nTwo giraffes standing by a pole in a grassy field.\nA giraffe sticking its head over a fence.\nA view of two computers sitting on a desk, with a man on the cell phone behind them.\nThis is a display of teddy bears and snow globes\nThe surfer expertly crouches to finish the ride.\nA small boy with blonde hair sitting in a rocking chair and holding a baseball bat.\nPeople standing in an over cast ski looking out to sea with surf boards.\nA smiling blue-eyed boy toddler chewing on a plastic object.\nA bird sitting on the branch of a tree.\nA cellphone with a strange rainbow screen saver\nA smiling shirtless man laying on a bed.\nYoung men are gathered together while enjoying drinks.\nA toilet, sink, and shower are located inside this bathroom.\nStudents in a classroom watching a lecture on television.\nA man sits on a bench and plays his guitar.\ngiraffes, zebra and bulls in zoo habitat together\na living room with an orange couch and green decorations\nA lot of people are sitting on the bench.\nAn elephant that is putting something in its mouth.\nA street with many buildings is lit up at night.\nA woman on a phone with a book with peoples photos\nA large platter full of colorful food product.\nThree zebras standing next to each other with heads together.\nA couple of people standing in a room.\nOlder woman and two young guys stand against the fence posing with tennis rackets\nTwo old ladies with rackets playing tennis at the court\na person skiing while holding onto some wires\nOstrich in enclosed area next to a giraffe.\nGiraffes statue displayed in indoor room at commercial business.\nAn elephant is standing in front of his food at a zoo.\nAn old brick clock tower with a metal roof\nIt is very dark in the room and there are pillows on the floor.\nA woman walking on a sidewalk talking on a cell phone.\nThe building is a piece of art.\nA guy skateboarding on a street at night.\nA blue painting dominates a living room with a brown coffee table.\na person riding a motorcycle on a city street\nA stuffed holiday bear decoration in a garden.\nView of the underside of a jet airplane passing overhead.\nLettuce, a knife and tomato slices sit on a cutting board.\nA lady is sitting in a restaurant while talking on her phone.\nA living area with various furniture and a bicycle.\nGrown men playing an indoor soccer game on turf.\nTwo vans are parked next to each other.\nA table has two plates of desert on it.\nA man and two girls sitting at a restaurant table\nAn unmarked van with trailer in tow is pulled over.\nA small teddy bear with a pink bow sits of a bed\nA soccer goalie unsuccessfully jumping for the ball\nSome flowers are in a clear sealed tube\nA sign on the side of a building.\nA cat sitting on a shelf in a refrigerator.\nA lady is walking along side a blue train.\nA red bench in the middle of a city street.\nA black motorcycle with a gargoyle painted on it.\nThis is a modern living room with great natural lighting.\nA toilet sitting in a bathroom next to a scale.\nA white coach travel bus sits parked on the street corner.\nTwo people skiing a snowy trail lined with trees.\ntwo cow grazing in a field with a tree beside them\nAn egg sandwich and other food on a tray.\nteddy bears dressed up in clothing sitting on a loveseat together\nSnowboarder and skiers on a bright sunny day.\nA man is talking to a horse which is inside a fence.\nA woman balances her surfboard atop her head on the beach.\nA man in a \"nun\" costume riding a skateboard in a parking lot.\nThree people standing at the waters edge on a beach with a blue surfboard.\nthere is a apple and two oranges and a stuffed animal on the bed\nMany different fruits and vegetables are laying side by side.\nThis is an image of a cat sleeping on a table next to houseplants.\nA police car next to a pickup truck at an intersection.\nLooking down at a cup of coffee and a piece of cake\nA child with an umbrella walks down a store aisle.\nA small white dog tucked into a persons backpac\na couple of people are playing with a flying disk\nA close up of a woman eating a hot dog on a street.\nTwo people are playing the video game while the others sit at the table\nA sign on the side of the street with religious meanings.\nWooden pole in sub urban area with intersection and trees nearby.\nA fire hydrant is in front of a wall which says Fire Hydrant.\nA tennis player in action on the court.\nA man riding skis down the side of a snow covered slope.\na baby wrapped up in a blanket laying next to a brush\nA couple of plates with sandwiches on them sitting next to an open can of spam.\nA girl is eating a piece of pizza.\nA child is on top of a boogey board in the water.\nAn orange sign that says the right lane is closed ahead.\nA small baby with a kite and other people playing with kites.\nA man gestures over a microwave as he leans on a chair.\npair of women standing on sidewalk at roadway pedestrian crossing area.\nThe little league player swings a bat at the baseball.\nPeople chopping cucumbers while a third person watches.\nA large umbrella open wide on a pole.\nA woman standing in a  room holding a Nintendo Wii game controller.\nthere is a train that is about to go through a tunnel\na truck by the water with a boat attached to the end of it\nA plate of sliced bananas, melon, and orange slices.\nA pie with a fork and knife place setting and a bottle of beer to drink.\nA stop sign and fire hydrant on a grassy corner\nA person sitting in a chair in the living room.\nA donut sitting on a plate next to a cup of coffee.\nA slice of pizza on a paper plate.\nA modern residential bathroom with a shower over the tub\nfour wooden benches under the shade of a tree in the park\nA large bird swoops over the waves of the ocean.\nThe plain is taking off from the airport.\na public transit bus on a city street\nA city bus traveling down the street next to a truck.\na person riding a horse close to the water\nElephants are hitched at this post like horses in an old west town\nCrowd of people at outdoor gathering on grassy field.\nA woman rides on the back of a prancing horse.\nA horse peeking out from behind a hedge\nA little girl in blue shorts standing on a tennis court.\nThe tiny bird is flying next to the flower.\nA group of men standing around a batting cage.\nA bookshelf is packed to capacity with books.\nA man with his hand on his skateboard as he is about to come down a ramp.\nA dog laying on a couch in a living room.\nA clock tower next to a building with a painted mural on it.\nA broadcast editing room with numerous video monitors and audio mixing stations.\nA double decker bus going down the street.\nA boy does a skateboarding trick next to a building.\na person in a field flying a kite\nA picture of a sun that is over a street.\na man is wearing headphones and eating food\nA yellow and blue motorcycle parked next to a stage.\nA cat stares at a television, which is turned on.\nOne zebra lays in the dirt while another walks away.\nLarge buses and cranes on the wet parking lot of a commercial building.\nTwo zebras in an enclosed area during the day.\nCalico cat sprawled stealthily in the grass in an alert manner.\nA man and woman posing all dressed up.\na close up of a cake on a plate on a table\nLunch plate with grilled sandwich, carrots, cheese, bananas, and lemon.\nAn old fire hydrant sitting outside in the grass.\nTwo women at a long table working on some urns.\nThe motorhome is parked outside the red brick house.\nA person surfing on large waves in the ocean.\na couple of people that are under a umbrella\nTwo people sitting on a ski lift, one posing for the camera while wearing a colorful hat.\nA man and a woman standing next to a table fulled of lettuce.\nA herd of three zebra standing next to each other near two giraffe.\nChopped and sliced ingredients atop a cutting board next next to a bowl partially filled with grated cheese.\na  large building that has large clock on it\nA smiling woman showing off her pizza topped with olives.\nPeople are standing in a field under British flags.\nA red bike in front of a statue and cannons\nA woman in tennis whites playing tennis on a professional court.\nTwo men in black aprons stand in a kitchen tent area.\nA baby sitting at a table with a plate of food.\nA very cute dog with his nose in a big red circle.\nThree elephants, one a baby appearing to be holding it's mother's tail, in wet land, but arid hill in background.\nA street sign gives directions to numerous major streets.\nA hotel room with a bed, chair, desk and an end table.\na person sitting on a bed reading a book\nA man riding skis on a snow covered summit holding ski poles.\nA big sign in front of Lake Kawaguchiko.\na young boy holding a baseball bat with a baseball helmet on\na male tennis player in a red shirt is playing tennis\nA man putting his time card into the time card machine\nTwo boys with their faces painted hold stuffed animals.\nBathroom counter with lighting on over mirror and sink.\nA giraffe standing in a field by some zebra's passing through.\nAn orange monoplane is tied down on the tarmac.\nRainy camera showing a car driving down a street.\nAn octopus vase with three roses in it\nA young boy is playing with a red soccer ball.\nPeople prepare to fly a kite with an image of an American President.\nA bus that is sitting in the street.\nA cellphone sitting on table with papers in the middle\nA man on a surfboard is riding the wave\na couple of people that are standing in a field\nA man stands by his bicycle with long horn handles on the sidewalk of the beach.\nA boy holds a cellphone up to the camera.\nA tall elephant standing next to a man next to other elephants.\nA young girl throwing a softball to a team mate.\na close up of a building window with a sky background\nA double-decker bus is going down the street.\nToddler boy sits on the stairs holding a tennis racket.\nA black and grey dog in the passengers side of a truck.\nStainless steel fridge in the kitchen of a home.\nA baseball player that has just hit the ball.\nThe train looks as though it needs to be fixed and washed.\nA kitchen and a living room are situated next to each other.\nA girl sitting at a counter with a piece of pizza.\nTwo giraffes that are together in an enclosure.\nA man in a suit does a dance pose near a young child.\nA boy on a skateboard going down a rail.\nA black and red locomotive sits on the tracks.\nThe woman in the black and white dress has a colorful tattoo.\nA solitary man walks through a crowded parking lot with his striped umbrella.\ntwo giraffes sitting on the grass outside of a stone enclosure.\nA skateboarder making a big jump in a parking lot.\nTwo trays of pizza are on the racks of an oven.\nthere are two small bears embracing each other\nA man in a suit talking on a phone\nA small child at a table eating some food.\nA herd of sheep on the side of a road with trees to the side.\nA young man is doing a trick on a skateboard at a skate park.\nThe cow is all alone in the brush.\nBench sitting on sandy area with lighthouse structure in background\nA horse is pulling two people in a carriage on a street.\nA man sitting in a chair in a kitchen drinking a canned drink.\nA steam train parked next ot a 1950's commuter train.\nA picture of someone's dinner. Steak with carrots and greens on the side, on a green plate.\nA train that is sitting on a track.\nA baseball player pitches on a dirt floor.\nThere are people that are flying kites in the air\nA family of elephants standing in a watering hole\nA bowl has fresh fruit and a toy fish.\nA black bird standing among blades of grass.\nThe stop sign is near a fire hydrant on the neighborhood street.\nthere is a large piece of food and a knife on a cutting board\nAn office desk with keyboard, monitor, mouse  and lava lamp on it.\nA small bathroom with a toilet that has buttons on the side.\nA man standing in a carriage hooked to some horses.\nA woman is laying on the bed with her feet in a suitcase.\nA very spacious and well organized kitchen witha wood floor.\nBird cages with birds in them inside a pet store\nA view of the ceiling of a kitchen with several light bulbs.\nthis plane has two large fans on its wings\nA crowd of people shopping for fruit in a farmers market.\nA group of people riding boats in the middle of an ocean.\nA group of zebras and giraffes standing by a bus.\nA red truck parked in a parking space.\nA group of people that are standing in the snow.\nA sign warns of a 350 fine for honking a horn.\nA colorful bird is perched on a branch.\na field that has a bunch of people flying kites\nA child with a backpack underneath an umbrella.\na picture of a large clock tower in a city.\nA spot with a few materials that is agreeable.\nThe motorcycle riders are taking cover from the rain.\nThree slices of tuna lie on a plate with garnishes.\na man is wearing yellow and blue in skis in the snow\nA zebra leaning over to eat some hay in a field.\nA giraffe standing in the grass and bushes, next to a bare tree that has one bird perched at the top of it.\nThe clocks are on display in the room.\nA group of people standing on top of a sandy beach flying kites.\nA man wearing glasses using a laptop computer.\nA man is standing on a kitchen counter painting the wall.\nA sign shows various directions through an intersection\nA person in a wheelchair walking a dog looking at a horse\nSOMEONE HAS THERE FOOT ON THE COFFEE TABLE WHILE WATCHING TV\nA surfer is in the ocean riding a large wave.\ntwo brown animals and one is laying down the other is standing\nIt is raining, a male jumping and so happy to take this picture\nA residential house next to some trees and a field\nNicely decorated train has a red smokestack and gold trim\nThis is an image of a pug chewing an empty water bottle.\nA sign for Madam's Organ Restaurant  Bar hangs on the side of a building.\na black and yellow bus driving down the bus with a double decker bus behind it .\nA young child sitting in front of a TV watching the Flintstones.\na toy set of a bear sitting at a desk\nA man standing in front of three toilets in one bathroom.\na few boats that are out in the lake\nPersonal toilet in a portal potty in a very confined room.\ntwo men standing next to each other one on the phone .\nPlane seen on the horizon above the boats\nA person in a dry area with a sail high in the sky\nA couple of trucks and a car driving down a highway.\nAn x-ray machine in a hospital next to a bed.\nA skateboarder is partially kneeling on his skateboard.\nThe huge delivery jetliner has three turbine engines.\nA colorful lady flying a colorful kite on a sunny day.\nA child sitting on a horse holding a flag on a field.\nA plate with fruit and nuts and cookies.\nA woman is holding a young girl up to look at a horse behind a fence\nSmall girl laying down on top of a board on the beach.\nA woman is hitting the ball at a tennis match.\nA woman hits a tennis ball with a racket.\nTwo tall television monitors are next to chairs and desk.\nA collection of painted boxes stand in a courtyard.\nA street with a wall with graffiti and plastered paper.\na toilet next to a sink in a bathroom\nA small herd of sheep stand still in the snow.\nA person is on a skateboard performing tricks off a wall.\na beautiful white bathroom with one huge mirror.\nA party with people, some in costume, standing around something not shown.\nLots of crew people in a large building working on an airplane.\nTwo donuts are on a plate on a desk.\nA young man ridding a skateboard down a rural street.\na close up of a cat laying on a dog laying on a bed\nThere are airplanes parked in a lot at the airlines.\nA giraffe is caged inside a building at a zoo.\na person in mid air on top of a snow board\nBlurred view of an intersection and metro area.\nSeveral cows standing in the grass near a few buildings.\nA sandwich, carrots and strawberries in a lunch box.\nLaptop computer with keyboard and mouse displayed on white surface.\nA man in grey baseball uniform swinging a bat.\nThe boy is wearing a suit and a tie.\nA boy is skateboarding on a city street.\nSkiers waiting to ski on a busy mountain slope.\nA table topped with a pizza next to a salad.\nA man with a drink stands by a woman in a white hallway at a house party.\nA beautiful woman holding two skis while standing near a wall.\nA muffin on a plate with a cup of tea.\na blue vase with blue flowers on a sink counter top.\nA person on a motor cycle poses on the road.\nA picture through a porthole of a bike on the boardwalk.\nMotorcycle riders are approaching an intersection by a bridge.\na street a fence people cars and traffic lights\na man departing a bus onto the street and another man standing next to the bus from the sidewalk.\nA man holding a motion controlled video game controller\nAn over ripened banana and a cup of coffee.\nA dark alley with an umbrella in it.\na orange sponge cake, with something square around bottom.\nA little boy chewing on a tooth brush that is still in the wrapper.\nA wall mounted grandfather clock mounted to a wall.\nA man and a group of kids on a field.\nan image of a professional baseball game being played\nAn airplane landing strip area and apron area with several planes parked on it.\nA giraffe and five wildebeests roam in the Savannah.\nA herd of zebras is running through the grassy landscape.\na long body of water lines with boats and trees.\nA large bed sitting inside of a bedroom next to a lamp.\nSmall children holding up white controllers on a couch.\nAn old sign hangs on an old building\nA bird sitting on a hand that has a glove on it.\nA plate with three doughnuts on a table.\nThree woman holding vegetables outside on a cloudy day.\nA motorcycle parked outside in a parking lot near the beach.\nAn adult talking to child while cross-country skiing.\nThe man is racing his horse on the race track.\nThe dog is being fed with a banana.\nA cake shaped like a stuffed and roasted chicken.\nA man in a blue shirt with a red beard, laughing.\nMany people are sitting under black and white umbrellas.\nTwo cats laying on the floor playing with toys\nTHERE IS A DESK TOP COMPUTER ON THE TABLE\nA man swinging at the ball in a game of tennis.\na person sitting on a curb operating a cell phone\na keyboard an orange and white cat a desk and a monitor\nThe mirror is near the view of an ocean beach.\nA large brown dog walking next to a wooden table.\nthere is a male baseball player that has swung for the ball\nThere are two street signs on the pole.\nA stop sign is on the side of a school bus.\nA bunch of scooters sitting a room with themselves.\nA girl in boots on a skateboard and a man teaching a boy to ride a scooter.\nA man is swinging a tennis racket at a ball\nfour different pictures of men making homemade pizzas\nA white bathroom with sink, toilet and tub.\nA young boy walking through a living room towards  a cat.\nA homemade focaccia is ready for the oven.\nA man with black hat and glasses holds a cup with drink\nA small child is cooking in the kitchen\nAn enclosed shower with a window and bathtub.\nA row of kites in the sky and girls are walking on the road.\nA large commercial airplane parked on the runway\nA desktop and laptop computer sit side by side on a desk.\nThree Starwars action figures playing in a blender.\nThere is a little dog next to the driver.\nA man is turning on a fire hydrant.\na yellow and blue train riding a track by some trees\nAn elephant with its calf standing inside an enclosed area\nA herd of elephants are by the water.\na row of skiers skiing on a course\nA truck carrying a golf cart follows behind a motor home.\nA group of people out enjoying a trail ride on horseback.\nA shiny metal train is traveling down the track in front of a sport's stadium.\na person on a small boat in a river\nA man wearing a toothbrush for a moustache.\nSeveral people sitting at a table working on their laptops.\nA small table set with fruit and drinks in front of a wide window with brown chairs.\nAn old fashion with a red truck with someone walking towards the front.\nA bear stands in front of a large fallen tree.\nA man taking a swing at a tennis ball\nTwo dogs and a cat laying in a big bed.\nA man in the middle of a busy city street displays nearly the same colors as an approaching Volkswagon bus.\nThe building has a large clock displayed on the side.\nTwo people ski down a large snowy hill.\nA close up of two doughnuts on a plate.\nA group of people ski down a hill\nA large giraffe standing in a grass field.\nA young boy wearing a powder-blue baseball uniform poses for a picture of him holding a bat.\nA woman is watching a girl ride a horse.\nA white plate topped with mint angel food cake.\nA plastic cup filled with two tooth brushes and a tube of toothpaste.\nFour skaters in speed suits are racing down a curved street.\nA bathroom with a white toilet next to a shower.\nA black and white image of a bird flying over the lake.\nA kitchen is well lit by three hanging pendant lights.\nA back of a truck with doors and two windows.\nA spinach pizza sits on a plate next to a class of wine on a table.\nPeople mill and gather about a vintage military airplane.\nA group of children with frisbees are standing in a field.\nSkier skiing down a hill near a guard rail\nThere is a little boy standing in a base ball uniform\nThe skateboarder has fallen off is the board.\nA silver BMW motorcycle being posed for a picture.\nA man puts his feet on a desk with a laptop, a PC, books, and work papers.\nSomeone flying a kite while on the beach.\nMan holding a tennis racket and ball on the tennis court.\nSmall boat moving along water with orange objects hanging off end\nA dump truck that is driving on a dirt lot.\nTwo giraffes standing next to each other under a group of trees.\nThe bananas on the tree are not ready to be picked.\nTwo horse pulling a wagon with a load of hay with children on top.\nA toilet is sitting in the grass by the trees.\nThis is a picture of three buses parked together.\nA woman standing at a bus stop with an umbrella\nA young lady holding a black umbrella in front of green bushes and trees.\na couple of giraffes walk next to some trees\nA very steep snowy hill filled with skiers and a lift.\nA group of people flying kites in a blue sky.\nTwo children sitting on a couch eating food off of plates.\nA RED NOSE PIT BULL PUPPY SHOWING HIS TONGUE.\nA few people are off their surf boards in the water.\nA shopping center sign right by a road and a big red building.\nA plate with a cupcake on top of it next to an orange.\nA kite with happy pictures on it is flown on the beach.\na person in a field with a plane shaped kite\na person on a surfboard riding a wave\nA waffle iron, and the ingredients for waffles are displayed.\nA woman riding on the back of a motorcycle with a child.\nFloor level view of woman with dark stockings and high heeled boots in crowd.\nA cardboard garage sale sign stapled to a post.\nAn elephant performing tricks on a stool in a circus.\nA cat standing near a dead bird with some words on the picture\nA group of men playing a game of soccer.\nThe windmill is sitting in an open field.\nA man in a suit talks on a cell phone.\nSeries of clocks with lights in them on a city street.\nA woman and a man are cooking food in a kitchen.\nA man is standing on the sidewalk talking on a cellphone.\nA chicken sandwich with tomato and lettuce with onion on the side.\na bath room with a mirror and a sink\nSeveral people standing outside in the evening, some carrying umbrellas.\ntwo street meters attached to the same pole on the road\nFive adult sized giraffes grazing in a field.\nA boy on a body board with a surfer standing in the water behind.\nAn old teddy bear stuffed into a iron railing on a balcony.\nA man with a large bear wearing a brown hat.\nTwo pizza rolls on a tray with a sign up\nA man holds up a small banana in his hand.\na hat on a table near a cake\nA motorcycle rider is near a crowd on the sidewalk.\nA green city bus pulling out into the street.\na group of zebras graze on some grass next to an antelope\nA view of a bunch of birds flying around purple flowers.\nAn individual is in the open view in the picture.\nA group of zebras standing close together .\nA photo of a bedroom with two beds.\nA man in riding armor poses in front of a motorcycle.\ntwo small children playing next to a fired hydrant and holding a balloon\nA smiling man in a striped shirt playing a video game.\na line of skate boards sit in front of a wood plank\nThe bathroom has a wall sink, medicine cabinet, toothbrush holder, and bare walls.\nAn instructor is teaching the little girl how to surf.\nA crowd of people walking down a street next to tall buildings.\nA giraffe standing in a dirt filled area.\nThe train is crossing the bridge by the water.\nA suitcase sitting next to a bottle of champagne.\nA lady is observing three other people in the background.\na kitchen that is empty with just a sink and some wine bottles.\nA pizza with no meat overflowing from a plate.\nHorses, a pony and sheep all grazing in a green field\nLarge collection of cakes shaped like hearts on a display.\nA toddler stands next to a No Trespassing sign.\nA white plate topped with meat and veggies.\nA Frito Lay delivery van parked outside in a parking lot.\nAn assortment of donuts on a plate.\nA person is standing with their foot on a skateboard.\nA little girl sitting by a bunch of bananas\nA dalmation dog sitting in the drivers seat of a bus\na tall and old brick building with many windows\nA soccer team in purple is watched by a crowd.\na woman and a little girl with an orange shirt standing on a skateboard\nA woman is cooking food at a restaurant.\nA cat is sitting on a pink chair near a computer.\nan image of two people that are each holding kites\nA cook standing in a kitchen in front of two bowls of food.\nTwo horses roaming the fields during the day.\nA young man riding a skateboard down the side of a ramp.\nA man sits down around the bunches of bananas\nTwo giraffes in the trees, one standing up.\nA red fire hydrant with a motor scooter in the background.\nFresh vegetables and smoked sausage on a bread tortilla.\nMan in black business suit on street corner.\na bathroom with red walls a shower a sink mirror and toilet\nTwo people wearing life jackets on a watercraft.\na cat playing halfway under a straw hat\nThere is a huge crowd of people in an area sitting on the grass and watching.\nLarge clock on post displayed near overhead display of commercial enterprise.\nThree cats are relaxing on a tile floor.\nA flock of birds flying over water and sand with a volley ball net on the sand.\nA white sink sitting next to a toilet.\nOn a wide street are people walking, on bikes, or in trucks.\nTwo older men that are preparing a table full of great eats.\nA number of people moving about on a snowy ski slope.\nA hitter, catcher, and umpire playing a baseball game.\nA white jet sitting inside of a hangar next to other aircraft.\nA trio of elephants stand in front of a watering hole.\nA man in a t-shirt flying a box kite\nA man in a blue jacket is traveling on snowshoes through snowy woods.\nA white bowl filled with rice and vegetables.\na child and another person a refrigerator and a silver cup\na yellow pink white and green vase and two other vases\nA man taking a selfie with his smart phone.\na man with a bat swings at a baseball\nA woman standing between a motor bike and a striped wall over a river.\nA large clock mounted to the side of a building.\nA small child is lying in bed with a baby.\nA man is on a laptop at a table\nThe pitcher is starting to deliver a pitch on the mound.\nPeople lined up on the sidewalk with pizza boxes laying in the snow.\na bride and groom are cutting their wedding cake\na bunch of bananas are on a table\nShort rain as view from above either from over view mountain or air craft.\nSeven doughnuts on a wooden plate over a doughnut pan.\ntwo people riding horses on a city street\nA man this is putting a bowl inside of a microwave.\nA cat is lying on top of several shoes.\nA few friends are gathering for dinner in a restaurant.\nThis is a nasty bathroom located in an undisclosed area.\na group of people sitting close to each other all using cell phones\nA junk pile of broken porcelain toilets in front of a wall with graffiti on it.\nSeven suitcases, stacked on top of one another, in front of a booth.\nA large donkey standing in the middle of a grassy field.\nA red bus diving past a fountain in a city square.\nBald man in black and red shirt playing baseball.\nAn analog clock set in a class case.\nA LARGE AIRPLANE THAT HAS LANDED AT THE AIRPORT\nA sign attached to a light pole on a street.\nA man talks to a plane full of smiling people.\nSmall child signing a document next to two men.\nA small child holding a piece of broccoli up to their face.\nan iced  birthday cake with a number candle on a table with a pink tablecloth.\nThe kitchen with green oven atop white tiled floor.\nA airplane coming in for a landing with a full moon above it.\nthree men sitting in a row eating a sandwich\nA cute child is dressed up standing by a door.\nA table with plates containing an assortment of cold cuts, cheeses, and vegetables.\nA man flying a kite in a parking lot by a lake.\nA boy riding a skate board down a stair rail.\nA group of horses grazing in a green field.\nPens, scissors, markers and other assorted clerical tools.\nA man taking of photo of himself in a mirror with a cell phone.\nThe white devil slavemaster puts a bat in the young black girl's hands and trains her to attack Mexicans on sight.\nA train rolls down the track through rural tree lined scenery.\nA man dressed in red riding a horse through town\na plate of bread and a bowl of fruit\nTwo black and white horse standing next to each other with gears.\nA man holding one frisbee and throwing another.\nA small breed dog looks up while laying on a couch.\nA man standing in front of an elephant.\nsome people on skis go through the snow as people watch\nSeveral empty boats floating on the river on a cloudy day.\nThe view of a clock in the distance of a building.\nA motorcycle parked in an intersection with cops on motorcycles going past it.\nVarious foods are sitting on the large and small plate\nA person pouring batter into a donut maker.\nA row of horses tied up on a rope rail.\nA man kneeling over a laptop computer on a table.\nAn old fashioned styled kitchen has a microwave.\nA woman covers her face, as a kitchen, flowers, and a laptop computer are also visible.\nA gentleman in a suit is standing near a wall.\nA car is driving down the road near some road signs.\nA man in uniform standing next to another man wearing a suit.\nA dark room with a tv playing spongebob squarepants\nA bus is on its way to the station.\na person is standing on a skateboard outside\nA double decker red bus is driving down the snowy street with the headlights on.\nA person's hands are opening a laptop beside another person\na close up of a clock on a pole on a city street\nThe yellow fire hydrant is at the side of the road.\nPlate of food that includes chicken, beans and a pickle.\nA couple of people riding horses with Saint Patrick's attire on.\nA man is in the middle of swinging his bat\na boy is looking at a train made of candy\nTwo buses driving by people in a city.\nA street with vehicles, pedestrians and detour equipment.\na black and white photo of a building clock and people and trees\nSimple bed in room with pair of nightstands and lighting.\nA black dog standing in front of a door.\nThis is the outside of a building with chairs and benches present.\nA KITCHEN WITH A STOVE AND LAP TOP\nAn old style cook oven with multiple pull out compartment\nMan skateboarding on rail in front of a building.\na picture of some vegetable meal and a plate of what looks like chicken and a side bowl of rice and curry.\nMan with a courier bag on a mobile phone on a crowded street.\nCalico kitten lying on a backpack on a wood floor.\nA table set with pizza and a bottle of coke.\nA zebra that is standing in the grass.\nWoman with dark hair in a multicolored bathing suit is flying a kite.\nA small bathroom has a sloping roof with a window.\nA set of five train tracks in front of a graffiti covered wall.\nA herd of elephants walking across a grass covered field.\nA girl by the side of the road selling flowers.\nMan and woman in airport lobby saying goodbye.\nA decorative church has several rows of pews.\nDiners at a cafe overlooking a sandy beach.\nA crowd of people standing under a clock ina  train station.\nA big cute black dog in the air with a disc.\nA group of friends siting at a table enjoying pizza.\nFour cows eating grass on a sunny day.\nA woman getting food from a tray with fruit, cereal and juices.\nTwo people on horses ride through a field.\nOne large sheep and small sheep next to it in a dirt ground area with a stone wall structure next to them.\nA child with a backpack looking at a polar bear.\nA stuffed bear that is in a backpack.\nA red double decker bus is riding down the road.\nA man in a wet suit walks across a crowded beach on a sunny day.\nA table filled with a big bunch of assorted veggies.\nA barber with a big mustache trims a man's hair.\nA plant in a glass vase sitting on a window sill.\nthere are old cabinets in this kitchen along with a microwave\nFruit flavored donuts lined up in a glass fronted cabinet\nA sandwich ,pickles and cookies  are for lunch\nA person on a snowboard in the snow.\nA boy in yellow shirt playing a game with a Nintendo Wii controller.\nTwo goats on the road surrounded by trees.\nA man holding the reigns while riding a horse.\nA toilet sitting in a grass yard out side.\nSeveral giraffes wander around their enclosure at the zoo.\nA herd of sheep stand in a snowy field with a cloudy sky in the background.\nA view out a bus window of people riding bicycles.\nYoung girl acting silly in the waiting room.\nA table full of food and chair with no one there.\nA plate of ries and a drink is sitting neatly on the table.\nSome goats are looking up at the camera.\nThe skier is competing in the winter Olympics.\nthe ball is coming toward the batter and the catcher is ready\na close up of an animal with something over its head\nA basket filled with donuts covered in powdered sugar.\nA young boy in a red shirt flies a kit high in the sky while a girl in a t-shirt watches.\nA policeman roller boarding in the street with another man.\nA skiier is preparing to ski on a snowy hill.\nA young male laying on top of surf board.\nA woman making some food inside her kitchen.\nLots of luggage is lined up on the sidewalk of a busy city.\nThe kitchen is clean and ready to be used.\nA metal sculpture of two birds and two poppy seedpods.\nA male skateboarder skateboards on a wall in an enclosed area\na pizza with a bunch of tomatoes on it.\na woman getting ready to hit a tennis ball with her racket\nThe dog lays down to scratch his itch.\nA giant Amoco sign sitting above a gas station.\nA man in a fuzzy hat is talking on his cell phone.\nan image of a bowl of tomatoes and a flower\nA dog is standing on the sandy area.\nA train track scene with one train on the tracks.\nSome fruits and vegetables and a ghost are in an orange container.\nA young girl is sitting on her bed, talking on the phone, with a laptop on her knees.\nMan in a uniform talking on a phone at a work desk.\nSmall  stuffed toy rests on leg of teddy bear.\nA woman standing next to a red and white truck.\nthe toilet is white and the cabinets are brown in this bathroom\nA person holding a phone to their ear and working on a computer.\nA man holding a tennis racquet on a tennis court.\nA herd of sheep grazing on a hill next to the ocean.\nA pizza with meat, cheese and tomato sauce.\nA train traveling down tracks next to a rural country side.\nA pitchers mit with a ball inside laying on some bleachers.\nA group of young men sitting on steps in front of the ocean.\nAirport security drives past airplane on the runway\nAn umbrella laying on the ground next to benches.\nA white counter top topped with a ripe banana and three coasters.\nan edited picture of the same boy doing several different tricks on a skateboard\nA large clock with a red second hand is attached to a modern building.\nTruck on an urban road hauling a lot of corn.\nA boy and girl play paddle ball in the grass\nAn room that has been broken into smaller work areas by a divider.\nA modern jet liner taking off at the airport\nA laptop computer on a shelf above a stove.\nA young child riding on the back of a brown horse.\nA train engine on the tracks with a side rail beside it.\nLooking up at a stone and brick clock tower\nA woman is taking a picture of herself in a bathroom mirror.\nThis decorated cake has a horse with a fence on the top.\nA tennis player is making an effort during a match.\nSome small boys standing near a floor drain on pink tiles.\nA boy riding a skateboard in the street.\nA traffic light is red for people on horses.\nA baseball player slides his body into home base.\nA taxi van in the street with pedestrians, by the corner of a building.\nA man partaking in a water sport in the ocean.\nOne surfer riding with the wave in the ocean, and another surfer on his stomach riding into the wave.\nA young girl is playing Wii boxing by herselg.\nA person wearing combat boots sitting on a kitchen counter.\nTwo women with open umbrellas walking down a street.\nLuggage including a trunk and a guitar stacked up by a wall\nGroup of zebras standing in a fenced in area with shade.\nThe man wearing  the animal puppet makes it cut the boy's birthday cake.\nA bathroom single sink vanity with a large mirror.\nThis is a cityscape of a skyscraper in front of a large mountain.\nGroup of skiers posing for photo on foggy day.\nLarge green truck parked at the outside of stadium with group of people walking past\nThe cutting area of a sewing room containing scraps of fabric\nMany cats lounging on a couch in front of a window.\nThe truck sporting graffiti  is parked on the street.\na blue and yellow bird is sitting on a branch\nA bird sitting on a branch next to some berries.\nA beer can and mug are shown with a rib plate.\nA man stands on his skis on a flat patch of snow near a fence.\nA stop sign with grey paint over top of it.\nA bus being loaded with bags of luggage parked in front of a building.\nA person that is in the grass with a kite.\nPeople going up a snowy hill on skis.\nA train is blowing steam as it stops at a train station.\nA pan on top of a stove with pizza dough and tomato sauce.\nA concrete bench is in front of the water.\nA person on a snow board performing a trick on a ledge.\nA skier has fallen down in the very deep snow.\ntwo street signs with one pointing towards the right next to a building.\nA man guides a dog to herd sheep.\nA fire hydrant in front of bushes with a glass face on top of it.\nRed bus coming down a street next to a red cab.\nA women wearing a tennis outfit, swinging at a tennis ball.\nA painted fruit bowl with different fruits in it\nTwo giraffes are eating leaves from tree branches.\nA man riding skis down the side of a snow covered ski slope.\nFruit and vegetables are cut up and placed in small containers.\nA computer desk with a computer and three monitors and a black chair sits in between them.\nA man is swinging a baseball bat at a game\ntwo zebras standing next to a tree\na woman wearing a helmet and holding onto a baseball bat\nfour poster bed and bedroom furniture in a bedroom\ntwo big chairs sitting close to a fireplace in a living room\nA collared dog standing between two potted plants\nA cook dishes a stew from a pan onto a plate.\nA couple of guys that are standing in front of a plane.\nA hipster emo woman sitting on luggage in the middle of a road.\nA person standing in the snow with a snowboard.\nAn all glass building showing the reflection of another building.\nA meal containing soda, salad pizza and rice on a table.\nDisplay of about 100 vintage wall clocks.\nA beautiful young lady hitting a tennis ball with a racquet.\nA bathroom area with three sinks and a towel dispenser.\nA old picture of a building with many people out front\nA microwave, bread and rice are on this counter\nTwo buses wait at a red light along a city street.\nA refrigerator with its door open and contents showing\nThe table is set with 4 boxes of different, delectable  donuts\nThe cat leaves paw prints as he seat on the car.\nA woman twirling a floral print parasol umbrella.\nFlowers in a vase placed on a table.\nA woman in grey shirt standing in room next to a dresser.\nA room filled with dining tables and chairs.\nA man on skis heading down the slope\nView of bushes next to traffic lights and moving cars.\nA table full of different types of donuts.\nTwo dogs sitting in the front of a car.\nA man sitting at a desk with a cat on his lap.\nTwo people sitting on a bench in front of a statue.\na living room with a tv a book shelf and plants\nTwo guys on a mechanical lift next to a building .\nA large bird perches on the seat of a bicycle.\nA stuffed animal that is laying on a carpet.\nthe train has lots of cars on top of it\nA Twins baseball player holding his glove walking on the field.\ntrees in fall colors and a stop sign to the right.\nA group of people on a side street with umbrellas and awnings.\nA train door opened with passengers sitting inside.\nTwo elephants outside, one being fed, one standing.\nWindow display a different pastries on a city street.\nA composite image of an office desk, cars and buildings.\nA family of four is posing for the camera near some flying kites.\nKids sitting at a table eating food.\na white plate holding onto a sandwich and a salad\nA man in a vegetable shop holding a green vegetable.\nSeveral grassy tennis courts with five tennis players.\nA woman holds her tennis racket ready to hit the ball.\nTwo children sitting at a table that has two cakes on it.\nA man wearing a tie holds his chin as he reads a document.\nservice man in uniform throwing a ball on a baseball field\nA white trash can on a beach under two palm trees.\nINFRARED PICTURE DEPICTING THE SHAPE OF A HUMAN BEING\nYoung skateboarder on pavement in rural populated setting.\nA little girl wearing glasses taking a selfie.\nA clown face made of yellow squash for the eyebrows, cucumber slices for eyes, a cherry tomato nose and a carrot smile.\nThe airplane is about ready to take off on the runway.\nA man riding on  a horse drawn carriage next to a red brick walkway.\nA red car parked on the street in front of a parking meter.\nA man polishing a horses' horse shoe while another man holds the horse.\nA woman holds a tennis raquet during a match.\nPizza on a metal plate sitting on table near phone.\nA girl holds her arms out to a Frisbee while a boy kicks his leg.\nA man wearing a helmet on a blue motorcycle.\nA very dimly lit kitchen with a nice window.\nA motorcycle is parked in front of two cars.\nA man wearing a black suit on talking into a microphone.\nTwo boats floating in the ocean one has a crane on top of it.\nA picture of a lot of people in the snow.\nmeat with onions and sauce on a plate next to potatoes and broccoli\nTwo children plays with a kite in the field\na piece of bread with some vegetables and met on top of it\nA man and woman getting married on the beach.\nA guy that is using his cell phone while in a park.\nA knife and fork sit on a plate with vegetable pizza.\nThe woman in red and black is skiing down the slope.\nA variety of sheep and goats drinking from a pond and eating.\nsome baseball players playing baseball and people watching\ntwo cup like things with a bird and a wolf painted on them\nA young woman sits at a picnic table with her laptop.\nA herd of elephants are walking among the desert.\nA baseball player is batting with a catcher and umpire behind him.\nSee picture of a lot of bicycles in the street.\nAn old suitcase on the sidewalk next to the road.\nA woman in a blue dress with no shoes, seated with her legs crossed on a chair in the middle of a room.\nA young girl with a nice booty standing in a living room.\nA boy in a jacket and tie looks at the camera.\nAn older couple with helmets preparing to go on a motorcycle ride.\nBoy wearing a helmet riding a skateboard down a street.\nA young adult looks at a computer screen while doing homework.\nAn adult and a baby giraffe stand gazing over a grassland.\nStuffed animals sitting on a counter with cups in front of them.\nThree square slices of food and sauce at an oriental restaurant\nA flock of birds looking for food in a field.\nAn airplane is parked at a terminal in an airport while luggage trucks unload the aircraft.\nA delightful pink frosted doughnut and a cup of coffee.\nThe purple flower with a yellow center is near a car air condition vent.\nAn orange and gray bus parked next to a sidewalk.\nA group of adults standing by a table with wine glasses on it\npeople with their head covered on a motorbike\nA batter, catcher, and umpire are poised for a baseball.\nA birthday cake is shaped like a teddy bear.\nA young child smiling while sitting in the grass.\na bird in the branches of a tree\nA gang of bikers riding motorcycles down a road.\nA tennis player swinging his racket with both hands to return the ball.\nPlate covered with french fries and opened hot dog sandwich\nSheep are locked up at a farm and feeding\npeople in the ocean standing on water boards and wind surfing\nA bilingual directional sign to the Hyatt on the Bund.\nEdible food items displayed on table with receipts.\nTwo cows grazing in a pasture by a stream.\nA line of baggage in a lobby with several people.\nA group of men playing a game with Nintendo Wii controllers.\n2 towers stand connected, a large clock in between them.\nA man posing for the camera with a red tie on.\nA man riding a horse over a red and white striped pole.\nThe zoo visitor is looking at the giraffes.\nA woman that is standing in the rain with an umbrella.\nA woman seated looking at her lap top\nThree different types of clocks propped against a wall.\nA bookshelf with books and other knick knacks\nA white plate topped with two slices of pizza.\nThree giraffes stand in the grass by a dirt pile.\nA young man is holding a giant sandwich in one hand.\ntwo zebras eating grass in a very big field.\nA small group of Zebras drink water from a pond.\nan image of a snow piled on the ski slope\nA fighter jet with missiles flies through the air.\nA man leaning on a building talking on a cell phone.\nFour individuals on skis headed in the same direction.\nThe baked potato has sour cream and lots of other condiments on it.\nA man flipping a skateboard on top of asphalt.\nA man holding a racquet hits a tennis ball.\nA close up of a toy squid riding a small bicycle.\nFour players posing for a picture on a tennis court.\nA bike sitting on a sidewalk in front of a bus.\nA public restroom with a urinal installed in the floor.\na person stretching to hit a tennis ball\nA street sign that says C have you paid?\nThe zebra stands underneath the branches of a tree.\nTwo people riding on the back of an elephant through a lake.\nWoman with life jacket and dog in rowboat near shoreline.\nA living room with wood flooring and furniture.\nTwo Dell mouses that go with a computer.\nA person standing on a beach and flying a kite.\nThe woman is learning how to use her new ski skates.\nA man and woman sitting a a table with pizza in boxes, in a room with a piano.\nA yellow table sitting on top of a hardwood floor with boxes on it.\na herd of cows walks down a city street\na person riding a snow board on a snowy slope\nMan with large orange and black kite in park area.\nA young boy standing on the beach with a colorful kite.\nA picture of a naked women who is using a laptop.\nA table is set colorfully with a pepperoni pizza.\nA black parking meter, that is next to a bunch of cars.\nMan standing on a tennis court holding a racket.\nTwo people in an open field are playing with a frisbee.\nTwo people eating slices of pizza while riding bicycles on a city sidewalk.\nA woman using a smart phone while standing next to a building.\nA jet in the air flying in a dark sky.\nTwo people sitting on the couch with a guitar in front of them.\nsome black and white cows in a green and yellow field\nA man holding two cell phones in his hands.\nPassengers getting ready to board a small aircraft.\nA guy riding his skateboard in a small town street on a chilly day.\na man and a woman walking across the street\nA rusty bench is near the steps outside.\nSomeone is skiing in the cold white snow.\nA red light that is on a pole.\nAn old elephant with a long trunk at the zoo\nA man plays in the water at the beach.\nA skier going downhill with snow flying up.\na zebra has its head down in a field\nYellow lounge chairs and an umbrella are reflected in a pool.\nA woman holds a tennis racket in one hand and a tennis ball in the other.\na public transit bus in a city street\na train depot with several trains stationed in it\nAn all way stop sign at the intersection of two streets.\nA view of some alcohol with a glass filled.\nA great shot of a mountain near the ocean.\nA man sitting in an office chair looking at his cell phone.\nA BIG BATH RUBE IS IN A CLEAN SPACE\nMen in suits with umbrellas walking through open area.\nA bird with food on its beak is sitting on a branch that holds a bitten on apple.\nA traffic light and street sign in a large city.\nA bike is inside leaning on a white shelf.\nA graduate wearing a blue cap and gown holding a cell phone and papers.\nAn umpire is catching a baseball that was missed by the batter.\nA large nicely set dining table displaying a cake and other pastries.\nThree people posing for a picture inside of a grocery store.\nThe contents of a pantry in a house.\nTwo women are in a kitchen baking together.\nA  city street that has police walking along with people, and some are carrying umbrellas.\na bird sits on a wheel next to some plants\nA messy kitchen that has the drawers open.\nGiraffe leaning over to nibble buds off a green bush.\na boy following a man holding a surfboard in the water\nA red baseball player sliding into a plate.\nA man and a woman cutting up a big sheet cake.\nTraffic light and street light for Belmont Avenue\nA person doing ski tricks on the slopes at night\nTwo giraffes leaning heads down, one with head in feeding trough\na man riding a bike with a cart attached to the front of it\nA little girl laying down holding a bear and a kitten.\nan image of a cat sitting on top of the desk area\nThe dog is in the kitchen sink and pizza is on the counter.\nA close up view of two men in a large assembly hall.\nA person jumping in the air on a skateboard.\nA teddy bear wearing a blue sweatshirt sitting on a bed.\nAn outdoor bench sits empty and covered in water.\nThree cows grazing on a hill overlooking a harbor.\nAn adorable cat laying back on a chair while it sleeps.\nA man sanding on a walkway covered in a long green jacket.\nA bowl of cherries are shown with a bowl of oranges.\nA man near a baby elephant by the water.\nA building with a clock that is on top of it.\nA white toilet and white pedestal sink sit in the bathroom with newly laid tile.\nA person spraying water from a hose, onto an umbrella being held by a child.\nAn egg, cheese and sausage biscuit sandwich on a plate\nA full plate full of delicious food sets on top of the table.\nA woman holding an umbrella while standing on top of a wooden deck.\nA girl holding a tennis racket in front of her face\nA couple of skiers that are at the end of the run.\nThree people sitting on their motorcycles near a building.\nA decorative propeller plane flying in front of a wooded area.\nA small Coast Guard boat meeting a personal boat on the water.\nA person swings a bat with a helmet on.\nA man on a trailer by trees with a dog.\nA man playing a game of frisbee with another man as they gaze into each others eyes with man lust.\nThe plane is ready to board passengers for their flight.\nA view into a living room containing several pieces of furniture.\nA guy riding his skateboard down a paved path.\na group of people that are sitting in some chairs\nOn a snowy area, a man is holding a young child with skis near several people, sleds, and mountains.\nA man in blue jeans, has stepped on a banana peal.\nTwo zebra grazing on an open ground full of grass and trees.\nA airplane that is flying in the sky.\nA desktop computer sitting on a wooden desk.\ntwo man sit at a table in a restaurant\nA group of people are outdoors playing with Frisbees..\nThere is someone at a table cutting ie ed of paper\nA cat plays on a laptop while watching a video.\nA teddy bear is seen looking out the window.\nA cat sits behind a person on a green revolving chair.\na person flying a kite on a beach with a person near by\nA view of a bunch of seagulls flying around the beach,\na street sign sitting between two benches sitting by a sidewalk\nA man on a motorcycle drives down the street\nA cat standing on top of a car trunk next to a parked motorcycle.\na person taking a photo in a mirror\nChocolate cupcake with a monsters face frosted on top.\nA white Nintendo Wii game controller sitting on top of  a table.\nA herd of cattle and zebra standing next to each other on a  field.\nThere is a woman riding on top of the elephant.\nAn empty bed in a bedroom in front of a small TV.\nA woman is eating a personal pizza with a friend.\nA girl flying a kite in the sky with her hands.\nThree stuffed teddy bears dressed in period clothing.\nA school bus by a crane and truck with a mountain view in the background.\nA stone oven with many kettle pots, baskets and bowls.\nA microwave has a container of food by it.\nThe dog is wearing a red scarf and is being petted by the woman with red shoes.\nSome people on snow boards high up in the air.\nA large airplane is sitting on the runway.\nPigeons gather atop the rails on the lighthouse.\nAn empty park with mature trees and a backless bench.\nA person is in the air skiing in the snow.\nA dressed up teddy bear is sitting in a corner.\nA young man riding on the back of a black motor scooter.\nA bowl contains a variety of chopped vegetables.\nOrange cat sleeping on a small laptop computer.\nOne horse trails behind another during a race.\nA cat is laying on a lap top on a desk.\nA man and woman sitting on a train using laptops.\nThe girl stands behind the line and waits for the ball.\nThe skateboarder is checking his technique in the mirror.\nThree teaspoons of instant coffee poised over a mug.\nA woman pinning a flower to a man's suit.\nA motorcycle parked in the middle of a crosswalk on a busy street.\nA display of vegetables is set up in front of a pickup.\nAn airplane at an airport at a jetway.\nA person standing next to  a box on the ground\nA person about to throw a Frisbee in the park.\na small plane flying by on a cloudy day\nBare feet atop a skateboard on a concrete surface.\nA surfer is riding a wave in the ocean.\nan image of a man that is by a bench on the phone\nA zebra laying down in the grass resting for a while.\nAn adult giraffe places its head on a young giraffe.\nA man on a skateboard passing a bus while posing for the camera.\nA baseball player swing the bat at a baseball.\nGraffiti has become and famous part of the art industry\nA woman in a bus with cars ahead\nA cat hangs out in a bathroom sink by a bottle of Method soap.\nA man in white and green jersey looking at a cellphone.\nZebras standing in the shade of a fenced off enclosure.\nA man jumping a grey horse over three rails.\nOverly ripened bananas are being skinned into a pot.\nTwo cars are parked across the street from a sidewalk bench.\nA beach with many empty blue and white chairs with umbrellas\nThe man is enjoying a snack at the park.\nan image of a man in the water waves  with a paddle\nA birthday party with a cake is being held for a dog.\nA batter hits a baseball with his baseball bat.\nA woman stands beside a baby in a high chair a table is set with a birthday cake and champagne.\nA woman sitting in a chair laughing while another person holds a cellphone up from behind an overturned table.\nA large bed on a wooden frame in a bedroom.\na couple sitting outdoors with some wine glasses\nA person with an umbrella is walking down a city street.\nA woman swinging a racket at a tennis ball.\nA counter is full of platters with different pizzas.\nA CITY BUS IS ON THE STREET COMING THRU\nTow cakes resembling the engine of a train.\nA dog is in the air catching a frisbee.\nA woman having a meal in a restaurant and using a cell phone.\nA long white plane resting on a run way.\nA down the counter view of a very messy kitchen area\nA baseball player holding a bat near a ball.\na clock on a white tower in front of a clear sky\nA man wearing a neck tie with a golden clock on it.\nCompetitors on skis are racing around the course.\nAerial view of a group of people flying heart shaped kites\na box with some big doughnuts inside of it\nTwo adult elephants interacting behind some trees and bushes.\nA food processor with a chopped mixture in a plastic bowl.\nHot dogs lay on an orange plate while hot dog buns are on a grill.\nPeople are riding down a street on skateboards.\nSailing boat tied up to a deck chair on the beach\nTwo large trucks parked next two each other next to the building\nA young man playing with video game controllers.\nA Boston Red Sox pitcher stands, holding the ball in his glove at his waist, prepares to pitch to an Oakland A's batter.\nA baseball player swings at a pitch during a game.\nA young man walks the beach with a surfboard under his arm.\nPeople assembling teddy bears on a table\nA sandwich on a plate in front of condiments.\na person in a field with a dog\na green military truck sitting in a warehouse\nA number of rose flower sticks in a bundle\nThere is a old tower with a clock in the center\nA luxurious living room with chandelier, bar, and couches\nA large elephant is shown walking through the terrain.\nthis is a man holding a kite in the air\nA person standing under an umbrella with other people and lights in the background.\na peacock on a wooden table looking for scraps\nA refrigerator that has items on the outside.\nA group of elephants on grassy area next to rock and trees.\nA man stands on a surf board and rides a wave.\na small boat on a small body of water\nMany sheep stand in a large grassy field\nA man and a woman are standing by the street\nA female tennis player raises her racket to hit the ball on a tennis court.\nthere is a woman with glasses eating a donut\na baby cow standing by its self in the grass\nA man riding a snowboard down a snow covered slope.\nA food bowl with vegetable and chicken salad.\nA lunch of salad, fries, a sandwich and a drink.\nAn elephant in an enclosure approaching a body of water.\nA bunch of birds sitting in a bread basket.\nWine and desserts are served on a table.\nThe large kitchen has an island in the middle of it.\nA tall giraffe standing next to a tree filled forest.\na picture of a bunch of train cars colored red.\nA man is standing partially inside an open refrigerator.\nThe dog is at the dog wanting to get into the house.\nWhite dog playing in grassy field with red disc.\nA pile of fruit sits ina  clean bowl\na laptop computer sitting on top of a homemade machine with wheels\nA black cat sits on a bench beside a wooden letter K.\nSeveral boats at a pier in a bay ringed by mountains.\nThree giraffe standing near trees in a grass field that appears to be a zoo.\nThe happy couple cutting their wedding cake together\nA woman is holding a tennis racket on a court\nTwo men sitting on steps and selling goods in the fog.\nthe small cat is sitting inside a suitcase\nThe little boy is holding an umbrella over his head.\nA Skyteam airplane taxiing on a snowy runway.\nA small aircraft is beginning to lift up off of the tarmac.\nA herd of deer in a field down a hill from a house.\nTwins are smiling with the same attire on.\nbedroom with pink patterned headboard and matching curtains\nA road is winding in the distance in-between trees.\nA boy in a red hat playing with tee ball set.\nA motorcycle is parked in front of a cafe.\nSkateboarder riding in a concrete with a large cross in the middle.\nA gray teddy bear sits on a doily near a card.\nA group of people on small bikes on a street.\nA girl shows a banana to the camera.\nA table set with wine glasses and plates.\nTagged cows are standing in an open field\nA bird stretches his wings at the beach.\nLittle boy with toothy grin talking on a cell phone\nCacti can be seen in a large clay pot.\nBlack and white horses are standing next to each other.\nBoys are playing Frisbee in a yard.\nSomeone's living room contains a bookshelf with lots of books.\nSome young boys are playing with video games.\nA tennis player reaching up to hit a tennis ball.\nA group of people in a room with remotes.\nA man sharing a hot dog with a black and white dog.\na fence that has a bunch of surfboards on it\nA man and woman that are standing on ski's in the snow.\nA bird sits on a branch in a tree.\nThere are some vegetables, herbs, and other seasonings and a knife on a wooden cutting board.\nA stone tower is has a clock on the side.\nA photo taken from an airplane looking down at the mountains.\nA man surfing waves on his surf board\nA young zebra is nursing from it's mother on a grassy plot near some shrubbery and a mountain in the distance.\nA man shows the screen of his phone to the camera\nThe stuffed bear is next to a toy doll.\nSeveral horses grazing in the grass near some hills.\nBaseball player getting ready to catch ball as many fans enthusiastically watch.\nA bathroom that is done in checkered walls and flooring.\nA group of people that are standing under umbrellas.\na close up of a cat laying on a laptop\nAn air plane is flying over the roller coaster.\na young black man lying down on a bench outside resting\na close up ofa clock on top of a shelf\npeople holding a skating pole on the snow\nTwo women make faces as they stand at bathroom sinks.\nA man lying in bed with a cat next to him.\nA guy in a blue shirt is surfing.\na person riding a surf board with a parachute\nA LOT OF PEOPLE ARE ON THE BOARD WALK\nAn antique car is parked on a city street next to two others.\nA group of people loading the back of a pickup truck.\nTwo  men in business suits shake hands.\nA brown horse standing in the middle of a flower filled field.\nThat seems like a very small sink for this kitchen.\nA group of people skiing down a snow covered slope.\nA laptop computer sitting on top of a wooden chest.\nA dog and a sheep separated by by a fence.\nSeveral different kinds of vegetables on a counter.\na woman is sitting outside with a blue umbrella\nThe fridge is full of food and goodies\nA man playing Wii while others watch\nAn older man stands behind a younger woman sitting on a park bench.\nA pair of scissors with orange string on a spool leading to the scissors.\nTwo guys on laying on surfboards riding a wave.\nLarge Elephants and small Elephants are walking in a line.\nPlates assembled near each other with silverware on right.\na clock on the wall saying it is 241 in the afternoon\nThe leg of a pair of glasses is stuck inside a clear vase.\na white woman in a white tennis outfit playing tennis\nA person with skis down a mountain in blue pants and black jacket\nA salad with side vegetables and dressing are positioned on a wooden tray.\nA man is on the beach with a brown horse.\nA flat-bread pizza with melted cheese, and a few vegetables sits on a black tray on a wooden table.\nA bathroom has blue walls and a large mirror.\nThere are many chefs here in this kitchen cooking\nA man is wiping down the elephant in the water\na tennis court that has a man on it\nTwo men with tennis rackets with one racket holding balls.\nA cat lies asleep in the middle of a mattress.\nA collection of yellow fire hydrants on the street.\ntwo male baseball players in uniform with long hair\nA pair of scissors, a crochet hook and a sewing needle are ready to craft.\nPeople are walking through a subway terminal.\nA baby girl brushing her teeth with a pink tooth brush.\nThe person is holding a pastry in their hand\nsome big and little bears walking across the street\nA young boy wearing camouflage sitting in a  doorway.\nA warning sign for high water is on the side of the road.\nA bus with two levels and a hostess ad is traveling on a street.\nA grey cat with green eyes and a pensive look on its face.\nA broken cell phone laying on carpeted ground.\nA lot of horses grouped together walking down a road .\nA home with rooms under construction of them\nA zebra standing next to a tree in a field.\nFour photographs of a woman in denim shirt next to white plate of food.\nanimals grazing on a straw field bordered by water and mountains.\na baseball player throwing a ball with a glove\nA man wearing a hat riding his skateboard in a skate park.\nVarious signs written in either Chinese or Japanese and also a sign of a man walking across a street.\nA man standing over a griddle in a park.\nA galley kitchen with white cabinets and fridge and a wooden island feature.\nA man standing under a ball on top of a grass covered field.\nA young man with a surfboard is surfing in the water.\nA person carrying a surf board on the beach.\nLots of toasters sit in the floor near an oven.\nBoats on the water with mountains in the background.\nA man riding a surfboard on a wave in the ocean.\nAn old, dirty toilet in a small bathroom that is falling down.\nA fat orange kitty sitting on a black chair\na man standing in front of some tall trees\na close up of a motorcycle with parts missing\nA close up of a bicycle  parked on a train platform.\nA sandwich on a toasted roll sits atop a green leafy salad with tomatoes.\nDog in the air to catch a frisbee while a man lays on the ground.\nA giraffe is walking along a paved walkway.\nAn elephant is standing on a cloudy day.\nA man with a bicycle in a train station walks past it as a train approaches\nA bicycle leaning against a street pole in the snow\nA little boy holding a bat over his shoulder\nTwo beautiful women riding horses in the ocean in bikinis.\nA bathroom sink with a large walk in shower.\nKitchen table ready for party with beverage cups, citrus fruit, and alcohol bottles.\nA child takes berries from a table full of fresh garden produce.\nA bathroom with blue walls and a pink tub, toilet, and sink.\nSoup and a sandwich on a metallic plate.\nA group of people hanging around holding umbrellas.\nA desk with a laptop and jars and candles.\nOnlookers watch an elephant stop for a drink of water\nA train car moving down the track at a crossing.\nA blue city bus putting over at a bus stop.\nTwo individuals posing with funny faces, one holding up a wine glass.\nA vase with an elephant head holds a bouquet of flowers.\nan image of two benches in the park\nSeveral purple flowers are shown growing with bamboo in the pot.\nA cat looks down from on top of a dresser.\nA baseball game in progress with the picture about throw the ball.\nAn old airplane flying above a large city.\nCooked broccoli in serving dish sitting on cloth hot pad.\nA person that is laying on a bed with a bag over his head.\nA bathroom with a phone mounted next to a toilet.\nA train travelling above ground near bushes and trees.\nA clock that is on the side of a wall.\nClowns ride an antique firetruck down the road in a parade.\nA woman stands next to a parked city bus.\nMan in business suit skiing in the snow\nSix snowboards are leaning against a red wall.\na baseball player getting ready to swing a baseball bat\nA woman on a surfboard surfing a wave on a beach.\nSmall child in white shirt holding a white controller.\nA red and white plane in on display in a field.\ntwo woman playing tennis on a court in front of a crowd of people\nbanana slices sit on top of toast on a white plate\nA man is sitting in a boat on a river and drinking a bottle of water.\na group of sheep standing around while eating some grass\nA full view of a nice kitchen and counters.\nA man is standing among pink and zebra feathers and a zebra.\nRemains of various deserts are situated on a table.\nA red fire hydrant standing across the street from two silver vehicles.\na man doing a skateboard trick on top of pool\nA boy in white shirt flying a kite on beach.\nA man is eating food with a pair of chopsticks.\nThis is a dirty urinal in a bathroom.\nCarrots are laying on a cutting board with a knife.\nA woman that is next to a surfboard with a dog.\na photo on mountains skating wearing very warm clothes\nA large bedroom with big windows and a patio.\nthere is a black and white dog standing in the bath tub\nTwo little girls playing with a kitchen set.\nA young man riding a skateboard down a street.\nA man wearing a white shirt, plaid tie, a grey hat and glasses smiling with his eyes closed.\nA man leaning up against a boat that is almost finished being built.\nA group of professionals at a business meeting.\nThree giraffes eating leaves off cut tree tops.\nA pizza covered in cheese and toppings on a plate.\nA tall clock tower flanked by two trees\nA young girl tasting food from her bowl\nSunlight streams into the living room through two windows.\nA cluster of small boats in shallow water.\nA crowd watches a baseball game being played.\nA dog that is sitting down in a backseat.\nA pair of blue scissors sitting on top of a paper and a container of note cards.\nA group of planes are flying through the air with smoke coming from their tails.\nA man eating a slice of pizza next to food stands.\nA fan is featured in a yellow room.\nA fenced in area off a sidewalk with posted signs.\nA white plate topped with salad and onions.\nA plate that has food on it with a glass next to it.\nA woman holding a blue frisbee over the top of her head.\nA man cutting a cake with a knife.\nView pointing upward of a skyline in a city\nThere are military people serving others hot food\nWhite bowl with tomatoes and greens on counter top.\nA bus with a few bikes on the front\nA green traffic light  and telephone wires\nThis is the sign for the Bart ba building.\nA bunch of birds flying over some waves.\nHandmade vases, all the same size but all different colors.\nA woman reaches down to pick up a video game control.\nDog laying on a green sofa in a living room of an apartment.\nThere is a man swinging a tennis racket.\na man with a camera is filming some baseball players\nThe little kid is flying a kite on the beach.\nI am unable to see the image above.\nA man in a suit and tie standing with a cellphone to his ear.\nA black keyboard is hook to a cell phone on a table.\nBlack and white of two adult zebras from shoulders up playing.\nA man hitting the ball during a tennis match\nA man standing in a park looking at trees.\nan old black and white photo of a man near a plane\nAn intersection with traffic lights and lots of traffic.\nSmall floor model refrigerator, so new it still has its manufacturer's sticker.\nA crumby chocolate dessert on a plate with a large knife.\nA dog on standing on a surfboard in the back of a truck.\nA delicious plate of churro with chocolate sauce.\na toilet with a black lid and the tank in the air\nA herd of sheep grazing on a lush green hillside.\nA black train engine on tracks next to buildings.\na person riding a skate board on a street\nTwo zebras are staying away from the sun as long as they can\nA stop sign with graffiti about the Red Sox\na man handing an elephant a stick in an enclosure at a zoo\na close up of a cat on a window sil\nA man sits on a boat cleaning a fish.\nA group of zebras and other animals grazing in a field with a rainbow in the background.\nA plate of green salad and pieces of tomato.\nA couple of cats laying on top of a brown chair.\nLittle girl walking down a road holding an umbrella.\nA wooden desk with a laptop sitting on it.\nA scooter parked in front of the door of a stone building.\nA giraffe is coming up close to people\nthere is a an standing on top of a mountain\nA kitchen with and island and several counters in it.\nA person walks on a bridge with a kite.\nThree skiers pose in the snow in front of barren trees.\nA table topped with two bowls filled with fruits.\nA very large pizza covered in cheese and toppings.\nA woman and man riding on the back of an elephant along a river.\nA motorcycle racer leans into a turn during a race.\na zebra is standing in its pen and some green plants and grass\na white black and brown cat on a table\na big bathroom with a sink, toilet and bath tub in it\nThe man in a suit stands next to a woman in a pink dress.\nWooly goat stands near gate with others on the other side.\nA book shelf with a large clock on top of it.\nA man walking his dog in the park.\na person holding an open umbrella in some bushes\nA little girl gets help brushing her teeth.\nA motor cycle procession down a wet street.\nSnow boarder sliding down the hill after falling in the snow\nThere is chicken, couscous and vegetables on the plate.\nA man holding his tennis racquet on a tennis court\nA zebra and smaller brown animal are running in the grass.\nA dog crossing a pavement path near motorcycles.\nThe man in the yellow checkered hat is flying a kite.\nA couple of men riding horses down a street with tall buildings.\nBoy attempts to hit a baseball with his bat.\nA BLACK AND WHITE PICTURE OF A MAN SITTING LOOK\nA person in the water being pulled by a kite.\ntwo very tall and white storage towers in a room\nLooking past a snowboard in the snow to a city beyond\nAn Alaska airplane is reaching up to a greater height.\nA man swinging a bat as he plays in a baseball game.\nA child wearing a red helmet holding a skateboard.\nA man having fun in the rolling ocean waves.\na person that is  on some dirt on a baseball field\nA plate several cookies and a small sign on it.\nA partial view of a formal living room.\na big window that has some birds out front\nA little girl is eating a hot dog and riding in a shopping cart.\nA grocery store filled with lots of fresh produce.\nA guy blowing on a hot piece of pizza.\nTwo children reading while lying in their bed\nA horse and buddy come down the side of a road.\nA photo of someone's meal at a restaurant.\nthere is one orange laying among five bananas\nA woman putting icing on a homemade cake.\nA couple of people at a counter near plates of food.\nA coupe of road signs near a downtown area or highway.\nA boy walks along the beach carrying his surfboard.\nPeople and buses are sitting still on a city street.\nA young man holding a basketball on top of a court.\na couple of beds  that are in one room\nA dog sleeping on the floor in the corner, a man looking down at him.\nA large group of men are dressed like Santa.\nA couple of people standing on top of a beach with surfboards.\nGiraffes and babies are in their habitat in the grass.\nA blurry image seen through a rainy window of a person holding a light blue umbrella.\nA toilet and a trash can in a room.\nTwo birds sit on the back of a bench made of logs.\nA very large bear sauntering in a zoo type environment\nA teddy bear that has been buried in the sand.\nA blue and cream tiled bathroom with a stand up shower\nThis table has three kinds of donuts on it.\na horse pulling a carriage down the road\na man wearing a striped tie holding a microphone\nA car driving down the street, some people are watching it.\nA man standing on a  tennis court holding a racquet.\nThe man with the umbrella is looking up.\nA door is opened to the inside of a bathroom.\nA person on a surfboard rides a wave.\nAn orange and white cat is sitting in an easy chair.\nA man in chain mail checking his cell phone.\na person is skiing down a snowy hill\nA refrigerator with a microwave on top of it.\nA green and blue fire hydrant sitting on bricks on the side of the road.\nA man stands and airs up his bike.\na plate with a small dessert and some fruit\nA table topped with vegetables and a pitcher.\nA locomotive train on a set of railroad tracks, with tanker cars attached behind it.\nThe woman in the kitchen is tending to her food.\na table that has some glasses on it\nA white basket filled with ripe and unripe bananas.\nA fireplace mantle has an ornate clock sitting on it in front of a large mirror near a teddy bear.\nA small round clock atop an ornate old building.\nA pile of luggage, boxes, towels and other items on a carpeted floor\nSmall airplanes are parked on a grassy field.\nTwo bear cubs are playing together in water\nA bunch of luggage is on a car in a bathroom stall.\na hot dog covered with some chili, mustard, adn ketchup\ntwo pans of dinner rolls baking in a large oven\nSix sheep standing in the grass beside a house.\nA calendar with some apples and oranges and pears in it.\na group of people that are getting out of a boat\nSome very big commercial planes over the water.\nThe man is riding a bicycle next to a train.\nA brown and white horse standing in front of a red wall.\nA field of wooden structures in front of a mountain.\nHorses peek through the windows of a small utilitarian horse barn.\nAn airplane jet flying through the air against a blue sky.\nOn this table there is bowl containing a bottle and glass vase containing rocks and leaves.\nA dug out filled with baseball players next to baseball equipment.\nA man taking a photo of an elephant as the elephant stands inside an enclosure.\nA dome shaped cake that has lit letter shaped candles on it, and people in the background.\nA young child brushing his teeth in the bathroom.\nMan playing Wii video game with group in background on couch\nA man taking a close up picture of a motorcycle.\nA baseball game is being played in a city park.\nA small water landing plane is on a lake near a neighborhood\nA beautiful young woman holding a tennis racquet on a tennis court.\nTwo kid touching food that is on a kitchen counter.\nA woman holding a tennis racquet next to a tree.\nA traffic light with two street lights hanging from it's side.\nA very nice looking dining table by a bright window.\nA picture of a person in the air on a skateboard.\nA man walking next to two horses on a dusty road.\nA whit plate topped with chicken and vegetables.\nA black and white picture of a lady getting off of a escalator holding an umbrella walking into the city.\nSeveral balls of yarn are sitting on an oven top.\nA bunt cake sitting on a red plate covered in icing.\nA statue in the middle of a park near trees.\nA Stop sign is slightly covered up by a tree.\nA group of people and an official player soccer.\nA couple of benches next to a street.\nA kitchen with a sink, dishwasher, microwave and refrigerator.\nA blue and silver fire hydrant on a sidewalk.\na herd of big cows on a wide farm\nThe hand is holding an open cel phone.\nA row of motorcycles parked on the side of a busy street.\nA computer monitor, a laptop and some other electronics sit on a tan, wooden desk.\nA little girl is jumping on a hotel room bed.\nA blue and white bus parked in front of a motorcycle.\nPerson stands and poses with skis next to a ski lift.\nTake-out food in a basket on a wooden table.\nA group of people are on the grass playing Frisbee\na big building with a clock built inside the top of it\nSome books that have been piled on top of each other.\na person sitting at a bench near a bush\nPeople sitting at a table with plates of food and beverages in front of them.\nA man sitting in an overstuffed chair in a living room.\nA long passenger train that is going quickly down the track.\nTHERE IS A DESIGN OF AN ELEPHANT ON THE SHELF\nA pair of scissors with white handles sits on a white piece of paper near several sheets of flannel.\nTwo yellow trains are entering a train station.\nsome female hands holding a sandwich in a car\nThis person is about to eat a banana.\nA table set for tea reveals finger sandwiches, tea cups and a cream pitcher all on a red and white table cloth.\nThe snowboarder is performing a jump at the top of the slope.\nA white bath tub sitting next to a white toilet.\nA banana with a sticker on it, with a person holding it.\nA cat sitting on top of a bag of luggage next to a TV that is showing a store about Giant Rats.\nSnowboarder rounding top of sloped edge in ski area.\nA large type pizza with cheese, spinach, and sauce is on a silver plate.\nA group of large red birds that are perched in a tree.\nA vase that has flowers inside of it.\nA group of people on a field with a Frisbee.\nMan and woman standing close together smiling into the camera.\nLarge polished black truck sitting in a parking space.\nDensely growing trees and a low fence frame the top part of a shot showing a tight huddle of grazing sheep on a section of sloping terrain with cropped grass and a cat at some distance behind them.\na couple of phones that are next to each other\nA pile of different fruits sitting next to each other in a  bowl.\nA couple of motor bikes parked on a beach.\nA white beat up bus going down the street .\nA bird flies over an island area of a river.\nA yellow fire hydrant is shown on this street.\nSmall child standing in the center of a crowd smiling.\nA person sitting at a table eating a doughnut.\nThe frame of a bench is metal and the seat of the bench is wood.\nA large bear walks in front of a rocky formation.\nA wooden cutting board with several vegetables sits on a counter.\nA woman is playing Wii with sunglasses on.\nA red, white, and blue plane is in the sky.\nA person holding a surfboard while wearing a wet suit near the water.\nA zebra stands between several small trees in tall grass.\nPERSON ON SNOWBOARD UP IN THE AIR OVERLOOKING NEARBY TOWN\nA view of a street corner in the middle of a city.\nA grey black and white cat laying in a chair.\nA cute dog lazily sleeps on top of a pile of clothes.\nLots of donuts being processed through a machine.\na woman and child are looking at an elephant in its pen\nA group of people in the snow with skis.\nA flock of birds standing on top of a wet beach.\nThe little girl is blowing out her birthday candles.\na kitchen with a small window in it\nA bus is stopped on a street surrounded by trees.\nA wooden park bench under a tree with long spiky leaves.\nYoung girl with racket with dog on lap\nA black pan filled with mushrooms and vegetables.\nThe man is holding an extremely large pizza with a lot of stuff on it.\na large window with a city in the reflection\nA living room has three televisions set up.\nA couple of men walking along a snow covered hill side.\nA yellow doorway with a clock above it.\nA bedroom with a picture on the wall and a lamp on the side\nMany people are sitting at round tables with dinner plates on them.\nAn elephant swinging its trunk inside of a pen.\nA woman wearing white playing tennis, about to serve.\na pro baseball player is swinging a bat\na man on a bus and a man looking over his shoulder both smiling\nA gang of bikers riding motorcycles down a street.\nFour bowls containing fruits and vegetables arranged decoratively\nA person on some skis in the snow.\na girl wearing a fuzzy vest and a girl wearing a flowered top\na toilet and a urinal in a marble tiled bathroom\nTwo people on skies posing for the camera\na man that is jumping his skateboard on some bricks\nA dish features breaded meat, lemon, and broccoli.\nAn intersection of a regulated entrance showing the stop sign\nA group of four people are riding a ski lift as they ride over the snowy mountain.\nA living area with a television, coffee table, couch and other items.\nTwo plates with small, rustic looking pizzas on them\nFive snowboarders doing tricks on the snowboard course.\na microwave is sitting on a wooden shelf\nA mom duck with a big bunch of ducklings swimming down a river.\nA dog sitting on a couch in front of a table with a laptop remote controls and glass on top.\nwoman taking picture of herself in the mirror\nA lady in a winter coat talking on a cell phone.\nZebra, antelope and other wild animals at a African National Park.\nHorse drawn chuck wagon followed by Jeep and cattle.\nTwo boys who are playing soccer against each other.\na horse is standing near a large lake\nTeddy bears are dressed in clothing and stand in a window sill\nA yellow fire hydrant surrounded by pebbles near a fence.\nA snowboarder about to move down the slope.\nAn elephant sticking it's trunk up another elephants rear end.\nA LOT OF PEOPLE ARE ON BOATS IN THE WATER\nTwo tennis players sitting on a chair holding racket.\na blue frisbee sitting on the beach with dog paws next to it\nA semi oval looking bathroom that is in someone's house.\nA group of people fly kites over a sand covered shore.\nHundreds of sheep walking in the water and a ranch.\nA person is cutting up some fruit on a cutting board\nTwo young children playing with each other on a bed.\nA cutting board with slices of peeled apple and a knife next to an apple and apple peels.\nA busy street with busses and cars merging together.\na black and white sign is by the road\nThe man sets up the ball to serve it.\nThree giraffes standing together inside a fenced area by white buildings.\nA man and a couple of women sitting on a colorful seat.\na person walking across an odd looking pavement carrying an umbrella\nA yellow train parked next to a train station near a loading platform.\nA street is blocked of for a festival.\nA man sitting at a desk in front of a laptop computer.\nA woman on a beach on a cell phone.\nA street sign showing the intersection of Beacon Ave and Stevens St.\nA man with his arms crossed is sitting in front of green couch with remote on it.\na lone black and white cow standing on a large field of grass\nA double decker bus is shown driving on street.\nA man wearing sunglasses wearing a green shirt.\nthis bathroom has two pictures of dogs in it\nA piped canopy bed with a wood headboard is dressed in neutral bedding.\nA kitchen with counter tops filled with lots of clutter.\nCommercial jets lined up at an airport terminal.\nThe view from the commercial airplane includes the wing and mountains and water.\nA desk with two computer monitors and a laptop.\nA pair of adults escorting children skiers up a hill.\na fire hydrant on a city street near a pole\nSeveral suitcases sitting next to a chair outside\nA crowd of people sitting around a dinner table.\nA large teddy bear is wearing a dress.\na woman with a nice little suit case\nA snowboarder soaring above a slope looking out on a mountain range.\nTwo pieces of pepperoni pizza are on a plate.\nA young girl is eating cake with her fingers.\nA giraffe amongst tall, slender trees in an enclosure\nA computer keyboard is shown on a desk.\nA train that is riding on rail road  tracks.\nA man in a shamrock hat is playing a video game.\nA kitchen scene looking at all the pans of hot dogs and sausage.\nThe bench at the tree offers a respite and a scenic autumnal view of a grand valley\nA row of elephants standing next to each other.\nA bathroom with a black and white pattern on the wall.\nTwo young woman walking by a fire hydrant, one talking on cell.\na person riding skis on a snowy surface\nA young child is jumping high in the air.\nA person on a surfboard, riding a wave and leaning to one side with one hand up in the air.\nA full view of an airplane taking a shower.\nA close shot of a BBQ pulled pork sandwich.\nThe skateboarders seem very relaxed as they wait for their turn to ride.\nMan with broken surfboard standing in waves in ocean.\nBrown leather couch in wood floored living area.\nThe man and woman are holding tennis rackets.\nA man wearing a neck tie and a white shirt.\nGroup of young adults eating pizza and drinking beer at a restaurant.\nA man with a yellow tie and white shirt holding a yellow sweater round his neck.\nA man standing in the street on a cellphone.\nA red bus is leaving and some people in the background.\nTwo students are playing games at a party\nA soccer player kicks the ball in a soccer field\na cloudy sky during a day with some overcast\nA plane with drawings on the side waiting for people to board.\nA  young boy holding an umbrella on a deck\nA sheep is minding its business near a body of water.\nLong-haired male downhill skier flying down the slope, negotiating a turn.\nSeveral people street skatingstreet luging on a road.\nA female snowboarder riding down the mountain slope\nA man with a baseball uniform on with a baseball and catcher's mitt.\nTwo cows are standing on a sloped green hill.\nA man in suit and tie has a cane and cigar.\nthere are six jets flying in formation\nA group of elephants in grassy field with mountains in background.\nA bed sitting in a bedroom between two lamps.\nA person sitting in a chair with the ocean in front of them.\nA man surf sailing out on the ocean.\nSimple silver remote being held out in front of a television.\na brown piece of cake is sliced and on a brown table\nA woman is holding a tennis racquet preparing to serve the ball.\nan air force jet flying with a sign attached to the back of it\na group of people ride atop of an elephant\nA dog sitting on a rug watching television.\nA tall giraffe standing in the middle of a green field.\nA pair of men sitting at a table in a diner.\nA woman sitting on a bench with a bunch of suitcases\nA baseball game with the pitcher in his follow-through and the batter preparing to swing.\nA brick building with a clock on the outside.\nA dog lies in the grass next to a Frisbee.\nA kit has markers, a scissors, and other plastic objects.\nA stop sign topped by two green street signs.\nNumerous parking meters along the side of a street.\nA bowl of pasta salad with onions and olives.\nTHERE IS A WHITE PICK UP TRUCK DRIVING DOWN THE HIGHWAY\nA flock of sheep grazing in a big grassy field.\nAn old pickup truck sits outside among other classic cars.\na bunch of cows eating out of a food trough\nThis is an image of two bikes on a beach.\nA large bus on a open city street.\nair force members consulting near airplanes, while a man is near the planes.\nA living room with a lighted floor lamp, sofa, wooden coffee table and end table.\nA fire-hydrant on a street and near a van.\nA truck with trailer for hauling rolls down the road.\nA group of people standing in a field flying a kite.\nBananas and other fruit on a white plate.\nMan with teenagers at outdoor setting enjoying food and drink.\nA very cramped room with a couch and a desk.\nseveral old fashion planes stilling in a field.\nA cat sitting on top of a hard wood floor.\nA man seated in front of a pizza.\nA man riding skis down a snow covered slope.\nCarrots, quash, green onions, and parsley all on one piece of paper.\nA baby boy holding a stuffed bear animal in his hands\nPeople sit at a table for a party.\nA man that is sitting in a train.\nSome apples and other fruits at a store\nA man flying through the air while riding skis.\nTwo youngsters in orange tops have catchers gloves and are playing.\nyoung boys in uniform playing baseball in a packed baseball field\nA public restroom with several urinals, a black floor and red and yellow walls\nA group of farm animals standing in the shade under a tree.\nA glorious sunny day at the beach and a man sitting on a bench taking it all in.\nA living room with a fireplace and an artificial tree.\nA man is surfing in the water in a really big wave.\nA large long train on a steel track.\nA large number of people outside near some flowers and a road.\nA woman is looking at a fire hydrant.\nSeveral people with backpacks waiting to get on a bus.\nA bathroom with a large mirror and walk in shower\na small cat watches a cheetah run on television\nthere is a female surfer riding in the water\nA woman is reflected in a mirror as she works on her laptop computer.\nA large passenger jet sitting on top of a runway.\ntwo tennis players on a tennis court with a sky background\nA group of people standing outside while some hold posters.\nA young girl standing on top of a grass covered field.\na yellow and brown fire hydrant on the side of the road\nA plate of food containing a sandwich and a salad.\nA little girl eating a donut in her left hand.\nA small private plane that is coming in for a landing.\nA cat sitting on the awning above a stove\nA person traveling on a crosswalk on a bike.\nA gathering of people around a large table eating.\nA group of people sitting in the snow while attached to snowboards.\nA large clock sitting in front of a building beneath a tower.\nA dog sleeps on the lap of his owner.\nsome guy standing on a beach with a surf board\na couple of street signs that are by some bushes\nA man with a shaved head lights a cigarette.\nA man in a pizzeria putting the toppings on a pizza.\nA girl with a cast on her arm stands in a bathroom.\nA person holding a red bowl filled with cake.\na person riding skis on a snowy slope\nTHIS IS A BEAUTIFUL PICTURE OF FRESH VEGETABLES\nA cat taking nap on top of a pair of shoes.\nTwo photos are presented white people talking on their phone.\nThe sheep are grazing on the hill side.\nA fire hydrant is alongside an empty road.\nA child playing on his skate board at a park.\nAN ELEGANT ENTRY WAY WITH ARCHED DOORWAYS AND GLASS AND A CLOCK\nA bowl of mixed fruit on a decorated mat.\nRed double-decker bus parked on a city street.\na few people on horses are riding down the dirt.\nA slanted picture of a woman waiting to cross the street.\na vet is trying to check a dog's teeth\nA man holding the strings of a kite on the ground\nA woman holding a baby while she has something in her mouth.\nA plate that has a glass and food on it.\nA bookshelf full of cookbooks, bottles, and magazines next to a microwave.\nA child is leaning out of his bed to touch a gadget.\nA laughing man is holding a baby with a plate.\nThe snowboarders are taking a break in the snow.\na bathroom vanity and shower door with towels hanging on a towel rack\nA clock clamped inside of a rusty vice.\na living room with couch, fireplace, tv, chair, and window\nThree different vases containing several red tulip blooms.\nA small bird is perched on top of the branch\nA baseball player holds up his bat while a catch squats.\nA young woman walks along the beach near the water.\na child and an adult pose for a photo\nRoses and other flowers arranged nicely in old-timey vases by a shop window.\nA man skateboarding in a skateboard park while another waits their turn.\nA man spray painting a fire hydrant on a street corner.\nCloseup of the head of a white cow on road.\nA woman who is holding a tennis racket.\na baseball game with the batter catcher and umpire\na person in glasses is using a laptop\nA cherry pie sitting on top of a piece of tin foil.\nA dog leans out of the window of a car.\nA shelf with pileed hats next to a teddy bear.\nA woman on a cell phone near a man.\nA Fiji Air Pacific plane is flying through the sky.\nPeople are walking around a plaza that has a sign that reads \"Spring in the City\".\nTwo men trying to get to a soccer ball in a soccer game\nA man plays video games in a cluttered living room.\nthere is only one horse standing on a large empty field\nA young lady laughing in a kitchen with a cake in front of her on the counter.\nThe girl is surfing a small wave in the water.\nA laptop computer and mouse sitting on a table.\nA group of women cooking and preparing food in a kitchen.\nA cat next to a box full of lots of trinkets.\nA person on a yellow motorcycle is turning around a street corner\nPeople dresses as zombies boarding a bus at a bus stop.\nA man and woman holding up cellphones near each other.\nThe brown  bench is in the woods\nThe woman is posing for a picture while skiing.\nA male surfer carrying a white board exiting the ocean.\nthe side of a passenger train at a train station\nA man riding a surfboard in a wet suit in the ocean.\nA dog is in a living room lying on the couch.\nA man wearing a cap, walking alongside a bicycle.\nA kitchen filled with black appliances and a table.\nA man playing tug o war with a dog over a white frisbee.\nA plate of food sitting next to a glass of orange juice.\nAn Olympic competitive skier furiously rounds the corner.\nA man adjusts his tie as the subject of a graphic.\nA plate of food including, grilled meat, baked potato, carrots and lima beans.\nThere are some bananas on a dinner table\nA plate filled with broccoli chicken and fried rice.\nThe view of a crowd of shoppers and vendors at a market.\nA man that is standing in the dirt with a bat.\nA person on a field flying a kite.\nA pretty young lady kneeling down to pet a cat.\nA little girl in a red shirt and blue dress standing on a road.\nA few skateboarders performing tricks at a skate park.\nA type of bread is on a plate next to a variety of sauces.\nThe living room has an old style fire place in the corner.\nA woman standing in the living room with a coach and t.v.\nA spindled bed sits inside of a wall papered bedroom\nA baseball player pitching a baseball on a field.\nA man on skis with ski poles has just descended the mountain.\nA man riding a wave on top of a surfboard.\nGiraffe relaxes in the shade in the park\nA sign cautioning the likelihood of cattle crossing.\nA group of giraffes is standing next to a fence.\nA harbor in a city is full of boats.\nA street filled with blurry traffic and traffic signals.\nA tennis player is about to hit a ball in front of a crowd.\na small airplane sits empty on a runway in the mountains\nA wooden table topped with four white bowls.\nTrays of pastries and sandwiches beside a bowl of soup.\nsome people some snow and some trees and one person is taking a picture\nA messy bedroom with items covering the floor.\nTwo large bags of luggage in a hallway.\nThe train is going down the railroad tracks.\nA women who is in a field of dirt  flying a kite.\nThree women posing for a picture in a dinning hall.\nA bench sits between two trees in a flooded area.\nTwo people with bicycles standing in front of a field of flowers.\nA giraffe walks near the gate as people look on.\nA group of wine bottles sit next to a glass.\nA vase filled with lots of different colored flowers.\nAn orange has been sliced in half and placed in a red bowl.\nA man with a suitcase walking through a crowd\nA couple of zebra standing and laying on a dirt field.\nThe cars are parked on the side of the street.\nThe steak and broccoli is next to a bowl of soup.\nA bedroom with windows with bright lights flowing through.\nA man and a child who are in the snow.\nThree male skiers standing on a ski slope\nA man on skis hovers over a series of small hills covered with snow.\na bird that is sitting on a branch\nThe person is riding on the back of the multi-colored truck.\na person wind surfing on a large body of water\nA small black and white dog sitting on a yellow davenport.\na young horse and its mother graze in a field\nA large group of people are sitting at a long dining table set with plates and wine.\nThis is a long red bus behind another one just like it.\na double decker bus stopping to pick up a passenger\nTwo uniformed men posing while holding pastry items.\nThe side of a train showing the entrance and two doors.\nThree giraffes lounging around in a grassy zoo enclosure.\nSeveral people are sitting around a lit birthday cake that is under construction.\nSmall child playing the Nintendo Wii on carpet\nA herd of cattle grazing on a grass covered hillside.\nPeople watching a big blue kite on a cloudy day.\nthere are several bullet trains on the track\nA woman poses for a picture while eating\nA water-stained cathedralclock-tower enveloped by various green vines.\nA large balloon on a beach with a black and white dog looking at it.\nA small set of silver scissors used with electronics.\nA cat sleeping in a sink next to a faucet.\nThree zebras are huddled together in an enclosure.\nA cake says Happy Birthday with an image of a horse.\nA large vase sitting on top of a wooden table filled with flowers.\nA clock sitting on top of a street sign.\nA baby that is laying down wearing a tie.\nSeveral people are getting ready to enter the water for surfing.\nSea beach with a bench.Four ships are seen in the sea.\nA person in snow gear skiing down a snowy hill.\nA baseball player holds a bat across his chest\nA desk with art work and photos displayed on it\nA woman that has a racquet on a tennis court.\nSome dogs stick their heads out the car window.\na large living room filled with a lot of furniture\nA large, white cow walking through the streets of a small town\nA toilet connected to a wire, next to a speaker.\nA yellow bus is driving alongside a small white car.\na baseball player with a bat on the field\nA man is skateboarding on equipment specially made for it.\nA man holding a tennis racket about to hit a ball.\nA man who is playing video games by himself.\nA big pretty rainbow over a long empty road.\ndoughnuts stacked on top of each other in a bowl\nTwo microwaves stacked on top of each other in a kitchen on a counter.\nthere is a very tall giraffe standing under a pole\nsome snow skiers are posing for a picture\nA computer monitor that is in front of a keyboard.\nA gold clock that is on the table.\na woman sits on a bench and talks on her cell phone that is waited down with key rings\nA snowboarder goes airborne with a mountain in the background.\nA truck that is sitting in the street.\na young boy holding onto a harness for a cow\nA table topped with a bowl of soup and a plate with a corned beef sandwich.\nThe traditional white sink features two faucets below the mirror..\nA man being assisted with a tie by a lady.\nPeople are in a field playing with a frisbee.\nA white laptop computer lays on a carpeted floor and a gray and black with white footed cat is on it.\nCattle with horns and red hair standing against a fence.\nA jumbo jet on the runway waiting to take off.\nThere is a table covered with various displays of cupcakes\nA living room has guitars, shelves, and a painting.\nSomeone has drawn a face on the yellow fire hydrant.\na large black giraffe that is out side by some kites\nA metal rusted bed frame in a dilapidated room\nA pile of apples lying underneath a tree on the ground.\nA white bowl filled with meat and green broccoli.\nA two floor bus picking up some passengers at a bus stop\nRacer riding a dirt bike on a race course.\nA pizza on a pan with a spatula.\nMany sheep graze in a grassy pasture in a valley.\nA man standing on the railing of a boat near the shore.\nGroup of people in for a group training session\nA man is sitting on a bench next a statue of man with dog licking his face.\nSteam rising from a manhole cover in the middle of a street with a yellow fire hydrant in the background.\nA teddy bear sitting on a bench in the shade\nAn eighteen wheeler with a patriotic paint job sits in a parking lot.\nA very tall clock tower with weird arches hanging off of it's sides.\nA todller, a girl, and a man pull a ribbon in the grass.\nHorses walking through the yard toward a barn.\nThe skateboarders are practicing their tricks on the stairs.\nA man on the beach kicking the sand.\nA kitchen with a standard stove top and wooden cabinets\nA couple of women standing on either side of a man wearing glasses.\nTwo men retrieving their Frisbee from the creek.\nA sign that warns of speed bumps ahead.\nCat relaxing on blanked, appears to be stretching\na mobile phone, tv remote, game controller and chips on a blue table cloth\nTwo giraffes are eating grass in the plains.\nA man that is jumping in the air with a racquet.\nTwo dogs on a bed in an RV.\nA snowboarder soaring through the air on a sunny day.\nTwo female cows looking forward outside in the grass.\nOrange seats on a train with Yellow doors and lime green floors.\nAn intricately decorated bathroom with a peacock light lit.\nthere is a blue and silver train that is stopped on the tracks\nA black and white photo of two birds standing on seaweed.\nA commuter train sitting at a station while passengers stand on the platform.\nA person lying on the ground with a suit case on top of them.\nA fruit that is still hanging from a twig.\nTwo pieces of french toast with syrup on a plate.\nSeveral umbrella's and chairs sitting on a beach.\nA hotel room showing a bed, desk, television bathroom.\nGirl moving while holding a Wii remote in a living room.\nA man in a suit and a tie with a cell phone.\nA kitchen refrigerator covered in various colorful stickers\nA wide angle view of this hotel suite\na bunch of people are standing near a bus\nFour people are in a room using four laptops.\nAn elephant standing in the middle of a rocky environment.\nA group of children sitting on a bed together\nA close shot of a plane flying in the air.\nA blender full of smoothies and two glasses on a kitchen counter.\nThe decor in the house is very elegant looking.\nA view of a bathroom showing vanity, toilet and shower.\nA baseball player is hitting the ball on the plate.\nA bed in a purple bedroom with a wooden dresser topped with a mirror.\nSome very fancy looking cocktails with fruit and veggies.\nA round clock on a colorful tower near a harbor.\nA dog is wearing a Santa hat for a portrait.\nA yellow freight train is traveling on a track\nA man and boy sit in chairs and enjoy breakfast.\na close up of a street sign with trees in the background\nA young man with a skate board standing in a graffiti covered area.\nHome library area, bookshelf in background with several laptops, notebook PC and two VDUmonitor and keyboard for desktop in foreground.\nA man sitting by produce while another man points to it\nThere is a plane sitting at the airport.\na covered table with fish on a table\nA woman sitting in a bathtub wearing a bikini.\nA young child in a snow outfit and goggles with skis on in the snow.\nA bear climbs through some plants and onto some rocks.\nA sheep standing on a green grass covered pasture.\nA pile of books sitting on a table underneath a clock.\nFour men carrying a long board that narrows at the ends\nA stainless steel microwave with something in it\nA red fire hydrant on a city street.\nAn older photo of a woman on a tennis court posing with her raquet.\nJapanese food of meat and vegetable are on a plate.\nPeople at a table with food and wine.\nA young girl with a helment stands on a skateboard.\nthere is a red stop sign and a white truck behind it\nYoung girl on large grassy field attempting to fly kite.\nA group of people playing with a green disc in a grassy field.\nA cat laying down in a bathroom sink.\nTwo animals are standing on a mound of dirt.\nGroup of seagulls flying around a fishing boat.\nMan and woman walking over a bridge in the rain and high wind.\nA couple of vehicles that are sitting in the street.\nA train driving past a building pouring out black smoke.\nA birthday car with a picture of a black bird on it.\nA young man running along a beach next to the ocean.\ntwo little kids playing soccer battle over the ball\nA bride and groom cutting into their wedding cake together,\nTowels on a towel rack of a bathroom and a towel mat on the floor.\nA close up of two time expired parking meters\nA bedroom with a bed, nightstand, windows, and dresser with a television atop it.\nMAN STANDING IN GRASS WITH LOTS OF MOUNDS AROUND AND A FRISBEE COMING TOWARD HIM\nA runway that has a jet plane and a truck on it.\nZebra walking on road and other animals on grass.\nA light post with a no parking sign posted on it.\nA man in a tie and shorts standing outside of a house window\nA surfer stands on their board as another surfer watches.\nA dog laying next to a large brown teddy bear on a wooden floor.\nA pizza that is cooking in an oven.\nFirefighters gather around a badly burned moving truck.\nTwo adults and one dog standing on a snow covered road.\nA cat that is curled up on a laptop\na airplane that is parked on a runway\nA man wearing pink underwear is sitting on top of a stove door looking surprised.\nA tower with a clock on it's face stands in the sky.\nA table topped with broccoli, apples and other produce.\nA public restroom with focus on three urinals.\nA pile of tiny sandwiches without crusts sits beside a pile of crusts and various sandwich fillers.\nA train approaching a station where people are waiting to board.\nA bus sits next to a tree and sidewalk.\nThere is a pizza that is on the table in the room\nA man taking a swing at a tennis ball\nA group of people watching a boy skateboard.\nA man on a cellphone using a water hose.\nThe young woman is making a face at the horse.\nFour giraffes are in a grassy area with several trees.\nOne boat sailing next to one canoe in a body of water.\nGirls competitively playing Frisbee in a green field\nDoofy young man shares his umbrella with an Asian woman.\nhounds running in front of a horse must be a fox hunt\nA man flying a kite on the beach next to other people.\nA bathroom sink under a mirror and lights.\na train on a track above a body of water\nA person looks at the camera while holding a black cat.\na stack of suitcases out on the street\nBoats are moored near a city that borders a large body of water.\nKids playing tennis on a clay tennis court.\nanimals in a field of tall grain near a tree\nTennis player returning volley during match play on grass court.\nGiant dolls sitting in giant beds next to a man wearing an orange safety vest.\nA person covered up in warm clothing sitting on a bench, with two bags next to them.\nA bathroom sink designed as a bowl next to its reflection in a mirror.\nWoman walking on train platform as train filled with passengers prepares to leave.\nA commuter train going through a tourist area.\nA dog with a pink object in its mouth.\nsome zebras standing on a hill while eating some grass\na car with a cargo full of steer and symbols painted on the side.\nvery many teedy bear with their price label\nHamburger on a bun with ketchup and onion.\nA guy riding a surfboard on the water.\nA person holding a toothbrush to their face in a crowded room.\nA person on a skateboard riding next to a road.\nthere is a small bird that is standing on the branch\nA couple walking in the snow while under a purple umbrella.\nA male on a snowboard on a rail in the snow as five time-lapsed stills in single image.\nThe fire hydrant has been made into a fountain.\nA green candle and a vase on a table with one chair\nA giraffe is standing near a fallen log.\nA cat sitting underneath a wooden stool next to shoes.\na track moving on the road with two people\nA plate of pasta and bread sit next to a beer bottle on a table.\nA dog is sniffing a chew toy on the floor.\nA living room filled with furniture and an old fashioned TV.\nA lady in a blue life jacket skiing.\nA couple of large gray elephants standing next to each other.\nA bird sitting on snow covered ground next to a statue.\nA yellow hazard sign sitting on the side of a road.\nA grilled hotdog with mustard and relish is sitting on a white plate.\nA cat staring at a camera laying on a floor next to a shoe.\nA small kitten is playing with the tv set\nA large cabinet in a corner next a picture.\nA living room with an chair and large couch sits in front large bookshelves with computers on top.\nA road sign that says reduce speed for motorcycles.\nA man riding a skateboard over a stone block.\nThe man is holding a pink iced doughnut.\nA fire hydrant covered in leaves sitting in front of a tree.\nA black dog sleeping on a yellow and white striped comforter on a bed.\nA view of a kitchen with a  very elegant look to it.\nThree people riding skateboards down a hill next to grass.\nLocal fresh fruits and vegetables displayed for sale in a market\na man and woman stand in front of a cake\nA bathroom sink and shower separated by an open doorway.\nchopsticks holding broccoli and noodles in a white dish\nWomen are selling bagged and fresh bananas under a colorful umbrella on a street corner.\nA giraffe that is standing on all fours on a dirt surface, in a fenced in area.\nthere is a bird that is sitting on a branch\nA man sitting on a horse while rubbing him and kids are rubbing him also.\nA plate topped with rice, broccoli and meat.\nThere is a dog sleeping on a couch in a cluttered room.\nA baby holding a busted up umbrella whle sitting on the ground next to a pile of garbage.\na woman sitting on the back of a pink scooter in the road\nTwo sheep stand in a field with mountains in the background.\nA small room with a television screen monitor.\nA cute little girl eating a hotdog almost as big as she is\nA Wii remote and nunchuk that someone's hand is holding on to.\nA woman posing for a picture in a kitchen.\nA large pizza sitting on a counter next to a glass of beer.\nThere is a man smiling with a banana in front of his mouth\nA boy in a chair with a teddy bear dressed in a railroad outfit.\nA large group of people protesting outside in a parking lot on a sunny day.\ntwo people walking in an open field with a sky background\nA desk with two computers on top of it\na man is at a snow slope jumping with a snowboard\nA red and clear small glass filled with candy on a desk next to a green plant.\nA man and woman are preparing pizzas on a table.\nThe girl is running through the grass in a costume.\nA market with a variety of fruits and vegetables.\nA young person ridding the waves on a surf board.\na truck is parked with some rafts by the water\nA white cat sticking his head out through something.\nTwo women with loads of green bananas on dirt ground.\nA wooden swing hangs above plush, green, grass.\na female in a black jacket is riding a brown and white horse\nThe train is travelling down the tracks of the road.\nPeople are using a boat to travel through a flooded town.\nA plate of food with pasta, mashed potatoes and broccoli.\nTwo girls involved in some sort of a game.\na group of zebras standing around by a fence\nA large tall tower with a clock on top.\nA traffic light sitting on the side of a road.\npeople sitting in the grass with some of them chekcing on their cell phones\nA woman is about to hit a tennis ball.\nA man and his child holding onto their skis.\nA man in a shop that sells bottled liquid tapes up a paper bag\nA CARGO AIR PLANE IS PARKED ON THE RUNWAY\nA man looking back while standing in a market below a clock.\nA boy stands on artificial grass holding a Frisbee.\nA female professional tennis player dressed in white.\nA large white stove sitting against a wall in a kitchen.\nThe baby girl is sitting in a high chair playing with broccoli.\nThree pizzas with nontraditional toppings, a statue and a bottle of wine.\nThe young boy is sitting on the couch playing a video game.\nA subway sign at night beside Big Ben.\nA woman holds a Weiner in each hand.\nAn old photo of four men in a boat with a bicycle\ntwo people using clear umbrellas that have fringe on them\nA man standing in front of a kitchen counter using a laptop.\nA zebra chews a flower in a fenced in field.\nA painting of a horse drawn carriage traveling through the country.\nThis wall oven has just cooked a homemade pizza.\nCrowded market street filled with pedestrians holding umbrellas in the rain\nA beaver sitting on top of a tree stump.\nTwo skateboarders practicing their flips on a wooden ramp.\nTwo people in the water on surf boars on a wave.\nA glass plate holds crackers, cheese, and vegetables.\nA man pointing at something in front of a bus.\nA woman sits in the grass talking on her cell phone.\nA girl smiles at the camera while making candy.\nA cat standing in front of a television screen with a picture of a fish.\nA toy model of a kitchen that has a refigerator, stove, oven and baby play pin.\nTennis player prepares to play with racket in his hands.\nA man smiles as he sips wine in an outdoor restaurant.\nwoman wearing a black coat and boots sitting on wooden bench\nA man cooking on a grill with a fire.\nThree Zebras and a Giraffe in an enclosed area.\nThe eight lane street is packed with cars in traffic.\nA women serving a bundt cake with candles to a child.\nA person in black shirt walking on sidewalk with an umbrella.\nCommercial plane taking off from a runway with water in the background\nThere are plates of cheese, crackers, and sandwiches on a table.\nA group of men are waiting for their bags to be unloaded.\nA zebra eating grass by the barn gate.\nA person holds a cell phone inside a car.\nThree birds are looking around while on the ground.\nA puppy is laying on a blanket with toys.\na family sits down to eat at a lighted dining room table\nThis suitcase is full of CD's and apparently they are for sale.\nLarge flowers are sitting inside of the vase.\na zebra drinks out of stainless steel tub\nA guy and woman dressed up for Halloween.\nA man is playing a game of tennis.\nThere is a bathroom with green walls and a white sink and toilet\nA pair of shoes with a baby kitten inside one of the shoes.\nA person on a snowboard jumping in the air.\nA bunch of hot dogs in a bowl with beer being poured on them\nA giraffe surrounded by a group of zebra in the grass.\nA monitor and keyboard sitting on a desk.\ntwo cows eating the grass on a urban area\nSome trash sits at the side of the road at an intersection.\nA group of people watching a woman jump a horse over obstacles.\na steam engine train driven down a rural area\nA man looking out of an airport window at planes.\nEmpty benches in the park after a storm.\nA group of young children sitting next to each other.\nA traffic light is sitting next to a pole\nA very cute little cat standing on a desk.\nThere is a plate with one slice of bacon a half of orange and bread\nAn empty bus is parked on a street.\nA woman preparing a young boys lunch in front of him.\nThree well dressed people are standing and laughing together.\nA group of people looking at stuffed animals lined up in a street.\nThere is some trash is a kitchen sink.\na cupcake with a blue umbrella in it\nAn outside table and chairs with a pink lamp.\nPeople are standing in a field flying and watching kites.\nAn empty kitchen with wood-paneled cabinets and black appliances.\nA man is flying a large kite in a field.\nA man that is standing up holding a surfboard.\na girl is sitting on a horse outside\nA woman sitting in a chair while holding a purse.\nA giant inflatable shaped like a spiked ball placed on a field.\nA group of people carrying ski equipment while walking on snow covered ground.\nTennis players in action on a court with shadows.\na person holding a doughnut up to their mouth\nA marble table with plates of food and utensils.\nmany people are trying to avoid the sun by holding umbrellas\nAn Emurates airplane flying through the sky\nA bird that is standing on a keyboard.\nA white plate topped with two pieces of stead and a salad.\nA group of people watching something with one man looking off into the distance.\nA woman crossing a city street while carrying groceries.\nA sign post with signs that read \"Maciel Ln\" and \"Wonder Stump Rd\".\nA bike with a box on it's back wheel is parked\nA lady in a blue dress is posing for the camera in front of her plate of food.\na couple of men that are sitting at a table\na baseball player swinging a bat on the field\nA family watches two boys singing into microphones.\nGrey and white cat sleeping on a pillow and a sweater\nA man holding a tennis racket raising his arms up in the air while two women clap.\nA puppy rubbing its face on a pair of shoes.\nPeople that are sitting on the grass eating food.\nA woman walking down a street at night holding a red sheep umbrella.\nTwo men in  suits taking a picture together\na close up of many fruits on a table\nA couple of giraffes that are blocking  the path of the safari.\nA pair of giraffes stand under a canopy together.\nThere is a full view of an outdoor area and it is nice.\nA small dog standing next to a table with a white plate on top of it containing two chocolate donuts.\nA group of people are standing or sitting around a table taking pictures and looking at a phone.\nMotorbikes and other vehicles move along a one-way city street.\nThe little girl is petting the horse in the barn.\na blue and white plane is on a runway\nA cat laying on a desk with a book and laptop.\nlooking up to a clock on the side of a building\nA person standing on a white square playing a video game.\nTwo women on cellphones laughing with trees in the background.\nA baseball cap with sunglasses sitting on top of a baseball glove.\nWe are looking at a propeller plane flying in a cloudy sky.\nA group of sail boats on a small pond.\nA toilet with many buttons is sitting with the lid up.\nChild standing in front of a stop sign on a suburban street corner.\nA metal bicycle on the top of a wooden bookshelf.\nA lady is standing outside in front of a bus station.\nThe plate is piled with rice next to a whole apple.\nMan stands inside a building talking on his cellphone.\nA rather large heard of elephants, including a baby.\nTwo wine glasses and wine bottles sitting on a wooden table.\nA peep hole view of a a man biting a sandwich.\nThis tennis player is watching the ball after hitting it\nA young woman that is sitting on a couch with her leg resting on some pillows.\na yellow and white bird is closing its eyes\nA local bar has appetizers and tapas to enjoy the game in the background\nMultiple boats are docked on the water by a pier.\nA snowboarder lies face down in the snow.\nA tall stack of suitcases arranged largest on bottom to smallest on top.\nA child reading as his mother and dog look on.\nA man wearing a cowboy hat, riding a horse in a parade.\nA boy laying on a small wooden bench with and umbrella held up over him.\nA variety of luggage is stacked in a compartment.\nA group of zebras standing in tall grass\nA giraffe walking in the grass past trees\nA giraffe bending down to drink water from a pond.\na man helping a young girl walk on snow and ice in snow shoes\nA lamb that is around a group of people.\nA girl is posing by something that was just taken out of the box.\nA jet and a small unknown aircraft are flying in the sky.\nA spoon and a blender on a counter.\nA bowl of bananas being placed in the middle of a table.\nA large black cat sits on a desk near a laptop.\nNoodle bar near cookie on plate near glass of milk.\nA close up of a not so happy white kitty.\nTwo slices of pizza sitting on a ceramic plate beside a box of cheesesticks.\nthere is a small glass vase with white flowers in it\nA person skiing on a mountain, in the snow.\nWoman laying down across her personal bathroom with her feet hanging over the tub.\na cat looking out a window as one sits by the laptop and looks at the camera\nTrees in a park are in front of some parked buses.\nA man and woman traveling on the subway with surfboards.\nA girl smiles from the backseat of a car on the phone\nA young girl standing on a surfboard ride.\nTwo young ladies that are dancing in the room together.\nA womab looking at herself while brushing her teeth.\nthere is a man wearing a red and white uniform that is at bat\nA women who is sitting on a horse.\nThis kitchen has a black stove, stainless steel refrigerator and white cupboards.\nA person riding skis down a snow covered slope.\nA man standing next to another man wearing headphones.\nA clock is built high into the side of a tower.\nThere is a man talking on a cell phone.\nAn old man sells a variety of kite string spools.\nA little boy sitting on a suitcase on the floor.\nA black and white picture of an overturned truck in the middle of a street.\nA large billboard that has words in a foreign language.\nKids running in the grass after a soccer ball during a game.\nThe back end view of two zebras standing at a fence.\nA woman is on the tennis court holding a racquet.\na large pizza is in a black pan\nNo image is being shown on the page right now.\nHigh speed train stopped at a station underground.\nA boy performs a trick on a skateboard\nA batter holding a bat at the home plate.\nA herd of sheep standing on top of a lush green field.\nAn Indian man straddles a horse beside a stone building.\nThere's a sideways traffic light next to a building.\nA group of fluffy sheep in a big grassy field.\nA stir-fry wok is filled with cooking vegetables.\nA baseball player standing on home plate with a bat.\nA woman leading a horse inside of a building.\nA parked white, green and red double-decker bus.\nA kitchen with brown cabinets and plenty of space.\nBaked pizza with meats and vegetables displayed on table.\nSteam rises from a bowl of colorful food, while a glass of juice sits on the sill in the background.\npeople on a table at the beach eating\na black seat on a white toilet in a restroom\nSkateboarder jumping down concrete steps outside on his board.\nA clock tower is seen in front of a tall building.\nA colorful glass vase sitting on a table.\nA snow covered area with a car with it's brake lights on in the distance.\nA giraffe in front of the doorway of a building looks around the corner, casting a shadow on the building on a sunny day.\nA yellow motorcycle is parked on a road with many bystanders\nMany zebra and one wildebeest on a savanna\nMan placing a white container into an oven.\nA stuffed animal sitting in a pizza box with some slices of olive and cheese pizza.\nA blue and white vase filled with flowers.\nPeople are getting ready to fly kites in a park\nA crooked one way sign pointing into the ground\nA skier is seen riding down a hill.\ntop shot of a boy sitting on the floor eating pizza\nA kitchen with all the appliances such as a fridge, microwave and stove.\nThree riders race around a track on dirt bikes.\nTwo rows of bicycles parked side by side on a sidewalk in front of a building.\nA tennis player is on the court preparing to swing.\nA tennis player holding her racket in the air.\nA bench is on a deck overlooking the water\nthere are two brown bears that are playing together in the water\nA large clock below a flagpole with a flag on it.\na united jet liner loading passengers before take off\nVase with water holds a bunch of flowers in front of window\nA typical living room with all the furnishings.\na sandwich laying on paper and on a table\nA person takes photos of sheep laying down.\na white plate topped with a piece of chocolate covered cake.\nA family of elephants is walking along a dirt road.\na person riding a surf board on a wave\nA bathroom that is all off-white with a mat in front of the shower.\nA bathroom has a custom bathtub with no curtain\nA computer desk with an old pc and lots of clutter.\n3 adult elephants stand with a baby elephant behind a fence.\nTwo partial pizzas with cheese, olives, green peppers and tomatoes.\nA couple of zebra standing next to each other.\nA tennis player with both feet off the ground leaping for the ball\nA kid standing at a table eating some food.\nA zebra that is putting its head on top of another zebra.\nAn old man is getting ready to blow out some candles.\nSeveral buses are lining up on the street.\na small dog tied to a bench on a leash\na woman is sitting down talking on a cellphone\nA woman sitting at a table with a glass of wine.\nA girl in a white dress at the beach with two surfboards.\nA light brown bear sitting down near large logs.\nThe taco pizza have a lot of olives on it.\nA table with four bowls of food on the top.\nA man sits at an outdoor restaurant table eating a soup with chopsticks.\nA plated filled with a fish, potatoes and broccoli.\nA fluffy cat laying on top of a white laptop computer.\nPeople riding their bikes down the middle of the road.\nWoman walking with a horse near a standing man.\nA couple of black bears standing on top of a rock area.\nA horse in a stall with three people.\nThree very different giraffes at a big zoo\nA cat curled up in a shelter made of printer boxes.\nRED DOUBLE DECKER BUS WITH CARS IN BACKGROUND\nA steam powered train pulls out of a busy station.\nA cow is standing in a field in front of a building.\nAn old looking two level bus in a parking lot.\nA slice of cheesecake sits beside a fork on a plate.\na bunch of people watching as two people play video games\nA man on a scooter sits beside a stop sign.\nThere is a woman holding and playing with a baby.\nA boy is playing frisbee golf in a park field.\nWhite decorated porcelain vase in front of others.\nA woman at a product show holding a cell phone\na large kitchen that has a stove and a dishwasher\nA couple of cows walking submerged in some water.\nYoung, tagged calf looking through a barbwire fence.\nA little girl in a pink snow suit on her skis.\nA train is pulling into the station.\nA baseball game in progress with a full crowd.\nSheep stand outside of a wooden building on a snowy day.\nA makeshift tent is constructed at a camp site.\nA woman posing with a stuffed bear in uniform.\nA bus stop with a slightly damaged bench.\nThe young woman is sitting on the bed fixing her hair.\na close up of a young baseball with a glove\nA woman walking her bicycle with dog walking beside her.\na shower door a sink a mirror and an outlet\nsliced orange and a knife resting on the cutting board\nA large bus that is sitting in the road.\nPerson sitting on elephant walking in a muddy river.\nA mother bear following her cub across a meadow\na person sticking a knife into a cake on the table\nA man standing with his arms folded while smiling.\nA woman holds up an electronic cigarette underneath an umbrella.\nA model of a kitchen with a sink dishwasher stove refrigerator.\ntwo small birds on a bench with a blurry background\nA man on a snowboard caught in each phase of his trick.\nA man, a lady, and a youth together and enjoying a pizza.\nA person holds a skateboard and stands on the sidewalk.\na group of men holding a long surfboard on the beach\nA wooden street sign in a residential neighborhood.\nA skate boarder rises on the crest of a concrete wall\nTwo gentleman in formal suits, one of them is adjusting the collar of the other.\nA shower has a removable shower head and a glass door.\na woman standing at an outdoor display of assorted fruit\na close up of a dog on the ground\nAnimals grazing on a lush green hillside covered in grass.\na close up of a persons hand holding a large knife\nA pizza in a box in a drawer of a motel desk where a TV displays \"Inspired By A True Story\"\nA red and green bird on a perch eating.\nAn office area set up with multiple monitors\nSurf boarder finding waves in a river designed for surfing.\nA glass vase that has dried flowers in it.\nA man is standing in a river with a cow.\nA garden with vegetables planted in it\nBanana bunches hanging at an open air market.\nA large blue bus parked at a bus stop in a city.\nA pizza with an assortment of toppings such as lettuce and radicchio.\nA empty bench on a snow covered beach.\nThis man is standing in a kitchen eating food.\nA modern sink is on top of a bathroom counter top.\nLittle girl with black hat sitting on a pony with two girls beside.\nA group of people sitting around a dinner table.\na cat walking on a floor next to a contruction area\nFood is shown in a display case at a deli.\nyoung child is eating a powdered doughnut at the kitchen table\nA counter to an office area with an orchid in a flower pot next to a balloon sculpture of flowers.\nCars and a motorcycle waiting at an intersection.\nAn empty street and stop sign at night.\nA clay rendition of roses in a pot are displayed.\na person standing on grass holding a large box of pizza\nA zebra and foal are standing on the ground.\nThere are two boys preparing food on a table.\nAn old picture of a building and trucks outside the building.\nA gray cat standing on top of a black car.\nA plate of food and drink on a table.\nA young woman holding a green baseball bat on a field.\nA small backyard garden with freshly grown vegetables.\na plate filled with pepperoni and mushroom pizza\nA man is racing a black motorbike around a race track.\nMan in black and white outfit swinging at a tennis ball on a court.\nAn airplane flying in the blue sky with some clouds.\nA woman is approaching a tennis ball with her racket.\nA man flying through the air while riding a skateboard.\na short tree in front of a pink wall\nA group of people at a busy restaurant and a close up of a restaurant dish on a white plate.\nSome young children are looking at the black device.\na street corner in a town all bright from lights\nA man standing on a tennis court holding a racquet.\nThe hotel room is clean and ready for guests to use.\nA large bowl of food is sitting on the table.\nA girl is holding her phone and looking at it.\nA large body of water next to a shore filled with clutter.\nSeveral zebras are walking across a dirt covered area.\nA horses head handing over a iron fence.\nA close of on an entree containing meat and vegetables.\nA bathroom has two sinks and a bathtub in the middle of the room.\nA group of people on the green grass about to catch phrase.\nA tennis player gets ready ready to hit the ball.\nSeveral elephants are walking up a dirt hill.\nA wet window blurs the image of an apartment building beyond.\nA group of children playing ball in a field\nyoung man catching frisbee right arm under left knee.\na jetblue plane sits on the tarmac at an airport\nA picture of a street during the night.\nA stuffed animal that is next to some hot dogs.\nA motor bike parked on a city street.\nThe rhinoceros lays down next to the zebra in the safari.\nA person sitting at a picnic, eating some food.\nA man on a skateboard who is performing a trick.\nA polar bear walking along on icy ground\nA pizza with fresh mozzarella and basil on top\nsome one with a glove on holding a sparkling drink in the cold\nA plastic horse standing on top of a chair.\nA woman unpacks a picnic basket with her teddy bear.\nthere is a cat that is laying inside of a sink\nA banana and a yellow apple in a woven basket\nA bicycle parked next to a motorized sitting scooter.\nCat laying on top of someones arm while using the computer.\nA city train stopped at a boarding station.\nTwo purple teddy bears one with pink bows sitting in a shopping cart.\nAn open white toilet next to toilet paper in a bathroom\nA train parked inside of a train station next to a person.\nA golden colored Shar Pei dog and a dog of indescribable heritage sitting on dog bed.\nTeenage kids playing a Wii Game, while others watch.\nA teddy bear lying face down on a bed on a pillow.\nA pizza covered with vegetables is on a tray near plates.\nA dog sits on a couch with a book.\nA very large elephant standing near two younger ones.\na baseball player swinging a bat on a field\na bunch of cows lay down on some grass\nA child eating a sandwich with relish on it.\na stadium of people watching a tennis game\nA man is playing Wii in his office.\nA bedroom with a bed and small tables on each side\nTwo sheep standing in a field against the sky.\nA man is performing stunts on a skateboard in a parking lot.\nFans watching a baseball game on the field outside.\npeople standing around with some holding onto to what look like drums\nA snack truck in the street in front of a building.\nA group of men on top of horses playing a game of polo.\nA person cuts grass in a yard using a small pair of scissors.\nA red flat bed truck with a load of lumber on the back.\nPeople are watching a man cut a birthday cake.\nA notebook computer by a window with an image of the same window on the screen.\ntwo people on a tennis court at night\nthere are two pictures of a small black and brown dog\nA group of people skiing around a snow covered slope.\na man riding skateboard down the side of  a hand rail.\nA kitchen with an automatic dishwasher and window.\nA snowboarder is seen from below while jumping.\nThree giraffes resting under a shaded area at a camp.\nA white toilet sitting in the corner of a bathroom.\nHome plate at a professional baseball game, batter not quite ready.\nSome people sitting around at various tables, with a railing dividing them\nA pizza in a pizza box cut into eight slices.\nTwo men stand besides an elephant and gesture toward a crowd.\nAn old blue iMac with a sad Finder face wallpaper\nA woman holding a smart phone at a table.\nA polar bear in water puts his paw on a cage.\nA woman with a cup of coffee and a donut smiling.\na big train that is on a rail way\nA bird perched on a power line looking at a house\nA white and brown horse pokes his head out of a stall.\nA train moves along train tracks in a grassy landscape.\nA man holds and gestures toward a sandwich\na skate board being picked up off the ground by a person\nA person has a stuffed bear on their wrist.\nA puppy sitting on top of a sneaker.\na uellow and blue bus is driving down the street\nTwo zebras are on a grassy brown field.\nA young man sitting next to a young woman both of which are holding Nintendo Wii controllers.\nA blue and yellow train is sitting on some railroad tracks.\nA train bears the numbers 4790 painted on the side.\nA group of three men standing next to each other without shirts.\na couple of people are flying kites on the beach\nA plate with chips, salsa and a burger, on a table with a glass of beer.\nA palm tree is on one side with an evergreen tree on the other side and snow capped mountains are in the distance.\nA small herd of zebras walking past the camera man.\nA collection of teddy bears bearing Swiss flags\na man putting some cheese on top of his pizza slice\nChickens are feeding on the ground while horses hover above.\nBlack and white photograph of a man using a cell phone on the street\na modern flush toilet in a bathroom with tile.\nA photograph portrait of a male teen in coat and tie.\na motor bike is parked outside on a road\nA bus driving down a street with a bears face on it's front.\nMAN ON SKIS STANDING STILL POSING FOR A PICTURE\na dog poking  its head out of a car window reflected in another car's rear view mirror\nTwo elephants are chained to the outside shed.\nA toddler has a baseball and a mitt and going to throw the baseball.\nA wooden object placed next to a tree on the side of the road\nA lot of building on each side of the road, with a very curvy road in the middle.\nan image of a boy that is lying under the bed\nA large yellow double decker bus driving past a guy riding a bicycle.\na bathroom with a toilet a stool and a toilet bowl cleaner\nA surfer wearing flippers skims along a wave\nRed and yellow train cars hold gravel on a train track.\nA mattress top on a bed in a small space.\nA bare loaf of chocolate cake sitting on a counter.\nAn arched doorway leads to a furnished living area.\na close up of a person talking on a very old cell phone\na man is sitting down as a child pretends to be cutting his hair with fake scissors\nA group of tourists riding a tour boat down a river.\nseveral sheep graze on grass near a tree with a protector around it\nLARGE DIMLY LIT BATH ROOM WITH A DOMED CEILING\nA picture of the president standing at a podium\na close up of a person with a large sandwich\nA replica sculpture of a baseball player holding a bat ready to swing.\nA man stands in a screened in area with a cell phone to his ear.\nA batter prepares to swing at a pitch during a game.\nSmall bathroom with lights on above the sink.\nA person that is standing with objects in their hands at the beach.\nSome giraffes and ostriches in a grass field with trees.\nclose up of a thin crust pizza with tomatoes\nA giraffe's head is framed by the posts of its enclosure.\na purse a pair of shoes and a horse behind a display glass\na man sits at the table an leans over to blow out the two candles on a cake\nA man snowboarding near a frozen pond and a tree.\na cat laying on the couch next to a remote and a pillow\nA plate of food including broccoli, sweet potato, and pork.\nplat bread pizza with BBQ chicken on it\nA zebra standing on dirt area with fence and bushes in background.\nA fighter jet is flying through a clear sky.\nA young boy with his tennis racket in hand is waiting for the ball.\nA large teddybear float is on snow skis.\nA picture of a person sitting down under an umbrella.\nA collection of trunks are piled against a wall.\nA plateful of meat, fruit, vegetables and bagels\nTwo men and a woman are standing in an elevator.\nSeveral people going down a snowy street in skis.\nA man is taking a big bite of a folded pizza in a cafe.\nA man is playing catch with two children and a dog.\nA man standing next to a large brown horse.\nA towel hanging on the bar in the bathroom.\nA kitchen has black appliances, wood cabinets, and a large window.\nA woman stands in a room that has two small beds in it.\nA chocolate dessert slice sitting on a clear plate accompanied by a fork.\nHorse drawn carriages lined up in the street.\nA train parked at a train station next to a loading platform.\nA bike is shown hooked up to a rack.\nA man with no shirt on a skate board.\nElephant standing in an exhibit behind a fence with a park keeper.\nTwo Pug dogs setting on a green park bench wearing harnesses.\nTwo zebra standing on a lush green grass covered field.\nA happy sun is painted on the building behind the bench.\nA snowboarder jumps very high above the snow.\nThis is the side of an intersection with a red sign\nAn empty bench looking out over a bay with numerous boats on it.\na young child shows off his smile after brushing his teeth\nA woman riding a horse with lots of purple flowers.\nA train soaked street lined with lots of street lights.\nA close-up of two ducks swimming with fish.\nA computer desk containing a laptop and computer monitor with a printer located on the left side.\nA red train is traveling undergroud on the tracks.\nHe loves the thrill of snowboarding down the slope.\nA man and a dog standing on a dirt path in the woods.\nThis looks like a McDonald's in a Chinese or Japanese community.\nA picture of a plane that is in the air.\nTwo adults and two children sitting on a couch.\nA green light is shown on this busy multi-lane street.\nA group of girls celebrating  as they leave the field\nA couch and a television in a room.\nA couple of boats on the open water.\nA half full glass of red wine with food arranged on a table behind it.\nA red umbrella hangs from an ornate stair rail.\nThe orange and white fire hydrant sits on the edge of the street.\nA man sitting at a wooden desk using a laptop computer.\nA group of men sitting by tables working on laptops\nTwo people stand in front of a bunch of elephants\nTwo women sitting next to each other on a boat.\nA cat that is laying down on a bed.\nA young  baseball team sitting on benches together\nA bowl filled with fruit on top of a green table.\nA little girl standing in a forest holding a black umbrella.\nA model of a beach front scene showing the parking lot, beach and the sea.\nSeveral skiers ski by a direction sign and a fence.\ntwo men looking a little boy beside a table\nA bedroom with a lamp, bed, and dresser.\nA truck parked next to another truck near a building.\nTwo buses in a downtown area,, near a boat dock\na man that is surfing on some water\nA woman decorating a fancy cake in her kitchen\na table with a shake and some fruit on it\nSkiers at the base of a mountain, one is fixing bindings.\nA tennis player readies to swing as they await the ball.\nA man with a hat getting food from the refrigerator\nThe man is on his surfboard in the water riding it.\nA girl is pulling back a sling shot on her fingers\nA group of boats sit on the shore line.\na person using a laptop in front of a television\nA group of Zebras grazing in a field.\nThere is a toilet in a bus or plane stall\nA kid is touching an elephant's trunk near a fence.\nA photo looking down at a parking area with garbage and old vehicles.\nA crowd of people standing outside of a bus.\nA man is holding the waist of a woman as they both stand and smile together and look straight ahead.\nA herd of animals walking across  a grass covered field.\nA rectangular vase is displayed, surrounded by flowers.\nA bear sleeping in a tree, with the branches hiding its face.\nA meal with meats, salad and eggs on a plate, a cup with soup, and a dish with something in it.\nA display of a variety of fruits and greenery.\nTwo giraffes standing on bare ground in a zoo.\nA small bathroom with a patterned tile wall.\nA group of people in a small boat in the water.\nA group of people is sitting in the living area of a loft apartment.\nA horned cow  standing in a green grass field.\nThe interior of a bathroom with a toilet and soiled floor.\nA child standing in the snow with pine trees surrounding him.\nA box that contains a cooked pizza in it.\nA gray and white cat near a black goat outside of a barn.\nA person playing baseball with foot up in the air.\nA giraffe head sitting next to a branch.\nA nigh time elephant parade or show in a street\nA group of people on a court with a tennis racket.\na person skiing down a snowy slope\nA black and white image of some electronics near a pen and cup of coffee.\nA little dog balancing itself on a surfboard.\nThe plate has two sausages, noodles, and broccoli.\nA mother bird sitting with her baby birds.\nA woman is showing a white teddy bear to a man.\nA man and baby are holding their arms up while at a dinner table.\nTwo giraffe standing next to each other near a stone mountain.\nA man riding on the side of a wind sail.\na woman riding down part of a snowy hill with a snowboard\na small bathroom features a tub, large vanity and mirror.\nServing dishes of fruit and cheese sit on a table\ntwo large air planes on a run way\nA traffic light suspended over a snow covered road.\na cat is sleeping next to a laptop\nMan looking at cell phone while on another at a game.\na computer PC monitor and a keyboard and mouse\nthere is a man standing in the field with two cows\nCute teddy bears with flowers lying around together\nThere are people on an outside platform waiting for the approaching train.\nA kitchen with a stove, refrigerator and dishwasher.\nA cat laying in a wooden chair with a patterned cushion.\na woman on the tennis court playing tennis ball\nA man holding a white and yellow frisbee.\nA person feeding a kitten from a bottle.\nA young man is on his skateboard going down the road.\nthree baseball players and one is hitting the ball\nSheep on a grassy hillside overlooking a river.\nThe guy is skateboarding while walking his dog.\nA teddy bear with a red hat sitting on a bed with fluffy pillows.\nA stop sign has street signs crossed on top of it.\nthere are many people that are standing around this building\nA hot dog sitting on top of a plate with a salad.\nRed train giving tour crosses a beautiful bridge\nA couple of birds sitting on a tree, with a blurry background.\nBundt cake with icing sitting next to another decorated cake.\nA batter has just hit the baseball in this small-town baseball game.\nSandwiches\nDisplayed for sale at a shop by keeper\nA man in a suit carrying an umbrella walks across a tight rope while a woman in a gown waits for him on the other side.\nA person takes a picture of people holding different pink umbrellas.\nThe cat is looking inside of the open backpack.\nMen and women are playing a softball game.\nA man skateboards up the side of a wall.\nFour boxes that have pizza on them in a row.\nThe small bathroom has a metal toilet and railing.\nA colorful public restroom focused on the sinks.\nDecorative clock with three owls for the framing hangs on wall next to mirror\nA plane flying low over a snow capped plain\nA train is traveling down a road with buildings.\nA clock tower with the American flag on top.\nA ripe banana is sitting on a table with a cat key chain on top.\nHappy people sledding down a snowy slope together\nA wet rain soaked street surrounded by buildings and trees\nA woman presenting on a computer to a large group\nTwo cats resting comfortably on a double bed.\nA busy street has many cars parked on the side.\nA black cup with a spoon sticking out next to a folded pair of glasses.\nAn airport filled with jets next to a parking lot filled with cars.\nThe manager is having a conference with his pitcher and his catcher.\nHome base of baseball field with an umpire and catcher squatting down, and  a hitter bent legged, holding a bat against shoulder.\nA pizza clock mounted to the side of a wall between two windows.\nTwo kids using an electric toothbrush at Christmas time\nthree zebras in the foreground and wildebeests look around\nWoman holding a banana over her face in the guise of a smile.\nA computer monitor sitting on top of a desk.\nA living room filled with furniture and a window.\nA room filled with luggage sitting next to furniture.\nTwo men skiing on snow in the woods\nA toddler on counter top eating a banana near the electric stove\nView down a city walkway and street, with grass, pedestrians, trees, cars on street and parked on side of street, a bench, and some buildings in distance.\nSmall toilet with tiled wall and patterned flooring next to it.\nA stuffed toy is packed in a bag.\na counter with cleaning supplies ice cube trays and racks from a fridge and a drawer missing\nA man is seated at his computer desk and looks at the camera.\nTwo people on a beach throw a frisbee.\nSeating area with many benches outside a building.\nA man in a black cap is purchasing a bottle of Aquafina water at a grocery store.\ntwo buses and a streetcar on a busy street\nTwo large horses stand nose to nose in an open field.\nA modern kitchen with recessed lighting, appliances and an island with a marble countertop.\nA kitchen with a large stove and hanging pots.\nA warthog and a zebra running in a grassland.\na person doing a trick on a skateboard\nA bare bathroom with a sink and toilet.\nA woman trying to eat a donut tied on a string\nKitten laying in a brown loafer stretched out\nA group of planes near a large wall of windows.\nAn apple and orange resting on a table.\nA red pick up truck parked on a field next to another truck.\nSeveral commercial jets lined up at the gates at an airport.\nThe clocks are built onto two sides of the building.\na person that is standing on his head with a skateboard\nthere is a red bus that is parked outside\nThe cows are standing on the hay in a meadow.\nA marina filled with lots of small ships.\nA skate boarder takes flight on a high jump.\nA small dog carries a frisbee in its mouth\nA toilet with a full roll of paper and plunger.\na bath room with a toilet a sink and a mirror\nAn umbrella obscures a person sitting outside a store.\nTwo giraffes walking through a fenced in enclosure.\nTwo very large pizzas sitting on top of wooden cutting boards.\nA young girl squishing her body into  a suitcase\nan image of a woman sitting on the bench\nPeople sit around low tables eating pastries, drinking juice and coffee.\nBurgandy colored train coming around the tracks in wooded area\nA herd of sheep grazing in a grassy field\nA young girl brushes her teeth with an electric toothbrush.\nA half eaten bunt cake sits on a white plate.\nA group of people standing on the side of a ramp.\nA man is holding a skateboad and a pepsi.\na blue and white plate with a sandwich on a wooden table\nA fluffy cake is on a metal cooling rack.\nAn asian girl taking a photo cuddling with a teddy bear.\nA fire hydrant painted to look like a soldier.\nA couple of people with ties in a room.\na woman in a black top on a couch with a brown black and white dog\nA decorated garden with a sheep standing in it.\nA gold plated Chopper Motorcycle on display at a convention.\nA fishing troller boat docked next to a lighthouse.\nOn main street is the Wisconsin state fair presented by U.S. Cellular.\nA boy with his baseball mitt and ball.\nA street plaza with horse riders and onlookers.\nA baby is sitting on a potty chair.\nA gray cat is laying on top of a suitcase.\nA motorcycle parked in front of a brick building.\na person holding a paper sheep beside a busy subway car\nA bunch of bananas hanging on the tree\nA hotdog with relish in a basket with a receipt.\nA kitchen with wooden cabinets and a gas range.\nTwo birds are standing on a very tiny rock island.\nA Tennis player getting ready to hit the Tennis ball.\nA man is walking his poodle as the poodle stops to rest against a bench.\nA bedroom with the drapes open, and a television on.\nA plate holds french fries and a sandwich.\nA BLACK CAT LAYING ON A WOOD BENCH\na kid in pink is holding a stuffed animal\nA fire hydrant in a garden on a suburban street\nA parking meter on the side of a city street.\nThe man in the tuxedo is also bald headed.\nA man that is standing up with a cellphone.\nA white cat with a brown head sits in the window sill of a brick house.\nA small dog lies on a pillow near a toy banana.\nA white, black and green plane cake that is decorated.\nThis is a babes room that has a crib and a small couch and a dresser\nA bicycler is stopped at an intersection waiting to go.\nA woman pouring some wine into a glass.\na polar bear siting on rocks near a body of water\nA few men ride on top of elephants while they carry large pieces of wood.\nA giraffe showing his head to the camera from an enclosed area.\nA Zebra and a horse are together in the wild.\nPlums and bananas are in a glass bowl.\nTwo people sitting in chairs under an umbrella  in the water\nA black and white picture of people in a park, flying kites.\nA water hydrant on the sidewalk with plants nearby\nan image of the wilderness with a brook\nA dog lays in a bed and looks a little sad.\nAn adult elephant and baby elephant loving on one another.\nAfternoon at a dock with seagulls flying overhead.\nSeveral white chairs lay on a grassy field while cows mill about them.\nA yard and cars on the street covered with snow.\nA small red couch in a living room with a coffee table topped with a flat screen tv\nA passenger bus that is parked in a parking lot.\nA woman with her pants pulled down on the toilet.\nThere is a large window over the kitchen sink.\nA man riding a skateboard down a sidewalk.\nTwo giraffes standing in a wide open area.\nA person leaning back holding a tether while water skiing.\na tall tower with a clock on top with a sky background\nA little girl in fashionable rain wear is walking under an umbrella.\nA boy asleep in bed with his Christmas teddy bear\nA table with crusty bread and cheese platter on it\nSeveral horses standing on a hill while grazing.\nA skateboarder doing a kickflip in a skatepark.\nA couple of beds sitting next to each other.\nA giraffe is drinking water from a pond.\nSeveral people standing together with a red stoplight behind them.\nGiraffes are standing in an enclosure peering over a fence.\nA computer sits on a desk with a red chair in a bedroom,\nSome cute small kids sitting and playing a video game.\nA young man holding a doughnut sitting at a table.\nA group of men holding up a bunch of bananas.\nA man in a hat riding down the street on a skateboard\nSome cattle are walking on a dirt trail\nA man in a carnival outfit posing for a picture.\nA calico cat taking sun bath in a window.\nA woman holding a plastic utensil passing out a piece of cake.\nA group of five zebras walking in a grassy area next to a rhino.\nA motorcyclist stops on the road to allow a pedestrian to cross.\nRecyclable material in garbage bags are left outside.\nToilet with blue rug and blue rug cover saying please do not use. Sorry!\nA skateboarder plants his board at the end of a bowl.\nLittle boy swinging a plastic bat at a ball in yard.\nA man holding a racquet toward a tennis ball.\na dog going for a frisby with a house and vehicles in the background\ntwo men in womens pajamas playing on the wii game\na grizzly bear is standing in some grass and brush\nAn old city with canals filled with water.\nThe airplane is flying really close to the tower.\na close up of a toilet with a device over it\nA microwave mounted in a shelf with the microwave door open\na man with a tie and headphones sitting at a table\nA white toilet seat in some lavatory somewhere.\nA dog sitting on top of a bed under a window.\nA couple of people in a room with remotes.\nA cloudy day with two airplanes getting ready for take off.\nA few people are doing something at this point that is darting.\nA brown long horn cow standing on top of a field.\nthese are three giraffes on the grass outside\nA male tennis player jumping and swinging a tennis racket.\nAn orange cake with whipped cream frosting sits on a plate beside a book on the table.\nA black and white photo shows a man hanging out of a plane.\nthe double Decker bus is not in service\na couple of skiers on top of a snowy mountain\nA baby in a high chair at a table.\nA pizza oven with a baking pizza inside it.\nThree giraffes walk together across a field with trees behind them.\na close up of a bike at a train station\nThe woman is sitting at the table and eating pizza.\nMan with glasses and a mustache standing in front of a door.\nThis bathroom has a handrail in the shower.\nPEOPLE WAITING IN LINE TO GET FOOD FROM A FOOD TRUCK\nA hand holding up a cell phone that is taking a picture.\nSeveral motorcycles are parked outdoors facing each other.\nTHERE IS A TOILET IN THE CORNER OF THE ROOM\nA man and a woman are standing besides a parking meter on an urban and colorful city street.\nA man and a woman pose for a picture at a party\nA panda bear rolls around looking ridiculous.\nA cat has made itself comfortable on the chair.\nA bathroom that has different posters on the wall.\nA sign for a pizza place rests on the ground.\na couple of birds that are on a branch\na yellow and white concrete truck next to a bus\nA man doing tricks on a skateboard on the street.\nA large bus and some people on a road.\na close up of a cow near a wooden bench in a field\nan airplane is taking off from the runway\na large pizza is laying on a table\nA girl holding on to a large, white teddy bear.\nA skiier posing in front of a mountain range\nthis is a park with people flying kites\nA green passenger bus is boarding passengers near some water.\nmany people sitting at desks near one another\na stop sign with the red color looking all cracked\nA living area with couch, cabinet and many windows.\nA ham and chili sandwich is close up.\nA white bowl filled with vegetables on top of a wooden table.\na black and white god with blue frisbey\nCropped up carrots, onions, other vegetables on a on a cutting board\nIs this a Honey Dew donut or a bagel?\nAn airplane beneath a cloudy sky flying over a bridge.\na person cooking meat on a grill\nVery large TWA plane sitting on the runway with passengers milling about\na tall giraffe standing on top of a dirt field.\na close up of two people holding a video camera\nA large elephant walking towards a watering hole.\nA man is sitting down at a table, eating his stew and tortillas\nA tall clear glass with a very pretty flower in it.\nA man riding a wave on a surfboard.\nA trainer picks up his horse's lead rope.\nThese two cats are playing in a room that has a large TV and a laptop computer.\nan orange bathroom with a sink toilet and mirror\nA man is balancing on a skateboard while others ride and stand.\nan area with snow and lots of skiers and orange cones\nA man sitting at a table about to enjoy a healthy meal.\nA man that is sitting in a chair by a skateboard.\nA group of men and emergency responders surrounding a table.\nA man wearing blue jeans and a white shirt is on a skateboard in a skate park.\nA bird sitting on top of a log in a lake.\nA passenger plane sits on the tarmac awaiting passengers.\na bunch of bananas and apples for sale\nA small black dog sitting inside of a car.\nA passenger bus that is driving down the street.\nAn empty living room with a charred fire place.\nA person laying under the sheets watching television.\nA big bear sits on the ground and grabs on to a guy's leg\nA view of a airport with people towing luggage.\nA white refrigerator next to a counter with an orange box.\nA modern bathtub, with a water hose next to it.\na cat sitting by a person using a laptop.\nAn adolescent giraffe near the fence in its enclosure.\na close up of a person laying in bed next to a book\nA man riding a surfboard on top of a wave.\nA glass shower door near a sink counter.\nThe food is ready to be eaten on the table.\nA bathroom with a toilet, sink and red tile flooring.\nPeople at a ski lift, with people off to the side one leaning down in the snow.\nA flat bed is on the floor with blue blankets.\nA herd of zebras drinking from a watering hole.\nA young bear and a mother bear foraging for food.\nA man holding a Wii controller in his hand.\nMotorcycle parked on road waiting for train to pass.\nA brown table holding a vase and three flowers.\nA cat walking past a bicycle on a rock path.\nA large group of people are having a pizza party\nA hot dog covered in toppings on top of a container.\nRed and white flowers in a vase on a table\nMan directs two horses on an open field.\nA kid buying ice cream at a truck\nAn outdoor patio with chairs and tables made of wood.\nPeople are typing on their laptops in a room.\nSomeone who is cutting a cooked pizza with a pizza cutter.\nA little girl hits a tennis ball over a net while a man stands on the other side of the net.\nMan wearing a blue shirt and pink tie posing for a picture sitting by a window.\nWoman flying a kite on walkway next to water.\nA kitchen with lots of black counter top space.\nA man in an arena rides a bucking horse.\nA city view with buildings, bikers and walkers.\nA beautiful white horse pulling a green carriage.\nA small girl sitting at a table with several foods.\nA mix of broccoli and other items in a pan.\nA living room with a brown sofa, chair and coffee table.\na living room filled with furniture and a dog\nA white plate with a small piece of cake and a cup of coffee.\nA white toilet and hanging towels in a small bathroom.\nA stuffed teddy bear is sitting on the sidewalk next to a street.\nThe man is riding his horse on the land.\nPeople swimming in the ocean on a clear day.\nA clock tower with a blue sky in the background.\nA group of people in a wine cellar.\nAn elephant is standing in a grassy field.\nA train is coming down the tracks near a building.\nA bus driving on a rain covered street\nTwo laptops sit next to a tv on a tv stand.\nA desk with three computer screens and a desk chair.\nA vase with flower on top of a table\nA man in a vest is eating a banana.\nA small horse is standing in the grass next to a larger horse.\nA white vase with some cherry blossoms in it\nA vase full of roses on an office desk\nvery many trains  at the railway station to their directions\na old train that is on a train track\nTwo elephants stand face to face as if conversing.\nA man sitting on a park bench holding paper\nA cake on a plate next to some oranges.\nOne slice of simple cheese pizza on a paper plate.\nAn electronic device that is available for free.\nA large truck next to people on a scooter.\nTable top with two sharpie markers and pair of scissors.\na close up of a dog laying on a couch\nAn old building with clocks at the tower.\nLots of silver and black remotes sit stacked on top of each other.\nA woman is talking to a man and holding a plate with a piece of cake.\nSeveral plates with snacks and sandwiches in a display.\nthree people sitting on a bench holding plates of food\nTwo guys are playing with the wii together\na bird is standing in a patch of dirt\nA white bear is laying out on the rocks\nA large herd of animals drinking at the water.\nA bus is parked on the road next to a building.\nA baseball player slides into the base, as the opposing team waits for the ball.\nA small kitchen with microwave and fridge.\nA man standing in front of a shelf filled with supplies.\nThe man grins in a restaurant holding a glass of wine.\nA black and white photo of a person swinging his tennis racket towards the ball.\nA snow field outside of a ski resort.\nA man standing in front of a motorcycle on a driveway.\nA small yellow bird sits atop a hanging water supply.\na bathroom with a toilet a sink and a mirror\nA cup of liquid with a fancy design on top of it.\nThis is a photo of a building with a large clock in the front of it.\na man standing next to a big red truck\nA baseball player looks up and drops his bat.\nA man on a tennis court holding a racquet.\nThe lofted ceiling features two white ceiling fans.\nTwo people holding remotes in their hands standing near a couch.\nA computer monitor, keyboard, and tower with peripherals and plugs sit on a desk.\nThe bears are at the water, along with a seagull and another sea bird.\nA white bus that is sitting in front of a crosswalk.\nA white bowl with shrimp, broccoli and rice.\nStreet signs with trees and rocks in the background.\nA photograph of a tiny bird on top of a tree branch.\nThree couches are in a living room arrangement.\nA man wearing a hat, standing on a snowboard in the snow.\nA toilet near a wooden stool with a container on top.\nA sloppy joe being displayed on a plate.\nAn elephant statue painted black, blue, white, red, green and yellow\nTwo adorable dogs enjoy a nap on a bed together.\nA narrow room with various luggage and two men.\na person holding a carrot with a bike in the back ground\ngolden delicious apples, coffee beans, and blueberries are in the foreground of this photograph, in the midground is a banana, and in the background are varieties of cookies.\na baby zebra nursing from an adult zebra\nThe multi-colored cat is standing on the roof of a car.\nA dog sitting on the floor in a room.\nSmall children in red and blue uniforms, kicking a red soccer ball.\nA cat is sleeping on a wooden chair\nA man flying through the air while riding a snowboard.\nA couple of giraffe standing next to each other.\nA man in leather and a dog with a hat and sunglasses on a motorcycle with people walking around them.\nA giraffe is standing erect on a dirt path and grass and trees are in the background.\nMany cars are parked at the curb or are traveling down the street.\na small child in a black top a kite and some grass\nAn empty chair at a desk with a computer\nA antique style bedroom with hardwood floors and accessories.\nA man riding a wave on top of a surfboard.\nsome girls playing a softball game with some people watching them\nP.O.V. of laptop with people walking by on the path\nTwo horses standing on a grass covered hill.\nThe  man is driving the horse fast\nA bus parked at a bus stop letting passengers get on.\nTwo people are attempting to catch a Frisbee.\nLighted urban street at night with cars and buildings.\nPeople milling around a row of two story busses.\nA plate with a sandwich and a salad next to a pickle.\nA group of people flying kites in the national mall in Washington, D.C.\nA plate of food with carrots, green beans, brussel sprouts and sauce.\nAn old white boat sits in the port.\nA stoplight with street signs on it\nA cat eating food off of a wooden floor.\nTour bus parked on an empty street in a tropical city.\nA skier is carrying their skis and poles in the snow.\nA woman with yellow gloves on looking at herself in a mirror and covered in blood.\nA model set has boat in water going under a drawbridge\nA plate with two grilled hotdogs noodles, macaroni and cheese and corn.\nA blue dish filled with steamed carrots and broccoli.\nA person on a surfboard on the water.\na well made bed in a hotel room with a window\nA woman poses next to a statue of a giant piece of luggage.\na mushroom and broccoli stir fry on a bed of rice\nA desk topped with snacks and electronics with office supplies.\nA man in white shirt doing a trick on a skateboard.\nA woman in shorts waving to a teddy bear mascot.\nA woman sitting on a bench outside holding two donuts.\nA group of people in a kayak rowing together\nA woman that is standing holding a remote.\nA horse is on a brushy hillside on the gravel.\nA man cross country skiing in the snow under dark clouds.\nA dog standing in the grass as a frisbee flies pass him towards the bushes.\nthe men are in the middle of a tennis match\nTHREE MEN STANDING NEAR A PARKING METER, ONE OF THEM PUTTING IN THE MONEY\nA white dog running across a field with a frisbee in it's mouth.\na school bus having a colorful shirt sale\nSome very cute small boys at a table with food.\na woman staring and some do nuts in a plate\nA man on a cell phone sitting at a booth table with books.\nA toilet that was set outside and a small part of it was broken.\nA woman brushing the teeth of a baby in a bathroom.\nTwo giraffes in the savannah with buses in the background.\nA bus pulling into a bus stop in the city.\na gray fire hydrant with eyes and a girl with a backpack\nA woman in a white tennis dress playing a game of tennis.\nA cat is looking at two pigeons perched on a ledge.\nA group of people walk through the middle of the street.\nA man is using his large laptop in the living room.\nA young man skateboarding on the rim of a crater\nsome kind of cabinet that is in a building\na number of people in a field with many kites flying above\nA TV showing two men in hats and women.\nA sandy area with an elephant made from sand.\nA small red and white airplane sitting in an airfield with a wooded area and a mountain in the background.\nA person holding a hot pizza on a pan.\nA baseball game in progress with the batter at the end of the swing.\nA bathroom with a glass shower door, toilet and a rug.\na wooden desk with a computer keyboard on it\nA boat in still water at a harbor at dusk.\nA horse tied to a post next to a tree\nA man with a helmet that is sitting next to bananas.\nA plate with a sandwich and chips.\nA zebra grazing in some very brown grass.\nA plate is full of a vegetable medley with a spoon in it.\nMan in full red winter gear on skis in the middle of snow.\nTwo orange sheeted beds in small room with desks.\nA triangle sign with an English and foreign warning\nthree zeebras all walking together in a row.\nA man looking at the bed in this lamp-lit bedroom\nThere is a man walking through the snow\nA man preparing to hit a tennis ball on a court.\nThere are several skaters at a skate park skating around.\nA feathered bird is sitting on a tree branch.\nA white table topped with tubes of tooth paste and tooth brushes.\nA counter with carrots, onions, peppers and other assorted vegetables on it.\nMen with horse and buggies pose in front of a train.\nA zebra that is looking at the ground.\nA black cat is sitting near a mirror and a picture.\nA kitten on a laptop sitting on a desk.\nTwo mean are playing tennis and both are wearing sunglasses.\nA sheepdog at work herding some sheep in an enclosure.\nA bus that is sitting in the street.\nan image of a truck driving down a dirt road\na man sits on a bench next to a dog\nA plate wit some very tasty looking treats.\nA woman glides over the water while standing on a glider.\na person riding a skate board on a city street\nA kitchen with dim light in the evening.\na small boat on the ground tethered by a rope\nA person in a mirror in a very small rest room.\nA large jetliner flying over a forest in a  blue sky.\nA man and woman beside a red motorcycle.\nA group of people walking and cross country skiing in the snow in the middle of the city.\nA very pretty lady touching a cute bow tie.\nA bathroom with a toilet, sink, tub, towel rack and a window.\nTwo empty motor boats floating in the water.\nA women wearing a top hat who is riding a horse.\nA group of people at some sort of function\nA pizza sits on a plate on a table cloth.\nA couple of cows and some people near a bike.\nThere is a baby sleeping on the bed.\nA bleeding cut on a thumb near the nail.\nthis is a dog looking through a arear view mirror\na group of birds eating on some pizza\nA hand holding a donut with a grassy field in background.\nA room with a bed and some furniture.\nPeople are skateboarding by the ocean at sunset.\nA train drives past a station during the day.\nA small girl is smiling next to a large pizza.\nA closeup of this Giraffe shows his interesting head.\nA white plate topped with different types of cake.\nA living room with a large decorated Christmas tree.\nTwo hands that are holding a rose next to a tie.\nThree women sitting at a table with drinks.\na couple of boats sit parked on a beach\nA train stopped at a train station with passengers standing next to it.\nBottles and other items on a counter top.\nSeveral different kinds of julienned vegetables in a bowl.\nTwo men playing tennis with one man preparing to hit the ball.\nA yellow taxi cab that is parked illegaly parked in front of a fire hydrant.\nThere is a little toy hanging on the key chain.\nA teddy bear sits on top of a sandwich board with writing on it in front of a cafe with outdoor seating.\nA tasty looking slice of pizza with some toppings.\nA freeze-frame series of a baseball player making a pitch.\nA living room with a couch that has blankets on it.\na giraffe walking on a green field next to trees.\nA cat that is laying down on a couch.\nA table with a lamp next to apples sitting on top of each other.\nA lot of different size trees in the woods.\nMangos, strawberries, and other fruits being prepared in a kitchen.\nThere is a road filled with busy traffic including odd buses.\nA red and white, beige and pink moped parked on the street.\na man in motorcycle gear standing next to a motorcycle parked near a tent\nA pile of band aids and medical supplies.\nA small elephant standing underneath a wooden structure.\nA man asleep sitting up on a metal bench.\na couple of women pose for a picture\nA bird bath with three birds amongst some greenery.\nA man and a woman sitting on a bench with laptops.\nA man standing next to an older man near a plane.\nA zebra is standing in the middle of a field.\nA small boat sailing along in the open ocean\nA man is trying to ski down a small hill.\nA woman holding a cell phone is standing near a person on a bike.\nA city street filled with heavy traffic flow.\na drain on the floor next to a trash can\nA tan dog rests on a public bench in a city at night.\nA couple of giraffe standing in the middle of a forest.\nA dog sits in a car, looking out the window.\nA street sign in the city giving directions to several intercity areas.\nBlue bench in front of a large sandy beach.\nA herd of cattle standing on a lush green hillside.\nA large truck on the side of the road.\nYoung men playing with Frisbee in sports like competition.\nA computer, coffee cup, and books sitting on a table.\nA group of people on a grass field with kites.\nA plate of chicken, green beans and mashed potatoes.\nA giraffe and another animal are standing on the grass.\ntwo males are on some grass playing frisbee\nTwo large blue bird with red heads walk on a grassy area near a body of water.\nTwo brown bears are wrestling in the water.\ntwo cats laying on both ends of a bed\na table with fresh vegetables and some dressing\nA group of people waiting by a large clock.\nA picture of a street sign on the street.\nA keyboard and monitor are sitting on a desk.\nA herd of elephant gathered at the edge of the water\na bed room with a bed and a window\nThree men with skateboards standing above a ramp.\nA rooster and a hen are standing on a bed of hay.\nA damaged, leather suit case sitting on a dirty sidewalk.\nThere is a clean bathroom with a blue floor.\nThere is a veritable banquet of fresh fruits on the long buffet table.\nAn area in front of a building has fountains, trees, benches, and people.\na young female standing in front of a large cupcake sculpture\na beautiful horse and a lady is standing by it.\nA happy woman sits on the couch while holding a glass of wine\nA woman with a tennis racket on a tennis court.\nA fire hydrant that has been colored red, white, and blue.\nThe cutting board has apple slices on it.\nA group of zebras walking in a grassy savanna.\nTHERE IS A WOMAN THAT IS PLAYING WITH WII\nA zebra standing next to a parked car.\nThree different dishes of food on a wooden table.\nthere are a few ducks that are sitting in the river\nSandwich on a bun in a white plate with a blue rim.\nan old photo of an elephant near a body of water\nSeveral people watching a snowboarder grind on a rail at night.\nA woman standing outside with her umbrella open.\na black and gray pigeon some windows and a building\nA group rides horseback down the beach.\ntwo giraffe standing side by side next to a group of trees\nA fence separates three people seen inside the dugout.  One young boy is in the batters box with his bat ready and another one is standing behind him and a man has his hand up to block the sun and is looking off to the distance.\nA public restroom that is kept clean for it's customers.\nSome elephants walk together through the grass.\nA person holding an opened umbrella walks down a wet street.\nThere are four cows in the field together.\nOne man playing the wii by his lonesome.\nBaskets of oranges are lined up on a table at a market.\nSeveral woman are at a table while one of them slices a cake.\nThe kitchen needs to be cleaned before we can use it.\nA wine glass sitting on top of a glass sculpture.\na close up of a stop sign and a street sign\nSomeone placed bananas, strawberries and oranges in a blender to make a smoothie\nWoman walking a small white dog behind her.\nTwo riders on horseback cross a desert landscape.\nA person is riding down a hill in the snow on skis.\nParty of people in canoes going down a river while site seeing.\nA man on a snowboard is performing a trick.\nA corner of a bathroom showing the sink, medicine cabinet and small window.\nPicture of what might be a TV remote control and a distorted picture in the front.\nThis double-decker bus is headed for White City.\nA kitchen with green walls, white trim, and a refrigerator.\nA Captain Jack Sparrow look alike plays tennis with school children.\nA person is carrying some luggage near the train.\nA pita and some fries sit on a large, white plate.\nA giraffe is stretching its neck above trees.\nA small dog sleeps in a basket on a computer desk.\nA kitchen filled with furniture and a painting on a wall.\nA street sign warning cars to keep clear of a driveway.\nA bathroom with gray walls has a fan in it.\nA man and a woman are throwing a Frisbee.\na person eating a hot dog, with a basket of toppings.\nA street pole with multiple street signs pointing in different directions.\nA Vietnamese woman stands on a boat laden with produce.\nA display of ceramic items on a street.\nA group of people wait in a red reception area.\nA dog tied to a sign next to a man on a bike.\nA white cat and a wooden bench by a building.\nA man sitting down with a brown teddy bear on his shoulders.\nwhite plate with a variety of vegtables on a scure\ntwo kids are eating pizza at a small red table\nA man is smoking next to outdoor tables.\nA garbage truck is going down a well lit street with buildings all around.\nA beautiful colorful angel portrait in on the back of a vehicle outside with nobody around.\nA line of buses stopped at a crosswalk as someone crosses the street.\nA rusted-out farm truck in a mountain field beside trees.\nA women holding a wooden board that has a desert on it.\nTHERE IS A BATHROOM WITH A SINK AND A MIRROR\na baseball bat and a brown glove on some grass\nA bench next two a table holding several pamphlets.\nTwo cats are laying on a bed in a bit of sunlight.\na man leaned over a toilet inside a bathroom\nsome skies are on a stand outside in the snow\nA menu board at a fast food restaurant.\nA bench sitting on top of a sandy beach next to the ocean.\nA kitchen with wood cabinets, stainless steel oven, stainless steel microwave and a refrigerator and a hole where the cooktop goes.\nA goalie is guarding his end at a soccer game.\nA clock on a boardwalk near a beach.\nThere is no image here to provide a caption for.\nA black dog running on the sand with waves in the background.\nThere are several stuffed animals standing near a brick building.\nA laptop computer sitting on a table with a glass of beer.\nA table topped with lots of different foods and sandwiches.\nA group of elephants sitting in the middle of the forest.\nThe young man is playing a game of Frisbee toss.\nTwo vases and a potted plant sit atop a worn dresser.\nA wire fence divides a background of backyards and houses from a yard with a child kicking a large ball.\nThree friends are getting ready to ski on a warm, sunny day.\ndifferent colored ribbons a basket and a pair of scissors\nAn adult and baby giraffe in an enclosure.\nA kite blows in the wind above a large sandy beach.\nA television monitor mounted on the ceiling of a plane or bus.\nA boy stands in the grass with his mitt open.\na man with a cell phone attached to his hat sitting on a bus\nA kettle sits on a kitchen stove beneath a shelf storing a blender, canisters and other items.\nA plate with a couple of scones and a kettle that may be tea.\na bunch of kites fly through the air\nA pizza sitting on top of a box on top of a table.\nPark benches are lined up in a room in the grass.\nLittle girl holding a ball over a red and white fire hydrant\nA blond toddler in a pink shirt brushing her teeth.\nA swan is floating down the river by the boat.\nA woman plays with a Nintendo Wii in her living room.\nContemporary living room setting in urban residential building.\na toilet and a sink that is in the bathroom\nA pair of individual using sails to surf.\nA man standing in the grass with hid dog.\nA man with a pizza playing on the computer\nA man holding a ball on a mound of a baseball field.\nMan in black vest with orange tie looking at the camera.\nA person walking with a kite in the air.\na number of people standing near by parked motorcycles\na couple of people climbing a hill of snow\nThe outside of a house that has a clock in front of it.\nA slice  of cheesecake with a red sauce and berries for topping.\nAn empty bed with a teddy bear laying on it.\nA bear is seen walking in a forest in a blurry photo.\nChildren sitting down to eat lunch at school.\nSeveral skiers wearing colorful attire ski slowly across a snowy mountain\nA few cows graze in a big wide open prarie\nA man standing next to a baseball player laying on the ground.\nA white dish filled with vegetables on a white table.\na bathroom that has a sink in it\nA pizza slice,with tomato on it and cheese\nAn open air market with a lot of fruits in a bowl\nA man in a wet suit riding a wave on a surfboard.\nA train is on going down the track while people watch.\nA woman smiles as she holds a parasol.\nA pole pointing where different things are at.\nA dog on a leash is standing in a grassy field.\nA large tall tower with a clock on top.\nSeveral people are sitting around a table with food on it.\nThere is nothing but beer bottles in the fridge.\nA young man with ear phones holding something.\nA black and white photo of a martin Luther King next to a Lincoln statue.\nA dog stretched out on the grass with his tongue hanging out.\nA train sits parked on the tracks in front of a billboard.\nA young blonde girl holding up 2 cell phones.\na baby giraffe has its head under its mother\nThe animal has very large horns on its head.\nA shelf with hygiene products in a bathroom.\nA clock mounted to the side of a brick building.\nA woman dressed in black is playing tennis on the court.\nA man eating a hot dog at a sporting event.\nthere is a woman holding a little girl taking a picture\nThe pay station for a parking lot is in a location that has recently had snow.\nthis bathroom is big but has a small tub a toilet and a sink\na white and red tow truck some trees and a building\nA large group of girls enjoying a pizza party with pizza and soda.\na man that is riding on a bike\nMale tennis player rushing hard to hit a ball.\nA half eaten piece of pie is all that is left.\nA man sitting in front of a table with a box of cupcakes.\nthe boat is sitting on the shore line away from the water\nA couple of airplanes flying through a blue sky.\nthree large dogs sitting outside near a forested area\nThe young soccer player is kicking the ball.\nA car and a motorcycle parked side by side.\nan image of a man on top of a snow mountain\nA batter waiting for a pitch with the umpire and catcher behind him.\nA zebra stands in snow in front of a wall.\nA stuffed monkey on a computer desk with two computers.\nA table full of asian styled dishes and soup\nA two stories bus is parked on the side of the street.\nA woman riding a bike near a bus and other people.\na man performs a trick on a skate board\nA group of people on motorcycles at an intersection.\nA woman that is standing next to a dog.\nA dog is in a yard with a Frisbee in its mouth.\nA group on people standing on steps and posing for a picture.\na couple of people that are sitting on the porch\nRockers with crazy hair holding out a racket and smoking a cigarette.\nA woman teaching a little girl how to ski.\nThe car us upside down on the road way.\nA child washes grapes in a stainless steel sink.\nA man dressed in a suit is eating carrots.\nA stuffed elephant standing in a museum window.\nTwo keyboards and a computer mouse on a table.\nA crosswalk signal at an intersection with a car and a bus.\nAn elephant walking across a dirt road in front of cars.\na small bed with blue comforter and sheets\nAn asian market has hanging bananas by the roof.\nThese people are posing in front of the trees.\nThere is a bathroom and a shower in a bathroom.\nA giraffe standing next to some dead brush near a bird.\nA pile of locks of hair next to a pair of scissors.\na green plant  is in a glass vase\nA man that is holding up a camera.\nA man is eatinga beignet covered in powdered sugar.\nTwo pieces of bruit are set bside a keyboard.\na bathroom that has a toilet and some nasty stuff all over\nShower with a removable shower head and a soap dispenser.\nA man in a wetsuit is holding his surfboard on the pier.\nA herd of sheep in a grass field.\nMountains with steam coming from them with horses on the lowland.\nTwo pizzas are on a pile of white plates.\nA toilet in a bathroom with large signs on the wall.\nThe horses are grazing in the grass along with another animal.\nThis is a woman getting on a motorcycle posing for a picture.\nA man prepares to swing while playing a video game.\nA young blonde boy leaning on a toilet.\nTwo glasses of wine, two hot dogs and some tater tots.\na close up of a plate of food with salad on a table\nThe silver cover of an Apple laptop computer\na woman is holding a baby in her arms\nTwo people are walking down a snowy path with an umbrella.\nAn airplane is descending in the air to land.\nA very drab looking room with a mattress on the floor.\nA bin is piled high with many apples.\nA young child is standing in the grass with a frisbee\npeople are sitting on a bench together outside\nThere is a burger in between two glazed donuts\na man surfing on an ocean wave headed to a beach\nThe head of a Giraffe with its mouth on a tree branch.\na little girl is holding a video game controller\nA bicycle is chained up and locked to a sign post on the sidewalk.\nA pile of Chinese noodles with broccoli mixed in.\nA black and white photo of people flying kites\nA group of people next to a person on a surfboard.\nCloseup of a hand holding a Wii controller.\nA glass vase with flowers in it next to a pair of computer speakers.\nA store window with the reflection of a parking lot with a stop sign.\nTwo young men playing a game involving a disc on grass.\nA large yellow train on a steel track.\nA person is in the distance while a brown dog is in midair and is running after a frisbee.\nA giraffe looking down while in a zoo pen.\nA group of people are standing on the beach flying a large kite.\nA slice of slightly eaten homemade pizza on a plate.\nStacked kites with long streamers being flown in grassy field.\nA tray filled with plates and dishes full of food.\na plate with bunch of diffent foods mixed together\nThe dog and cat is laying in the bed with the man.\nFamily sitting at the table together enjoying dinner\na little girl sitting on a small kids toilet\nA clock is affixed to the wall of a religious institution.\nA large clock that is on the top half of a building.\nA bathroom has blue guard rails by the toilet.\na man in a blue jacket and helmet on a black horse\nA man sitting in a chair sitting inside of a living room.\na snow skier wearing black shorts and a blue jacket\nThree park benches at the edge of the water\na couple of baseball player high five\nA large dirty airplane is sitting in a dirt field.\nA fancy steak sandwich served with fries and dipping sauce.\nA beautiful living room view with a vase sitting on the table.\na bathroom with red walls and a tiled floor\nA cow standing in the alley near a building.\nA girl on skis is grabbing a man's head for support while several people watch.\nA couple of giraffe standing on top of a green forest.\nChildren are in the living room playing a video game\na black computer keyboard on a wooden desk\nA very narrow bathroom with a walk in style shower.\nA light airplane flying in a cloudy sky\nAn older man watches a kite fly from across a body of water.\nA desktop computer sits on an old and scarred wooden desk.\na couple of men that are standing up\nA person that is kneeling in the sand near a bike.\na bunch of people that are skiing around in the snow\nA person is sitting in a chair and a bird is on the ground.\nAn old fire hydrant on the edge of a city street\nThe painting shows a naked woman using her laptop.\nSeveral people walking across the street in the rain with umbrellas.\nA clock tower sitting in the lobby of an airport lobby.\nBar stools at a bar separating a dining area from a kitchen.\nA guy sitting at a dining table with some tasty looking food.\nA kitchen with white cabinets and a cool tile design.\nPictures of a bathroom taken at different angles.\na person standing under a colorful umbrella and wearing a big hat and sunglasses\nA boy on a skateboard in a skate park performing a trick.\nCafe tables with table cloths and orange umbrellas over them.\nA dog standing with his head outside a caged area.\nThe interior of a kitchen with wood floors and large appliances.\nA broken black umbrella laying in the street.\nA black and white cat beside a wood carving.\nA small group of zebras is standing beside a water hole.\nA big, fat bird has some crazy hair\nA woman watches a dog watching a man eating a sandwich.\nA person wearing an orange back pack standing in front of park benches.\nA king size bed in a hotel room.\npeople walking around with a bus and car on the street behind them\na group of people with umbrellas walk on a side walk\nA small bedroom with a bed and a desk\na stove on the front lawn near a side walk\nTwo elephants in a herd playing with each other.\nA tour bus stopped in traffic on a busy street\nA large pizza sits on a large white plate.\na living room with two big couches and green chairs\na teddy bear sitting on wooden steps leaning on a pole\nthere is a small pizza and broccoli on a plate\nA plane sitting on the tarmac at an airport\nTwo adorable chubby dogs sleeping next to each other.\nA woman in shorts giving a thumbs down signal\nA Siamese cat staring at a laptop computer screen.\na big plane flying through the blue sky\na clock tower near many buildings wit ha sky background\nA cat lounges on the arm of a sofa near a window.\nThree people with a video game remote in their hands.\nA vase with a white long stem flower in it.\nA parking meter with no time left in front of it.\nA four faced clock a top a stone column in  a parking area.\nA group of people standing outside of a blue ice cream trucks.\nA cluttered desk with books, bag  and electronics\nthis is a baby and a blue chair\nA bathroom sink with travel size soap and shampoo.\nThe skier with the animal cap is standing on the mountain.\nA baby observing a calf eating hay outside.\na little kid holding a toothbrush standing in a doorway for a bedroom\nA red parrot eating a piece of fruit from the palm of a hand\nA sports announcer talking on a cell phone while on a ball field\nA very small bathroom stall with a toilet and several rolls of toilet paper.\nSeveral sheep standing around in the grass.\nPeople standing around talking and doing different things\nthree people dressed similarly playing frisbee on a tiled floor\nA citrus fruit sliced in half on a plate.\nTwo elephants are walking through the tall grass.\nthe person is standing next to the animals in the water\nA woman looking at a tablet while standing outside a train car.\nA cat with a irritated look sitting on a bed.\na man sitting on a green bench in a park\nA wooden cutting board next to a window topped with fruit.\nA cat is sitting behind the keyboard of a cluttered computer desk.\nA red double decker bus is seen in London.\nA fire hydrant is painted red, white and blue and sits on a sidewalk in front of a brick wall that shows graffiti.\nA pack of elephants are trampling in the sand.\nA person kneels as they ride a wave.\nA group of elephants walking across a large river.\nA little girl tossing a red Frisbee in a driveway.\nA group of zebras grazing in their enclosure\nA white plate with a hot dog topped with mac and cheese.\nA giant sheep with a lot of fur eats outside\nA black and white picture shows a tree covered hill.\nA couple of elephants washing a baby elephant in a river.\nA young man that is wearing a nice suite coat with a skirt and a purse.\nA big elephant playing in a puddle of water\nFour People riding two elephants across the water.\na person riding a snow board on a snowy surface\nA red train traveling down train tracks through a rural countryside.\nA toilet, sink, mirror, and tub in a bathroom.\nA truck made to look like a train parked on the side of the road.\nmany different vegetables are sitting on a white counter\na woman walking outside with oranges on a stick\nAirliner being moved by tow vehicle near airport terminal.\na river that has a bridge with a train on it\nA cat rubbing its head on a laptop.\nA plate of toast and other breakfast items.\nA light blue sky filled with colorful kites.\nA bear reaching up towards a tree on a rocky hillside.\na bunch of guys in front of a table with cake frosting on their faces.\nA truck waiting in front of the warehouse.\nTwo lamps by a window looking out at a forest.\nA bathroom with a toilet and a sink.\nA baby sitting in the grass watching kites fly in the sky.\nSome motorcycles are parked on a brick area\nThe fire hydrant is painted all completely yellow.\nA table with a bowl of food and some mugs.\nThe layer cake is on the flowered plate along with a fork.\nA woman with good posture sits at a wooden desk with an open laptop.\na horse drawn carriage on a city street\nA little boy is smiling at the camera in front of a brown chair.\nThe antique furniture and mirrors are next to the wall.\nA woman running through a city while carrying a Frisbee.\nA boy in camo shorts stands before an overturned skateboard.\na day of the dead offering with fruit\nSome people are hanging out and playing the nintendo Wii.\nA car in flood waters in front of a camping area with camping trailers that is flooded.\nA very cute curly haired dog with a toy.\nA man and woman sitting on a vintage motorcycle.\nA street at night time with many different lights.\na person at the zoo feeding a giraffe\nSomeone holding a sandwich like food object with a few bites taken out of it.\nA cross country skier walking in snow during the day.\nA blue two layer cake sitting on top of a counter.\nTwo teddy bears one dressed as a female and one male\nA black stuffed animal sitting on top of a toilet in a bathroom stall with blue floor tile.\nA woman leaping into the air while holding a tennis racquet.\nA carrot is being sliced as well as an onion\nSeveral dogs on a yellow school bus with a stop sign below the window.\nA woman sitting down with a large cell phone holder on her pants.\nA seated angel figure next to a clock dial.\nTwo beds sitting next to each other in a bedroom.\na nun rides around on a motor cycle around on the street\nA horse standing around in teh middle of a farm.\nA family of four sitting on an outdoor sofa\nFine food served with sauce on a white plate\nA kitchen with a microwave oven next to a stove top oven.\nA small-furry dog on a red seat in a living room.\nA wet woman with two horses wading through a river\nSome elephants that are together in an enclosure.\nA couple of people with many bikes on a street.\nA couple of men standing on a tennis court holding racquets.\nTwo halves of a sandwich sit on a white plate on a table.\nA peeled orange sitting on a white table next to the peelings.\nA man in an orange outfit is directing traffic to drive slowly.\nA woman in tennis attire swinging a tennis racket.\nAn elephant standing eating hay in an enclosure.\nA blue and white plate with a chocolate dessert on the plate and powdered sugar on top.\nWoman holding red cased cellular phone in room.\nThe man is showing the mess in the fridge to the ladies.\ntwo tennis players with rackets and balls on a court\nPicture of arctitecture probably a church or university.\na dog on a skateboard in a shirt\nA group of giraffes and zebras in an  enclosure\nThere are boxes which haven't been unpacked but the television is already up on the wall.\nA fire hydrant painted in the American patriotic colors\nA green, grassy field with grazing animals on it.\nA mid sized commercial airline flying in the air\nA man taking a bite out of a doughnut.\nPeople are standing in the grass playing with a frisbee.\na man about to throw a green frisby\nA woman with her head out of the photo is standing barefoot in a simple dress holding a suitcase.\nA cat is on a table with stuffed animals.\nA highway with several cars on a cloudy day.\nA streetsign with one side pointing to Maciel and the other pointing towards Wonderstump.\nA room that has two people sleeping in a bed together and another bed on the other side of the room and a person at a desk and computer.\nA polar bear in a polar bear enclosure at a zoo looking up.\nA man sitting on a couch and a man on a chair.\nA woman is hanging up post it notes in a kitchen.\nThree giraffes under the shade of the trees.\nA woman sitting on a yellow surf board on the beach.\nA boy is running while holding on to a kite.\nA meal is being prepared on the stove in a kitchen\nA room of bookshelves with books, suitcase, area rug and tv\na small copper vase with some flowers in it\nThey are holding a frisbee together while hugging each other.\na little girl that is outside with a umbrella\nA toilet in a white bathroom is seen in this image.\nA man dressed in a suit and tie posing for a photo.\nA man actively plays wii in front of a television screen.\nA bear looks around in a rocky enclosure.\nA commuter train passing through a small town\nA couple of computer monitors on top of a desk.\nA four sided clock on a raised pole.\nan image of a slice of pizza on a white plate\nA low to the ground stop sign on the corner of a suburban street\na kid sitting down eating a slice of pizza\na body of water with buildings near by\nthere are many people that are sitting on this bus\na person sitting on a toilet while operating a computer\nYoung woman with long brown hair in very dark grey jump top holding electronic instrument like a remote control.\na woman with glasses is eating a hot dog\nTwo small teddy bears sit by the vase with flowers\nA man pitching a baseball on a baseball field.\nA cow with a tag in its ear looking observantly.\na toddler playing the piano with a stuffed animal\nA meal is being displayed in a tray with separate compartments.\nA toddler is brushing her teeth in a bathroom.\nthis is a close up picture go two broccolis\nA collection of differently colored trucks in a field.\na desk with a computer  a laptop and monitor\na man stands on a beach with a bunch of surf boards\nSeveral bunches of carrots on a cutting board next to a squash on a counter.\nAltered photograph or painting of a necktie creatively knotted\nA man with a bucket hat riding a hose on a beach.\na little boy batting a ball while his family looks on.\nA group of bikers passing through a crosswalk.\nA monkey holding a strawberry and a banana.\nA boat that is floating in the water.\nA MAN IS ON HIS SKATE BOARD ON THE STREET\nA couple of men standing near a sailboat.\nThe cat is sitting on the ground near the bench.\nMotor and photon boats moored in the water.\nThe bus is parked next to the curb.\na room filled with white furniture and books on the ground.\nHe is writing to his destination on a skateboard.\nA green bus of some sort moving along a road.\nA elephant walking the edge of its raised enclosure at a zoo.\na cow that is laying down on some hay\nA photo looking out of the side of a plane at another commercial plane.\nA large cut pizza on a wooden surface.\nA street sign that says Pee Wee Reese Street.\nA baby zebra rubbing up against it's mother while she eats grass.\nA bus leads traffic down a city street.\nA picture of a person standing by a bicycle.\nSeveral colorful foods are sitting on a large plate.\nA close shot of a green bathtub and a toilet.\nTwo male tennis players posing at center red clay court.\nAn egg-topped hamburger and arugula salad with broccoli\nA fish tank is inside a underwater themed bed room.\nA bred and silver plane resting on stands outside.\nA young man standing in forest filled with trees.\na home made breakfast that looks super awful\nRed and white bus parked next to a glass building.\nA large polar bear walking near some rocks.\na woman is jumping up in the air by another girl\nMan in blue pants and white short on a stage\nOld model Harley Davidson motorcycle and old cars parked.\nA bird swimming in wavy water, with a island in the background.\nA zebra is trying to stand in the shade.\nA woman cutting cake while another woman is holding a plate.\nAn orderly bathroom with two sinks and a large mirror.\nA bath tub sitting in a kitchen next to a brick floor.\nA man in an apron arranging a stack of oranges.\nA group of people riding an elephant through the jungle.\nA person climbing up a snow covered mountain.\na woman sits on a chair with a laptop on her lap\nTwo guys sitting on a couch conversing while another guy looks at his camera.\nA distorted black and white picture with clocks.\nA girl holding a racket and touching her head\nA man does a skateboard trick up a ramp\nTwo people sitting at a table with laptops in a bookstore.\nA foreign candy sitting next to it's open wrapper.\nA stop sign has been tagged to include the hammer time song.\na person walking beside a boat sitting next to a fence\na train going down the track all by itself\na traffic sign two people walking and a van in front of a large building\nSeveral women working with some type of production equipment\nBusiness people having a discussion during a luncheon\nA giraffe caged in while grass falling from his mouth.\nwomen riding on the backs of elephants at the circus\nTwo snow patrol people at the bottom of a snow hill.\nA large bear standing in front of a bunch of leaf filled trees.\nTHERE ARE A BUNCH OF SHEEPS THAT ARE ON THE GRASS\nA dessert is sitting on a plate by a teapot.\nA snowboard sticking out of snow covered ground.\nthree giraffes walking outside near a wood gate\nA cow rests in a pen with a turkey, chicken, and duck.\nA family of people hanging out on a beach.\nSeveral pedestrians crossing an intersection at a bridge.\nA little boy that has birds on his arms.\nA large jetliner flying through a sky filled with clouds.\nTwo cats eating out of one food bowl.\nA blurry man standing next to another man laying on a bench.\nA single file row of dark colored luggage backs.\nA man is laying on the couch with a large cat.\nA woman with a concerned look talking on a cell phone.\nA plate topped with two sliced of pizza.\nThe traffic lights glow green in the night sky.\nThere is a man sitting on a bench listening to music\nA red fire hydrant sitting in the middle of a sidewalk.\na sign in front of an old house in the city\nThe medium sized zebra is looking into the camera.\nA young boy holding a remote control standing in front of a TV.\nA teddy bear is on the hand rail of a train door.\nA woman sitting in a car smiling while sitting beside a bunch of suitcases.\nA small group of cows standing in front of the camera.\nA large bunch of broccoli growing with the leaves around it.\nSeveral street signs displaying street names, addresses and driving option.\nA metal sink with a cupboard of knives sitting on it.\na fridge is shown with some pictures on it\nTwo blue and white vases are sitting on a table.\nsmall bathroom with tiles on the floor, sink, toilet and a window\nA group of men sitting on a lush green field.\nfour jet plans are flying across the blue sky\nA group of men waiting for a bus at a bus stop.\ntwo people riding down the middle of the road on a moped bike\nA woman holding down a dog with a swab in her hand.\nThere is someone standing in water holding a board.\nA man with a bandanna on serving himself food.\nA man and woman that are standing near a table.\nA woman that is holding a camera taking a picture in the mirror.\nWashed clothing is hung out on a clothesline in a cattle enclosure.\nA group of people standing near a number of blenders\nThe men are playing a game of baseball.\nA young man is body surfing and paddling in the water.\nA young girl sitting in front of a bunch of bananas and grapes.\nA double decker bus driving past a tall building.\na couple of people are holding tennis racket on a court\nThis man is skiing down a snowy slope\nThere is a surfer riding a wave in the ocean\nA bus parked along the side of a busy street.\na man cutting  up carrots in long strips\nA man sitting down eating a pizza at a restaurant.\nA body of water containing boats, kayaks and people.\nA woman is preparing to make dinner at her kitchen counter with the cabinets open\nAn arrangement of doughnuts grouped in front of a store window.\nA fork sits next to a piece of white cake.\nA ram sitting on top of a hill in the day.\nA man on a surfboard riding a wave.\nA black girl removing her denim jacket top.\nA male elephant stands beside a shady bush.\na young baseball player starts running to first base\nA small white dog lays in front of the fireplace.\nThere was a lot of organizational effort put into planning this kitchen.\nA bull is next to a large group of people outside a train.\nA skier is shown kneeling while on a flat patch of snow.\nTwo horses in grassy field below power lines.\nA group of people outdoors next to a large white building.\nA slice of cheesecake sitting on top of a white plate.\nA blurry image of a gauge on a pipe.\na couple of hot dogs that are on agrill\na newly married couple cutting up a colorful wedding cake.\nI sign in a video game warning dog owners to pick up after their dogs.\nthere is a baby sheep that is laying on the ground\nA large bathroom features tiled walls, two mirrors and two sinks.\nA bird perched on top of a branch in a tree.\na small boat on a large body of water\nAn elephant and a handler in an enclosure down below.\nA manual or book about ten-speed bicycles\nA skateboarder doing a trick at an event.\nA stop sign between two traffic cones in the middle of the dessert.\nA brown dog carrying a frisbee in a grassy area.\nA picture of two smart phone display screens.\nA United plane flying close to the runway.\nA large airliner with a kangaroo on the tail wing.\nPassengers waiting for their bags at a luggage carousel.\nA rural train station is loading and unloading passengers\nA group of soldiers  sitting at a table with a woman.\nA group of ties hang off a pole\nA bear is sitting on a rock in the sun.\nA person surfing on a continuous wave ride in a city.\nLady laying across a bed with a dog.\nan image of a small airplane flying in the sky\nA woman taking a swing at a tennis ball\nA family is posing with their luggage at the airport.\nA white bowl filled with rice and broccoli beef.\nA group of passengers with a lot of luggage.\nA boy sits in a living room using a laptop computer.\nThree carrots being cut by a large metal knife.\na cat sitting underneath a vehicle on the cement ground\nA man wearing a red striped tie is seen talking\nA man is sleeping with the covers pulled up high.\nA puppy is learning to retrieve a frisbee.\nA double bed with white sheets and floral pillows and blue trimmings.\nA lucky bamboo plant in the window of a small bathroom.\nTwo people in suites posing behind some serving bowls\nA bowl of salad is sitting next to a dessert on a plate.\nA group of men standing next to each other.\na number of motorcycles parked near each other\nA snow covered street with a person walking down it.\na couple of sheep stand in front of a rock\nA black and white dog sitting on a bench.\na man eats a sandwich and drinks a cup of coffee\nA close up of the side of an orange train.\nA man holding a large soup pot in a kitchen.\nA dog standing on blocks outside near deck furniture.\nA man looks down at his loose necktie with disdain.\nThree people cross country skiing in a wooded area.\nTwo zebra eating hay outside in a zoo.\nA airplane that is sitting on a runway.\nBatter, catcher and umpire during as baseball game\nA big elephant standing beside a small elephant in tall grass field with other animals obscured in back.\nAn airplane flying through a cloudy blue sky.\na girl in a white jacket and orange sun visor playing tennis\nFour pieces of toast with olives, cheese, and other toppings.\nA zebra that is close by is grazing on some hay.\nThe large winged bird is looking for some prey.\nThe cook is slicing lengthwise  several  bananas on the cutting board.\nTrain on the tracks at a station with people sitting on a bench.\nA cat with a cone on sitting behind a man while he is sleeping.\nA fairly normal looking bathroom that's in someone's house.\nA man with a knife and chopping board cutting apples\nA blond girl carries a tennis ball on top of her racket.\na baby with a pacifier sleeping in bed\na dog wears a baseball hat on his head\nthere is a very tall giraffe in a zoo\nthree people closing their eyes standing in a line together\nA large number of identical wooden boats float close to each other on the water.\nAn emo girl laying on top of a bed on her back.\nA train trolley with a car in front of it.\nA view of kitchen missing everything except the microwave and top cabinets.\nWoman walking down a icy walk way next to a stop sign.\nThe salad is inside of a clear bowl on the table.\nPeople observing a display of a concept motorcycle.\nA background of blurred shapes is fronted by bunches of green bananas  of which one's been ripped off.\na couple of cars are parked outside a church\nA cat lies up against the arm rest of a couch.\nA crowd of people in a metropolitan area at dusk.\nA little girl at the picnic table eating a cake.\nPlate full of cooked carrots, potatoes, and other vegetables.\nA dish which consists of roast beef, broccoli and potatoes.\nLaptop computer next to monitor on wooden desk.\na white plate with some food on it\nsome phones on a wooden table and a laptop\na red and yellow train is going past some red lightstrain signals\nThe giraffes walk next to each other down the wilderness trail.\nThree glass vases with a single yellow flower in each.\nthe truck has been painted red white and blue\nA picture of a scene in a baseball game.\nA giraffe reaching for a tree branch on a sandy zoo lot.\nthere is a owl that is sitting in trees and bushes\nThese people are riding horses through the mountains\nA pizza with veggies and eggs on it.\na cat laying down on top of a cardboard box\nmany fruits arranged in large containers indoors near a weall\nA young boy in glasses paying video games\na slice of orange sitting next to a sliced cake\nA green street sign sitting on top of a metal pole.\nA pair of gray shoes are sitting on a bed.\nA cake that is made to look like a pink castle.\nA tan cat wearing an old bowl as a hat.\nA bathroom with a toilet, sink, and other bathroom items.\nA small cow stands near a market display of soda bottles.\na train on a train track with trees in the background\nA white toilet and a dark cherry paneled wall.\nA brown and white animal standing next to a marina.\nTwo young men standing next to two dogs.\na living room that has a coffee table in it\nTwo children stand beneath the tail of an airliner near many others.\nA zebra standing on a lush green field.\nA black train parked next to a  red train in a train station.\na clock that is sitting on top of a table\nA messy desk with a computer, cups, glasses, bottles, books on the desk and the floor.\nA MAN SITTING AT A TABLE WITH NICE DINNER GLASSES\nA man on a surfboard kneels down as a wave breaks.\nRice with ground beef and asparagus in a bowl.\nBoat sitting by the dock at the river\nA commercial stainless steel kitchen with white dishes\na male in a red shirt cooking pizzas in an black oven\nA large elephant standing in a grassy field.\nView of one of the clocks surrounding this tower top\nA person with a red bike jacket is riding a red bike\na bunch of toilet seats in a building that is being renovated.\na small child is holding up a bottle\nA drawers of various supplies in different sections.\nPeople walking through a multi level shopping mall.\nA giraffe standing behind a wire fence on a grass covered field.\nA view of a small room with a bed, and small kitchenette.\nA woman holding a plate with a pizza on it\nA chef playing salad in bowls in a kitchen.\nA variety of donuts in a glass case.\nTwo girls walking with umbrellas on the sidewalk.\nA skier in a panda hat poses for the camera.\na small child with an open umbrella on the ground\nA skier on the snow with gear and ear muffs.\nAntique military biplane at waters edge at beach.\nFighter jet on a airstrip with low hanging clouds.\nA baby lies on blue and green bedding next to a teddy bear.\nOld single engine plane on display in open building\na man is holding something next to a motorcycle\na guy dressed in leather sitting on a motorcycle next to a bus\nA white sink in the corner of a grey tiled bathroom.\nA fenced in area with a giraffe reaching it's neck and head over a fence that separates it from people.\nA man flying through the air on top of a skateboard.\nA group of people sitting around a table with food.\nMan walking up mountain using ski poles with backpack on\na cow sitting on top of a hill eating in the rocks\nTwo pizzas sitting on pizza pans on a oven.\na child and adult in ski gear walking in the snow.\nA woman in bed beneath red linens having a conversation with a man.\nan image of two people playing outside with cups\nTwo motorcycles side by side in a building.\nA boy skating on his skate board at a skate ramp.\nA man riding a surfboard on top of a river.\nA woman talking on a cell phone standing next to a  parking sign.\nA woman under a sheet in the bed with her head on a pillow.\nA elephant standing close to a fence in front of trees.\nCattle in a fenced area resting and eating next to a lush green field.\na little green bird sitting in a tree next to a house\nA black and white cat stands on a bathroom sink.\nThe young woman smiles shyly while washing dishes.\nA bus that is driving in the street.\nThree zebras hurry across the road in front of car\nA hand with a glove over it above a toilet.\nBoy holding an umbrella at the edge of a cliff.\na boy looking over a gate at a cow.\nHazy image of a surfer riding a wave on the ocean.\na bathroom in an outhose with a wooden window on the side of it\nA plate that has food on a table.\nA person holding a purple stuffed teddy bear.\na line of buses that are parked in the road\nrobot dogs playing soccer in front of people\nBathroom vanity show featuring the sinks and a stool.\nYellow fired hydrant on the side of a city road.\nAn oversized picture of a train has a conductor standing by it.\nA green and white bathroom with folded towels\nA red double decker bus traveling down a road in the snow.\nAn old man riding a skateboard down a street.\nthere is a woman standing by a trains window\na close up of a man taking a bite out of a chocolate glazed sprinkled donut\nThe desk has multiple computers screens and mouses on it.\nMany cows grazing outside on hills in the grass.\nA bed with grey sheet and two red pillows.\nAn grown elephant standing beside its two babies.\na tennis player on a court with a racket\nA kitchen is all white with gray counter tops.\nA man's legs standing on a skateboard on a road\nA man is surfing a wave in the ocean.\nA man with long hair is about to hit a tennis ball.\nAn open kitchen with dark wood cabinets opens to a seating area which is vacant.\nTwo women on a balcony cooking on the grill.\nA look at a sign signaling no skateboarding.\na black kitty laying on a bench licking its paw\nA High flying skier is doing a mid air flip.\nA man sitting on the sidewalk under an umbrella\nA sandwich on a green plate on a kitchen counter\nThree cupcakes with blue icing are on the table and the middle one is split in half at the top.\nA large truck driving down a road next to a car.\nA Christmas display featuring stuffed bears and rabbits.\nPeople looking through the tents at the book festival\na male skateboarder in a black shirt is doing a trick\nA group of  men standing around a giant sheet cake.\nA kitten toy is on a desk with a computer.\nA herd of long horned cows laying on the grass.\na city street with some cars driving down it\nA man swings at a pitch during a baseball game.\ntwo dogs laying down in a pillow on a wooden floor\nWorkers in a restaurant kitchen preparing meals\npassenger train in front of a depot on a late afternoon\nTwo giraffes feeding while standing behind a fence.\na horse in a field of grass\nA white and brown dog is covered in a blanket.\nA young female wearing black is holding a purse and a cell phone.\nA giraffe walking away in a zoo exhibit.\nPen and paper on desktop with computer equipment.\nA women reaches out to catch a softball\nA large clock sits in the middle of a flower bed on a street.\nThe surfer sizes up the waves as he holds his pink surfboard\nA man working on a laptop looking at the camera.\ntwo parked motorcycles umbrellas shops and people and a tree\nA huge double couch in a living with a TV against the wall.\nAn area with blankets and food containers laid out with people holding umbrellas sitting on the ground.\nA kitchen in a dollhouse with various dolls in it.\nBaseball batter gets ready for the pitch during the game.\nThe cat is laying down in the window resting.\nA red fire hydrant pouring water onto a sidewalk.\nThere is a vase filled with water that has rocks and a plant in it\nthere are many people walking in the rain with umbrellas\nA horse wearing a saddle standing in the sand.\nA statue of a baseball player extending his arms to catch a ball.\na piece pizza on a white plate with tomato\nTwo red trains are on one track as a yellow train rides down another.\nThe man and woman stand next to each other holding video game remotes.\nA group of people socializing at a dinner table in a restaurant.\nA man sitting on a bench in front of a bunch of pigeons.\na collection of animal kites flying into the air\nA bus headed to Manchester is on a street.\nA view of a total gym exercise piece laying against the wall.\nA headboard attached to a bed mattress in a room.\nAn old truck, painted over blue in the desert\nTwo suitcases that are sitting near each other.\nOpen door going into a bathroom with black and white tile floor.\nA man riding a wave on top of a surfboard.\nA woman catching a red Frisbee while standing on a dirt road.\nA dog is posted by the window with his reflection in a mirror.\nA picture is taped to the bottom of a stop sign.\nA lot of food that is on top of a table.\nFive unknown objects displayed on a beige counter.\nA couple of of surfers talking on the beach, with other surfers in the background.\nA woman putting a pot into the oven.\nStairs lead down towards a fire between benches in a garden.\nA man standing on a very busy sidewalk in a city\nAn elephant standing next to a green plant with purple flowers.\nA girl brushing her hair by a bed n a room.\nPeople on skateboard and with bikes on a ramp in a parking lot.\nthere are two elephants that are walking on the road\na big pink house with some chairs out front\nA MAN HIS HOLDING A SURF BOARD WALKING ON THE BEACH\nSlice of dessert items served on plate with fork.\nA living room with big windows looking at the ocean.\nA train traveling down a track during the day.\nSeveral street signs hand on a pole as a brick building stands in the back ground near some trees.\nA tall giraffe standing on a  lush green field.\nman on blue tennis court preparing to make serve\na man talking to a pretty girl under an umbrella\nA pair of zebras standing in pen, in the grass.\nA vase containing water as well as a flower in it.\nA blue water hydrant on a roadside in a city\nAn adult elephant standing next to a baby elephant.\na rest room and a bench inside the dugout\nA brown and white cat laying on a tan sofa.\nA person in a room with a remote.\nA picture of a black cat sitting on a young man.\nA Bathroom with a toilet, sink, and records on the wall.\nA young man eating food on a kitchen counter.\nA large refrigerator and freezer sits in the middle of a kitchen.\ntwo kids playing in a park with their kite\nA chestnut horse stands in the surf on a beach.\nA couple of buses parked across the street from each other.\nTwo people next to a metal bench stare into a river.\nA trash can sitting next to a bench outside with a trash bag next to it .\nA group of teddy bear on a shopping trolley\nOne man with soccer ball touching his head while another stands near.\nA narrow bathroom with a thin door is shown.\nA collection of items for an advertisement are arranged on a table.\nThis is a bathroom that is in someones home.\nA dog in mid air catching a frisbee on a field.\nA woman sitting on top of a brown horse.\nA group of cute stuffed animals in a bed.\nsome people and two laptops on a yellow table\nA man is taking a slice of thin pizza\na chair made out of skis with people playing on the grass\nA woman walking a path by snow with her dog.\na couple of horse that are pulling a wagon\nthere is only one boat on the sand at the beach\nMaintenance city man inspecting fire hydrant on street.\nA roller skier pushes off down a street.\nA man leaning on another man both in suits and ties.\nA modern kitchen with a glass of wine on the counter.\nAn old cement wall in a home is decorated with garland.\na ball game being played on the field in front of an audience\nThree people at a intersection are waiting for the light to change.\nA group of passengers on a public transportation bus.\nA cat sits by four matching luggage bags.\nA long haired cat is sitting in an open suitcase.\nA young boy and man eat food at a cafe\nA table topped with oranges and a bowl of salad.\nCows lay down resting in the foreground while a flank of trees highlights the background.\nAn adult walking beside a child in a field.\nA child stands with his bat ready to hit a ball.\nA baseball player is going to hit the ball\npanda bear sitting between two trees in  forest\nThe sandwich dominates the plate and comes with soup.\na person reaching up for an open umbrella\nAn airliner is descending over the water to an airport.\nA boy is holding a dog that is wearing a hat.\nA woman hitting a tennis ball with a tennis racket.\nA white tub sitting next to a window and shower.\na woman sitting in a chair at a dining table in a restaurant\nA bald man in a suit on a television.\nThere is a half eaten piece of pie on the plate.\nThree zebras standing near each other in an enclosure.\nThere is a woman that is sitting down playing wii\nA brown leather piece of luggage sitting on a luggage stand.\na Shetland pony  with  tennis shoes on\nTHERE IS A CAT THAT IS ON THE BACK OF A DOG\na group of people skiing down a snowy slope\nAn open Swiss Army knife rests on a table.\nA variety of kites flying over the beach and ocean.\nPeople at a bus stop getting a a bus.\ntwo cooks in a kitchen sampling their food\nA lady looking into the sun standing on a hill wearing skis.\nSeveral cake doughnuts cooking in large fryer full of oil.\na guy sitting on his motor bike under some palm trees\na living room with a bright red couch next to a yellow wall\nTwo cats find room to stretch out and rest themselves end to end, even on a cluttered desk.\nA man in a red snow jacket is standing on skis.\nA painted fire hydrant next to an old tv.\nA close up of a pole with several street name signs.\nA bathroom with a toilet, and sink with the lights on.\nA man is presenting someone with a chocolate cake.\nA close up of a television remote being pointed at a TV\nThe young woman is licking the bread of a sandwich.\nA cat looking at something on the floor\na room showing a fridge well cleaned and a microwave\nTwo Zebras eat grass in a dusty area.\nA silver sports car is parked beside horse droppings left by a group of horses.\nA china cabinet filled with fine blue and pink china.\nA bathroom with a sink, toilet, mirror and toilet roll stand.\nA man is standing on his skis in the snow.\nSeveral boats out off the shore of a lake.\nMan displaying bunches of fruit in arid area.\na male in an orange shirt in a black suitcase\nPeople sit in a hot tub that is surrounded by snow.\nA baseball player has just swung his bat.\nA large white clock tower sitting in the middle of a city.\nA group of four people standing next to each other in the snow.\nA heard of sheep are roaming in the pasture.\nA dog sitting in a chair next to a table.\nA person they sitting down in a chair.\nA room with low ceilings and old furniture.\nA nice big living room with a big fireplace.\nCouple of goose standing at the water's edge while ducks swin in it.\nA couple of women sitting at a table next to drinks.\nA mass transit train moving across a small bridge.\nA piece of cake is sitting on a blue, green and white decorative plate.\nA fighter jet flying through the air above the clouds\nA pole is holding up street signs in the city.\nA photo of two people sitting on a couch, one playing the Wii.\nAdult and juvenile cows roaming in a grassy field\nA very cute dog laying down in a child's bed.\nA picture of a person touching a cupcake.\nA person wearing a helmet is holding bunches of bananas.\nA woman sitting on a bench that is facing the ocean.\nA man standing on top of a beach near a surfboard.\nSmall Prop plane drives along the runway in the day.\nA woman hitting a tennis ball on a professional court.\nA broken flip phone sits, in two pieces, on the counter.\nA produce stall at a farmers' market displaying baskets of carrots and cauliflower.\nA bathroom scene with two bathtubs and a toilet.\nA man holding a new sign under a stop sign.\nA small bathroom with a stand-up shower.\na cat is sitting on a wooden bench outside\nTwo guys getting ready to jump of a ramp with their snowboard.\nA zebra stands in the dirt in its enclosure.\nSeveral men in suits and military gear standing near a table.\nA smiling boy wearing a white shirt and red tie.\nA blue single engine airplane in the air above a landing strip.\nA group of stuffed animals sitting on a bed\nA bathroom with a sink and a tub and a minimal, modern style design.\nA goup of people at a wine tasting.\nA woman in black shirt resting on a luggage carrying cart.\nA young boy riding a skate board on the walkway of a park.\nA kitchen with white walls and wooden cabinets.\nA group of open umbrellas piled on top of each other.\nAn assortment of fruit for sale at a market.\nA single engine aircraft parked in a grassy field with other planes.\nGroup of people all showing off their cellphones in a group seating.\nA bird soaring through a foggy sky over a snow covered mountain.\na bird flying just above a body of water.\nsome giraffes standing next to each other in their pen\nSeveral herd animals are on the grass by a mountain.\na close up of a plate of food on a table\nVehicles on the side of the road and a herd of sheep.\nA young boy standing on top of a green field holding a baseball bat.\nA wet dog running on a beach with a neon green Frisbee in it's mouth.\nA man in an empty parking lot trying to pull something\nA man is in the picture above a plate of food.\nThe thin woman is standing between a man and an eating dog.\nAn apple being held by a hand with a knife tip presses against it.\nA bathroom with two white toilets and a large bathtub.\nThis group of steer are laying in the grass\nA zebra standing next to a zebra sitting on the ground.\nA man camping with two dogs eating a meal.\nTwo people are flying a kite on the beach.\nA plate with a hot dog and fresh pickles.\nA group of animals standing in a grass field.\nA woman sitting on a bench looking at her cell phone\nSeveral paraskiers engaged beneath a cloudy winter sky.\nYoung girl dressed in blue and pink skiing down a hill.\nA smiling grey teddy bear with a plaid bow lies on a green carpet.\nA tennis player holds his racket in the air after hitting the ball.\nCouch and chairs in living area with television.\nA cup of coffee next to a laptop of some sort.\nA cat cleaning itself on the top of a suitcase\nThere is a bowl of fruit with apples, pears, and oranges in it.\nA person is skiing down a mountain next to a  blue line in the snow.\nA young girl standing in front of a plate of food.\nA cake is the table along with some fruit.\nA group of celebrating fans in a city street.\nSomeone laying on a wood floor with a dog\nA group of giraffes stand together in the field.\nA giant clock on the side of of a neon sign.\nThree Red Sox baseball players stand smiling in a dugout.\nTHERE IS A CAT IN THE MIDDLE OF A BUNCH OF KNIVES\nA boy biting into a piece of broccoli.\nTwo garbage collectors standing behind a garbage truck gathering up bags.\nPeople on a safari look at an elephant in the road.\nA choice of poached eggs and bacon on a bagel or donuts.\na mirrored door showing the reflection of a couple\nthere are train tracks that lead in to a train station\nKids out on a sunny day while skate boarding.\nA bathroom with a tiled backsplash over a sink and bathtub.\nA street sign is pointing towards 8th avenue and the other is pointing towards 22 34 street in the middle of the forest.\na man that is outside with a kite in hand\nA red and yellow sign for the life guards and an umbrella on a beach near the ocean.\nTwo men and two women are hanging out at a skate park.\nAn orange container filled with office supplies sitting on the ground.\nA plate with food on it next to a bowl with salad.\nA large airplane flying through a sky above a city.\nBoats tied up in  a harbor with cranes in the background.\nPlaying on a small laptop and a phone at the same time is not recommended\nA man wearing a suit has a boutonniere pinned on his chest.\nthere are two men on a field playing with a frisbee\nFour people are skiing down a snowy hill.\nA very cute small child touching a fire hydrant.\nA table with bins of food that include pizza, fruit and salads.\na cow eating garbage on the side of a road\nA man in a short sleeve shirt with a tennis racket\nA large multiple layer cake with yellow frosting flowers.\nA large jetliner sitting on top of an airport tarmac.\na boat partly submerged in a body of water\nAn animal that is looking at something on the ground.\nA mission style bed is dressed with bright white sheets and a striped folded quilt sitting in between two matching nightstands and lamps.\nTwo plates have a meal prepared on each of them.\nA person on a skateboard does an air trick.\nTwo side-by-side photos of different living room settings.\nA large kitchen generously adorned with shiny metal surfaces.\nA man in front of a horse working on its hoof.\nThe woman stands on the cart behind a man driving it.\nA man holds a laptop that has a message about Barack Obama written on its screen.\nA motorcycle is parked in a lot by a store.\nVarious trains at a train station next to people on loading dock.\nA guy rail grinding a skateboard on a ramp.\nA woman sitting on a bus next to a dog.\na giraffe is eating a piece of food\nA booth with salesman trying to track down\nTHERE IS A WOMAN THAT IS PLAYING TENNIS ON THE COURT\nA woman texting on her phone while on her laptop.\nA brown donut on a thin piece of white paper.\nA bunch of sheep grazing in an open field.\nA baseball player is up to bat during a game.\nA stand with various tv and game equipment on it.\nA closeup of a train at the station for people to board.\nA small pizza sitting on a wooden table next to a bread maker.\nA baseball player is preparing to swing his bat.\nTwo kids sitting at a table eating a meal.\na young man is performing a skateboarding trick\nA man holding a tennis racquet on a tennis court.\nAn image of some baseball players in front of some money.\nSome food that is on a glass plate.\nThe owl is looking at the camera in an intense fashion.\nCat sitting on cabinet in front of large screen television.\nA brown clock tower with a gold, black and white clock.\nA table with cut up vegetables and cheese with it's rind cut off.\nA group of people sitting in a chair, working in computers.\nTwo road bicycles are locked to a pole in front of a man talking on his phone.\nA teen boy and teen girl standing on skateboards in front of a stone brick wall.\nThe man is riding up a hill on a motorcycle.\nA person wearing sandels standing in front of a cat.\nA toilet filled with Hershey squirts with a blue lid.\nA man in grey shirt doing a trick on a skateboard.\nTwo hipsters sitting down at a table cutting up a chocolate cake.\nTHREE BASEBALL PLAYERS STANDING ON A BASEBALL FIELD PLAYING A GAME\nA person is driving a speedboat quickly through the water.\na teen holding onto a brown teddy bear\nA man rides a motorcycle that is decorated with three teddy bears.\nA Virgin Mobile train driving in the middle of a city.\nA restroom has a toilet and a decorative sun wall plaque.\nThe man is carting his suitcase around the city.\nA toilet with the lid open and a phone on the wall beside it.\nA little boy is in a batting cage with his dad, who is serving as catcher.\nA kitchen area with a large pot, dove and a wooden cabinet.\na person behind a stand selling fruit with a person near by\nThis small kitchen has pots, pans and spices on display\na man getting ready to serve tennis ball\nA plate of vegetables and meet on a table\nA public bathroom that is dimly lit by a window.\nA person jumping on a rail on a skateboard.\ntwo white sheep, a black goat and a white goat in a field\na man holding a surfboard on his back\nA bench is sitting near a wooded area.\nTwo guys wearing nice clothes are standing outside.\nA door opens to a view of a toilet.\nTwo girls in cowboy hats riding horses waving.\nA giraffes face and neck while he eats leaves from  a tree\nsome baseball players are playing baseball on a field\nA pole stands in the dirt with a biker in the back ground\nTwo men and two women enjoying an outdoor meal.\nTwo men walking a dog and watching an airplane about to take off.\nAn elephant putting its trunk in another elephants ear.\nA clock is shown on the top of a tower.\nThree laptop computers sitting next to each other on a kitchen counter.\nA person is looking down with ski boots on and skis next to them.\nA white toilet, sink and shower stand in a bathroom.\nA car driving in an intersection, past a furniture shop.\nA smiling woman perched on a chaise long under an umbrella\nA couple of cars that are parked in the street.\nA man playing a game of tennis on a brown tennis court.\nA plate with chicken and broccoli on it.\nTwo young boys laying on a carpeted floor playing on laptops.\nThe hotdog is next to a bucket of popcorn and a soda.\na plate with a cheese shrimp and scallion pizza\nA man lays on a bed wrapped in a white blanket.\nMany plates of food with their silverware.\nA giraffe leaning down drinking from some water\nTwo jets high in the sky with white trails.\nA small bathroom has toilet, medicine cabinet, and small sink.\nAn industrial sized blender filling a jar\nCar and motorcycle traffic in a large city\nA living room filled with living room furniture and decor.\nthis is a dog running near some water\nA classic military motorcycle is parked in front of a crowd.\nA computer desk with two desk chairs at it.\nA group of people with kids sitting in a living room.\nA man on a skateboard standing on asphalt.\nA zebra foraging for grass among dead branches.\nA refrigerator is filled with a lot of food and beverages.\nA small elephant playing with a toy suspended from a wire.\nA plate with sliced pizza and a bottle of beer.\nSeveral people standing next to two people in cell phone costumes.\nA plate of food with a sandwich and a salad.\na toilet some white brick walls and toilet paper\nA boy riding a skateboard on the sidewalk.\nA man goes to strike a tennis ball.\ntwo men looking angry at each other.\nA person has a sandwich on a plate.\nThe old vase is on display on the table.\nA silver car next to a parking meter.\nThree zebras are standing in a filed under the clouds.\nA man walking down the sidewalk, and a blue briefcase in front of a post.\nA cat laying on a handbag on a bed.\nA couple of coaches in a large room\nA baseball player swinging a bat on top of a field.\nThis sewing room space is small but well stocked.\nWhite kitchen cupboards with grown counter top and black stove.\nA church stands in a country field, underneath blue sky.\nA sumptuous table setting in a royal dining car.\nSeveral bottles of wine on a display table.\nA guy with a white shirt and jeans riding a skateboard.\nShe is checking her messages before finding a good spot to enjoy the concert.\nA smiling young woman holds up a bottle.\nA man wearing a patterned shirt and tie and glasses.\nA woman is dressed as Merida from Brave.\nA couple of guys at a picnic of some sort contemplate sweets arranged on a paper plate.\nA group of cows mill about on a grassy pasture.\nA kitchen with green cabinets and tile back splash\na bunch of food is sitting out on a table\nA very thin cow standing near a herd of elephants.\nAn industrial kitchen with a strainer on the counter.\nA man sitting on top of a cement ledge.\nYoung adults in tennis clothes are playing Wii.\nA living room filled with furniture and a fire place.\nTwo officers are riding horses near a crowd on the sidewalk.\nSeveral people are skiing in the snow by a tree.\na train approaching a station with people waiting to board\nA skateboarder is crouching and arms fixed as if to run into something.\nA man releasing a baseball at the end of a pitch.\nA boy is blowing his candles on his ninth birthday.\nA vase full of flowers sits on a counter.\nA bathroom with a pink sink and blue tiles.\nA man in red shirt kissing a woman's forehead at a table outdoors.\nA woman pushing a stroller and looking at her cellphone walking down the street with people walking or riding bicycles behind her.\nvery types many ripe  fruits in a basket\nPeople are standing on a sidewalk in London.\nPeople pulling their luggage as they walk\nskiers riding on a ski lift to the top of a mountain\nA chair at a desk in a room.\nA hot dog in a paper boat sitting on a person's jeans clad legs.\nSkate boarder performing a stunt in a vacant area\nA man carrying a white surfboard across a beach.\nA group of students posing for a photo.\nA open laptop is on the table next to a box.\nA man standing on top of a boat on a large body of water.\nThe garden vegetables are blooming outside and are ready to be picked.\nA plate of food sits next to its dessert\nSome smiling guys in a very big crowd of people.\nA street intersection with street lights in a small town.\nA baby sitting in front of a stuffed teddy bear.\nLooking down on a very winding twisting road\na bunch of chairs and umbrellas on a beach.\nA woman sitting in front of a desktop computer.\nA bike parking white tent cover is set up.\nFive planes are flying in formation in the sky.\nA black and white picture of accessories in a store.\nA sandwich and condiments sit on a white plate next to a drink.\na person in fancy clothes rides on a horse\nTwo people jumping in the air to fight over a frisbee.\na skateboarder skating on a  stone skate ramp.\nA stove top topped with three pans filled with food.\nThe rider and horse canter onto the field to compete.\ncorner cabinet and sink area of a green kitchen\nA group of elephants that are in the grass.\nThe couple are dressed up and posing for photos.\nA snow skier is off the ground in the snow.\nA man holds a bat awaiting his turn in the batting cages.\nA siamese cat lays on a wooden desk\nThe snowboarder is grabbing the board while jumping up.\nDoor leading into a compartment on a train.\nCows grazing on the grass in a green pasture.\nA bucket full of toothbrushes rests on a rock outside.\nA hand holding a spraying hose to a toilet bowl in a small toilet stall.\nA couple of guy sitting at a table with a couple plates of food in front of them.\ntwo elephants are walking together down the street\na large stack of old and antique multicolored suitcases.\nA man with a backpack sits on a non-functioning toilet outside.\nTwo green metal street signs with Spanish words on it.\nA girl is sitting and eating a biscuit.\nA toilet in a bathroom next to a plaque on a wall.\nA group of guys in a field playing soccer together\nA picture of a orange cat in a bowl.\nTwo young kids play soccer against each other.\nA group of oranges are sitting in the bowl\nA glass sitting on a table next to an oven.\nFood prepared on a bun and set in a basket\nAn old double decker green bus says London Transport.\nA mother and daughter are cooking together in a kitchen.\nA living room with wooden walls and a tv.\nThis little league player is catching a ball during a play\nA pair of parking meters sitting behind a row of parked vehicles.\nA child looking at an elephant that is standing in an enclosure.\na person on a skate board comes off a ramp\nA girl with a black eye and pig tails sits in a suitcase.\nA group of snowboarders gliding down a snowy mountainside.\nA man in a jersey swinging a baseball bat at a ball.\na small bathroom with a toilet and a sink\nA plane flies over, painted in right colors.\nthree people sitting in chairs and a teddy bear\nan image of baby sleeping next to a woman\nMany cows in a pasture with trees eating grass.\nAn old woman attempting to play a video game.\nAn airplane sitting on top of an airport tarmac.\nA blue vase with a bird painted on it with flowers in it.\nA young man is standing at the bottom of a staircase.\na number of people walking on a side walk near a building\nA crowd watches a large giraffe through a wire fence.\nSeveral people on skies in the snow\nTwo ducks by the water one is spreading its wings.\nA passenger train that is pulling into a station.\nA man and a child standing on top of a beach.\nA bike sitting in front of a beach in the evening.\na couple of baseball players are out on the field\nan old steam powered locamotive at a station filled with passengers\nA half dozen assorted doughnuts in an open box\nPeople standing outside of a hut with several bunches of bananas and other fruit outside of it.\nA blurry screenshot of a green street sign.\nThe stop sign below the street signs has writing on it.\nthree surfers are walking on the sand at the beach\na man has a refrigerator on his three wheel bicycle\na bunch of people stand next to some suit cases\na sheep and baby sheep standing in a field\nA smiling man is holding a skate board near a street.\nA man on a surfboard riding a wave.\nA young girl who is looking in the refrigerator.\nA baseball bat leaning agains a wall beside a yellow box.\nA larger standing horse is standing protectively over a smaller resting horse in some tall grass.\nA man catching a white frisbee with his hand.\nA cake with a couple of birds and other animals on it.\nA tall clock tower and a tree against a blue sky.\na female in a white top is playing tennis\nA yellow fire hydrant gushes water onto the street.\nTwo giraffes are walking next to one another.\nModel train locomotive on track in small village display.\nA boat with a wooden hull is on a beach.\nA girl sticking her hand in a large bowl.\nA vase filled with lots of colorful flowers.\nAn old refrigerator displays its open door and contents.\nA picture wedged in between a bunch of bananas\nModern kitchen with counter and cabinet and hardwood floors.\nA little boy watching two elephants in an enclosure.\nThe motorcycle rider is on the road with all his gear.\nTwo large slices of pepperoni pizza on a table.\na red and white stop sign and a street sign\nTwo women are at a table with laptops.\nA young man holding a tennis racquet on a tennis court.\nA man playing with a soccer ball in a field\nA Blue dish full of green broccoli heads and asparagus.\nA bowl has onions, shredded carrots and other ingredients in it.\nA red stop sign with a car parked behind it.\nA large church clock tower towering over a city.\ntourists riding and petting an elephant at a tourist attraction\nthree people are sitting on a bench watching a train go by\nThe men are playing doubles tennis on the court.\nA man and a woman carrying a surfboard down the road.\na small bird in a field of green grass\na close up of a vase with art behind a display glass\nThe wooden bench has spray paint on the back.\nA pitcher winding up to throw a ball on a baseball field.\nA table topped with a plate with a pizza on it.\nA male competitive speed skier coming around a curve.\ntraffic cones in a bathroom that's under construction\nA male getting an object out of a tree.\nTHERE ARE PEOPLE THAT ARE SITTING AT THE TABLE\nA men's tennis couple watching a ball hit the net.\nGroup of people enjoying food at a market.\nA full view of a building that has a huge roof on top.\na male with short hair is looking out of a trains window\nA street sign surrounded by orange and red leaves\nA woman feeding a white dog a small carrot.\nThree individuals flying a geometric kite on the beach.\nA 4 way stop sign on the corner of a city street\nA girl took a selfie of her taking a selfie on her cell phone.\na wooden piece of art consisting of two birds standing at opposite ends of a log with a cone shaped vase in the center with a group of red berries sticking out of the cone.\nIt's easy to imagine a dinosaur as an ancestor of the giraffe.\nA young man holding a tennis racquet on a court.\nWhite police car passing through a stop sign in front of the building.\na vegetable sandwich with cucumber pickle and tomato\nA hand wearing a ring reaching for a pair of scissors.\na man is riding on a ramp on a skateboard\nA woman sits on a bus, presumably waiting for a bench.\nThis bathroom has a wood floor and wood on the wall.\nTwo guys stand by bottom of stairs playing the Wii\nA group of zebras crossing the dirt road .\nA chef standing in a kitchen preparing food.\nA small bird sits on a corn plant.\nSeveral park benches lined up under a row of trees.\nA red Two level bus stopped to pick up passengers\nA person standing on a sidewalk with a black umbrella\nA table with pots of sliced carrots, green vegetables and baked bread.\nA living area with two chairs, stool and a television.\nA man is next to a horse in a window.\nA white woman and an indian man shaking hands\nA child's lunch, of soup, fruit, and veggie, sits on an A B C place-mat.\nA very sophisticated bathroom with a white theme\nAn old man in religious clothes reaches to catch a frisbee.\nSome zebras eating together outside in a grassy area.\na double decker bus rides through london happily\nKiwi fruit, banana, apples and an avocado in a dish\nA table with two bottles of wine on it.\na plane sitting on a runway with a ladder sitting there for it\na jet on three pillars in front of a building\na kitchen with a stove a sink and some cupboards\nA man is cutting small hot dogs and adding toothpicks to them.\nthis is a man hitting a ball in a game\nA flock of ducks are swimming in the water.\nA table with a tin of hotdogs and a plate with bun.\na grill that has some pans on it\nA light pole and street sign in front of a store front.\nA city bus that is sitting on the road.\nA man bounces a tennis ball as he prepares to serve during a match.\na lot of people are on a tennis court.\nA collection of knives and a pair of scissors in a wooden block.\nA cup of coffee sitting next to a sandwich  chips.\nA pair of scissors near a stick of butter.\nA surfer is riding a wave in this aerial photograph.\nA man riding a skateboard down the side of a hand rail.\nTrain coming down a rusty train track with scrub grass.\nA single seagull standing on the coast with waves in the background.\nA white plate with a cut in half sandwich on top of it.\nA rectangular toilet bowl in  a tiled bathroom\nSurfboader riding the crest of a ocean wave\na person riding a horse puling a lot of hay\nThe hotel bed is designed for the business traveler.\nA yellow fire hydrant is standing alone in a parking garage.\nA motorcycle and car are parked in a garage.\na white chicken with a black tail and a red head\nAn adult cow walking along side the river bank\nA yak needs long hair to survive in these mountains.\na man sitting on a bench and laying down\nMan posing with a tennis racket for a shot.\nA passenger train that is pulling into the station.\na person riding skis jumping in the air\nA desktop computer sitting on a a desk.\nThe woman's  on the horse giving presentation with flags\nA white beach chair with a red, white and blue striped towel under a yellow umbrella.\nA person on a snowboard anticipates a jump.\na vase having a bunch of flowers inside of it\nA baseball pitcher throwing a baseball on a field.\nTwo people and a dog are sitting under a sheet-tent.\nA group of people standing at a table with bottles of wine.\nThe train rides on the track past the station during snowy weather.\nBlack and white photograph of a modern commuter train\na dog with its frisbe in its mouth walking in a water way\nThe woman in white outfit swings the racquet at the tennis match.\nA close up view of a pizza sitting on a table with a soda in the back.\nA child snowboarding down a hill in the snow.\nA cat is lying on a table, watching a television.\nA cat sits with hisher toy on a blanket.\nA man on a skateboard standing on a ramp.\nlittle kids sleeping all over a big bed\nA 24 hour recovery truck traveling down the road.\nA snowboarder sitting on a ramp in the snow\nA kitchen table, refrigerator, garbage can, chandelier and window.\na clock with big numbers at the end of a table\nA person is standing in the snow on skis.\nA canopy bed in a white and brown room\nA towels hanging from a towels rack outside a shower.\nAn unfurled sailboat in the water under a pink sky.\nA blue and yellow meal pole with street lamp lighting\nA toilet is made of wood with accents on the back of it.\nbacon, lettuce and tomato on toast with slaw and a pickle.\nA family of giraffes standing by a puddle.\nFive giraffes are standing in tall grass, in their habitat.\nA girl playing some video games with her family.\nParking meter and flower in vase displayed in window.\nStudents are sitting at tables with books and laptops.\nA plate displaying vegetables, meat and bread on a table.\nan image of a stop sign that is posted in the three way zone\nA laptop computer and mouse is on a sofa.\nA man with an old-fashioned hat is looking at the camera.\nTwo gray elephants fighting each other bumping heads.\na group of horses graze on some grass\nA bowl full of oranges and leaves on the table.\nA man doing a trick on a skateboard over a ledge.\nThree people ride an elephant while a man on the ground directs him.\nsome bananas peaches apricots and and apple\nA woman standing next to a motorcycle and some health aid trucks.\nPlayer at bat and umpire holding a ball .\nA large plane sitting inside of a hangar.\nA row of red stop signs sitting next to a  lush green field.\nA view of a mountain range is seen from an airplane.\nA woman talking on a cell phone while walking down a street.\nA woman sitting on a trunk wearing a polka dot dress with a red belt.\nA couple of people at the beach during the day.\nA blue street sign in front of a building with many windows.\nA woman in a blouse wearing a striped tie.\nSomeone riding on an elephant as it stretches it's trunk out\na bunch of motor cycles all parked together\nTwo zebras stand next to each other on a field.\nAn altar with purple cloth, vase and two candles.\nMan poses sideways wearing a plaid shirt and a tie.\nA small plane is parked on the tarmac.\nA van follows behind a bus on a rural road.\na little kid starts to learn how to ski\na little boy bending down taking a bit of a hotdog\nThe laundry is hanging in the tilted room.\nA young boy riding a skateboard down the side of a ramp.\nA couple of cruise ships in port with a large building in the background.\nThis is a picture of a woman playing tennis.\nTwo giraffes stare out of their enclosure at a zoo.\nA small naked boy holding a tennis racquet on a beach.\nA homemade cheese pizza is made and ready for the oven.\nThere's enough wind to fly a large box kite.\nA clock that is siting above a sign.\nThe man reclines in his seat from the table with doughnut in his mouth.\nA white chair with two glass birds on top of it.\nThe blonde lady answered her cell phone because she was waiting for an important call.\nThe street light, the electrical box, and the sidewalk are littered with bird poop.\nA bunch of people lounging at a beach near an ocean.\nTwo men hold hands around a dining table.\nA large train gains speed on the railroad tracks.\nA couple of cows grazing in an open meadow.\nTwo young boys sitting on a bed with three teddy bears and a sign with the number twenty crossed out.\na truck with two off-road vehicles in its back compartment\nA cat is sitting on a motor scooter.\nA group of people walking around a train station.\nYoung boy posing in front of a flying kite in the park\nA stop sign has collaborate and listen on it.\nA compact kitchen set-up with shelves for storage and a small stove.\nA herd of elephants walking across a ground near a river.\nThere are two cows walking on the sand.\nApples and oranges pile in well lit color photo.\nFresh fruit and vegetables on a kitchen counter\nA group of young women getting food from a table.\nA white fireplace that has pink candles lit on its mantle.\nTwo boys with an umbrella and chair on the beach.\nA man and boy are sitting on a couch.\nA woman and some children near a zebra behind a fence.\nA jockey is on a brown horse with a crowd watching.\nA skateboarder doing a trick in the air at night.\nA topdown view of floor with sheets, shoes and a desk on it.\nA herd of cattle grazing in a lush green field.\nA woman in a suit and tie standing with her hands in her pockets.\nA person taking a picture of a stoplight on the side of the street.\nThere is an old-fashioned clock tower in front of a building.\nTwo boys are standing in front of a train with backpacks.\nTwo officers are riding horses near the ocean.\nA cow looking at the camera from inside its fenced in pasture.\ntwo geraffes in a feild next to a tree.\nSnowboarder impaled on a tree during dusk with fire.\ntwo people feeding each other cake at a wediing\na dog outside playing with a ball in the grass\nA two level bus with a large advertisement on the side\ntwo little kids sleeping on a pink bed\nA fleet of small air crafts are flying over sea.\nA royal, horse-drawn carriage moves along the road.\ntwo giraffes are laying down in a park like setting.\na woman stands with some luggage by some chairs\nThis room is caught in a design time warp.\nTwo blue and black parking meters sitting on a sidewalk.\nThe man is cutting into a large cake as others sit around the table.\nThe girl in shorts is attempting to hit the tennis ball.\nA group of people are riding horses near a train.\nA cluttered countertop with a celebratory pink and white cake and opened containers\nA man on a surfboard riding a wave.\nA rectangular pizza served on a wooden cutting board\nA cat and dog standing buy their human in the kitchen.\nA pregnant woman is in bed reading a large book.\nA person that is holding a dog and a bowl.\nSeveral pieces of wood lined up near a lot with several axes around.\nThree children on a sofa by window eating bread and pasta.\na magnetic knife holder on the wall above a kitchen counter\nThree giraffes stand near each other in a field.\nA sign at the corner of St. Clair Street and South Main with flowers above.\nMany people have come to tour an authentic military aircraft.\nA bus moving past a street sign opposite a building\nStaring into the camera to take a picture.\nBottom view of an airliner flying directly over head.\nThere is a man sitting on a wall talking on a phone\na living room with a couch and a tv\nA small kid on a field with a bat.\nA blue sign posted on an overpass that people walk across\na number of large kites on a beach\nA man standing in front of a car with it's hood open and a dog standing in front of the car.\nMen laughing and playing tennis on Wii Sports\nA big building in front of a tall clock tower.\nA small cats eats out of a food bowl while standing in another bowl.\nThe snowboarder is jumping high above the ramp.\nA plastic container of food with rice and vegetables.\nA small showcase of an assortment of funny and cute items.\nA row of motorcycles filling all the spots in a parking area.\nA giraffe and other animals in a field\na black and white image of a man on the phone\nA large kitchen has a stainless steel counter.\nA woman sitting at a table eating a plate of food.\nA double decker bus stopped at an intersection.\na girl that has a racket in her hand\na man standing on a tennis court holding a ball and a tennis racket\nA man prepares to hit a ball during a tennis match.\nA man standing next to a produce stand with tomatoes and other vegetables.\nA man is in a hospital bed has a teddy bear.\nA cat laying on many shoes on a brown rug.\nA kitchen counter with a candle display on it\nA little girl that is standing on a surfboard in the water.\nA small orange vase is on a table with a small branch in it.\nA group of cyclists are riding across an intersection.\nTwo people are on a motorcycle driving down the street.\nA herd of different colored sheep walking near rocks.\nA table with two red vase type items\nA number of people flying kites on a clear day.\nA bath and sink with a woman in a room.\nA parking meter sitting on the side of a road.\nA cat drinking out of a sink faucet.\nA grey tabby cat stretches out on some clothes\nA kitchen with wooden flooring and white wooden cabinets.\nDog sleeping in his bed next to rocking chair\nMan on a surfboard under a large wave\nA roadwork crew constructing a guard rail along a mountain road\ntwo woman stand in the snow and pose for a picture\nTwo men having beers in a dimly lit room.\nLong woman sitting on a raised log looking at the mountains.\nA lone horse standing next to a fence.\nA trainer feeding two giraffes from his hands\nOld picture of a sumo wrestler playing baseball.\nA child holds a game controller for Wii.\na large dog is looking oof to the left\nLiving area with small desk and leather couches.\nA train is pulling into the train station.\nA large chair statue with a large horse statue on top of it.\nA dog running in the sand near the water.\nThe man in stripes holds onto the plate of food as he poses for a picture.\nTwo people standing in the grass playing with a soccer ball.\nA small toilet and trashcan across from a dirty sink in a very small, dirty bathroom.\nA large white sign above a brick wall with a yellow vehicle to the left and a parking sign to the right.\nA black bird is perched on a tree limb.\nCowgirl at rodeo riding house with a Texas flag.\nA man with many bags walking on street next to fence.\nA skier is being towed over the snow.\nA small pizza on a plate that is sitting on a checkered table cloth.\nA picture of a large apple and walnut pie on a plate.\nA woman petting a horse in an open field.\na couple of men are eating at a table\nA few pieces of luggage sitting on top of a wooden floor.\nA herd of goats walks by a car and its driver.\nA large group of competitive cross country skiers.\nA woman smiling while sitting on a bed.\nA slender high rise building is fashioned behind a pole clock.\nA man dressed as Elvis sitting on top of a bull statue.\nThe city buses are parked together in the parking lot.\nA woman is in water catching a frisbee near a boat\nA man takes a bite of his food at an event.\nA red airliner is parked on the tarmac at the airport.\nDock area with urban area on cloudy day.\nA bus is seen coming up to a bus stop.\nA long tunnel with a long table with lots of seats and candles next to wine glasses.\nA man and woman are walking in the rain with an umbrella\nA woman tennis player serving a tennis ball.\nA red double decker bus parked near a red telephone booth\nSurfers on surfboards ride in a row on the ocean waves.\nA person stands next to a train parked on tracks\nA group of boys stand around a museum exhibit.\nCandles, flowers, and stuffed bears are set in a corner near a poster.\nA mother elephant and baby standing near the water\nsome people sitting at tables eating pizza and drinks\na room that has a bunch of tables in it\nFour bowls of snacks crackers, broccoli and carrots, nuts and dip\nA woman riding a carriage pulled by a brown pony in a race.\nPeople stand near a desk with laptops on it.\nA small child holds onto a fire hydrant to stand up.\nA man holding an umbrella for another man in the rain\nA sail boat with a large Colgate Clock in the distance.\nSkateboarder with an elongated shadow at an outdoor skate park.\nA group of people riding skis on a snow covered summit.\nTwo people standing on a tennis court with ball in the air.\nThe men are playing a game of baseball in the field.\nSpectators watch the players at a baseball game.\na road that is next to some trees\nA small farm animal steps through the short grass of a green field.\nA Dairy Queen sign on a major road advertising it's special.\nHorses pull carriages on a dusty dirt road.\nA young boy in a green outfit holds baseball mitt.\nA city bus going down a city street.\nclose up of fingers holding a slice of pizza\nA man with a tennis racket stands on a court.\nA picture of a street with parking meters.\nTHERE IS A BED WITH A SKY BACKGROUND\nA woman with her laptop on a bed in a dark room.\nA man on a horse going down a track.\nA young couple poses with a cake decorated like a keyboard.\nTwo cats sit on top of a towel on a counter.\nA plate that has an apple and sliced kiwi on it.\nGrizzly bear grazing in grassy field in daylight\nA computer is set up with gaming equipment.\nAn elephant walking through a brushy field\nA brown and black bird standing on a tree branch.\na number of elephants near a body of water\nThree stuffed animals next to a radiator and below a rocking chair.\nThe man is focusing on something in his hand while holding up his bike with his leg.\nA tennis player holding a racket on the tennis court.\nA stop sign and people on the street in front of a double decker bus.\na kitchen wit ha stove some cupbaords and drawers\nBlack and white photograph of houses and a clock tower.\na cat licking its lips while holding onto a toy in the shape of an elephant\nThe young man is jumping on his skateboard.\nA batter swings the bat as the crowd watches attentively.\nThree people with surfboards standing near the waves.\na group of people that are eating some food\nTwo men in suits and ties with woman behind one of the men\ntwo college graduates pack up after a long day\ntwo yaks are out in a grassy meadow\nA baseball player swings at a pitched ball.\nThe man is holding a tennis racket in his hand.\nA group of women that are in a kitchen.\nPeople at the picnic while an elderly woman shows a pizza.\nA statue of a man and woman with luggage in a city.\nA pizza sitting on top of a cardboard box on a table.\na large plane is parked on the run way\nThe building has several umbrellas suspended in mid-air for decorative purposes.\nThere is a window with a cake and other baked goods showing.\nAn antique fire truck parked on the side of the road.\nfour traffic lights over a city street\nA woman sitting on a pier near boxes of fruit.\nApples, oranges and bananas all mixed in a bowl.\nA sign with a large hand with five dollars written on it.\ntwo elephants giving people rides down a street\nA man eating something from a paper bag\nA couple of people that are watching a baseball game.\na very tall clock tower sticking out of a building.\nA chrome colored microwave oven in a custom cabinet space.\nA group of people sitting down at a dinner table.\nA lady reaching for a huge wine glass\nA man riding a skateboard down a road.\nThree skiers in bright outfits start down a slope.\nThe stove top and oven is separated in this kitchen\nTwo rams are staring at each other in the woods.\nTwo people dressed up entertain a little girl.\nTwo people are riding a sports motorcycle down the street.\nA view of a bright hallway and a room with a wood burning stove .\nA person holding a toothbrush in his hand.\nsome people standing around one man is wearing a tie\nA picture of someone riding a snowboard doing tricks\nA woman is leaning on a car talking on her cellphone.\nA plate of food with a bite taken out of the hamburger.\nA woman on a motorcycle is next to a man walking a dog along with other people going down a dirt road.\nThe entire baseball team has gathered on the field for a celebration.\nA giraffe that is sticking out its tongue.\nA woman in dress leaning against stack of concrete blocks.\nA black and white scene with a lady answering a phone\nthis is a horse and a dog by the water\nA white and gray bird perched on a human's hand.\nPeople riding on the backs of lavishly decorated elephants\nA large black horse standing on a field filled with green and brown grass.\na van driving down a cracked street\nThe dirt bike has seen many hill climbs in its history.\nA cat and dog napping together on the couch.\nA visitors desk with a vase with sunflowers in it.\nA skateboarder in mid air after a jump\na hotdog with toppings in a paper tray\nA group of people sit in a open living room and kitchen area.\nA group of four zebras standing in different positions.\nA simple wooden bench is in the woods.\nA man in a red uniform and shorts throws a ball while wearing a baseball glove.\nMulti colored cat laying on the floor next to door and liquor bottles\nThe person is holding his cell phone while on his laptop.\nA man standing in a kitchen using a blender.\nTwo men stand by a trunk of a car next to which are both a surf board and some folding chairs.\nA variety of shots of a man doing skateboarding tricks\na couch in a living room with three pillows\nA marble bathroom with automatic toilet and bidet.\nA white and grey airplane sits at a gate at an airport.\nHorse grazing in a field in front of a cascade of mountains.\nA white dog has curly matted hair in it's eyes.\nA group of zebra crossing  a river together\na living room with couches and a table\nA woman looking at the camera while holding a cell phone\nThe person on the motorcycle had a big helmet on.\nA young lady reading a paperback book on her bed at night.\nlight brown cocker spaniel dog howling in street\nA cat licking a bowl clean on the counter.\na bathroom with a chipped sink and holes in the walls\nThis looks like a bunch of burned food on top of burned bread.\nA baseball player has just launched a ball.\na living room with a chair near a tv\nPerson high in the sky after jumping snow ramp with snowboard\nPeople are standing behind the bakery counter.\nTwo cows in a large green grassy field.\nThese people are sitting on a street bench.\nA row of cars and trucks parallel parked at parking meters.\nGroup of people riding bicycles on a busy city street.\na man sits next to a monument on a bench\na caddy holding one cell phone and another cell phone in a holder\nA soccer team is praying on the field.\ntwo black and white cows on a green hill\na hot pizza topped with cheese and olives to be eaten\na close up of a cat at a table with a plate in front of it\nA women who is picking up a large sandwich.\nA very large cat sitting in front of a television set.\nA store on a street corner called \"James Smith  Sons\".\nThe old metal bed with dirty linens is the only furnishing in the abandoned room\nA plane flying with a dark, cloudy sky in the background.\nA boy is sitting down and eating a donut.\nThe plane is taking off from the runway.\nWhite and grey cat laying down on a white sheet.\nA gray tiger cat sleeping on a bed under a blanket.\nA photo-shopped image a cube drawn around a lego in the kitchen.\nA young man catching a yellow frisbee on a green lush grass covered field.\na square white appliance with a blue thing on top\nthree people riding horses on a beach near a body of water\nThe laptop has an attractive image on the screen, and there are welcoming flowers and munchies\nA man playing tennis on the tennis court while his coaches watch.\na street sign on a pole with buildings in the background\nStreet sign edited to look like a man is holding the white bar\nA businessman wearing a suit and close up picture of suit.\nBoy sitting at table with food and a cellphone.\nA man holding a woman's hand and cutting a wedding cake together.\nA beautiful woman sitting on the back of a moving truck while clutching her dog.\nA bathroom cubicle showing a toilet, sink and waste can.\nA wedding cake with flowers descending down it to the plate.\nA bathroom area with three different sized and shaped urials on a mosaic wall.\nA plate of cakes with frosting and topped with berries.\nA large truck next to some trees outside.\nPeople can be seen boarding a ship through the windshield.\nA tower of brick holds clocks and a bell in a courtyard.\nPeople are playing in a field, flying a kite.\nA young giraffe stands near some trees in a wooded area.\nYoung baseball player running in open grassy field.\nA few skiers are enjoying the calm snow-trodden mountain tops.\nMexican food is layed out on two trays.\na small boy and a big chocolate donut cake\nA brown horse standing next to a metal fence.\ntwo men standing in front of small car\na boat a larger ship a buoy and water\nShot of a nice quaint living room with an ascending staircase on the side.\nthe skateboarders are taking turns using the ramp.\nA giraffe extends its tongue to drink water\nA park bench is in the white fluffy snow.\nA person is standing with their face near a toilet.\nA couple of elephants are dawdling in an enclosure.\nA giraffe standing near a tree in a field.\nAn African American man wearing a bow tie is taking a selfie.\nFour men are dressed up with a tie.\nPeople having a drink in a basement bar.\nAn underground Asian subway train, on the tracks and in transit.\nThe strawberries are supposed to make this dessert look less fattening.\nMen playing Frisbee on the lawn at a get-together.\nThe man is carrying bunches of bananas.\nA young child that is looking at a birthday cake.\nA person doing a trick on a skateboard caught in motion.\nTwo giraffes and two zebras are standing in a grassy field.\nvegetables laying in the soil next to a trowel\nAn adult elephant standing over a very small baby elephant.\nA bunch of vegetables that are stacked together.\nAn old photo of railroad tracks passing through a western town.\nTwo giraffe standing next to each other.on a lush green field.\nA man holding a tennis racket getting ready to hit a tennis ball.\nBatter, Catcher and Umpire wait at home plate for ball to be thrown.\na formation of fighter jets flying by in the air\nA group of people standing around a nearly empty field.\nA young lady that is smiling and holding up a box with a blue tie in it.\nThere are people watching a game of baseball.\na brown and white dog lying on a bed and brown pillows\nA couple of trains parked in front of a tree.\nA train is stopped at a train station.\nThe skateboarder is jumping down the stairs on his skateboard.\nA sheep standing with his behind to a fence in the snow.\nA round plate that has a white and red pie type dessert on it and a light green pitcher behind it.\nPedestrians on narrow alleyway with archway between buildings.\nA blue and yellow fire hydrant sitting in a field.\nA soldier is cutting a large decorated cake.\na small girl in a field with a blue yellow and red kite\nA yellow fire hydrant in the middle of a plaza.\nFour children eating pizza in a booth at a restaurant.\nA train traveling through a tree covered wilderness.\na lady holding a game controller and a man giving the rock on sign\nAn antique gold clock with a man and an eagle.\nA woman eats a hotdog while holding another.\nA man riding water skis on top of water.\nA bundle of six apples are hanging from a tree.\na bathroom with a white toilet next to a tub.\nSeveral elephants standing in a lake near trees.\na grassy medium between a two way street in a city\nA macro image of an apple keyboard.\nA single person is working in the cluttered kitchen.\nA young boy holds a baseball bat above his head.\nA donkey draws a carriage carrying two people\nTwo men reach up for a Frisbee at a park.\nFire hoses are attached to a fire hydrant.\nThis blue and yellow transit bus provides information about the service it self rather than advertisements.\nA group of people standing around a table together.\nAn apple sliced into four, fork and knife\nA very elaborate cake decorated to look like a bear's forest dinner table.\nA giraffe standing by a brick building with a ladder.\nA couple of women preparing food inside of a kitchen.\nA bed containing two small boxes and an electronic item\nA brown and white cow standing on top of a lush green field.\nA motorcycle is parked next to a blue tent\ntwo people are sitting on two different elephants\nA very cute toddler playing with a laptop that is fully open\nA black and white cat standing on a table next to a pizza.\nA plate with fries and a napkin with eating utensils.\nTourist train with several cars driving on street.\nA group of young boys playing a soccer game in progress.\nA train passing on bridge over a busy city street.\nA bicycle parked on the side of the road beside some doors.\nLego clock and wall setup for a interior Lego house.\nA couple of hot cars in a packet on the table\nA small plane is taking off from a grassy field.\nA look at a hotel room with two beds in it.\nA man standing on a tennis court hitting a tennis ball.\nA guy with a motorcycle helmet stands behind a motorcycle.\nA dark kitchen with many cabinets with a small light on above the stove.\nA bus drives down a city street featuring larger brick buildings.\nOpen textbook near a computer keyboard and mouse on a mouse pad.\nA desk with a computer a keyboard and a mouse\nA basket with a sandwich, coleslaw, and onion rings is sitting on the table.\nA bed sitting in a room under four pictures.\nA tray of assorted food including fruits and vegetables.\nA box with six divisions, each with its own variety of donut.\nPerson flying a red kite in a grassy area of a park.\nA photo of an outdoor with many things in the scene.\nA train is moving or resting on railroad tracks.\na fire hydron that is next to a concrete road\nTwo streets cross and the signs prove it\nA bedroom with a large blanket covered bed in front of a flat screen TV.\nA man holding up a hot dog on a stick.\nA toy wagon holds many stuffed teddy bears.\ntwo zebras are standing together in the woods\nA group of people on motorcycles driving down a street.\nTwo parking meters sit on the side of the street.\nA vehicle decorate like a pink elephant with passengers on its back.\nBlack and white photograph of a woman in an old kitchen\nAnimal in shadows of woods surrounded by foliage.\nThe domed, shiny surface reflects a man falling off a skateboard.\nCloseup of a brown bear sitting in a grassy area.\nThis is an old town from the 1950 's.\nA man skateboarding on the grass in his yard.\nTwo people stand in a field as one of them flies a kite.\nA baseball player is sitting on the bench at a baseball game.\nA white bus is driving on the road.\nContrails can be seen from a descending jet.\nA grey and blue train passing over a city area.\nA young girl is smoking in a kitchen.\nA  city bus with bikes on the front of it\nFour people playing a game system in someones living room.\na group of baseball players that are on a field\na baseball hat cake made with fondant that says happy birthday\nA bird flying over a beach with a few people in the background.\na male is looking at a sausage pizza\nA IMAGE OF A CAKE TEA POT AND BIRDS\nA group of large trains on a steel track.\nthe people are walking down the street with luggage\nA bedroom with a desk in the corner.\nA picture of a girl that is posing on the ground.\nA large green freight boat is seen at sea\na microwave is open with some food in it\nA man cutting a cake on a table with cards on it.\na man looking up as he rides a surfboard\nan image of a living room setting with tables\nA young person holding a surfboard next to a man.\nA chocolate cake with a pile of strawberries on top.\nA giraffe laying down and another giraffe standing up next to trees.\nA man riding a bike while holding an orange and black umbrella.\na close up of a sandwich on a plate next to rice and beans\na building with clock tower in a town square\na clock is sitting on the outside of the building\na pizza topped with different toppings is brought to a table\nA man rides a horse while other people look on.\nA warning sign is at the edge of a body of water next to a fire hydrant.\nThe sign on the side of the road is telling motorcycles to use caution.\nTheir is a skyview of the city from a small aircraft.\nThere is a orange tabby cat sitting on a mat\nA large passenger jet sitting on top of a runway.\nan adult female standing on a beach holding a colorful kite\nA clock is shown in a package on a shelf.\nA woman smiles on a street while holding an umbrella.\nA young boy swings a baseball bat as another boy waits to catch the ball.\nA group of people sit in a room while one plays a video game\na toilet and a bidet in a bathroom\nTall, fresh, colorful flowers in a clear vase\nthree people are skiing down a huge mountain slope\nFour fine zebras cruise through grains very alert.\na tall clock tower near other buildings\nThe large twin engine airliner has a red stripe on the sides.\nA city scene has a tall red double building.\nOne single biker seems to be leading the group down the road.\nA man in light blue jacket riding on a skateboard.\nStreet construction being separated by orange barriers.\nA hotel room with tv, desk, bed and arm chair\na close up of a person eating food at a table\nA laptop that is sitting on a desk.\na flooded city street with a stop sign coming out of it\nA squirrel is eating a piece of food on the ground.\nA park bench is next to a colorful fence.\nA group of people on motorcycles sitting in the road.\nA blue and white bicycle parking in bike track next to building.\nAn older woman sitting at a table cutting up donuts.\nA man wearing a lei is waiting in a parking lot with his luggage.\nMan with young boy carrying surfboards at beach.\na person standing up and holding two remotes\nThere are rambutan, bananas, and papayas in separate crates.\nCement ledge with orange in bowl and red plastic bag below.\nA brown and white sink sitting in a bathroom.\nA small vase of flowers with petals on the table\na man is standing on top of a surf board at sea\nA table topped with steak, potatoes and carrots.\nA lavish hotel room with a comfy bed.\nA couple of glasses of wine on a table.\nThe wooden bench is near a busy stream.\nAn  big airplane flying through the sky\nAdults with looking at watercraft on waterway near park.\nA kitchen filled with an empty refrigerator and microwave.\nA driver's view of an intersection on a sunny day.\na herd of zebras walking through the grassland\nA green bus parked in a parking lot next to others buses.\nA dog curled up by a pair of boots on the floor.\na man is skateboarding on the edge of a building\nTwo men play tennis in a fenced courtyard\nSnowboarding elderly man on side of mountain posing for picture.\nA view of a single bathtub in an otherwise empty bathroom\na man holding a child next to a double decker bus.\nA bird is wading in shallow water by a boat.\nA dog and some humans in a garage of some sort.\nA sink in a kitchen with an overhead light on\none dog laying down and another dog standing over it\nA woman on the beach is flying a kite.\nA street in an Asian country is littered with signs and advertisements.\nA group of people standing on top of a grass covered field.\nThere are two people standing on the side of a street.\na trolly train on a city street at night\nA young and a woman sitting down outside with a laptop between them.\nSeven cows are lined up while being milked.\nA woman riding a horse in a pasture with great caution.\nA large jetliner flying through a cloudy sky.\nA young woman drinking from a wine glass\nA woman standing alone holding a large, white umbrella.\nSeveral elephants eat grass and plants by the water.\nA computer geek's setup of his computer, laptop and various games.\na cat in a luggage bag in a closet\nTwo zebras, one facing forward, one looking at its mate.\nA tall church tower under a blue sky filled with white fluffy clouds.\na person standing on a beach wit ha dog\nOld rusted train left out on the train tracks\nA teddy bear drinking from a pink cup.\nA pizza with an egg in the middle is on a plate.\nA country road covered in rain next to a river.\nA boy holding a kite while standing on a sidewalk.\nA young woman holding a teddy bear in a room.\na person is standing next to a surfboard\nA clock on the side of a window in a room.\nThree skiers posing for photo in front of sign.\nThe fire hydrant is in a field near a covered, wooden bridge.\nA group of young children sitting next to each other.\nThe catcher in a baseball game picks up the ball with his glove.\nA black dog laying underneath a car in the shade\na brown and yellow bathroom with a toilet tub a mirror and a sink\nThe modern bathroom has a glass shower door and cream and brown color scheme.\nA group of zebras standing beside each other in a grassy area.\nLarge special saddles are used while riding elephants.\nA living room with a brown wicker couch and ottoman.\nThree cats sitting on a leopard print bed.\nA table topped with a tray full of cookies and a vase filled with flowers.\na street with cars parked on the side\nA boy in a red jacket at a bus station.\nA shower and sink in a small room.\nNumerous water fowl either taking off or landing in the water\nA large group of motorcycles stretching into the distance on a highway.\nOld lady takes a rest from her walker on a sea side bench\nA hotdog sandwich with sauerkraut, cheese, and mustard.\nA couple of people with snowboards in the snow.\nThree people standing on a mountain taking a picture as they ski.\nb baby doll holding a very big samsung phone\nA street sign sitting next to a tree.\nFour women with two children cross the street in a crosswalk.\nA professional kitchen with metal counter tops with good lighting.\nA gray truck driving past an ATM machine.\nA couple of horses standing near a road.\nA person at a table outdoors with a laptop.\nA pair of hot dogs with toppings next to a drink.\nSliced apple in a bowl covered in cinnamon.\nzebras are walking in a pack on the grass\nA man in a tan shirt and glasses in a car\nA large train resting inside a railway station.\nThe hotel room has two large beds, a desk, a flat screen tv and a lot of space.\nA kitchen area with a dishwasher, stove and microwave.\nthere  is a large blue vase that is empty\nA clock tower on top of a  building next to the ocean.\nSkateboarder doing a high jump down stairs at a competition.\nA person riding a snowboard on the snow on a sunny day.\nAn indoor bathroom with reflective marble counter tops\nA bunch of cows enjoying the grass and sunshine.\nA man flying a kite over a sandy beach.\nThere are benches on the landing where one can sit and enjoy the view of the  wooded surroundings\nA parking lot filled with parked cars in a shopping center..\na close up of a young person wearing a suit and tie\nA large commercial airplane parked on the runway\nA man riding a skateboard through the air above a skate park.\nSeveral skiers ride down a steep, snowy slope.\na toothbrush that is on down on the counter\nA man is standing under an umbrella in the rain.\nsome people standing around a bright lit up party bus\nan image of two little kids playing baseball\nA small red bicycle sits on a hardwood floor.\nA white plate topped with meat and broccoli.\nA woman returning a shot at a tennis match.\nTwo surfers are riding two large ocean waves.\nA large bus parked on a handicapped parking space.\nA person stopped wearing a yellow jacket riding a motorcycle.\nStuffed bear posed \"reading\"  open computer reference book.\nA dog laying in bed all covered up with the blanket .\nOld canopied single bed with luxurious linens and curtains\nA cat sitting on the floor beside a pair of shoes.\nA bench falls into a crack in the asphalt.\nA man holding a luggage cart in front of an airport.\nA pile of carrots and other vegetables on  a tray.\nA woman sitting on the couch with her baby.\nSome very tall giraffes in a big green area eating.\nA couple who is cutting their wedding cake together.\nA black cat sitting in a tub licking the faucet.\nA black and white clock is mounted to a building\nComputer screen displaying a page of small print.\nThe woman is sitting on the couch watching TV.\nThere is a little girl standing next to a very large pizza\nA grey and white cat laying in a black wire basket.\nA child playing with toys in a backyard pool.\nTwo skateboarders performing a trick on a ramp.\nThe man is about ready to cut the cake to share.\nA vintage bicycle is parked outside a storefront beside a state of the art apparatus.\nA beautiful young woman holding a tennis racquet on a tennis court.\nA large bathroom shower has flowers by it.\nA refrigerator in a basic kitchen with bottles on counter.\nThe man is in the air after jumping on a snowboard.\nA person stands on skis on a snowy mountain.\nA green bowl filled with oranges on top of a blue striped table.\nTeddy bears modeling on a runway with other teddy bears watching.\nthe man is holding on to a firsbee\nA bicycle is locked up to a post\nA long city bus pulls away from the curb into traffic.\nA crowd of people stand in the water on the beach.\na woman in a skirt holds a tennis racket an ball\nthe elephants are moving people across the river\nthree people skiing together in a line down a hill\nA baseball player is holding a baseball bat.\nA little kid that is touching a fridge.\nA woman carrying a surfboard into the ocean.\nA giraffe standing in the grass near trees.\na yellow fire hydrogen next to some weeds\nA black cow eating some grass in a field.\na little league player getting ready to throw a ball\nA person is snowboarding down a snowy slope.\nA girl plays with a Wii remote in her hand.\nA female tennis player getting ready to serve.\nA large bus driving down a city street.\nA white plate topped with meat, veggies and bread.\nA colorful breakfast omlet with toast on a green plate.\nPaper umbrellas hangs from the trees for art\nA man is turning a pizza with a spatula.\nThree sheep are grazing on the city sidewalk.\nA small child in a pink dress sitting at a table having cake.\na couple of men in cowboy hats near a sheep\nA chocolate cupcake with a smiling giraffe face.\nA man in the air on his skateboard doing a trick.\na street sign that reads \"do not enter\" on a quiet street\na baseball player holding a baseball bat on a field\na street sign showing a one way street\nA cake with a knife on the table\nA black kitchen sink with potted plants, toaster oven and knife rack behind it.\nThis black dog is sleeping on a bed with white sheets\nA man with a brown sweater is playing a WII game.\nA man standing on a  park holding a white frisbee.\na group of teens playing at a skateboard park with one doing a jump\nTwo zebras and a giraffe are walking in a park.\nA large crowd of people and flying kites.\nTwo beach chairs next to an umbrella on the beach.\nA cat peeking over a tub that it is inside of.\nA red and silver airplane sitting outside at the airport.\nOverhead view of a table with a log and food on it.\nBlack capped cranes standing in a zoo enclosure\nSmall child on a skateboard watches another skateboarder.\nA tennis player holding a racket looking up at the spectators.\nA man holding an umbrella on top of a bridge.\nTwo plates that have food on a counter.\nAn old green steam engine is on the tracks.\nThe reflection of a red truck in a buildings windows\nTwo well dressed hot dogs are sitting next to fries.\nA person on a court with a tennis racket.\nA sink sitting under a mirror and near some cupboards.\nA person riding a horse and another person petting the horse.\nA small engine plane sitting on a runway.\nperson walking their dog on sidewalk past cars\nA bath and a sink in a small room.\nA skateboarder doing tricks in the air at a park.\nA herd of elephants walking through a grass covered field.\nBlurry photograph of a cat jumping up from a chair\nA person that is about to throw a frisbee.\nsome sheep standing in the snow with one looking for food\nThe man squats down while surfing through a wave.\nA tidy hotel room that has two beds and a flat screen\ntwo females dressed in ski attire standing side by side in the snow\nA body of water with boats floating on top of it.\nThe pizza is next to a bowl of salad.\nA couple of boats parked at the wharf during the day.\nGuy in glasses using scissors to cut something in the room.\nA group of vehicles driving down a city street.\nA group of athletes are waiting to compete on a playing field.\nCars backed up several blocks in traffic on a city street.\nA small girl is sitting inside an open suitcase.\nA horse and buggy parked along a sidewalk near a wharf.\nA little boy wearing a baseball hat holding a baseball bat.\na person wearing glasses with a cellphone in their hand.\nA number of mannequins in a clothing store\nA couple is on the beach with a small child.\na white dog that is looking at a frizbee\nA very pretty blue city street sign near some trees.\nA group of men standing around a UK surfboard.\na cluster of blue flowers inside an orange peel\nA person riding skis down a snow covered slope.\nTwo parking meters on roadside and a road sign\nA sandwich with lots of french fries in a foam container next to a cup of dipping sauce.\nA dragon boat race with a bunch of people in the boat\na boy in a baseball uniform poses for the camera with his bat\nA group of soccer players on a field\nZebras are socializing in a pattern of three by three by one.\na kitchen with a table and chairs and a stove\nA person skateboarding in front of two statues of reclining women.\nTwo women play together in a tennis match.\nA toy sits at a desk with a beer.\nThe fire hydrant is across the street from a large building.\ntwo people playing tennis in front of a crowd\nA skier at the bottom of a slope, among coniferous trees.\na giraffe leaning over so it can eat some leaves\nA man sitting in a chair holding a glass in his hand.\nvery well made meal placed in a bowl\nTwo sandwiches in plastic wrap sitting on a counter.\nA teddy bear suspended in mid air as it rains down water on it.\nTwo hot dogs on a plate with a cup of coffee.\na man goes to hit a tennis ball with a racket\nWorld War II vintage fighter plane parked in a museum.\nBulldog riding a wakeboard on a body of water.\nA man flying through the air while riding skis.\nCollege dorm room with stack of newspapers, backpack and suitcase near bed.\nA bathtub sits next a window showing a ferris wheel.\nA couple relax and watch a wide screen television on the far end of a messy living area.\npeople standing around the table with 3 laptops on it\nA falcon sitting on a back yard BBQ grill lid\nA large cow standing in a grass field.\na tennis player is doing an overhead serve\nA giraffe stands by a tree in its habitat at a zoo.\nA horse stand behind a fence and in front of an old building on a snowy day.\nLiving room of residence with green couches and large bookshelf area.\nA person on a orange motorbike is on a track.\nTHERE IS A DISPLAY OF DIFFERNT DOUGHNUTS ON THE TABLE\nA large clock on a cloudy day in the city.\nA table topped with small and large metal bowls filled with veggies.\nAn airplane is flying high in the cloudy sky\nBowls of chopped carrots, onion and lettuce on a turquoise mat.\nGreen traffic light shown against a tall sky scraper at night.\nThree sheared sheep on grass facing different directions.\nA dining room and kitchen area with a glass table and gray chairs.\nA train that is sitting on the tracks.\nThe black and white photo shows a toilet and a bathroom sink.\nA round bed with lots of pillows next to a cat bed.\nThe three zebra are walking down the road.\nA plate with two slices of pizza on it with toppings.\nQuick Stop Groceries has many things besides groceries\nTHERE IS A RED STOP SIGN ON THE DOOR\nA mom giraffe escorting her newborn around a fenced in area.\nA guy is eating a huge slice of cheese pizza.\nA bird walking past a white car in a lot\nA man feeding a brown spotted giraffe over a fence.\nA dark colored beverage in a tall glass and a small bow of food on a table.\nA hipster standing between two surfboards while wearing sunglasses..\nA toaster oven and dish drying rack sit on the kitchen sink counter.\nBoats docked by a couple of city buildings.\nThe hula girl doll sits on top of the car dash.\nSnow piled up on and around a fire hydrant by a fence.\nA man and woman set a formal dining table.\ntwo trains on train tracks at a train station\nbaseball players in motion playing in a stadium\nA man is kneeling down in front of five surfboards.\nThe little boy is petting the giraffe whose head come over the zoo enclosure.\nA hamburger sitting on top of a tray on tissue paper.\nCars riding on the street across a train on the tracks\nA empty living room that has a table in the center.\na work desk with display with graphs, notebooks, and keyboard\nA little boy and a little girl laying on a cat shaped beany bed.\nAn airplane with four engines is on a runway.\nA brown curly haired dog chasing after a red frisbee\nA woman is swinging at a tennis ball on a court.\nthe baseball pitcher getting ready to pitch the ball\nA school safety sign lies against a piano.\ntwo people in a living area playing with a dog wearing a cowboy hat.\nA boy enjoying some sandwich or donut during the day.\nA residential bathroom with sink, toilet and curtained tub\na living room with a couch a window and a lamp\nA close up photo of a baby giraffe standing in the hay.\nSome hooks that are holding hot pads, a ladle, and a pair of scissors.\nA man is leaning out of a train.\nA group of people enjoying a meal at a table.\nA bathroom with a toilet, a sink, and a bathtub.\npeople walking on the sand of a beach shoreline beneath flying kites.\na woman talks on a phone in front of a slide out glass\nA small framed picture is hanging above the toilet\nTwo military men riding horse in the water along the shoreline of the beach\nA woman walking down the street with an umbrella\nSome highly cultural objects on display in this well lit room.\nA man plays an organ in an historic photo.\nA little girl with a snack laughing on a bench.\nA man hitching a ride on an elephant.\nA variety of old motorcycles on display in a shop\nAn open toilet seat next to a urinal.\nA close-up of pink and red flowers in a clear vase.\nA black and white photo of men shoveling rocks.\nA group of people on top of some horses.\nThree giraffes in a field with a fence\nA traffic sign is displayed on a street.\nA man using one hand to hold a skate board while performing a handstand\nTwo female tennis players on a grass court.\nA broken park bench in the middle of a grassy lawn.\nSugar donuts sitting in a white paper bag.\nA man sitting on a park bench next to a person laying on it with a dog.\nA row of auto-flush urinals lines the wall in this public restroom.\nA baseball player holds the bat while the catcher and the umpire stand behind him on a baseball diamond.\nA gray and black cell phone resting in a man's left hand.\nA lonely sheep standing in a field in front of a rock wall.\nA man and a woman riding on a motorcycle are getting ready to hit the road.\nTwo horses gaze out from among the trees.\nA man sitting on a couch holding a small white object.\nA bus going down the area next to the ocean.\nGuy with shades on taking picture of his hot dog\nThe skier is skier down the snow covered hill.\nA large bathroom with a large bathtub in front of curtained windows.\nA bunch of people walking around in a street\na guy riding his skateboard near the edge of a pool\nA mascot entertains fans as baseball players leave the field.\nSeveral motor scooters are jammed into a small market street.\nA microwave sits above a stove built into the cabinets.\nA bus going to Oakland in an empty lot.\nPeople watch a women's softball game from behind a chain link fence.\ntwo girls standing outside a building next to a large toothbrush statue\nA young man sitting on a toilet in a white bathroom.\nThere are two snowboarders in the air completing stunts.\nThe image shows a book digitally modified onto a tennis racket.\nA zebra eating hay scattered on the ground while another zebra lays in the shade.\nA man in an orange t-shirt rides a wave on his surfboard.\nA woman in a window either taking a picture or video taping something outside.\nA skier holds a ski pole in each hand.\nA large commercial plane sitting on a tarmac.\na man that is on a tennis court with a racket\nA person and some animals that are by some plants.\nA baby pulling themselves up to look at a laptop.\nTrays of snacks and a bottle of wine.\nFour guys are sitting around a table eating and drinking.\nA slice of cheese pizza on a plate with parmasean cheese on the crust.\nA cupcake covered in lots of white frosting.\na picture that's been sped up to show streaks of headlights and taillights\nA young black man sitting on a skateboard on a basketball court.\nA train is on rails over the ocean by a pier.\nA LARGE TRASH CAN IN THE SHAPE OF A SOUP CAN IS ON A STREET\nTHERE IS A PLATE WITH SWEET DESSERTS ON THE PLATE\na woman in a pink top holding a cellphone and a few other people\nA blue Hospital sign with an arrow pointing towards the Hospital.\nA man riding a motorcycle down a curvy mountainous road.\nA giraffe standing in  a valley of two small hills\nyoung boy with surf board in hand walking out to the water\nan image of a man with a tennis racket in hand\nThe meals are ready in their individual containers.\nthree yellow buses line up on the street\nA surfer rides a medium sized wave on the beach.\nBoy doing skateboard trick in air at a skate park.\nA person riding on the back of a horse drawn carriage on a beach.\nA large smart phone made the the NOKIA company.\nan image on the table with apples and oatmeal\nA cat is looking at itself in the mirror on the floor.\nA Frisbee team on a field being happy.\nA refrigerator with a variation of different magnets and photographs on the doors.\nA pond with lilypads and a frisbee floating in it.\nmilitary jets being prepared for a  mission\nA picture of some people posing for a picture.\ntwo people sitting at a table with laptops\nA group of young men standing around playing games on the Nintendo Wii.\nan image of a stop sign and yield sign\nA red trolley train is going down the tracks.\na small boy is playing with a remote control\nThe kitchen is has a stainless steal refrigerator.\na vintage photo of a bike parked next to a store\nA giraffe about to eat leaves from a tree\nA kitchen with a stove and a microwave.\nA young woman in a red skirt is waiting on a train platform with her suitcase.\nSeveral horses that are grazing in a field.\nA bathroom with a marble bathtub and a large sink.\nA parking meter with two cars parked beside it.\nGiraffe stretching its neck out to reach green leaves on a pole.\nThe large red city bus drives on a brick street.\nA man riding a skateboard up the side of a ramp.\na person riding a snow board on a snowy surface\nA street sign that is on the side of the road.\na woman holding onto a container as she eats a donut\nA kitten peeking out from of a pile of white blankets.\nThree zebras standing in grass with bushes and trees.\nTwo young girls in uniforms sitting closely together.\nA small boat is in the water and a red bench in on the dock next to it.\nsome giraffes standing in front of a white building\nTHERE IS A PASTRY THAT IS SITTING ON A PLATE\nA cat drinking coffee from a cup on top of a table.\nA plate covered with a meat and vegetable dish\na man performs a flip trick on a skateboard\nA sandwich and french fries are on a plate.\nA living room with wooden walls and furniture.\nfour giraffes basking in sunlight of enclosed area\na close up of a pot cooking broccoli\nA herd of cows standing in a field grazing\na group of people standing around a table covered with different containers of food\nA boy is cutting a string with scissors.\nA close up of an \"all traffic\" sign on the freeway.\nA large jetliner sitting on top of an airport tarmac.\nA man posing with a mouse and keyboard\nan image of kids playing on skateboards in the street\nA woman attempts to fly a butterfly kite.\nTwo boys and their mother playing with a kite.\nA man on a couch playing a video game.\na close up of a white toilet and trash can\nTwo teddy bears with a price tag on ear.\nA pan of food on the stove consisting of sliced carrots\nthis train is leaving the station on rails\nA jet airliner sits in front of the runway.\nTwo men play a game together using the Nintendo Wii gaming system.\nA couple of kids in skinny jeans with skateboards.\nAn open, lit-up, fully stocked refrigerator and freezer.\nPeople are getting off of a large bus onto a commercial airplane.\nA photograph of a white range and oven.\nA beach with people, beach chairs and umbrellas.\nA woman plays Wii while a man holds a martini glass beside her.\nA red firetruck is on a street near a brick building.\nA group of children with two of them brushing their teeth.\nA MUNI bus in San Francisco, parked next to a fountain.\nsome people and a red and black train engine\nA surfer rides a wave in this Michael L. Baird photo.\na tennis player swinging a racket at a ball\nA man wearing a medieval style helmet sits atop his motorcycle.\nA person is holding a flag in a gathering\nMany people near a river gathering around in a circle\nA couple of young guys at a skate ramp with their boards.\nA pair of kites fly above a statue.\nA person balances a large scale full of goods.\nA blurry picture of a man with black hair wearing a suit and tie.\nA man is walking out of a pizza restaurant with his pizza\nA town square with many pedestrians walking about.\na bathroom with a blue dustpan and broom on the floor\nA cat sits under an umbrella while indoors.\nTwo skewers of vegetables and broccoli  on a plate\nA large inflatable whale sitting on top of a beach.\na full view of view of a zebra and a head shot of another zebra\nA vase with flowers is displayed next to a handmade object.\nA woman serves a ball with a clock in the background.\nA man and women at a table eating, there is a baby in a stroller behind them.\nA man adjusts his tie while getting ready to go.\nan apple sticking out ot the side of an apple\na kitchen with a refrigerator and a stove\nA young woman is brushing her teeth at the sink.\nA man pitching a baseball from a mound on a field.\na cat is on the coach staring at a remote control\na cat sitting on the toilet looking at one on the floor\nA bunch of pedestrians walk down the street in the rain\nKeepers looking after a family of elephants at the zoo\nA red truck has a black dog in the drivers chair.\nThree people sitting on bench watching a train go by\na white train is coming down some tracks\nFour people on a ski slope preparing to ski.\nA male tennis player gets ready to serve.\nA road sign by a stone wall and dirt path.\nA sea lion on the rocks with an elephant's head photoshopped on it.\nTwo men sitting at a table with a pizza in front of them.\nSun shining through a window into a bathroom.\nA person on a motorcycle riding down a street.\nTwo dogs sitting at a dinner table enjoy food in bowls.\nA sign, on a sidewalk, containing directions to nearby locations.\nSeveral different donuts are placed in a tall bowl\na person on a skate board tries to do a trick\nWhite commuter airplane with blue tail on an airport runway.\nA train is parked by the sidewalk in the city.\nA man doing tricks on a skateboard with onlookers watching\nA person surfing on a wave in the water on a surf board.\nA woman in a towel combs her wet hair.\nA bench in a park covered in snow.\nThe vase is decorated with a colorful design.\nLady teach a class and uses her laptop\nA woman is playing a game of tennis.\nA baseball player swinging a baseball bat at a baseball.\nThe fireplug is the dominant element with the architecturally interesting building in the background.\nA policeman on a horse is standing across the street from a building.\nA bathroom has several items next to a sink and on top of a medicine cabinet with a door opened up against a glass-walled shower.\nThere are various utensils on the counter of a large kitchen.\na plate containing a bag and several pastries\nA stainless steel kitchen sink on a black granite countertop.\nA paper plate with a very large sandwich with a lot of condiments on it.\na cat rests its head and paw on a pair of womens shoes\nA cup is on a desk with a dog figurine.\nBaby at her first birthday party feeding dad cake.\nA living area with several chairs and a lot of color\nTwo children's miniature trains with conductor and mother and child.\nA horse is shown behind fences in a field.\nA white refrigerator freezer sitting inside of a kitchen.\nA man in black shirt holding a yellow frisbee by rocks.\nA man in blue shirt touching a cake with a utensil.\nThe woman in a pink shirt looks at the kite in the sky.\nA cake with thick icing partially eaten with a knife.\nUp close view of a plate with two well cooked hot dogs on it.\nClock and sign on church tower made of bricks\nTwo lanes of cars waiting at a traffic light.\nA herd of cows grazing on a hill by the road.\nA plate with food on it and an orange with a fork on the plate.\nA small train with writing all over it passes through an intersection.\nA yellow truck next to various cars in a warehouse.\nA miniature bathroom set is shown for a model\nA batter is throwing a bat and getting ready to run.\nA hot dog on a  bun covered in lots of pastrami and a pickle.\nA group of people by a bunch of bananas.\na boy and a woman in a competition with a motor bike\nA baby chews on a toy with a cat pinned under his leg.\nThe man is preparing to pitch the ball.\nA man preparing food on top of a large metal pan.\nA bunch of items that are on a table.\nA skier goes cross country consulting a sign\nA sunset scene with water, elephants and grass.\nTwo small sheep, one standing and one sitting, in a grassy field\ngray and white cat hiding underneath a toilet\na close up of a number of zebras behind a fence\nA baseball batter, catcher, and umpire await a pitch at home plate.\nA young man carries a black backpack and a blue suitcase.\nA cat lies on its back on top of a table with pink roses.\nA living room with a couch, a chair and a piano.\nAn open oven has lots dishes on the racks.\nA statue of a jalapeno on a fire hydrant.\nA bus is passing through a city intersection.\nTwo elephants standing near a small pool of water.\nA brown cloth covered table filled with stuffed animals.\nA railing in front of the beach with surfboards leaning on it.\nTwo zebras looking for food near a tree.\nTwo guys are playing some sort of video games.\nMany televisions are showing the same sunset picture.\nThere is a person looking at the contents of a refrigerator.\nA woman cutting a mans hair in a barbers chair.\nA boy operating a mouse and viewing a laptop.\nAn apple laptop with pens, headphones, books and various small items.\nA skier carves a path as they descend a snowy slope\nChildren playing in a soccer competition on a grass field.\nSmall girl eating pizza off a colorful plate on a blue table.\nA toilet in a restroom with a wooden toilet seat.\nAn orange on a counter next to a bottle of alcohol.\nBed and nightstand with blinds closed and doll sitting on pillow.\na kitchen with a brown dining table set and a potted plant on the counter\nLiving room with half circle window and furniture.\na table with two glasses and a plate with a chocolate dessert and a spoon\nA view of a gourmet style banana split.\na couple of zebras are standing in a gassy field\nA desk has picutres, cds, cups, and a dog figurine.\nA man jumping in the air with a skateboard\nA large outdoor clock with two faces and various designs and numbers on the faces.\nA little girl and woman standing near a birthday with lighted candles.\na train on a train track near a small river\nA man with a bright green tie with his arms around to boys.\na person snows boarding on top of a small hill\nA vase with flowers, cup, pitcher and mug sitting on a table\na woman holds out a stuffed bear to a man in a suit\npeople standing and windsurfing on boards in the water with trees in the background\nA group of people flying lots of kites in a large grassy park area.\nA colorful bus stops at a bus stop.\nA man riding a wave on top of a surfboard.\nA giraffe picture on box with some pizza.\nA couple of multi-colored lawn chairs sitting on a beach.\nA group of baseball players playing a game of baseball.\nA man posing with a horse in the shade\ntwo elephants are in a field together eating\nWe are looking at a delicious plate of banana walnut pancakes.\na public transit bus on a city street\nA bartender filling a long row of champagne flutes\nSeveral stacks of disposable cups sit in a kitchen.\nAssortment of toothbrushes in ceramic container in corner of counter in bathroom.\nA machine that dispenses tickets for some mode of transportation.\nA baseball player holding a bat standing next to a base.\nA group of sheep surrounded by three dogs.\nTwo people touch feet while sitting in chairs.\nA cat sitting on top of a book shelf filled with books.\nA person holding an open mobile phone and a camera.\n2 people outside on a snowy area snow boarding\nA man sitting on a ledge reading a book.\nA cat is standing on the back of a huge dog.\nWarning sign displayed in wooded lane on sunny day.\nFresh fruits, vegetables, and other foods are spread out on the table.\nA meter with a sign on it stating that the meter remains as a courtesy to cyclists\nA city bus moving down a city street on the sidewalk nearby\nTwo cows standing in a penned pasture near a log.\nA person does a trick on a skateboard in black and white.\nA zebra standing amongst tall, dry grass during the day\nA plate of food that is on a table.\nGloves, cell phones, brushes, ties, and ear buds are placed on the floor.\nA fork perched into shredded meat on the bread on the table.\nBrick fireplace in a white and brown living room.\nA person riding a snowboard on a snow covered slope.\nFour pieces of luggage sits on the floor.\nA man flying through the air while riding a snowboard.\nBlack and white photograph looking past traffic lights at an old building\nA girl standing in a boat resting her arm on an elephant who is passing by.\nA statue of a dinosaur, next to a bunch of flying kites.\nA baseball player tries to avoid a tag out play.\nA man walking down a road holding a black umbrella.\na close up of a small dog near a pair of shoes\nGuy in hoodie peeing in a bathroom toilet\nA parking lot with cars and motorcycles at walmart.\nA table topped different plates and bowls of foods.\nThe goose is curious about whats in the bucket.\nA very tall tower sticking out of the side of a building.\nA person that threw a frisbee in the air.\na man with white and blue on playing tennis\nA dog is looking out a large window.\na woman trying to fly a kite with no wind\nA bathroom with bathroom supplies is pictured in this image.\nThe man in black is moving towards a refrigerator.\nAn arrangement of food is displayed on a table.\na couple of cows are standing in a field\nknife cuts into a medium sized pizza on a plate\nA man sitting on the hood of a car talking on a cell phone.\na hotel room with a bed, chair and a window\nthere are many benches that line this park\nA commercial airplane being pulled across the runway by a truck.\nTwo baseball teams of young children playing baseball on a dirt field.\na woman and child checking out a display of food on an outdoor table\nA truck in the street near a person on the side of the road\nToilet design outside of the US with accompanying trash can.\nA person and a dog with frisbees in a park.\nA young boy is flying a kite in a park.\nMultiple trains sit on tracks that run through the city.\nA wooden carved clock tower with posts holding it up.\nsome people and the male is holding a baseball bat\nA small kitchen with dark wood cabinets and white appliances.\nA pick-up truck with a Christmas wreath attached to the grill.\nA refrigerator door left open showing the contents inside.\nA red fire hydrant between two flower boxes\nSeveral square pizzas are sitting on round plates.\nA plate filled with fruit salad and a melted cheese sandwich.\nA baseball player prepares to swing at a pitch.\nA woman sitting on top of a purple motorcycle.\nA coffee cup sits next to an open computer.\nRusted fire hydrant covered with bees in grass near road.\nA workspace with a laptop computer and desktop computer.\nA baseball player is swinging at a pitched ball.\nA plate of colorful vegetables and a cut of meat.\nAn orange fire hydrant sitting below a tall building.\nA chef is cooking food in the kitchen.\nThe zebras are grazing on the grass in the field.\nA person is holding a fork with pancakes on a plate.\nA man and boy in dirt field playing a game with frisbee.\nA man poses for a picture in a suit and tie.\nA cat drinking water from a bathroom sink.\nA train pumps out steam while going down a track on a cloudy day.\nA pitcher, a catcher, and a man up to bat.\na photo of a kitchen with a fridge, an oven and a sink\na flooded street with a street pole\nfive giraffes drinking water with a field behind them\nA woman holding a tennis ball and racquet on a court.\na person holding onto a banana with brown spots\na sign attached to a metal pole sitting in the grass\nA couple of trains that are riding in the rails.\nPeople taking photos of a public speaker with their telephones\nA snowboarder performing a stunt on a snowy mountain.\nThree women are standing in a kitchen cooking.\nA jet flying in the sky surrounded by smoke.\nA living room with hard wood floors covered in furniture.\nA close up of a pizza with spinach and parmesan topping.\nA girl in a green shirt and denim skirt cutting a cake.\nThis is a staggering picture show of people having a remarkable time.\nA close-up of a green apple next to other fruits.\nSeveral cows in a field with a train passing in the background.\nA street sweeper driving down a city street\nA young giraffe and an old giraffe outside of a building.\nFive uniformed players are on a baseball field near a crowd.\nA cat in a bow tie laying under a car.\nA desk full of desktop and laptop computers.\nA group of people on a horse carriage ride going down a street.\nA very close up look at a plate with some food on it.\nA plate of food that includes beef, broccoli and sauces.\nA large bird flying next to a tall building.\nA herd of cattle grazing on a lush green field.\nA frisbee in mid air with a someone below jumping.\na giraffe outside near a forested area and a lot of trees\nan image of several giraffes in a zoo\nfour planes flying in the sky in a formation\nA bathroom with a toilet and a counter next to a door.\nCat laying on the floor near some books\nA man in a wetsuit riding on a surfboard\nBoxes filled with donuts sitting on top of a table.\nA small and large giraffe are by a tall fence.\nYoung men playing frisbee in a grassy field.\nAn older black desktop computer running Windows operating system.\nA man is catching a white Frisbee on the beach.\nSeveral images of a surfer in various phases of going out for a wave.\nA vase of flowers are placed on a long table.\na person in snow gear walking through some deep snow\nThe sign for Spring St. and 6th Ave. is in front of a brick building.\nA couple of zebras are in a brushy field.\na male in a light blue shirt and a white frisbee\nsome people on a bank flying kites and water\nA boat is coming down the water near the shore.\na couple of pelicans sitting on some rocks\nA family is playing with the Wii together in the living room.\nThree giraffes are standing in the field spread apart.\nA couple of women chasing after a frisbee on a field.\nA plate of cheese bread next to bread sticks and wine.\nA street that has a bunch of cars and trucks.\nA pizza that is setting down on a table.\nSomeone's hand on top of a computer's keyboard.\nA man in the progress of getting ready for a wedding.\nA black horse standing inside a fenced enclosure.\nHands typing on light colored electronic computer keyboard.\nTwo beds with a nightstand in between them.\nA man wearing a tie holding his suit jacket over his shoulder\nTV in a cabinet with other furnishings around it\nMan sitting on the side of a van playing the guitar.\nsome elephants in their pen and in some water\nA wooden table topped with plates of food and fruit.\nA brown bear is walking in the woods by some bushes and trees.\nA kitchen with a refrigerator and some cabinets\na man and woman are sitting at a table with their food\nA man in wetsuit surgin on surfboard next to wave.\nA white bathroom sink sitting under a mirror.\nA grill holds meat and a wide assortment of vegetables.\nA group of women eating at a dinner table and conversating.\nA palm tree in front of a poster.\nIn a park, a man in a dress shirt sits on a rock.\nThe boy is skateboarding  up the ramp during the day.\nA laptop computer sitting on a desk in front of a window.\nCommode scene, probably commercial establishment, outside of USA.\nGirl with cake in hand looking at lit candles on it\nA living room with gold walls has a playpen and mounted television.\nA large cut pizza on a table with a laptop.\nA cat sitting on top of a television looking down.\nThe toilet has special buttons that help the handicapped.\nSeveral people film and observe children as they use iPads at school.\nA woman swinging at a tennis ball on the court\na guy jumping with a skateboard on a sunny day\nA mix of beef and broccoli covers rice.\nFour powder covered donuts on a blue plate.\nSheep grazing in an open grassy field.\nA woman sticking her tongue out and doing the \"shocker\" hand sign.\nthree muffins sitting on a chair with a bite out on one of them\nA tennis player with racket serving the ball\na couple of kids that are playing on the ground\nA tray covered in chocolate donuts on top of a table.\nSeveral cops on motorcycles parked next to a large group of people.\nA close up of a clear vase with flowers.\nA group of people throw a frisbee in a circle.\nA wooden bench sitting on top of a dirt field.\nAn alarm clock next to two people sleeping and a pillow.\nA laptop on a pedestal near a hedge.\nA taco salad sits on white paper near a table with a lap top.\nTeddy bears with barcode tags in a pile.\nA cramped bathroom with a yellow bowl on the back of the tank.\na very large collection of remote controls spread out\nA floor with lots of different items and a bag.\nThe motorcycle is parked on the side of a road near snowy mountains.\nA line of hawks wearing hoods on a wooden beam.\na hot dog with onions and cheese next to some french fries\nA low angle shot of Big Ben in the daytime.\na woman looking up at a banana tree.\nA motorcycle police officer leads a parade on a sunny day.\nA counter cluttered with many items, including a tea kettle, a pot, a food scale and more.\nThe city bus is driving through a street intersection.\nTwo woman sitting at a table eating food\nA wild trail with elephants and jeeps driving down a path.\nThe boy eats his large breakfast at the table\nA market has an array of fruits displayed in boxes.\na pair of scissors and eggs laying on a table\na close up of a person pulling food out of an oven\nTwo women in front of a television playing a video game.\nA green, red and blue bus parked on a street in a foreign country.\nHe is eating a banana while taking a selfie.\nA couple of benches sitting next to each other.\na busy street that  has a lot of cars in it\na person taking a photo in a mirror\nA few people are laying on a pull out sofa bed.\nVintage motorcycles sit on a tiled floor way in a shop\nA baseball player is getting ready to hit a ball.\nA piece of cake sits atop a piece of foil.\nA woman prepares to hit a tennis ball on a tennis court.\nA snow boarder taking flight while skiing down a slope\nA baseball pitcher in motion with the ball right out of his fingers.\nA big screen TV and a Wii gaming console on a rooftop.\nA large truck and a bus on a road.\nPolice officer on horse moving through city street.\na couple of horses that are tied up\na clock on a tower next to a building\nan old photo of some people in fancy clothes sitting on a boat\nA woman riding a aqua blue wave on a surfboard.\na red city bus coming through an intersection\nA bowl has a salad with carrots, red cabbage, and broccoli in it.\nA man soaked walking out of the water holding a surfboard.\nA glass bottle on a red surface with a red backdrop\nA bowl filled with mixed cooked green vegetables.\nan extremely long hot dog covered with ketchup and mustard sitting on a table\nA man with a dog is preparing to board a train with others.\nThe people in the homemade boat have a bicycle and a big green umbrella.\nA donut sitting in front of a laptop with black and orange sprinkles\na close up of a hot dog next to a drink on a table\nA man standing outside beside a bunch of fruit.\ntwo apples and a banana laid out to look like a happy face\nAn asian woman smiling while holding a cell phone.\na young broccoli plant in a garden bed\na train moving on the tracks next to a building on a hill\nVarious sizes and colors of tagged and bundled luggage.\nA bride and groom cut the cake at their reception.\na lady on the mountains in very warm clothing\nA guy leaning on the front of a food truck\nOdd plant in a vase on a tray with cookies.\nA view of home plate and to left field during a baseball game.\nAn adult elephant walks near two smaller elephants.\nThe floor of the bathroom is strewn with toilet paper.\nPerson of a surfboard riding a wave in the ocean.\nTwo bikes are sitting in the sand on the beach.\nA cat perched on a toilet using the bathroom.\nTwo people in cowboy hats riding bicycles in an RV park\nA jockey rides a horse through a course.\na blue and gray bus and a woman and buildings\nA person holding a tennis racket and ball getting ready to serve.\na large bus riding in the street outside a building\ntwo little teddy bears with peoples names in tags\na black bird flying above the water of the ocean\nA person riding a racing bike on a track with spectators.\nA surfboarder falling off his board as a wave hits\nA batter, catcher and umpire in a baseball game.\nA herd of sheep standing on top of a grass covered field.\nA group of kids at a table with a cake.\nA teenager doing a skateboard trick in front of a crowd.\nA small red bird perched on a branch.\nA man swinging a tennis racquet on top of a court.\nA woman sits on a bike holding a small gun as a man lies in front of her.\nmany kites flying in the sky with a street light\nA red food truck has a crowd of people by it.\nAn old model motorcycle parked outside a house.\nA bathroom with a glass shower door, toilet, bidet and sink, with a set of shelves\nA bed and a mirror in a small room.\nsome children are in a yard and one has a dog on a leash\nSeveral horses running down the track near a fence.\nA person is displaying the hot dog they are eating.\nTwo bears playfully fight and nip at each other.\nA busy city street has many red double decked buses on it.\nSkate boarder performing aerial trick on sidewalk with car nearby on roadway.\nA passenger jet that is on the runway.\nsomeone is holding in their hands a very old mp3 player\nA group of men on a field playing baseball.\nA cow in a fenced in grass area.\nThis truck has two yellow ribbons and says Freedom isn 't Free.\ntwo people playing on the ocean with a frisbee\nA man on a court with a tennis racket.\nThe zebra is drinking water from the pond near the grass.\nA white plate with two crab cakes and fries.\nA woman's feet who is wearing a pair of red heels.\nThe painting shows a parrot sitting on a branch over a river.\nA small child poses in his baseball uniform\nA fire hydrant in a weedy lot next to a street.\nA woman is riding a moped on the road.\nA stack of folded shirts sits in a darkened room.\nA small group of cows are grazing out in the pasture.\nBlack and white photograph of a train at the station\nPeople are standing in a street car covered in oranges.\nA brown bear licking the ear of another brown bear.\nA man wearing a beret while using a laptop computer.\nA little girl packs her luggage with toys.\na bathroom with a sink and a mirror in it\nWoman placing a dog on a white and yellow surfboard.\na small brown and white bird eating off paper plates on a table\nA couch and a coffee table is in a living room with a wooden floor.\nA teenage male is falling off of a skateboard.\nA couple of buses parked in a parking lot.\na desk has a laptop computer and monitor on it\nA computer is sitting on a messy desk with flowers.\nA large living room has a mini kitchen in the corner.\nA building with a clock built into it.\nA dessert consists of donuts and custard cream.\nA slice of cake and strawberries on white plate.\nthere is a cow along with baby cows behind a gate\nA pizza on a rack and a plate with noodles.\nPair of giraffes foraging in natural outdoor setting.\nA glass filled with pens and scissors and pencils.\nA black cat laying inside a bathroom sink.\na white van is on the back of a truck\nA wooden chair sitting on a sidewalk next to a tree.\nA slice of pizza on a white ceramic plate.\nOrchids are arranged in a glass bowl with table accents around.\na man that is standing under a tree\nA painting of waves upon an ocean with tall grass and gold flowers\nLuggage at an airport under a blue net\na very black dog lying on a courch\nA girl with pale skin wearing a hoodie holds up a toothbrush.\nA close of a fire hydrant painted red white and green\nA giraffe and several zebras out on the plains.\nA large tower with a clock stands in front of the cloudy sky.\na number of people in an open field with kites flying above\nSkier in the air on fresh powder snow.\nElephants are bathing in a river with three men.\ntwo large air planes on a run way\na little boy is holding up a cell phone\nA faded yellow and red train passes through the trees.\nA street scene with many cars and a bus.\nThree surfers stand in front of a wall facing the ocean.\nA young man standing on top of a field holding a baseball bat.\nA man sitting at a table with a laptop and looking off to the side.\nA cute little animal made out of oranges that is on a plate.\nThere are tombstones in the cemetery next to an old church.\nTwo stuffed animals sitting beside each other on a chair.\nA large truck is shown in a rear view mirror.\na fire hydrant near a tree in a field\nA dozen of glazed donuts in a white box.\na zebra grazes on some vegetation next to a fence\nA car is driving down a city street.\nBOY TAKING A GIANT LEAP ON A SKATEBOARD IN FRONT OF ONLOOKERS\nA man and woman on beach with three surfboards.\nA man surfing, with a vegetated coast in the background.\nA large circular clock near a body of water.\nA tall stop sign next to the road near a red fire hydrant.\nA young girl is holding the reins to a small horse.\nA man on a skateboard with his friend talking to people.\nA bear eating a piece of food in rocky area with hay.\nThere were a flock of sheep walking down the road together.\nA wedding cake and cupcakes on a table with knife.\nA table has some old fashioned computer type equipment on it.\nA desk with a laptop and desktop computer.\nA man with a baseball bat that is standing in the dirt.\nA hotel room with a neatly made bad and lamps on the bed stands.\nA modernly styled hotel room has a bed that appears to float off the floor.\nA couple of bannanas and cards for sale\nA herd of horses standing in a dirt horse coral.\nA Not A Thru Street signed hung up on a tree.\na blue chair is in front of a desk\nA woman preparing to serve a ball thrown high in the air.\nA toilet and a urinal with male and female signs.\na person wearing pants surfing on a white board\nA grassy field with different colored umbrellas on the grass.\nA outdoor cafe with many people chatting and eating.\na group of young people getting ready to go ski\nA large group of people are on a field flying kites.\na dog sitting in a truck with its head out the window\na cat laying on top or s shelf in front of a window\nA bunch of wooden desks sitting inside a classroom\nA man and woman stand holding tennis rackets with a young boy.\na large air plane flying in the sky\nA kitchen area with dining table, refrigerator and sink.\nA wood plate with several yellow rolls on it.\nElephants are drinking water from a small pond.\na close up of a dog laying under a table\nA ginger cat lounges comfortably on a bed.\nA man standing behind a camera on a grass covered field.\nA couple of men kayaking in a flooded park area through a gate.\nA woman scratching flakes of fecal matter off of her buttocks.\nA number of wine glasses and a cup on a tray\nA bicyclist speaks to two police officers on horseback.\nA construction worker standing on dirt near a fire hydrant.\nA young child in a field of grass holding a baseball bat.\nWeightplate with me investable sitting on top of the table.\nThree colored toothbrushes standing in a glass holder.\nCuff links on the sleeve of a man wearing a business suit\nTwo giraffes are standing under a tree back to back.\nFresh fruits are stacked and arranged in colorful rows.\nA table with plates, silverware and an electric grill.\nThe ski jumper is concentrating intensely on his target.\nA man is walking while using his cellphone.\nA man riding a surfboard in the water\nA person on a snowboard rides on the hill.\nThis is some fine dining courses on nice plates.\nA person inside of a house using a computer\nHorses and goats are grazing on the open terrain.\nTwo boys in jackets and hats ride horses together.\nA person has fallen off a surfboard near a large wave\nThe young boy is playing in a baseball game,\nMan on skis on a downhill course after a fall.\nA rainbow siting below a lot of clouds near a field.\nA person holding a controller pets his cat.\nA white toilet and broken mirror in a side yard.\nA man riding a wave on a surfboard in the ocean.\nZebra grazing on grass in outdoor enclosed area.\nA sea plane taxis across the water in a large lake.\nA microwave oven sitting on top of  a counter.\nTwo pillows on a bed next to a window.\nStuffed toy bears on display through window setting.\nA white toilet sitting in a restroom with a open lid.\nA dog is laying on the ground with a frisbie.\nMan with a backpack using a urinal with against a tiled wall.\nA trio of images of food including bell peppers, watermelon, milk, and chopped meat\nA close of up oranges with people standing around fruit stands.\nthere is a surfer that can be seen in the water\nThis is a cut up potato on a cutting board with a knife on top of it.\nA baseball player swings his bat after a hit.\nA baseball game in progress with the umpire calling a play.\nA man standing in a room with something in his hand\nAn adult goat standing beside its baby goats in a grassy area.\na teddy bear sitting on a wall next to an old stone house.\nSeveral skiers congregate around a slope at a ski resort.\nLittle boy looking out over a calm body of water\nA young child is asleep next to her mother.\na hummingbird eating from a little bird feeder\nQuesadilla for breakfast with a friend at a restaurant\nA bird is posted on a rock by a lake.\nAn empty roadway between two rows of buildings.\nTwo children stand on a porch with toy tennis rackets.\nA cake depicts a laptop, mouse, and latte.\nA blue bus waiting for passengers at a stop.\nA kitchen with multiple counters and various appliances.\nA brown bear pup running across a grassy area.\nThe men are celebrating at a formal dinner with one wearing a paper crown.\nA smoothie is pictured next to several fruits and vegetables.\na group of navy jets slying together in a line\nThe huge airliner is flying next to the clouds.\nA blurry image of a man in a room full of pots on tables.\nA bunch of people with some wearing headscarves are flying kites and pulling a panda bear balloon.\nA kite laying on the ground surrounded by people.\nA baseball player in a white uniform holds a bat over his shoulder as he stands near an umpire and a catcher.\nA desktop computer sitting on top of a wooden desk.\nA young girl looks through the eye holes in a pizza.\nA mirrored bathroom with a good hair dryer.\nSteak and crab cakes served with grilled peaches.\nA professional skateboarder leaps over a bunch of over skateboards while a crowd watches.\nA variety of Asian foods sit on a table.\nA girl standing under a white and black umbrella.\nA man is riding a surfboard in the ocean.\nA surfboard stored on a rack at the beach with people in the background.\nA man leaning on a pole on a sidewalk in front of a store.\nA street sign sits at an intersection near a store.\nA red train traveling past a three story building.\na bathroom filled with a sink, toilet and hardwood floors\nA few trucks at night with their headlights on\nA skateboarder takes a leisurely run down a city sidewalk.\nA boy riding a skateboard on a sidewalk in an open courtyard.\nA cat playing with a cup that is on the floor\nA man is covered with four cats in bed.\na clown, teddy bear and troll doll for sale in a store.\nsomeone rolls a pizza cutter over a small pizza\nA wireless computer mouse with a computer in the background.\nA child reading a book next to a dog that's lying on the ground.\nThere is a man sitting on the couch next to a woman but he has three neck ties on.\nThree lamb in a pen, some of which have been sheered.\ntwo young people playing in a house one is posing with a stick.\nMan sitting at a picnic table near the beach with his lunch.\nA cat on a leather chair next to remotes\nA camera and tripod is shown with a laptop.\nHundreds of birds soaring through a cloudy sky.\nA jumbo jet is just taking off form the runway.\nA woman puts something into a stone oven.\nA airplane that is flying over a runway.\nA child flying a kite on a sunny day.\nA counter and refrigerator in a small kitchen.\nSurf boaders preparing to head into the ocean.\nAn airplane is on a runway near a passenger ramp.\nA food entree is served on a plate.\nThere are different types of Italian food in the picture.\nA man leaping to hit a tennis ball with a racket.\nA beautiful woman inspecting a small brown dog.\nA fancy bathroom with clear shower, toilet, and mirror\nTwo boats floating on top of a river next to a  rock mountain\nRed pickup truck carrying a sign it its truck bed.\nA set of windows with a red farm house in the view and green grass on the ground.\nA woman sitting at a table with a plate.\nA soldier dressed in white on top of skis.\nA young elephant by a pool of water in a zoo enclosure.\na split picture of two tennis players swinging at the ball\nA police officer and police horse directing traffic.\na black keyboard and a power strip and cords\nA young woman looks over her shoulder as a sky lift takes her down the mountain.\na person riding  a wave on top of a surfboard.\nFour men playing with remote controlled dog toys.\nA man teaching his child how to ride a skate board\nBaseball player at the plate in the process of swinging at a ball.\nA t-shirt has been put onto a stuffed bear\nSmall cat sitting on top of a table looking at a television.\nA cat with a collar sitting on a laptop keyboard.\nTwo square pizzas sitting a grill with cheese.\nA baseball player is starting to run to first base.\nFour people who are all wearing snow skis.\nTwo kids laying down propped up on pillows.\nthis giraffe is going for a walk in the grass\nA man is surfing on a surfboard, catching a big wave.\nThe purple and pink flowers are in a vase.\nTwo baseball players walk near another player from the opposing team.\nA person performs a jump in the air on a snowboard.\na close up of a plate of pancakes on a table\nA giraffe standing next to a wooden pike fence\nA woman is posing in front of a giraffe.\nA woman sitting on a couch near a dog\nA very sleek, clean and dark modern kitchen.\na baseball player is running down a field\nA person on a snowboard in the snow.\nA plate on a table is filled with carrots and beans.\na toilet on the ground outdoors near a bath tub\nThe contents of a purse are on a table.\nA vodafone sitting on a table next to a Mac laptop.\nA mismatched bathroom includes a center shower pan.\nA white toilet sitting next to a window and a sink.\nThree giraffe standing next to a fence under a lot of trees.\nThere is no image here to provide a caption for.\nA table holding a group of fruits and vegetables in bags and crates.\nA tour bus with advertisement on the side of it\nPeople sitting at a table with multiple servings of food.\na woman in a black top on a motorcycle and a male on a bicycle\na red plate that has a piece of chicken with some veggies on it\nbirds standing on the edge of the ledge by the water\na large clock on the wall above a radiator\nmany people riding skis on a snowy slope\nA hat that is on top of a shelf.\nA keyboard and mouse on the ground in a room.\nA woman holding a suitcase on a dirt road.\nA couple present a birthday cake with three candles.\n5 very people posing for the camera over some drinks.\na bunch of motorcycles are parked tightly together\nTwo monks with umbrella standing on a pavement\nA man carrying a surf board out of the ocean.\nA nurse administering medicine to a patient in a hospital.\nA white plate that has various types of vegetables, meat and food items on it.\nA set of cutlery and personal items lined up on a table.\nAn orange kitten on the green couch by itself.\nSkier in red jacket stands on top of a large mountain\nplacemats are on top of a counter in this kitchen\nthere is a man sitting in his truck next to a surf board\ntwo trains on a train track at a train station\nA elephant standing next to two men near a stand.\nThe bathroom is clean and ready to be used.\nA railroad train letting off a big black smokecloud\nTwo birds stand beside each other outside a green door.\nA person standing in the snow near a snow board.\nA single engine plane out front of buildings\nclose up of a red vase holding sticks\nA old picture of workers building the railroad.\nA large leafy green salad in a silver bowl.\nA pre-made cold sandwich is in a cooler with drinks.\nA man is on a saddled horse with reins.\nA woman sitting on top of a bench with large breast.\nA couple of street signs that got wrecked from a car accident.\nA horse with a cover over it being carried along by a woman.\nA woman that is sitting outside on a bench in the snow.\nTwo giraffes standing on a grassy plain with mountains in the background.\nGroup of folks playing bowling on Wii sports\nA person goes down the slope covered with snow.\nA PICTURE OF A BASEBALL PLAYER PLAYING BASEBALL\nA table topped with vases filled with flowers.\nThree sheep are grazing on grass by trees.\nA man standing in falling snow at night holding on to a snowboard.\nA man running on beach with a surfboard and mountain in the background.\nA woman sitting on a white bench with her dog.\nA wooden table topped with plates of food and drinks.\nA close-up photo of a propeller plane in flight.\nA baseball player is getting ready to swing the bat on home base.\nAn accordian sitting on a toilet in a bathroom.\nA child holding a teddy bear while outside.\nA train that is driving by in the day.\nA baseball player has just hit a ball.\nThe train has stopped on the railroad tracks.\nA pair of shoes that are under a bench.\nmany horses st a horse stable with people walking by\nA very nice looking pizza with assorted toppings.\nA woman in a dress carrying an umbrella\nThe vase is filled with multiple pink flowers.\nA photo of a horse race from inside the stands\na number of boats in a large body of water\nThree urinals are hanging from a marble tile wall\nsome kids in a bedroom with a lot of beds in it\nA surfer dressed as Abraham Lincoln rides a wave into the beach\nA clock sitting next to a large tree near a building.\nYoung girls with backpacks are standing near stairs that look to go to the subway.\na young lady in her room looking out the window\nA couple of people playing a game of tennis.\nA frame has six pictures, two with a horse.\nA living room area with tile flooring and a man sitting int he middle of the room on a chair with a remote control in his hand, while looking at a television.\nYellow box truck parked on busy street in city.\nA man is cutting a sub sandwiches while a lady put a vegetable in the bag.\nA dog chasing after a Frisbee with green grass in the background.\nPeople standing around two cakes and plates on a conference room table.\nA woman laying in a bed next to a cat.\na sign on a short pole nest to some little trees\nA gray and white cat sleeping inside of a luggage bag next to clothes.\na person in the air with a skateboard at a skate park\nA person holds an umbrella in their hands.\nA woman is making homemade pizza at a table.\nA table topped with boxes of cupcakes and a sign.\nA woman in a white shirt cutting into a cake in front of a television.\nMan surfing on a surf board on water.\nTHERE ARE A LOT OF VEGETABLES ON TEH STAND\nthree elephants in a green field and some clouds\nMotorcyclists gather at an event with their bikes.\nA book and a pillow with a face lie on a blanket.\nThe group is gathered around the table to eat their meal.\nTwo breakfast meals on a table at IHOP.\nA train on a track pulling into a station.\nA dog enjoys chewing on a carrot in the living room.\nA black and white themed bathroom with two toilets.\nA food tray with french fries and a sandwich.\nA young man prepares to hit a ball with a plastic bat.\nA train going along a track near apartments.\nA large glass table topped with different types of plants.\nTwo men in the park playing with a frisbee.\nBathroom stalls with trash on floor in commercial business.\nA woman holding a tray with a chocolate covered pastry.\nA toilet with buttons or a remote control.\nScreens and small stuffed animals on a computer desk.\nA pantry area next to a large white fridge.\nPair of skiers on snowy slope at sunset.\nA garbage truck is emptying a plastic garbage can.\nA man is bicycling down a street with a passenger standing on the back.\nA group of people are skiing down the mountain.\nA game controller in a persons hand over a couch.\nTwo horses are pulling the covered wagon through the snow.\nA man on skis comes down the slope\na person standing over a squat down toilet\nA group of four people playing croquet on a lawn.\nA small airplane is flying against the blue sky.\nA bouquet of flowers sits in a vase on a desk.\nA person is in the rear view mirror of a motorcycle.\nThe man smiles while walking with skis down a grass slope.\nA walk in shower in a dilapidated bathroom.\nA young girl holding a tennis racket upright\nA commuter train pulling out of a suburban station.\nA person using a pair of scissors to work on a garment.\nA man taking a big bite of a hot dog.\na bus showing domestic animals moving along the street\nA couple of cows that are penned up for safe keeping.\nA close-up of a black cow in front of a metal fence.\nCheerleaders are riding atop a trash truck turned float.\nA bunk bed sits next to an open window.\nA very big pretty bird in the water.\na fire hydrant with the word hydrant written above it\nA man holding a remote in front of a garage.\ntwo boats are idly floating on a lake.\nPeople looking at a group of giraffes in zoo.\nA baseball player crosses home plate as his teammate waits.\nFour airplanes are flying high over telephone wires into cloudy skies.\na man bent over a sink while brushing his teeth\nThree wine glasses and a glass bowl are on the top of the refrigerator.\nThe woman is walking with a pick umbrella.\nA woman on a transport bike waiting for customers.\nTwo double decker buses sitting on top of a parking lot.\nThe Ansett-ANA airplane is parked on the lawn.\nA clean bathroom with a toilet and shower.\nA person holding up a smart phone to take a picture.\na girl sitting at a table with several plates of food around her.\nA little girl trying to feed two giraffes through a netted barrier.\nA woman is playing with a Frisbee on the grass.\nA passenger train leaving the train station that is now empty..\nPeople walk near the many parked tour buses.\nA teddy bear sitting on a table in front of a computer.\na number of people sitting and standing near a building\na man is eating fruit from a bowl\nTwo cows roam and graze among trees and shrub along a mud path.\nA table with three each of three different kinds of pizza.\nThere is a taxing rolling through a wet street\nA train covered in snow on top of train tracks.\ntwo elephants in tall bushes and trees in the background\nConstruction is being done on a street near businesses.\nA man looking in a toilet under a sink.\nA person is standing in the middle of fruit.\nA chili cheese dog on a plate with a bag of corn chips next to it.\nA woman standing on top of a sandy beach flying a kite.\nThe man and woman are decorating the vases together.\nThe men are enjoying a meal together by the window.\nA street corner with a sign and a person riding by on a bike.\nA television in a living room with a doughnut logo on it.\nA kitchen has old white cabinets, and rice on the counter.\nA yellow and black fire hydrant on sidewalk next to building.\nA window in a kitchen with a red shade is shown.\na nice stove that is inside of a kitchen\nA bullet train on rail tracks in the open country.\nA yellow bus on street next to a building.\nA man flying through the air riding a skateboard.\nA donut on a plate in a microwave oven.\nA display case in a bakery filled with lots of donuts.\nA snow boarder performs a jump on a ski slope.\na giraffe standing next to a tree with one with it's leg in the air\nA person with a baseball bat on a field.\nA white toilet in a black bathroom with a phone on the wall.\nThree children dressed in \"Sunday school\" clothes posting for a picture.\nFans pose with stuffed animals at an ice rink.\nA tennis player is lunging forward after hitting the ball.\nThe toilet is clean and ready to be used.\nA rocking chair sitting near a fire place.\nFour men stand behind a couch playing a video game.\nThere is a mirror with a reflection of a train in it.\nA her of zebras in the watering hole with a giraffe in the background.\nA boy performing a trick with his skate board.\nA refrigerator that has a plant on top of it.\nThis is a broccoli carrot soup with a lot of broth.\nA man wearing a black shirt and a purple tie.\nA man riding a wave on top of a surfboard as he flies through the air.\nOriental umbrellas at a food court in a mall.\nA picture of some food in a plate.\nA person with an umbrella stands in front of a bench.\nA little baby laying on a fluffy blanket.\nA dog with a frisbee in his mouth in the back yard.\nA surfboard is decorated and sitting in the sand.\nA group of CGI people standing on a hillside flying kites.\nAn orange and white cat sitting on a wood seat by a bed.\nA white plate topped with a pile of food.\nTwo women prepare various vegetable dishes in a kitchen.\nA women riding a bike with an umbrella.\na piece of cake sits on top of a plate\nA group of girls sharing a pie each with a fork.\nA cat sitting on the back of a motor bike.\nA man placing some flowers inside a vase.\nWoman riding a horse on an asphalt road.\nTwo pug dogs dressed in green bow ties and green top hats to celebrate St. Patrick's day.\nA young child is standing in a room with toys on the floor.\nA white van is covered with graffiti as it's parked near a curb.\nA man is riding his 103 labeled bike on the road.\nA young boy holding two skis poles on top of a snow covered slope.\nTwo people on bicycles and a dog crossing by barrier on a street.\nThree different trains stopping at a train station.\nA plate of vegetables is set next to some sauce.\nA motorcycle cop on a city street tries to look cool.\nA man and a woman enjoying a meal of sandwiches.\nA man exits the huge boat parked by the beach.\nThe dog is laying down on the grass outside.\nMan flying kite in open field near RV park.\nBoy skate boarding on cement ramp at night.\na bathroom with a tub, counter, mirror and small mosaic tile\nThree people skiing in single file in the snow.\nthere is a fried crab inside of a small bowl\nA group of animals standing next to each other.\nSeveral Southwest airlines planes sitting on the runways.\nTwo dogs on a beach surrounded by grass.\na couple of kids sit at a table with some cake\nOrange puffy dog standing in the light on a tile floor.\nA herd of sheep grazing in an open grassy field.\nA stir fry consisting of rice, broccoli, and other vegetables.\na man is taking a selfie in the mirror\nPeople are skiing down a snowy hill.\nA bus is headed under a pass way on a foggy day.\nFive girls in the frame playing soccer, one has the ball.\nA colorful train winds through the valley of a mountain.\nA statue of a man not far from a large clock.\nA young boy in a cluttered rec room playing Wii in his pajamas.\nA person riding a skateboard down a handrail\nHorses cows and sheep are led down a dirt parking lot.\na bear that is sitting on a very large rock\nTwo baby horses playing together in a field\nA male tennis player about to return a tennis ball.\nA kitchen that has a stove, refrigerator, and table in it.\nA smiling man with a box of donuts is handing a donut to a girl as two other young children look on.\na person riding a surf board on a body of water\na person is drinking a beer and eating food\nA white toilet with a clear toilet seat.\na bath room with a toilet and a towel rack\nA group of snowboarders in the snowy conditions\nThree fire hydrants in front of a huge building.\nSports team playing baseball on a ball field.\nA couple of women holding up smart phones in their hands.\nA night scene with a lit street sign, \"Fremont St. Experience.\"\nThree cows stand at the top of a grass-covered hill.\nThe front and back cover of a book.\nA busy street with people walking by a train station.\nA man in a top hat and a woman with glasses.\na bath room with a toilet and a sink\nA pair of scissors and crumpled paper sitting on a table.\nA plate with a piece of food next to a pile of cheese broccoli.\nA cow and a calf are standing in a pen.\nA grown and a baby elephant are in a sandy area\na nice fast green motor cycle in the sun\nA ELEPHANT IS IN THE WATER RIGHT NOW\nYoung professional looking man with a tie and cardigan\nA red train with a bike painted on the side.\nA man ordering something from a milk truck.\nA man riding a skateboard through the air over a ramp.\nA red double bus sitting on top of a dry grass field.\nA police officer rides a motorcycle with a side car.\nA tennis player lunges to hit the ball.\na delicious looking sandwich on a plate with a knife\nThe cowboy at the rodeo is trying to rope the calf.\na street sign for Peepee Falls street above a stop sign\nA dog is sitting on a piece of wood.\nA women in a sunhat and sunglasses posing beside a bilingual English-Arabic stop sign.\nA girl stands while talking on a cell phone.\nLiving room with a table, couch, and a lounge chair.\nA young boy is in the park holding a kite.\nA green and white van full of signs written in spanish\nA child laying in crib with teddy bear.\nA man sitting in front of his birthday cake smiling.\nSome apples and strawberries are on the plate.\na fire hydrant sitting undeneath trees covered with toilet paper\nA punch of different shots of a man in the air.\nThis is an image of a laptop computer\nAn animal eating from the ground near a beach.\nA man standing up holding wii controllers in his hands\nA group of people holding while glasses posing for a picture.\nA store is on a city street near a traffic light.\nThere is a military plane that is parked on the tarp\na black and white cat looks out the window\nSome type of cheese casserole enclosed in parchment paper in the oven.\nA man leans down and picks up a flying disc.\nA large kitchen with many brown cabinets and brown flooring\nFeet wearing red tennis shoes stands next to a white toilet on a tile floor.\nA bed in a bedroom next to a table with a lamp.\nA laptop sits on a pad on a desk.\nA tennis player bounces a ball before a serve.\nA young boy tosses a tennis ball into the air in preparations to hit it.\na couple of people walking on a highway\nA man cooking vegetables and sausages on a grill\nAn assortment of miscellaneous gadgets spread out on a table.\nA white dog is lying down under a chair in sand.\nAn cat sits on the sill of a dilapidated window.\nA happy woman engaging interaction with her laptop\nA clock tower made of bricks outside when it is not so bright.\nThree giraffe standing next to a brown stone building.\nThe corner of a kitchen showing a dishwasher, sink and household items.\nLarge assortment of decorated vases on shelf on display.\nA surfer rides in on her stomach and a gentle wave\na couple of elephants are standing in a field\na group of weird looking vegetables sitting on a table\nThere are people with a man holding a Frisbee on the grass.\nPizza with side salad and glass of wine on display on table.\nA man is talking a picture of a man on a skateboard.\nA black boy playing tennis at a tennis court.\nthere is a man on a skateboard doing a trick\nBrown, black and tan cows grazing on grass in an open field.\nA man sitting with two ties on.\nA large Italian dish on a wood block\nA woman wearing a pair of glasses on top of her head.\na man is holding a tennis racket and a ball\nA horse drawn carriage coming down a city street\nMen talking to monks sitting down at an airport.\nA baseball batter striking a ball at a baseball game.\nA person sail boards in a lake with hills in the background.\nTwo people sit on a city train while checking their personal items.\na long narrow bathroom with a dirty tub and blue and white walls.\nA stop sign in front of the water on a bridge.\nTwo beers sit on a table between bunk beds.\nA pan on a table with lots of pizza.\nBowl of pasta with chicken and broccoli with bread and cheese.\nA dirty show floor in a very small bathroom.\nA yellow school bus negotiates and intersection in a city.\nA large room with much seating available.\nA young boy riding a pedal boat at an amusement park.\nA man talks to a young boy who is wearing skis.\nA little boy plays outside with his ball.\nA table with a lamp on top of it next to a couch.\nYoung man in orange jersey swinging a baseball bat.\nA man is sitting next to a Christmas teddy bear.\nTwo zebra standing next to each other on a dry grass field.\nA person wearing a tie posing for a photo.\nSmoke billows from the back of a yellow and blue fighter jet.\nA dog is standing in the middle of a rug wearing a green tie.\nA couple of people are walking their horses.\nA cat that is laying underneath an umbrella.\nthree baseball players holding up bats on a baseball diamond\nA man on a scooter doing a trick in the air\nA vase and glass with decorative paintings on them.\nThree friends pose for a picture while dining.\nA woman plays a video game in a living room.\nA man installs wood cabinets in a kitchen.\nA picture of a old water pic machine.\nA man lugging a red bag of luggage down a sidewalk.\nA man and a woman seated on a motorcycle, leading a line of others, also riding on motorcycles.\nA man stretching out yelling while catching a Frisbee.\nAsian man in glasses holding two colorful mobile phone cases\na number of people standing holding umbrellas near a building\nA couple of lawn chairs sitting under a white umbrella.\nA person is holding up a carrot in a kitchen.\nA close up of a blue vase with flowers on a table.\nA cow eating grass by a house next to the ocean.\na person who's going down a snowy slope.\nA kitchen filled with metallic appliances sitting next to a stove.\nsome sliced up orange peels sitting on a counter and bowl\nA wood room with some tools on shelves\nA little boy seems fascinated by this silver fire hydrant.\nA bathroom with a white toilet next to a sink and tub.\nA bicycle chained to a beached boat on a beach.\nA man skiing on a slope while people watch.\nAn old man in the middle of his kitchen.\nPeople walk on the beach, with a hut in the distance.\nA crowd gathers outside of an outdoor bar.\nA man crossing a busy intersection near train tracks.\nThe man is fixing his two skies so the shoe will fit.\nA crowd of people with umbrellas standing near a train.\nA dog in a field looking up while wearing a hat.\nA black and white cat relaxing inside a laptop.\nA kitchen is shown with an oven and stove.\nTwo men ride a bicycle contraption with a big load of bananas.\nA large motorcycle is on display at a gathering of people\na baseball pitcher ready to throw the ball\nA coffee cup sitting on a counter in front of a TV with the show 24 playing.\nA bathroom with a shower sink and windows.\nPeople carrying surfboards walking down a sidewalk during the day.\nA bus sits parked at the curb on an empty street\nthis man is riding a wave on a board\nA baseball player stands in front of advertising signs.\nFruits, vegetables and a carton of eggs sitting on a table.\nThree giraffes standing idly in a dry field\nA man in a suit is holding a glass of Champagne.\nStreetlights in front of a brick building in some downtown\nThe working and kitchen area of a dorm room\nMan in a green field standing behind a red Frisbee in the grass.\nA surfer in a wetsuit catching a breaking wave.\ntwo dogs standing on a checker board printed floor\na close up of a small dog near a car\nTwo cows stand next to each other inside a corral.\nA young child lays in bed with a bunch of different books.\nTwo birds walk in the surf along the beach.\na black and white picture of a blue fire hydrant.\nThe small bathroom has a beige toilet in it.\nThe woman is showing the child how to feed the giraffe.\nTwo brown cows standing in some tall grass.\nA young boy touching a small frog that is sitting on an orange frisbee.\nA herd of cattle walking down a country road.\nThree motorcyclists riding down the road on a curve\nPeople are waiting on the station platform for the train to stop.\nSeveral zebras standing in grass during the day.\nA man riding a motorcycle while talking on a cell phone.\nA tennis player has just hit the ball.\nThe toilet is sitting in the brown colored bathroom.\nA print ad for the Pizzeria La Crescia.\nA desk with a laptop, monitor, keyboard and mouse.\nCute picture of white cat snuggled near older dog.\na close up of a remote control pointed at a tv\nA hot dog on a plate with lettuce.\npeople skiing down a hill with no poles\nHorse-drawn carriage moves along street carrying two passengers\na man that is throwing a frizbee in the woods\nA cat is perched on top of a parked car.\nA couple of street signs sitting next to tall buildings.\nA young woman with an oar paddling on a surf board.\nThere is a bowl of fresh fruit on the table.\nA young boy is sitting in front of the oven.\nA wall mounted black oven next to a counter top.\nA man trying to manoeuvre through violent waves as he surfs.\nA woman sitting on a bench holding a kite of a bat.\nA train traveling down train tracks next to a small building.\nA boat sailing on a beautiful lake during the day.\nOlympic skiers are competing in a cross country event.\na train traveling along tracks near a lush green forest.\nA man using a snow board holding a giant fake axe\nA tennis player swings his racket to return a ball.\nA plate that has a sandwich and french fries on it.\nA elephant and a brown elk in a field.\nA neatly organized room with a bed and stuffed bear on it.\nA pair of hands preparing a sausage dog on grill.\na group of guys on the soccer field playing in front of a crowd\nA group of people skiing in a ski race on snow covered ground.\nAn electric commuter train at a well maintained station\nA motorcycle sits on a sidewalk near a city street.\nA dell lap top and an apple laptop side by side on a counter\nA man driving a two horse wagon team.\nA kitchen with a stove top oven next to a kitchen counter.\nA man in a shirt and tie motioning with his hand.\na table that has a bunch of stuff on it\nA large airplane flying in the blue sky.\nTwo pieces of flat round bread laying next to each other.\nA boy sits on a brick wall while holding his skateboard.\nA small room features a microwave and a mirror.\nA man wearing a fedora talking on a cell phone.\nA man holding a plate of fresh pickles up.\nA group of zebras gathered and a wooden shelter to get out of the sun.\nA man cleans his surfboard with a cloth\nThere a man and woman standing on the beach.\nA laptop sitting on a desk near a cellphone, mouse, keyboard and monitor.\nA blue chrome motorcycle with a dark blue seat.\nThe tennis player in the green Nike shirt has a pained facial expression.\nA restaurant called the library bar and grill\nThis bathroom has wall paper on most of the wall and wall paper on the bath tub.\na child in a wagon with many green apples\nA couple of people that are playing a game.\na traffic light next to a street sign\nTwo buses, one blue and one red and white, are going to different destinations.\nA man climbing up the side of a black pole in a park.\nA red and green plate holding a pink cake with frosting.\nA person on some skis in the snow.\na image of blue and yellow trains on train tracks\nTwo people walking along the beach while someone flies a kite in the surf.\nA woman and two men inspect cars at a show.\nA GROUP OF ZEBRAS CLOSELY GATHERING TOGETHER IN OPEN AREA.\nA breakfast sandwich made from biscuits contains egg and sausage.\nthis is a group of elephants in the water near rocks\nStop sign with street signs at a parkway intersection.\na pair of animals on the side of a rocky hill\nA man standing on a snow covered slope holding a board.\na grey cat standing at the sink with its eyes wide open\nFour bowls with food in them on a table\nA laptop and a mouse sit on a wooden table.\nA bed that has padding with a blue picture frame hanging above it.\nA young boy about to throw a baseball during a game.\nThe desk is full with computers and other hardware.\nTwo men sitting on a couch holding pool sticks, one between his legs.\nA couple of people walking out of the ocean with surfboards.\nBlack car sitting at a red light intersection.\nVariety of different deli products sitting in a glass case next to each other.\nA police officer on a motorcycle with others following.\nThe dark green double decker bus travels down an empty road.\nThere are lots of seagulls flying near a boat.\nThere is a freshly made pie on top of the stove\nA rainy picture of three red double deck buses on a street.\nVarious black and white street signs with a pigeon on them.\nA man and a woman cooking hot dogs on an outdoor grill.\nA person and a kid on a couch in a room.\nParents laying on bed in opposite direction of their daughters\nA zebra standing on top of a grass covered field\nThe table has several plates of pizza on it.\nTwo sandwiches and a bowl of fries sitting on a plate next to a cup.\nA skateboarder jumps over a limbo bar during a competition\nThe view from the airplane shows a mountain range.\nA man throwing a baseball at a baseball game\nThe people are playing a game in the living room.\nChild laying down with arms extended in the air.\nA small bathroom with a lot of white tile.\nTwo women in a kitchen preparing a meal\ntwo cats laying in a messy bed near a wall\nThe woman is posing for a photo near the bikes.\nMen in SWAT gear running with guns drawn.\na pair of colorful vases holding white daisies\nA small bird perched on top of a tree branch.\nAn old building with a clock tower in it.\nCupcakes with candy and marsh mellow toppings sit in a white box.\nA man performs a trick on a skateboard in front of three other men.\nA train on a railroad track adjacent to 5 other railroad tracks.\nMan in blue shirt feeding birds from cup.\nAn elephant with a medium sized bird on his back is eating brush.\nTwo men are standing in front of flags and shaking hands.\na big white bridge is going across a lake\nA person sits at an outdoor bar with a piece of paper.\na stuffed animal dog sits inside of a toilet\nA bird is perched in front of a window with bars on it.\nA black bear that is sitting in a grassy spot in a garden.\nA little girl wearing green shoes riding a skateboard in the street.\nTwo men pictured next to a light aircraft with another one in the background\nA stained-glass window is seen in front of a unique background.\nA man is standing in a field and flying a kite.\nA table filled with fresh vegetables being prepared to eat.\nCatering truck parked tightly between cars on a city street.\nMany people stand in front of a large modern building.\nATTEMPTING TRICKS ON BICYCLES AND SKATEBOARDS AT A SKATE PARK\nColorful bird sitting on a branch of a tree.\nMan in blindfold and red garb holding glass of wine.\nA man is running to try and catch a frisbee.\nA sign indicating turns ahead in the night\nA deep pizza with cheese sliced into 6 pieces.\nTwo horses racing with two men on them.\nA man takes his dog for a ride on a scooter.\na sandwich and a cup on a table\nA bit of broccoli, celery and melon on a table.\na couple of men are eating on a boat\nOutside shot of a restroom showing the door partially open.\nA street sign leaned over with the words High Gate Avenue on it.\na view from below of a one way sign\nTwo boys are playing a game of soccer.\nA kitchen with an island which includes a dishwasher, a stove, cabinets, a vent and two windows.\na bath room with multiple mirrors and sinks\nStreet sign saying Tow Zone with a teddy bear hanging from the pole.\nAn orange subway car with purple and yellow graffiti is passing by two men.\nan image of a bear that is walking up a hill\nA man in a cap is sitting at a laptop.\nTwo dogs standing in front of debris in the snow.\nA sandy beach covered in lawn chairs and umbrellas.\nBaby eating food from a  blue plate and spoon\nA boy hitting a baseball with a bat on a field.\na park bench with a blue umbrella among flowers and trees\nA cow is standing outside in the grass on a foggy day.\na man that is walking around with a surfboard\nA yellow and blue train traveling under a bridge.\nA feminine shirtless man holding a bottle of wine in the kitchen with the refrigerator open.\nA couple of people in the snow on skis.\nA cross country skier is stopping along a path.\nStreet signs one on a street corner surrounded by trees\na cat siting on a blue bench in front of some trees\na rusted boat resting on the shore\nA boy skateboarding along the top of a marble garden shelf.\nA bus that is driving down the road.\nThere are books on all three shelves of this book shelf.\nAn airplane and airport crews preparing for takeoff.\nA pizza cut into four slices with blue stuff on it.\na women that is on a court with a racket\na male in a white shirt with a black suitcase and people\nA double decker bus driving on a street.\nA guy is performing a trick on a skateboard.\nsome sheep are standing way off in a field\nA red pick up truck with a large blue object in it's back.\nA little boy in a plaza holding a kite.\nTwo men standing on a hill in snow skis.\nClose-up of green bananas still on the stalk\na bunch of trains that are sitting on tracks.\nA large jetliner flying over a mountain next to a statue of Jesus.\nA pasta dish is featured along with a grilled flatbread.\nA man is standing in the grass holding a baseball bat.\nPerson holding a camera in front of an orange display.\nA man standing in the doorway of a bus traveling down the road\nMany different cars parked on a city street.\nA white toilet with the seat up in a room.\nA bed that has some books on top of it.\na number of birds flying over a body of water\nA red bowl of meat and vegetables on a wood table.\nA herd of zebra standing next to each other in water.\nA person in a blue coat snowboarding down a mountain.\nGroup of people all on laptops during a meeting\nA plate of homemade cheesy pizza on a table.\ngentlemen in suits one wearing a bow tie and one a regular tie\nA man riding a skateboard down the side of a graffiti covered ramp.\nBehind a metal bar a giraffe is view-able.\nA young boy jumping into the air while wearing a catchers mitt.\nA couple of people sitting at a table with pizza.\nA skateboard rider on top of a handrail by a path in the city\nA dog is laying down with some stuffed animals.\nA bunch of animals gather together in the snow\nA bowl of soup with chopped broccoli on top.\nTwo women talking and having a drink at a bar.\nA cow reaches through its fence to eat hay.\nA girl is playing frisbee in a courtyard area.\nA subway train stopped to except new passengers\nan image of a 2 zebras looking on\nWe see a very old and beat up coke machine.\nA horse tied up to the side of a tree in the snow.\nA person that is working on a computer.\na yellow green white and red double decker bus and a building\nA yellow street sign sitting on the side of a road.\nTwo people ride an elephant on the side of a road.\nA man is holding a cellular phone against the rail.\na person holding clothes near a bed in a bed room\nA man in sunglasses is getting ready to play tennis.\nA photo of an omelet and toast with coffee on the side.\nA delicious looking pizza with a variety of vegetable toppings stands out on a yellow plate.\nA child watches television while a panda bear sits by a purse.\nA white washer machine positioned in a bathroom.\nA dog walks through a kitchen with cabinets.\nThe train does not have any cars attached to it.\nA cute teddy bear sitting on a table next to a bottled beverage.\nA gren and white bus on street next to a building.\nA sliced chocolate cake on a white plate.\nHand with scissors cutting computer printout paper.\nA boy holding a skateboard with a two women and coffee design on it.\nA silver car driving down a rain soaked street with bikes on top of it.\nThe train engines and cars have seen better days.\nBlack-and-white photo of two benches on the street.\nA bowl of soup including vegetables and rice.\na knife with a black handle broccoli and green beans\nA man on skis performs a jump in the snow.\nThe sky is cloudy behind an illuminated street light.\nA bathroom vanity with candle, toothbrush's and holder and photo's of Marilyn Monroe.\nTwo small ducklings on a field of grass.\nA dog sitting outside a large brick building.\nA man unloading sheep from the back of a truck onto a pile of mud.\nA lady eating a doughnut and drinking coffee.\nA red train sitting at an empty station.\nA woman helps a little girl take a bite of a large hot dog while they sit on a bench.\na big tower that has a clock on top\nA filtered photograph of a person riding a motorcycle.\nA dirty bathtub sits in a bathroom with a big window on the side.\nA hot dog has a person's head on it.\nA crowd of people on a beach flying kites.\nA white plate with a piece of brad on top of it.\na grey suitcase next to several other objects outdoors on the pavement.\na large giraffe that is walking by some trees\nA couple of baseball players that are on a baseball field.\nA woman swinging a tennis racquet on a court.\nA jug shaped vase holding yellow flowers on a table.\na ceramic set of two cups and a cake which is probably a sugar bowl\nA man sitting on a sofa holding up a laptop with writing on the monitor.\nA communal sink in a white and dingy bathroom.\nA small child putting peanut butter on some bread.\nThe young man is going around the cone on his skateboard.\nthere is a man pointing out to another man in the ocean\nA skateboard turned upside down in a street with shoes hovering over it\nA street sign pole with many street signs on it.\nTwo three dimensional images of a woman with an umbrella.\nA woman standing in front of a table with lots of salad.\na man that is standing up at home plate\na person waking up and hitting the alarm button on a white clock\nYoung woman using video game controller in living area.\na flat screen television sitting on a entertainment center\nA large elephant walking through a wooded area\nA large airplane flying high up in the sky.\nA man standing in front of a TV playing a video game.\nA man in a a shirt and tie smiling at the camera\nTwo green street signs sitting under a tree.\nA bowl of cherries beside apples, bananas, and eggplants\nSlices of pizza sitting on plates next to a glass.\nA parked airplane with the terminal gate to the plane.\nA cat sits on a desk in front of a computer.\nTwo small kids on skis on the slopes\nA person cutting bananas in half on a cutting board.\nA small truck sitting on a road near a gang of bikers.\nA man cutting a piece of plastic with scissors.\na baseball field and some players playing baseball\nA baseball player poses as if he hits a baseball.\nThe pizza is ready to be cooked, then eaten.\nA fork sitting on a table next to a car shaped cupcake.\nThe clock has many designs and sculptures carved around it.\nA tractor and a truck travel down a road.\nA blurry image of some bison laying on the grass.\nA number of motorbikes and cars parked in the field\nA white bull dog rolling around on it's back next to a cat.\nA large long train on a steel track.\ngolden clock details on large clock tower clock surrounded by brick\nA woman playing tennis swings a racket overhead.\na woman sits on a couch with a cat laying on her\na tiny ass bed in a tiny ass room with a tiny ass tv\nA cat is sitting on the arm of a chair.\na giraffe in its pen some bushes trees and grass\nA vase filled with lots of flowers sitting on top of a table.\na white plate filled up with a lot of glazed donuts\nA farm animal on dirt outside of a home.\nA person holding a tiny piece of paper.\nThe pizza sits on the board on the stove.\nA sub sandwich on a table at a restaurant\nA girl and boy playing on a fire hydrant.\na close up of two people shaking hands over a motor cycle\nA white and grey freckled horse next to a brown horse in a valley filled with trees and tall grass.\nA woman in a white dress and someone with a striped umbrella seated by a pond.\nsome cute brown and white cows looking towards the camera\na male skateboarder in a white shirt is doing a trick\nAn orange keychain is next to a red camera.\na young man doing a jump with his skateboard in a skate park\nA giraffe bending over near a big pole.\nA woman standing in a twist position with arm extended, and a Frisbee in the air near her, in a grass park with trees, with people playing Frisbee, walking and lounging in the grass on a sunny day.\nA  cat sleeping on a blanket on someone's bed\nA cluttered living room with figurines on a display case and photographs on the wall.\nA girl in a dress standing on a small skateboard.\nThe man on the couch is playing a video game.\nA man is standing next to a motorcycle in a village.\nSeveral cars and people at bikes sitting at a red light.\nA study table where two laptops are kept open.\nsome baseball players a pitcher catcher and an umpire\ntwo ladies and kids playing sports in a green yard\nA man in a white shirt and black shorts jumps near a soccer ball.\nA plate of bacon, sausage, and other breakfast foods.\na man on skis fly through the air\nA long train with a yellow front stopped at station.\nRed motor \nscooter parked on the sand with a sunset in background.\nA fire hydrant is gushing water on a sidewalk.\nA man arranges toppings on the uncooked pizza.\na group of friends sitting on a mountain posing for a picture\nTwo teddy bears sit on a bed in a bedroom.\nan image of a group of giraffes at the zoo\nA person sitting down talking on a telephone.\nSkateboarder sitting down in the snow in front of another rider.\nTwo boats docked on top of a gravely beach near the ocean.\nTwo giraffes are walking through the enclosed area.\na dog lying on the ground next to a red bicycle with a laundry basket attached.\nA person leaning on their ski poles with a snow covered background.\nA couple of men reaching up towards a blue kite.\na green train is coming down a set of tracks\na laptop sitting on a special rack on top of a desk\nTwo blue and yellow trains parked next to each other on train tracks.\nA person cycles on a motorcycle down a road.\nA stove sitting next to a bunch of old box springs.\nA man, woman and child petting a goat at a petting zoo.\na bear on a road near a field of green grass\nA man kneeling down on a beach next to the ocean.\nA man in a tie standing in front a a Budlight truck.\nA young cow looking forward while several others drink at a trough in the background\nA white sink sitting under a bathroom mirror.\nPartially eaten cake doughnut with sugar sprinkle topping.\nA grey automobile driving down a city street shaded by several trees.\nA man throwing a frisbee towards a man and two children.\nA toy plane flies through a cloudless sky.\nThe baseball player is throwing the ball from the mound.\nA person riding a skate board in the air.\nWoman in dark, heavy dress cooking in a home kitchen.\nA metro train is pulling into the station.\nA large gray elephant walking across a road.\na man with his arms out waiting to hit the tennis ball\nTwo plates filled with plain hotdogs on a table.\na room filled with a stove and surrounded by cabinets\nA woman on a bike with a baby seat holding a dog leash.\nPeople in the ocean are playing frisbee and sitting in small watercrafts.\nFlying bird silhouetted overhead against cloudy sky background.\nKitchen accessories in a clean, organized  kitchen.\nan old stone building with a clock mounted on the side.\nA pole with several different street signs on it.\nTwo students waiting to cross a busy street.\nA collection of plush animals with big ears and eyes.\nliving room angle with fireplace, bookshelf, furniture, and hardwood floors\na black and yellow train sitting next to a fence\nA young boy sitting on a stone bench in an arid landscape\na large wooden park bench next to some rails\nA store building with stuffed bears in the window.\nA man leads a painted elephant carrying tourists down the street.\nA woman sitting on top of a wooden chair at a table.\nA woman sitting in a vehicle using a cell phone.\nA whole pizza sits on a pan on the table.\nthere are two skiers that are going down the hill\nA desktop computer that is sitting on a desk.\na wooden table with so many tools on it\nMan posing in front of bicycle with a banana in his hand.\nWhite horse looking over shoulder in enclosure of wood\nsome zebras are in their pen eating some food\nA couple of signs hang off of a building\nA commercial airplane on the runway with the jets on.\nA doughnut sitting on top of a napkin next to a cut of coffee on top of a doughnut table cloth.\nA toy baby in a toy stroller in a toy kitchen.\nA homemade pizza, salad and two glasses of wine on a table.\nA long train riding on train tracks through an empty field\nA horse is looking over a fence with a shield on its face.\nSeveral kids are playing frisbee outside in a yard.\nA giraffe is sticking its tongue out at some people\na woman in a red top some glasses and a pizza\nA man riding on his bike and talking on the phone.\nA TV sitting in front of a picture on a wall.\na teddy bear set on top of a child sleeping\nA man is driving with his dog in the back seat.\na man takes a photo of a clean bathroom\nA plate that has a half eaten piece of cake.\nThere is a group of small birds standing on the chairs.\nA wooden table topped with different kinds of foods.\nthere are many men that are playing soccer on the field\nA brown, black, and white cat that is wearing a black hat\nA person on a field with a baseball bat.\nA man riding a skateboard down some steps.\nThe train is pulling up to the platform.\nA red small engine plane in motion on a field.\nA man windsurfs with several other people in the background.\nA commuter train that is stopped at the station for loading of passengers.\nA side mirror with the image of the Eifel Tower reflecting in it.\nA tennis player lurches forward after hitting the ball toward the other side.\nTwo plates of food next to two laptops.\nA woman in business attire walking on a sidewalk and talking on a cell phone.\nA woman on a bench reading a book\nUSA 20 dollars totaling 120, held down by a cell phone with Coca cola cans nearby.\nGirl smiles for picture in busy Asian plaza.\nA small pizza sitting on a sheet of tin foil.\nThree people posing for a picture in front of a cell phone case.\nA person with their car open stands on the snow.\nThis is a small and clean but cluttered kitchen.\nA homemade pizza with cheese and cucumbers in a pan.\nsome one skiing on a snow filled hill\na person helping someone prepare food on a buffet\na bathroom with a toilet a curtain and wooden floors\na cat starring at the camera and a television in the background\nLOTS OF CABLE CARS, ON LOTS OF TRACKS\nred flowers in a jar against a screen\nMan with unhappy face in clothing store shirt section.\nA large white bush stopped at a bus stop.\nA man and woman with blue shirts and bicycles on a sidewalk.\nPurple and gold bed and flowers against a red wall\nThe two teddy bears are posed together to take a photo.\nA person giving a stuffed teddy bear a kiss\nTwo sofas are facing each other in this well decorated living room.\nTwo people carrying backpacks are cross country skiing.\nAn elephant in the shade of a tree.\nA tower white with yellow trim tower features a large clock.\nA large bathroom with a frosted walk in shower\nA man with a racket walks on the pavement.\nA group of children standing next to each other on snow.\nA  MAN IS SKATE BOARDING ON THE SIDE WALK\nTwo boats that have groups of people in them.\na person with a skate board and a back pack\nA baseball player holding a bat next to home plate.\nA man on a motorcycle talking to a woman in an SUV on residential street.\nA group of people riding skis on a snow covered slope.\na sandwich in a plastic food basket on a table\nSeveral zebras walk through the tall green grass.\nfrosted donuts in a display case to feast upon\nA bathroom with a toilet and a shower with a window.\nA triple layer cake with a white hand made out of frosting on top of it.\nA white plate topped with a salad next to a glass of OJ.\nThree carrots sitting on a plate in front of a knife.\nThere are several plates with different pastries on the plate\nA kitchen with wood cabinets, white refrigerator, white stove and a microwave above the refrigerator.\na man in a suit is holding up a beer\nA construction worker smashes away at the roof of a building.\nA hockey player on the court with a bunch of stuffed animals.\nA woman with some of her fingers in her mouth\nTwo giraffes are eating their food from a feeder.\na group of people loading up on a big airplane\nA sandwich and can of soda on a table.\nMan riding a colorful surfboard on green ocean waves.\nGolden lab with smile sitting in the bed of a red pickup truck.\nThere is a small kitchen with black cabinets\nThe steak and hot dogs are being cooked on the grill.\na yellow and black train traveling along a train track\nthere is a street sign that has been bent in the middle\nAn Audi car on an oriental city street\nThe jumbo jet flies over a building with it's landing gear down.\nA woman walking down a street holding an umbrella.\nA bike and some small birds on a field.\nA wooden bench by the water and some grass.\nA bright red umbrella with a view of the ocean and mountains behind it.\nA large pile of stuffed animals is outside.\nThese are user manuals for an Apple mouse and keyboard.\nA bathroom features a large mirror and toiletries next to the sink.\nA group of chefs working in a kitchen that has a statue of a chef.\nAn airplane parked on a runway at an airport.\nA sign at a railway crossing giving instructions on how not to get hurt.\nA beautiful blonde girl holding a Nintendo Wii controller next to a man .\nA group of wild animals walking along a gravel road.\nA woman is sitting down and talking on a cell phone.\nA man standing in a wooded area looking at trees.\nDogs walking down a set of rickety porch steps.\nTwo plates of food are sitting on a tray with forks.\nA very nice looking train by a plat form.\nChairs and a table with a laptop on it sitting outside.\nThree zebras stand in tall grass near a wooded area\nA boy is flying a kite in a field.\na desk covered with electronics, paperwork and a lamp\nA lavish bedroom furniture set of carved wood\nA zebra standing on the grass above a bird and rocks.\nMotion blur photograph of lights at night time\nA picture showing a long line of scooters parked on a city street.\nA man displaying a cake pan at a kitchen counter.\nBrown and white cat sitting in front of an open refrigerator.\nA little child standing next to a yellow fire hydrant.\nA fire hydrant on a side walk in front of a building.\nA person holds a fork and knife to cut pizza.\nA gondola boat ride on the canals of Venice.\nA bench on a pier near a ferris wheel in a park.\nA woman is riding a show horse at a competition.\na bunch of skiers at a skiing resort on a clear day\nA dog rides in a cart pulled by a man on a bike.\nA living room filled with lots of furniture.\nA passenger bus that is driving down the street.\nTwo vehicles are parked in a giant warehouse.\nFruit on a plate next to a book.\nan orange truck people trees and a street and buildings\nthere are many canoes that are in the water\nA number of street signs on some poles\nA long train sits at the station waiting for it's departure.\na dog is playing with a water bottle\nA suitcase sitting next to a brick wall.\nA smiling woman pours a bucket of water into a toilet.\nThe large herd of cows are all around the large field.\nA room has blue walls and a wooden floor.\nAtribe of people ride some elephants out side\nCraft tools and a project currently in progress\nA boy is riding a skateboard in order to skate off the ramp.\nAntique black and white photograph of a horse drawn tram\nA young girl opens her mouth while eating cake.\nView of a city bus through the side view mirror\nThis red pot is filled with a variety of vegetables.\nA for rent sign hanging outside in front of a building .\nA picture of a Wii remote in its packaging.\nA room filled with flowers in front of windows.\nA group of people in grassy area with kites in the sky.\nTwo young people are posing for the camera with their surf boards.\nA herd of elephant walking across a dirt covered ground.\nOrange train on tracks in the country side.\nA young woman on a tennis ball about to return a hit\nA man is taking a picture of himself in the mirror\nA red double-decker bus with a open top level.\nA batter up to plate in the middle of the swing.\na man in a tie and a woman in a hat ride horses\nA quail looking bird is standing in a tree.\nThis man is looking downward while his is skiing.\nA photograph of something in the image.\nA view of an empty kitchen with white and wood lined cabinets.\nThe paper towel holder in the restroom hangs from rope.\nA view of a pizza cut into four slices.\na couple of jets are flying in the sky\na basket with a sandwich and some fries in it.\nTwo men smiling while riding in a bus.\nA dog staring out the window at people standing outside.\nA screen of people playing a baseball game\nA bedroom with a four post bed decorated in black and white.\nA man standing next to a wall sized glass window.\nA small person rides a skateboard modified with large tires.\nA couple of street signs mounted to the side of a building.\nGirl on a couch with her computer on a table\na group of people walking on the street during the day\nA person on a surfboard riding a wave.\nA dog is under a brown computer desk.\nThe woman is playing a game of tennis on the court.\na body of water with three boats sitting next to each other\nA bathroom vanity sink with a large mirror and hairdryer on the wall.\nPeople are sitting in a large room on couches with a fireplace.\nSeveral men in the kitchen with one cutting a piece of meat.\nThere are two beds in a room side by side.\nthere are two blur bullet trains on the tracks\nA lamb with several babies is laying in the grass.\nA bathtub with a colorful wall decoration is seen here.\nSmall flowers are placed in a clear empty bottle.\nHorses in fenced area with grass and hay and adults nearby.\nGroup of people standing around a kitchen area with food on it.\nPlates of various food items sit on tables.\nA baseball player is standing on the playing field.\nA very tall clock tower towering over a green tree.\nThere are plates with food and drinks on the table.\nA man and woman in kitchen preparing food by a stove.\ntwo zebras in the field grazing on grass\nA man stands in an outdoor market selling a variety of fruit.\nA man ries his skateboard around bright green cones.\na cat with its head burried in a shoe\nTwo children on boogiebody boards in the ocean.\nA public bus parked in a bus station at night.\nA man skillfully water skiing in wild water.\na couple of guys standing up with some snowboards in hand\nA cat is standing under a red car.\nA vase with a few large sticks in it next to a sink.\nThe back of an elephant with tusks overlooking a road.\nMan standing on a soccer field holding a frisbee with a dog beside him.\nThere is a clock displayed on the side of the building.\nA group of people sitting at a table eating food.\nThere is a green plant inside a bottle\nA man calmly sitting on a bench with an Indian Head Dress on.\nA bi-plane with a wing walker on its wings.\nA large body of water with a train traveling over it.\nA woman brushing her hair standing in a living room.\na man holding  a tennis racket beside him during the game\nsome oranges hanging from some branches of an orange tree\nA vase filled with purple flowers sitting on a table.\nAn Apple laptop rests on a custom wooden stand.\nA large bed in a bedroom next to a fire place.\nA woman flying a kite on a rocky beach near the shore line.\nA man has fallen off of his surfboard.\nA man catches a wave on his surfboard and holds his arms up to balance.\nTwo men in horse drawn carriage on city street.\nA person sitting on  a snowboard going downhill\nA view of a bathroom with a mirror, towels and a tub.\nTwo men in robes while one has a toothbrush in his mouth.\nA biplane in the sky in the middle of a turn.\na clear road across the street from tall building and water\na red baron pizza cooked in a microwave\nAn all white and steel bathroom with 2 windows\nA car stops to pay a parking fee to a woman\nA black Chrome laptop sits on a desk.\nSome very tall pretty giraffes by some other animals.\nA couple green stoplights on an empty street.\nA little girl holding a green cup in front of a bowl of food.\nA young child with a colorful umbrella walks down a path in a coastal setting.\nA man is holding a small Dell laptop.\nThe sink and mirror in a business bath room\nA table filled with plates of food sitting next to each other.\nA massive crowd of people standing around the Washington monument.\nA man doing a trick on a skateboard.\na bathroom with a toilet, tub and cabinets\nA cat taking a nap upon a laptop computer on a desk.\nBicyclists on a city street, most not using the bike lane\nBusy city traffic in an older part of town.\nA black and white photo of an old steam locomotive on a train track.\nA blue and white train in front of a building.\na man sits by himself on a bench in a stone-paved square in front of a large bed of flowers\nA large amount of tables and chairs are by a clock.\nYoung man pointing at computer keyboard with sprouts growing out of it.\nBaseball game with batter on base and umpire standing by.\nA sign reading \"Car parking,\" is on a fence in front of a herd of cattle.\nSmall black and white cow in grazing field.\nLarge black motorcycle with a motor vehicle sitting on the side.\nA couple of women are walking down the street\nA plate with  sandwich and french fries on it.\nA person is walking in the sand near the water.\na large air plane on a run way\nA cat looking out a window with green trim.\nPerson in pink snow gear on a snowboard.\nJumbo jet flying over a group of trees.\nToilet bowl sitting on the side of the road filled with papers.\nan image of two men working on their laptops\nMany cows travel down the side of a street\nThe family is playing a game all together.\nA large fluffy cat is sitting on a chair next to a computer mouse.\nA man sitting and looking at something with his hand to his mouth.\nA man kneeling on the ground next to a couch working on his laptop.\nThe tennis player stands ready for the next play.\nTwo men hold up glasses of bear above two pizzas.\nA passenger plane that belongs to American Airlines taking off.\nA man checking on some food that is in a white oven\nA wooden table holding two glasses of wine and a plate with pizza.\nLots of bags of luggage sitting on the floor of an airport.\nA group of people playing a game of frisbee.\nTwo people riding on a boat on a large body of water.\nPair of adults and teens having silly fun.\nA young man is using his skateboard on the street.\nA cat sitting in a plastic water bag on a hard wood floor.\nThe elephant wades through very deep, calm waters.\nA female tennis player in action on the court.\nAn animal on top of a table while a bear rides a bike.\nFour stuffed animals against a plain light colored background\nA person walking while brushing their teeth and wearing a red hat.\nA jumbo jet travels down the runway towards the camera.\nA crowded city sidewalk with lots of people.\nA giraffe in an enclosure looking at onlookers.\nA man sits and admires the architecture of a large bridge.\nA sign prohibiting skating on the sidewalk with black and red writing.\nA father taking a mirror photo of him and his daughter brushing their teeth\nA small boy is eating a sprinkled doughnut\nA boy on a skate board crossing the street\nThe skateboard competition is geared to even the youngest boarders.\nA bunch of food is layed out on a white dish\nThis is a shot of someone wearing a pair of skis.\na black and white photo of two people cutting cake\na giraffe standing in a pen next to a tree trunk.\nA handyman moving a refrigerator back into its place.\nA person wearing a Cat in the Hat costume in front of kids.\nA horse drawn carriage waiting outside a hotel.\nA fruit stand with grapes, oranges, apples and plums.\nPlayer returning volley during match play on tennis court.\nSomeone who is applying some chocolate on a cake.\nThere is a street light with two green arrows in different directions\nA plate with bagel, fruit and potatoes sitting on a table.\nA surfer is riding a wave while another swims to catch the next one.\nA giraffe inspects the roots of a fallen tree.\nA bathroom that features a vanity cabinet with sink, commode, overhead cabinet and mirror.\na couple of doughnuts are under a display case\nPeople watching a man doing a skateboard trick at a skate park.\nA man cuddles with a woman who holds a banana.\nA young man siting at a picnic table holding a sandwich while a young girl looks on smiling.\nA hot dog sandwich with eyes on top of a plate.\nTwo giraffe's by large rocks with and ostrich.\nRelax in the chairs next to the pool.\nA pizza with ham, cheese, olives and oysters on a plate.\na black and red brick building with a white and black clock and sign\nVehicle traffic on a city street in a snow storm.\nA NARROW HALLWAY WITH A TOILET IN THE BATHROOM\nA stop sign with bike rack next to street corner.\nA giraffe attempting to lick a woman's hand over a fence.\nA fluffy cat is sitting on a glass table.\na woman swinging a tennis racket on a tennis court.\nThe little girl hold the pole while the man sits on the fire hydrant.\na person riding a surfboard in a body of water\nA woman sitting down holding a cell phone up to her ear\nWell hello there cat, are you up to something?\na blue car sits in front of a red bus\nPeople are walking down a busy city street.\nA woman is brushing her teeth in the bathroom\na black orange and white cat and a controller\nA table with a white plate of food that includes broccoli and chicken.\nA plate holds a portion of a broccoli casserole.\nThe small bathroom has a shower, a toilet, and a sink in it.\nLittle kid on a metal pedal tractor in a yard with sheep.\nA group of people on grassy field with kites in the sky.\nSeveral young children with ties in a school room.\nA man in grey shirt sitting at a table with plate of food.\nA kitchen with a sink, cabinets, and other accessories in it\nassorted decorations laying around on a plastic tarp\nTwo persons in formal dress posing for a photograph.\nA person in a car smiling next to a suitcase.\nA young man dressed in a suit and a tie smiling at someone.\nMultiple person seating bench on the side of a city street.\nA gastric delight of sausage, broccoli and onions.\nA young boy riding a skateboard on a street in front of a house.\nA train riding on a track near a platform.\nThe two woman are baking in the kitchen.\nA green double deck bus parked next to a railing.\nA boy runs in the grass while holding an umbrella.\nA young baseball player stands at the plate, in motion to hit an oncoming ball.\nTHERE IS A BOAT ON THE GRASS IN FRONT OF THE YARD\nA kitchen with stone counters and a bar.\nThis is a shot of a printed recipe for lemon marmalade.\nA large group of people flying kites in a field.\nA man cooking food over an open flame.\nA variety of people sit at tables and watch screens.\nA tennis match from above with a large crowd of spectators.\nA fire hydrant in the grass with a red top and yellow bottom.\nA toilet in bathtub in a home bathroom\nA large open field filled with a large group of cows.\nA kitchen with an oven and a sink.\nthere are many small red trains on these tracks\nA pair of cats laying on each other on a desk.\na man wearing a hat is on his boat and four birds\nA steep incline of snow, a strip of sky and what looks like a large red strut sticking out of the snow make a setting for a helmeted skier in otherwise regular clothes in a twisted posture the points of his or her skis ascending the slope.\nOld photograph depicting train in industrial area of city.\nA clock tower has a group of people inside it.\nA person pets a cat near a weathered building and plant.\nA living room with a large picture window, a fire place, a table and a couch and loveseat\nThe bowls on the table are filled with food.\na plastic toilet in a small bathroom stall\nSkateboarder performing aerial trick in large indoor area.\ntwo photos of one woman playing against a doubles team\nThree people are standing together in the snow on skis.\nA man in a wetsuit surfs a wave.\nA man is riding his dirt bike while wearing a helmet\nFour cows grazing and resting in the shade of trees.\nA long blue classic truck parked in a parking lot.\nA large passenger jet flying through a cloudy sky.\nA tennis player runs and swings his racket at the ball.\nA skier standing on top of a rail on a snowy slope\nA stop sign with a one way directional sign marks an intersection.\nA group of people are gathered for a photo.\nA black cow walking across a stream.\nA young man standing on a skateboard riding a wave.\na bench near a river, situated close to a bridge\nA woman wearing a dress holding a brown teddy bear.\nsome zebras gather around a watering hole in a herd\nA dog lays on a large turtle pillow panting.\ntwo people sitting in a living room touching each other with their feet\nA herd of sheep blocking a parking lot.\nTwo men standing on a tennis court holding tennis racquets.\nA man wearing a shirt and tie next to a barn door.\nA couple of brown and white cows standing on top of a hill.\nA black-and-white photo of a man walking next to a fire hydrant.\nA smiling couple stand next to a bench at the bottom of an escalator.\nSmall dog sitting on bed in bedroom of home.\nThis is an unused bathroom with a sink, toilet, and bathtub.\nA woman stands near a red double-decker bus and uses her cell phone.\nA green, orange and white train in a train station.\nA group of men and women standing together.\nA man riding in a carriage with horses and an umbrella.\nA man holding a baseball bat wearing an old fashioned uniform\nThe view of a large sized bathroom with a tile floor.\na dog sitting in an open suitcase with a stuffed animal\nA man pets a small, baby elephant.\nAn old vase with artwork of an octopus.\nThe cows are ready to eat and drink their next meal.\nA girl with red dyed hair eating a banana\na delivery truck and a couple of bikes\nA laptop displaying a picture of a man.\nTwo giraffes eat leaves from some small trees.\nThis metal plate contains lovely slices of tangerines and pomegranate.\nA dog whizzes by to run towards a yellow Frisbee.\nYoung boy with giraffe in enclosed fenced area.\nA counter holds a plate with bananas on it.\na yellow red and silver train on its track and some wires\nA white table topped with game controllers, remote controls, a keyboard and phones.\nA woman eating a pizza and smiling for the camera.\nview of a snow capped mountain from in a plane\nAn old man with a beard and a bowler hat\nA sink with cups and towel next to it\nSeveral pictures of Asian style dishes and in the middle a person is eating.\na giraffe standing close to a big rock\nA far view of a plane in a cloudy sky.\nThis is the caboose of a freight train.\na large group of people on a beach doing various activities.\na man can be seen walking past a store front\nA cat is playing on a small computer.\na man standing on a tennis court in front of a crowd\nThere is a chair sitting in an empty lawn.\nA bathroom that has a mirror and a bathtub.\nA giant clock tower and a clear sky.\nA child with a bat standing at home plate waiting for the pitch from a pitching machine with his teammates in the field.\nA cabinet that has some stickers on it.\nA man and boy are looking at a cellphone.\nThe restaurant kitchen is closed until lunch tomorrow.\nA couple of women reaching out to a tall giraffe.\nA black baseball mitt, ball, and baseball bat.\nA batter positions the bat in the air.\na woman sitting at a table while using her white laptop\nA bedroom, well lit, has a couch, dresser, comforter.\nA brown clock tower rises above some trees.\nA group of people pose with a horse as a crowd looks on.\nA man standing on a sidewalk beside a park.\nA large blue street clock attached to a post.\nA close up shows a large bunch of broccoli.\nAnimals next to shoreline in artist's painting at sunset.\nThe woman on the bench was near the boy on the skateboard.\nA man smiles while eating a small piece of food\nA teddy bear sitting on ice with a knife stabbed in the belly.\nAn on microwave unit is heating something in a cup.\nThis desk has a one computer and two laptops on it.\nA metal stove and a counter in a room.\nA CHEF IS HOLINDG CARROTS IN HIS HAND SMILING\nTwo cats sitting in their beds beside a window\na clock hanging fron the ceiling announcing the time as 144\nRain falling on a city street filled with people and cars.\nTwo men playing competitive frisbee against each other.\nTwo adult horses grazing in a dry field.\nA flamingo with its hear scrunched back near its feathers.\nA blond married man in a green T-shirt sits in front of a computer keyboard taking a bite out of a donut.\nSmall brown dog with colorful braids sitting on a couch.\nA cat sitting on top of a wooden table next to a yellow motorcycle.\nA picture of a person that is looking into the water.\nTwo small elephants standing together in the wild.\nA little boy that is standing on a skateboard.\nthere are four small beds all in the same room\nA cat sitting in a sink in the bathroom\nA dog and a cat standing side by side.\nTHRE ARE TWO PEPLE THAT ARE SITTING ON THE BENCH\na man is playing tennis on the court\nPeople are gathered around vintage cars at a car show.\nA seagull strolls on the beach during sunset.\na group of people are crammed in a small area on motor bikes\nThis guy is in the country about to fly a kite.\nFour remote controls attached to the side of a television.\nThe pole of the stop sign is covered in vines.\nThere are some people riding horses on a field\nA woman sitting in the sand holding a kite.\nA giraffe standing near a tree branch in the grass near a grove of trees.\nA laptop with keyboard and mouse separate on a desk\nTheir is two pieces of bread on a white plate\nA living room with a decorated christmas tree in it\nTraditional view of crowded city residences with big bridge at the end of the street.\nA group of people sitting at a table eating food.\nA young boy is herding rams with a stick.\nA grinning man stands outside in the snow with a snowboard.\nThree people on a baseball field with catchers mask and baseball bat.\nA computer desk with a laptop computer on it.\nTwo giraffes inside a white fence in a yard.\na double parking meter near a tiled wall\nA person is walking through the snow on skis.\na woman smiles at two babies who are laughing at each other\nA girl happily eating a pizza in a restaurant.\nFresh fruits such as apples, pears, watermelon, and apricots.\nA group of men playing a game of frisbee on a field at night.\ntwo snow skiers coming down a snowy hill\nPeople skateboard down the sidewalk and the street.\nTwo elephants standing next to each other in front of a face.\nThe teddy bear sports some very unusual colors.\nAn upside down sign saying \"Road Work Ahead\"\nA teddy bear is sitting down reading a book.\nA person in a black snowsuit pulls a kid on a sleigh who is holding a birghtly colored umbrella.\nA pair of yellow scissors sticking out of a cow pattern bag.\nA couple of sheep grazing on the grass in a pasture.\nTwo girls walk down the street carrying pink umbrellas\nA man is performing on stage at an event.\nSeveral bunches of bananas growing on a tree\nthere is a cat that has fallen asleep under a car\nA red sleeping area in a bedroom scene\nONIONS, TOMATOES AND OLIVES ATOP A PLATE ON A TABLE\nA young child dressed like a chef cutting broccoli\nA normal bench sitting on a wooden bridge\nA small calf next to a large cow in a field.\nA man riding a wave on a surfboard in the ocean.\na large pizza that is in a box\nSeven stuffed teddy bears lined up against a wall.\nGroup of cars passing by a long row of apartment buildings.\nSeveral small boats are on the water on a foggy day.\nA bathroom that has some blue tile on the wall.\nA pizza has meat toppings on a square plate with other food items on a table.\nThe bathtub and sink of a bathroom with a large mirror.\na girl that is playing some tennis on a court\nA chocolate cake with candy on top of it.\nA fire hydrant stands in front of the entrance to an apartment.\nA zebra in front of a barn and pen.\nA bird holds its wings up as it wades in shallow water.\nA child in the window of a paper fire truck.\nA surfer hitting a trick on top of a wave.\nA cow laying on a green field next to it's baby.\nA very cute small boy holding up a cell phone.\nA holiday wreath with stuffed teddy bears and a penguin.\nA young man on a skateboard rides past a cafe.\nThe items are on the conveyor being ready to be put on the wagon.\nA closeup of a waxed surfboard in a surf shop.\nA car at a show with people in background.\nA car's passenger side mirror reflects the image of a long freight train.\nA sandwich in a box with carrot sticks and an apple.\nTwo men holding bottles with ties on their heads\nA man holding an umbrella on a sidewalk.\nA person with a tie is holding a baby.\nA red fire hydrant sitting on the side of a road.\nA man purses his lips while holding up an orange in front of his face.\nA giraffe is standing tall in an enclosure with large plants.\nA young boy sitting at a counter drinking from a straw.\nTwo young people playing a game on the Nintendo Wii.\nA bowl of cherries on a table in front of different fruits.\nA person is on snow skis on a mountain top.\nA delicious looking donut or cinnamon roll covered in icing\nA young boy contemplating skating on the pipe.\nSome people in a large kitchen preparing food.\nA large airliner is taking off from the runway\na pizza on a plate on a table\nA birthday cake is topped with a dog's head made out of frosting.\nThere is no toilet paper in this tiny, claustrophobic bathroom.\nA baseball catcher standing and ready to throw a baseball as an umpire looks at him.\nA BMW motorcycle sitting in a marina with boats.\na horse pulling a little carriage down the road\nA person with a black, grey and green striped tie on.\na person sitting on a city street talking on a cell phone\nA highway scene with a bus and a car behind a cattle truck.\nA plain white bathroom with a sink and toilet.\nA couple of people standing in front of a TV.\nthere are many fruits and vegetables on the table\nA messy living room with pictures on the wall\nLarge pottery and bonsai trees are sitting outside.\nDonuts with frosting in foreground, plate underneath them all\nA blue motorcycle strapped onto a vehicle trailer.\nA man with a hat is sitting by a television.\nSkateboarder with leg tattoo riding on a skateboard\nA man placing a baseball on a tee for a child.\nA large bunch of baby bananas still green and in a basket.\nA woman poses for a photo while sitting on a bench by the seaside.\nLady with sunglasses under pink umbrella at outdoor event.\nThe stop light found at a Hocken avenue intersection.\nA man loads bananas high on top of a banana truck.\nA bus with its doors open is waiting at a bus stop.\na tall giraffe peering over some trees and shrubbery\nBlack and white photo of a skateboarder doing a trick.\nA pizza that is made of many various ingredients.\nA red church in between two plain buildings.\nA photo of a woman eating a hot dog.\nSmall black object sitting on the inside of a toilet bowl.\nA very large commuter train is going down the track.\nMen at the beach with one holding a surfboard\nTwo people looking into an empty, lighted wall oven.\nA sandwich with broccoli, onions, cucumbers and other food on it.\na bird sitting on a fence against a lake\ntwo plates of food on a table with chairs\na pair of scissors cutting  sheet of plastic cups\nA group of people that are sitting at a table talking.\nwhite plates that are covered with assorted donuts\nA nice bathroom has a sink on glass.\nThe horse is tied up to the post outside.\nA living room with orange colored walls, and a purple chair.\na view of a flock of sheep grazing in a field.\nA man in a black jacket holds a toothbrush in his mouth as he stands near a woman with her eyes closed.\nAn elaborate metal vase holds a decorative bouquet of flowers.\nA woman cross country skiing with her dog.\na blue double decker bus traveling down a street\nBirds perched on iron poles in front of a tall building.\nA man is asleep in bed with a laptop open on his lap.\nTwo horse drawn carriages travel down an old looking street.\na male in a red shirt eating and some people and lights\nA motorized bicycle covered with greens and beans.\nSeveral planes are parked in an airport field.\nHerd of elephants standing in waterway near man in orange shirt.\nA professional baseball player in the middle of a swing.\nThe parking lot by the market is full of cars.\nTwo plates sit on a table as one plate holds a sandwich and the other holds a cup of soup.\nA young man is in action with a frisbee.\nA man sitting on a bench surrounded by trees.\nA desert and fork on a plate with multi colored polka dots.\nA group of men in uniform riding a bunch of horses.\nA blue and black motorcycle parked next to a silver truck.\nA man skiing in the air above a snow filled mountain top\nA cat is sleeping in a window sill.\nThe basket of lemon is near a rubber duck in the large bathroom.\nCarrots, peppers, and zucchini resting on a paper towel.\nA crowd of people are under a tent with a giraffe.\nA table topped with a banana and other items like scissors.\nA snowboarder is at the top of a snowy hill.\nA very big building with a bunch of chickens.\nthere are two very large beds inside of this room\nA man riding over a wave with a surfboard in the ocean.\nA room with beds and suitcases and other items\na clock hanging down off a brick wall in a row of circular hanging light shades\nAn old red fire truck with 3 kids sitting in it\nA man wrestling with a calf at a rodeo.\nthere is a small plane that is flying in the sky\nA man riding a white and brown horse in the dirt.\ntwo white plates with pizza a pitcher of wine some glasses and silverware\nA giraffe running in an open barren desert.\nAn orange and white food truck is parked inside.\nA bird stands near a car in the snow.\nA white vase that is holding a pink and white flower.\nA person wearing a large hat standing in front of a building.\nA white and blue train traveling down train tracks.\nTwo large gray elephants standing in a dry grass covered field.\nA man and a woman standing in front of a train.\nA baseball player, catcher and umpire in a baseball field.\nBroccoli, carrot, and dome other items on a dish.\nA man surfing the waves on his surfboard in the ocean.\nA plant coming up from the inside of a square pipe\ntwo people driving motorcycles next to each other\nA small white building with a clock tower on it.\nA person on a skateboard up on a ledge.\nA picture of a blender with some liquid in it.\nA group of young boys riding scooters at a skate park.\nTwo women share a red umbrella walking down the street.\na person holding an open umbrella near a small pool\nA man that is holding on to a racquet.\nA cat sits on top of a toilet in front of a bathtub.\nThe giraffe is standing with its head between the tree.\na giraffe is eating food from the branches of trees\nA tall brown elephant walking through a lush green forest.\nStuffed toy dog lays next to laptop that the woman is staring at.\nA group of people sitting around a wooden table with food.\nThree people are standing and throwing a frisbee.\nA desk with multiple computer monitors and a laptop.\nA man swinging a tennis racket at a tennis ball.\na living room with two white couches, a fireplace and a window with a view\nA wooden desk topped with a computer monitor and keyboard.\nPeople are getting off a bus in the evening.\na line of parking meters with buildings and vehicles in the background\nA full view of a plate full of delicious food.\nA horse standing in a secluded field of a Mountain Valley.\nA plane fitted with pontoons moving around in the water.\nA large dark sheep stands with two young ones.\nTwo people sitting on a ski lift over snow and trees, one wearing skis and one wearing a snowboard on feet.\nA brown bear walking in an open area.\ntwo pizza pies sitting on top of wooden pizza racks on top of stovetops.\nA white kitchen filled with appliances next to a window.\na rum cake vender in a yellow truck\na laptop on a small table with a mouse\na bunch of random stuff sitting together on a tablecloth\nA beautiful young lady walking a black and white dog past a hotel.\nA table that has two cakes on it.\nSeries of propeller airplanes lined up at an airport.\na silver gray subway train parked in a subway\nA paddle surfer riding a small wave in the ocean.\nPots, pans and a collander displayed on a kitchen cart.\nA person doing a trick on a skateboard.\nA row of orange trees sitting along side of a dirt road.\nThere are bicycles parked along a stone sidewalk.\nthis tennis player stands waiting for her opponents serve\nSeveral people riding their bikes down a sidewalk.\nA woman that is sitting down with a book and umbrella.\nA woman is sitting with a congratulations sign\nA bunch of bananas are hanging from a rack\nA person is skiing on a lake while holding a rope attached to a parachute.\nThe man had to bend down to kiss the horse.\nA group of cows walking down the middle of a street.\nA small boat floating on a lake at sunset.\nA bowl of oranges and bananas is in the center of the table while a plate of toast and eggs is towards the end.\nBright white bathroom sink and shelf with folded towels on a shelf.\nA white bathroom area with a plant and yellow bottle.\nA cat stretching its paw over a keyboard.\nman riding a blue surfboard in the ocean\na group of bikers driving down the street\nMan standing in office with glass walls eating a donut.\nA MAN IN A RED SHIRT AND JEANS PLAYING A VIDEO GAME.\nA man with a skateboard talking to another man.\nA cat rubbing its head against a person's shoe.\nSomeone holds a donut in front of a box of donuts.\na couple of giraffes are sitting in a pin\nTwo large adult elephants have saddles on their backs.\nA laptop, a mouse, and a pen are on the wooden table.\nThree surf boarders talk on a dirty beach covered with seaweed.\nA view of the back of a bus from inside.\nAn elephant is carrying people across a forested area.\nShe is sleeping with her dog on the couch.\nA young boy sitting on a rug holding a cell phone.\nA man standing next to a woman as they prepare food.\nTwo people sit on a bench in an grassy area in the midst of some building.\nA parking meter is on the curb of a hilly street.\nA umbrella stuck into sand at a beach with boats and hills in the background.\nTwo men are riding on motorcycles through the air.\nThe bus is stopped at the street corner.\nTwo pieces of luggage leaned up against a tree.\nA cat stretched out next to a persons leg who is sitting in a chair holding a laptop in their lap.\nThe snow boarder is snow boarding down the mountain.\nA man is seen walking out of a building.\nA green birdhouse sits on a wooden platform in a garden.\nA young elephant holds its trunk up to its mouth.\nCrates of different vegetables stacked next to each other\nA grey cat wearing a hat is getting petted.\nan image of a girl that is playing outside in the field\nParked motorcycles and an old yellow school bus\nA couple of people on skis examining a park description sign.\nA picture of a stop sign with a small green smiley face sticker.\nA vase of flowers, money, and a bottle of wine sitting on a table.\nA clock sits above green bushes under a blue sky.\nPassengers wait on a platform for the arrival of a train.\nA large truck with crane scaffolding on the back.\nan athlete holding a tennis tacquet in a stadium\nTwo uncooked pizzas has different ingredients on each.\nA man wearing a blue tie with the ten commandments on it.\nFive youths stand together holding tennis rackets on a court.\na wall with a bunch of graffiti on it\nA man in gray and black holds up a small cell phone.\nThe man is walking up the ski slope.\na person with an orange beanie taking a picture of a gray train\na plate full of vegetables sits on top of a table\nA woman and children surfing in the ocean.\nA few men standing in there military uniforms .\nA woman eating a doughnut sits behind a box of doughnuts.\na close up of a person riding on the back of an elephant\nA planter box of vegetables in a fenced garden.\nA man standing on the sea shore with surf board in his hand.\nPurple teddy bear with book in its lap staged to look like its reading to a small orange stuffed bear beside it.\nTwo giraffe standing next to each other under a cloudy sky.\nA woman is walking down the street with a red and white umbrella.\nThe cat is sitting on a person with a laptop on their knees.\nA dual monitor station also hosts a cup of coffee, water, and a thin keyboard with a mouse.\nA white fishing boat being followed by birds\nMan and woman sitting at table enjoying meal with wait staff seen in background.\nTennis court match with a player on each side of net and people in audience.\nThe two ball players are setting in the dug out.\nSome people sit together for a meal.\nThe very large, spceous bathroom has carpet and a jacuzzi.\nA surfer wears a completely black wetsuit including a head covering.\nOne bird on top of another on a tree branch.\nA man holding a skateboard in front of a group of people.\nThe intersection of a city street at a red light\nA group of people sitting around a table together.\nA young boy who is eating some food.\nA line of young skiers ski down a gentle slope.\nA delicious lookign healthy vege pizza in a box\nthree giraffes behind a fence with a tree near by\nGiraffe and other animals graze in tall grasslands.\nA man standing next to a red motorcycle in a parking space.\nA man rides a cow through a parking lot.\na close up of a person with a plate of food on a table\nA man in blue shirt standing by a brown and black dog.\nA large brown teddy bear laying on top of the ground.\na bowl with some noodles inside of it\nTwo people and a dog that are standing together.\nA man in brown shirt jumping with skateboard over gap.\nThe man is on a ladder painting the walls.\nSeveral sheep standing and grazing in a yard.\nA collage of photos shows different foods being prepared.\nThere are a lot of items laying in the bathroom floor.\nAn old style white stove with a kettle on it.\nTwo women are about to cut into a chocolate heart cake together.\nA man and little girl sitting on a bench near a parked airplane.\ntwo cake doughnuts with three strawberries and a cup  of coffee\nFour fighter jets fly through the sky leaving a trail of smoke.\nA man sitting on a motorcycle near several bicycles with a partially visible person standing nearby.\nA young girl being pushed on a skateboard by her brother.\nOne lamb, amongst other lambs, looking directly toward the camera\ntwo sheep sitting on a hill next to a fence\nA yellow fronted train is going down the tracks.\nThis is a bathroom that is painted an ugly mustard color.\nthere is a woman dressed in a costume holding a bear\nA parking meter and a car on a road.\nA train car with graffiti on the side of it.\nA person with a guitar hung on their body while playing a keyboard.\nA paper, laptop, cellphone, mouse and bottle sitting on a table.\nA haul of produce including squash, bananas, and mushrooms.\na bath room with a toilet a sink and a bath tub\nA small herd of sheep grazing in a grassy field.\nA green vase filled with multi colored candy canes.\nA group of children playing in the snow.\nsome soldiers cutting into a decorated sheet cake\nA red hammock set up in a wooded park.\nA snapshot of a family at a store taking a picture together.\nA blue motorcycle parked on the side of a road.\nThree men holding baseball bats dressed in full uniform the first man is holding the bat and the man in the middle has his hands crossed and the third man is holding the bat with both his hands cupped together.\nA man standing on a dock next to a boat.\nA man and woman playing a video game together.\nA giraffe standing next to a horse in the grass.\nA meat filled sandwich sitting next to a cup of chili.\nA couple of sheep in the middle of a grassy field.\nA bathroom with blue walls has a window, a sink, a bathub, and a toilet.\nA bike and a dog on the sidewalk outside a red building.\nA cat laying on a pillow on a couch\nPeople in casual sports uniforms running and jumping around.\nA large double decker bus is driving down a street.\nA photograph of a kitchen inside a house.\nA field with horses on a cloudy day.\nA Dilbert doll sits on a table next to drinks and a plate of donuts.\nA guy is doing stunts on his motorbike.\nA room full of American soldiers eating pizza.\nA kitchen, including a table, oven and cabinets.\nA vase full of flowers is sitting on a deck.\nA traffic light and street sign on the road.\nA cake that is shaped to look like a child's toy.\nA group of friends sits in their living room while playing video games.\nBlack and white photograph of man on skateboard carrying a surfboard.\nTwo small babies sitting in feeding chairs with spoons in their mouths.\nA woman that is kneeling under a elephants trunk.\nA church lit up at night in a town.\nMan with glasses talking on cell phone in car\nA dog and a man are herding sheep.\nA baseball game with a batter and a catcher.\nA family riding on the back of an elephant\nThe man stands on a stage as his neck tie blows in the wind.\nHarvested bananas, still green, sit in a pile.\nA cat standing in the fridge with milk and juice.\nAn electric train pulling into a train station.\nwomen sitting on a bed while man is getting dressed\nA pizza is sitting on a pizza stone fully cooked.\nA person waiting to perform a stunt on his skateboard on a quiet street.\nAn orange cat laying on its' side.\na hotdog a hamburger  and some onion rings\nan old rust bucket truck with a cracked mirror\nSeveral teenagers are playing soccer in a field.\nA very blurry picture of an intersection taken from a moving car\nA white and red boat in water with lighthouse in background.\nA plane prepares to land on an airport runway.\na meal with meat, rice, and vegetables\nMan poses for picture while sitting on the motorcycle\na person riding a surf board with a sail\nA miniature blue train engine sits on the tracks in a rural setting.\nAn Asian gentleman sitting in a blue chair at an open office area.\npeople skiing down a roped off section learning\na bunch of traffic driving on a city street\nLong empty white bus sitting out in the parking lot\nThe back side of a small charter jet flying through the air.\nA balding man with glasses, standing near a bridge.\nAn open door leading to a small bathroom\na close up of a sandwich on a plate\nA homemade pizza with gourmet toppings cools on a plate.\nA woman standing in front of a large candle lit cake.\nA professional baseball player holding a bat during a game.\nAntique black and white photograph of surfers on a California beach\nthere is a large pizza with toppings on it\nTwo slices of pizza sitting on a white plate with soda near it.\na large building with a fence in front of it .\nA restaurant sign hangs in from of a large oak tree.\nThe pipe smoker enjoys his nightly  smoky ritual.\nA group of snowboarders poses for a picture on top of a mountain.\nA man jumping up with is hands raised while playing Wii\nA duck is in the air flying over water.\nA group of snow skiers waiting  at the top of a mountain.\nSkiers on a snowy slope stop for a rest.\nSome animals that are sitting in the street.\nTwo small beds are sitting side by side\nAn empty side walk with in a city\nA man flying through the air while riding a skateboard.\nA double decker bus driving while it snows.\na little bathroom with a striped tiled floor\nsome people a clock tower and a black and white clock\nA saddled horse tied to a rope on a beach\nThe street sign in posted near people walking across a road.\nPeople watching two school buses crash on a dirt field.\na zebra is walking around in the snow\nA metal wire fence confining sheep inside a grassy meadow.\ncolorful head pieces on large elephants for entertainment\nSome very big trains one of them blowing smoke.\nPhoto of a man riding an old styled bicycle near what appears to be the Golden Gate Bridge.\nA woman is walking and holding a kite\nA siamese cat playing on the bed with a tabby.\nA dozen surfboards are lined up on the beach shore.\nblack furry dog sitting in front of yellow fire plug\nA man playing frisbee with a child in the park.\nA bus is making a left turn behind a white car.\nMan looking at camera taking a bite of food\nA road sign advertises luxury while a cow rests on a dirt lawn in front of run down buildings.\nA seagull holds a small fish in its beak\nSoda with a plate of food, such as, pork, macaroni, and corn.\nA desktop and a laptop sitting on a desk.\nA flock of birds in motion of a field of grass.\nA painting of green apples next to a bunch of bananas.\nTwo cats lying stretched out on a bed.\nTwo horses are standing together on the beach.\nA food truck that sells soft frozen lemonade that is parked near other cars and kites are flying overhead.\nA bouquet of different flowers is in a vase.\nA snowboarder catching some air over a bump.\nA woman ordering food in a dark restaurant.\nThese are crab cakes served on lettuce leafs.\nA man is flying a kite at a park.\na group of zebras together in the grass\nA school bus covered in art and a sign.\nA holder with toothbrushes, toothpaste, make-up and earrings.\nTwo men stand holding skateboards in front of them.\nModern looking living room with white flooring and furnishings\nThree red traffic lights suspended above an intersection by a cable.\nUnoccupied park benches near very unusual, leafless trees.\na person riding a skate board on a skate park\nA photographer holding a camera is looking in a mirror.\nA busy street with many people walking down the sidewalk.\nThis is a man and a dog walking towards the water.\nA large display with many watermelons and bananas.\na silver and blue fire hydrant lights and grass\ndiced meat and tomatoes are mixed with cheese and pasta in a large bowl.\na person wearing a vest, collared shirt and tie in front of bookshelves\na man standing on the street at the bus station\nA brown and white cow standing in front of an iron fence.\na foot long hotdog and a regular hotdog and a mug of beer\nsome people walk down a city sidewalk by stores\nA black cell phone resting on the table.\nMan standing in a living room holding up a Wii controller.\nA drink in a mason jar sitting beside a vase of pink flowers.\nA clean bathroom with a white toilet and black bath mat.\nA bunch of kids and some grown ups skiing.\nA woman holding a tennis racket while people watch from the stands.\nA lot of oranges are on a plate, with some having spilled onto a table.\nA large building that has a clock on it.\nA train with a red and yellow engine on a railroad track.\nA man and a woman holding Nintendo Wii controllers.\nI love the way the sun is creeping behind those two buidings\nA pair of glasses and a cell phone next to a laptop.\na close up of a street sign with a building\nA man holding a phone up to take a selfie.\na lady petting a giraffe behind a fence\nThe dog is laying on a rug in the the living room.\nA man wearing a tee shirt eating a sandwhich.\nSheep are laying down together in the snow.\na vintage photo of some cows grazing on some grass\nA truck driving down a rural dirt road near a street light.\nA horse carrying a carriage getting a drink of water.\nA child wearing a hat, tie, and white shirt smiling\nA small baby is eating a long banana.\nA man gets ready to swing a tennis racket.\nBananas on a table woman using a cell phone on another.\nA snow boarder is in mid air on the mountain.\na train sitting next to  a  pedestrian sitting on a bench on a  railway platform.\nA crowd of people standing next to a parked truck.\nA woman with a cake and bag on the street\nA woman seated and another standing with a cake and soda on the table\na book and a tablet on a black desk\nA little girl in a green dress watching a herd of sheared sheep.\nThree horses are seeking the shade of a large cottonwood tree.\nA man in a red shirt in midair catching a flying disc.\nA sad, young girl sits on her bed, moping.\nA hill that is used for people to ski on.\na kitchen with a stove and a refrigerator\na collection of stuffed animals with some wearing party hats\nA view of a room with a couch, television, and a fireplace.\nA toddler holds a tennis racket that is bigger than they are.\nFarm animals graze in the grass in the sunshine.\nA skillet full of broccoli and vegetables cooking.\nThere is a stuffed bear in an electric chair\nAn area of a city street section off with police tape.\nHe should be careful not to get sauce on his notebook.\nBlue umbrella in black and white photo of crowd of people\nA bathroom in the process of being remolded.\nAn empty bench sitting under a nice big shade tree.\nComputer stand with large monitor in cluttered room.\nA scooter is parked on the street in front of a car.\nA man wearing a black jacket next to a brick wall.\nAn old woman is playing with her two dogs\nA boy wearing a green shirt and helmet is leaning up against a black fence while standing on a skateboard.\nthere is a sandwich and a bowl of food on a white plate\nA laptop and a tablet on a wooden table\nA person is wind sailing in the ocean.\na silver oven and stove and some brown cabinets and bottles\nA small stuffed bear with a red hat.\nA gray and white cat sprawled out on a sandy surface outside.\nan upset adult baseball player throwing a baseball bat on first base\na car and a rear view window on a dirt road.\nA bathroom with two small windows and a bathtub covered in a shower curtain.\nA baseball player in red shorts prepares to swing at the ball.\nSome people and chickens hang out in an undeveloped space.\na person holding a hamster holding a piece of broccoli\nA surreal photo of a chair, a clock tower and a table suspended from the side of a building.\nThere is a person in animal suit holding large toothbrushes.\nTwo girls enjoy playing a game on the Nintendo Wii.\nA person in a shirt and tie is holding a can.\nA couple of ladies are playing tennis in this 3D image.\nA bag on the floor with various items around it such as sneakers, clipboard, scissors, insect repellent and paper towels.\na guy taking a picture of  some art work on the wall\nA beverage cooler and counter area in a small store.\nA man wearing a blue shirt maneuvers to volley a tennis ball.\nMandarin oranges tangerine on yellow with blue trim bowl, white counter top.\na few baskets of food that is on top of a table\nA small pizza has a curly topping on it.\nA wooden caddy is full of scissors and pens.\nA flat screen tv on a wooden shelf in front of a green wall.\nTwo cows standing on a dirt road next to wild green brush.\nPeople spending time on a beach during the summer.\nA blue vase holding pink carnations and white daisies.\nThe street sign is for Curran Street and 10th Street.\nA little girl standing next to a boat on a beach.\nYoung men are playing frisbee in a park.\nA clean white stove with a stainless steel pot on it.\nA night scene of a traffic light in front of a parking lot.\nA black bear laying on top of a field near trees.\nA group of tourists are feeding some elephants.\nA kitchen with wood cabinetry and a double sink.\nThe airplane is being serviced so it can make it's next flight.\nYoung men playing on the beach with a cow in the foreground.\nBrown bear standing next to a big log.\na close up of a pizza with broccoli\na small boat on a beach with trees in the background\na bunch of orange cones sitting in the road\nA couple of kids are on their laptops\nSwans are swimming in the pond at the park\nTwo snow skiers pose to have their pictures made on their way uphill.\nA bathroom with tan tiled floors and a glass shower.\nseveral people are waiting to board a train\nA surfer standing on the beach in front of his board\nA woman is on skis riding down the snow covered sloped.\nAn animal that is looking at something in the air.\nA man wearing a suit and tie and red hat with a silver buckle.\na couple of kids that are playing some frizbee\nA kitchen is completely decorated in white and black.\nA baseball player holding a bat in both of his hands.\nCrowds of people on a street corner and a bus picks up people.\nA woman riding a bike down the street.\nA man bites in to a piece of food while outside\nA teddy bear sitting in a fake bath tub with a rubber ducky.\nA car crashed into the side of bus on a busy city street.\nMan with piercing riding a skate board through neighborhood\nA man in red jacket snowboarding down a snowy hill.\na lady in a chair touching a vase that is on the floor\nA girl sitting in a chair holding a laptop in her hands.\nCovered and uncovered produce is sitting on tables at a market.\nA black and white image of a shipyard with some boats.\nA computer desk with various items around it.\nA couple of baseball players standing on top of a field.\na woman with flowers in her hair staring at the horse next to her\nA skier skiing on a snowing day with trees in the background.\nA cloud rolling over a ski slope with skiers watching.\nSome women who are cooking a pizza on a grill.\nA little girl cutting up food on a  cutting board.\na brown horse feeding on the grass which is well cleaned\nA woman holding a cell phone while she smiles.\nAn older man wearing a suit and tie.\nA couple of animals on a grass field.\nA wooden table with a hotdog and a pitcher of beer.\nAn otherwise ordinary roof and chimney are offset by an ornate tower resting in the middle of the roof that features ornamental work, a walkway, a weather vane, and a clock.\nA person in blue ski pants on skis going down a slope\nA girl in a hat sitting on a dock near the water\nA woman laying on the floor next to a dog and a cat.\nAn old smiling lady holding out a remote.\nMan behind counter in  shop with coke cooler, newspapers, condiments on table.\nA skate boarder flying high in the air over steps.\nA red stop sign on the street in the snow.\nTwo adults and one baby elephant walking in the woods\nA woman is jumping her horse over a piece of wood.\nMulticolored kites flying in the blue sky with a few clouds.\nA boat that is on some wooden cylinders on a beach.\nA window stands beyond a large tub in a room.\nA transit bus riding down a street with trees lined along it.\nThree surfers standing in the sand holding surfboards\nWe see a blurry picture of a person riding a bike through a field with some cows.\nA small airplane flying over a field filled with people.\nA sign on a street post advises smiling.\nthis grizzly bear is standing in some shallow water\nA woman walking up some steps towards a door.\nA fried piece of lobster sitting on top of a table.\nA person and a dog playing with frisbees.\na table that is full of many different  teddy bears\nMany skiers are walking through the snow with skis and poles.\nSomeone is showing a text message to the camera.\nSlices of pizza in a box next to a DVD movie.\nA dumptruck is parked on a street near a hill.\nTwo young cows standing next to each other.\ntwo brown bears lying together and relaxing on a rock\nan elephant extending his trunk out and on to the ground\nA close up of a plate of food containing eggs and toast.\nA zebra and her baby walk through dry grass.\nAn older man in shorts with flip flops and an umbrella standing next to a luggage belt.\nA male and a female walking together in a military airport.\na cat and a dog near one another\nA woman and child are in the kitchen eating food.\nA tennis player getting ready to serve the ball.\nA bathroom scene with focus on a mirror and a bathtub.\nA person in a ball cap sheering a sheep.\na coin-operated parking meter stands beside a brick wall along a parking lot\nMen standing around outside on possibly a movie set\nA dog laying on a red couch in a room.\nA woman with glasses contemplates something as she rubs her chin.\nTwo persons on the sea shore holding a ski board.\nPeddlers in boats on the waterway talking to people on the sidewalk.\nan older person on an air plane looking at a display on the back on a seat\nPeople are standing outside near a clock tower.\nA display of vintage items including an antique television, Barbie dolls and a lunch box.\nYoung boy dressed in a large baseball uniform.\na woman holding a mitt during a baseball game\nA large television screen in a large room.\nA photoshop of President Obama and a celebrity\nA cat sitting on a bench in front of a building.\nA man sitting on the floor by a window with an electronic device\na train is moving forward letting out a huge puff of black smoke\na man in glasses gazing at the pizza on the table\nA sign with plants and shade umbrellas sitting on the side of the road.\nAn old blue truck is on a grassy area.\nLittle girl covering her face and sitting in a wooden chair outside of a door.\nA humble kitchen has a stove and microwave.\nA herd of sheep standing outside of a pen.\na large bunch of flowers outdoors in a field\nThe cat is on the counter in the bathroom.\nWhite goose with young floating on water in daytime.\nA cow grazes from a junk pile, as a bird of prey soars overhead by the side of the road in a desolate setting.\nThe single bird has a small head and a large body.\nLarge group of food sitting on top of a table with white dishes.\nA toilet and sink are connected to a steel piece.\nA girl is standing outside flying a kite.\nA plate with meat, broccoli and cheese and a potoato.\nSeveral elephants walking on dirt and grass near body of water.\nTwo young girls holding hands in front of giraffes\nA man rides an elephant across a body of water.\na train that is on a train track\nSeveral countries have their flags displayed with flower memorials at the base of  lighthouse.\nA fat hipster wearing a gray hat, a pink shirt, and a black butoniere.\nMotor bikes with multiple packages driving on city street.\nA group of tourists watch a herd of sheep in a field.\na street sign with a sticker on it to make it look like someone on a cross\nCouple walking with an umbrella in the dark.\nA bathroom with a toilet and sink below a window.\nA white bird with a long black peak standing near the ocean.\nA person riding skis on top of a snow covered slope.\nbrown cabinets in a kitchen with black appliances\nA man with a tennis racket and ball is on a tennis court.\nA foot long sandwich on a plate on a table.\nA tennis player holds his racket with two hands\nA living room filled with furniture sitting on a hard wood floor.\na train covered in black dirt sitting in a fancy train station\nChildren pay adept attention at a party as someone speaks.\na man riding a boogie board in the water\nA ram laying down in the hay inside a wood enclosure.\nA large number of suitcases cordoned off by rope.\nA man eating a slice of pizza without holding the slice in his hands.\na man is in a salon getting his hair dryed\na lamppost during the day with two street sign\nThis is a game of professional baseball being played,\nA motorcyclist walking away from his motorcycle that is parked beside the road.\nThe horse is approaching a man wearing a camera.\nA ski slope with one skier on it doing the snowplow.\nA large skylight inside of a building with a high ceiling.\na young man holding onto a bat by a sign\nwoman takes a picture of herself in a mirror.\nA young woman kneeling behind a small stone wall.\nA set of bulls lying on the ground next to a boat.\nTwo sheep stand next to a fence on grass.\nA whole sliced pizza and a can in a box.\na bathroom with a toilet and a sign on the lid\nA woman balances an umbrella on her finger.\nA striped zebra is on short grass by a forest.\nan image of a cat on top of a couch\nElectric train car, on tracks with car carrier in background.\na man with a green bandana holding onto a kite string\nThere are different appliances in the middle of a kitchen.\nA man bent over in an open grassy field with something in his hand.\nTHIS IS A PHOTO OF A SMALL HERD OF COWS WALKING DOWN THE ROAD\nTwo laptops are stacked on top of each other on this desk.\nA painting of a blue fish flying through the canvas\nA single engine plane painted yellow flying overhead.\na man in the kitchen cutting something on a cutting board\nAn umbrella standing upright in a room on the floor near a wall.\nA woman is pointing and holding a hair dryer.\nTwo women who are holding papers and wine glasses\na street post with lights while clouds go by\nA large tower stands tall in front of a blue sky.\nA snowboarder posing for the camera on a snow bank.\nA group of rescue workers helping an overturned car\ntwo people standing side by side holding a glass of wine\nTwo giraffes out in the sun either in a zoo or in the wild\nTrain with its lights on a train track at night.\nA man flies a kite by the water side.\na sanctuary sign and a tall clock tower\nFive giraffes in an enclosure on a sunny day.\nFruit baskets and dips on display in a market.\nBeach umbrellas made of straw with the ocean in the background.\nA couple of girls holding tennis racquets and a ball.\npeople in a field lfying many kites flying in the sky\nA elephant stands at a watering hole with its truck in its mouth.\nA shelf full of teddy bears on display.\na train going down the tracks near a large city\nA bathroom with wooden door and a suitcase on metal a metal frame chair.\nA man in a red suit is on a white surfboard on top of a wave.\nA man walking in the sand with a surfboard.\nMultiple fire engines in the street in front of building.\nThe boy is playing video games on the tv.\nTwo long buses parked on the side of a road.\nA giraffe that is standing in a grassy area.\nTwo women smile with skis on as they sit in a snow bank.\nA pair of tiny red scissors getting ready to cut.\nA cut in half sandwich on a plate next to a shake.\na white plate on a table  filled with pizza plices\nWoman places a piece of chocolate at the top of this treat\na long train is crossing over a river\nYoung boys on a couch with their stuffed animals and a laptop computer\nA man sitting in a chair with his legs crossed.\nA lady playing tennis on a court professionally.\nPeople are walking with horses on a trail of dirt and stone.\nA skier comes down the snowy slopes quickly.\nA male surfer riding a very small wave to shore.\nA cat is on the floor with some scissors.\nA dog that is sitting on a couch.\nAn older stove sits in the kitchen next to a bottle of cleaner.\nAn athletic middle aged male skier courses downhill.\nA train is traveling though a very beautiful mountain area.\nA stuffed bear is sitting next to some jars\nA dining space with a table and four chairs under a window and art on the wall.\na small 3 storey building with a clock on the top\nA large empty bathroom with a walk in shower tub.\nA small child sitting in a sink brushing his teeth\nDouble photos of two Rice University tennis players\nA meal laid out on a table outside at a restaurant.\nA stop sign on a pole in the grass.\nA boy getting ready to hit a baseball at a game.\nA man riding a skateboard into the air.\nlarge gothic styled church towering over cemetery\nA rusty fire hydrant is between two poles.\nA plate full of food accompanied by a glass of wine.\nAn old train is making its way through the city.\nA ski slope scene with a skier on skis.\nA person and a laptop in a room.\nA black and white photo of a motorcycle.\nThe four images each have different plates of food.\na person riding a skate board on a ledge\nA pole with two wooden street signs in front of a bush.\nFresh produce, including oranges and apples, is on display in bins in the sunshine.\nTwo wine bottles on a table with one wine glass next to the bottles.\nThere are many zebras out on the plain.\nA flock of birds are clinging to a tree.\nA piece of paper and some scissors on a table.\nA women holding a tennis racquet getting ready to play a game of tennis.\nA man carries a surfboard through the city.\na small pizza that is on a white plate\nA white sheep standing in a wire pen.\nA tennis player prepares to return the ball.\nA large commercial air plane on the other side of a body of water.\na group of zebras standing on a dirt and grass field\nSome ice cream with a fork on a clear plate.\nA kitchen filled with kitchen furniture and accessories.\nA man carrying a plate with food on it.\nA man standing in a field is throwing a frisbee.\nA plane that is on the ground in the air.\nA bathroom is reflected in a round mirror.\nan image of a skateboarder doing a trick down a ramp\na dark gray horse grazing in the field\nThis is an arrangement of pebbles and fruit with a butterfly sitting on an orange slice.\nA woman in her bra and a dress holding a giant green object.\nA zebra stands near a mound of dirt in a wooded area.\na close up of a slice of pizza in a box\nA smart phone sitting next to a receipt on a table.\nA kite that is stuck in a tree.\nBook case with books and computer with keyboard\nA woman faces a truck that is loaded with luggage.\nThe man jumps high to hit the tennis ball.\nthree brown bears are cooling off in the water\nan image of two horses with noses nestled to each other\nA suitcase and a stroller full of miscellaneous items abandoned on a city sidewalk.\nA building with an ornate clock fastened to it near a flag.\nA group of different animals that's standing in the dirt.\nA bunch of fruit like banana along side each other.\nA young person stands in the kitchen, holding up a box of food, near the island counter.\nsome people riding some bikes right by some boats\nA man on a striped board windsurfing in the ocean.\nMan preparing to serve ball on outdoor tennis court.\nThe giraffe seems calm inside of the fence.\nA couple of people carrying surfboards under a pier.\nA person sitting at a table eating pizza and drinking wine.\nTwo men standing in a living room next to each other.\na ca dipping its head into a toilet bowl\nA group of surfers ride a wave on their surfboards\nA young child sitting on a surfboard at a beach.\nA person is riding a snowboard down a snowy hill.\na hotel room with a nice tv and sofa setup\nA girl who is wearing a baseball glove.\nthere is a woman cooking in a very large kitchen\nA young boy wearing a blue shirt standing next to a woman.\nsome people walking on a pier and a skateboarder\nCat sitting near a row of shoes and boots.\nA strand of beads on an open laptop computer.\nTwo military men being honored with an award.\nA bird on a beach with the ocean in the background.\nA sub sandwich is fully loaded and must be eaten from a container.\nThe cow is grazing in the tall grass.\nGreen highway signs pointing in opposite directions next to a building\nA brick outdoor structure of the Delacourte Clock.\nThe perspective of the skateboard picture creates an unusual scene.\nA boy and a girl pose for a prom picture.\nA wooden table with bowl of soup and cup with beverage in it.\nTwo giraffes under the trees on a sunny day\nA surf board rider falling off his board while a ship sails out a sea.\nA plate with a sandwich and french fries with a drink in a glass.\nA person's hand holding a bitten into doughnut.\nSeveral cows laying in the grass on a sunny day.\nA small boat tied to a dock at a pier.\nan old silver and brown double parking meter\nA mountain covered in snow with a person on a snowboard.\nA woman grimaces in frustration with a video game remote.\na truck sits parked next to a bench\nA cat sitting on the home office desk by an open window\na steeple outside of a window with a clock\nPerson on the tennis court bent over with racket in hand\nA large elephant stomps around on the dirt covered ground.\nA woman smiles from behind a bar displaying liquor bottles.\nA kitchen view of a refrigerator, with TV trays next to it.\nA skier standing in the snow next to a yellow and blue train.\na man bouncing a tennis ball on a court before he serves\nA man wearing a backpack and holding a suitcase on the road side.\nA blue, yellow and brown house with a clock in front of the fence.\na tiger striped cat hiding under a bed\nso many people at the beach swimming and resting\npart of a sandwich sitting on a table\nThree men sitting on a bench holding black luggage.\na dog sitting in the driver seat of a truck\nA female in pajamas and hooded sweatshirt playing a video game.\nA group of people riding an elephant through a forest.\na plastic cup of almonds some crackers and cheese\nLarge made up bed in modern bedroom, with small desk.\nA woman tying a horse down to a trailer.\nA zebra runs across a field with antelope in the background.\nA man is standing in the middle of a living room.\nA piece of pie sits on a red plate.\nteddy bear like candy on a wooden table\nA man on a skateboard is riding on the ramp.\nA close up picture of a vase in front of 6 other vases.\nan airplane flying about many tall buildings and cars\nA clean living room with multiple sofas and a flatscreen television.\nPeople standing at a bar, eating appetizers and drinking wine.\nA bridge over water that has several trees on one side.\nA little girls peers into display of goods in a bakery.\nSomeone is doing something right now that is fascinating.\nA bald headed man on top of a red motorcycle.\nThe man is working on his cell phone by his desk.\nA woman pouring coffee into cups on a counter.\nA group of men on a field playing baseball.\nA motorcycle sits parked across from a herd of livestock.\na man on a surf board riding on a wave\nA cat is licking up food from a blue plate.\nA red brick building sits on a corner and has a tower and a clock.\nA wrought iron bench sits above the sea shore.\nA garden filled with lots of green plants.\nA plate full of food that has carrots and some meat on it.\nThree beds in a white bedroom with two windows.\na photographer wears a umbrella to get camera dry\nA child in a blue coat skiing on a ski slope.\nSome french toast sits on a plate next to coffee.\nA table with a plate of food, pitcher of orange juice, coffee and sugar packets.\nA brown and white dog with long ears holding a yellow frisbee in it's mouth.\na woman is on her cell phone on the sidewalk\nTwo planes that are flying in the sky.\nA table topped with plates and trays of food.\nA woman swinging at a incoming tennis ball\nTwo Clydesdale horses being walked through a park.\nSkiers enjoying a day on the slopes in the sun\nA group of men are playing a game in a living room.\nThe living room has a long grey couch and a rug under the coffee table.\na man sits on a park bench surrounded by pidgeons\nan image of a cat with a tennis racket by a girl\nan old jet fighter with a propellor sitting in a plane graveyard\nA kids ski school with one instructor teaching\nA woman shops at a market with an assortment of fresh fruits.\nA fire hydrant and a little yellow ball person is between three yellow poles.\nThe sausage is sitting on the side of the plate.\nA train decorated with candy canes and other Christmas decorations.\nAn open laptop computer sitting on top of a wooden desk.\nA living room decorated with a modern theme.\nSome players in action on the soccer field.\nA very small bathroom has a toilet in it.\nA male is eating a large piece of food with his mouth wide open.\nA couple of people underneath a building with a clock.\nA large tower that has a clock on the side of it.\nYoung boys are playing softball on a dirt field.\nTwo bears playing in a water hole at a zoo.\nThe vase has some beautiful flowers in it.\nA customized motor cycle with skulls on it\nA custom motorcycle on display at a motorcycle show.\nA black tour buss parked on side of road\nA person jumping up into the air on a skateboard.\nBlue passenger train passing through an open forrest.\nBlack and white photograph of a women's tennis team\nA snow covered sign in a city neighborhood.\na cat with some kittens laying on a bed\nTwo water buffalo's standing together by a fence.\nThe words Market Street are written on a white sign.\nA person leaning on an upright skate board in front of a building.\nTwo people seated on a couch, one with glasses and holding remotes.\nA boat is sailing on the water in foggy conditions.\nA surfer waits at the water's edge on a rocky beach.\nA desk has a keyboard, monitor, and laptop on it.\na closed up flower laying on a huge leaf\nThis toilet has a weird plastic piece on it.\nThe dining table is in the middle of the large kitchen.\na guy in a black suit with a bright tie\nThe city is next to a beach and many docked sail boats.\nA man riding a surfboard on top of a wave in the ocean.\na kitchen sink with several white mugs hanging on the wall.\nA city in the night light up with lights\nA family poses together during a day out skiing.\nA black and silver fire hydrant sitting on a sidewalk in front of a brick building.\nA baseball player standing on top of a green field.\nA woman in the kitchen with others preparing a meal.\nA giraffe is eating in an enclosed space.\nA view of a few cocunuts in a basket.\nA tall shell gas station sign proclaims it is the Czech stop.\nA man standing in a room holding something in his hand.\nWoman with umbrella walking in the rain next to man.\nthree small birds on a sandy beach\nA cat laying on top of a couch on a shoe.\na chopping board with some cakes on it\nA lush green hillside covered in cows grazing.\nThree baseball players stand on a baseball field.\nA woman getting ready to light candles on a cake.\nthere is a dog sitting in a room where there is sun\nPeople holding various phones in a group together\nA black tennis player swinging the racket towards the ball.\na girl is turned around on a wood bench\nan image of a military man holding his daughter\nA man who is standing in front of a crowd talking.\nA man and woman holding coffee and talking to a woman in the city while walking their dog\nA close up of the push to walk button\nA couple of animals lounging on a hill in the open.\nA horse that is walking around by themself.\nA couple of computer monitors sitting on top of a wooden desk.\nA woman is sitting at an outdoor table using a cellphone.\nA long haired house cat, sitting in a shallow pot, is roaring.\na man on a horse that is in side of a gate\na polar bear sleeping on a rock ledge\nA women riding a scooter on a busy street.\nan image of a dog that is catching a frisbee\nWoman carrying bags eating a hotdog on a crowded street.\nA beach with a lot of kites flying in the air.\nA train going down the track with steam on top and a bicyclist riding beside it.\na man in a a hat i standing with a horse\nA keyboard, mouse and monitor sit on a desk.\nTwo large toilet sectionals in the middle of a grey bathroom.\nthis is a cat in front of a tv\nA mirror, road signage and a skyscraper in the city\nA bathtub and sink under a window with a lace curtain.\nA person is holding a sandwich in one hand\nSome baseball players are playing a game.\na boy on a skateboard is about to skate down the ramp\nPicture of a person that is reading a book.\nA white plate has a brown stripe design in the middle\nA sink in a kitchen under a microwave oven.\nA couple of black bears snuggling each other.\nA traffic light with a building in the background.\na man on a surf board rides a wave\nA building with a sign that says Donuts above the door.\nTrays of a variety of different donuts for sale.\nclose up of a pastry with a bite taken out of it\nA man is standing and talking on a cell phone.\nA bus is traveling down a city street that does not have much traffic.\nThere is a flip phone in a banana shaped case\nA giraffe running around a field at a zoo.\nA parking meter reserved for the disabled outside of a boutique\nA man holds an oversized frisbee at the park.\nA toddler happily takes a bite of a donut.\nA beach that has people walking on the sand and in the water.\nA train sitting on top of tracks with steam pouring out of it.\nFirst bus on street currently not in service.\nA man flying through the air while riding a skateboard.\nThe girls are checking-out where to put their surfboard in the water.\nA black cat rubbing up against a laptop.\na close up of a small bird on a green surface\nA car with some surfboards in a field.\nA Delta airlines plane with the food services truck docked at the service door and a worker at the door.\nA sausage sandwich and greens sit on paper.\nA cake is being cut in front of little kids and parents.\nCat standing on papers that are sitting next to a laptop.\nA studio apartment with a bed, a table, and a kitchen area.\nWoman poses on beach with two umbrellas in front of a floating boat\nA red car with various pizzas sticking out of its window.\nA church with a steeple and the sky in the background.\nA table full of food with a glass of water.\nCows walking on a path between rocky outcrops.\na man standing by a desk  with a toothbrush in his mouth\nsome cars and a motorcycle driving on a road\nA group of people watch as a man stands before them  holding a string that is attached to a kite that flies in the cloudy blue sky.\nA gathering of people fly kites in the park\nA  woman riding a motorcycle with a man on the back of it.\nThere is a male surfer riding a wave while the sun goes down\nA black and white cityscape shows lots of people, mainly a tall, smiling man in suit and tie, who is paying attention to a woman standing beside a second smiling man in glasses and headset, who is also holding a microphone and notepad.\nA man making a face while biting a hot dog with cheese on it.\nBright sunlight shining through a colorful window curtain.\na white horse standing next to a stream, rocks and a green field.\nA man with a surfboard walking into the ocean.\nA kitchen sink near a couple of windows.\nA woman wearing a white shirt and black capris getting ready to fly a multi colored kite.\nTwo boxes of donut with milk and juice on a dining room table.\nA green bus with a bike on the front of it driving.\nFour bears standing on a fallen tree outside.\nA young boy holding a blue baseball bat on top of a green field.\nAn adorable little girl holding two ski poles.\nA motorcycle parked across from a business next to a highway.\nA red and white wings black bird sitting on wood\nMan and woman at an outdoor restaurant smiling for camera.\na person riding a two thick wheeled bike on sand\na group of people shopping for fresh fruit and vegetables at a market\nA baby elephant following behind a mother elephant\nseveral bottles displayed on counter in well decorated indoor area.\nA yellow train is traveling down the railroad tracks.\nFireplace with brick border displaying many photos and decorative flowers.\nA view of a bathroom, that is very old looking.\nAn airplane ready to let passengers get on.\nA rendering of an old fashioned water closet.\nA old time picture of a woman milking a cow.\nA pinto horse walking in a coral with two people.\nA dog standing on a chair eating out of a dog bowl.\nA cute puppy curiously looks to see whats going on.\nA woman gets a fresh glass of wine from a cask using a glass instrument.\nA bedroom packed full of home goods and luggage.\nMilitary colors being shown at a baseball game.\nA group of alpacas grazing on a dry hillside.\nA woman and her son picking out sweets at a bakery.\nA young skateboarder wearing safety equipment skateboarding down a sidewalk.\nA dinner plate with meat and vegetables on it.\nA kitchen large green hanging plant and a door.\nA bed with four pillows and the covers turned down.\nA large motorcycle is parked next to a brick wall.\ncows in a small wood and straw shack\na vintage photo of a woman sitting on a horse with a man in a suit standing\nA woman smiling while holding a yellow banana.\nThe lady is sitting with food in her hand.\nPedestrians cross the street during a winter day.\nassorted foods separated in bowls on a white table\na number of people standing in a kitchen area with a counter top\nFour pieces of pepperoni pizza on a plate.\nA man in a suit standing in front of a window\nA woman is walking her dogs  on the city sidewalks through the newly fallen snow.\nPeople swim in a pool on a beach resort.\nA toddler in a t-shirt holding open a refrigerator door and looking inside\nA street lines if restaurants with signs hanging off of them.\nA baseball sitting in a baseball mitt on a blanket\nA lone kite is flying above the water and under a blue blanketed sky.\nTwo road side workers chatting, one is holding a stop sign.\nA homemade square pizza fresh from the oven.\nA lanky skateboarder poses against a barn-red door.\na dog stands inside of a boat as it stares at a camera\nA man surfs on a surfboard over a wave\nA foot ball fan is showing off his team spirit\nA man in black jacket with dog in snow.\nA tray full of breakfast items served on a plane.\nA young man in a black shirt and purple tie driving an automobile.\nDifferent style toys placed next to eachother  and a batman costume.\nTwo men are talking to each other while holding a skateboard.\nA pitcher throws a ball while the opposing team watches.\nA man in uniform is looking at his phone.\nA woman is pouring a bottle of wine into wine glasses.\na man in a blue shirt and a orange tie\nA bird sitting on top of large pile of brush.\nA man casually throws a frisbee into the air.\nA woman in a Sailor Moon costume rides a motorcycle in a street full of people\na laptop on the floor with a cat on the laptop\nA plethora of stop signs in the same vicinity of each other.\nA pizza with a sign with a cartoon mobster.\nThere are horses walking beside of the cars.\nA cow lies down in a pen and looks at the camera.\nThree boys peel vegetables and cook at a counter.\nA police officer on a police motorcycle rides past a line of men in uniforms.\nA wide photo of two people kite surfing in the water.\nSeveral trays of pastries sit on a table.\nHighway road sign announcing exit ahead for vehicle traffic.\nA black and white photo of a woman asleep on a park bench surrounded by foilage.\na toddler sitting at the end of a surfboard on the beach\nThe worker is cleaning the eating area for the customers.\nA baseball player swinging a bat while standing next to home plate.\nA woman sitting beside a table full of fruits.\nA train with smoke coming out going down the tracks\nA airplane sitting on the tarmac at an airport.\nA gross bathroom has graffiti all over it.\nDog trying to pick up an object with its mouth underneath a bench.\na close up of a white keyboard with a black monitor\nTwo military men are cutting a large cake.\nA woman walking down a street holding an umbrella.\nA red stop sign posted next to a tree next to a sidewalk.\na clocktower standing high with lights on\nA dog and cat in a master bedroom looking at the camera.\nTHERE IS A BATH TUB AND A SINK IN IT\nBoy in purple shirt holding a tennis rack on tennis court.\nA full view of a picture cloth with an animal.\nCHEF IN  KITCHEN WEARS FACE MASK WHILE PREPARING FOOD.\nAn assortment of fruits and vegetables sitting on a counter.\nTwo people walk down a walking path.\nThis is an image of several kids playing soccer.\nA man carrying a surf board into the water where there are other people.\nThe pizza in the box is divided into four slices.\nA farm picture with an old cabinet and a horse with its head down.\nA plate of food including chicken, rice, and beets.\nA plate of food with onion and broccoli on it.\nSquare white plate with a sandwich full of meat and dressing.\nA group of people walking down a wet sidewalk.\nMan riding white horse in the street while others watch.\nA man holding a camera standing in a crowd.\na train on a track near a platform\nA man doing tricks on a skateboard outdoors in a city.\nA group of people mill about on a lawn of a building.\nA green highway sign beneath a beautiful blue sky.\nA toilet with a wooden seat is in a small bathroom.\nA big road sign listing three different locations\nHotdog sandwiches sitting on ears of corn on a table.\nThe young child is riding swiftly on a skateboard.\nA very close up look at a tasty looking pastry.\nMotorcycle police and their bikes with Battenburg markings\nThere is a seagull flying towards beach umbrellas\nA man leans against a wooden box on wheels that contains a teddy bear and a basket.\nA large white bus is traveling through the city streets.\nA young woman walks in the rain, smiling and holding an umbrella.\nA neon green toilet and sink are by a large trash can.\nSome people gathered together on the snow covered ground.\nA man riding a surfboard on a wave in the ocean.\nThe crowd of people are gathered in front of the building.\nthis is an unmade bed with a flowery blanket\nA long line of skiers is waiting on a snow covered mountain.\nSmiling young girl holding video game controllers while standing\nA woman in a blue riding jacket rides a dark brown horse on a riding course.\nA brown dog laying on floor under a brown mat.\nA couple of people riding waves on top of boards.\nA sandwich on a plate with a side of coleslaw on a tray.\nA small boat is going down a water channel.\nA woman with an umbrella standing by a fountain at the park.\nA clock behind a fenced in area in a city setting.\nA cat sitting on a couch looking intently at something.\nStreet signs showing streets with a one letter name\nA young man scoffing a huge slice of pizza from two paper plates.\nA table cluttered with a bunch of stuff.\nA woman wearing sunglasses and a hat is smiling.\nA train sitting parked on tracks next to a platform.\nA young woman looking at a store display and holding an umbrella.\nA clock that has been placed on a window sill.\nA giraffe in a grassy fenced in enclosure.\nThe  colorful lights are illuminating the darkened street.\nA stuffed zebra posed and being chased by stuffed wild dogs.\nA rear view mirror has the reflection of a truck.\nA small pizza sits on a granite counter top next to a napkin.\nTwo firetrucks with their lights on are stopped on this road.\nA woman sits cross legged near a pile of eggs.\nGuy on bench looks over while eating pizza\nChickens on a sandy beach with a motor boat in the background.\na couple of zebras are inside of a caged area\nA bus stop sign that is on a pole.\nA English muffin lays on a plate next to a drink.\nAn ornate antiqued pole holding a clock with trees in the background\nSmall white sheep below another sheep eating in an open field.\nSeveral kites are flown along the shoreline on a cloudy day.\ntwo people holding surf boards on a beach\nA zebra standing on top of a dry grass field.\nA couple of giraffes looking attentively at the camera.\nA non passenger train sitting out on the tracks at a curve\nColorful lights reflect off the items inside this bathroom stall\nA man standing behind a woman holding a bat.\nA man playing with his dog near the water.\nTwo men playing frisbee in a large field\nThe man is outside playing Frisbee with his dog.\nA large elephant standing in a grassy field.\nA man is using his board to surf a wave\nA young man wearing goggles with spiky hair dressed up like Robbin.\nA toilet sitting in a bathroom under a window.\nSink with electric toothbrush and toothpaste sitting on the top.\nPlate of vegetables made from knitted yarn on wooden plate.\nA group of young people playing a game of frisbee.\nTwo young girls sitting a big bench on the beach.\nA small dog with long hair sits on a computer desk.\nSeveral brown cows grazing in a field.\na person standing next to a fire hydration that is spraying water\nA cat underneath a car on the pavement looking Rome underneath\nA very large jetliner sitting on top of a tarmac.\nA group of people sitting down at a table together sharing a meal.\nThis truck has an open deck for the passengers.\nInterior of a public toilet stall in a country that squats to defecate\nA counter topped with small different shades red tiles\nYoung boy in front of a large elephants cage.\nThere is a parking meter with one side covered up.\nA large passenger jet flying through a  cloudy blue sky.\nA black steam engine train sitting on top of rail road tracks.\nA very close up view of a very pretty bird.\nA store that has trees on the side of the building.\nA large stuffed white teddy bear sitting on a bed.\nA baseball player wearing a white and red suit with the number 19 gets ready to hit his bat.\na large bed is in a white room\nSeveral pieces of furniture are in an empty parking area.\nA small kitchen with stainless appliances and red cabinet doors.\nThe baseball between the pitcher and the batter during a game\nA vase of flowers on a table near a window\nA fighter plane is taxiing down a runway.\nThe father and daughter are under an umbrella on the beach.\nA man walking on the sidewalk next to a suitcase leaning against a lamp.\nA cat laying in a bowl on top of a pillow.\nA woman is standing in front of a birthday cake.\nCars driving on the street and people walking on the sidewalk in a city.\nA plane is sitting on the ground at the airport\nA pier stands in the ocean while people wade in the water.\nA green fire hydrant sitting in the middle of a sidewalk.\nThere are apples and oranges on top of a table\nA person cut out a bird shape out of a piece of paper.\na big crowd of people that are looking at a zebra\nA large bear walking around a zoo enclosure.\na zebra standing next to a car on a bright day\nA woman flying a kite on the beach under a grey sky.\nA large flock of birds flying in the air.\nA soccer player is about to kick a soccer ball\na fridge stove sink and dishwasher and a dinette set in a kitchen\nA group of people standing on top of a building near a large clock.\nDifferent types of fruit displayed on a table.\nSkateboarders waiting to hear the go ahead word to skate down a ramp.\nAn open laptop computer sitting on top of a wooden desk.\ncars on the road that are nothing but blurry lights\nA bedroom scene with a bed and dresser.\nA young woman is pulling a casserole out of the oven.\nA man in a blue shirt serves a tennis ball\nA table holding two trays of cookies and a cake.\nA full view of some cows grazing on a field.\nA plethora of apples sitting inside a bowl.\na home made pizza sits on a trey\nA white toilet missing seat in an old bathroom.\na yellow and black train is on some tracks\nA sepia-tone photo of a man and a boy standing near a stove.\nA piece of cake sitting on a square plate.\nMan standing on side of busy street next to a mall.\na small airplane sitting in the middle of an airstrip in a field\na couple of surfers are walking out of the sea\nThere is a group of people flying a kite together\nTwo saucers have a doughnut and cappuccino on them, respectively.\nA woman seems to be doing yoga on a surfboard in the water\nMan holding dog mouth open to brush teeth in tiled area\n3 microwaves cooking something and catching on fire.\nA man who is riding a wave on a surfboard.\nA clock on the wall inside a mass transit vehicle.\nA computer and keyboard are on a computer desk.\nTwo men stand near another man who is jumping onto a bed.\nA group of people standing around a living room\na heard of sheep on a grass field.\nA group of people ready their skiing equipment in the snow.\nA tennis player throws the ball up to hit it.\nThe traffic light is in front of the building.\na man waterskiing behind a white boat on a lake\nA black and white cat is sitting in a window.\nA sandwich and french fries on a paper plate\na man is flipping through a book on a bed\nDisc on beach, with dog prints in sand\nA dog is standing next to a cat on a suitcase.\nA plate with asparagus, broccoli, carrots, cauliflower and a sandwich.\nA group of elephants standing together in a field of grass.\nA red fire hydrant with a hose sticking out of it.\nA airplane that is sitting on a tarmac.\nA display case holding various types of donuts in metal racks.\nsmall boy eating food from a white plate\na large green and white clock tower in the middle of a plaza\nA skateboard zooms down the railing at the skate park.\na man in a tie holding a cigarette and looking down\nA cross country skier on a trail, smiling.\nSome people riding a motorcycle near a bunch of motorcycles.\nman having fun with a video game system\nA man sitting on couch with two little girls.\nA group of people stand by a red lighthouse.\nThe mother smiles as she holds the baby boy.\nA man teaching a girl how to play tennis.\nA broccoli head with onions and potatoes by a wooden wall.\nA woman standing next to a man near a traffic light.\nA person on some skis in the snow.\nan image of a woman sitting in a dark room\nA very long and wide road with some assorted vehicles.\nTwo hotdogs and a side of french fries in yellow containers.\nA stop sign in front of a brick building.\nFour men in a lake attempting to stand up on a board together, with their hands raised in the air, and one man in the water.\npeople are at outdoor seating with umbrellas overhead\nA tennis player prepares to hit a forehand on a red clay court.\nSmall group with a folding table next to a decorative old bus.\nThe bowl is full of broccoli and some kind of meat.\nan image of a place setting with soup and biscuits\nA bathroom with a large white tub and his and her sinks.\nWe see a picture of many many teddy bears.\nA group of men sitting on a snow slope while attached to snowboard.\nA pitch approaches the batter in a baseball game.\nA man in glasses eats a slice of pizza.\nthis photo is blurred it is of a house\nA black cat standing inside of a piece of luggage.\nPack of zebras in a zoo standing together.\nA metal pole with three street signs pointing different directions.\nA man riding on a wave on top of a surfboard.\nTHIS IS A PICTURE OF A TOILET AND SINK IN A BATHROOM\nMan smiling with hat in kitchen with mess around\nA young man in a baseball uniform with his arm pulled back.\na red bench and some buildings and lights\nA bird sitting on the branch of a tree near leaves.\nA sink with some cups on the counter top.\nA man on a horse without a saddle stands on a hill.\nLow view of small passenger train moving through the countryside.\nA person with an umbrella and some cars on a street.\nA man is playing Wii tennis in his living room.\nA large jetliner flying through a cloudy blue sky.\nA man standing on a beach near luggage.\nan empty and clean wood floored home kitchen\nA brown and white dog sits on grass next to a Frisbee.\nMany kites fly above a crowded beach.\nA picture of a room with a table that has a vase and candles on it.\na plant that has a yellow bird on it\nYoung man exclaiming over an unripe green plantain.\nTwo teddy bears that are sitting next to each other.\nSome ripe bananas are in a brown wicker basket.\nA young boy is skateboarding swiftly through a crowded park gazebo.\nFour people on skis standing in the snow\nA book on finance sitting on a bed.\nThere is a blue and yellow train stopped at a train stop\nTwo oranges and a banana laid out to look like a sad face\nthere is a military truck that is stopped on the street\nA crowded harbor filled with small sailboats and other watercraft.\na blue bus is parked by a bench\nThree cats on a bar watching television very closely.\nChefs and cooks are preparing meals in a restaurant kichen\nA microwave and a cone on asphalt by bushes.\na red and white plane and a blue and white plane\nA very tasty looking cheese and vegetable dish\nA person with a snowboard next to a man with skis.\nA person that is eating some food in her mouth.\nA very small kid in the road next to a big yellow bus.\nA movie cover with some food on top of a plate.\nA silhouette of a woman with a tennis racket.\nA teddy bear is sitting alone in a window.\nA woman sitting on a brown couch with two children.\nLittleboy been playing with a Nintendo Wii and amused\nTwo children playing a miniature version of tennis on a city street.\nA group of sheep walking in a grassy pasture\nA small airplane in the sky and another in the water.\nA pair of giraffes is stretching up to a limb in perfect harmony.\nThe bananas were cut to put chocolate inside them for a treat.\nTwo benches are empty on a sunny day.\nTwo young males playing a video game together in front of a tv.\nA man in green and a red haired woman sharing a laugh.\ntwo guys riding bicycles while carrying their surf boards\nA bunch of different types of tools in a play kitchen.\na street sign on a wooden pole near a fire hydrant\nFresh cut flowers in a glass vase on a tablecloth\nA banana sitting on top of a table next to a  paper.\nA person is parasailing on the water under a cloudy sky\nA fire hydrant is placed in a wooded area\nA pizza sits on a table and it has cheese, olives and broccoli on it.\nSheep grazing in a lush, green field on a lavish farm estate\nOlder Americans ride in a simple parade float adorned with red, white and blue decorations.\nA person wearing a glove holding a chili dog.\nA close-up of the rear end of a propeller plane.\nA black park bench sitting near the water\na person walking on a sie walk talking on a phone\nSeveral cows are on a sloping grassy hill.\na blue and white plane flying over a lake.\na bike with a tarp and boxes of items\nTwo men standing on the street wearing a suit and tie\nA shirtless man with a hat and sunglasses holding a frisbee in one hand and in a stance where he is preparing to throw the frisbee.\nA tub and shower with a curtain in a bathroom.\nSome cars at a traffic light, one with a red sticker on the back\nA group of people play a game of frisbee.\nA bathroom with a separate area from the sink.\ntwo zebras walking next to each other in a desert area\nA man is sitting in a chair watching television with a remote control in his hand.\nThat looks like some sort of huge satellite.\nA person is typing on a lap top and there is a person up on screen.\na giraffe in a field with rocks and grace\nThe giraffe is standing alone in the field.\nA can of soda and a cat with kitten next to a monitor.\na bread with some noodles and minced meat\na bird that is sitting on a log in some water\nA man wearing a bow tie walks in the rain with an umbrella.\nLarge man in leather biker outfit with a small brown dog.\nThe wooden boat is floating on the river near the bank.\nA kitchen with a stove, refrigerator and a microwave.\nPeople flying kites in the snow on a sunny day\nA plate with grapes, green vegetables, and noodles on a child's place mat.\na cow walking in a crowded city street\nA closeup of a bull cow with horns on its head\nA person does a snowboard trick on a rail in the mountain\nThere is a grilled sandwich on a white plate with sauce\nA stove is shown with a mixer next to it.\ndilapidated, dirty bathroom with mold and water damage\nA yellow bus and blue bus passing on the street\nA cheese pizza pie is in the serving dish on the counter.\nA soccer player blocks the goal during a nighttime soccer game.\nA person is snowboarding down a hill fast.\nA man holding a ball as he leaps into the air.\nA kitchen that has wooden floors and a bay window.\nThe \"Yoctangee Park\" sign has a Native American on it.\na park bench that is on top of some bricks\nSide by side view of two oval plates, one with fork, with chicken salad sandwiches and rosy new potatoes, by an open and an unopened bottle of lager, a pepper mill, paper towel roll, basket behind.\nA very large elephant in a field standing next to a pond\nThere is a woman drinking from a fire hydrant and several other people nearby.\nyou can see a large belt that is used to make donuts\nA man holding a tennis racquet on a tennis court.\nA man holding a racket playing tennis at the court\nA woman sitting on a bench reading a magazine.\nA young boy and girl playing on a ride.\nA large grizzly bear walking through tall grass.\nA pair of scissors next to a writing instrument of some sort.\nA small child using skis to ski down the hill.\nSurfers walk out through the surf toward large waves.\nA person is holding a computer and watching a flat screen t.v.\nA bus is stopped in the middle of the road.\nElephants with passengers walking through a calm river.\nA little boy that is standing on a skateboard in the street.\nMeat, carrots, and a roll sit on a small white plate.\nA bathroom with a wall mounted toilet and TP dispenser.\nA herd of zebras in a tall grassy field\na close up of a plate of fruit with apples\nA zebra standing in water next to grassy area.\nA large herd of elephants at the edge of a body of water.\na kitchen with a refrigerator near a window\nA small elephant laying on the ground in the mud.\nA red train or trolley car is shown at a station.\nA man holding a baseball bat on a baseball field.\ntwo people riding skis across a snow covered forest.\na person is touching a small stuffed bear\nA couple of horses grazing on a lush green field.\nA woman standing by a yellow fire hydrant\nA man plays with two young children in the grass.\nAn elephant stands between two bushes on a dry field.\nA shoeless foot standing on top of a skate board\nA man stands on skis near a snowy mountain.\nA farm house and barns in the background with horses and farm animals in the yard in front of them.\nSeveral views of mean playing with a white disc on grass.\nAn ornate wrought iron frame holds a sign reading GARAGE.\na man playing tennis going for the return\nA child in white shirt laying on bed in wooden crib.\nA man and woman pose for a photograph while sitting on a moped.\nA close-up picture was taken of a giraffe.\nA yellow plate topped with different types of food.\nA women is laying on a board surfing a small wave.\nThe zebras are eating the grass in their habitat.\nA man dressed up for a themed party.\nA black and white photo of an old train\na women that is eating a very long hot dog\nA post with a clock and several birds sitting on it.\nAn Asian stir fry on a plate with chicken, brown rice, and broccoli.\nTwo men skateboarding down a road near some cones.\nThe street sweeper has a safety triangle on the back\nA motion blur street scene of people and a bus.\nA man riding on top of an elephant.\na car on a road with people standing on the side walk\nA stuffed bear and a stuffed bunny sitting beside of one another.\nA man taking a swing at  a tennis ball\nA photo of a green and red train on a set of tracks.\na vintage photo of some kids playing on a bed\nThe skier is upside down in the air.\nA kitchen counter with an unassembled food processor.\nA group of people sitting on a train.\nA dog that is standing on top of a fire hydrant.\nThree white plates topped with pizza on an orange table.\na black and white photo with a sign next to a building\nI cant see wht the images are in this one\nA planter is full of green plants along side a fence.\na monument with many kites flying near by\nA living room with hard wood floors and furniture.\nA rusty looking parking meter is on the pavement.\nThree horses are walking through the grass wearing blankets.\nA man in a suit and tie beside a stack of suitcases.\nTwo pieces of pepperoni, sausage and ham pizza on a plate.\nA bed with a stuffed animal looking out a window\nA batter, catcher, and umpire stand on a baseball mound.\nA closeup of a shelf displaying a canned beverage and a muffaletta sandwich.\nMan standing on a skateboard with a person sitting on it.\nA bathroom stall that says did you check your lipstick\nA large wooden clock hanging from the side of a large cement pole.\nA small blue and silver airplane spewing smoke at an air show.\nA bag sits on a white sheeted bed\nA white house with a red top next to the ocean.\nA large green truck with giant tires on it.\nA packed Chinese train is filled with commuters.\nThere are several bananas tat are in te tabe\nA vase of flowers that is on a table.\nTwo dolls with crazy hair and interesting clothes.\nTHERE ARE BLACK AND WHITE KEYS ON THE KEY BOARD\nA computer desk is very cluttered with various items.\nA trellis and arbor with a bench under it\nA bunch of zebras grazing near a road where vehicles are driving by.\nA group of parked motorcycles at a parking lot\nA person can be seen trying to cross country ski as though they are on a slope.\na person is skiing outside in the snow\nA market display with the rows of vegetables in baskets.\nA man leaning in to see the laptop he is using\nA man in black jacket playing a game with a Nintendo Wii controller.\nA cat is sitting on a laptop's keyboard\nAn old man in a suit and tie is staring.\nTwo pizzas sitting on top of a counter top.\nA four way traffic light showing the green light lite up.\nA rack with many accessories next to a refrigerator.\nTwo UPS trucks are parked side by side beside a building.\nA skate boarder practicing his tricks on the ramp.\na little bow outside in a yard by a bar, playing frisbee\nGuy in shirt and tie walking away from the chairs\nA man doing a trick on a motorcycle.\na kite flying in the sky above a body of water\nA group of fruit, vegetables and eggs on a kitchen counter.\nA man standing next to his wife as she holds their baby.\nA modern kitchen is displayed with silver decor.\nAll white bathroom with shelving unit over commode.\nA surfer in a wet suit rides a wave.\nA beautiful woman riding skis down a snow covered hill.\nA blue sign in front of a bamboo wall.\nA bathroom with white toliet and sink visible\nfive used toothbrushes in a clear glass on top of a sink\nBlack and white photograph of a man on a motorcycle.\nA bunch of young boys playing soccer and having lots of fun.\nAn adult carries a child and a surfboard through the waves.\nA man holding a yellow frisbee in his right hand.\nThree boats filled with people floating down a river.\na player squatting down to return the ball\nThere are a collage of pictures of different foods\nA dog standing outside next to a car.\nA woman is taking the first bite of a banana.\na steel bridge over the water with a train\nA baseball player swinging a bat towards a ball.\nFive snowboarders in yellow jackets perform a simultaneous jump.\nA barber giving a man a haircut with a blue smock on.\nA wooden table with two plates of food and a paper\nan oven and a small table in a home kitchen\nA person sitting at a table with a cup of coffee.\nAn open computer next to books on a table\na big zebra that is on a dirt ground\nVehicles and people on a crowded city street.\nTwo people posing next to a giant statue with a suit case.\nThree planes fly high in the sky in unison.\nA person on a surfboard riding a wave.\nA tray with four different types of food.\nA man holding a child with a toothbrush in its mouth.\nA LONE GIRAFFE IS GRAZING IN A OPEN FIELD\nA variety of vegetables hooked on sticks on a tray next to a remote control\nThe wheel of a bicycle going down the street\nA mailed postcard of people in a boat being rowed\nSausage and cheese on bread on a plate\na person riding skis on a snowy surface\na man standing on a porch holding a bat over his head\nYellow construction trucks parked in line on a dirt road.\nA red train traveling out of a dark tunnel\nA living room filled with furniture and windows.\nA man is lacing his boots while several others are ready to ski.\nMushrooms are used in many variety of dishes\nSeveral oranges hanging on tree branches in a grove.\nAn antique semi with flames painted on it.\nThe woman is standing by the elephants outside.\nthere is a slice of pizza with mac and cheese on it\nA dog sits in the side car of a motorcycle.\nA bathroom with a wooden frame around the mirror\nA person on a skateboard near a building.\nTwo single beds that are made up with a night stand between them.\nA large clock tower on top of a tall white building.\nA woman wearing a towel holding a blow dryer.\nA solar panel powers a public phone booth.\nPicture of a bathroom with three paintings over the toilet.\nA photo of a place during bike week.\nA person is handling broccoli on a cutting board.\nAn orange and white cat standing in front of a flat screen TV\nA view of a person's legs sitting on a bench alone.\nA man stands on a white object while playing Wii.\nA large elephant is staring in front of a fence.\nA man sitting on the beach behind his surfboard.\nThe bathroom is clean and ready to use.\nbroccoli cauliflower and carrots in a white bowl\nThe Master of Hounds leading the dogs out for a fox hunt.\nCrammed and congested city street in oriental area with many people and buildings.\nA cat walking through a kitchen by a eating tray.\na person holding a tennis racket on a tennis court.\na shops table filled with apples oranges and other fruits\nA man wearing skis poses for a picture in the snow.\nThere are two beds in the bedroom, along with a desk and a television.\nA tight, rectangular kitchen space, with kiwi colored walls and a grey door, shows cabinet and counter spaces of pale wood, holding built in appliances, that borders a white tiled floor.\na single person standing on the side of a snowy mountain\nA baby crying with a teddy bear in its arm.\nA man and a woman sitting on a motorcycle.\nA view of a tree with pink flowers as soon in a mirror.\nan image of the back end of a childs car seat\nThe giraffe is standing inside of the pen.\nA dimly lit bedroom that has odd colored walls.\nRoadsigns showing stop lights, right and left turns and warning cyclists to dismount.\nTwo polar bears are sleeping atop some rocks.\nTwo bowls of food on top metal plates.\nThere is not much space left for anything else.\nA person skateboarding on an outside basketball court.\nCarry on bag sitting on bench near metal railing.\nA sticker promoting vegetarianism has been placed on a stop sign.\nThis restaurant provides laptop computers in the booths for each of its patrons.\nA yellow bowl filled with soup next to another bowl of soup.\nA couple of VW buses parked in front of a small brick house.\na new kitchen cabinet with a sink being installed\nTwo cakes shaped like trains are on gold foil.\nThe room is crowded with many things including chairs, a bicycle, and a table with cups on it.\nA train going through a tunnel under a building\nAn elephant is walking through the mud behind a gate.\nSeveral bruised oranges and lemons mixed together.\nA public toilet with the lid up in a stall\nA framed picture and reed diffuser sit on top of a toilet in a bathroom.\nA man in a white sweater placing a turkey in an oven.\nAll aboard for a ride on the tourist train.\nA bathroom wall with three urinals on the walls and images of women peeking out behind trees on the wall.\nA tennis player is trying to hit the ball.\nThis is a downhill skier sticking his pole into the mountain.\nA girl is having fun playing a video game of tennis.\nWhich one would you choose to drive, the beauty or the beast?\nA single train at a train stop with many train tracks.\nA couple hold their cellphones while taking selfies.\nSeveral zebras walking together in the wild\nA street free sign sitting under street lights on a bridge.\nA memorial with various plaques and American Flags on it.\na toy animal is wearing a feathery hat\nthe white vase has drawings of women on it.\nTwo birds are sitting on some gray cement.\nA man on skateboard riding a skate ramp.\nSome cats laying on a bed and posing for a picture.\na train on a track near people\nA lady in red water clothes skiing on a lake.\nA beautiful woman sitting in a bed holding a tooth brush in her mouth.\nA wire basket of bananas and apples on a table.\nSome people sitting and painting a road divider\nA young man is playing frisbee in the park.\nA wooden bench under a tree in the field\nA girl takes her friends picture while wearing leis.\nThe zebra is in the field standing all alone.\nSeveral cross country skiers prepare to start down a course.\nA woman skiing down a steep hill as snow flies up in the background.\nA older TV on a shelf with videos on shelves on either side.\nAn eagle is standing on top of a pile of rocks.\nA woman playing with a dog while another person is skiing.\nThe white toilet is sitting in the corner of the bathroom.\nThree sheep standing together in a grassy field.\nA lady is standing by the white truck.\nA man with a bald head has a cell phone to his ear.\nA chicken and cat walk in a barnyard.\nA living room with a large couch and a coffee table.\na tall building with some clocks on it below a cloudy sky\nA bunch of broccoli that is near carrots\nBunches of bananas are shown for display at the market.\nA sandwich cut in half and a cup of coffee.\nA baseball player holding a catchers mitt on top of a field.\nA wooden double door refrigerator with one side opened up.\nA number of seagulls stand in the shallow water as the tide sweeps over the beach.\na man dressed nicely and sitting next to a female\nPeople holding signs on a one way street.\nA giraffe kissing a man with a shaved.\nTHIS IS A SIDEWALK SHOT OF A PLACE CALLED THE LION\na person sitting at a table eating food from a plate.\nThe slice of pizza has large chunks of tomatoes on it.\nThis is a spacious bathroom with an interesting tile pattern.\nDinnerware with fruit painted on them beside a matching vase.\nTwo girls are smiling and staring in their school uniform.\nA woman in a room with multiple cats laying and walking around.\nA vase of flowers on a white sheet.\nA person taking a piece of dessert from a plate.\nA man playing tennis going high for the ball.\nBoats in a river with trees alongside in a rural setting.\nA motorcycle rider gives a thumbs up to the camera.\nA truck is carrying a load of logs.\nthis is a pair of women sitting on bikes\nA plate with a hot dog, chips and a strawberry.\nWoman in a field playing with spectators watching\nA man and woman that are standing in the sand.\nA small cat is walking behind a bike.\nA man in black jacket riding skis on a snowy slope.\nTwo people standing near the ocean with sails in the sky.\na man with a hat skiing on the snow towards a building\nA man that is standing on a court with a racquet.\nA woman is trying to catch a frisbee.\nA stop sign on a residential area has caring under the stop.\na bench near a tree near a light pole\na girl in front of a stop sign\nA surfer in a wetsuit in the curl of a wave\nTwo rectangular boxes with chop sticks have food in them.\na young man on the beach holding a sall\nA girl is at a table with two pizzas.\na little boy standing beside a toilet in the bathroom\na man in a wet suit stands on top of a rocky hill\nTwo ponies are running through a grassy field.\nA man riding a skateboard down a cement ramp.\na close up of a cat on the ground looking in a mirror\nA statue of a man riding a horse on a tower of rocks.\na guy sitting on a balcony using his laptop\nA parking meter is next to white wires.\nA group of men playing a game of tennis on a dirt court.\nThere are many giraffes standing among each other\nPlayer walking away from home plate carrying bat during game.\nA woman strikes her tennis racket against a ball.\nGuests gather around and converse at a wine tasting.\nSeveral people mounted on horses riding down a trail.\na group of surfboards stuck in the sand near the ocean\nA table topped with ripe bananas sitting in piles.\nA young boy standing on a street holding a skateboard.\nA man wearing a black ski suit preparing to go down a snow covered hill.\na very large pizza with a fork and a knife\nMan in wetsuit surfing next to a small wave.\nA Chinese lady on a boat wearing a Chinaman hat\nA pink cat creature sewn to the side of a pink bag.\nA dog who is sitting on a couch.\nMany people standing in a field with a red flag and many kites.\nA large white cat sitting on a table in front of a TV.\na bathroom with a lot of toilet paper next to the toilet\nA few people are in outside in the snow, with their ski gear.\nTwo military officers cut a cake with two civilians.\nthis kite is being flown above a city\nA man on a horse in the middle of the street.\nA man holding a cigarette and talking on a cell phone.\nThe polar  bear is white and showing his teeth\nA black cat laying on a green pillow.\nA plate of pastries with fruit and a fork and knife.\nA man surfing waves in the ocean on his surf board.\nA group of black cows with horns standing in the middle of a street.\nA broken surfboard on a beach with trees in the background.\nA group of people sitting around a table eating.\nTwo people are passing a man playing a piano on the street.\nBlack and white photograph of a fence next to a fire hydrant.\nYoung child brushing teeth using blue and white toothbrush.\nVintage red truck parked on a parking lot alone.\nTwo hot dogs with chili, cheese and tomatoes.\na small dog is sleeping on a chiar\nA man on a surfboard rides a wave.\nA baseball player is winding up for the pitch.\nA row of many kites in the shape of cows fly along with other kites.\nA woman holding a plate of food and a glass.\nA red fire hydrant on sidewalk next to a wet street.\na star shaped kite flying high in the sky.\nFour pans of food on a stove in a restaurant.\nThe blue necktie shows a picture of a pocket watch.\nA stop sign on a one way street.\nTraffic signal on the side of a bridge outside.\nA team of two makes their way down the water on a primitive raft.\nTwo red double decker buses passing in opposite directions.\nA skier posing for picture while straddling a tree.\nFour people with a group of elephants on a hill.\ntwo cups of coffee next to a white plate of pastry and icecream\nA backpack with rollers is sitting unattended in the middle of this forested dirt road.\na fire hydrant sits off a city street\nA bowl of soup, a metal spoon, and an orange on a wood surface\nPeople on a slope snowboard and skiing next to trees.\nA toilet sitting in a bathroom that is being remodeled.\nA cardboard box contains some old vegetables and some trash.\nLine of fire trucks driving down a city street.\nA plate full of different types of food.\nA man is holding scissors to his own head.\nA big pretty commercial plane on the runway.\na person sitting on the ground wearing a suit and tie\nA little blonde boy wearing a tie and purple shirt\nA vase filled with pink flowers on top of a table.\nTwo trains stopped side by side in a railway station, both with platforms\nA bathroom with shower and plenty of toiletries.\nSlices of pizza on plates and drinking glasses.\nA parking meter reads COPE on one side and four dollar signs on the right.\nTwo trains on parallel tracks near a station\nA men's public washroom with a blue floor.\nFour people sitting around a computer station talking.\nA man next to a woman with his horse by a house.\nA sign on a metal pole on a street.\nA man sitting on a couch using a laptop\nPack of elephants in tall green grass as one has its trunk raised.\nA home made pizza with cheese is on a shelf.\nA keyboard, mouse, and wires on a desk.\nMan speaking on phone with large sideburns\nA bus is going down a rural highway road.\nAn Asian meal with noodles, vegetables and soup.\nA woman pushing a baby carriage by a building.\nA horse drawn carriage stopped near the water.\nA donut is laying on a large noodle looking mat.\nA piece of cake on a white plate next to whipped cream.\nPeople on the sidewalk near a no left turn sign on a post\nthis living room is done in colors of black and red\nA street sign prohibiting bicycles, skates, and skateboards.\nA dog lying on a cement porch in front of a brightly painted building with a motorbike next to it\nA beached sailboat in the sand with a chair next to it.\nFire hydrant in non-traditional paint, whitish yellow paint with black polka dots in front of old style firehouse with USA flag.\nThe reflection of two dogs being walked down the street\nA young man riding on top of a skateboard.\nA vase sitting on top of a wooden table in a living room.\nA woman is looking at pastries in the shop's window.\nA pizza with onions peppers and cheese and coke to drink\nAn elephant standing on the ground near a lake.\nAn adult on skis is standing near a group of children with skis on.\nA bathroom with red and white tiling and a toilet and floor drain.\nTwo horse next to each other walking down a road .\nA man reaches to catch a frisbee in a grass field.\na group of kids standing next to each other in a room\nA train is approaching alongside a body of water.\nPeople are walking on a beach alongside giant rock formations and flying a kite.\nA bright kitchen with blond wood table and chairs and side server.\nCross country Skiier trekking through heavily forested land with snow.\nA boy learning to skateboard in a park\nGourmet pizza cooked and sliced and on a plate\nA variety of furniture sits scattered in a storage facility.\nA woman wake boarding in a lake having fun.\nsome different items of food in a glass case\nTwo tennis players play tennis on the court.\nSeveral horses walking along the beach by the ocean.\nSome big baskets filled with tasty looking apples.\nA polar bear standing open mouthed on a glacier\nThere is a blue bike leaning up against the wall.\nA picture of a car waiting at a intersection.\nA zebra grazing and standing on the grass.\nA set of professional knives attached to a mounted magnet.\nA woman is sitting on a leather couch smiling at another woman.\nAn airplane flying over a big harvest moon\nGroup of giraffes standing behind a caged in area.\nVarious equine horses and zebras inside stalls under a tent.\na bent white sign with a black pole\nThe buses are parked on the side of the street\nBaked pizza with red tomatoes and green olives.\nA cat drinking water out of a water bowl.\nThere is a cat standing in a toilet.\nA very pretty horse in front of a big metal structure.\nA man and a baby with toothbrushes in their mouth.\nA hot dog or sausage in a bun with bowl containing condiment and bacon on the side.\nMan chopping a chicken on a butcher block with a bottle of wine in front.\nA bench that has some water drops on it.\nA street pole that has a street name sign, a one way street sign and a map sign on it.\nA girl is standing in a field and flying a kite.\nA surfer posing for a photo with a surfboard\nA close up of a zebra's back with its neighbor's mane in the background.\nA cat watching water go down a sink drain\nA man laying on a bench and a woman next to him touching his face.\nA person wearing blue jeans and black tennis shoes riding a skateboard.\nA very close up view of some very tasty looking food.\na person is reaching for a piece of pizza in a box\nA man in a suit and tie posing for a photo in a large building.\nA street sign where St. Stevens St crosses 17 Ave S.\na man in white holding a plate playing\nA man is holding his arms out on his surfboard in the middle of the sea.\nThe chef is putting ingredients on the pizza.\nYoung baseball players on a field with a pitch being thrown\nA modern style kitchen filled with may different items.\nA home office features full bookcase, a laptop and a red leather chair.\nA man happy about a truckload of bananas.\nA varied collection of glass bottle containers on three sleves\nthe girls are standing in a room with the window behind them\nA couple of women shaking hands on top of a tennis court.\nA person wearing a wetsuit with a surfboard under one arm.\nA baby boy sucking on a pacifier while wearing a diaper.\nA man posing for a brochure picture with Akieys translations on it.\nA woman is watching a kid and man playing Wii.\nA group of guys standing behind tables on a stage before a presentation.\nWhite bowl with assorted fruits being eaten by fork.\nWe are looking past a speaker at a monitor.\nA white stuffed teddy bear sitting on a couch.\nA wooden desk has an open lap top on it and a pair of scissors.\nA refrigerator and a stove in a kitchen.\nA woman is smoking a cigarette and on her phone.\nA bowl full of food that is sitting on the table.\nA person standing on a sandy beach flying a kite.\nStreet sign showing the name of the street in English and then in \nAsian characters below\ntwo bikes sitting on a walkway next to some trees\nSeveral men standing beside of each other in a line.\nA giraffe leans its head over the fence of an enclosure.\nA personal size pizza with tomatoes, spinach, and garlic.\nA school girl in a uniform in front of a window.\na man is parasailing out at the beach\nA female showing an open door to a refrigerator.\nA bathroom with a sink, tub, shower head and mirror.\nA zebra with his head down eating grass.\nA man stands on a street corner next to a stop sign.\nA retail sign is hanging above a stop sign alert for added effect.\nA mounted police officer riding down a city street past parked cars.\nA passenger train moving along a railway in the country side.\nTw people ride horses with trees in the background.\nA white car at intersection of two roads.\na young man staring into the camera sticking his nose in between the handles of a pair of old metal scissors\nA giraffe standing next to another giraffe on a lush green field.\nA room full  of electronics and musical equipment\nA man riding a wave on top of a white surfboard.\nA clock tower with sculptures and a bell.\nTwo people in a park throwing frisbees at the camera.\nThe two zebras are standing together on the land.\nA made up bed in a well decorated room with art pieces on the wall.\nWomen hiding from the sun on a city bench\nPeople sitting at the umbrella covered tables next to the river\nA train going down the rails passes under a pedestrian walkway.\nA cat standing by some bushes outside in the woods.\nA bathroom with tiled floors and double sinks\nThree stop signs in the middle of the street.\nTwo men are snowboarding and skiing in the snow.\nMan with camera holding kite in park setting.\nA man is sitting behind a laptop computer.\nTwo tennis players are walking by the tennis net.\nThe man with the red and black bookbag is walking toward the building.\nAn adult man helping a youth on a skateboard.\nthis is a sign at a gas station\nA piece of thin-crust pizza sits on a plate.\nZebras inside of a fenced in field eating grass\nA young girl scrunches up her face as she holds a video game remote.\nA cactus sits in a pretty green vase.\nA city bus stopped at a crosswalk on a street.\nA girl sitting at a table drinking from a bottle.\nA person with there feet propped up in a chair in front of computer equipment.\nA street side shop next to an intersection.\nA train sitting at a train station platform.\na big plate of food that is on a table\nA man swinging a tennis racquet on a tennis court.\nAn old lady smiling in a pink kitchen.\nBray pickup truck parked in driveway of residential home.\nA white kitchen with stainless steel appliances and granite counter top.\nThis is a black and white family picture taken in the mid 1900s, of Grandpap, and his progeny\nA picture of something and it appears like food.\nA flock of ducks floating on top of water.\nA woman with an umbrella stands with her belongings on the ground.\nA white, yellow and blue airplane on a runway.\nA resort with palm trees, bridge, people and bushes.\na surfer on a surf board in the midst of a wave\na person standing in a living room playing nintendo wii\nSeveral people are riding in a horse race.\nA red city bus driving through city streets.\nBird walking in the water near the shore edge.\nA white and black bedroom with a white bed.\nA plate of food, dishes with food, and a pot of flowers sitting on top of a table.\nTwo pictures of a burger, onion rings and a beer.\nSeveral pans containing a few slices of pizza are displayed on a table.\nA man with his arms crossed in a Santa hat and wearing a tie.\nThe young person is carrying their surfboard into the water.\nA refrigerator in a kitchen next to a dining room.\nan image of a broken fighter plane on the runway\nThe employee is carrying gas canisters on a bicycle.\nA skateboarder comes off his board on a ramp.\nSome kites flying over some buildings in the snow.\nthere is a very tall tower with a clock on it\nA new roll of toilet paper is on the back of a toilet.\na crowd of people are looking off of a balcony\na child on top of a buket on the front lawn\nA man riding on the back of a green motorcycle.\nBoats docked in the water with a cloudy sky above.\na close up of a number of different remote controls\nA woman sitting in front of a plate of food.\nThe person is reading music and playing a keyboard.\nCommercial passenger jet at gate on airport tarmac.\nA man is surfing on a wave while another floats with is board.\nTwo people are drinking red wine from wine glasses.\nA child standing completely upright in front of a refrigerator.\nA woman is on a red surfboard in the ocean.\nA male and a female sitting together, the female is texting on her phone.\nA person guiding a child down a hill on skis.\nA white airplane is on a crowded airport runway.\nA person is standing on a snowboard near a bridge.\nA giraffe is eating grass in an open field.\nA number of peach trees on a sunny day.\na man talking to a group of kids as a cow stands in a cage\na stuffed animal sits in front of a book\nSeveral people are playing at a beach with a boat in the distance.\nThe man is sitting on a low ledge.\na dog sitting in the passenger seat of a big truck\na person taking a photo in a mirror\nA couch and furniture in a small room.\nThree men who are standing around a campsite.\nA sign mounted to a pole that reads \" No Stops \".\nA cat curled up on a bed next to a stuffed animals.\nA jet is sitting on the tarmac with blue sky's around\nBroccoli dish in a bowl with a fork inside of it.\nthere is a card board bus with a cat sitting in it\nA man holding a large umbrella with some girls and a woman underneath.\nThree people are posing next to a raw pizza.\nA single person skiis down part of a mountain\nA person holding a half eaten hot dog with toppings.\nA small child is in the kitchen with an adult and dog.\nThis white passenger bus is waiting at a stop\nA train on railroad tracks beside a platform.\nTwo dogs sleeping on a semi made bed\nA woman standing next to the ocean flying a colorful kite.\nAn ornate clock is surrounded by artwork and white arch.\nA hitter watching the baseball approach during an at bat\nA young child holding a skate board and pointing into the distance.\na small plane flying through a blue sky\nA dog sitting on a bed in a sweater with a indifferent look.\nThree elephants with seats and umbrellas stopped by a body of water\nA street crossing with a street sign for Mulholland and a no-U-turn sign.\nChicken and assorted vegetables are frying in a pan.\nA photo of people looking off into the distance holding an umbrella.\nA skateboarder launching his skateboard into the air as he rides it.\nThe little girl ordered a piece of cake at the restaurant.\nA dark road with power lines and street lights.\na transportation bus parked in a parking lot\nA boy cutting a pizza on a wooden cutting board.\nA small bird sitting in the sand at the beach.\nA zebra, walking on dry grass, is seen from the rear.\nthere is a piece of cake on a plate on the table\nClock tower overlooking a red shuttle bus outside.\nA young foal nuzzling its mother in the nose.\nTwo boys perform skateboard stunts on the street\nThe cat is sitting on top of the remote.\nVarious home appliances are lined up on the sidewalk.\nA child smiles in front of a container of carrots with a stuffed rabbit.\nA casserole sitting on a counter with apples and a measuring cup behind it.\nA strawberry pound cake with a slice taken out.\na red and white airplane ascending in the sky\nA plate of food including rice, broccoli, protein and a sauce in a bowl.\nTwo girls eating food at a Chinese restaurant\nA formally dressed man with a martini poses with two women in evening gowns.\nA big pair of scissors is on a wooden box.\nA tall vase full of orange flowers sitting on a table.\nGroup of people crossing a busy city street in the rain.\nA male college student playing frisbee in the park.\nA big commercial plane flying low by a bridge.\nCars stopped on a road blocked by a herd of sheep.\na woman is holding a toothbrush up to a masked face\nthere are many birds that are standing by the water\nA large cut pizza on a dining table.\nA person putting sliced carrots into a dish.\nA male emo hipster wearing a furry jacket in front of a laptop computer.\nA pile of carrots, radishes, green beans, and broccoli on a cutting board.\ntwo bulls in a field between bushes with a sky background\nA couple of birds are walking around the grass\nA cat relaxing on a plaid couch on a person's clothes.\nA work out ball sits on a chair near a cluttered desk.\nA cat walking into a kitchen with a phone and fridge visible.\nA man near the ocean catching a frisbee on the beach.\nA cow with a tag is staring at a viewer.\nThis is an image of a man with an umbrella.\nA person on a cell phone by a big stone wall.\nA man and a elephant that are standing in the dirt.\nA crumbling bathroom has a sink and a medicine cabinet.\nA female pedestrian stands in the center of a crosswalk as a double-decker bus quickly approaches.\nThe young batter wearing a helmet prepares to swing.\nThe sheep are all standing around together in front of a monument.\nA crow sits on the roof of a blue car.\na kitty standing in an empty food dish eating from aniother\nPanorama of a field with cows next to a dirt road\nA clock repairman working at his table displays his wares on the wall.\nA painting of a house showing the bathroom, kitchen and bedroom.\na street sign on a pole with a sky background\nA photo taken through a window at houses on a hill.\nThree people near a truck in a sunflower field.\na person riding a surf board in a wave tunnel\nA panda is eating a frozen treat with fruit in it.\nA cat sitting on a window sill outside.\nA cake decorated with things from a barbershop\nA group of elephants walking down a river with people riding them.\nA man is kissing a woman on the cheek.\nVintage tour guides stand next to an early bus.\nA man holding up two ripe bananas in front of a house.\nA white plate topped with carrots, potatoes and dumplings.\nShoes rest on a carpet next to a drawer with a picture on top.\nA plate with waffles, butter and a fork and knife.\nMan crouched over a suitcase looking at the items inside\nA stop sign on a corner of a road\nA woman on a surfboard in the ocean.\nAn old goat with big horns resting in the shade\nA child in a room with a remote in hand.\nA new home banner sits beside a small curvy road.\nAn orange has a frown drawn on it with a knife in it.\na bathroom with a bath tub a sink and a mirror\nMany kites in a field launched and launching\nA clock that is on the side of a building.\nA woman sitting on the ground next to two dogs.\nFood sits on top of a refrigerator covered in magnets.\na person standing in a living room playing nintendo wii\nA gray cat is wearing a red knitted rabbit hat cozy.\nA clock in a busy city at night.\nTwo patrolman on horseback standing in front of an establishment.\nA large crowd watches a professional tennis match.\nA large black and white cow standing in a desert field.\nAn orange cat laying on top of a wooden bench.\nA close up of a motorcycle parked on the sidewalk next to a door.\nA jumbo jet Fed Ex plane on a runway of an airport.\nA brown teddy bear in a forest with trees and shrubbery.\na couple of birds stand on a grass hill\nA decal of a skateboarding man is applied to the wall.\nA bathroom with a pedestal fan in ti.\nan inflatable blue car on the beach with a man walking beside it\nHungry man enjoys lunch at a local restaurant.\nA small dog sitting on top of a computer  in a bag\nAbstract picture featuring girl on tennis court with racket.\nA couple dressed to be married are pretending to talk on cell phones.\nFriends playing and taking pictures with a camera phone.\nA girl is petting a horse out in a field.\nTraffic on a city street with busses, trucks and cars.\nA giant giraffe made of building bricks, outside of a building.\nA train sits on the rails beside the station.\nA toilet has several toilet paper roll dispensers.\nA young boy using a lap top by a table\nGuy in a helmet on a skateboard in red.\nA young man on a skateboard doing tricks on the cement\nA couple of people near a truck on the road.\nMany bunches of bananas sit atop this grocery store display case.\nA man is skateboarding while at the park.\nA professional baseball player running around a base on the field\nA flat screen and a keyboard and mouse on the desk\nA city garbage truck with three men in the front.\nA scene of something that is quite attractive.\nA man in a shop holding a picture of two men.\nA woman tennis player is waving at the fans while she holds her tennis racket.\nA road shot has a radio antennae and a small section of windshield, a brown hillside, the vanishing road beside it, bikers, close to the antennae, and far away, and off in the distance, signs, a car,  and a big blue sky.\nA heard of cows stands in front of a man with a tractor.\nA baseball team talks with coaches on the outside of the field.\nA crowd is watching a baseball game being played.\nTelevision and computers on with no one utilizing them.\na baseball player throws a pitch to a batter\nA child holds the line of a kite flying in the wind.\nA man standing over many doughnuts on display.\nA group of young men standing next to each other on a ski slope.\nA penguin is running through a pasture as sheep graze.\nA small portable set of burners with a tea kettle are on the counter top of a neat, clean efficiency kitchen.\nA reflection of a dog in a vehicle's side view mirror.\nA car driving by a herd of sheep.\nA tennis racket is laying on the floor of a tiled room.\nA fighter is jet flying through the clouds\nA miniature blue bow of fruits next to a penny.\nThere are two pieces of cake on a plate and a glass of pumpkin juice.\nA boat that is inside of the water.\nA Thomas the train engine model cake with writing on the platform.\nA FedEx plane moving on the snowy runway.\nThere is a white cake and some small cookies\nview of a bathroom with white toilet and white sink\nA workspace inside an office with snowy trees outside the window.\nThe next hitter in the baseball game saunters to the plate.\nA little boy that is holding a bat.\nA large mirror reflecting a bus driving down a street.\nSilver and green train sitting at a train station.\nA clock that is sitting on top of a metal pole.\nA horse jumping over a wooden jump at a horse show.\na person racing on a motorcycle on a race track.\nA smiling young buxom woman is displaying a sandwich and a glass of beer.\nA woman standing in a room with a remote.\nTwo giraffes that are standing by each other in a field.\na baseball player swings his bat at a ball\nA person on skis riding down a race course on a hill.\nA large clock fixed to a building as vehicles pass by.\nTwo giraffes rubbing their heads and necks together.\nA person standing in the snow with their hand up to their face.\nA small boy skiing down a snowy hill\nA passenger bus that has two levels driving down a street.\nA view of a city street through the windshield of a vehicle.\nA man working on a propeller driven airplane.\nthere is a dog laying on a couch with many blankets on it\nA living room filled with furniture and a TV.\nA young child swinging a baseball bat at a baseball.\nA man and his boys play Wii Fit in their home\nA jockey is on top of his horse number 6.\nA row boat is tied up to a dock.\nA man on his motorcycle with a teddy bear attached.\ntwo large elephants walk on the green grass\nA woman is cooking over an open flame in the cabin\nA woman standing  near a kitchen counter talking to someone\nTennis player getting ready to back hand the ball over the net.\na brown and white dog is riding a skateboard\nA young boy is riding his skateboard down a hill.\nA dog holding a yellow frisbee in it's mouth.\nA plate of fruit near some other bottles of liquid\nAn elegant white vase of colorful flowers rests on the windowsill.\nA bathroom with a small sink vanity and a toilet.\nA woman in heels pulling a suitcase behind her\nTwo horses pulling some carts in the street\nA flock of birds sitting on top of a set of power lines.\nA vintage photo shows students sitting at their desks.\nPeople sitting on the beach and sitting on beach chairs.\nA man is at bat in either a baseball or softball game\nan old laptop and a dog rest on a bed\na blonde girl is wearing a clip on tie\nA cat looking up between two plastic bottles\nTwo cat lying on a floor playing with each other\nA man performing a jump on a skateboard.\nWhite show horses and handlers performing during public event.\nA girl eagerly bites into a hot dog bun\nA man in an orange shirt pushing a stroller.\nsomeone jumping up to get a frisbee out of a tree\nA train is passing through a residential area with houses, trees, cars and pedestrians.\na man sitting on a rock while he watches elephants in the water\nA white jet airliner on runway with mountains in background.\nA red tray that has some food and an orange drink on it.\nA tour bus unloading at a rest stop.\nTwo shirtless men playing Frisbee in a grassy area\nA woman leading a brown horse down a sandy beach.\nA man surfing a small wave in the ocean.\nAn ad for Costa Rica shows a beach scene with surfers.\nA toddler is in the bathroom holding his ear with one hand and his other hand is closed together.\nA red and white tow truck tows a white car down the street.\nA passenger train is passing a cargo ship.\nTwo gray elephants standing next to each other.\nPeople are riding horses through a parking lot.\nA close up of a dog wearing a Christmas themed hat.\nMan holding paddle in air on surfboard with patch in corner\nA busy looking street area in an asian country.\nA tall giraffe standing next to a tree.\nPeople sitting in a chair lift in a purely white landscape\nA man in black coat standing under umbrella next to a building.\nA young man sitting on the beach with a surf board.\nA passenger train parked at a train depot.\nA little girl sitting at a table with a piece of cake.\nThree people are riding down a street while buses are in the background.\nA person standing on skis with a backpack in front of them.\nTwo young boys are seated with their legs crossed.\nMen are lying on a couch with a computer on the table.\nA kite is being flown by a man in the distance.\nA group of skiers posing for a photo.\nTwo hot dogs smothered in salsa on hot dog buns.\nA hotdog and fries sit on a table.\nMany elephants are walking near a muddy watering hole.\nA boater smiles as he paddles his canoe.\nA rusted up bard sinking into a body of water.\nA bathroom with a sink, toilet and picture on the wall.\nA pizza is on a plate of tin foil.\nDinnerplate with me vegetables and other condiments.\nA young man is sitting in a chair and has mismatched outfit and a name badge.\nA man on a skateboard performing a trick.\nA brown and white dog and person standing on a wooden floor.\npeople on the beach playing with a brown cow\nA desk area with a window view with mugs, tablets, and books.\nA city bridge with a clock on the top of it\nTwo zebras grazing on grass in a field.\na woman with a blue umbrella standing by some stairs\nA person on a surfboard is riding a big wave.\nA large clock hanging off the side of a building.\nThe cat is lying on top of a pair of shoes.\nShrimp, broccoli and carrots are in white dishes.\nCanoes and motor boats sit along the water's edge.\nA persons legs with a dark colored cat rubbing against their legs and shoes.\nA RETRO FOOD CHOPPER IN A CORNER ON THE COUNTER\nA picture of a tennis player about to hit a ball.\nA yellow and green fire hydrant on the side of a street with peeling paint.\nSeveral elephants are standing in a  desolate field.\nA picture of some food on a plate.\nTwo skate boarders riding down a paved path.\nA person on a court with a tennis racket.\nA group of people in a park with food.\na person jumping a skateboard into the air\nA view out the window of an airport terminal\nA polar bear is looking over the grass at something off camera.\nA woman serves a tennis ball during a match.\nmany people in a kitchen area preparing a pizza\nTwo identical airplanes are flying side by side with people doing tricks on top of them.\nA black and white dog sitting in the grass next to a frisbee.\nTHREE MEN SITTING ON A HUGE GREEN BENCH IN FRONT OF A BIG YELLOW BUILDING.\na young boy holding a tennis racket\nA clock tower with ornate designs above a bridge.\na big plane sits parked as a bunch of people watch\nA baseball player catching a baseball in a catchers mitt.\nA man sleeping in clothing on a bed.\nthis is a computer and books on a desk\na man is standing surrounded by a lot of luggage\nA man jumping up on a blue tennis court with a black tennis racket in his hand.\nA dog laying under a brown computer desk.\nAntique black truck with a barrel in the bed.\nA bunch of stuffed bears and gift boxes in a suitcase.\nA horse near another horse in a building.\nView of a smartphone sitting on a computer keyboard.\nA blue train traveling down tracks next to a building.\nA man with a kite on a hill\nA couple of bears on a shore near some water.\nA wedding cake design with roses and wine glasses.\nA group of people standing with some motor bikes.\nA baseball player dropping his bat and beginning to run\nA skateboarder does a trick in a crowded skatepark covered in graffiti.\nTwo people in the snow on skis taking pictures.\na line of kites that look like cows next to the road\nAn arrangement of items from a woman's purse including wallet, cell phone, MP3 player, gloves, hairbrush, eyeglasses case and day planner\nA man holds the hand of a child as they look at a row of cows.\na bathroom with fancy sink in the corner\nthe toilet in this bathroom is in disrepair\nThe city bus is parked on the side of the road.\nA traffic light with a bike signal on a pole.\nA dog and a sheep are separated by a wire fence\nThe entire pizza is in a box atop the dishwasher\na soldier is receiving an award from a man in a suit\nA group of giraffes stand next to a building and tree.\na white green and black sign and a bicycle without wheels\nA teddy bear with a book is placed in a wooden chair\nA bobble head baseball figurine on a desk.\na meal on a table which includes pizza in a box and a bottle of beer along with a beer mug\na small group of zebra and giraffe in a savanah\nmany trains on tracks near a building\nA man standing next to a red motorcycle.\nA man smiles while holding his cell phone.\na close up of a school bus parked in a lot\nA person holding a rope hovers over the ocean.\nA boy and a girl under an umbrella.\nA man standing on a lush green field holding a kite.\nA zebra standing in dry grass has dark and light stripes\nComputer desk with monitors and large monitor displays.\nPeople are gathered around at an outdoor table.\nA loving couple who has fallen to sleep together on a couch.\nthis is a  pizza cut into slices\na man that is walking with something in his hand\nA fire hydrant with a painting of a face on it.\na woman in red is riding a horse\nA young boy standing ready to hit at the plate in a baseball game.\nA group of people seated at a table in a restaurant\nA man seated on a park bench with his head down\nA very clean, modern living area has a very comfortable couch-bed and a wide screen TV.\nA player is in motion as he reaches back to throw the ball.\nTwo elephants touching each others trunks beside each other.\nA man in racing gear and number under a banner.\nTwo men sitting on a yellow boat in the water.\nseveral elephant type large yard ornaments setting outside.\nan image of a plane that is taking off in the center\nA man and woman standing in front of some pizza.\na woman in a bathing suit standing near water taking a picture\nA picture of a giraffe walking around its enclosure.\nshowing lemon, red pepper, zucchini, ginger, and yellow squash\nA toothbrush holder with tooth brushes inside of it.\nA red double deck bus traveling along a city street.\nThere are baby birds in a birds nest\nTwo fancy dressed people ride on horses down the street.\nA large black bear walking across a lush green field.\nA woman holding a teddy bear in a costume while wearing a really tight shirt.\nA person snowboarding down a snow covered hill.\nApples, plums, peaches and pears sitting on a metal counter top.\nA woman is reading a book as she sits on bench with a sign in front of it.\nWoman on a Kitchen counter on the phone with paint.\nmany people of all ages skiing on the snow clad mountains.\nA pizza on a large knotty pine kitchen table\nA group of policemen on motorcycles in a city.\nA crowd of people standing around each other in front of a shack.\nA large white polar bear standing on a icy pool.\nA suitcase sitting in a living room of a home.\nBoys playing with a colorful kite in a park\nA street sign on a pole next to a building.\nA cat that is laying down near a shoe.\nA woman with long red hair packing herself into a suitcase.\nTwo people talking and a young lady that is reading a book on a bench.\nMany sheep are out in the green grass.\na person swinging a baseball bat at a ball\nMan on grassy field getting ready to catch yellow frisbee.\nA piece of asparagus quiche and carrot salad on a white plate\nMulti colored cat laying on and among shoes and boots.\nA shaggy mother pony and her foal in a field.\nA person is getting a slice of pizza from a platter.\nA close-up photo of a young cauliflower plant.\nA comfort bus is driving on the street.\na woman standing by a window while talking on the phone\nSkateboarder jumps high off of a ramp into the air.\nA parent and child playing with a plastic basebat and ball on the beach\nA grey bird perched on a tree branch\nA MAN IS ON THE SNOW BOARD IN THE SNOW\nAdults gathered in living room playing video games.\nTwo woman at a table full of wine\nA \"pet crossing\" sign with a peace sign on it is on a pole by a tree near the highway.\nA group of zebra standing in the tall grass.\nA man in an apron standing at a table full of oranges.\nA bed near an open window with a small fan in the windowsill.\nA newly married couple kissing next to a food van.\nthere is a small bowl with a lot of food in it\nA boy running and flying a kite in a field.\nA cat lying on a pink blanket sleeping.\nA large mirror with black framing on the wall of a bathroom above the sink.\nA bunch of people walking and doing things down the street\na plate that has some cut up vegetables on it\nA bunch of doughnuts that are on a tray.\nA man is holding a bunch of banana's\nAn elephant scratching his ear in the sun.\nTwo horses pulling an old fashioned style carriage down an urban street\nA baseball player holding  bat while standing on a field.\ntwo people standing next to an elephant in fenced enclosure.\nA red and white napkin covered with fries, a burger and coleslaw\nSeveral objects displayed on a kitchen table including bread, oranges and plating.\nthere is a blue left turn sign on this street pole\nA man riding on a bicycle down a street while holding a surfboard under one arm.\nA woman with an intense look rared back with her tennis racquet.\nYoung soccer players on field during match play.\nA plate of fruit next to a cup of coffee.\nA baseball player holding a bat standing next to home plate.\nSome young ladies in swimsuits sitting on a dock over water.\nA boy riding on the back of a motorcycle near a truck with pineapples in it.\nA stop sign at an intersection that has stickers and leaves on it.\nA beige and white bathroom with white toilet and honey colored hardwood vanity\nA little girl wearing a hat has one foot on a skateboard.\nTwo desktop computers sitting on top of a desk.\nThe desktop computer has three different working screens.\na few people that are standing next some motorcycles\nA bird on a table eating from plates of food\nA desk with a midi controller to make music with.\nA boy and a girl on a boat while another boy is standing on land with one foot on the boat.\nA slice of rich and decadent cake covered with frosting sits on a plate.\nA tugboat sits beside a ferry on placid water with a mountain in the distance.\nA man is holding two sandwiches one  in each hand.\nThe cattle are standing in the dirt path.\nDisplay case full of several kinds of donuts in a shop.\nA momma zebra and her baby running through a field.\nA man in a black jacket taking a picture of a sink area.\nA cat sitting on the edge of a sink in a bathroom.\nTeddy bears seemingly hug one another against a dark background\nA small bathroom with a sink and vanity.\na big animal that is in some grass\nA big sandy beach with some kites flying in the air.\nA man laying in bed with a book over his face.\nA baby plays with a teddybear while sitting on a green blanket outside.\nA small cat has it's front paws inside a toilet.\nA bird sitting on a house eave in a backyard.\nSome people sitting at a table with open luggage and papers\nA man in black jacket flying a kite on a beach.\nA urinal in a public restroom near a wooden table.\nA man in a very fashionable cleanly decorated bedroom.\nA container with a meat sandwich and fork is sitting on the grass.\nThis seems to be a bear laying on the snow.\nA bedroom with a bed, desk and a television.\nFar shot of a clock on the side of a building.\nA rural street at an intersection with cars in the distance parked on the curb.\nA black and white colored cat on top of a wooden bench.\na woman riding a surfboard on  a wave in the ocean.\na bath room with a toilet and a shower\nA gray bathroom is lit up to show to sinks.\nFour planes fly through the air in a black and white photograph.\nA baseball player is in motion with his bat.\nA slice of pizza is on a plate on a table.\nthere is a large bowl of food on top of a table\na train on a track near a platform\nA pizza with pasta on top and olives and pepperonis.\nA grey teddy bear with a red bow and a card.\nPeople in a square near a small clock tower.\nA striped cat laying on a wooden bench\na silver oven some pots pans a knife and cabinets\nA young boy skis down a slope with adults standing in the background.\nA young boy eating mushrooms near a pizza.\nA bear looks ahead from a field of vegetation.\nA housewife holds a platter of food in the kitchen.\nTwo giraffes in a zoo enclosure stand by a wall.\na garbage truck in the city late at night\nA group of elephants gathered near some poles.\nA man holds a skateboard in his hand.\nThe official box for the Wii game showing a hand holding a controller.\nA man walking across a field holding a baseball bat.\nA line of people crowd the sidewalk beside a business.\nTwo men watch as yellow aircraft flies over a lake.\nA plate of food that includes meat, broccoli and potatoes.\na vintage photo of a cake walking on a toilet\nA apple that is taped to the back of a laptop.\nAn old lady is smiling happily sitting on a motorcycle.\na brown teddy bear is sitting on a green bed\nA man with sunglasses talking on a cellphone.\nA WOMAN IS EATING A SANDWICH OUT ON THE GRASS\nA kitchen with hard wood floors and wooden cabinets\nGiraffes standing together and other animals in the background.\nA fat gray tiger cat laying on top of  bed up against a pillow.\nA bunch of luggage bags with tags on the floor\nAn empty chair is set in front of two computers at a work desk.\nA Ferris wheel is visible behind the building's clock tower.\nA group of people are putting their sweet treats all towards each other.\na double decked bus parked by a stadium\nA group of people comparing cell phones together\nThe two tables are each covered with food and plates.\nYoung girl posed with a bunch of cell phones and a \"New Years\" party hat.\nThe man in the hat walks along using his cell phone.\nA man that is leaning over a tray of doughnuts.\nTwo Zebras are eating grass together in the wild.\nA group of cars driving past a mcdonalds near a bridge.\nA woman performing a shot in a tennis match\nA Canadian airplane with a big red maple leaf is flying high.\nPlane next to a boarding ramp under a cloudy sky.\nSandwich made of two doughnuts sitting on top of a plastic plate.\na girl balancing on a surf board while a man watches behind.\nA cross country skier stretches on an open field of snow.\nA man walking across a field holding a wand near a dog.\nA beach setting with tons of people around the shore.\nA train station stands majestically and functionally while passengers wait for their train.\nA table with a keyboard and some other items.\nA little baby zebra running around in a fenced in area.\nSome motorcycles are being displayed in a window.\na dried up stream stands two zebras and there are other animals in the background with trees.\nThree people posing with sundaes in glass bowls.\nA black dog laying on a tile floor next to wall.\nA folder sitting on top of a wooden bench.\nAn empty double-decker bus rests against the curb, alongside some buildings.\nA row of outdoor food tables look very primitive.\nA plate with two doughnuts, strawberries, and coffee.\nThis is someones bathroom sink in their home.\nA tall clock with a small tree beside it.\nTwo women who love and care for their horses.\nA man skateboarding in an old abandoned pool.\na person on a train station platform\nA long train sitting on a railroad track.\nThe baby boat is drinking milk from it's mother.\nA meal at a restaurant of a salad, a toasted sandwich and a pickle\nA group of men playing instrument next to a wooden wall.\nA set of coffee mugs sitting together on a small wooden table near a bedside.\na big airplane that is parked on some concrete\nA small dog rests in a large dog bed, snuggled on a blanket.\nSeveral zebras in an open area during a not so sunny day.\nWomen sitting at the table eating meals at the restaurant\nA man bends over an open toilet and looks in it.\nPeople at a park, taking walks, sitting on the grass and throwing Frisbees.\nA train engine carrying carts down a track past some buildings.\nA woman hovering over food on a wooden table.\nA cat laying on a TV in the middle of the room.\nA man rides a bicycle carrying snow skis.\nA mockup of an African elephant stands in a museum\nthis is a bird sitting in some grass\nA woman chops vegetables in a kitchen.\nA classroom with a rug on the floor that looks like a computer keyboard\nCupcakes with frosting sit on a foil covered tray.\nAn old photo of a man with a pipe and a beer.\nA woman with bleeding nose and blood stained shirt looks into a cellphone.\nA nice shiny suitcase is positioned alongside sneakers for a quick getaway.\nA monk is looking at a mobile phone among ancient architecture.\nMany sheep grazing next to a busy road.\nThere is a person sitting on a motorcycle.\nA small girl eating a plate of food with a fork\nA lush green field with colorful kites flying above it.\nWhite dog sticking his nose out from under red and white striped bed ruffle.\nA giraffe resting it's head on a fence at a zoo.\nA busy street full of cars and buses with buildings in the background.\nA plane flying with a smaller plane above it.\nAn elephant is walking towards a tree in a park.\nA large elephant walking across a field of grass.\nA small black and brown dog standing next to a cow.\na red bus is parking in the field.\nA bus makes its way through the city street.\nBoats in a river on a foggy day.\nThis unusual animal figurine sits in front of a clock.\nA group of men are in discussion around bananas.\nTwo zebras who are in a field together.\nA desktop computer has two keyboards and two mice.\nSeveral people walking around near a white van.\nAn office with file cabinets, a keyboard and chairs.\nAn elephant guided by a man in a blue shirt and followed by another elephant.\nA pan sitting on top of a stove top under a wooden spoon.\nTwo women standing on a purple tennis court.\nMunching in the grass is a daily habit.\nA yellow school bus parked in a parking lot full of snow.\na man holding a cell phone towards the camera\nBalls of garbage sitting on top of a toilet.\nA group of men cutting a giant sheet cake.\nA tram is traveling down a green track\nA pizza that is topped with an assortment of items and sliced.\nA man wearing a red baseball cap walks along a grass field with a backpack\nA big bear is standing next to the bars.\nSeveral people fly kites above a paved outdoor area.\nthere are many beer signs on the side of buildings\nThe clown is driving down the grassy area.\nA bus that is parked in a lot next to another bus.\nA cat sitting on haunches next to a wooden door.\na man doing a jump with a skateboard in the road\nA skier leans as she makes a turn down the hill.\npeople sitting at tables next to a building in the background\nA stop sign is posted near a road with a bridge in the background.\nMen on horseback going through a crowd of people.\nA group of people on line at an intersection.\nA couple both holding a knife and cutting their wedding cake together.\nA meatball sub served with french fries on the side.\na desk with a keyboard and a monitor on it\nA beautiful woman playing a game of tennis.\nA person cutting out pictures of clothing items.\nA giraffe out amid the trees and grass.\na double decker bus going down a road beside some stands\nTwo boys playing frisbee on a soccer field.\nThere is a small yellow bird standing on a fence\nA living room filled with furniture beneath a window.\nA couple of me playing tennis on a plane flying in the air.\nThree men are standing on a baseball field with a crowd watching.\nA family are in their skies posing for the camera.\nA small red belt clip cell phone case.\nA dresser with a clock and a potted plant on it.\nA person wearing brightly colored clothes is riding a motorcycle.\nA dimly light living room with wooden floors and large windows.\nA heard of Zebras moving with another animal group across a field.\nA girl is showing off a stadium hot dog\nSomeone is using a small grill to melt his sandwich.\nA bright green frog on a bright green plant.\na close up picture of a large variety of fruit\nA plate of food on a wooden serving tray\nA snowboarder with ski poles in midair facing the ground.\nThe person is putting toppings on his food.\na man in an orange and white striped shirt with some scissors and machines\nTwo women share some chocolate cake and coffee.\ntwo people eating food off of a paper plate\nA skateboarder is featured at different positions on a ramp.\nPeople standing next to a bus with a cat face on the front.\nA group of walkers are seen while passengers ride in a train.\nA bright bedroom with a red bedspread and someone laying on the bed.\nA polar bear standing high on a rock.\nA man and a woman smiling while holding an electric keyboard.\nA river that has many boats floating in it.\nThe two bears are wondering about the point of the camera.\nA woman showing a teddy bear to another woman and child.\nA table that has been served soup and fruit.\nA view of a kitchen from the doorway.\nA car is seen in the reflection of the microwave.\nA catcher reaching out to catch a ball while the batter is swinging.\nCars line up to coin meters on at a busy sidewalk\nA guy sitting at a table in front of a birthday cake with candles in the cake\nA highway sign on a rocky slope along side the road.\nthis is boats sitting in water near grass\nA dog chasing a group of birds outside.\nA man and a woman standing beside each other.\nA cluttered and dirty kitchen counter top, with food spread around.\nA tomato and an apple sitting on a table.\nThere is a sink and toilet in the bathroom.\nA fire hydrant with writing on it on a street corner.\nA man in a suit holding a red ukulele\nA young child smiles as he holds a tennis racket.\nA herd of cattle walking along a sandy beach.\na brown teddy bear and some wooden block toys\na couple of dogs running around a field\nA guy riding a bike and carrying a surfboard turns to look behind.\nUnattended luggage in a roped section of an airport lobby.\na man holding a sandwich and another on a plate\nA person holding up a cell phone taking a picture.\nWhite flowers in a tall brown contrasting vase.\nA white horse looking out over a fence.\nThe sky is dim as the sun changes positions behind a building.\nA group of men riding on a horse drawn carriage.\nA wood bench under a tree in front of some bushes.\nA bus broke down on the side of the highway and all the passengers had to file out onto the side of the road.\nA cellphone, piece of fruit and cup are on the table.\nA living room area with wood accents on the wall and floor.\nsomeone that is holding a wii remote in their hand\nA sign for the Atlantic City Convention Center.\nA man is standing behind many different fruits.\na close up of a sandwich on a plate\nA tall white clock tower with a black clock on each of it's sides.\nPlate of food, including hot dog, ribs, beans, and corn.\nThe surfer in the wetsuit is coming through a very big wave.\nA group of people fly guiding on the sand.\na person riding on an elephants head walking on a dirt road\nthe woman is giving the solider something to eat\nA bench is sitting in front of the water.\nA toddler with a pacifier wearing a neck tie\nThere is a suitcase with items surrounding it.\nThe baseball player is getting ready to take his turn at bat.\nA green sign  that says rockaway beach on a post.\nthis image is of a boy with a skateboard doing tricks\nA boy does an ollie in a skate park on his skateboard.\nA bowl filled with soup sitting on top of a white place mat.\nA trash can on a corner has a microwave in it.\nA picture of a dog sitting in the backseat of a car.\nA person standing in a living room with a fire place.\nMotorcycle police are on large bikes in a crowd of people.\nA picture of a full bathroom with a large tub.\nBalloons and banners decorate the open fair grounds.\nA woman holding a tennis racquet on a tennis court.\na person walking on a city street with an umbrella\nA vespa parked with a cover in a fence\nA fire place sitting in a living room under a mirror.\nA classroom with a purple chair and a chalkboard.\nThe interior of a bathroom made of stone and colored glass.\nA pizza laying on top of a wooden board.\nMan with no shirt holding frisbee in grassy, rocky area\nScissors, a hole punch, and paper laying next to each other.\nThe luggage boxes are downloaded from the aeroplane.\nA stunning skyline sits in the back drop of traffic lights.\nall of the parking meters on this street are covered with plastic bags\nthere are many people laying in the sand at this beach\nPeople are purchasing food from a fruit salesman.\nA woman is standing looking down at luggage.\nA smiling man at a table has a wine glass.\nA man riding a wave on a surfboard near a para sail chute.\nA tennis player on sand in the middle of a play.\nA person that is in the snow having some fun.\nA group of people are sitting by a truck on the ground.\nSeveral cakes are on display in the bakery\na laptop sitting on a table, with a beer and tv in background.\nView of a snowy mountain outside the windshield of an aircraft.\nThis kitchen layout appears choppy and full of \"blocks\".\nA plate with a wide variety of food on it.\na woman in a dress and a tennis racket in hand\nA couple of men on horses and people on bicycles in a courtyard area in the nighttime.\npeople skiing on a snowy ski bank while wearing ski wear.\nA smart device sitting inside of a white bunny bat.\nA clock that is embedded in the ornate top of a building.\nA pair of woman lunge after a tennis ball on a doubles tennis court.\nA young person sitting in his seat working on his laptop.\nA dog looking out a window of a car.\nA living room with two blue couches and entertainment equipment.\nA giraffe looks at the back of its enclosure.\nA plate with food and a newspaper on a table.\nperson cutting paper with scissors at a table\nan older person standing playing nintendo wii system\nA young man in striped shorts rides the waves on a surfboard.\nA brightly lit, quaint and clean living room.\nA man is surfing on a wave in the ocean.\nA group of three men riding snowboards on a snow covered slope.\nA person is holding a nintendo wii controller\nA small pizza sliced into four pieces garnished with green leaves.\na group of people under umbrellas at a beach\nA person is flying a kite high in the air.\nA fluffy cat is sitting on the sidewalk.\na toothbrush holder with four toothbrushes in it\nAn office with a two desks and a filing cabinet.\nThe person is flying a kite with two strings.\nBananas are hung up to ripen at an outdoor market.\na woman is standing by a sink in a kitchen\nA woman sitting at a table holding up a pair of scissors.\nA black and white kitten is asleep on a keyboard laptop.\nCouple sitting at a table in a restaurant with pieces of cake.\nCarrots, celery, nuts, onions, and bay leaves are mixed together in a bowl.\nA man standing on top of a snowy mountain\nA kitchen area with a stove, refrigerator and sink.\na white cat covering itself with an umbrella\na bird is standing on a green bench\nA group of holiday bears are arranged in a group.\nA big white bird standing in front of rows of benches.\nA little boy holding a baseball bat getting ready to swing\nA futuristic bike parked in front of a sail boat.\nAn elephant is spraying water out of it's trunk.\nThe yellow train is headed towards the final destination.\nBlack statue on marble base surrounded by security ropes.\nTable and chairs set up at the back of a church.\nthis is a traing riding through a city\nan open suitcase and a closed suitcase on the floor and a cat on the bed\nA man about to hit a tennis ball with a racket.\npeople walking on a path around log cabins\na man on the tennis court with his arms stretched out\nThere are two zebras in a rocky plain\nA man with a small backpack cross country skiing\nA tennis player standing on a tennis court looking up.\nA woman holding up a large carrot in a backyard.\nA person walking in the ocean with a surfboard under their arm.\ntwo slices of pizza sitting on a plate next to a fork\na couple of people on skis ride through the snow\nA bunch of bananas hand from a banana tree.\nA couple of men playing a game of frisbee.\na hot pocket sandwich  laying on butcher paper\nThese families are riding on the backs of elephants\nA fresh vegetable shop in a vegetable market.\nA person on a skateboard on a street.\nA man riding a surfboard in the ocean on water.\nA skate boarder falling down in a very big ramp.\nLivestock, people, and vehicles on asphalt near a building.\nThe city bus is parked in the parking lot.\nA person with a kid on top of a horse.\nA person playing tennis on an outdoor court with trees.\nA statue stands in a courtyard near a colorful flower bed.\nA man riding on the back of a horse.\nThe cabin of a small boat has two couches\nTwo decker bus entering leaving Winchester Bus Station.\na big sign saing where to go for parking\nA view shows the bedroom and bathroom close together.\nTwo zebras are facing away from each other.\nA child flying a butterfly kite while another child rides a scooter.\nA couple of giraffe standing on a lush green field.\nA baseball player waiting for his turn in baseball game.\nTwo people ridding horses on a dirt trail with woods behind them.\nA person walking with a small brown pony on a leash.\nA blue tent sitting in the middle of a forest.\nThe child's bedroom has two low beds and storage space for toys and entertainment.\npeople walking with umbrellas in a rainy  london england\nA bedroom with bicycle, computer desk and checkered bedspread.\na man doing a trick on a skaeboard\nThe celery and carrots are on a cutting board with a knife.\nA hand holds a piece of fruit with the peel cut off.\nGroup of black chairs sitting underneath a blue umbrella.\ntwo benches sitting on the beach by some trees\nA group of green traffic lights on a street filled with snow.\nA dog running behind three sheep in an open field..\nA piece of pizza on a white plate with multiple toppings.\nSeveral airplanes can be seen at the airport but there is also snow on the ground here.\nThe little girl in pink shirt and beige pants throws the frisbee.\nA car driving down a street near stores with bicycles outside of them.\nThe female tennis player is heading towards her next match.\nthere is a bench on top of bricks by the water\nLady loses her ski on a snowy hill.\nA yellow cat wears a blue plastic sports hat.\nTwo boys carrying hot dogs and other snacks at an outdoor sporting event.\nA beach with an area with umbrellas and an open area without them.\nA close up of an apple mouse and the numberpad of the keyboard.\nA person looking at their cell phone at another person taking a picture.\nA chicken sandwich and sweet potato fries on a plate.\nTwo teenage boys playing a game of frisbee.\nTwo people holding umbrellas looking at a statue of a man.\nA group of people waiting in line to board a train.\nA few snow skiers are going a mountain slope.\nA woman sitting on a bench at a park.\nA man riding a skateboard while a group of people watch.\na close up of two slices of pizza on a plate\nA plate of vegetables arranged with flowers and herbs.\nan air plane at an air port run way\nA giraffe laying on the ground looking forward.\nA living room filled with furniture and a flat screen TV.\nA giraffe standing underneath a beautiful rainbow in a cloudy sky..\nA coffee cup, food, and a passport sitting near each other.\nThe meal is prepared and ready to be eaten.\na soldier is carrying a couple of bags\nDinner is served in a tray on the table.\nA blue sign that is pointing to the restrooms.\nA person on a motorcycle making a sharp turn in the dirt.\na black and white photo with a double decker bus in color\nA woman wearing a white t-shirt and visor with pink shorts playing on a tennis court.\nMessy apartment in the middle of packing for travel.\nA close-up of a laptop on a desk with a book.\nThe group of friends are enjoying their drinks.\nA washroom with many photos hanging all over the wall\nA girl is dressed in all red holding a red umbrella.\na bunch of stuff is loaded in the back of a red truck\nA giraffe with dark spots lounges in the grass.\nA beach with several kites flying just slightly off the ground.\nA tall tower with a clock on it at night.\nA man sitting at a table using a laptop computer.\nA market is shown on the side of the road.\nSeven carrots of varying sizes lie on a table\na person riding a bike wit ha dog in a basket\nA person on a skateboard being watched by a crowd.\nA street sign showing the intersection of Main and B.\nA yellow bus that is sitting in the grass.\nA bowl of food contains meat and broccoli.\nSurfer on knees on surfboard while riding wave.\nA herd of sheep grazing on a lush green field.\nA group of men standing around each other playing a game of baseball.\na dog sits under neath a chair with a person in it\nA girl and a man are playing Frisbee on a lawn.\nA man is holding a box of some sorts near a bus and someone wearing a strange outfit.\na little girl playing a game of wii golf\nA beautiful young woman laying on her stomach in front of a laptop.\nA group of skiers with backpacks carrying their skies up a mountainside.\nsome people walking across a road with a sign on it\nA close up shot of a giraffe against a blurry background.\nClock in middle of a sculpture on top of building\na man standing next to a laptop and bottles of beer\nA brick sidewalk of various colored bricks next to a street with cars driving on it.\nA plate with broccoli, potatoes and a meat with sauce arranged on it.\nThis is a picture of a women trying to figure out where her keys are.\nA woman gesturing with her hands and sitting at a table with a computer.\nA pile of veggies next to a pile of bananas.\nA skateboarder doing tricks on a ramp in the sun\nA man and girl are standing on a field holding baseball gloves.\nTwo people hold up tennis racquets over a net.\nA pizza is topped with vegetable strips and garnishment.\nA cup with three pairs of scissors sitting on a table.\na close up of a cat paw near a book\nTwo skiers are traversing up a tall mountain.\na man with a hat and a baseball bat swinging at a ball\nsandy deserted umbrella lined beach with houses on top the cliff\nAn historic training sitting on railroad tracks.\nA vase filled with flowers on top of a table.\nSomeone flying a kite in the sand on the beach.\ntwo little bird on a tree touching beaks\nA kitchen painted white with an automatic dishwasher and a large window.\nMany cartoon modeled objects are in the sand.\nA blue, white and red fire hydrant sitting on a sidewalk.\nA woman sitting on a beach taking a picture of a number of kites\nthe table is set with many things to eat.\nThis girl is happily filling her plate with the healthy and creative food choices served at the buffet in this yard party.\nFresh fruit for sale hang by the side of the street.\nTwo giraffes are standing side by side in a field.\nA man standing behind a white frisbee on a lush green field.\nA large number of cattle confined in a small area.\nA person riding a brown horse in full dress.\nA large airplane sits at a gate at an airport.\nA group of children sitting in a red wooden canoe on the seashore.\nA skier flips upside in the air performing a high jump in the mountains.\nA couple of men racing motorcycles next to each other.\nA painting of a luminous glass bottle seems to glow with inner light.\nThe dog is lying on the white sheet.\nA cow sculpture sits on top of the grass.\nA group of motorcycles are sitting in front of a building.\nA tennis player looks at a tennis ball as she lifts up a tennis racket.\nA man holding a square shaped pizza pie.\nA line up of motorcycle cops riding motorcycles on a street.\nA person holding a skateboard with a dog tucked in their jacket.\nA boardwalk with a fence and bench lit by streetlights.\nTwo yaks are standing in a grassy field.\na barbecue sandwich with onions in a paper tray\nA jet waits on the runway of a mountain airport\nthe horse is bending its head over and grazen\nA small yellow plane is leaving the hangar.\nA tree-lined city street with car and motorcycle traffic.\nHe is skateboarding down the wall at the skateboard park.\nThe gray elephant family is crossing the ditch.\nA businessman giving a slide show presentation in a meeting room.\nThe two young children are sitting at the table together.\nA girl in white shirt and blue shorts playing tennis.\nAn odd looking mechanism sits on a dirt road while beyond it someone rides a bicycle and in the background small flags are flying.\nA fuzzy black cat is sitting on a laptop computer.\na plate with some eggs chicken and tomatoes on it\na woman is standing at the beach with a surfboard\nTwo computer monitors sitting next to each other.\nA person sitting down eating a sandwich next to a street.\nSerious looking couple with light brown Teddy bear, side sun light.\nTwo trains sitting side by side on the tracks.\nA woman in glasses is taking a bite out of food.\na group of sheep are all outside in the soil together\nAn empty bathroom with a toilet and sink.\nAn egg is served on top of a small pizza.\nThis is a public restroom that is fully tiled.\nA group of people standing around a van in the rain.\nTwo men are in a green train with yellow lettering.\nThe little boy is brushing his teeth with a toothbrush.\nA dog jumping into the air to catch a toy.\na baseball player getting ready to hit the ball with a bat\nA pie sitting on top of a stove top oven.\nA bunch of very cute signs hanging by a business.\na close up of a bowl of fruit with oranges\nThe traffic light is visible for all of us to see.\na couple of people on a motorcycle dressed as santa\nA person snowboarding down a slope at an angle.\nTwo women walk near a man skateboarding with a child.\nThe fork sits next to a piece of chocolate cake.\nSome grilled fish is on a white plate with a fork and some carrots.\nCars driving on a road near traffic lights.\nSlice of baked dessert item on platter ready for consumption.\nA man holds his arms in the air while standing in the snow.\nA man standing in a bathroom brushing teeth while wearing monster mask.\nA plate of pizza on a restaurant table.\nA black and white cat sits on a wooden porch\nThe man watches his reflection while brushing his teeth in the mirror.\ntwo bento box meals with meat and vegetables\nA digital clock shows the current time at 653\nA baseball player holding a baseball bat in the game.\nan image of a close up of food with meat and veggies\nA small kitten figurine on top of a cellphone screen.\nTwo light brown cows standing inside of gated corral.\nA beach filled with lots of people next to the ocean.\nA street light that shows, horse crossing on it.\na large truck in a field with trees in the background\nA man riding on the back of a white surfboard with two small dogs.\na bench at a train station with seating on the front and back\nA woman presenting cupcakes with lit candles to a baby.\nA toilet in a stall with the toilet seat up.\nThere is a man dressed in a purple tie and black suit.\na street full of people walking and one riding a small motorcycle\nA large group of people looking at an elephant behind a fence.\nan image of a group of people in the woods playing with frisbee\na elephant balances on a stepping stool\nThere are three people posing with their drinks.\nA city street at twilight showing a bus crossing the intersection and people standing on the corner.\nTabby cat sitting on the hood of a blue car.\nA young man jumping a metal railing on top of a skateboard.\nA person with their hand on the mouse of the computer\nA white bathroom with a toilet and a brown and white  tiled floor.\nThe large tray has a large sandwich, two pickle slices, and a bucket of fries.\nA group of people sitting around a living room.\nTwo birds sit near a plate of partially eaten food.\nsome boys having some food at a table together\nA dish with mean inside of it .\nThree people smiling and sitting at an outdoor dining table that has place settings for four plates.\nA baby sitting in the middle of a bunch of teddy bears.\nA tennis player about to hit the ball.\nA train is waiting at the station for passengers.\nA para sailer approaching the beach on  a sunny day\nA man riding a skateboard is making a jump over a bench.\nA city bus stopped in front of a building.\nA piece of cake with many colours on a plate\nAn umpire gets ready to call a player safe or out.\na person with a red umbrella a building and a car\nA blue jacket laying on top of a fire hydrant\nA young baseball player winds up for a pitch.\nImages on the same man song tricks on a skateboard\nA pretty little girl standing on a hardwood floor.\nPeople are laying on a sunny beach near the water.\nA woman looking at a website on her computer.\nA person with skis and gears standing in the snow.\na sky full of kites floating in the wind\nA stuffed blue bear with a tag in a room.\nA man eats a pizza in a small restaurant chain\na man wearing protective gear is on a skateboard\nA woman surfer walking along the beach sand.\nA silver oven door is reflecting the wooden floor.\n2 girls laughing while one holds a telephone\nA long white bath tub near a white toilet bowl\nA white plate of food on a table.\nThere are many doughnuts and pastries arranged on platters\nA silver colored refrigeration unit, in a kitchen.\nA person that is playing in a tennis game.\nA dog chews on a box in a grassy yard.\nthe girl id licking the spoon of batter\nA man throwing a Frisbee in a parkland\nA small apple tree sitting next to a  wooden fence.\nA book shelf filled with lots of books.\nA living area with a futon, chair and a window.\nA display case at a store filled with lots of different vegetables.\nA cat is sitting on a car hood on a wintery day.\nA man playing tennis with two people watching the game\na clean bathroom that has a big mirror\nA person that is surfing in the water.\nA pile of submarine sandwiches sitting in a stack.\nA white table with a bottle of soda and a hotdog.\nA plate of chocolate donuts and one has sprinkles on top sitting on a blue platter on a table.\nA man in a baseball uniform about to throw a ball.\nA child eating a slice of pizza at a table.\nA group of motorcycles parked on a dirt parking lot in a mountainous region.\nA large kitchen with a metallic refrigerator freezer and a center island.\nA man who is in the air with a skateboard.\nA young man riding a skateboard on a walkway.\nA young person on skis on a ski slope\nA banana split with white and dark chocolate\nA white dog sits in a basket with wheels on the floor.\nA woman poses with a large teddy bear.\nA corner of a room with a very big sink near a toilet.\nA man and a woman with cell phone in hand behind table of food trays.\nA man unpacking a laptop computer in his living room\nA tennis player gets ready to hit the ball.\nA sign for a restaurant and bar on a building.\nA young child brushes their teeth with a blue toothbrush.\nA small toy is sitting on a plate of pizza.\nA red, yellow and white transit bus traveleing down a street.\nA herd of dairy cows in a field behind a fence.\na cat looking out from an open doorway\nA baby sitting on a kitchen floor in front of an open refrigerator.\nA piece of broccoli partially surrounded by knife blades.\na few drag queens make some cake and eat it\nA cat sitting on top of a desk.\nBlack and white photo of three suitcases stacked on top of each other.\nA man sitting at a table with pizza.\nA surfer riding on a wave well if it's crash in the ocean\nA man and dog are interacting on a bed.\nTwo men sitting in the snow with their snowboards on while one man is standing.\nA large truck on a open city road.\nFour remote controls are placed next to a Universal remote still in its package.\nA person selecting some bananas from a bunch.\nA young woman feeding cattle on a dairy farm.\nThree young men eating food while sitting on an indoor bench.\na person on a bicycle wearing a hat in a parking lot\nA teddy bear dressed in a pair of underwear sitting on a chair.\nTwo zebras that are standing together in a field.\nTwo people sit on a couch by a guitar.\nA cat that is standing over a bowl.\nA silver train parked in front of a train station.\na waterway and a train going over it on a train bridge\nA cake that has dogs around and on top of it.\nA beach area that has seagulls on the rocks and sand near the water.\nA woman surfer riding a wave crashing behind her.\na man on a snowboard is on a ramp\nA cartoon image of a man on a pair of skis.\nBatter, catcher and umpire at a baseball game.\nA picture of some oranges stacked on top of each other.\nA man standing on home base with a baseball bat.\nA large teddy bear sits at a yard sale.\nA room with holes in the wall and dirt on the bed looking utterly disgusting.\nA white sink sitting next to a toilet under a window.\nA utility truck parked on a incline covered in graffiti.\nA woman sits at a table with an open laptop in front of a screen.\nA group of people standing around a elephant.\nA man sleeping on a couch holding a ripe banana.\nPeople in a hall, bags and suitcases on the conveyor belt\nThe group is going skiing  on the snow.\na girl choking up on the bat waiting to hit the ball\nA person taking a picture on their cell phone\nA person flying through the air while riding a skateboard.\na cat layling on a red blanket and looking relaxed\nA personal pizza and beer on a table\nSeveral cows are standing near each other in the grass.\nIs that a tiny computer next to the phone?\nTwo people stand next to a grill with hot dogs talking.\nOne white sheep standing still on the pasture near a dried up tree.\nThis is the grill of a large truck.\nA man plays with a frisbee in a grassy field.\nA train on the tracks up on a bridge.\nA large airplane mid flight among the clouds.\nThe people are walking down the street with their umbrellas up.\na bus that is parked in a parking lot\nA group of bikers make their way up the city street, as a line of buses park by the sidewalk on the opposite side of the street.\nA small bathroom with a vanity on one side and the shower on the other.\nA toy kitchen with a play sink, stove and oven.\nA girl wearing protective gear while riding a skateboard.\nTwo teddy bears in front of two vases of flowers.\nA man in a den playing with remote controllers.\nA bag of luggage filled with personal items.\nSmall celebration cake on a table with happy birthday decorations.\nwoman in a hat feed a giraffe out of hand\nTwo giraffes, and antelope and some zebra in tall dry grass\nA man on top of a car standing next to a group of mountain goats.\nA cat is in front of an open refrigerator door.\nTHIS IS A PICTURE OF A KITCHEN ISLAND WITH SEATING\nA woman is paddle boarding down the river.\nA young lady playing soccer alone on a soccer field.\nA big bunch of ripe yellow bananas on display.\nA pair of men playing a game with some remote controllers.\nA laptop and some suitcases in a room.\nA group of people standing on the beach watching a low flyinf plane go overhead.\nKitteh at rest on somebody's black and white shoes\na couple of people sitting on a couch plays with a wii remote\nPlate of food including rice, meat, and vegetables.\nA blurry image of an object with signs behind it and motor bikes.\nA baseball player in a white uniform holds a bat up while standing near a. catcher and an umpire on home plate.\nA group of white sheep walking through a wide grassy field.\nA kitchen with many cups on the window sill.\nOne tall giraffe on top of the dry terrain.\nA man riding a red scooter down the street.\nA man is standing on base at a baseball game.\nAn elephant is the focal point in this photo.\nA little girl in a store playing with four large white Teddy bears.\nWoman in red shirt on a horse in a river.\na man is talking on a phone outside\nAn older man is examining a table of bananas.\nThe front of a store with its doors wide open\nTwo children sitting on a skateboard riding it on down a slope.\nDogs gather to eat food out of a metal bowl.\nA bunch of hot dogs sitting next to each other on a table.\nA cowboy boot filled with flowers sitting on a bannister.\nA white toilet sitting next to a white bath tub.\nTwo bears in a sunset sitting on a hill.\nA young man holding blue handled scissors to his tongue.\nA train is shown next to a platform.\nWomen smiling looking into a mirror while fixing their hair.\na living room with some antiques and a book case\nA kitchen with light wooden cabinets and an island in the middle.\nA dog is tied to a cart on the side of a motorcycle\nMeat with lentils, rice and vegetables sit on a blue plate on a wooden table.\na fire hydrant stands before a partially visible cave\nA costumed employee is holding an open umbrella.\nA zebra herd standing around in the grass.\nAn ornate building is viewed by a crowd.\nSeveral Air Canada jetliners parked at an airport.\nA man in a suit standing beside his bicycle.\nTwo small beds are now together to form a single bed.\na close up of a person holding a book near a dog\nA bunch of sheet and geese in a field with a bible quote\nA woman surfing in the ocean and riding a wave that is crashing behind her.\nBirds are in the water and sticking their heads in\nAn Asian man riding a motor scooter on a street\nsome boats parked on the side of the river\nA toothbrush and a mirror in a bathroom.\nWinded dog sitting and eagerly waiting for a frisbee to be thrown.\nA piece of luggage sits by train tracks with passengers waiting.\nA bunch of cats sitting in a fenced in enclosure\nTwo plates have what looks like a hot dog and seaweed.\nThis fruit basket contains orange and green fruit.\nA shot of feet riding down a street on the skateboard.\nA person that is brushing his teeth in a room.\na little wooden bench sitting in front of some trees\nA group of people riding on a bush.\nTwo Zebra in an empty field with trees and buildings behind the field.\na dirty kitchen ith various appliances in it\nThere are two people standing outside on a balcony of a very large living room\nA cat drinking from a toilet in a bathroom with toothbrushes.\nA curly brown dog is laying beside a novel.\nA police vehicle carries away a car from the scene of an accident.\nA piece of broccoli next to a kitchen knife setting on a painted wooden bench with the paint chipping of it.\nA dog doing tricks commanded by a person.\nTwo planes sitting in a field on a cloudy day.\nThe airplane is in the air flying over the mountains.\nA gray dog has a pink frisbee in it's mouth.\na close up of a woman wearing a shirt and tie\na green and white street sign in a busy intersection in a city\nTwo horses are sniffing a frosted cake as a lady stands in front of them with a plate.\nTwo hotdogs with a hand full of fish snacks\nA red brick tower with a clock in it.\nA computer mouse sitting on top of a laptop keyboard.\nThe two buddies are cross country skiing through the mountainous region\nA shiny kitchen gas stove and oven with a black counter.\na cat almost all the way inside the bowl of a toilet\nA desk with two monitors, a keyboard, a mouse, and a binder.\nA kitchen with a refrigerator, ovens, a sink, and cabinets.\na black cat is laying next to his colorful toy\nA glass vase with a green plant in it sitting in front of a window.\nA young woman sits at a computer in an office.\nA girl standing under an umbrella reading a book.\nA bunch of stuffed teddy bears with flag shirts\nA foot next to a snowboard on the ground.\na house very big showing a city clock\nA baseball player jumps over another to catch a ball.\nA person sitting on the floor  playing computer games by holding remote.\nSkis displayed on a sedan mounted ski rack.\nA woman eating a hot dog bun covered in sesame seeds.\nA bowl of rice, meat, peas, and carrots.\nA stop sign, a kosher butcher sign, and a Rite Aid sign\nThe concert audience is composed of many young Indian men, some taking pictures of the performer.\nA giraffe standing with a bird flying in the distance.\na baseball player swings his bat at a ball\nA boy and his younger sister looking at a steam engine\nA surfer is gliding through a small wave.\nA young woman is playing a tennis game.\nA horse galloping through the sand on a farm.\nA child's hands hover over a small uncooked pizza sitting on a tabletop.\nThree people posing for a picture in a parking lot.\nThe person is sitting while holding the string of a colorful kite.\nA large herd of sheep are grazing in the snow.\nA man sits on a bench looking at a book in the subway.\nA group of men stand playing a video game.\nthere is a male skier that is riding down a mountain\nSmall crocheted teddy bear on the side of a quilted blanket.\nA bus is in traffic near a sidewalk and eatery.\nA white and black passenger bus at a paved intersection.\nA group of people hand flowers to a man.\nTwo people with green shirts caring for some animals.\nA variety of fruit - including oranges, apples, pears, and Kiwi fruit - sit in a cardboard box.\nMature man speaking on microphone in front of curtains\nSome young soldiers are looking at their pictures.\nA passenger bus parked in a parking lot.\na couple of elephants are in a field\nA bike standing on a sidewalk next to a road at sunset.\nA tabby cat sleeping with its head on a laptop keyboard.\nA train yard in a city with a train in the distance\nThere is a close up photo of an elephants face wearing a garment\nHorses stand around a horse trailer grazing and drinking.\nA red fire hydrant outside a shopping center.\nTwo teenage girls performing chores in a kitchen.\nTwo men overlooking the activities of students on small computers.\nTwo birds are flying over a sandy beach.\nthere are many different dishes on this table\nA woman smiles as she eats a lunch of Chinese food.\nThe cat is at the desk near the computer.\nA birthday cake is shaped like a sheep.\na topless man laying on the bed\nsome sheep standing together while surrounded by some tall grass\nThe are two bananas, the brand of them are dole.\nA small bathroom with a yellow towel on the floor and a rack with magazines and various other items.\nA picture of an open air zone that looks incredible.\nPeople walking on the train platform pulling luggage bags on wheels\nThe room in the old house is ready for the new mother and baby, decorated with vintage finds.\nA group of kids is skiing in the daylight.\na bunch of urinals are lined up on the walls\nA group of men standing next to each other holding a racquet.\nA cat that is laying down next to apples.\nTHERE ARE DIFFERENT SIGNS ON THE STREET\nA woman sitting on the ground in an organized room.\nA person that is eating some food on a table.\nA giraffe with his long neck bent over and his mouth on the ground in an outdoor area.\nlarge plate of french fries in sauce on a white tabletop\na yellow car turning on a somewhat busy road\nA group of people on a street next to a food truck.\nTwo people pose together for a photo of themselves on a ski resort besides the ski lift\nThe pinnacle of the building is illuminated at night.\nThree elephants standing by a man made waterfall.\nsome table and chairs sitting around a building with a clock on the top of it\nA group of boats tied to the rocks near the shore.\nAn acoustic machine, speakers and remote control are sitting on a table.\nThree people pulling suitcases behind them on a wet pavement\nTwo sheep are standing on some short grass.\nA simple computer desk with a desktop monitor keyboard and mouse and a laptop computer.\nThe two people are in the kitchen cooking.\nA dirty train is sitting on the train tracks.\nA woman ists on a chair while a child stands under an umbrella with red dots.\nA small baby bird on a piece of metal.\na all white bathroom with blue tape on the walls\nA blender pitcher on the counter near a sink.\nA horse with a white stripe is in the woods.\nThe dog is in a field on the side of a parking lot.\na close up of a propeller on a plane in the air\nAn elephant is walking across a dirt road.\nPeople are standing outside of an old airplane.\nThere is a blue pick up truck broken down on the road\nMan standing up playing a video game on a TV.\nA bed with a comforter that is slightly pulled down and pillows that have a note on one of the pillows.\nA sign that reads 'plaza drive' is being displayed.\nA mother and her child giraffe walking in tall grass.\na white motorcycle is parked in a spot\nA living area with a couch and a television.\nA boat is going down the middle of a channel.\nA three-piece bathroom with wood shelves and a round mirror.\nA black bear is standing by the rock\nThree zebras walking in a dry grass field.\nA young person riding a body board on a wave.\nA tan building facade with a bench out front.\nA little girl standing on top of a tiled floor.\nA black bear lying down in the grass next to trees.\nA train is moving swiftly through the station tracks.\nA red tray of food on a table.\nBanana on table with three colored plastic wafers.\nA woman holding a dog above a bowl on a counter.\nThe Christmas presents are left in the kitchen.\nA glass full of drink is on the table next to a slice of pizza.\nA bus stop and sidewalk near a park.\nWoman standing in grassy area near baseball field.\nRed wine being poured into five crystal glasses.\nA girl dressed in pink sports gear stands on a snowboard at the top of a snowy slope.\nWomen playing in field with flying disc during competition.\nA kitchen area with stoves, coffee maker and cutting board.\nA small elephant toy pushed against an orange.\nA pizza with mozzarella, tomato, and basil on a table with silverware.\na bunch of cars that are on a street\na woman in white shirt talking on a cellphone.\nThree people ride their horses down a beach.\nThe large room has a lot of furniture in it.\na group of motor bikes parked in front of a store\nDog sitting in the back basket of a bike outside the shop\nA lady scratching her head in the bathroom.\nA long boat with an ad on it floats down the river\nA front end of a boat sitting over a body of water.\nThree people are getting off the train with their luggage\nA black and white image of a line of umbrellas\nA wall dedicated to white cloth with suitcases out front\nA small airplane sitting on the tarmac at an airport.\na person riding a surf board on a wave\nTwo trains on separate tracks travel through a city\nA towel with his nose right next to the camera looking towards it\nA lot if people are in the conference too\nTwo teenagers with backpacks are on the street corner.\nAn animal is eating some food out of a bucket.\nA bushel of greens are on the table with various fruits.\nA man wearing a suit and a blue tie\nA man holding a broom on a surfboard with a dog.\nBlack and white boat sitting at a pier near a building.\npart of a road with assorted food on tables for sale\na girl petting a pony on the back of it's neck\nA green train engine moves down the tracks with many cars behind it.\nA couple of giraffe standing in front of a cage eating hay.\nA young girl stands on her bunk bed holding a paper.\nA group of three women sitting at a table sharing a cup of tea.\nTwo halves of a sandwich that is on a plate.\nsome people are playing ball in a field\nA bearded man poses with his breakfast meal at a cafe\nA toilet that is sitting in a bathroom under a window.\nThere are two children who are holding tennis rackets.\nMultiple images overlaid of several women playing frisbee.\nA slice of lemon pie with frosting on a white plate.\nGroup of women on a soccer field with the ball in the air.\na male in a green shirt a bowl some food and a pan\nSeveral giraffes stand near each other in a large grassy area\nA ski boarder riding up a big hill doing tricks.\nAn abstract graffiti on what looks like an old train\nA boy holds a baseball glove on his left hand.\nA person doing a tail slide on a rail in a skate park.\nThree people in an art gallery using their phones.\nA teen is seen mid-jump while flipping his skateboard in an indoor skate park.\nA fat cat laying on a rug and shoes.\nA hotel room features a balcony over looking the water\nA man riding skis across a snow covered slope.\nThe shadow of a skateboarder in the middle of a stunt.\ntwo small children sit next to each other\nA plate with a hot dog, chips and a strawberry on it.\nA person laying down on a bench outside.\na person in a black shirt a horse water and trees\nbathroom with its door open and is very clean\nA hot dog sitting on top of a white plate.\nSheep and lamb standing in pasture by stone fence.\nAn old picture of two women with two small sheep\nA person near a large screen with others at a long table.\nA man and a woman cutting a cake with a large knife.\nA room with a tea pot and two blue and white vases.\nThe skiers are ready to try the snowy slopes.\nA woman looking up at someone taking a picture\nA person with something in their mouth while holding a cell phone.\nA group of giraffes eating leaves off trees.\nan image of a dog eating on his plates\nA baby eats some cake with a fork while several people hover over him.\nAn orange truck driving down a street full of men in the back.\nA red train sits next to a passenger platform at a station.\nGroup of different types of vegetables sitting on a metal railing.\nMan dressed in black snowboarding down a mountain.\nThe little girl pokes her finger into the sheep cage.\na tower with a clock on it in front of a street light\nA red light on a yellow contraption in a n intersection\nGroup of people riding on the top of an elephant.\na white and red boat with a bunch the people on it\nA group of young men riding skateboards in a skate park.\na dog passing in front of a girl on her cell phone\nLighthouse on a point with sailboats near it\nBrown cat sticking its face into a pair of white shoes.\nA harbor with various boats and people walking on the pier.\nA lone giraffe walking in dry vegetation in front of a tree.\nTwo luggage cases near a desk and bed\nA clock with glow in the dark hands, sits in a dark room.\nMan playing game with Nintendo Wii control next to kid carrying a cup.\nKids swimming and surfing in shallow water on a beach.\nA small snowman with a person holding a carrot next to it\ntwo chicken patties filled with cheese in the center\nA work desk with a computer books and keyboard\nA plate that has different types of food on it.\nThe kitchen has wooden cupboards, plenty of counter space, and a sink adjacent to the oven.\nA group of sexy young ladies wearing bikini tops.\ntwo giraffes headed into a building and another one standing by the fence\nA girl serves a tennis ball on the court.\nA person is holding a purple bear with no eyes against a yellow back ground.\nA person is holding onto a cellphone somewhere.\nA white coat on a bench on paver stones.\nA man in a tie sitting on a wooden log.\nA small girl is on the beach near a kite.\nA couple of men adjusting their ties in front of red steps.\na person riding a horse next to a baseball field\nA yellow and grey train on train tracks.\nthere is a man with a beard sitting in the grass\nThe little boy is pulling the suitcase by the handle.\nTWO PIECES OF PIZZA BOTH DIFFERENT IN A BOX\nA large bear standing on top of a stone ground.\nA street scene showing a group of cars stopped at a red light.\nThere are two brown eggs in a metal bowl\nTwo men on a boat with a dog on the front\nA woman walking a dog by a table of food.\nTwo people under an umbrella on a wet sidewalk with stars.\nA tall building with a massive golden clock on it's face.\nA large building with a clock and some trees.\nTwo men standing in a kitchen preparing food.\nThere is a red light on a traffic light\ntwo women riding down the snowy hills on sleds\nA very old fashion looking red smaller bus.\nA group of people are around a birthday cake.\nA close up of a cut into piece of food\nA large plain with a couple zebras and many antelope.\nA baseball player wearing green and white standing next to a baseball player wearing red and gray on a baseball field.\nA  young woman sitting near a tree eating food.\nA man is throwing a Frisbee into the air.\nA beer and a slice of pizza on a table\nThe large bathroom is reflected in the mirror.\nan image of a bear that is in the woods\nA naked woman sitting in a large suitcase.\nA photo shopped photo is shown with a tiny fire hydrant.\nPerson holding a toothbrush under a faucet with running water.\nA bunch of sheep together in a very narrow area.\nA airplane that is sitting on a tarmac.\nThe surfer is riding the wave on his surf board.\nan image of two zebras in the middle of the wilderness\nA man in a T-shirt is typing on a laptop.\nAirplane at airport loading gate under hazy skies.\nTwo giraffes are neck to neck in an enclosure.\nA man holding a surfboard is standing by the ocean.\nA zebra grazing on grass in it's natural habitat.\nA lighted fish tank above a toilet in a bathroom\nA young man holds his skateboard while in a courtyard that is next to a large rock building.\nthe traffic signs are easy to read for the street\nBig Been clock tower in London, England on an overcast day.\nA wooden table with a purple laptop and orange pen.\nA dirty brown teddy bear in a trash can.\nMany people flying kites on a cloudy day.\nSomething delicious and sweet is done baking in the oven.\nTwo large piece of broccoli laying on a piece of paper.\nLarge sized truck with a medium sized black dog in the passenger seat.\nA kitchen that has a wooden cabinets with a wine holder.\ntwo people standing in the snow by a sign\nA batter is getting ready to swing at a pitch.\nA white clock tower at the top of a tiled building.\nA person is doing a skateboard trick outdoors.\nA remote control sitting on a wooden table.\na group of people stand under neath a tent on a beach\nA baseball player throwing a ball on a baseball field.\nTwo people waiting at an intersection carrying umbrellas\nSome people that are walking on a sidewalk while it is raining.\nAn orange motorcycle is next to a red car.\nA laptop and two controllers on a small table in front of a couch.\nA living room has a large animal cage in it.\nLarge amounts of desserts set on different platters.\nAn assembly line machine has many goods on it as two people stand in the background.\nTwo people are playing video games in a living room.\nA young boy rides his skateboard amongst pedestrians.\nGroup of white sheep walking in a field of grass together.\nA keyboard, mouse, and computer monitor on a desk.\nA living room filled with furniture and a flat screen TV.\ncherry tomatoes  and various food dishes on a table top\nA building with large windows sitting inside of a building.\nA couple of women standing with a boy inside of a kitchen.\nman crosses skis while jumping in the air\nA small sewing kit sitting next to a pair of scissors.\nA yellow and blue bus is going down the street.\nTwo large plates off a variety of food .\nHorses communing with each other on a shady street.\nA group of people order food from a food truck.\nA man cooking a large number of hot dogs on a grill.\nMany motorcycles are parked side by side.\nThree pieces of cheese bread are on a plate.\nA man flying through the air while riding a skateboard.\nA bunch of keyboards with mice on top of them.\nA woman cutting a birthday cake on a tray.\nTwo toilets sitting on a sidewalk with a cardboard box.\nA black bear is emerging from the grass to cross a paved street.\nA laptop with a small screen is chained to a desk.\nA man on a piece of equipment resembling a bicycle that has very large wheels.\nA kitchen counter has a coffee pot and microwave.\nA man is hitting the tennis ball with a racket\nA black Sony remote control being held in a hand\nA young man standing next to a racecar on a display lot.\nA man doing a trick on a skateboard off a rail.\nOne giraffe from a group of two reaches through a gate toward a group of people standing outside the gate.\na woman and child watching a herd of elephants in a gated area\nA group of men in suits sitting on couches talking.\nA man at a table with a bowl of food.\na kitchen cupboard with the doors open and plates and bowls on the shelves\nA cruise ship docked for letting passengers off to port\nCars move through an intersection below a green stoplight.\nA big yellow train travelling by a road.\nA black cat is on a laptop computer.\nTwo travel bags on shelf with a metal rail.\nThree Giraffes are standing in a row and they are all different sizes.\nA woman sitting at a table eating a giant hamburger.\nA bathroom with a white bath tub and a sink.\na big sausage in a roll with cheese and cups of sauce and a person\nA white dog on a bed looking in a box.\nA photo taking of the inside of the building looking at three balconies and the clock.\nWisps of smoke on a public street at night.\nA wooden chair that has a black vase with two flower holders at the top, and two sets of flowers in the vase.\nA beach covered in kites next to an umbrella.\nMany people sit at a table eating a meal.\nA giraffe standing in a small piece of shade.\nA dog laying on the side of a car door.\nThe evening sky  on the lake foretells hope \"Red sky at night, sailors delight.\"\na small boat parked next to a bigger boat in the water\nA woman stands with an umbrella next to a building.\nA red bike locked up next to a a pay meter.\nPick up truck parked by side of road with white building in distance\na group of people that are flying kites\nPoultry and broccoli on white pizza, with lemon slice.\nA man dressed up in zombie costume is wondering around the street.\nA kitchen counter full of freshly picked vegetables.\nA bird themed clock sitting inside of a green box.\nA giraffe munching on leaves with man standing in front.\na man is sitting at a table on a train\nA falcon sitting in a pond of water.\na big colorful buss parked on the side of a road\nA airplane flying through the sky with a leaf on it's tail.\nAn intersectional street sign stands in front of a vast mansion.\nA broccoli and cheese quiche with a piece missing.\nan image of a boy walking on the beach with surfboard\nTWO PEOPLE ARE TRYING TO GET A BICYCLE IN THE BACK OF A VAN\nTwo red and white cows standing in a pasture.\nA man holding a frisbee on a beach with a clouded sky.\nA young child is swing at a ball with a plastic bat.\nA man looking at a laptop next to a beer can and speaker.\nAn Alaska Airlines passenger jet sitting on top of a runway.\nPeople reaching for sandwiches on a plate sitting on a countertop.\nA man about to run to first base after just swinging a bat\nA group of three boys sitting on top of a couch.\nA woman is jumping in the air with a frisbee\nA yellow and black bird perched on top of a dead sunflower plant.\ndog sitting on dog chair with toy next to its paws\nA close up of a care with an advertisement for a movie.\nPastries shaped like bear heads are displayed for sale in Japan.\nSomeone is touching a white plate that has a sandwich and chips on it.\nPeople at a table with cups and a plate with donuts on it.\nA large bathroom with tile flooring and white fixtures\na trunk of a car filled with a lot of luggage\nlady in the jacket is sitting on the concrete bench smiling.\nA group of cows laying next to some trees.\nsome rice chicken broccoli and carrots on a black plate\nSome yellow school buses parked in a row.\nA woman in grey shirt on park bench with cellphone and bicycle.\nA couple of foreign language road signs.\ntwo zebra standing in front of some goats\nMany people hold umbrellas on the street during a rainy day\nA dog catching a frisbee in its teeth in a field\nPlates of food are on a ledge overlooking a soccer game.\nA person sitting on a beach with some animals.\nA desk with two laptops on it and both turned on.\nTwo people watch TV on a couch with their legs propped up.\nfourt plates of vegetables and fruit sitting individually in each\nA little boy standing next to a sheep smiling.\nA city street is busy with cars and a clock tower above.\nA teddy bear sits on a stair railing.\nA black dog in the snow playing with the Frisbee.\nBusy traffic in a city intersection at night.\nA bathroom complete with a toilet, sink and window\nA man in a suit sits alone on a bus.\nPeople walk through a shop with flowers on the table.\nA counter top that has a mug on it.\nA ship is sailing away from the dock.\nTwo bears relaxing in a pond side by side.\nA woman with sun glasses on a cell phone.\nSki patrol with helicopter at accident on steep ski slope.\nThis is an incredible picture show of individuals having a fabulous time.\nan airplane that is parked out in a grassy field\nA man performing a skateboarding trick on a rail.\nA toilet sitting next to a sink, towel, vase and mirror in a bathroom.\nan image of a tour bus that is parked outside a house\nA woman texting on her phone, while sitting in a chair.\nPerson wearing grey clothing on a motorcycle on a city street.\nTwo young child skiers are headed down a small slope.\nA table with two people and two pizzas on the table, one at each place setting.\nan old black and white photo of four people sitting on a bench\nsome little kids sitting in the grass with a green frisbee\nA dog watching another dog on a television at home\nThe apple computer is sitting on the bed.\nA young boy playing with a plastic ball and bat.\na close up of a cat laying on a dresser and watching tv\nA girl holding a wii remote looking forward\nA chicken burger and french fries laid on a plate.\nA group of men, standing while playing video games.\na clean bathroom with some flowers and a window\nA herd of sheep are grazing in a field.\nSeveral giraffes are near a fallen tree on the grass.\nSmall groups of people, including a person walking a dog, are scattered about an outdoor area, encompassing some streets, that is filled with classic cars.\nAn elderly woman poses for a picture in the park.\nIt is never too young to teach a child about tooth brushing.\nA man with a tie, dress shirt, sweater and headphones.\nA man on a bench is looking at a boat in the water.\nA skier is performing an advanced trick on a slope\nA bunch of street lights in a town hanging from ropes\nA woman concentrating on her work at a table in a sunny room\nThree men in military suits are sitting on a bench,\nA boy and a girl sitting down to eat a pizza.\nA carrot sitting on top of a wooden cutting board next to a small green knife.\na man sitting in a lawn chair eating food\nAn airplane is lit up as it sits on a runway.\nView of a subway train through a mirror.\na cup that has some flowers in it\nTwo bears are romping in the water with one showing teeth.\nA train going down a track beside many skyscrapers\nA young boy wearing a baseball uniform and holding a baseball bat.\nA horse walking down the road, in the daytime.\na bunch of people on skate boards ride on some cement\nA person is preparing a meal in a large home kicthen.\na bunch of motorcycles sits parked on a street curb\nA man surfing on a wave in the ocean.\nA view of a bathroom that is in the process of being remodeled.\nGroup of people holding orange and blue frisbees.\nThe backyard of a big house with outdoor seating furniture.\nA white cake with decorations of penguins and a Merry Christmas message.\nTwo people are playing tennis in an outside court.\nTwo elephants are in a field of grass together.\nA boy sleeps with his head on a pillow and an arm around his cat.\na lonely horse tied up in the desert.\nAn photo of a lake, fire hydrant, and sign.\nA pair of zebras cross a dirt road in the plains.\nA sub sandwhich sitting on a napkin next to a glass of water.\na home made pizza sitting on a table top\nA large metallic refrigerator freezer combo in a kitchen.\nA man with two children posing on snowy ski slopes.\nA couple of men standing between two large elephants.\nTwo men on a boat in a lake near a house.\nA skier and snowboarder going down the snowy hill.\nA small personal pizza sits on a small white plate.\nA man rides his bike on a deserted street.\na computer room with shelfing that displaying various electronic devices\nA cut in half sandwich sitting on top of a white plate.\nA woman talking on a cell phone while wearing a bag.\nA cow that is standing in the grass.\nA street with a street sign and a stop light\na kid skating very high on the walking steps\nA man flying though the air while riding skis.\nA group of people riding sailboats on blue water.\nA man standing next to  a woman under a kite in a tree.\nTwo woman playing with Wii remotes and a man in short shorts sitting in a chair watching.\na table full of different kinds of pizzas\nA mom holding her baby while working on her laptop.\nA man on a surfboard riding an ocean wave.\ntwo vehicles are sharing space wide enough for just one\na white steeple near the roof of a neighboring building.\nPair of elephants walking along the shore of pond in desert.\nA silver bowl filled with salad on top of a table.\nThe food on the plate looks really healthy and hearty.\nA young man playing tennis on a tennis court.\nTwo people are sliding down the mountain slope.\nTwo women a man and a boy all riding horses down a river path.\nMan laying on ground with skateboards under hand and feet being nailed by another man\nthere is a baseball player that has hit the ball\nA strange looking shower curtain in an ordinary looking bathroom.\nA man standing next to a parked motorcycle.\nA small white dog is standing on a desk chair\nA black and white dog walking down a  sidewalk.\nThe people are playing the game in the living room.\nThe motorcycle racers speed down the curvy track.\nA woman laying in bed while clutching a blanket.\nA person riding on the back of a brown horse through a dirt field.\nA PUBLIC BATHROOM WITH CLEAN FLOORS AND WINDOW\nChristmas teddy bear next to a coffee cup of a candycanes\na man swinging a tennis racquet at a tennis ball\nA bot watches while a man cuts a blue and yellow cake.\nSigns on the corner of an east London street by apartment buildings\nA bus drives down the street in a town.\nA group of people sit holding glasses and smiling at a table with several bottles.\na blond woman with a spoon and a blender\na man riding a motor bike with a usa flag on the back\nPassengers board the transit bus from the station at the loading zone.\nA baseball player slides toward a base as another waits to catch a ball.\nA group of people sitting on a trail side with a dog looking onward.\nSeveral animals cross the road with a human behind them.\nThe cat is laying on the pink blanket by a window.\nThirteen children and one adult dressed in baseball attire holding sports equipment.\nA dog is lying down on the unmade bed\nan image of man riding his bike down the street\nA couch that has several blankets on it.\nA person slicing something with a dog watching\nAn adorable little gir sitting on a park bench.\nThree red motorcycles with riders in protective gear are on the street.\nTwo cows stare out while being in the meadow.\nA turkey sandwich smothered in cheese on a plate with vegetables.\nAn elephant walking into watering hole while a mother and child watch.\nA bed with a brilliantly colored bedspread and pillows.\nsome fruit and veggies sitting on a counter\nGrape tomatoes, apples, and an onion are on a table.\nA cat is wearing a small blue backpack.\nA woman is eating food as she sits in a crowd.\nThe baseball player reaches out to catch a ball.\nA hotdog on a colorful plate with ketchup, some ketchup spilt on the table.\nA view through a bathroom doorway without a doorway, showing turquoise tile and an unfinished wall section.\nA small boat washes up onto the beach.\nA fence is put up in a desert climate.\nVery large bicycle sitting in the middle of a freshly polished flooring.\nTwo plastic baskets filled with food sitting on top of a table.\nLooking down at skiers holding their skis on the ski slope\nA lady dressed warm on a bike in the street.\nChicken sandwich, french fries, herb tomato, pepper salad with sour cream and ketchup condiment.\nThree people standing outside a small airplane on wet pavement.\na man jumping over a black box with a skateboard\nPeople crossing the street in a busy, overcast city.\nA group of people play a game of tennis.\nA filthy bathroom with a grimy tub and toilet and grime covered floor and walls.\nA \"Greenwave\" bus stopped at a bus stop next to brick buildings.\nThree skiers jump to the snowy ground in front of a tree line.\na stuffed elephant with a brown stuffed teddy bear leaning on it\nA horse stands is front of people on a sidewalk.\na girl sitting on a bench looking at her cell phone\nTwo men sitting on the street in front of a building.\nMany horses are walking near the guard rail down the side of a street.\na number of kites flying in the sky above a field of people\nA messy desk with a computer that shows a young child on the screen.\na table top sitting inside of a kitchen\nTwo giraffes neck up closeup from behind at dusk.\nTwo dogs are sleeping together on the bed.\nA tan clock tower with a black and white clock.\nThe bagel sandwich has many ingredients inside of it.\nA desert sitting in a plate that has congratulations written in chocolate\nA spacious  bedroom with access to a balcony.\nA deep red and white airplane sitting in front of a hanger.\nRings radiate from a gray bird in the water.\nA machine with multiple clocks on it with wheels.\na model airplane sitting next to a bigger plane\nA man vigorously serves the ball during a tennis match\nA toilet sitting in a unique bathroom with painted and designed walls.\nA person riding skis on top of a snow covered slope.\na kid eating from a blue plate and a spoon\nA woman with a suitcase sitting outside at a park.\nThe aerial view shows a crowd with many umbrellas below.\nA man holding a ball in his hand in a room.\nBlack and white vintage picture of a man in a suit with glasses.\nAn Apple mouse sits on a desk next to a keyboard.\nA tennis player reaching with his tennis racket at the ball.\nThe breakfast setup includes pancakes with a cherry.\nA baseball player is at the plate about to bat.\nA black case on the ground with a small tire and jack.\nA animal with a very scared look on his face and a red thing on his head.\nA white toilet sitting in a bathroom surrounded by tiled walls.\nA cat on a table next to a vase of flowers.\nA young girl sitting on a bench holding a toothbrush\nGiraffes, zebras and ostriches in a large enclosure.\nA sheep stands alert with it's face to the camera while it's offspring, head hidden by the sheep's wool, drinks it's mother's milk.\nA dog jumps in the air to catch a white Frizbee on a grass field.\na person holding a  cell phone  near a corch\nSeveral male horse riders crossing a river to shore.\nA white horse is standing on grass in the country.\nFood stands with red umbrellas on a crowded street.\nvintage black and white photograph of two baseball players\nPeople out in the ocean on surfboards by a large cliff.\nThree mirrors mounted on a tiled wall with lights.\nA person standing on the snowboard on top of the snow.\nA red phone sitting on a table by a folder.\nA man is walking down a main street.\nThe street sign indicated the names of the two streets.\nA woman holding a tennis racket in her hand.\nA close shot of a grilled cheese sandwich on a plate.\na tennis player getting ready to swing a racket at a ball\na red and white bus a bicycle and some people\nA white toilet sitting under a bathroom window.\nA white dish plated with corn, carrots, tomatoes, onions, olives, herbs and oil.\na tan teddy bear a white sheep and two other bears\ntwo girls soccer teams are playing soccer and player from each side fight for the ball.\nA box filled with two slices of pizza and sewing equipment.\nA young girl standing on a grate with a racket.\nA cupcake, piece of cake, and tort with raspberries.\nStreet sign with plants growing around it on the side of the street.\nA girl standing next to a bed standing next to a bed.\nAn elephant stands in front of a body of water.\nDisplays of deliscious looking dessert in store window.\nTwo people are walking over tracks with stuffed animals near two other men and a lady standing by a model train.\nCloseup of two laptop computers sitting on a desk.\nA Singapore Airlines commercial aircraft landing on the runway next to the water.\nA young boy jumping in the air on a skateboard\nPeople that are making a pizza from start to finish.\nA bus parked in front of a building and beside a fence.\nA man on a snowboard in the snow.\nA Skiier on trail hillside posing for picture with hands out\nA man sticks his tongue out to have his picture taken.\na man in a suit in front of a white truck\nA couple of tennis players on a large, fenced-in outdoor court.\nSmall slice of pizza sitting on a table next to the bottle of beer.\na table top with some trey of food on it\nA major league baseball player in the batting box.\nfive bagels are sitting on a silver tray\nA small sandwich with lettuce and tomato on it.\nA yellow wooden bench swing hanging from chains.\nTwo plates with dessert crepes and a cup of coffee on a red tablecloth.\nSome carrots and bananas in a small bowl\nSeveral people enjoy a day at the sandy beach.\nA hand holding an apple with the tip of a knife piercing the fruit.\nsome people trees two blue umbrellas and chairs\nA beautiful young lady looking into an empty microwave oven with lust.\nA brown horse in a grassy field with trees behind.\nA teddy bear with a red bow holds two red, white and blue pom poms\nA clock mounted to a wall next to tall buildings.\na white horse at the top of a hill\nA toilet with the lid opened placed beside a shelf.\nA Women chef outside holding a pan with food in it.\nA train has graffiti on it while it sits on the track\nA man preparing to swing his bat as another holds a glove.\nA blue motorcycle with rusty tailpipe, parked beside a truck.\na train on a track near a platform\nA bowl of antipasta with sausage and beans in it\nAn airport scene where aeroplanes are landed on the ground.\nSkiers doing stunts over a hill of snow.\na piece of bead with some sliced cheese and bananas on it\na woman laying on a bed but peeking at someone\nA large horse studding next to a baby horse.\nTwo plates of breakfast foods on a restaurant table.\nA piece of wood has a fresh pizza on it.\nSeveral people riding on horses at the beach.\nA black and white picture of a man in a suit wearing a tie.\nA baby in a high-chair being handed his first birthday cake.\nan image of a living room setting with fireplace\nthis is a piece of broccoli on a table\nA large crowd watches as a pitcher throws a ball.\nA collection of vegetables inside a grocery a store.\nA desk contianing a computer monitor, telephone, modem, CD drive, and a cat above a keyboard drawer containing a keyboard and a mouse which is above a tangle of wires and next to a bed.\na table with a white plate and knife with food on it .\nA man riding a motorcycle in the middle of the street.\nA tennis player in blue returns a volley.\nA man with a baseball bat that is standing in the dirt.\nSeveral pieces of luggage and bags near moving trucks.\nTwo slice of cake and a fork rest on a plate\nA woman sitting on chair holding an umbrella.\na male wearing white is playing tennis on a court\nA peeson at a table is eating a small pizza\nA little girl is standing in front of a refrigerator.\nA faded red fire hydrant on a sidewalk near a building.\na person riding skis on a snowy slope\nthere is a woman standing outside talking on the phone\nMulti colored scissors with a multi colored ribbon.\nA sign in front of a railroad explaining how to board the train.\na snowboarder flies through the air with an onlooker taking a picture\nA large propeller plane sitting on top of an airport tarmac.\nA man holding his arm out, holding a game remote control.\nA parking lot full of open blue umbrellas.\nA pair of surfers approach the water's edge, where the waves spread thinly over the compacted sand.\nA kid in the grass swinging a baseball bat.\na bunch of boats are sitting in a harbor\nThe little boy is too close to the stove in the kitchen.\nThis is an image of the inside of a modern kitchen.\nA man looking at a large pepperoni pizza.\nA woman posing on a skateboard on a sidewalk.\nTwo plastic model airplanes lie on the ground.\nA bathroom with a white toilet and a white sink\nA family on the beach points into the water.\nA person is riding a surfboard on the water.\nTwo sheep in a vast field during the day.\na lady standing in front of potted plants.\nA guy on a surf board riding a wave.\nCompute on desk in next to green wall area of living space.\nA male taking a picture of himself wearing a cardboard Happy Father's Day tie\nBUNK BEDS WITH LADDER TO TOP BED WITH STRIPED SHEETS\nTwo cats are laying on the keyboard of a computer.\nA couple of sheep are on a grassy field.\nA crowd of people crossing a cross walk.\nA black cat laying by two pairs of slippers on carpet.\nAn oncoming railroad train traveling down the tracks.\nA woman is hugging an orange fire hydrant.\nA young surfer surfboarding in the ocean doing tricks\nA donkey painted with stripes has a snack while hitched to a decorated wagon in Mexico.\nA steer standing next to one that is laying down.\nThree baseball players are on the field during a game.\nA baseball player swinging while the catcher waits for the ball.\nA small baby lying in an open suitcase.\nJet-skis sitting on the sand in front of the water.\nTwo parked motorcycles in a lot near a large field.\nA beach with many umbrella's and chairs with people by them.\niphone playing game while donuts are in background\nA giraffe walking through a grassy area near some rocks.\na woman reaching up while jumping to hit a tennis ball\nA boy is holding a teddy bear figure.\nA woman is on a tennis court in mid serve.\na woman holding a tennis racket by the side of a road.\nA blue room with a brown double door and a closet full of clothes with a pink television on a stand.\nA giraffe standing on top of a lush green field. near trees.\nMotorcycles parked with pedestrians nearby at outdoor event.\nPeople in a field flying a kite with large clouds in the sky.\nA blender filled with liquid on a counter.\nTwo men are working on a train at a station.\nA man that is bent over in a boat.\nA person riding a ski lift over orange traffic cones.\nA man on skis going down a hill away from other skiers\nA train moves through a heavily forested area\na male is riding a horse and some cows a street and trees\nA woman is tossing an omelet in a frying pan.\nA calico cat is standing outside a shoe store looking in.\nA gothic building with a magnificent clock tower featuring gothic columns and arches.\nPictured from above are clothing and shoes scattered on a wood floor.\nTwo giraffes on a hill and one is walking towards the other.\nMany people are flying their kites on the beach area.\nA plate of food containing a sandwich with a tooth pick, lettuce, tomato fries and cole slaw.\nA man is standing by some parked motorcycles.\nThe line of people are riding horses through the plains.\nA close-up image of a black dog in a room.\nMany motorbikes are parked on the side of a city street.\nAn individual is hiking in the snow with some skiing utensils.\nA sign for handicapped parking with mountains in the background.\nA bedroom with a bed, desk, and tv with paper and pen on the table\nA large open kitchen has wooden cabinets and white appliances.\nA man and woman sitting closely on a bean bag type chair together, and the man is holding a banana in his hand.\nA dog herding the sheep by running towards them.\nA white sheep standing on top of a dirt road.\nA giraffe stands tall among grass and trees.\nAn equestrian lady riding on a brown horse.\na shelf holding onto some assorted paperback books\na plate of pizza on a table\na small boat parked on the ground on display\nA large clock anchored on top of a building\nA passenger jet flies over houses on a coastline.\nCattle walking on dirt path through green mountainous area.\na giraffe  standing beside a building and part of a tree\nA young person is jumping his skateboarder off of a lodge.\nThe cows are looking at the photographer taking the picture.\nA girl with a coat and hat on is pulling luggage.\na red and silver train is coming down a hill and snow\nA field and a fence sitting in front of a group of buses.\nA rider is dressed in red riding gear while sitting on a coordinating red motorcycle.\nA YOUNG LADY ON THE COURT PLAYING TENNIS.\nAn orange and bottle of orange liquid on a table.\na close up of a woman in pigtails a shirt and tie\nA tall white and red light house sitting on a green hill.\nA european city in nice a sunny bright day\nA man holding a cake knife and stretches it out toward a cake as he stands next to a woman in a darkened room.\nMany cattle are on the field while people ride them in the background.\nTwo people skiing on cross-country skis on the snow.\nA woman wearing plastic gloves handing out fruit slices from behind a table.\na counter top with a microwave inside of it\nA surfer on a surfboard flying over the crest of a wave.\nA man holding a baby, eating and sitting at a table with two pizza atop.\nA multi colored train comes around the bend on the tracks\nA dirty train sits on the railroad track.\nThe Big Ben clock tower towering over the city of London.\nA group of three men standing next to each other.\nA group of zebras that are standing in the dirt.\na popular sporting event being being witnessed by spectators\nA girl with big glasses is brushing her teeth\nA young man in a bathroom dancing while looking at his reflection in the mirror.\nA motorcycle has a red and white plastic container on the side.\nA sandwich is on a delicately-designed plate with other place settings.\nA person wearing skiis and jumping off a snow hill.\nA large panda bear laying down in a forest.\nA grey cat smelling a cut filled donut on a plate.\nA cat sits on a fence under an umbrella with ghost lights\nA motorcycle rider riding on the street near a grassy hill.\nA small train traveling on the railroad tracks\nA couple is cutting a wedding cake together.\nA group of people sit in a boat with a bike.\nThis meal has four pastries, grapes, strawberries, and sauce.\nA shirtless male tennis player awaiting the ball.\nTwo women wearing hats standing near a fence.\nA man rides a wave on a surfboard.\na girl that is kicking a soccer ball around\nA person is holding a cup with food and a plastic sword in it.\nA tennis player returning a tennis ball hit to him.\na little kid that is sitting on a toilet in a bathroom\na city neighborhood with a stop sign on the corner\nA woman standing with a bag in a mirror.\nAn umbrella strapped to a bicycle a rain shower.\nA locomotive on the tracks near buildings and wires.\nthis bed is very large and is under a window\nColorful red bar stools are lined up in a kitchen.\nsome stools a white refrigerator and wood table and chairs\na one way sign in front of a tree pointing to the right\nA person on the street with a skateboard.\nA female tennis player in action on a court.\nA purse has it's contents laid out on a table.\na man and a woman sitting on a river with an umbrella\nVarious fruits and vegetables sitting on a table.\nAn adult riding a bike next to a little boy.\nTwo people and a dog sit on a sidewalk and watch a commercial bus pass them.\nA person is holding a donut with two fingers.\nA cat sitting on top of a black refrigerator..\nA large tall building with a small bird flying over the top.\nA knife point on the surface of an apple.\nThe little boy is standing in front of the new refrigerator.\nA family of zebras and a giraffe in a grass field.\na large group of cattle have been fed fresh hay.\nA train traveling down tracks next to a  mountain.\nA man with a white beard sits wearing a top hat and a suit.\nA vase and two candles sitting on top of a table.\nA red truck with a flame paint job.\nA brass clock stands in a train station.\na river with frozen sections floating in it\nA child holding a dragon kite standing in the grass.\nA bird walking on a beach with something it it's mouth\nA kite on on the ground on a grassy field\nA train is on the tracks that is red and yellow.\nA yellow and green bus is going next to grass.\nA baseball player sliding in to a base while the baseman tries to tag him out\nA train moving along a track during the day.\nA snowboarder sliding down a hill in the snow\na close up of some white puffy balls\nA mother elephant and her baby walking through the brush.\nA group of teens are playing frisbee in a field with a view.\nThe man is snowboarding down the snow covered hill.\nA table with a laptop, a phone and a drinking glass.\nA man standing on the sidewalk with his skate board.\nA man in a black suit carries a checkered umbrella as he walks on a crosswalk.\nA large brown dog holds a Frisbee in his mouth.\nA steak covered with seasonings of mushrooms and broccoli.\nNight time view of pole with too many signs on it resulting in joke street name.\nAirplane flying low over the treeline and field beyond.\nDifferent types of foods and vegetables side by side.\nA train prepares to depart from a station.\nBirthday cake decorated with a frosting in the shape of a truck.\nA sandy beach next to the ocean covered in kites.\nthere are many people that are riding a elephant\nPerson doing a trick on a skateboard on side of building as others walk by.\nA person sits next to a laptop on the wooden table.\nA large hill in a green pasture of grazing cattle.\nA MAN DOING A BICYCLE TRICK AMONG OTHER BICYCLISTS\nA portrait of a group of tennis players and coaches.\nA train on a track going under a trellis.\nA close-up of a stop sign in a snowy landscape.\nA bus is leading the pack heading towards the hotel.\nA pastry with fruit, mug and fork sitting on a counter.\nA photo taken from a field looking at a train going by.\nA yellow commuter train sitting at a station.\nThere is a plate of mushroom pizza on a table.\na brown bear walking on the side of the road\na woman is walking and talking on her phone\na bathroom with a toilet, a sink and a mirror\nTwo men running for a frisbe on a field.\nA smiling woman in a formal dress holds an umbrella.\nA brown bear walking around in the river.\nAn espresso machine brewing fresh coffee and a toaster.\nA woman riding a bike with a woman on back.\na little dog running up to two bulls next to some bushes\nA baseball player swinging a baseball bat\nA locomotive engine blowing steam as it comes down a track.\nA mime soaked in the blood of the innocent while standing in a park.\nA chefs knife and a cutting board with uncut mushrooms and half of an onion.\nA woman smiles as she stands in skis on a snowy hill.\nA couple of people that are in the snow.\nA street sign reads \"Jack Kerouac\" on a street corner.\nA row of bikes sitting next to each other as people ride bikes past them.\nA fire hydrant located in a clearing in  the woods.\nA prepared pizza is sitting on a table.\nA woman in a cowboy hat and Texas flag on a horse.\nA skateboarder is mid air doing a trick\nThe gourmet pizza includes several very special ingredients.\nA person in a wet suit riding a wave\nA green motorcycle sits parked by a gas station.\na pair of scissors with long shears sitting on a pattern\nA pond of water with three giraffe walking in the dirt.\nA person sitting on the sidewalk holding an umbrella.\nThe giraffe is standing alone in the wilderness.\nA woman holding a banana in front of her mouth.\nA satellite dish is near the produce hanging above a door.\nA fake zebra is shown in the lobby of a hotel.\nGuacamole sits on a white plate with a garnish of shredded carrots.\nSeveral people walk out of a bus onto the street.\nA park bench has four people sitting on it under a large tree.\nA group of people standing on top of a sky slope.\nA red garbage truck and a man behind it.\na close up of a laptop and a mouse on a small table\nA white toilet sitting next to a white sink.\nAn orange cat trying to look underneath a closed door\nCat sleeping in a high chair in the kitchen.\nA male tennis player in action on the court.\nA group of five sheep standing in a row\nA white refrigerator freezer sitting in a park.\nA bird sticking it's beak in the water.\nA motorcycle has a paint design in green.\nA couple of brown bears sitting and standing next to a brick wall.\na person riding a surf board on a rivier\ntwo people in a kitchen area preparing food\na couple of white couches in a room\nAll of those bikes look exactly the same.\nA giraffe standing in front of trees and an open field.\na big kitchen that has a lot of open space\nMany pictures and toys are posted in the office\nA dog is running around some cows in a field.\nA broken pair of sissors with a half of an orange handle.\nA person holding a smart device in their hand.\nA jet fighter sitting on top of a field of green grass.\nThe young giraffe are eating from a branch.\nA woman looking at a group of giraffes.\nTwo planes parked next to a runway on the grass.\na white dining table and two chairs by a window and a cat in the corner\nA COUNTER FULL OF DESSERT INGREDIENTS AND BEER.\nTwo brown horses in a pasture eating grass.\nA baseball player is on home plate with his bat.\nThree zebras are standing together in the dirt\na man doing a rail slide on a skateboard\nA young Giraffe enjoying the sun on the grass.\nA street lamp is on a street with a sign and flowers.\na desk that has a computer and a keyboard on it\nThe lady is holding a baby eating dessert.\nA black bear in the background on a grassy slope.\nPeople are standing and sitting near the street.\nKitchen with silver appliances and brown cabinets.\nA side mirror of a vehicle showing a street sign.\nA large bus on a open city street.\nA teddy bear is being tied on a pose with pink ribbon.\nAn orange and white cat is sitting in a car.\nA fenced in area shows two leafy and low-hanging tree branches, casting shadows, and making shade for two horses that are grazing at some patchy grass.\nA plate with some meat, bread and salad on it\na person trying to get something out of a plastic case\nRow of black suitcases on a wooden floor.\nA man holds a candle in one hand and an umbrella in another.\nA red bike is parked between others as people walk past\nSomeone in the air on a snow board\nLots of construction materials at a childrens park\nThree giraffe standing next to a man in front of a blue barrel.\ntwo men and two women receiving some kind of reward\nAn old picture of three student in the library with there teacher.\nBroccoli, green beans and various other foods in a tray.\nAn older gentlemen is wearing a black suit with a white shirt and tie and a red flower in his lapel.\nA street corner view from the bottom of a clock tower.\nThe lunch was in a box and had carrots, berries, grapes and a sandwich.\nA picture of a meal of artisan pizza.\nTwo riders on the backs of horses riding along the beach.\nA tour bus making a right turn as people wait.\nA bedroom with a window, armoire, chair and table with plant.\nGroup of four people standing and playing a video game.\nUrban area intersection with traffic signals displayed at sunset.\nA cat sitting on a bench in front of a building.\na person holding a surf board in a body of water\nA snowboarder is at the edge of an outdoor jump.\nA person riding a bike on the road near some stores\nA man stands with a beer in his hand.\nAn assortment of veggies sitting on top of a wooden table.\nA surfer's surfboard is going straight up on a turbulent wave.\nPenguin balloon, an orange, coins and beverage at computer.\nA bride and groom walking from the church with umbrellas.\nPedestrians cross at a crosswalk in a crowded city.\nA row of parked buses sitting in front of a buiding.\na close up of two slices of pizza on a plate\nA finger that is pointing at bread on a plate.\nPlastic bento box lunch example with fresh food\na lady on a phone sitting on a couch\nA large continental jet sitting on a tarmac at an airport.\nA man propped up against a bike looking at a cell phone.\nA collection of sailboats docked in a harbor.\nA man standing under a street sign looking at paper.\na man is sitting in front of a small cake\nA man pouring a drink into a glass while a woman watches across the counter.\nA very cute looking girl on a cell phone.\nWoman standing on the porch holding a tennis racket.\na close up of a dog near a door way\na couple of signs are hanging on a wall\nThree people are handing bunches of bananas to a fourth person.\nA young man roasting a chicken in an oven.\nA girl raring back at a soccer ball on a field.\nA sandwich cut in half sitting on top of a wrapper.\nThere is a little girl playing with a ball.\na baseball player that is standing at home plate\nFour dogs playing with a Frisbee on a lawn.\nA Soutwest Airlines jet airplaine taxiing along a runway.\nA white bed that is in its room.\na giraffe walks through a bunch of bus\nA little girl standing on skis in a snowy area.\nA small front is lying down on the leaf.\nA beautiful young woman standing on a tennis court.\na bathroom that has a sink and some lights\nA woman in black jacket sitting in snow with snowboard.\nA person standing looking at a  large statue with clocks built into it.\nA stop sign with people walking down the sidewalk.\nA young woman riding a skateboard at a skate park.\nA bus is stopping to pick up people in the snow.\na group of young people watching a young boy skateboard down a rail over some steps\nA group of cow standing in a patch of dirt in a pasture.\nA woman that is sitting near a coin meter.\na toilet and a bidet sit in a bathroom next to a garbage can\nThe herd of sheep is walking near cars on a street.\nA cellphone sitting on a table with a cup.\na white bus is driving on the dirt\nA man riding a skateboard down a street.\ntwo people standing in the snow mountain with their skis\nA couple hugging each other to pose for the camera\nA skateboard that is sitting on a beach.\na women that has a large pizza on a table\nThe horse looks at the camera while the people talk amongst themselves.\na man doing a trick on a skateboard going down a hill\nA man holding a box of food while wearing glasses.\nTwo guitarist playing while people sing in the background\nA small car is parked in front of a scooter\nA lady flying a kite with a black dog nearby.\na lady o a urban street holding a see through umbrella with two men standing behind her.\na half of a pepperoni pizza on paper\nA young boy that is holding a baseball bat.\ntwo people playing basketball at an apartment complex\nA street scene looking down at cars and motorcycles parked.\na man standing while attempting a trick with a white frisbee\nThe zebra is standing alone grazing in the grassy field.\nA group of women are walking with cups.\nTwo odd-looking birds wander around in a field.\nThe blue bus has arrived and parked on the side.\nA small brown ukulele sits on a small wooden table next to a vase.\nA man against a concrete wall talking on a mobile phone.\nAn airplane hooked up to the umbilical walkway at an airport.\nA guy letting a bird eat from the palm of his hand.\nThree people stand around a small aircraft on a wet runway.\nThree zebras graze in a field with grass and trees.\na large hotdog with lots of mustard and a Hawaiian punch soda\na group of boats parked next to a dock in the water\nA young boy swinging a baseball bat during a game.\nA beautiful young woman in a bikini feeding a baby food.\nA airplane that is in the sky near clouds.\nA blue and red tour bus standing by a building with a tile roof.\na sign with soem names on the top of it\na disk with a computer sitting by two windows with a view\nelephants at the zoo standing in front of a waterfall\nA spoon next to a plate with fish, rice, beans and broccoli.\na line of people that has skies on\nA couple skiers on a snowy mountain side\nThe handle bars in the restroom are sturdy.\nA split image of two different women holding a object resembling an arrow\nA man standing next to a little girl on top of a field.\nA group of cats looking out of a window.\nA boy swinging a tennis racquet on a court with other kids.\na red white and black sign of a man working\nA wooden paneled door opens to a spacious bathroom.\nA white-bearded man stands holding a puppy and a stuffed animal.\nA pocket sandwich filed with meat, cheese and a pickle.\nA cat sitting next to a bowl filled with water and roses.\nA white plate topped with a pizza and a knife.\na female in a red dress is on a bed with a laptop\nA white plate topped with meat veggies and rice with sauce.\nA piece of bread sitting on top of a plate.\nA sandy beach covered in lawn chairs with blue umbrella over them.\ntwo people in the air standing on snow boards in the snow\nA man is at a table with three plates of food.\na person standing next to a truck with its hood open in a parking lot\nA group of seagulls are flying over a wooden dock that is sitting in a lake during the early part of the evening.\na person cutting a pizza with a knife\nA brown puppy passed out after drinking a bottle of coke.\nA large adorable cat resting on a big soft pillow.\nA sheep is laying on its side while another sits against a fence.\nthere is a pink fridge and a pink stove in the grass\nA train on the tracks next to a wooded area.\nA room filled with furniture and boxes and clutter.\nA man does a handstand on his skateboard.\nA man sitting on a kitchen floor has tools spread out beside him and is holding a drill.\nA couple of zebra standing in the tall grass.\nA stop sign covered in stickers next to tall buildings.\nA man and a woman on a touch looking at a smart phone.\nA boat drives in a large body of water.\nThis plate has meat, broccoli, and a potato.\na number of sheep in a field with dogs\nA man preparing food in a kitchen on top of a stove.\nSeveral children are playing in a fire hydrant.\nA small kitten lies next to a laptop.\nTwo sheep grazing in a field with buildings in the background.\nA very tasty looking dish with some assorted veggies.\nA top of a building that has a clock and is flying a flag.\nA digital clock on a bus can be seen above people's heads.\nthis is a man standing in a field\nA man on his stomach in a white bed.\nTwo cows are sitting on an open field during the daytime.\ntwo men stand in the sand in a baseball diamond, while one hands the other a bat\nSmall boy holding up broken umbrella Ina parking lot.\nThree blue pieces of luggage stacked on top of each other.\nThere is a toilet with the seat up in the bathroom.\nA woman riding a gray horse in the middle of a street.\nA person stands between two tents set up inside of a cabin.\nCow tethered with chain eating hay in outdoor field.\nA young boy standing on a grass covered field under a flying kite.\nA man standing on the beach next to a surfboard.\na close up of a pot of flowers with a box of flowers\nA person engaging in a water sport with skis on.\nA woman with skis is standing on the snow.\na boat is docked in some water next to a house and a bridge\nA skier skis down a slope, with blue and red course markers in the background.\nA vintage photograph of a man riding a motorcycle.\nA public bathroom sink and hand drying area.\nA train engine is pulling cars down a stretch of track.\nA oddly colored zebra laying down on the dirt\nA plate with food on it next a a spoon and some more plates.\nTwo children and a woman on a play-mat in a living room.\nA wedding cake with a bride and groom on top.\nA group of elephants marches down the city street in front of a large building.\nA woman smiling and talking on a cell phone.\nA group of zebra standing on top of a dirt field.\na room that has a bunch of beds in it\nSheep are gathered around a lone tree on the hill\nVarious lights on the front of a white vehicle.\na car with a mirror view of a dog walking behind it\nA man standing in the doorway of an umbrella and parasol shop.\nA man with a glove that is in the dirt.\nA train driving down the tracks near trees and a building.\nTwo baskets on a table underneath hanging items.\nA pink plate with white polka dots and a slice of chocolate cake and white frosting.\nA giraffe walking across a dry grass field.\nA train on the tracks at a train station.\nRED, WHITE, BLUE AND YELLOW TRAIN COMING DOWN THE TRACKS\nAmbulance and fallen over motorcycle from viewpoint of injured.\nA bench and trash can are seen in this picture.\nA person skateboarding on a street barefoot with one foot up\nA living room in a well decorated house.\nA horse eating grass next to an old fence and building.\nA couple of large white airplanes and trucks.\nA empty bench in front of a green bush up against a building.\nTwo horses pulling an older styled coach passing a home.\nA close up of stuffed animal bear face.\nA man tying a windsor knot in his tie.\nA long table covered and used as a desk\nThe men are in the bathroom using it together.\nA food entree is served on a plate.\nTwo smiling men are cutting into a cake.\nAn old advertisement for Maxwell coffee with a family sitting around a table.\nA dog preparing to catch a frisbee in its mouth.\nA couple of people laying on top of surf boards near the shoreline.\nCommuter bus at roadway intersection in urban area at dusk.\na man is riding down a ramp on a skateboard\nAn man and a young girl on a motorcycle.\na magazine cover showing a man getting ready to kick a soccer ball\nA couple of boxes filled with lots of donuts.\na black cat next to a box of fruit and vegetables looking up at the camera\nA bird is standing on the shore next to the water.\nThe dog is laying on top of the couch.\nA group of people cutting a cake with a sword.\nMan on the back of a surfboard riding on a wave.\nan image of two people on the beach\nThe large bathroom mirror is clean and spotless.\nYoung boy throwing a ball up and catching it\nSurfer and black outfit coming down the front of a wave.\nTwo striped zebras are on knee high grazing grass.\nA little girl putting a blue umbrella over a yellow fire hydrant.\nA person in black is skiing down a snowy hill with trees.\nA man that is being pulled by a boat on a board.\nA group of children in a classroom with windows around.\nTHERE ARE MEN SIGING WITH ALL OF THEM WEARIGN YELLOW TIES\nA picture of a empty street very late at night.\na room with a big chair with some boxes behind it\nThe teddy bear was posed at the table as if he was drinking.\nThe person is flying a kite at the beach on an over cast day.\nA woman displays a homemade pizza dotted with mozzarella and herbs.\nA red traffic sign next to a uphill alley.\nA woman standing next to a building holding a phone.\nFour men jumping into the air to catch a frisbee\nA gray and white bellied bird stands on a branch\nA man walking down a sidewalk next to a busy city street.\na group of signs that are next to some trees\nA living room with a corner chair and a scatter rug.\nTwo men in pajamas are holding Nintendo Wii controllers.\nA large red fridge is sitting on the red carpet.\nA small young child is holding an umbrella in the sun.\nDifferent styled sinks next to each other under mirrors\nA woman cutting a cake at a bachelorette party.\nA man with a surfboard in the ocean.\nA couple of kids laying on top of booths.\nThis is a road sign for La Brea Ave\nA baby giraffe standing with other young giraffes in captivity.\nA laptop computer sitting on top of a bed.\nA female skier competing in a skiing competition.\nA large pizza prepared and ready to go in the oven\nA man poses and smiles while holding a doughnut.\nA plate of sausages, bread and butter, and potato salad.\na person holding a surf board on a beach near the water\nA train going down the train track.\nA man sitting at the kitchen counter looking at a picture.\nPolice car is parked in front of a hydrant\nA blanket with various items that include a mouse, computer hard drive and a keyboard.\nNine men pose together near a coach and a dog.\nWoman deliver serve in a professional tennis match\nA cake is frosted with a surfing teddy graham on the side.\nA young man is surfing behind the giant wave.\nSeveral people are standing around a decorated elephant.\nA large cat sits on the sofa arm next to a girl using a computer\nA red car is parked next to a black truck.\nThe young girl smiles holding a donut with sprinkles.\nA man holding out his white eight bit tie.\nA cow makes its way down the street next to city traffic.\nThe meal consists of beef, brocolli, and other vegetables.\nA woman in a red coat can be seen in the background talking on a phone.\nThere are Indian people riding in a cow drawn carriage\nA woman swinging a tennis racquet at a ball.\nA person walking a dog on a sidewalk lined with vehicles.\na motorcycle with two people driving by a car\nSome white cattle roaming down the street of a town.\na black cat with it's head stuck in a boot\nA white and blue bus driving down a road next to trees.\nsome fog traffic lights street lights and buildings\nA bus on the side of the road in traffic.\nA giraffe sits in the grass next to horned animals.\nPeople being social outside a large colorful amusement tent.\nA skateboarder jumps very high at a skate park.\nA child watches an animal on a rock platform in a zoo.\nA skateboarder skating off the  top of an outdoor stairway.\nIt is surprizing that these flying kites don't get tangled together.\nthere is a lot of old stoves on the ground\ntwo big red double decker buses on the road\nA giraffe looks like a statue in the dirt.\nA woman and two teenagers are holding on to a stop sign.\nTwo young men in dress clothes and ties standing in front of an outside door.\nThree empty wood benches sitting in a woody area.\nA kitchen area with a stove, microwave and counter space.\nsome buildings and some boats are docked in a harbor\nA guy is going up the ramp with a skateboard.\nTwo men are playing Frisbee in the park.\nTwo people, most likely a couple, are on the bench.\na black bear walks through the woods in the distance\nA bathroom scene with a sink, toilet and shower.\nA bathroom with a toilet, sink, mirror and shower stall.\nA person with a surfboard walks along a beach.\na couple of people are skiing down a snowy hill\nPeople standing at a table putting toppings on their hotdogs.\nThe blue white bus sign next to the trees on the campus.\nThe silver refrigerator is across the kitchen from a black stove.\nA stop sign that is right by a road.\nA man swinging a tennis racket at a ball on a tennis court.\nA street is displayed at night with time lapse photography.\nAn old school bus painted white with curtained windows parked under a freeway\nA small elephant is standing next to the other elephants\nYou male poses against stone wall with leg up.\nThe sun sets over the trees beyond some docks.\nA full view of a market place full of sheep and items.\nA grey and white cat laying behind a laptop.\nThe contents of a refrigerator filled to over flowing\na black bear pokes its head out of a field of tall grass\nA single piece of pizza sitting on a paper plate.\nA man in a suit carefully adjusts his tie.\na bunch of boats all lined up on a dock.\nA man sitting on a couch holding a Nintendo Wii controller.\nA sink that is in front of a mirror.\nThree young girls holding ribbons in the snow.\nSome kids are talking together outside of a house\nA black and white cat laying down resting its head on a cushion.\nTwo zebra standing next to each other next to a tree.\nA large clock suspended over a street sign.\nAn adult black horse and a young brown horse interacting.\nA train sits on the tracks at an empty train yard\nTwo men playing a video game as other look on.\nToilet next to a sink with it's counter cluttered with bottles of lotion and stuff.\na sandwich sits next to some fries\nA horse pulling a carriage wearing a straw hat\nA bunch of bananas on a small chair.\nA man riding a skateboard being towed by a woman on a bike.\na suitcase with writing on it sitting next to a guitar\nA black cat laying down on a laptop.\nA young black cat resting on a colorful surface.\nA plate with two hot dogs covered in slaw, and french fries\nAt sunset, a surfboard upside down on the wet sand.\nA man holds a toothbrush in his mouth.\nA vintage baseball team of ten pose for a photo.\nA group of people standing in the dirt near large tents.\nTwo buckets with a bowl sitting between them.\nA recently remodeled kitchen with marble and wooden furnishings.\nA bus is passing through a city intersection.\nA person making a strange face at a very large pizza pie.\nA table with a laptop, phone and other devices sitting on it.\nTwo people working at a market with oranges and apples.\nA group of people are playing soccer on a soccer field.\nA small dog chewing on a teddy bear\nA man holding a Nintendo Wii game controller.\nModern espresso machine on counter in residential kitchen.\nA brown horse is on the grass with two people.\nthere is a surfer that is walking towards the water\nTwo zebras standing near a pile of sticks and a wooden fence.\nThree women and one man wear various skis on their feet while wearing swimming clothes.\nA thirtieth birthday cake with candles on it.\nA zombie apocalypse is happening on the street.\nA dog is on a beach with people in the background.\na person bends down to put air into a car tire\nThe people are having a discussion about cell phones on the table.\nAntique black and white photograph of a couple on their wedding day\nA woman using a laptop computer on top of a wooden table.\nMan folding banner while holding stick in unfinished carpet\nA herd of sheep crossing the road under a cloudy sky.\nsome old wooden doors decorated with scissors for handles\nA toilet with a wooden seat is open.\nThere is a large cooking pot and some staples on sitting on the shelf.\nA person is holding up a large colorful umbrella\nA man on his bike is between the busy traffic, including two buses.\nA girl is holding the strings to a kite.\nA snowboarder in the middle of a jump, with a mountain in the background.\nA yellow fire hydrant sitting in a plant with a green top.\nA building with a clock on the front and side of it.\nA truck that is in front of a building.\nTo buses side by side with one being a double Decker bus.\na woman some pizzas drinks and bottles and bowls\nA group of zebras are with a group of giraffes.\nA baseball player mid swing during a game.\nA bowl fo soup sitting on top of a wooden table.\nA person reaches for the cabinet as the cat sits in the sink.\nA large clock is on the colored wall of this building.\nA woman underneath a umbrella on a street.\nSome very pretty giraffes standing in some trees.\nSmall boats sit unused in water by a dock.\nA neck tie that is knitted or crocheted from yarn.\nA cat leans halfway off of a bed.\nA group of people standing outside of a building\nTwo side by side zebras are near the tall grass.\nThe people are sitting down together having a meeting.\nA baseball pitcher pitches a ball while standing on a baseball field.\nA dog sits by and watches his owner.\nA bench sitting in front of a brick wall on a patio.\nA thomas the tank train traveling down tracks.\nA mirror hanging on the wall reflecting a toothbrush.\nA passenger train that is pulling into a station.\na room with a brown sofa,computer on a table next to a window and a red book shelf\nA jet that is flying in the sky.\nA white toiler in a very small bathroom.\nA plate with a variety of Indian food on it.\nA white dog sitting on a ledge of a window.\nA man and a woman sit on a bench overlooking the water.\na male is on his stomach riding a wave on a surfboard\nA woman standing on top of a lush green field.\nOpen packed suitcase with too many extra clothes to fit.\nA woman dressed in military uniform speaks to a child.\nSheep are grazing in the fenced in area.\nA man sitting at a table eating pizza slices\nA cat lies on a laptop and paws the keyboard\nA cat is sitting on the floor staring at the TV.\nA tennis racket being held by a person and balancing a tennis ball at the top of the racket.\nA white train colliding with a black car.\nTHERE IS A WOMAN WALKING WITH AN UMBREALLA\nA man pushing a luggage bard through the middle of an airport.\na girl is getting close to a giraffe\nA green bus near a curb in front of a brick building.\nA baseball player on the backswing of hitting a pitch\nA bedroom with wooden floors in an apartment.\nA brown stuffed teddy bear wearing a red bow tie.\nA man throwing up in a toilet, with his head in it..\nA man doing a jump over a wave on a surfboard\nA bowl of apples and tangerines on a table.\nA man sitting on a big white horse.\nA cat enjoying the warmth of a laptop.\nA person making food inside of a factory on a machine.\nA child wearing skis stands on snow and smiles at the camera.\na pan that has a big pizza on it\nA truck in the middle of the street.\nA man showing a women an image on a projector.\nA freshly made pizza sits on a cutting board and pizza wheel.\nA small child heading down the mountain on a snowboard\nA mid sized transport plane sitting on a tarmac at an airport.\nA man is standing in the street near a frisbee.\nThe boys are standing beside a group of motorcycles.\nA person holding a dog's leash and looking at books.\na truck on a city street in front of another vehicle\nA showroom in a high end furnitureinterior design store.\nA man in a sports jacket is sitting in front of a microphone.\nAirplane being loaded at a terminal on a cloudy day.\nSpectators watching a professional baseball game's action closely\nA man standing next to a woman with an open umbrella.\nA man in a baseball uniform hitting a ball.\nAn apple on the ground, and an orange on the ground in a picture beside it.\nMan riding a bike on a wet street in an urban setting\nThe skier in the red coat is doing a flip in the air.\nThe furry cat is looking at it's own reflection in the mirror.\nA slice of pizza sitting on top of a white paper plate.\nA woman walking down a street holding an umbrella.\nBridge and groom walking down a path surrounded by a crowd.\nA man smiles as he plays a guitar.\nA batter and catcher during a baseball game.\nTeddy bear in sweater sitting on shelf near plant.\nTwo horses on sand face each other while one urinates.\na clock that is on the outside of a building\nA person riding the waves on surf board.\nOld fashioned furniture arranged around a parlor on an oriental rug.\nA wooden table that has several types of pastries sitting on it.\nA white and black cat standing partially in an open refrigerator.\nA man with glasses is wearing three ties while holding a camera.\nA black cat resting in an flower pot\nA bunch of fruits and vegetables for sale on display\nA bulky laptop computer on a desk near a lamp.\nThe people are waiting for the train to get there.\na brown and black ox and a white and black one and grass\nAn old bathroom with a sink and toilet.\na close up of a clock on a pole near a building\nA blurry dog holds a frisbee in it's mouth.\nThe two elephants are very close to each other.\nA couple of people are riding horses on a beach.\nA few items laid out on a towel on a table.\nA man looks at a hot dog he is eating.\nA herd of zebra grazing on a grass covered hillside.\nA beach with people surf boarding in the waves.\npeople walking pulling their bags and the security looking at them\nA young child enjoying a serving of cake and ice cream.\nA living room with a computer desk in one corner, a coffee table and television.\nA teddy bear with no face made from denim.\nBlack and white photo of woman on chair holding strap of leopard or cheetah skinned hand bag on ground.\nthere are many people snowboarding down a hill\nChef at counter with baked goods, baking pans and containers of toppings.\nOranges and lemons sitting together on a white plate.\nA group of people sit on a boat on the water.\nA woman with short hair looks at a cell phone screen.\na yellow sign of a person carrying a surf board\nA close up shot of horse, with it's baby in the back.\nA white plate topped with eggs, sausage and a cut in half tomato.\nA dog running in a field with people around.\nA room filled of shelves topped with lots of items.\nthree baseball players on a dirt baseball field\na wooden table with the tail of a cat and a plate of cookies\nA giraffe standing next to a tall wooden pole.\nA woman stands beside a pony wearing a blanket\nA beautifully appointed bathroom with classic color and amenities\nA green and white bus driving past a building.\na fryer that has a bunch of doughnuts in it\nA man uses his laptop on a kitchen counter.\nA train that is riding on the tracks near the street.\nA child holds at bat at a baseball game while people watch in the background.\nA close up photo of a brown bear.\na brown bear standing in the shade in the wood\nThe people are trying to climb the mountain.\nA man breaking slices of pizza on a pan\nA yellow school bus reflected in a side mirror.\nFour red birds perched on a branch in front of the clock tower.\nMan in black blazer pouring wine in glasses.\nA young person holding a frisbee while standing on a field.\nA bearded man in dark clothing sleeping on a sofa.\nA black cow is looking over a grass covered chain link fence.\na female playing tennis on a clay court.\nFour boys dressed up one talking while the other's are listening.\nA group of young children sitting on top of a bean bag chair.\na man surfing on his surf board  doing a trick\na bathroom with a sink, mirror on a tiled floor with a door open\nA man is holding a bunch of green bananas in his yard.\n3 dogs sitting in front of a fruit and veggie stand.\nA train sits in a train yard with an animal.\nThe young child is learning how to ski.\nA MAN IS ON HIS SKATE BOARD IN THE PARK\nSmall bathroom with toilet, bath tub and sink.\nTwo giraffes are standing by a tree and eating.\nA little girl riding a horse next to another girl.\na guys tie all up closes its black with strips\nthere is a cat that is sitting on the kitchen counter\nA teen-aged boy standing near a jail replica.\nA boy holding spoons over a pan filled with food.\nTwo buses next to each other in front of a fence.\nA businessman showing off a unique red tie.\nA cat sleeps in the sunlight beside a computer.\nA woman who is holding her little dog.\nA woman is crouched next to a suitcase on a city sidewalk, she is surrounded by people standing over her.\nA man standing with an umbrella in one hand and a flashlight in the other\nA red European passenger train sitting on the rails.\na red truck parked on a bridge with people in the back\nA small plane sitting on top of an airport tarmac.\nA man folding his towel on the beach while his dog stands in the sand.\nBen clock made as a model with bystanders walking by.\nA young man tossing a frisbee in a  forest.\nA man that is holding a frisbee in his hand.\nA fire hydrant is surrounded by and covered with snow.\nTen people and their dog pose for a picture while skiing.\nA blue clock with clear leaves coming out of it.\nThere is a tower with a clock at the top.\nA boy laying on a bed with a black kitten.\na little toy fire engine sitting on the ground outdoors\nTwo men in military uniforms holding a large key in front of a house.\na group of people excited to eat pizza\nA guy on a skate board near some graffiti.\nA cutting board with a long pizza and knife on it.\nA picture of a fire hydrant on the side of the road.\nA child's highchair has a little cat in it.\nA cat is laying inside a briefcase in a room.\nA group of people in white lab coats leading a group of cows.\na white box with different kinds of donuts\nA bunch of stuffed toys inside of a homemade castle\nTwo white toilets, white towels, and a shower.\nthere are two zebras standing next to each other\nA computer mouse sitting on top of a table.\nA large bird sitting on top of a speed limit sign.\nMany people walk down the street with umbrellas in hand.\nYoung boy taking swing with bat outdoors in play field.\nA man is on his roof with a large umbrella.\nA skateboarder doing a stunt on the edge of a ramp.\nA cat sitting on top of a shelf by a computer.\nMultiple vehicles parked curbside next to parking meters.\nA little boy that is standing on a skateboard.\nTwo people are looking at a truck while a dog is being walked.\nThe person ski's downhill on the mountain of snow.\nLarge number of snow skiers at the bottom of a slope.\na herd of zebra standing next to each other.\nTwo young men retrieve plastic flying discs in the park.\nA large sandwich on some paper by a knife.\nA PICTURE OF A BATHROOM WITH A PLAID SHOWER CURTAIN.\nA green pan that is on a stove.\nA man brushing his teeth in front of a mirror.\na large pizza is sitting on a pan\nA zebra stands near a giraffe in the wilderness.\nA man flying through the air while riding a snowboard.\nBaked pizza displayed on serving dish with beverages on small table.\nThere is a family out on the ski slopes.\nA sign that reads public market center is shown.\nYoung man looking into the inside of a refrigerator through bottles.\na small girl in a white shirt and another person\na dog is under a man with a laptop\nlady wearing work out clothes and glasses with a cat in her lap\nA clock on a stone tower is against the blue sky.\nLarge striped zebra walking down a patch of grass.\nA man standing on top of a beach under a cloudy sky.\nA couple of elephants standing next to each other.\nA messy baby eats the broccoli off of the table.\nA man plowing the field with two horses on the country side\na group of females standing in a grassy field playing frisby\nan intersection with different poles filled with street lights and a camera\nPanda bear climbing tree with paw over limb.\nA man with glasses playing with a Nintendo Wii.\nA giraffe sitting on a rocky dirt and grass covered ground.\nA man on a phone on a ddr pad\nAn orderly bathroom is seen in this picture.\nA man standing with a dog in a field of grass.\nThe person with the bag is walking down the street.\nThe elephant family is walking down the road.\nA skier in all white standing in the snow.\nThe grinch riding a motorcycle with a small dog with antlers.\nthere is a withe toilet and the tub has a blue curtain\nA elephant fenced in a large land area .\nIdentical street signs pointing in the opposite directions of each other.\nA man and a young girl on a motorcycle.\na man with a white beard and hat on a cellphone\nA person with their feet propped up by a flower vase and couch.\nA living room arrangement looking into a kitchen and dining room.\nTwo surfers-are in the Ocean one stands and look's at his board\nA man flying through the air while riding a snow board.\nA small kitchen with a stove and refrigerator.\nA man swinging a tennis racquet at a tennis ball.\nA giraffe looking alert at the camera in a field.\nA view of a shower and toilet from above.\nTwo men standing in a living room holding Wii remotes and nun-chucks.\nEmotional person hugging a stuffed bear while sitting in a plain room.\nA street sign with two streets and two block numbers.\nA herd of sheep crossing a bridge over a river.\nA small bedroom picture taken through a fisheye lens\nA picture of a person fixing a road sign.\nthe woman is sitting at a table in a purple chair\nA very cute elephant covered in mud in some tall grass.\nseveral people play video games with remote controls\nA group of people taking pictures of two pizzas in open boxes on a counter\nA white toilet sitting next to a bathroom sink.\nA hamburger and fries sitting on wax paper.\nA nice hotel has a full living suite\nA A bowl and a sandwich on an orange plate on a table.\na tennis player swinging a racket to hit a ball\nA group of colorful umbrellas sitting next to each other.\nA picture of some trash being wasted in a trash.\nA truck driver adjusted the straps on his load.\nTwo groups of people rowing in boats side by side.\nA young man riding a motorcycle having a good time.\nA girl walking behind an open fire hydrant spraying water.\na woman is petting an elephant and a fence\nThe man is holding up his chat pad in his hand\nA man looks into the mirror as he styles his hair.\nA refrigerator and table and chairs in a garage.\nA boy with a racquet swinging at a tennis ball.\nA peanut butter bagel is sitting on a white plate with several other food items surrounding it.\nA white bed sitting next to two windows.\nA giraffe putting it's head in a leafy green tree.\nA bird sits in a fruit tree with many leaves\nThree teddy bears dressed up for Christmas on display\nA maroon vehicle stops at the stop sign.\nA woman spooning cookie dough onto a cookie sheet.\nThe silhouette of a group of people and a horse.\nA boy in grey shirt sheering a sheep by wall.\nAn army jeep with an American Flag sitting at an airport.\nA young boy skinning carrots into a sink\nA happy stray puppy lies in the street.\nA street  sign on a busy sidewalk corner\nAn oreo cookie and chocolate dessert on a plate.\na display shelf with a few bananas on it\nA man in a pink bow tie and a pink shirt is being hugged by a man in a blue shirt.\nTwo street signs indicating no parking or towing.\nA picture of a bunch food sitting on a table.\nSeveral bicycles sit parked nest to each other.\nA vintage airplane museum, with people walking underneath displays of WWII-era planes in a hangar.\nA group of people on skies with contestant numbers.\ntwo boys are playing a video game and people are watching\nThe side of the building has a large clock and several windows.\na group of people standing playing nintendo wii\nA tabby cat is laying in an open packed suitcase.\nA skier putting their feet in the skies.\nA chocolate bunt cake is adorned with cashews.\nA group of giraffes on a jungle path.\nA passenger jet rolling along a runway at an airport.\nSeveral vehicles are stopped at an intersection behind a red light.\nA young man performing a skate board trick outside.\na bright day and skiing in the mountains\nA woman in shorts and heels waiting on a train platform\nTraditional looking around the umbrella girl with old clothing.\nSmall piece of bread and a donut sitting on a white napkin.\nA man sits on a surfboard in shallow ocean water\nA clock on a tower in the middle of a brick building.\nThe women sits in shade working on her laptop.\nDad, son and teddy bear are all smiling and happy.\nan image of a baby eating a spoon\nthere are people sitting at a table using lap tops\na living room with a person playing with a kid\nChocalate covered deserts on a stick on the table.\nA small white-and-brown dog curled up on a flower-print pillow.\nthree people standing at the zoo watching a elephant\nA man riding skis down a snow covered slope.\nA train crossing the road with cars waiting.\nA man is wearing a pink shirt and a tie.\nThe two airplanes are close on a runway.\nA bowl of chicken, lo mein noodles and vegetables.\nA crowd of people mill about on the street.\nTwo people skiing on a snowy mountain with a building in the background.\nA man on a surfboard performing a trick.\nA young boy flying a kite near a house.\nA minimalist room features white appliances and beige walls.\nThere are two horses walking in a grassy field\nTwo brown horses pulling a carriage as people sitting on the side of the road watch.\nA man shaving his face with another man hiding behind him.\nA kid laying down with a stuffed dog on him.\na group of three people talking to each other on the sidewalk with a skateboard\nTwo giraffes standing next to each other in their natural habitat.\nA man flying a kite in an open field under cloudy skies.\nA woman is sitting on a canoe going down a river.\nA group of people with surfboards enjoying a small river.\na cat with its hair sticking out as it looks at a dog by the window\na polar bear swimming in the water by a wall\nA kitten that appears to be focused on a computer mouse.\nA group of men standing next to each other.\nThree packages of toilet paper sitting on top of a toilet seat.\nA motor scooter has multiple rear view mirrors.\nA little girl crawling out of a piece of luggage.\nTwo men with racquets on a tennis court.\na little girl sits on a bench by herself\nAn industrial kitchen has a double oven with glass doors next to a shelf of dishes and utensils.\nMany laptops and their assorted wires atop a wooden bench.\nColorful Adirondack chairs at the end of a pier.\nThere are four goats and one giraffe standing in a group.\nPurple  orchid and colored leaves in a green vase.\na bunch of different colored vases on a table\nA giraffe and a baby giraffe standing in an enclosure.\npink double decker bus with two woman pictured on side\nThe view inside a suit case, and a backpack.\nA dog standing on top of a boat in a body of water.\nA group of men doing tricks on skateboard next to ramp.\na close up of a buses rear view mirror\nThere is a bowl of food with bread and a plate of fruit.\nThree children sitting at a table with food and drinks.\nStop sign at the intersection of two rather rural roads\nhorses graze and drink from the water at a lake\na black and white dog is herding some white animals\nA large orange striped cat laying next to a computer keyboard.\nA brown cardboard box with glazed doughnuts and wax paper.\na bed sitting inside of a bedroom on a wooden floor.\nA person holding a pair of scissors in one hand.\nA small group of giraffes walk across the savannah.\nA bundled up woman skier falling in the snow.\nA person riding a horse and wearing armor in front of a crowd.\nA table with two drinks and glasses flanked by two chairs.\nA display case with various types of pastries.\nA couple of cats are sitting next to a dirty door.\nThe guy with the white shirt and baseball cap is milking the cow.\nA striped plane flying up into the sky as the sun shines behind it.\nA man's torso wearing a brown patterned tie, pens in pocket and a large checked shirt.\nA slice of strawberry cheesecake on a plate with a fork\nThe buffet features several different types of pizza.\nA man is leisurely crossing the street on a skateboard\nThe cat is wandering around in front of the cardboard boxes.\nA view from a house looking outside at the front of a black car.\nFreshly cooked food and salad on a paper plate with a fork\na airplane that is flying through the air\nThe right hand of someone unpacking a Wii remote and sports games\nA large clock is displayed on the side of a building.\na close up of a plate of food\nYoung child playing baseball in a local park league\nA plate loaded full with well cooked food\nTwo women and a man posing for a photo on the dance floor.\nZebras and wildebeest walking in their natural environment\nA female Tennis player is holding her racket while the crowd and man look on.\nA man is cooking a pan full of various foods.\nA white dog in grassy field with red frisbee.\nA clock and a picture hung above a big window.\nA gooey piece of pizza with peppers, cheese and onions.\nA fire place sitting inside of a living room.\na woman walking down the street with a baby carriage\nthree groups of yellow flowers in vases on table\nTwo males are watching something on a camcorder.\nthere is a toilet with dirt on it\nA baby laying on its tummy on a bed is looking at a blue elephant.\nA laptop next to a wall in a room.\nA jet airliner leaves a faint trail of smoke during landing.\na laptop sits in front of a group of people\nJet parked with no one around in the area.\nA white polar bear is laying in the snow.\nA man is riding a motorcycle across the sandy shore line.\nA group of children sit on a bench outside.\nA group of people standing near surfboards in the sand.\nSmall children wearing a cast holding up a Wii controller.\nA man is playing tennis on a dirt court.\nA man eating food while wearing a gray hat.\na man that is cutting a pizza that is on a stove\nA sleeping black cat sitting on a pizza box.\nThree doughnut holes sit on a white plate with a doughnut that has been topped with topping and drizzled with sauce.\nA store with items on display in it's front windows.\na person holding an apple near a tree\nA woman on a cell phone sitting on the ground.\nA man handing another man something inside of a room.\nA person riding a skateboard while wearing blue shoes.\nA bunch of people waiting on a subway train.\nA horse drawn wagon driving down a dirt road.\nA man standing over a table presenting food.\nA city street filled with lots of traffic.\nPeople standing on surfboards on waves in the water.\nA close-up of a metal statue of a bird landing on the nest.\nAn owl among a few leaves, next to a wire fence.\nA women in a blue shirt cuddles up with her cat\nTwo girls looking at a calf in a fence.\nA man in a grassy field about to catch a frisbee.\nThe food is a mixture of pizza, salad, and wine.\na group of people walking on a city street\nAn old fashion looking clock tower near some bright lights.\nA person stands under an umbrella on a sunny day.\nThe is a line of elephants in the street.\nAn abstract designed bowl holding a bunch of oranges.\nA couple cargo trucks parked outside of a few shops.\nA man riding his surfboard through the waves.\nThe head of the black and white horse has a red decoration.\nAn elephant standing in water and surrounded by grass.\nA group of people standing around a green tent next to a horse,\nA clock tower in the middle of a road.\na clock on a wooden pole in the middle of a beach\nA little boy brushing his teeth with a tooth brush.\nA double decker bus driving down a road.\nA woman playing games on a laptop computer.\nCloseup of a pastry with white and brown frosted petals.\nA bird perched on a wooden peg ready to take flight.\nA teenager standing on a ramp while holding a skateboard.\nA young child riding on the back of a sheep.\nA wooden doll is next to a teddy bear.\nGray and white dog sitting on top of the bed with a black cat.\nA person in a wet suit in the water engaging in a water sport.\nA guy wearing a black wet suit on a white board, surfing.\nA man in a suit waits in a room with a tv.\nA black and white picture showing small children in a dormitory setting.\nA bed sitting in a room near two lamps and a couple of pictures on the wall.\nThree boys hanging out in a living room with the T.V. on in the background\nA table full of assorted snacks and plates.\nA red double decker bus parked on the side of a road.\nA hallway lined with doors and filled with suitcases.\na engine sits parked inside of a ware house\nA yellow and blue fire hydrant in front of a building.\nplated vegetables on white dish displayed on hard surface.\na van that is parked by some people with umbrellas\nA cat lying on an open laptop that is on a bed.\nAn elephant standing next to a tree outside.\nTrio of zebras stands idle on the savanna.\na dark picture of two men on skate boards\ntwo cats, one orange and one gray, sit on  a shelf intended for shoes\nA submarine sandwich cut in half on a white plate next to a cup of coffee.\nA boy with a blue jacket is smiling on a ski-slope.\nA young child with a spoon eating a slice of cake\nAn assorted group of standing and reclining cell phones.\nTwo pizzas on a wooden table with a person seated.\nA person in their car views a ram in the street.\nThere is a horse race going on in a carriage cart\nSigns displaying foot and seating area hanging inside restaurant\nA traffic light with an orange and a red having faces drawn on them.\nan image of a flamingo drinking something orange\nA group of people holding umbrellas standing behind a sign for a umbrella drive.\nA little girl is playing a game on the television.\nA cat is in a bathroom standing on an open toilet.\nA woman stands by her luggage and carries a large bag.\nA woman standing on a surfboard riding a wave.\nA healthy meal of fruits and vegetables on a table.\nTHERE ARE CARS AND A TRUCK THAT IS PARKED IN THE PARKING LOT\nA train on the railroad track in an underground subway.\nTWO CONTAINERS OF FOOD SITTING ON TOP OF CONCRETE STEP\nTwo men are sitting on a couch and their ties have been tied together.\nA clothes line with clothes hanging from it and cattle in the background\nA young girl is taking a nap next to her mother.\nA stuffed animal with colorful decorations on it and clothes hanging on a wall.\nA small bird perched on the handle of a bicycle.\nA giraffe standing next to a tree covered in leaves.\nA hand holding a piece of food at a table.\nA guy on a snow board in the dark.\nA display of historic pots and artifacts on display steps.\nA group of women standing under a red and white umbrella.\nA woman holds an electronic device in front of the camera.\nA piece of cake sits on top of a plate.\nA herd of cows make their way across a river.\nA woman sitting at a table with a little girl and a man.\nA man with a toothbrush in his mouth and uncombed hair takes a picture of himself at his computer desk.\nA banana, red pepper, carrot, and green apple\nA man standing on a baseball field while wearing a glove.\nA clock tower on top of a building with a wind indicator.\nA woman sitting on a bench with a dog sitting on the ground by her.\nA bunch of people in a building doing different things\nA group of people stands around and looks at a phone.\nA British Airways airplane taking off into the sky.\nA dog with a white hat at the field\nA woman trying to take a frisbee from her dog.\nA group of people at the beach flying kites\nA shot of a field and road taken from outside of a vehicle window.\nA train passing through a railway station.Railway platform is seen.\nA bathroom has a sink on legs and round lights.\nA cat sits in the foreground looking at the camera while a bright yellow motor cycle is in the background.\nA kitchen with a large white counter top.\nThe cat naps on a shelf near the desk.\nA person eating food from a white plate next to a glass of wine.\nThe toaster adorned with a face sits atop the tiled surface.\nA cat is standing on a board game\nLarge and small elephants standing near a watering hole in the grass.\nA small restroom that is painted the color blue.\nThere is a fruit slushie next to a very sloppy chili dog.\nA airport runway filled with jetliners next to large tanks.\nA kid in a white shirt stands on the grass while another boy stands on a pathway near a hovering white Frisbee.\nA high shot of a counter with a microwave and other food.\nA fruit market with shops of banana and apple.people buying banana.\nDelicious looking pasta with a variety of noodles\nTwo young men sit on a couch in a sloppy room with a laptop, a phone, and a flat screen tv.\nA plate of food with broccoli and beef.\nA dog sleeping on a rug next to a stuffed animal.\nBoats floating on a  lake near a dock.\nSea birds gather on a broken pier surrounded by algae.\nSeveral bundles of fruit hanging from a plant.\nA crowded city street with a row of bicycles\nA black bear that is walking on a branch.\nA small white bird walking across a lush green field.\nA woman sits on the curb talking on her phone.\nA high statue with a clock inside on a very nice day.\nA man riding a skateboard down a wooden ramp.\nAn intersection of two streets in front of a home.\nA man with a snowboard that is standing up.\nThe baseball player is running from home plate.\nA train is going down the tracks in the dark.\nA person riding on a skateboard down a ramp.\nA large building with a clock tower on top of it\nThree ladies and a man sitting in a room with drinks on the table.Two of them playing video games.\nA bed with covers turned down and a messenger bag against a pillow.\nAn old building sits in the background behind an illuminated signal light.\nThe man is sitting on the post beside the water.\nA bathroom with a large tub next to a toilet and sink.\nA baseball player pitching a baseball on a field.\nA happy couple taking a selfie while sharing a drink.\nthree people and one is petting an elephant\nBLACK AND WHITE PHOTO OF A WOMAN, TWO CHILDREN,HORSE,COW AND A DOG\nSeveral boys on a field playing with a frisbee.\nA crock pot on top of a microwave on top of a refrigerator.\nA table topped with two pizza and plates next to glasses.\nA man carries a surf board as a dog walks beside him.\nA young man doing a jump off a ramp at a skate park.\nThe man in the suit is cutting the cake.\nPeople are flying their kites in the sky.\nSeveral people are standing in a living room while one examines a remote.\nA man riding skis down a snow covered slope.\nThe Central Railway Station tracks in an old photograph.\nA bathroom with a sink and toilet and very small mirror.\nA woman in a black helmet jumping a hurdle while riding a horse.\nTwo zeba standing on a dried grass plain looking off into the distance.\nA man and a woman smiling at the camera inside a large building.\nA wet floor sign is between a toilet and a urinal.\nTwo marble vases one containing white flowers, the other green grass.\nA woman in black jacket sitting at a park bench in woods.\nA banana sitting in a bowl that is on the table.\nA dinking room table in the living room right next to the fire place.\nA girl with long brown hair with streaks of red lays on a bed and looks at an open laptop computer.\nSeveral elephant statues on display in a mall.\na red bus that is in line with other cars\nA clock tower on a roundabout next to a building.\nSmoothie ingredients are in a blender including blueberries, strawberries, and bananas.\nA man on the beach is playing Frisbee.\na back to the future mcclaren and time machine toy\nA bear is standing outdoors in the wilderness.\nA toilet and bathtub are in a bathroom.\nthis is a pink box with food inside of it\nA trolley bus is coming down the street near trees.\nA woman takes a close up photo with her cat.\nA counter with a bunch of bananas and oranges on it.\nThe boy in the green shirt and green hat is holding a baseball mitt.\nAdult riding breaking wave in open ocean on sunny day.\nfour white and blue street signs on a wooden pole\nA giraffe is stepping on a log in a grassy area.\nA bathroom with a metal sink and an odd shaped toilet.\nA piece of chocolate cake is on a plate with a fork.\na couple of men that have wine bottles in hand\na porcelain toilet that must be used by crouching over it rather than sitting on it\nThe person is bodyboarding as the waves crash around him.\nA dog that is swimming in some water.\nMultiple men climbing and hiking through the snowy mountains\nA tray that has two forks, a bowl , and food on it.\nTwo birds perched up on a large tree branch.\nA lone skier, dressed all in black, going down a hill.\nPeople standing on the top of a green hill area with kites flying in the blue sky.\nA man wearing a tie next to a woman.\nA woman sits on a bed in a dark room.\nAn older boy and a young boy are playing a video game.\nTwo horses in a rope corral in a courtyard with one being groomed by a woman.\nBrown and white cows lined up against a barbed wire fence.\nA clock tower with a toy doll display below it.\nPeople riding and pushing tricycle carriages down the street.\nA train driving along tracks next to a city street.\nA boat is docked alone on the side of a river.\nSome cattle next to a brick building and a guy on motorcycle.\nWoman sitting at a restaurant holding a wine glass.\nA large bear in a tree biting into a branch.\nA toilet in a stall with a sink attached to the toilet tank and a console attached to the lid.\nSome baby bears are having fun on a sunny day.\nA person in red jacket snowboarding down a snowy hill.\nA farm with dozens of sheep in an enclosure.\nA kitchen that has a hanging rack and a refrigerator.\nvery clean bathroom with white towels and some bathing soaps\nA couple at a cafe each on their respective cell phones\na big giraffe and a small giraffe are in their pen\nA large dog in a room with yellow walls.\nA woman walks through a busy area holding a purple umbrella\na giraffe eating food from a food dispenser\nA tennis player is running on a tennis court.\nA subway car stops at a station, its doors open.\nA snowboarder standing on a snowy mountain looking out.\nSome people standing on a surf board on the beach.\nA desert with some fruit on a plate.\nAn adult and child elephant are eating grass.\na black and white dog standing in front of a glass window\nThe multi-colored cat is standing on a luggage bag.\na bunch of stuff in a home living room\nA group of cows grazing near a passing train\na big swimming pool that has some people in it\nA train with closed doors near a platform.\nKids playing baseball while parents watch from benches.\ntwo women with a basket sitting at the bottom of the stairs\nA rack of bow ties hanging from clothes pins.\nCrabs walk across the sand along the ocean.\nTwo old style planes flying side by side in the sky.\na truck has pulled off the road to look at an elephant\nA group of nine jet planes flies in formation.\nA woman carrying a surf board by the ocean.\na pink and white plate with some banana slices on bread and a drink\nA man that is on a surfboard in the air.\nA glass vase sitting on top of a table.\nA close up of raw meat and meat cooking in a deep fryer.\nA kitten is standing in a refrigerator shelf.\nA skateboarder skating next to a concrete street divider.\na close up of an electric blender on a table\na person in a kitchen preparing food\nTwo Teddy bears sit next to each other.\nA group of people set on the ground talking in a park.\nA group of young people gather with surfboards on a tropical beach.\nThree giraffes standing around inside of their enclosure.\nA black and white street sign with a white building behind it.\na young person laying on a couch with s nintendo wii remote\nA sandwich with peppers and ale are setting on a table.\nBusses paused to a stop at a bus stop.\nA red and blue dump truck traveling along a city street.\nA sprinkled doughnut sitting on a white napkin next to the bag it came in.\na large elephant that is standing in grass\nA brown and black cat licking a woman's face.\nTwo hitched horses standing next to each other with pink coverings on their heads.\nA tennis player leans into her stroke on the court\nA pizza slice on plate, beer in mugs and beer bottle on a kitchen table with place mats.\nA donut has a bunch of nuts on top of it.\nA train sitting on top of train tracks near  forest.\nA couple of giraffe standing on top of a grass covered field.\nA half unmade bed in a hotel room\na close up of a person cutting a pizza with scissors\nA man in a suit and tie holding a water bottle and people with cameras standing around him.\nA man trying to get his dog to herd goats.\nA man in wetsuit riding a white surfboard on wave.\nA woman stands ready with a tennis racket.\nA person and skateboard in air over a ledge by a sidewalk of city road with cars.\nA teenager rides a skateboard down the stair railing.\nThe two giraffes are standing together in the grasslands.\nA large passenger jet flying over an airport.\nA cat laying on top of a blue dresser near a chair.\na blurry photo of an empty city street\na woman is holding a teddy bear in a room\nA man watches a flatscreen TV set above wrapped gifts.\na couple of computers are sitting on a desk\nA bathroom mirror that is trimmed in gold and reflecting the room.\nThe back side of a vehicle packed with bags.\nSmall horse sitting beside a large brown horse.\na plate of pizza sits on a checkered table cover\nThe Asian kid is gleefully playing with the cellular telephone.\nA woman petting the trunk of a elephant.\na man flying through the air on top of a skateboard.\nThere is a room with various items in the picture.\nThis is a baseball player trying to hit a ball\nThe urinal on the ground has a toilet scrubber next to it.\nFour giraffes are behind a fence in the dirt.\nA pink kite is flying in the sky at a beach.\nAn orange cat laying on a black laptop in living room.\nA man pitching a baseball on top of a field.\nA female runner eating a banana during her run.\nA guy in a bandana leaning over a laptop.\nA elephant standing in a field with lots of grass.\nThree zebras are running in bright green grass.\nTwo guys are shaking hands while one grips the tennis racket.\nA laptop on a wooden chair of some sort.\na street with two stop signs and people walking down the street\nA picture of a hotel room having just been cleaned.\nA brown dog touches noses with a sheep.\nA little boy holding a baseball bat on a field.\nA group of people that are in a market.\nA zebra waling in a field of dead grass by some trees.\nYellow and black older snowmobile inside room with blue walls.\nA man standing up in front of doors with a folder in his hands.\nA smiling woman with scissors cutting a sign\nA clean white toilet with the lid down in a bathroom.\nA woman that has purple socks and a book.\nThere is a large gray elephant standing next to a tree.\nA small white living room with sofa and lunge chair\na brown and white cat has its paws on a laptop\nA group of people in a room playing video games\nA picture of a bear that is in the grass.\nSeveral people in a group are flying very colorful kites.\nthere are two statues of zebras at a exhibit\nThe bowl of greens is near a wooden bowl.\nBunches of bananas are placed on flat newspapers.\nA giraffe looking up at a tree behind some large rocks.\nFour laptops sitting on a cluttered desk with a phone and a pair of headphones.\nA man in a black shirt opens an oven door while looking at the inside of the oven.\nA group of airplanes are parked at a runway and a truck is parked next to a plane.\nThe couple is sharing a piece of cake while being photographed.\na man on a pole with an umbrella\nA group of people skiing down a mountain in the snow.\nA man carrying a basket filled with fruit and clippers.\nA green double decker city bus by the curb.\nBlack and white photograph of people with umbrella next to cars in snow.\nOrange tiger stuffed animal sitting on the bed of a pickup truck.\nA plate that has sandwiches and chips on it.\nMan sits on his parked motorcycle with body of water and bridge behind him.\nA cat lays on top of a blanket and sleeps.\nA person on a motor bike on a road.\nA pizza with a few toppings is on a plate.\nA brown basket filled with bananas and apples.\nA blue and white fire hydrant on a street.\nFresh produce is arranged in a grocery store display.\na man is sitting at a table with some food\nA wooden crate holding bananas under a roof area.\nsome people in orange are standing together outside\nTwo people are flying a kite on a hill.\na young person and an older person holding a kite\na fire hydrant is spraying water onto the street\nA green bus parked in front of a tall building.\nthere is a male tennis player on the court in a game\nA bunch of stuffed animals stacked on top of each other.\nBeach goers enjoying sunny day on sandy beach at ocean.\nTwo firetrucks are ready to be deployed to a fire.\nA city bus drives down a quiet road.\nA large elephant walking in front of a vehicle.\nTwo giraffe's standing in the shade under a canopy.\nWoman on cell phone in city at night.\nA girl sits on a bench on grass outside a red door.\nA variety of Apple Ipod products on display.\nTwo giraffe standing on top of a muddy puddle of water.\nA panda bear that is holding a stick.\nA view of a baseball field from behind home plate\nA blue and yellow mass transit bus turning a corner.\nA  plate of food on a table with a tall glass\nA man standing and holding a tennis racket on court.\nSmall dog playing with toilet paper in bathroom.\nTwo men and two women make breakfast plates in a kitchen.\nTwo people in skis standing together a snowy hill.\nPeople are riding horses through the grassy plain.\nA building with three steeples and a clock in the center.\nA parking meter on the sidewalk of a busy street.\na square shaped pizza with bacon, an egg and tomatoes on a white paper plate.\nA full view of suitcases with some clothes on it.\nunderside of a plane flying through a cloudy sky\nA large group of people on the street.\nTwo men sit in front of large baskets of fruits and vegetables.\nTHERE IS A WOMAN THAT IS STANDING ON THE STREET\nTwo triangular street signs on grass next to brick pathway.\nthere is a small dog that has fallen asleep with a book\nA pizza on a tray with a fork and glasses on the table\nA dog laying in the back of a moving truck.\nA small bird standing on the ground next to body of water.\nA couple of elephants standing on a lush green forest.\nA close up of a duck walking on a path.\nToddler enjoying playing with a colorful kite in a grassy field\na statue standing next to a clock and some bells\nThree plates with a different dessert on each.\nThe view of a bathroom showing a toilet with a small waste bin next to it.\nA blowup seat in the back of a blow up raft\nThe plate has a picture of a kitty on it.\nHorse drawn carriage with a pair of black horses in front.\nThe man is pitching the baseball on the field.\nA planter that is standing on a stand.\nA very large doughnut  sits atop a building as an advertisement.\nA sleeping  dog laying on a stone walkway.\na man skating on the road very fast\nAn empty bus is parked in front of a building.\nA black table has a white vase with flowers.\nMale tennis player on the middle of the court.\na close up of doughnuts on a plate on a table\nA skier performs a trick in the air off a ramp\nA fire hydrant sits on the curb in the snow.\nA small bathroom has an open skylight in the ceiling.\nThe small dog wearing a pink scarf stands in the yard near a bowl.\nA living room complete with a couch, chair and television.\na couple of benches in front of a body of water.\nA zebra standing in a sandy spot surrounded by green ground cover.\nA boat travels in one direction of the ocean while a smaller pleasure craft travels in the opposite direction.\nA laptop set up on a wooden table.\nA chair lift over a long ski run.\nA stop sign on a pole in a city\nCardboard boxes stacked up in a living room\nA fireman is on top of the truck ladder\nA big ocean wave with someone trying to stay on the surfboard.\nA man hitting a tennis ball on the tennis court.\nA man is in a kayak in a pool with a ball.\nPeople sitting at tables working on laptop computers.\na pizza with pesto sauce sitting on some oven mitts\nA group of people riding motorcycles is going down a road.\nThe large double decker bus is coming around a corner.\nA brown horse grazing in a grassy area.\nChina Airlines plane in air with landing gear out.\na sandwich has a bite taken out of it\nA suite case that has a large quantity of glasses in it.\nThe fluffy cat is sitting on top of a toilet in the bathroom.\nA spacious bathroom with two sinks and a claw foot tub.\nTwo young people are riding a bike together next to the parked vehicles.\nThe three trains are stopped on the railroad tracks.\nA kitchen with brown cabinets has an island.\nI am unable to see the image above.\nthere is a man sitting outside at a table with a large pizza\nA train engine with train cars behind it, riding on a  set of tracks with smoke blowing from the engine.\nA vase filled with a large yellow and black sunflower and other flowers.\nA group of people standing around a room together.\nwomen standing next to a truck on display\nA large jet sitting on top of an airport tarmac.\nthere are many blue and white umbrellas on this beach\nA boy is sitting at a table eating pizza.\nA cat sitting on top of a television.\nA white cow surrounded by many dark cows inside a coral.\nComputer screen with the keyboard and printer sitting next to it.\nPartially open door leading to a kitchen from a hallway.\nan Olympic event going on with many skiers\nA box of doughnuts being held open by a hand\nA FedEx truck waits at the bottom of a San Francisco hill.\nA city bus driving down the street to georgetown\nsome baseball players are playing baseball on a field\nTwo women riding on the back of somebody else motorcycles\nResidential pantry with food items stocked on shelves.\nA very large building, that appears to be a truck.\nA person feeding a giraffe while wearing a hat.\nA young man riding a boogie board on top of a wave pool\nProfessional dirt biker with woman on backseat of bike.\nTwo zebras grazing on flowers in a pasture.\nThe powdered pastry has filling in the middle of it.\nBoy with a football book and his dog outside.\nA table topped with plates and bowls of food.\nHe is hitting the baseball with the bat.\na baseball player is swinging at the pitch\nAn intersection shows  an expanse of empty road and then a car coming out from under a large arch that looks like a giant Chinese letter and stands between two buildings that stand at the forefront of am open walled walkway and retail venues.\nA man in his car using his phone\nA motorcycle police officer is pulled alongside fellow officers in car.\nTwo elephants cross a dirt road between two stands of trees.\nThe man is throwing the baseball during the game.\nA skier in a green jacket going down a slope covered in snow.\nA large passenger airplane flying against a partly cloudy sky.\nA young boy eats a piece of pizza.\nA flock of birds flying through the sky.\nA giant panda sitting on logs lazily yawning.\na man is making pizza in his brick stove oven\nA picture taken from between an individuals knees at the sky.\nA part of hands with scissors trimming a plant.\nA banana next to a sprig of vanilla and a shot glass.\nA group of people in the snow, putting on snowboards.\nA chair sitting in the middle of the room, in a black and white photo.\nTwo riders dressed as knights are on horseback.\nA medium-sized brown-colored worm wiggles as a large yellow slimy slug looks on.\nA half eaten pizza on a table with dishes.\nA beautifully maintained bedroom with rustic charm features natural wood.\nTwo parking meters that are almost covered in snow.\nSome pancakes with icecream and bananas and a coffee\nPeople riding elephants who are wading through a river.\nA small group of penguins approaching a pool of water with one already swimming\nA group of people sitting around a table eating food.\nA stop sign flashes with an exit sign below it.\na bunch of small children holding tennis rackets on a tennis court\nA black dog is laying on a white pillow\na guy grinding his skateboard on a wooden post\nA bear that is standing in front of a rock.\nA man standing in front of a TV holding a Wii game controller.\nA man swings his Wii controller back in a living room.\nA large clock next to other smaller clocks set to different time zones.\na room showing a cooker and an oven\nA man with a helmet holding wires attached to something in the sky.\nAn elephant with tusks is standing between two fences.\nA very big bright colored truck and a van on a narrow road.\nThe ball player is preparing to pitch the ball.\nThe unicycle is on the curb in front of a parking meter.\nA book in french laying on a bed.\nA giraffe standing inside an enclosure with two deer.\nA train with multiple cars passing by trees.\nSpectators watching men on horses riding in an ANZAC Day parade in Australia\ntwo police riding horses on a london street\nA door that is opened wiith a chair inside .\nAn empty bathroom with 2 toilets next to each other.\nTwo black bears sit on the ground beside a structure made of wooden logs while another stands on top of it.\nTHERE IS A METET THAT IS ON THE STREET ON THE SIDE WALK\nThe young woman is jumping into the air as birds fly over the ocean behind her.\nThree stop lights and one way signs are in the intersection.\nA cat lays in the window on a sunny day.\nA man on a skate board who is touching the ground.\na bamboo tray holding several bowls of asian food\nA man riding on the back of a brown horse down a street.\nA group of people with toy swords in a crowd.\nTwo women in the snow on skis in front of a large building.\nA pigeon that is sitting on top of a head stone.\nA herd of zebra and horses standing next to each other.\nA man holding a baby in front of a plate with cake.\nA person clothed head to toe in white paints a room.\nA BIG GROUP OF PEOPLE FLYING KITES IN A FIELD\nA bathroom with a tub and shower and a sink.\nA baby bear standing among some tall grass.\nA living room filled with furniture and a flat screen TV.\nBright red umbrella open on the sand of a beach.\na motorcycle parked on a side walk near a brick building\nlose up of various trays of croissants and muffins.\nA man is taking a selfie with a mountain range in the background.\na zebra in some brown grass and some green plants\nA car and motorcycle riding on a pavement road.\nA row of pizzas sit on tables underneath lamps.\nSome people in an arena with other people watching from the stands.\nA living room near an open window has furniture and an area rug on the floor.\nA man and woman dressed in wedding attire walking out of a building together.\nA bunch of bananas sit next to a cup of coffee.\nA woman relaxes on her bed and uses her computer\nA man with gray hair is holding a colorful kite.\nA polar bear grazing in a vibrant green grass\nA purple skateboard sitting at the back of a bus isle.\nA black horse and white horse graze for grass\na guy and a girl getting ready to stand up on their surf boards\nA long red table with dishes on it seats many people in a room.\nSmall toy train engine set with a train station.\nView of adult elephant seen through the trees\na couple of cars pass through a city street\nA woman in the process of serving a tennis ball.\nA woman with a shorn sheep on a grate.\nA smiling young woman uses a computer in the kitchen.\na photo of city buildings near beautiful plants\nA group of cows laying in a green pasture or grazing.\nA giraffe looking ahead in front of a stone wall.\nA man wearing a purple die and work shirt\nA radio sitting on a table next to a record player.\nA stove top in a storage type of room with several spices on the stove.\nGreen onions sit on a cutting board along with carrot sticks.\nA man setting at a table in a restaurant cutting his food.\nA picture of a large cathedral with clock in the center.\nTwo plush bears are found as a gift along with a Starbucks cup\nA row of floor height urinals in a public restroom.\nTwo shake boarders playing on the street with one individual sitting under a tree.\nA child playing with his hand-held game system.\nA birthday cake that is decorated with a dolphin and sea horse on it\nA white bowl with a few pieces of broccoli.\nA train is coming down the track near a hillside.\nThere is a figurine by the computer keyboard in the office.\nA bedroom with a plain neatly made bed with no headboard\nA man petting a giraffe whose face it over the fence\nAn elephant standing on rocks next to a wood bridge.\nA large bathroom has a tiny window and a tub and toilet and sink and mirror.\nA group of children playing with a ball.\nA cooked pizza made with various separated toppings\nA man is snowboarding and is mid air over the snow.\nThe person in the black and white photo is jumping up with a skateboard.\nA big clock tower topped with a walk and an American flag, stands tall against a blue sky, far ahead of city skyline, and right above a lot of teal-roofed domiciles.\na blue bus parked at a street corner.\nCouple of people out in the ocean on surfboards\nThe dogs are playing together out in the yard.\nAn adorable little girl holding her hand over her mouth.\nA dog catching a Frisbee in a park, with people in the background.\nA woman with short, brown hair is looking into a circular mirror and holding a camera up to her cheek.\nA man and a small child fly a butterfly kite in a park.\nA snowboarder jumping through the air and performing a trick.\nA very beautiful kitchen with very modern updates\npeople bringing their vegetables to the market by boat\nA bowl with steamed broccoli topped with nuts in it.\nA wooden table topped with cooking tools next to a sink.\nA man is jumping up to catch a Frisbee between his legs.\nTwo men playing a game with steering wheel controllers.\na giraffe eating some leaves off a tree\nA fire hydrant that was busted and is shooting water out.\nA woman walking past a table with a plate of food on top of it.\nA wooden table topped with lots of camera equipment.\nA stemmed bottle is holding a slender flower in a window sill with a view of rain.\nA blender filled with food on top of a counter.\nA person took a picture of his torso and legs while laying on the top of a bunk bed.\nA boy with a helmet stands next to a clock.\nA man that is holding a banana in his mouth.\nA caste all it up, reflecting off of the water.\nA male maneuvering up a ramp while on his skateboard.\nA red pick up truck with a plow blade drives down a snowy suburban road.\nSeveral toilets are place outside on a lawn.\na bird eating out of a pizza box that is on the ground\nAn airplane flying under the clouds in daytime.\nA hand with a gold ring is posed over a wireless keyboard, beside a wired mouse\nA couple of people dancing in some sand with no shoes on.\nThe cat is observing its own visage in the circular make-up mirror.\nA group of people standing around a baby elephant in a river.\na guy attempting a trick with his skateboard while othes watch\nA man and two boys herd 5 sheep into a truck.\nA sub sandwich in a box next to two hot dogs.\nMen in suits smiling and walking across a green soccer field.\nFour tanned men and a girl at an event.\nTwo born bears walking though a forest surrounded by trees.\nA skiers lies on her back with the skis straight up.\nA building at a railroad crossing billows smoke.\nConstruction loading truck driving in front of a building.\nThe nose of an airplane sits on the landing strip, boarding passengers.\nA bowl filled with pasta, veggies and seasoning.\na small dog is walking next to the fruit stand.\nA cup of Starbucks coffee is sitting on the side of a court.\nA street scene with cars on the road and people on the sidewalk.\nA person taking a photo in a mirror on a mass transit vehicle.\nA woman sitting on a bench in a stone alcove.\nA picture of a boat and some water.\nA small plane is getting ready for a flight\nAirport during a snowstorm with planes awaiting boarding.\nA dog and cat sitting on a couch\nA little girl makes a pizza with a smiley face.\nA woman in sunglasses petting the trunk of an elephant.\nA woman holds a mirror and tool up to a woman's mouth.\nThe skier is sitting down in the snow.\nA street pole has an enormous number of signs on it.\nThe carrots in the dish are marinating in beer.\nThere is a woman holding a wine glass and a man wearing a necklace.\nA person holding a Chocolate Lab dog while the dog holds an old teddy bear.\nthere is a male wake boarder holding on to a rope in the water\nthree young people holding wine glasses laughing\na cow walking on a city street near people\nA young man holding a white frisbee next to poles.\nA kitchen mostly empty with lots of cupboard and counter space.\nThe food is seasoned and ready to be cooked.\nTwo men in a kitchen are standing by a refrigerator.\nTwo baseball players are walking on the field.\nA man riding skis down a snow covered slope.\nSome lemons are in a vase and oranges and grapes are in a plate.\nTwo men are holding tennis balls and rackets.\nA circular mirror reflecting a woman's stomach in turquoise shirt.\nA birthday cake with gum drops and a bag of Cheetos cheese bacon snacks on a table.\nA blue and silver railroad train placed on the tracks\nA man welding the back of an oven.\na close up of a box of open pizza\nThe Helen J sitting in the ocean not moving.\nA yellow fire hydrant on a street corner.\nIt looks like a human figure hanging in the tree limbs, partially concealed by foliage.\nA person sitting on a wooden bench outside.\nA guy in a helmet skate boards down the street.\na girl is on a phone standing near a sign\nA group of young men standing on a basketball court.\nA cutting board with green peppers already cut and some awaiting their cutting.\nA man standing on a field holding a catchers mitt.\nA woman standing in front of the Eiffel Tower surrounded by photo shopped animals.\nA kitchen area with a double sink, a stove, a refrigerator and several other kitchen utensils.\nPicture of reflection in a mirror of a kitchen\nTwo people holding surfboards on the shoreline of the beach\nA few surfers ride a good wave in the ocean.\nA building lined street with three lanes and light traffic.\nA small bathroom with a toilet next to a cabinet.\nA woman holding a white teddy bear next to a wood cabinet.\nFamily room with furniture, fireplace and wood flooring.\nA tennis player is in air while extending his arm up to return the ball.\nTwo little dogs hiding in the pillows of a couch.\nA bear dressed in a green outfit sitting outside.\nA man teaching a boy how to play baseball\nAn office break room with table, microwave, sink and lockers.\nThe boy throws a baseball to another boy who is ready to hit it.\nA home kitchen stripped down to be painted.\nan assortment of fruit including oranges and bananas\nThe top of a church showing steeples and windows.\nA service truck at an airport terminal with planes reflected in the windows.\nA computer screen and keyboard on a desk.\nA boat heading upriver to a harbor town.\nA group of bicyclists are riding down a path\nBaseball players are in action as a crowd watches.\nA sheepdog prepares to guide a sheep into a corral.\nTwo people that are standing on ski's in the snow.\nA person that is about to catch a frisbee.\nThere is a bowl of food and a sandwich on a plate\nA fire hydrant sitting on the side of a road.\nA crane is stacked high with lots of luggage.\nA man with no fashion sense holding several frisbees.\nTwo cats are laying down together on what seems to be a table cloth.\nRows of handmade grass umbrellas lying on their sides.\na red double decked bus advertising a shop\nThe woman walks through sand with a black horse.\nA heard of cows with yellow tags on their ears in a field of grass.\nA plane is flying low during the evening.\nSnow covered mountains can be seen past the boats on the water.\nPeople hold six corn dogs with various mustard designs\na couple skiers skiing through cones down a slope\nA man prepares to serve a tennis ball.\nA cat sitting behind storage containers and a computer.\nTwo emergency vehicles on a driveway next to a garage.\nA sidewalk is next to many different signs.\nSome men with snowboards standing on a hill\nimage does not appear in this particular one\nA giraffe standing on top of a dirt field.\nThree people check on a number of bicycles in a showroom\nGroup of mixed fruits sitting inside a metal basket.\nA blue and yellow train is parked on the tracks.\nA brown and white dog laying next to luggage at an airport.\nA grassy field with three zebra grazing from the ground.\nA baseball player holds a bat while standing next to home plate.\na tennis player swings his tennis racket\nA hotel lobby with a table and flowers in a vase.\nTwo planes are on a runway beside trucks.\na man in a black hat standing next to and  holding the reigns of a horse\nA boy takes a selfie in a bathroom with Harry Potter decorations.\nA very big cute giraffe by a pretty palm tree.\nA group of people walking around a shopping center.\nA smiling man is playing tennis on a brown court.\na small and dirty zebra inside of a corral\nA young boy holding a baseball bat to his face.\nA crowded store with several different displays of goods for sale.\nA black cat staring into the distance in a room\nA young girl on a bench with a kite.\nA bike on a pole in front of a brick building.\nA man is performing tricks on a bicycle.\nA busy city intersection with public transit and pedestrians.\nA dark room with a bed and black chair.\nA man doing a trick on a skateboard on a ramp.\nA man holding the string to a kite in a park.\nA man holding a tennis racket with a ball in the air on the tennis court.\nA person in the snow with two dogs on leashes.\nA brown bear walking with rocks in the background.\nA clock above a glove resting on a leopard print ledge.\nA cat places its mouth on a computer keyboard.\nA couple of people eating a slice of pizza.\nA silver fire hydrant with a blue top at a road corner.\nA man on a a fake horse is in the parade.\nThe inside of a vehicle driving down a highway with a tv playing an image.\nA large group of elephants are in the water.\nA colorful plate of avocado, carrot, and cabbage.\nA young man doing tricks on his skate board.\nAn airplane sitting on top of an airport tarmac.\nA man sitting on the raised cement border around a tree and looking at his cellphone.\nThis is a small bathroom with a towel on the floor.\nthere is a man drinking whine from a glass\nA red truck with patriotic bunting drags a parade float.\ntwo black cats are drinking out of a toilet\nA herd of goats standing on a public street.\nSandwiches on buns topped with black olives and tomato.\na birthday cake with candles on top of it\nLooking up at a tall clock tower in a blue sky\nA city bus is leaving the bus station.\nA person walking out of the waves with a surfboard.\nTwo bowls of soup set on a restaurant table.\n"
  },
  {
    "path": "DiT-ToCa/cache_functions/__init__.py",
    "content": "from .cache_cutfresh import cache_cutfresh\nfrom .fresh_ratio_scheduler import fresh_ratio_scheduler\nfrom .score_evaluate import score_evaluate\nfrom .global_force_fresh import global_force_fresh\nfrom .cache_cutfresh import cache_cutfresh\nfrom .update_cache import update_cache\nfrom .force_init import force_init\nfrom .attention import Attention\nfrom .cache_init import cache_init\nfrom .cal_type import cal_type"
  },
  {
    "path": "DiT-ToCa/cache_functions/attention.py",
    "content": "# Besides, re-arrange the attention module\nfrom torch.jit import Final\nfrom timm.layers import use_fused_attn\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport os\n\nclass Attention(nn.Module):\n    fused_attn: Final[bool]\n\n    def __init__(\n            self,\n            dim: int,\n            num_heads: int = 8,\n            qkv_bias: bool = False,\n            qk_norm: bool = False,\n            attn_drop: float = 0.,\n            proj_drop: float = 0.,\n            norm_layer: nn.Module = nn.LayerNorm,\n    ) -> None:\n        super().__init__()\n        assert dim % num_heads == 0, 'dim should be divisible by num_heads'\n        self.num_heads = num_heads\n        self.head_dim = dim // num_heads\n        self.scale = self.head_dim ** -0.5\n        self.fused_attn = use_fused_attn()\n        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)\n        self.q_norm = norm_layer(self.head_dim) if qk_norm else nn.Identity()\n        self.k_norm = norm_layer(self.head_dim) if qk_norm else nn.Identity()\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj = nn.Linear(dim, dim)\n        self.proj_drop = nn.Dropout(proj_drop)\n\n    def forward(self, x: torch.Tensor, cache_dic, current, fresh_indices=None) -> torch.Tensor:\n    # 0.4ms extra cost on A800, mainly tensor operations\n        \"\"\"\n        fresh_indices: (B, fresh_ratio*N), the index tensor for the fresh tokens\n        \"\"\"\n\n        B, N, C = x.shape\n        \n        qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, self.head_dim).permute(2, 0, 3, 1, 4)\n        q, k, v = qkv.unbind(0)   #q: (B, num_heads, N, head_dim)\n        if cache_dic['cache_type'] == 'kv-norm':\n            cache_dic['cache'][-1][current['layer']]['v_norm'] = torch.norm(v, dim=-1, p=2)\n\n        q, k = self.q_norm(q), self.k_norm(k)\n        #q: (B, num_heads, N-M, head_dim), k: (B, num_heads, N, head_dim), v: (B, num_heads, N, head_dim)\n        if (self.fused_attn) and (cache_dic['cache_type'] !='attention'):\n            x = F.scaled_dot_product_attention(\n                q, k, v,\n                dropout_p=self.attn_drop.p if self.training else 0.,\n            )\n            attn_map = None\n        else:\n            q = q * self.scale\n            attn = q @ k.transpose(-2, -1)\n\n            attn_map= attn.softmax(dim=-1) #extra cost for attn\n            attn = self.attn_drop(attn_map)\n            x = attn @ v\n            attn_map = attn_map.mean(dim=1) #head mean\n        \n        x = x.transpose(1, 2).reshape(B, N, C)\n        x = self.proj(x)\n        x = self.proj_drop(x) \n        \n        flops = (\n            B * N * C * 3 * C * 2 # QKV projection\n            + B * self.num_heads * N * self.head_dim  # Scale q\n            + B * self.num_heads * N * N * self.head_dim * 2 # Q @ K\n            + B * self.num_heads * N * N * 5 # Softmax\n            + B * self.num_heads * N * N * self.head_dim * 2 # Attn @ V\n            + B * N * C * C * 2 # Projection\n        )\n        cache_dic['flops']+=flops\n        \n        return x, attn_map # x: (B, N-M, C), attn_map: (B, N-M, N)\n"
  },
  {
    "path": "DiT-ToCa/cache_functions/cache_cutfresh.py",
    "content": "from .fresh_ratio_scheduler import fresh_ratio_scheduler\nfrom .score_evaluate import score_evaluate\nfrom .token_merge import token_merge\nimport torch\ndef cache_cutfresh(cache_dic, tokens, current):\n    '''\n    Cut fresh tokens from the input tokens and update the cache counter.\n    \n    cache_dic: dict, the cache dictionary containing cache(main extra memory cost), indices and some other information.\n    tokens: torch.Tensor, the input tokens to be cut.\n    current: dict, the current step, layer, and module information. Particularly convenient for debugging.\n    '''\n    step = current['step']\n    layer = current['layer']\n    module = current['module']\n    \n    fresh_ratio = fresh_ratio_scheduler(cache_dic, current)\n    fresh_ratio = torch.clamp(torch.tensor(fresh_ratio), 0.0, 1.0)\n    # Generate the index tensor for fresh tokens\n    score = score_evaluate(cache_dic, tokens, current)\n    score = local_selection_with_bonus(score, 0.6, 2) # Uniform Spatial Distribution s4 mentioned in the paper\n    # 0.6, 2\n    indices = score.argsort(dim=-1, descending=True)\n    topk = int(fresh_ratio * score.shape[1])\n    fresh_indices = indices[:, :topk]\n    #stale_indices = indices[:, topk:]\n    # (B, fresh_ratio *N)\n\n    # Updating the Cache Frequency Score s3 mentioned in the paper\n    # stale tokens index + 1, fresh tokens index = 0\n    cache_dic['cache_index'][-1][layer][module] += 1\n    cache_dic['cache_index'][-1][layer][module].scatter_(dim=1, index=fresh_indices, \n                                                                    src = torch.zeros_like(fresh_indices, dtype=torch.int, device=fresh_indices.device))\n    \n    ## not used in the final version\n    #cache_dic['cache_index']['layer_index'][module] += 1\n    #cache_dic['cache_index']['layer_index'][module].scatter_(dim=1, index=fresh_indices, \n    #                                                                src = torch.zeros_like(fresh_indices, dtype=torch.int, device=fresh_indices.device))\n    # select the fresh tokens out\n    fresh_indices_expand = fresh_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1])\n\n    if module in ['mlp', 'attn']:\n        # cut out the fresh tokens\n        fresh_tokens = torch.gather(input = tokens, dim = 1, index = fresh_indices_expand)\n\n        return fresh_indices, fresh_tokens\n    \n    else:\n        # no need for this branch hhh.\n        raise ValueError(\"Unrecognized module?\", module)\n    \ndef local_selection_with_bonus(score, bonus_ratio, grid_size=2):\n    '''\n    Uniform Spatial Distribution s4 mentioned in the paper\n    '''\n    batch_size, num_tokens = score.shape\n    image_size = int(num_tokens ** 0.5)\n    block_size = grid_size * grid_size\n    \n    assert num_tokens % block_size == 0, \"The number of tokens must be divisible by the block size.\"\n    \n    # Step 1: Reshape score to group it by blocks\n    score_reshaped = score.view(batch_size, image_size // grid_size, grid_size, image_size // grid_size, grid_size)\n    score_reshaped = score_reshaped.permute(0, 1, 3, 2, 4).contiguous()\n    score_reshaped = score_reshaped.view(batch_size, -1, block_size)  # [batch_size, num_blocks, block_size]\n    \n    # Step 2: Find the max token in each block\n    max_scores, max_indices = score_reshaped.max(dim=-1, keepdim=True)  # [batch_size, num_blocks, 1]\n    \n    # Step 3: Create a mask to identify max score tokens\n    mask = torch.zeros_like(score_reshaped)\n    mask.scatter_(-1, max_indices, 1)  # Set mask to 1 at the max indices\n    \n    # Step 4: Apply the bonus only to the max score tokens\n    score_reshaped = score_reshaped + (mask * max_scores * bonus_ratio)  # Apply bonus only to max tokens\n    \n    # Step 5: Reshape the score back to its original shape\n    score_modified = score_reshaped.view(batch_size, image_size // grid_size, image_size // grid_size, grid_size, grid_size)\n    score_modified = score_modified.permute(0, 1, 3, 2, 4).contiguous()\n    score_modified = score_modified.view(batch_size, num_tokens)\n    \n    return score_modified"
  },
  {
    "path": "DiT-ToCa/cache_functions/cache_init.py",
    "content": "def cache_init(model_kwargs, num_steps):   \n    '''\n    Initialization for cache.\n    '''\n    cache_dic = {}\n    cache = {}\n    cache_index = {}\n    cache[-1]={}\n    cache_index[-1]={}\n    cache_index['layer_index']={}\n    cache_dic['attn_map'] = {}\n    cache_dic['attn_map'][-1] = {}\n    for j in range(28):\n        cache[-1][j] = {}\n        cache_index[-1][j] = {}\n        cache_dic['attn_map'][-1][j] = {}\n    for i in range(num_steps):\n        cache[i]={}\n        for j in range(28):\n            cache[i][j] = {}\n    cache_dic['cache_type']           = model_kwargs['cache_type']\n    cache_dic['cache_index']          = cache_index\n    cache_dic['cache']                = cache\n    cache_dic['fresh_ratio_schedule'] = model_kwargs['ratio_scheduler']\n    cache_dic['fresh_ratio']          = model_kwargs['fresh_ratio']\n    cache_dic['fresh_threshold']      = model_kwargs['fresh_threshold']\n    cache_dic['force_fresh']          = model_kwargs['force_fresh']\n    cache_dic['soft_fresh_weight']    = model_kwargs['soft_fresh_weight']\n    cache_dic['flops']                = 0.0\n    cache_dic['test_FLOPs']           = model_kwargs['test_FLOPs'] \n    \n    cache_dic['cache'][-1]['noise_steps'] = {}\n    cache_dic['counter'] = 0.0\n    \n    current = {}\n    current['num_steps'] = num_steps\n    return cache_dic, current\n    "
  },
  {
    "path": "DiT-ToCa/cache_functions/cal_type.py",
    "content": "def cal_type(cache_dic, current):\n    '''\n    Determine calculation type for this step\n    '''\n    last_steps = (current['step'] <=2)\n    first_step = (current['step'] == (current['num_steps'] - 1))\n    force_fresh = cache_dic['force_fresh']\n    if not first_step:\n        fresh_interval = cache_dic['cal_threshold']\n    else:\n        fresh_interval = cache_dic['fresh_threshold']\n\n    if (current['step'] % fresh_interval == 0) or first_step:\n        current['type'] = 'full'\n        \n    elif ((current['step'] % fresh_interval) % 2 == 1): #[1,3,5] [2,4,6]\n        current['type'] = 'ToCa'\n    # 'ToCa' 'FORA'\n    else: \n        current['type'] = 'ToCa'\n"
  },
  {
    "path": "DiT-ToCa/cache_functions/force_init.py",
    "content": "import torch\nfrom .force_scheduler import force_scheduler\ndef force_init(cache_dic, current, tokens):\n    '''\n    Initialization for Force Activation step.\n    '''\n    # reset the cache index to 0\n    cache_dic['cache_index'][-1][current['layer']][current['module']] = torch.zeros(tokens.shape[0], tokens.shape[1], dtype=torch.int, device=tokens.device)\n    if current['layer'] == 0:\n        cache_dic['cache_index']['layer_index'][current['module']] = torch.zeros(tokens.shape[0], tokens.shape[1], dtype=torch.int, device=tokens.device)\n    #if current['layer'] == 27:\n        force_scheduler(cache_dic, current)"
  },
  {
    "path": "DiT-ToCa/cache_functions/force_scheduler.py",
    "content": "import torch\ndef force_scheduler(cache_dic, current):\n    '''\n    Force Activation Cycle Scheduler\n    '''\n    if cache_dic['fresh_ratio'] == 0:\n        # FORA\n        linear_step_weight = 0.0\n    else: \n        # ToCa\n        linear_step_weight = 0.4 #0.4\n    step_factor = torch.tensor(1 + linear_step_weight - 2 * linear_step_weight * current['step'] / current['num_steps'])\n    threshold = torch.round(cache_dic['fresh_threshold'] / step_factor)\n\n    if (current['step'] in range(int(current['num_steps']*0.2),int(current['num_steps']*0.4))) and (cache_dic['fresh_ratio'] != 0):\n        # We find that in these 20% steps, the model is extremely sensitive for cache, i.e. worse temporal redundancy.\n        threshold = 2\n\n    cache_dic['cal_threshold'] = threshold\n"
  },
  {
    "path": "DiT-ToCa/cache_functions/fresh_ratio_scheduler.py",
    "content": "import torch\ndef fresh_ratio_scheduler(cache_dic, current):\n    '''\n    Return the fresh ratio for the current step.\n    '''\n    fresh_ratio = cache_dic['fresh_ratio']\n    fresh_ratio_schedule = cache_dic['fresh_ratio_schedule']\n    step = current['step']\n    num_steps = current['num_steps']\n    threshold = cache_dic['fresh_threshold']\n    weight = 0.9\n    if fresh_ratio_schedule == 'constant':\n        return fresh_ratio\n    elif fresh_ratio_schedule == 'linear':\n        return fresh_ratio * (1 + weight - 2 * weight * step / num_steps)\n    elif fresh_ratio_schedule == 'exp':\n        #return 0.5 * (0.052 ** (step/num_steps))\n        return fresh_ratio * (weight ** (step / num_steps))\n    elif fresh_ratio_schedule == 'linear-mode':\n        mode = (step % threshold)/threshold - 0.5\n        mode_weight = 0.1\n        return fresh_ratio * (1 + weight - 2 * weight * step / num_steps + mode_weight * mode)\n    elif fresh_ratio_schedule == 'layerwise':\n        return fresh_ratio * (1 + weight - 2 * weight * current['layer'] / 27)\n    elif fresh_ratio_schedule == 'linear-layerwise':\n        step_weight = 0.4 \n        step_factor = 1 + step_weight - 2 * step_weight * step / num_steps\n\n        layer_weight = 0.8\n        layer_factor = 1 + layer_weight - 2 * layer_weight * current['layer'] / 27\n\n        module_weight = 2.5\n        module_time_weight = 0.6\n        module_factor = (1 - (1-module_time_weight) * module_weight) if current['module']=='attn' else (1 + module_time_weight * module_weight)\n        \n        return fresh_ratio * layer_factor * step_factor * module_factor\n    \n###### Recommended Configurations ######\n\n    elif fresh_ratio_schedule == 'ToCa-ddim50':\n        # Proposed scheduling method in toca.\n\n        # step wise scheduling, we find there is little differece if change the weight of step factor, so this is not a key factor. \n        step_weight = 2.0 #0.4 #0.0 # 2.0\n        step_factor = 1 + step_weight - 2 * step_weight * step / num_steps\n\n        # layer wise scheduling, important. Meaning caculate more in the front layers, less in the back layers.\n        layer_weight = -0.2#0.8 #0.0 # -0.2\n        layer_factor = 1 + layer_weight - 2 * layer_weight * current['layer'] / 27\n\n        # Module wise scheduling, important. Meaning caculate more in the mlp module, less in the attn module.\n        module_weight = 2.5 # no calculations for attn module (2.5 * 0.4 = 1.0), compuation is transformed to mlp module.\n        module_time_weight = 0.6 # estimated from the time and flops of mlp and attn module, may change in different situations.\n        module_factor = (1 - (1-module_time_weight) * module_weight) if current['module']=='attn' else (1 + module_time_weight * module_weight)\n        \n        return fresh_ratio * layer_factor * step_factor * module_factor\n    \n    elif fresh_ratio_schedule == 'ToCa-ddpm250':\n        # Proposed scheduling method in toca.\n\n        # step wise scheduling, we find there is little differece if change the weight of step factor, so this is not a key factor. \n        step_weight = 0.4 #0.0 # 2.0\n        step_factor = 1 + step_weight - 2 * step_weight * step / num_steps\n\n        # layer wise scheduling, important. Meaning caculate more in the front layers, less in the back layers.\n        layer_weight = 0.8 #0.0 # -0.2\n        layer_factor = 1 + layer_weight - 2 * layer_weight * current['layer'] / 27\n\n        # Module wise scheduling, important. Meaning caculate more in the mlp module, less in the attn module.\n        module_weight = 2.5 # no calculations for attn module (2.5 * 0.4 = 1.0), compuation is transformed to mlp module.\n        module_time_weight = 0.6 # estimated from the time and flops of mlp and attn module, may change in different situations.\n        module_factor = (1 - (1-module_time_weight) * module_weight) if current['module']=='attn' else (1 + module_time_weight * module_weight)\n        return fresh_ratio * layer_factor * step_factor * module_factor\n\n    else:\n        raise ValueError(\"unrecognized fresh ratio schedule\", fresh_ratio_schedule)\n"
  },
  {
    "path": "DiT-ToCa/cache_functions/global_force_fresh.py",
    "content": "from .force_scheduler import force_scheduler\ndef global_force_fresh(cache_dic, current):\n    '''\n    Return whether to force fresh tokens globally.\n    '''\n    last_steps = (current['step'] <= 2)\n    first_step = (current['step'] == (current['num_steps'] - 1))\n    force_fresh = cache_dic['force_fresh']\n    if not first_step:\n        fresh_threshold = cache_dic['cal_threshold']\n    else:\n        fresh_threshold = cache_dic['fresh_threshold']\n\n    if force_fresh == 'global':\n    # global force fresh means force activate all tokens in this step.\n        return (first_step or (current['step']% fresh_threshold == 0))\n    \n    elif force_fresh == 'local':\n    # fresh locally cause much worse results, for the misalignment of cache and computed tokens.\n        return first_step\n    elif force_fresh == 'none':\n        return first_step\n    else:\n        raise ValueError(\"unrecognized force fresh strategy\", force_fresh)"
  },
  {
    "path": "DiT-ToCa/cache_functions/score_evaluate.py",
    "content": "import torch\nimport torch.nn as nn\nfrom .scores import attn_score, similarity_score, norm_score, kv_norm_score\ndef score_evaluate(cache_dic, tokens, current) -> torch.Tensor:\n    '''\n    Return the score tensor (B, N) for the given tokens. Mainly include s1, (s2,) s3 mentioned in the paper.\n    '''\n\n    #if ((not current['is_force_fresh']) and (cache_dic['force_fresh'] == 'local')):\n    ## abandoned branch, if you want to explore the local force fresh strategy, this may help.\n    #    force_fresh_mask = torch.as_tensor((cache_dic['cache_index'][-1][current['layer']][current['module']] >= 2 * cache_dic['fresh_threshold']), dtype = int) # 2 because the threshold is for step, not module\n    #    force_len = force_fresh_mask.sum(dim=1)\n    #    force_indices = force_fresh_mask.argsort(dim = -1, descending = True)[:, :force_len.min()]\n    #\n    #    force_indices = force_indices[:, torch.randperm(force_indices.shape[1])]\n\n    if cache_dic['cache_type'] == 'random':\n        # select tokens randomly, but remember to keep the same for cfg and no cfg.\n        score = torch.rand(int(tokens.shape[0]*0.5), tokens.shape[1], device=tokens.device)\n        score = torch.cat([score, score], dim=0).to(tokens.device)\n\n    elif cache_dic['cache_type'] == 'straight':\n        # abandon the cache, just return 1 hhh, obviously no use.\n        score = torch.ones(tokens.shape[0], tokens.shape[1]).to(tokens.device)\n    \n    elif cache_dic['cache_type'] == 'attention':\n        # Recommended selection method in the paper.\n\n        # cache_dic['attn_map'][step][layer] (B, N, N), the last dimention has get softmaxed\n\n        # calculate the attention score, for DiT, there is no cross-attention, so just self-attention score s1 applied.\n        score = attn_score(cache_dic, current)\n\n        # if you'd like to add some randomness to the score as SiTo does to avoid tokens been over cached. This works, but we have another elegant way.\n        #score = score + 0.0 * torch.rand_like(score, device= score.device)\n    elif cache_dic['cache_type'] == 'kv-norm':\n        score = kv_norm_score(cache_dic, current)\n\n    elif cache_dic['cache_type'] == 'similarity':\n        # why don't we calculate similarity score? \n        # This is natural but we find it cost **TOO MUCH TIME**, cause in DiT series models, you can calculate similarity for scoring every where.\n        score = similarity_score(cache_dic, current, tokens)\n\n    elif cache_dic['cache_type'] == 'norm':\n        # an interesting exploration, but not used in the final version.\n        # use norm as the selectioon method is probably because of the norm of the tokens may indicate the importance of the token. but it is not the case.\n        score = norm_score(cache_dic, current, tokens)\n\n    elif cache_dic['cache_type'] == 'compress':\n        # if you want to combine any of the methods mentioned, we have not tried this yet hhh.\n        score1 = torch.rand(int(tokens.shape[0]*0.5), tokens.shape[1])\n        score1 = torch.cat([score1, score1], dim=0).to(tokens.device)\n        score2 = cache_dic['attn_map'][-1][current['layer']].sum(dim=1)#.mean(dim=0) # (B, N)\n        # normalize\n        score2 = score2 / score2.max(dim=1, keepdim=True)[0]\n        score = 0.5 * score1 + 0.5 * score2\n\n    # abandon the branch, if you want to explore the local force fresh strategy, this may help.\n    #if ((not current['is_force_fresh']) and (cache_dic['force_fresh'] == 'local')): # current['is_force_fresh'] is False, cause when it is True, no cut and fresh are needed\n    #        #print(torch.ones_like(force_indices, dtype=float, device=force_indices.device).dtype)\n    #    score.scatter_(dim=1, index=force_indices, src=torch.ones_like(force_indices, dtype=torch.float32, \n    #                                                                       device=force_indices.device))\n    \n    if (True and (cache_dic['force_fresh'] == 'global')):\n        # apply s3 mentioned in the paper, the \"True\" above is for a switch to turn on/off the s3.\n        soft_step_score = cache_dic['cache_index'][-1][current['layer']][current['module']].float() / (cache_dic['fresh_threshold'])\n\n        # layer wise s3, not used in the final version. seems it is not necessary to add if step wise is applied.\n        #soft_layer_score = cache_dic['cache_index']['layer_index'][current['module']].float() / (27)\n        score = score + cache_dic['soft_fresh_weight'] * soft_step_score #+ 0.1 *soft_layer_score\n    \n    #cfg_score, no_cfg_score = torch.split(score, len(score)//2, dim = 0)\n    #score = 0.5* cfg_score + 0.5* no_cfg_score\n    #score = torch.cat([score,score], dim=0)\n\n    return score.to(tokens.device)"
  },
  {
    "path": "DiT-ToCa/cache_functions/scores.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\ndef attn_score(cache_dic, current):\n    '''\n    Attention Score s1 (s2, but dit doesn't contain cross-attention for s2)\n    '''\n    #self_attn_score = 1- cache_dic['attn_map'][-1][current['layer']].diagonal(dim1=1, dim2=2)\n    #self_attn_score = F.normalize(self_attn_score, dim=1, p=2)\n\n    attention_score = F.normalize(cache_dic['attn_map'][-1][current['layer']].sum(dim=1), dim=1, p=2)\n\n    #score = self_attn_score\n    score = attention_score\n    return score\n\ndef similarity_score(cache_dic, current, tokens):\n    cosine_sim = F.cosine_similarity(tokens, cache_dic['cache'][-1][current['layer']][current['module']], dim=-1)\n\n    return F.normalize(1- cosine_sim, dim=-1, p=2)\n\ndef norm_score(cache_dic, current, tokens):\n    norm = tokens.norm(dim=-1, p=2)\n    return F.normalize(norm, dim=-1, p=2)\n\ndef kv_norm_score(cache_dic, current):\n    # (B, num_heads, N)\n    #k_norm = cache_dic['cache'][-1][current['layer']]['k_norm']\n    v_norm = cache_dic['cache'][-1][current['layer']]['v_norm']\n    kv_norm = 1- v_norm \n\n\n    return F.normalize(kv_norm.sum(dim = -2), p=2)"
  },
  {
    "path": "DiT-ToCa/cache_functions/token_merge.py",
    "content": "import torch\ndef token_merge(cache_dic, tokens, current, fresh_indices, stale_indices):\n    '''\n    An abandoned branch in exploring if token merge helps. The answer is no, at least no for training-free strategy.\n    '''\n    if (current['layer'] % 1 == 0):\n        fresh_tokens = torch.gather(input = tokens, dim = 1, index = fresh_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1]))\n        stale_tokens = torch.gather(input = tokens, dim = 1, index = stale_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1]))\n        method = 'similarity'\n        if method == 'distance':\n            descending = False\n            distance = torch.cdist(stale_tokens, fresh_tokens, p=1)\n            stale_fresh_dist, stale_fresh_indices_allstale = torch.min(distance, dim=2)\n        elif method == 'similarity':\n            descending = True\n            fresh_tokens = torch.nn.functional.normalize(fresh_tokens, p=2, dim=-1)\n            stale_tokens = torch.nn.functional.normalize(stale_tokens, p=2, dim=-1)\n            similarity = stale_tokens @ fresh_tokens.transpose(1, 2)\n            stale_fresh_dist, stale_fresh_indices_allstale = torch.max(similarity, dim=2)\n        \n\n        saved_topk_stale = int((stale_fresh_dist > 0.995).sum(dim=1).min())\n        merged_stale_sequence = torch.sort(stale_fresh_dist, dim=1, descending=descending)[1][:,:saved_topk_stale]\n        stale_fresh_indices = stale_fresh_indices_allstale.gather(1, merged_stale_sequence)\n        merged_stale_sequence = stale_indices.gather(1, merged_stale_sequence)\n        merged_stale_fresh_indices = fresh_indices.gather(1, stale_fresh_indices)\n        cache_dic['merged_stale_fresh_indices'] = merged_stale_fresh_indices \n        cache_dic['merged_stale_sequence'] = merged_stale_sequence\n"
  },
  {
    "path": "DiT-ToCa/cache_functions/update_cache.py",
    "content": "import torch\ndef update_cache(fresh_indices, fresh_tokens, cache_dic, current, fresh_attn_map=None):\n    '''\n    Update the cache with the fresh tokens.\n    '''\n    step = current['step']\n    layer = current['layer']\n    module = current['module']\n    \n    # Update the cached tokens at the positions\n    if module == 'attn': \n        # this branch is not used in the final version, but if you explore the partial fresh strategy of attention, it works.\n        indices = fresh_indices.sort(dim=1, descending=False)[0]\n        \n        cache_dic['attn_map'][-1][layer].scatter_(dim=1, index=indices.unsqueeze(-1).expand(-1, -1, fresh_attn_map.shape[-1]), src=fresh_attn_map)\n    elif module == 'mlp':\n        indices = fresh_indices\n\n    cache_dic['cache'][-1][layer][module].scatter_(dim=1, index=indices.unsqueeze(-1).expand(-1, -1, fresh_tokens.shape[-1]), src=fresh_tokens)\n            \n    \n\n        \n        "
  },
  {
    "path": "DiT-ToCa/diffusion/__init__.py",
    "content": "# Modified from OpenAI's diffusion repos\r\n#     GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py\r\n#     ADM:   https://github.com/openai/guided-diffusion/blob/main/guided_diffusion\r\n#     IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\r\n\r\nfrom . import gaussian_diffusion as gd\r\nfrom .respace import SpacedDiffusion, space_timesteps\r\n\r\n\r\ndef create_diffusion(\r\n    timestep_respacing,\r\n    noise_schedule=\"linear\", \r\n    use_kl=False,\r\n    sigma_small=False,\r\n    predict_xstart=False,\r\n    learn_sigma=True,\r\n    rescale_learned_sigmas=False,\r\n    diffusion_steps=1000\r\n):\r\n    betas = gd.get_named_beta_schedule(noise_schedule, diffusion_steps)\r\n    if use_kl:\r\n        loss_type = gd.LossType.RESCALED_KL\r\n    elif rescale_learned_sigmas:\r\n        loss_type = gd.LossType.RESCALED_MSE\r\n    else:\r\n        loss_type = gd.LossType.MSE\r\n    if timestep_respacing is None or timestep_respacing == \"\":\r\n        timestep_respacing = [diffusion_steps]\r\n    return SpacedDiffusion(\r\n        use_timesteps=space_timesteps(diffusion_steps, timestep_respacing),\r\n        betas=betas,\r\n        model_mean_type=(\r\n            gd.ModelMeanType.EPSILON if not predict_xstart else gd.ModelMeanType.START_X\r\n        ),\r\n        model_var_type=(\r\n            (\r\n                gd.ModelVarType.FIXED_LARGE\r\n                if not sigma_small\r\n                else gd.ModelVarType.FIXED_SMALL\r\n            )\r\n            if not learn_sigma\r\n            else gd.ModelVarType.LEARNED_RANGE\r\n        ),\r\n        loss_type=loss_type\r\n        # rescale_timesteps=rescale_timesteps,\r\n    )\r\n"
  },
  {
    "path": "DiT-ToCa/diffusion/diffusion_utils.py",
    "content": "# Modified from OpenAI's diffusion repos\r\n#     GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py\r\n#     ADM:   https://github.com/openai/guided-diffusion/blob/main/guided_diffusion\r\n#     IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\r\n\r\nimport torch as th\r\nimport numpy as np\r\n\r\n\r\ndef normal_kl(mean1, logvar1, mean2, logvar2):\r\n    \"\"\"\r\n    Compute the KL divergence between two gaussians.\r\n    Shapes are automatically broadcasted, so batches can be compared to\r\n    scalars, among other use cases.\r\n    \"\"\"\r\n    tensor = None\r\n    for obj in (mean1, logvar1, mean2, logvar2):\r\n        if isinstance(obj, th.Tensor):\r\n            tensor = obj\r\n            break\r\n    assert tensor is not None, \"at least one argument must be a Tensor\"\r\n\r\n    # Force variances to be Tensors. Broadcasting helps convert scalars to\r\n    # Tensors, but it does not work for th.exp().\r\n    logvar1, logvar2 = [\r\n        x if isinstance(x, th.Tensor) else th.tensor(x).to(tensor)\r\n        for x in (logvar1, logvar2)\r\n    ]\r\n\r\n    return 0.5 * (\r\n        -1.0\r\n        + logvar2\r\n        - logvar1\r\n        + th.exp(logvar1 - logvar2)\r\n        + ((mean1 - mean2) ** 2) * th.exp(-logvar2)\r\n    )\r\n\r\n\r\ndef approx_standard_normal_cdf(x):\r\n    \"\"\"\r\n    A fast approximation of the cumulative distribution function of the\r\n    standard normal.\r\n    \"\"\"\r\n    return 0.5 * (1.0 + th.tanh(np.sqrt(2.0 / np.pi) * (x + 0.044715 * th.pow(x, 3))))\r\n\r\n\r\ndef continuous_gaussian_log_likelihood(x, *, means, log_scales):\r\n    \"\"\"\r\n    Compute the log-likelihood of a continuous Gaussian distribution.\r\n    :param x: the targets\r\n    :param means: the Gaussian mean Tensor.\r\n    :param log_scales: the Gaussian log stddev Tensor.\r\n    :return: a tensor like x of log probabilities (in nats).\r\n    \"\"\"\r\n    centered_x = x - means\r\n    inv_stdv = th.exp(-log_scales)\r\n    normalized_x = centered_x * inv_stdv\r\n    log_probs = th.distributions.Normal(th.zeros_like(x), th.ones_like(x)).log_prob(normalized_x)\r\n    return log_probs\r\n\r\n\r\ndef discretized_gaussian_log_likelihood(x, *, means, log_scales):\r\n    \"\"\"\r\n    Compute the log-likelihood of a Gaussian distribution discretizing to a\r\n    given image.\r\n    :param x: the target images. It is assumed that this was uint8 values,\r\n              rescaled to the range [-1, 1].\r\n    :param means: the Gaussian mean Tensor.\r\n    :param log_scales: the Gaussian log stddev Tensor.\r\n    :return: a tensor like x of log probabilities (in nats).\r\n    \"\"\"\r\n    assert x.shape == means.shape == log_scales.shape\r\n    centered_x = x - means\r\n    inv_stdv = th.exp(-log_scales)\r\n    plus_in = inv_stdv * (centered_x + 1.0 / 255.0)\r\n    cdf_plus = approx_standard_normal_cdf(plus_in)\r\n    min_in = inv_stdv * (centered_x - 1.0 / 255.0)\r\n    cdf_min = approx_standard_normal_cdf(min_in)\r\n    log_cdf_plus = th.log(cdf_plus.clamp(min=1e-12))\r\n    log_one_minus_cdf_min = th.log((1.0 - cdf_min).clamp(min=1e-12))\r\n    cdf_delta = cdf_plus - cdf_min\r\n    log_probs = th.where(\r\n        x < -0.999,\r\n        log_cdf_plus,\r\n        th.where(x > 0.999, log_one_minus_cdf_min, th.log(cdf_delta.clamp(min=1e-12))),\r\n    )\r\n    assert log_probs.shape == x.shape\r\n    return log_probs\r\n"
  },
  {
    "path": "DiT-ToCa/diffusion/gaussian_diffusion.py",
    "content": "# Modified from OpenAI's diffusion repos\r\n#     GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py\r\n#     ADM:   https://github.com/openai/guided-diffusion/blob/main/guided_diffusion\r\n#     IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\r\n\r\n\r\nimport math\r\n\r\nimport numpy as np\r\nimport torch as th\r\nimport enum\r\n\r\nfrom .diffusion_utils import discretized_gaussian_log_likelihood, normal_kl\r\n\r\nfrom cache_functions import cache_init\r\n\r\ndef mean_flat(tensor):\r\n    \"\"\"\r\n    Take the mean over all non-batch dimensions.\r\n    \"\"\"\r\n    return tensor.mean(dim=list(range(1, len(tensor.shape))))\r\n\r\n\r\nclass ModelMeanType(enum.Enum):\r\n    \"\"\"\r\n    Which type of output the model predicts.\r\n    \"\"\"\r\n\r\n    PREVIOUS_X = enum.auto()  # the model predicts x_{t-1}\r\n    START_X = enum.auto()  # the model predicts x_0\r\n    EPSILON = enum.auto()  # the model predicts epsilon\r\n\r\n\r\nclass ModelVarType(enum.Enum):\r\n    \"\"\"\r\n    What is used as the model's output variance.\r\n    The LEARNED_RANGE option has been added to allow the model to predict\r\n    values between FIXED_SMALL and FIXED_LARGE, making its job easier.\r\n    \"\"\"\r\n\r\n    LEARNED = enum.auto()\r\n    FIXED_SMALL = enum.auto()\r\n    FIXED_LARGE = enum.auto()\r\n    LEARNED_RANGE = enum.auto()\r\n\r\n\r\nclass LossType(enum.Enum):\r\n    MSE = enum.auto()  # use raw MSE loss (and KL when learning variances)\r\n    RESCALED_MSE = (\r\n        enum.auto()\r\n    )  # use raw MSE loss (with RESCALED_KL when learning variances)\r\n    KL = enum.auto()  # use the variational lower-bound\r\n    RESCALED_KL = enum.auto()  # like KL, but rescale to estimate the full VLB\r\n\r\n    def is_vb(self):\r\n        return self == LossType.KL or self == LossType.RESCALED_KL\r\n\r\n\r\ndef _warmup_beta(beta_start, beta_end, num_diffusion_timesteps, warmup_frac):\r\n    betas = beta_end * np.ones(num_diffusion_timesteps, dtype=np.float64)\r\n    warmup_time = int(num_diffusion_timesteps * warmup_frac)\r\n    betas[:warmup_time] = np.linspace(beta_start, beta_end, warmup_time, dtype=np.float64)\r\n    return betas\r\n\r\n\r\ndef get_beta_schedule(beta_schedule, *, beta_start, beta_end, num_diffusion_timesteps):\r\n    \"\"\"\r\n    This is the deprecated API for creating beta schedules.\r\n    See get_named_beta_schedule() for the new library of schedules.\r\n    \"\"\"\r\n    if beta_schedule == \"quad\":\r\n        betas = (\r\n            np.linspace(\r\n                beta_start ** 0.5,\r\n                beta_end ** 0.5,\r\n                num_diffusion_timesteps,\r\n                dtype=np.float64,\r\n            )\r\n            ** 2\r\n        )\r\n    elif beta_schedule == \"linear\":\r\n        betas = np.linspace(beta_start, beta_end, num_diffusion_timesteps, dtype=np.float64)\r\n    elif beta_schedule == \"warmup10\":\r\n        betas = _warmup_beta(beta_start, beta_end, num_diffusion_timesteps, 0.1)\r\n    elif beta_schedule == \"warmup50\":\r\n        betas = _warmup_beta(beta_start, beta_end, num_diffusion_timesteps, 0.5)\r\n    elif beta_schedule == \"const\":\r\n        betas = beta_end * np.ones(num_diffusion_timesteps, dtype=np.float64)\r\n    elif beta_schedule == \"jsd\":  # 1/T, 1/(T-1), 1/(T-2), ..., 1\r\n        betas = 1.0 / np.linspace(\r\n            num_diffusion_timesteps, 1, num_diffusion_timesteps, dtype=np.float64\r\n        )\r\n    else:\r\n        raise NotImplementedError(beta_schedule)\r\n    assert betas.shape == (num_diffusion_timesteps,)\r\n    return betas\r\n\r\n\r\ndef get_named_beta_schedule(schedule_name, num_diffusion_timesteps):\r\n    \"\"\"\r\n    Get a pre-defined beta schedule for the given name.\r\n    The beta schedule library consists of beta schedules which remain similar\r\n    in the limit of num_diffusion_timesteps.\r\n    Beta schedules may be added, but should not be removed or changed once\r\n    they are committed to maintain backwards compatibility.\r\n    \"\"\"\r\n    if schedule_name == \"linear\":\r\n        # Linear schedule from Ho et al, extended to work for any number of\r\n        # diffusion steps.\r\n        scale = 1000 / num_diffusion_timesteps\r\n        return get_beta_schedule(\r\n            \"linear\",\r\n            beta_start=scale * 0.0001,\r\n            beta_end=scale * 0.02,\r\n            num_diffusion_timesteps=num_diffusion_timesteps,\r\n        )\r\n    elif schedule_name == \"squaredcos_cap_v2\":\r\n        return betas_for_alpha_bar(\r\n            num_diffusion_timesteps,\r\n            lambda t: math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2,\r\n        )\r\n    else:\r\n        raise NotImplementedError(f\"unknown beta schedule: {schedule_name}\")\r\n\r\n\r\ndef betas_for_alpha_bar(num_diffusion_timesteps, alpha_bar, max_beta=0.999):\r\n    \"\"\"\r\n    Create a beta schedule that discretizes the given alpha_t_bar function,\r\n    which defines the cumulative product of (1-beta) over time from t = [0,1].\r\n    :param num_diffusion_timesteps: the number of betas to produce.\r\n    :param alpha_bar: a lambda that takes an argument t from 0 to 1 and\r\n                      produces the cumulative product of (1-beta) up to that\r\n                      part of the diffusion process.\r\n    :param max_beta: the maximum beta to use; use values lower than 1 to\r\n                     prevent singularities.\r\n    \"\"\"\r\n    betas = []\r\n    for i in range(num_diffusion_timesteps):\r\n        t1 = i / num_diffusion_timesteps\r\n        t2 = (i + 1) / num_diffusion_timesteps\r\n        betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta))\r\n    return np.array(betas)\r\n\r\n\r\nclass GaussianDiffusion:\r\n    \"\"\"\r\n    Utilities for training and sampling diffusion models.\r\n    Original ported from this codebase:\r\n    https://github.com/hojonathanho/diffusion/blob/1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/diffusion_utils_2.py#L42\r\n    :param betas: a 1-D numpy array of betas for each diffusion timestep,\r\n                  starting at T and going to 1.\r\n    \"\"\"\r\n\r\n    def __init__(\r\n        self,\r\n        *,\r\n        betas,\r\n        model_mean_type,\r\n        model_var_type,\r\n        loss_type\r\n    ):\r\n\r\n        self.model_mean_type = model_mean_type\r\n        self.model_var_type = model_var_type\r\n        self.loss_type = loss_type\r\n\r\n\r\n\r\n        # Use float64 for accuracy.\r\n        betas = np.array(betas, dtype=np.float64)\r\n        self.betas = betas\r\n        assert len(betas.shape) == 1, \"betas must be 1-D\"\r\n        assert (betas > 0).all() and (betas <= 1).all()\r\n\r\n        self.num_timesteps = int(betas.shape[0])\r\n\r\n        alphas = 1.0 - betas\r\n        self.alphas_cumprod = np.cumprod(alphas, axis=0)\r\n        self.alphas_cumprod_prev = np.append(1.0, self.alphas_cumprod[:-1])\r\n        self.alphas_cumprod_next = np.append(self.alphas_cumprod[1:], 0.0)\r\n        assert self.alphas_cumprod_prev.shape == (self.num_timesteps,)\r\n\r\n        # calculations for diffusion q(x_t | x_{t-1}) and others\r\n        self.sqrt_alphas_cumprod = np.sqrt(self.alphas_cumprod)\r\n        self.sqrt_one_minus_alphas_cumprod = np.sqrt(1.0 - self.alphas_cumprod)\r\n        self.log_one_minus_alphas_cumprod = np.log(1.0 - self.alphas_cumprod)\r\n        self.sqrt_recip_alphas_cumprod = np.sqrt(1.0 / self.alphas_cumprod)\r\n        self.sqrt_recipm1_alphas_cumprod = np.sqrt(1.0 / self.alphas_cumprod - 1)\r\n\r\n        # calculations for posterior q(x_{t-1} | x_t, x_0)\r\n        self.posterior_variance = (\r\n            betas * (1.0 - self.alphas_cumprod_prev) / (1.0 - self.alphas_cumprod)\r\n        )\r\n        # below: log calculation clipped because the posterior variance is 0 at the beginning of the diffusion chain\r\n        self.posterior_log_variance_clipped = np.log(\r\n            np.append(self.posterior_variance[1], self.posterior_variance[1:])\r\n        ) if len(self.posterior_variance) > 1 else np.array([])\r\n\r\n        self.posterior_mean_coef1 = (\r\n            betas * np.sqrt(self.alphas_cumprod_prev) / (1.0 - self.alphas_cumprod)\r\n        )\r\n        self.posterior_mean_coef2 = (\r\n            (1.0 - self.alphas_cumprod_prev) * np.sqrt(alphas) / (1.0 - self.alphas_cumprod)\r\n        )\r\n\r\n    def q_mean_variance(self, x_start, t):\r\n        \"\"\"\r\n        Get the distribution q(x_t | x_0).\r\n        :param x_start: the [N x C x ...] tensor of noiseless inputs.\r\n        :param t: the number of diffusion steps (minus 1). Here, 0 means one step.\r\n        :return: A tuple (mean, variance, log_variance), all of x_start's shape.\r\n        \"\"\"\r\n        mean = _extract_into_tensor(self.sqrt_alphas_cumprod, t, x_start.shape) * x_start\r\n        variance = _extract_into_tensor(1.0 - self.alphas_cumprod, t, x_start.shape)\r\n        log_variance = _extract_into_tensor(self.log_one_minus_alphas_cumprod, t, x_start.shape)\r\n        return mean, variance, log_variance\r\n\r\n    def q_sample(self, x_start, t, noise=None):\r\n        \"\"\"\r\n        Diffuse the data for a given number of diffusion steps.\r\n        In other words, sample from q(x_t | x_0).\r\n        :param x_start: the initial data batch.\r\n        :param t: the number of diffusion steps (minus 1). Here, 0 means one step.\r\n        :param noise: if specified, the split-out normal noise.\r\n        :return: A noisy version of x_start.\r\n        \"\"\"\r\n        if noise is None:\r\n            noise = th.randn_like(x_start)\r\n        assert noise.shape == x_start.shape\r\n        return (\r\n            _extract_into_tensor(self.sqrt_alphas_cumprod, t, x_start.shape) * x_start\r\n            + _extract_into_tensor(self.sqrt_one_minus_alphas_cumprod, t, x_start.shape) * noise\r\n        )\r\n\r\n    def q_posterior_mean_variance(self, x_start, x_t, t):\r\n        \"\"\"\r\n        Compute the mean and variance of the diffusion posterior:\r\n            q(x_{t-1} | x_t, x_0)\r\n        \"\"\"\r\n        assert x_start.shape == x_t.shape\r\n        posterior_mean = (\r\n            _extract_into_tensor(self.posterior_mean_coef1, t, x_t.shape) * x_start\r\n            + _extract_into_tensor(self.posterior_mean_coef2, t, x_t.shape) * x_t\r\n        )\r\n        posterior_variance = _extract_into_tensor(self.posterior_variance, t, x_t.shape)\r\n        posterior_log_variance_clipped = _extract_into_tensor(\r\n            self.posterior_log_variance_clipped, t, x_t.shape\r\n        )\r\n        assert (\r\n            posterior_mean.shape[0]\r\n            == posterior_variance.shape[0]\r\n            == posterior_log_variance_clipped.shape[0]\r\n            == x_start.shape[0]\r\n        )\r\n        return posterior_mean, posterior_variance, posterior_log_variance_clipped\r\n\r\n    def p_mean_variance(self, model, x, t, current=None, cache_dic=None, clip_denoised=True, denoised_fn=None, model_kwargs=None):\r\n        #def p_mean_variance(self, model, x, t, clip_denoised=True, denoised_fn=None, model_kwargs=None): \r\n        \"\"\"\r\n        Apply the model to get p(x_{t-1} | x_t), as well as a prediction of\r\n        the initial x, x_0.\r\n        :param model: the model, which takes a signal and a batch of timesteps\r\n                      as input.\r\n        :param x: the [N x C x ...] tensor at time t.\r\n        :param t: a 1-D Tensor of timesteps.\r\n        :param clip_denoised: if True, clip the denoised signal into [-1, 1].\r\n        :param denoised_fn: if not None, a function which applies to the\r\n            x_start prediction before it is used to sample. Applies before\r\n            clip_denoised.\r\n        :param model_kwargs: if not None, a dict of extra keyword arguments to\r\n            pass to the model. This can be used for conditioning.\r\n        :return: a dict with the following keys:\r\n                 - 'mean': the model mean output.\r\n                 - 'variance': the model variance output.\r\n                 - 'log_variance': the log of 'variance'.\r\n                 - 'pred_xstart': the prediction for x_0.\r\n        \"\"\"\r\n        if model_kwargs is None:\r\n            model_kwargs = {}\r\n\r\n        B, C = x.shape[:2]\r\n        assert t.shape == (B,)\r\n\r\n        model_output = model(x, t, current=current, cache_dic=cache_dic, **model_kwargs)\r\n        if isinstance(model_output, tuple):\r\n            model_output, extra = model_output\r\n        else:\r\n            extra = None\r\n\r\n        if self.model_var_type in [ModelVarType.LEARNED, ModelVarType.LEARNED_RANGE]:\r\n            assert model_output.shape == (B, C * 2, *x.shape[2:])\r\n            model_output, model_var_values = th.split(model_output, C, dim=1)\r\n            min_log = _extract_into_tensor(self.posterior_log_variance_clipped, t, x.shape)\r\n            max_log = _extract_into_tensor(np.log(self.betas), t, x.shape)\r\n            # The model_var_values is [-1, 1] for [min_var, max_var].\r\n            frac = (model_var_values + 1) / 2\r\n            model_log_variance = frac * max_log + (1 - frac) * min_log\r\n            model_variance = th.exp(model_log_variance)\r\n        else:\r\n            model_variance, model_log_variance = {\r\n                # for fixedlarge, we set the initial (log-)variance like so\r\n                # to get a better decoder log likelihood.\r\n                ModelVarType.FIXED_LARGE: (\r\n                    np.append(self.posterior_variance[1], self.betas[1:]),\r\n                    np.log(np.append(self.posterior_variance[1], self.betas[1:])),\r\n                ),\r\n                ModelVarType.FIXED_SMALL: (\r\n                    self.posterior_variance,\r\n                    self.posterior_log_variance_clipped,\r\n                ),\r\n            }[self.model_var_type]\r\n            model_variance = _extract_into_tensor(model_variance, t, x.shape)\r\n            model_log_variance = _extract_into_tensor(model_log_variance, t, x.shape)\r\n\r\n        def process_xstart(x):\r\n            if denoised_fn is not None:\r\n                x = denoised_fn(x)\r\n            if clip_denoised:\r\n                return x.clamp(-1, 1)\r\n            return x\r\n\r\n        if self.model_mean_type == ModelMeanType.START_X:\r\n            pred_xstart = process_xstart(model_output)\r\n        else:\r\n            pred_xstart = process_xstart(\r\n                self._predict_xstart_from_eps(x_t=x, t=t, eps=model_output)\r\n            )\r\n        model_mean, _, _ = self.q_posterior_mean_variance(x_start=pred_xstart, x_t=x, t=t)\r\n\r\n        assert model_mean.shape == model_log_variance.shape == pred_xstart.shape == x.shape\r\n        return {\r\n            \"mean\": model_mean,\r\n            \"variance\": model_variance,\r\n            \"log_variance\": model_log_variance,\r\n            \"pred_xstart\": pred_xstart,\r\n            \"extra\": extra,\r\n        }\r\n\r\n    def _predict_xstart_from_eps(self, x_t, t, eps):\r\n        assert x_t.shape == eps.shape\r\n        return (\r\n            _extract_into_tensor(self.sqrt_recip_alphas_cumprod, t, x_t.shape) * x_t\r\n            - _extract_into_tensor(self.sqrt_recipm1_alphas_cumprod, t, x_t.shape) * eps\r\n        )\r\n\r\n    def _predict_eps_from_xstart(self, x_t, t, pred_xstart):\r\n        return (\r\n            _extract_into_tensor(self.sqrt_recip_alphas_cumprod, t, x_t.shape) * x_t - pred_xstart\r\n        ) / _extract_into_tensor(self.sqrt_recipm1_alphas_cumprod, t, x_t.shape)\r\n\r\n    def condition_mean(self, cond_fn, p_mean_var, x, t, model_kwargs=None):\r\n        \"\"\"\r\n        Compute the mean for the previous step, given a function cond_fn that\r\n        computes the gradient of a conditional log probability with respect to\r\n        x. In particular, cond_fn computes grad(log(p(y|x))), and we want to\r\n        condition on y.\r\n        This uses the conditioning strategy from Sohl-Dickstein et al. (2015).\r\n        \"\"\"\r\n        gradient = cond_fn(x, t, **model_kwargs)\r\n        new_mean = p_mean_var[\"mean\"].float() + p_mean_var[\"variance\"] * gradient.float()\r\n        return new_mean\r\n\r\n    def condition_score(self, cond_fn, p_mean_var, x, t, model_kwargs=None):\r\n        \"\"\"\r\n        Compute what the p_mean_variance output would have been, should the\r\n        model's score function be conditioned by cond_fn.\r\n        See condition_mean() for details on cond_fn.\r\n        Unlike condition_mean(), this instead uses the conditioning strategy\r\n        from Song et al (2020).\r\n        \"\"\"\r\n        alpha_bar = _extract_into_tensor(self.alphas_cumprod, t, x.shape)\r\n\r\n        eps = self._predict_eps_from_xstart(x, t, p_mean_var[\"pred_xstart\"])\r\n        eps = eps - (1 - alpha_bar).sqrt() * cond_fn(x, t, **model_kwargs)\r\n\r\n        out = p_mean_var.copy()\r\n        out[\"pred_xstart\"] = self._predict_xstart_from_eps(x, t, eps)\r\n        out[\"mean\"], _, _ = self.q_posterior_mean_variance(x_start=out[\"pred_xstart\"], x_t=x, t=t)\r\n        return out\r\n\r\n    def p_sample(\r\n        self,\r\n        model,\r\n        x,\r\n        t,\r\n        clip_denoised=True,\r\n        current=None,\r\n        cache_dic=None,\r\n        denoised_fn=None,\r\n        cond_fn=None,\r\n        model_kwargs=None,\r\n    ):\r\n        \"\"\"\r\n        Sample x_{t-1} from the model at the given timestep.\r\n        :param model: the model to sample from.\r\n        :param x: the current tensor at x_{t-1}.\r\n        :param t: the value of t, starting at 0 for the first diffusion step.\r\n        :param clip_denoised: if True, clip the x_start prediction to [-1, 1].\r\n        :param denoised_fn: if not None, a function which applies to the\r\n            x_start prediction before it is used to sample.\r\n        :param cond_fn: if not None, this is a gradient function that acts\r\n                        similarly to the model.\r\n        :param model_kwargs: if not None, a dict of extra keyword arguments to\r\n            pass to the model. This can be used for conditioning.\r\n        :return: a dict containing the following keys:\r\n                 - 'sample': a random sample from the model.\r\n                 - 'pred_xstart': a prediction of x_0.\r\n        \"\"\"\r\n        out = self.p_mean_variance(\r\n            model,\r\n            x,\r\n            t,\r\n            current=current,\r\n            cache_dic=cache_dic,\r\n            clip_denoised=clip_denoised,\r\n            denoised_fn=denoised_fn,\r\n            model_kwargs=model_kwargs,\r\n        )\r\n        noise = th.randn_like(x)\r\n        nonzero_mask = (\r\n            (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))\r\n        )  # no noise when t == 0\r\n        if cond_fn is not None:\r\n            out[\"mean\"] = self.condition_mean(cond_fn, out, x, t, model_kwargs=model_kwargs)\r\n        sample = out[\"mean\"] + nonzero_mask * th.exp(0.5 * out[\"log_variance\"]) * noise\r\n        return {\"sample\": sample, \"pred_xstart\": out[\"pred_xstart\"]}\r\n\r\n    def p_sample_loop(\r\n        self,\r\n        model,\r\n        shape,\r\n        noise=None,\r\n        clip_denoised=True,\r\n        denoised_fn=None,\r\n        cond_fn=None,\r\n        model_kwargs=None,\r\n        device=None,\r\n        progress=False,\r\n    ):\r\n        \"\"\"\r\n        Generate samples from the model.\r\n        :param model: the model module.\r\n        :param shape: the shape of the samples, (N, C, H, W).\r\n        :param noise: if specified, the noise from the encoder to sample.\r\n                      Should be of the same shape as `shape`.\r\n        :param clip_denoised: if True, clip x_start predictions to [-1, 1].\r\n        :param denoised_fn: if not None, a function which applies to the\r\n            x_start prediction before it is used to sample.\r\n        :param cond_fn: if not None, this is a gradient function that acts\r\n                        similarly to the model.\r\n        :param model_kwargs: if not None, a dict of extra keyword arguments to\r\n            pass to the model. This can be used for conditioning.\r\n        :param device: if specified, the device to create the samples on.\r\n                       If not specified, use a model parameter's device.\r\n        :param progress: if True, show a tqdm progress bar.\r\n        :return: a non-differentiable batch of samples.\r\n        \"\"\"\r\n        final = None\r\n        for sample in self.p_sample_loop_progressive(\r\n            model,\r\n            shape,\r\n            noise=noise,\r\n            clip_denoised=clip_denoised,\r\n            denoised_fn=denoised_fn,\r\n            cond_fn=cond_fn,\r\n            model_kwargs=model_kwargs,\r\n            device=device,\r\n            progress=progress,\r\n        ):\r\n            final = sample\r\n        return final[\"sample\"]\r\n\r\n    def p_sample_loop_progressive(\r\n        self,\r\n        model,\r\n        shape,\r\n        noise=None,\r\n        clip_denoised=True,\r\n        denoised_fn=None,\r\n        cond_fn=None,\r\n        model_kwargs=None,\r\n        device=None,\r\n        progress=False,\r\n    ):\r\n        \"\"\"\r\n        Generate samples from the model and yield intermediate samples from\r\n        each timestep of diffusion.\r\n        Arguments are the same as p_sample_loop().\r\n        Returns a generator over dicts, where each dict is the return value of\r\n        p_sample().\r\n        \"\"\"\r\n        if device is None:\r\n            device = next(model.parameters()).device\r\n        assert isinstance(shape, (tuple, list))\r\n        if noise is not None:\r\n            img = noise\r\n        else:\r\n            img = th.randn(*shape, device=device)\r\n        indices = list(range(self.num_timesteps))[::-1]\r\n\r\n        if progress:\r\n            # Lazy import so that we don't depend on tqdm.\r\n            from tqdm.auto import tqdm\r\n\r\n            indices = tqdm(indices)\r\n\r\n        # Initialization for ToCa     \r\n        cache_dic, current = cache_init(model_kwargs=model_kwargs, num_steps=self.num_timesteps)\r\n\r\n        for i in indices:\r\n            t = th.tensor([i] * shape[0], device=device)\r\n            with th.no_grad():\r\n                current['step'] = i\r\n                out = self.p_sample(\r\n                    model,\r\n                    img,\r\n                    t,\r\n                    current=current,\r\n                    cache_dic=cache_dic,\r\n                    clip_denoised=clip_denoised,\r\n                    denoised_fn=denoised_fn,\r\n                    cond_fn=cond_fn,\r\n                    model_kwargs=model_kwargs,\r\n                )\r\n                yield out\r\n                img = out[\"sample\"]\r\n        \r\n        if cache_dic['test_FLOPs'] == True:\r\n            print(cache_dic['flops'] * 1e-12, \"TFLOPs\")\r\n\r\n    def ddim_sample(\r\n        self,\r\n        model,\r\n        x,\r\n        t,\r\n        current = None,\r\n        cache_dic = None,\r\n        clip_denoised=True,\r\n        denoised_fn=None,\r\n        cond_fn=None,\r\n        model_kwargs=None,\r\n        eta=0.0,\r\n    ):\r\n        \"\"\"\r\n        Sample x_{t-1} from the model using DDIM.\r\n        Same usage as p_sample().\r\n        \"\"\"\r\n        out = self.p_mean_variance(\r\n            model,\r\n            x,\r\n            t,\r\n            current=current,\r\n            cache_dic=cache_dic,\r\n            clip_denoised=clip_denoised,\r\n            denoised_fn=denoised_fn,\r\n            model_kwargs=model_kwargs,\r\n        )\r\n        if cond_fn is not None:\r\n            out = self.condition_score(cond_fn, out, x, t, model_kwargs=model_kwargs)\r\n\r\n        # Usually our model outputs epsilon, but we re-derive it\r\n        # in case we used x_start or x_prev prediction.\r\n        eps = self._predict_eps_from_xstart(x, t, out[\"pred_xstart\"])\r\n\r\n        alpha_bar = _extract_into_tensor(self.alphas_cumprod, t, x.shape)\r\n        alpha_bar_prev = _extract_into_tensor(self.alphas_cumprod_prev, t, x.shape)\r\n        sigma = (\r\n            eta\r\n            * th.sqrt((1 - alpha_bar_prev) / (1 - alpha_bar))\r\n            * th.sqrt(1 - alpha_bar / alpha_bar_prev)\r\n        )\r\n        # Equation 12.\r\n        noise = th.randn_like(x)\r\n        mean_pred = (\r\n            out[\"pred_xstart\"] * th.sqrt(alpha_bar_prev)\r\n            + th.sqrt(1 - alpha_bar_prev - sigma ** 2) * eps\r\n        )\r\n        nonzero_mask = (\r\n            (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))\r\n        )  # no noise when t == 0\r\n        sample = mean_pred + nonzero_mask * sigma * noise\r\n        return {\"sample\": sample, \"pred_xstart\": out[\"pred_xstart\"]}\r\n\r\n    def ddim_reverse_sample(\r\n        self,\r\n        model,\r\n        x,\r\n        t,\r\n        clip_denoised=True,\r\n        denoised_fn=None,\r\n        cond_fn=None,\r\n        model_kwargs=None,\r\n        eta=0.0,\r\n    ):\r\n        \"\"\"\r\n        Sample x_{t+1} from the model using DDIM reverse ODE.\r\n        \"\"\"\r\n        assert eta == 0.0, \"Reverse ODE only for deterministic path\"\r\n        out = self.p_mean_variance(\r\n            model,\r\n            x,\r\n            t,\r\n            clip_denoised=clip_denoised,\r\n            denoised_fn=denoised_fn,\r\n            model_kwargs=model_kwargs,\r\n        )\r\n        if cond_fn is not None:\r\n            out = self.condition_score(cond_fn, out, x, t, model_kwargs=model_kwargs)\r\n        # Usually our model outputs epsilon, but we re-derive it\r\n        # in case we used x_start or x_prev prediction.\r\n        eps = (\r\n            _extract_into_tensor(self.sqrt_recip_alphas_cumprod, t, x.shape) * x\r\n            - out[\"pred_xstart\"]\r\n        ) / _extract_into_tensor(self.sqrt_recipm1_alphas_cumprod, t, x.shape)\r\n        alpha_bar_next = _extract_into_tensor(self.alphas_cumprod_next, t, x.shape)\r\n\r\n        # Equation 12. reversed\r\n        mean_pred = out[\"pred_xstart\"] * th.sqrt(alpha_bar_next) + th.sqrt(1 - alpha_bar_next) * eps\r\n\r\n        return {\"sample\": mean_pred, \"pred_xstart\": out[\"pred_xstart\"]}\r\n\r\n    def ddim_sample_loop(\r\n        self,\r\n        model,\r\n        shape,\r\n        noise=None,\r\n        clip_denoised=True,\r\n        denoised_fn=None,\r\n        cond_fn=None,\r\n        model_kwargs=None,\r\n        device=None,\r\n        progress=False,\r\n        eta=0.0,\r\n    ):\r\n        \"\"\"\r\n        Generate samples from the model using DDIM.\r\n        Same usage as p_sample_loop().\r\n        \"\"\"\r\n        final = None\r\n        for sample in self.ddim_sample_loop_progressive(\r\n            model,\r\n            shape,\r\n            noise=noise,\r\n            clip_denoised=clip_denoised,\r\n            denoised_fn=denoised_fn,\r\n            cond_fn=cond_fn,\r\n            model_kwargs=model_kwargs,\r\n            device=device,\r\n            progress=progress,\r\n            eta=eta,\r\n        ):\r\n            final = sample\r\n        return final[\"sample\"]\r\n\r\n    def ddim_sample_loop_progressive(\r\n        self,\r\n        model,\r\n        shape,\r\n        noise=None,\r\n        clip_denoised=True,\r\n        denoised_fn=None,\r\n        cond_fn=None,\r\n        model_kwargs=None,\r\n        device=None,\r\n        progress=False,\r\n        eta=0.0,\r\n    ):\r\n        \"\"\"\r\n        Use DDIM to sample from the model and yield intermediate samples from\r\n        each timestep of DDIM.\r\n        Same usage as p_sample_loop_progressive().\r\n        \"\"\"\r\n        if device is None:\r\n            device = next(model.parameters()).device\r\n        assert isinstance(shape, (tuple, list))\r\n        if noise is not None:\r\n            img = noise\r\n        else:\r\n            img = th.randn(*shape, device=device)\r\n        indices = list(range(self.num_timesteps))[::-1]\r\n\r\n        if progress:\r\n            # Lazy import so that we don't depend on tqdm.\r\n            from tqdm.auto import tqdm\r\n\r\n            indices = tqdm(indices)\r\n\r\n        # Initialization for ToCa     \r\n        cache_dic, current = cache_init(model_kwargs=model_kwargs, num_steps=self.num_timesteps)\r\n\r\n        for i in indices:\r\n            t = th.tensor([i] * shape[0], device=device)\r\n            with th.no_grad():\r\n                current['step'] = i\r\n                out = self.ddim_sample(\r\n                    model,\r\n                    img,\r\n                    t,\r\n                    current=current,\r\n                    cache_dic=cache_dic,\r\n                    clip_denoised=clip_denoised,\r\n                    denoised_fn=denoised_fn,\r\n                    cond_fn=cond_fn,\r\n                    model_kwargs=model_kwargs,\r\n                    eta=eta,\r\n                )\r\n                yield out\r\n                img = out[\"sample\"]\r\n        if cache_dic['test_FLOPs'] == True:\r\n            print(cache_dic['flops'] * 1e-12, \"TFLOPs\")\r\n\r\n    def _vb_terms_bpd(\r\n            self, model, x_start, x_t, t, clip_denoised=True, model_kwargs=None\r\n    ):\r\n        \"\"\"\r\n        Get a term for the variational lower-bound.\r\n        The resulting units are bits (rather than nats, as one might expect).\r\n        This allows for comparison to other papers.\r\n        :return: a dict with the following keys:\r\n                 - 'output': a shape [N] tensor of NLLs or KLs.\r\n                 - 'pred_xstart': the x_0 predictions.\r\n        \"\"\"\r\n        true_mean, _, true_log_variance_clipped = self.q_posterior_mean_variance(\r\n            x_start=x_start, x_t=x_t, t=t\r\n        )\r\n        out = self.p_mean_variance(\r\n            model, x_t, t, clip_denoised=clip_denoised, model_kwargs=model_kwargs\r\n        )\r\n        kl = normal_kl(\r\n            true_mean, true_log_variance_clipped, out[\"mean\"], out[\"log_variance\"]\r\n        )\r\n        kl = mean_flat(kl) / np.log(2.0)\r\n\r\n        decoder_nll = -discretized_gaussian_log_likelihood(\r\n            x_start, means=out[\"mean\"], log_scales=0.5 * out[\"log_variance\"]\r\n        )\r\n        assert decoder_nll.shape == x_start.shape\r\n        decoder_nll = mean_flat(decoder_nll) / np.log(2.0)\r\n\r\n        # At the first timestep return the decoder NLL,\r\n        # otherwise return KL(q(x_{t-1}|x_t,x_0) || p(x_{t-1}|x_t))\r\n        output = th.where((t == 0), decoder_nll, kl)\r\n        return {\"output\": output, \"pred_xstart\": out[\"pred_xstart\"]}\r\n\r\n    def training_losses(self, model, x_start, t, model_kwargs=None, noise=None):\r\n        \"\"\"\r\n        Compute training losses for a single timestep.\r\n        :param model: the model to evaluate loss on.\r\n        :param x_start: the [N x C x ...] tensor of inputs.\r\n        :param t: a batch of timestep indices.\r\n        :param model_kwargs: if not None, a dict of extra keyword arguments to\r\n            pass to the model. This can be used for conditioning.\r\n        :param noise: if specified, the specific Gaussian noise to try to remove.\r\n        :return: a dict with the key \"loss\" containing a tensor of shape [N].\r\n                 Some mean or variance settings may also have other keys.\r\n        \"\"\"\r\n        if model_kwargs is None:\r\n            model_kwargs = {}\r\n        if noise is None:\r\n            noise = th.randn_like(x_start)\r\n        x_t = self.q_sample(x_start, t, noise=noise)\r\n\r\n        terms = {}\r\n\r\n        if self.loss_type == LossType.KL or self.loss_type == LossType.RESCALED_KL:\r\n            terms[\"loss\"] = self._vb_terms_bpd(\r\n                model=model,\r\n                x_start=x_start,\r\n                x_t=x_t,\r\n                t=t,\r\n                clip_denoised=False,\r\n                model_kwargs=model_kwargs,\r\n            )[\"output\"]\r\n            if self.loss_type == LossType.RESCALED_KL:\r\n                terms[\"loss\"] *= self.num_timesteps\r\n        elif self.loss_type == LossType.MSE or self.loss_type == LossType.RESCALED_MSE:\r\n            model_output = model(x_t, t, **model_kwargs)\r\n\r\n            if self.model_var_type in [\r\n                ModelVarType.LEARNED,\r\n                ModelVarType.LEARNED_RANGE,\r\n            ]:\r\n                B, C = x_t.shape[:2]\r\n                assert model_output.shape == (B, C * 2, *x_t.shape[2:])\r\n                model_output, model_var_values = th.split(model_output, C, dim=1)\r\n                # Learn the variance using the variational bound, but don't let\r\n                # it affect our mean prediction.\r\n                frozen_out = th.cat([model_output.detach(), model_var_values], dim=1)\r\n                terms[\"vb\"] = self._vb_terms_bpd(\r\n                    model=lambda *args, r=frozen_out: r,\r\n                    x_start=x_start,\r\n                    x_t=x_t,\r\n                    t=t,\r\n                    clip_denoised=False,\r\n                )[\"output\"]\r\n                if self.loss_type == LossType.RESCALED_MSE:\r\n                    # Divide by 1000 for equivalence with initial implementation.\r\n                    # Without a factor of 1/1000, the VB term hurts the MSE term.\r\n                    terms[\"vb\"] *= self.num_timesteps / 1000.0\r\n\r\n            target = {\r\n                ModelMeanType.PREVIOUS_X: self.q_posterior_mean_variance(\r\n                    x_start=x_start, x_t=x_t, t=t\r\n                )[0],\r\n                ModelMeanType.START_X: x_start,\r\n                ModelMeanType.EPSILON: noise,\r\n            }[self.model_mean_type]\r\n            assert model_output.shape == target.shape == x_start.shape\r\n            terms[\"mse\"] = mean_flat((target - model_output) ** 2)\r\n            if \"vb\" in terms:\r\n                terms[\"loss\"] = terms[\"mse\"] + terms[\"vb\"]\r\n            else:\r\n                terms[\"loss\"] = terms[\"mse\"]\r\n        else:\r\n            raise NotImplementedError(self.loss_type)\r\n\r\n        return terms\r\n\r\n    def _prior_bpd(self, x_start):\r\n        \"\"\"\r\n        Get the prior KL term for the variational lower-bound, measured in\r\n        bits-per-dim.\r\n        This term can't be optimized, as it only depends on the encoder.\r\n        :param x_start: the [N x C x ...] tensor of inputs.\r\n        :return: a batch of [N] KL values (in bits), one per batch element.\r\n        \"\"\"\r\n        batch_size = x_start.shape[0]\r\n        t = th.tensor([self.num_timesteps - 1] * batch_size, device=x_start.device)\r\n        qt_mean, _, qt_log_variance = self.q_mean_variance(x_start, t)\r\n        kl_prior = normal_kl(\r\n            mean1=qt_mean, logvar1=qt_log_variance, mean2=0.0, logvar2=0.0\r\n        )\r\n        return mean_flat(kl_prior) / np.log(2.0)\r\n\r\n    def calc_bpd_loop(self, model, x_start, clip_denoised=True, model_kwargs=None):\r\n        \"\"\"\r\n        Compute the entire variational lower-bound, measured in bits-per-dim,\r\n        as well as other related quantities.\r\n        :param model: the model to evaluate loss on.\r\n        :param x_start: the [N x C x ...] tensor of inputs.\r\n        :param clip_denoised: if True, clip denoised samples.\r\n        :param model_kwargs: if not None, a dict of extra keyword arguments to\r\n            pass to the model. This can be used for conditioning.\r\n        :return: a dict containing the following keys:\r\n                 - total_bpd: the total variational lower-bound, per batch element.\r\n                 - prior_bpd: the prior term in the lower-bound.\r\n                 - vb: an [N x T] tensor of terms in the lower-bound.\r\n                 - xstart_mse: an [N x T] tensor of x_0 MSEs for each timestep.\r\n                 - mse: an [N x T] tensor of epsilon MSEs for each timestep.\r\n        \"\"\"\r\n        device = x_start.device\r\n        batch_size = x_start.shape[0]\r\n\r\n        vb = []\r\n        xstart_mse = []\r\n        mse = []\r\n        for t in list(range(self.num_timesteps))[::-1]:\r\n            t_batch = th.tensor([t] * batch_size, device=device)\r\n            noise = th.randn_like(x_start)\r\n            x_t = self.q_sample(x_start=x_start, t=t_batch, noise=noise)\r\n            # Calculate VLB term at the current timestep\r\n            with th.no_grad():\r\n                out = self._vb_terms_bpd(\r\n                    model,\r\n                    x_start=x_start,\r\n                    x_t=x_t,\r\n                    t=t_batch,\r\n                    clip_denoised=clip_denoised,\r\n                    model_kwargs=model_kwargs,\r\n                )\r\n            vb.append(out[\"output\"])\r\n            xstart_mse.append(mean_flat((out[\"pred_xstart\"] - x_start) ** 2))\r\n            eps = self._predict_eps_from_xstart(x_t, t_batch, out[\"pred_xstart\"])\r\n            mse.append(mean_flat((eps - noise) ** 2))\r\n\r\n        vb = th.stack(vb, dim=1)\r\n        xstart_mse = th.stack(xstart_mse, dim=1)\r\n        mse = th.stack(mse, dim=1)\r\n\r\n        prior_bpd = self._prior_bpd(x_start)\r\n        total_bpd = vb.sum(dim=1) + prior_bpd\r\n        return {\r\n            \"total_bpd\": total_bpd,\r\n            \"prior_bpd\": prior_bpd,\r\n            \"vb\": vb,\r\n            \"xstart_mse\": xstart_mse,\r\n            \"mse\": mse,\r\n        }\r\n\r\n\r\ndef _extract_into_tensor(arr, timesteps, broadcast_shape):\r\n    \"\"\"\r\n    Extract values from a 1-D numpy array for a batch of indices.\r\n    :param arr: the 1-D numpy array.\r\n    :param timesteps: a tensor of indices into the array to extract.\r\n    :param broadcast_shape: a larger shape of K dimensions with the batch\r\n                            dimension equal to the length of timesteps.\r\n    :return: a tensor of shape [batch_size, 1, ...] where the shape has K dims.\r\n    \"\"\"\r\n    res = th.from_numpy(arr).to(device=timesteps.device)[timesteps].float()\r\n    while len(res.shape) < len(broadcast_shape):\r\n        res = res[..., None]\r\n    return res + th.zeros(broadcast_shape, device=timesteps.device)\r\n"
  },
  {
    "path": "DiT-ToCa/diffusion/respace.py",
    "content": "# Modified from OpenAI's diffusion repos\r\n#     GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py\r\n#     ADM:   https://github.com/openai/guided-diffusion/blob/main/guided_diffusion\r\n#     IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\r\n\r\nimport numpy as np\r\nimport torch as th\r\n\r\nfrom .gaussian_diffusion import GaussianDiffusion\r\n\r\n\r\ndef space_timesteps(num_timesteps, section_counts):\r\n    \"\"\"\r\n    Create a list of timesteps to use from an original diffusion process,\r\n    given the number of timesteps we want to take from equally-sized portions\r\n    of the original process.\r\n    For example, if there's 300 timesteps and the section counts are [10,15,20]\r\n    then the first 100 timesteps are strided to be 10 timesteps, the second 100\r\n    are strided to be 15 timesteps, and the final 100 are strided to be 20.\r\n    If the stride is a string starting with \"ddim\", then the fixed striding\r\n    from the DDIM paper is used, and only one section is allowed.\r\n    :param num_timesteps: the number of diffusion steps in the original\r\n                          process to divide up.\r\n    :param section_counts: either a list of numbers, or a string containing\r\n                           comma-separated numbers, indicating the step count\r\n                           per section. As a special case, use \"ddimN\" where N\r\n                           is a number of steps to use the striding from the\r\n                           DDIM paper.\r\n    :return: a set of diffusion steps from the original process to use.\r\n    \"\"\"\r\n    if isinstance(section_counts, str):\r\n        if section_counts.startswith(\"ddim\"):\r\n            desired_count = int(section_counts[len(\"ddim\") :])\r\n            for i in range(1, num_timesteps):\r\n                if len(range(0, num_timesteps, i)) == desired_count:\r\n                    return set(range(0, num_timesteps, i))\r\n            raise ValueError(\r\n                f\"cannot create exactly {num_timesteps} steps with an integer stride\"\r\n            )\r\n        section_counts = [int(x) for x in section_counts.split(\",\")]\r\n    size_per = num_timesteps // len(section_counts)\r\n    extra = num_timesteps % len(section_counts)\r\n    start_idx = 0\r\n    all_steps = []\r\n    for i, section_count in enumerate(section_counts):\r\n        size = size_per + (1 if i < extra else 0)\r\n        if size < section_count:\r\n            raise ValueError(\r\n                f\"cannot divide section of {size} steps into {section_count}\"\r\n            )\r\n        if section_count <= 1:\r\n            frac_stride = 1\r\n        else:\r\n            frac_stride = (size - 1) / (section_count - 1)\r\n        cur_idx = 0.0\r\n        taken_steps = []\r\n        for _ in range(section_count):\r\n            taken_steps.append(start_idx + round(cur_idx))\r\n            cur_idx += frac_stride\r\n        all_steps += taken_steps\r\n        start_idx += size\r\n    return set(all_steps)\r\n\r\n\r\nclass SpacedDiffusion(GaussianDiffusion):\r\n    \"\"\"\r\n    A diffusion process which can skip steps in a base diffusion process.\r\n    :param use_timesteps: a collection (sequence or set) of timesteps from the\r\n                          original diffusion process to retain.\r\n    :param kwargs: the kwargs to create the base diffusion process.\r\n    \"\"\"\r\n\r\n    def __init__(self, use_timesteps, **kwargs):\r\n        self.use_timesteps = set(use_timesteps)\r\n        self.timestep_map = []\r\n        self.original_num_steps = len(kwargs[\"betas\"])\r\n\r\n        base_diffusion = GaussianDiffusion(**kwargs)  # pylint: disable=missing-kwoa\r\n        last_alpha_cumprod = 1.0\r\n        new_betas = []\r\n        for i, alpha_cumprod in enumerate(base_diffusion.alphas_cumprod):\r\n            if i in self.use_timesteps:\r\n                new_betas.append(1 - alpha_cumprod / last_alpha_cumprod)\r\n                last_alpha_cumprod = alpha_cumprod\r\n                self.timestep_map.append(i)\r\n        kwargs[\"betas\"] = np.array(new_betas)\r\n        super().__init__(**kwargs)\r\n\r\n    def p_mean_variance(\r\n        self, model, *args, **kwargs\r\n    ):  # pylint: disable=signature-differs\r\n        return super().p_mean_variance(self._wrap_model(model), *args, **kwargs)\r\n\r\n    def training_losses(\r\n        self, model, *args, **kwargs\r\n    ):  # pylint: disable=signature-differs\r\n        return super().training_losses(self._wrap_model(model), *args, **kwargs)\r\n\r\n    def condition_mean(self, cond_fn, *args, **kwargs):\r\n        return super().condition_mean(self._wrap_model(cond_fn), *args, **kwargs)\r\n\r\n    def condition_score(self, cond_fn, *args, **kwargs):\r\n        return super().condition_score(self._wrap_model(cond_fn), *args, **kwargs)\r\n\r\n    def _wrap_model(self, model):\r\n        if isinstance(model, _WrappedModel):\r\n            return model\r\n        return _WrappedModel(\r\n            model, self.timestep_map, self.original_num_steps\r\n        )\r\n\r\n    def _scale_timesteps(self, t):\r\n        # Scaling is done by the wrapped model.\r\n        return t\r\n\r\n\r\nclass _WrappedModel:\r\n    def __init__(self, model, timestep_map, original_num_steps):\r\n        self.model = model\r\n        self.timestep_map = timestep_map\r\n        # self.rescale_timesteps = rescale_timesteps\r\n        self.original_num_steps = original_num_steps\r\n\r\n    def __call__(self, x, ts, **kwargs):\r\n        map_tensor = th.tensor(self.timestep_map, device=ts.device, dtype=ts.dtype)\r\n        new_ts = map_tensor[ts]\r\n        # if self.rescale_timesteps:\r\n        #     new_ts = new_ts.float() * (1000.0 / self.original_num_steps)\r\n        return self.model(x, new_ts, **kwargs)\r\n"
  },
  {
    "path": "DiT-ToCa/diffusion/timestep_sampler.py",
    "content": "# Modified from OpenAI's diffusion repos\r\n#     GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py\r\n#     ADM:   https://github.com/openai/guided-diffusion/blob/main/guided_diffusion\r\n#     IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\r\n\r\nfrom abc import ABC, abstractmethod\r\n\r\nimport numpy as np\r\nimport torch as th\r\nimport torch.distributed as dist\r\n\r\n\r\ndef create_named_schedule_sampler(name, diffusion):\r\n    \"\"\"\r\n    Create a ScheduleSampler from a library of pre-defined samplers.\r\n    :param name: the name of the sampler.\r\n    :param diffusion: the diffusion object to sample for.\r\n    \"\"\"\r\n    if name == \"uniform\":\r\n        return UniformSampler(diffusion)\r\n    elif name == \"loss-second-moment\":\r\n        return LossSecondMomentResampler(diffusion)\r\n    else:\r\n        raise NotImplementedError(f\"unknown schedule sampler: {name}\")\r\n\r\n\r\nclass ScheduleSampler(ABC):\r\n    \"\"\"\r\n    A distribution over timesteps in the diffusion process, intended to reduce\r\n    variance of the objective.\r\n    By default, samplers perform unbiased importance sampling, in which the\r\n    objective's mean is unchanged.\r\n    However, subclasses may override sample() to change how the resampled\r\n    terms are reweighted, allowing for actual changes in the objective.\r\n    \"\"\"\r\n\r\n    @abstractmethod\r\n    def weights(self):\r\n        \"\"\"\r\n        Get a numpy array of weights, one per diffusion step.\r\n        The weights needn't be normalized, but must be positive.\r\n        \"\"\"\r\n\r\n    def sample(self, batch_size, device):\r\n        \"\"\"\r\n        Importance-sample timesteps for a batch.\r\n        :param batch_size: the number of timesteps.\r\n        :param device: the torch device to save to.\r\n        :return: a tuple (timesteps, weights):\r\n                 - timesteps: a tensor of timestep indices.\r\n                 - weights: a tensor of weights to scale the resulting losses.\r\n        \"\"\"\r\n        w = self.weights()\r\n        p = w / np.sum(w)\r\n        indices_np = np.random.choice(len(p), size=(batch_size,), p=p)\r\n        indices = th.from_numpy(indices_np).long().to(device)\r\n        weights_np = 1 / (len(p) * p[indices_np])\r\n        weights = th.from_numpy(weights_np).float().to(device)\r\n        return indices, weights\r\n\r\n\r\nclass UniformSampler(ScheduleSampler):\r\n    def __init__(self, diffusion):\r\n        self.diffusion = diffusion\r\n        self._weights = np.ones([diffusion.num_timesteps])\r\n\r\n    def weights(self):\r\n        return self._weights\r\n\r\n\r\nclass LossAwareSampler(ScheduleSampler):\r\n    def update_with_local_losses(self, local_ts, local_losses):\r\n        \"\"\"\r\n        Update the reweighting using losses from a model.\r\n        Call this method from each rank with a batch of timesteps and the\r\n        corresponding losses for each of those timesteps.\r\n        This method will perform synchronization to make sure all of the ranks\r\n        maintain the exact same reweighting.\r\n        :param local_ts: an integer Tensor of timesteps.\r\n        :param local_losses: a 1D Tensor of losses.\r\n        \"\"\"\r\n        batch_sizes = [\r\n            th.tensor([0], dtype=th.int32, device=local_ts.device)\r\n            for _ in range(dist.get_world_size())\r\n        ]\r\n        dist.all_gather(\r\n            batch_sizes,\r\n            th.tensor([len(local_ts)], dtype=th.int32, device=local_ts.device),\r\n        )\r\n\r\n        # Pad all_gather batches to be the maximum batch size.\r\n        batch_sizes = [x.item() for x in batch_sizes]\r\n        max_bs = max(batch_sizes)\r\n\r\n        timestep_batches = [th.zeros(max_bs).to(local_ts) for bs in batch_sizes]\r\n        loss_batches = [th.zeros(max_bs).to(local_losses) for bs in batch_sizes]\r\n        dist.all_gather(timestep_batches, local_ts)\r\n        dist.all_gather(loss_batches, local_losses)\r\n        timesteps = [\r\n            x.item() for y, bs in zip(timestep_batches, batch_sizes) for x in y[:bs]\r\n        ]\r\n        losses = [x.item() for y, bs in zip(loss_batches, batch_sizes) for x in y[:bs]]\r\n        self.update_with_all_losses(timesteps, losses)\r\n\r\n    @abstractmethod\r\n    def update_with_all_losses(self, ts, losses):\r\n        \"\"\"\r\n        Update the reweighting using losses from a model.\r\n        Sub-classes should override this method to update the reweighting\r\n        using losses from the model.\r\n        This method directly updates the reweighting without synchronizing\r\n        between workers. It is called by update_with_local_losses from all\r\n        ranks with identical arguments. Thus, it should have deterministic\r\n        behavior to maintain state across workers.\r\n        :param ts: a list of int timesteps.\r\n        :param losses: a list of float losses, one per timestep.\r\n        \"\"\"\r\n\r\n\r\nclass LossSecondMomentResampler(LossAwareSampler):\r\n    def __init__(self, diffusion, history_per_term=10, uniform_prob=0.001):\r\n        self.diffusion = diffusion\r\n        self.history_per_term = history_per_term\r\n        self.uniform_prob = uniform_prob\r\n        self._loss_history = np.zeros(\r\n            [diffusion.num_timesteps, history_per_term], dtype=np.float64\r\n        )\r\n        self._loss_counts = np.zeros([diffusion.num_timesteps], dtype=np.int)\r\n\r\n    def weights(self):\r\n        if not self._warmed_up():\r\n            return np.ones([self.diffusion.num_timesteps], dtype=np.float64)\r\n        weights = np.sqrt(np.mean(self._loss_history ** 2, axis=-1))\r\n        weights /= np.sum(weights)\r\n        weights *= 1 - self.uniform_prob\r\n        weights += self.uniform_prob / len(weights)\r\n        return weights\r\n\r\n    def update_with_all_losses(self, ts, losses):\r\n        for t, loss in zip(ts, losses):\r\n            if self._loss_counts[t] == self.history_per_term:\r\n                # Shift out the oldest loss term.\r\n                self._loss_history[t, :-1] = self._loss_history[t, 1:]\r\n                self._loss_history[t, -1] = loss\r\n            else:\r\n                self._loss_history[t, self._loss_counts[t]] = loss\r\n                self._loss_counts[t] += 1\r\n\r\n    def _warmed_up(self):\r\n        return (self._loss_counts == self.history_per_term).all()\r\n"
  },
  {
    "path": "DiT-ToCa/download.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates.\r\n# All rights reserved.\r\n\r\n# This source code is licensed under the license found in the\r\n# LICENSE file in the root directory of this source tree.\r\n\r\n\"\"\"\r\nFunctions for downloading pre-trained DiT models\r\n\"\"\"\r\nfrom torchvision.datasets.utils import download_url\r\nimport torch\r\nimport os\r\n\r\n\r\npretrained_models = {'DiT-XL-2-512x512.pt', 'DiT-XL-2-256x256.pt'}\r\n\r\n\r\ndef find_model(model_name):\r\n    \"\"\"\r\n    Finds a pre-trained DiT model, downloading it if necessary. Alternatively, loads a model from a local path.\r\n    \"\"\"\r\n    if model_name in pretrained_models:  # Find/download our pre-trained DiT checkpoints\r\n        return download_model(model_name)\r\n    else:  # Load a custom DiT checkpoint:\r\n        assert os.path.isfile(model_name), f'Could not find DiT checkpoint at {model_name}'\r\n        checkpoint = torch.load(model_name, map_location=lambda storage, loc: storage, weights_only=True)\r\n        if \"ema\" in checkpoint:  # supports checkpoints from train.py\r\n            checkpoint = checkpoint[\"ema\"]\r\n        return checkpoint\r\n\r\n\r\ndef download_model(model_name):\r\n    \"\"\"\r\n    Downloads a pre-trained DiT model from the web.\r\n    \"\"\"\r\n    assert model_name in pretrained_models\r\n    local_path = f'pretrained_models/{model_name}'\r\n    if not os.path.isfile(local_path):\r\n        os.makedirs('pretrained_models', exist_ok=True)\r\n        web_path = f'https://dl.fbaipublicfiles.com/DiT/models/{model_name}'\r\n        download_url(web_path, 'pretrained_models')\r\n    model = torch.load(local_path, map_location=lambda storage, loc: storage)\r\n    return model\r\n\r\n\r\nif __name__ == \"__main__\":\r\n    # Download all DiT checkpoints\r\n    for model in pretrained_models:\r\n        download_model(model)\r\n    print('Done.')\r\n"
  },
  {
    "path": "DiT-ToCa/environment-dit.yml",
    "content": "name: base\nchannels:\n  - pytorch\n  - nvidia\n  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main\n  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/\n  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/\n  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/\n  - defaults\ndependencies:\n  - _libgcc_mutex=0.1=main\n  - _openmp_mutex=5.1=1_gnu\n  - aiohttp=3.9.5=py312h5eee18b_0\n  - aiosignal=1.2.0=pyhd3eb1b0_0\n  - anaconda-anon-usage=0.4.4=py312hfc0e8ea_100\n  - archspec=0.2.3=pyhd3eb1b0_0\n  - arrow-cpp=16.1.0=hc1eb8f0_0\n  - aws-c-auth=0.6.19=h5eee18b_0\n  - aws-c-cal=0.5.20=hdbd6064_0\n  - aws-c-common=0.8.5=h5eee18b_0\n  - aws-c-compression=0.2.16=h5eee18b_0\n  - aws-c-event-stream=0.2.15=h6a678d5_0\n  - aws-c-http=0.6.25=h5eee18b_0\n  - aws-c-io=0.13.10=h5eee18b_0\n  - aws-c-mqtt=0.7.13=h5eee18b_0\n  - aws-c-s3=0.1.51=hdbd6064_0\n  - aws-c-sdkutils=0.1.6=h5eee18b_0\n  - aws-checksums=0.1.13=h5eee18b_0\n  - aws-crt-cpp=0.18.16=h6a678d5_0\n  - aws-sdk-cpp=1.10.55=h721c034_0\n  - blas=1.0=mkl\n  - boltons=23.0.0=py312h06a4308_0\n  - boost-cpp=1.82.0=hdb19cb5_2\n  - bottleneck=1.3.7=py312ha883a20_0\n  - brotli-python=1.0.9=py312h6a678d5_8\n  - bzip2=1.0.8=h5eee18b_6\n  - c-ares=1.19.1=h5eee18b_0\n  - ca-certificates=2024.7.2=h06a4308_0\n  - certifi=2024.7.4=py312h06a4308_0\n  - cffi=1.16.0=py312h5eee18b_1\n  - charset-normalizer=2.0.4=pyhd3eb1b0_0\n  - conda=24.7.1=py312h06a4308_0\n  - conda-content-trust=0.2.0=py312h06a4308_1\n  - conda-libmamba-solver=24.1.0=pyhd3eb1b0_0\n  - conda-package-handling=2.2.0=py312h06a4308_1\n  - conda-package-streaming=0.9.0=py312h06a4308_0\n  - cryptography=42.0.5=py312hdda0065_1\n  - cuda-cudart=12.1.105=0\n  - cuda-cupti=12.1.105=0\n  - cuda-libraries=12.1.0=0\n  - cuda-nvrtc=12.1.105=0\n  - cuda-nvtx=12.1.105=0\n  - cuda-opencl=12.6.37=0\n  - cuda-runtime=12.1.0=0\n  - cuda-version=12.6=3\n  - datasets=2.19.1=py312h06a4308_0\n  - diffusers=0.18.2=py312he106c6f_0\n  - diffusers-base=0.18.2=py312he106c6f_0\n  - diffusers-torch=0.18.2=py312he106c6f_0\n  - dill=0.3.8=py312h06a4308_0\n  - distro=1.9.0=py312h06a4308_0\n  - expat=2.6.2=h6a678d5_0\n  - ffmpeg=4.3=hf484d3e_0\n  - fmt=9.1.0=hdb19cb5_1\n  - freetype=2.12.1=h4a9f257_0\n  - frozendict=2.4.2=py312h06a4308_0\n  - frozenlist=1.4.0=py312h5eee18b_0\n  - gflags=2.2.2=h6a678d5_1\n  - glog=0.5.0=h6a678d5_1\n  - gmp=6.2.1=h295c915_3\n  - gnutls=3.6.15=he1e5248_0\n  - huggingface_accelerate=0.21.0=py312h06a4308_0\n  - huggingface_hub=0.23.1=py312h06a4308_0\n  - icu=73.1=h6a678d5_0\n  - idna=3.7=py312h06a4308_0\n  - importlib-metadata=7.0.1=py312h06a4308_0\n  - intel-openmp=2023.1.0=hdb19cb5_46306\n  - jinja2=3.1.4=py312h06a4308_0\n  - jpeg=9e=h5eee18b_3\n  - jsonpatch=1.33=py312h06a4308_1\n  - jsonpointer=2.1=pyhd3eb1b0_0\n  - krb5=1.20.1=h143b758_1\n  - lame=3.100=h7b6447c_0\n  - lcms2=2.12=h3be6417_0\n  - ld_impl_linux-64=2.38=h1181459_1\n  - lerc=3.0=h295c915_0\n  - libabseil=20240116.2=cxx17_h6a678d5_0\n  - libarchive=3.6.2=h6ac8c49_3\n  - libboost=1.82.0=h109eef0_2\n  - libbrotlicommon=1.0.9=h5eee18b_8\n  - libbrotlidec=1.0.9=h5eee18b_8\n  - libbrotlienc=1.0.9=h5eee18b_8\n  - libcublas=12.1.0.26=0\n  - libcufft=11.0.2.4=0\n  - libcufile=1.11.0.15=0\n  - libcurand=10.3.7.37=0\n  - libcurl=8.7.1=h251f7ec_0\n  - libcusolver=11.4.4.55=0\n  - libcusparse=12.0.2.55=0\n  - libdeflate=1.17=h5eee18b_1\n  - libedit=3.1.20230828=h5eee18b_0\n  - libev=4.33=h7f8727e_1\n  - libevent=2.1.12=hdbd6064_1\n  - libffi=3.4.4=h6a678d5_1\n  - libgcc-ng=11.2.0=h1234567_1\n  - libgomp=11.2.0=h1234567_1\n  - libgrpc=1.62.2=h2d74bed_0\n  - libiconv=1.16=h5eee18b_3\n  - libidn2=2.3.4=h5eee18b_0\n  - libjpeg-turbo=2.0.0=h9bf148f_0\n  - libmamba=1.5.8=hfe524e5_2\n  - libmambapy=1.5.8=py312h2dafd23_2\n  - libnghttp2=1.57.0=h2d74bed_0\n  - libnpp=12.0.2.50=0\n  - libnvjitlink=12.1.105=0\n  - libnvjpeg=12.1.1.14=0\n  - libpng=1.6.39=h5eee18b_0\n  - libprotobuf=4.25.3=he621ea3_0\n  - libsolv=0.7.24=he621ea3_1\n  - libssh2=1.11.0=h251f7ec_0\n  - libstdcxx-ng=11.2.0=h1234567_1\n  - libtasn1=4.19.0=h5eee18b_0\n  - libthrift=0.15.0=h1795dd8_2\n  - libtiff=4.5.1=h6a678d5_0\n  - libunistring=0.9.10=h27cfd23_0\n  - libuuid=1.41.5=h5eee18b_0\n  - libwebp-base=1.3.2=h5eee18b_0\n  - libxml2=2.10.4=hfdd30dd_2\n  - llvm-openmp=14.0.6=h9e868ea_0\n  - lz4-c=1.9.4=h6a678d5_1\n  - menuinst=2.0.2=py312h06a4308_1\n  - mkl=2023.1.0=h213fc3f_46344\n  - mkl-service=2.4.0=py312h5eee18b_1\n  - mkl_fft=1.3.8=py312h5eee18b_0\n  - mkl_random=1.2.4=py312hdb19cb5_0\n  - mpmath=1.3.0=py312h06a4308_0\n  - multidict=6.0.4=py312h5eee18b_0\n  - multiprocess=0.70.15=py312h06a4308_0\n  - ncurses=6.4=h6a678d5_0\n  - nettle=3.7.3=hbbd107a_1\n  - networkx=3.3=py312h06a4308_0\n  - numexpr=2.8.7=py312hf827012_0\n  - numpy=1.26.4=py312hc5e2394_0\n  - numpy-base=1.26.4=py312h0da6c21_0\n  - openh264=2.1.1=h4ff587b_0\n  - openjpeg=2.5.2=he7f1fd0_0\n  - openssl=3.0.14=h5eee18b_0\n  - orc=2.0.1=h2d29ad5_0\n  - packaging=23.2=py312h06a4308_0\n  - pandas=2.2.2=py312h526ad5a_0\n  - pcre2=10.42=hebb0a14_1\n  - pip=24.0=py312h06a4308_0\n  - platformdirs=3.10.0=py312h06a4308_0\n  - pluggy=1.0.0=py312h06a4308_1\n  - pyarrow=16.1.0=py312h526ad5a_0\n  - pybind11-abi=5=hd3eb1b0_0\n  - pycosat=0.6.6=py312h5eee18b_1\n  - pycparser=2.21=pyhd3eb1b0_0\n  - pysocks=1.7.1=py312h06a4308_0\n  - python=3.12.3=h996f2a0_1\n  - python-dateutil=2.9.0post0=py312h06a4308_2\n  - python-tzdata=2023.3=pyhd3eb1b0_0\n  - python-xxhash=2.0.2=py312h5eee18b_1\n  - pytorch=2.4.0=py3.12_cuda12.1_cudnn9.1.0_0\n  - pytorch-cuda=12.1=ha16c6d3_5\n  - pytorch-mutex=1.0=cuda\n  - pytz=2024.1=py312h06a4308_0\n  - pyyaml=6.0.1=py312h5eee18b_0\n  - re2=2022.04.01=h295c915_0\n  - readline=8.2=h5eee18b_0\n  - regex=2024.7.24=py312h5eee18b_0\n  - reproc=14.2.4=h6a678d5_2\n  - reproc-cpp=14.2.4=h6a678d5_2\n  - requests=2.31.0=py312h06a4308_1\n  - ruamel.yaml=0.17.21=py312h5eee18b_0\n  - s2n=1.3.27=hdbd6064_0\n  - safetensors=0.4.2=py312hb7cc22b_1\n  - setuptools=69.5.1=py312h06a4308_0\n  - six=1.16.0=pyhd3eb1b0_1\n  - snappy=1.2.1=h6a678d5_0\n  - sqlite=3.45.3=h5eee18b_0\n  - tbb=2021.8.0=hdb19cb5_0\n  - tk=8.6.14=h39e8969_0\n  - tokenizers=0.19.1=py312ha11519a_0\n  - torchaudio=2.4.0=py312_cu121\n  - torchtriton=3.0.0=py312\n  - tqdm=4.66.2=py312he106c6f_0\n  - transformers=4.41.2=py312h06a4308_0\n  - truststore=0.8.0=py312h06a4308_0\n  - typing_extensions=4.11.0=py312h06a4308_0\n  - tzdata=2024a=h04d1e81_0\n  - urllib3=2.1.0=py312h06a4308_1\n  - utf8proc=2.6.1=h5eee18b_1\n  - wheel=0.43.0=py312h06a4308_0\n  - xxhash=0.8.0=h7f8727e_3\n  - xz=5.4.6=h5eee18b_1\n  - yaml=0.2.5=h7b6447c_0\n  - yaml-cpp=0.8.0=h6a678d5_1\n  - yarl=1.9.3=py312h5eee18b_0\n  - zipp=3.17.0=py312h06a4308_0\n  - zlib=1.2.13=h5eee18b_1\n  - zstandard=0.22.0=py312h2c38b39_0\n  - zstd=1.5.5=hc292b87_2\n  - pip:\n      - absl-py==2.1.0\n      - anyio==4.4.0\n      - argon2-cffi==23.1.0\n      - argon2-cffi-bindings==21.2.0\n      - arrow==1.3.0\n      - asttokens==2.4.1\n      - async-lru==2.0.4\n      - attrs==23.2.0\n      - babel==2.15.0\n      - beautifulsoup4==4.12.3\n      - bleach==6.1.0\n      - brokenaxes==0.6.2\n      - comm==0.2.2\n      - contourpy==1.2.1\n      - cycler==0.12.1\n      - debugpy==1.8.1\n      - decorator==5.1.1\n      - defusedxml==0.7.1\n      - executing==2.0.1\n      - fastjsonschema==2.19.1\n      - filelock==3.14.0\n      - fonttools==4.53.0\n      - fqdn==1.5.1\n      - fsspec==2024.5.0\n      - grpcio==1.64.0\n      - h11==0.14.0\n      - httpcore==1.0.5\n      - httpx==0.27.0\n      - ipykernel==6.29.4\n      - ipython==8.25.0\n      - ipywidgets==8.1.3\n      - isoduration==20.11.0\n      - jedi==0.19.1\n      - json5==0.9.25\n      - jsonschema==4.22.0\n      - jsonschema-specifications==2023.12.1\n      - jupyter-client==8.6.2\n      - jupyter-core==5.7.2\n      - jupyter-events==0.10.0\n      - jupyter-lsp==2.2.5\n      - jupyter-server==2.14.1\n      - jupyter-server-terminals==0.5.3\n      - jupyterlab==4.2.1\n      - jupyterlab-language-pack-zh-cn==4.2.post1\n      - jupyterlab-pygments==0.3.0\n      - jupyterlab-server==2.27.2\n      - jupyterlab-widgets==3.0.11\n      - kiwisolver==1.4.5\n      - markdown==3.6\n      - markupsafe==2.1.5\n      - matplotlib==3.9.0\n      - matplotlib-inline==0.1.7\n      - mistune==3.0.2\n      - nbclient==0.10.0\n      - nbconvert==7.16.4\n      - nbformat==5.10.4\n      - nest-asyncio==1.6.0\n      - notebook-shim==0.2.4\n      - nvidia-cublas-cu12==12.1.3.1\n      - nvidia-cuda-cupti-cu12==12.1.105\n      - nvidia-cuda-nvrtc-cu12==12.1.105\n      - nvidia-cuda-runtime-cu12==12.1.105\n      - nvidia-cudnn-cu12==9.1.0.70\n      - nvidia-cufft-cu12==11.0.2.54\n      - nvidia-curand-cu12==10.3.2.106\n      - nvidia-cusolver-cu12==11.4.5.107\n      - nvidia-cusparse-cu12==12.1.0.106\n      - nvidia-nccl-cu12==2.20.5\n      - nvidia-nvjitlink-cu12==12.5.40\n      - nvidia-nvtx-cu12==12.1.105\n      - overrides==7.7.0\n      - pandocfilters==1.5.1\n      - parso==0.8.4\n      - pexpect==4.9.0\n      - pillow==10.3.0\n      - prometheus-client==0.20.0\n      - prompt-toolkit==3.0.45\n      - protobuf==5.27.0\n      - psutil==5.9.8\n      - ptyprocess==0.7.0\n      - pure-eval==0.2.2\n      - pygments==2.18.0\n      - pyparsing==3.1.2\n      - python-json-logger==2.0.7\n      - pytorch-fid==0.3.0\n      - pyzmq==26.0.3\n      - referencing==0.35.1\n      - rfc3339-validator==0.1.4\n      - rfc3986-validator==0.1.1\n      - rpds-py==0.18.1\n      - scipy==1.14.1\n      - send2trash==1.8.3\n      - sniffio==1.3.1\n      - soupsieve==2.5\n      - stack-data==0.6.3\n      - supervisor==4.2.5\n      - sympy==1.12.1\n      - tensorboard==2.16.2\n      - tensorboard-data-server==0.7.2\n      - terminado==0.18.1\n      - timm==1.0.8\n      - tinycss2==1.3.0\n      - torch==2.4.0\n      - torchvision==0.19.0\n      - tornado==6.4\n      - traitlets==5.14.3\n      - triton==3.0.0\n      - types-python-dateutil==2.9.0.20240316\n      - typing-extensions==4.12.1\n      - uri-template==1.3.0\n      - wcwidth==0.2.13\n      - webcolors==1.13\n      - webencodings==0.5.1\n      - websocket-client==1.8.0\n      - werkzeug==3.0.3\n      - widgetsnbextension==4.0.11\nprefix: /root/miniconda3\n"
  },
  {
    "path": "DiT-ToCa/models.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates.\r\n# All rights reserved.\r\n\r\n# This source code is licensed under the license found in the\r\n# LICENSE file in the root directory of this source tree.\r\n# --------------------------------------------------------\r\n# References:\r\n# GLIDE: https://github.com/openai/glide-text2im\r\n# MAE: https://github.com/facebookresearch/mae/blob/main/models_mae.py\r\n# --------------------------------------------------------\r\n\r\nimport torch\r\nimport torch.nn as nn\r\nimport numpy as np\r\nimport math\r\n#from timm.models.vision_transformer import PatchEmbed, Attention, Mlp\r\nfrom timm.models.vision_transformer import PatchEmbed, Mlp\r\n#import os.path as osp\r\nfrom cache_functions import global_force_fresh, cache_cutfresh, update_cache, force_init, Attention, cal_type\r\n\r\n\r\ndef modulate(x, shift, scale):\r\n    return x * (1 + scale.unsqueeze(1)) + shift.unsqueeze(1)\r\n\r\n\r\n#################################################################################\r\n#               Embedding Layers for Timesteps and Class Labels                 #\r\n#################################################################################\r\n\r\nclass TimestepEmbedder(nn.Module):\r\n    \"\"\"\r\n    Embeds scalar timesteps into vector representations.\r\n    \"\"\"\r\n    def __init__(self, hidden_size, frequency_embedding_size=256):\r\n        super().__init__()\r\n        self.mlp = nn.Sequential(\r\n            nn.Linear(frequency_embedding_size, hidden_size, bias=True),\r\n            nn.SiLU(),\r\n            nn.Linear(hidden_size, hidden_size, bias=True),\r\n        )\r\n        self.frequency_embedding_size = frequency_embedding_size\r\n\r\n    @staticmethod\r\n    def timestep_embedding(t, dim, max_period=10000):\r\n        \"\"\"\r\n        Create sinusoidal timestep embeddings.\r\n        :param t: a 1-D Tensor of N indices, one per batch element.\r\n                          These may be fractional.\r\n        :param dim: the dimension of the output.\r\n        :param max_period: controls the minimum frequency of the embeddings.\r\n        :return: an (N, D) Tensor of positional embeddings.\r\n        \"\"\"\r\n        # https://github.com/openai/glide-text2im/blob/main/glide_text2im/nn.py\r\n        half = dim // 2\r\n        freqs = torch.exp(\r\n            -math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half\r\n        ).to(device=t.device)\r\n        args = t[:, None].float() * freqs[None]\r\n        embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)\r\n        if dim % 2:\r\n            embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)\r\n        return embedding\r\n\r\n    def forward(self, t):\r\n        t_freq = self.timestep_embedding(t, self.frequency_embedding_size)\r\n        t_emb = self.mlp(t_freq)\r\n        return t_emb\r\n\r\n\r\nclass LabelEmbedder(nn.Module):\r\n    \"\"\"\r\n    Embeds class labels into vector representations. Also handles label dropout for classifier-free guidance.\r\n    \"\"\"\r\n    def __init__(self, num_classes, hidden_size, dropout_prob):\r\n        super().__init__()\r\n        use_cfg_embedding = dropout_prob > 0\r\n        self.embedding_table = nn.Embedding(num_classes + use_cfg_embedding, hidden_size)\r\n        self.num_classes = num_classes\r\n        self.dropout_prob = dropout_prob\r\n\r\n    def token_drop(self, labels, force_drop_ids=None):\r\n        \"\"\"\r\n        Drops labels to enable classifier-free guidance.\r\n        \"\"\"\r\n        if force_drop_ids is None:\r\n            drop_ids = torch.rand(labels.shape[0], device=labels.device) < self.dropout_prob\r\n        else:\r\n            drop_ids = force_drop_ids == 1\r\n        labels = torch.where(drop_ids, self.num_classes, labels)\r\n        return labels\r\n\r\n    def forward(self, labels, train, force_drop_ids=None):\r\n        use_dropout = self.dropout_prob > 0\r\n        if (train and use_dropout) or (force_drop_ids is not None):\r\n            labels = self.token_drop(labels, force_drop_ids)\r\n        embeddings = self.embedding_table(labels)\r\n        return embeddings\r\n\r\n\r\n\r\n#################################################################################\r\n#                                 Core DiT Model                                #\r\n#################################################################################\r\n\r\nclass DiTBlock(nn.Module):\r\n    \"\"\"\r\n    A DiT block with adaptive layer norm zero (adaLN-Zero) conditioning.\r\n    \"\"\"\r\n    def __init__(self, hidden_size, num_heads, mlp_ratio=4.0, **block_kwargs):\r\n        super().__init__()\r\n        self.norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\r\n        self.attn = Attention(hidden_size, num_heads=num_heads, qkv_bias=True, **block_kwargs)\r\n        self.norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\r\n        mlp_hidden_dim = int(hidden_size * mlp_ratio)\r\n        approx_gelu = lambda: nn.GELU(approximate=\"tanh\")\r\n        self.mlp = Mlp(in_features=hidden_size, hidden_features=mlp_hidden_dim, act_layer=approx_gelu, drop=0)\r\n        self.adaLN_modulation = nn.Sequential(\r\n            nn.SiLU(),\r\n            nn.Linear(hidden_size, 6 * hidden_size, bias=True)\r\n        )\r\n\r\n    def forward(self, x, c, current, cache_dic):\r\n        B, N, C = x.shape\r\n\r\n        layer = current['layer']\r\n\r\n        # FLOPs calculation initialization\r\n        flops = 0\r\n        test_FLOPs = cache_dic.get('test_FLOPs', False)  # check if test_FLOPs is enabled\r\n        \r\n        # determine current working status\r\n        cal_type(cache_dic, current)\r\n\r\n        if current['type'] == 'full':  # Force Activation: Compute all tokens and save them in cache\r\n\r\n            # AdaLN Modulation\r\n            shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = self.adaLN_modulation(c).chunk(6, dim=1)\r\n\r\n            # LayerNorm FLOPs (for both norm1 and norm2)\r\n            if test_FLOPs:\r\n                flops += 2 * B * N * C\r\n\r\n            # AdaLN FLOPs (SiLU and Linear)\r\n            if test_FLOPs:\r\n                flops += B * C  # SiLU FLOPs\r\n                flops += B * C * 6 * C  # Linear FLOPs in adaLN_modulation\r\n\r\n            current['module'] = 'attn'\r\n            attn_output, attn_map = self.attn(modulate(self.norm1(x), shift_msa, scale_msa), cache_dic=cache_dic, current=current)\r\n            cache_dic['cache'][-1][layer]['attn'] = attn_output\r\n            cache_dic['attn_map'][-1][layer] = attn_map\r\n            force_init(cache_dic, current, x)\r\n            x = x + gate_msa.unsqueeze(1) * attn_output\r\n\r\n            current['module'] = 'mlp'\r\n            mlp_output = self.mlp(modulate(self.norm2(x), shift_mlp, scale_mlp))\r\n            cache_dic['cache'][-1][layer]['mlp'] = mlp_output\r\n            force_init(cache_dic, current, x)\r\n            x = x + gate_mlp.unsqueeze(1) * mlp_output\r\n\r\n            # MLP FLOPs\r\n            if test_FLOPs:\r\n                mlp_hidden_dim = int(C * 4)  # Assuming mlp_ratio = 4\r\n                flops += B * N * C * mlp_hidden_dim * 2 # First projection\r\n                flops += B * N * mlp_hidden_dim * C * 2# Second projection\r\n                flops += B * N * mlp_hidden_dim * 6 # GELU activation\r\n\r\n        elif current['type'] == 'ToCa':  # Partial Computation: Compute only fresh tokens and save them in cache, no attention token computation in the final version\r\n            \r\n            # AdaLN Modulation\r\n            shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = self.adaLN_modulation(c).chunk(6, dim=1)\r\n            \r\n            # LayerNorm FLOPs (for both norm1 and norm2)\r\n            if test_FLOPs:\r\n                flops += 2 * B * N * C\r\n\r\n            # AdaLN FLOPs (SiLU and Linear)\r\n            if test_FLOPs:\r\n                flops += B * C  # SiLU FLOPs\r\n                flops += B * C * 6 * C  # Linear FLOPs in adaLN_modulation\r\n\r\n            current['module'] = 'attn'\r\n            x = x + gate_msa.unsqueeze(1) * cache_dic['cache'][-1][layer]['attn']\r\n\r\n            current['module'] = 'mlp'\r\n            fresh_indices, fresh_tokens = cache_cutfresh(cache_dic, x, current)\r\n            fresh_tokens = self.mlp(modulate(self.norm2(fresh_tokens), shift_mlp, scale_mlp))\r\n            update_cache(fresh_indices, fresh_tokens=fresh_tokens, cache_dic=cache_dic, current=current)\r\n            \r\n            x = x + gate_mlp.unsqueeze(1) * cache_dic['cache'][-1][layer]['mlp']\r\n            \r\n\r\n            # MLP FLOPs for the 'else' branch\r\n            if test_FLOPs:\r\n                B_fresh, N_fresh, C_fresh = fresh_tokens.shape\r\n                mlp_hidden_dim = int(C_fresh * 4)  # Assuming mlp_ratio = 4\r\n                flops += B_fresh * N_fresh * C_fresh * mlp_hidden_dim * 2 # First projection\r\n                flops += B_fresh * N_fresh * mlp_hidden_dim * C_fresh * 2 # Second projection\r\n                flops += B_fresh * N_fresh * mlp_hidden_dim * 6 # GELU activation\r\n\r\n        elif current['type'] == 'FORA':\r\n            \r\n            # AdaLN Modulation\r\n            shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = self.adaLN_modulation(c).chunk(6, dim=1)\r\n            \r\n            # AdaLN FLOPs (SiLU and Linear)\r\n            if test_FLOPs:\r\n                flops += B * C  # SiLU FLOPs\r\n                flops += B * C * 6 * C  # Linear FLOPs in adaLN_modulation\r\n\r\n            current['module'] = 'attn'\r\n            x = x + gate_msa.unsqueeze(1) * cache_dic['cache'][-1][layer]['attn']\r\n\r\n            current['module'] = 'mlp'\r\n            x = x + gate_mlp.unsqueeze(1) * cache_dic['cache'][-1][layer]['mlp']\r\n        \r\n        else:\r\n            current['module'] = 'skipped'\r\n            if current['layer'] == 27:\r\n                x = cache_dic['cache'][-1]['noise']\r\n\r\n        cache_dic['flops'] += flops\r\n\r\n        if current['layer'] == 27:\r\n            cache_dic['cache'][-1]['noise'] = x\r\n\r\n        return x\r\n\r\n\r\nclass FinalLayer(nn.Module):\r\n    \"\"\"\r\n    The final layer of DiT.\r\n    \"\"\"\r\n    def __init__(self, hidden_size, patch_size, out_channels):\r\n        super().__init__()\r\n        self.norm_final = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\r\n        self.linear = nn.Linear(hidden_size, patch_size * patch_size * out_channels, bias=True)\r\n        self.adaLN_modulation = nn.Sequential(\r\n            nn.SiLU(),\r\n            nn.Linear(hidden_size, 2 * hidden_size, bias=True)\r\n        )\r\n\r\n    def forward(self, x, c):\r\n        shift, scale = self.adaLN_modulation(c).chunk(2, dim=1)\r\n        x = modulate(self.norm_final(x), shift, scale)\r\n        x = self.linear(x)\r\n        return x\r\n\r\n\r\nclass DiT(nn.Module):\r\n    \"\"\"\r\n    Diffusion model with a Transformer backbone.\r\n    \"\"\"\r\n    def __init__(\r\n        self,\r\n        input_size=32,\r\n        patch_size=2,\r\n        in_channels=4,\r\n        hidden_size=1152,\r\n        depth=28,\r\n        num_heads=16,\r\n        mlp_ratio=4.0,\r\n        class_dropout_prob=0.1,\r\n        num_classes=1000,\r\n        learn_sigma=True,\r\n    ):\r\n        super().__init__()\r\n        self.learn_sigma = learn_sigma\r\n        self.in_channels = in_channels\r\n        self.out_channels = in_channels * 2 if learn_sigma else in_channels\r\n        self.patch_size = patch_size\r\n        self.num_heads = num_heads\r\n\r\n        self.x_embedder = PatchEmbed(input_size, patch_size, in_channels, hidden_size, bias=True)\r\n        self.t_embedder = TimestepEmbedder(hidden_size)\r\n        self.y_embedder = LabelEmbedder(num_classes, hidden_size, class_dropout_prob)\r\n        num_patches = self.x_embedder.num_patches\r\n        # Will use fixed sin-cos embedding:\r\n        self.pos_embed = nn.Parameter(torch.zeros(1, num_patches, hidden_size), requires_grad=False)\r\n\r\n        self.blocks = nn.ModuleList([\r\n            DiTBlock(hidden_size, num_heads, mlp_ratio=mlp_ratio) for _ in range(depth)\r\n        ])\r\n        self.final_layer = FinalLayer(hidden_size, patch_size, self.out_channels)\r\n        self.initialize_weights()\r\n\r\n    def initialize_weights(self):\r\n        # Initialize transformer layers:\r\n        def _basic_init(module):\r\n            if isinstance(module, nn.Linear):\r\n                torch.nn.init.xavier_uniform_(module.weight)\r\n                if module.bias is not None:\r\n                    nn.init.constant_(module.bias, 0)\r\n        self.apply(_basic_init)\r\n\r\n        # Initialize (and freeze) pos_embed by sin-cos embedding:\r\n        pos_embed = get_2d_sincos_pos_embed(self.pos_embed.shape[-1], int(self.x_embedder.num_patches ** 0.5))\r\n        self.pos_embed.data.copy_(torch.from_numpy(pos_embed).float().unsqueeze(0))\r\n\r\n        # Initialize patch_embed like nn.Linear (instead of nn.Conv2d):\r\n        w = self.x_embedder.proj.weight.data\r\n        nn.init.xavier_uniform_(w.view([w.shape[0], -1]))\r\n        nn.init.constant_(self.x_embedder.proj.bias, 0)\r\n\r\n        # Initialize label embedding table:\r\n        nn.init.normal_(self.y_embedder.embedding_table.weight, std=0.02)\r\n\r\n        # Initialize timestep embedding MLP:\r\n        nn.init.normal_(self.t_embedder.mlp[0].weight, std=0.02)\r\n        nn.init.normal_(self.t_embedder.mlp[2].weight, std=0.02)\r\n\r\n        # Zero-out adaLN modulation layers in DiT blocks:\r\n        for block in self.blocks:\r\n            nn.init.constant_(block.adaLN_modulation[-1].weight, 0)\r\n            nn.init.constant_(block.adaLN_modulation[-1].bias, 0)\r\n\r\n        # Zero-out output layers:\r\n        nn.init.constant_(self.final_layer.adaLN_modulation[-1].weight, 0)\r\n        nn.init.constant_(self.final_layer.adaLN_modulation[-1].bias, 0)\r\n        nn.init.constant_(self.final_layer.linear.weight, 0)\r\n        nn.init.constant_(self.final_layer.linear.bias, 0)\r\n\r\n    def unpatchify(self, x):\r\n        \"\"\"\r\n        x: (N, T, patch_size**2 * C)\r\n        imgs: (N, H, W, C)\r\n        \"\"\"\r\n        c = self.out_channels\r\n        p = self.x_embedder.patch_size[0]\r\n        h = w = int(x.shape[1] ** 0.5)\r\n        assert h * w == x.shape[1]\r\n\r\n        x = x.reshape(shape=(x.shape[0], h, w, p, p, c))\r\n        x = torch.einsum('nhwpqc->nchpwq', x)\r\n        imgs = x.reshape(shape=(x.shape[0], c, h * p, h * p))\r\n        return imgs\r\n\r\n    def forward(self, x, t, current, cache_dic, y): \r\n        \"\"\"\r\n        Forward pass of DiT.\r\n        x: (N, C, H, W) tensor of spatial inputs (images or latent representations of images)\r\n        t: (N,) tensor of diffusion timesteps\r\n        y: (N,) tensor of class labels\r\n        \"\"\"\r\n\r\n        x = self.x_embedder(x) + self.pos_embed  # (N, T, D), where T = H * W / patch_size ** 2\r\n        t = self.t_embedder(t)                   # (N, D)\r\n        y = self.y_embedder(y, self.training)    # (N, D)\r\n        c = t + y                                # (N, D)\r\n\r\n        for layeridx, block in enumerate(self.blocks):\r\n            current['layer'] = layeridx\r\n            x = block(x, c, current, cache_dic)                      # (N, T, D)\r\n\r\n        x = self.final_layer(x, c)                # (N, T, patch_size ** 2 * out_channels)\r\n        x = self.unpatchify(x)                   # (N, out_channels, H, W)\r\n        return x\r\n\r\n    \r\n    def forward_with_cfg(self, x, t, current, cache_dic, y, cfg_scale, **kwargs):\r\n    #def forward_with_cfg(self, x, t, y, cfg_scale):\r\n        \"\"\"\r\n        Forward pass of DiT, but also batches the unconditional forward pass for classifier-free guidance.\r\n        \"\"\"\r\n        # https://github.com/openai/glide-text2im/blob/main/notebooks/text2im.ipynb\r\n        half = x[: len(x) // 2]\r\n        combined = torch.cat([half, half], dim=0)\r\n        #model_out = self.forward(combined, t, y)\r\n        model_out = self.forward(combined, t, current, cache_dic, y)\r\n        # For exact reproducibility reasons, we apply classifier-free guidance on only\r\n        # three channels by default. The standard approach to cfg applies it to all channels.\r\n        # This can be done by uncommenting the following line and commenting-out the line following that.\r\n        # eps, rest = model_out[:, :self.in_channels], model_out[:, self.in_channels:]\r\n        eps, rest = model_out[:, :3], model_out[:, 3:]\r\n        cond_eps, uncond_eps = torch.split(eps, len(eps) // 2, dim=0)\r\n        half_eps = uncond_eps + cfg_scale * (cond_eps - uncond_eps)\r\n        eps = torch.cat([half_eps, half_eps], dim=0)\r\n        return torch.cat([eps, rest], dim=1)\r\n    \r\n\r\n\r\n\r\n#################################################################################\r\n#                   Sine/Cosine Positional Embedding Functions                  #\r\n#################################################################################\r\n# https://github.com/facebookresearch/mae/blob/main/util/pos_embed.py\r\n\r\ndef get_2d_sincos_pos_embed(embed_dim, grid_size, cls_token=False, extra_tokens=0):\r\n    \"\"\"\r\n    grid_size: int of the grid height and width\r\n    return:\r\n    pos_embed: [grid_size*grid_size, embed_dim] or [1+grid_size*grid_size, embed_dim] (w/ or w/o cls_token)\r\n    \"\"\"\r\n    grid_h = np.arange(grid_size, dtype=np.float32)\r\n    grid_w = np.arange(grid_size, dtype=np.float32)\r\n    grid = np.meshgrid(grid_w, grid_h)  # here w goes first\r\n    grid = np.stack(grid, axis=0)\r\n\r\n    grid = grid.reshape([2, 1, grid_size, grid_size])\r\n    pos_embed = get_2d_sincos_pos_embed_from_grid(embed_dim, grid)\r\n    if cls_token and extra_tokens > 0:\r\n        pos_embed = np.concatenate([np.zeros([extra_tokens, embed_dim]), pos_embed], axis=0)\r\n    return pos_embed\r\n\r\n\r\ndef get_2d_sincos_pos_embed_from_grid(embed_dim, grid):\r\n    assert embed_dim % 2 == 0\r\n\r\n    # use half of dimensions to encode grid_h\r\n    emb_h = get_1d_sincos_pos_embed_from_grid(embed_dim // 2, grid[0])  # (H*W, D/2)\r\n    emb_w = get_1d_sincos_pos_embed_from_grid(embed_dim // 2, grid[1])  # (H*W, D/2)\r\n\r\n    emb = np.concatenate([emb_h, emb_w], axis=1) # (H*W, D)\r\n    return emb\r\n\r\n\r\ndef get_1d_sincos_pos_embed_from_grid(embed_dim, pos):\r\n    \"\"\"\r\n    embed_dim: output dimension for each position\r\n    pos: a list of positions to be encoded: size (M,)\r\n    out: (M, D)\r\n    \"\"\"\r\n    assert embed_dim % 2 == 0\r\n    omega = np.arange(embed_dim // 2, dtype=np.float64)\r\n    omega /= embed_dim / 2.\r\n    omega = 1. / 10000**omega  # (D/2,)\r\n\r\n    pos = pos.reshape(-1)  # (M,)\r\n    out = np.einsum('m,d->md', pos, omega)  # (M, D/2), outer product\r\n\r\n    emb_sin = np.sin(out) # (M, D/2)\r\n    emb_cos = np.cos(out) # (M, D/2)\r\n\r\n    emb = np.concatenate([emb_sin, emb_cos], axis=1)  # (M, D)\r\n    return emb\r\n\r\n\r\n#################################################################################\r\n#                                   DiT Configs                                  #\r\n#################################################################################\r\n\r\ndef DiT_XL_2(**kwargs):\r\n    return DiT(depth=28, hidden_size=1152, patch_size=2, num_heads=16, **kwargs)\r\n\r\ndef DiT_XL_4(**kwargs):\r\n    return DiT(depth=28, hidden_size=1152, patch_size=4, num_heads=16, **kwargs)\r\n\r\ndef DiT_XL_8(**kwargs):\r\n    return DiT(depth=28, hidden_size=1152, patch_size=8, num_heads=16, **kwargs)\r\n\r\ndef DiT_L_2(**kwargs):\r\n    return DiT(depth=24, hidden_size=1024, patch_size=2, num_heads=16, **kwargs)\r\n\r\ndef DiT_L_4(**kwargs):\r\n    return DiT(depth=24, hidden_size=1024, patch_size=4, num_heads=16, **kwargs)\r\n\r\ndef DiT_L_8(**kwargs):\r\n    return DiT(depth=24, hidden_size=1024, patch_size=8, num_heads=16, **kwargs)\r\n\r\ndef DiT_B_2(**kwargs):\r\n    return DiT(depth=12, hidden_size=768, patch_size=2, num_heads=12, **kwargs)\r\n\r\ndef DiT_B_4(**kwargs):\r\n    return DiT(depth=12, hidden_size=768, patch_size=4, num_heads=12, **kwargs)\r\n\r\ndef DiT_B_8(**kwargs):\r\n    return DiT(depth=12, hidden_size=768, patch_size=8, num_heads=12, **kwargs)\r\n\r\ndef DiT_S_2(**kwargs):\r\n    return DiT(depth=12, hidden_size=384, patch_size=2, num_heads=6, **kwargs)\r\n\r\ndef DiT_S_4(**kwargs):\r\n    return DiT(depth=12, hidden_size=384, patch_size=4, num_heads=6, **kwargs)\r\n\r\ndef DiT_S_8(**kwargs):\r\n    return DiT(depth=12, hidden_size=384, patch_size=8, num_heads=6, **kwargs)\r\n\r\n\r\nDiT_models = {\r\n    'DiT-XL/2': DiT_XL_2,  'DiT-XL/4': DiT_XL_4,  'DiT-XL/8': DiT_XL_8,\r\n    'DiT-L/2':  DiT_L_2,   'DiT-L/4':  DiT_L_4,   'DiT-L/8':  DiT_L_8,\r\n    'DiT-B/2':  DiT_B_2,   'DiT-B/4':  DiT_B_4,   'DiT-B/8':  DiT_B_8,\r\n    'DiT-S/2':  DiT_S_2,   'DiT-S/4':  DiT_S_4,   'DiT-S/8':  DiT_S_8,\r\n}\r\n"
  },
  {
    "path": "DiT-ToCa/sample.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates.\r\n# All rights reserved.\r\n\r\n# This source code is licensed under the license found in the\r\n# LICENSE file in the root directory of this source tree.\r\n\r\n\"\"\"\r\nSample new images from a pre-trained DiT.\r\n\"\"\"\r\nimport torch\r\ntorch.backends.cuda.matmul.allow_tf32 = True\r\ntorch.backends.cudnn.allow_tf32 = True\r\nfrom torchvision.utils import save_image\r\nfrom diffusion import create_diffusion\r\nfrom diffusers.models import AutoencoderKL\r\nfrom download import find_model\r\nfrom models import DiT_models\r\nimport argparse\r\n\r\n\r\ndef main(args):\r\n    # Setup PyTorch:\r\n    torch.manual_seed(args.seed)\r\n    torch.set_grad_enabled(False)\r\n    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\r\n    #device = \"cpu\" \r\n    #print(\"device = \", device, flush=True)\r\n    #print(torch.cuda.device_count(), flush=True)\r\n\r\n    if args.ckpt is None:\r\n        assert args.model == \"DiT-XL/2\", \"Only DiT-XL/2 models are available for auto-download.\"\r\n        assert args.image_size in [256, 512]\r\n        assert args.num_classes == 1000\r\n\r\n    # Load model:\r\n    latent_size = args.image_size // 8\r\n    model = DiT_models[args.model](\r\n        input_size=latent_size,\r\n        num_classes=args.num_classes\r\n    ).to(device)\r\n    # Auto-download a pre-trained model or load a custom DiT checkpoint from train.py:\r\n    ckpt_path = args.ckpt or f\"/root/autodl-tmp/pretrained_models/DiT/DiT-XL-2-{args.image_size}x{args.image_size}.pt\"\r\n    state_dict = find_model(ckpt_path)\r\n    model.load_state_dict(state_dict)\r\n    model.eval()  # important!\r\n    diffusion = create_diffusion(str(args.num_sampling_steps))\r\n    vae = AutoencoderKL.from_pretrained(f\"/root/autodl-tmp/pretrained_models/stabilityai/sd-vae-ft-{args.vae}\").to(device)\r\n    #vae = AutoencoderKL.from_pretrained(f\"/root/autodl-tmp/pretrained_models\").to(device)\r\n\r\n    # Labels to condition the model with (feel free to change):\r\n    class_labels = [985]\r\n\r\n\r\n    # Create sampling noise:\r\n    n = len(class_labels)\r\n    # Sample 4 images for category label\r\n    z = torch.randn(n, 4, latent_size, latent_size, device=device)\r\n    y = torch.tensor(class_labels, device=device)\r\n\r\n    # Setup classifier-free guidance:\r\n    #print(\"cfg scale = \", args.cfg_scale, flush=True)\r\n    z = torch.cat([z, z], 0)\r\n    y_null = torch.tensor([1000] * n, device=device)\r\n    y = torch.cat([y, y_null], 0)\r\n    model_kwargs = dict(y=y, cfg_scale=args.cfg_scale)\r\n\r\n    model_kwargs['cache_type']        = args.cache_type\r\n    model_kwargs['fresh_ratio']       = args.fresh_ratio\r\n    model_kwargs['force_fresh']       = args.force_fresh\r\n    model_kwargs['fresh_threshold']   = args.fresh_threshold\r\n    model_kwargs['ratio_scheduler']   = args.ratio_scheduler\r\n    model_kwargs['soft_fresh_weight'] = args.soft_fresh_weight\r\n    model_kwargs['test_FLOPs']        = args.test_FLOPs\r\n        \r\n\r\n    start = torch.cuda.Event(enable_timing=True)\r\n    end = torch.cuda.Event(enable_timing=True)\r\n    start.record()\r\n\r\n    if args.ddim_sample:\r\n        samples = diffusion.ddim_sample_loop(\r\n            model.forward_with_cfg, z.shape, z, clip_denoised=False, model_kwargs=model_kwargs, progress=True, device=device\r\n        )\r\n    else:\r\n        samples = diffusion.p_sample_loop(\r\n            model.forward_with_cfg, z.shape, z, clip_denoised=False, model_kwargs=model_kwargs, progress=True, device=device\r\n        )\r\n    end.record()\r\n    torch.cuda.synchronize()\r\n    print(f\"Total Sampling took {start.elapsed_time(end)*0.001} seconds\")\r\n\r\n    samples, _ = samples.chunk(2, dim=0)  # Remove null class samples\r\n    samples = vae.decode(samples / 0.18215).sample\r\n\r\n    # Save and display images:\r\n    save_image(samples, \"sample.png\", nrow=4, normalize=True, value_range=(-1, 1))\r\n\r\n\r\nif __name__ == \"__main__\":\r\n    parser = argparse.ArgumentParser()\r\n    parser.add_argument(\"--model\", type=str, choices=list(DiT_models.keys()), default=\"DiT-XL/2\")\r\n    parser.add_argument(\"--vae\", type=str, choices=[\"ema\", \"mse\"], default=\"mse\")\r\n    parser.add_argument(\"--image-size\", type=int, choices=[256, 512], default=256)\r\n    parser.add_argument(\"--num-classes\", type=int, default=1000)\r\n    parser.add_argument(\"--cfg-scale\", type=float, default=1.5)\r\n    parser.add_argument(\"--num-sampling-steps\", type=int, default=250)\r\n    parser.add_argument(\"--seed\", type=int, default=0)\r\n    parser.add_argument(\"--ckpt\", type=str, default=None,\r\n                        help=\"Optional path to a DiT checkpoint (default: auto-download a pre-trained DiT-XL/2 model).\")\r\n    parser.add_argument(\"--ddim-sample\", action=\"store_true\", default=False)\r\n    parser.add_argument(\"--cache-type\", type=str, choices=['random', 'attention','similarity','norm', 'compress','kv-norm'], default='attention') # only attention is supported currently\r\n    parser.add_argument(\"--fresh-ratio\", type=float, default=0.07)\r\n    parser.add_argument(\"--ratio-scheduler\", type=str, default='ToCa', choices=['linear', 'cosine', 'exp', 'constant','linear-mode','layerwise','ToCa-ddpm250', 'ToCa-ddim50']) #  'ToCa' is the proposed scheduler in Final version of the paper\r\n    parser.add_argument(\"--force-fresh\", type=str, choices=['global', 'local'], default='global',\r\n                        help=\"Force fresh strategy. global: fresh all tokens. local: fresh tokens acheiving fresh step threshold.\") # only global is supported currently, local causes bad results\r\n    parser.add_argument(\"--fresh-threshold\", type=int, default=4) # N in the paper\r\n    parser.add_argument(\"--soft-fresh-weight\", type=float, default=0.25, # lambda_3 in the paper\r\n                        help=\"soft weight for updating the stale tokens by adding extra scores.\")\r\n    parser.add_argument(\"--test-FLOPs\", action=\"store_true\", default=False)\r\n    #parser.add_argument(\"--merge-weight\", type=float, default=0.0) # never used in the paper, just for exploration\r\n\r\n    args = parser.parse_args()\r\n    main(args)\r\n"
  },
  {
    "path": "DiT-ToCa/sample_ddp.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates.\r\n# All rights reserved.\r\n\r\n# This source code is licensed under the license found in the\r\n# LICENSE file in the root directory of this source tree.\r\n\r\n\"\"\"\r\nSamples a large number of images from a pre-trained DiT model using DDP.\r\nSubsequently saves a .npz file that can be used to compute FID and other\r\nevaluation metrics via the ADM repo: https://github.com/openai/guided-diffusion/tree/main/evaluations\r\n\r\nFor a simple single-GPU/CPU sampling script, see sample.py.\r\n\"\"\"\r\nimport torch\r\nimport torch.distributed as dist\r\nfrom models import DiT_models\r\nfrom download import find_model\r\nfrom diffusion import create_diffusion\r\nfrom diffusers.models import AutoencoderKL\r\nfrom tqdm import tqdm\r\nimport os\r\nfrom PIL import Image\r\nimport numpy as np\r\nimport math\r\nimport argparse\r\n\r\n\r\ndef create_npz_from_sample_folder(sample_dir, num=50_000):\r\n    \"\"\"\r\n    Builds a single .npz file from a folder of .png samples.\r\n    \"\"\"\r\n    samples = []\r\n    for i in tqdm(range(num), desc=\"Building .npz file from samples\"):\r\n        sample_pil = Image.open(f\"{sample_dir}/{i:06d}.png\")\r\n        sample_np = np.asarray(sample_pil).astype(np.uint8)\r\n        samples.append(sample_np)\r\n    samples = np.stack(samples)\r\n    assert samples.shape == (num, samples.shape[1], samples.shape[2], 3)\r\n    npz_path = f\"{sample_dir}.npz\"\r\n    np.savez(npz_path, arr_0=samples)\r\n    print(f\"Saved .npz file to {npz_path} [shape={samples.shape}].\")\r\n    return npz_path\r\n\r\ndef main(args):\r\n    \"\"\"\r\n    Run sampling.\r\n    \"\"\"\r\n\r\n    torch.backends.cuda.matmul.allow_tf32 = args.tf32  # True: fast but may lead to some small numerical differences\r\n    assert torch.cuda.is_available(), \"Sampling with DDP requires at least one GPU. sample.py supports CPU-only usage\"\r\n    torch.set_grad_enabled(False)\r\n\r\n    # Setup DDP:\r\n    dist.init_process_group(\"nccl\")\r\n    rank = dist.get_rank()\r\n    device = rank % torch.cuda.device_count()\r\n    seed = args.global_seed * dist.get_world_size() + rank\r\n    torch.manual_seed(seed)\r\n    torch.cuda.set_device(device)\r\n    print(f\"Starting rank={rank}, seed={seed}, world_size={dist.get_world_size()}.\")\r\n\r\n    if args.ckpt is None:\r\n        assert args.model == \"DiT-XL/2\", \"Only DiT-XL/2 models are available for auto-download.\"\r\n        assert args.image_size in [256, 512]\r\n        assert args.num_classes == 1000\r\n\r\n    # Load model:\r\n    latent_size = args.image_size // 8\r\n    model = DiT_models[args.model](\r\n        input_size=latent_size,\r\n        num_classes=args.num_classes\r\n    ).to(device)\r\n    # Auto-download a pre-trained model or load a custom DiT checkpoint from train.py:\r\n    ckpt_path = args.ckpt or f\"/root/autodl-tmp/pretrained_models/DiT/DiT-XL-2-{args.image_size}x{args.image_size}.pt\"\r\n    state_dict = find_model(ckpt_path)\r\n    model.load_state_dict(state_dict)\r\n    model.eval()  # important!\r\n    diffusion = create_diffusion(str(args.num_sampling_steps))\r\n    vae = AutoencoderKL.from_pretrained(f\"/root/autodl-tmp/pretrained_models/stabilityai/sd-vae-ft-{args.vae}\").to(device)\r\n    #vae = AutoencoderKL.from_pretrained(f\"/root/autodl-tmp/pretrained_models\").to(device)\r\n    assert args.cfg_scale >= 1.0, \"In almost all cases, cfg_scale be >= 1.0\"\r\n    using_cfg = args.cfg_scale > 1.0\r\n\r\n    # Create folder to save samples:\r\n    model_string_name = args.model.replace(\"/\", \"-\")\r\n    ckpt_string_name = os.path.basename(args.ckpt).replace(\".pt\", \"\") if args.ckpt else \"pretrained\"\r\n    folder_name = f\"ToCa-{model_string_name}-{ckpt_string_name}-size-{args.image_size}-vae-{args.vae}-\" \\\r\n                  f\"cfg-{args.cfg_scale}-seed-{args.global_seed}-step-{args.num_sampling_steps}-num-{args.num_fid_samples}\"\\\r\n                  f\"-{args.cache_type}-{args.fresh_ratio}-{args.ratio_scheduler}-{args.force_fresh}-{args.fresh_threshold}\"\\\r\n                  f\"-softweight-{args.soft_fresh_weight}\"\r\n    sample_folder_dir = f\"{args.sample_dir}/{folder_name}\"\r\n    if rank == 0:\r\n        os.makedirs(sample_folder_dir, exist_ok=True)\r\n        print(f\"Saving .png samples at {sample_folder_dir}\")\r\n    dist.barrier()\r\n\r\n    # Figure out how many samples we need to generate on each GPU and how many iterations we need to run:\r\n    n = args.per_proc_batch_size\r\n    global_batch_size = n * dist.get_world_size()\r\n    # To make things evenly-divisible, we'll sample a bit more than we need and then discard the extra samples:\r\n    total_samples = int(math.ceil(args.num_fid_samples / global_batch_size) * global_batch_size)\r\n    if rank == 0:\r\n        print(f\"Total number of images that will be sampled: {total_samples}\")\r\n    assert total_samples % dist.get_world_size() == 0, \"total_samples must be divisible by world_size\"\r\n    samples_needed_this_gpu = int(total_samples // dist.get_world_size())\r\n    assert samples_needed_this_gpu % n == 0, \"samples_needed_this_gpu must be divisible by the per-GPU batch size\"\r\n    iterations = int(samples_needed_this_gpu // n)\r\n    pbar = range(iterations)\r\n    pbar = tqdm(pbar) if rank == 0 else pbar\r\n    total = 0\r\n\r\n    for _ in pbar:\r\n        # Sample inputs:\r\n        z = torch.randn(n, model.in_channels, latent_size, latent_size, device=device)\r\n        y = torch.randint(0, args.num_classes, (n,), device=device)\r\n\r\n        # Setup classifier-free guidance:\r\n        if using_cfg:\r\n            z = torch.cat([z, z], 0)\r\n            y_null = torch.tensor([1000] * n, device=device)\r\n            y = torch.cat([y, y_null], 0)\r\n            model_kwargs = dict(y=y, cfg_scale=args.cfg_scale)\r\n            sample_fn = model.forward_with_cfg\r\n        else:\r\n            model_kwargs = dict(y=y)\r\n            sample_fn = model.forward\r\n\r\n        model_kwargs['cache_type']        = args.cache_type\r\n        model_kwargs['fresh_ratio']       = args.fresh_ratio\r\n        model_kwargs['force_fresh']       = args.force_fresh\r\n        model_kwargs['fresh_threshold']   = args.fresh_threshold\r\n        model_kwargs['ratio_scheduler']   = args.ratio_scheduler\r\n        model_kwargs['soft_fresh_weight'] = args.soft_fresh_weight\r\n        model_kwargs['test_FLOPs']        = args.test_FLOPs\r\n        \r\n\r\n        # Sample images:\r\n        if args.ddim_sample:\r\n            samples = diffusion.ddim_sample_loop(\r\n                sample_fn, z.shape, z, clip_denoised=False, model_kwargs=model_kwargs, progress=False, device=device\r\n            )\r\n        else:\r\n            samples = diffusion.p_sample_loop(\r\n                sample_fn, z.shape, z, clip_denoised=False, model_kwargs=model_kwargs, progress=False, device=device,\r\n            )\r\n            \r\n        if using_cfg:\r\n            samples, _ = samples.chunk(2, dim=0)  # Remove null class samples\r\n\r\n        samples = vae.decode(samples / 0.18215).sample\r\n        samples = torch.clamp(127.5 * samples + 128.0, 0, 255).permute(0, 2, 3, 1).to(\"cpu\", dtype=torch.uint8).numpy()\r\n\r\n        # Save samples to disk as individual .png files\r\n        for i, sample in enumerate(samples):\r\n            index = i * dist.get_world_size() + rank + total\r\n            Image.fromarray(sample).save(f\"{sample_folder_dir}/{index:06d}.png\")\r\n        total += global_batch_size\r\n\r\n    # Make sure all processes have finished saving their samples before attempting to convert to .npz\r\n    dist.barrier()\r\n    if rank == 0:\r\n        create_npz_from_sample_folder(sample_folder_dir, args.num_fid_samples)\r\n        print(\"Done.\")\r\n    dist.barrier()\r\n    dist.destroy_process_group()\r\n\r\n\r\nif __name__ == \"__main__\":\r\n    parser = argparse.ArgumentParser()\r\n    parser.add_argument(\"--model\", type=str, choices=list(DiT_models.keys()), default=\"DiT-XL/2\")\r\n    parser.add_argument(\"--vae\",  type=str, choices=[\"ema\", \"mse\"], default=\"ema\")\r\n    parser.add_argument(\"--sample-dir\", type=str, default=\"/root/autodl-tmp/samples\") # Change this to your desired sample directory\r\n    parser.add_argument(\"--per-proc-batch-size\", type=int, default=32)\r\n    parser.add_argument(\"--num-fid-samples\", type=int, default=50_000)\r\n    parser.add_argument(\"--image-size\", type=int, choices=[256, 512], default=256)\r\n    parser.add_argument(\"--num-classes\", type=int, default=1000)\r\n    parser.add_argument(\"--cfg-scale\",  type=float, default=1.5)\r\n    parser.add_argument(\"--num-sampling-steps\", type=int, default=250)\r\n    parser.add_argument(\"--global-seed\", type=int, default=0)\r\n    parser.add_argument(\"--tf32\", action=argparse.BooleanOptionalAction, default=True,\r\n                        help=\"By default, use TF32 matmuls. This massively accelerates sampling on Ampere GPUs.\")\r\n    parser.add_argument(\"--ckpt\", type=str, default=None,\r\n                        help=\"Optional path to a DiT checkpoint (default: auto-download a pre-trained DiT-XL/2 model).\")\r\n    parser.add_argument(\"--ddim-sample\", action=\"store_true\", default=False)\r\n    parser.add_argument(\"--fresh-ratio\", type=float, default=0.07)\r\n    parser.add_argument(\"--cache-type\", type=str, choices=['random', 'attention','similarity','norm', 'compress','kv-norm'], default='random') # only attention supported currently\r\n    parser.add_argument(\"--ratio-scheduler\", type=str, default='ToCa', choices=['linear', 'cosine', 'exp', 'constant','linear-mode','layerwise','ToCa-ddpm250', 'ToCa-ddim50']) #  'ToCa' is the proposed scheduler in Final version of the paper\r\n    parser.add_argument(\"--force-fresh\", type=str, choices=['global', 'local'], default='global', # only global is supported currently, local causes bad results\r\n                        help=\"Force fresh strategy. global: fresh all tokens. local: fresh tokens acheiving fresh step threshold.\")\r\n    parser.add_argument(\"--fresh-threshold\", type=int, default=4) # N in the paper\r\n    parser.add_argument(\"--soft-fresh-weight\", type=float, default=0.25, # lambda_3 in the paper\r\n                        help=\"soft weight for updating the stale tokens by adding extra scores.\")\r\n    parser.add_argument(\"--test-FLOPs\", action=\"store_true\", default=False)\r\n    #parser.add_argument(\"--merge-weight\", type=float, default=0.0) # never used in the paper, just for exploration\r\n\r\n    args = parser.parse_args()\r\n    main(args)"
  },
  {
    "path": "DiT-ToCa/train.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates.\r\n# All rights reserved.\r\n\r\n# This source code is licensed under the license found in the\r\n# LICENSE file in the root directory of this source tree.\r\n\r\n\"\"\"\r\nA minimal training script for DiT using PyTorch DDP.\r\n\"\"\"\r\nimport torch\r\n# the first flag below was False when we tested this script but True makes A100 training a lot faster:\r\ntorch.backends.cuda.matmul.allow_tf32 = True\r\ntorch.backends.cudnn.allow_tf32 = True\r\nimport torch.distributed as dist\r\nfrom torch.nn.parallel import DistributedDataParallel as DDP\r\nfrom torch.utils.data import DataLoader\r\nfrom torch.utils.data.distributed import DistributedSampler\r\nfrom torchvision.datasets import ImageFolder\r\nfrom torchvision import transforms\r\nimport numpy as np\r\nfrom collections import OrderedDict\r\nfrom PIL import Image\r\nfrom copy import deepcopy\r\nfrom glob import glob\r\nfrom time import time\r\nimport argparse\r\nimport logging\r\nimport os\r\n\r\nfrom models import DiT_models\r\nfrom diffusion import create_diffusion\r\nfrom diffusers.models import AutoencoderKL\r\n\r\n\r\n#################################################################################\r\n#                             Training Helper Functions                         #\r\n#################################################################################\r\n\r\n@torch.no_grad()\r\ndef update_ema(ema_model, model, decay=0.9999):\r\n    \"\"\"\r\n    Step the EMA model towards the current model.\r\n    \"\"\"\r\n    ema_params = OrderedDict(ema_model.named_parameters())\r\n    model_params = OrderedDict(model.named_parameters())\r\n\r\n    for name, param in model_params.items():\r\n        # TODO: Consider applying only to params that require_grad to avoid small numerical changes of pos_embed\r\n        ema_params[name].mul_(decay).add_(param.data, alpha=1 - decay)\r\n\r\n\r\ndef requires_grad(model, flag=True):\r\n    \"\"\"\r\n    Set requires_grad flag for all parameters in a model.\r\n    \"\"\"\r\n    for p in model.parameters():\r\n        p.requires_grad = flag\r\n\r\n\r\ndef cleanup():\r\n    \"\"\"\r\n    End DDP training.\r\n    \"\"\"\r\n    dist.destroy_process_group()\r\n\r\n\r\ndef create_logger(logging_dir):\r\n    \"\"\"\r\n    Create a logger that writes to a log file and stdout.\r\n    \"\"\"\r\n    if dist.get_rank() == 0:  # real logger\r\n        logging.basicConfig(\r\n            level=logging.INFO,\r\n            format='[\\033[34m%(asctime)s\\033[0m] %(message)s',\r\n            datefmt='%Y-%m-%d %H:%M:%S',\r\n            handlers=[logging.StreamHandler(), logging.FileHandler(f\"{logging_dir}/log.txt\")]\r\n        )\r\n        logger = logging.getLogger(__name__)\r\n    else:  # dummy logger (does nothing)\r\n        logger = logging.getLogger(__name__)\r\n        logger.addHandler(logging.NullHandler())\r\n    return logger\r\n\r\n\r\ndef center_crop_arr(pil_image, image_size):\r\n    \"\"\"\r\n    Center cropping implementation from ADM.\r\n    https://github.com/openai/guided-diffusion/blob/8fb3ad9197f16bbc40620447b2742e13458d2831/guided_diffusion/image_datasets.py#L126\r\n    \"\"\"\r\n    while min(*pil_image.size) >= 2 * image_size:\r\n        pil_image = pil_image.resize(\r\n            tuple(x // 2 for x in pil_image.size), resample=Image.BOX\r\n        )\r\n\r\n    scale = image_size / min(*pil_image.size)\r\n    pil_image = pil_image.resize(\r\n        tuple(round(x * scale) for x in pil_image.size), resample=Image.BICUBIC\r\n    )\r\n\r\n    arr = np.array(pil_image)\r\n    crop_y = (arr.shape[0] - image_size) // 2\r\n    crop_x = (arr.shape[1] - image_size) // 2\r\n    return Image.fromarray(arr[crop_y: crop_y + image_size, crop_x: crop_x + image_size])\r\n\r\n\r\n#################################################################################\r\n#                                  Training Loop                                #\r\n#################################################################################\r\n\r\ndef main(args):\r\n    \"\"\"\r\n    Trains a new DiT model.\r\n    \"\"\"\r\n    assert torch.cuda.is_available(), \"Training currently requires at least one GPU.\"\r\n\r\n    # Setup DDP:\r\n    dist.init_process_group(\"nccl\")\r\n    assert args.global_batch_size % dist.get_world_size() == 0, f\"Batch size must be divisible by world size.\"\r\n    rank = dist.get_rank()\r\n    device = rank % torch.cuda.device_count()\r\n    seed = args.global_seed * dist.get_world_size() + rank\r\n    torch.manual_seed(seed)\r\n    torch.cuda.set_device(device)\r\n    print(f\"Starting rank={rank}, seed={seed}, world_size={dist.get_world_size()}.\")\r\n\r\n    # Setup an experiment folder:\r\n    if rank == 0:\r\n        os.makedirs(args.results_dir, exist_ok=True)  # Make results folder (holds all experiment subfolders)\r\n        experiment_index = len(glob(f\"{args.results_dir}/*\"))\r\n        model_string_name = args.model.replace(\"/\", \"-\")  # e.g., DiT-XL/2 --> DiT-XL-2 (for naming folders)\r\n        experiment_dir = f\"{args.results_dir}/{experiment_index:03d}-{model_string_name}\"  # Create an experiment folder\r\n        checkpoint_dir = f\"{experiment_dir}/checkpoints\"  # Stores saved model checkpoints\r\n        os.makedirs(checkpoint_dir, exist_ok=True)\r\n        logger = create_logger(experiment_dir)\r\n        logger.info(f\"Experiment directory created at {experiment_dir}\")\r\n    else:\r\n        logger = create_logger(None)\r\n\r\n    # Create model:\r\n    assert args.image_size % 8 == 0, \"Image size must be divisible by 8 (for the VAE encoder).\"\r\n    latent_size = args.image_size // 8\r\n    model = DiT_models[args.model](\r\n        input_size=latent_size,\r\n        num_classes=args.num_classes\r\n    )\r\n    # Note that parameter initialization is done within the DiT constructor\r\n    ema = deepcopy(model).to(device)  # Create an EMA of the model for use after training\r\n    requires_grad(ema, False)\r\n    model = DDP(model.to(device), device_ids=[rank])\r\n    diffusion = create_diffusion(timestep_respacing=\"\")  # default: 1000 steps, linear noise schedule\r\n    vae = AutoencoderKL.from_pretrained(f\"stabilityai/sd-vae-ft-{args.vae}\").to(device)\r\n    logger.info(f\"DiT Parameters: {sum(p.numel() for p in model.parameters()):,}\")\r\n\r\n    # Setup optimizer (we used default Adam betas=(0.9, 0.999) and a constant learning rate of 1e-4 in our paper):\r\n    opt = torch.optim.AdamW(model.parameters(), lr=1e-4, weight_decay=0)\r\n\r\n    # Setup data:\r\n    transform = transforms.Compose([\r\n        transforms.Lambda(lambda pil_image: center_crop_arr(pil_image, args.image_size)),\r\n        transforms.RandomHorizontalFlip(),\r\n        transforms.ToTensor(),\r\n        transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], inplace=True)\r\n    ])\r\n    dataset = ImageFolder(args.data_path, transform=transform)\r\n    sampler = DistributedSampler(\r\n        dataset,\r\n        num_replicas=dist.get_world_size(),\r\n        rank=rank,\r\n        shuffle=True,\r\n        seed=args.global_seed\r\n    )\r\n    loader = DataLoader(\r\n        dataset,\r\n        batch_size=int(args.global_batch_size // dist.get_world_size()),\r\n        shuffle=False,\r\n        sampler=sampler,\r\n        num_workers=args.num_workers,\r\n        pin_memory=True,\r\n        drop_last=True\r\n    )\r\n    logger.info(f\"Dataset contains {len(dataset):,} images ({args.data_path})\")\r\n\r\n    # Prepare models for training:\r\n    update_ema(ema, model.module, decay=0)  # Ensure EMA is initialized with synced weights\r\n    model.train()  # important! This enables embedding dropout for classifier-free guidance\r\n    ema.eval()  # EMA model should always be in eval mode\r\n\r\n    # Variables for monitoring/logging purposes:\r\n    train_steps = 0\r\n    log_steps = 0\r\n    running_loss = 0\r\n    start_time = time()\r\n\r\n    logger.info(f\"Training for {args.epochs} epochs...\")\r\n    for epoch in range(args.epochs):\r\n        sampler.set_epoch(epoch)\r\n        logger.info(f\"Beginning epoch {epoch}...\")\r\n        for x, y in loader:\r\n            x = x.to(device)\r\n            y = y.to(device)\r\n            with torch.no_grad():\r\n                # Map input images to latent space + normalize latents:\r\n                x = vae.encode(x).latent_dist.sample().mul_(0.18215)\r\n            t = torch.randint(0, diffusion.num_timesteps, (x.shape[0],), device=device)\r\n            model_kwargs = dict(y=y)\r\n            loss_dict = diffusion.training_losses(model, x, t, model_kwargs)\r\n            loss = loss_dict[\"loss\"].mean()\r\n            opt.zero_grad()\r\n            loss.backward()\r\n            opt.step()\r\n            update_ema(ema, model.module)\r\n\r\n            # Log loss values:\r\n            running_loss += loss.item()\r\n            log_steps += 1\r\n            train_steps += 1\r\n            if train_steps % args.log_every == 0:\r\n                # Measure training speed:\r\n                torch.cuda.synchronize()\r\n                end_time = time()\r\n                steps_per_sec = log_steps / (end_time - start_time)\r\n                # Reduce loss history over all processes:\r\n                avg_loss = torch.tensor(running_loss / log_steps, device=device)\r\n                dist.all_reduce(avg_loss, op=dist.ReduceOp.SUM)\r\n                avg_loss = avg_loss.item() / dist.get_world_size()\r\n                logger.info(f\"(step={train_steps:07d}) Train Loss: {avg_loss:.4f}, Train Steps/Sec: {steps_per_sec:.2f}\")\r\n                # Reset monitoring variables:\r\n                running_loss = 0\r\n                log_steps = 0\r\n                start_time = time()\r\n\r\n            # Save DiT checkpoint:\r\n            if train_steps % args.ckpt_every == 0 and train_steps > 0:\r\n                if rank == 0:\r\n                    checkpoint = {\r\n                        \"model\": model.module.state_dict(),\r\n                        \"ema\": ema.state_dict(),\r\n                        \"opt\": opt.state_dict(),\r\n                        \"args\": args\r\n                    }\r\n                    checkpoint_path = f\"{checkpoint_dir}/{train_steps:07d}.pt\"\r\n                    torch.save(checkpoint, checkpoint_path)\r\n                    logger.info(f\"Saved checkpoint to {checkpoint_path}\")\r\n                dist.barrier()\r\n\r\n    model.eval()  # important! This disables randomized embedding dropout\r\n    # do any sampling/FID calculation/etc. with ema (or model) in eval mode ...\r\n\r\n    logger.info(\"Done!\")\r\n    cleanup()\r\n\r\n\r\nif __name__ == \"__main__\":\r\n    # Default args here will train DiT-XL/2 with the hyperparameters we used in our paper (except training iters).\r\n    parser = argparse.ArgumentParser()\r\n    parser.add_argument(\"--data-path\", type=str, required=True)\r\n    parser.add_argument(\"--results-dir\", type=str, default=\"results\")\r\n    parser.add_argument(\"--model\", type=str, choices=list(DiT_models.keys()), default=\"DiT-XL/2\")\r\n    parser.add_argument(\"--image-size\", type=int, choices=[256, 512], default=256)\r\n    parser.add_argument(\"--num-classes\", type=int, default=1000)\r\n    parser.add_argument(\"--epochs\", type=int, default=1400)\r\n    parser.add_argument(\"--global-batch-size\", type=int, default=256)\r\n    parser.add_argument(\"--global-seed\", type=int, default=0)\r\n    parser.add_argument(\"--vae\", type=str, choices=[\"ema\", \"mse\"], default=\"ema\")  # Choice doesn't affect training\r\n    parser.add_argument(\"--num-workers\", type=int, default=4)\r\n    parser.add_argument(\"--log-every\", type=int, default=100)\r\n    parser.add_argument(\"--ckpt-every\", type=int, default=50_000)\r\n    args = parser.parse_args()\r\n    main(args)\r\n"
  },
  {
    "path": "DrawBench200.txt",
    "content": "A red colored car.\nA black colored car.\nA pink colored car.\nA black colored dog.\nA red colored dog.\nA blue colored dog.\nA green colored banana.\nA red colored banana.\nA black colored banana.\nA white colored sandwich.\nA black colored sandwich.\nAn orange colored sandwich.\nA pink colored giraffe.\nA yellow colored giraffe.\nA brown colored giraffe.\nA red car and a white sheep.\nA blue bird and a brown bear.\nA green apple and a black backpack.\nA green cup and a blue cell phone.\nA yellow book and a red vase.\nA white car and a red sheep.\nA brown bird and a blue bear.\nA black apple and a green backpack.\nA blue cup and a green cell phone.\nA red book and a yellow vase.\nA horse riding an astronaut.\nA pizza cooking an oven.\nA bird scaring a scarecrow.\nA blue coloured pizza.\nHovering cow abducting aliens.\nA panda making latte art.\nA shark in the desert.\nAn elephant under the sea.\nRainbow coloured penguin.\nA fish eating a pelican.\nOne car on the street.\nTwo cars on the street.\nThree cars on the street.\nFour cars on the street.\nFive cars on the street.\nOne dog on the street.\nTwo dogs on the street.\nThree dogs on the street.\nFour dogs on the street.\nFive dogs on the street.\nOne cat and one dog sitting on the grass.\nOne cat and two dogs sitting on the grass.\nOne cat and three dogs sitting on the grass.\nTwo cats and one dog sitting on the grass.\nTwo cats and two dogs sitting on the grass.\nTwo cats and three dogs sitting on the grass.\nThree cats and one dog sitting on the grass.\nThree cats and two dogs sitting on the grass.\nThree cats and three dogs sitting on the grass.\nA triangular purple flower pot. A purple flower pot in the shape of a triangle.\nA triangular orange picture frame. An orange picture frame in the shape of a triangle.\nA triangular pink stop sign. A pink stop sign in the shape of a triangle.\nA cube made of denim. A cube with the texture of denim.\nA sphere made of kitchen tile. A sphere with the texture of kitchen tile.\nA cube made of brick. A cube with the texture of brick.\nA collection of nail is sitting on a table.\nA single clock is sitting on a table.\nA couple of glasses are sitting on a table.\nAn illustration of a large red elephant sitting on a small blue mouse.\nAn illustration of a small green elephant standing behind a large red mouse.\nA small blue book sitting on a large red book.\n\"A stack of 3 plates. A blue plate is on the top, sitting on a blue plate. The blue plate is in the middle, sitting on a green plate. The green plate is on the bottom.\"\n\"A stack of 3 cubes. A red cube is on the top, sitting on a red cube. The red cube is in the middle, sitting on a green cube. The green cube is on the bottom.\"\n\"A stack of 3 books. A green book is on the top, sitting on a red book. The red book is in the middle, sitting on a blue book. The blue book is on the bottom.\"\n\"An emoji of a baby panda wearing a red hat, green gloves, red shirt, and green pants.\"\n\"An emoji of a baby panda wearing a red hat, blue gloves, green shirt, and blue pants.\"\nA fisheye lens view of a turtle sitting in a forest.\nA side view of an owl sitting in a field.\nA cross-section view of a brain.\n\"A vehicle composed of two wheels held in a frame one behind the other, propelled by pedals and steered with handlebars attached to the front wheel.\"\n\"A large motor vehicle carrying passengers by road, typically one serving the public on a fixed route and for a fare.\"\n\"A small vessel propelled on water by oars, sails, or an engine.\"\nA connection point by which firefighters can tap into a water supply.\n\"A machine next to a parking space in a street, into which the driver puts money so as to be authorized to park the vehicle for a particular length of time.\"\n\"A device consisting of a circular canopy of cloth on a folding metal frame supported by a central rod, used as protection against rain or sometimes sun.\"\n\"A separate seat for one person, typically with a back and four legs.\"\nAn appliance or compartment which is artificially kept cool and used to store food and drink.\nA mechanical or electrical device for measuring time.\n\"An instrument used for cutting cloth, paper, and other thin material, consisting of two blades laid one on top of the other and fastened in the middle so as to allow them to be opened and closed by a thumb and finger inserted through rings on the end of their handles.\"\n\"A large plant-eating domesticated mammal with solid hoofs and a flowing mane and tail, used for riding, racing, and to carry and pull loads.\"\nA long curved fruit which grows in clusters and has soft pulpy flesh and yellow skin when ripe.\n\"A small domesticated carnivorous mammal with soft fur, a short snout, and retractable claws. It is widely kept as a pet or for catching mice, and many breeds have been developed.\"\n\"A domesticated carnivorous mammal that typically has a long snout, an acute sense of smell, nonretractable claws, and a barking, howling, or whining voice.\"\n\"An organ of soft nervous tissue contained in the skull of vertebrates, functioning as the coordinating center of sensation and intellectual and nervous activity.\"\n\"An American multinational technology company that focuses on artificial intelligence, search engine, online advertising, cloud computing, computer software, quantum computing, e-commerce, and consumer electronics.\"\n\"A large keyboard musical instrument with a wooden case enclosing a soundboard and metal strings, which are struck by hammers when the keys are depressed. The strings' vibration is stopped by dampers when the keys are released and can be regulated for length and volume by two or three pedals.\"\n\"A type of digital currency in which a record of transactions is maintained and new units of currency are generated by the computational solution of mathematical problems, and which operates independently of a central bank.\"\n\"A large thick-skinned semiaquatic African mammal, with massive jaws and large tusks.\"\nA machine resembling a human being and able to replicate certain human movements and functions automatically.\nPaying for a quarter-sized pizza with a pizza-sized quarter.\nAn oil painting of a couple in formal evening wear going home get caught in a heavy downpour with no umbrellas.\n\"A grocery store refrigerator has pint cartons of milk on the top shelf, quart cartons on the middle shelf, and gallon plastic jugs on the bottom shelf.\"\n\"In late afternoon in January in New England, a man stands in the shadow of a maple tree.\"\nAn elephant is behind a tree. You can see the trunk on one side and the back legs on the other.\nA tomato has been put on top of a pumpkin on a kitchen stool. There is a fork sticking into the pumpkin. The scene is viewed from above.\nA pear cut into seven pieces arranged in a ring.\n\"A donkey and an octopus are playing a game. The donkey is holding a rope on one end, the octopus is holding onto the other. The donkey holds the rope in its mouth. A cat is jumping over the rope.\"\n\"Supreme Court Justices play a baseball game with the FBI. The FBI is at bat, the justices are on the field.\"\nAbraham Lincoln touches his toes while George Washington does chin-ups. Lincoln is barefoot. Washington is wearing boots.\nTcennis rpacket.\nBzaseball galove.\nRbefraigerator.\nDininrg tablez.\nPafrking metr.\n\"A smafml vessef epropoeilled on watvewr by ors, sauls, or han engie.\"\n\"A sjmall domesticated carnivorious mammnal with sof fuh,y a sthort sout, and retracwtablbe flaws. It iw widexly kept as a pet or for catchitng mic, ad many breeds zhlyde beefn develvoked.\"\n\"An instqrumemnt used for cutting cloth, paper, axdz othr thdin mteroial, consamistng of two blades lad one on tvopb of the other and fhastned in tle mixdqdjle so as to bllow them txo be pened and closed by thumb and fitngesr inserted tgrough rings on kthe end oc thei vatndlzes.\"\n\"A domesticated carnivvorous mzammal that typicbally hfaas a lons sfnout, an acxujte sense off osmell, noneetractaaln crlaws, anid xbarkring,y howlingu, or whining rvoiche.\"\n\"A ldarge keybord msical instroument lwith a woden case enmclosig a qsouvnkboajrd and mfgtal strivgf, which are strucrk b hammrs when the nels are depresdsmed.f lhe strsingsj' vibration ie stopped by damperds when the keys re released and can bce regulavewdd for lengh and vnolume y two or three pedalvs.\"\nA train on top of a surfboard.\nA wine glass on top of a dog.\nA bicycle on top of a boat.\nAn umbrella on top of a spoon.\nA laptop on top of a teddy bear.\nA giraffe underneath a microwave.\nA donut underneath a toilet.\nA hair drier underneath a sheep.\nA tennis racket underneath a traffic light.\nA zebra underneath a broccoli.\nA banana on the left of an apple.\nA couch on the left of a chair.\nA car on the left of a bus.\nA cat on the left of a dog.\nA carrot on the left of a broccoli.\nA pizza on the right of a suitcase.\nA cat on the right of a tennis racket.\nA stop sign on the right of a refrigerator.\nA sheep to the right of a wine glass.\nA zebra to the right of a fire hydrant.\nAcersecomicke.\nJentacular.\nMatutinal.\nPeristeronic.\nArtophagous.\nBacklotter.\nOctothorpe.\nA church with stained glass windows depicting a hamburger and french fries.\n\"Painting of the orange cat Otto von Garfield, Count of Bismarck-Schönhausen, Duke of Lauenburg, Minister-President of Prussia. Depicted wearing a Prussian Pickelhaube and eating his favorite meal - lasagna.\"\n\"A baby fennec sneezing onto a strawberry, detailed, macro, studio light, droplets, backlit ears.\"\nA photo of a confused grizzly bear in calculus class.\nAn ancient Egyptian painting depicting an argument over whose turn it is to take out the trash.\n\"A fluffy baby sloth with a knitted hat trying to figure out a laptop, close up, highly detailed, studio lighting, screen reflecting in its eyes.\"\n\"A tiger in a lab coat with a 1980s Miami vibe, turning a well oiled science content machine, digital art.\"\nA 1960s yearbook photo with animals dressed as humans.\nLego Arnold Schwarzenegger.\nA yellow and black bus cruising through the rainforest.\nA medieval painting of the wifi not working.\n\"An IT-guy trying to fix hardware of a PC tower is being tangled by the PC cables like Laokoon. Marble, copy after Hellenistic original from ca. 200 BC. Found in the Baths of Trajan, 1506.\"\n\"35mm macro shot a kitten licking a baby duck, studio lighting.\"\nMcDonalds Church.\nPhoto of an athlete cat explaining it's latest scandal at a press conference to journalists.\nGreek statue of a man tripping over a cat.\n\"An old photograph of a 1920s airship shaped like a pig, floating over a wheat field.\"\nPhoto of a cat singing in a barbershop quartet.\n\"A painting by Grant Wood of an astronaut couple, american gothic style.\"\nAn oil painting portrait of the regal Burger King posing with a Whopper.\n\"A keyboard made of water, the water is made of light, the light is turned off.\"\nPainting of Mona Lisa but the view is from behind of Mona Lisa.\nHyper-realistic photo of an abandoned industrial site during a storm.\nA screenshot of an iOS app for ordering different types of milk.\n\"A real life photography of super mario, 8k Ultra HD.\"\nColouring page of large cats climbing the eifel tower in a cyberpunk future.\nPhoto of a mega Lego space station inside a kid's bedroom.\nA spider with a moustache bidding an equally gentlemanly grasshopper a good day during his walk to work.\nA photocopy of a photograph of a painting of a sculpture of a giraffe.\n\"A bridge connecting Europe and North America on the Atlantic Ocean, bird's eye view.\"\n\"A maglev train going vertically downward in high speed, New York Times photojournalism.\"\nA magnifying glass over a page of a 1950s batman comic.\n\"A car playing soccer, digital art.\"\nDarth Vader playing with raccoon in Mars during sunset.\nA 1960s poster warning against climate change.\nIllustration of a mouse using a mushroom as an umbrella.\nA realistic photo of a Pomeranian dressed up like a 1980s professional wrestler with neon green and neon orange face paint and bright green wrestling tights with bright orange boots.\nA pyramid made of falafel with a partial solar eclipse in the background.\nA storefront with 'Hello World' written on it.\nA storefront with 'Diffusion' written on it.\nA storefront with 'Text to Image' written on it.\nA storefront with 'NeurIPS' written on it.\nA storefront with 'Deep Learning' written on it.\nA storefront with 'Google Brain Toronto' written on it.\nA storefront with 'Google Research Pizza Cafe' written on it.\nA sign that says 'Hello World'.\nA sign that says 'Diffusion'.\nA sign that says 'Text to Image'.\nA sign that says 'NeurIPS'.\nA sign that says 'Deep Learning'.\nA sign that says 'Google Brain Toronto'.\nA sign that says 'Google Research Pizza Cafe'.\nNew York Skyline with 'Hello World' written with fireworks on the sky.\nNew York Skyline with 'Diffusion' written with fireworks on the sky.\nNew York Skyline with 'Text to Image' written with fireworks on the sky.\nNew York Skyline with 'NeurIPS' written with fireworks on the sky.\nNew York Skyline with 'Deep Learning' written with fireworks on the sky.\nNew York Skyline with 'Google Brain Toronto' written with fireworks on the sky.\nNew York Skyline with 'Google Research Pizza Cafe' written with fireworks on the sky.\n"
  },
  {
    "path": "LICENSE",
    "content": "                    GNU GENERAL PUBLIC LICENSE\n                       Version 3, 29 June 2007\n\n Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>\n Everyone is permitted to copy and distribute verbatim copies\n of this license document, but changing it is not allowed.\n\n                            Preamble\n\n  The GNU General Public License is a free, copyleft license for\nsoftware and other kinds of works.\n\n  The licenses for most software and other practical works are designed\nto take away your freedom to share and change the works.  By contrast,\nthe GNU General Public License is intended to guarantee your freedom to\nshare and change all versions of a program--to make sure it remains free\nsoftware for all its users.  We, the Free Software Foundation, use the\nGNU General Public License for most of our software; it applies also to\nany other work released this way by its authors.  You can apply it to\nyour programs, too.\n\n  When we speak of free software, we are referring to freedom, not\nprice.  Our General Public Licenses are designed to make sure that you\nhave the freedom to distribute copies of free software (and charge for\nthem if you wish), that you receive source code or can get it if you\nwant it, that you can change the software or use pieces of it in new\nfree programs, and that you know you can do these things.\n\n  To protect your rights, we need to prevent others from denying you\nthese rights or asking you to surrender the rights.  Therefore, you have\ncertain responsibilities if you distribute copies of the software, or if\nyou modify it: responsibilities to respect the freedom of others.\n\n  For example, if you distribute copies of such a program, whether\ngratis or for a fee, you must pass on to the recipients the same\nfreedoms that you received.  You must make sure that they, too, receive\nor can get the source code.  And you must show them these terms so they\nknow their rights.\n\n  Developers that use the GNU GPL protect your rights with two steps:\n(1) assert copyright on the software, and (2) offer you this License\ngiving you legal permission to copy, distribute and/or modify it.\n\n  For the developers' and authors' protection, the GPL clearly explains\nthat there is no warranty for this free software.  For both users' and\nauthors' sake, the GPL requires that modified versions be marked as\nchanged, so that their problems will not be attributed erroneously to\nauthors of previous versions.\n\n  Some devices are designed to deny users access to install or run\nmodified versions of the software inside them, although the manufacturer\ncan do so.  This is fundamentally incompatible with the aim of\nprotecting users' freedom to change the software.  The systematic\npattern of such abuse occurs in the area of products for individuals to\nuse, which is precisely where it is most unacceptable.  Therefore, we\nhave designed this version of the GPL to prohibit the practice for those\nproducts.  If such problems arise substantially in other domains, we\nstand ready to extend this provision to those domains in future versions\nof the GPL, as needed to protect the freedom of users.\n\n  Finally, every program is threatened constantly by software patents.\nStates should not allow patents to restrict development and use of\nsoftware on general-purpose computers, but in those that do, we wish to\navoid the special danger that patents applied to a free program could\nmake it effectively proprietary.  To prevent this, the GPL assures that\npatents cannot be used to render the program non-free.\n\n  The precise terms and conditions for copying, distribution and\nmodification follow.\n\n                       TERMS AND CONDITIONS\n\n  0. Definitions.\n\n  \"This License\" refers to version 3 of the GNU General Public License.\n\n  \"Copyright\" also means copyright-like laws that apply to other kinds of\nworks, such as semiconductor masks.\n\n  \"The Program\" refers to any copyrightable work licensed under this\nLicense.  Each licensee is addressed as \"you\".  \"Licensees\" and\n\"recipients\" may be individuals or organizations.\n\n  To \"modify\" a work means to copy from or adapt all or part of the work\nin a fashion requiring copyright permission, other than the making of an\nexact copy.  The resulting work is called a \"modified version\" of the\nearlier work or a work \"based on\" the earlier work.\n\n  A \"covered work\" means either the unmodified Program or a work based\non the Program.\n\n  To \"propagate\" a work means to do anything with it that, without\npermission, would make you directly or secondarily liable for\ninfringement under applicable copyright law, except executing it on a\ncomputer or modifying a private copy.  Propagation includes copying,\ndistribution (with or without modification), making available to the\npublic, and in some countries other activities as well.\n\n  To \"convey\" a work means any kind of propagation that enables other\nparties to make or receive copies.  Mere interaction with a user through\na computer network, with no transfer of a copy, is not conveying.\n\n  An interactive user interface displays \"Appropriate Legal Notices\"\nto the extent that it includes a convenient and prominently visible\nfeature that (1) displays an appropriate copyright notice, and (2)\ntells the user that there is no warranty for the work (except to the\nextent that warranties are provided), that licensees may convey the\nwork under this License, and how to view a copy of this License.  If\nthe interface presents a list of user commands or options, such as a\nmenu, a prominent item in the list meets this criterion.\n\n  1. Source Code.\n\n  The \"source code\" for a work means the preferred form of the work\nfor making modifications to it.  \"Object code\" means any non-source\nform of a work.\n\n  A \"Standard Interface\" means an interface that either is an official\nstandard defined by a recognized standards body, or, in the case of\ninterfaces specified for a particular programming language, one that\nis widely used among developers working in that language.\n\n  The \"System Libraries\" of an executable work include anything, other\nthan the work as a whole, that (a) is included in the normal form of\npackaging a Major Component, but which is not part of that Major\nComponent, and (b) serves only to enable use of the work with that\nMajor Component, or to implement a Standard Interface for which an\nimplementation is available to the public in source code form.  A\n\"Major Component\", in this context, means a major essential component\n(kernel, window system, and so on) of the specific operating system\n(if any) on which the executable work runs, or a compiler used to\nproduce the work, or an object code interpreter used to run it.\n\n  The \"Corresponding Source\" for a work in object code form means all\nthe source code needed to generate, install, and (for an executable\nwork) run the object code and to modify the work, including scripts to\ncontrol those activities.  However, it does not include the work's\nSystem Libraries, or general-purpose tools or generally available free\nprograms which are used unmodified in performing those activities but\nwhich are not part of the work.  For example, Corresponding Source\nincludes interface definition files associated with source files for\nthe work, and the source code for shared libraries and dynamically\nlinked subprograms that the work is specifically designed to require,\nsuch as by intimate data communication or control flow between those\nsubprograms and other parts of the work.\n\n  The Corresponding Source need not include anything that users\ncan regenerate automatically from other parts of the Corresponding\nSource.\n\n  The Corresponding Source for a work in source code form is that\nsame work.\n\n  2. Basic Permissions.\n\n  All rights granted under this License are granted for the term of\ncopyright on the Program, and are irrevocable provided the stated\nconditions are met.  This License explicitly affirms your unlimited\npermission to run the unmodified Program.  The output from running a\ncovered work is covered by this License only if the output, given its\ncontent, constitutes a covered work.  This License acknowledges your\nrights of fair use or other equivalent, as provided by copyright law.\n\n  You may make, run and propagate covered works that you do not\nconvey, without conditions so long as your license otherwise remains\nin force.  You may convey covered works to others for the sole purpose\nof having them make modifications exclusively for you, or provide you\nwith facilities for running those works, provided that you comply with\nthe terms of this License in conveying all material for which you do\nnot control copyright.  Those thus making or running the covered works\nfor you must do so exclusively on your behalf, under your direction\nand control, on terms that prohibit them from making any copies of\nyour copyrighted material outside their relationship with you.\n\n  Conveying under any other circumstances is permitted solely under\nthe conditions stated below.  Sublicensing is not allowed; section 10\nmakes it unnecessary.\n\n  3. Protecting Users' Legal Rights From Anti-Circumvention Law.\n\n  No covered work shall be deemed part of an effective technological\nmeasure under any applicable law fulfilling obligations under article\n11 of the WIPO copyright treaty adopted on 20 December 1996, or\nsimilar laws prohibiting or restricting circumvention of such\nmeasures.\n\n  When you convey a covered work, you waive any legal power to forbid\ncircumvention of technological measures to the extent such circumvention\nis effected by exercising rights under this License with respect to\nthe covered work, and you disclaim any intention to limit operation or\nmodification of the work as a means of enforcing, against the work's\nusers, your or third parties' legal rights to forbid circumvention of\ntechnological measures.\n\n  4. Conveying Verbatim Copies.\n\n  You may convey verbatim copies of the Program's source code as you\nreceive it, in any medium, provided that you conspicuously and\nappropriately publish on each copy an appropriate copyright notice;\nkeep intact all notices stating that this License and any\nnon-permissive terms added in accord with section 7 apply to the code;\nkeep intact all notices of the absence of any warranty; and give all\nrecipients a copy of this License along with the Program.\n\n  You may charge any price or no price for each copy that you convey,\nand you may offer support or warranty protection for a fee.\n\n  5. Conveying Modified Source Versions.\n\n  You may convey a work based on the Program, or the modifications to\nproduce it from the Program, in the form of source code under the\nterms of section 4, provided that you also meet all of these conditions:\n\n    a) The work must carry prominent notices stating that you modified\n    it, and giving a relevant date.\n\n    b) The work must carry prominent notices stating that it is\n    released under this License and any conditions added under section\n    7.  This requirement modifies the requirement in section 4 to\n    \"keep intact all notices\".\n\n    c) You must license the entire work, as a whole, under this\n    License to anyone who comes into possession of a copy.  This\n    License will therefore apply, along with any applicable section 7\n    additional terms, to the whole of the work, and all its parts,\n    regardless of how they are packaged.  This License gives no\n    permission to license the work in any other way, but it does not\n    invalidate such permission if you have separately received it.\n\n    d) If the work has interactive user interfaces, each must display\n    Appropriate Legal Notices; however, if the Program has interactive\n    interfaces that do not display Appropriate Legal Notices, your\n    work need not make them do so.\n\n  A compilation of a covered work with other separate and independent\nworks, which are not by their nature extensions of the covered work,\nand which are not combined with it such as to form a larger program,\nin or on a volume of a storage or distribution medium, is called an\n\"aggregate\" if the compilation and its resulting copyright are not\nused to limit the access or legal rights of the compilation's users\nbeyond what the individual works permit.  Inclusion of a covered work\nin an aggregate does not cause this License to apply to the other\nparts of the aggregate.\n\n  6. Conveying Non-Source Forms.\n\n  You may convey a covered work in object code form under the terms\nof sections 4 and 5, provided that you also convey the\nmachine-readable Corresponding Source under the terms of this License,\nin one of these ways:\n\n    a) Convey the object code in, or embodied in, a physical product\n    (including a physical distribution medium), accompanied by the\n    Corresponding Source fixed on a durable physical medium\n    customarily used for software interchange.\n\n    b) Convey the object code in, or embodied in, a physical product\n    (including a physical distribution medium), accompanied by a\n    written offer, valid for at least three years and valid for as\n    long as you offer spare parts or customer support for that product\n    model, to give anyone who possesses the object code either (1) a\n    copy of the Corresponding Source for all the software in the\n    product that is covered by this License, on a durable physical\n    medium customarily used for software interchange, for a price no\n    more than your reasonable cost of physically performing this\n    conveying of source, or (2) access to copy the\n    Corresponding Source from a network server at no charge.\n\n    c) Convey individual copies of the object code with a copy of the\n    written offer to provide the Corresponding Source.  This\n    alternative is allowed only occasionally and noncommercially, and\n    only if you received the object code with such an offer, in accord\n    with subsection 6b.\n\n    d) Convey the object code by offering access from a designated\n    place (gratis or for a charge), and offer equivalent access to the\n    Corresponding Source in the same way through the same place at no\n    further charge.  You need not require recipients to copy the\n    Corresponding Source along with the object code.  If the place to\n    copy the object code is a network server, the Corresponding Source\n    may be on a different server (operated by you or a third party)\n    that supports equivalent copying facilities, provided you maintain\n    clear directions next to the object code saying where to find the\n    Corresponding Source.  Regardless of what server hosts the\n    Corresponding Source, you remain obligated to ensure that it is\n    available for as long as needed to satisfy these requirements.\n\n    e) Convey the object code using peer-to-peer transmission, provided\n    you inform other peers where the object code and Corresponding\n    Source of the work are being offered to the general public at no\n    charge under subsection 6d.\n\n  A separable portion of the object code, whose source code is excluded\nfrom the Corresponding Source as a System Library, need not be\nincluded in conveying the object code work.\n\n  A \"User Product\" is either (1) a \"consumer product\", which means any\ntangible personal property which is normally used for personal, family,\nor household purposes, or (2) anything designed or sold for incorporation\ninto a dwelling.  In determining whether a product is a consumer product,\ndoubtful cases shall be resolved in favor of coverage.  For a particular\nproduct received by a particular user, \"normally used\" refers to a\ntypical or common use of that class of product, regardless of the status\nof the particular user or of the way in which the particular user\nactually uses, or expects or is expected to use, the product.  A product\nis a consumer product regardless of whether the product has substantial\ncommercial, industrial or non-consumer uses, unless such uses represent\nthe only significant mode of use of the product.\n\n  \"Installation Information\" for a User Product means any methods,\nprocedures, authorization keys, or other information required to install\nand execute modified versions of a covered work in that User Product from\na modified version of its Corresponding Source.  The information must\nsuffice to ensure that the continued functioning of the modified object\ncode is in no case prevented or interfered with solely because\nmodification has been made.\n\n  If you convey an object code work under this section in, or with, or\nspecifically for use in, a User Product, and the conveying occurs as\npart of a transaction in which the right of possession and use of the\nUser Product is transferred to the recipient in perpetuity or for a\nfixed term (regardless of how the transaction is characterized), the\nCorresponding Source conveyed under this section must be accompanied\nby the Installation Information.  But this requirement does not apply\nif neither you nor any third party retains the ability to install\nmodified object code on the User Product (for example, the work has\nbeen installed in ROM).\n\n  The requirement to provide Installation Information does not include a\nrequirement to continue to provide support service, warranty, or updates\nfor a work that has been modified or installed by the recipient, or for\nthe User Product in which it has been modified or installed.  Access to a\nnetwork may be denied when the modification itself materially and\nadversely affects the operation of the network or violates the rules and\nprotocols for communication across the network.\n\n  Corresponding Source conveyed, and Installation Information provided,\nin accord with this section must be in a format that is publicly\ndocumented (and with an implementation available to the public in\nsource code form), and must require no special password or key for\nunpacking, reading or copying.\n\n  7. Additional Terms.\n\n  \"Additional permissions\" are terms that supplement the terms of this\nLicense by making exceptions from one or more of its conditions.\nAdditional permissions that are applicable to the entire Program shall\nbe treated as though they were included in this License, to the extent\nthat they are valid under applicable law.  If additional permissions\napply only to part of the Program, that part may be used separately\nunder those permissions, but the entire Program remains governed by\nthis License without regard to the additional permissions.\n\n  When you convey a copy of a covered work, you may at your option\nremove any additional permissions from that copy, or from any part of\nit.  (Additional permissions may be written to require their own\nremoval in certain cases when you modify the work.)  You may place\nadditional permissions on material, added by you to a covered work,\nfor which you have or can give appropriate copyright permission.\n\n  Notwithstanding any other provision of this License, for material you\nadd to a covered work, you may (if authorized by the copyright holders of\nthat material) supplement the terms of this License with terms:\n\n    a) Disclaiming warranty or limiting liability differently from the\n    terms of sections 15 and 16 of this License; or\n\n    b) Requiring preservation of specified reasonable legal notices or\n    author attributions in that material or in the Appropriate Legal\n    Notices displayed by works containing it; or\n\n    c) Prohibiting misrepresentation of the origin of that material, or\n    requiring that modified versions of such material be marked in\n    reasonable ways as different from the original version; or\n\n    d) Limiting the use for publicity purposes of names of licensors or\n    authors of the material; or\n\n    e) Declining to grant rights under trademark law for use of some\n    trade names, trademarks, or service marks; or\n\n    f) Requiring indemnification of licensors and authors of that\n    material by anyone who conveys the material (or modified versions of\n    it) with contractual assumptions of liability to the recipient, for\n    any liability that these contractual assumptions directly impose on\n    those licensors and authors.\n\n  All other non-permissive additional terms are considered \"further\nrestrictions\" within the meaning of section 10.  If the Program as you\nreceived it, or any part of it, contains a notice stating that it is\ngoverned by this License along with a term that is a further\nrestriction, you may remove that term.  If a license document contains\na further restriction but permits relicensing or conveying under this\nLicense, you may add to a covered work material governed by the terms\nof that license document, provided that the further restriction does\nnot survive such relicensing or conveying.\n\n  If you add terms to a covered work in accord with this section, you\nmust place, in the relevant source files, a statement of the\nadditional terms that apply to those files, or a notice indicating\nwhere to find the applicable terms.\n\n  Additional terms, permissive or non-permissive, may be stated in the\nform of a separately written license, or stated as exceptions;\nthe above requirements apply either way.\n\n  8. Termination.\n\n  You may not propagate or modify a covered work except as expressly\nprovided under this License.  Any attempt otherwise to propagate or\nmodify it is void, and will automatically terminate your rights under\nthis License (including any patent licenses granted under the third\nparagraph of section 11).\n\n  However, if you cease all violation of this License, then your\nlicense from a particular copyright holder is reinstated (a)\nprovisionally, unless and until the copyright holder explicitly and\nfinally terminates your license, and (b) permanently, if the copyright\nholder fails to notify you of the violation by some reasonable means\nprior to 60 days after the cessation.\n\n  Moreover, your license from a particular copyright holder is\nreinstated permanently if the copyright holder notifies you of the\nviolation by some reasonable means, this is the first time you have\nreceived notice of violation of this License (for any work) from that\ncopyright holder, and you cure the violation prior to 30 days after\nyour receipt of the notice.\n\n  Termination of your rights under this section does not terminate the\nlicenses of parties who have received copies or rights from you under\nthis License.  If your rights have been terminated and not permanently\nreinstated, you do not qualify to receive new licenses for the same\nmaterial under section 10.\n\n  9. Acceptance Not Required for Having Copies.\n\n  You are not required to accept this License in order to receive or\nrun a copy of the Program.  Ancillary propagation of a covered work\noccurring solely as a consequence of using peer-to-peer transmission\nto receive a copy likewise does not require acceptance.  However,\nnothing other than this License grants you permission to propagate or\nmodify any covered work.  These actions infringe copyright if you do\nnot accept this License.  Therefore, by modifying or propagating a\ncovered work, you indicate your acceptance of this License to do so.\n\n  10. Automatic Licensing of Downstream Recipients.\n\n  Each time you convey a covered work, the recipient automatically\nreceives a license from the original licensors, to run, modify and\npropagate that work, subject to this License.  You are not responsible\nfor enforcing compliance by third parties with this License.\n\n  An \"entity transaction\" is a transaction transferring control of an\norganization, or substantially all assets of one, or subdividing an\norganization, or merging organizations.  If propagation of a covered\nwork results from an entity transaction, each party to that\ntransaction who receives a copy of the work also receives whatever\nlicenses to the work the party's predecessor in interest had or could\ngive under the previous paragraph, plus a right to possession of the\nCorresponding Source of the work from the predecessor in interest, if\nthe predecessor has it or can get it with reasonable efforts.\n\n  You may not impose any further restrictions on the exercise of the\nrights granted or affirmed under this License.  For example, you may\nnot impose a license fee, royalty, or other charge for exercise of\nrights granted under this License, and you may not initiate litigation\n(including a cross-claim or counterclaim in a lawsuit) alleging that\nany patent claim is infringed by making, using, selling, offering for\nsale, or importing the Program or any portion of it.\n\n  11. Patents.\n\n  A \"contributor\" is a copyright holder who authorizes use under this\nLicense of the Program or a work on which the Program is based.  The\nwork thus licensed is called the contributor's \"contributor version\".\n\n  A contributor's \"essential patent claims\" are all patent claims\nowned or controlled by the contributor, whether already acquired or\nhereafter acquired, that would be infringed by some manner, permitted\nby this License, of making, using, or selling its contributor version,\nbut do not include claims that would be infringed only as a\nconsequence of further modification of the contributor version.  For\npurposes of this definition, \"control\" includes the right to grant\npatent sublicenses in a manner consistent with the requirements of\nthis License.\n\n  Each contributor grants you a non-exclusive, worldwide, royalty-free\npatent license under the contributor's essential patent claims, to\nmake, use, sell, offer for sale, import and otherwise run, modify and\npropagate the contents of its contributor version.\n\n  In the following three paragraphs, a \"patent license\" is any express\nagreement or commitment, however denominated, not to enforce a patent\n(such as an express permission to practice a patent or covenant not to\nsue for patent infringement).  To \"grant\" such a patent license to a\nparty means to make such an agreement or commitment not to enforce a\npatent against the party.\n\n  If you convey a covered work, knowingly relying on a patent license,\nand the Corresponding Source of the work is not available for anyone\nto copy, free of charge and under the terms of this License, through a\npublicly available network server or other readily accessible means,\nthen you must either (1) cause the Corresponding Source to be so\navailable, or (2) arrange to deprive yourself of the benefit of the\npatent license for this particular work, or (3) arrange, in a manner\nconsistent with the requirements of this License, to extend the patent\nlicense to downstream recipients.  \"Knowingly relying\" means you have\nactual knowledge that, but for the patent license, your conveying the\ncovered work in a country, or your recipient's use of the covered work\nin a country, would infringe one or more identifiable patents in that\ncountry that you have reason to believe are valid.\n\n  If, pursuant to or in connection with a single transaction or\narrangement, you convey, or propagate by procuring conveyance of, a\ncovered work, and grant a patent license to some of the parties\nreceiving the covered work authorizing them to use, propagate, modify\nor convey a specific copy of the covered work, then the patent license\nyou grant is automatically extended to all recipients of the covered\nwork and works based on it.\n\n  A patent license is \"discriminatory\" if it does not include within\nthe scope of its coverage, prohibits the exercise of, or is\nconditioned on the non-exercise of one or more of the rights that are\nspecifically granted under this License.  You may not convey a covered\nwork if you are a party to an arrangement with a third party that is\nin the business of distributing software, under which you make payment\nto the third party based on the extent of your activity of conveying\nthe work, and under which the third party grants, to any of the\nparties who would receive the covered work from you, a discriminatory\npatent license (a) in connection with copies of the covered work\nconveyed by you (or copies made from those copies), or (b) primarily\nfor and in connection with specific products or compilations that\ncontain the covered work, unless you entered into that arrangement,\nor that patent license was granted, prior to 28 March 2007.\n\n  Nothing in this License shall be construed as excluding or limiting\nany implied license or other defenses to infringement that may\notherwise be available to you under applicable patent law.\n\n  12. No Surrender of Others' Freedom.\n\n  If conditions are imposed on you (whether by court order, agreement or\notherwise) that contradict the conditions of this License, they do not\nexcuse you from the conditions of this License.  If you cannot convey a\ncovered work so as to satisfy simultaneously your obligations under this\nLicense and any other pertinent obligations, then as a consequence you may\nnot convey it at all.  For example, if you agree to terms that obligate you\nto collect a royalty for further conveying from those to whom you convey\nthe Program, the only way you could satisfy both those terms and this\nLicense would be to refrain entirely from conveying the Program.\n\n  13. Use with the GNU Affero General Public License.\n\n  Notwithstanding any other provision of this License, you have\npermission to link or combine any covered work with a work licensed\nunder version 3 of the GNU Affero General Public License into a single\ncombined work, and to convey the resulting work.  The terms of this\nLicense will continue to apply to the part which is the covered work,\nbut the special requirements of the GNU Affero General Public License,\nsection 13, concerning interaction through a network will apply to the\ncombination as such.\n\n  14. Revised Versions of this License.\n\n  The Free Software Foundation may publish revised and/or new versions of\nthe GNU General Public License from time to time.  Such new versions will\nbe similar in spirit to the present version, but may differ in detail to\naddress new problems or concerns.\n\n  Each version is given a distinguishing version number.  If the\nProgram specifies that a certain numbered version of the GNU General\nPublic License \"or any later version\" applies to it, you have the\noption of following the terms and conditions either of that numbered\nversion or of any later version published by the Free Software\nFoundation.  If the Program does not specify a version number of the\nGNU General Public License, you may choose any version ever published\nby the Free Software Foundation.\n\n  If the Program specifies that a proxy can decide which future\nversions of the GNU General Public License can be used, that proxy's\npublic statement of acceptance of a version permanently authorizes you\nto choose that version for the Program.\n\n  Later license versions may give you additional or different\npermissions.  However, no additional obligations are imposed on any\nauthor or copyright holder as a result of your choosing to follow a\nlater version.\n\n  15. Disclaimer of Warranty.\n\n  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY\nAPPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT\nHOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM \"AS IS\" WITHOUT WARRANTY\nOF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,\nTHE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR\nPURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM\nIS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF\nALL NECESSARY SERVICING, REPAIR OR CORRECTION.\n\n  16. Limitation of Liability.\n\n  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING\nWILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS\nTHE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY\nGENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE\nUSE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF\nDATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD\nPARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),\nEVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF\nSUCH DAMAGES.\n\n  17. Interpretation of Sections 15 and 16.\n\n  If the disclaimer of warranty and limitation of liability provided\nabove cannot be given local legal effect according to their terms,\nreviewing courts shall apply local law that most closely approximates\nan absolute waiver of all civil liability in connection with the\nProgram, unless a warranty or assumption of liability accompanies a\ncopy of the Program in return for a fee.\n\n                     END OF TERMS AND CONDITIONS\n\n            How to Apply These Terms to Your New Programs\n\n  If you develop a new program, and you want it to be of the greatest\npossible use to the public, the best way to achieve this is to make it\nfree software which everyone can redistribute and change under these terms.\n\n  To do so, attach the following notices to the program.  It is safest\nto attach them to the start of each source file to most effectively\nstate the exclusion of warranty; and each file should have at least\nthe \"copyright\" line and a pointer to where the full notice is found.\n\n    <one line to give the program's name and a brief idea of what it does.>\n    Copyright (C) <year>  <name of author>\n\n    This program is free software: you can redistribute it and/or modify\n    it under the terms of the GNU General Public License as published by\n    the Free Software Foundation, either version 3 of the License, or\n    (at your option) any later version.\n\n    This program is distributed in the hope that it will be useful,\n    but WITHOUT ANY WARRANTY; without even the implied warranty of\n    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n    GNU General Public License for more details.\n\n    You should have received a copy of the GNU General Public License\n    along with this program.  If not, see <https://www.gnu.org/licenses/>.\n\nAlso add information on how to contact you by electronic and paper mail.\n\n  If the program does terminal interaction, make it output a short\nnotice like this when it starts in an interactive mode:\n\n    <program>  Copyright (C) <year>  <name of author>\n    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.\n    This is free software, and you are welcome to redistribute it\n    under certain conditions; type `show c' for details.\n\nThe hypothetical commands `show w' and `show c' should show the appropriate\nparts of the General Public License.  Of course, your program's commands\nmight be different; for a GUI interface, you would use an \"about box\".\n\n  You should also get your employer (if you work as a programmer) or school,\nif any, to sign a \"copyright disclaimer\" for the program, if necessary.\nFor more information on this, and how to apply and follow the GNU GPL, see\n<https://www.gnu.org/licenses/>.\n\n  The GNU General Public License does not permit incorporating your program\ninto proprietary programs.  If your program is a subroutine library, you\nmay consider it more useful to permit linking proprietary applications with\nthe library.  If this is what you want to do, use the GNU Lesser General\nPublic License instead of this License.  But first, please read\n<https://www.gnu.org/licenses/why-not-lgpl.html>.\n"
  },
  {
    "path": "Open-Sora/Dockerfile",
    "content": "FROM hpcaitech/pytorch-cuda:2.1.0-12.1.0\n\n# metainformation\nLABEL org.opencontainers.image.source = \"https://github.com/hpcaitech/Open-Sora\"\nLABEL org.opencontainers.image.licenses = \"Apache License 2.0\"\nLABEL org.opencontainers.image.base.name = \"docker.io/library/hpcaitech/pytorch-cuda:2.1.0-12.1.0\"\n\n# Set the working directory\nWORKDIR /workspace/Open-Sora\n# Copy the current directory contents into the container at /workspace/Open-Sora\nCOPY . .\n\n# inatall library dependencies\nRUN apt-get update && apt-get install ffmpeg libsm6 libxext6  -y\n\n# install flash attention\nRUN pip install flash-attn --no-build-isolation\n\n# install apex\nRUN pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings \"--build-option=--cpp_ext\" --config-settings \"--build-option=--cuda_ext\" git+https://github.com/NVIDIA/apex.git\n\n# install xformers\nRUN pip install xformers --index-url https://download.pytorch.org/whl/cu121\n\n# install this project\nRUN pip install -v .\n"
  },
  {
    "path": "Open-Sora/LICENSE",
    "content": "Copyright 2024 HPC-AI Technology Inc. All rights reserved.\n                                 Apache License\n                           Version 2.0, January 2004\n                        http://www.apache.org/licenses/\n\n   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION\n\n   1. Definitions.\n\n      \"License\" shall mean the terms and conditions for use, reproduction,\n      and distribution as defined by Sections 1 through 9 of this document.\n\n      \"Licensor\" shall mean the copyright owner or entity authorized by\n      the copyright owner that is granting the License.\n\n      \"Legal Entity\" shall mean the union of the acting entity and all\n      other entities that control, are controlled by, or are under common\n      control with that entity. For the purposes of this definition,\n      \"control\" means (i) the power, direct or indirect, to cause the\n      direction or management of such entity, whether by contract or\n      otherwise, or (ii) ownership of fifty percent (50%) or more of the\n      outstanding shares, or (iii) beneficial ownership of such entity.\n\n      \"You\" (or \"Your\") shall mean an individual or Legal Entity\n      exercising permissions granted by this License.\n\n      \"Source\" form shall mean the preferred form for making modifications,\n      including but not limited to software source code, documentation\n      source, and configuration files.\n\n      \"Object\" form shall mean any form resulting from mechanical\n      transformation or translation of a Source form, including but\n      not limited to compiled object code, generated documentation,\n      and conversions to other media types.\n\n      \"Work\" shall mean the work of authorship, whether in Source or\n      Object form, made available under the License, as indicated by a\n      copyright notice that is included in or attached to the work\n      (an example is provided in the Appendix below).\n\n      \"Derivative Works\" shall mean any work, whether in Source or Object\n      form, that is based on (or derived from) the Work and for which the\n      editorial revisions, annotations, elaborations, or other modifications\n      represent, as a whole, an original work of authorship. For the purposes\n      of this License, Derivative Works shall not include works that remain\n      separable from, or merely link (or bind by name) to the interfaces of,\n      the Work and Derivative Works thereof.\n\n      \"Contribution\" shall mean any work of authorship, including\n      the original version of the Work and any modifications or additions\n      to that Work or Derivative Works thereof, that is intentionally\n      submitted to Licensor for inclusion in the Work by the copyright owner\n      or by an individual or Legal Entity authorized to submit on behalf of\n      the copyright owner. For the purposes of this definition, \"submitted\"\n      means any form of electronic, verbal, or written communication sent\n      to the Licensor or its representatives, including but not limited to\n      communication on electronic mailing lists, source code control systems,\n      and issue tracking systems that are managed by, or on behalf of, the\n      Licensor for the purpose of discussing and improving the Work, but\n      excluding communication that is conspicuously marked or otherwise\n      designated in writing by the copyright owner as \"Not a Contribution.\"\n\n      \"Contributor\" shall mean Licensor and any individual or Legal Entity\n      on behalf of whom a Contribution has been received by Licensor and\n      subsequently incorporated within the Work.\n\n   2. Grant of Copyright License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      copyright license to reproduce, prepare Derivative Works of,\n      publicly display, publicly perform, sublicense, and distribute the\n      Work and such Derivative Works in Source or Object form.\n\n   3. Grant of Patent License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      (except as stated in this section) patent license to make, have made,\n      use, offer to sell, sell, import, and otherwise transfer the Work,\n      where such license applies only to those patent claims licensable\n      by such Contributor that are necessarily infringed by their\n      Contribution(s) alone or by combination of their Contribution(s)\n      with the Work to which such Contribution(s) was submitted. If You\n      institute patent litigation against any entity (including a\n      cross-claim or counterclaim in a lawsuit) alleging that the Work\n      or a Contribution incorporated within the Work constitutes direct\n      or contributory patent infringement, then any patent licenses\n      granted to You under this License for that Work shall terminate\n      as of the date such litigation is filed.\n\n   4. Redistribution. You may reproduce and distribute copies of the\n      Work or Derivative Works thereof in any medium, with or without\n      modifications, and in Source or Object form, provided that You\n      meet the following conditions:\n\n      (a) You must give any other recipients of the Work or\n          Derivative Works a copy of this License; and\n\n      (b) You must cause any modified files to carry prominent notices\n          stating that You changed the files; and\n\n      (c) You must retain, in the Source form of any Derivative Works\n          that You distribute, all copyright, patent, trademark, and\n          attribution notices from the Source form of the Work,\n          excluding those notices that do not pertain to any part of\n          the Derivative Works; and\n\n      (d) If the Work includes a \"NOTICE\" text file as part of its\n          distribution, then any Derivative Works that You distribute must\n          include a readable copy of the attribution notices contained\n          within such NOTICE file, excluding those notices that do not\n          pertain to any part of the Derivative Works, in at least one\n          of the following places: within a NOTICE text file distributed\n          as part of the Derivative Works; within the Source form or\n          documentation, if provided along with the Derivative Works; or,\n          within a display generated by the Derivative Works, if and\n          wherever such third-party notices normally appear. The contents\n          of the NOTICE file are for informational purposes only and\n          do not modify the License. You may add Your own attribution\n          notices within Derivative Works that You distribute, alongside\n          or as an addendum to the NOTICE text from the Work, provided\n          that such additional attribution notices cannot be construed\n          as modifying the License.\n\n      You may add Your own copyright statement to Your modifications and\n      may provide additional or different license terms and conditions\n      for use, reproduction, or distribution of Your modifications, or\n      for any such Derivative Works as a whole, provided Your use,\n      reproduction, and distribution of the Work otherwise complies with\n      the conditions stated in this License.\n\n   5. Submission of Contributions. Unless You explicitly state otherwise,\n      any Contribution intentionally submitted for inclusion in the Work\n      by You to the Licensor shall be under the terms and conditions of\n      this License, without any additional terms or conditions.\n      Notwithstanding the above, nothing herein shall supersede or modify\n      the terms of any separate license agreement you may have executed\n      with Licensor regarding such Contributions.\n\n   6. Trademarks. This License does not grant permission to use the trade\n      names, trademarks, service marks, or product names of the Licensor,\n      except as required for reasonable and customary use in describing the\n      origin of the Work and reproducing the content of the NOTICE file.\n\n   7. Disclaimer of Warranty. Unless required by applicable law or\n      agreed to in writing, Licensor provides the Work (and each\n      Contributor provides its Contributions) on an \"AS IS\" BASIS,\n      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or\n      implied, including, without limitation, any warranties or conditions\n      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A\n      PARTICULAR PURPOSE. You are solely responsible for determining the\n      appropriateness of using or redistributing the Work and assume any\n      risks associated with Your exercise of permissions under this License.\n\n   8. Limitation of Liability. In no event and under no legal theory,\n      whether in tort (including negligence), contract, or otherwise,\n      unless required by applicable law (such as deliberate and grossly\n      negligent acts) or agreed to in writing, shall any Contributor be\n      liable to You for damages, including any direct, indirect, special,\n      incidental, or consequential damages of any character arising as a\n      result of this License or out of the use or inability to use the\n      Work (including but not limited to damages for loss of goodwill,\n      work stoppage, computer failure or malfunction, or any and all\n      other commercial damages or losses), even if such Contributor\n      has been advised of the possibility of such damages.\n\n   9. Accepting Warranty or Additional Liability. While redistributing\n      the Work or Derivative Works thereof, You may choose to offer,\n      and charge a fee for, acceptance of support, warranty, indemnity,\n      or other liability obligations and/or rights consistent with this\n      License. However, in accepting such obligations, You may act only\n      on Your own behalf and on Your sole responsibility, not on behalf\n      of any other Contributor, and only if You agree to indemnify,\n      defend, and hold each Contributor harmless for any liability\n      incurred by, or claims asserted against, such Contributor by reason\n      of your accepting any such warranty or additional liability.\n\n   END OF TERMS AND CONDITIONS\n\n   APPENDIX: How to apply the Apache License to your work.\n\n      To apply the Apache License to your work, attach the following\n      boilerplate notice, with the fields enclosed by brackets \"[]\"\n      replaced with your own identifying information. (Don't include\n      the brackets!)  The text should be enclosed in the appropriate\n      comment syntax for the file format. We also recommend that a\n      file or class name and description of purpose be included on the\n      same \"printed page\" as the copyright notice for easier\n      identification within third-party archives.\n\n   Copyright 2024 HPC-AI Technology Inc.\n\n   Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License.\n\n   =========================================================================\n   This project is inspired by the listed projects and is subject to the following licenses:\n\n   1. Latte (https://github.com/Vchitect/Latte/blob/main/LICENSE)\n\n   Copyright 2024 Latte\n\n   Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License.\n\n   2. PixArt-alpha (https://github.com/PixArt-alpha/PixArt-alpha/blob/master/LICENSE)\n\n   Copyright (C) 2024 PixArt-alpha/PixArt-alpha\n\n   This program is free software: you can redistribute it and/or modify\n   it under the terms of the GNU Affero General Public License as published\n   by the Free Software Foundation, either version 3 of the License, or\n   (at your option) any later version.\n\n   This program is distributed in the hope that it will be useful,\n   but WITHOUT ANY WARRANTY; without even the implied warranty of\n   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n   GNU Affero General Public License for more details.\n\n   You should have received a copy of the GNU Affero General Public License\n   along with this program.  If not, see <https://www.gnu.org/licenses/>.\n\n   3. dpm-solver (https://github.com/LuChengTHU/dpm-solver/blob/main/LICENSE)\n\n   MIT License\n\n   Copyright (c) 2022 Cheng Lu\n\n   Permission is hereby granted, free of charge, to any person obtaining a copy\n   of this software and associated documentation files (the \"Software\"), to deal\n   in the Software without restriction, including without limitation the rights\n   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n   copies of the Software, and to permit persons to whom the Software is\n   furnished to do so, subject to the following conditions:\n\n   The above copyright notice and this permission notice shall be included in all\n   copies or substantial portions of the Software.\n\n   THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n   SOFTWARE.\n\n   4. DiT (https://github.com/facebookresearch/DiT/blob/main/LICENSE.txt)\n\n   Attribution-NonCommercial 4.0 International\n\n   =======================================================================\n\n   Creative Commons Corporation (\"Creative Commons\") is not a law firm and\n   does not provide legal services or legal advice. Distribution of\n   Creative Commons public licenses does not create a lawyer-client or\n   other relationship. Creative Commons makes its licenses and related\n   information available on an \"as-is\" basis. Creative Commons gives no\n   warranties regarding its licenses, any material licensed under their\n   terms and conditions, or any related information. Creative Commons\n   disclaims all liability for damages resulting from their use to the\n   fullest extent possible.\n\n   Using Creative Commons Public Licenses\n\n   Creative Commons public licenses provide a standard set of terms and\n   conditions that creators and other rights holders may use to share\n   original works of authorship and other material subject to copyright\n   and certain other rights specified in the public license below. The\n   following considerations are for informational purposes only, are not\n   exhaustive, and do not form part of our licenses.\n\n      Considerations for licensors: Our public licenses are\n      intended for use by those authorized to give the public\n      permission to use material in ways otherwise restricted by\n      copyright and certain other rights. Our licenses are\n      irrevocable. Licensors should read and understand the terms\n      and conditions of the license they choose before applying it.\n      Licensors should also secure all rights necessary before\n      applying our licenses so that the public can reuse the\n      material as expected. Licensors should clearly mark any\n      material not subject to the license. This includes other CC-\n      licensed material, or material used under an exception or\n      limitation to copyright. More considerations for licensors:\n      wiki.creativecommons.org/Considerations_for_licensors\n\n      Considerations for the public: By using one of our public\n      licenses, a licensor grants the public permission to use the\n      licensed material under specified terms and conditions. If\n      the licensor's permission is not necessary for any reason--for\n      example, because of any applicable exception or limitation to\n      copyright--then that use is not regulated by the license. Our\n      licenses grant only permissions under copyright and certain\n      other rights that a licensor has authority to grant. Use of\n      the licensed material may still be restricted for other\n      reasons, including because others have copyright or other\n      rights in the material. A licensor may make special requests,\n      such as asking that all changes be marked or described.\n      Although not required by our licenses, you are encouraged to\n      respect those requests where reasonable. More_considerations\n      for the public:\n      wiki.creativecommons.org/Considerations_for_licensees\n\n   =======================================================================\n\n   Creative Commons Attribution-NonCommercial 4.0 International Public\n   License\n\n   By exercising the Licensed Rights (defined below), You accept and agree\n   to be bound by the terms and conditions of this Creative Commons\n   Attribution-NonCommercial 4.0 International Public License (\"Public\n   License\"). To the extent this Public License may be interpreted as a\n   contract, You are granted the Licensed Rights in consideration of Your\n   acceptance of these terms and conditions, and the Licensor grants You\n   such rights in consideration of benefits the Licensor receives from\n   making the Licensed Material available under these terms and\n   conditions.\n\n   Section 1 -- Definitions.\n\n   a. Adapted Material means material subject to Copyright and Similar\n      Rights that is derived from or based upon the Licensed Material\n      and in which the Licensed Material is translated, altered,\n      arranged, transformed, or otherwise modified in a manner requiring\n      permission under the Copyright and Similar Rights held by the\n      Licensor. For purposes of this Public License, where the Licensed\n      Material is a musical work, performance, or sound recording,\n      Adapted Material is always produced where the Licensed Material is\n      synched in timed relation with a moving image.\n\n   b. Adapter's License means the license You apply to Your Copyright\n      and Similar Rights in Your contributions to Adapted Material in\n      accordance with the terms and conditions of this Public License.\n\n   c. Copyright and Similar Rights means copyright and/or similar rights\n      closely related to copyright including, without limitation,\n      performance, broadcast, sound recording, and Sui Generis Database\n      Rights, without regard to how the rights are labeled or\n      categorized. For purposes of this Public License, the rights\n      specified in Section 2(b)(1)-(2) are not Copyright and Similar\n      Rights.\n   d. Effective Technological Measures means those measures that, in the\n      absence of proper authority, may not be circumvented under laws\n      fulfilling obligations under Article 11 of the WIPO Copyright\n      Treaty adopted on December 20, 1996, and/or similar international\n      agreements.\n\n   e. Exceptions and Limitations means fair use, fair dealing, and/or\n      any other exception or limitation to Copyright and Similar Rights\n      that applies to Your use of the Licensed Material.\n\n   f. Licensed Material means the artistic or literary work, database,\n      or other material to which the Licensor applied this Public\n      License.\n\n   g. Licensed Rights means the rights granted to You subject to the\n      terms and conditions of this Public License, which are limited to\n      all Copyright and Similar Rights that apply to Your use of the\n      Licensed Material and that the Licensor has authority to license.\n\n   h. Licensor means the individual(s) or entity(ies) granting rights\n      under this Public License.\n\n   i. NonCommercial means not primarily intended for or directed towards\n      commercial advantage or monetary compensation. For purposes of\n      this Public License, the exchange of the Licensed Material for\n      other material subject to Copyright and Similar Rights by digital\n      file-sharing or similar means is NonCommercial provided there is\n      no payment of monetary compensation in connection with the\n      exchange.\n\n   j. Share means to provide material to the public by any means or\n      process that requires permission under the Licensed Rights, such\n      as reproduction, public display, public performance, distribution,\n      dissemination, communication, or importation, and to make material\n      available to the public including in ways that members of the\n      public may access the material from a place and at a time\n      individually chosen by them.\n\n   k. Sui Generis Database Rights means rights other than copyright\n      resulting from Directive 96/9/EC of the European Parliament and of\n      the Council of 11 March 1996 on the legal protection of databases,\n      as amended and/or succeeded, as well as other essentially\n      equivalent rights anywhere in the world.\n\n   l. You means the individual or entity exercising the Licensed Rights\n      under this Public License. Your has a corresponding meaning.\n\n   Section 2 -- Scope.\n\n   a. License grant.\n\n         1. Subject to the terms and conditions of this Public License,\n            the Licensor hereby grants You a worldwide, royalty-free,\n            non-sublicensable, non-exclusive, irrevocable license to\n            exercise the Licensed Rights in the Licensed Material to:\n\n               a. reproduce and Share the Licensed Material, in whole or\n                  in part, for NonCommercial purposes only; and\n\n               b. produce, reproduce, and Share Adapted Material for\n                  NonCommercial purposes only.\n\n         2. Exceptions and Limitations. For the avoidance of doubt, where\n            Exceptions and Limitations apply to Your use, this Public\n            License does not apply, and You do not need to comply with\n            its terms and conditions.\n\n         3. Term. The term of this Public License is specified in Section\n            6(a).\n\n         4. Media and formats; technical modifications allowed. The\n            Licensor authorizes You to exercise the Licensed Rights in\n            all media and formats whether now known or hereafter created,\n            and to make technical modifications necessary to do so. The\n            Licensor waives and/or agrees not to assert any right or\n            authority to forbid You from making technical modifications\n            necessary to exercise the Licensed Rights, including\n            technical modifications necessary to circumvent Effective\n            Technological Measures. For purposes of this Public License,\n            simply making modifications authorized by this Section 2(a)\n            (4) never produces Adapted Material.\n\n         5. Downstream recipients.\n\n               a. Offer from the Licensor -- Licensed Material. Every\n                  recipient of the Licensed Material automatically\n                  receives an offer from the Licensor to exercise the\n                  Licensed Rights under the terms and conditions of this\n                  Public License.\n\n               b. No downstream restrictions. You may not offer or impose\n                  any additional or different terms or conditions on, or\n                  apply any Effective Technological Measures to, the\n                  Licensed Material if doing so restricts exercise of the\n                  Licensed Rights by any recipient of the Licensed\n                  Material.\n\n         6. No endorsement. Nothing in this Public License constitutes or\n            may be construed as permission to assert or imply that You\n            are, or that Your use of the Licensed Material is, connected\n            with, or sponsored, endorsed, or granted official status by,\n            the Licensor or others designated to receive attribution as\n            provided in Section 3(a)(1)(A)(i).\n\n   b. Other rights.\n\n         1. Moral rights, such as the right of integrity, are not\n            licensed under this Public License, nor are publicity,\n            privacy, and/or other similar personality rights; however, to\n            the extent possible, the Licensor waives and/or agrees not to\n            assert any such rights held by the Licensor to the limited\n            extent necessary to allow You to exercise the Licensed\n            Rights, but not otherwise.\n\n         2. Patent and trademark rights are not licensed under this\n            Public License.\n\n         3. To the extent possible, the Licensor waives any right to\n            collect royalties from You for the exercise of the Licensed\n            Rights, whether directly or through a collecting society\n            under any voluntary or waivable statutory or compulsory\n            licensing scheme. In all other cases the Licensor expressly\n            reserves any right to collect such royalties, including when\n            the Licensed Material is used other than for NonCommercial\n            purposes.\n\n   Section 3 -- License Conditions.\n\n   Your exercise of the Licensed Rights is expressly made subject to the\n   following conditions.\n\n   a. Attribution.\n\n         1. If You Share the Licensed Material (including in modified\n            form), You must:\n\n               a. retain the following if it is supplied by the Licensor\n                  with the Licensed Material:\n\n                  i. identification of the creator(s) of the Licensed\n                     Material and any others designated to receive\n                     attribution, in any reasonable manner requested by\n                     the Licensor (including by pseudonym if\n                     designated);\n\n                  ii. a copyright notice;\n\n                  iii. a notice that refers to this Public License;\n\n                  iv. a notice that refers to the disclaimer of\n                     warranties;\n\n                  v. a URI or hyperlink to the Licensed Material to the\n                     extent reasonably practicable;\n\n               b. indicate if You modified the Licensed Material and\n                  retain an indication of any previous modifications; and\n\n               c. indicate the Licensed Material is licensed under this\n                  Public License, and include the text of, or the URI or\n                  hyperlink to, this Public License.\n\n         2. You may satisfy the conditions in Section 3(a)(1) in any\n            reasonable manner based on the medium, means, and context in\n            which You Share the Licensed Material. For example, it may be\n            reasonable to satisfy the conditions by providing a URI or\n            hyperlink to a resource that includes the required\n            information.\n\n         3. If requested by the Licensor, You must remove any of the\n            information required by Section 3(a)(1)(A) to the extent\n            reasonably practicable.\n\n         4. If You Share Adapted Material You produce, the Adapter's\n            License You apply must not prevent recipients of the Adapted\n            Material from complying with this Public License.\n\n   Section 4 -- Sui Generis Database Rights.\n\n   Where the Licensed Rights include Sui Generis Database Rights that\n   apply to Your use of the Licensed Material:\n\n   a. for the avoidance of doubt, Section 2(a)(1) grants You the right\n      to extract, reuse, reproduce, and Share all or a substantial\n      portion of the contents of the database for NonCommercial purposes\n      only;\n\n   b. if You include all or a substantial portion of the database\n      contents in a database in which You have Sui Generis Database\n      Rights, then the database in which You have Sui Generis Database\n      Rights (but not its individual contents) is Adapted Material; and\n\n   c. You must comply with the conditions in Section 3(a) if You Share\n      all or a substantial portion of the contents of the database.\n\n   For the avoidance of doubt, this Section 4 supplements and does not\n   replace Your obligations under this Public License where the Licensed\n   Rights include other Copyright and Similar Rights.\n\n   Section 5 -- Disclaimer of Warranties and Limitation of Liability.\n\n   a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE\n      EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS\n      AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF\n      ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,\n      IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,\n      WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR\n      PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,\n      ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT\n      KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT\n      ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.\n\n   b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE\n      TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,\n      NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,\n      INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,\n      COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR\n      USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN\n      ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR\n      DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR\n      IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.\n\n   c. The disclaimer of warranties and limitation of liability provided\n      above shall be interpreted in a manner that, to the extent\n      possible, most closely approximates an absolute disclaimer and\n      waiver of all liability.\n\n   Section 6 -- Term and Termination.\n\n   a. This Public License applies for the term of the Copyright and\n      Similar Rights licensed here. However, if You fail to comply with\n      this Public License, then Your rights under this Public License\n      terminate automatically.\n\n   b. Where Your right to use the Licensed Material has terminated under\n      Section 6(a), it reinstates:\n\n         1. automatically as of the date the violation is cured, provided\n            it is cured within 30 days of Your discovery of the\n            violation; or\n\n         2. upon express reinstatement by the Licensor.\n\n      For the avoidance of doubt, this Section 6(b) does not affect any\n      right the Licensor may have to seek remedies for Your violations\n      of this Public License.\n\n   c. For the avoidance of doubt, the Licensor may also offer the\n      Licensed Material under separate terms or conditions or stop\n      distributing the Licensed Material at any time; however, doing so\n      will not terminate this Public License.\n\n   d. Sections 1, 5, 6, 7, and 8 survive termination of this Public\n      License.\n\n   Section 7 -- Other Terms and Conditions.\n\n   a. The Licensor shall not be bound by any additional or different\n      terms or conditions communicated by You unless expressly agreed.\n\n   b. Any arrangements, understandings, or agreements regarding the\n      Licensed Material not stated herein are separate from and\n      independent of the terms and conditions of this Public License.\n\n   Section 8 -- Interpretation.\n\n   a. For the avoidance of doubt, this Public License does not, and\n      shall not be interpreted to, reduce, limit, restrict, or impose\n      conditions on any use of the Licensed Material that could lawfully\n      be made without permission under this Public License.\n\n   b. To the extent possible, if any provision of this Public License is\n      deemed unenforceable, it shall be automatically reformed to the\n      minimum extent necessary to make it enforceable. If the provision\n      cannot be reformed, it shall be severed from this Public License\n      without affecting the enforceability of the remaining terms and\n      conditions.\n\n   c. No term or condition of this Public License will be waived and no\n      failure to comply consented to unless expressly agreed to by the\n      Licensor.\n\n   d. Nothing in this Public License constitutes or may be interpreted\n      as a limitation upon, or waiver of, any privileges and immunities\n      that apply to the Licensor or You, including from the legal\n      processes of any jurisdiction or authority.\n\n   =======================================================================\n\n   Creative Commons is not a party to its public\n   licenses. Notwithstanding, Creative Commons may elect to apply one of\n   its public licenses to material it publishes and in those instances\n   will be considered the “Licensor.” The text of the Creative Commons\n   public licenses is dedicated to the public domain under the CC0 Public\n   Domain Dedication. Except for the limited purpose of indicating that\n   material is shared under a Creative Commons public license or as\n   otherwise permitted by the Creative Commons policies published at\n   creativecommons.org/policies, Creative Commons does not authorize the\n   use of the trademark \"Creative Commons\" or any other trademark or logo\n   of Creative Commons without its prior written consent including,\n   without limitation, in connection with any unauthorized modifications\n   to any of its public licenses or any other arrangements,\n   understandings, or agreements concerning use of licensed material. For\n   the avoidance of doubt, this paragraph does not form part of the\n   public licenses.\n\n   Creative Commons may be contacted at creativecommons.org.\n\n   5. OpenDiT (https://github.com/NUS-HPC-AI-Lab/OpenDiT/blob/master/LICENSE)\n\n   Copyright OpenDiT\n\n   Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License.\n"
  },
  {
    "path": "Open-Sora/README.md",
    "content": "<p align=\"center\">\n    <img src=\"./assets/readme/icon.png\" width=\"250\"/>\n</p>\n<div align=\"center\">\n    <a href=\"https://github.com/hpcaitech/Open-Sora/stargazers\"><img src=\"https://img.shields.io/github/stars/hpcaitech/Open-Sora?style=social\"></a>\n    <a href=\"https://hpcaitech.github.io/Open-Sora/\"><img src=\"https://img.shields.io/badge/Gallery-View-orange?logo=&amp\"></a>\n    <a href=\"https://discord.gg/kZakZzrSUT\"><img src=\"https://img.shields.io/badge/Discord-join-blueviolet?logo=discord&amp\"></a>\n    <a href=\"https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-247ipg9fk-KRRYmUl~u2ll2637WRURVA\"><img src=\"https://img.shields.io/badge/Slack-ColossalAI-blueviolet?logo=slack&amp\"></a>\n    <a href=\"https://twitter.com/yangyou1991/status/1769411544083996787?s=61&t=jT0Dsx2d-MS5vS9rNM5e5g\"><img src=\"https://img.shields.io/badge/Twitter-Discuss-blue?logo=twitter&amp\"></a>\n    <a href=\"https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png\"><img src=\"https://img.shields.io/badge/微信-小助手加群-green?logo=wechat&amp\"></a>\n    <a href=\"https://hpc-ai.com/blog/open-sora-v1.0\"><img src=\"https://img.shields.io/badge/Open_Sora-Blog-blue\"></a>\n    <a href=\"https://huggingface.co/spaces/hpcai-tech/open-sora\"><img src=\"https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Gradio Demo-blue\"></a>\n</div>\n\n## Open-Sora: Democratizing Efficient Video Production for All\n\nWe design and implement **Open-Sora**, an initiative dedicated to **efficiently** producing high-quality video. We hope to make the model,\ntools and all details accessible to all. By embracing **open-source** principles,\nOpen-Sora not only democratizes access to advanced video generation techniques, but also offers a\nstreamlined and user-friendly platform that simplifies the complexities of video generation.\nWith Open-Sora, our goal is to foster innovation, creativity, and inclusivity within the field of content creation.\n\n[[中文文档](/docs/zh_CN/README.md)] [[潞晨云](https://cloud.luchentech.com/)|[OpenSora镜像](https://cloud.luchentech.com/doc/docs/image/open-sora/)|[视频教程](https://www.bilibili.com/video/BV1ow4m1e7PX/?vd_source=c6b752764cd36ff0e535a768e35d98d2)]\n\n## 📰 News\n\n- **[2024.06.17]** 🔥 We released **Open-Sora 1.2**, which includes **3D-VAE**, **rectified flow**, and **score condition**. The video quality is greatly improved. [[checkpoints]](#open-sora-10-model-weights) [[report]](/docs/report_03.md)   [[blog]](https://hpc-ai.com/blog/open-sora-from-hpc-ai-tech-team-continues-open-source-generate-any-16-second-720p-hd-video-with-one-click-model-weights-ready-to-use)\n- **[2024.04.25]** 🤗 We released the [Gradio demo for Open-Sora](https://huggingface.co/spaces/hpcai-tech/open-sora) on Hugging Face Spaces.\n- **[2024.04.25]** We released **Open-Sora 1.1**, which supports **2s~15s, 144p to 720p, any aspect ratio** text-to-image, **text-to-video, image-to-video, video-to-video, infinite time** generation. In addition, a full video processing pipeline is released. [[checkpoints]]() [[report]](/docs/report_02.md)\n- **[2024.03.18]** We released **Open-Sora 1.0**, a fully open-source project for video generation.\n  Open-Sora 1.0 supports a full pipeline of video data preprocessing, training with\n  <a href=\"https://github.com/hpcaitech/ColossalAI\"><img src=\"assets/readme/colossal_ai.png\" width=\"8%\" ></a>\n  acceleration,\n  inference, and more. Our model can produce 2s 512x512 videos with only 3 days training. [[checkpoints]](#open-sora-10-model-weights)\n  [[blog]](https://hpc-ai.com/blog/open-sora-v1.0) [[report]](/docs/report_01.md)\n- **[2024.03.04]** Open-Sora provides training with 46% cost reduction.\n  [[blog]](https://hpc-ai.com/blog/open-sora)\n\n## 🎥 Latest Demo\n\n🔥 You can experience Open-Sora on our [🤗 Gradio application on Hugging Face](https://huggingface.co/spaces/hpcai-tech/open-sora). More samples and corresponding prompts are available in our [Gallery](https://hpcaitech.github.io/Open-Sora/).\n\n| **4s 720×1280**                                                                                                                                      | **4s 720×1280**                                                                                                                                      | **4s 720×1280**                                                                                                                                      |\n| ---------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [<img src=\"assets/demo/v1.2/sample_0013.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/7895aab6-ed23-488c-8486-091480c26327) | [<img src=\"assets/demo/v1.2/sample_1718.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/20f07c7b-182b-4562-bbee-f1df74c86c9a) | [<img src=\"assets/demo/v1.2/sample_0087.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/3d897e0d-dc21-453a-b911-b3bda838acc2) |\n| [<img src=\"assets/demo/v1.2/sample_0052.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/644bf938-96ce-44aa-b797-b3c0b513d64c) | [<img src=\"assets/demo/v1.2/sample_1719.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/272d88ac-4b4a-484d-a665-8d07431671d0) | [<img src=\"assets/demo/v1.2/sample_0002.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/ebbac621-c34e-4bb4-9543-1c34f8989764) |\n| [<img src=\"assets/demo/v1.2/sample_0011.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/a1e3a1a3-4abd-45f5-8df2-6cced69da4ca) | [<img src=\"assets/demo/v1.2/sample_0004.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/d6ce9c13-28e1-4dff-9644-cc01f5f11926) | [<img src=\"assets/demo/v1.2/sample_0061.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/561978f8-f1b0-4f4d-ae7b-45bec9001b4a) |\n\n<details>\n<summary>OpenSora 1.1 Demo</summary>\n\n| **2s 240×426**                                                                                                                                              | **2s 240×426**                                                                                                                                             |\n| ----------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [<img src=\"assets/demo/sample_16x240x426_9.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c31ebc52-de39-4a4e-9b1e-9211d45e05b2) | [<img src=\"assets/demo/sora_16x240x426_26.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c31ebc52-de39-4a4e-9b1e-9211d45e05b2) |\n| [<img src=\"assets/demo/sora_16x240x426_27.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/f7ce4aaa-528f-40a8-be7a-72e61eaacbbd)  | [<img src=\"assets/demo/sora_16x240x426_40.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/5d58d71e-1fda-4d90-9ad3-5f2f7b75c6a9) |\n\n| **2s 426×240**                                                                                                                                             | **4s 480×854**                                                                                                                                              |\n| ---------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [<img src=\"assets/demo/sora_16x426x240_24.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/34ecb4a0-4eef-4286-ad4c-8e3a87e5a9fd) | [<img src=\"assets/demo/sample_32x480x854_9.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c1619333-25d7-42ba-a91c-18dbc1870b18) |\n\n| **16s 320×320**                                                                                                                                        | **16s 224×448**                                                                                                                                        | **2s 426×240**                                                                                                                                            |\n| ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [<img src=\"assets/demo/sample_16s_320x320.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/3cab536e-9b43-4b33-8da8-a0f9cf842ff2) | [<img src=\"assets/demo/sample_16s_224x448.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/9fb0b9e0-c6f4-4935-b29e-4cac10b373c4) | [<img src=\"assets/demo/sora_16x426x240_3.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/3e892ad2-9543-4049-b005-643a4c1bf3bf) |\n\n</details>\n\n<details>\n<summary>OpenSora 1.0 Demo</summary>\n\n| **2s 512×512**                                                                                                                                                                 | **2s 512×512**                                                                                                                                                              | **2s 512×512**                                                                                                                                    |\n| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [<img src=\"assets/readme/sample_0.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/de1963d3-b43b-4e68-a670-bb821ebb6f80)                                 | [<img src=\"assets/readme/sample_1.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/13f8338f-3d42-4b71-8142-d234fbd746cc)                              | [<img src=\"assets/readme/sample_2.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/fa6a65a6-e32a-4d64-9a9e-eabb0ebb8c16)    |\n| A serene night scene in a forested area. [...] The video is a time-lapse, capturing the transition from day to night, with the lake and forest serving as a constant backdrop. | A soaring drone footage captures the majestic beauty of a coastal cliff, [...] The water gently laps at the rock base and the greenery that clings to the top of the cliff. | The majestic beauty of a waterfall cascading down a cliff into a serene lake. [...] The camera angle provides a bird's eye view of the waterfall. |\n| [<img src=\"assets/readme/sample_3.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/64232f84-1b36-4750-a6c0-3e610fa9aa94)                                 | [<img src=\"assets/readme/sample_4.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/983a1965-a374-41a7-a76b-c07941a6c1e9)                              | [<img src=\"assets/readme/sample_5.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/ec10c879-9767-4c31-865f-2e8d6cf11e65)    |\n| A bustling city street at night, filled with the glow of car headlights and the ambient light of streetlights. [...]                                                           | The vibrant beauty of a sunflower field. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. [...]                                            | A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell [...]                   |\n\nVideos are downsampled to `.gif` for display. Click for original videos. Prompts are trimmed for display,\nsee [here](/assets/texts/t2v_samples.txt) for full prompts.\n\n</details>\n\n## 🔆 New Features/Updates\n\n- 📍 **Open-Sora 1.2** released. Model weights are available [here](#model-weights). See our **[report 1.2](/docs/report_03.md)** for more details.\n- ✅ Support rectified flow scheduling.\n- ✅ Support more conditioning including fps, aesthetic score, motion strength and camera motion.\n- ✅ Trained our 3D-VAE for temporal dimension compression.\n- 📍 **Open-Sora 1.1** released. Model weights are available [here](#model-weights). It is trained on **0s~15s, 144p to 720p, various aspect ratios** videos. See our **[report 1.1](/docs/report_02.md)** for more discussions.\n- 🔧 **Data processing pipeline v1.1** is released. An automatic [processing pipeline](#data-processing) from raw videos to (text, video clip) pairs is provided, including scene cutting $\\rightarrow$ filtering(aesthetic, optical flow, OCR, etc.) $\\rightarrow$ captioning $\\rightarrow$ managing. With this tool, you can easily build your video dataset.\n\n<details>\n<summary>View more</summary>\n\n- ✅ Improved ST-DiT architecture includes rope positional encoding, qk norm, longer text length, etc.\n- ✅ Support training with any resolution, aspect ratio, and duration (including images).\n- ✅ Support image and video conditioning and video editing, and thus support animating images, connecting videos, etc.\n- 📍 **Open-Sora 1.0** released. Model weights are available [here](#model-weights). With only 400K video clips and 200 H800\n  days (compared with 152M samples in Stable Video Diffusion), we are able to generate 2s 512×512 videos. See our **[report 1.0](docs/report_01.md)** for more discussions.\n- ✅ Three-stage training from an image diffusion model to a video diffusion model. We provide the weights for each\n  stage.\n- ✅ Support training acceleration including accelerated transformer, faster T5 and VAE, and sequence parallelism.\n  Open-Sora improves **55%** training speed when training on 64x512x512 videos. Details locates\n  at [acceleration.md](docs/acceleration.md).\n- 🔧 **Data preprocessing pipeline v1.0**,\n  including [downloading](tools/datasets/README.md), [video cutting](tools/scene_cut/README.md),\n  and [captioning](tools/caption/README.md) tools. Our data collection plan can be found\n  at [datasets.md](docs/datasets.md).\n- ✅ We find VQ-VAE from [VideoGPT](https://wilson1yan.github.io/videogpt/index.html) has a low quality and thus adopt a\n  better VAE from [Stability-AI](https://huggingface.co/stabilityai/sd-vae-ft-mse-original). We also find patching in\n  the time dimension deteriorates the quality. See our **[report](docs/report_01.md)** for more discussions.\n- ✅ We investigate different architectures including DiT, Latte, and our proposed STDiT. Our **STDiT** achieves a better\n  trade-off between quality and speed. See our **[report](docs/report_01.md)** for more discussions.\n- ✅ Support clip and T5 text conditioning.\n- ✅ By viewing images as one-frame videos, our project supports training DiT on both images and videos (e.g., ImageNet &\n  UCF101). See [commands.md](docs/commands.md) for more instructions.\n- ✅ Support inference with official weights\n  from [DiT](https://github.com/facebookresearch/DiT), [Latte](https://github.com/Vchitect/Latte),\n  and [PixArt](https://pixart-alpha.github.io/).\n- ✅ Refactor the codebase. See [structure.md](docs/structure.md) to learn the project structure and how to use the\n  config files.\n\n</details>\n\n### TODO list sorted by priority\n\n<details>\n<summary>View more</summary>\n\n- [x] Training Video-VAE and adapt our model to new VAE.\n- [x] Scaling model parameters and dataset size.\n- [x] Incoporate a better scheduler (rectified flow).\n- [x] Evaluation pipeline.\n- [x] Complete the data processing pipeline (including dense optical flow, aesthetics scores, text-image similarity, etc.). See [the dataset](/docs/datasets.md) for more information\n- [x] Support image and video conditioning.\n- [x] Support variable aspect ratios, resolutions, durations.\n\n</details>\n\n## Contents\n\n- [Installation](#installation)\n- [Model Weights](#model-weights)\n- [Gradio Demo](#gradio-demo)\n- [Inference](#inference)\n- [Data Processing](#data-processing)\n- [Training](#training)\n- [Evaluation](#evaluation)\n- [VAE Training & Evaluation](#vae-training--evaluation)\n- [Contribution](#contribution)\n- [Citation](#citation)\n- [Acknowledgement](#acknowledgement)\n\nOther useful documents and links are listed below.\n\n- Report: each version is trained from a image base seperately (not continuously trained), while a newer version will incorporate the techniques from the previous version.\n  - [report 1.2](docs/report_03.md): rectified flow, 3d-VAE, score condition, evaluation, etc.\n  - [report 1.1](docs/report_02.md): multi-resolution/length/aspect-ratio, image/video conditioning/editing, data preprocessing, etc.\n  - [report 1.0](docs/report_01.md): architecture, captioning, etc.\n  - [acceleration.md](docs/acceleration.md)\n- Repo structure: [structure.md](docs/structure.md)\n- Config file explanation: [config.md](docs/config.md)\n- Useful commands: [commands.md](docs/commands.md)\n- Data processing pipeline and dataset: [datasets.md](docs/datasets.md)\n- Each data processing tool's README: [dataset conventions and management](/tools/datasets/README.md), [scene cutting](/tools/scene_cut/README.md), [scoring](/tools/scoring/README.md), [caption](/tools/caption/README.md)\n- Evaluation: [eval/README.md](/eval/README.md)\n- Gallery: [gallery](https://hpcaitech.github.io/Open-Sora/)\n\n## Installation\n\n### Install from Source\n\nFor CUDA 12.1, you can install the dependencies with the following commands. Otherwise, please refer to [Installation Documentation](docs/installation.md) for more instructions on different cuda version, and additional dependency for data preprocessing, VAE, and model evaluation.\n\n```bash\n# create a virtual env and activate (conda as an example)\nconda create -n opensora python=3.9\nconda activate opensora\n\n# download the repo\ngit clone https://github.com/hpcaitech/Open-Sora\ncd Open-Sora\n\n# install torch, torchvision and xformers\npip install -r requirements/requirements-cu121.txt\n\n# the default installation is for inference only\npip install -v . # for development mode, `pip install -v -e .`\n```\n\n(Optional, recommended for fast speed, especially for training) To enable `layernorm_kernel` and `flash_attn`, you need to install `apex` and `flash-attn` with the following commands.\n\n```bash\n# install flash attention\n# set enable_flash_attn=False in config to disable flash attention\npip install packaging ninja\npip install flash-attn --no-build-isolation\n\n# install apex\n# set enable_layernorm_kernel=False in config to disable apex\npip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings \"--build-option=--cpp_ext\" --config-settings \"--build-option=--cuda_ext\" git+https://github.com/NVIDIA/apex.git\n```\n\n### Use Docker\n\nRun the following command to build a docker image from Dockerfile provided.\n\n```bash\ndocker build -t opensora .\n```\n\nRun the following command to start the docker container in interactive mode.\n\n```bash\ndocker run -ti --gpus all -v .:/workspace/Open-Sora opensora\n```\n\n## Model Weights\n\n### Open-Sora 1.2 Model Weights\n\n| Model     | Model Size | Data | #iterations | Batch Size | URL                                                           |\n| --------- | ---------- | ---- | ----------- | ---------- | ------------------------------------------------------------- |\n| Diffusion | 1.1B       | 30M  | 70k         | Dynamic    | [:link:](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v3) |\n| VAE       | 384M       | 3M   | 1M          | 8          | [:link:](https://huggingface.co/hpcai-tech/OpenSora-VAE-v1.2) |\n\nSee our **[report 1.2](docs/report_03.md)** for more infomation. Weight will be automatically downloaded when you run the inference script.\n\n> For users from mainland China, try `export HF_ENDPOINT=https://hf-mirror.com` to successfully download the weights.\n\n### Open-Sora 1.1 Model Weights\n\n<details>\n<summary>View more</summary>\n\n| Resolution         | Model Size | Data                       | #iterations | Batch Size                                        | URL                                                                  |\n| ------------------ | ---------- | -------------------------- | ----------- | ------------------------------------------------- | -------------------------------------------------------------------- |\n| mainly 144p & 240p | 700M       | 10M videos + 2M images     | 100k        | [dynamic](/configs/opensora-v1-1/train/stage2.py) | [:link:](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v2-stage2) |\n| 144p to 720p       | 700M       | 500K HQ videos + 1M images | 4k          | [dynamic](/configs/opensora-v1-1/train/stage3.py) | [:link:](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v2-stage3) |\n\nSee our **[report 1.1](docs/report_02.md)** for more infomation.\n\n:warning: **LIMITATION**: This version contains known issues which we are going to fix in the next version (as we save computation resource for the next release). In addition, the video generation may fail for long duration, and high resolution will have noisy results due to this problem.\n\n</details>\n\n### Open-Sora 1.0 Model Weights\n\n<details>\n<summary>View more</summary>\n\n| Resolution | Model Size | Data   | #iterations | Batch Size | GPU days (H800) | URL                                                                                           |\n| ---------- | ---------- | ------ | ----------- | ---------- | --------------- | --------------------------------------------------------------------------------------------- |\n| 16×512×512 | 700M       | 20K HQ | 20k         | 2×64       | 35              | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth) |\n| 16×256×256 | 700M       | 20K HQ | 24k         | 8×64       | 45              | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth) |\n| 16×256×256 | 700M       | 366K   | 80k         | 8×64       | 117             | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth)    |\n\nTraining orders: 16x256x256 $\\rightarrow$ 16x256x256 HQ $\\rightarrow$ 16x512x512 HQ.\n\nOur model's weight is partially initialized from [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha). The number of\nparameters is 724M. More information about training can be found in our **[report](/docs/report_01.md)**. More about\nthe dataset can be found in [datasets.md](/docs/datasets.md). HQ means high quality.\n\n:warning: **LIMITATION**: Our model is trained on a limited budget. The quality and text alignment is relatively poor.\nThe model performs badly, especially on generating human beings and cannot follow detailed instructions. We are working\non improving the quality and text alignment.\n\n</details>\n\n## Gradio Demo\n\n🔥 You can experience Open-Sora on our [🤗 Gradio application](https://huggingface.co/spaces/hpcai-tech/open-sora) on Hugging Face online.\n\n### Local Deployment\n\nIf you want to deploy gradio locally, we have also provided a [Gradio application](./gradio) in this repository, you can use the following the command to start an interactive web application to experience video generation with Open-Sora.\n\n```bash\npip install gradio spaces\npython gradio/app.py\n```\n\nThis will launch a Gradio application on your localhost. If you want to know more about the Gradio applicaiton, you can refer to the [Gradio README](./gradio/README.md).\n\nTo enable prompt enhancement and other language input (e.g., 中文输入), you need to set the `OPENAI_API_KEY` in the environment. Check [OpenAI's documentation](https://platform.openai.com/docs/quickstart) to get your API key.\n\n```bash\nexport OPENAI_API_KEY=YOUR_API_KEY\n```\n\n### Getting Started\n\nIn the Gradio application, the basic options are as follows:\n\n![Gradio Demo](assets/readme/gradio_basic.png)\n\nThe easiest way to generate a video is to input a text prompt and click the \"**Generate video**\" button (scroll down if you cannot find). The generated video will be displayed in the right panel. Checking the \"**Enhance prompt with GPT4o**\" will use GPT-4o to refine the prompt, while \"**Random Prompt**\" button will generate a random prompt by GPT-4o for you. Due to the OpenAI's API limit, the prompt refinement result has some randomness.\n\nThen, you can choose the **resolution**, **duration**, and **aspect ratio** of the generated video. Different resolution and video length will affect the video generation speed. On a 80G H100 GPU, the generation speed (with `num_sampling_step=30`) and peak memory usage is:\n\n|      | Image   | 2s       | 4s        | 8s        | 16s       |\n| ---- | ------- | -------- | --------- | --------- | --------- |\n| 360p | 3s, 24G | 18s, 27G | 31s, 27G  | 62s, 28G  | 121s, 33G |\n| 480p | 2s, 24G | 29s, 31G | 55s, 30G  | 108s, 32G | 219s, 36G |\n| 720p | 6s, 27G | 68s, 41G | 130s, 39G | 260s, 45G | 547s, 67G |\n\nNote that besides text to video, you can also use **image to video generation**. You can upload an image and then click the \"**Generate video**\" button to generate a video with the image as the first frame. Or you can fill in the text prompt and click the \"**Generate image**\" button to generate an image with the text prompt, and then click the \"**Generate video**\" button to generate a video with the image generated with the same model.\n\n![Gradio Demo](assets/readme/gradio_option.png)\n\nThen you can specify more options, including \"**Motion Strength**\", \"**Aesthetic**\" and \"**Camera Motion**\". If \"Enable\" not checked or the choice is \"none\", the information is not passed to the model. Otherwise, the model will generate videos with the specified motion strength, aesthetic score, and camera motion.\n\nFor the **aesthetic score**, we recommend using values higher than 6. For **motion strength**, a smaller value will lead to a smoother but less dynamic video, while a larger value will lead to a more dynamic but likely more blurry video. Thus, you can try without it and then adjust it according to the generated video. For the **camera motion**, sometimes the model cannot follow the instruction well, and we are working on improving it.\n\nYou can also adjust the \"**Sampling steps**\", this is directly related to the generation speed as it is the number of denoising. A number smaller than 30 usually leads to a poor generation results, while a number larger than 100 usually has no significant improvement. The \"**Seed**\" is used for reproducibility, you can set it to a fixed number to generate the same video. The \"**CFG Scale**\" controls how much the model follows the text prompt, a smaller value will lead to a more random video, while a larger value will lead to a more text-following video (7 is recommended).\n\nFor more advanced usage, you can refer to [Gradio README](./gradio/README.md#advanced-usage).\n\n## Inference\n\n### Open-Sora 1.2 Command Line Inference\n\nThe basic command line inference is as follows:\n\n```bash\n# text to video\npython scripts/inference.py configs/opensora-v1-2/inference/sample.py \\\n  --num-frames 4s --resolution 720p --aspect-ratio 9:16 \\\n  --prompt \"a beautiful waterfall\"\n```\n\nYou can add more options to the command line to customize the generation.\n\n```bash\npython scripts/inference.py configs/opensora-v1-2/inference/sample.py \\\n  --num-frames 4s --resolution 720p --aspect-ratio 9:16 \\\n  --num-sampling-steps 30 --flow 5 --aes 6.5 \\\n  --prompt \"a beautiful waterfall\"\n```\n\nFor image to video generation and other functionalities, the API is compatible with Open-Sora 1.1. See [here](docs/commands.md) for more instructions.\n\nIf your installation do not contain `apex` and `flash-attn`, you need to disable them in the config file, or via the folowing command.\n\n```bash\npython scripts/inference.py configs/opensora-v1-2/inference/sample.py \\\n  --num-frames 4s --resolution 720p \\\n  --layernorm-kernel False --flash-attn False \\\n  --prompt \"a beautiful waterfall\"\n```\n\n### Sequence Parallelism Inference\n\nTo enable sequence parallelism, you need to use `torchrun` to run the inference script. The following command will run the inference with 2 GPUs.\n\n```bash\n# text to video\nCUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node 2 scripts/inference.py configs/opensora-v1-2/inference/sample.py \\\n  --num-frames 4s --resolution 720p --aspect-ratio 9:16 \\\n  --prompt \"a beautiful waterfall\"\n```\n\n:warning: **LIMITATION**: The sequence parallelism is not supported for gradio deployment. For now, the sequence parallelism is only supported when the dimension can be divided by the number of GPUs. Thus, it may fail for some cases. We tested 4 GPUs for 720p and 2 GPUs for 480p.\n\n### GPT-4o Prompt Refinement\n\nWe find that GPT-4o can refine the prompt and improve the quality of the generated video. With this feature, you can also use other language (e.g., Chinese) as the prompt. To enable this feature, you need prepare your openai api key in the environment:\n\n```bash\nexport OPENAI_API_KEY=YOUR_API_KEY\n```\n\nThen you can inference with `--llm-refine True` to enable the GPT-4o prompt refinement, or leave prompt empty to get a random prompt generated by GPT-4o.\n\n```bash\npython scripts/inference.py configs/opensora-v1-2/inference/sample.py \\\n  --num-frames 4s --resolution 720p --llm-refine True\n```\n\n### Open-Sora 1.1 Command Line Inference\n\n<details>\n<summary>View more</summary>\n\nSince Open-Sora 1.1 supports inference with dynamic input size, you can pass the input size as an argument.\n\n```bash\n# text to video\npython scripts/inference.py configs/opensora-v1-1/inference/sample.py --prompt \"A beautiful sunset over the city\" --num-frames 32 --image-size 480 854\n```\n\nIf your installation do not contain `apex` and `flash-attn`, you need to disable them in the config file, or via the folowing command.\n\n```bash\npython scripts/inference.py configs/opensora-v1-1/inference/sample.py --prompt \"A beautiful sunset over the city\" --num-frames 32 --image-size 480 854 --layernorm-kernel False --flash-attn False\n```\n\nSee [here](docs/commands.md#inference-with-open-sora-11) for more instructions including text-to-image, image-to-video, video-to-video, and infinite time generation.\n\n</details>\n\n### Open-Sora 1.0 Command Line Inference\n\n<details>\n<summary>View more</summary>\n\nWe have also provided an offline inference script. Run the following commands to generate samples, the required model weights will be automatically downloaded. To change sampling prompts, modify the txt file passed to `--prompt-path`. See [here](docs/structure.md#inference-config-demos) to customize the configuration.\n\n```bash\n# Sample 16x512x512 (20s/sample, 100 time steps, 24 GB memory)\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path OpenSora-v1-HQ-16x512x512.pth --prompt-path ./assets/texts/t2v_samples.txt\n\n# Sample 16x256x256 (5s/sample, 100 time steps, 22 GB memory)\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path OpenSora-v1-HQ-16x256x256.pth --prompt-path ./assets/texts/t2v_samples.txt\n\n# Sample 64x512x512 (40s/sample, 100 time steps)\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/64x512x512.py --ckpt-path ./path/to/your/ckpt.pth --prompt-path ./assets/texts/t2v_samples.txt\n\n# Sample 64x512x512 with sequence parallelism (30s/sample, 100 time steps)\n# sequence parallelism is enabled automatically when nproc_per_node is larger than 1\ntorchrun --standalone --nproc_per_node 2 scripts/inference.py configs/opensora/inference/64x512x512.py --ckpt-path ./path/to/your/ckpt.pth --prompt-path ./assets/texts/t2v_samples.txt\n```\n\nThe speed is tested on H800 GPUs. For inference with other models, see [here](docs/commands.md) for more instructions.\nTo lower the memory usage, set a smaller `vae.micro_batch_size` in the config (slightly lower sampling speed).\n\n</details>\n\n## Data Processing\n\nHigh-quality data is crucial for training good generation models.\nTo this end, we establish a complete pipeline for data processing, which could seamlessly convert raw videos to high-quality video-text pairs.\nThe pipeline is shown below. For detailed information, please refer to [data processing](docs/data_processing.md).\nAlso check out the [datasets](docs/datasets.md) we use.\n\n![Data Processing Pipeline](assets/readme/report_data_pipeline.png)\n\n## Training\n\n### Open-Sora 1.2 Training\n\nThe training process is same as Open-Sora 1.1.\n\n```bash\n# one node\ntorchrun --standalone --nproc_per_node 8 scripts/train.py \\\n    configs/opensora-v1-2/train/stage1.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n# multiple nodes\ncolossalai run --nproc_per_node 8 --hostfile hostfile scripts/train.py \\\n    configs/opensora-v1-2/train/stage1.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n```\n\n### Open-Sora 1.1 Training\n\n<details>\n<summary>View more</summary>\n\nOnce you prepare the data in a `csv` file, run the following commands to launch training on a single node.\n\n```bash\n# one node\ntorchrun --standalone --nproc_per_node 8 scripts/train.py \\\n    configs/opensora-v1-1/train/stage1.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n# multiple nodes\ncolossalai run --nproc_per_node 8 --hostfile hostfile scripts/train.py \\\n    configs/opensora-v1-1/train/stage1.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n```\n\n</details>\n\n### Open-Sora 1.0 Training\n\n<details>\n<summary>View more</summary>\n\nOnce you prepare the data in a `csv` file, run the following commands to launch training on a single node.\n\n```bash\n# 1 GPU, 16x256x256\ntorchrun --nnodes=1 --nproc_per_node=1 scripts/train.py configs/opensora/train/16x256x256.py --data-path YOUR_CSV_PATH\n# 8 GPUs, 64x512x512\ntorchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n```\n\nTo launch training on multiple nodes, prepare a hostfile according\nto [ColossalAI](https://colossalai.org/docs/basics/launch_colossalai/#launch-with-colossal-ai-cli), and run the\nfollowing commands.\n\n```bash\ncolossalai run --nproc_per_node 8 --hostfile hostfile scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n```\n\nFor training other models and advanced usage, see [here](docs/commands.md) for more instructions.\n\n</details>\n\n## Evaluation\n\nWe support evaluation based on:\n\n- Validation loss\n- [VBench](https://github.com/Vchitect/VBench/tree/master) score\n- VBench-i2v score\n- Batch generation for human evaluation\n\nAll the evaluation code is released in `eval` folder. Check the [README](/eval/README.md) for more details. Our [report](/docs/report_03.md#evaluation) also provides more information about the evaluation during training. The following table shows Open-Sora 1.2 greatly improves Open-Sora 1.0.\n\n| Model          | Total Score | Quality Score | Semantic Score |\n| -------------- | ----------- | ------------- | -------------- |\n| Open-Sora V1.0 | 75.91%      | 78.81%        | 64.28%         |\n| Open-Sora V1.2 | 79.23%      | 80.71%        | 73.30%         |\n\n## VAE Training & Evaluation\n\nWe train a VAE pipeline that consists of a spatial VAE followed by a temporal VAE.\nFor more details, refer to [VAE Documentation](docs/vae.md).\nBefore you run the following commands, follow our [Installation Documentation](docs/installation.md) to install the required dependencies for VAE and Evaluation.\n\nIf you want to train your own VAE, we need to prepare data in the csv following the [data processing](#data-processing) pipeline, then run the following commands.\nNote that you need to adjust the number of trained epochs (`epochs`) in the config file accordingly with respect to your own csv data size.\n\n```bash\n# stage 1 training, 380k steps, 8 GPUs\ntorchrun --nnodes=1 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage1.py --data-path YOUR_CSV_PATH\n# stage 2 training, 260k steps, 8 GPUs\ntorchrun --nnodes=1 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage2.py --data-path YOUR_CSV_PATH\n# stage 3 training, 540k steps, 24 GPUs\ntorchrun --nnodes=3 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage3.py --data-path YOUR_CSV_PATH\n```\n\nTo evaluate the VAE performance, you need to run VAE inference first to generate the videos, then calculate scores on the generated videos:\n\n```bash\n# video generation\ntorchrun --standalone --nnodes=1 --nproc_per_node=1 scripts/inference_vae.py configs/vae/inference/video.py --ckpt-path YOUR_VAE_CKPT_PATH --data-path YOUR_CSV_PATH --save-dir YOUR_VIDEO_DIR\n# the original videos will be saved to `YOUR_VIDEO_DIR_ori`\n# the reconstructed videos through the pipeline will be saved to `YOUR_VIDEO_DIR_rec`\n# the reconstructed videos through the spatial VAE only will be saved to `YOUR_VIDEO_DIR_spatial`\n\n# score calculation\npython eval/vae/eval_common_metric.py --batch_size 2 --real_video_dir YOUR_VIDEO_DIR_ori --generated_video_dir YOUR_VIDEO_DIR_rec --device cuda --sample_fps 24 --crop_size 256 --resolution 256 --num_frames 17 --sample_rate 1 --metric ssim psnr lpips flolpips\n```\n\n## Contribution\n\nThanks goes to these wonderful contributors:\n\n<a href=\"https://github.com/hpcaitech/Open-Sora/graphs/contributors\">\n  <img src=\"https://contrib.rocks/image?repo=hpcaitech/Open-Sora\" />\n</a>\n\nIf you wish to contribute to this project, please refer to the [Contribution Guideline](./CONTRIBUTING.md).\n\n## Acknowledgement\n\nHere we only list a few of the projects. For other works and datasets, please refer to our report.\n\n- [ColossalAI](https://github.com/hpcaitech/ColossalAI): A powerful large model parallel acceleration and optimization\n  system.\n- [DiT](https://github.com/facebookresearch/DiT): Scalable Diffusion Models with Transformers.\n- [OpenDiT](https://github.com/NUS-HPC-AI-Lab/OpenDiT): An acceleration for DiT training. We adopt valuable acceleration\n  strategies for training progress from OpenDiT.\n- [PixArt](https://github.com/PixArt-alpha/PixArt-alpha): An open-source DiT-based text-to-image model.\n- [Latte](https://github.com/Vchitect/Latte): An attempt to efficiently train DiT for video.\n- [StabilityAI VAE](https://huggingface.co/stabilityai/sd-vae-ft-mse-original): A powerful image VAE model.\n- [CLIP](https://github.com/openai/CLIP): A powerful text-image embedding model.\n- [T5](https://github.com/google-research/text-to-text-transfer-transformer): A powerful text encoder.\n- [LLaVA](https://github.com/haotian-liu/LLaVA): A powerful image captioning model based on [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) and [Yi-34B](https://huggingface.co/01-ai/Yi-34B).\n- [PLLaVA](https://github.com/magic-research/PLLaVA): A powerful video captioning model.\n- [MiraData](https://github.com/mira-space/MiraData): A large-scale video dataset with long durations and structured caption.\n\nWe are grateful for their exceptional work and generous contribution to open source. Special thanks go to the authors of [MiraData](https://github.com/mira-space/MiraData) and [Rectified Flow](https://github.com/gnobitab/RectifiedFlow) for their valuable advice and help. We wish to express gratitude towards AK for sharing this project on social media and Hugging Face for providing free GPU resources for our online Gradio demo.\n\n## Citation\n\n```bibtex\n@software{opensora,\n  author = {Zangwei Zheng and Xiangyu Peng and Tianji Yang and Chenhui Shen and Shenggui Li and Hongxin Liu and Yukun Zhou and Tianyi Li and Yang You},\n  title = {Open-Sora: Democratizing Efficient Video Production for All},\n  month = {March},\n  year = {2024},\n  url = {https://github.com/hpcaitech/Open-Sora}\n}\n```\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=hpcaitech/Open-Sora&type=Date)](https://star-history.com/#hpcaitech/Open-Sora&Date)\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/all_category.txt",
    "content": "a black dog wearing halloween costume\nspider making a web\nbat eating fruits while hanging\na snake crawling on a wooden flooring\na close up video of a dragonfly\nmacro shot of ladybug on green leaf plant\nchameleon eating ant\na bee feeding on nectars\nbird nests on a tree captured with moving camera\na squirrel eating nuts\nclose up video of snail\ntop view of a hermit crab crawling on a wooden surface\ncat licking another cat\nred dragonfly perched on green leaf\nclose up view of a brown caterpillar crawling on green leaf\nants eating dead spider\nan eagle on a tree branch\na frog eating an ant\nwhite rabbit near the fence\na gorilla eating a carrot\nclose up of wolf\na meerkat looking around\na hyena in a zoo\nlemur eating grass leaves\nan owl being trained by a man\na lizard on a bamboo\nbrown chicken hunting for its food\nvideo of parrots perched on bird stand\nunderwater footage of an octopus in a coral reef\na cute pomeranian dog playing with a soccer ball\nwhite fox on rock\nclose up footage of a horse figurine\ngiraffe feeding on a tree in a savannah\ncurious cat sitting and looking around\nhummingbird hawk moth flying near pink flowers\nclose up of a scorpion on a rock\nclose up on fish in net\nkoala eating leaves from a branch\na pod of dolphins swirling in the sea catching forage fish\nlow angle view of a hawk perched on a tree branch\na lion standing on wild grass\ndeer grazing in the field\nelephant herd in a savanna\nclose up on lobster under water\nhedgehog crossing road in forest\na sheep eating yellow flowers from behind a wire fence\ntwin sisters and a turtle\na pig wallowing in mud\nflock of goose eating on the lake water\ncow in a field irritated with flies\na close up shot of a fly\ncheetah lying on the grass\nclose up of a lemur\nclose up shot of a kangaroo itching in the sand\na tortoise covered with algae\nturkey in cage\na great blue heron bird in the lakeside\ncrab with shell in aquarium\na seagull walking on shore\nan american crocodile\na tiger walking inside a cage\nalligator in the nature\na raccoon climbing a tree\nwild rabbit in a green meadow\ngroup of ring tailed lemurs\na clouded leopard on a tree branch\nduck grooming its feathers\nan african penguin walking on a beach\na video of a peacock\nclose up shot of a wild bear\nbaby rhino plays with mom\nporcupine climbs tree branches\nclose up of a natterjack toad on a rock\na sleeping orangutan\nmother whale swimming with babies\na bear wearing red jersey\npink jellyfish swimming underwater in a blue sea\nbeautiful clown fish swimming\nanimation of disposable objects shaped as a whale\npaper cut out of a pair of hands a whale and a heart\nvertical video of camel roaming in the field during daytime\na still video of mosquito biting human\na curious sloth hanging from a tree branch\na plastic flamingo bird stumbles from the wind\na wolf in its natural habitat\na monkey sitting in the stone and scratching his head\nbat hanging upside down\na red panda eating leaves\nsnake on ground\na harbour seal swimming near the shore\nshark swimming in the sea\notter on branch while eating\ngoat standing over a rock\na troop of monkey on top of a mountain\na zebra eating grass on the field\na colorful butterfly perching on a bud\na snail crawling on a leaf\nzookeeper showering a baby elephant\na beetle emerging from the sand\na nine banded armadillo searching for food\nan apartment building with balcony\nasian garden and medieval castle\nilluminated tower in berlin\na wooden house overseeing the lake\na crowd of people in a plaza in front of a government building\na church interior\njewish friends posing with hanukkah menorah in a cabin house\na destroyed building after a missile attack in ukraine\nabandoned building in the woods\ndrone video of an abandoned school building in pripyat ukraine\nelegant university building\narchitecture and designs of buildings in central london\na pancake tower with chocolate syrup and strawberries on top\nan ancient white building\nfriends hanging out at a coffee house\nhouse front door with christmas decorations\ncity night dark building\na bird house hanging on a tree branch\nsacred sculpture in a temple\nhigh angle shot of a clock tower\nmodern wooden house interior\nthe interior of an abandoned building\nopera house overlooking sea\na concrete structure near the green trees\ndome like building in scotland\nlow angle shot of a building\ntower on hill\na miniature house\neiffel tower from the seine river\nlow angle footage of an apartment building\nisland with pier and antique building\nasian historic architecture\ndrone footage of a beautiful mansion\nmosque in the middle east\nbuilding a tent and hammock in the forest camping site\ntop view of a high rise building\nhouse covered in snow\nskyscraper at night\nhouse in village\na casino with people outside the building\nsilhouette of a building\na woman climbing a tree house\ndrone view of house near lake during golden hour\nan under construction concrete house\na watch tower by the sea\nexterior view of arabic style building\nvideo of a hotel building\nred paper lantern decorations hanging outside a building\nhouse on seashore\naerial footage of the palace of culture and science building in warsaw poland\naerial video of stuttgart tv tower in germany\naerial view of the highway and building in a city\ndrone shot of a skyscraper san francisco california usa\nwaterfall and house\nview of the sky through a building\ndrone footage of a house on top of the mountain\nabandoned house in the nature\nclouds hovering over a mansion\nlight house on the ocean\nbuddhist temple at sunrise\npeople walking by a graveyard near a mosque at sunset\nview of lifeguard tower on the beach\nscenic view of a house in the mountains\nthe landscape in front of a government building\naerial footage of a building and its surrounding landscape in winter\ntime lapse of a cloudy sky behind a transmission tower\nblue ocean near the brown castle\nfog over temple\nhouse in countryside top view\nbuilding under construction\nturkish flag waving on old tower\nthe georgian building\nclose up shot of a steel structure\nthe atrium and interior design of a multi floor building\ncity view reflected on a glass building\naerial view of a luxurious house with pool\nan unpaved road leading to the house\ndrone footage of a lookout tower in mountain landscape\nwind turbines on hill behind building\ntime lapse footage of the sun light in front of a small house porch\na building built with lots of stairways\novercast over house on seashore\nthe view of the sydney opera house from the other side of the harbor\ncandle on a jar and a house figurine on a surface\nvideo of a farm and house\na dilapidated building made of bricks\na view of a unique building from a moving vehicle\naerial footage of a tall building in cambodia\npush in shot of a huge house\na beach house built over a seawall protected from the sea waves\nexotic house surrounded by trees\ndrone video of a house surrounded by tropical vegetation\ndrone footage of a building beside a pond\nobservation tower on hill in forest\na tree house in the woods\na video of vessel structure during daytime\nfire in front of illuminated building at night\na footage of a wooden house on a wheat field\ntilt shot of a solar panel below a light tower\nwater tower on the desert\nfreshly baked finger looking cookies\nvideo of fake blood in wine glass\nhalloween food art\na person slicing a vegetable\na serving of pumpkin dish in a plate\nclose up view of green leafy vegetable\na birthday cake in the plate\nvideo of a slice papaya fruit\na muffin with a burning candle and a love sign by a ceramic mug\na jack o lantern designed cookie\nbaked bread with chocolate\na broccoli soup on wooden table\na freshly brewed coffee on a pink mug\ngrabbing sourdough neapolitan style pizza slices\nperson cooking mushrooms in frying pan\nrice grains placed on a reusable cloth bag\nslices of kiwi fruit\ngrilling a steak on a pan grill\nclose up of bread popping out of a toaster\nman eating noodle\npreparing a cocktail drink\nclose up pasta with bacon on plate\nmilk and cinnamon rolls\nboy getting a dumpling using chopsticks\na mother preparing food with her kids\nman using his phone while eating\nfresh salmon salad on a plate\ncutting cucumbers into long thin slices as ingredient for sushi roll\na steaming cup of tea by the window\na glass filled with beer\na kid eating popcorn while watching tv\nclose up shot of fried fish on the plate\na man eating a donut\nperson making a vegetarian dish\nspreading cheese on bagel\nclose up view of a man drinking red wine\na couple having breakfast in a restaurant\na student eating her sandwich\ngirl peeling a banana\nred rice in a small bowl\npancake with blueberry on the top\ngreen apple fruit on white wooden table\na man eating a taco by the bar\nmaking of a burrito\nsqueezing lemon into salad\na chef cutting sushi rolls\nvideo of a delicious dessert\ndeep frying a crab on a wok in high fire\nclose up video of a orange juice\nvideo of a cooked chicken breast\nwoman holding a pineapple\na woman eating a bar of chocolate\ndecorating christmas cookie\nsqueezing a slice of fruit\ntuna sashimi on a plate\na strawberry fruit mixed in an alcoholic drink\npreparing hot dogs in a grill\na woman cutting a tomato\nan orange fruit cut in half\na coconut fruit with drinking straw\nwoman holding a dragon fruit\na woman pouring hot beverage on a cup\nwaffles with whipped cream and fruit\nfocus shot of an insect at the bottom of a fruit\npreparing a healthy broccoli dish\nman eating snack at picnic\nclose up video of a grilled shrimp skewer\na woman mixing a smoothie drinks\nclose up video of woman having a bite of jelly\nbusinessman drinking whiskey at the bar counter of a hotel lounge\ncutting an onion with a knife over a wooden chopping board\nfresh lemonade in bottles\ngrilling a meat on a charcoal grill\npeople enjoying asian cuisine\nclose up footage of a hot dish on a clay pot\npork ribs dish\nwaffle with strawberry and syrup for breakfast\ntofu dish with rose garnish\nuncooked pork meat\negg yolk being dumped over gourmet dish\ntasty brunch dish close up\nlittle boy pretending to eat the watermelon\nslicing roasted beef\nclose up of a chef adding teriyaki sauce to a dish\nflat lay mexican dish\na person placing an octopus dish on a marble surface\nclose up of tea leaves brewing in a glass kettle\nadding fresh herbs to soup dish\na scoop of roasted coffee beans\nfresh dim sum set up on a bamboo steam tray for cooking\na girl putting ketchup on food at the kitchen\ncooking on electric stove\na woman with a slice of a pie\ngrapes and wine on a wooden board\nman taking picture of his food\nhamburger and fries on restaurant table\nclose up video of japanese food\na cracker sandwich with cheese filling for snack\nbarista preparing matcha tea\nclose up of onion rings being deep fried\npeople carving a pumpkin\npeople sitting on a sofa\na man with a muertos face painting\nman walking in the dark\nmen in front of their computer editing photos\nmen loading christmas tree on tow truck\nwoman washing the dishes\nwoman adding honey to the cinnamon rolls\ntwo women kissing and smiling\nthree women looking at watercolor paintings\na family wearing paper bag masks\na family posing for the camera\na boy covering a rose flower with a dome glass\nboy sitting on grass petting a dog\na girl in her tennis sportswear\na girl coloring the cardboard\nsilhouette of the couple during sunset\ncouple dancing with body paint\na child playing with water\na woman with her child sitting on a couch in the living room\na group of friend place doing hand gestures of agreement\nfriends having a group selfie\nfriends talking while on the basketball court\ngroup of people protesting\na group of campers with a cute dog\na group of photographers taking pictures at the north western gardens in llandudno north wales\na group of students laughing and talking\na group of martial artist warming up\na person playing golf\na person walking on a wet wooden bridge\nperson doing a leg exercise\nice hockey athlete on rink\na young athlete training in swimming\nchess player dusting a chessboard\nbaseball player holding his bat\na bearded man putting a vinyl record on a vinyl player\nan orchestra finishes a performance\npeople applauding the performance of the kids\nband performance at the recording studio\nfather and his children playing jenga game\npeople playing a board game\nman playing a video game\na man video recording the movie in theater\nman and a woman eating while watching a movie\nmovie crew talking together\na director explaining the movie scene\nman and woman listening to music on car\nman playing music\ncouple dancing slow dance with sun glare\na ballerina practicing in the dance studio\nfather and son holding hands\nfather and daughter talking together\na mother and her kids engaged in a video call\nmother and daughter reading a book together\na mother teaching her daughter playing a violin\nkid in a halloween costume\na happy kid playing the ukulele\na chef slicing a cucumber\nchef wearing his gloves properly\nbrother and sister using hammock\ngirl applying sunblock to her brother\na girl pushing the chair while her sister is on the chair\ncolleagues talking in office building\nfighter practice kicking\na woman fighter in her cosplay costume\nan engineer holding blueprints while talking with her colleague\na young woman looking at vr controllers with her friend\nworkmates teasing a colleague in the work\na male police officer talking on the radio\nteacher holding a marker while talking\nteacher writing on her notebook\na young student attending her online classes\na student showing his classmates his wand\na male vendor selling fruits\na shirtless male climber\na sound engineer listening to music\nfemale talking to a psychiatrist in a therapy session\nyoung female activist posing with flag\na man in a hoodie and woman with a red bandana talking to each other and smiling\na medium close up of women wearing kimonos\na male interviewer listening to a person talking\na social worker having a conversation with the foster parents\na farm worker harvesting onions\nworker packing street food\nworker and client at barber shop\nelderly man lifting kettlebell\nmom assisting son in riding a bicycle\ndad watching her daughter eat\nyoung guy with vr headset\npregnant woman exercising with trainer\na fortune teller talking to a client\nwizard doing a ritual on a woman\na footage of an actor on a movie scene\na man holding a best actor trophy\na singer of a music band\na young singer performing on stage\nyoung dancer practicing at home\nseller showing room to a couple\ncab driver talking to passenger\na policeman talking to the car driver\nkids celebrating halloween at home\nlittle boy helping mother in kitchen\nvideo of a indoor green plant\na girl arranges a christmas garland hanging by the kitchen cabinet\ncandle burning in dark room\ncouple having fun and goofing around the bedroom\ngirls jumping up and down in the bedroom\nwoman and man in pajamas working from home\na muslim family sitting and talking in the living room\nfamily enjoying snack time while sitting in the living room\nwoman holding an animal puppet and a little girl playing together at the living room\nkids playing in the indoor tent\nyoung people celebrating new year at the office\na woman writing on the sticky note in the office\na woman exercising at home over a yoga mat\ngirls preparing easter decorations at home\ndog on floor in room\nturning on a fluorescent light inside a room\ncolleagues talking to each other near the office windows\na woman recording herself while exercising at home\nmusic room\ndifferent kind of tools kept in a utility room\nsofa beds and other furniture\na girl finding her brother reading a book in the bedroom\nan elegant ceramic plant pot and hanging plant on indoor\nfurniture inside a bedroom\ninterior design of the bar section\nliving room with party decoration\nfirewood burning in dark room\na young woman playing the ukulele at home\nwoman painting at home\na woman in a locker room\nvideo of a bathroom interior\nthe interior design of a jewish synagogue\na woman in protective suit disinfecting the kitchen\nmodern minimalist home interior\nmodern interior design of a coffee shop\nperson arranging minimalist furniture\naerial shot of interior of the warehouse\na room of a manufacturing facility\ninterior of catholic\ninterior design of a restaurant\na female model in a changing room looking herself in mirror\nmen walking in the office hallway\npeople sitting in a conference room\nthe interior design of a shopping mall\nchandeliers in room\nlucerne railway station interior\na female fencer posing in a foggy room\na toolbox and a paint roller beside a huge package in a room\nbedroom in hotel\na woman lying in the operating room\na chef holding and checking kitchen utensils\na couple singing in the shower room together\na woman cleaning mess in the living room\nan empty meeting room with natural light\nperson dancing in a dark room\nclose up on blood in hospital room\na couple resting on their home floor\na young female staff at courier office\na man entering the gym locker room\na bored man sitting by the tv at home\nwoman dancing in indoor garden\nrubble in the interior of an abandoned house\nindoor farm in a greenhouse\nman doing handstand in indoor garden\nan abandoned indoor swimming pool\nhome decorations on top of a cabinet\ngraffiti art on the interior walls of an abandoned mansion\nindoor wall climbing activity\nsunlight inside a room\nteenage girl roller skating at indoor rink\nhome deco with lighted\nbaby in the shower room\nmen enjoying office christmas party\na bedroom with a brick wall\nactors prepping in the dressing room\nkids playing at an indoor playground\na person sanitizing an office space using smoke machine\nmother and daughter choosing clothes at home\na woman sitting by the indoor fire pit\nman standing on the corner of the room while looking around\nperson assembling furniture\na family stacking cardboard boxes in a room\nfamily having fun in the dining room\nperson disinfecting a room\na woman washing strawberries in the kitchen sink\nmodern office waiting room\nclose up view of a person slicing with a kitchen knife\nboiling coffee on a stove in the kitchen\nmodern equipment used in a home studio\ninterior of a recording studio\npeople working in a call center office\nband performing at a home concert\na group of people watching a concert in a room\npeople packing their furniture\nyoung employees in office holding a certificate\na criminal inside a dark room handcuffed in a table\ncouple browsing and looking for furniture in the store\nworkspace at home\nvideo of a indoor green plant\nclose up view of a plant\nclose up shot of a burning plant\nplucking leaves from plant\na plant on gold pot with glass lid\na branch of a tree and a plant\na leafless tree\nclose up shot of fern leaf\nclose up video of strawberry plant\nplant with blooming flowers\nclose up video of flower petals\nwatering yellow plant\nbeautiful flower decoration\ncannabis flower in a jar\na footage of the tree leaves\na red leaf plant\nclose up view of a white christmas tree\nsnow pouring on a tree\nclose up shot of white flowers on the tree\nleaves in the trees daytime\na dead tree lying on a grass field\ntree branches in a flowing river\npurple flowers with leaves\na coconut tree by the house\nclose up on flower in winter\nbamboo leaves backlit by the sun\nclose up video of a wet flower\na man putting a flower in a box\ndropping flower petals on a wooden bowl\na close up shot of gypsophila flower\nvariety of succulent plants on a garden\nvariety of trees and plants in a botanical garden\nforest of deciduous trees\na stack of dried leaves burning in a forest\ntall forest trees on a misty morning\nclose up view of dewdrops on a leaf\nclose up view of white petaled flower\nremoving a pineapple leaf\na dragonfly perched on a leaf\nbutterfly pollinating flower\nperson visiting and checking a corn plant\nwoman picking beans from a plant\nwoman plucking mint leaves\nsingle tree in the middle of farmland\na plant on a soil\ndrone footage of a tree on farm field\na tractor harvesting lavender flower\npeople putting christmas ornaments on a christmas tree\njack o lantern hanging on a tree\ntree with halloween decoration\nflower field near the waterfall\ntruck carrying the tree logs\nraindrops falling on leaves\nshot of a palm tree swaying with the wind\nsquirrels on a tree branch\nperson holding a flower\na fallen tree trunk\ntree with golden leaves\ncherry tree\nwind blows through leaves of the tree in autumn\na leaf on a glass\nthe long trunks of tall trees in the forest\ntrees in the forest during sunny day\nclose up video of tree bark\nreflection of tree branches\ntrunks of many trees in the forest\ntree leaves providing shades from the sun\nleaves swaying in the wind\nlow angle shot of baobab tree\nbare trees in forest\na plant surrounded by fallen leaves\na couple preparing food and pruning a plant\na man cutting a tree bark\noranges on a tree branch\nplant connected on the stones\nvideo of a sawmill machine cutting tree log\nwomen drying flower petals\nmacro view of an agave plant\na video of a person tying a plant on a string\ngreen moss in forest nature\ncoconut tree near sea under blue sky\nthe canopy of a coconut tree\na man leaning on a tree at the beach\na full grown plant on a pot\ncandle wax dripping on flower petals\nclose up of leaves in autumn\na woman opening a book with a flower inside\na man holding leaves looking at the camera\na shadow of a swaying plant\na tree and concrete structure under a blue and cloudy sky\ntrimming excess leaves on a potted plant\nthe changing color of the tree leaves during autumn season\na gooseberry tree swayed by the wind\nforest trees and a medieval castle at sunset\nwoman cut down tree\nan old oak tree in a park across the street from a hotel\nwild flowers growing in a forest ground\na mossy fountain and green plants in a botanical garden\nmansion with beautiful garden\nants on a dragon fruit flower\nscenery of desert landscape\nlandscape agriculture farm tractor\nburning slash piles in the forest\ngraveyard at sunset\nview of a jack o lantern with pumpkins in a smoky garden\nsun view through a spider web\nview of the sea from an abandoned building\nclose up view of a full moon\nclose up view of lighted candles\nclose up view of swaying white flowers and leaves\nscenery of a relaxing beach\nselective focus video of grass during sunny day\naerial view of brown dry landscape\nfireworks display in the sky at night\na bonfire near river\nmountain view\nwaterfalls in between mountain\na picturesque view of nature\nexotic view of a riverfront city\ntall trees in the forest under the clear sky\nsnow on branches in forest\nstream in the nature\nan airplane flying above the sea of clouds\nscenic video of sunset\nview of houses with bush fence under a blue and cloudy sky\nscenic view from wooden pathway\nscenic view of a tropical beach\ndrone footage of waves crashing on beach shore\na scenic view of the golden hour at norway\ntime lapse video of foggy mountain forest\nbrown mountain during fall season\nvideo of ocean during daytime\nboat sailing in the ocean\ntop view of yachts\nbeautiful scenery of flowing waterfalls and river\nwild ducks paddling on the lake surface\na relaxing scenery of beach view under cloudy sky\nnatural rock formations on beach under cloudy sky\na palm tree against blue sky\nvideo of sailboat on a lake during sunset\naerial view of snow piles\ntime lapse of a sunset sky in the countryside\naerial footage of a statue\ntime lapse video of a farm during sunset\nclouds formation in the sky at sunset\naerial shot of a village\ndrone shot of a beautiful sunrise at the mountains\ntime lapse video of foggy morning during sunrise\nsun shining between tree leaves at sunrise\nvideo of lake during dawn\nvehicles traveling on roadway under cloudy sky\nview of golden domed church\na monument under the blue sky\nfirecrackers in the sky\nview of fruit signage in the farm\na dark clouds over shadowing the full moon\nview of the amazon river\na big river swamp in a dense forest\na blooming cherry blossom tree under a blue sky with white clouds\na river waterfall cascading down the plunge basin\nflooded landscape with palm trees\na blurry waterfall background\nwaterfall in the mountains\naerial footage of a city at night\npond by small waterfall in forest\naerial view of farmlands at the bay of lake\nrice terraces in the countryside\na highway built across an agricultural area in the countryside\ngloomy morning in the countryside\ndrone shot of an abandoned coliseum on a snowy mountain top\nboat sailing in the middle of ocean\ndrone shot of the grass field\nnatural landscape of mountain and sea with islets developed into a community\naerial view of zaporizhia in ukraine\naerial footage of a herd\nan aerial footage of a red sky\ngrass and plants growing in the remains of an abandoned house\nview from hill on city\naerial view on orthodox church\naerial view of bay in croatia\na footage of a frozen river\noverlooking view of a city at daylight\nview outside the cemetery\nclear sky with moon over meadow\nclouds over railway\naerial footage of moving vehicles on the road at night\naerial view of town and park\ntop view of skyscrapers\ntop view of the empire state building in manhattan\ntop view of the central park in new york city\nsheep running in a grass field\nclear sky over factory\nsmoke and fire in birds eye view\nview of a pathway with snow melting on its side\nferry under bridge on river near city in malaysia\nmountain slopes covered in green vegetation\npanoramic view of a town surrounded by snow covered mountains\naerial view of a palace\ntop view of vehicles driving on the intersection\na graveyard by a church in a mountain landscape\na modern railway station in malaysia use for public transportation\ndrone footage of amsterdam metro station\ntrain arriving at a station\nred vehicle driving on field\nclose up view of flashing emergency vehicle lighting\nvehicle with fertilizer on field\na highway built across an agricultural area in the countryside\ndrone footage of motorcycles driving on country road between agricultural fields\na road in the woods under fog\nfootage of a car driving through a wheat field\nvehicle stops for an ambulance passing through city traffic\nemergency vehicle parked outside the casino\nzombies attacking a woman and a boy inside a car\nwoman seating inside the car while chewing\nvideo of passengers riding a double decker bus during night\ntraffic in london street at night\nelderly couple checking engine of automobile\na green vintage automobile with an open hood parked in a parking area\nclose up of a prototype automobile with exposed engine on the back seat of the car\naerial view of road in forest\ntrain departing from station\naerial view of a train passing by a bridge\nvideo of a train tracks\nvideo footage of a subway\nvideo of blinking traffic lights\ncouple walking out on the subway\ntime lapse of a subway tunnel\nmonitor board inside the subway\nmetro train at night\nzoom in video of a tram passing by city\nyoung man using laptop in the tram\nman reading a book at bus stop\nclose up shot of a moving taxi\nnight travel in london street on a public bus\nred bus in a rainy city\nflow of traffic in the city\nclose up shot of a yellow taxi turning left\ntwo women calling for a taxi\ndrone view of an illuminated bridge across a river\npoliceman in police car talking on radio\nairplane taking off at night\nview through window in airplane\nan airplane in the sky\nhelicopter landing on the street\na pilot getting out of a helicopter\na helicopter flying under blue sky\nboat sailing in the middle of the ocean\ngirl playing with a toy boat\nsilhouette of a boat on sea during golden hour\na boat travelling around the lake\nroad on mountain ridge\nship sailing on danube river\nslow motion video of a ship water trail in the sea\ndrone footage of a wreck ship on shore\na white yacht traveling on a river and passing under the bridge\nfemale teenagers drinking champagne in the yacht\nvideo of yacht sailing in the ocean\nred combine harvester on road on field\na woman sitting on a bicycle while using a mobile phone\na woman sitting on a motorcycle looking around\nthree teenagers fixing a bicycle\na woman in a halloween costume posing on a motorcycle\na parked motorcycle on a foggy roadside\ncable car near sea shore\na truck travelling in the road\nfootage of the road without any traffic\na road sign\nlove padlocks on a bridge\ncamera moving at highway construction site\nvehicles driving on highway\na motorbike on highway at timelapse mode\npoint of view of a car driving through a tunnel\ntime lapse of heavy traffic on an avenue\nferry boat on city canal\nblack vintage car in museum\na zigzag road across a forest\npeople crossing the road\nvideo of a kayak boat in a river\na person paddling a wooden boat in a lake\na car charging in the parking area\ncars parked on the road\nfootage of the street with people and vehicle passing by in the rain\ntraffic on busy city street\na woman getting out of the car to walk with their dog\nyacht sailing through the ocean\npeople in queue to military ship\nman wearing motorcycle helmet looking at the camera\nempty seats in the bus\nempty boat on the water\ncargo train traveling on the mountainside\ncruise ship in harbor\ncounting down at traffic lights\npressing the car ignition\nfire truck driving on the road\na footage of a broken bicycle\ndrone footage of an ambulance on the road\nslow motion footage of a racing car\nship sailing on sea against sunset\nbig cargo ship passing on the shore\nback view of man and woman walking on unpaved road\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/all_dimension.txt",
    "content": "In a still frame, a stop sign\na toilet, frozen in time\na laptop, frozen in time\nA tranquil tableau of alley\nA tranquil tableau of bar\nA tranquil tableau of barn\nA tranquil tableau of bathroom\nA tranquil tableau of bedroom\nA tranquil tableau of cliff\nIn a still frame, courtyard\nIn a still frame, gas station\nA tranquil tableau of house\nindoor gymnasium, frozen in time\nA tranquil tableau of indoor library\nA tranquil tableau of kitchen\nA tranquil tableau of palace\nIn a still frame, parking lot\nIn a still frame, phone booth\nA tranquil tableau of restaurant\nA tranquil tableau of tower\nA tranquil tableau of a bowl\nA tranquil tableau of an apple\nA tranquil tableau of a bench\nA tranquil tableau of a bed\nA tranquil tableau of a chair\nA tranquil tableau of a cup\nA tranquil tableau of a dining table\nIn a still frame, a pear\nA tranquil tableau of a bunch of grapes\nA tranquil tableau of a bowl on the kitchen counter\nA tranquil tableau of a beautiful, handcrafted ceramic bowl\nA tranquil tableau of an antique bowl\nA tranquil tableau of an exquisite mahogany dining table\nA tranquil tableau of a wooden bench in the park\nA tranquil tableau of a beautiful wrought-iron bench surrounded by blooming flowers\nIn a still frame, a park bench with a view of the lake\nA tranquil tableau of a vintage rocking chair was placed on the porch\nA tranquil tableau of the jail cell was small and dimly lit, with cold, steel bars\nA tranquil tableau of the phone booth was tucked away in a quiet alley\na dilapidated phone booth stood as a relic of a bygone era on the sidewalk, frozen in time\nA tranquil tableau of the old red barn stood weathered and iconic against the backdrop of the countryside\nA tranquil tableau of a picturesque barn was painted a warm shade of red and nestled in a picturesque meadow\nIn a still frame, within the desolate desert, an oasis unfolded, characterized by the stoic presence of palm trees and a motionless, glassy pool of water\nIn a still frame, the Parthenon's majestic Doric columns stand in serene solitude atop the Acropolis, framed by the tranquil Athenian landscape\nIn a still frame, the Temple of Hephaestus, with its timeless Doric grace, stands stoically against the backdrop of a quiet Athens\nIn a still frame, the ornate Victorian streetlamp stands solemnly, adorned with intricate ironwork and stained glass panels\nA tranquil tableau of the Stonehenge presented itself as an enigmatic puzzle, each colossal stone meticulously placed against the backdrop of tranquility\nIn a still frame, in the vast desert, an oasis nestled among dunes, featuring tall palm trees and an air of serenity\nstatic view on a desert scene with an oasis, palm trees, and a clear, calm pool of water\nA tranquil tableau of an ornate Victorian streetlamp standing on a cobblestone street corner, illuminating the empty night\nA tranquil tableau of a tranquil lakeside cabin nestled among tall pines, its reflection mirrored perfectly in the calm water\nIn a still frame, a vintage gas lantern, adorned with intricate details, gracing a historic cobblestone square\nIn a still frame, a tranquil Japanese tea ceremony room, with tatami mats, a delicate tea set, and a bonsai tree in the corner\nA tranquil tableau of the Parthenon stands resolute in its classical elegance, a timeless symbol of Athens' cultural legacy\nA tranquil tableau of in the heart of Plaka, the neoclassical architecture of the old city harmonizes with the ancient ruins\nA tranquil tableau of in the desolate beauty of the American Southwest, Chaco Canyon's ancient ruins whispered tales of an enigmatic civilization that once thrived amidst the arid landscapes\nA tranquil tableau of at the edge of the Arabian Desert, the ancient city of Petra beckoned with its enigmatic rock-carved façades\nIn a still frame, amidst the cobblestone streets, an Art Nouveau lamppost stood tall\nA tranquil tableau of in the quaint village square, a traditional wrought-iron streetlamp featured delicate filigree patterns and amber-hued glass panels\nA tranquil tableau of the lampposts were adorned with Art Deco motifs, their geometric shapes and frosted glass creating a sense of vintage glamour\nIn a still frame, in the picturesque square, a Gothic-style lamppost adorned with intricate stone carvings added a touch of medieval charm to the setting\nIn a still frame, in the heart of the old city, a row of ornate lantern-style streetlamps bathed the narrow alleyway in a warm, welcoming light\nA tranquil tableau of in the heart of the Utah desert, a massive sandstone arch spanned the horizon\nA tranquil tableau of in the Arizona desert, a massive stone bridge arched across a rugged canyon\nA tranquil tableau of in the corner of the minimalist tea room, a bonsai tree added a touch of nature's beauty to the otherwise simple and elegant space\nIn a still frame, amidst the hushed ambiance of the traditional tea room, a meticulously arranged tea set awaited, with porcelain cups, a bamboo whisk\nIn a still frame, nestled in the Zen garden, a rustic teahouse featured tatami seating and a traditional charcoal brazier\nA tranquil tableau of a country estate's library featured elegant wooden shelves\nA tranquil tableau of beneath the shade of a solitary oak tree, an old wooden park bench sat patiently\nA tranquil tableau of beside a tranquil pond, a weeping willow tree draped its branches gracefully over the water's surface, creating a serene tableau of reflection and calm\nA tranquil tableau of in the Zen garden, a perfectly raked gravel path led to a serene rock garden\nIn a still frame, a tranquil pond was fringed by weeping cherry trees, their blossoms drifting lazily onto the glassy surface\nIn a still frame, within the historic library's reading room, rows of antique leather chairs and mahogany tables offered a serene haven for literary contemplation\nA tranquil tableau of a peaceful orchid garden showcased a variety of delicate blooms\nA tranquil tableau of in the serene courtyard, a centuries-old stone well stood as a symbol of a bygone era, its mossy stones bearing witness to the passage of time\na bird and a cat\na cat and a dog\na dog and a horse\na horse and a sheep\na sheep and a cow\na cow and an elephant\nan elephant and a bear\na bear and a zebra\na zebra and a giraffe\na giraffe and a bird\na chair and a couch\na couch and a potted plant\na potted plant and a tv\na tv and a laptop\na laptop and a remote\na remote and a keyboard\na keyboard and a cell phone\na cell phone and a book\na book and a clock\na clock and a backpack\na backpack and an umbrella\nan umbrella and a handbag\na handbag and a tie\na tie and a suitcase\na suitcase and a vase\na vase and scissors\nscissors and a teddy bear\na teddy bear and a frisbee\na frisbee and skis\nskis and a snowboard\na snowboard and a sports ball\na sports ball and a kite\na kite and a baseball bat\na baseball bat and a baseball glove\na baseball glove and a skateboard\na skateboard and a surfboard\na surfboard and a tennis racket\na tennis racket and a bottle\na bottle and a chair\nan airplane and a train\na train and a boat\na boat and an airplane\na bicycle and a car\na car and a motorcycle\na motorcycle and a bus\na bus and a traffic light\na traffic light and a fire hydrant\na fire hydrant and a stop sign\na stop sign and a parking meter\na parking meter and a truck\na truck and a bicycle\na toilet and a hair drier\na hair drier and a toothbrush\na toothbrush and a sink\na sink and a toilet\na wine glass and a chair\na cup and a couch\na fork and a potted plant\na knife and a tv\na spoon and a laptop\na bowl and a remote\na banana and a keyboard\nan apple and a cell phone\na sandwich and a book\nan orange and a clock\nbroccoli and a backpack\na carrot and an umbrella\na hot dog and a handbag\na pizza and a tie\na donut and a suitcase\na cake and a vase\nan oven and scissors\na toaster and a teddy bear\na microwave and a frisbee\na refrigerator and skis\na bicycle and an airplane\na car and a train\na motorcycle and a boat\na person and a toilet\na person and a hair drier\na person and a toothbrush\na person and a sink\nA person is riding a bike\nA person is marching\nA person is roller skating\nA person is tasting beer\nA person is clapping\nA person is drawing\nA person is petting animal (not cat)\nA person is eating watermelon\nA person is playing harp\nA person is wrestling\nA person is riding scooter\nA person is sweeping floor\nA person is skateboarding\nA person is dunking basketball\nA person is playing flute\nA person is stretching leg\nA person is tying tie\nA person is skydiving\nA person is shooting goal (soccer)\nA person is playing piano\nA person is finger snapping\nA person is canoeing or kayaking\nA person is laughing\nA person is digging\nA person is clay pottery making\nA person is shooting basketball\nA person is bending back\nA person is shaking hands\nA person is bandaging\nA person is push up\nA person is catching or throwing frisbee\nA person is playing trumpet\nA person is flying kite\nA person is filling eyebrows\nA person is shuffling cards\nA person is folding clothes\nA person is smoking\nA person is tai chi\nA person is squat\nA person is playing controller\nA person is throwing axe\nA person is giving or receiving award\nA person is air drumming\nA person is taking a shower\nA person is planting trees\nA person is sharpening knives\nA person is robot dancing\nA person is rock climbing\nA person is hula hooping\nA person is writing\nA person is bungee jumping\nA person is pushing cart\nA person is cleaning windows\nA person is cutting watermelon\nA person is cheerleading\nA person is washing hands\nA person is ironing\nA person is cutting nails\nA person is hugging\nA person is trimming or shaving beard\nA person is jogging\nA person is making bed\nA person is washing dishes\nA person is grooming dog\nA person is doing laundry\nA person is knitting\nA person is reading book\nA person is baby waking up\nA person is massaging legs\nA person is brushing teeth\nA person is crawling baby\nA person is motorcycling\nA person is driving car\nA person is sticking tongue out\nA person is shaking head\nA person is sword fighting\nA person is doing aerobics\nA person is strumming guitar\nA person is riding or walking with horse\nA person is archery\nA person is catching or throwing baseball\nA person is playing chess\nA person is rock scissors paper\nA person is using computer\nA person is arranging flowers\nA person is bending metal\nA person is ice skating\nA person is climbing a rope\nA person is crying\nA person is dancing ballet\nA person is getting a haircut\nA person is running on treadmill\nA person is kissing\nA person is counting money\nA person is barbequing\nA person is peeling apples\nA person is milking cow\nA person is shining shoes\nA person is making snowman\nA person is sailing\na person swimming in ocean\na person giving a presentation to a room full of colleagues\na person washing the dishes\na person eating a burger\na person walking in the snowstorm\na person drinking coffee in a cafe\na person playing guitar\na bicycle leaning against a tree\na bicycle gliding through a snowy field\na bicycle slowing down to stop\na bicycle accelerating to gain speed\na car stuck in traffic during rush hour\na car turning a corner\na car slowing down to stop\na car accelerating to gain speed\na motorcycle cruising along a coastal highway\na motorcycle turning a corner\na motorcycle slowing down to stop\na motorcycle gliding through a snowy field\na motorcycle accelerating to gain speed\nan airplane soaring through a clear blue sky\nan airplane taking off\nan airplane landing smoothly on a runway\nan airplane accelerating to gain speed\na bus turning a corner\na bus stuck in traffic during rush hour\na bus accelerating to gain speed\na train speeding down the tracks\na train crossing over a tall bridge\na train accelerating to gain speed\na truck turning a corner\na truck anchored in a tranquil bay\na truck stuck in traffic during rush hour\na truck slowing down to stop\na truck accelerating to gain speed\na boat sailing smoothly on a calm lake\na boat slowing down to stop\na boat accelerating to gain speed\na bird soaring gracefully in the sky\na bird building a nest from twigs and leaves\na bird flying over a snowy forest\na cat grooming itself meticulously with its tongue\na cat playing in park\na cat drinking water\na cat running happily\na dog enjoying a peaceful walk\na dog playing in park\na dog drinking water\na dog running happily\na horse bending down to drink water from a river\na horse galloping across an open field\na horse taking a peaceful walk\na horse running to join a herd of its kind\na sheep bending down to drink water from a river\na sheep taking a peaceful walk\na sheep running to join a herd of its kind\na cow bending down to drink water from a river\na cow chewing cud while resting in a tranquil barn\na cow running to join a herd of its kind\nan elephant spraying itself with water using its trunk to cool down\nan elephant taking a peaceful walk\nan elephant running to join a herd of its kind\na bear catching a salmon in its powerful jaws\na bear sniffing the air for scents of food\na bear climbing a tree\na bear hunting for prey\na zebra bending down to drink water from a river\na zebra running to join a herd of its kind\na zebra taking a peaceful walk\na giraffe bending down to drink water from a river\na giraffe taking a peaceful walk\na giraffe running to join a herd of its kind\na person\na bicycle\na car\na motorcycle\nan airplane\na bus\na train\na truck\na boat\na traffic light\na fire hydrant\na stop sign\na parking meter\na bench\na bird\na cat\na dog\na horse\na sheep\na cow\nan elephant\na bear\na zebra\na giraffe\na backpack\nan umbrella\na handbag\na tie\na suitcase\na frisbee\nskis\na snowboard\na sports ball\na kite\na baseball bat\na baseball glove\na skateboard\na surfboard\na tennis racket\na bottle\na wine glass\na cup\na fork\na knife\na spoon\na bowl\na banana\nan apple\na sandwich\nan orange\nbroccoli\na carrot\na hot dog\na pizza\na donut\na cake\na chair\na couch\na potted plant\na bed\na dining table\na toilet\na tv\na laptop\na remote\na keyboard\na cell phone\na microwave\nan oven\na toaster\na sink\na refrigerator\na book\na clock\na vase\nscissors\na teddy bear\na hair drier\na toothbrush\na red bicycle\na green bicycle\na blue bicycle\na yellow bicycle\nan orange bicycle\na purple bicycle\na pink bicycle\na black bicycle\na white bicycle\na red car\na green car\na blue car\na yellow car\nan orange car\na purple car\na pink car\na black car\na white car\na red bird\na green bird\na blue bird\na yellow bird\nan orange bird\na purple bird\na pink bird\na black bird\na white bird\na black cat\na white cat\nan orange cat\na yellow cat\na red umbrella\na green umbrella\na blue umbrella\na yellow umbrella\nan orange umbrella\na purple umbrella\na pink umbrella\na black umbrella\na white umbrella\na red suitcase\na green suitcase\na blue suitcase\na yellow suitcase\nan orange suitcase\na purple suitcase\na pink suitcase\na black suitcase\na white suitcase\na red bowl\na green bowl\na blue bowl\na yellow bowl\nan orange bowl\na purple bowl\na pink bowl\na black bowl\na white bowl\na red chair\na green chair\na blue chair\na yellow chair\nan orange chair\na purple chair\na pink chair\na black chair\na white chair\na red clock\na green clock\na blue clock\na yellow clock\nan orange clock\na purple clock\na pink clock\na black clock\na white clock\na red vase\na green vase\na blue vase\na yellow vase\nan orange vase\na purple vase\na pink vase\na black vase\na white vase\nA beautiful coastal beach in spring, waves lapping on sand, Van Gogh style\nA beautiful coastal beach in spring, waves lapping on sand, oil painting\nA beautiful coastal beach in spring, waves lapping on sand by Hokusai, in the style of Ukiyo\nA beautiful coastal beach in spring, waves lapping on sand, black and white\nA beautiful coastal beach in spring, waves lapping on sand, pixel art\nA beautiful coastal beach in spring, waves lapping on sand, in cyberpunk style\nA beautiful coastal beach in spring, waves lapping on sand, animated style\nA beautiful coastal beach in spring, waves lapping on sand, watercolor painting\nA beautiful coastal beach in spring, waves lapping on sand, surrealism style\nThe bund Shanghai, Van Gogh style\nThe bund Shanghai, oil painting\nThe bund Shanghai by Hokusai, in the style of Ukiyo\nThe bund Shanghai, black and white\nThe bund Shanghai, pixel art\nThe bund Shanghai, in cyberpunk style\nThe bund Shanghai, animated style\nThe bund Shanghai, watercolor painting\nThe bund Shanghai, surrealism style\na shark is swimming in the ocean, Van Gogh style\na shark is swimming in the ocean, oil painting\na shark is swimming in the ocean by Hokusai, in the style of Ukiyo\na shark is swimming in the ocean, black and white\na shark is swimming in the ocean, pixel art\na shark is swimming in the ocean, in cyberpunk style\na shark is swimming in the ocean, animated style\na shark is swimming in the ocean, watercolor painting\na shark is swimming in the ocean, surrealism style\nA panda drinking coffee in a cafe in Paris, Van Gogh style\nA panda drinking coffee in a cafe in Paris, oil painting\nA panda drinking coffee in a cafe in Paris by Hokusai, in the style of Ukiyo\nA panda drinking coffee in a cafe in Paris, black and white\nA panda drinking coffee in a cafe in Paris, pixel art\nA panda drinking coffee in a cafe in Paris, in cyberpunk style\nA panda drinking coffee in a cafe in Paris, animated style\nA panda drinking coffee in a cafe in Paris, watercolor painting\nA panda drinking coffee in a cafe in Paris, surrealism style\nA cute happy Corgi playing in park, sunset, Van Gogh style\nA cute happy Corgi playing in park, sunset, oil painting\nA cute happy Corgi playing in park, sunset by Hokusai, in the style of Ukiyo\nA cute happy Corgi playing in park, sunset, black and white\nA cute happy Corgi playing in park, sunset, pixel art\nA cute happy Corgi playing in park, sunset, in cyberpunk style\nA cute happy Corgi playing in park, sunset, animated style\nA cute happy Corgi playing in park, sunset, watercolor painting\nA cute happy Corgi playing in park, sunset, surrealism style\nGwen Stacy reading a book, Van Gogh style\nGwen Stacy reading a book, oil painting\nGwen Stacy reading a book by Hokusai, in the style of Ukiyo\nGwen Stacy reading a book, black and white\nGwen Stacy reading a book, pixel art\nGwen Stacy reading a book, in cyberpunk style\nGwen Stacy reading a book, animated style\nGwen Stacy reading a book, watercolor painting\nGwen Stacy reading a book, surrealism style\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, Van Gogh style\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, oil painting\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background by Hokusai, in the style of Ukiyo\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, black and white\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, pixel art\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, in cyberpunk style\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, animated style\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, watercolor painting\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, surrealism style\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, Van Gogh style\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, oil painting\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas by Hokusai, in the style of Ukiyo\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, black and white\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, pixel art\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, in cyberpunk style\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, animated style\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, watercolor painting\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, surrealism style\nAn astronaut flying in space, Van Gogh style\nAn astronaut flying in space, oil painting\nAn astronaut flying in space by Hokusai, in the style of Ukiyo\nAn astronaut flying in space, black and white\nAn astronaut flying in space, pixel art\nAn astronaut flying in space, in cyberpunk style\nAn astronaut flying in space, animated style\nAn astronaut flying in space, watercolor painting\nAn astronaut flying in space, surrealism style\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, Van Gogh style\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, oil painting\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks by Hokusai, in the style of Ukiyo\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, black and white\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, pixel art\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, in cyberpunk style\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, animated style\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, watercolor painting\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, surrealism style\nA beautiful coastal beach in spring, waves lapping on sand, in super slow motion\nA beautiful coastal beach in spring, waves lapping on sand, zoom in\nA beautiful coastal beach in spring, waves lapping on sand, zoom out\nA beautiful coastal beach in spring, waves lapping on sand, pan left\nA beautiful coastal beach in spring, waves lapping on sand, pan right\nA beautiful coastal beach in spring, waves lapping on sand, tilt up\nA beautiful coastal beach in spring, waves lapping on sand, tilt down\nA beautiful coastal beach in spring, waves lapping on sand, with an intense shaking effect\nA beautiful coastal beach in spring, waves lapping on sand, featuring a steady and smooth perspective\nA beautiful coastal beach in spring, waves lapping on sand, racking focus\nThe bund Shanghai, in super slow motion\nThe bund Shanghai, zoom in\nThe bund Shanghai, zoom out\nThe bund Shanghai, pan left\nThe bund Shanghai, pan right\nThe bund Shanghai, tilt up\nThe bund Shanghai, tilt down\nThe bund Shanghai, with an intense shaking effect\nThe bund Shanghai, featuring a steady and smooth perspective\nThe bund Shanghai, racking focus\na shark is swimming in the ocean, in super slow motion\na shark is swimming in the ocean, zoom in\na shark is swimming in the ocean, zoom out\na shark is swimming in the ocean, pan left\na shark is swimming in the ocean, pan right\na shark is swimming in the ocean, tilt up\na shark is swimming in the ocean, tilt down\na shark is swimming in the ocean, with an intense shaking effect\na shark is swimming in the ocean, featuring a steady and smooth perspective\na shark is swimming in the ocean, racking focus\nA panda drinking coffee in a cafe in Paris, in super slow motion\nA panda drinking coffee in a cafe in Paris, zoom in\nA panda drinking coffee in a cafe in Paris, zoom out\nA panda drinking coffee in a cafe in Paris, pan left\nA panda drinking coffee in a cafe in Paris, pan right\nA panda drinking coffee in a cafe in Paris, tilt up\nA panda drinking coffee in a cafe in Paris, tilt down\nA panda drinking coffee in a cafe in Paris, with an intense shaking effect\nA panda drinking coffee in a cafe in Paris, featuring a steady and smooth perspective\nA panda drinking coffee in a cafe in Paris, racking focus\nA cute happy Corgi playing in park, sunset, in super slow motion\nA cute happy Corgi playing in park, sunset, zoom in\nA cute happy Corgi playing in park, sunset, zoom out\nA cute happy Corgi playing in park, sunset, pan left\nA cute happy Corgi playing in park, sunset, pan right\nA cute happy Corgi playing in park, sunset, tilt up\nA cute happy Corgi playing in park, sunset, tilt down\nA cute happy Corgi playing in park, sunset, with an intense shaking effect\nA cute happy Corgi playing in park, sunset, featuring a steady and smooth perspective\nA cute happy Corgi playing in park, sunset, racking focus\nGwen Stacy reading a book, in super slow motion\nGwen Stacy reading a book, zoom in\nGwen Stacy reading a book, zoom out\nGwen Stacy reading a book, pan left\nGwen Stacy reading a book, pan right\nGwen Stacy reading a book, tilt up\nGwen Stacy reading a book, tilt down\nGwen Stacy reading a book, with an intense shaking effect\nGwen Stacy reading a book, featuring a steady and smooth perspective\nGwen Stacy reading a book, racking focus\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, in super slow motion\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, zoom in\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, zoom out\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, pan left\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, pan right\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, tilt up\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, tilt down\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, with an intense shaking effect\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, featuring a steady and smooth perspective\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, racking focus\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, in super slow motion\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, zoom in\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, zoom out\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, pan left\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, pan right\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, tilt up\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, tilt down\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, with an intense shaking effect\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, featuring a steady and smooth perspective\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, racking focus\nAn astronaut flying in space, in super slow motion\nAn astronaut flying in space, zoom in\nAn astronaut flying in space, zoom out\nAn astronaut flying in space, pan left\nAn astronaut flying in space, pan right\nAn astronaut flying in space, tilt up\nAn astronaut flying in space, tilt down\nAn astronaut flying in space, with an intense shaking effect\nAn astronaut flying in space, featuring a steady and smooth perspective\nAn astronaut flying in space, racking focus\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, in super slow motion\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, zoom in\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, zoom out\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, pan left\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, pan right\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, tilt up\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, tilt down\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, with an intense shaking effect\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, featuring a steady and smooth perspective\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, racking focus\nClose up of grapes on a rotating table.\nTurtle swimming in ocean.\nA storm trooper vacuuming the beach.\nA panda standing on a surfboard in the ocean in sunset.\nAn astronaut feeding ducks on a sunny afternoon, reflection from the water.\nTwo pandas discussing an academic paper.\nSunset time lapse at the beach with moving clouds and colors in the sky.\nA fat rabbit wearing a purple robe walking through a fantasy landscape.\nA koala bear playing piano in the forest.\nAn astronaut flying in space.\nFireworks.\nAn animated painting of fluffy white clouds moving in sky.\nFlying through fantasy landscapes.\nA bigfoot walking in the snowstorm.\nA squirrel eating a burger.\nA cat wearing sunglasses and working as a lifeguard at a pool.\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks.\nSplash of turquoise water in extreme slow motion, alpha channel included.\nan ice cream is melting on the table.\na drone flying over a snowy forest.\na shark is swimming in the ocean.\nAerial panoramic video from a drone of a fantasy land.\na teddy bear is swimming in the ocean.\ntime lapse of sunrise on mars.\ngolden fish swimming in the ocean.\nAn artist brush painting on a canvas close up.\nA drone view of celebration with Christmas tree and fireworks, starry sky - background.\nhappy dog wearing a yellow turtleneck, studio, portrait, facing camera, dark background\nOrigami dancers in white paper, 3D render, on white background, studio shot, dancing modern dance.\nCampfire at night in a snowy forest with starry sky in the background.\na fantasy landscape\nA 3D model of a 1800s victorian house.\nthis is how I do makeup in the morning.\nA raccoon that looks like a turtle, digital art.\nRobot dancing in Times Square.\nBusy freeway at night.\nBalloon full of water exploding in extreme slow motion.\nAn astronaut is riding a horse in the space in a photorealistic style.\nMacro slo-mo. Slow motion cropped closeup of roasted coffee beans falling into an empty bowl.\nSewing machine, old sewing machine working.\nMotion colour drop in water, ink swirling in water, colourful ink in water, abstraction fancy dream cloud of ink.\nFew big purple plums rotating on the turntable. water drops appear on the skin during rotation. isolated on the white background. close-up. macro.\nVampire makeup face of beautiful girl, red contact lenses.\nAshtray full of butts on table, smoke flowing on black background, close-up\nPacific coast, carmel by the sea ocean and waves.\nA teddy bear is playing drum kit in NYC Times Square.\nA corgi is playing drum kit.\nAn Iron man is playing the electronic guitar, high electronic guitar.\nA raccoon is playing the electronic guitar.\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background by Vincent van Gogh\nA corgi's head depicted as an explosion of a nebula\nA fantasy landscape\nA future where humans have achieved teleportation technology\nA jellyfish floating through the ocean, with bioluminescent tentacles\nA Mars rover moving on Mars\nA panda drinking coffee in a cafe in Paris\nA space shuttle launching into orbit, with flames and smoke billowing out from the engines\nA steam train moving on a mountainside\nA super cool giant robot in Cyberpunk Beijing\nA tropical beach at sunrise, with palm trees and crystal-clear water in the foreground\nCinematic shot of Van Gogh's selfie, Van Gogh style\nGwen Stacy reading a book\nIron Man flying in the sky\nThe bund Shanghai, oil painting\nYoda playing guitar on the stage\nA beautiful coastal beach in spring, waves lapping on sand by Hokusai, in the style of Ukiyo\nA beautiful coastal beach in spring, waves lapping on sand by Vincent van Gogh\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background\nA car moving slowly on an empty street, rainy evening\nA cat eating food out of a bowl\nA cat wearing sunglasses at a pool\nA confused panda in calculus class\nA cute fluffy panda eating Chinese food in a restaurant\nA cute happy Corgi playing in park, sunset\nA cute raccoon playing guitar in a boat on the ocean\nA happy fuzzy panda playing guitar nearby a campfire, snow mountain in the background\nA lightning striking atop of eiffel tower, dark clouds in the sky\nA modern art museum, with colorful paintings\nA panda cooking in the kitchen\nA panda playing on a swing set\nA polar bear is playing guitar\nA raccoon dressed in suit playing the trumpet, stage background\nA robot DJ is playing the turntable, in heavy raining futuristic tokyo rooftop cyberpunk night, sci-fi, fantasy\nA shark swimming in clear Caribbean ocean\nA super robot protecting city\nA teddy bear washing the dishes\nAn epic tornado attacking above a glowing city at night, the tornado is made of smoke\nAn oil painting of a couple in formal evening wear going home get caught in a heavy downpour with umbrellas\nClown fish swimming through the coral reef\nHyper-realistic spaceship landing on Mars\nThe bund Shanghai, vibrant color\nVincent van Gogh is painting in the room\nYellow flowers swing in the wind\nalley\namusement park\naquarium\narch\nart gallery\nbathroom\nbakery shop\nballroom\nbar\nbarn\nbasement\nbeach\nbedroom\nbridge\nbotanical garden\ncafeteria\ncampsite\ncampus\ncarrousel\ncastle\ncemetery\nclassroom\ncliff\ncrosswalk\nconstruction site\ncorridor\ncourtyard\ndesert\ndowntown\ndriveway\nfarm\nfood court\nfootball field\nforest road\nfountain\ngas station\nglacier\ngolf course\nindoor gymnasium\nharbor\nhighway\nhospital\nhouse\niceberg\nindustrial area\njail cell\njunkyard\nkitchen\nindoor library\nlighthouse\nlaboratory\nmansion\nmarsh\nmountain\nindoor movie theater\nindoor museum\nmusic studio\nnursery\nocean\noffice\npalace\nparking lot\npharmacy\nphone booth\nraceway\nrestaurant\nriver\nscience museum\nshower\nski slope\nsky\nskyscraper\nbaseball stadium\nstaircase\nstreet\nsupermarket\nindoor swimming pool\ntower\noutdoor track\ntrain railway\ntrain station platform\nunderwater coral reef\nvalley\nvolcano\nwaterfall\nwindmill\na bicycle on the left of a car, front view\na car on the right of a motorcycle, front view\na motorcycle on the left of a bus, front view\na bus on the right of a traffic light, front view\na traffic light on the left of a fire hydrant, front view\na fire hydrant on the right of a stop sign, front view\na stop sign on the left of a parking meter, front view\na parking meter on the right of a bench, front view\na bench on the left of a truck, front view\na truck on the right of a bicycle, front view\na bird on the left of a cat, front view\na cat on the right of a dog, front view\na dog on the left of a horse, front view\na horse on the right of a sheep, front view\na sheep on the left of a cow, front view\na cow on the right of an elephant, front view\nan elephant on the left of a bear, front view\na bear on the right of a zebra, front view\na zebra on the left of a giraffe, front view\na giraffe on the right of a bird, front view\na bottle on the left of a wine glass, front view\na wine glass on the right of a cup, front view\na cup on the left of a fork, front view\na fork on the right of a knife, front view\na knife on the left of a spoon, front view\na spoon on the right of a bowl, front view\na bowl on the left of a bottle, front view\na potted plant on the left of a remote, front view\na remote on the right of a clock, front view\na clock on the left of a vase, front view\na vase on the right of scissors, front view\nscissors on the left of a teddy bear, front view\na teddy bear on the right of a potted plant, front view\na frisbee on the left of a sports ball, front view\na sports ball on the right of a baseball bat, front view\na baseball bat on the left of a baseball glove, front view\na baseball glove on the right of a tennis racket, front view\na tennis racket on the left of a frisbee, front view\na toilet on the left of a hair drier, front view\na hair drier on the right of a toothbrush, front view\na toothbrush on the left of a sink, front view\na sink on the right of a toilet, front view\na chair on the left of a couch, front view\na couch on the right of a bed, front view\na bed on the left of a tv, front view\na tv on the right of a dining table, front view\na dining table on the left of a chair, front view\nan airplane on the left of a train, front view\na train on the right of a boat, front view\na boat on the left of an airplane, front view\nan oven on the top of a toaster, front view\nan oven on the bottom of a toaster, front view\na toaster on the top of a microwave, front view\na toaster on the bottom of a microwave, front view\na microwave on the top of an oven, front view\na microwave on the bottom of an oven, front view\na banana on the top of an apple, front view\na banana on the bottom of an apple, front view\nan apple on the top of a sandwich, front view\nan apple on the bottom of a sandwich, front view\na sandwich on the top of an orange, front view\na sandwich on the bottom of an orange, front view\nan orange on the top of a carrot, front view\nan orange on the bottom of a carrot, front view\na carrot on the top of a hot dog, front view\na carrot on the bottom of a hot dog, front view\na hot dog on the top of a pizza, front view\na hot dog on the bottom of a pizza, front view\na pizza on the top of a donut, front view\na pizza on the bottom of a donut, front view\na donut on the top of broccoli, front view\na donut on the bottom of broccoli, front view\nbroccoli on the top of a banana, front view\nbroccoli on the bottom of a banana, front view\nskis on the top of a snowboard, front view\nskis on the bottom of a snowboard, front view\na snowboard on the top of a kite, front view\na snowboard on the bottom of a kite, front view\na kite on the top of a skateboard, front view\na kite on the bottom of a skateboard, front view\na skateboard on the top of a surfboard, front view\na skateboard on the bottom of a surfboard, front view\na surfboard on the top of skis, front view\na surfboard on the bottom of skis, front view\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/all_i2v.txt",
    "content": "a close up of a blue and orange liquid{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close up of a blue and orange liquid.jpg\", \"mask_strategy\": \"0\"}\na close up of a blue and orange liquid, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close up of a blue and orange liquid.jpg\", \"mask_strategy\": \"0\"}\na close up of a blue and orange liquid, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close up of a blue and orange liquid.jpg\", \"mask_strategy\": \"0\"}\na close up of a blue and orange liquid, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close up of a blue and orange liquid.jpg\", \"mask_strategy\": \"0\"}\na close up of a blue and orange liquid, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close up of a blue and orange liquid.jpg\", \"mask_strategy\": \"0\"}\na close up of a blue and orange liquid, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close up of a blue and orange liquid.jpg\", \"mask_strategy\": \"0\"}\na close up of a blue and orange liquid, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close up of a blue and orange liquid.jpg\", \"mask_strategy\": \"0\"}\na close up of a blue and orange liquid, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close up of a blue and orange liquid.jpg\", \"mask_strategy\": \"0\"}\nA black and white abstract video featuring mesmerizing bubbles{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A black and white abstract video featuring mesmerizing bubbles.jpg\", \"mask_strategy\": \"0\"}\nA black and white abstract video featuring mesmerizing bubbles, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A black and white abstract video featuring mesmerizing bubbles.jpg\", \"mask_strategy\": \"0\"}\nA black and white abstract video featuring mesmerizing bubbles, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A black and white abstract video featuring mesmerizing bubbles.jpg\", \"mask_strategy\": \"0\"}\nA black and white abstract video featuring mesmerizing bubbles, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A black and white abstract video featuring mesmerizing bubbles.jpg\", \"mask_strategy\": \"0\"}\nA black and white abstract video featuring mesmerizing bubbles, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A black and white abstract video featuring mesmerizing bubbles.jpg\", \"mask_strategy\": \"0\"}\nA black and white abstract video featuring mesmerizing bubbles, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A black and white abstract video featuring mesmerizing bubbles.jpg\", \"mask_strategy\": \"0\"}\nA black and white abstract video featuring mesmerizing bubbles, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A black and white abstract video featuring mesmerizing bubbles.jpg\", \"mask_strategy\": \"0\"}\nA black and white abstract video featuring mesmerizing bubbles, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A black and white abstract video featuring mesmerizing bubbles.jpg\", \"mask_strategy\": \"0\"}\na blue and white smoke is swirly in the dark{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a blue and white smoke is swirly in the dark.jpg\", \"mask_strategy\": \"0\"}\na blue and white smoke is swirly in the dark, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a blue and white smoke is swirly in the dark.jpg\", \"mask_strategy\": \"0\"}\na blue and white smoke is swirly in the dark, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a blue and white smoke is swirly in the dark.jpg\", \"mask_strategy\": \"0\"}\na blue and white smoke is swirly in the dark, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a blue and white smoke is swirly in the dark.jpg\", \"mask_strategy\": \"0\"}\na blue and white smoke is swirly in the dark, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a blue and white smoke is swirly in the dark.jpg\", \"mask_strategy\": \"0\"}\na blue and white smoke is swirly in the dark, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a blue and white smoke is swirly in the dark.jpg\", \"mask_strategy\": \"0\"}\na blue and white smoke is swirly in the dark, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a blue and white smoke is swirly in the dark.jpg\", \"mask_strategy\": \"0\"}\na blue and white smoke is swirly in the dark, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a blue and white smoke is swirly in the dark.jpg\", \"mask_strategy\": \"0\"}\na close-up view of a sea fan in the water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of a sea fan in the water.jpg\", \"mask_strategy\": \"0\"}\na close-up view of a sea fan in the water, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of a sea fan in the water.jpg\", \"mask_strategy\": \"0\"}\na close-up view of a sea fan in the water, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of a sea fan in the water.jpg\", \"mask_strategy\": \"0\"}\na close-up view of a sea fan in the water, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of a sea fan in the water.jpg\", \"mask_strategy\": \"0\"}\na close-up view of a sea fan in the water, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of a sea fan in the water.jpg\", \"mask_strategy\": \"0\"}\na close-up view of a sea fan in the water, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of a sea fan in the water.jpg\", \"mask_strategy\": \"0\"}\na close-up view of a sea fan in the water, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of a sea fan in the water.jpg\", \"mask_strategy\": \"0\"}\na close-up view of a sea fan in the water, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of a sea fan in the water.jpg\", \"mask_strategy\": \"0\"}\na visually captivating abstract video, rich in color, set against a dramatic black background{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a visually captivating abstract video, rich in color, set against a dramatic black background.jpg\", \"mask_strategy\": \"0\"}\na visually captivating abstract video, rich in color, set against a dramatic black background, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a visually captivating abstract video, rich in color, set against a dramatic black background.jpg\", \"mask_strategy\": \"0\"}\na visually captivating abstract video, rich in color, set against a dramatic black background, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a visually captivating abstract video, rich in color, set against a dramatic black background.jpg\", \"mask_strategy\": \"0\"}\na visually captivating abstract video, rich in color, set against a dramatic black background, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a visually captivating abstract video, rich in color, set against a dramatic black background.jpg\", \"mask_strategy\": \"0\"}\na visually captivating abstract video, rich in color, set against a dramatic black background, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a visually captivating abstract video, rich in color, set against a dramatic black background.jpg\", \"mask_strategy\": \"0\"}\na visually captivating abstract video, rich in color, set against a dramatic black background, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a visually captivating abstract video, rich in color, set against a dramatic black background.jpg\", \"mask_strategy\": \"0\"}\na visually captivating abstract video, rich in color, set against a dramatic black background, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a visually captivating abstract video, rich in color, set against a dramatic black background.jpg\", \"mask_strategy\": \"0\"}\na visually captivating abstract video, rich in color, set against a dramatic black background, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a visually captivating abstract video, rich in color, set against a dramatic black background.jpg\", \"mask_strategy\": \"0\"}\na purple and yellow abstract painting with a black background{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a purple and yellow abstract painting with a black background.jpg\", \"mask_strategy\": \"0\"}\na purple and yellow abstract painting with a black background, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a purple and yellow abstract painting with a black background.jpg\", \"mask_strategy\": \"0\"}\na purple and yellow abstract painting with a black background, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a purple and yellow abstract painting with a black background.jpg\", \"mask_strategy\": \"0\"}\na purple and yellow abstract painting with a black background, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a purple and yellow abstract painting with a black background.jpg\", \"mask_strategy\": \"0\"}\na purple and yellow abstract painting with a black background, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a purple and yellow abstract painting with a black background.jpg\", \"mask_strategy\": \"0\"}\na purple and yellow abstract painting with a black background, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a purple and yellow abstract painting with a black background.jpg\", \"mask_strategy\": \"0\"}\na purple and yellow abstract painting with a black background, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a purple and yellow abstract painting with a black background.jpg\", \"mask_strategy\": \"0\"}\na purple and yellow abstract painting with a black background, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a purple and yellow abstract painting with a black background.jpg\", \"mask_strategy\": \"0\"}\na dynamic video of a blurry neon light in the dark, radiating captivating colors{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dynamic video of a blurry neon light in the dark, radiating captivating colors.jpg\", \"mask_strategy\": \"0\"}\na dynamic video of a blurry neon light in the dark, radiating captivating colors, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dynamic video of a blurry neon light in the dark, radiating captivating colors.jpg\", \"mask_strategy\": \"0\"}\na dynamic video of a blurry neon light in the dark, radiating captivating colors, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dynamic video of a blurry neon light in the dark, radiating captivating colors.jpg\", \"mask_strategy\": \"0\"}\na dynamic video of a blurry neon light in the dark, radiating captivating colors, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dynamic video of a blurry neon light in the dark, radiating captivating colors.jpg\", \"mask_strategy\": \"0\"}\na dynamic video of a blurry neon light in the dark, radiating captivating colors, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dynamic video of a blurry neon light in the dark, radiating captivating colors.jpg\", \"mask_strategy\": \"0\"}\na dynamic video of a blurry neon light in the dark, radiating captivating colors, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dynamic video of a blurry neon light in the dark, radiating captivating colors.jpg\", \"mask_strategy\": \"0\"}\na dynamic video of a blurry neon light in the dark, radiating captivating colors, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dynamic video of a blurry neon light in the dark, radiating captivating colors.jpg\", \"mask_strategy\": \"0\"}\na dynamic video of a blurry neon light in the dark, radiating captivating colors, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dynamic video of a blurry neon light in the dark, radiating captivating colors.jpg\", \"mask_strategy\": \"0\"}\na view of a star trail in the night sky{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a star trail in the night sky.jpg\", \"mask_strategy\": \"0\"}\na view of a star trail in the night sky, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a star trail in the night sky.jpg\", \"mask_strategy\": \"0\"}\na view of a star trail in the night sky, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a star trail in the night sky.jpg\", \"mask_strategy\": \"0\"}\na view of a star trail in the night sky, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a star trail in the night sky.jpg\", \"mask_strategy\": \"0\"}\na view of a star trail in the night sky, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a star trail in the night sky.jpg\", \"mask_strategy\": \"0\"}\na view of a star trail in the night sky, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a star trail in the night sky.jpg\", \"mask_strategy\": \"0\"}\na view of a star trail in the night sky, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a star trail in the night sky.jpg\", \"mask_strategy\": \"0\"}\na view of a star trail in the night sky, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a star trail in the night sky.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a small town on the edge of the ocean{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a small town on the edge of the ocean.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a small town on the edge of the ocean, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a small town on the edge of the ocean.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a small town on the edge of the ocean, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a small town on the edge of the ocean.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a small town on the edge of the ocean, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a small town on the edge of the ocean.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a small town on the edge of the ocean, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a small town on the edge of the ocean.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a small town on the edge of the ocean, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a small town on the edge of the ocean.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a small town on the edge of the ocean, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a small town on the edge of the ocean.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a small town on the edge of the ocean, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a small town on the edge of the ocean.jpg\", \"mask_strategy\": \"0\"}\nColorful buildings on the seaside cliffs{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/Colorful buildings on the seaside cliffs.jpg\", \"mask_strategy\": \"0\"}\nColorful buildings on the seaside cliffs, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/Colorful buildings on the seaside cliffs.jpg\", \"mask_strategy\": \"0\"}\nColorful buildings on the seaside cliffs, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/Colorful buildings on the seaside cliffs.jpg\", \"mask_strategy\": \"0\"}\nColorful buildings on the seaside cliffs, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/Colorful buildings on the seaside cliffs.jpg\", \"mask_strategy\": \"0\"}\nColorful buildings on the seaside cliffs, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/Colorful buildings on the seaside cliffs.jpg\", \"mask_strategy\": \"0\"}\nColorful buildings on the seaside cliffs, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/Colorful buildings on the seaside cliffs.jpg\", \"mask_strategy\": \"0\"}\nColorful buildings on the seaside cliffs, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/Colorful buildings on the seaside cliffs.jpg\", \"mask_strategy\": \"0\"}\nColorful buildings on the seaside cliffs, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/Colorful buildings on the seaside cliffs.jpg\", \"mask_strategy\": \"0\"}\na bunch of houses that are on a hillside{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bunch of houses that are on a hillside.jpg\", \"mask_strategy\": \"0\"}\na bunch of houses that are on a hillside, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bunch of houses that are on a hillside.jpg\", \"mask_strategy\": \"0\"}\na bunch of houses that are on a hillside, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bunch of houses that are on a hillside.jpg\", \"mask_strategy\": \"0\"}\na bunch of houses that are on a hillside, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bunch of houses that are on a hillside.jpg\", \"mask_strategy\": \"0\"}\na bunch of houses that are on a hillside, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bunch of houses that are on a hillside.jpg\", \"mask_strategy\": \"0\"}\na bunch of houses that are on a hillside, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bunch of houses that are on a hillside.jpg\", \"mask_strategy\": \"0\"}\na bunch of houses that are on a hillside, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bunch of houses that are on a hillside.jpg\", \"mask_strategy\": \"0\"}\na bunch of houses that are on a hillside, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bunch of houses that are on a hillside.jpg\", \"mask_strategy\": \"0\"}\na building that is sitting on the side of a pond{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a building that is sitting on the side of a pond.jpg\", \"mask_strategy\": \"0\"}\na building that is sitting on the side of a pond, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a building that is sitting on the side of a pond.jpg\", \"mask_strategy\": \"0\"}\na building that is sitting on the side of a pond, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a building that is sitting on the side of a pond.jpg\", \"mask_strategy\": \"0\"}\na building that is sitting on the side of a pond, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a building that is sitting on the side of a pond.jpg\", \"mask_strategy\": \"0\"}\na building that is sitting on the side of a pond, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a building that is sitting on the side of a pond.jpg\", \"mask_strategy\": \"0\"}\na building that is sitting on the side of a pond, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a building that is sitting on the side of a pond.jpg\", \"mask_strategy\": \"0\"}\na building that is sitting on the side of a pond, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a building that is sitting on the side of a pond.jpg\", \"mask_strategy\": \"0\"}\na building that is sitting on the side of a pond, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a building that is sitting on the side of a pond.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a busy city with a bridge in the background{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a busy city with a bridge in the background.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a busy city with a bridge in the background, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a busy city with a bridge in the background.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a busy city with a bridge in the background, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a busy city with a bridge in the background.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a busy city with a bridge in the background, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a busy city with a bridge in the background.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a busy city with a bridge in the background, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a busy city with a bridge in the background.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a busy city with a bridge in the background, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a busy city with a bridge in the background.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a busy city with a bridge in the background, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a busy city with a bridge in the background.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a busy city with a bridge in the background, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a busy city with a bridge in the background.jpg\", \"mask_strategy\": \"0\"}\na bridge that is over a body of water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bridge that is over a body of water.jpg\", \"mask_strategy\": \"0\"}\na bridge that is over a body of water, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bridge that is over a body of water.jpg\", \"mask_strategy\": \"0\"}\na bridge that is over a body of water, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bridge that is over a body of water.jpg\", \"mask_strategy\": \"0\"}\na bridge that is over a body of water, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bridge that is over a body of water.jpg\", \"mask_strategy\": \"0\"}\na bridge that is over a body of water, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bridge that is over a body of water.jpg\", \"mask_strategy\": \"0\"}\na bridge that is over a body of water, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bridge that is over a body of water.jpg\", \"mask_strategy\": \"0\"}\na bridge that is over a body of water, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bridge that is over a body of water.jpg\", \"mask_strategy\": \"0\"}\na bridge that is over a body of water, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bridge that is over a body of water.jpg\", \"mask_strategy\": \"0\"}\na pile of wood sitting next to a log house{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pile of wood sitting next to a log house.jpg\", \"mask_strategy\": \"0\"}\na pile of wood sitting next to a log house, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pile of wood sitting next to a log house.jpg\", \"mask_strategy\": \"0\"}\na pile of wood sitting next to a log house, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pile of wood sitting next to a log house.jpg\", \"mask_strategy\": \"0\"}\na pile of wood sitting next to a log house, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pile of wood sitting next to a log house.jpg\", \"mask_strategy\": \"0\"}\na pile of wood sitting next to a log house, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pile of wood sitting next to a log house.jpg\", \"mask_strategy\": \"0\"}\na pile of wood sitting next to a log house, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pile of wood sitting next to a log house.jpg\", \"mask_strategy\": \"0\"}\na pile of wood sitting next to a log house, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pile of wood sitting next to a log house.jpg\", \"mask_strategy\": \"0\"}\na pile of wood sitting next to a log house, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pile of wood sitting next to a log house.jpg\", \"mask_strategy\": \"0\"}\na view of a snowy mountain side with many buildings{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a snowy mountain side with many buildings.jpg\", \"mask_strategy\": \"0\"}\na view of a snowy mountain side with many buildings, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a snowy mountain side with many buildings.jpg\", \"mask_strategy\": \"0\"}\na view of a snowy mountain side with many buildings, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a snowy mountain side with many buildings.jpg\", \"mask_strategy\": \"0\"}\na view of a snowy mountain side with many buildings, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a snowy mountain side with many buildings.jpg\", \"mask_strategy\": \"0\"}\na view of a snowy mountain side with many buildings, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a snowy mountain side with many buildings.jpg\", \"mask_strategy\": \"0\"}\na view of a snowy mountain side with many buildings, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a snowy mountain side with many buildings.jpg\", \"mask_strategy\": \"0\"}\na view of a snowy mountain side with many buildings, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a snowy mountain side with many buildings.jpg\", \"mask_strategy\": \"0\"}\na view of a snowy mountain side with many buildings, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a snowy mountain side with many buildings.jpg\", \"mask_strategy\": \"0\"}\nsan francisco skyline at sunset{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/san francisco skyline at sunset.jpg\", \"mask_strategy\": \"0\"}\nsan francisco skyline at sunset, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/san francisco skyline at sunset.jpg\", \"mask_strategy\": \"0\"}\nsan francisco skyline at sunset, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/san francisco skyline at sunset.jpg\", \"mask_strategy\": \"0\"}\nsan francisco skyline at sunset, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/san francisco skyline at sunset.jpg\", \"mask_strategy\": \"0\"}\nsan francisco skyline at sunset, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/san francisco skyline at sunset.jpg\", \"mask_strategy\": \"0\"}\nsan francisco skyline at sunset, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/san francisco skyline at sunset.jpg\", \"mask_strategy\": \"0\"}\nsan francisco skyline at sunset, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/san francisco skyline at sunset.jpg\", \"mask_strategy\": \"0\"}\nsan francisco skyline at sunset, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/san francisco skyline at sunset.jpg\", \"mask_strategy\": \"0\"}\na castle on top of a hill covered in snow{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a castle on top of a hill covered in snow.jpg\", \"mask_strategy\": \"0\"}\na castle on top of a hill covered in snow, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a castle on top of a hill covered in snow.jpg\", \"mask_strategy\": \"0\"}\na castle on top of a hill covered in snow, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a castle on top of a hill covered in snow.jpg\", \"mask_strategy\": \"0\"}\na castle on top of a hill covered in snow, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a castle on top of a hill covered in snow.jpg\", \"mask_strategy\": \"0\"}\na castle on top of a hill covered in snow, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a castle on top of a hill covered in snow.jpg\", \"mask_strategy\": \"0\"}\na castle on top of a hill covered in snow, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a castle on top of a hill covered in snow.jpg\", \"mask_strategy\": \"0\"}\na castle on top of a hill covered in snow, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a castle on top of a hill covered in snow.jpg\", \"mask_strategy\": \"0\"}\na castle on top of a hill covered in snow, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a castle on top of a hill covered in snow.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of big ben and the houses of parliament in london{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of big ben and the houses of parliament in london.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of big ben and the houses of parliament in london, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of big ben and the houses of parliament in london.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of big ben and the houses of parliament in london, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of big ben and the houses of parliament in london.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of big ben and the houses of parliament in london, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of big ben and the houses of parliament in london.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of big ben and the houses of parliament in london, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of big ben and the houses of parliament in london.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of big ben and the houses of parliament in london, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of big ben and the houses of parliament in london.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of big ben and the houses of parliament in london, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of big ben and the houses of parliament in london.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of big ben and the houses of parliament in london, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of big ben and the houses of parliament in london.jpg\", \"mask_strategy\": \"0\"}\na beach with a lot of buildings on the side of a cliff{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a beach with a lot of buildings on the side of a cliff.jpg\", \"mask_strategy\": \"0\"}\na beach with a lot of buildings on the side of a cliff, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a beach with a lot of buildings on the side of a cliff.jpg\", \"mask_strategy\": \"0\"}\na beach with a lot of buildings on the side of a cliff, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a beach with a lot of buildings on the side of a cliff.jpg\", \"mask_strategy\": \"0\"}\na beach with a lot of buildings on the side of a cliff, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a beach with a lot of buildings on the side of a cliff.jpg\", \"mask_strategy\": \"0\"}\na beach with a lot of buildings on the side of a cliff, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a beach with a lot of buildings on the side of a cliff.jpg\", \"mask_strategy\": \"0\"}\na beach with a lot of buildings on the side of a cliff, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a beach with a lot of buildings on the side of a cliff.jpg\", \"mask_strategy\": \"0\"}\na beach with a lot of buildings on the side of a cliff, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a beach with a lot of buildings on the side of a cliff.jpg\", \"mask_strategy\": \"0\"}\na beach with a lot of buildings on the side of a cliff, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a beach with a lot of buildings on the side of a cliff.jpg\", \"mask_strategy\": \"0\"}\nan alley way in an old european city{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an alley way in an old european city.jpg\", \"mask_strategy\": \"0\"}\nan alley way in an old european city, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an alley way in an old european city.jpg\", \"mask_strategy\": \"0\"}\nan alley way in an old european city, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an alley way in an old european city.jpg\", \"mask_strategy\": \"0\"}\nan alley way in an old european city, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an alley way in an old european city.jpg\", \"mask_strategy\": \"0\"}\nan alley way in an old european city, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an alley way in an old european city.jpg\", \"mask_strategy\": \"0\"}\nan alley way in an old european city, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an alley way in an old european city.jpg\", \"mask_strategy\": \"0\"}\nan alley way in an old european city, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an alley way in an old european city.jpg\", \"mask_strategy\": \"0\"}\nan alley way in an old european city, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an alley way in an old european city.jpg\", \"mask_strategy\": \"0\"}\nthe golden gate bridge in san franscisco is lit up by the setting sun{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the golden gate bridge in san franscisco is lit up by the setting sun.jpg\", \"mask_strategy\": \"0\"}\nthe golden gate bridge in san franscisco is lit up by the setting sun, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the golden gate bridge in san franscisco is lit up by the setting sun.jpg\", \"mask_strategy\": \"0\"}\nthe golden gate bridge in san franscisco is lit up by the setting sun, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the golden gate bridge in san franscisco is lit up by the setting sun.jpg\", \"mask_strategy\": \"0\"}\nthe golden gate bridge in san franscisco is lit up by the setting sun, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the golden gate bridge in san franscisco is lit up by the setting sun.jpg\", \"mask_strategy\": \"0\"}\nthe golden gate bridge in san franscisco is lit up by the setting sun, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the golden gate bridge in san franscisco is lit up by the setting sun.jpg\", \"mask_strategy\": \"0\"}\nthe golden gate bridge in san franscisco is lit up by the setting sun, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the golden gate bridge in san franscisco is lit up by the setting sun.jpg\", \"mask_strategy\": \"0\"}\nthe golden gate bridge in san franscisco is lit up by the setting sun, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the golden gate bridge in san franscisco is lit up by the setting sun.jpg\", \"mask_strategy\": \"0\"}\nthe golden gate bridge in san franscisco is lit up by the setting sun, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the golden gate bridge in san franscisco is lit up by the setting sun.jpg\", \"mask_strategy\": \"0\"}\nthe great wall of china in autumn{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the great wall of china in autumn.jpg\", \"mask_strategy\": \"0\"}\nthe great wall of china in autumn, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the great wall of china in autumn.jpg\", \"mask_strategy\": \"0\"}\nthe great wall of china in autumn, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the great wall of china in autumn.jpg\", \"mask_strategy\": \"0\"}\nthe great wall of china in autumn, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the great wall of china in autumn.jpg\", \"mask_strategy\": \"0\"}\nthe great wall of china in autumn, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the great wall of china in autumn.jpg\", \"mask_strategy\": \"0\"}\nthe great wall of china in autumn, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the great wall of china in autumn.jpg\", \"mask_strategy\": \"0\"}\nthe great wall of china in autumn, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the great wall of china in autumn.jpg\", \"mask_strategy\": \"0\"}\nthe great wall of china in autumn, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the great wall of china in autumn.jpg\", \"mask_strategy\": \"0\"}\nthe town of hallstatt is surrounded by mountains and water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the town of hallstatt is surrounded by mountains and water.jpg\", \"mask_strategy\": \"0\"}\nthe town of hallstatt is surrounded by mountains and water, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the town of hallstatt is surrounded by mountains and water.jpg\", \"mask_strategy\": \"0\"}\nthe town of hallstatt is surrounded by mountains and water, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the town of hallstatt is surrounded by mountains and water.jpg\", \"mask_strategy\": \"0\"}\nthe town of hallstatt is surrounded by mountains and water, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the town of hallstatt is surrounded by mountains and water.jpg\", \"mask_strategy\": \"0\"}\nthe town of hallstatt is surrounded by mountains and water, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the town of hallstatt is surrounded by mountains and water.jpg\", \"mask_strategy\": \"0\"}\nthe town of hallstatt is surrounded by mountains and water, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the town of hallstatt is surrounded by mountains and water.jpg\", \"mask_strategy\": \"0\"}\nthe town of hallstatt is surrounded by mountains and water, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the town of hallstatt is surrounded by mountains and water.jpg\", \"mask_strategy\": \"0\"}\nthe town of hallstatt is surrounded by mountains and water, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the town of hallstatt is surrounded by mountains and water.jpg\", \"mask_strategy\": \"0\"}\ntokyo skyline at night{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tokyo skyline at night.jpg\", \"mask_strategy\": \"0\"}\ntokyo skyline at night, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tokyo skyline at night.jpg\", \"mask_strategy\": \"0\"}\ntokyo skyline at night, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tokyo skyline at night.jpg\", \"mask_strategy\": \"0\"}\ntokyo skyline at night, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tokyo skyline at night.jpg\", \"mask_strategy\": \"0\"}\ntokyo skyline at night, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tokyo skyline at night.jpg\", \"mask_strategy\": \"0\"}\ntokyo skyline at night, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tokyo skyline at night.jpg\", \"mask_strategy\": \"0\"}\ntokyo skyline at night, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tokyo skyline at night.jpg\", \"mask_strategy\": \"0\"}\ntokyo skyline at night, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tokyo skyline at night.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes into a lighthouse{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes into a lighthouse.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes into a lighthouse, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes into a lighthouse.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes into a lighthouse, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes into a lighthouse.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes into a lighthouse, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes into a lighthouse.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes into a lighthouse, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes into a lighthouse.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes into a lighthouse, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes into a lighthouse.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes into a lighthouse, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes into a lighthouse.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes into a lighthouse, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes into a lighthouse.jpg\", \"mask_strategy\": \"0\"}\na church sits on top of a hill under a cloudy sky{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a church sits on top of a hill under a cloudy sky.jpg\", \"mask_strategy\": \"0\"}\na church sits on top of a hill under a cloudy sky, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a church sits on top of a hill under a cloudy sky.jpg\", \"mask_strategy\": \"0\"}\na church sits on top of a hill under a cloudy sky, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a church sits on top of a hill under a cloudy sky.jpg\", \"mask_strategy\": \"0\"}\na church sits on top of a hill under a cloudy sky, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a church sits on top of a hill under a cloudy sky.jpg\", \"mask_strategy\": \"0\"}\na church sits on top of a hill under a cloudy sky, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a church sits on top of a hill under a cloudy sky.jpg\", \"mask_strategy\": \"0\"}\na church sits on top of a hill under a cloudy sky, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a church sits on top of a hill under a cloudy sky.jpg\", \"mask_strategy\": \"0\"}\na church sits on top of a hill under a cloudy sky, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a church sits on top of a hill under a cloudy sky.jpg\", \"mask_strategy\": \"0\"}\na church sits on top of a hill under a cloudy sky, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a church sits on top of a hill under a cloudy sky.jpg\", \"mask_strategy\": \"0\"}\nthe parthenon in acropolis, greece{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the parthenon in acropolis, greece.jpg\", \"mask_strategy\": \"0\"}\nthe parthenon in acropolis, greece, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the parthenon in acropolis, greece.jpg\", \"mask_strategy\": \"0\"}\nthe parthenon in acropolis, greece, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the parthenon in acropolis, greece.jpg\", \"mask_strategy\": \"0\"}\nthe parthenon in acropolis, greece, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the parthenon in acropolis, greece.jpg\", \"mask_strategy\": \"0\"}\nthe parthenon in acropolis, greece, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the parthenon in acropolis, greece.jpg\", \"mask_strategy\": \"0\"}\nthe parthenon in acropolis, greece, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the parthenon in acropolis, greece.jpg\", \"mask_strategy\": \"0\"}\nthe parthenon in acropolis, greece, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the parthenon in acropolis, greece.jpg\", \"mask_strategy\": \"0\"}\nthe parthenon in acropolis, greece, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the parthenon in acropolis, greece.jpg\", \"mask_strategy\": \"0\"}\na large crowd of people walking in a shopping mall{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large crowd of people walking in a shopping mall.jpg\", \"mask_strategy\": \"0\"}\na large crowd of people walking in a shopping mall, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large crowd of people walking in a shopping mall.jpg\", \"mask_strategy\": \"0\"}\na large crowd of people walking in a shopping mall, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large crowd of people walking in a shopping mall.jpg\", \"mask_strategy\": \"0\"}\na large crowd of people walking in a shopping mall, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large crowd of people walking in a shopping mall.jpg\", \"mask_strategy\": \"0\"}\na large crowd of people walking in a shopping mall, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large crowd of people walking in a shopping mall.jpg\", \"mask_strategy\": \"0\"}\na large crowd of people walking in a shopping mall, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large crowd of people walking in a shopping mall.jpg\", \"mask_strategy\": \"0\"}\na large crowd of people walking in a shopping mall, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large crowd of people walking in a shopping mall.jpg\", \"mask_strategy\": \"0\"}\na large crowd of people walking in a shopping mall, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large crowd of people walking in a shopping mall.jpg\", \"mask_strategy\": \"0\"}\nthe pyramids of giza, egypt{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the pyramids of giza, egypt.jpg\", \"mask_strategy\": \"0\"}\nthe pyramids of giza, egypt, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the pyramids of giza, egypt.jpg\", \"mask_strategy\": \"0\"}\nthe pyramids of giza, egypt, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the pyramids of giza, egypt.jpg\", \"mask_strategy\": \"0\"}\nthe pyramids of giza, egypt, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the pyramids of giza, egypt.jpg\", \"mask_strategy\": \"0\"}\nthe pyramids of giza, egypt, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the pyramids of giza, egypt.jpg\", \"mask_strategy\": \"0\"}\nthe pyramids of giza, egypt, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the pyramids of giza, egypt.jpg\", \"mask_strategy\": \"0\"}\nthe pyramids of giza, egypt, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the pyramids of giza, egypt.jpg\", \"mask_strategy\": \"0\"}\nthe pyramids of giza, egypt, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the pyramids of giza, egypt.jpg\", \"mask_strategy\": \"0\"}\na stage door painted with a star on the side of a brick wall{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a stage door painted with a star on the side of a brick wall.jpg\", \"mask_strategy\": \"0\"}\na stage door painted with a star on the side of a brick wall, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a stage door painted with a star on the side of a brick wall.jpg\", \"mask_strategy\": \"0\"}\na stage door painted with a star on the side of a brick wall, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a stage door painted with a star on the side of a brick wall.jpg\", \"mask_strategy\": \"0\"}\na stage door painted with a star on the side of a brick wall, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a stage door painted with a star on the side of a brick wall.jpg\", \"mask_strategy\": \"0\"}\na stage door painted with a star on the side of a brick wall, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a stage door painted with a star on the side of a brick wall.jpg\", \"mask_strategy\": \"0\"}\na stage door painted with a star on the side of a brick wall, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a stage door painted with a star on the side of a brick wall.jpg\", \"mask_strategy\": \"0\"}\na stage door painted with a star on the side of a brick wall, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a stage door painted with a star on the side of a brick wall.jpg\", \"mask_strategy\": \"0\"}\na stage door painted with a star on the side of a brick wall, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a stage door painted with a star on the side of a brick wall.jpg\", \"mask_strategy\": \"0\"}\na light house on the edge of the water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a light house on the edge of the water.jpg\", \"mask_strategy\": \"0\"}\na light house on the edge of the water, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a light house on the edge of the water.jpg\", \"mask_strategy\": \"0\"}\na light house on the edge of the water, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a light house on the edge of the water.jpg\", \"mask_strategy\": \"0\"}\na light house on the edge of the water, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a light house on the edge of the water.jpg\", \"mask_strategy\": \"0\"}\na light house on the edge of the water, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a light house on the edge of the water.jpg\", \"mask_strategy\": \"0\"}\na light house on the edge of the water, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a light house on the edge of the water.jpg\", \"mask_strategy\": \"0\"}\na light house on the edge of the water, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a light house on the edge of the water.jpg\", \"mask_strategy\": \"0\"}\na light house on the edge of the water, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a light house on the edge of the water.jpg\", \"mask_strategy\": \"0\"}\nan asian city street at night with people and bicycles{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an asian city street at night with people and bicycles.jpg\", \"mask_strategy\": \"0\"}\nan asian city street at night with people and bicycles, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an asian city street at night with people and bicycles.jpg\", \"mask_strategy\": \"0\"}\nan asian city street at night with people and bicycles, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an asian city street at night with people and bicycles.jpg\", \"mask_strategy\": \"0\"}\nan asian city street at night with people and bicycles, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an asian city street at night with people and bicycles.jpg\", \"mask_strategy\": \"0\"}\nan asian city street at night with people and bicycles, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an asian city street at night with people and bicycles.jpg\", \"mask_strategy\": \"0\"}\nan asian city street at night with people and bicycles, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an asian city street at night with people and bicycles.jpg\", \"mask_strategy\": \"0\"}\nan asian city street at night with people and bicycles, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an asian city street at night with people and bicycles.jpg\", \"mask_strategy\": \"0\"}\nan asian city street at night with people and bicycles, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an asian city street at night with people and bicycles.jpg\", \"mask_strategy\": \"0\"}\na couple of wooden benches in the middle of a street{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a couple of wooden benches in the middle of a street.jpg\", \"mask_strategy\": \"0\"}\na couple of wooden benches in the middle of a street, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a couple of wooden benches in the middle of a street.jpg\", \"mask_strategy\": \"0\"}\na couple of wooden benches in the middle of a street, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a couple of wooden benches in the middle of a street.jpg\", \"mask_strategy\": \"0\"}\na couple of wooden benches in the middle of a street, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a couple of wooden benches in the middle of a street.jpg\", \"mask_strategy\": \"0\"}\na couple of wooden benches in the middle of a street, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a couple of wooden benches in the middle of a street.jpg\", \"mask_strategy\": \"0\"}\na couple of wooden benches in the middle of a street, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a couple of wooden benches in the middle of a street.jpg\", \"mask_strategy\": \"0\"}\na couple of wooden benches in the middle of a street, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a couple of wooden benches in the middle of a street.jpg\", \"mask_strategy\": \"0\"}\na couple of wooden benches in the middle of a street, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a couple of wooden benches in the middle of a street.jpg\", \"mask_strategy\": \"0\"}\na pagoda sits on top of a mountain in japan{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pagoda sits on top of a mountain in japan.jpg\", \"mask_strategy\": \"0\"}\na pagoda sits on top of a mountain in japan, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pagoda sits on top of a mountain in japan.jpg\", \"mask_strategy\": \"0\"}\na pagoda sits on top of a mountain in japan, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pagoda sits on top of a mountain in japan.jpg\", \"mask_strategy\": \"0\"}\na pagoda sits on top of a mountain in japan, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pagoda sits on top of a mountain in japan.jpg\", \"mask_strategy\": \"0\"}\na pagoda sits on top of a mountain in japan, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pagoda sits on top of a mountain in japan.jpg\", \"mask_strategy\": \"0\"}\na pagoda sits on top of a mountain in japan, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pagoda sits on top of a mountain in japan.jpg\", \"mask_strategy\": \"0\"}\na pagoda sits on top of a mountain in japan, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pagoda sits on top of a mountain in japan.jpg\", \"mask_strategy\": \"0\"}\na pagoda sits on top of a mountain in japan, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pagoda sits on top of a mountain in japan.jpg\", \"mask_strategy\": \"0\"}\na red bus driving down a snowy street at night{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a red bus driving down a snowy street at night.jpg\", \"mask_strategy\": \"0\"}\na red bus driving down a snowy street at night, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a red bus driving down a snowy street at night.jpg\", \"mask_strategy\": \"0\"}\na red bus driving down a snowy street at night, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a red bus driving down a snowy street at night.jpg\", \"mask_strategy\": \"0\"}\na red bus driving down a snowy street at night, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a red bus driving down a snowy street at night.jpg\", \"mask_strategy\": \"0\"}\na red bus driving down a snowy street at night, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a red bus driving down a snowy street at night.jpg\", \"mask_strategy\": \"0\"}\na red bus driving down a snowy street at night, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a red bus driving down a snowy street at night.jpg\", \"mask_strategy\": \"0\"}\na red bus driving down a snowy street at night, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a red bus driving down a snowy street at night.jpg\", \"mask_strategy\": \"0\"}\na red bus driving down a snowy street at night, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a red bus driving down a snowy street at night.jpg\", \"mask_strategy\": \"0\"}\na snow covered street{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a snow covered street.jpg\", \"mask_strategy\": \"0\"}\na snow covered street, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a snow covered street.jpg\", \"mask_strategy\": \"0\"}\na snow covered street, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a snow covered street.jpg\", \"mask_strategy\": \"0\"}\na snow covered street, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a snow covered street.jpg\", \"mask_strategy\": \"0\"}\na snow covered street, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a snow covered street.jpg\", \"mask_strategy\": \"0\"}\na snow covered street, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a snow covered street.jpg\", \"mask_strategy\": \"0\"}\na snow covered street, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a snow covered street.jpg\", \"mask_strategy\": \"0\"}\na snow covered street, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a snow covered street.jpg\", \"mask_strategy\": \"0\"}\na house with snow on the ground{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a house with snow on the ground.jpg\", \"mask_strategy\": \"0\"}\na house with snow on the ground, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a house with snow on the ground.jpg\", \"mask_strategy\": \"0\"}\na house with snow on the ground, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a house with snow on the ground.jpg\", \"mask_strategy\": \"0\"}\na house with snow on the ground, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a house with snow on the ground.jpg\", \"mask_strategy\": \"0\"}\na house with snow on the ground, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a house with snow on the ground.jpg\", \"mask_strategy\": \"0\"}\na house with snow on the ground, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a house with snow on the ground.jpg\", \"mask_strategy\": \"0\"}\na house with snow on the ground, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a house with snow on the ground.jpg\", \"mask_strategy\": \"0\"}\na house with snow on the ground, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a house with snow on the ground.jpg\", \"mask_strategy\": \"0\"}\ncars parked on the side of the road during a snowstorm{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/cars parked on the side of the road during a snowstorm.jpg\", \"mask_strategy\": \"0\"}\ncars parked on the side of the road during a snowstorm, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/cars parked on the side of the road during a snowstorm.jpg\", \"mask_strategy\": \"0\"}\ncars parked on the side of the road during a snowstorm, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/cars parked on the side of the road during a snowstorm.jpg\", \"mask_strategy\": \"0\"}\ncars parked on the side of the road during a snowstorm, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/cars parked on the side of the road during a snowstorm.jpg\", \"mask_strategy\": \"0\"}\ncars parked on the side of the road during a snowstorm, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/cars parked on the side of the road during a snowstorm.jpg\", \"mask_strategy\": \"0\"}\ncars parked on the side of the road during a snowstorm, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/cars parked on the side of the road during a snowstorm.jpg\", \"mask_strategy\": \"0\"}\ncars parked on the side of the road during a snowstorm, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/cars parked on the side of the road during a snowstorm.jpg\", \"mask_strategy\": \"0\"}\ncars parked on the side of the road during a snowstorm, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/cars parked on the side of the road during a snowstorm.jpg\", \"mask_strategy\": \"0\"}\na group of statues on the side of a building{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of statues on the side of a building.jpg\", \"mask_strategy\": \"0\"}\na group of statues on the side of a building, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of statues on the side of a building.jpg\", \"mask_strategy\": \"0\"}\na group of statues on the side of a building, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of statues on the side of a building.jpg\", \"mask_strategy\": \"0\"}\na group of statues on the side of a building, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of statues on the side of a building.jpg\", \"mask_strategy\": \"0\"}\na group of statues on the side of a building, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of statues on the side of a building.jpg\", \"mask_strategy\": \"0\"}\na group of statues on the side of a building, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of statues on the side of a building.jpg\", \"mask_strategy\": \"0\"}\na group of statues on the side of a building, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of statues on the side of a building.jpg\", \"mask_strategy\": \"0\"}\na group of statues on the side of a building, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of statues on the side of a building.jpg\", \"mask_strategy\": \"0\"}\na city street at night during a snow storm{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a city street at night during a snow storm.jpg\", \"mask_strategy\": \"0\"}\na city street at night during a snow storm, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a city street at night during a snow storm.jpg\", \"mask_strategy\": \"0\"}\na city street at night during a snow storm, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a city street at night during a snow storm.jpg\", \"mask_strategy\": \"0\"}\na city street at night during a snow storm, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a city street at night during a snow storm.jpg\", \"mask_strategy\": \"0\"}\na city street at night during a snow storm, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a city street at night during a snow storm.jpg\", \"mask_strategy\": \"0\"}\na city street at night during a snow storm, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a city street at night during a snow storm.jpg\", \"mask_strategy\": \"0\"}\na city street at night during a snow storm, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a city street at night during a snow storm.jpg\", \"mask_strategy\": \"0\"}\na city street at night during a snow storm, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a city street at night during a snow storm.jpg\", \"mask_strategy\": \"0\"}\ntower bridge in london{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tower bridge in london.jpg\", \"mask_strategy\": \"0\"}\ntower bridge in london, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tower bridge in london.jpg\", \"mask_strategy\": \"0\"}\ntower bridge in london, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tower bridge in london.jpg\", \"mask_strategy\": \"0\"}\ntower bridge in london, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tower bridge in london.jpg\", \"mask_strategy\": \"0\"}\ntower bridge in london, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tower bridge in london.jpg\", \"mask_strategy\": \"0\"}\ntower bridge in london, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tower bridge in london.jpg\", \"mask_strategy\": \"0\"}\ntower bridge in london, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tower bridge in london.jpg\", \"mask_strategy\": \"0\"}\ntower bridge in london, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tower bridge in london.jpg\", \"mask_strategy\": \"0\"}\nchinese pagoda in the middle of a snowy day{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/chinese pagoda in the middle of a snowy day.jpg\", \"mask_strategy\": \"0\"}\nchinese pagoda in the middle of a snowy day, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/chinese pagoda in the middle of a snowy day.jpg\", \"mask_strategy\": \"0\"}\nchinese pagoda in the middle of a snowy day, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/chinese pagoda in the middle of a snowy day.jpg\", \"mask_strategy\": \"0\"}\nchinese pagoda in the middle of a snowy day, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/chinese pagoda in the middle of a snowy day.jpg\", \"mask_strategy\": \"0\"}\nchinese pagoda in the middle of a snowy day, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/chinese pagoda in the middle of a snowy day.jpg\", \"mask_strategy\": \"0\"}\nchinese pagoda in the middle of a snowy day, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/chinese pagoda in the middle of a snowy day.jpg\", \"mask_strategy\": \"0\"}\nchinese pagoda in the middle of a snowy day, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/chinese pagoda in the middle of a snowy day.jpg\", \"mask_strategy\": \"0\"}\nchinese pagoda in the middle of a snowy day, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/chinese pagoda in the middle of a snowy day.jpg\", \"mask_strategy\": \"0\"}\na dark alleyway with a bus driving down it{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dark alleyway with a bus driving down it.jpg\", \"mask_strategy\": \"0\"}\na dark alleyway with a bus driving down it, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dark alleyway with a bus driving down it.jpg\", \"mask_strategy\": \"0\"}\na dark alleyway with a bus driving down it, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dark alleyway with a bus driving down it.jpg\", \"mask_strategy\": \"0\"}\na dark alleyway with a bus driving down it, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dark alleyway with a bus driving down it.jpg\", \"mask_strategy\": \"0\"}\na dark alleyway with a bus driving down it, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dark alleyway with a bus driving down it.jpg\", \"mask_strategy\": \"0\"}\na dark alleyway with a bus driving down it, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dark alleyway with a bus driving down it.jpg\", \"mask_strategy\": \"0\"}\na dark alleyway with a bus driving down it, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dark alleyway with a bus driving down it.jpg\", \"mask_strategy\": \"0\"}\na dark alleyway with a bus driving down it, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dark alleyway with a bus driving down it.jpg\", \"mask_strategy\": \"0\"}\na monastery sits on top of a cliff in bhutan{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a monastery sits on top of a cliff in bhutan.jpg\", \"mask_strategy\": \"0\"}\na monastery sits on top of a cliff in bhutan, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a monastery sits on top of a cliff in bhutan.jpg\", \"mask_strategy\": \"0\"}\na monastery sits on top of a cliff in bhutan, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a monastery sits on top of a cliff in bhutan.jpg\", \"mask_strategy\": \"0\"}\na monastery sits on top of a cliff in bhutan, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a monastery sits on top of a cliff in bhutan.jpg\", \"mask_strategy\": \"0\"}\na monastery sits on top of a cliff in bhutan, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a monastery sits on top of a cliff in bhutan.jpg\", \"mask_strategy\": \"0\"}\na monastery sits on top of a cliff in bhutan, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a monastery sits on top of a cliff in bhutan.jpg\", \"mask_strategy\": \"0\"}\na monastery sits on top of a cliff in bhutan, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a monastery sits on top of a cliff in bhutan.jpg\", \"mask_strategy\": \"0\"}\na monastery sits on top of a cliff in bhutan, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a monastery sits on top of a cliff in bhutan.jpg\", \"mask_strategy\": \"0\"}\nthe dome of the rock in jerusalem{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the dome of the rock in jerusalem.jpg\", \"mask_strategy\": \"0\"}\nthe dome of the rock in jerusalem, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the dome of the rock in jerusalem.jpg\", \"mask_strategy\": \"0\"}\nthe dome of the rock in jerusalem, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the dome of the rock in jerusalem.jpg\", \"mask_strategy\": \"0\"}\nthe dome of the rock in jerusalem, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the dome of the rock in jerusalem.jpg\", \"mask_strategy\": \"0\"}\nthe dome of the rock in jerusalem, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the dome of the rock in jerusalem.jpg\", \"mask_strategy\": \"0\"}\nthe dome of the rock in jerusalem, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the dome of the rock in jerusalem.jpg\", \"mask_strategy\": \"0\"}\nthe dome of the rock in jerusalem, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the dome of the rock in jerusalem.jpg\", \"mask_strategy\": \"0\"}\nthe dome of the rock in jerusalem, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the dome of the rock in jerusalem.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a futuristic building on a cliff overlooking a body of water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a futuristic building on a cliff overlooking a body of water.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a futuristic building on a cliff overlooking a body of water, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a futuristic building on a cliff overlooking a body of water.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a futuristic building on a cliff overlooking a body of water, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a futuristic building on a cliff overlooking a body of water.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a futuristic building on a cliff overlooking a body of water, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a futuristic building on a cliff overlooking a body of water.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a futuristic building on a cliff overlooking a body of water, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a futuristic building on a cliff overlooking a body of water.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a futuristic building on a cliff overlooking a body of water, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a futuristic building on a cliff overlooking a body of water.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a futuristic building on a cliff overlooking a body of water, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a futuristic building on a cliff overlooking a body of water.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a futuristic building on a cliff overlooking a body of water, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a futuristic building on a cliff overlooking a body of water.jpg\", \"mask_strategy\": \"0\"}\na reflection of a city with buildings in the water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a reflection of a city with buildings in the water.jpg\", \"mask_strategy\": \"0\"}\na reflection of a city with buildings in the water, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a reflection of a city with buildings in the water.jpg\", \"mask_strategy\": \"0\"}\na reflection of a city with buildings in the water, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a reflection of a city with buildings in the water.jpg\", \"mask_strategy\": \"0\"}\na reflection of a city with buildings in the water, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a reflection of a city with buildings in the water.jpg\", \"mask_strategy\": \"0\"}\na reflection of a city with buildings in the water, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a reflection of a city with buildings in the water.jpg\", \"mask_strategy\": \"0\"}\na reflection of a city with buildings in the water, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a reflection of a city with buildings in the water.jpg\", \"mask_strategy\": \"0\"}\na reflection of a city with buildings in the water, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a reflection of a city with buildings in the water.jpg\", \"mask_strategy\": \"0\"}\na reflection of a city with buildings in the water, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a reflection of a city with buildings in the water.jpg\", \"mask_strategy\": \"0\"}\na bar with chairs and a television on the wall{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bar with chairs and a television on the wall.jpg\", \"mask_strategy\": \"0\"}\na bar with chairs and a television on the wall, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bar with chairs and a television on the wall.jpg\", \"mask_strategy\": \"0\"}\na bar with chairs and a television on the wall, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bar with chairs and a television on the wall.jpg\", \"mask_strategy\": \"0\"}\na bar with chairs and a television on the wall, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bar with chairs and a television on the wall.jpg\", \"mask_strategy\": \"0\"}\na bar with chairs and a television on the wall, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bar with chairs and a television on the wall.jpg\", \"mask_strategy\": \"0\"}\na bar with chairs and a television on the wall, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bar with chairs and a television on the wall.jpg\", \"mask_strategy\": \"0\"}\na bar with chairs and a television on the wall, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bar with chairs and a television on the wall.jpg\", \"mask_strategy\": \"0\"}\na bar with chairs and a television on the wall, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bar with chairs and a television on the wall.jpg\", \"mask_strategy\": \"0\"}\na living room filled with lots of books on a wall{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room filled with lots of books on a wall.jpg\", \"mask_strategy\": \"0\"}\na living room filled with lots of books on a wall, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room filled with lots of books on a wall.jpg\", \"mask_strategy\": \"0\"}\na living room filled with lots of books on a wall, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room filled with lots of books on a wall.jpg\", \"mask_strategy\": \"0\"}\na living room filled with lots of books on a wall, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room filled with lots of books on a wall.jpg\", \"mask_strategy\": \"0\"}\na living room filled with lots of books on a wall, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room filled with lots of books on a wall.jpg\", \"mask_strategy\": \"0\"}\na living room filled with lots of books on a wall, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room filled with lots of books on a wall.jpg\", \"mask_strategy\": \"0\"}\na living room filled with lots of books on a wall, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room filled with lots of books on a wall.jpg\", \"mask_strategy\": \"0\"}\na living room filled with lots of books on a wall, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room filled with lots of books on a wall.jpg\", \"mask_strategy\": \"0\"}\na living room filled with furniture next to a stone wall{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room filled with furniture next to a stone wall.jpg\", \"mask_strategy\": \"0\"}\na living room filled with furniture next to a stone wall, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room filled with furniture next to a stone wall.jpg\", \"mask_strategy\": \"0\"}\na living room filled with furniture next to a stone wall, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room filled with furniture next to a stone wall.jpg\", \"mask_strategy\": \"0\"}\na living room filled with furniture next to a stone wall, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room filled with furniture next to a stone wall.jpg\", \"mask_strategy\": \"0\"}\na living room filled with furniture next to a stone wall, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room filled with furniture next to a stone wall.jpg\", \"mask_strategy\": \"0\"}\na living room filled with furniture next to a stone wall, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room filled with furniture next to a stone wall.jpg\", \"mask_strategy\": \"0\"}\na living room filled with furniture next to a stone wall, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room filled with furniture next to a stone wall.jpg\", \"mask_strategy\": \"0\"}\na living room filled with furniture next to a stone wall, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room filled with furniture next to a stone wall.jpg\", \"mask_strategy\": \"0\"}\na table and chairs in a room with sunlight coming through the window{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table and chairs in a room with sunlight coming through the window.jpg\", \"mask_strategy\": \"0\"}\na table and chairs in a room with sunlight coming through the window, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table and chairs in a room with sunlight coming through the window.jpg\", \"mask_strategy\": \"0\"}\na table and chairs in a room with sunlight coming through the window, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table and chairs in a room with sunlight coming through the window.jpg\", \"mask_strategy\": \"0\"}\na table and chairs in a room with sunlight coming through the window, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table and chairs in a room with sunlight coming through the window.jpg\", \"mask_strategy\": \"0\"}\na table and chairs in a room with sunlight coming through the window, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table and chairs in a room with sunlight coming through the window.jpg\", \"mask_strategy\": \"0\"}\na table and chairs in a room with sunlight coming through the window, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table and chairs in a room with sunlight coming through the window.jpg\", \"mask_strategy\": \"0\"}\na table and chairs in a room with sunlight coming through the window, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table and chairs in a room with sunlight coming through the window.jpg\", \"mask_strategy\": \"0\"}\na table and chairs in a room with sunlight coming through the window, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table and chairs in a room with sunlight coming through the window.jpg\", \"mask_strategy\": \"0\"}\na room filled with lots of shelves filled with books{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with lots of shelves filled with books.jpg\", \"mask_strategy\": \"0\"}\na room filled with lots of shelves filled with books, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with lots of shelves filled with books.jpg\", \"mask_strategy\": \"0\"}\na room filled with lots of shelves filled with books, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with lots of shelves filled with books.jpg\", \"mask_strategy\": \"0\"}\na room filled with lots of shelves filled with books, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with lots of shelves filled with books.jpg\", \"mask_strategy\": \"0\"}\na room filled with lots of shelves filled with books, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with lots of shelves filled with books.jpg\", \"mask_strategy\": \"0\"}\na room filled with lots of shelves filled with books, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with lots of shelves filled with books.jpg\", \"mask_strategy\": \"0\"}\na room filled with lots of shelves filled with books, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with lots of shelves filled with books.jpg\", \"mask_strategy\": \"0\"}\na room filled with lots of shelves filled with books, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with lots of shelves filled with books.jpg\", \"mask_strategy\": \"0\"}\nan art gallery with paintings on the walls{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an art gallery with paintings on the walls.jpg\", \"mask_strategy\": \"0\"}\nan art gallery with paintings on the walls, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an art gallery with paintings on the walls.jpg\", \"mask_strategy\": \"0\"}\nan art gallery with paintings on the walls, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an art gallery with paintings on the walls.jpg\", \"mask_strategy\": \"0\"}\nan art gallery with paintings on the walls, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an art gallery with paintings on the walls.jpg\", \"mask_strategy\": \"0\"}\nan art gallery with paintings on the walls, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an art gallery with paintings on the walls.jpg\", \"mask_strategy\": \"0\"}\nan art gallery with paintings on the walls, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an art gallery with paintings on the walls.jpg\", \"mask_strategy\": \"0\"}\nan art gallery with paintings on the walls, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an art gallery with paintings on the walls.jpg\", \"mask_strategy\": \"0\"}\nan art gallery with paintings on the walls, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an art gallery with paintings on the walls.jpg\", \"mask_strategy\": \"0\"}\na room with a lot of pictures on the walls{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room with a lot of pictures on the walls.jpg\", \"mask_strategy\": \"0\"}\na room with a lot of pictures on the walls, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room with a lot of pictures on the walls.jpg\", \"mask_strategy\": \"0\"}\na room with a lot of pictures on the walls, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room with a lot of pictures on the walls.jpg\", \"mask_strategy\": \"0\"}\na room with a lot of pictures on the walls, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room with a lot of pictures on the walls.jpg\", \"mask_strategy\": \"0\"}\na room with a lot of pictures on the walls, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room with a lot of pictures on the walls.jpg\", \"mask_strategy\": \"0\"}\na room with a lot of pictures on the walls, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room with a lot of pictures on the walls.jpg\", \"mask_strategy\": \"0\"}\na room with a lot of pictures on the walls, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room with a lot of pictures on the walls.jpg\", \"mask_strategy\": \"0\"}\na room with a lot of pictures on the walls, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room with a lot of pictures on the walls.jpg\", \"mask_strategy\": \"0\"}\na painting of a cloudy sky next to an easel{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a painting of a cloudy sky next to an easel.jpg\", \"mask_strategy\": \"0\"}\na painting of a cloudy sky next to an easel, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a painting of a cloudy sky next to an easel.jpg\", \"mask_strategy\": \"0\"}\na painting of a cloudy sky next to an easel, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a painting of a cloudy sky next to an easel.jpg\", \"mask_strategy\": \"0\"}\na painting of a cloudy sky next to an easel, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a painting of a cloudy sky next to an easel.jpg\", \"mask_strategy\": \"0\"}\na painting of a cloudy sky next to an easel, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a painting of a cloudy sky next to an easel.jpg\", \"mask_strategy\": \"0\"}\na painting of a cloudy sky next to an easel, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a painting of a cloudy sky next to an easel.jpg\", \"mask_strategy\": \"0\"}\na painting of a cloudy sky next to an easel, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a painting of a cloudy sky next to an easel.jpg\", \"mask_strategy\": \"0\"}\na painting of a cloudy sky next to an easel, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a painting of a cloudy sky next to an easel.jpg\", \"mask_strategy\": \"0\"}\na living room with a christmas tree and a rocking chair{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with a christmas tree and a rocking chair.jpg\", \"mask_strategy\": \"0\"}\na living room with a christmas tree and a rocking chair, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with a christmas tree and a rocking chair.jpg\", \"mask_strategy\": \"0\"}\na living room with a christmas tree and a rocking chair, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with a christmas tree and a rocking chair.jpg\", \"mask_strategy\": \"0\"}\na living room with a christmas tree and a rocking chair, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with a christmas tree and a rocking chair.jpg\", \"mask_strategy\": \"0\"}\na living room with a christmas tree and a rocking chair, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with a christmas tree and a rocking chair.jpg\", \"mask_strategy\": \"0\"}\na living room with a christmas tree and a rocking chair, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with a christmas tree and a rocking chair.jpg\", \"mask_strategy\": \"0\"}\na living room with a christmas tree and a rocking chair, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with a christmas tree and a rocking chair.jpg\", \"mask_strategy\": \"0\"}\na living room with a christmas tree and a rocking chair, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with a christmas tree and a rocking chair.jpg\", \"mask_strategy\": \"0\"}\na kitchen with a sink and a lot of glasses on the counter{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a kitchen with a sink and a lot of glasses on the counter.jpg\", \"mask_strategy\": \"0\"}\na kitchen with a sink and a lot of glasses on the counter, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a kitchen with a sink and a lot of glasses on the counter.jpg\", \"mask_strategy\": \"0\"}\na kitchen with a sink and a lot of glasses on the counter, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a kitchen with a sink and a lot of glasses on the counter.jpg\", \"mask_strategy\": \"0\"}\na kitchen with a sink and a lot of glasses on the counter, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a kitchen with a sink and a lot of glasses on the counter.jpg\", \"mask_strategy\": \"0\"}\na kitchen with a sink and a lot of glasses on the counter, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a kitchen with a sink and a lot of glasses on the counter.jpg\", \"mask_strategy\": \"0\"}\na kitchen with a sink and a lot of glasses on the counter, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a kitchen with a sink and a lot of glasses on the counter.jpg\", \"mask_strategy\": \"0\"}\na kitchen with a sink and a lot of glasses on the counter, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a kitchen with a sink and a lot of glasses on the counter.jpg\", \"mask_strategy\": \"0\"}\na kitchen with a sink and a lot of glasses on the counter, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a kitchen with a sink and a lot of glasses on the counter.jpg\", \"mask_strategy\": \"0\"}\na wooden table in front of a brick wall with bottles on the wall{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a wooden table in front of a brick wall with bottles on the wall.jpg\", \"mask_strategy\": \"0\"}\na wooden table in front of a brick wall with bottles on the wall, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a wooden table in front of a brick wall with bottles on the wall.jpg\", \"mask_strategy\": \"0\"}\na wooden table in front of a brick wall with bottles on the wall, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a wooden table in front of a brick wall with bottles on the wall.jpg\", \"mask_strategy\": \"0\"}\na wooden table in front of a brick wall with bottles on the wall, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a wooden table in front of a brick wall with bottles on the wall.jpg\", \"mask_strategy\": \"0\"}\na wooden table in front of a brick wall with bottles on the wall, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a wooden table in front of a brick wall with bottles on the wall.jpg\", \"mask_strategy\": \"0\"}\na wooden table in front of a brick wall with bottles on the wall, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a wooden table in front of a brick wall with bottles on the wall.jpg\", \"mask_strategy\": \"0\"}\na wooden table in front of a brick wall with bottles on the wall, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a wooden table in front of a brick wall with bottles on the wall.jpg\", \"mask_strategy\": \"0\"}\na wooden table in front of a brick wall with bottles on the wall, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a wooden table in front of a brick wall with bottles on the wall.jpg\", \"mask_strategy\": \"0\"}\na room filled with paintings and statues{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with paintings and statues.jpg\", \"mask_strategy\": \"0\"}\na room filled with paintings and statues, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with paintings and statues.jpg\", \"mask_strategy\": \"0\"}\na room filled with paintings and statues, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with paintings and statues.jpg\", \"mask_strategy\": \"0\"}\na room filled with paintings and statues, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with paintings and statues.jpg\", \"mask_strategy\": \"0\"}\na room filled with paintings and statues, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with paintings and statues.jpg\", \"mask_strategy\": \"0\"}\na room filled with paintings and statues, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with paintings and statues.jpg\", \"mask_strategy\": \"0\"}\na room filled with paintings and statues, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with paintings and statues.jpg\", \"mask_strategy\": \"0\"}\na room filled with paintings and statues, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with paintings and statues.jpg\", \"mask_strategy\": \"0\"}\nan outdoor dining area surrounded by plants and a brick walkway{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an outdoor dining area surrounded by plants and a brick walkway.jpg\", \"mask_strategy\": \"0\"}\nan outdoor dining area surrounded by plants and a brick walkway, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an outdoor dining area surrounded by plants and a brick walkway.jpg\", \"mask_strategy\": \"0\"}\nan outdoor dining area surrounded by plants and a brick walkway, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an outdoor dining area surrounded by plants and a brick walkway.jpg\", \"mask_strategy\": \"0\"}\nan outdoor dining area surrounded by plants and a brick walkway, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an outdoor dining area surrounded by plants and a brick walkway.jpg\", \"mask_strategy\": \"0\"}\nan outdoor dining area surrounded by plants and a brick walkway, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an outdoor dining area surrounded by plants and a brick walkway.jpg\", \"mask_strategy\": \"0\"}\nan outdoor dining area surrounded by plants and a brick walkway, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an outdoor dining area surrounded by plants and a brick walkway.jpg\", \"mask_strategy\": \"0\"}\nan outdoor dining area surrounded by plants and a brick walkway, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an outdoor dining area surrounded by plants and a brick walkway.jpg\", \"mask_strategy\": \"0\"}\nan outdoor dining area surrounded by plants and a brick walkway, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an outdoor dining area surrounded by plants and a brick walkway.jpg\", \"mask_strategy\": \"0\"}\na room filled with books and teddy bears{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with books and teddy bears.jpg\", \"mask_strategy\": \"0\"}\na room filled with books and teddy bears, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with books and teddy bears.jpg\", \"mask_strategy\": \"0\"}\na room filled with books and teddy bears, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with books and teddy bears.jpg\", \"mask_strategy\": \"0\"}\na room filled with books and teddy bears, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with books and teddy bears.jpg\", \"mask_strategy\": \"0\"}\na room filled with books and teddy bears, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with books and teddy bears.jpg\", \"mask_strategy\": \"0\"}\na room filled with books and teddy bears, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with books and teddy bears.jpg\", \"mask_strategy\": \"0\"}\na room filled with books and teddy bears, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with books and teddy bears.jpg\", \"mask_strategy\": \"0\"}\na room filled with books and teddy bears, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room filled with books and teddy bears.jpg\", \"mask_strategy\": \"0\"}\na table and chairs in a room with a plant in the corner{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table and chairs in a room with a plant in the corner.jpg\", \"mask_strategy\": \"0\"}\na table and chairs in a room with a plant in the corner, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table and chairs in a room with a plant in the corner.jpg\", \"mask_strategy\": \"0\"}\na table and chairs in a room with a plant in the corner, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table and chairs in a room with a plant in the corner.jpg\", \"mask_strategy\": \"0\"}\na table and chairs in a room with a plant in the corner, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table and chairs in a room with a plant in the corner.jpg\", \"mask_strategy\": \"0\"}\na table and chairs in a room with a plant in the corner, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table and chairs in a room with a plant in the corner.jpg\", \"mask_strategy\": \"0\"}\na table and chairs in a room with a plant in the corner, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table and chairs in a room with a plant in the corner.jpg\", \"mask_strategy\": \"0\"}\na table and chairs in a room with a plant in the corner, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table and chairs in a room with a plant in the corner.jpg\", \"mask_strategy\": \"0\"}\na table and chairs in a room with a plant in the corner, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table and chairs in a room with a plant in the corner.jpg\", \"mask_strategy\": \"0\"}\na living room with a couch, table, and a window{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with a couch, table, and a window.jpg\", \"mask_strategy\": \"0\"}\na living room with a couch, table, and a window, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with a couch, table, and a window.jpg\", \"mask_strategy\": \"0\"}\na living room with a couch, table, and a window, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with a couch, table, and a window.jpg\", \"mask_strategy\": \"0\"}\na living room with a couch, table, and a window, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with a couch, table, and a window.jpg\", \"mask_strategy\": \"0\"}\na living room with a couch, table, and a window, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with a couch, table, and a window.jpg\", \"mask_strategy\": \"0\"}\na living room with a couch, table, and a window, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with a couch, table, and a window.jpg\", \"mask_strategy\": \"0\"}\na living room with a couch, table, and a window, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with a couch, table, and a window.jpg\", \"mask_strategy\": \"0\"}\na living room with a couch, table, and a window, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with a couch, table, and a window.jpg\", \"mask_strategy\": \"0\"}\na modern living room with wood floors and a tv{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a modern living room with wood floors and a tv.jpg\", \"mask_strategy\": \"0\"}\na modern living room with wood floors and a tv, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a modern living room with wood floors and a tv.jpg\", \"mask_strategy\": \"0\"}\na modern living room with wood floors and a tv, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a modern living room with wood floors and a tv.jpg\", \"mask_strategy\": \"0\"}\na modern living room with wood floors and a tv, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a modern living room with wood floors and a tv.jpg\", \"mask_strategy\": \"0\"}\na modern living room with wood floors and a tv, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a modern living room with wood floors and a tv.jpg\", \"mask_strategy\": \"0\"}\na modern living room with wood floors and a tv, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a modern living room with wood floors and a tv.jpg\", \"mask_strategy\": \"0\"}\na modern living room with wood floors and a tv, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a modern living room with wood floors and a tv.jpg\", \"mask_strategy\": \"0\"}\na modern living room with wood floors and a tv, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a modern living room with wood floors and a tv.jpg\", \"mask_strategy\": \"0\"}\na room with a desk and a chair in it{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room with a desk and a chair in it.jpg\", \"mask_strategy\": \"0\"}\na room with a desk and a chair in it, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room with a desk and a chair in it.jpg\", \"mask_strategy\": \"0\"}\na room with a desk and a chair in it, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room with a desk and a chair in it.jpg\", \"mask_strategy\": \"0\"}\na room with a desk and a chair in it, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room with a desk and a chair in it.jpg\", \"mask_strategy\": \"0\"}\na room with a desk and a chair in it, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room with a desk and a chair in it.jpg\", \"mask_strategy\": \"0\"}\na room with a desk and a chair in it, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room with a desk and a chair in it.jpg\", \"mask_strategy\": \"0\"}\na room with a desk and a chair in it, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room with a desk and a chair in it.jpg\", \"mask_strategy\": \"0\"}\na room with a desk and a chair in it, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a room with a desk and a chair in it.jpg\", \"mask_strategy\": \"0\"}\na large waterfall in the middle of a building{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large waterfall in the middle of a building.jpg\", \"mask_strategy\": \"0\"}\na large waterfall in the middle of a building, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large waterfall in the middle of a building.jpg\", \"mask_strategy\": \"0\"}\na large waterfall in the middle of a building, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large waterfall in the middle of a building.jpg\", \"mask_strategy\": \"0\"}\na large waterfall in the middle of a building, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large waterfall in the middle of a building.jpg\", \"mask_strategy\": \"0\"}\na large waterfall in the middle of a building, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large waterfall in the middle of a building.jpg\", \"mask_strategy\": \"0\"}\na large waterfall in the middle of a building, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large waterfall in the middle of a building.jpg\", \"mask_strategy\": \"0\"}\na large waterfall in the middle of a building, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large waterfall in the middle of a building.jpg\", \"mask_strategy\": \"0\"}\na large waterfall in the middle of a building, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large waterfall in the middle of a building.jpg\", \"mask_strategy\": \"0\"}\na chair in a room next to some drawings{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a chair in a room next to some drawings.jpg\", \"mask_strategy\": \"0\"}\na chair in a room next to some drawings, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a chair in a room next to some drawings.jpg\", \"mask_strategy\": \"0\"}\na chair in a room next to some drawings, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a chair in a room next to some drawings.jpg\", \"mask_strategy\": \"0\"}\na chair in a room next to some drawings, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a chair in a room next to some drawings.jpg\", \"mask_strategy\": \"0\"}\na chair in a room next to some drawings, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a chair in a room next to some drawings.jpg\", \"mask_strategy\": \"0\"}\na chair in a room next to some drawings, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a chair in a room next to some drawings.jpg\", \"mask_strategy\": \"0\"}\na chair in a room next to some drawings, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a chair in a room next to some drawings.jpg\", \"mask_strategy\": \"0\"}\na chair in a room next to some drawings, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a chair in a room next to some drawings.jpg\", \"mask_strategy\": \"0\"}\na living room with hardwood floors and a white couch{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with hardwood floors and a white couch.jpg\", \"mask_strategy\": \"0\"}\na living room with hardwood floors and a white couch, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with hardwood floors and a white couch.jpg\", \"mask_strategy\": \"0\"}\na living room with hardwood floors and a white couch, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with hardwood floors and a white couch.jpg\", \"mask_strategy\": \"0\"}\na living room with hardwood floors and a white couch, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with hardwood floors and a white couch.jpg\", \"mask_strategy\": \"0\"}\na living room with hardwood floors and a white couch, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with hardwood floors and a white couch.jpg\", \"mask_strategy\": \"0\"}\na living room with hardwood floors and a white couch, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with hardwood floors and a white couch.jpg\", \"mask_strategy\": \"0\"}\na living room with hardwood floors and a white couch, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with hardwood floors and a white couch.jpg\", \"mask_strategy\": \"0\"}\na living room with hardwood floors and a white couch, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a living room with hardwood floors and a white couch.jpg\", \"mask_strategy\": \"0\"}\ntwo people in a canoe on a lake with mountains in the background{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two people in a canoe on a lake with mountains in the background.jpg\", \"mask_strategy\": \"0\"}\ntwo people in a canoe on a lake with mountains in the background, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two people in a canoe on a lake with mountains in the background.jpg\", \"mask_strategy\": \"0\"}\ntwo people in a canoe on a lake with mountains in the background, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two people in a canoe on a lake with mountains in the background.jpg\", \"mask_strategy\": \"0\"}\ntwo people in a canoe on a lake with mountains in the background, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two people in a canoe on a lake with mountains in the background.jpg\", \"mask_strategy\": \"0\"}\ntwo people in a canoe on a lake with mountains in the background, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two people in a canoe on a lake with mountains in the background.jpg\", \"mask_strategy\": \"0\"}\ntwo people in a canoe on a lake with mountains in the background, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two people in a canoe on a lake with mountains in the background.jpg\", \"mask_strategy\": \"0\"}\ntwo people in a canoe on a lake with mountains in the background, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two people in a canoe on a lake with mountains in the background.jpg\", \"mask_strategy\": \"0\"}\ntwo people in a canoe on a lake with mountains in the background, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two people in a canoe on a lake with mountains in the background.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a snowy road in a forest{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a snowy road in a forest.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a snowy road in a forest, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a snowy road in a forest.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a snowy road in a forest, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a snowy road in a forest.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a snowy road in a forest, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a snowy road in a forest.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a snowy road in a forest, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a snowy road in a forest.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a snowy road in a forest, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a snowy road in a forest.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a snowy road in a forest, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a snowy road in a forest.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a snowy road in a forest, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a snowy road in a forest.jpg\", \"mask_strategy\": \"0\"}\na view of a waterfall from a distance{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a waterfall from a distance.jpg\", \"mask_strategy\": \"0\"}\na view of a waterfall from a distance, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a waterfall from a distance.jpg\", \"mask_strategy\": \"0\"}\na view of a waterfall from a distance, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a waterfall from a distance.jpg\", \"mask_strategy\": \"0\"}\na view of a waterfall from a distance, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a waterfall from a distance.jpg\", \"mask_strategy\": \"0\"}\na view of a waterfall from a distance, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a waterfall from a distance.jpg\", \"mask_strategy\": \"0\"}\na view of a waterfall from a distance, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a waterfall from a distance.jpg\", \"mask_strategy\": \"0\"}\na view of a waterfall from a distance, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a waterfall from a distance.jpg\", \"mask_strategy\": \"0\"}\na view of a waterfall from a distance, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a view of a waterfall from a distance.jpg\", \"mask_strategy\": \"0\"}\na group of hot air balloons flying over a valley{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of hot air balloons flying over a valley.jpg\", \"mask_strategy\": \"0\"}\na group of hot air balloons flying over a valley, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of hot air balloons flying over a valley.jpg\", \"mask_strategy\": \"0\"}\na group of hot air balloons flying over a valley, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of hot air balloons flying over a valley.jpg\", \"mask_strategy\": \"0\"}\na group of hot air balloons flying over a valley, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of hot air balloons flying over a valley.jpg\", \"mask_strategy\": \"0\"}\na group of hot air balloons flying over a valley, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of hot air balloons flying over a valley.jpg\", \"mask_strategy\": \"0\"}\na group of hot air balloons flying over a valley, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of hot air balloons flying over a valley.jpg\", \"mask_strategy\": \"0\"}\na group of hot air balloons flying over a valley, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of hot air balloons flying over a valley.jpg\", \"mask_strategy\": \"0\"}\na group of hot air balloons flying over a valley, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of hot air balloons flying over a valley.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a group of islands in the middle of a lake{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a group of islands in the middle of a lake.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a group of islands in the middle of a lake, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a group of islands in the middle of a lake.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a group of islands in the middle of a lake, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a group of islands in the middle of a lake.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a group of islands in the middle of a lake, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a group of islands in the middle of a lake.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a group of islands in the middle of a lake, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a group of islands in the middle of a lake.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a group of islands in the middle of a lake, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a group of islands in the middle of a lake.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a group of islands in the middle of a lake, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a group of islands in the middle of a lake.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a group of islands in the middle of a lake, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a group of islands in the middle of a lake.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a rocky beach in indonesia{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a rocky beach in indonesia.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a rocky beach in indonesia, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a rocky beach in indonesia.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a rocky beach in indonesia, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a rocky beach in indonesia.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a rocky beach in indonesia, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a rocky beach in indonesia.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a rocky beach in indonesia, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a rocky beach in indonesia.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a rocky beach in indonesia, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a rocky beach in indonesia.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a rocky beach in indonesia, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a rocky beach in indonesia.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a rocky beach in indonesia, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a rocky beach in indonesia.jpg\", \"mask_strategy\": \"0\"}\nfireworks in the night sky over a city{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/fireworks in the night sky over a city.jpg\", \"mask_strategy\": \"0\"}\nfireworks in the night sky over a city, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/fireworks in the night sky over a city.jpg\", \"mask_strategy\": \"0\"}\nfireworks in the night sky over a city, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/fireworks in the night sky over a city.jpg\", \"mask_strategy\": \"0\"}\nfireworks in the night sky over a city, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/fireworks in the night sky over a city.jpg\", \"mask_strategy\": \"0\"}\nfireworks in the night sky over a city, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/fireworks in the night sky over a city.jpg\", \"mask_strategy\": \"0\"}\nfireworks in the night sky over a city, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/fireworks in the night sky over a city.jpg\", \"mask_strategy\": \"0\"}\nfireworks in the night sky over a city, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/fireworks in the night sky over a city.jpg\", \"mask_strategy\": \"0\"}\nfireworks in the night sky over a city, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/fireworks in the night sky over a city.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes into a lighthouse on a stormy day{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes into a lighthouse on a stormy day.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes into a lighthouse on a stormy day, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes into a lighthouse on a stormy day.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes into a lighthouse on a stormy day, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes into a lighthouse on a stormy day.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes into a lighthouse on a stormy day, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes into a lighthouse on a stormy day.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes into a lighthouse on a stormy day, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes into a lighthouse on a stormy day.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes into a lighthouse on a stormy day, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes into a lighthouse on a stormy day.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes into a lighthouse on a stormy day, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes into a lighthouse on a stormy day.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes into a lighthouse on a stormy day, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes into a lighthouse on a stormy day.jpg\", \"mask_strategy\": \"0\"}\na mountain range with a sky background{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain range with a sky background.jpg\", \"mask_strategy\": \"0\"}\na mountain range with a sky background, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain range with a sky background.jpg\", \"mask_strategy\": \"0\"}\na mountain range with a sky background, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain range with a sky background.jpg\", \"mask_strategy\": \"0\"}\na mountain range with a sky background, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain range with a sky background.jpg\", \"mask_strategy\": \"0\"}\na mountain range with a sky background, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain range with a sky background.jpg\", \"mask_strategy\": \"0\"}\na mountain range with a sky background, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain range with a sky background.jpg\", \"mask_strategy\": \"0\"}\na mountain range with a sky background, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain range with a sky background.jpg\", \"mask_strategy\": \"0\"}\na mountain range with a sky background, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain range with a sky background.jpg\", \"mask_strategy\": \"0\"}\na large bonfire is burning in the night sky{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large bonfire is burning in the night sky.jpg\", \"mask_strategy\": \"0\"}\na large bonfire is burning in the night sky, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large bonfire is burning in the night sky.jpg\", \"mask_strategy\": \"0\"}\na large bonfire is burning in the night sky, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large bonfire is burning in the night sky.jpg\", \"mask_strategy\": \"0\"}\na large bonfire is burning in the night sky, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large bonfire is burning in the night sky.jpg\", \"mask_strategy\": \"0\"}\na large bonfire is burning in the night sky, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large bonfire is burning in the night sky.jpg\", \"mask_strategy\": \"0\"}\na large bonfire is burning in the night sky, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large bonfire is burning in the night sky.jpg\", \"mask_strategy\": \"0\"}\na large bonfire is burning in the night sky, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large bonfire is burning in the night sky.jpg\", \"mask_strategy\": \"0\"}\na large bonfire is burning in the night sky, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large bonfire is burning in the night sky.jpg\", \"mask_strategy\": \"0\"}\na close-up view of the flames of a fireplace{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of the flames of a fireplace.jpg\", \"mask_strategy\": \"0\"}\na close-up view of the flames of a fireplace, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of the flames of a fireplace.jpg\", \"mask_strategy\": \"0\"}\na close-up view of the flames of a fireplace, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of the flames of a fireplace.jpg\", \"mask_strategy\": \"0\"}\na close-up view of the flames of a fireplace, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of the flames of a fireplace.jpg\", \"mask_strategy\": \"0\"}\na close-up view of the flames of a fireplace, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of the flames of a fireplace.jpg\", \"mask_strategy\": \"0\"}\na close-up view of the flames of a fireplace, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of the flames of a fireplace.jpg\", \"mask_strategy\": \"0\"}\na close-up view of the flames of a fireplace, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of the flames of a fireplace.jpg\", \"mask_strategy\": \"0\"}\na close-up view of the flames of a fireplace, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of the flames of a fireplace.jpg\", \"mask_strategy\": \"0\"}\na farm in the middle of the day{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a farm in the middle of the day.jpg\", \"mask_strategy\": \"0\"}\na farm in the middle of the day, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a farm in the middle of the day.jpg\", \"mask_strategy\": \"0\"}\na farm in the middle of the day, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a farm in the middle of the day.jpg\", \"mask_strategy\": \"0\"}\na farm in the middle of the day, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a farm in the middle of the day.jpg\", \"mask_strategy\": \"0\"}\na farm in the middle of the day, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a farm in the middle of the day.jpg\", \"mask_strategy\": \"0\"}\na farm in the middle of the day, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a farm in the middle of the day.jpg\", \"mask_strategy\": \"0\"}\na farm in the middle of the day, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a farm in the middle of the day.jpg\", \"mask_strategy\": \"0\"}\na farm in the middle of the day, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a farm in the middle of the day.jpg\", \"mask_strategy\": \"0\"}\na flock of birds flying over a tree at sunset{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a flock of birds flying over a tree at sunset.jpg\", \"mask_strategy\": \"0\"}\na flock of birds flying over a tree at sunset, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a flock of birds flying over a tree at sunset.jpg\", \"mask_strategy\": \"0\"}\na flock of birds flying over a tree at sunset, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a flock of birds flying over a tree at sunset.jpg\", \"mask_strategy\": \"0\"}\na flock of birds flying over a tree at sunset, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a flock of birds flying over a tree at sunset.jpg\", \"mask_strategy\": \"0\"}\na flock of birds flying over a tree at sunset, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a flock of birds flying over a tree at sunset.jpg\", \"mask_strategy\": \"0\"}\na flock of birds flying over a tree at sunset, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a flock of birds flying over a tree at sunset.jpg\", \"mask_strategy\": \"0\"}\na flock of birds flying over a tree at sunset, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a flock of birds flying over a tree at sunset.jpg\", \"mask_strategy\": \"0\"}\na flock of birds flying over a tree at sunset, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a flock of birds flying over a tree at sunset.jpg\", \"mask_strategy\": \"0\"}\na captivating scene featuring a spiral galaxy shining brilliantly in the night sky{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a captivating scene featuring a spiral galaxy shining brilliantly in the night sky.jpg\", \"mask_strategy\": \"0\"}\na captivating scene featuring a spiral galaxy shining brilliantly in the night sky, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a captivating scene featuring a spiral galaxy shining brilliantly in the night sky.jpg\", \"mask_strategy\": \"0\"}\na captivating scene featuring a spiral galaxy shining brilliantly in the night sky, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a captivating scene featuring a spiral galaxy shining brilliantly in the night sky.jpg\", \"mask_strategy\": \"0\"}\na captivating scene featuring a spiral galaxy shining brilliantly in the night sky, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a captivating scene featuring a spiral galaxy shining brilliantly in the night sky.jpg\", \"mask_strategy\": \"0\"}\na captivating scene featuring a spiral galaxy shining brilliantly in the night sky, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a captivating scene featuring a spiral galaxy shining brilliantly in the night sky.jpg\", \"mask_strategy\": \"0\"}\na captivating scene featuring a spiral galaxy shining brilliantly in the night sky, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a captivating scene featuring a spiral galaxy shining brilliantly in the night sky.jpg\", \"mask_strategy\": \"0\"}\na captivating scene featuring a spiral galaxy shining brilliantly in the night sky, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a captivating scene featuring a spiral galaxy shining brilliantly in the night sky.jpg\", \"mask_strategy\": \"0\"}\na captivating scene featuring a spiral galaxy shining brilliantly in the night sky, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a captivating scene featuring a spiral galaxy shining brilliantly in the night sky.jpg\", \"mask_strategy\": \"0\"}\na mountain with snow on it{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain with snow on it.jpg\", \"mask_strategy\": \"0\"}\na mountain with snow on it, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain with snow on it.jpg\", \"mask_strategy\": \"0\"}\na mountain with snow on it, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain with snow on it.jpg\", \"mask_strategy\": \"0\"}\na mountain with snow on it, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain with snow on it.jpg\", \"mask_strategy\": \"0\"}\na mountain with snow on it, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain with snow on it.jpg\", \"mask_strategy\": \"0\"}\na mountain with snow on it, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain with snow on it.jpg\", \"mask_strategy\": \"0\"}\na mountain with snow on it, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain with snow on it.jpg\", \"mask_strategy\": \"0\"}\na mountain with snow on it, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain with snow on it.jpg\", \"mask_strategy\": \"0\"}\na bridge that is in the middle of a river{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bridge that is in the middle of a river.jpg\", \"mask_strategy\": \"0\"}\na bridge that is in the middle of a river, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bridge that is in the middle of a river.jpg\", \"mask_strategy\": \"0\"}\na bridge that is in the middle of a river, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bridge that is in the middle of a river.jpg\", \"mask_strategy\": \"0\"}\na bridge that is in the middle of a river, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bridge that is in the middle of a river.jpg\", \"mask_strategy\": \"0\"}\na bridge that is in the middle of a river, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bridge that is in the middle of a river.jpg\", \"mask_strategy\": \"0\"}\na bridge that is in the middle of a river, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bridge that is in the middle of a river.jpg\", \"mask_strategy\": \"0\"}\na bridge that is in the middle of a river, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bridge that is in the middle of a river.jpg\", \"mask_strategy\": \"0\"}\na bridge that is in the middle of a river, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bridge that is in the middle of a river.jpg\", \"mask_strategy\": \"0\"}\na group of people standing on top of a green hill{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of people standing on top of a green hill.jpg\", \"mask_strategy\": \"0\"}\na group of people standing on top of a green hill, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of people standing on top of a green hill.jpg\", \"mask_strategy\": \"0\"}\na group of people standing on top of a green hill, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of people standing on top of a green hill.jpg\", \"mask_strategy\": \"0\"}\na group of people standing on top of a green hill, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of people standing on top of a green hill.jpg\", \"mask_strategy\": \"0\"}\na group of people standing on top of a green hill, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of people standing on top of a green hill.jpg\", \"mask_strategy\": \"0\"}\na group of people standing on top of a green hill, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of people standing on top of a green hill.jpg\", \"mask_strategy\": \"0\"}\na group of people standing on top of a green hill, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of people standing on top of a green hill.jpg\", \"mask_strategy\": \"0\"}\na group of people standing on top of a green hill, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of people standing on top of a green hill.jpg\", \"mask_strategy\": \"0\"}\na sandy beach with a wooden pier in the water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sandy beach with a wooden pier in the water.jpg\", \"mask_strategy\": \"0\"}\na sandy beach with a wooden pier in the water, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sandy beach with a wooden pier in the water.jpg\", \"mask_strategy\": \"0\"}\na sandy beach with a wooden pier in the water, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sandy beach with a wooden pier in the water.jpg\", \"mask_strategy\": \"0\"}\na sandy beach with a wooden pier in the water, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sandy beach with a wooden pier in the water.jpg\", \"mask_strategy\": \"0\"}\na sandy beach with a wooden pier in the water, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sandy beach with a wooden pier in the water.jpg\", \"mask_strategy\": \"0\"}\na sandy beach with a wooden pier in the water, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sandy beach with a wooden pier in the water.jpg\", \"mask_strategy\": \"0\"}\na sandy beach with a wooden pier in the water, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sandy beach with a wooden pier in the water.jpg\", \"mask_strategy\": \"0\"}\na sandy beach with a wooden pier in the water, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sandy beach with a wooden pier in the water.jpg\", \"mask_strategy\": \"0\"}\na lake surrounded by mountains and flowers{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a lake surrounded by mountains and flowers.jpg\", \"mask_strategy\": \"0\"}\na lake surrounded by mountains and flowers, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a lake surrounded by mountains and flowers.jpg\", \"mask_strategy\": \"0\"}\na lake surrounded by mountains and flowers, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a lake surrounded by mountains and flowers.jpg\", \"mask_strategy\": \"0\"}\na lake surrounded by mountains and flowers, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a lake surrounded by mountains and flowers.jpg\", \"mask_strategy\": \"0\"}\na lake surrounded by mountains and flowers, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a lake surrounded by mountains and flowers.jpg\", \"mask_strategy\": \"0\"}\na lake surrounded by mountains and flowers, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a lake surrounded by mountains and flowers.jpg\", \"mask_strategy\": \"0\"}\na lake surrounded by mountains and flowers, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a lake surrounded by mountains and flowers.jpg\", \"mask_strategy\": \"0\"}\na lake surrounded by mountains and flowers, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a lake surrounded by mountains and flowers.jpg\", \"mask_strategy\": \"0\"}\na hot-air balloon flying over a desert landscape{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a hot-air balloon flying over a desert landscape.jpg\", \"mask_strategy\": \"0\"}\na hot-air balloon flying over a desert landscape, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a hot-air balloon flying over a desert landscape.jpg\", \"mask_strategy\": \"0\"}\na hot-air balloon flying over a desert landscape, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a hot-air balloon flying over a desert landscape.jpg\", \"mask_strategy\": \"0\"}\na hot-air balloon flying over a desert landscape, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a hot-air balloon flying over a desert landscape.jpg\", \"mask_strategy\": \"0\"}\na hot-air balloon flying over a desert landscape, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a hot-air balloon flying over a desert landscape.jpg\", \"mask_strategy\": \"0\"}\na hot-air balloon flying over a desert landscape, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a hot-air balloon flying over a desert landscape.jpg\", \"mask_strategy\": \"0\"}\na hot-air balloon flying over a desert landscape, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a hot-air balloon flying over a desert landscape.jpg\", \"mask_strategy\": \"0\"}\na hot-air balloon flying over a desert landscape, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a hot-air balloon flying over a desert landscape.jpg\", \"mask_strategy\": \"0\"}\nseveral hot air balloons flying over a city{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/several hot air balloons flying over a city.jpg\", \"mask_strategy\": \"0\"}\nseveral hot air balloons flying over a city, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/several hot air balloons flying over a city.jpg\", \"mask_strategy\": \"0\"}\nseveral hot air balloons flying over a city, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/several hot air balloons flying over a city.jpg\", \"mask_strategy\": \"0\"}\nseveral hot air balloons flying over a city, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/several hot air balloons flying over a city.jpg\", \"mask_strategy\": \"0\"}\nseveral hot air balloons flying over a city, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/several hot air balloons flying over a city.jpg\", \"mask_strategy\": \"0\"}\nseveral hot air balloons flying over a city, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/several hot air balloons flying over a city.jpg\", \"mask_strategy\": \"0\"}\nseveral hot air balloons flying over a city, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/several hot air balloons flying over a city.jpg\", \"mask_strategy\": \"0\"}\nseveral hot air balloons flying over a city, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/several hot air balloons flying over a city.jpg\", \"mask_strategy\": \"0\"}\na group of hot air balloons flying over a field{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of hot air balloons flying over a field.jpg\", \"mask_strategy\": \"0\"}\na group of hot air balloons flying over a field, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of hot air balloons flying over a field.jpg\", \"mask_strategy\": \"0\"}\na group of hot air balloons flying over a field, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of hot air balloons flying over a field.jpg\", \"mask_strategy\": \"0\"}\na group of hot air balloons flying over a field, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of hot air balloons flying over a field.jpg\", \"mask_strategy\": \"0\"}\na group of hot air balloons flying over a field, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of hot air balloons flying over a field.jpg\", \"mask_strategy\": \"0\"}\na group of hot air balloons flying over a field, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of hot air balloons flying over a field.jpg\", \"mask_strategy\": \"0\"}\na group of hot air balloons flying over a field, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of hot air balloons flying over a field.jpg\", \"mask_strategy\": \"0\"}\na group of hot air balloons flying over a field, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of hot air balloons flying over a field.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes over a rocky cliff{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes over a rocky cliff.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes over a rocky cliff, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes over a rocky cliff.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes over a rocky cliff, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes over a rocky cliff.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes over a rocky cliff, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes over a rocky cliff.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes over a rocky cliff, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes over a rocky cliff.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes over a rocky cliff, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes over a rocky cliff.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes over a rocky cliff, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes over a rocky cliff.jpg\", \"mask_strategy\": \"0\"}\na large wave crashes over a rocky cliff, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave crashes over a rocky cliff.jpg\", \"mask_strategy\": \"0\"}\nthe sun is setting over a lake in the mountains{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the sun is setting over a lake in the mountains.jpg\", \"mask_strategy\": \"0\"}\nthe sun is setting over a lake in the mountains, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the sun is setting over a lake in the mountains.jpg\", \"mask_strategy\": \"0\"}\nthe sun is setting over a lake in the mountains, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the sun is setting over a lake in the mountains.jpg\", \"mask_strategy\": \"0\"}\nthe sun is setting over a lake in the mountains, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the sun is setting over a lake in the mountains.jpg\", \"mask_strategy\": \"0\"}\nthe sun is setting over a lake in the mountains, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the sun is setting over a lake in the mountains.jpg\", \"mask_strategy\": \"0\"}\nthe sun is setting over a lake in the mountains, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the sun is setting over a lake in the mountains.jpg\", \"mask_strategy\": \"0\"}\nthe sun is setting over a lake in the mountains, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the sun is setting over a lake in the mountains.jpg\", \"mask_strategy\": \"0\"}\nthe sun is setting over a lake in the mountains, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the sun is setting over a lake in the mountains.jpg\", \"mask_strategy\": \"0\"}\na mountain range with snow on the ground{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain range with snow on the ground.jpg\", \"mask_strategy\": \"0\"}\na mountain range with snow on the ground, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain range with snow on the ground.jpg\", \"mask_strategy\": \"0\"}\na mountain range with snow on the ground, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain range with snow on the ground.jpg\", \"mask_strategy\": \"0\"}\na mountain range with snow on the ground, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain range with snow on the ground.jpg\", \"mask_strategy\": \"0\"}\na mountain range with snow on the ground, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain range with snow on the ground.jpg\", \"mask_strategy\": \"0\"}\na mountain range with snow on the ground, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain range with snow on the ground.jpg\", \"mask_strategy\": \"0\"}\na mountain range with snow on the ground, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain range with snow on the ground.jpg\", \"mask_strategy\": \"0\"}\na mountain range with snow on the ground, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain range with snow on the ground.jpg\", \"mask_strategy\": \"0\"}\nsun rays shining through clouds over a lake{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/sun rays shining through clouds over a lake.jpg\", \"mask_strategy\": \"0\"}\nsun rays shining through clouds over a lake, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/sun rays shining through clouds over a lake.jpg\", \"mask_strategy\": \"0\"}\nsun rays shining through clouds over a lake, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/sun rays shining through clouds over a lake.jpg\", \"mask_strategy\": \"0\"}\nsun rays shining through clouds over a lake, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/sun rays shining through clouds over a lake.jpg\", \"mask_strategy\": \"0\"}\nsun rays shining through clouds over a lake, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/sun rays shining through clouds over a lake.jpg\", \"mask_strategy\": \"0\"}\nsun rays shining through clouds over a lake, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/sun rays shining through clouds over a lake.jpg\", \"mask_strategy\": \"0\"}\nsun rays shining through clouds over a lake, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/sun rays shining through clouds over a lake.jpg\", \"mask_strategy\": \"0\"}\nsun rays shining through clouds over a lake, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/sun rays shining through clouds over a lake.jpg\", \"mask_strategy\": \"0\"}\na boat sits on the shore of a lake with mt fuji in the background{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a boat sits on the shore of a lake with mt fuji in the background.jpg\", \"mask_strategy\": \"0\"}\na boat sits on the shore of a lake with mt fuji in the background, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a boat sits on the shore of a lake with mt fuji in the background.jpg\", \"mask_strategy\": \"0\"}\na boat sits on the shore of a lake with mt fuji in the background, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a boat sits on the shore of a lake with mt fuji in the background.jpg\", \"mask_strategy\": \"0\"}\na boat sits on the shore of a lake with mt fuji in the background, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a boat sits on the shore of a lake with mt fuji in the background.jpg\", \"mask_strategy\": \"0\"}\na boat sits on the shore of a lake with mt fuji in the background, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a boat sits on the shore of a lake with mt fuji in the background.jpg\", \"mask_strategy\": \"0\"}\na boat sits on the shore of a lake with mt fuji in the background, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a boat sits on the shore of a lake with mt fuji in the background.jpg\", \"mask_strategy\": \"0\"}\na boat sits on the shore of a lake with mt fuji in the background, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a boat sits on the shore of a lake with mt fuji in the background.jpg\", \"mask_strategy\": \"0\"}\na boat sits on the shore of a lake with mt fuji in the background, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a boat sits on the shore of a lake with mt fuji in the background.jpg\", \"mask_strategy\": \"0\"}\na foggy road with trees in the distance{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy road with trees in the distance.jpg\", \"mask_strategy\": \"0\"}\na foggy road with trees in the distance, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy road with trees in the distance.jpg\", \"mask_strategy\": \"0\"}\na foggy road with trees in the distance, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy road with trees in the distance.jpg\", \"mask_strategy\": \"0\"}\na foggy road with trees in the distance, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy road with trees in the distance.jpg\", \"mask_strategy\": \"0\"}\na foggy road with trees in the distance, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy road with trees in the distance.jpg\", \"mask_strategy\": \"0\"}\na foggy road with trees in the distance, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy road with trees in the distance.jpg\", \"mask_strategy\": \"0\"}\na foggy road with trees in the distance, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy road with trees in the distance.jpg\", \"mask_strategy\": \"0\"}\na foggy road with trees in the distance, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy road with trees in the distance.jpg\", \"mask_strategy\": \"0\"}\ntwo swans swimming on a lake in the fog{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two swans swimming on a lake in the fog.jpg\", \"mask_strategy\": \"0\"}\ntwo swans swimming on a lake in the fog, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two swans swimming on a lake in the fog.jpg\", \"mask_strategy\": \"0\"}\ntwo swans swimming on a lake in the fog, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two swans swimming on a lake in the fog.jpg\", \"mask_strategy\": \"0\"}\ntwo swans swimming on a lake in the fog, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two swans swimming on a lake in the fog.jpg\", \"mask_strategy\": \"0\"}\ntwo swans swimming on a lake in the fog, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two swans swimming on a lake in the fog.jpg\", \"mask_strategy\": \"0\"}\ntwo swans swimming on a lake in the fog, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two swans swimming on a lake in the fog.jpg\", \"mask_strategy\": \"0\"}\ntwo swans swimming on a lake in the fog, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two swans swimming on a lake in the fog.jpg\", \"mask_strategy\": \"0\"}\ntwo swans swimming on a lake in the fog, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two swans swimming on a lake in the fog.jpg\", \"mask_strategy\": \"0\"}\nthe sun is shining through the trees near a waterfall{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the sun is shining through the trees near a waterfall.jpg\", \"mask_strategy\": \"0\"}\nthe sun is shining through the trees near a waterfall, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the sun is shining through the trees near a waterfall.jpg\", \"mask_strategy\": \"0\"}\nthe sun is shining through the trees near a waterfall, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the sun is shining through the trees near a waterfall.jpg\", \"mask_strategy\": \"0\"}\nthe sun is shining through the trees near a waterfall, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the sun is shining through the trees near a waterfall.jpg\", \"mask_strategy\": \"0\"}\nthe sun is shining through the trees near a waterfall, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the sun is shining through the trees near a waterfall.jpg\", \"mask_strategy\": \"0\"}\nthe sun is shining through the trees near a waterfall, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the sun is shining through the trees near a waterfall.jpg\", \"mask_strategy\": \"0\"}\nthe sun is shining through the trees near a waterfall, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the sun is shining through the trees near a waterfall.jpg\", \"mask_strategy\": \"0\"}\nthe sun is shining through the trees near a waterfall, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the sun is shining through the trees near a waterfall.jpg\", \"mask_strategy\": \"0\"}\na sandy beach with palm trees on the shore{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sandy beach with palm trees on the shore.jpg\", \"mask_strategy\": \"0\"}\na sandy beach with palm trees on the shore, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sandy beach with palm trees on the shore.jpg\", \"mask_strategy\": \"0\"}\na sandy beach with palm trees on the shore, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sandy beach with palm trees on the shore.jpg\", \"mask_strategy\": \"0\"}\na sandy beach with palm trees on the shore, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sandy beach with palm trees on the shore.jpg\", \"mask_strategy\": \"0\"}\na sandy beach with palm trees on the shore, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sandy beach with palm trees on the shore.jpg\", \"mask_strategy\": \"0\"}\na sandy beach with palm trees on the shore, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sandy beach with palm trees on the shore.jpg\", \"mask_strategy\": \"0\"}\na sandy beach with palm trees on the shore, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sandy beach with palm trees on the shore.jpg\", \"mask_strategy\": \"0\"}\na sandy beach with palm trees on the shore, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sandy beach with palm trees on the shore.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a body of water and a beach{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a body of water and a beach.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a body of water and a beach, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a body of water and a beach.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a body of water and a beach, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a body of water and a beach.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a body of water and a beach, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a body of water and a beach.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a body of water and a beach, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a body of water and a beach.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a body of water and a beach, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a body of water and a beach.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a body of water and a beach, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a body of water and a beach.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a body of water and a beach, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a body of water and a beach.jpg\", \"mask_strategy\": \"0\"}\na foggy field that has trees in the grass{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy field that has trees in the grass.jpg\", \"mask_strategy\": \"0\"}\na foggy field that has trees in the grass, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy field that has trees in the grass.jpg\", \"mask_strategy\": \"0\"}\na foggy field that has trees in the grass, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy field that has trees in the grass.jpg\", \"mask_strategy\": \"0\"}\na foggy field that has trees in the grass, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy field that has trees in the grass.jpg\", \"mask_strategy\": \"0\"}\na foggy field that has trees in the grass, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy field that has trees in the grass.jpg\", \"mask_strategy\": \"0\"}\na foggy field that has trees in the grass, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy field that has trees in the grass.jpg\", \"mask_strategy\": \"0\"}\na foggy field that has trees in the grass, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy field that has trees in the grass.jpg\", \"mask_strategy\": \"0\"}\na foggy field that has trees in the grass, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy field that has trees in the grass.jpg\", \"mask_strategy\": \"0\"}\na foggy landscape with trees and hills in the distance{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy landscape with trees and hills in the distance.jpg\", \"mask_strategy\": \"0\"}\na foggy landscape with trees and hills in the distance, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy landscape with trees and hills in the distance.jpg\", \"mask_strategy\": \"0\"}\na foggy landscape with trees and hills in the distance, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy landscape with trees and hills in the distance.jpg\", \"mask_strategy\": \"0\"}\na foggy landscape with trees and hills in the distance, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy landscape with trees and hills in the distance.jpg\", \"mask_strategy\": \"0\"}\na foggy landscape with trees and hills in the distance, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy landscape with trees and hills in the distance.jpg\", \"mask_strategy\": \"0\"}\na foggy landscape with trees and hills in the distance, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy landscape with trees and hills in the distance.jpg\", \"mask_strategy\": \"0\"}\na foggy landscape with trees and hills in the distance, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy landscape with trees and hills in the distance.jpg\", \"mask_strategy\": \"0\"}\na foggy landscape with trees and hills in the distance, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a foggy landscape with trees and hills in the distance.jpg\", \"mask_strategy\": \"0\"}\na large wave in the ocean with a lot of spray coming from it{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave in the ocean with a lot of spray coming from it.jpg\", \"mask_strategy\": \"0\"}\na large wave in the ocean with a lot of spray coming from it, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave in the ocean with a lot of spray coming from it.jpg\", \"mask_strategy\": \"0\"}\na large wave in the ocean with a lot of spray coming from it, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave in the ocean with a lot of spray coming from it.jpg\", \"mask_strategy\": \"0\"}\na large wave in the ocean with a lot of spray coming from it, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave in the ocean with a lot of spray coming from it.jpg\", \"mask_strategy\": \"0\"}\na large wave in the ocean with a lot of spray coming from it, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave in the ocean with a lot of spray coming from it.jpg\", \"mask_strategy\": \"0\"}\na large wave in the ocean with a lot of spray coming from it, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave in the ocean with a lot of spray coming from it.jpg\", \"mask_strategy\": \"0\"}\na large wave in the ocean with a lot of spray coming from it, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave in the ocean with a lot of spray coming from it.jpg\", \"mask_strategy\": \"0\"}\na large wave in the ocean with a lot of spray coming from it, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large wave in the ocean with a lot of spray coming from it.jpg\", \"mask_strategy\": \"0\"}\na waterfall is shown in the middle of a lush green hillside{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a waterfall is shown in the middle of a lush green hillside.jpg\", \"mask_strategy\": \"0\"}\na waterfall is shown in the middle of a lush green hillside, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a waterfall is shown in the middle of a lush green hillside.jpg\", \"mask_strategy\": \"0\"}\na waterfall is shown in the middle of a lush green hillside, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a waterfall is shown in the middle of a lush green hillside.jpg\", \"mask_strategy\": \"0\"}\na waterfall is shown in the middle of a lush green hillside, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a waterfall is shown in the middle of a lush green hillside.jpg\", \"mask_strategy\": \"0\"}\na waterfall is shown in the middle of a lush green hillside, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a waterfall is shown in the middle of a lush green hillside.jpg\", \"mask_strategy\": \"0\"}\na waterfall is shown in the middle of a lush green hillside, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a waterfall is shown in the middle of a lush green hillside.jpg\", \"mask_strategy\": \"0\"}\na waterfall is shown in the middle of a lush green hillside, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a waterfall is shown in the middle of a lush green hillside.jpg\", \"mask_strategy\": \"0\"}\na waterfall is shown in the middle of a lush green hillside, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a waterfall is shown in the middle of a lush green hillside.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a curvy road in the middle of a forest{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a curvy road in the middle of a forest.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a curvy road in the middle of a forest, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a curvy road in the middle of a forest.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a curvy road in the middle of a forest, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a curvy road in the middle of a forest.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a curvy road in the middle of a forest, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a curvy road in the middle of a forest.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a curvy road in the middle of a forest, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a curvy road in the middle of a forest.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a curvy road in the middle of a forest, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a curvy road in the middle of a forest.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a curvy road in the middle of a forest, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a curvy road in the middle of a forest.jpg\", \"mask_strategy\": \"0\"}\nan aerial view of a curvy road in the middle of a forest, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an aerial view of a curvy road in the middle of a forest.jpg\", \"mask_strategy\": \"0\"}\na mountain covered in snow with evergreen trees{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain covered in snow with evergreen trees.jpg\", \"mask_strategy\": \"0\"}\na mountain covered in snow with evergreen trees, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain covered in snow with evergreen trees.jpg\", \"mask_strategy\": \"0\"}\na mountain covered in snow with evergreen trees, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain covered in snow with evergreen trees.jpg\", \"mask_strategy\": \"0\"}\na mountain covered in snow with evergreen trees, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain covered in snow with evergreen trees.jpg\", \"mask_strategy\": \"0\"}\na mountain covered in snow with evergreen trees, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain covered in snow with evergreen trees.jpg\", \"mask_strategy\": \"0\"}\na mountain covered in snow with evergreen trees, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain covered in snow with evergreen trees.jpg\", \"mask_strategy\": \"0\"}\na mountain covered in snow with evergreen trees, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain covered in snow with evergreen trees.jpg\", \"mask_strategy\": \"0\"}\na mountain covered in snow with evergreen trees, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a mountain covered in snow with evergreen trees.jpg\", \"mask_strategy\": \"0\"}\na very large waterfall in the middle of the day{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a very large waterfall in the middle of the day.jpg\", \"mask_strategy\": \"0\"}\na very large waterfall in the middle of the day, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a very large waterfall in the middle of the day.jpg\", \"mask_strategy\": \"0\"}\na very large waterfall in the middle of the day, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a very large waterfall in the middle of the day.jpg\", \"mask_strategy\": \"0\"}\na very large waterfall in the middle of the day, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a very large waterfall in the middle of the day.jpg\", \"mask_strategy\": \"0\"}\na very large waterfall in the middle of the day, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a very large waterfall in the middle of the day.jpg\", \"mask_strategy\": \"0\"}\na very large waterfall in the middle of the day, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a very large waterfall in the middle of the day.jpg\", \"mask_strategy\": \"0\"}\na very large waterfall in the middle of the day, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a very large waterfall in the middle of the day.jpg\", \"mask_strategy\": \"0\"}\na very large waterfall in the middle of the day, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a very large waterfall in the middle of the day.jpg\", \"mask_strategy\": \"0\"}\na large waterfall in the middle of a lush green hillside{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large waterfall in the middle of a lush green hillside.jpg\", \"mask_strategy\": \"0\"}\na large waterfall in the middle of a lush green hillside, camera pans left{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large waterfall in the middle of a lush green hillside.jpg\", \"mask_strategy\": \"0\"}\na large waterfall in the middle of a lush green hillside, camera pans right{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large waterfall in the middle of a lush green hillside.jpg\", \"mask_strategy\": \"0\"}\na large waterfall in the middle of a lush green hillside, camera tilts up{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large waterfall in the middle of a lush green hillside.jpg\", \"mask_strategy\": \"0\"}\na large waterfall in the middle of a lush green hillside, camera tilts down{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large waterfall in the middle of a lush green hillside.jpg\", \"mask_strategy\": \"0\"}\na large waterfall in the middle of a lush green hillside, camera zooms in{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large waterfall in the middle of a lush green hillside.jpg\", \"mask_strategy\": \"0\"}\na large waterfall in the middle of a lush green hillside, camera zooms out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large waterfall in the middle of a lush green hillside.jpg\", \"mask_strategy\": \"0\"}\na large waterfall in the middle of a lush green hillside, camera static{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large waterfall in the middle of a lush green hillside.jpg\", \"mask_strategy\": \"0\"}\na brown bear in the water with a fish in its mouth{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a brown bear in the water with a fish in its mouth.jpg\", \"mask_strategy\": \"0\"}\na close-up of a hippopotamus eating grass in a field{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up of a hippopotamus eating grass in a field.jpg\", \"mask_strategy\": \"0\"}\na sea turtle swimming in the ocean under the water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sea turtle swimming in the ocean under the water.jpg\", \"mask_strategy\": \"0\"}\ntwo bees are flying over a lavender plant{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two bees are flying over a lavender plant.jpg\", \"mask_strategy\": \"0\"}\nthe otter is standing in the water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the otter is standing in the water.jpg\", \"mask_strategy\": \"0\"}\na dog carrying a soccer ball in its mouth{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dog carrying a soccer ball in its mouth.jpg\", \"mask_strategy\": \"0\"}\nan eagle is flying over a mountain with trees in the background{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an eagle is flying over a mountain with trees in the background.jpg\", \"mask_strategy\": \"0\"}\na couple of horses are running in the dirt{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a couple of horses are running in the dirt.jpg\", \"mask_strategy\": \"0\"}\na highland cow with long horns standing in a field{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a highland cow with long horns standing in a field.jpg\", \"mask_strategy\": \"0\"}\na monkey is holding a banana in its mouth{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a monkey is holding a banana in its mouth.jpg\", \"mask_strategy\": \"0\"}\na large rhino grazing in the grass near a bush{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large rhino grazing in the grass near a bush.jpg\", \"mask_strategy\": \"0\"}\na butterfly sits on top of a purple flower{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a butterfly sits on top of a purple flower.jpg\", \"mask_strategy\": \"0\"}\nan alligator is covered in green plants in the water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an alligator is covered in green plants in the water.jpg\", \"mask_strategy\": \"0\"}\na red panda eating bamboo in a zoo{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a red panda eating bamboo in a zoo.jpg\", \"mask_strategy\": \"0\"}\na monochromatic video capturing a cat's gaze into the camera{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a monochromatic video capturing a cat's gaze into the camera.jpg\", \"mask_strategy\": \"0\"}\na frog sitting on top of water lily leaves{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a frog sitting on top of water lily leaves.jpg\", \"mask_strategy\": \"0\"}\na lion is roaring in the wild{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a lion is roaring in the wild.jpg\", \"mask_strategy\": \"0\"}\na seagull is flying towards a person's hand{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a seagull is flying towards a person's hand.jpg\", \"mask_strategy\": \"0\"}\na yellow and white jellyfish is floating in the ocean{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a yellow and white jellyfish is floating in the ocean.jpg\", \"mask_strategy\": \"0\"}\na group of jellyfish swimming in an aquarium{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of jellyfish swimming in an aquarium.jpg\", \"mask_strategy\": \"0\"}\na clown fish hiding in a purple anemone{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a clown fish hiding in a purple anemone.jpg\", \"mask_strategy\": \"0\"}\na snake sitting on the ground next to a bowl{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a snake sitting on the ground next to a bowl.jpg\", \"mask_strategy\": \"0\"}\na brown and white cow eating hay{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a brown and white cow eating hay.jpg\", \"mask_strategy\": \"0\"}\na seal swimming in the water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a seal swimming in the water.jpg\", \"mask_strategy\": \"0\"}\na panda bear is eating a piece of bamboo{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a panda bear is eating a piece of bamboo.jpg\", \"mask_strategy\": \"0\"}\na small bird sits on a moss covered branch{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a small bird sits on a moss covered branch.jpg\", \"mask_strategy\": \"0\"}\na bird with a fish in its beak flying over a field{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bird with a fish in its beak flying over a field.jpg\", \"mask_strategy\": \"0\"}\na large flock of birds flying in the sky{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large flock of birds flying in the sky.jpg\", \"mask_strategy\": \"0\"}\na bald eagle flying over a tree filled forest{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bald eagle flying over a tree filled forest.jpg\", \"mask_strategy\": \"0\"}\na giraffe walking in a field{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a giraffe walking in a field.jpg\", \"mask_strategy\": \"0\"}\na lioness yawning in a field{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a lioness yawning in a field.jpg\", \"mask_strategy\": \"0\"}\na little crab scurried on the sandy beach{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a little crab scurried on the sandy beach.jpg\", \"mask_strategy\": \"0\"}\na warthog is walking in the grass{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a warthog is walking in the grass.jpg\", \"mask_strategy\": \"0\"}\na penguin walking on a beach near the water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a penguin walking on a beach near the water.jpg\", \"mask_strategy\": \"0\"}\na tiger walking through a wooded area{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a tiger walking through a wooded area.jpg\", \"mask_strategy\": \"0\"}\na tiger walking on a dirt path in the woods{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a tiger walking on a dirt path in the woods.jpg\", \"mask_strategy\": \"0\"}\na small monkey holding a piece of food in it's mouth{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a small monkey holding a piece of food in it's mouth.jpg\", \"mask_strategy\": \"0\"}\na squirrel sitting on the ground eating a piece of bread{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a squirrel sitting on the ground eating a piece of bread.jpg\", \"mask_strategy\": \"0\"}\na group of fish swimming over a coral reef{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of fish swimming over a coral reef.jpg\", \"mask_strategy\": \"0\"}\na toad is sitting on top of some moss{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a toad is sitting on top of some moss.jpg\", \"mask_strategy\": \"0\"}\na great white shark swimming in the ocean{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a great white shark swimming in the ocean.jpg\", \"mask_strategy\": \"0\"}\na group of camels resting in the desert{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of camels resting in the desert.jpg\", \"mask_strategy\": \"0\"}\ntwo sheep grazing in the grass next to a wooden bridge{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two sheep grazing in the grass next to a wooden bridge.jpg\", \"mask_strategy\": \"0\"}\nan elephant walking through a forest{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an elephant walking through a forest.jpg\", \"mask_strategy\": \"0\"}\na white rooster standing in a grassy field{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a white rooster standing in a grassy field.jpg\", \"mask_strategy\": \"0\"}\na zebra walking across a dirt road near a field{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a zebra walking across a dirt road near a field.jpg\", \"mask_strategy\": \"0\"}\ncars are driving down a street lined with tall trees{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/cars are driving down a street lined with tall trees.jpg\", \"mask_strategy\": \"0\"}\nthe cars on the street are waiting for the traffic lights{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the cars on the street are waiting for the traffic lights.jpg\", \"mask_strategy\": \"0\"}\na bicycle leaning against a fence in the snow{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bicycle leaning against a fence in the snow.jpg\", \"mask_strategy\": \"0\"}\na blue fishing boat is navigating in the ocean next to a cruise ship{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a blue fishing boat is navigating in the ocean next to a cruise ship.jpg\", \"mask_strategy\": \"0\"}\na blue car driving down a dirt road near train tracks{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a blue car driving down a dirt road near train tracks.jpg\", \"mask_strategy\": \"0\"}\na sailboat is drifting on the ocean{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sailboat is drifting on the ocean.jpg\", \"mask_strategy\": \"0\"}\na couple of boats floating on a body of water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a couple of boats floating on a body of water.jpg\", \"mask_strategy\": \"0\"}\na city street with cars driving in the rain{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a city street with cars driving in the rain.jpg\", \"mask_strategy\": \"0\"}\na red and white tram traveling down a snowy street{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a red and white tram traveling down a snowy street.jpg\", \"mask_strategy\": \"0\"}\na city bus driving down a snowy street at night{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a city bus driving down a snowy street at night.jpg\", \"mask_strategy\": \"0\"}\na green toy car is sitting on the ground{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a green toy car is sitting on the ground.jpg\", \"mask_strategy\": \"0\"}\na train traveling down tracks through the woods with leaves on the ground{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a train traveling down tracks through the woods with leaves on the ground.jpg\", \"mask_strategy\": \"0\"}\na man in a small boat fishing in the ocean{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man in a small boat fishing in the ocean.jpg\", \"mask_strategy\": \"0\"}\nan airplane is flying through the sky at sunset{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an airplane is flying through the sky at sunset.jpg\", \"mask_strategy\": \"0\"}\nan old rusty car sits in the middle of a field{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an old rusty car sits in the middle of a field.jpg\", \"mask_strategy\": \"0\"}\na motorcycle driving down a road{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a motorcycle driving down a road.jpg\", \"mask_strategy\": \"0\"}\na blue train traveling through a lush green area{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a blue train traveling through a lush green area.jpg\", \"mask_strategy\": \"0\"}\na white car is swiftly driving on a dirt road near a bush, kicking up dust{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a white car is swiftly driving on a dirt road near a bush, kicking up dust.jpg\", \"mask_strategy\": \"0\"}\na large cargo ship sailing in the water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large cargo ship sailing in the water.jpg\", \"mask_strategy\": \"0\"}\nthe red Alfa sports car is speeding down the road{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/the red Alfa sports car is speeding down the road.jpg\", \"mask_strategy\": \"0\"}\ntwo cars that have been involved in a violent collision{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two cars that have been involved in a violent collision.jpg\", \"mask_strategy\": \"0\"}\na red double decker bus driving down a street{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a red double decker bus driving down a street.jpg\", \"mask_strategy\": \"0\"}\nA red sports car driving through sand, kicking up a large amount of dust{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A red sports car driving through sand, kicking up a large amount of dust.jpg\", \"mask_strategy\": \"0\"}\na yellow toy car parked on a rock near the water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a yellow toy car parked on a rock near the water.jpg\", \"mask_strategy\": \"0\"}\na space shuttle taking off into the sky{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a space shuttle taking off into the sky.jpg\", \"mask_strategy\": \"0\"}\na steam train traveling through the woods{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a steam train traveling through the woods.jpg\", \"mask_strategy\": \"0\"}\na group of buses parked at a bus station{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of buses parked at a bus station.jpg\", \"mask_strategy\": \"0\"}\nA bunch of cars are driving on a highway{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A bunch of cars are driving on a highway.jpg\", \"mask_strategy\": \"0\"}\na white and blue airplane flying in the sky{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a white and blue airplane flying in the sky.jpg\", \"mask_strategy\": \"0\"}\nA space station orbited above the Earth{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A space station orbited above the Earth.jpg\", \"mask_strategy\": \"0\"}\nA yellow boat is cruising in front of a bridge{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A yellow boat is cruising in front of a bridge.jpg\", \"mask_strategy\": \"0\"}\ntangerines in a metal bowl on a table{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/tangerines in a metal bowl on a table.jpg\", \"mask_strategy\": \"0\"}\na shadow of a hand reaching for a leaf{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a shadow of a hand reaching for a leaf.jpg\", \"mask_strategy\": \"0\"}\nA teddy bear is climbing over a wooden fence{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A teddy bear is climbing over a wooden fence.jpg\", \"mask_strategy\": \"0\"}\na book on fire with flames coming out of it{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a book on fire with flames coming out of it.jpg\", \"mask_strategy\": \"0\"}\na close-up of a pink rose with water droplets on it{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up of a pink rose with water droplets on it.jpg\", \"mask_strategy\": \"0\"}\na person is cooking meat on a grill with flames{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person is cooking meat on a grill with flames.jpg\", \"mask_strategy\": \"0\"}\na snowman wearing a santa hat and scarf{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a snowman wearing a santa hat and scarf.jpg\", \"mask_strategy\": \"0\"}\na person holding a sparkler in their hand{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person holding a sparkler in their hand.jpg\", \"mask_strategy\": \"0\"}\na teddy bear sitting on a moss covered ground{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a teddy bear sitting on a moss covered ground.jpg\", \"mask_strategy\": \"0\"}\na statue of a lion is sitting on a pedestal{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a statue of a lion is sitting on a pedestal.jpg\", \"mask_strategy\": \"0\"}\nmetal balls are suspended in the air{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/metal balls are suspended in the air.jpg\", \"mask_strategy\": \"0\"}\na close up of a bunch of green grapes{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close up of a bunch of green grapes.jpg\", \"mask_strategy\": \"0\"}\na close-up view of a green plant with unfurled fronds{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of a green plant with unfurled fronds.jpg\", \"mask_strategy\": \"0\"}\nan orange mushroom sitting on top of a tree stump in the woods{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an orange mushroom sitting on top of a tree stump in the woods.jpg\", \"mask_strategy\": \"0\"}\na stack of pancakes covered in syrup and fruit{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a stack of pancakes covered in syrup and fruit.jpg\", \"mask_strategy\": \"0\"}\na plate of spaghetti with spinach and tomatoes{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a plate of spaghetti with spinach and tomatoes.jpg\", \"mask_strategy\": \"0\"}\na pink lotus flower in the middle of a pond{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pink lotus flower in the middle of a pond.jpg\", \"mask_strategy\": \"0\"}\na person holding a sparkler in front of a sunset{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person holding a sparkler in front of a sunset.jpg\", \"mask_strategy\": \"0\"}\na pink rose is blooming in a garden{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pink rose is blooming in a garden.jpg\", \"mask_strategy\": \"0\"}\na snow man holding a lantern in the snow{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a snow man holding a lantern in the snow.jpg\", \"mask_strategy\": \"0\"}\na stack of chocolate cookies with a bite taken out of it{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a stack of chocolate cookies with a bite taken out of it.jpg\", \"mask_strategy\": \"0\"}\na white plate topped with eggs, toast, tomatoes, and a sausage{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a white plate topped with eggs, toast, tomatoes, and a sausage.jpg\", \"mask_strategy\": \"0\"}\na yellow water lily is floating in a pond{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a yellow water lily is floating in a pond.jpg\", \"mask_strategy\": \"0\"}\nan astronaut floating in space with the earth in the background{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an astronaut floating in space with the earth in the background.jpg\", \"mask_strategy\": \"0\"}\nA little girl, lost in thought, is quietly sitting on the bus{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A little girl, lost in thought, is quietly sitting on the bus.jpg\", \"mask_strategy\": \"0\"}\na man holding a tray in front of a brick wall{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man holding a tray in front of a brick wall.jpg\", \"mask_strategy\": \"0\"}\nan older man playing a saxophone on the street{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an older man playing a saxophone on the street.jpg\", \"mask_strategy\": \"0\"}\nan older man jogging by the water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an older man jogging by the water.jpg\", \"mask_strategy\": \"0\"}\na person riding a skateboard on a concrete floor{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person riding a skateboard on a concrete floor.jpg\", \"mask_strategy\": \"0\"}\na woman with long black hair is posing for a picture{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman with long black hair is posing for a picture.jpg\", \"mask_strategy\": \"0\"}\na woman sitting on the ground in front of a guitar{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman sitting on the ground in front of a guitar.jpg\", \"mask_strategy\": \"0\"}\na little girl wearing a purple helmet riding a blue bike{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a little girl wearing a purple helmet riding a blue bike.jpg\", \"mask_strategy\": \"0\"}\na young boy is jumping in the mud{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a young boy is jumping in the mud.jpg\", \"mask_strategy\": \"0\"}\na man sitting in the driver's seat of a car wearing sunglasses{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man sitting in the driver's seat of a car wearing sunglasses.jpg\", \"mask_strategy\": \"0\"}\na little boy jumping in the air over a puddle of water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a little boy jumping in the air over a puddle of water.jpg\", \"mask_strategy\": \"0\"}\na woman with afro hair is smiling while wearing earphones{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman with afro hair is smiling while wearing earphones.jpg\", \"mask_strategy\": \"0\"}\na smiling woman with her hands clasped{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a smiling woman with her hands clasped.jpg\", \"mask_strategy\": \"0\"}\na young boy standing in a field with horses in the background{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a young boy standing in a field with horses in the background.jpg\", \"mask_strategy\": \"0\"}\na young man is covered in colored powder{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a young man is covered in colored powder.jpg\", \"mask_strategy\": \"0\"}\na woman with curly hair is drinking a beer{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman with curly hair is drinking a beer.jpg\", \"mask_strategy\": \"0\"}\nan old man standing in the middle of a field holding a bunch of plants{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an old man standing in the middle of a field holding a bunch of plants.jpg\", \"mask_strategy\": \"0\"}\na man standing on a boat with a net{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man standing on a boat with a net.jpg\", \"mask_strategy\": \"0\"}\na woman in a hat is putting salt into a basket{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman in a hat is putting salt into a basket.jpg\", \"mask_strategy\": \"0\"}\na young girl smelling a pink flower{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a young girl smelling a pink flower.jpg\", \"mask_strategy\": \"0\"}\na young boy leaning on a wooden pole{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a young boy leaning on a wooden pole.jpg\", \"mask_strategy\": \"0\"}\na man in a hat sitting in front of a brick oven{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man in a hat sitting in front of a brick oven.jpg\", \"mask_strategy\": \"0\"}\na man in a mexican outfit holding an acoustic guitar{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man in a mexican outfit holding an acoustic guitar.jpg\", \"mask_strategy\": \"0\"}\na snowboarder is in the air doing a trick{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a snowboarder is in the air doing a trick.jpg\", \"mask_strategy\": \"0\"}\na man riding a horse with a spear in his hand{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man riding a horse with a spear in his hand.jpg\", \"mask_strategy\": \"0\"}\na woman carrying a bundle of plants over their head{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman carrying a bundle of plants over their head.jpg\", \"mask_strategy\": \"0\"}\na person jumping in the air over a fence{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person jumping in the air over a fence.jpg\", \"mask_strategy\": \"0\"}\na man on a surfboard riding a wave in the ocean{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man on a surfboard riding a wave in the ocean.jpg\", \"mask_strategy\": \"0\"}\na man sitting on steps playing an acoustic guitar{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man sitting on steps playing an acoustic guitar.jpg\", \"mask_strategy\": \"0\"}\na man swinging a tennis racquet at a tennis ball{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man swinging a tennis racquet at a tennis ball.jpg\", \"mask_strategy\": \"0\"}\na man riding a mountain bike on top of a rocky hill{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man riding a mountain bike on top of a rocky hill.jpg\", \"mask_strategy\": \"0\"}\na man riding a bike down a street{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man riding a bike down a street.jpg\", \"mask_strategy\": \"0\"}\na man is running on a dirt road{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man is running on a dirt road.jpg\", \"mask_strategy\": \"0\"}\nA man in a black suit and a sombrero, shouting loudly{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A man in a black suit and a sombrero, shouting loudly.jpg\", \"mask_strategy\": \"0\"}\na man standing on top of a sand dune in the desert{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man standing on top of a sand dune in the desert.jpg\", \"mask_strategy\": \"0\"}\na person riding a motorcycle down a road{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person riding a motorcycle down a road.jpg\", \"mask_strategy\": \"0\"}\na man standing on top of a mountain with a backpack{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man standing on top of a mountain with a backpack.jpg\", \"mask_strategy\": \"0\"}\na man with a skull face paint smoking a cigar and holding a guitar{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man with a skull face paint smoking a cigar and holding a guitar.jpg\", \"mask_strategy\": \"0\"}\na man in sunglasses laying on a wooden bench{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man in sunglasses laying on a wooden bench.jpg\", \"mask_strategy\": \"0\"}\nan older woman sitting in a room with a cigarette in her hand{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an older woman sitting in a room with a cigarette in her hand.jpg\", \"mask_strategy\": \"0\"}\na man sitting on the ground playing a musical instrument{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man sitting on the ground playing a musical instrument.jpg\", \"mask_strategy\": \"0\"}\na person riding a horse in a polo match{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person riding a horse in a polo match.jpg\", \"mask_strategy\": \"0\"}\na woman in a kimono holding an umbrella{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman in a kimono holding an umbrella.jpg\", \"mask_strategy\": \"0\"}\na person riding a dirt bike{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person riding a dirt bike.jpg\", \"mask_strategy\": \"0\"}\na person riding an atv on a dirt track{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person riding an atv on a dirt track.jpg\", \"mask_strategy\": \"0\"}\na person riding a wave on a surfboard{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person riding a wave on a surfboard.jpg\", \"mask_strategy\": \"0\"}\na woman in a wetsuit is swimming in the ocean{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman in a wetsuit is swimming in the ocean.jpg\", \"mask_strategy\": \"0\"}\na man snorkling in the ocean{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man snorkling in the ocean.jpg\", \"mask_strategy\": \"0\"}\na beautiful woman in a blue sari posing in front of a wall{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a beautiful woman in a blue sari posing in front of a wall.jpg\", \"mask_strategy\": \"0\"}\na woman wearing a shawl in front of a mountain{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman wearing a shawl in front of a mountain.jpg\", \"mask_strategy\": \"0\"}\na woman is making bread in an oven{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman is making bread in an oven.jpg\", \"mask_strategy\": \"0\"}\na woman smiles while holding a yellow flower{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman smiles while holding a yellow flower.jpg\", \"mask_strategy\": \"0\"}\nA young boy is lifting a bundle of dry grass wrapped in waterproof fabric over his head{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A young boy is lifting a bundle of dry grass wrapped in waterproof fabric over his head.jpg\", \"mask_strategy\": \"0\"}\ntwo people performing a sword fight in front of a forest{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two people performing a sword fight in front of a forest.jpg\", \"mask_strategy\": \"0\"}\na woman in a colorful shirt is cooking food{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman in a colorful shirt is cooking food.jpg\", \"mask_strategy\": \"0\"}\nan older woman is drinking a bottle of water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an older woman is drinking a bottle of water.jpg\", \"mask_strategy\": \"0\"}\na smiling woman sitting at a table with food and drinks{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a smiling woman sitting at a table with food and drinks.jpg\", \"mask_strategy\": \"0\"}\na woman wearing a hijab reading a book on the beach{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman wearing a hijab reading a book on the beach.jpg\", \"mask_strategy\": \"0\"}\na woman wearing a headscarf is reaching for an olive tree{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman wearing a headscarf is reaching for an olive tree.jpg\", \"mask_strategy\": \"0\"}\na woman in a white dress jumping in the air in a field of pink flowers{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman in a white dress jumping in the air in a field of pink flowers.jpg\", \"mask_strategy\": \"0\"}\na woman wearing a conical hat sits on a boat{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman wearing a conical hat sits on a boat.jpg\", \"mask_strategy\": \"0\"}\nan older woman sitting in front of an old building{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an older woman sitting in front of an old building.jpg\", \"mask_strategy\": \"0\"}\na woman is praying in front of a buddhist temple{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman is praying in front of a buddhist temple.jpg\", \"mask_strategy\": \"0\"}\na woman with green hair smiling for the camera{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman with green hair smiling for the camera.jpg\", \"mask_strategy\": \"0\"}\nA group of people in a yellow raft is rowing through turbulent waters{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A group of people in a yellow raft is rowing through turbulent waters.jpg\", \"mask_strategy\": \"0\"}\na man carrying a woman on his back in a field{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man carrying a woman on his back in a field.jpg\", \"mask_strategy\": \"0\"}\nan indian police officer talking to an old woman{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an indian police officer talking to an old woman.jpg\", \"mask_strategy\": \"0\"}\ntwo people scuba diving in the ocean{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two people scuba diving in the ocean.jpg\", \"mask_strategy\": \"0\"}\nA man and woman dressed as sugar skulls in a field of flowers, sharing a loving gaze with each other{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A man and woman dressed as sugar skulls in a field of flowers, sharing a loving gaze with each other.jpg\", \"mask_strategy\": \"0\"}\na group of people watching a cow race{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of people watching a cow race.jpg\", \"mask_strategy\": \"0\"}\na man and a child riding bumper cars in an amusement park{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man and a child riding bumper cars in an amusement park.jpg\", \"mask_strategy\": \"0\"}\na group of motorcyclists racing on a dirt track{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of motorcyclists racing on a dirt track.jpg\", \"mask_strategy\": \"0\"}\na man and a woman are boxing in a boxing ring{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man and a woman are boxing in a boxing ring.jpg\", \"mask_strategy\": \"0\"}\na man holding a baby in his arms{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man holding a baby in his arms.jpg\", \"mask_strategy\": \"0\"}\na man and a woman sitting on a bench playing instruments{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man and a woman sitting on a bench playing instruments.jpg\", \"mask_strategy\": \"0\"}\ntwo men are standing next to each other with a bicycle{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two men are standing next to each other with a bicycle.jpg\", \"mask_strategy\": \"0\"}\na man and a boy sitting on a beach near the ocean{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man and a boy sitting on a beach near the ocean.jpg\", \"mask_strategy\": \"0\"}\ntwo men in white clothing standing next to each other{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two men in white clothing standing next to each other.jpg\", \"mask_strategy\": \"0\"}\na group of men riding horses in a dusty arena{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of men riding horses in a dusty arena.jpg\", \"mask_strategy\": \"0\"}\na soccer player in a yellow and black shirt is chasing a soccer ball{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a soccer player in a yellow and black shirt is chasing a soccer ball.jpg\", \"mask_strategy\": \"0\"}\na group of women sitting on the steps of a building{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of women sitting on the steps of a building.jpg\", \"mask_strategy\": \"0\"}\na group of people gathered around a red checkered blanket{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of people gathered around a red checkered blanket.jpg\", \"mask_strategy\": \"0\"}\na group of people in orange jumpsuits running along a river{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of people in orange jumpsuits running along a river.jpg\", \"mask_strategy\": \"0\"}\na woman walking down a sidewalk with a bag{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman walking down a sidewalk with a bag.jpg\", \"mask_strategy\": \"0\"}\na busy street with cars and people on motorcycles{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a busy street with cars and people on motorcycles.jpg\", \"mask_strategy\": \"0\"}\na man in a mask is walking through a crowd of people{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man in a mask is walking through a crowd of people.jpg\", \"mask_strategy\": \"0\"}\na man and a woman walking under an umbrella next to a brick wall{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a man and a woman walking under an umbrella next to a brick wall.jpg\", \"mask_strategy\": \"0\"}\na group of people riding bikes down a street{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of people riding bikes down a street.jpg\", \"mask_strategy\": \"0\"}\nAn old person is holding a cup on the street, and people around are curiously looking at him{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/An old person is holding a cup on the street, and people around are curiously looking at him.jpg\", \"mask_strategy\": \"0\"}\ntwo young girls playing with leaves in the woods{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two young girls playing with leaves in the woods.jpg\", \"mask_strategy\": \"0\"}\nOne person is riding on the back of a horse led by another person{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/One person is riding on the back of a horse led by another person.jpg\", \"mask_strategy\": \"0\"}\nan older woman and a young girl are knitting together{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/an older woman and a young girl are knitting together.jpg\", \"mask_strategy\": \"0\"}\nthree geishas walking down the street in traditional clothing{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/three geishas walking down the street in traditional clothing.jpg\", \"mask_strategy\": \"0\"}\ntwo men riding bikes down a road near a forest{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two men riding bikes down a road near a forest.jpg\", \"mask_strategy\": \"0\"}\ntwo women carrying bowls on their heads{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two women carrying bowls on their heads.jpg\", \"mask_strategy\": \"0\"}\ntwo women eating pizza at a restaurant{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two women eating pizza at a restaurant.jpg\", \"mask_strategy\": \"0\"}\ntwo young women studying in a library{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two young women studying in a library.jpg\", \"mask_strategy\": \"0\"}\npink water lilies in a pond with leaves{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/pink water lilies in a pond with leaves.jpg\", \"mask_strategy\": \"0\"}\na group of succulents in a rock garden{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of succulents in a rock garden.jpg\", \"mask_strategy\": \"0\"}\na close up view of a bunch of snowdrop flowers{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close up view of a bunch of snowdrop flowers.jpg\", \"mask_strategy\": \"0\"}\na close up of leaves with water droplets on them{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close up of leaves with water droplets on them.jpg\", \"mask_strategy\": \"0\"}\na close-up of a sea anemone in the water{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up of a sea anemone in the water.jpg\", \"mask_strategy\": \"0\"}\na plant with water droplets on it{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a plant with water droplets on it.jpg\", \"mask_strategy\": \"0\"}\na group of cactus plants in the desert{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of cactus plants in the desert.jpg\", \"mask_strategy\": \"0\"}\na close-up view of a plant with spiky leaves{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of a plant with spiky leaves.jpg\", \"mask_strategy\": \"0\"}\nA budding and blossoming flower bud seedling{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A budding and blossoming flower bud seedling.jpg\", \"mask_strategy\": \"0\"}\na field of orange flowers near the ocean'{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a field of orange flowers near the ocean'.jpg\", \"mask_strategy\": \"0\"}\na close-up view of a bunch of pink flowers{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close-up view of a bunch of pink flowers.jpg\", \"mask_strategy\": \"0\"}\npink water lilies in a pond{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/pink water lilies in a pond.jpg\", \"mask_strategy\": \"0\"}\nreeds blowing in the wind against a cloudy sky{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/reeds blowing in the wind against a cloudy sky.jpg\", \"mask_strategy\": \"0\"}\ntwo tall cacti in the middle of the desert{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two tall cacti in the middle of the desert.jpg\", \"mask_strategy\": \"0\"}\na sea anemone on a coral reef{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a sea anemone on a coral reef.jpg\", \"mask_strategy\": \"0\"}\na dandelion blowing in the wind{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a dandelion blowing in the wind.jpg\", \"mask_strategy\": \"0\"}\nA boiling pot cooking vegetables{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A boiling pot cooking vegetables.jpg\", \"mask_strategy\": \"0\"}\na woman stirring food in a pan on the stove{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a woman stirring food in a pan on the stove.jpg\", \"mask_strategy\": \"0\"}\ntwo eggs are fried in a frying pan on the stove{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/two eggs are fried in a frying pan on the stove.jpg\", \"mask_strategy\": \"0\"}\nfried onion rings in a basket{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/fried onion rings in a basket.jpg\", \"mask_strategy\": \"0\"}\na pot is sitting on top of a campfire{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pot is sitting on top of a campfire.jpg\", \"mask_strategy\": \"0\"}\na chef is preparing a dish with mushrooms on a wooden board{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a chef is preparing a dish with mushrooms on a wooden board.jpg\", \"mask_strategy\": \"0\"}\na hand holding a slice of pizza{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a hand holding a slice of pizza.jpg\", \"mask_strategy\": \"0\"}\nA person is using tongs to pick up meat from a plate{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A person is using tongs to pick up meat from a plate.jpg\", \"mask_strategy\": \"0\"}\nThe meat is picked up from the grill with tongs{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/The meat is picked up from the grill with tongs.jpg\", \"mask_strategy\": \"0\"}\nA person is whisking eggs, and the egg whites and yolks are gently streaming out{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A person is whisking eggs, and the egg whites and yolks are gently streaming out.jpg\", \"mask_strategy\": \"0\"}\na person is putting sauce on a burger{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person is putting sauce on a burger.jpg\", \"mask_strategy\": \"0\"}\nA person is making dumplings{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A person is making dumplings.jpg\", \"mask_strategy\": \"0\"}\na pan filled with fried food{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a pan filled with fried food.jpg\", \"mask_strategy\": \"0\"}\nChopsticks are slowly picking up the buns from the plastic container{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/Chopsticks are slowly picking up the buns from the plastic container.jpg\", \"mask_strategy\": \"0\"}\na basket of french fries in a fryer{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a basket of french fries in a fryer.jpg\", \"mask_strategy\": \"0\"}\na table with lobsters and drinks on it{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a table with lobsters and drinks on it.jpg\", \"mask_strategy\": \"0\"}\na person pouring coffee into a pot on a stove{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person pouring coffee into a pot on a stove.jpg\", \"mask_strategy\": \"0\"}\na kettle is sitting on top of a campfire{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a kettle is sitting on top of a campfire.jpg\", \"mask_strategy\": \"0\"}\nChopsticks are picking up noodles from the bowl{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/Chopsticks are picking up noodles from the bowl.jpg\", \"mask_strategy\": \"0\"}\na person is cooking eggs on an outdoor grill{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person is cooking eggs on an outdoor grill.jpg\", \"mask_strategy\": \"0\"}\na person is cooking food in a wok on a stove{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person is cooking food in a wok on a stove.jpg\", \"mask_strategy\": \"0\"}\na person is holding up a burger with his hands{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person is holding up a burger with his hands.jpg\", \"mask_strategy\": \"0\"}\nA person is pouring water into a teacup{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/A person is pouring water into a teacup.jpg\", \"mask_strategy\": \"0\"}\na person pouring seasoning into a pot of food{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person pouring seasoning into a pot of food.jpg\", \"mask_strategy\": \"0\"}\na person holding a taco in their hand{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person holding a taco in their hand.jpg\", \"mask_strategy\": \"0\"}\na person slicing salmon on a cutting board{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person slicing salmon on a cutting board.jpg\", \"mask_strategy\": \"0\"}\na bunch of food is cooking on a grill over an open fire{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a bunch of food is cooking on a grill over an open fire.jpg\", \"mask_strategy\": \"0\"}\na close up of a piece of sushi on chopsticks{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a close up of a piece of sushi on chopsticks.jpg\", \"mask_strategy\": \"0\"}\na group of pots on a stove with flames in the background{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a group of pots on a stove with flames in the background.jpg\", \"mask_strategy\": \"0\"}\na person cooking vegetables in a pan on a stove{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person cooking vegetables in a pan on a stove.jpg\", \"mask_strategy\": \"0\"}\na large pot of soup filled with vegetables and meat{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a large pot of soup filled with vegetables and meat.jpg\", \"mask_strategy\": \"0\"}\na person holding chopsticks over a bowl of food{\"reference_path\": \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop/1-1/a person holding chopsticks over a bowl of food.jpg\", \"mask_strategy\": \"0\"}\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_category/animal.txt",
    "content": "a black dog wearing halloween costume\nspider making a web\nbat eating fruits while hanging\na snake crawling on a wooden flooring\na close up video of a dragonfly\nmacro shot of ladybug on green leaf plant\nchameleon eating ant\na bee feeding on nectars\nbird nests on a tree captured with moving camera\na squirrel eating nuts\nclose up video of snail\ntop view of a hermit crab crawling on a wooden surface\ncat licking another cat\nred dragonfly perched on green leaf\nclose up view of a brown caterpillar crawling on green leaf\nants eating dead spider\nan eagle on a tree branch\na frog eating an ant\nwhite rabbit near the fence\na gorilla eating a carrot\nclose up of wolf\na meerkat looking around\na hyena in a zoo\nlemur eating grass leaves\nan owl being trained by a man\na lizard on a bamboo\nbrown chicken hunting for its food\nvideo of parrots perched on bird stand\nunderwater footage of an octopus in a coral reef\na cute pomeranian dog playing with a soccer ball\nwhite fox on rock\nclose up footage of a horse figurine\ngiraffe feeding on a tree in a savannah\ncurious cat sitting and looking around\nhummingbird hawk moth flying near pink flowers\nclose up of a scorpion on a rock\nclose up on fish in net\nkoala eating leaves from a branch\na pod of dolphins swirling in the sea catching forage fish\nlow angle view of a hawk perched on a tree branch\na lion standing on wild grass\ndeer grazing in the field\nelephant herd in a savanna\nclose up on lobster under water\nhedgehog crossing road in forest\na sheep eating yellow flowers from behind a wire fence\ntwin sisters and a turtle\na pig wallowing in mud\nflock of goose eating on the lake water\ncow in a field irritated with flies\na close up shot of a fly\ncheetah lying on the grass\nclose up of a lemur\nclose up shot of a kangaroo itching in the sand\na tortoise covered with algae\nturkey in cage\na great blue heron bird in the lakeside\ncrab with shell in aquarium\na seagull walking on shore\nan american crocodile\na tiger walking inside a cage\nalligator in the nature\na raccoon climbing a tree\nwild rabbit in a green meadow\ngroup of ring tailed lemurs\na clouded leopard on a tree branch\nduck grooming its feathers\nan african penguin walking on a beach\na video of a peacock\nclose up shot of a wild bear\nbaby rhino plays with mom\nporcupine climbs tree branches\nclose up of a natterjack toad on a rock\na sleeping orangutan\nmother whale swimming with babies\na bear wearing red jersey\npink jellyfish swimming underwater in a blue sea\nbeautiful clown fish swimming\nanimation of disposable objects shaped as a whale\npaper cut out of a pair of hands a whale and a heart\nvertical video of camel roaming in the field during daytime\na still video of mosquito biting human\na curious sloth hanging from a tree branch\na plastic flamingo bird stumbles from the wind\na wolf in its natural habitat\na monkey sitting in the stone and scratching his head\nbat hanging upside down\na red panda eating leaves\nsnake on ground\na harbour seal swimming near the shore\nshark swimming in the sea\notter on branch while eating\ngoat standing over a rock\na troop of monkey on top of a mountain\na zebra eating grass on the field\na colorful butterfly perching on a bud\na snail crawling on a leaf\nzookeeper showering a baby elephant\na beetle emerging from the sand\na nine banded armadillo searching for food\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_category/architecture.txt",
    "content": "an apartment building with balcony\nasian garden and medieval castle\nilluminated tower in berlin\na wooden house overseeing the lake\na crowd of people in a plaza in front of a government building\na church interior\njewish friends posing with hanukkah menorah in a cabin house\na destroyed building after a missile attack in ukraine\nabandoned building in the woods\ndrone video of an abandoned school building in pripyat ukraine\nelegant university building\narchitecture and designs of buildings in central london\na pancake tower with chocolate syrup and strawberries on top\nan ancient white building\nfriends hanging out at a coffee house\nhouse front door with christmas decorations\ncity night dark building\na bird house hanging on a tree branch\nsacred sculpture in a temple\nhigh angle shot of a clock tower\nmodern wooden house interior\nthe interior of an abandoned building\nopera house overlooking sea\na concrete structure near the green trees\ndome like building in scotland\nlow angle shot of a building\ntower on hill\na miniature house\neiffel tower from the seine river\nlow angle footage of an apartment building\nisland with pier and antique building\nasian historic architecture\ndrone footage of a beautiful mansion\nmosque in the middle east\nbuilding a tent and hammock in the forest camping site\ntop view of a high rise building\nhouse covered in snow\nskyscraper at night\nhouse in village\na casino with people outside the building\nsilhouette of a building\na woman climbing a tree house\ndrone view of house near lake during golden hour\nan under construction concrete house\na watch tower by the sea\nexterior view of arabic style building\nvideo of a hotel building\nred paper lantern decorations hanging outside a building\nhouse on seashore\naerial footage of the palace of culture and science building in warsaw poland\naerial video of stuttgart tv tower in germany\naerial view of the highway and building in a city\ndrone shot of a skyscraper san francisco california usa\nwaterfall and house\nview of the sky through a building\ndrone footage of a house on top of the mountain\nabandoned house in the nature\nclouds hovering over a mansion\nlight house on the ocean\nbuddhist temple at sunrise\npeople walking by a graveyard near a mosque at sunset\nview of lifeguard tower on the beach\nscenic view of a house in the mountains\nthe landscape in front of a government building\naerial footage of a building and its surrounding landscape in winter\ntime lapse of a cloudy sky behind a transmission tower\nblue ocean near the brown castle\nfog over temple\nhouse in countryside top view\nbuilding under construction\nturkish flag waving on old tower\nthe georgian building\nclose up shot of a steel structure\nthe atrium and interior design of a multi floor building\ncity view reflected on a glass building\naerial view of a luxurious house with pool\nan unpaved road leading to the house\ndrone footage of a lookout tower in mountain landscape\nwind turbines on hill behind building\ntime lapse footage of the sun light in front of a small house porch\na building built with lots of stairways\novercast over house on seashore\nthe view of the sydney opera house from the other side of the harbor\ncandle on a jar and a house figurine on a surface\nvideo of a farm and house\na dilapidated building made of bricks\na view of a unique building from a moving vehicle\naerial footage of a tall building in cambodia\npush in shot of a huge house\na beach house built over a seawall protected from the sea waves\nexotic house surrounded by trees\ndrone video of a house surrounded by tropical vegetation\ndrone footage of a building beside a pond\nobservation tower on hill in forest\na tree house in the woods\na video of vessel structure during daytime\nfire in front of illuminated building at night\na footage of a wooden house on a wheat field\ntilt shot of a solar panel below a light tower\nwater tower on the desert\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_category/food.txt",
    "content": "freshly baked finger looking cookies\nvideo of fake blood in wine glass\nhalloween food art\na person slicing a vegetable\na serving of pumpkin dish in a plate\nclose up view of green leafy vegetable\na birthday cake in the plate\nvideo of a slice papaya fruit\na muffin with a burning candle and a love sign by a ceramic mug\na jack o lantern designed cookie\nbaked bread with chocolate\na broccoli soup on wooden table\na freshly brewed coffee on a pink mug\ngrabbing sourdough neapolitan style pizza slices\nperson cooking mushrooms in frying pan\nrice grains placed on a reusable cloth bag\nslices of kiwi fruit\ngrilling a steak on a pan grill\nclose up of bread popping out of a toaster\nman eating noodle\npreparing a cocktail drink\nclose up pasta with bacon on plate\nmilk and cinnamon rolls\nboy getting a dumpling using chopsticks\na mother preparing food with her kids\nman using his phone while eating\nfresh salmon salad on a plate\ncutting cucumbers into long thin slices as ingredient for sushi roll\na steaming cup of tea by the window\na glass filled with beer\na kid eating popcorn while watching tv\nclose up shot of fried fish on the plate\na man eating a donut\nperson making a vegetarian dish\nspreading cheese on bagel\nclose up view of a man drinking red wine\na couple having breakfast in a restaurant\na student eating her sandwich\ngirl peeling a banana\nred rice in a small bowl\npancake with blueberry on the top\ngreen apple fruit on white wooden table\na man eating a taco by the bar\nmaking of a burrito\nsqueezing lemon into salad\na chef cutting sushi rolls\nvideo of a delicious dessert\ndeep frying a crab on a wok in high fire\nclose up video of a orange juice\nvideo of a cooked chicken breast\nwoman holding a pineapple\na woman eating a bar of chocolate\ndecorating christmas cookie\nsqueezing a slice of fruit\ntuna sashimi on a plate\na strawberry fruit mixed in an alcoholic drink\npreparing hot dogs in a grill\na woman cutting a tomato\nan orange fruit cut in half\na coconut fruit with drinking straw\nwoman holding a dragon fruit\na woman pouring hot beverage on a cup\nwaffles with whipped cream and fruit\nfocus shot of an insect at the bottom of a fruit\npreparing a healthy broccoli dish\nman eating snack at picnic\nclose up video of a grilled shrimp skewer\na woman mixing a smoothie drinks\nclose up video of woman having a bite of jelly\nbusinessman drinking whiskey at the bar counter of a hotel lounge\ncutting an onion with a knife over a wooden chopping board\nfresh lemonade in bottles\ngrilling a meat on a charcoal grill\npeople enjoying asian cuisine\nclose up footage of a hot dish on a clay pot\npork ribs dish\nwaffle with strawberry and syrup for breakfast\ntofu dish with rose garnish\nuncooked pork meat\negg yolk being dumped over gourmet dish\ntasty brunch dish close up\nlittle boy pretending to eat the watermelon\nslicing roasted beef\nclose up of a chef adding teriyaki sauce to a dish\nflat lay mexican dish\na person placing an octopus dish on a marble surface\nclose up of tea leaves brewing in a glass kettle\nadding fresh herbs to soup dish\na scoop of roasted coffee beans\nfresh dim sum set up on a bamboo steam tray for cooking\na girl putting ketchup on food at the kitchen\ncooking on electric stove\na woman with a slice of a pie\ngrapes and wine on a wooden board\nman taking picture of his food\nhamburger and fries on restaurant table\nclose up video of japanese food\na cracker sandwich with cheese filling for snack\nbarista preparing matcha tea\nclose up of onion rings being deep fried\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_category/human.txt",
    "content": "people carving a pumpkin\npeople sitting on a sofa\na man with a muertos face painting\nman walking in the dark\nmen in front of their computer editing photos\nmen loading christmas tree on tow truck\nwoman washing the dishes\nwoman adding honey to the cinnamon rolls\ntwo women kissing and smiling\nthree women looking at watercolor paintings\na family wearing paper bag masks\na family posing for the camera\na boy covering a rose flower with a dome glass\nboy sitting on grass petting a dog\na girl in her tennis sportswear\na girl coloring the cardboard\nsilhouette of the couple during sunset\ncouple dancing with body paint\na child playing with water\na woman with her child sitting on a couch in the living room\na group of friend place doing hand gestures of agreement\nfriends having a group selfie\nfriends talking while on the basketball court\ngroup of people protesting\na group of campers with a cute dog\na group of photographers taking pictures at the north western gardens in llandudno north wales\na group of students laughing and talking\na group of martial artist warming up\na person playing golf\na person walking on a wet wooden bridge\nperson doing a leg exercise\nice hockey athlete on rink\na young athlete training in swimming\nchess player dusting a chessboard\nbaseball player holding his bat\na bearded man putting a vinyl record on a vinyl player\nan orchestra finishes a performance\npeople applauding the performance of the kids\nband performance at the recording studio\nfather and his children playing jenga game\npeople playing a board game\nman playing a video game\na man video recording the movie in theater\nman and a woman eating while watching a movie\nmovie crew talking together\na director explaining the movie scene\nman and woman listening to music on car\nman playing music\ncouple dancing slow dance with sun glare\na ballerina practicing in the dance studio\nfather and son holding hands\nfather and daughter talking together\na mother and her kids engaged in a video call\nmother and daughter reading a book together\na mother teaching her daughter playing a violin\nkid in a halloween costume\na happy kid playing the ukulele\na chef slicing a cucumber\nchef wearing his gloves properly\nbrother and sister using hammock\ngirl applying sunblock to her brother\na girl pushing the chair while her sister is on the chair\ncolleagues talking in office building\nfighter practice kicking\na woman fighter in her cosplay costume\nan engineer holding blueprints while talking with her colleague\na young woman looking at vr controllers with her friend\nworkmates teasing a colleague in the work\na male police officer talking on the radio\nteacher holding a marker while talking\nteacher writing on her notebook\na young student attending her online classes\na student showing his classmates his wand\na male vendor selling fruits\na shirtless male climber\na sound engineer listening to music\nfemale talking to a psychiatrist in a therapy session\nyoung female activist posing with flag\na man in a hoodie and woman with a red bandana talking to each other and smiling\na medium close up of women wearing kimonos\na male interviewer listening to a person talking\na social worker having a conversation with the foster parents\na farm worker harvesting onions\nworker packing street food\nworker and client at barber shop\nelderly man lifting kettlebell\nmom assisting son in riding a bicycle\ndad watching her daughter eat\nyoung guy with vr headset\npregnant woman exercising with trainer\na fortune teller talking to a client\nwizard doing a ritual on a woman\na footage of an actor on a movie scene\na man holding a best actor trophy\na singer of a music band\na young singer performing on stage\nyoung dancer practicing at home\nseller showing room to a couple\ncab driver talking to passenger\na policeman talking to the car driver\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_category/lifestyle.txt",
    "content": "kids celebrating halloween at home\nlittle boy helping mother in kitchen\nvideo of a indoor green plant\na girl arranges a christmas garland hanging by the kitchen cabinet\ncandle burning in dark room\ncouple having fun and goofing around the bedroom\ngirls jumping up and down in the bedroom\nwoman and man in pajamas working from home\na muslim family sitting and talking in the living room\nfamily enjoying snack time while sitting in the living room\nwoman holding an animal puppet and a little girl playing together at the living room\nkids playing in the indoor tent\nyoung people celebrating new year at the office\na woman writing on the sticky note in the office\na woman exercising at home over a yoga mat\ngirls preparing easter decorations at home\ndog on floor in room\nturning on a fluorescent light inside a room\ncolleagues talking to each other near the office windows\na woman recording herself while exercising at home\nmusic room\ndifferent kind of tools kept in a utility room\nsofa beds and other furniture\na girl finding her brother reading a book in the bedroom\nan elegant ceramic plant pot and hanging plant on indoor\nfurniture inside a bedroom\ninterior design of the bar section\nliving room with party decoration\nfirewood burning in dark room\na young woman playing the ukulele at home\nwoman painting at home\na woman in a locker room\nvideo of a bathroom interior\nthe interior design of a jewish synagogue\na woman in protective suit disinfecting the kitchen\nmodern minimalist home interior\nmodern interior design of a coffee shop\nperson arranging minimalist furniture\naerial shot of interior of the warehouse\na room of a manufacturing facility\ninterior of catholic\ninterior design of a restaurant\na female model in a changing room looking herself in mirror\nmen walking in the office hallway\npeople sitting in a conference room\nthe interior design of a shopping mall\nchandeliers in room\nlucerne railway station interior\na female fencer posing in a foggy room\na toolbox and a paint roller beside a huge package in a room\nbedroom in hotel\na woman lying in the operating room\na chef holding and checking kitchen utensils\na couple singing in the shower room together\na woman cleaning mess in the living room\nan empty meeting room with natural light\nperson dancing in a dark room\nclose up on blood in hospital room\na couple resting on their home floor\na young female staff at courier office\na man entering the gym locker room\na bored man sitting by the tv at home\nwoman dancing in indoor garden\nrubble in the interior of an abandoned house\nindoor farm in a greenhouse\nman doing handstand in indoor garden\nan abandoned indoor swimming pool\nhome decorations on top of a cabinet\ngraffiti art on the interior walls of an abandoned mansion\nindoor wall climbing activity\nsunlight inside a room\nteenage girl roller skating at indoor rink\nhome deco with lighted\nbaby in the shower room\nmen enjoying office christmas party\na bedroom with a brick wall\nactors prepping in the dressing room\nkids playing at an indoor playground\na person sanitizing an office space using smoke machine\nmother and daughter choosing clothes at home\na woman sitting by the indoor fire pit\nman standing on the corner of the room while looking around\nperson assembling furniture\na family stacking cardboard boxes in a room\nfamily having fun in the dining room\nperson disinfecting a room\na woman washing strawberries in the kitchen sink\nmodern office waiting room\nclose up view of a person slicing with a kitchen knife\nboiling coffee on a stove in the kitchen\nmodern equipment used in a home studio\ninterior of a recording studio\npeople working in a call center office\nband performing at a home concert\na group of people watching a concert in a room\npeople packing their furniture\nyoung employees in office holding a certificate\na criminal inside a dark room handcuffed in a table\ncouple browsing and looking for furniture in the store\nworkspace at home\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_category/plant.txt",
    "content": "video of a indoor green plant\nclose up view of a plant\nclose up shot of a burning plant\nplucking leaves from plant\na plant on gold pot with glass lid\na branch of a tree and a plant\na leafless tree\nclose up shot of fern leaf\nclose up video of strawberry plant\nplant with blooming flowers\nclose up video of flower petals\nwatering yellow plant\nbeautiful flower decoration\ncannabis flower in a jar\na footage of the tree leaves\na red leaf plant\nclose up view of a white christmas tree\nsnow pouring on a tree\nclose up shot of white flowers on the tree\nleaves in the trees daytime\na dead tree lying on a grass field\ntree branches in a flowing river\npurple flowers with leaves\na coconut tree by the house\nclose up on flower in winter\nbamboo leaves backlit by the sun\nclose up video of a wet flower\na man putting a flower in a box\ndropping flower petals on a wooden bowl\na close up shot of gypsophila flower\nvariety of succulent plants on a garden\nvariety of trees and plants in a botanical garden\nforest of deciduous trees\na stack of dried leaves burning in a forest\ntall forest trees on a misty morning\nclose up view of dewdrops on a leaf\nclose up view of white petaled flower\nremoving a pineapple leaf\na dragonfly perched on a leaf\nbutterfly pollinating flower\nperson visiting and checking a corn plant\nwoman picking beans from a plant\nwoman plucking mint leaves\nsingle tree in the middle of farmland\na plant on a soil\ndrone footage of a tree on farm field\na tractor harvesting lavender flower\npeople putting christmas ornaments on a christmas tree\njack o lantern hanging on a tree\ntree with halloween decoration\nflower field near the waterfall\ntruck carrying the tree logs\nraindrops falling on leaves\nshot of a palm tree swaying with the wind\nsquirrels on a tree branch\nperson holding a flower\na fallen tree trunk\ntree with golden leaves\ncherry tree\nwind blows through leaves of the tree in autumn\na leaf on a glass\nthe long trunks of tall trees in the forest\ntrees in the forest during sunny day\nclose up video of tree bark\nreflection of tree branches\ntrunks of many trees in the forest\ntree leaves providing shades from the sun\nleaves swaying in the wind\nlow angle shot of baobab tree\nbare trees in forest\na plant surrounded by fallen leaves\na couple preparing food and pruning a plant\na man cutting a tree bark\noranges on a tree branch\nplant connected on the stones\nvideo of a sawmill machine cutting tree log\nwomen drying flower petals\nmacro view of an agave plant\na video of a person tying a plant on a string\ngreen moss in forest nature\ncoconut tree near sea under blue sky\nthe canopy of a coconut tree\na man leaning on a tree at the beach\na full grown plant on a pot\ncandle wax dripping on flower petals\nclose up of leaves in autumn\na woman opening a book with a flower inside\na man holding leaves looking at the camera\na shadow of a swaying plant\na tree and concrete structure under a blue and cloudy sky\ntrimming excess leaves on a potted plant\nthe changing color of the tree leaves during autumn season\na gooseberry tree swayed by the wind\nforest trees and a medieval castle at sunset\nwoman cut down tree\nan old oak tree in a park across the street from a hotel\nwild flowers growing in a forest ground\na mossy fountain and green plants in a botanical garden\nmansion with beautiful garden\nants on a dragon fruit flower\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_category/scenery.txt",
    "content": "scenery of desert landscape\nlandscape agriculture farm tractor\nburning slash piles in the forest\ngraveyard at sunset\nview of a jack o lantern with pumpkins in a smoky garden\nsun view through a spider web\nview of the sea from an abandoned building\nclose up view of a full moon\nclose up view of lighted candles\nclose up view of swaying white flowers and leaves\nscenery of a relaxing beach\nselective focus video of grass during sunny day\naerial view of brown dry landscape\nfireworks display in the sky at night\na bonfire near river\nmountain view\nwaterfalls in between mountain\na picturesque view of nature\nexotic view of a riverfront city\ntall trees in the forest under the clear sky\nsnow on branches in forest\nstream in the nature\nan airplane flying above the sea of clouds\nscenic video of sunset\nview of houses with bush fence under a blue and cloudy sky\nscenic view from wooden pathway\nscenic view of a tropical beach\ndrone footage of waves crashing on beach shore\na scenic view of the golden hour at norway\ntime lapse video of foggy mountain forest\nbrown mountain during fall season\nvideo of ocean during daytime\nboat sailing in the ocean\ntop view of yachts\nbeautiful scenery of flowing waterfalls and river\nwild ducks paddling on the lake surface\na relaxing scenery of beach view under cloudy sky\nnatural rock formations on beach under cloudy sky\na palm tree against blue sky\nvideo of sailboat on a lake during sunset\naerial view of snow piles\ntime lapse of a sunset sky in the countryside\naerial footage of a statue\ntime lapse video of a farm during sunset\nclouds formation in the sky at sunset\naerial shot of a village\ndrone shot of a beautiful sunrise at the mountains\ntime lapse video of foggy morning during sunrise\nsun shining between tree leaves at sunrise\nvideo of lake during dawn\nvehicles traveling on roadway under cloudy sky\nview of golden domed church\na monument under the blue sky\nfirecrackers in the sky\nview of fruit signage in the farm\na dark clouds over shadowing the full moon\nview of the amazon river\na big river swamp in a dense forest\na blooming cherry blossom tree under a blue sky with white clouds\na river waterfall cascading down the plunge basin\nflooded landscape with palm trees\na blurry waterfall background\nwaterfall in the mountains\naerial footage of a city at night\npond by small waterfall in forest\naerial view of farmlands at the bay of lake\nrice terraces in the countryside\na highway built across an agricultural area in the countryside\ngloomy morning in the countryside\ndrone shot of an abandoned coliseum on a snowy mountain top\nboat sailing in the middle of ocean\ndrone shot of the grass field\nnatural landscape of mountain and sea with islets developed into a community\naerial view of zaporizhia in ukraine\naerial footage of a herd\nan aerial footage of a red sky\ngrass and plants growing in the remains of an abandoned house\nview from hill on city\naerial view on orthodox church\naerial view of bay in croatia\na footage of a frozen river\noverlooking view of a city at daylight\nview outside the cemetery\nclear sky with moon over meadow\nclouds over railway\naerial footage of moving vehicles on the road at night\naerial view of town and park\ntop view of skyscrapers\ntop view of the empire state building in manhattan\ntop view of the central park in new york city\nsheep running in a grass field\nclear sky over factory\nsmoke and fire in birds eye view\nview of a pathway with snow melting on its side\nferry under bridge on river near city in malaysia\nmountain slopes covered in green vegetation\npanoramic view of a town surrounded by snow covered mountains\naerial view of a palace\ntop view of vehicles driving on the intersection\na graveyard by a church in a mountain landscape\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_category/vehicles.txt",
    "content": "a modern railway station in malaysia use for public transportation\ndrone footage of amsterdam metro station\ntrain arriving at a station\nred vehicle driving on field\nclose up view of flashing emergency vehicle lighting\nvehicle with fertilizer on field\na highway built across an agricultural area in the countryside\ndrone footage of motorcycles driving on country road between agricultural fields\na road in the woods under fog\nfootage of a car driving through a wheat field\nvehicle stops for an ambulance passing through city traffic\nemergency vehicle parked outside the casino\nzombies attacking a woman and a boy inside a car\nwoman seating inside the car while chewing\nvideo of passengers riding a double decker bus during night\ntraffic in london street at night\nelderly couple checking engine of automobile\na green vintage automobile with an open hood parked in a parking area\nclose up of a prototype automobile with exposed engine on the back seat of the car\naerial view of road in forest\ntrain departing from station\naerial view of a train passing by a bridge\nvideo of a train tracks\nvideo footage of a subway\nvideo of blinking traffic lights\ncouple walking out on the subway\ntime lapse of a subway tunnel\nmonitor board inside the subway\nmetro train at night\nzoom in video of a tram passing by city\nyoung man using laptop in the tram\nman reading a book at bus stop\nclose up shot of a moving taxi\nnight travel in london street on a public bus\nred bus in a rainy city\nflow of traffic in the city\nclose up shot of a yellow taxi turning left\ntwo women calling for a taxi\ndrone view of an illuminated bridge across a river\npoliceman in police car talking on radio\nairplane taking off at night\nview through window in airplane\nan airplane in the sky\nhelicopter landing on the street\na pilot getting out of a helicopter\na helicopter flying under blue sky\nboat sailing in the middle of the ocean\ngirl playing with a toy boat\nsilhouette of a boat on sea during golden hour\na boat travelling around the lake\nroad on mountain ridge\nship sailing on danube river\nslow motion video of a ship water trail in the sea\ndrone footage of a wreck ship on shore\na white yacht traveling on a river and passing under the bridge\nfemale teenagers drinking champagne in the yacht\nvideo of yacht sailing in the ocean\nred combine harvester on road on field\na woman sitting on a bicycle while using a mobile phone\na woman sitting on a motorcycle looking around\nthree teenagers fixing a bicycle\na woman in a halloween costume posing on a motorcycle\na parked motorcycle on a foggy roadside\ncable car near sea shore\na truck travelling in the road\nfootage of the road without any traffic\na road sign\nlove padlocks on a bridge\ncamera moving at highway construction site\nvehicles driving on highway\na motorbike on highway at timelapse mode\npoint of view of a car driving through a tunnel\ntime lapse of heavy traffic on an avenue\nferry boat on city canal\nblack vintage car in museum\na zigzag road across a forest\npeople crossing the road\nvideo of a kayak boat in a river\na person paddling a wooden boat in a lake\na car charging in the parking area\ncars parked on the road\nfootage of the street with people and vehicle passing by in the rain\ntraffic on busy city street\na woman getting out of the car to walk with their dog\nyacht sailing through the ocean\npeople in queue to military ship\nman wearing motorcycle helmet looking at the camera\nempty seats in the bus\nempty boat on the water\ncargo train traveling on the mountainside\ncruise ship in harbor\ncounting down at traffic lights\npressing the car ignition\nfire truck driving on the road\na footage of a broken bicycle\ndrone footage of an ambulance on the road\nslow motion footage of a racing car\nship sailing on sea against sunset\nbig cargo ship passing on the shore\nback view of man and woman walking on unpaved road\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_dimension/appearance_style.txt",
    "content": "A beautiful coastal beach in spring, waves lapping on sand, Van Gogh style\nA beautiful coastal beach in spring, waves lapping on sand, oil painting\nA beautiful coastal beach in spring, waves lapping on sand by Hokusai, in the style of Ukiyo\nA beautiful coastal beach in spring, waves lapping on sand, black and white\nA beautiful coastal beach in spring, waves lapping on sand, pixel art\nA beautiful coastal beach in spring, waves lapping on sand, in cyberpunk style\nA beautiful coastal beach in spring, waves lapping on sand, animated style\nA beautiful coastal beach in spring, waves lapping on sand, watercolor painting\nA beautiful coastal beach in spring, waves lapping on sand, surrealism style\nThe bund Shanghai, Van Gogh style\nThe bund Shanghai, oil painting\nThe bund Shanghai by Hokusai, in the style of Ukiyo\nThe bund Shanghai, black and white\nThe bund Shanghai, pixel art\nThe bund Shanghai, in cyberpunk style\nThe bund Shanghai, animated style\nThe bund Shanghai, watercolor painting\nThe bund Shanghai, surrealism style\na shark is swimming in the ocean, Van Gogh style\na shark is swimming in the ocean, oil painting\na shark is swimming in the ocean by Hokusai, in the style of Ukiyo\na shark is swimming in the ocean, black and white\na shark is swimming in the ocean, pixel art\na shark is swimming in the ocean, in cyberpunk style\na shark is swimming in the ocean, animated style\na shark is swimming in the ocean, watercolor painting\na shark is swimming in the ocean, surrealism style\nA panda drinking coffee in a cafe in Paris, Van Gogh style\nA panda drinking coffee in a cafe in Paris, oil painting\nA panda drinking coffee in a cafe in Paris by Hokusai, in the style of Ukiyo\nA panda drinking coffee in a cafe in Paris, black and white\nA panda drinking coffee in a cafe in Paris, pixel art\nA panda drinking coffee in a cafe in Paris, in cyberpunk style\nA panda drinking coffee in a cafe in Paris, animated style\nA panda drinking coffee in a cafe in Paris, watercolor painting\nA panda drinking coffee in a cafe in Paris, surrealism style\nA cute happy Corgi playing in park, sunset, Van Gogh style\nA cute happy Corgi playing in park, sunset, oil painting\nA cute happy Corgi playing in park, sunset by Hokusai, in the style of Ukiyo\nA cute happy Corgi playing in park, sunset, black and white\nA cute happy Corgi playing in park, sunset, pixel art\nA cute happy Corgi playing in park, sunset, in cyberpunk style\nA cute happy Corgi playing in park, sunset, animated style\nA cute happy Corgi playing in park, sunset, watercolor painting\nA cute happy Corgi playing in park, sunset, surrealism style\nGwen Stacy reading a book, Van Gogh style\nGwen Stacy reading a book, oil painting\nGwen Stacy reading a book by Hokusai, in the style of Ukiyo\nGwen Stacy reading a book, black and white\nGwen Stacy reading a book, pixel art\nGwen Stacy reading a book, in cyberpunk style\nGwen Stacy reading a book, animated style\nGwen Stacy reading a book, watercolor painting\nGwen Stacy reading a book, surrealism style\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, Van Gogh style\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, oil painting\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background by Hokusai, in the style of Ukiyo\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, black and white\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, pixel art\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, in cyberpunk style\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, animated style\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, watercolor painting\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, surrealism style\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, Van Gogh style\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, oil painting\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas by Hokusai, in the style of Ukiyo\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, black and white\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, pixel art\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, in cyberpunk style\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, animated style\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, watercolor painting\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, surrealism style\nAn astronaut flying in space, Van Gogh style\nAn astronaut flying in space, oil painting\nAn astronaut flying in space by Hokusai, in the style of Ukiyo\nAn astronaut flying in space, black and white\nAn astronaut flying in space, pixel art\nAn astronaut flying in space, in cyberpunk style\nAn astronaut flying in space, animated style\nAn astronaut flying in space, watercolor painting\nAn astronaut flying in space, surrealism style\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, Van Gogh style\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, oil painting\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks by Hokusai, in the style of Ukiyo\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, black and white\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, pixel art\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, in cyberpunk style\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, animated style\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, watercolor painting\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, surrealism style\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_dimension/color.txt",
    "content": "a red bicycle\na green bicycle\na blue bicycle\na yellow bicycle\nan orange bicycle\na purple bicycle\na pink bicycle\na black bicycle\na white bicycle\na red car\na green car\na blue car\na yellow car\nan orange car\na purple car\na pink car\na black car\na white car\na red bird\na green bird\na blue bird\na yellow bird\nan orange bird\na purple bird\na pink bird\na black bird\na white bird\na black cat\na white cat\nan orange cat\na yellow cat\na red umbrella\na green umbrella\na blue umbrella\na yellow umbrella\nan orange umbrella\na purple umbrella\na pink umbrella\na black umbrella\na white umbrella\na red suitcase\na green suitcase\na blue suitcase\na yellow suitcase\nan orange suitcase\na purple suitcase\na pink suitcase\na black suitcase\na white suitcase\na red bowl\na green bowl\na blue bowl\na yellow bowl\nan orange bowl\na purple bowl\na pink bowl\na black bowl\na white bowl\na red chair\na green chair\na blue chair\na yellow chair\nan orange chair\na purple chair\na pink chair\na black chair\na white chair\na red clock\na green clock\na blue clock\na yellow clock\nan orange clock\na purple clock\na pink clock\na black clock\na white clock\na red vase\na green vase\na blue vase\na yellow vase\nan orange vase\na purple vase\na pink vase\na black vase\na white vase\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_dimension/human_action.txt",
    "content": "A person is riding a bike\nA person is marching\nA person is roller skating\nA person is tasting beer\nA person is clapping\nA person is drawing\nA person is petting animal (not cat)\nA person is eating watermelon\nA person is playing harp\nA person is wrestling\nA person is riding scooter\nA person is sweeping floor\nA person is skateboarding\nA person is dunking basketball\nA person is playing flute\nA person is stretching leg\nA person is tying tie\nA person is skydiving\nA person is shooting goal (soccer)\nA person is playing piano\nA person is finger snapping\nA person is canoeing or kayaking\nA person is laughing\nA person is digging\nA person is clay pottery making\nA person is shooting basketball\nA person is bending back\nA person is shaking hands\nA person is bandaging\nA person is push up\nA person is catching or throwing frisbee\nA person is playing trumpet\nA person is flying kite\nA person is filling eyebrows\nA person is shuffling cards\nA person is folding clothes\nA person is smoking\nA person is tai chi\nA person is squat\nA person is playing controller\nA person is throwing axe\nA person is giving or receiving award\nA person is air drumming\nA person is taking a shower\nA person is planting trees\nA person is sharpening knives\nA person is robot dancing\nA person is rock climbing\nA person is hula hooping\nA person is writing\nA person is bungee jumping\nA person is pushing cart\nA person is cleaning windows\nA person is cutting watermelon\nA person is cheerleading\nA person is washing hands\nA person is ironing\nA person is cutting nails\nA person is hugging\nA person is trimming or shaving beard\nA person is jogging\nA person is making bed\nA person is washing dishes\nA person is grooming dog\nA person is doing laundry\nA person is knitting\nA person is reading book\nA person is baby waking up\nA person is massaging legs\nA person is brushing teeth\nA person is crawling baby\nA person is motorcycling\nA person is driving car\nA person is sticking tongue out\nA person is shaking head\nA person is sword fighting\nA person is doing aerobics\nA person is strumming guitar\nA person is riding or walking with horse\nA person is archery\nA person is catching or throwing baseball\nA person is playing chess\nA person is rock scissors paper\nA person is using computer\nA person is arranging flowers\nA person is bending metal\nA person is ice skating\nA person is climbing a rope\nA person is crying\nA person is dancing ballet\nA person is getting a haircut\nA person is running on treadmill\nA person is kissing\nA person is counting money\nA person is barbequing\nA person is peeling apples\nA person is milking cow\nA person is shining shoes\nA person is making snowman\nA person is sailing\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_dimension/multiple_objects.txt",
    "content": "a bird and a cat\na cat and a dog\na dog and a horse\na horse and a sheep\na sheep and a cow\na cow and an elephant\nan elephant and a bear\na bear and a zebra\na zebra and a giraffe\na giraffe and a bird\na chair and a couch\na couch and a potted plant\na potted plant and a tv\na tv and a laptop\na laptop and a remote\na remote and a keyboard\na keyboard and a cell phone\na cell phone and a book\na book and a clock\na clock and a backpack\na backpack and an umbrella\nan umbrella and a handbag\na handbag and a tie\na tie and a suitcase\na suitcase and a vase\na vase and scissors\nscissors and a teddy bear\na teddy bear and a frisbee\na frisbee and skis\nskis and a snowboard\na snowboard and a sports ball\na sports ball and a kite\na kite and a baseball bat\na baseball bat and a baseball glove\na baseball glove and a skateboard\na skateboard and a surfboard\na surfboard and a tennis racket\na tennis racket and a bottle\na bottle and a chair\nan airplane and a train\na train and a boat\na boat and an airplane\na bicycle and a car\na car and a motorcycle\na motorcycle and a bus\na bus and a traffic light\na traffic light and a fire hydrant\na fire hydrant and a stop sign\na stop sign and a parking meter\na parking meter and a truck\na truck and a bicycle\na toilet and a hair drier\na hair drier and a toothbrush\na toothbrush and a sink\na sink and a toilet\na wine glass and a chair\na cup and a couch\na fork and a potted plant\na knife and a tv\na spoon and a laptop\na bowl and a remote\na banana and a keyboard\nan apple and a cell phone\na sandwich and a book\nan orange and a clock\nbroccoli and a backpack\na carrot and an umbrella\na hot dog and a handbag\na pizza and a tie\na donut and a suitcase\na cake and a vase\nan oven and scissors\na toaster and a teddy bear\na microwave and a frisbee\na refrigerator and skis\na bicycle and an airplane\na car and a train\na motorcycle and a boat\na person and a toilet\na person and a hair drier\na person and a toothbrush\na person and a sink\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_dimension/object_class.txt",
    "content": "a person\na bicycle\na car\na motorcycle\nan airplane\na bus\na train\na truck\na boat\na traffic light\na fire hydrant\na stop sign\na parking meter\na bench\na bird\na cat\na dog\na horse\na sheep\na cow\nan elephant\na bear\na zebra\na giraffe\na backpack\nan umbrella\na handbag\na tie\na suitcase\na frisbee\nskis\na snowboard\na sports ball\na kite\na baseball bat\na baseball glove\na skateboard\na surfboard\na tennis racket\na bottle\na wine glass\na cup\na fork\na knife\na spoon\na bowl\na banana\nan apple\na sandwich\nan orange\nbroccoli\na carrot\na hot dog\na pizza\na donut\na cake\na chair\na couch\na potted plant\na bed\na dining table\na toilet\na tv\na laptop\na remote\na keyboard\na cell phone\na microwave\nan oven\na toaster\na sink\na refrigerator\na book\na clock\na vase\nscissors\na teddy bear\na hair drier\na toothbrush\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_dimension/overall_consistency.txt",
    "content": "Close up of grapes on a rotating table.\nTurtle swimming in ocean.\nA storm trooper vacuuming the beach.\nA panda standing on a surfboard in the ocean in sunset.\nAn astronaut feeding ducks on a sunny afternoon, reflection from the water.\nTwo pandas discussing an academic paper.\nSunset time lapse at the beach with moving clouds and colors in the sky.\nA fat rabbit wearing a purple robe walking through a fantasy landscape.\nA koala bear playing piano in the forest.\nAn astronaut flying in space.\nFireworks.\nAn animated painting of fluffy white clouds moving in sky.\nFlying through fantasy landscapes.\nA bigfoot walking in the snowstorm.\nA squirrel eating a burger.\nA cat wearing sunglasses and working as a lifeguard at a pool.\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks.\nSplash of turquoise water in extreme slow motion, alpha channel included.\nan ice cream is melting on the table.\na drone flying over a snowy forest.\na shark is swimming in the ocean.\nAerial panoramic video from a drone of a fantasy land.\na teddy bear is swimming in the ocean.\ntime lapse of sunrise on mars.\ngolden fish swimming in the ocean.\nAn artist brush painting on a canvas close up.\nA drone view of celebration with Christmas tree and fireworks, starry sky - background.\nhappy dog wearing a yellow turtleneck, studio, portrait, facing camera, dark background\nOrigami dancers in white paper, 3D render, on white background, studio shot, dancing modern dance.\nCampfire at night in a snowy forest with starry sky in the background.\na fantasy landscape\nA 3D model of a 1800s victorian house.\nthis is how I do makeup in the morning.\nA raccoon that looks like a turtle, digital art.\nRobot dancing in Times Square.\nBusy freeway at night.\nBalloon full of water exploding in extreme slow motion.\nAn astronaut is riding a horse in the space in a photorealistic style.\nMacro slo-mo. Slow motion cropped closeup of roasted coffee beans falling into an empty bowl.\nSewing machine, old sewing machine working.\nMotion colour drop in water, ink swirling in water, colourful ink in water, abstraction fancy dream cloud of ink.\nFew big purple plums rotating on the turntable. water drops appear on the skin during rotation. isolated on the white background. close-up. macro.\nVampire makeup face of beautiful girl, red contact lenses.\nAshtray full of butts on table, smoke flowing on black background, close-up\nPacific coast, carmel by the sea ocean and waves.\nA teddy bear is playing drum kit in NYC Times Square.\nA corgi is playing drum kit.\nAn Iron man is playing the electronic guitar, high electronic guitar.\nA raccoon is playing the electronic guitar.\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background by Vincent van Gogh\nA corgi's head depicted as an explosion of a nebula\nA fantasy landscape\nA future where humans have achieved teleportation technology\nA jellyfish floating through the ocean, with bioluminescent tentacles\nA Mars rover moving on Mars\nA panda drinking coffee in a cafe in Paris\nA space shuttle launching into orbit, with flames and smoke billowing out from the engines\nA steam train moving on a mountainside\nA super cool giant robot in Cyberpunk Beijing\nA tropical beach at sunrise, with palm trees and crystal-clear water in the foreground\nCinematic shot of Van Gogh's selfie, Van Gogh style\nGwen Stacy reading a book\nIron Man flying in the sky\nThe bund Shanghai, oil painting\nYoda playing guitar on the stage\nA beautiful coastal beach in spring, waves lapping on sand by Hokusai, in the style of Ukiyo\nA beautiful coastal beach in spring, waves lapping on sand by Vincent van Gogh\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background\nA car moving slowly on an empty street, rainy evening\nA cat eating food out of a bowl\nA cat wearing sunglasses at a pool\nA confused panda in calculus class\nA cute fluffy panda eating Chinese food in a restaurant\nA cute happy Corgi playing in park, sunset\nA cute raccoon playing guitar in a boat on the ocean\nA happy fuzzy panda playing guitar nearby a campfire, snow mountain in the background\nA lightning striking atop of eiffel tower, dark clouds in the sky\nA modern art museum, with colorful paintings\nA panda cooking in the kitchen\nA panda playing on a swing set\nA polar bear is playing guitar\nA raccoon dressed in suit playing the trumpet, stage background\nA robot DJ is playing the turntable, in heavy raining futuristic tokyo rooftop cyberpunk night, sci-fi, fantasy\nA shark swimming in clear Caribbean ocean\nA super robot protecting city\nA teddy bear washing the dishes\nAn epic tornado attacking above a glowing city at night, the tornado is made of smoke\nAn oil painting of a couple in formal evening wear going home get caught in a heavy downpour with umbrellas\nClown fish swimming through the coral reef\nHyper-realistic spaceship landing on Mars\nThe bund Shanghai, vibrant color\nVincent van Gogh is painting in the room\nYellow flowers swing in the wind\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_dimension/scene.txt",
    "content": "alley\namusement park\naquarium\narch\nart gallery\nbathroom\nbakery shop\nballroom\nbar\nbarn\nbasement\nbeach\nbedroom\nbridge\nbotanical garden\ncafeteria\ncampsite\ncampus\ncarrousel\ncastle\ncemetery\nclassroom\ncliff\ncrosswalk\nconstruction site\ncorridor\ncourtyard\ndesert\ndowntown\ndriveway\nfarm\nfood court\nfootball field\nforest road\nfountain\ngas station\nglacier\ngolf course\nindoor gymnasium\nharbor\nhighway\nhospital\nhouse\niceberg\nindustrial area\njail cell\njunkyard\nkitchen\nindoor library\nlighthouse\nlaboratory\nmansion\nmarsh\nmountain\nindoor movie theater\nindoor museum\nmusic studio\nnursery\nocean\noffice\npalace\nparking lot\npharmacy\nphone booth\nraceway\nrestaurant\nriver\nscience museum\nshower\nski slope\nsky\nskyscraper\nbaseball stadium\nstaircase\nstreet\nsupermarket\nindoor swimming pool\ntower\noutdoor track\ntrain railway\ntrain station platform\nunderwater coral reef\nvalley\nvolcano\nwaterfall\nwindmill\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_dimension/spatial_relationship.txt",
    "content": "a bicycle on the left of a car, front view\na car on the right of a motorcycle, front view\na motorcycle on the left of a bus, front view\na bus on the right of a traffic light, front view\na traffic light on the left of a fire hydrant, front view\na fire hydrant on the right of a stop sign, front view\na stop sign on the left of a parking meter, front view\na parking meter on the right of a bench, front view\na bench on the left of a truck, front view\na truck on the right of a bicycle, front view\na bird on the left of a cat, front view\na cat on the right of a dog, front view\na dog on the left of a horse, front view\na horse on the right of a sheep, front view\na sheep on the left of a cow, front view\na cow on the right of an elephant, front view\nan elephant on the left of a bear, front view\na bear on the right of a zebra, front view\na zebra on the left of a giraffe, front view\na giraffe on the right of a bird, front view\na bottle on the left of a wine glass, front view\na wine glass on the right of a cup, front view\na cup on the left of a fork, front view\na fork on the right of a knife, front view\na knife on the left of a spoon, front view\na spoon on the right of a bowl, front view\na bowl on the left of a bottle, front view\na potted plant on the left of a remote, front view\na remote on the right of a clock, front view\na clock on the left of a vase, front view\na vase on the right of scissors, front view\nscissors on the left of a teddy bear, front view\na teddy bear on the right of a potted plant, front view\na frisbee on the left of a sports ball, front view\na sports ball on the right of a baseball bat, front view\na baseball bat on the left of a baseball glove, front view\na baseball glove on the right of a tennis racket, front view\na tennis racket on the left of a frisbee, front view\na toilet on the left of a hair drier, front view\na hair drier on the right of a toothbrush, front view\na toothbrush on the left of a sink, front view\na sink on the right of a toilet, front view\na chair on the left of a couch, front view\na couch on the right of a bed, front view\na bed on the left of a tv, front view\na tv on the right of a dining table, front view\na dining table on the left of a chair, front view\nan airplane on the left of a train, front view\na train on the right of a boat, front view\na boat on the left of an airplane, front view\nan oven on the top of a toaster, front view\nan oven on the bottom of a toaster, front view\na toaster on the top of a microwave, front view\na toaster on the bottom of a microwave, front view\na microwave on the top of an oven, front view\na microwave on the bottom of an oven, front view\na banana on the top of an apple, front view\na banana on the bottom of an apple, front view\nan apple on the top of a sandwich, front view\nan apple on the bottom of a sandwich, front view\na sandwich on the top of an orange, front view\na sandwich on the bottom of an orange, front view\nan orange on the top of a carrot, front view\nan orange on the bottom of a carrot, front view\na carrot on the top of a hot dog, front view\na carrot on the bottom of a hot dog, front view\na hot dog on the top of a pizza, front view\na hot dog on the bottom of a pizza, front view\na pizza on the top of a donut, front view\na pizza on the bottom of a donut, front view\na donut on the top of broccoli, front view\na donut on the bottom of broccoli, front view\nbroccoli on the top of a banana, front view\nbroccoli on the bottom of a banana, front view\nskis on the top of a snowboard, front view\nskis on the bottom of a snowboard, front view\na snowboard on the top of a kite, front view\na snowboard on the bottom of a kite, front view\na kite on the top of a skateboard, front view\na kite on the bottom of a skateboard, front view\na skateboard on the top of a surfboard, front view\na skateboard on the bottom of a surfboard, front view\na surfboard on the top of skis, front view\na surfboard on the bottom of skis, front view\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_dimension/subject_consistency.txt",
    "content": "a person swimming in ocean\na person giving a presentation to a room full of colleagues\na person washing the dishes\na person eating a burger\na person walking in the snowstorm\na person drinking coffee in a cafe\na person playing guitar\na bicycle leaning against a tree\na bicycle gliding through a snowy field\na bicycle slowing down to stop\na bicycle accelerating to gain speed\na car stuck in traffic during rush hour\na car turning a corner\na car slowing down to stop\na car accelerating to gain speed\na motorcycle cruising along a coastal highway\na motorcycle turning a corner\na motorcycle slowing down to stop\na motorcycle gliding through a snowy field\na motorcycle accelerating to gain speed\nan airplane soaring through a clear blue sky\nan airplane taking off\nan airplane landing smoothly on a runway\nan airplane accelerating to gain speed\na bus turning a corner\na bus stuck in traffic during rush hour\na bus accelerating to gain speed\na train speeding down the tracks\na train crossing over a tall bridge\na train accelerating to gain speed\na truck turning a corner\na truck anchored in a tranquil bay\na truck stuck in traffic during rush hour\na truck slowing down to stop\na truck accelerating to gain speed\na boat sailing smoothly on a calm lake\na boat slowing down to stop\na boat accelerating to gain speed\na bird soaring gracefully in the sky\na bird building a nest from twigs and leaves\na bird flying over a snowy forest\na cat grooming itself meticulously with its tongue\na cat playing in park\na cat drinking water\na cat running happily\na dog enjoying a peaceful walk\na dog playing in park\na dog drinking water\na dog running happily\na horse bending down to drink water from a river\na horse galloping across an open field\na horse taking a peaceful walk\na horse running to join a herd of its kind\na sheep bending down to drink water from a river\na sheep taking a peaceful walk\na sheep running to join a herd of its kind\na cow bending down to drink water from a river\na cow chewing cud while resting in a tranquil barn\na cow running to join a herd of its kind\nan elephant spraying itself with water using its trunk to cool down\nan elephant taking a peaceful walk\nan elephant running to join a herd of its kind\na bear catching a salmon in its powerful jaws\na bear sniffing the air for scents of food\na bear climbing a tree\na bear hunting for prey\na zebra bending down to drink water from a river\na zebra running to join a herd of its kind\na zebra taking a peaceful walk\na giraffe bending down to drink water from a river\na giraffe taking a peaceful walk\na giraffe running to join a herd of its kind\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_dimension/temporal_flickering.txt",
    "content": "In a still frame, a stop sign\na toilet, frozen in time\na laptop, frozen in time\nA tranquil tableau of alley\nA tranquil tableau of bar\nA tranquil tableau of barn\nA tranquil tableau of bathroom\nA tranquil tableau of bedroom\nA tranquil tableau of cliff\nIn a still frame, courtyard\nIn a still frame, gas station\nA tranquil tableau of house\nindoor gymnasium, frozen in time\nA tranquil tableau of indoor library\nA tranquil tableau of kitchen\nA tranquil tableau of palace\nIn a still frame, parking lot\nIn a still frame, phone booth\nA tranquil tableau of restaurant\nA tranquil tableau of tower\nA tranquil tableau of a bowl\nA tranquil tableau of an apple\nA tranquil tableau of a bench\nA tranquil tableau of a bed\nA tranquil tableau of a chair\nA tranquil tableau of a cup\nA tranquil tableau of a dining table\nIn a still frame, a pear\nA tranquil tableau of a bunch of grapes\nA tranquil tableau of a bowl on the kitchen counter\nA tranquil tableau of a beautiful, handcrafted ceramic bowl\nA tranquil tableau of an antique bowl\nA tranquil tableau of an exquisite mahogany dining table\nA tranquil tableau of a wooden bench in the park\nA tranquil tableau of a beautiful wrought-iron bench surrounded by blooming flowers\nIn a still frame, a park bench with a view of the lake\nA tranquil tableau of a vintage rocking chair was placed on the porch\nA tranquil tableau of the jail cell was small and dimly lit, with cold, steel bars\nA tranquil tableau of the phone booth was tucked away in a quiet alley\na dilapidated phone booth stood as a relic of a bygone era on the sidewalk, frozen in time\nA tranquil tableau of the old red barn stood weathered and iconic against the backdrop of the countryside\nA tranquil tableau of a picturesque barn was painted a warm shade of red and nestled in a picturesque meadow\nIn a still frame, within the desolate desert, an oasis unfolded, characterized by the stoic presence of palm trees and a motionless, glassy pool of water\nIn a still frame, the Parthenon's majestic Doric columns stand in serene solitude atop the Acropolis, framed by the tranquil Athenian landscape\nIn a still frame, the Temple of Hephaestus, with its timeless Doric grace, stands stoically against the backdrop of a quiet Athens\nIn a still frame, the ornate Victorian streetlamp stands solemnly, adorned with intricate ironwork and stained glass panels\nA tranquil tableau of the Stonehenge presented itself as an enigmatic puzzle, each colossal stone meticulously placed against the backdrop of tranquility\nIn a still frame, in the vast desert, an oasis nestled among dunes, featuring tall palm trees and an air of serenity\nstatic view on a desert scene with an oasis, palm trees, and a clear, calm pool of water\nA tranquil tableau of an ornate Victorian streetlamp standing on a cobblestone street corner, illuminating the empty night\nA tranquil tableau of a tranquil lakeside cabin nestled among tall pines, its reflection mirrored perfectly in the calm water\nIn a still frame, a vintage gas lantern, adorned with intricate details, gracing a historic cobblestone square\nIn a still frame, a tranquil Japanese tea ceremony room, with tatami mats, a delicate tea set, and a bonsai tree in the corner\nA tranquil tableau of the Parthenon stands resolute in its classical elegance, a timeless symbol of Athens' cultural legacy\nA tranquil tableau of in the heart of Plaka, the neoclassical architecture of the old city harmonizes with the ancient ruins\nA tranquil tableau of in the desolate beauty of the American Southwest, Chaco Canyon's ancient ruins whispered tales of an enigmatic civilization that once thrived amidst the arid landscapes\nA tranquil tableau of at the edge of the Arabian Desert, the ancient city of Petra beckoned with its enigmatic rock-carved façades\nIn a still frame, amidst the cobblestone streets, an Art Nouveau lamppost stood tall\nA tranquil tableau of in the quaint village square, a traditional wrought-iron streetlamp featured delicate filigree patterns and amber-hued glass panels\nA tranquil tableau of the lampposts were adorned with Art Deco motifs, their geometric shapes and frosted glass creating a sense of vintage glamour\nIn a still frame, in the picturesque square, a Gothic-style lamppost adorned with intricate stone carvings added a touch of medieval charm to the setting\nIn a still frame, in the heart of the old city, a row of ornate lantern-style streetlamps bathed the narrow alleyway in a warm, welcoming light\nA tranquil tableau of in the heart of the Utah desert, a massive sandstone arch spanned the horizon\nA tranquil tableau of in the Arizona desert, a massive stone bridge arched across a rugged canyon\nA tranquil tableau of in the corner of the minimalist tea room, a bonsai tree added a touch of nature's beauty to the otherwise simple and elegant space\nIn a still frame, amidst the hushed ambiance of the traditional tea room, a meticulously arranged tea set awaited, with porcelain cups, a bamboo whisk\nIn a still frame, nestled in the Zen garden, a rustic teahouse featured tatami seating and a traditional charcoal brazier\nA tranquil tableau of a country estate's library featured elegant wooden shelves\nA tranquil tableau of beneath the shade of a solitary oak tree, an old wooden park bench sat patiently\nA tranquil tableau of beside a tranquil pond, a weeping willow tree draped its branches gracefully over the water's surface, creating a serene tableau of reflection and calm\nA tranquil tableau of in the Zen garden, a perfectly raked gravel path led to a serene rock garden\nIn a still frame, a tranquil pond was fringed by weeping cherry trees, their blossoms drifting lazily onto the glassy surface\nIn a still frame, within the historic library's reading room, rows of antique leather chairs and mahogany tables offered a serene haven for literary contemplation\nA tranquil tableau of a peaceful orchid garden showcased a variety of delicate blooms\nA tranquil tableau of in the serene courtyard, a centuries-old stone well stood as a symbol of a bygone era, its mossy stones bearing witness to the passage of time\n"
  },
  {
    "path": "Open-Sora/assets/texts/VBench/prompts_per_dimension/temporal_style.txt",
    "content": "A beautiful coastal beach in spring, waves lapping on sand, in super slow motion\nA beautiful coastal beach in spring, waves lapping on sand, zoom in\nA beautiful coastal beach in spring, waves lapping on sand, zoom out\nA beautiful coastal beach in spring, waves lapping on sand, pan left\nA beautiful coastal beach in spring, waves lapping on sand, pan right\nA beautiful coastal beach in spring, waves lapping on sand, tilt up\nA beautiful coastal beach in spring, waves lapping on sand, tilt down\nA beautiful coastal beach in spring, waves lapping on sand, with an intense shaking effect\nA beautiful coastal beach in spring, waves lapping on sand, featuring a steady and smooth perspective\nA beautiful coastal beach in spring, waves lapping on sand, racking focus\nThe bund Shanghai, in super slow motion\nThe bund Shanghai, zoom in\nThe bund Shanghai, zoom out\nThe bund Shanghai, pan left\nThe bund Shanghai, pan right\nThe bund Shanghai, tilt up\nThe bund Shanghai, tilt down\nThe bund Shanghai, with an intense shaking effect\nThe bund Shanghai, featuring a steady and smooth perspective\nThe bund Shanghai, racking focus\na shark is swimming in the ocean, in super slow motion\na shark is swimming in the ocean, zoom in\na shark is swimming in the ocean, zoom out\na shark is swimming in the ocean, pan left\na shark is swimming in the ocean, pan right\na shark is swimming in the ocean, tilt up\na shark is swimming in the ocean, tilt down\na shark is swimming in the ocean, with an intense shaking effect\na shark is swimming in the ocean, featuring a steady and smooth perspective\na shark is swimming in the ocean, racking focus\nA panda drinking coffee in a cafe in Paris, in super slow motion\nA panda drinking coffee in a cafe in Paris, zoom in\nA panda drinking coffee in a cafe in Paris, zoom out\nA panda drinking coffee in a cafe in Paris, pan left\nA panda drinking coffee in a cafe in Paris, pan right\nA panda drinking coffee in a cafe in Paris, tilt up\nA panda drinking coffee in a cafe in Paris, tilt down\nA panda drinking coffee in a cafe in Paris, with an intense shaking effect\nA panda drinking coffee in a cafe in Paris, featuring a steady and smooth perspective\nA panda drinking coffee in a cafe in Paris, racking focus\nA cute happy Corgi playing in park, sunset, in super slow motion\nA cute happy Corgi playing in park, sunset, zoom in\nA cute happy Corgi playing in park, sunset, zoom out\nA cute happy Corgi playing in park, sunset, pan left\nA cute happy Corgi playing in park, sunset, pan right\nA cute happy Corgi playing in park, sunset, tilt up\nA cute happy Corgi playing in park, sunset, tilt down\nA cute happy Corgi playing in park, sunset, with an intense shaking effect\nA cute happy Corgi playing in park, sunset, featuring a steady and smooth perspective\nA cute happy Corgi playing in park, sunset, racking focus\nGwen Stacy reading a book, in super slow motion\nGwen Stacy reading a book, zoom in\nGwen Stacy reading a book, zoom out\nGwen Stacy reading a book, pan left\nGwen Stacy reading a book, pan right\nGwen Stacy reading a book, tilt up\nGwen Stacy reading a book, tilt down\nGwen Stacy reading a book, with an intense shaking effect\nGwen Stacy reading a book, featuring a steady and smooth perspective\nGwen Stacy reading a book, racking focus\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, in super slow motion\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, zoom in\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, zoom out\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, pan left\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, pan right\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, tilt up\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, tilt down\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, with an intense shaking effect\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, featuring a steady and smooth perspective\nA boat sailing leisurely along the Seine River with the Eiffel Tower in background, racking focus\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, in super slow motion\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, zoom in\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, zoom out\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, pan left\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, pan right\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, tilt up\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, tilt down\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, with an intense shaking effect\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, featuring a steady and smooth perspective\nA couple in formal evening wear going home get caught in a heavy downpour with umbrellas, racking focus\nAn astronaut flying in space, in super slow motion\nAn astronaut flying in space, zoom in\nAn astronaut flying in space, zoom out\nAn astronaut flying in space, pan left\nAn astronaut flying in space, pan right\nAn astronaut flying in space, tilt up\nAn astronaut flying in space, tilt down\nAn astronaut flying in space, with an intense shaking effect\nAn astronaut flying in space, featuring a steady and smooth perspective\nAn astronaut flying in space, racking focus\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, in super slow motion\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, zoom in\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, zoom out\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, pan left\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, pan right\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, tilt up\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, tilt down\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, with an intense shaking effect\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, featuring a steady and smooth perspective\nSnow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, racking focus\n"
  },
  {
    "path": "Open-Sora/assets/texts/imagenet_id.txt",
    "content": "207\n360\n387\n974\n88\n979\n417\n279\n"
  },
  {
    "path": "Open-Sora/assets/texts/imagenet_labels.txt",
    "content": "golden retriever\notter\nlesser panda\ngeyser\nmacaw\nvalley\nballoon\ngolden panda\n"
  },
  {
    "path": "Open-Sora/assets/texts/rand_types.txt",
    "content": "随机电影镜头\n随机电影镜头\n随机电影镜头\n随机电影镜头\n随机电影镜头\n随机任务镜头\n随机任务镜头\n随机任务镜头\n随机任务镜头\n随机任务镜头\n随机游戏镜头\n随机游戏镜头\n随机游戏镜头\n随机游戏镜头\n随机游戏镜头\n随机开车镜头\n随机开车镜头\n随机开车镜头\n随机开车镜头\n随机开车镜头\n随机动物镜头\n随机动物镜头\n随机动物镜头\n随机动物镜头\n随机动物镜头\n随机森林镜头\n随机森林镜头\n随机森林镜头\n随机森林镜头\n随机森林镜头\n随机动漫镜头\n随机动漫镜头\n随机动漫镜头\n随机动漫镜头\n随机动漫镜头\n随机舞蹈镜头\n随机舞蹈镜头\n随机舞蹈镜头\n随机舞蹈镜头\n随机舞蹈镜头\n"
  },
  {
    "path": "Open-Sora/assets/texts/t2i_samples.txt",
    "content": "A small cactus with a happy face in the Sahara desert.\nBright scene, aerial view,ancient city, fantasy, gorgeous light, mirror reflection, high detail, wide angle lens.\nNature vs human nature, surreal, UHD, 8k, hyper details, rich colors, photograph.\nPoster of a mechanical cat, techical Schematics viewed from front.\nLuffy from ONEPIECE, handsome face, fantasy.\nReal beautiful woman.\nA alpaca made of colorful building blocks, cyberpunk.\nartistic\n"
  },
  {
    "path": "Open-Sora/assets/texts/t2i_sigma.txt",
    "content": "Eiffel Tower was Made up of more than 2 million translucent straws to look like a cloud, with the bell tower at the top of the building, Michel installed huge foam-making machines in the forest to blow huge amounts of unpredictable wet clouds in the building's classic architecture.\nA gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.\nFull body shot, a French woman, Photography, French Streets background, backlighting, rim light, Fujifilm.\nClose-up photos of models, hazy light and shadow, laser metal hair accessories, soft and beautiful, light gold pupils, white eyelashes, low saturation, real skin details, clear pores and fine lines, light reflection and refraction, ultra-clear, cinematography, award-winning works.\nA litter of golden retriever puppies playing in the snow. Their heads pop out of the snow, covered in.\nLego model, future rocket station, intricate details, high resolution, unreal engine, UHD\nOne giant, sharp, metal square mirror in the center of the frame, four young people on the foreground, background sunny palm oil planation, tropical, realistic style, photography, nostalgic, green tone, mysterious, dreamy, bright color.\nModern luxury contemporary luxury home interiors house, in the style of mimicking ruined materials, ray tracing, haunting houses, and stone, capture the essence of nature, gray and bronze, dynamic outdoor shots.\nOver the shoulder game perspective, game screen of Diablo 4, Inside the gorgeous palace is the wet ground, The necromancer knelt before the king, and a horde of skeletons he summoned stood at his side, cinematic light.\nA curvy timber house near a sea, designed by Zaha Hadid, represent the image of a cold, modern architecture, at night, white lighting, highly detailed.\n"
  },
  {
    "path": "Open-Sora/assets/texts/t2v_car.txt",
    "content": "|0|A car driving on the in forest.|2|A car driving in the desert.|4|A car driving near the coast.|6|A car driving in the city.|8|A car driving near a mountain.|10|A car driving on the surface of a river.|12|A car driving on the surface of the earch.|14|A car driving in the universe.{\"reference_path\": \"https://cdn.openai.com/tmp/s/interp/d0.mp4\", \"mask_strategy\": \"0,0,0,0,16,0.4\"}\n"
  },
  {
    "path": "Open-Sora/assets/texts/t2v_latte.txt",
    "content": "Yellow and black tropical fish dart through the sea.\nAn epic tornado attacking above aglowing city at night.\nSlow pan upward of blazing oak fire in an indoor fireplace.\na cat wearing sunglasses and working as a lifeguard at pool.\nSunset over the sea.\nA dog in astronaut suit and sunglasses floating in space.\nA astronaut in flying in space, 4k, high resolution\n"
  },
  {
    "path": "Open-Sora/assets/texts/t2v_pllava.txt",
    "content": "a close-up shot of a woman standing in a room with a white wall and a plant on the left side. the woman has curly hair and is wearing a green tank top. she is looking to the side with a neutral expression on her face. the lighting in the room is soft and appears to be natural, coming from the left side of the frame. the focus is on the woman, with the background being out of focus. there are no texts or other objects in the video. the style of the video is a simple, candid portrait with a shallow depth of field.\na serene scene of a pond filled with water lilies. the water is a deep blue, providing a striking contrast to the pink and white flowers that float on its surface. the flowers, in full bloom, are the main focus of the video. they are scattered across the pond, with some closer to the camera and others further away, creating a sense of depth. the pond is surrounded by lush greenery, adding a touch of nature to the scene. the video is taken from a low angle, looking up at the flowers, which gives a unique perspective and emphasizes their beauty. the overall composition of the video suggests a peaceful and tranquil setting, likely a garden or a park.\na professional setting where a woman is presenting a slide from a presentation. she is standing in front of a projector screen, which displays a bar chart. the chart is colorful, with bars of different heights, indicating some sort of data comparison. the woman is holding a pointer, which she uses to highlight specific parts of the chart. she is dressed in a white blouse and black pants, and her hair is styled in a bun. the room has a modern design, with a sleek black floor and a white ceiling. the lighting is bright, illuminating the woman and the projector screen. the focus of the image is on the woman and the projector screen, with the background being out of focus. there are no texts visible in the image. the relative positions of the objects suggest that the woman is the main subject of the image, and the projector screen is the object of her attention. the image does not provide any information about the content of the presentation or the context of the meeting.\na bustling city street from the perspective of a car. the car, a sleek black sedan, is in motion, driving down the street. the dashboard of the car is visible in the foreground, providing a view of the road ahead. the street is lined with parked cars on both sides, their colors muted in the bright sunlight. buildings rise on either side of the street, their windows reflecting the sunlight. the sky above is a clear blue, and the sun is shining brightly, casting a warm glow on the scene. the street is busy with pedestrians and other vehicles, adding to the dynamic nature of the scene. the video does not contain any text. the relative positions of the objects suggest a typical city street scene with the car in the foreground, the parked cars on either side, and the buildings in the background. the sunlight illuminates the scene, highlighting the colors and details of the objects. the pedestrians and other vehicles are in motion, adding a sense of life and activity to the scene. the buildings provide a sense of depth and scale to the image. the video does not contain any text or countable objects. the\na serene scene in a park. the sun is shining brightly, casting a warm glow on the lush green trees and the grassy field. the camera is positioned low, looking up at the towering trees, which are the main focus of the image. the trees are dense and full of leaves, creating a canopy of green that fills the frame. the sunlight filters through the leaves, creating a beautiful pattern of light and shadow on the ground. the overall atmosphere of the video is peaceful and tranquil, evoking a sense of calm and relaxation.\na moment in a movie theater. a couple is seated in the middle of the theater, engrossed in the movie they are watching. the man is dressed in a casual outfit, complete with a pair of sunglasses, while the woman is wearing a cozy sweater. they are seated on a red theater seat, which stands out against the dark surroundings. the theater itself is dimly lit, with the screen displaying the movie they are watching. the couple appears to be enjoying the movie, their attention completely absorbed by the on-screen action. the theater is mostly empty, with only a few other seats visible in the background. the video does not contain any text or additional objects. the relative positions of the objects are such that the couple is in the foreground, while the screen and the other seats are in the background. the focus of the video is clearly on the couple and their shared experience of watching a movie in a theater.\na scene where a person is examining a dog. the person is wearing a blue shirt with the word \"volunteer\" printed on it. the dog is lying on its side, and the person is using a stethoscope to listen to the dog's heartbeat. the dog appears to be a golden retriever and is looking directly at the camera. the background is blurred, but it seems to be an indoor setting with a white wall. the person's focus is on the dog, and they seem to be checking its health. the dog's expression is calm, and it seems to be comfortable with the person's touch. the overall atmosphere of the video is calm and professional.\na close-up shot of a woman applying makeup. she is using a black brush to apply a dark powder to her face. the woman has blonde hair and is wearing a black top. the background is black, which contrasts with her skin tone and the makeup. the focus is on her face and the brush, with the rest of her body and the background being out of focus. the lighting is soft and even, highlighting the texture of the makeup and the woman's skin. there are no texts or other objects in the video. the woman's expression is neutral, and she is looking directly at the camera. the video does not contain any action, as it is a still shot of a woman applying makeup. the relative position of the woman and the brush is such that the brush is in her hand and is being used to apply the makeup to her face. the video does not contain any other objects or actions. the woman is the only person in the video, and she is the main subject. the video does not contain any sound. the description is based on the visible content of the video and does not include any assumptions or interpretations.\na young woman is seated in a black gaming chair in a room filled with computer monitors and other gaming equipment. she is wearing a red tank top and black pants, and her hair is styled in loose waves. the room is dimly lit, with the glow of the monitors casting a soft light on her face. she is holding a black game controller in her hands, and her attention is focused on the screen in front of her. the room is filled with other gaming equipment, including keyboards and mice, and there are other chairs and desks scattered around the room. the woman appears to be engrossed in her game, her posture relaxed yet focused. the room is quiet, the only sound coming from the beeps and boops of the game. the woman is the only person in the room, adding a sense of solitude to the scene. the video does not contain any text. the relative positions of the objects suggest a well-organized gaming setup, with the woman at the center, surrounded by her gaming equipment. the video does not contain any action, but the woman's focused expression suggests that she is in the middle of an intense g\na breathtaking aerial view of a coastal landscape at sunset. the sky, painted in hues of orange and pink, serves as a stunning backdrop to the scene. the sun, partially obscured by the horizon, casts a warm glow on the landscape below. the foreground of the image is dominated by a rocky cliff, its rugged surface adding a touch of raw beauty to the scene. the cliff's edge is adorned with patches of green vegetation, providing a stark contrast to the otherwise barren landscape. the middle ground of the image reveals a winding road that hugs the coastline. the road, appearing as a thin line against the vast expanse of the landscape, guides the viewer's eye towards the horizon. in the background, the silhouette of mountains can be seen, their peaks shrouded in a light mist. the mountains, along with the road, add depth to the image, creating a sense of distance and scale. overall, the video presents a serene and majestic coastal landscape, captured at the perfect moment of sunset. the colors\n"
  },
  {
    "path": "Open-Sora/assets/texts/t2v_ref.txt",
    "content": "Drone view of waves crashing against the rugged cliffs along Big Sur’s garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff’s edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff's edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.\nIn an ornate, historical hall, a massive tidal wave peaks and begins to crash. Two surfers, seizing the moment, skillfully navigate the face of the wave.\nPirate ship in a cosmic maelstrom nebula.\nDrone view of waves crashing against the rugged cliffs along Big Sur’s garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff’s edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff's edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.\nA sad small cactus with in the Sahara desert becomes happy.\nA car driving on a road in the middle of a desert.\n"
  },
  {
    "path": "Open-Sora/assets/texts/t2v_samples.txt",
    "content": "A soaring drone footage captures the majestic beauty of a coastal cliff, its red and yellow stratified rock faces rich in color and against the vibrant turquoise of the sea. Seabirds can be seen taking flight around the cliff's precipices. As the drone slowly moves from different angles, the changing sunlight casts shifting shadows that highlight the rugged textures of the cliff and the surrounding calm sea. The water gently laps at the rock base and the greenery that clings to the top of the cliff, and the scene gives a sense of peaceful isolation at the fringes of the ocean. The video captures the essence of pristine natural beauty untouched by human structures.\nA majestic beauty of a waterfall cascading down a cliff into a serene lake. The waterfall, with its powerful flow, is the central focus of the video. The surrounding landscape is lush and green, with trees and foliage adding to the natural beauty of the scene. The camera angle provides a bird's eye view of the waterfall, allowing viewers to appreciate the full height and grandeur of the waterfall. The video is a stunning representation of nature's power and beauty.\nA vibrant scene of a snowy mountain landscape. The sky is filled with a multitude of colorful hot air balloons, each floating at different heights, creating a dynamic and lively atmosphere. The balloons are scattered across the sky, some closer to the viewer, others further away, adding depth to the scene.  Below, the mountainous terrain is blanketed in a thick layer of snow, with a few patches of bare earth visible here and there. The snow-covered mountains provide a stark contrast to the colorful balloons, enhancing the visual appeal of the scene.  In the foreground, a few cars can be seen driving along a winding road that cuts through the mountains. The cars are small compared to the vastness of the landscape, emphasizing the grandeur of the surroundings.  The overall style of the video is a mix of adventure and tranquility, with the hot air balloons adding a touch of whimsy to the otherwise serene mountain landscape. The video is likely shot during the day, as the lighting is bright and even, casting soft shadows on the snow-covered mountains.\nThe vibrant beauty of a sunflower field. The sunflowers, with their bright yellow petals and dark brown centers, are in full bloom, creating a stunning contrast against the green leaves and stems. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. The sun is shining brightly, casting a warm glow on the flowers and highlighting their intricate details. The video is shot from a low angle, looking up at the sunflowers, which adds a sense of grandeur and awe to the scene. The sunflowers are the main focus of the video, with no other objects or people present. The video is a celebration of nature's beauty and the simple joy of a sunny day in the countryside.\nA serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell, is the main focus of the video, swimming gracefully towards the right side of the frame. The coral reef, teeming with life, is visible in the background, providing a vibrant and colorful backdrop to the turtle's journey. Several small fish, darting around the turtle, add a sense of movement and dynamism to the scene. The video is shot from a slightly elevated angle, providing a comprehensive view of the turtle's surroundings. The overall style of the video is calm and peaceful, capturing the beauty and tranquility of the underwater world.\nA vibrant underwater scene. A group of blue fish, with yellow fins, are swimming around a coral reef. The coral reef is a mix of brown and green, providing a natural habitat for the fish. The water is a deep blue, indicating a depth of around 30 feet. The fish are swimming in a circular pattern around the coral reef, indicating a sense of motion and activity. The overall scene is a beautiful representation of marine life.\nA bustling city street at night, filled with the glow of car headlights and the ambient light of streetlights. The scene is a blur of motion, with cars speeding by and pedestrians navigating the crosswalks. The cityscape is a mix of towering buildings and illuminated signs, creating a vibrant and dynamic atmosphere. The perspective of the video is from a high angle, providing a bird's eye view of the street and its surroundings. The overall style of the video is dynamic and energetic, capturing the essence of urban life at night.\nA snowy forest landscape with a dirt road running through it. The road is flanked by trees covered in snow, and the ground is also covered in snow. The sun is shining, creating a bright and serene atmosphere. The road appears to be empty, and there are no people or animals visible in the video. The style of the video is a natural landscape shot, with a focus on the beauty of the snowy forest and the peacefulness of the road.\nThe dynamic movement of tall, wispy grasses swaying in the wind. The sky above is filled with clouds, creating a dramatic backdrop. The sunlight pierces through the clouds, casting a warm glow on the scene. The grasses are a mix of green and brown, indicating a change in seasons. The overall style of the video is naturalistic, capturing the beauty of the landscape in a realistic manner. The focus is on the grasses and their movement, with the sky serving as a secondary element. The video does not contain any human or animal elements.\nA serene night scene in a forested area. The first frame shows a tranquil lake reflecting the star-filled sky above. The second frame reveals a beautiful sunset, casting a warm glow over the landscape. The third frame showcases the night sky, filled with stars and a vibrant Milky Way galaxy. The video is a time-lapse, capturing the transition from day to night, with the lake and forest serving as a constant backdrop. The style of the video is naturalistic, emphasizing the beauty of the night sky and the peacefulness of the forest.\n"
  },
  {
    "path": "Open-Sora/assets/texts/t2v_short.txt",
    "content": "A fat rabbit wearing a purple robe walking through a fantasy landscape\nWaves crashing against a lone lighthouse, ominous lighting\nA mystical forest showcasing the adventures of travelers who enter\nA blue-haired mage singing\nA surreal landscape with floating islands and waterfalls in the sky craft\nA blue bird standing in water\nA young man walks alone by the seaside\nPink rose on a glass surface with droplets, close-up\nDrove viewpoint, a subway train coming out of a tunnel\nSpace with all planets green and pink color with background of bright white stars\nA city floating in an astral space, with stars and nebulae\nSunrise on top of a high-rise building\nPink and cyan powder explosions\nDeers in the woods gaze into the camera under the sunlight\nIn a flash of lightning, a wizard appeared from thin air, his long robes billowing in the wind\nA futuristic cyberpunk cityscape at night with towering neon-lit skyscrapers\nA scene where the trees, flowers, and animals come together to create a symphony of nature\nA ghostly ship sailing through the clouds, navigating through a sea under a moonlit sky\nA sunset with beautiful beach\nA young man walking alone in the forest\n"
  },
  {
    "path": "Open-Sora/assets/texts/t2v_sora.txt",
    "content": "A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.\nSeveral giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.\nA movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.\nDrone view of waves crashing against the rugged cliffs along Big Sur’s garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff’s edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.\nAnimated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. The art style is 3D and realistic, with a focus on lighting and texture. The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.\nA gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.\nThis close-up shot of a Victoria crowned pigeon showcases its striking blue plumage and red chest. Its crest is made of delicate, lacy feathers, while its eye is a striking red color. The bird’s head is tilted slightly to the side, giving the impression of it looking regal and majestic. The background is blurred, drawing attention to the bird’s striking appearance.\nPhotorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee.\nA young man at his 20s is sitting on a piece of cloud in the sky, reading a book.\nHistorical footage of California during the gold rush.\nA close up view of a glass sphere that has a zen garden within it. There is a small dwarf in the sphere who is raking the zen garden and creating patterns in the sand.\nExtreme close up of a 24 year old woman’s eye blinking, standing in Marrakech during magic hour, cinematic film shot in 70mm, depth of field, vivid colors, cinematic\nA cartoon kangaroo disco dances.\nA beautiful homemade video showing the people of Lagos, Nigeria in the year 2056. Shot with a mobile phone camera.\nA petri dish with a bamboo forest growing within it that has tiny red pandas running around.\nThe camera rotates around a large stack of vintage televisions all showing different programs — 1950s sci-fi movies, horror movies, news, static, a 1970s sitcom, etc, set inside a large New York museum gallery.\n3D animation of a small, round, fluffy creature with big, expressive eyes explores a vibrant, enchanted forest. The creature, a whimsical blend of a rabbit and a squirrel, has soft blue fur and a bushy, striped tail. It hops along a sparkling stream, its eyes wide with wonder. The forest is alive with magical elements: flowers that glow and change colors, trees with leaves in shades of purple and silver, and small floating lights that resemble fireflies. The creature stops to interact playfully with a group of tiny, fairy-like beings dancing around a mushroom ring. The creature looks up in awe at a large, glowing tree that seems to be the heart of the forest.\nThe camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from it’s tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.\nReflections in the window of a train traveling through the Tokyo suburbs.\nA drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast, the view showcases historic and magnificent architectural details and tiered pathways and patios, waves are seen crashing against the rocks below as the view overlooks the horizon of the coastal waters and hilly landscapes of the Amalfi Coast Italy, several distant people are seen walking and enjoying vistas on patios of the dramatic ocean views, the warm glow of the afternoon sun creates a magical and romantic feeling to the scene, the view is stunning captured with beautiful photography.\nA large orange octopus is seen resting on the bottom of the ocean floor, blending in with the sandy and rocky terrain. Its tentacles are spread out around its body, and its eyes are closed. The octopus is unaware of a king crab that is crawling towards it from behind a rock, its claws raised and ready to attack. The crab is brown and spiny, with long legs and antennae. The scene is captured from a wide angle, showing the vastness and depth of the ocean. The water is clear and blue, with rays of sunlight filtering through. The shot is sharp and crisp, with a high dynamic range. The octopus and the crab are in focus, while the background is slightly blurred, creating a depth of field effect.\nA flock of paper airplanes flutters through a dense jungle, weaving around trees as if they were migrating birds.\nA cat waking up its sleeping owner demanding breakfast. The owner tries to ignore the cat, but the cat tries new tactics and finally the owner pulls out a secret stash of treats from under the pillow to hold the cat off a little longer.\nBorneo wildlife on the Kinabatangan River\nA Chinese Lunar New Year celebration video with Chinese Dragon.\nTour of an art gallery with many beautiful works of art in different styles.\nBeautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.\nA stop motion animation of a flower growing out of the windowsill of a suburban house.\nThe story of a robot’s life in a cyberpunk setting.\nAn extreme close-up of an gray-haired man with a beard in his 60s, he is deep in thought pondering the history of the universe as he sits at a cafe in Paris, his eyes focus on people offscreen as they walk as he sits mostly motionless, he is dressed in a wool coat suit coat with a button-down shirt , he wears a brown beret and glasses and has a very professorial appearance, and the end he offers a subtle closed-mouth smile as if he found the answer to the mystery of life, the lighting is very cinematic with the golden light and the Parisian streets and city in the background, depth of field, cinematic 35mm film.\nA beautiful silhouette animation shows a wolf howling at the moon, feeling lonely, until it finds its pack.\nNew York City submerged like Atlantis. Fish, whales, sea turtles and sharks swim through the streets of New York.\nA litter of golden retriever puppies playing in the snow. Their heads pop out of the snow, covered in.\nStep-printing scene of a person running, cinematic film shot in 35mm.\nFive gray wolf pups frolicking and chasing each other around a remote gravel road, surrounded by grass. The pups run and leap, chasing each other, and nipping at each other, playing.\nBasketball through hoop then explodes.\nArcheologists discover a generic plastic chair in the desert, excavating and dusting it with great care.\nA grandmother with neatly combed grey hair stands behind a colorful birthday cake with numerous candles at a wood dining room table, expression is one of pure joy and happiness, with a happy glow in her eye. She leans forward and blows out the candles with a gentle puff, the cake has pink frosting and sprinkles and the candles cease to flicker, the grandmother wears a light blue blouse adorned with floral patterns, several happy friends and family sitting at the table can be seen celebrating, out of focus. The scene is beautifully captured, cinematic, showing a 3/4 view of the grandmother and the dining room. Warm color tones and soft lighting enhance the mood.\nThe camera directly faces colorful buildings in Burano Italy. An adorable dalmation looks through a window on a building on the ground floor. Many people are walking and cycling along the canal streets in front of the buildings.\nAn adorable happy otter confidently stands on a surfboard wearing a yellow lifejacket, riding along turquoise tropical waters near lush tropical islands, 3D digital render art style.\nThis close-up shot of a chameleon showcases its striking color changing capabilities. The background is blurred, drawing attention to the animal’s striking appearance.\nA corgi vlogging itself in tropical Maui.\nA white and orange tabby cat is seen happily darting through a dense garden, as if chasing something. Its eyes are wide and happy as it jogs forward, scanning the branches, flowers, and leaves as it walks. The path is narrow as it makes its way between all the plants. the scene is captured from a ground-level angle, following the cat closely, giving a low and intimate perspective. The image is cinematic with warm tones and a grainy texture. The scattered daylight between the leaves and plants above creates a warm contrast, accentuating the cat’s orange fur. The shot is clear and sharp, with a shallow depth of field.\nAerial view of Santorini during the blue hour, showcasing the stunning architecture of white Cycladic buildings with blue domes. The caldera views are breathtaking, and the lighting creates a beautiful, serene atmosphere.\nTiltshift of a construction site filled with workers, equipment, and heavy machinery.\nA giant, towering cloud in the shape of a man looms over the earth. The cloud man shoots lighting bolts down to the earth.\nA Samoyed and a Golden Retriever dog are playfully romping through a futuristic neon city at night. The neon lights emitted from the nearby buildings glistens off of their fur.\nThe Glenfinnan Viaduct is a historic railway bridge in Scotland, UK, that crosses over the west highland line between the towns of Mallaig and Fort William. It is a stunning sight as a steam train leaves the bridge, traveling over the arch-covered viaduct. The landscape is dotted with lush greenery and rocky mountains, creating a picturesque backdrop for the train journey. The sky is blue and the sun is shining, making for a beautiful day to explore this majestic spot.\n"
  },
  {
    "path": "Open-Sora/assets/texts/ucf101_id.txt",
    "content": "0\n1\n2\n3\n4\n5\n"
  },
  {
    "path": "Open-Sora/assets/texts/ucf101_labels.txt",
    "content": "Apply Eye Makeup\nApply Lipstick\nArchery\nBaby Crawling\nBalance Beam\nBand Marching\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/acceleration/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/opensora/acceleration/checkpoint.py",
    "content": "from collections.abc import Iterable\n\nimport torch.nn as nn\nfrom torch.utils.checkpoint import checkpoint, checkpoint_sequential\n\n\ndef set_grad_checkpoint(model, use_fp32_attention=False, gc_step=1):\n    assert isinstance(model, nn.Module)\n\n    def set_attr(module):\n        module.grad_checkpointing = True\n        module.fp32_attention = use_fp32_attention\n        module.grad_checkpointing_step = gc_step\n\n    model.apply(set_attr)\n\n\ndef auto_grad_checkpoint(module, *args, **kwargs):\n    if getattr(module, \"grad_checkpointing\", False):\n        if not isinstance(module, Iterable):\n            return checkpoint(module, *args, use_reentrant=False, **kwargs)\n        gc_step = module[0].grad_checkpointing_step\n        return checkpoint_sequential(module, gc_step, *args, use_reentrant=False, **kwargs)\n    return module(*args, **kwargs)\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/acceleration/communications.py",
    "content": "import torch\nimport torch.distributed as dist\n\n\n# ====================\n# All-To-All\n# ====================\ndef _all_to_all(\n    input_: torch.Tensor,\n    world_size: int,\n    group: dist.ProcessGroup,\n    scatter_dim: int,\n    gather_dim: int,\n):\n    input_list = [t.contiguous() for t in torch.tensor_split(input_, world_size, scatter_dim)]\n    output_list = [torch.empty_like(input_list[0]) for _ in range(world_size)]\n    dist.all_to_all(output_list, input_list, group=group)\n    return torch.cat(output_list, dim=gather_dim).contiguous()\n\n\nclass _AllToAll(torch.autograd.Function):\n    \"\"\"All-to-all communication.\n\n    Args:\n        input_: input matrix\n        process_group: communication group\n        scatter_dim: scatter dimension\n        gather_dim: gather dimension\n    \"\"\"\n\n    @staticmethod\n    def forward(ctx, input_, process_group, scatter_dim, gather_dim):\n        ctx.process_group = process_group\n        ctx.scatter_dim = scatter_dim\n        ctx.gather_dim = gather_dim\n        ctx.world_size = dist.get_world_size(process_group)\n        output = _all_to_all(input_, ctx.world_size, process_group, scatter_dim, gather_dim)\n        return output\n\n    @staticmethod\n    def backward(ctx, grad_output):\n        grad_output = _all_to_all(\n            grad_output,\n            ctx.world_size,\n            ctx.process_group,\n            ctx.gather_dim,\n            ctx.scatter_dim,\n        )\n        return (\n            grad_output,\n            None,\n            None,\n            None,\n        )\n\n\ndef all_to_all(\n    input_: torch.Tensor,\n    process_group: dist.ProcessGroup,\n    scatter_dim: int = 2,\n    gather_dim: int = 1,\n):\n    return _AllToAll.apply(input_, process_group, scatter_dim, gather_dim)\n\n\ndef _gather(\n    input_: torch.Tensor,\n    world_size: int,\n    group: dist.ProcessGroup,\n    gather_dim: int,\n):\n    if gather_list is None:\n        gather_list = [torch.empty_like(input_) for _ in range(world_size)]\n    dist.gather(input_, gather_list, group=group, gather_dim=gather_dim)\n    return gather_list\n\n\n# ====================\n# Gather-Split\n# ====================\n\n\ndef _split(input_, pg: dist.ProcessGroup, dim=-1):\n    # skip if only one rank involved\n    world_size = dist.get_world_size(pg)\n    rank = dist.get_rank(pg)\n    if world_size == 1:\n        return input_\n\n    # Split along last dimension.\n    dim_size = input_.size(dim)\n    assert dim_size % world_size == 0, (\n        f\"The dimension to split ({dim_size}) is not a multiple of world size ({world_size}), \"\n        f\"cannot split tensor evenly\"\n    )\n\n    tensor_list = torch.split(input_, dim_size // world_size, dim=dim)\n    output = tensor_list[rank].contiguous()\n\n    return output\n\n\ndef _gather(input_, pg: dist.ProcessGroup, dim=-1):\n    # skip if only one rank involved\n    input_ = input_.contiguous()\n    world_size = dist.get_world_size(pg)\n    dist.get_rank(pg)\n\n    if world_size == 1:\n        return input_\n\n    # all gather\n    tensor_list = [torch.empty_like(input_) for _ in range(world_size)]\n    assert input_.device.type == \"cuda\"\n    torch.distributed.all_gather(tensor_list, input_, group=pg)\n\n    # concat\n    output = torch.cat(tensor_list, dim=dim).contiguous()\n\n    return output\n\n\nclass _GatherForwardSplitBackward(torch.autograd.Function):\n    \"\"\"Gather the input from model parallel region and concatenate.\n\n    Args:\n        input_: input matrix.\n        process_group: parallel mode.\n        dim: dimension\n    \"\"\"\n\n    @staticmethod\n    def symbolic(graph, input_):\n        return _gather(input_)\n\n    @staticmethod\n    def forward(ctx, input_, process_group, dim, grad_scale):\n        ctx.mode = process_group\n        ctx.dim = dim\n        ctx.grad_scale = grad_scale\n        return _gather(input_, process_group, dim)\n\n    @staticmethod\n    def backward(ctx, grad_output):\n        if ctx.grad_scale == \"up\":\n            grad_output = grad_output * dist.get_world_size(ctx.mode)\n        elif ctx.grad_scale == \"down\":\n            grad_output = grad_output / dist.get_world_size(ctx.mode)\n\n        return _split(grad_output, ctx.mode, ctx.dim), None, None, None\n\n\nclass _SplitForwardGatherBackward(torch.autograd.Function):\n    \"\"\"\n    Split the input and keep only the corresponding chuck to the rank.\n\n    Args:\n        input_: input matrix.\n        process_group: parallel mode.\n        dim: dimension\n    \"\"\"\n\n    @staticmethod\n    def symbolic(graph, input_):\n        return _split(input_)\n\n    @staticmethod\n    def forward(ctx, input_, process_group, dim, grad_scale):\n        ctx.mode = process_group\n        ctx.dim = dim\n        ctx.grad_scale = grad_scale\n        return _split(input_, process_group, dim)\n\n    @staticmethod\n    def backward(ctx, grad_output):\n        if ctx.grad_scale == \"up\":\n            grad_output = grad_output * dist.get_world_size(ctx.mode)\n        elif ctx.grad_scale == \"down\":\n            grad_output = grad_output / dist.get_world_size(ctx.mode)\n        return _gather(grad_output, ctx.mode, ctx.dim), None, None, None\n\n\ndef split_forward_gather_backward(input_, process_group, dim, grad_scale=1.0):\n    return _SplitForwardGatherBackward.apply(input_, process_group, dim, grad_scale)\n\n\ndef gather_forward_split_backward(input_, process_group, dim, grad_scale=None):\n    return _GatherForwardSplitBackward.apply(input_, process_group, dim, grad_scale)\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/acceleration/parallel_states.py",
    "content": "import torch.distributed as dist\n\n_GLOBAL_PARALLEL_GROUPS = dict()\n\n\ndef set_data_parallel_group(group: dist.ProcessGroup):\n    _GLOBAL_PARALLEL_GROUPS[\"data\"] = group\n\n\ndef get_data_parallel_group():\n    return _GLOBAL_PARALLEL_GROUPS.get(\"data\", dist.group.WORLD)\n\n\ndef set_sequence_parallel_group(group: dist.ProcessGroup):\n    _GLOBAL_PARALLEL_GROUPS[\"sequence\"] = group\n\n\ndef get_sequence_parallel_group():\n    return _GLOBAL_PARALLEL_GROUPS.get(\"sequence\", None)\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/acceleration/plugin.py",
    "content": "import random\nfrom typing import Optional\n\nimport numpy as np\nimport torch\nfrom colossalai.booster.plugin import LowLevelZeroPlugin\nfrom colossalai.cluster import ProcessGroupMesh\nfrom torch.utils.data import DataLoader\nfrom torch.utils.data.distributed import DistributedSampler\n\nDP_AXIS, SP_AXIS = 0, 1\n\n\nclass ZeroSeqParallelPlugin(LowLevelZeroPlugin):\n    def __init__(\n        self,\n        sp_size: int = 1,\n        stage: int = 2,\n        precision: str = \"fp16\",\n        initial_scale: float = 2**32,\n        min_scale: float = 1,\n        growth_factor: float = 2,\n        backoff_factor: float = 0.5,\n        growth_interval: int = 1000,\n        hysteresis: int = 2,\n        max_scale: float = 2**32,\n        max_norm: float = 0.0,\n        norm_type: float = 2.0,\n        reduce_bucket_size_in_m: int = 12,\n        communication_dtype: Optional[torch.dtype] = None,\n        overlap_communication: bool = True,\n        cpu_offload: bool = False,\n        master_weights: bool = True,\n        verbose: bool = False,\n    ) -> None:\n        super().__init__(\n            stage=stage,\n            precision=precision,\n            initial_scale=initial_scale,\n            min_scale=min_scale,\n            growth_factor=growth_factor,\n            backoff_factor=backoff_factor,\n            growth_interval=growth_interval,\n            hysteresis=hysteresis,\n            max_scale=max_scale,\n            max_norm=max_norm,\n            norm_type=norm_type,\n            reduce_bucket_size_in_m=reduce_bucket_size_in_m,\n            communication_dtype=communication_dtype,\n            overlap_communication=overlap_communication,\n            cpu_offload=cpu_offload,\n            master_weights=master_weights,\n            verbose=verbose,\n        )\n        self.sp_size = sp_size\n        assert self.world_size % sp_size == 0, \"world_size must be divisible by sp_size\"\n        self.dp_size = self.world_size // sp_size\n        self.pg_mesh = ProcessGroupMesh(self.dp_size, self.sp_size)\n        self.dp_group = self.pg_mesh.get_group_along_axis(DP_AXIS)\n        self.sp_group = self.pg_mesh.get_group_along_axis(SP_AXIS)\n        self.dp_rank = self.pg_mesh.coordinate(DP_AXIS)\n        self.sp_rank = self.pg_mesh.coordinate(SP_AXIS)\n\n    def __del__(self):\n        \"\"\"Destroy the prcess groups in ProcessGroupMesh\"\"\"\n        self.pg_mesh.destroy_mesh_process_groups()\n\n    def prepare_dataloader(\n        self,\n        dataset,\n        batch_size,\n        shuffle=False,\n        seed=1024,\n        drop_last=False,\n        pin_memory=False,\n        num_workers=0,\n        distributed_sampler_cls=None,\n        **kwargs,\n    ):\n        _kwargs = kwargs.copy()\n        distributed_sampler_cls = distributed_sampler_cls or DistributedSampler\n        sampler = distributed_sampler_cls(dataset, num_replicas=self.dp_size, rank=self.dp_rank, shuffle=shuffle)\n\n        # Deterministic dataloader\n        def seed_worker(worker_id):\n            worker_seed = seed\n            np.random.seed(worker_seed)\n            torch.manual_seed(worker_seed)\n            random.seed(worker_seed)\n\n        return DataLoader(\n            dataset,\n            batch_size=batch_size,\n            sampler=sampler,\n            worker_init_fn=seed_worker,\n            drop_last=drop_last,\n            pin_memory=pin_memory,\n            num_workers=num_workers,\n            **_kwargs,\n        )\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/acceleration/shardformer/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/opensora/acceleration/shardformer/modeling/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/opensora/acceleration/shardformer/modeling/t5.py",
    "content": "import torch\nimport torch.nn as nn\n\n\nclass T5LayerNorm(nn.Module):\n    def __init__(self, hidden_size, eps=1e-6):\n        \"\"\"\n        Construct a layernorm module in the T5 style. No bias and no subtraction of mean.\n        \"\"\"\n        super().__init__()\n        self.weight = nn.Parameter(torch.ones(hidden_size))\n        self.variance_epsilon = eps\n\n    def forward(self, hidden_states):\n        # T5 uses a layer_norm which only scales and doesn't shift, which is also known as Root Mean\n        # Square Layer Normalization https://arxiv.org/abs/1910.07467 thus varience is calculated\n        # w/o mean and there is no bias. Additionally we want to make sure that the accumulation for\n        # half-precision inputs is done in fp32\n\n        variance = hidden_states.to(torch.float32).pow(2).mean(-1, keepdim=True)\n        hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)\n\n        # convert into half-precision if necessary\n        if self.weight.dtype in [torch.float16, torch.bfloat16]:\n            hidden_states = hidden_states.to(self.weight.dtype)\n\n        return self.weight * hidden_states\n\n    @staticmethod\n    def from_native_module(module, *args, **kwargs):\n        assert module.__class__.__name__ == \"FusedRMSNorm\", (\n            \"Recovering T5LayerNorm requires the original layer to be apex's Fused RMS Norm.\"\n            \"Apex's fused norm is automatically used by Hugging Face Transformers https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py#L265C5-L265C48\"\n        )\n\n        layer_norm = T5LayerNorm(module.normalized_shape, eps=module.eps)\n        layer_norm.weight.data.copy_(module.weight.data)\n        layer_norm = layer_norm.to(module.weight.device)\n        return layer_norm\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/acceleration/shardformer/policy/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/opensora/acceleration/shardformer/policy/t5_encoder.py",
    "content": "from colossalai.shardformer.modeling.jit import get_jit_fused_dropout_add_func\nfrom colossalai.shardformer.modeling.t5 import get_jit_fused_T5_layer_ff_forward, get_T5_layer_self_attention_forward\nfrom colossalai.shardformer.policies.base_policy import Policy, SubModuleReplacementDescription\n\n\nclass T5EncoderPolicy(Policy):\n    def config_sanity_check(self):\n        assert not self.shard_config.enable_tensor_parallelism\n        assert not self.shard_config.enable_flash_attention\n\n    def preprocess(self):\n        return self.model\n\n    def module_policy(self):\n        from transformers.models.t5.modeling_t5 import T5LayerFF, T5LayerSelfAttention, T5Stack\n\n        policy = {}\n\n        # check whether apex is installed\n        try:\n            from opensora.acceleration.shardformer.modeling.t5 import T5LayerNorm\n\n            # recover hf from fused rms norm to T5 norm which is faster\n            self.append_or_create_submodule_replacement(\n                description=SubModuleReplacementDescription(\n                    suffix=\"layer_norm\",\n                    target_module=T5LayerNorm,\n                ),\n                policy=policy,\n                target_key=T5LayerFF,\n            )\n            self.append_or_create_submodule_replacement(\n                description=SubModuleReplacementDescription(suffix=\"layer_norm\", target_module=T5LayerNorm),\n                policy=policy,\n                target_key=T5LayerSelfAttention,\n            )\n            self.append_or_create_submodule_replacement(\n                description=SubModuleReplacementDescription(suffix=\"final_layer_norm\", target_module=T5LayerNorm),\n                policy=policy,\n                target_key=T5Stack,\n            )\n        except (ImportError, ModuleNotFoundError):\n            pass\n\n        # use jit operator\n        if self.shard_config.enable_jit_fused:\n            self.append_or_create_method_replacement(\n                description={\n                    \"forward\": get_jit_fused_T5_layer_ff_forward(),\n                    \"dropout_add\": get_jit_fused_dropout_add_func(),\n                },\n                policy=policy,\n                target_key=T5LayerFF,\n            )\n            self.append_or_create_method_replacement(\n                description={\n                    \"forward\": get_T5_layer_self_attention_forward(),\n                    \"dropout_add\": get_jit_fused_dropout_add_func(),\n                },\n                policy=policy,\n                target_key=T5LayerSelfAttention,\n            )\n\n        return policy\n\n    def postprocess(self):\n        return self.model\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/datasets/__init__.py",
    "content": "from .datasets import IMG_FPS, BatchFeatureDataset, VariableVideoTextDataset, VideoTextDataset\nfrom .utils import get_transforms_image, get_transforms_video, is_img, is_vid, save_sample\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/datasets/aspect.py",
    "content": "import math\n\n\n# computation\ndef get_h_w(a, ts, eps=1e-4):\n    h = (ts * a) ** 0.5\n    h = h + eps\n    h = math.ceil(h) if math.ceil(h) % 2 == 0 else math.floor(h)\n    w = h / a\n    w = w + eps\n    w = math.ceil(w) if math.ceil(w) % 2 == 0 else math.floor(w)\n    return h, w\n\n\ndef get_aspect_ratios_dict(ars, ts=360 * 640):\n    est = {f\"{a:.2f}\": get_h_w(a, ts) for a in ars}\n    return est\n\n\ndef get_ar(ratio):\n    h, w = ratio.split(\":\")\n    return int(h) / int(w)\n\n\n# H:W\nASPECT_RATIO_MAP = {\n    \"3:8\": \"0.38\",\n    \"9:21\": \"0.43\",\n    \"12:25\": \"0.48\",\n    \"1:2\": \"0.50\",\n    \"9:17\": \"0.53\",\n    \"27:50\": \"0.54\",\n    \"9:16\": \"0.56\",\n    \"5:8\": \"0.62\",\n    \"2:3\": \"0.67\",\n    \"3:4\": \"0.75\",\n    \"1:1\": \"1.00\",\n    \"4:3\": \"1.33\",\n    \"3:2\": \"1.50\",\n    \"16:9\": \"1.78\",\n    \"17:9\": \"1.89\",\n    \"2:1\": \"2.00\",\n    \"50:27\": \"2.08\",\n}\n\n\nAR = [get_ar(ratio) for ratio in ASPECT_RATIO_MAP.keys()]\n\n# computed from above code\n# S = 8294400\nASPECT_RATIO_4K = {\n    \"0.38\": (1764, 4704),\n    \"0.43\": (1886, 4400),\n    \"0.48\": (1996, 4158),\n    \"0.50\": (2036, 4072),\n    \"0.53\": (2096, 3960),\n    \"0.54\": (2118, 3918),\n    \"0.62\": (2276, 3642),\n    \"0.56\": (2160, 3840),  # base\n    \"0.67\": (2352, 3528),\n    \"0.75\": (2494, 3326),\n    \"1.00\": (2880, 2880),\n    \"1.33\": (3326, 2494),\n    \"1.50\": (3528, 2352),\n    \"1.78\": (3840, 2160),\n    \"1.89\": (3958, 2096),\n    \"2.00\": (4072, 2036),\n    \"2.08\": (4156, 1994),\n}\n\n# S = 3686400\nASPECT_RATIO_2K = {\n    \"0.38\": (1176, 3136),\n    \"0.43\": (1256, 2930),\n    \"0.48\": (1330, 2770),\n    \"0.50\": (1358, 2716),\n    \"0.53\": (1398, 2640),\n    \"0.54\": (1412, 2612),\n    \"0.56\": (1440, 2560),  # base\n    \"0.62\": (1518, 2428),\n    \"0.67\": (1568, 2352),\n    \"0.75\": (1662, 2216),\n    \"1.00\": (1920, 1920),\n    \"1.33\": (2218, 1664),\n    \"1.50\": (2352, 1568),\n    \"1.78\": (2560, 1440),\n    \"1.89\": (2638, 1396),\n    \"2.00\": (2716, 1358),\n    \"2.08\": (2772, 1330),\n}\n\n# S = 2073600\nASPECT_RATIO_1080P = {\n    \"0.38\": (882, 2352),\n    \"0.43\": (942, 2198),\n    \"0.48\": (998, 2080),\n    \"0.50\": (1018, 2036),\n    \"0.53\": (1048, 1980),\n    \"0.54\": (1058, 1958),\n    \"0.56\": (1080, 1920),  # base\n    \"0.62\": (1138, 1820),\n    \"0.67\": (1176, 1764),\n    \"0.75\": (1248, 1664),\n    \"1.00\": (1440, 1440),\n    \"1.33\": (1662, 1246),\n    \"1.50\": (1764, 1176),\n    \"1.78\": (1920, 1080),\n    \"1.89\": (1980, 1048),\n    \"2.00\": (2036, 1018),\n    \"2.08\": (2078, 998),\n}\n\n# S = 921600\nASPECT_RATIO_720P = {\n    \"0.38\": (588, 1568),\n    \"0.43\": (628, 1466),\n    \"0.48\": (666, 1388),\n    \"0.50\": (678, 1356),\n    \"0.53\": (698, 1318),\n    \"0.54\": (706, 1306),\n    \"0.56\": (720, 1280),  # base\n    \"0.62\": (758, 1212),\n    \"0.67\": (784, 1176),\n    \"0.75\": (832, 1110),\n    \"1.00\": (960, 960),\n    \"1.33\": (1108, 832),\n    \"1.50\": (1176, 784),\n    \"1.78\": (1280, 720),\n    \"1.89\": (1320, 698),\n    \"2.00\": (1358, 680),\n    \"2.08\": (1386, 666),\n}\n\n# S = 409920\nASPECT_RATIO_480P = {\n    \"0.38\": (392, 1046),\n    \"0.43\": (420, 980),\n    \"0.48\": (444, 925),\n    \"0.50\": (452, 904),\n    \"0.53\": (466, 880),\n    \"0.54\": (470, 870),\n    \"0.56\": (480, 854),  # base\n    \"0.62\": (506, 810),\n    \"0.67\": (522, 784),\n    \"0.75\": (554, 738),\n    \"1.00\": (640, 640),\n    \"1.33\": (740, 555),\n    \"1.50\": (784, 522),\n    \"1.78\": (854, 480),\n    \"1.89\": (880, 466),\n    \"2.00\": (906, 454),\n    \"2.08\": (924, 444),\n}\n\n# S = 230400\nASPECT_RATIO_360P = {\n    \"0.38\": (294, 784),\n    \"0.43\": (314, 732),\n    \"0.48\": (332, 692),\n    \"0.50\": (340, 680),\n    \"0.53\": (350, 662),\n    \"0.54\": (352, 652),\n    \"0.56\": (360, 640),  # base\n    \"0.62\": (380, 608),\n    \"0.67\": (392, 588),\n    \"0.75\": (416, 554),\n    \"1.00\": (480, 480),\n    \"1.33\": (554, 416),\n    \"1.50\": (588, 392),\n    \"1.78\": (640, 360),\n    \"1.89\": (660, 350),\n    \"2.00\": (678, 340),\n    \"2.08\": (692, 332),\n}\n\n# S = 102240\nASPECT_RATIO_240P = {\n    \"0.38\": (196, 522),\n    \"0.43\": (210, 490),\n    \"0.48\": (222, 462),\n    \"0.50\": (226, 452),\n    \"0.53\": (232, 438),\n    \"0.54\": (236, 436),\n    \"0.56\": (240, 426),  # base\n    \"0.62\": (252, 404),\n    \"0.67\": (262, 393),\n    \"0.75\": (276, 368),\n    \"1.00\": (320, 320),\n    \"1.33\": (370, 278),\n    \"1.50\": (392, 262),\n    \"1.78\": (426, 240),\n    \"1.89\": (440, 232),\n    \"2.00\": (452, 226),\n    \"2.08\": (462, 222),\n}\n\n# S = 36864\nASPECT_RATIO_144P = {\n    \"0.38\": (117, 312),\n    \"0.43\": (125, 291),\n    \"0.48\": (133, 277),\n    \"0.50\": (135, 270),\n    \"0.53\": (139, 262),\n    \"0.54\": (141, 260),\n    \"0.56\": (144, 256),  # base\n    \"0.62\": (151, 241),\n    \"0.67\": (156, 234),\n    \"0.75\": (166, 221),\n    \"1.00\": (192, 192),\n    \"1.33\": (221, 165),\n    \"1.50\": (235, 156),\n    \"1.78\": (256, 144),\n    \"1.89\": (263, 139),\n    \"2.00\": (271, 135),\n    \"2.08\": (277, 132),\n}\n\n# from PixArt\n# S = 8294400\nASPECT_RATIO_2880 = {\n    \"0.25\": (1408, 5760),\n    \"0.26\": (1408, 5568),\n    \"0.27\": (1408, 5376),\n    \"0.28\": (1408, 5184),\n    \"0.32\": (1600, 4992),\n    \"0.33\": (1600, 4800),\n    \"0.34\": (1600, 4672),\n    \"0.40\": (1792, 4480),\n    \"0.42\": (1792, 4288),\n    \"0.47\": (1920, 4096),\n    \"0.49\": (1920, 3904),\n    \"0.51\": (1920, 3776),\n    \"0.55\": (2112, 3840),\n    \"0.59\": (2112, 3584),\n    \"0.68\": (2304, 3392),\n    \"0.72\": (2304, 3200),\n    \"0.78\": (2496, 3200),\n    \"0.83\": (2496, 3008),\n    \"0.89\": (2688, 3008),\n    \"0.93\": (2688, 2880),\n    \"1.00\": (2880, 2880),\n    \"1.07\": (2880, 2688),\n    \"1.12\": (3008, 2688),\n    \"1.21\": (3008, 2496),\n    \"1.28\": (3200, 2496),\n    \"1.39\": (3200, 2304),\n    \"1.47\": (3392, 2304),\n    \"1.70\": (3584, 2112),\n    \"1.82\": (3840, 2112),\n    \"2.03\": (3904, 1920),\n    \"2.13\": (4096, 1920),\n    \"2.39\": (4288, 1792),\n    \"2.50\": (4480, 1792),\n    \"2.92\": (4672, 1600),\n    \"3.00\": (4800, 1600),\n    \"3.12\": (4992, 1600),\n    \"3.68\": (5184, 1408),\n    \"3.82\": (5376, 1408),\n    \"3.95\": (5568, 1408),\n    \"4.00\": (5760, 1408),\n}\n\n# S = 4194304\nASPECT_RATIO_2048 = {\n    \"0.25\": (1024, 4096),\n    \"0.26\": (1024, 3968),\n    \"0.27\": (1024, 3840),\n    \"0.28\": (1024, 3712),\n    \"0.32\": (1152, 3584),\n    \"0.33\": (1152, 3456),\n    \"0.35\": (1152, 3328),\n    \"0.40\": (1280, 3200),\n    \"0.42\": (1280, 3072),\n    \"0.48\": (1408, 2944),\n    \"0.50\": (1408, 2816),\n    \"0.52\": (1408, 2688),\n    \"0.57\": (1536, 2688),\n    \"0.60\": (1536, 2560),\n    \"0.68\": (1664, 2432),\n    \"0.72\": (1664, 2304),\n    \"0.78\": (1792, 2304),\n    \"0.82\": (1792, 2176),\n    \"0.88\": (1920, 2176),\n    \"0.94\": (1920, 2048),\n    \"1.00\": (2048, 2048),\n    \"1.07\": (2048, 1920),\n    \"1.13\": (2176, 1920),\n    \"1.21\": (2176, 1792),\n    \"1.29\": (2304, 1792),\n    \"1.38\": (2304, 1664),\n    \"1.46\": (2432, 1664),\n    \"1.67\": (2560, 1536),\n    \"1.75\": (2688, 1536),\n    \"2.00\": (2816, 1408),\n    \"2.09\": (2944, 1408),\n    \"2.40\": (3072, 1280),\n    \"2.50\": (3200, 1280),\n    \"2.89\": (3328, 1152),\n    \"3.00\": (3456, 1152),\n    \"3.11\": (3584, 1152),\n    \"3.62\": (3712, 1024),\n    \"3.75\": (3840, 1024),\n    \"3.88\": (3968, 1024),\n    \"4.00\": (4096, 1024),\n}\n\n# S = 1048576\nASPECT_RATIO_1024 = {\n    \"0.25\": (512, 2048),\n    \"0.26\": (512, 1984),\n    \"0.27\": (512, 1920),\n    \"0.28\": (512, 1856),\n    \"0.32\": (576, 1792),\n    \"0.33\": (576, 1728),\n    \"0.35\": (576, 1664),\n    \"0.40\": (640, 1600),\n    \"0.42\": (640, 1536),\n    \"0.48\": (704, 1472),\n    \"0.50\": (704, 1408),\n    \"0.52\": (704, 1344),\n    \"0.57\": (768, 1344),\n    \"0.60\": (768, 1280),\n    \"0.68\": (832, 1216),\n    \"0.72\": (832, 1152),\n    \"0.78\": (896, 1152),\n    \"0.82\": (896, 1088),\n    \"0.88\": (960, 1088),\n    \"0.94\": (960, 1024),\n    \"1.00\": (1024, 1024),\n    \"1.07\": (1024, 960),\n    \"1.13\": (1088, 960),\n    \"1.21\": (1088, 896),\n    \"1.29\": (1152, 896),\n    \"1.38\": (1152, 832),\n    \"1.46\": (1216, 832),\n    \"1.67\": (1280, 768),\n    \"1.75\": (1344, 768),\n    \"2.00\": (1408, 704),\n    \"2.09\": (1472, 704),\n    \"2.40\": (1536, 640),\n    \"2.50\": (1600, 640),\n    \"2.89\": (1664, 576),\n    \"3.00\": (1728, 576),\n    \"3.11\": (1792, 576),\n    \"3.62\": (1856, 512),\n    \"3.75\": (1920, 512),\n    \"3.88\": (1984, 512),\n    \"4.00\": (2048, 512),\n}\n\n# S = 262144\nASPECT_RATIO_512 = {\n    \"0.25\": (256, 1024),\n    \"0.26\": (256, 992),\n    \"0.27\": (256, 960),\n    \"0.28\": (256, 928),\n    \"0.32\": (288, 896),\n    \"0.33\": (288, 864),\n    \"0.35\": (288, 832),\n    \"0.40\": (320, 800),\n    \"0.42\": (320, 768),\n    \"0.48\": (352, 736),\n    \"0.50\": (352, 704),\n    \"0.52\": (352, 672),\n    \"0.57\": (384, 672),\n    \"0.60\": (384, 640),\n    \"0.68\": (416, 608),\n    \"0.72\": (416, 576),\n    \"0.78\": (448, 576),\n    \"0.82\": (448, 544),\n    \"0.88\": (480, 544),\n    \"0.94\": (480, 512),\n    \"1.00\": (512, 512),\n    \"1.07\": (512, 480),\n    \"1.13\": (544, 480),\n    \"1.21\": (544, 448),\n    \"1.29\": (576, 448),\n    \"1.38\": (576, 416),\n    \"1.46\": (608, 416),\n    \"1.67\": (640, 384),\n    \"1.75\": (672, 384),\n    \"2.00\": (704, 352),\n    \"2.09\": (736, 352),\n    \"2.40\": (768, 320),\n    \"2.50\": (800, 320),\n    \"2.89\": (832, 288),\n    \"3.00\": (864, 288),\n    \"3.11\": (896, 288),\n    \"3.62\": (928, 256),\n    \"3.75\": (960, 256),\n    \"3.88\": (992, 256),\n    \"4.00\": (1024, 256),\n}\n\n# S = 65536\nASPECT_RATIO_256 = {\n    \"0.25\": (128, 512),\n    \"0.26\": (128, 496),\n    \"0.27\": (128, 480),\n    \"0.28\": (128, 464),\n    \"0.32\": (144, 448),\n    \"0.33\": (144, 432),\n    \"0.35\": (144, 416),\n    \"0.40\": (160, 400),\n    \"0.42\": (160, 384),\n    \"0.48\": (176, 368),\n    \"0.50\": (176, 352),\n    \"0.52\": (176, 336),\n    \"0.57\": (192, 336),\n    \"0.60\": (192, 320),\n    \"0.68\": (208, 304),\n    \"0.72\": (208, 288),\n    \"0.78\": (224, 288),\n    \"0.82\": (224, 272),\n    \"0.88\": (240, 272),\n    \"0.94\": (240, 256),\n    \"1.00\": (256, 256),\n    \"1.07\": (256, 240),\n    \"1.13\": (272, 240),\n    \"1.21\": (272, 224),\n    \"1.29\": (288, 224),\n    \"1.38\": (288, 208),\n    \"1.46\": (304, 208),\n    \"1.67\": (320, 192),\n    \"1.75\": (336, 192),\n    \"2.00\": (352, 176),\n    \"2.09\": (368, 176),\n    \"2.40\": (384, 160),\n    \"2.50\": (400, 160),\n    \"2.89\": (416, 144),\n    \"3.00\": (432, 144),\n    \"3.11\": (448, 144),\n    \"3.62\": (464, 128),\n    \"3.75\": (480, 128),\n    \"3.88\": (496, 128),\n    \"4.00\": (512, 128),\n}\n\n\ndef get_closest_ratio(height: float, width: float, ratios: dict):\n    aspect_ratio = height / width\n    closest_ratio = min(ratios.keys(), key=lambda ratio: abs(float(ratio) - aspect_ratio))\n    return closest_ratio\n\n\nASPECT_RATIOS = {\n    \"144p\": (36864, ASPECT_RATIO_144P),\n    \"256\": (65536, ASPECT_RATIO_256),\n    \"240p\": (102240, ASPECT_RATIO_240P),\n    \"360p\": (230400, ASPECT_RATIO_360P),\n    \"512\": (262144, ASPECT_RATIO_512),\n    \"480p\": (409920, ASPECT_RATIO_480P),\n    \"720p\": (921600, ASPECT_RATIO_720P),\n    \"1024\": (1048576, ASPECT_RATIO_1024),\n    \"1080p\": (2073600, ASPECT_RATIO_1080P),\n    \"2k\": (3686400, ASPECT_RATIO_2K),\n    \"2048\": (4194304, ASPECT_RATIO_2048),\n    \"2880\": (8294400, ASPECT_RATIO_2880),\n    \"4k\": (8294400, ASPECT_RATIO_4K),\n}\n\n\ndef get_num_pixels(name):\n    return ASPECT_RATIOS[name][0]\n\n\ndef get_image_size(resolution, ar_ratio):\n    if ar_ratio in ASPECT_RATIO_MAP:\n        ar_key = ASPECT_RATIO_MAP[ar_ratio]\n    else:\n        ar_key = ar_ratio\n    rs_dict = ASPECT_RATIOS[resolution][1]\n    assert ar_key in rs_dict, f\"Aspect ratio {ar_ratio} not found for resolution {resolution}\"\n    return rs_dict[ar_key]\n\n\nNUM_FRAMES_MAP = {\n    \"1x\": 51,\n    \"2x\": 102,\n    \"4x\": 204,\n    \"8x\": 408,\n    \"16x\": 816,\n    \"2s\": 51,\n    \"4s\": 102,\n    \"8s\": 204,\n    \"16s\": 408,\n    \"32s\": 816,\n}\n\n\ndef get_num_frames(num_frames):\n    if num_frames in NUM_FRAMES_MAP:\n        return NUM_FRAMES_MAP[num_frames]\n    else:\n        return int(num_frames)\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/datasets/bucket.py",
    "content": "from collections import OrderedDict\n\nimport numpy as np\n\nfrom opensora.utils.misc import get_logger\n\nfrom .aspect import ASPECT_RATIOS, get_closest_ratio\n\n\ndef find_approximate_hw(hw, hw_dict, approx=0.8):\n    for k, v in hw_dict.items():\n        if hw >= v * approx:\n            return k\n    return None\n\n\ndef find_closet_smaller_bucket(t, t_dict, frame_interval):\n    # process image\n    if t == 1:\n        if 1 in t_dict:\n            return 1\n        else:\n            return None\n    # process video\n    for k, v in t_dict.items():\n        if t >= v * frame_interval and v != 1:\n            return k\n    return None\n\n\nclass Bucket:\n    def __init__(self, bucket_config):\n        for key in bucket_config:\n            assert key in ASPECT_RATIOS, f\"Aspect ratio {key} not found.\"\n        # wrap config with OrderedDict\n        bucket_probs = OrderedDict()\n        bucket_bs = OrderedDict()\n        bucket_names = sorted(bucket_config.keys(), key=lambda x: ASPECT_RATIOS[x][0], reverse=True)\n        for key in bucket_names:\n            bucket_time_names = sorted(bucket_config[key].keys(), key=lambda x: x, reverse=True)\n            bucket_probs[key] = OrderedDict({k: bucket_config[key][k][0] for k in bucket_time_names})\n            bucket_bs[key] = OrderedDict({k: bucket_config[key][k][1] for k in bucket_time_names})\n\n        # first level: HW\n        num_bucket = 0\n        hw_criteria = dict()\n        t_criteria = dict()\n        ar_criteria = dict()\n        bucket_id = OrderedDict()\n        bucket_id_cnt = 0\n        for k1, v1 in bucket_probs.items():\n            hw_criteria[k1] = ASPECT_RATIOS[k1][0]\n            t_criteria[k1] = dict()\n            ar_criteria[k1] = dict()\n            bucket_id[k1] = dict()\n            for k2, _ in v1.items():\n                t_criteria[k1][k2] = k2\n                bucket_id[k1][k2] = bucket_id_cnt\n                bucket_id_cnt += 1\n                ar_criteria[k1][k2] = dict()\n                for k3, v3 in ASPECT_RATIOS[k1][1].items():\n                    ar_criteria[k1][k2][k3] = v3\n                    num_bucket += 1\n\n        self.bucket_probs = bucket_probs\n        self.bucket_bs = bucket_bs\n        self.bucket_id = bucket_id\n        self.hw_criteria = hw_criteria\n        self.t_criteria = t_criteria\n        self.ar_criteria = ar_criteria\n        self.num_bucket = num_bucket\n        get_logger().info(\"Number of buckets: %s\", num_bucket)\n\n    def get_bucket_id(self, T, H, W, frame_interval=1, seed=None):\n        resolution = H * W\n        approx = 0.8\n\n        fail = True\n        for hw_id, t_criteria in self.bucket_probs.items():\n            if resolution < self.hw_criteria[hw_id] * approx:\n                continue\n\n            # if sample is an image\n            if T == 1:\n                if 1 in t_criteria:\n                    rng = np.random.default_rng(seed + self.bucket_id[hw_id][1])\n                    if rng.random() < t_criteria[1]:\n                        fail = False\n                        t_id = 1\n                        break\n                else:\n                    continue\n\n            # otherwise, find suitable t_id for video\n            t_fail = True\n            for t_id, prob in t_criteria.items():\n                rng = np.random.default_rng(seed + self.bucket_id[hw_id][t_id])\n                if isinstance(prob, tuple):\n                    prob_t = prob[1]\n                    if rng.random() > prob_t:\n                        continue\n                if T > t_id * frame_interval and t_id != 1:\n                    t_fail = False\n                    break\n            if t_fail:\n                continue\n\n            # leave the loop if prob is high enough\n            if isinstance(prob, tuple):\n                prob = prob[0]\n            if prob >= 1 or rng.random() < prob:\n                fail = False\n                break\n        if fail:\n            return None\n\n        # get aspect ratio id\n        ar_criteria = self.ar_criteria[hw_id][t_id]\n        ar_id = get_closest_ratio(H, W, ar_criteria)\n        return hw_id, t_id, ar_id\n\n    def get_thw(self, bucket_id):\n        assert len(bucket_id) == 3\n        T = self.t_criteria[bucket_id[0]][bucket_id[1]]\n        H, W = self.ar_criteria[bucket_id[0]][bucket_id[1]][bucket_id[2]]\n        return T, H, W\n\n    def get_prob(self, bucket_id):\n        return self.bucket_probs[bucket_id[0]][bucket_id[1]]\n\n    def get_batch_size(self, bucket_id):\n        return self.bucket_bs[bucket_id[0]][bucket_id[1]]\n\n    def __len__(self):\n        return self.num_bucket\n\n\ndef closet_smaller_bucket(value, bucket):\n    for i in range(1, len(bucket)):\n        if value < bucket[i]:\n            return bucket[i - 1]\n    return bucket[-1]\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/datasets/dataloader.py",
    "content": "import collections\nimport random\nfrom typing import Optional\n\nimport numpy as np\nimport torch\nfrom torch.distributed import ProcessGroup\nfrom torch.distributed.distributed_c10d import _get_default_group\nfrom torch.utils.data import DataLoader\n\nfrom .datasets import BatchFeatureDataset, VariableVideoTextDataset, VideoTextDataset\nfrom .sampler import BatchDistributedSampler, StatefulDistributedSampler, VariableVideoBatchSampler\n\n\n# Deterministic dataloader\ndef get_seed_worker(seed):\n    def seed_worker(worker_id):\n        worker_seed = seed\n        np.random.seed(worker_seed)\n        torch.manual_seed(worker_seed)\n        random.seed(worker_seed)\n\n    return seed_worker\n\n\ndef prepare_dataloader(\n    dataset,\n    batch_size=None,\n    shuffle=False,\n    seed=1024,\n    drop_last=False,\n    pin_memory=False,\n    num_workers=0,\n    process_group: Optional[ProcessGroup] = None,\n    bucket_config=None,\n    num_bucket_build_workers=1,\n    prefetch_factor=None,\n    **kwargs,\n):\n    _kwargs = kwargs.copy()\n    if isinstance(dataset, VariableVideoTextDataset):\n        batch_sampler = VariableVideoBatchSampler(\n            dataset,\n            bucket_config,\n            num_replicas=process_group.size(),\n            rank=process_group.rank(),\n            shuffle=shuffle,\n            seed=seed,\n            drop_last=drop_last,\n            verbose=True,\n            num_bucket_build_workers=num_bucket_build_workers,\n        )\n        return (\n            DataLoader(\n                dataset,\n                batch_sampler=batch_sampler,\n                worker_init_fn=get_seed_worker(seed),\n                pin_memory=pin_memory,\n                num_workers=num_workers,\n                collate_fn=collate_fn_default,\n                prefetch_factor=prefetch_factor,\n                **_kwargs,\n            ),\n            batch_sampler,\n        )\n    elif isinstance(dataset, VideoTextDataset):\n        process_group = process_group or _get_default_group()\n        sampler = StatefulDistributedSampler(\n            dataset,\n            num_replicas=process_group.size(),\n            rank=process_group.rank(),\n            shuffle=shuffle,\n        )\n        return (\n            DataLoader(\n                dataset,\n                batch_size=batch_size,\n                sampler=sampler,\n                worker_init_fn=get_seed_worker(seed),\n                drop_last=drop_last,\n                pin_memory=pin_memory,\n                num_workers=num_workers,\n                collate_fn=collate_fn_default,\n                prefetch_factor=prefetch_factor,\n                **_kwargs,\n            ),\n            sampler,\n        )\n    elif isinstance(dataset, BatchFeatureDataset):\n        sampler = BatchDistributedSampler(\n            dataset,\n            num_replicas=process_group.size(),\n            rank=process_group.rank(),\n        )\n        return (\n            DataLoader(\n                dataset,\n                batch_size=1,\n                sampler=sampler,\n                worker_init_fn=get_seed_worker(seed),\n                pin_memory=pin_memory,\n                num_workers=num_workers,\n                collate_fn=collate_fn_batch,\n                prefetch_factor=prefetch_factor,\n                **_kwargs,\n            ),\n            sampler,\n        )\n    else:\n        raise ValueError(f\"Unsupported dataset type: {type(dataset)}\")\n\n\ndef collate_fn_default(batch):\n    # filter out None\n    batch = [x for x in batch if x is not None]\n\n    # HACK: for loading text features\n    use_mask = False\n    if \"mask\" in batch[0] and isinstance(batch[0][\"mask\"], int):\n        masks = [x.pop(\"mask\") for x in batch]\n\n        texts = [x.pop(\"text\") for x in batch]\n        texts = torch.cat(texts, dim=1)\n        use_mask = True\n\n    ret = torch.utils.data.default_collate(batch)\n\n    if use_mask:\n        ret[\"mask\"] = masks\n        ret[\"text\"] = texts\n    return ret\n\n\ndef collate_fn_batch(batch):\n    \"\"\"\n    Used only with BatchDistributedSampler\n    \"\"\"\n    # filter out None\n    batch = [x for x in batch if x is not None]\n    \n    res = torch.utils.data.default_collate(batch)\n\n    # squeeze the first dimension, which is due to torch.stack() in default_collate()\n    if isinstance(res, collections.abc.Mapping):\n        for k, v in res.items():\n            if isinstance(v, torch.Tensor):\n                res[k] = v.squeeze(0)\n    elif isinstance(res, collections.abc.Sequence):\n        res = [x.squeeze(0) if isinstance(x, torch.Tensor) else x for x in res]\n    elif isinstance(res, torch.Tensor):\n        res = res.squeeze(0)\n    else:\n        raise TypeError\n\n    return res\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/datasets/datasets.py",
    "content": "import os\nfrom glob import glob\n\nimport numpy as np\nimport torch\nfrom PIL import ImageFile\nfrom torchvision.datasets.folder import IMG_EXTENSIONS, pil_loader\n\nfrom opensora.registry import DATASETS\n\nfrom .read_video import read_video\nfrom .utils import VID_EXTENSIONS, get_transforms_image, get_transforms_video, read_file, temporal_random_crop\n\nImageFile.LOAD_TRUNCATED_IMAGES = True\nIMG_FPS = 120\n\n\n@DATASETS.register_module()\nclass VideoTextDataset(torch.utils.data.Dataset):\n    \"\"\"load video according to the csv file.\n\n    Args:\n        target_video_len (int): the number of video frames will be load.\n        align_transform (callable): Align different videos in a specified size.\n        temporal_sample (callable): Sample the target length of a video.\n    \"\"\"\n\n    def __init__(\n        self,\n        data_path=None,\n        num_frames=16,\n        frame_interval=1,\n        image_size=(256, 256),\n        transform_name=\"center\",\n    ):\n        self.data_path = data_path\n        self.data = read_file(data_path)\n        self.get_text = \"text\" in self.data.columns\n        self.num_frames = num_frames\n        self.frame_interval = frame_interval\n        self.image_size = image_size\n        self.transforms = {\n            \"image\": get_transforms_image(transform_name, image_size),\n            \"video\": get_transforms_video(transform_name, image_size),\n        }\n\n    def _print_data_number(self):\n        num_videos = 0\n        num_images = 0\n        for path in self.data[\"path\"]:\n            if self.get_type(path) == \"video\":\n                num_videos += 1\n            else:\n                num_images += 1\n        print(f\"Dataset contains {num_videos} videos and {num_images} images.\")\n\n    def get_type(self, path):\n        ext = os.path.splitext(path)[-1].lower()\n        if ext.lower() in VID_EXTENSIONS:\n            return \"video\"\n        else:\n            assert ext.lower() in IMG_EXTENSIONS, f\"Unsupported file format: {ext}\"\n            return \"image\"\n\n    def getitem(self, index):\n        sample = self.data.iloc[index]\n        path = sample[\"path\"]\n        file_type = self.get_type(path)\n\n        if file_type == \"video\":\n            # loading\n            vframes, vinfo = read_video(path, backend=\"av\")\n            video_fps = vinfo[\"video_fps\"] if \"video_fps\" in vinfo else 24\n\n            # Sampling video frames\n            video = temporal_random_crop(vframes, self.num_frames, self.frame_interval)\n\n            # transform\n            transform = self.transforms[\"video\"]\n            video = transform(video)  # T C H W\n        else:\n            # loading\n            image = pil_loader(path)\n            video_fps = IMG_FPS\n\n            # transform\n            transform = self.transforms[\"image\"]\n            image = transform(image)\n\n            # repeat\n            video = image.unsqueeze(0).repeat(self.num_frames, 1, 1, 1)\n\n        # TCHW -> CTHW\n        video = video.permute(1, 0, 2, 3)\n\n        ret = {\"video\": video, \"fps\": video_fps}\n        if self.get_text:\n            ret[\"text\"] = sample[\"text\"]\n        return ret\n\n    def __getitem__(self, index):\n        for _ in range(10):\n            try:\n                return self.getitem(index)\n            except Exception as e:\n                path = self.data.iloc[index][\"path\"]\n                print(f\"data {path}: {e}\")\n                index = np.random.randint(len(self))\n        raise RuntimeError(\"Too many bad data.\")\n\n    def __len__(self):\n        return len(self.data)\n\n\n@DATASETS.register_module()\nclass VariableVideoTextDataset(VideoTextDataset):\n    def __init__(\n        self,\n        data_path=None,\n        num_frames=None,\n        frame_interval=1,\n        image_size=(None, None),\n        transform_name=None,\n        dummy_text_feature=False,\n    ):\n        super().__init__(data_path, num_frames, frame_interval, image_size, transform_name=None)\n        self.transform_name = transform_name\n        self.data[\"id\"] = np.arange(len(self.data))\n        self.dummy_text_feature = dummy_text_feature\n\n    def get_data_info(self, index):\n        T = self.data.iloc[index][\"num_frames\"]\n        H = self.data.iloc[index][\"height\"]\n        W = self.data.iloc[index][\"width\"]\n        return T, H, W\n\n    def getitem(self, index):\n        # a hack to pass in the (time, height, width) info from sampler\n        index, num_frames, height, width = [int(val) for val in index.split(\"-\")]\n\n        sample = self.data.iloc[index]\n        path = sample[\"path\"]\n        file_type = self.get_type(path)\n        ar = height / width\n\n        video_fps = 24  # default fps\n        if file_type == \"video\":\n            # loading\n            vframes, vinfo = read_video(path, backend=\"av\")\n            video_fps = vinfo[\"video_fps\"] if \"video_fps\" in vinfo else 24\n\n            # Sampling video frames\n            video = temporal_random_crop(vframes, num_frames, self.frame_interval)\n            video = video.clone()\n            del vframes\n\n            video_fps = video_fps // self.frame_interval\n\n            # transform\n            transform = get_transforms_video(self.transform_name, (height, width))\n            video = transform(video)  # T C H W\n        else:\n            # loading\n            image = pil_loader(path)\n            video_fps = IMG_FPS\n\n            # transform\n            transform = get_transforms_image(self.transform_name, (height, width))\n            image = transform(image)\n\n            # repeat\n            video = image.unsqueeze(0)\n\n        # TCHW -> CTHW\n        video = video.permute(1, 0, 2, 3)\n        ret = {\n            \"video\": video,\n            \"num_frames\": num_frames,\n            \"height\": height,\n            \"width\": width,\n            \"ar\": ar,\n            \"fps\": video_fps,\n        }\n        if self.get_text:\n            ret[\"text\"] = sample[\"text\"]\n        if self.dummy_text_feature:\n            text_len = 50\n            ret[\"text\"] = torch.zeros((1, text_len, 1152))\n            ret[\"mask\"] = text_len\n        return ret\n\n    def __getitem__(self, index):\n        try:\n            return self.getitem(index)\n        except:\n            return None\n\n\n@DATASETS.register_module()\nclass BatchFeatureDataset(torch.utils.data.Dataset):\n    \"\"\"\n    The dataset is composed of multiple .bin files.\n    Each .bin file is a list of batch data (like a buffer). All .bin files have the same length.\n    In each training iteration, one batch is fetched from the current buffer.\n    Once a buffer is consumed, load another one.\n    Avoid loading the same .bin on two difference GPUs, i.e., one .bin is assigned to one GPU only.\n    \"\"\"\n\n    def __init__(self, data_path=None):\n        self.path_list = sorted(glob(data_path + \"/**/*.bin\"))\n\n        self._len_buffer = len(torch.load(self.path_list[0]))\n        self._num_buffers = len(self.path_list)\n        self.num_samples = self.len_buffer * len(self.path_list)\n\n        self.cur_file_idx = -1\n        self.cur_buffer = None\n\n    @property\n    def num_buffers(self):\n        return self._num_buffers\n\n    @property\n    def len_buffer(self):\n        return self._len_buffer\n\n    def _load_buffer(self, idx):\n        file_idx = idx // self.len_buffer\n        if file_idx != self.cur_file_idx:\n            self.cur_file_idx = file_idx\n            self.cur_buffer = torch.load(self.path_list[file_idx])\n\n    def __len__(self):\n        return self.num_samples\n\n    def __getitem__(self, idx):\n        self._load_buffer(idx)\n\n        batch = self.cur_buffer[idx % self.len_buffer]  # dict; keys are {'x', 'fps'} and text related\n\n        ret = {\n            \"video\": batch[\"x\"],\n            \"text\": batch[\"y\"],\n            \"mask\": batch[\"mask\"],\n            \"fps\": batch[\"fps\"],\n            \"height\": batch[\"height\"],\n            \"width\": batch[\"width\"],\n            \"num_frames\": batch[\"num_frames\"],\n        }\n        return ret\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/datasets/read_video.py",
    "content": "import gc\nimport math\nimport os\nimport re\nimport warnings\nfrom fractions import Fraction\nfrom typing import Any, Dict, List, Optional, Tuple, Union\n\nimport av\nimport cv2\nimport numpy as np\nimport torch\nfrom torchvision import get_video_backend\nfrom torchvision.io.video import _check_av_available\n\nMAX_NUM_FRAMES = 2500\n\n\ndef read_video_av(\n    filename: str,\n    start_pts: Union[float, Fraction] = 0,\n    end_pts: Optional[Union[float, Fraction]] = None,\n    pts_unit: str = \"pts\",\n    output_format: str = \"THWC\",\n) -> Tuple[torch.Tensor, torch.Tensor, Dict[str, Any]]:\n    \"\"\"\n    Reads a video from a file, returning both the video frames and the audio frames\n\n    This method is modified from torchvision.io.video.read_video, with the following changes:\n\n    1. will not extract audio frames and return empty for aframes\n    2. remove checks and only support pyav\n    3. add container.close() and gc.collect() to avoid thread leakage\n    4. try our best to avoid memory leak\n\n    Args:\n        filename (str): path to the video file\n        start_pts (int if pts_unit = 'pts', float / Fraction if pts_unit = 'sec', optional):\n            The start presentation time of the video\n        end_pts (int if pts_unit = 'pts', float / Fraction if pts_unit = 'sec', optional):\n            The end presentation time\n        pts_unit (str, optional): unit in which start_pts and end_pts values will be interpreted,\n            either 'pts' or 'sec'. Defaults to 'pts'.\n        output_format (str, optional): The format of the output video tensors. Can be either \"THWC\" (default) or \"TCHW\".\n\n    Returns:\n        vframes (Tensor[T, H, W, C] or Tensor[T, C, H, W]): the `T` video frames\n        aframes (Tensor[K, L]): the audio frames, where `K` is the number of channels and `L` is the number of points\n        info (Dict): metadata for the video and audio. Can contain the fields video_fps (float) and audio_fps (int)\n    \"\"\"\n    # format\n    output_format = output_format.upper()\n    if output_format not in (\"THWC\", \"TCHW\"):\n        raise ValueError(f\"output_format should be either 'THWC' or 'TCHW', got {output_format}.\")\n    # file existence\n    if not os.path.exists(filename):\n        raise RuntimeError(f\"File not found: {filename}\")\n    # backend check\n    assert get_video_backend() == \"pyav\", \"pyav backend is required for read_video_av\"\n    _check_av_available()\n    # end_pts check\n    if end_pts is None:\n        end_pts = float(\"inf\")\n    if end_pts < start_pts:\n        raise ValueError(f\"end_pts should be larger than start_pts, got start_pts={start_pts} and end_pts={end_pts}\")\n\n    # == get video info ==\n    info = {}\n    # TODO: creating an container leads to memory leak (1G for 8 workers 1 GPU)\n    container = av.open(filename, metadata_errors=\"ignore\")\n    # fps\n    video_fps = container.streams.video[0].average_rate\n    # guard against potentially corrupted files\n    if video_fps is not None:\n        info[\"video_fps\"] = float(video_fps)\n    iter_video = container.decode(**{\"video\": 0})\n    frame = next(iter_video).to_rgb().to_ndarray()\n    height, width = frame.shape[:2]\n    total_frames = container.streams.video[0].frames\n    if total_frames == 0:\n        total_frames = MAX_NUM_FRAMES\n        warnings.warn(f\"total_frames is 0, using {MAX_NUM_FRAMES} as a fallback\")\n    container.close()\n    del container\n\n    # HACK: must create before iterating stream\n    # use np.zeros will not actually allocate memory\n    # use np.ones will lead to a little memory leak\n    video_frames = np.zeros((total_frames, height, width, 3), dtype=np.uint8)\n\n    # == read ==\n    try:\n        # TODO: The reading has memory leak (4G for 8 workers 1 GPU)\n        container = av.open(filename, metadata_errors=\"ignore\")\n        assert container.streams.video is not None\n        video_frames = _read_from_stream(\n            video_frames,\n            container,\n            start_pts,\n            end_pts,\n            pts_unit,\n            container.streams.video[0],\n            {\"video\": 0},\n            filename=filename,\n        )\n    except av.AVError as e:\n        print(f\"[Warning] Error while reading video {filename}: {e}\")\n\n    vframes = torch.from_numpy(video_frames).clone()\n    del video_frames\n    if output_format == \"TCHW\":\n        # [T,H,W,C] --> [T,C,H,W]\n        vframes = vframes.permute(0, 3, 1, 2)\n\n    aframes = torch.empty((1, 0), dtype=torch.float32)\n    return vframes, aframes, info\n\n\ndef _read_from_stream(\n    video_frames,\n    container: \"av.container.Container\",\n    start_offset: float,\n    end_offset: float,\n    pts_unit: str,\n    stream: \"av.stream.Stream\",\n    stream_name: Dict[str, Optional[Union[int, Tuple[int, ...], List[int]]]],\n    filename: Optional[str] = None,\n) -> List[\"av.frame.Frame\"]:\n    if pts_unit == \"sec\":\n        # TODO: we should change all of this from ground up to simply take\n        # sec and convert to MS in C++\n        start_offset = int(math.floor(start_offset * (1 / stream.time_base)))\n        if end_offset != float(\"inf\"):\n            end_offset = int(math.ceil(end_offset * (1 / stream.time_base)))\n    else:\n        warnings.warn(\"The pts_unit 'pts' gives wrong results. Please use pts_unit 'sec'.\")\n\n    should_buffer = True\n    max_buffer_size = 5\n    if stream.type == \"video\":\n        # DivX-style packed B-frames can have out-of-order pts (2 frames in a single pkt)\n        # so need to buffer some extra frames to sort everything\n        # properly\n        extradata = stream.codec_context.extradata\n        # overly complicated way of finding if `divx_packed` is set, following\n        # https://github.com/FFmpeg/FFmpeg/commit/d5a21172283572af587b3d939eba0091484d3263\n        if extradata and b\"DivX\" in extradata:\n            # can't use regex directly because of some weird characters sometimes...\n            pos = extradata.find(b\"DivX\")\n            d = extradata[pos:]\n            o = re.search(rb\"DivX(\\d+)Build(\\d+)(\\w)\", d)\n            if o is None:\n                o = re.search(rb\"DivX(\\d+)b(\\d+)(\\w)\", d)\n            if o is not None:\n                should_buffer = o.group(3) == b\"p\"\n    seek_offset = start_offset\n    # some files don't seek to the right location, so better be safe here\n    seek_offset = max(seek_offset - 1, 0)\n    if should_buffer:\n        # FIXME this is kind of a hack, but we will jump to the previous keyframe\n        # so this will be safe\n        seek_offset = max(seek_offset - max_buffer_size, 0)\n    try:\n        # TODO check if stream needs to always be the video stream here or not\n        container.seek(seek_offset, any_frame=False, backward=True, stream=stream)\n    except av.AVError as e:\n        print(f\"[Warning] Error while seeking video {filename}: {e}\")\n        return []\n\n    # == main ==\n    buffer_count = 0\n    frames_pts = []\n    cnt = 0\n    try:\n        for _idx, frame in enumerate(container.decode(**stream_name)):\n            frames_pts.append(frame.pts)\n            video_frames[cnt] = frame.to_rgb().to_ndarray()\n            cnt += 1\n            if cnt >= len(video_frames):\n                break\n            if frame.pts >= end_offset:\n                if should_buffer and buffer_count < max_buffer_size:\n                    buffer_count += 1\n                    continue\n                break\n    except av.AVError as e:\n        print(f\"[Warning] Error while reading video {filename}: {e}\")\n\n    # garbage collection for thread leakage\n    container.close()\n    del container\n    # NOTE: manually garbage collect to close pyav threads\n    gc.collect()\n\n    # ensure that the results are sorted wrt the pts\n    # NOTE: here we assert frames_pts is sorted\n    start_ptr = 0\n    end_ptr = cnt\n    while start_ptr < end_ptr and frames_pts[start_ptr] < start_offset:\n        start_ptr += 1\n    while start_ptr < end_ptr and frames_pts[end_ptr - 1] > end_offset:\n        end_ptr -= 1\n    if start_offset > 0 and start_offset not in frames_pts[start_ptr:end_ptr]:\n        # if there is no frame that exactly matches the pts of start_offset\n        # add the last frame smaller than start_offset, to guarantee that\n        # we will have all the necessary data. This is most useful for audio\n        if start_ptr > 0:\n            start_ptr -= 1\n    result = video_frames[start_ptr:end_ptr].copy()\n    return result\n\n\ndef read_video_cv2(video_path):\n    cap = cv2.VideoCapture(video_path)\n\n    if not cap.isOpened():\n        # print(\"Error: Unable to open video\")\n        raise ValueError\n    else:\n        fps = cap.get(cv2.CAP_PROP_FPS)\n        vinfo = {\n            \"video_fps\": fps,\n        }\n\n        frames = []\n        while True:\n            # Read a frame from the video\n            ret, frame = cap.read()\n\n            # If frame is not read correctly, break the loop\n            if not ret:\n                break\n\n            frames.append(frame[:, :, ::-1])  # BGR to RGB\n\n            # Exit if 'q' is pressed\n            if cv2.waitKey(25) & 0xFF == ord(\"q\"):\n                break\n\n        # Release the video capture object and close all windows\n        cap.release()\n        cv2.destroyAllWindows()\n\n        frames = np.stack(frames)\n        frames = torch.from_numpy(frames)  # [T, H, W, C=3]\n        frames = frames.permute(0, 3, 1, 2)\n        return frames, vinfo\n\n\ndef read_video(video_path, backend=\"av\"):\n    if backend == \"cv2\":\n        vframes, vinfo = read_video_cv2(video_path)\n    elif backend == \"av\":\n        vframes, _, vinfo = read_video_av(filename=video_path, pts_unit=\"sec\", output_format=\"TCHW\")\n    else:\n        raise ValueError\n\n    return vframes, vinfo\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/datasets/sampler.py",
    "content": "from collections import OrderedDict, defaultdict\nfrom pprint import pformat\nfrom typing import Iterator, List, Optional\n\nimport numpy as np\nimport torch\nimport torch.distributed as dist\nfrom torch.utils.data import Dataset, DistributedSampler\n\nfrom opensora.utils.misc import format_numel_str, get_logger\n\nfrom .aspect import get_num_pixels\nfrom .bucket import Bucket\nfrom .datasets import VariableVideoTextDataset\n\n\n# use pandarallel to accelerate bucket processing\n# NOTE: pandarallel should only access local variables\ndef apply(data, method=None, frame_interval=None, seed=None, num_bucket=None):\n    return method(\n        data[\"num_frames\"],\n        data[\"height\"],\n        data[\"width\"],\n        frame_interval,\n        seed + data[\"id\"] * num_bucket,\n    )\n\n\nclass StatefulDistributedSampler(DistributedSampler):\n    def __init__(\n        self,\n        dataset: Dataset,\n        num_replicas: Optional[int] = None,\n        rank: Optional[int] = None,\n        shuffle: bool = True,\n        seed: int = 0,\n        drop_last: bool = False,\n    ) -> None:\n        super().__init__(dataset, num_replicas, rank, shuffle, seed, drop_last)\n        self.start_index: int = 0\n\n    def __iter__(self) -> Iterator:\n        iterator = super().__iter__()\n        indices = list(iterator)\n        indices = indices[self.start_index :]\n        return iter(indices)\n\n    def __len__(self) -> int:\n        return self.num_samples - self.start_index\n\n    def reset(self) -> None:\n        self.start_index = 0\n\n    def state_dict(self, step) -> dict:\n        return {\"start_index\": step}\n\n    def load_state_dict(self, state_dict: dict) -> None:\n        self.__dict__.update(state_dict)\n\n\nclass VariableVideoBatchSampler(DistributedSampler):\n    def __init__(\n        self,\n        dataset: VariableVideoTextDataset,\n        bucket_config: dict,\n        num_replicas: Optional[int] = None,\n        rank: Optional[int] = None,\n        shuffle: bool = True,\n        seed: int = 0,\n        drop_last: bool = False,\n        verbose: bool = False,\n        num_bucket_build_workers: int = 1,\n    ) -> None:\n        super().__init__(\n            dataset=dataset, num_replicas=num_replicas, rank=rank, shuffle=shuffle, seed=seed, drop_last=drop_last\n        )\n        self.dataset = dataset\n        self.bucket = Bucket(bucket_config)\n        self.verbose = verbose\n        self.last_micro_batch_access_index = 0\n        self.approximate_num_batch = None\n\n        self._get_num_batch_cached_bucket_sample_dict = None\n        self.num_bucket_build_workers = num_bucket_build_workers\n\n    def __iter__(self) -> Iterator[List[int]]:\n        if self._get_num_batch_cached_bucket_sample_dict is not None:\n            bucket_sample_dict = self._get_num_batch_cached_bucket_sample_dict\n            self._get_num_batch_cached_bucket_sample_dict = None\n        else:\n            bucket_sample_dict = self.group_by_bucket()\n            if self.verbose:\n                self._print_bucket_info(bucket_sample_dict)\n\n        g = torch.Generator()\n        g.manual_seed(self.seed + self.epoch)\n        bucket_micro_batch_count = OrderedDict()\n        bucket_last_consumed = OrderedDict()\n\n        # process the samples\n        for bucket_id, data_list in bucket_sample_dict.items():\n            # handle droplast\n            bs_per_gpu = self.bucket.get_batch_size(bucket_id)\n            remainder = len(data_list) % bs_per_gpu\n\n            if remainder > 0:\n                if not self.drop_last:\n                    # if there is remainder, we pad to make it divisible\n                    data_list += data_list[: bs_per_gpu - remainder]\n                else:\n                    # we just drop the remainder to make it divisible\n                    data_list = data_list[:-remainder]\n            bucket_sample_dict[bucket_id] = data_list\n\n            # handle shuffle\n            if self.shuffle:\n                data_indices = torch.randperm(len(data_list), generator=g).tolist()\n                data_list = [data_list[i] for i in data_indices]\n                bucket_sample_dict[bucket_id] = data_list\n\n            # compute how many micro-batches each bucket has\n            num_micro_batches = len(data_list) // bs_per_gpu\n            bucket_micro_batch_count[bucket_id] = num_micro_batches\n\n        # compute the bucket access order\n        # each bucket may have more than one batch of data\n        # thus bucket_id may appear more than 1 time\n        bucket_id_access_order = []\n        for bucket_id, num_micro_batch in bucket_micro_batch_count.items():\n            bucket_id_access_order.extend([bucket_id] * num_micro_batch)\n\n        # randomize the access order\n        if self.shuffle:\n            bucket_id_access_order_indices = torch.randperm(len(bucket_id_access_order), generator=g).tolist()\n            bucket_id_access_order = [bucket_id_access_order[i] for i in bucket_id_access_order_indices]\n\n        # make the number of bucket accesses divisible by dp size\n        remainder = len(bucket_id_access_order) % self.num_replicas\n        if remainder > 0:\n            if self.drop_last:\n                bucket_id_access_order = bucket_id_access_order[: len(bucket_id_access_order) - remainder]\n            else:\n                bucket_id_access_order += bucket_id_access_order[: self.num_replicas - remainder]\n\n        # prepare each batch from its bucket\n        # according to the predefined bucket access order\n        num_iters = len(bucket_id_access_order) // self.num_replicas\n        start_iter_idx = self.last_micro_batch_access_index // self.num_replicas\n\n        # re-compute the micro-batch consumption\n        # this is useful when resuming from a state dict with a different number of GPUs\n        self.last_micro_batch_access_index = start_iter_idx * self.num_replicas\n        for i in range(self.last_micro_batch_access_index):\n            bucket_id = bucket_id_access_order[i]\n            bucket_bs = self.bucket.get_batch_size(bucket_id)\n            if bucket_id in bucket_last_consumed:\n                bucket_last_consumed[bucket_id] += bucket_bs\n            else:\n                bucket_last_consumed[bucket_id] = bucket_bs\n\n        for i in range(start_iter_idx, num_iters):\n            bucket_access_list = bucket_id_access_order[i * self.num_replicas : (i + 1) * self.num_replicas]\n            self.last_micro_batch_access_index += self.num_replicas\n\n            # compute the data samples consumed by each access\n            bucket_access_boundaries = []\n            for bucket_id in bucket_access_list:\n                bucket_bs = self.bucket.get_batch_size(bucket_id)\n                last_consumed_index = bucket_last_consumed.get(bucket_id, 0)\n                bucket_access_boundaries.append([last_consumed_index, last_consumed_index + bucket_bs])\n\n                # update consumption\n                if bucket_id in bucket_last_consumed:\n                    bucket_last_consumed[bucket_id] += bucket_bs\n                else:\n                    bucket_last_consumed[bucket_id] = bucket_bs\n\n            # compute the range of data accessed by each GPU\n            bucket_id = bucket_access_list[self.rank]\n            boundary = bucket_access_boundaries[self.rank]\n            cur_micro_batch = bucket_sample_dict[bucket_id][boundary[0] : boundary[1]]\n\n            # encode t, h, w into the sample index\n            real_t, real_h, real_w = self.bucket.get_thw(bucket_id)\n            cur_micro_batch = [f\"{idx}-{real_t}-{real_h}-{real_w}\" for idx in cur_micro_batch]\n            yield cur_micro_batch\n\n        self.reset()\n\n    def __len__(self) -> int:\n        return self.get_num_batch() // dist.get_world_size()\n\n    def group_by_bucket(self) -> dict:\n        bucket_sample_dict = OrderedDict()\n\n        from pandarallel import pandarallel\n\n        pandarallel.initialize(nb_workers=self.num_bucket_build_workers, progress_bar=False)\n        get_logger().info(\"Building buckets...\")\n        bucket_ids = self.dataset.data.parallel_apply(\n            apply,\n            axis=1,\n            method=self.bucket.get_bucket_id,\n            frame_interval=self.dataset.frame_interval,\n            seed=self.seed + self.epoch,\n            num_bucket=self.bucket.num_bucket,\n        )\n\n        # group by bucket\n        # each data sample is put into a bucket with a similar image/video size\n        for i in range(len(self.dataset)):\n            bucket_id = bucket_ids[i]\n            if bucket_id is None:\n                continue\n            if bucket_id not in bucket_sample_dict:\n                bucket_sample_dict[bucket_id] = []\n            bucket_sample_dict[bucket_id].append(i)\n        return bucket_sample_dict\n\n    def get_num_batch(self) -> int:\n        bucket_sample_dict = self.group_by_bucket()\n        self._get_num_batch_cached_bucket_sample_dict = bucket_sample_dict\n\n        # calculate the number of batches\n        if self.verbose:\n            self._print_bucket_info(bucket_sample_dict)\n        return self.approximate_num_batch\n\n    def _print_bucket_info(self, bucket_sample_dict: dict) -> None:\n        # collect statistics\n        total_samples = 0\n        total_batch = 0\n        num_aspect_dict = defaultdict(lambda: [0, 0])\n        num_hwt_dict = defaultdict(lambda: [0, 0])\n        for k, v in bucket_sample_dict.items():\n            size = len(v)\n            num_batch = size // self.bucket.get_batch_size(k[:-1])\n\n            total_samples += size\n            total_batch += num_batch\n\n            num_aspect_dict[k[-1]][0] += size\n            num_aspect_dict[k[-1]][1] += num_batch\n            num_hwt_dict[k[:-1]][0] += size\n            num_hwt_dict[k[:-1]][1] += num_batch\n\n        # sort\n        num_aspect_dict = dict(sorted(num_aspect_dict.items(), key=lambda x: x[0]))\n        num_hwt_dict = dict(\n            sorted(num_hwt_dict.items(), key=lambda x: (get_num_pixels(x[0][0]), x[0][1]), reverse=True)\n        )\n        num_hwt_img_dict = {k: v for k, v in num_hwt_dict.items() if k[1] == 1}\n        num_hwt_vid_dict = {k: v for k, v in num_hwt_dict.items() if k[1] > 1}\n\n        # log\n        if dist.get_rank() == 0 and self.verbose:\n            get_logger().info(\"Bucket Info:\")\n            get_logger().info(\n                \"Bucket [#sample, #batch] by aspect ratio:\\n%s\", pformat(num_aspect_dict, sort_dicts=False)\n            )\n            get_logger().info(\n                \"Image Bucket [#sample, #batch] by HxWxT:\\n%s\", pformat(num_hwt_img_dict, sort_dicts=False)\n            )\n            get_logger().info(\n                \"Video Bucket [#sample, #batch] by HxWxT:\\n%s\", pformat(num_hwt_vid_dict, sort_dicts=False)\n            )\n            get_logger().info(\n                \"#training batch: %s, #training sample: %s, #non empty bucket: %s\",\n                format_numel_str(total_batch),\n                format_numel_str(total_samples),\n                len(bucket_sample_dict),\n            )\n        self.approximate_num_batch = total_batch\n\n    def reset(self):\n        self.last_micro_batch_access_index = 0\n\n    def state_dict(self, num_steps: int) -> dict:\n        # the last_micro_batch_access_index in the __iter__ is often\n        # not accurate during multi-workers and data prefetching\n        # thus, we need the user to pass the actual steps which have been executed\n        # to calculate the correct last_micro_batch_access_index\n        return {\"seed\": self.seed, \"epoch\": self.epoch, \"last_micro_batch_access_index\": num_steps * self.num_replicas}\n\n    def load_state_dict(self, state_dict: dict) -> None:\n        self.__dict__.update(state_dict)\n\n\nclass BatchDistributedSampler(DistributedSampler):\n    \"\"\"\n    Used with BatchDataset;\n    Suppose len_buffer == 5, num_buffers == 6, #GPUs == 3, then\n           | buffer {i}          | buffer {i+1}\n    ------ | ------------------- | -------------------\n    rank 0 |  0,  1,  2,  3,  4, |  5,  6,  7,  8,  9\n    rank 1 | 10, 11, 12, 13, 14, | 15, 16, 17, 18, 19\n    rank 2 | 20, 21, 22, 23, 24, | 25, 26, 27, 28, 29\n    \"\"\"\n\n    def __init__(self, dataset: Dataset, **kwargs):\n        super().__init__(dataset, **kwargs)\n        self.start_index = 0\n\n    def __iter__(self):\n        num_buffers = self.dataset.num_buffers\n        len_buffer = self.dataset.len_buffer\n        num_buffers_i = num_buffers // self.num_replicas\n        num_samples_i = len_buffer * num_buffers_i\n\n        indices_i = np.arange(self.start_index, num_samples_i) + self.rank * num_samples_i\n        indices_i = indices_i.tolist()\n\n        return iter(indices_i)\n\n    def reset(self):\n        self.start_index = 0\n\n    def state_dict(self, step) -> dict:\n        return {\"start_index\": step}\n\n    def load_state_dict(self, state_dict: dict):\n        self.start_index = state_dict[\"start_index\"] + 1\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/datasets/utils.py",
    "content": "import os\nimport re\n\nimport numpy as np\nimport pandas as pd\nimport requests\nimport torch\nimport torchvision\nimport torchvision.transforms as transforms\nfrom PIL import Image\nfrom torchvision.datasets.folder import IMG_EXTENSIONS, pil_loader\nfrom torchvision.io import write_video\nfrom torchvision.utils import save_image\n\nfrom . import video_transforms\n\nVID_EXTENSIONS = (\".mp4\", \".avi\", \".mov\", \".mkv\")\n\nregex = re.compile(\n    r\"^(?:http|ftp)s?://\"  # http:// or https://\n    r\"(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\\.)+(?:[A-Z]{2,6}\\.?|[A-Z0-9-]{2,}\\.?)|\"  # domain...\n    r\"localhost|\"  # localhost...\n    r\"\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\"  # ...or ip\n    r\"(?::\\d+)?\"  # optional port\n    r\"(?:/?|[/?]\\S+)$\",\n    re.IGNORECASE,\n)\n\n\ndef is_img(path):\n    ext = os.path.splitext(path)[-1].lower()\n    return ext in IMG_EXTENSIONS\n\n\ndef is_vid(path):\n    ext = os.path.splitext(path)[-1].lower()\n    return ext in VID_EXTENSIONS\n\n\ndef is_url(url):\n    return re.match(regex, url) is not None\n\n\ndef read_file(input_path):\n    if input_path.endswith(\".csv\"):\n        return pd.read_csv(input_path)\n    elif input_path.endswith(\".parquet\"):\n        return pd.read_parquet(input_path)\n    else:\n        raise NotImplementedError(f\"Unsupported file format: {input_path}\")\n\n\ndef download_url(input_path):\n    output_dir = \"cache\"\n    os.makedirs(output_dir, exist_ok=True)\n    base_name = os.path.basename(input_path)\n    output_path = os.path.join(output_dir, base_name)\n    img_data = requests.get(input_path).content\n    with open(output_path, \"wb\") as handler:\n        handler.write(img_data)\n    print(f\"URL {input_path} downloaded to {output_path}\")\n    return output_path\n\n\ndef temporal_random_crop(vframes, num_frames, frame_interval):\n    temporal_sample = video_transforms.TemporalRandomCrop(num_frames * frame_interval)\n    total_frames = len(vframes)\n    start_frame_ind, end_frame_ind = temporal_sample(total_frames)\n    assert (\n        end_frame_ind - start_frame_ind >= num_frames\n    ), f\"Not enough frames to sample, {end_frame_ind} - {start_frame_ind} < {num_frames}\"\n    frame_indice = np.linspace(start_frame_ind, end_frame_ind - 1, num_frames, dtype=int)\n    video = vframes[frame_indice]\n    return video\n\n\ndef get_transforms_video(name=\"center\", image_size=(256, 256)):\n    if name is None:\n        return None\n    elif name == \"center\":\n        assert image_size[0] == image_size[1], \"image_size must be square for center crop\"\n        transform_video = transforms.Compose(\n            [\n                video_transforms.ToTensorVideo(),  # TCHW\n                # video_transforms.RandomHorizontalFlipVideo(),\n                video_transforms.UCFCenterCropVideo(image_size[0]),\n                transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], inplace=True),\n            ]\n        )\n    elif name == \"resize_crop\":\n        transform_video = transforms.Compose(\n            [\n                video_transforms.ToTensorVideo(),  # TCHW\n                video_transforms.ResizeCrop(image_size),\n                transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], inplace=True),\n            ]\n        )\n    else:\n        raise NotImplementedError(f\"Transform {name} not implemented\")\n    return transform_video\n\n\ndef get_transforms_image(name=\"center\", image_size=(256, 256)):\n    if name is None:\n        return None\n    elif name == \"center\":\n        assert image_size[0] == image_size[1], \"Image size must be square for center crop\"\n        transform = transforms.Compose(\n            [\n                transforms.Lambda(lambda pil_image: center_crop_arr(pil_image, image_size[0])),\n                # transforms.RandomHorizontalFlip(),\n                transforms.ToTensor(),\n                transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], inplace=True),\n            ]\n        )\n    elif name == \"resize_crop\":\n        transform = transforms.Compose(\n            [\n                transforms.Lambda(lambda pil_image: resize_crop_to_fill(pil_image, image_size)),\n                transforms.ToTensor(),\n                transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], inplace=True),\n            ]\n        )\n    else:\n        raise NotImplementedError(f\"Transform {name} not implemented\")\n    return transform\n\n\ndef read_image_from_path(path, transform=None, transform_name=\"center\", num_frames=1, image_size=(256, 256)):\n    image = pil_loader(path)\n    if transform is None:\n        transform = get_transforms_image(image_size=image_size, name=transform_name)\n    image = transform(image)\n    video = image.unsqueeze(0).repeat(num_frames, 1, 1, 1)\n    video = video.permute(1, 0, 2, 3)\n    return video\n\n\ndef read_video_from_path(path, transform=None, transform_name=\"center\", image_size=(256, 256)):\n    vframes, aframes, info = torchvision.io.read_video(filename=path, pts_unit=\"sec\", output_format=\"TCHW\")\n    if transform is None:\n        transform = get_transforms_video(image_size=image_size, name=transform_name)\n    video = transform(vframes)  # T C H W\n    video = video.permute(1, 0, 2, 3)\n    return video\n\n\ndef read_from_path(path, image_size, transform_name=\"center\"):\n    if is_url(path):\n        path = download_url(path)\n    ext = os.path.splitext(path)[-1].lower()\n    if ext.lower() in VID_EXTENSIONS:\n        return read_video_from_path(path, image_size=image_size, transform_name=transform_name)\n    else:\n        assert ext.lower() in IMG_EXTENSIONS, f\"Unsupported file format: {ext}\"\n        return read_image_from_path(path, image_size=image_size, transform_name=transform_name)\n\n\ndef save_sample(x, save_path=None, fps=8, normalize=True, value_range=(-1, 1), force_video=False, verbose=True):\n    \"\"\"\n    Args:\n        x (Tensor): shape [C, T, H, W]\n    \"\"\"\n    assert x.ndim == 4\n\n    if not force_video and x.shape[1] == 1:  # T = 1: save as image\n        save_path += \".png\"\n        x = x.squeeze(1)\n        save_image([x], save_path, normalize=normalize, value_range=value_range)\n    else:\n        save_path += \".mp4\"\n        if normalize:\n            low, high = value_range\n            x.clamp_(min=low, max=high)\n            x.sub_(low).div_(max(high - low, 1e-5))\n\n        x = x.mul(255).add_(0.5).clamp_(0, 255).permute(1, 2, 3, 0).to(\"cpu\", torch.uint8)\n        write_video(save_path, x, fps=fps, video_codec=\"h264\")\n    if verbose:\n        print(f\"Saved to {save_path}\")\n    return save_path\n\n\ndef center_crop_arr(pil_image, image_size):\n    \"\"\"\n    Center cropping implementation from ADM.\n    https://github.com/openai/guided-diffusion/blob/8fb3ad9197f16bbc40620447b2742e13458d2831/guided_diffusion/image_datasets.py#L126\n    \"\"\"\n    while min(*pil_image.size) >= 2 * image_size:\n        pil_image = pil_image.resize(tuple(x // 2 for x in pil_image.size), resample=Image.BOX)\n\n    scale = image_size / min(*pil_image.size)\n    pil_image = pil_image.resize(tuple(round(x * scale) for x in pil_image.size), resample=Image.BICUBIC)\n\n    arr = np.array(pil_image)\n    crop_y = (arr.shape[0] - image_size) // 2\n    crop_x = (arr.shape[1] - image_size) // 2\n    return Image.fromarray(arr[crop_y : crop_y + image_size, crop_x : crop_x + image_size])\n\n\ndef resize_crop_to_fill(pil_image, image_size):\n    w, h = pil_image.size  # PIL is (W, H)\n    th, tw = image_size\n    rh, rw = th / h, tw / w\n    if rh > rw:\n        sh, sw = th, round(w * rh)\n        image = pil_image.resize((sw, sh), Image.BICUBIC)\n        i = 0\n        j = int(round((sw - tw) / 2.0))\n    else:\n        sh, sw = round(h * rw), tw\n        image = pil_image.resize((sw, sh), Image.BICUBIC)\n        i = int(round((sh - th) / 2.0))\n        j = 0\n    arr = np.array(image)\n    assert i + th <= arr.shape[0] and j + tw <= arr.shape[1]\n    return Image.fromarray(arr[i : i + th, j : j + tw])\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/datasets/video_transforms.py",
    "content": "# Copyright 2024 Vchitect/Latte\n\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n\n#     http://www.apache.org/licenses/LICENSE-2.0\n\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.# Modified from Latte\n\n# - This file is adapted from https://github.com/Vchitect/Latte/blob/main/datasets/video_transforms.py\n\n\nimport numbers\nimport random\n\nimport numpy as np\nimport torch\n\n\ndef _is_tensor_video_clip(clip):\n    if not torch.is_tensor(clip):\n        raise TypeError(\"clip should be Tensor. Got %s\" % type(clip))\n\n    if not clip.ndimension() == 4:\n        raise ValueError(\"clip should be 4D. Got %dD\" % clip.dim())\n\n    return True\n\n\ndef crop(clip, i, j, h, w):\n    \"\"\"\n    Args:\n        clip (torch.tensor): Video clip to be cropped. Size is (T, C, H, W)\n    \"\"\"\n    if len(clip.size()) != 4:\n        raise ValueError(\"clip should be a 4D tensor\")\n    return clip[..., i : i + h, j : j + w]\n\n\ndef resize(clip, target_size, interpolation_mode):\n    if len(target_size) != 2:\n        raise ValueError(f\"target size should be tuple (height, width), instead got {target_size}\")\n    return torch.nn.functional.interpolate(clip, size=target_size, mode=interpolation_mode, align_corners=False)\n\n\ndef resize_scale(clip, target_size, interpolation_mode):\n    if len(target_size) != 2:\n        raise ValueError(f\"target size should be tuple (height, width), instead got {target_size}\")\n    H, W = clip.size(-2), clip.size(-1)\n    scale_ = target_size[0] / min(H, W)\n    return torch.nn.functional.interpolate(clip, scale_factor=scale_, mode=interpolation_mode, align_corners=False)\n\n\ndef resized_crop(clip, i, j, h, w, size, interpolation_mode=\"bilinear\"):\n    \"\"\"\n    Do spatial cropping and resizing to the video clip\n    Args:\n        clip (torch.tensor): Video clip to be cropped. Size is (T, C, H, W)\n        i (int): i in (i,j) i.e coordinates of the upper left corner.\n        j (int): j in (i,j) i.e coordinates of the upper left corner.\n        h (int): Height of the cropped region.\n        w (int): Width of the cropped region.\n        size (tuple(int, int)): height and width of resized clip\n    Returns:\n        clip (torch.tensor): Resized and cropped clip. Size is (T, C, H, W)\n    \"\"\"\n    if not _is_tensor_video_clip(clip):\n        raise ValueError(\"clip should be a 4D torch.tensor\")\n    clip = crop(clip, i, j, h, w)\n    clip = resize(clip, size, interpolation_mode)\n    return clip\n\n\ndef center_crop(clip, crop_size):\n    if not _is_tensor_video_clip(clip):\n        raise ValueError(\"clip should be a 4D torch.tensor\")\n    h, w = clip.size(-2), clip.size(-1)\n    th, tw = crop_size\n    if h < th or w < tw:\n        raise ValueError(\"height and width must be no smaller than crop_size\")\n\n    i = int(round((h - th) / 2.0))\n    j = int(round((w - tw) / 2.0))\n    return crop(clip, i, j, th, tw)\n\n\ndef center_crop_using_short_edge(clip):\n    if not _is_tensor_video_clip(clip):\n        raise ValueError(\"clip should be a 4D torch.tensor\")\n    h, w = clip.size(-2), clip.size(-1)\n    if h < w:\n        th, tw = h, h\n        i = 0\n        j = int(round((w - tw) / 2.0))\n    else:\n        th, tw = w, w\n        i = int(round((h - th) / 2.0))\n        j = 0\n    return crop(clip, i, j, th, tw)\n\n\ndef resize_crop_to_fill(clip, target_size):\n    if not _is_tensor_video_clip(clip):\n        raise ValueError(\"clip should be a 4D torch.tensor\")\n    h, w = clip.size(-2), clip.size(-1)\n    th, tw = target_size[0], target_size[1]\n    rh, rw = th / h, tw / w\n    if rh > rw:\n        sh, sw = th, round(w * rh)\n        clip = resize(clip, (sh, sw), \"bilinear\")\n        i = 0\n        j = int(round(sw - tw) / 2.0)\n    else:\n        sh, sw = round(h * rw), tw\n        clip = resize(clip, (sh, sw), \"bilinear\")\n        i = int(round(sh - th) / 2.0)\n        j = 0\n    assert i + th <= clip.size(-2) and j + tw <= clip.size(-1)\n    return crop(clip, i, j, th, tw)\n\n\ndef random_shift_crop(clip):\n    \"\"\"\n    Slide along the long edge, with the short edge as crop size\n    \"\"\"\n    if not _is_tensor_video_clip(clip):\n        raise ValueError(\"clip should be a 4D torch.tensor\")\n    h, w = clip.size(-2), clip.size(-1)\n\n    if h <= w:\n        short_edge = h\n    else:\n        short_edge = w\n\n    th, tw = short_edge, short_edge\n\n    i = torch.randint(0, h - th + 1, size=(1,)).item()\n    j = torch.randint(0, w - tw + 1, size=(1,)).item()\n    return crop(clip, i, j, th, tw)\n\n\ndef to_tensor(clip):\n    \"\"\"\n    Convert tensor data type from uint8 to float, divide value by 255.0 and\n    permute the dimensions of clip tensor\n    Args:\n        clip (torch.tensor, dtype=torch.uint8): Size is (T, C, H, W)\n    Return:\n        clip (torch.tensor, dtype=torch.float): Size is (T, C, H, W)\n    \"\"\"\n    _is_tensor_video_clip(clip)\n    if not clip.dtype == torch.uint8:\n        raise TypeError(\"clip tensor should have data type uint8. Got %s\" % str(clip.dtype))\n    # return clip.float().permute(3, 0, 1, 2) / 255.0\n    return clip.float() / 255.0\n\n\ndef normalize(clip, mean, std, inplace=False):\n    \"\"\"\n    Args:\n        clip (torch.tensor): Video clip to be normalized. Size is (T, C, H, W)\n        mean (tuple): pixel RGB mean. Size is (3)\n        std (tuple): pixel standard deviation. Size is (3)\n    Returns:\n        normalized clip (torch.tensor): Size is (T, C, H, W)\n    \"\"\"\n    if not _is_tensor_video_clip(clip):\n        raise ValueError(\"clip should be a 4D torch.tensor\")\n    if not inplace:\n        clip = clip.clone()\n    mean = torch.as_tensor(mean, dtype=clip.dtype, device=clip.device)\n    # print(mean)\n    std = torch.as_tensor(std, dtype=clip.dtype, device=clip.device)\n    clip.sub_(mean[:, None, None, None]).div_(std[:, None, None, None])\n    return clip\n\n\ndef hflip(clip):\n    \"\"\"\n    Args:\n        clip (torch.tensor): Video clip to be normalized. Size is (T, C, H, W)\n    Returns:\n        flipped clip (torch.tensor): Size is (T, C, H, W)\n    \"\"\"\n    if not _is_tensor_video_clip(clip):\n        raise ValueError(\"clip should be a 4D torch.tensor\")\n    return clip.flip(-1)\n\n\nclass ResizeCrop:\n    def __init__(self, size):\n        if isinstance(size, numbers.Number):\n            self.size = (int(size), int(size))\n        else:\n            self.size = size\n\n    def __call__(self, clip):\n        clip = resize_crop_to_fill(clip, self.size)\n        return clip\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(size={self.size})\"\n\n\nclass RandomCropVideo:\n    def __init__(self, size):\n        if isinstance(size, numbers.Number):\n            self.size = (int(size), int(size))\n        else:\n            self.size = size\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n            clip (torch.tensor): Video clip to be cropped. Size is (T, C, H, W)\n        Returns:\n            torch.tensor: randomly cropped video clip.\n                size is (T, C, OH, OW)\n        \"\"\"\n        i, j, h, w = self.get_params(clip)\n        return crop(clip, i, j, h, w)\n\n    def get_params(self, clip):\n        h, w = clip.shape[-2:]\n        th, tw = self.size\n\n        if h < th or w < tw:\n            raise ValueError(f\"Required crop size {(th, tw)} is larger than input image size {(h, w)}\")\n\n        if w == tw and h == th:\n            return 0, 0, h, w\n\n        i = torch.randint(0, h - th + 1, size=(1,)).item()\n        j = torch.randint(0, w - tw + 1, size=(1,)).item()\n\n        return i, j, th, tw\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(size={self.size})\"\n\n\nclass CenterCropResizeVideo:\n    \"\"\"\n    First use the short side for cropping length,\n    center crop video, then resize to the specified size\n    \"\"\"\n\n    def __init__(\n        self,\n        size,\n        interpolation_mode=\"bilinear\",\n    ):\n        if isinstance(size, tuple):\n            if len(size) != 2:\n                raise ValueError(f\"size should be tuple (height, width), instead got {size}\")\n            self.size = size\n        else:\n            self.size = (size, size)\n\n        self.interpolation_mode = interpolation_mode\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n            clip (torch.tensor): Video clip to be cropped. Size is (T, C, H, W)\n        Returns:\n            torch.tensor: scale resized / center cropped video clip.\n                size is (T, C, crop_size, crop_size)\n        \"\"\"\n        clip_center_crop = center_crop_using_short_edge(clip)\n        clip_center_crop_resize = resize(\n            clip_center_crop, target_size=self.size, interpolation_mode=self.interpolation_mode\n        )\n        return clip_center_crop_resize\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(size={self.size}, interpolation_mode={self.interpolation_mode}\"\n\n\nclass UCFCenterCropVideo:\n    \"\"\"\n    First scale to the specified size in equal proportion to the short edge,\n    then center cropping\n    \"\"\"\n\n    def __init__(\n        self,\n        size,\n        interpolation_mode=\"bilinear\",\n    ):\n        if isinstance(size, tuple):\n            if len(size) != 2:\n                raise ValueError(f\"size should be tuple (height, width), instead got {size}\")\n            self.size = size\n        else:\n            self.size = (size, size)\n\n        self.interpolation_mode = interpolation_mode\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n            clip (torch.tensor): Video clip to be cropped. Size is (T, C, H, W)\n        Returns:\n            torch.tensor: scale resized / center cropped video clip.\n                size is (T, C, crop_size, crop_size)\n        \"\"\"\n        clip_resize = resize_scale(clip=clip, target_size=self.size, interpolation_mode=self.interpolation_mode)\n        clip_center_crop = center_crop(clip_resize, self.size)\n        return clip_center_crop\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(size={self.size}, interpolation_mode={self.interpolation_mode}\"\n\n\nclass KineticsRandomCropResizeVideo:\n    \"\"\"\n    Slide along the long edge, with the short edge as crop size. And resie to the desired size.\n    \"\"\"\n\n    def __init__(\n        self,\n        size,\n        interpolation_mode=\"bilinear\",\n    ):\n        if isinstance(size, tuple):\n            if len(size) != 2:\n                raise ValueError(f\"size should be tuple (height, width), instead got {size}\")\n            self.size = size\n        else:\n            self.size = (size, size)\n\n        self.interpolation_mode = interpolation_mode\n\n    def __call__(self, clip):\n        clip_random_crop = random_shift_crop(clip)\n        clip_resize = resize(clip_random_crop, self.size, self.interpolation_mode)\n        return clip_resize\n\n\nclass CenterCropVideo:\n    def __init__(\n        self,\n        size,\n        interpolation_mode=\"bilinear\",\n    ):\n        if isinstance(size, tuple):\n            if len(size) != 2:\n                raise ValueError(f\"size should be tuple (height, width), instead got {size}\")\n            self.size = size\n        else:\n            self.size = (size, size)\n\n        self.interpolation_mode = interpolation_mode\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n            clip (torch.tensor): Video clip to be cropped. Size is (T, C, H, W)\n        Returns:\n            torch.tensor: center cropped video clip.\n                size is (T, C, crop_size, crop_size)\n        \"\"\"\n        clip_center_crop = center_crop(clip, self.size)\n        return clip_center_crop\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(size={self.size}, interpolation_mode={self.interpolation_mode}\"\n\n\nclass NormalizeVideo:\n    \"\"\"\n    Normalize the video clip by mean subtraction and division by standard deviation\n    Args:\n        mean (3-tuple): pixel RGB mean\n        std (3-tuple): pixel RGB standard deviation\n        inplace (boolean): whether do in-place normalization\n    \"\"\"\n\n    def __init__(self, mean, std, inplace=False):\n        self.mean = mean\n        self.std = std\n        self.inplace = inplace\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n            clip (torch.tensor): video clip must be normalized. Size is (C, T, H, W)\n        \"\"\"\n        return normalize(clip, self.mean, self.std, self.inplace)\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(mean={self.mean}, std={self.std}, inplace={self.inplace})\"\n\n\nclass ToTensorVideo:\n    \"\"\"\n    Convert tensor data type from uint8 to float, divide value by 255.0 and\n    permute the dimensions of clip tensor\n    \"\"\"\n\n    def __init__(self):\n        pass\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n            clip (torch.tensor, dtype=torch.uint8): Size is (T, C, H, W)\n        Return:\n            clip (torch.tensor, dtype=torch.float): Size is (T, C, H, W)\n        \"\"\"\n        return to_tensor(clip)\n\n    def __repr__(self) -> str:\n        return self.__class__.__name__\n\n\nclass RandomHorizontalFlipVideo:\n    \"\"\"\n    Flip the video clip along the horizontal direction with a given probability\n    Args:\n        p (float): probability of the clip being flipped. Default value is 0.5\n    \"\"\"\n\n    def __init__(self, p=0.5):\n        self.p = p\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n            clip (torch.tensor): Size is (T, C, H, W)\n        Return:\n            clip (torch.tensor): Size is (T, C, H, W)\n        \"\"\"\n        if random.random() < self.p:\n            clip = hflip(clip)\n        return clip\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(p={self.p})\"\n\n\n#  ------------------------------------------------------------\n#  ---------------------  Sampling  ---------------------------\n#  ------------------------------------------------------------\nclass TemporalRandomCrop(object):\n    \"\"\"Temporally crop the given frame indices at a random location.\n\n    Args:\n            size (int): Desired length of frames will be seen in the model.\n    \"\"\"\n\n    def __init__(self, size):\n        self.size = size\n\n    def __call__(self, total_frames):\n        rand_end = max(0, total_frames - self.size - 1)\n        begin_index = random.randint(0, rand_end)\n        end_index = min(begin_index + self.size, total_frames)\n        return begin_index, end_index\n\n\nif __name__ == \"__main__\":\n    import os\n\n    import numpy as np\n    import torchvision.io as io\n    from torchvision import transforms\n    from torchvision.utils import save_image\n\n    vframes, aframes, info = io.read_video(filename=\"./v_Archery_g01_c03.avi\", pts_unit=\"sec\", output_format=\"TCHW\")\n\n    trans = transforms.Compose(\n        [\n            ToTensorVideo(),\n            RandomHorizontalFlipVideo(),\n            UCFCenterCropVideo(512),\n            # NormalizeVideo(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], inplace=True),\n            transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], inplace=True),\n        ]\n    )\n\n    target_video_len = 32\n    frame_interval = 1\n    total_frames = len(vframes)\n    print(total_frames)\n\n    temporal_sample = TemporalRandomCrop(target_video_len * frame_interval)\n\n    # Sampling video frames\n    start_frame_ind, end_frame_ind = temporal_sample(total_frames)\n    # print(start_frame_ind)\n    # print(end_frame_ind)\n    assert end_frame_ind - start_frame_ind >= target_video_len\n    frame_indice = np.linspace(start_frame_ind, end_frame_ind - 1, target_video_len, dtype=int)\n    print(frame_indice)\n\n    select_vframes = vframes[frame_indice]\n    print(select_vframes.shape)\n    print(select_vframes.dtype)\n\n    select_vframes_trans = trans(select_vframes)\n    print(select_vframes_trans.shape)\n    print(select_vframes_trans.dtype)\n\n    select_vframes_trans_int = ((select_vframes_trans * 0.5 + 0.5) * 255).to(dtype=torch.uint8)\n    print(select_vframes_trans_int.dtype)\n    print(select_vframes_trans_int.permute(0, 2, 3, 1).shape)\n\n    io.write_video(\"./test.avi\", select_vframes_trans_int.permute(0, 2, 3, 1), fps=8)\n\n    for i in range(target_video_len):\n        save_image(\n            select_vframes_trans[i], os.path.join(\"./test000\", \"%04d.png\" % i), normalize=True, value_range=(-1, 1)\n        )\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/cache_functions/__init__.py",
    "content": "from .cache_cutfresh import cache_cutfresh\nfrom .fresh_ratio_scheduler import fresh_ratio_scheduler\nfrom .score_evaluate import score_evaluate\nfrom .global_force_fresh import global_force_fresh\nfrom .cache_cutfresh import cache_cutfresh\nfrom .update_cache import update_cache\nfrom .force_init import force_init\nfrom .attention import cached_attention_forward\nfrom .cache_init import cache_init"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/cache_functions/attention.py",
    "content": "# Besides, re-arrange the attention module\nfrom torch.jit import Final\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom typing import Optional, Union\nfrom xformers.ops.fmha.attn_bias import BlockDiagonalMask\ndef cached_attention_forward(\n    query: torch.Tensor,\n    key: torch.Tensor,\n    value: torch.Tensor,\n    attn_bias: Optional[Union[torch.Tensor, BlockDiagonalMask]] = None,\n    p: float = 0.0,\n    scale: Optional[float] = None\n) -> torch.Tensor:\n    scale = 1.0 / query.shape[-1] ** 0.5\n    query = query * scale\n    query = query.transpose(1, 2)\n    key = key.transpose(1, 2)\n    value = value.transpose(1, 2)\n    #attn = query @ key.transpose(-2, -1)\n    attn = torch.matmul(query, key.transpose(-2, -1))\n    if attn_bias is not None:\n        attn_bias = attn_bias.materialize(shape= attn.shape, dtype= attn.dtype, device= attn.device)\n        attn = attn + attn_bias\n    #out_map = attn\n    attn_map = attn.softmax(-1)\n    attn = F.dropout(attn_map, p)\n    attn = torch.matmul(attn, value)\n    #attn = attn @ value\n\n    return attn.transpose(1, 2).contiguous(), attn_map.mean(dim=1)"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/cache_functions/cache_cutfresh.py",
    "content": "from .fresh_ratio_scheduler import fresh_ratio_scheduler\nfrom .score_evaluate import score_evaluate\n#from .token_merge import token_merge\nimport torch\ndef cache_cutfresh(cache_dic, tokens, current):\n    \"\"\"\n    indices: (B, N), the index tensor for the fresh tokens, tell where the 1st, 2nd, 3rd... tokens are\n    fresh_indices: (B, fresh_ratio * N), top fresh_ratio cut for indices\n    fresh_tokens: (B, fresh_ratio * N, D), the fresh tokens\n    \"\"\"\n    tick1 = torch.cuda.Event(enable_timing=True)\n    tick2 = torch.cuda.Event(enable_timing=True)\n    #tick3 = torch.cuda.Event(enable_timing=True)\n    #tick4 = torch.cuda.Event(enable_timing=True)\n\n    step = current['step']\n    layer = current['layer']\n    module = current['module']\n\n    fresh_ratio = fresh_ratio_scheduler(cache_dic, current)\n\n    fresh_ratio = torch.clamp(torch.tensor(fresh_ratio, device = tokens.device), min=0, max=1) # 0.03ms\n    # Generate the index tensor for fresh tokens\n    #tick1.record()\n    score = score_evaluate(cache_dic, tokens, current) # 0.26ms\n    #tick2.record()\n    #score = local_selection_with_space_time_bonus(cache_dic, score, 0.3, 2, time_mean=False)\n    indices = score.argsort(dim=-1, descending=True) # 0.12ms\n    #indices = cache_dic['indices_cache'][current['flag']][current['layer']]\n    topk = int(fresh_ratio * score.shape[1])\n    #topk = int(fresh_ratio * cache_dic['dynamic_size'][2] * cache_dic['dynamic_size'][3]) * cache_dic['dynamic_size'][1]\n    fresh_indices = indices[:, :topk] #前fresh_ratio的token的index\n    stale_indices = indices[:, topk:] #后1-fresh_ratio的token的index\n    # (B, fresh_ratio *N)\n\n    # stale tokens index + 1 in each ***module***, fresh tokens index = 0\n    cache_dic['cache_index'][current['flag']][layer][module] += 1\n    cache_dic['cache_index'][current['flag']][layer][module].scatter_(dim=1, index=fresh_indices, \n                                                                    src = torch.zeros_like(fresh_indices, dtype=torch.int, device=fresh_indices.device))\n    cache_dic['cache_index']['layer_index'][module] += 1\n    cache_dic['cache_index']['layer_index'][module].scatter_(dim=1, index=fresh_indices, \n                                                                    src = torch.zeros_like(fresh_indices, dtype=torch.int, device=fresh_indices.device))\n    # 0.08ms\n    # select the fresh tokens out\n    fresh_indices_expand = fresh_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1])\n    #stale_indices_expand = stale_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1])\n    #if cache_dic['merge_weight'] != 0:\n    #    token_merge(cache_dic, tokens, current, fresh_indices, stale_indices)        \n    \n    if module in ['mlp', 'attn', 'cross-attn']:\n         \n        fresh_tokens = torch.gather(input = tokens, dim = 1, index = fresh_indices_expand)\n        # 0.10ms\n        #torch.cuda.synchronize()\n        #print(tick1.elapsed_time(tick2))\n        return fresh_indices, fresh_tokens\n    else:\n        raise ValueError(\"Unrecognized module?\", module)\n    \nimport torch\nfrom einops import rearrange\n\ndef local_selection_with_space_time_bonus(cache_dic, score, bonus_ratio, grid_size=2, time_mean = False):\n    # 从 cache_dic 中获取张量的形状\n    B, T, H, W = cache_dic['dynamic_size']\n    \n    # 对 score 进行变形，将其重塑为 [B, T, H, W] 的形状\n    score = rearrange(score, \"B (T H W) -> B T H W\", T=T, H=H, W=W)\n    \n    # 计算补 0 的尺寸，使得 H 和 W 都能被 grid_size 整除\n    pad_h = (grid_size - H % grid_size) % grid_size  # H 维度需要补充的 0 的数量\n    pad_w = (grid_size - W % grid_size) % grid_size  # W 维度需要补充的 0 的数量\n    \n    # 对 H 和 W 维度进行补 0\n    if pad_h > 0 or pad_w > 0:\n        score = torch.nn.functional.pad(score, (0, pad_w, 0, pad_h))  # (W 左右补 pad_w, H 上下补 pad_h)\n\n    # 更新补 0 后的 H 和 W\n    H_padded, W_padded = score.shape[2], score.shape[3]\n    \n    # Step 1: 在 H*W 维度上进行归一化，使得不同时间步的信息权重相同\n    score = score.view(B, T, -1)  # 将 H 和 W 合并为一个维度 [B, T, H*W]\n    score = torch.nn.functional.softmax(score, dim=-1)  # 在 H*W 维度上进行归一化\n    score = score.view(B, T, H_padded, W_padded)  # 恢复到 [B, T, H_padded, W_padded] 形状\n\n    # Step 2: 在每个空间切片（即每个 T 时间步内）进行分块操作\n    block_size = grid_size * grid_size\n    assert (H_padded * W_padded) % block_size == 0, f\"H_padded * W_padded 必须能被块大小整除, shape: {B},{T},{H_padded},{W_padded}; block:{grid_size}*{grid_size};\" \n\n    # 将 score 重塑为按块分组的形状\n    score_reshaped = score.view(B, T, H_padded // grid_size, grid_size, W_padded // grid_size, grid_size)\n    score_reshaped = score_reshaped.permute(0, 1, 2, 4, 3, 5).contiguous()  # [B, T, H//grid_size, W//grid_size, grid_size, grid_size]\n    score_reshaped = score_reshaped.view(B, T, -1, block_size)  # [B, T, num_blocks, block_size]\n\n    # Step 3: 找到每个块中的最大分数\n    max_scores, max_indices = score_reshaped.max(dim=-1, keepdim=True)  # [B, T, num_blocks, 1]\n    \n    # Step 4: 创建掩码以标识最大分数的 token\n    mask = torch.zeros_like(score_reshaped)\n    mask.scatter_(-1, max_indices, 1)  # 将掩码在最大分数的索引位置设置为 1\n    \n    # Step 5: 仅对最大分数的 token 应用加成\n    score_reshaped = score_reshaped + (mask * max_scores * bonus_ratio)  # 仅对最大分数应用加成\n    \n    # Step 6: 将 score 还原为原始的形状\n    score_modified = score_reshaped.view(B, T, H_padded // grid_size, W_padded // grid_size, grid_size, grid_size)\n    score_modified = score_modified.permute(0, 1, 2, 4, 3, 5).contiguous()\n    score_modified = score_modified.view(B, T, H_padded, W_padded)\n\n    # Step 7: 去除补 0 的部分\n    if pad_h > 0 or pad_w > 0:\n        score_modified = score_modified[:, :, :H, :W]  # 移除补的 0\n\n    if time_mean:\n        score_modified = score_modified.mean(dim = 1)\n        score_modified = score_modified.unsqueeze(1).expand(B, T, H, W)\n    # 最后将 score 变回原始的形状 [B, (T H W)]\n    score_modified = rearrange(score_modified, \"B T H W -> B (T H W)\")\n    \n    return score_modified\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/cache_functions/cache_init.py",
    "content": "def cache_init(model_kwargs, num_steps):   \n    cache_dic = {}\n    cache = {}\n    indices_cache = {}\n    cache_index = {}\n    cache[-1]={}\n    cache[0]={}\n    indices_cache[-1]={}\n    indices_cache[0]={}\n    cache_index[-1]={}\n    cache_index[0]={}\n    cache_index['layer_index']={}\n    cache_dic['attn_map'] = {}\n    cache_dic['attn_map'][-1] = {}\n    cache_dic['attn_map'][0] = {}\n    cache_dic['cross_attn_map'] = {}\n    cache_dic['cross_attn_map'][-1] = {}\n    cache_dic['cross_attn_map'][0] = {}\n\n    for j in range(28):\n        cache[-1][j] = {}\n        indices_cache[-1] = {}\n        cache_index[-1][j] = {}\n        cache_dic['attn_map'][-1][j] = {}\n        cache_dic['cross_attn_map'][-1][j] = {}\n\n        cache[0][j] = {}\n        indices_cache[0] = {}\n        cache_index[0][j] = {}\n        cache_dic['attn_map'][0][j] = {}\n        cache_dic['cross_attn_map'][0][j] = {}\n\n    cache_dic['cache_type'] = model_kwargs['cache_type']\n    cache_dic['cache_index'] = cache_index\n    cache_dic['cache'] = cache\n    cache_dic['indices_cache'] = indices_cache\n    cache_dic['fresh_ratio_schedule'] = model_kwargs['ratio_scheduler']\n    cache_dic['fresh_ratio'] = model_kwargs['fresh_ratio']\n    cache_dic['fresh_threshold'] = model_kwargs['fresh_threshold']\n    cache_dic['force_fresh'] = model_kwargs['force_fresh']\n    cache_dic['soft_fresh_weight'] = model_kwargs['soft_fresh_weight']\n    #cache_dic['extra_flops'] = 0.0\n    #cache_dic['merge_weight'] = merge_weight\n    current = {}\n    current['num_steps'] = num_steps\n    return cache_dic, current\n    "
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/cache_functions/force_init.py",
    "content": "import torch\nfrom .force_scheduler import force_scheduler\ndef force_init(cache_dic, current, tokens):\n    cache_dic['cache_index'][current['flag']][current['layer']][current['module']] = torch.zeros(tokens.shape[0], tokens.shape[1], dtype=torch.int, device=tokens.device)\n    force_scheduler(cache_dic, current)\n    if current['layer'] == 0:\n        cache_dic['cache_index']['layer_index'][current['module']] = torch.zeros(tokens.shape[0], tokens.shape[1], dtype=torch.int, device=tokens.device)"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/cache_functions/force_scheduler.py",
    "content": "import torch\ndef force_scheduler(cache_dic, current):\n    thresholds = {}\n    if cache_dic['fresh_ratio'] == 0:\n        # FORA\n        linear_step_weight = 0.0\n    else: \n        # TokenCache\n        linear_step_weight = 0.0 #N=6 0.2 #N=4 0.3\n    step_factor = torch.tensor(1 - linear_step_weight + 2 * linear_step_weight * current['step'] / current['num_steps'])\n    threshold = torch.round(cache_dic['fresh_threshold'] / step_factor)\n    #threshold = torch.round(4 / step_factor)\n    key_point = 2\n    if current['step'] in range(0,key_point):\n        threshold = 1\n    #thresholds = {\n    #    'spat-attn' : 3,\n    #    'temp-attn' : 3,\n    #   'cross-attn' : 6,\n    #          'mlp' : 3   }\n    thresholds = {\n        'spat-attn' : 1,\n        'temp-attn' : 1,\n       'cross-attn' : 1,\n              'mlp' : 1   }\n    #if current['step'] in range(150,175):\n    #    threshold = 4\n    #elif current['step'] in list(range(0,25)) + list(range(75,100)) + list(range(175,200)) + list(range(225,250)):\n    #    threshold = 3\n    #elif current['step'] in list(range(100,125)) + list(range(150,175)) + list(range(200,225)):\n    #    threshold = 4\n    #elif current['step'] in range(100,175):\n    #    threshold = 5\n    #elif current['step'] in range(200,225):\n    #    threshold = 5\n    #step_weight = 0.25\n    #if current['step'] >= 0.5 * (1 - step_weight) * current['num_steps']:\n    #    threshold =  int(cache_dic['fresh_threshold'] * (1 + step_weight))\n    #elif current['step'] <= 0.5 * (1 - step_weight) * current['num_steps']:\n    #    threshold = int(cache_dic['fresh_threshold'] * (1 - step_weight))\n    cache_dic['cal_threshold'] = thresholds\n    #return threshold"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/cache_functions/fresh_ratio_scheduler.py",
    "content": "import torch\ndef fresh_ratio_scheduler(cache_dic, current):\n    '''\n    Return the fresh ratio for the current step.\n    '''\n    fresh_ratio = cache_dic['fresh_ratio']\n    fresh_ratio_schedule = cache_dic['fresh_ratio_schedule']\n    step = current['step']\n    num_steps = current['num_steps']\n    threshold = cache_dic['fresh_threshold']\n    weight = 0.9\n    if fresh_ratio_schedule == 'constant':\n        return fresh_ratio\n    elif fresh_ratio_schedule == 'linear':\n        return fresh_ratio * (1 + weight - 2 * weight * step / num_steps)\n    elif fresh_ratio_schedule == 'exp':\n        #return 0.5 * (0.052 ** (step/num_steps))\n        return fresh_ratio * (weight ** (step / num_steps))\n    elif fresh_ratio_schedule == 'linear-mode':\n        mode = (step % threshold)/threshold - 0.5\n        mode_weight = 0.1\n        return fresh_ratio * (1 + weight - 2 * weight * step / num_steps + mode_weight * mode)\n    elif fresh_ratio_schedule == 'layerwise':\n        return fresh_ratio * (1 + weight - 2 * weight * current['layer'] / 27)\n    elif fresh_ratio_schedule == 'linear-layerwise':\n        step_weight = 0.0 #0.9\n        step_factor = 1 + step_weight - 2 * step_weight * step / num_steps\n\n        layer_weight = 0.0\n        layer_factor = 1 + layer_weight - 2 * layer_weight * current['layer'] / 27\n\n        module_weight = 1.5\n        module_time_weight = 0.33\n        module_factor = (1 - (1-module_time_weight) * module_weight) if current['module']=='cross-attn' else (1 + module_time_weight * module_weight)\n        \n        type_weight = 0.0\n        type_factor = 1 + type_weight if current['flag'] == -1 else 1 - type_weight\n\n        return fresh_ratio * layer_factor * step_factor * module_factor * type_factor\n\n        #saved_weight = 0.25\n        ##earliest 50%\n        #if current['step'] % cache_dic['cal_threshold'] >=  (1- saved_weight) * cache_dic['cal_threshold']:\n        #    return fresh_ratio * layer_factor * step_factor / saved_weight\n        ## latest 50%\n        ##if current['step'] % cache_dic['cal_threshold'] <=  (saved_weight) * cache_dic['cal_threshold']:\n        ##    return fresh_ratio * layer_factor * step_factor / saved_weight\n#\n        #else :\n        #    return 0\n\n    else:\n        raise ValueError(\"unrecognized fresh ratio schedule\", fresh_ratio_schedule)\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/cache_functions/global_force_fresh.py",
    "content": "from .force_scheduler import force_scheduler\ndef global_force_fresh(cache_dic, current):\n    '''\n    Return whether to force fresh tokens globally.\n    '''\n    is_force_fresh = {}\n    fresh_thresholds = {}\n    first_step = (current['step'] == 0)\n    first_3steps = (current['step'] <= 2)\n    last_step = current['step'] == current['num_steps'] - 1\n    force_fresh = cache_dic['force_fresh']\n    if not first_step:\n        fresh_thresholds['spat-attn']  = cache_dic['cal_threshold']['spat-attn']\n        fresh_thresholds['temp-attn']  = cache_dic['cal_threshold']['temp-attn']\n        fresh_thresholds['cross-attn'] = cache_dic['cal_threshold']['cross-attn']\n        fresh_thresholds['mlp']        = cache_dic['cal_threshold']['mlp']\n    else:\n        fresh_thresholds['spat-attn']  = cache_dic['fresh_threshold']\n        fresh_thresholds['temp-attn']  = cache_dic['fresh_threshold']\n        fresh_thresholds['cross-attn'] = cache_dic['fresh_threshold']\n        fresh_thresholds['mlp']        = cache_dic['fresh_threshold']\n\n    if force_fresh == 'global':\n        if current['flag'] == -1:\n            is_force_fresh['attn'] =   (first_3steps or (current['step']% fresh_thresholds['temp-attn'] == 0))\n        else:\n            is_force_fresh['attn'] =   (first_3steps or (current['step']% fresh_thresholds['spat-attn'] == 0))\n\n        is_force_fresh['cross-attn'] = (first_3steps or (current['step']% fresh_thresholds['cross-attn'] == 0))\n        is_force_fresh['mlp'] =        (first_3steps or (current['step']% fresh_thresholds['mlp'] == 0))\n\n        return is_force_fresh\n    elif force_fresh == 'local':\n        return first_step\n    elif force_fresh == 'none':\n        return first_step\n    else:\n        raise ValueError(\"unrecognized force fresh strategy\", force_fresh)"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/cache_functions/score_evaluate.py",
    "content": "import torch\nimport torch.nn as nn\nfrom .scores import attn_score, similarity_score, norm_score\ndef score_evaluate(cache_dic, tokens, current) -> torch.Tensor:\n    '''\n    Return the score tensor (B, N) for the given tokens.\n    '''\n    #这里用match case 来做可读性更好，但是考虑到match case是3.10版本才有的,而且其加速性能未验证，先用if else\n    #fresh_ratio = cache_dic['fresh_ratio']\n    #cache_index = cache_dic['cache_index']\n    #start = torch.cuda.Event(enable_timing=True)\n    #end = torch.cuda.Event(enable_timing=True)\n    #start.record()\n    if ((not current['is_force_fresh']) and (cache_dic['force_fresh'] == 'local')):\n        # 0.4ms extra on 4090\n        # 从cache_index中找出达到cache_step达到fresh_threshold的tokens\n        force_fresh_mask = torch.as_tensor((cache_dic['cache_index'][current['flag']][current['layer']][current['module']] >= 2 * cache_dic['fresh_threshold']), dtype = int) # 2 because the threshold is for step, not module\n        force_len = force_fresh_mask.sum(dim=1)\n        force_indices = force_fresh_mask.argsort(dim = -1, descending = True)[:, :force_len.min()]\n        #在维度-1随机重排\n        force_indices = force_indices[:, torch.randperm(force_indices.shape[1])]\n\n    if cache_dic['cache_type'] == 'random':\n        score = torch.rand(int(tokens.shape[0]*0.5), tokens.shape[1], device=tokens.device)\n        score = torch.cat([score, score], dim=0).to(tokens.device)\n\n    elif cache_dic['cache_type'] == 'straight':\n        score = torch.ones(tokens.shape[0], tokens.shape[1]).to(tokens.device)\n    \n    elif cache_dic['cache_type'] == 'attention':\n        # cache_dic['attn_map'][step][layer] (B, N, N), the last dimention has get softmaxed\n        score = attn_score(cache_dic, current)\n        #score = score + 0.0 * torch.rand_like(score, device= score.device)\n    \n    elif cache_dic['cache_type'] == 'similarity':\n        score = similarity_score(cache_dic, current, tokens)\n\n    elif cache_dic['cache_type'] == 'norm':\n        score = norm_score(cache_dic, current, tokens)\n\n    elif cache_dic['cache_type'] == 'compress':\n        score1 = torch.rand(int(tokens.shape[0]*0.5), tokens.shape[1])\n        score1 = torch.cat([score1, score1], dim=0).to(tokens.device)\n        score2 = cache_dic['attn_map'][current['flag']][current['layer']].sum(dim=1)#.mean(dim=0) # (B, N)\n        # normalize\n        score2 = score2 / score2.max(dim=1, keepdim=True)[0]\n        score = 0.5 * score1 + 0.5 * score2\n    #end.record()\n    #torch.cuda.synchronize()\n    #print(f\"Time for score evaluation: {start.elapsed_time(end)} ms\")\n    if ((not current['is_force_fresh']) and (cache_dic['force_fresh'] == 'local')): # current['is_force_fresh'] is False, cause when it is True, no cut and fresh are needed\n            #print(torch.ones_like(force_indices, dtype=float, device=force_indices.device).dtype)\n        score.scatter_(dim=1, index=force_indices, src=torch.ones_like(force_indices, dtype=torch.float32, \n                                                                           device=force_indices.device))\n    \n    if (True and (cache_dic['force_fresh'] == 'global')):\n        soft_step_score = cache_dic['cache_index'][current['flag']][current['layer']][current['module']].float() / (cache_dic['fresh_threshold'])\n        soft_layer_score = cache_dic['cache_index']['layer_index'][current['module']].float() / (27)\n        score = score + cache_dic['soft_fresh_weight'] * soft_step_score #+ 0.1 *soft_layer_score\n    \n    return score.to(tokens.device)"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/cache_functions/scores.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\ndef attn_score(cache_dic, current):\n    #self_attn_score = 1- cache_dic['attn_map'][current['flag']][current['layer']].diagonal(dim1=1, dim2=2)\n    #self_attn_score = F.normalize(self_attn_score, dim=1, p=2)\n    #attention_score = F.normalize(cache_dic['attn_map'][current['flag']][current['layer']].sum(dim=1), dim=1, p=2)\n    #cross_attn_map = F.threshold(cache_dic['cross_attn_map'][current['flag']][current['layer']],threshold=0.0, value=0.0)\n    #cross_attention_score = F.normalize(cross_attn_map.sum(dim=-1), dim=-1, p=2)\n    cond_cmap, uncond_cmap = torch.split(cache_dic['cross_attn_map'][current['flag']][current['layer']], len(cache_dic['cross_attn_map'][current['flag']][current['layer']]) // 2, dim=0)\n    cond_weight = 0.5\n    cmap = cond_weight * cond_cmap + (1 - cond_weight) * uncond_cmap\n    cross_attention_entropy = -torch.sum(cmap * torch.log(cmap + 1e-7), dim=-1)\n    cross_attention_score   = F.normalize(1 + cross_attention_entropy, dim=1, p=2)\n    #score = self_attn_score\n    #score = attention_score\n    score = cross_attention_score.repeat(2, 1)\n    #cross_weight = 0.0\n    #score =  (1-cross_weight) * attention_score + cross_weight * cross_attention_score\n    return score\n\ndef similarity_score(cache_dic, current, tokens):\n    cosine_sim = F.cosine_similarity(tokens, cache_dic['cache'][current['flag']][current['layer']][current['module']], dim=-1)\n\n    return F.normalize(1- cosine_sim, dim=-1, p=2)\n\ndef norm_score(cache_dic, current, tokens):\n    norm = tokens.norm(dim=-1, p=2)\n    return F.normalize(norm, dim=-1, p=2)\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/cache_functions/token_merge.py",
    "content": "import torch\ndef token_merge(cache_dic, tokens, current, fresh_indices, stale_indices):\n        #fresh_tokens = torch.zeros_like(tokens).scatter_(dim=1, index=fresh_indices_expand, src=tokens.gather(dim=1, index=fresh_indices_expand))\n        #stale_tokens = torch.zeros_like(tokens).scatter_(dim=1, index=stale_indices_expand, src=tokens.gather(dim=1, index=stale_indices_expand))\n        #fresh_tokens = torch.nn.functional.normalize(fresh_tokens, p=2, dim=-1)\n        #stale_tokens = torch.nn.functional.normalize(stale_tokens, p=2, dim=-1)\n        #stale_fresh_similarity = stale_tokens @ fresh_tokens.transpose(1, 2)\n        #fresh_indices_expand = fresh_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1])\n        if (current['layer'] % 1 == 0):\n            fresh_tokens = torch.gather(input = tokens, dim = 1, index = fresh_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1]))\n            stale_tokens = torch.gather(input = tokens, dim = 1, index = stale_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1]))\n            method = 'similarity'\n            if method == 'distance':\n                descending = False\n                distance = torch.cdist(stale_tokens, fresh_tokens, p=1)\n                stale_fresh_dist, stale_fresh_indices_allstale = torch.min(distance, dim=2)\n\n            elif method == 'similarity':\n                descending = True\n                fresh_tokens = torch.nn.functional.normalize(fresh_tokens, p=2, dim=-1)\n                stale_tokens = torch.nn.functional.normalize(stale_tokens, p=2, dim=-1)\n                similarity = stale_tokens @ fresh_tokens.transpose(1, 2)\n                stale_fresh_dist, stale_fresh_indices_allstale = torch.max(similarity, dim=2)\n            \n            # 在dim =  1 上再次排序，保留 saved_topk_stale 个最小的\n            # 函数方案\n            #layer_weight = 1.0\n            #layer_factor = 1 - layer_weight + 2 * layer_weight * current['layer'] / 27\n            #layer_factor = 2 * torch.sigmoid(torch.tensor([1.0 * (current['layer'] - 13.5 )]))\n            #saved_topk_stale = int(cache_dic['merge_weight'] * stale_tokens.shape[1] * layer_factor)\n            # 阈值自适应方案\n            saved_topk_stale = int((stale_fresh_dist > 0.995).sum(dim=1).min())\n            merged_stale_sequence = torch.sort(stale_fresh_dist, dim=1, descending=descending)[1][:,:saved_topk_stale]\n            stale_fresh_indices = stale_fresh_indices_allstale.gather(1, merged_stale_sequence)\n            merged_stale_sequence = stale_indices.gather(1, merged_stale_sequence)\n            merged_stale_fresh_indices = fresh_indices.gather(1, stale_fresh_indices)\n\n            cache_dic['merged_stale_fresh_indices'] = merged_stale_fresh_indices # 距离从小到大的stale tokens 与其对应fresh tokens的index\n            cache_dic['merged_stale_sequence'] = merged_stale_sequence # 距离从小到大的stale tokens 的index\n            #print(torch.all(merged_stale_fresh_indices == merged_stale_sequence)) \n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/cache_functions/update_cache.py",
    "content": "import torch\ndef update_cache(fresh_indices, fresh_tokens, cache_dic, current, fresh_attn_map=None):\n    \"\"\"\n    Update the cache with fresh tokens based on the given index.\n    \n    Args:\n    indices (torch.Tensor): The index tensor for tokens. 从权重高到底的index\n    fresh_tokens (torch.Tensor): The fresh tokens to update the cache with.\n    cach_dic (dict): The cache dictionary containing cache data and indices.\n    current (dict): Dictionary containing the current step, layer, and module information.\n    fresh_attn_map (torch.Tensor): The attention map for the fresh tokens. attn模块里已经排好序了,直接盖上去就行\n    \"\"\"\n    step = current['step']\n    layer = current['layer']\n    module = current['module']\n    # Update the cached tokens at the positions\n    if module == 'attn':\n        indices = fresh_indices#.sort(dim=1, descending=False)[0]\n        cache_dic['attn_map'][current['flag']][layer].scatter_(dim=1, index=indices.unsqueeze(-1).expand(-1, -1, fresh_attn_map.shape[-1]), src=fresh_attn_map)\n    elif module == 'cross-attn':\n        indices = fresh_indices#.sort(dim=1, descending=False)[0]\n        cache_dic['cross_attn_map'][current['flag']][layer].scatter_(dim=1, index=indices.unsqueeze(-1).expand(-1, -1, fresh_attn_map.shape[-1]), src=fresh_attn_map)\n    elif module == 'mlp':\n        indices = fresh_indices\n\n    #if (indices.shape[1] != 0):\n    #    to_be_updated_fresh_tokens = torch.gather(input = cache_dic['cache'][current['flag']][layer][module], dim = 1, index = indices.unsqueeze(-1).expand(-1, -1, fresh_tokens.shape[-1]))\n    #    residual_token = (fresh_tokens - to_be_updated_fresh_tokens).mean(dim=1)\n    #    cache_dic['cache'][current['flag']][layer][module] = cache_dic['cache'][current['flag']][layer][module] + 0.0 * residual_token.unsqueeze(1)\n    \n    cache_dic['cache'][current['flag']][layer][module].scatter_(dim=1, index=indices.unsqueeze(-1).expand(-1, -1, fresh_tokens.shape[-1]), src=fresh_tokens)\n\n\n        \n        "
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/dit/__init__.py",
    "content": "from .dit import DiT, DiT_XL_2, DiT_XL_2x2\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/dit/dit.py",
    "content": "# Modified from Meta DiT\n\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# DiT:   https://github.com/facebookresearch/DiT/tree/main\n# GLIDE: https://github.com/openai/glide-text2im\n# MAE:   https://github.com/facebookresearch/mae/blob/main/models_mae.py\n# --------------------------------------------------------\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.utils.checkpoint\nfrom einops import rearrange\nfrom timm.models.vision_transformer import Mlp\n\nfrom opensora.acceleration.checkpoint import auto_grad_checkpoint\nfrom opensora.models.layers.blocks import (\n    Attention,\n    CaptionEmbedder,\n    FinalLayer,\n    LabelEmbedder,\n    PatchEmbed3D,\n    TimestepEmbedder,\n    approx_gelu,\n    get_1d_sincos_pos_embed,\n    get_2d_sincos_pos_embed,\n    get_layernorm,\n    modulate,\n)\nfrom opensora.registry import MODELS\nfrom opensora.utils.ckpt_utils import load_checkpoint\n\n\nclass DiTBlock(nn.Module):\n    \"\"\"\n    A DiT block with adaptive layer norm zero (adaLN-Zero) conditioning.\n    \"\"\"\n\n    def __init__(\n        self,\n        hidden_size,\n        num_heads,\n        mlp_ratio=4.0,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n    ):\n        super().__init__()\n        self.hidden_size = hidden_size\n        self.num_heads = num_heads\n        self.enable_flash_attn = enable_flash_attn\n        mlp_hidden_dim = int(hidden_size * mlp_ratio)\n\n        self.norm1 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.attn = Attention(\n            hidden_size,\n            num_heads=num_heads,\n            qkv_bias=True,\n            enable_flash_attn=enable_flash_attn,\n        )\n        self.norm2 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.mlp = Mlp(in_features=hidden_size, hidden_features=mlp_hidden_dim, act_layer=approx_gelu, drop=0)\n        self.adaLN_modulation = nn.Sequential(nn.SiLU(), nn.Linear(hidden_size, 6 * hidden_size, bias=True))\n\n    def forward(self, x, c):\n        shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = self.adaLN_modulation(c).chunk(6, dim=1)\n        x = x + gate_msa.unsqueeze(1) * self.attn(modulate(self.norm1, x, shift_msa, scale_msa))\n        x = x + gate_mlp.unsqueeze(1) * self.mlp(modulate(self.norm2, x, shift_mlp, scale_mlp))\n        return x\n\n\n@MODELS.register_module()\nclass DiT(nn.Module):\n    \"\"\"\n    Diffusion model with a Transformer backbone.\n    \"\"\"\n\n    def __init__(\n        self,\n        input_size=(16, 32, 32),\n        in_channels=4,\n        patch_size=(1, 2, 2),\n        hidden_size=1152,\n        depth=28,\n        num_heads=16,\n        mlp_ratio=4.0,\n        class_dropout_prob=0.1,\n        learn_sigma=True,\n        condition=\"text\",\n        no_temporal_pos_emb=False,\n        caption_channels=512,\n        model_max_length=77,\n        dtype=torch.float32,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n    ):\n        super().__init__()\n        self.learn_sigma = learn_sigma\n        self.in_channels = in_channels\n        self.out_channels = in_channels * 2 if learn_sigma else in_channels\n        self.hidden_size = hidden_size\n        self.patch_size = patch_size\n        self.input_size = input_size\n        num_patches = np.prod([input_size[i] // patch_size[i] for i in range(3)])\n        self.num_patches = num_patches\n        self.num_temporal = input_size[0] // patch_size[0]\n        self.num_spatial = num_patches // self.num_temporal\n        self.num_heads = num_heads\n        self.dtype = dtype\n        self.use_text_encoder = not condition.startswith(\"label\")\n        if enable_flash_attn:\n            assert dtype in [\n                torch.float16,\n                torch.bfloat16,\n            ], f\"Flash attention only supports float16 and bfloat16, but got {self.dtype}\"\n        self.no_temporal_pos_emb = no_temporal_pos_emb\n        self.mlp_ratio = mlp_ratio\n        self.depth = depth\n        assert enable_sequence_parallelism is False, \"Sequence parallelism is not supported in DiT\"\n\n        self.register_buffer(\"pos_embed_spatial\", self.get_spatial_pos_embed())\n        self.register_buffer(\"pos_embed_temporal\", self.get_temporal_pos_embed())\n\n        self.x_embedder = PatchEmbed3D(patch_size, in_channels, embed_dim=hidden_size)\n        if not self.use_text_encoder:\n            num_classes = int(condition.split(\"_\")[-1])\n            self.y_embedder = LabelEmbedder(num_classes, hidden_size, class_dropout_prob)\n        else:\n            self.y_embedder = CaptionEmbedder(\n                in_channels=caption_channels,\n                hidden_size=hidden_size,\n                uncond_prob=class_dropout_prob,\n                act_layer=approx_gelu,\n                token_num=1,  # pooled token\n            )\n        self.t_embedder = TimestepEmbedder(hidden_size)\n        self.blocks = nn.ModuleList(\n            [\n                DiTBlock(\n                    hidden_size,\n                    num_heads,\n                    mlp_ratio=mlp_ratio,\n                    enable_flash_attn=enable_flash_attn,\n                    enable_layernorm_kernel=enable_layernorm_kernel,\n                )\n                for _ in range(depth)\n            ]\n        )\n        self.final_layer = FinalLayer(hidden_size, np.prod(self.patch_size), self.out_channels)\n\n        self.initialize_weights()\n        self.enable_flash_attn = enable_flash_attn\n        self.enable_layernorm_kernel = enable_layernorm_kernel\n\n    def get_spatial_pos_embed(self):\n        pos_embed = get_2d_sincos_pos_embed(\n            self.hidden_size,\n            self.input_size[1] // self.patch_size[1],\n        )\n        pos_embed = torch.from_numpy(pos_embed).float().unsqueeze(0).requires_grad_(False)\n        return pos_embed\n\n    def get_temporal_pos_embed(self):\n        pos_embed = get_1d_sincos_pos_embed(\n            self.hidden_size,\n            self.input_size[0] // self.patch_size[0],\n        )\n        pos_embed = torch.from_numpy(pos_embed).float().unsqueeze(0).requires_grad_(False)\n        return pos_embed\n\n    def unpatchify(self, x):\n        c = self.out_channels\n        t, h, w = [self.input_size[i] // self.patch_size[i] for i in range(3)]\n        pt, ph, pw = self.patch_size\n\n        x = x.reshape(shape=(x.shape[0], t, h, w, pt, ph, pw, c))\n        x = rearrange(x, \"n t h w r p q c -> n c t r h p w q\")\n        imgs = x.reshape(shape=(x.shape[0], c, t * pt, h * ph, w * pw))\n        return imgs\n\n    def forward(self, x, t, y):\n        \"\"\"\n        Forward pass of DiT.\n        x: (B, C, T, H, W) tensor of inputs\n        t: (B,) tensor of diffusion timesteps\n        y: list of text\n        \"\"\"\n        # origin inputs should be float32, cast to specified dtype\n        x = x.to(self.dtype)\n        if self.use_text_encoder:\n            y = y.to(self.dtype)\n\n        # embedding\n        x = self.x_embedder(x)  # (B, N, D)\n        x = rearrange(x, \"b (t s) d -> b t s d\", t=self.num_temporal, s=self.num_spatial)\n        x = x + self.pos_embed_spatial\n        if not self.no_temporal_pos_emb:\n            x = rearrange(x, \"b t s d -> b s t d\")\n            x = x + self.pos_embed_temporal\n            x = rearrange(x, \"b s t d -> b (t s) d\")\n        else:\n            x = rearrange(x, \"b t s d -> b (t s) d\")\n\n        t = self.t_embedder(t, dtype=x.dtype)  # (N, D)\n        y = self.y_embedder(y, self.training)  # (N, D)\n        if self.use_text_encoder:\n            y = y.squeeze(1).squeeze(1)\n        condition = t + y\n\n        # blocks\n        for _, block in enumerate(self.blocks):\n            c = condition\n            x = auto_grad_checkpoint(block, x, c)  # (B, N, D)\n\n        # final process\n        x = self.final_layer(x, condition)  # (B, N, num_patches * out_channels)\n        x = self.unpatchify(x)  # (B, out_channels, T, H, W)\n\n        # cast to float32 for better accuracy\n        x = x.to(torch.float32)\n        return x\n\n    def initialize_weights(self):\n        # Initialize transformer layers:\n        def _basic_init(module):\n            if isinstance(module, nn.Linear):\n                if module.weight.requires_grad_:\n                    torch.nn.init.xavier_uniform_(module.weight)\n                    if module.bias is not None:\n                        nn.init.constant_(module.bias, 0)\n\n        self.apply(_basic_init)\n\n        # Initialize patch_embed like nn.Linear (instead of nn.Conv2d):\n        w = self.x_embedder.proj.weight.data\n        nn.init.xavier_uniform_(w.view([w.shape[0], -1]))\n        nn.init.constant_(self.x_embedder.proj.bias, 0)\n\n        # Initialize timestep embedding MLP:\n        nn.init.normal_(self.t_embedder.mlp[0].weight, std=0.02)\n        nn.init.normal_(self.t_embedder.mlp[2].weight, std=0.02)\n\n        # Zero-out adaLN modulation layers in DiT blocks:\n        for block in self.blocks:\n            nn.init.constant_(block.adaLN_modulation[-1].weight, 0)\n            nn.init.constant_(block.adaLN_modulation[-1].bias, 0)\n\n        # Zero-out output layers:\n        nn.init.constant_(self.final_layer.adaLN_modulation[-1].weight, 0)\n        nn.init.constant_(self.final_layer.adaLN_modulation[-1].bias, 0)\n        nn.init.constant_(self.final_layer.linear.weight, 0)\n        nn.init.constant_(self.final_layer.linear.bias, 0)\n\n        # Zero-out text embedding layers:\n        if self.use_text_encoder:\n            nn.init.normal_(self.y_embedder.y_proj.fc1.weight, std=0.02)\n            nn.init.normal_(self.y_embedder.y_proj.fc2.weight, std=0.02)\n\n\n@MODELS.register_module(\"DiT-XL/2\")\ndef DiT_XL_2(from_pretrained=None, **kwargs):\n    model = DiT(\n        depth=28,\n        hidden_size=1152,\n        patch_size=(1, 2, 2),\n        num_heads=16,\n        **kwargs,\n    )\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n\n\n@MODELS.register_module(\"DiT-XL/2x2\")\ndef DiT_XL_2x2(from_pretrained=None, **kwargs):\n    model = DiT(\n        depth=28,\n        hidden_size=1152,\n        patch_size=(2, 2, 2),\n        num_heads=16,\n        **kwargs,\n    )\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/latte/__init__.py",
    "content": "from .latte import Latte, Latte_XL_2, Latte_XL_2x2\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/latte/latte.py",
    "content": "# Copyright 2024 Vchitect/Latte\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.# Modified from Latte\n#\n#\n# This file is mofied from https://github.com/Vchitect/Latte/blob/main/models/latte.py\n#\n# With references to:\n# Latte:  https://github.com/Vchitect/Latte\n# DiT:    https://github.com/facebookresearch/DiT/tree/main\n\n\nimport torch\nfrom einops import rearrange, repeat\n\nfrom opensora.acceleration.checkpoint import auto_grad_checkpoint\nfrom opensora.models.dit import DiT\nfrom opensora.registry import MODELS\nfrom opensora.utils.ckpt_utils import load_checkpoint\n\n\n@MODELS.register_module()\nclass Latte(DiT):\n    def forward(self, x, t, y):\n        \"\"\"\n        Forward pass of DiT.\n        x: (B, C, T, H, W) tensor of inputs\n        t: (B,) tensor of diffusion timesteps\n        y: list of text\n        \"\"\"\n        # origin inputs should be float32, cast to specified dtype\n        x = x.to(self.dtype)\n\n        # embedding\n        x = self.x_embedder(x)  # (B, N, D)\n        x = rearrange(x, \"b (t s) d -> b t s d\", t=self.num_temporal, s=self.num_spatial)\n        x = x + self.pos_embed_spatial\n        x = rearrange(x, \"b t s d -> b (t s) d\")\n\n        t = self.t_embedder(t, dtype=x.dtype)  # (N, D)\n        y = self.y_embedder(y, self.training)  # (N, D)\n        if self.use_text_encoder:\n            y = y.squeeze(1).squeeze(1)\n        condition = t + y\n        condition_spatial = repeat(condition, \"b d -> (b t) d\", t=self.num_temporal)\n        condition_temporal = repeat(condition, \"b d -> (b s) d\", s=self.num_spatial)\n\n        # blocks\n        for i, block in enumerate(self.blocks):\n            if i % 2 == 0:\n                # spatial\n                x = rearrange(x, \"b (t s) d -> (b t) s d\", t=self.num_temporal, s=self.num_spatial)\n                c = condition_spatial\n            else:\n                # temporal\n                x = rearrange(x, \"b (t s) d -> (b s) t d\", t=self.num_temporal, s=self.num_spatial)\n                c = condition_temporal\n                if i == 1:\n                    x = x + self.pos_embed_temporal\n\n            x = auto_grad_checkpoint(block, x, c)  # (B, N, D)\n\n            if i % 2 == 0:\n                x = rearrange(x, \"(b t) s d -> b (t s) d\", t=self.num_temporal, s=self.num_spatial)\n            else:\n                x = rearrange(x, \"(b s) t d -> b (t s) d\", t=self.num_temporal, s=self.num_spatial)\n\n        # final process\n        x = self.final_layer(x, condition)  # (B, N, num_patches * out_channels)\n        x = self.unpatchify(x)  # (B, out_channels, T, H, W)\n\n        # cast to float32 for better accuracy\n        x = x.to(torch.float32)\n        return x\n\n\n@MODELS.register_module(\"Latte-XL/2\")\ndef Latte_XL_2(from_pretrained=None, **kwargs):\n    model = Latte(\n        depth=28,\n        hidden_size=1152,\n        patch_size=(1, 2, 2),\n        num_heads=16,\n        **kwargs,\n    )\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n\n\n@MODELS.register_module(\"Latte-XL/2x2\")\ndef Latte_XL_2x2(from_pretrained=None, **kwargs):\n    model = Latte(\n        depth=28,\n        hidden_size=1152,\n        patch_size=(2, 2, 2),\n        num_heads=16,\n        **kwargs,\n    )\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/layers/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/layers/blocks.py",
    "content": "# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# PixArt: https://github.com/PixArt-alpha/PixArt-alpha\n# Latte:  https://github.com/Vchitect/Latte\n# DiT:    https://github.com/facebookresearch/DiT/tree/main\n# GLIDE:  https://github.com/openai/glide-text2im\n# MAE:    https://github.com/facebookresearch/mae/blob/main/models_mae.py\n# --------------------------------------------------------\n\nimport functools\nimport math\nfrom typing import Optional\n\nimport numpy as np\nimport torch\nimport torch.distributed as dist\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.utils.checkpoint\nimport xformers.ops\nfrom einops import rearrange\nfrom timm.models.vision_transformer import Mlp\n\nfrom opensora.acceleration.communications import all_to_all, split_forward_gather_backward\nfrom opensora.acceleration.parallel_states import get_sequence_parallel_group\n\nfrom ..cache_functions.attention import cached_attention_forward\n\napprox_gelu = lambda: nn.GELU(approximate=\"tanh\")\n\n\nclass LlamaRMSNorm(nn.Module):\n    def __init__(self, hidden_size, eps=1e-6):\n        \"\"\"\n        LlamaRMSNorm is equivalent to T5LayerNorm\n        \"\"\"\n        super().__init__()\n        self.weight = nn.Parameter(torch.ones(hidden_size))\n        self.variance_epsilon = eps\n\n    def forward(self, hidden_states):\n        input_dtype = hidden_states.dtype\n        hidden_states = hidden_states.to(torch.float32)\n        variance = hidden_states.pow(2).mean(-1, keepdim=True)\n        hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)\n        return self.weight * hidden_states.to(input_dtype)\n\n\ndef get_layernorm(hidden_size: torch.Tensor, eps: float, affine: bool, use_kernel: bool):\n    if use_kernel:\n        try:\n            from apex.normalization import FusedLayerNorm\n\n            return FusedLayerNorm(hidden_size, elementwise_affine=affine, eps=eps)\n        except ImportError:\n            raise RuntimeError(\"FusedLayerNorm not available. Please install apex.\")\n    else:\n        return nn.LayerNorm(hidden_size, eps, elementwise_affine=affine)\n\n\ndef modulate(norm_func, x, shift, scale):\n    # Suppose x is (B, N, D), shift is (B, D), scale is (B, D)\n    dtype = x.dtype\n    x = norm_func(x.to(torch.float32)).to(dtype)\n    x = x * (scale.unsqueeze(1) + 1) + shift.unsqueeze(1)\n    x = x.to(dtype)\n    return x\n\n\ndef t2i_modulate(x, shift, scale):\n    return x * (1 + scale) + shift\n\n\n# ===============================================\n# General-purpose Layers\n# ===============================================\n\n\nclass PatchEmbed3D(nn.Module):\n    \"\"\"Video to Patch Embedding.\n\n    Args:\n        patch_size (int): Patch token size. Default: (2,4,4).\n        in_chans (int): Number of input video channels. Default: 3.\n        embed_dim (int): Number of linear projection output channels. Default: 96.\n        norm_layer (nn.Module, optional): Normalization layer. Default: None\n    \"\"\"\n\n    def __init__(\n        self,\n        patch_size=(2, 4, 4),\n        in_chans=3,\n        embed_dim=96,\n        norm_layer=None,\n        flatten=True,\n    ):\n        super().__init__()\n        self.patch_size = patch_size\n        self.flatten = flatten\n\n        self.in_chans = in_chans\n        self.embed_dim = embed_dim\n\n        self.proj = nn.Conv3d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size)\n        if norm_layer is not None:\n            self.norm = norm_layer(embed_dim)\n        else:\n            self.norm = None\n\n    def forward(self, x):\n        \"\"\"Forward function.\"\"\"\n        # padding\n        _, _, D, H, W = x.size()\n        if W % self.patch_size[2] != 0:\n            x = F.pad(x, (0, self.patch_size[2] - W % self.patch_size[2]))\n        if H % self.patch_size[1] != 0:\n            x = F.pad(x, (0, 0, 0, self.patch_size[1] - H % self.patch_size[1]))\n        if D % self.patch_size[0] != 0:\n            x = F.pad(x, (0, 0, 0, 0, 0, self.patch_size[0] - D % self.patch_size[0]))\n\n        x = self.proj(x)  # (B C T H W)\n        if self.norm is not None:\n            D, Wh, Ww = x.size(2), x.size(3), x.size(4)\n            x = x.flatten(2).transpose(1, 2)\n            x = self.norm(x)\n            x = x.transpose(1, 2).view(-1, self.embed_dim, D, Wh, Ww)\n        if self.flatten:\n            x = x.flatten(2).transpose(1, 2)  # BCTHW -> BNC\n        return x\n\n\nclass Attention(nn.Module):\n    def __init__(\n        self,\n        dim: int,\n        num_heads: int = 8,\n        qkv_bias: bool = False,\n        qk_norm: bool = False,\n        attn_drop: float = 0.0,\n        proj_drop: float = 0.0,\n        norm_layer: nn.Module = LlamaRMSNorm,\n        enable_flash_attn: bool = False,\n        rope=None,\n        qk_norm_legacy: bool = False,\n    ) -> None:\n        super().__init__()\n        assert dim % num_heads == 0, \"dim should be divisible by num_heads\"\n        self.dim = dim\n        self.num_heads = num_heads\n        self.head_dim = dim // num_heads\n        self.scale = self.head_dim**-0.5\n        self.enable_flash_attn = False\n\n        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)\n        self.q_norm = norm_layer(self.head_dim) if qk_norm else nn.Identity()\n        self.k_norm = norm_layer(self.head_dim) if qk_norm else nn.Identity()\n        self.qk_norm_legacy = qk_norm_legacy\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj = nn.Linear(dim, dim)\n        self.proj_drop = nn.Dropout(proj_drop)\n\n        self.rope = False\n        if rope is not None:\n            self.rope = True\n            self.rotary_emb = rope\n        \n        self.is_causal = False\n    \n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        B, N, C = x.shape\n        # flash attn is not memory efficient for small sequences, this is empirical\n        enable_flash_attn = self.enable_flash_attn and (N > B)\n        qkv = self.qkv(x)\n        qkv_shape = (B, N, 3, self.num_heads, self.head_dim)\n\n        qkv = qkv.view(qkv_shape).permute(2, 0, 3, 1, 4)\n        q, k, v = qkv.unbind(0)\n        if self.qk_norm_legacy:\n            # WARNING: this may be a bug\n            if self.rope:\n                q = self.rotary_emb(q)\n                k = self.rotary_emb(k)\n            q, k = self.q_norm(q), self.k_norm(k)\n        else:\n            q, k = self.q_norm(q), self.k_norm(k)\n            if self.rope:\n                q = self.rotary_emb(q)\n                k = self.rotary_emb(k)\n\n        if enable_flash_attn:\n            from flash_attn import flash_attn_func\n\n            # (B, #heads, N, #dim) -> (B, N, #heads, #dim)\n            q = q.permute(0, 2, 1, 3)\n            k = k.permute(0, 2, 1, 3)\n            v = v.permute(0, 2, 1, 3)\n            x = flash_attn_func(\n                q,\n                k,\n                v,\n                dropout_p=self.attn_drop.p if self.training else 0.0,\n                softmax_scale=self.scale,\n                causal=self.is_causal,\n            )\n        else:\n            dtype = q.dtype\n            q = q * self.scale\n            #attn = q @ k.transpose(-2, -1)  # translate attn to float32\n            attn = torch.matmul(q,k.transpose(-2, -1))\n            attn = attn.to(torch.float32)\n            if self.is_causal:\n                causal_mask = torch.tril(torch.ones_like(attn), diagonal=0)\n                causal_mask = torch.where(causal_mask.bool(), 0, float('-inf'))\n                attn += causal_mask\n            attn = attn.softmax(dim=-1)\n            attn = attn.to(dtype)  # cast back attn to original dtype\n            attn = self.attn_drop(attn)\n            #x = attn @ v\n            x = torch.matmul(attn,v)\n\n        x_output_shape = (B, N, C)\n        if not enable_flash_attn:\n            x = x.transpose(1, 2)\n        x = x.reshape(x_output_shape)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        return x\n\n\nclass KVCompressAttention(nn.Module):\n    def __init__(\n        self,\n        dim: int,\n        num_heads: int = 8,\n        qkv_bias: bool = False,\n        qk_norm: bool = False,\n        attn_drop: float = 0.0,\n        proj_drop: float = 0.0,\n        norm_layer: nn.Module = LlamaRMSNorm,\n        enable_flash_attn: bool = False,\n        sampling=\"conv\",\n        sr_ratio=1,\n        mem_eff_attention=False,\n        attn_half=False,\n    ) -> None:\n        super().__init__()\n        assert dim % num_heads == 0, \"dim should be divisible by num_heads\"\n        self.dim = dim\n        self.num_heads = num_heads\n        self.head_dim = dim // num_heads\n        self.scale = self.head_dim**-0.5\n        self.enable_flash_attn = enable_flash_attn\n\n        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)\n\n        self.sr_ratio = sr_ratio\n        self.sampling = sampling\n        if sr_ratio > 1 and sampling == \"conv\":\n            # Avg Conv Init.\n            self.sr = nn.Conv2d(dim, dim, groups=dim, kernel_size=sr_ratio, stride=sr_ratio)\n            self.sr.weight.data.fill_(1 / sr_ratio**2)\n            self.sr.bias.data.zero_()\n            self.norm = nn.LayerNorm(dim)\n\n        self.q_norm = norm_layer(self.head_dim) if qk_norm else nn.Identity()\n        self.k_norm = norm_layer(self.head_dim) if qk_norm else nn.Identity()\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj = nn.Linear(dim, dim)\n        self.proj_drop = nn.Dropout(proj_drop)\n\n        self.mem_eff_attention = mem_eff_attention\n        self.attn_half = attn_half\n\n    def downsample_2d(self, tensor, H, W, scale_factor, sampling=None):\n        if sampling is None or scale_factor == 1:\n            return tensor\n        B, N, C = tensor.shape\n\n        if sampling == \"uniform_every\":\n            return tensor[:, ::scale_factor], int(N // scale_factor)\n\n        tensor = tensor.reshape(B, H, W, C).permute(0, 3, 1, 2)\n        new_H, new_W = int(H / scale_factor), int(W / scale_factor)\n        new_N = new_H * new_W\n\n        if sampling == \"ave\":\n            tensor = F.interpolate(tensor, scale_factor=1 / scale_factor, mode=\"nearest\").permute(0, 2, 3, 1)\n        elif sampling == \"uniform\":\n            tensor = tensor[:, :, ::scale_factor, ::scale_factor].permute(0, 2, 3, 1)\n        elif sampling == \"conv\":\n            tensor = self.sr(tensor).reshape(B, C, -1).permute(0, 2, 1)\n            tensor = self.norm(tensor)\n        else:\n            raise ValueError\n\n        return tensor.reshape(B, new_N, C).contiguous(), new_N\n\n    def forward(self, x: torch.Tensor, mask=None, HW=None, block_id=None, **kwargs) -> torch.Tensor:\n        B, N, C = x.shape\n        new_N = N\n        H, W = HW\n        # flash attn is not memory efficient for small sequences, this is empirical\n        enable_flash_attn = self.enable_flash_attn and (N > B)\n\n        qkv = self.qkv(x).reshape(B, N, 3, C)\n        q, k, v = qkv.unbind(2)\n        dtype = q.dtype\n        # KV compression\n        if self.sr_ratio > 1:\n            k, new_N = self.downsample_2d(k, H, W, self.sr_ratio, sampling=self.sampling)\n            v, new_N = self.downsample_2d(v, H, W, self.sr_ratio, sampling=self.sampling)\n\n        q = q.reshape(B, N, self.num_heads, C // self.num_heads).to(dtype)\n        k = k.reshape(B, new_N, self.num_heads, C // self.num_heads).to(dtype)\n        v = v.reshape(B, new_N, self.num_heads, C // self.num_heads).to(dtype)\n\n        q, k = self.q_norm(q), self.k_norm(k)\n\n        if enable_flash_attn:\n            from flash_attn import flash_attn_func\n\n            x = flash_attn_func(\n                q,\n                k,\n                v,\n                dropout_p=self.attn_drop.p if self.training else 0.0,\n                softmax_scale=self.scale,\n            )\n\n        elif self.mem_eff_attention:\n            attn_bias = None\n            if mask is not None:\n                attn_bias = torch.zeros([B * self.num_heads, q.shape[1], k.shape[1]], dtype=q.dtype, device=q.device)\n                attn_bias.masked_fill_(mask.squeeze(1).repeat(self.num_heads, 1, 1) == 0, float(\"-inf\"))\n            x = xformers.ops.memory_efficient_attention(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)\n        else:\n            # (B, N, #heads, #dim) -> (B, #heads, N, #dim)\n            q = q.permute(0, 2, 1, 3)\n            k = k.permute(0, 2, 1, 3)\n            v = v.permute(0, 2, 1, 3)\n            dtype = q.dtype\n            q = q * self.scale\n            attn = q @ k.transpose(-2, -1)  # translate attn to float32\n            if not self.attn_half:\n                attn = attn.to(torch.float32)\n            attn = attn.softmax(dim=-1)\n            attn = attn.to(dtype)  # cast back attn to original dtype\n            attn = self.attn_drop(attn)\n            x = attn @ v\n\n        x_output_shape = (B, N, C)\n        if not enable_flash_attn:\n            x = x.transpose(1, 2)\n        x = x.reshape(x_output_shape)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        return x\n\n\nclass SeqParallelAttention(Attention):\n    def __init__(\n        self,\n        dim: int,\n        num_heads: int = 8,\n        qkv_bias: bool = False,\n        qk_norm: bool = False,\n        attn_drop: float = 0.0,\n        proj_drop: float = 0.0,\n        norm_layer: nn.Module = LlamaRMSNorm,\n        enable_flash_attn: bool = False,\n        rope=None,\n    ) -> None:\n        assert rope is None, \"Rope is not supported in SeqParallelAttention\"\n        super().__init__(\n            dim=dim,\n            num_heads=num_heads,\n            qkv_bias=qkv_bias,\n            qk_norm=qk_norm,\n            attn_drop=attn_drop,\n            proj_drop=proj_drop,\n            norm_layer=norm_layer,\n            enable_flash_attn=enable_flash_attn,\n        )\n\n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        B, N, C = x.shape  # for sequence parallel here, the N is a local sequence length\n        qkv = self.qkv(x)\n        qkv_shape = (B, N, 3, self.num_heads, self.head_dim)\n        qkv = qkv.view(qkv_shape)\n\n        sp_group = get_sequence_parallel_group()\n\n        # apply all_to_all to gather sequence and split attention heads\n        # [B, SUB_N, 3, NUM_HEAD, HEAD_DIM] -> [B, N, 3, NUM_HEAD_PER_DEVICE, HEAD_DIM]\n        qkv = all_to_all(qkv, sp_group, scatter_dim=3, gather_dim=1)\n\n        if self.enable_flash_attn:\n            qkv_permute_shape = (\n                2,\n                0,\n                1,\n                3,\n                4,\n            )  # [3, B, N, NUM_HEAD_PER_DEVICE, HEAD_DIM]\n        else:\n            qkv_permute_shape = (\n                2,\n                0,\n                3,\n                1,\n                4,\n            )  # [3, B, NUM_HEAD_PER_DEVICE, N, HEAD_DIM]\n        qkv = qkv.permute(qkv_permute_shape)\n\n        # ERROR: Should qk_norm first\n        q, k, v = qkv.unbind(0)\n        q, k = self.q_norm(q), self.k_norm(k)\n        if self.enable_flash_attn:\n            from flash_attn import flash_attn_func\n\n            x = flash_attn_func(\n                q,\n                k,\n                v,\n                dropout_p=self.attn_drop.p if self.training else 0.0,\n                softmax_scale=self.scale,\n            )\n        else:\n            dtype = q.dtype\n            q = q * self.scale\n            attn = q @ k.transpose(-2, -1)  # translate attn to float32\n            attn = attn.to(torch.float32)\n            attn = attn.softmax(dim=-1)\n            attn = attn.to(dtype)  # cast back attn to original dtype\n            attn = self.attn_drop(attn)\n            x = attn @ v\n\n        if not self.enable_flash_attn:\n            x = x.transpose(1, 2)\n\n        # apply all to all to gather back attention heads and split sequence\n        # [B, N, NUM_HEAD_PER_DEVICE, HEAD_DIM]  -> [B, SUB_N, NUM_HEAD, HEAD_DIM]\n        x = all_to_all(x, sp_group, scatter_dim=1, gather_dim=2)\n\n        # reshape outputs back to [B, N, C]\n        x_output_shape = (B, N, C)\n        x = x.reshape(x_output_shape)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        return x\n\n\nclass MultiHeadCrossAttention(nn.Module):\n    def __init__(self, d_model, num_heads, attn_drop=0.0, proj_drop=0.0):\n        super(MultiHeadCrossAttention, self).__init__()\n        assert d_model % num_heads == 0, \"d_model must be divisible by num_heads\"\n\n        self.d_model = d_model\n        self.num_heads = num_heads\n        self.head_dim = d_model // num_heads\n\n        self.q_linear = nn.Linear(d_model, d_model)\n        self.kv_linear = nn.Linear(d_model, d_model * 2)\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj = nn.Linear(d_model, d_model)\n        self.proj_drop = nn.Dropout(proj_drop)\n    \n    def forward(self, x, cond, mask=None):\n        #start = torch.cuda.Event(enable_timing=True)\n        #end = torch.cuda.Event(enable_timing=True)\n        # query/value: img tokens; key: condition; mask: if padding tokens\n        B, N, C = x.shape\n        #start.record()\n        q = self.q_linear(x).view(1, -1, self.num_heads, self.head_dim)\n        kv = self.kv_linear(cond).view(1, -1, 2, self.num_heads, self.head_dim)\n        k, v = kv.unbind(2)\n\n        attn_bias = None\n        if mask is not None:\n            attn_bias = xformers.ops.fmha.BlockDiagonalMask.from_seqlens([N] * B, mask)\n        #x = xformers.ops.memory_efficient_attention(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)\n\n        x, cross_attn_map = cached_attention_forward(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)\n        x = x.view(B, -1, C)\n        cross_attn_map = cross_attn_map.view(B, -1, cross_attn_map.shape[-1])\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        #end.record()\n        #torch.cuda.synchronize()\n        #print(start.elapsed_time(end))\n        return x, cross_attn_map\n\n\nclass SeqParallelMultiHeadCrossAttention(MultiHeadCrossAttention):\n    def __init__(\n        self,\n        d_model,\n        num_heads,\n        attn_drop=0.0,\n        proj_drop=0.0,\n    ):\n        super().__init__(\n            d_model=d_model,\n            num_heads=num_heads,\n            attn_drop=attn_drop,\n            proj_drop=proj_drop,\n        )\n\n    def forward(self, x, cond, mask=None):\n        # query/value: img tokens; key: condition; mask: if padding tokens\n        sp_group = get_sequence_parallel_group()\n        sp_size = dist.get_world_size(sp_group)\n        B, SUB_N, C = x.shape  # [B, TS/p, C]\n        N = SUB_N * sp_size\n\n        # shape:\n        # q, k, v: [B, SUB_N, NUM_HEADS, HEAD_DIM]\n        q = self.q_linear(x).view(B, -1, self.num_heads, self.head_dim)\n        kv = self.kv_linear(cond).view(1, -1, 2, self.num_heads, self.head_dim)\n        kv = split_forward_gather_backward(kv, get_sequence_parallel_group(), dim=3, grad_scale=\"down\")\n        k, v = kv.unbind(2)\n\n        # apply all_to_all to gather sequence and split attention heads\n        q = all_to_all(q, sp_group, scatter_dim=2, gather_dim=1)\n\n        q = q.view(1, -1, self.num_heads // sp_size, self.head_dim)\n        k = k.view(1, -1, self.num_heads // sp_size, self.head_dim)\n        v = v.view(1, -1, self.num_heads // sp_size, self.head_dim)\n\n        # compute attention\n        attn_bias = None\n        if mask is not None:\n            attn_bias = xformers.ops.fmha.BlockDiagonalMask.from_seqlens([N] * B, mask)\n        x = xformers.ops.memory_efficient_attention(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)\n\n        # apply all to all to gather back attention heads and scatter sequence\n        x = x.view(B, -1, self.num_heads // sp_size, self.head_dim)\n        x = all_to_all(x, sp_group, scatter_dim=1, gather_dim=2)\n\n        # apply output projection\n        x = x.view(B, -1, C)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        return x\n\n\nclass FinalLayer(nn.Module):\n    \"\"\"\n    The final layer of DiT.\n    \"\"\"\n\n    def __init__(self, hidden_size, num_patch, out_channels):\n        super().__init__()\n        self.norm_final = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n        self.linear = nn.Linear(hidden_size, num_patch * out_channels, bias=True)\n        self.adaLN_modulation = nn.Sequential(nn.SiLU(), nn.Linear(hidden_size, 2 * hidden_size, bias=True))\n\n    def forward(self, x, c):\n        shift, scale = self.adaLN_modulation(c).chunk(2, dim=1)\n        x = modulate(self.norm_final, x, shift, scale)\n        x = self.linear(x)\n        return x\n\n\nclass T2IFinalLayer(nn.Module):\n    \"\"\"\n    The final layer of PixArt.\n    \"\"\"\n\n    def __init__(self, hidden_size, num_patch, out_channels, d_t=None, d_s=None):\n        super().__init__()\n        self.norm_final = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n        self.linear = nn.Linear(hidden_size, num_patch * out_channels, bias=True)\n        self.scale_shift_table = nn.Parameter(torch.randn(2, hidden_size) / hidden_size**0.5)\n        self.out_channels = out_channels\n        self.d_t = d_t\n        self.d_s = d_s\n\n    def t_mask_select(self, x_mask, x, masked_x, T, S):\n        # x: [B, (T, S), C]\n        # mased_x: [B, (T, S), C]\n        # x_mask: [B, T]\n        x = rearrange(x, \"B (T S) C -> B T S C\", T=T, S=S)\n        masked_x = rearrange(masked_x, \"B (T S) C -> B T S C\", T=T, S=S)\n        x = torch.where(x_mask[:, :, None, None], x, masked_x)\n        x = rearrange(x, \"B T S C -> B (T S) C\")\n        return x\n\n    def forward(self, x, t, x_mask=None, t0=None, T=None, S=None):\n        if T is None:\n            T = self.d_t\n        if S is None:\n            S = self.d_s\n        shift, scale = (self.scale_shift_table[None] + t[:, None]).chunk(2, dim=1)\n        x = t2i_modulate(self.norm_final(x), shift, scale)\n        if x_mask is not None:\n            shift_zero, scale_zero = (self.scale_shift_table[None] + t0[:, None]).chunk(2, dim=1)\n            x_zero = t2i_modulate(self.norm_final(x), shift_zero, scale_zero)\n            x = self.t_mask_select(x_mask, x, x_zero, T, S)\n        x = self.linear(x)\n        return x\n\n\n# ===============================================\n# Embedding Layers for Timesteps and Class Labels\n# ===============================================\n\n\nclass TimestepEmbedder(nn.Module):\n    \"\"\"\n    Embeds scalar timesteps into vector representations.\n    \"\"\"\n\n    def __init__(self, hidden_size, frequency_embedding_size=256):\n        super().__init__()\n        self.mlp = nn.Sequential(\n            nn.Linear(frequency_embedding_size, hidden_size, bias=True),\n            nn.SiLU(),\n            nn.Linear(hidden_size, hidden_size, bias=True),\n        )\n        self.frequency_embedding_size = frequency_embedding_size\n\n    @staticmethod\n    def timestep_embedding(t, dim, max_period=10000):\n        \"\"\"\n        Create sinusoidal timestep embeddings.\n        :param t: a 1-D Tensor of N indices, one per batch element.\n                          These may be fractional.\n        :param dim: the dimension of the output.\n        :param max_period: controls the minimum frequency of the embeddings.\n        :return: an (N, D) Tensor of positional embeddings.\n        \"\"\"\n        # https://github.com/openai/glide-text2im/blob/main/glide_text2im/nn.py\n        half = dim // 2\n        freqs = torch.exp(-math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half)\n        freqs = freqs.to(device=t.device)\n        args = t[:, None].float() * freqs[None]\n        embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)\n        if dim % 2:\n            embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)\n        return embedding\n\n    def forward(self, t, dtype):\n        t_freq = self.timestep_embedding(t, self.frequency_embedding_size)\n        if t_freq.dtype != dtype:\n            t_freq = t_freq.to(dtype)\n        t_emb = self.mlp(t_freq)\n        return t_emb\n\n\nclass LabelEmbedder(nn.Module):\n    \"\"\"\n    Embeds class labels into vector representations. Also handles label dropout for classifier-free guidance.\n    \"\"\"\n\n    def __init__(self, num_classes, hidden_size, dropout_prob):\n        super().__init__()\n        use_cfg_embedding = dropout_prob > 0\n        self.embedding_table = nn.Embedding(num_classes + use_cfg_embedding, hidden_size)\n        self.num_classes = num_classes\n        self.dropout_prob = dropout_prob\n\n    def token_drop(self, labels, force_drop_ids=None):\n        \"\"\"\n        Drops labels to enable classifier-free guidance.\n        \"\"\"\n        if force_drop_ids is None:\n            drop_ids = torch.rand(labels.shape[0]).cuda() < self.dropout_prob\n        else:\n            drop_ids = force_drop_ids == 1\n        labels = torch.where(drop_ids, self.num_classes, labels)\n        return labels\n\n    def forward(self, labels, train, force_drop_ids=None):\n        use_dropout = self.dropout_prob > 0\n        if (train and use_dropout) or (force_drop_ids is not None):\n            labels = self.token_drop(labels, force_drop_ids)\n        return self.embedding_table(labels)\n\n\nclass SizeEmbedder(TimestepEmbedder):\n    \"\"\"\n    Embeds scalar timesteps into vector representations.\n    \"\"\"\n\n    def __init__(self, hidden_size, frequency_embedding_size=256):\n        super().__init__(hidden_size=hidden_size, frequency_embedding_size=frequency_embedding_size)\n        self.mlp = nn.Sequential(\n            nn.Linear(frequency_embedding_size, hidden_size, bias=True),\n            nn.SiLU(),\n            nn.Linear(hidden_size, hidden_size, bias=True),\n        )\n        self.frequency_embedding_size = frequency_embedding_size\n        self.outdim = hidden_size\n\n    def forward(self, s, bs):\n        if s.ndim == 1:\n            s = s[:, None]\n        assert s.ndim == 2\n        if s.shape[0] != bs:\n            s = s.repeat(bs // s.shape[0], 1)\n            assert s.shape[0] == bs\n        b, dims = s.shape[0], s.shape[1]\n        s = rearrange(s, \"b d -> (b d)\")\n        s_freq = self.timestep_embedding(s, self.frequency_embedding_size).to(self.dtype)\n        s_emb = self.mlp(s_freq)\n        s_emb = rearrange(s_emb, \"(b d) d2 -> b (d d2)\", b=b, d=dims, d2=self.outdim)\n        return s_emb\n\n    @property\n    def dtype(self):\n        return next(self.parameters()).dtype\n\n\nclass CaptionEmbedder(nn.Module):\n    \"\"\"\n    Embeds class labels into vector representations. Also handles label dropout for classifier-free guidance.\n    \"\"\"\n\n    def __init__(\n        self,\n        in_channels,\n        hidden_size,\n        uncond_prob,\n        act_layer=nn.GELU(approximate=\"tanh\"),\n        token_num=120,\n    ):\n        super().__init__()\n        self.y_proj = Mlp(\n            in_features=in_channels,\n            hidden_features=hidden_size,\n            out_features=hidden_size,\n            act_layer=act_layer,\n            drop=0,\n        )\n        self.register_buffer(\n            \"y_embedding\",\n            torch.randn(token_num, in_channels) / in_channels**0.5,\n        )\n        self.uncond_prob = uncond_prob\n\n    def token_drop(self, caption, force_drop_ids=None):\n        \"\"\"\n        Drops labels to enable classifier-free guidance.\n        \"\"\"\n        if force_drop_ids is None:\n            drop_ids = torch.rand(caption.shape[0]).cuda() < self.uncond_prob\n        else:\n            drop_ids = force_drop_ids == 1\n        caption = torch.where(drop_ids[:, None, None, None], self.y_embedding, caption)\n        return caption\n\n    def forward(self, caption, train, force_drop_ids=None):\n        if train:\n            assert caption.shape[2:] == self.y_embedding.shape\n        use_dropout = self.uncond_prob > 0\n        if (train and use_dropout) or (force_drop_ids is not None):\n            caption = self.token_drop(caption, force_drop_ids)\n        caption = self.y_proj(caption)\n        return caption\n\n\nclass PositionEmbedding2D(nn.Module):\n    def __init__(self, dim: int) -> None:\n        super().__init__()\n        self.dim = dim\n        assert dim % 4 == 0, \"dim must be divisible by 4\"\n        half_dim = dim // 2\n        inv_freq = 1.0 / (10000 ** (torch.arange(0, half_dim, 2).float() / half_dim))\n        self.register_buffer(\"inv_freq\", inv_freq, persistent=False)\n\n    def _get_sin_cos_emb(self, t: torch.Tensor):\n        out = torch.einsum(\"i,d->id\", t, self.inv_freq)\n        emb_cos = torch.cos(out)\n        emb_sin = torch.sin(out)\n        return torch.cat((emb_sin, emb_cos), dim=-1)\n\n    @functools.lru_cache(maxsize=512)\n    def _get_cached_emb(\n        self,\n        device: torch.device,\n        dtype: torch.dtype,\n        h: int,\n        w: int,\n        scale: float = 1.0,\n        base_size: Optional[int] = None,\n    ):\n        grid_h = torch.arange(h, device=device) / scale\n        grid_w = torch.arange(w, device=device) / scale\n        if base_size is not None:\n            grid_h *= base_size / h\n            grid_w *= base_size / w\n        grid_h, grid_w = torch.meshgrid(\n            grid_w,\n            grid_h,\n            indexing=\"ij\",\n        )  # here w goes first\n        grid_h = grid_h.t().reshape(-1)\n        grid_w = grid_w.t().reshape(-1)\n        emb_h = self._get_sin_cos_emb(grid_h)\n        emb_w = self._get_sin_cos_emb(grid_w)\n        return torch.concat([emb_h, emb_w], dim=-1).unsqueeze(0).to(dtype)\n\n    def forward(\n        self,\n        x: torch.Tensor,\n        h: int,\n        w: int,\n        scale: Optional[float] = 1.0,\n        base_size: Optional[int] = None,\n    ) -> torch.Tensor:\n        return self._get_cached_emb(x.device, x.dtype, h, w, scale, base_size)\n\n\n# ===============================================\n# Sine/Cosine Positional Embedding Functions\n# ===============================================\n# https://github.com/facebookresearch/mae/blob/main/util/pos_embed.py\n\n\ndef get_2d_sincos_pos_embed(embed_dim, grid_size, cls_token=False, extra_tokens=0, scale=1.0, base_size=None):\n    \"\"\"\n    grid_size: int of the grid height and width\n    return:\n    pos_embed: [grid_size*grid_size, embed_dim] or [1+grid_size*grid_size, embed_dim] (w/ or w/o cls_token)\n    \"\"\"\n    if not isinstance(grid_size, tuple):\n        grid_size = (grid_size, grid_size)\n\n    grid_h = np.arange(grid_size[0], dtype=np.float32) / scale\n    grid_w = np.arange(grid_size[1], dtype=np.float32) / scale\n    if base_size is not None:\n        grid_h *= base_size / grid_size[0]\n        grid_w *= base_size / grid_size[1]\n    grid = np.meshgrid(grid_w, grid_h)  # here w goes first\n    grid = np.stack(grid, axis=0)\n\n    grid = grid.reshape([2, 1, grid_size[1], grid_size[0]])\n    pos_embed = get_2d_sincos_pos_embed_from_grid(embed_dim, grid)\n    if cls_token and extra_tokens > 0:\n        pos_embed = np.concatenate([np.zeros([extra_tokens, embed_dim]), pos_embed], axis=0)\n    return pos_embed\n\n\ndef get_2d_sincos_pos_embed_from_grid(embed_dim, grid):\n    assert embed_dim % 2 == 0\n\n    # use half of dimensions to encode grid_h\n    emb_h = get_1d_sincos_pos_embed_from_grid(embed_dim // 2, grid[0])  # (H*W, D/2)\n    emb_w = get_1d_sincos_pos_embed_from_grid(embed_dim // 2, grid[1])  # (H*W, D/2)\n\n    emb = np.concatenate([emb_h, emb_w], axis=1)  # (H*W, D)\n    return emb\n\n\ndef get_1d_sincos_pos_embed(embed_dim, length, scale=1.0):\n    pos = np.arange(0, length)[..., None] / scale\n    return get_1d_sincos_pos_embed_from_grid(embed_dim, pos)\n\n\ndef get_1d_sincos_pos_embed_from_grid(embed_dim, pos):\n    \"\"\"\n    embed_dim: output dimension for each position\n    pos: a list of positions to be encoded: size (M,)\n    out: (M, D)\n    \"\"\"\n    assert embed_dim % 2 == 0\n    omega = np.arange(embed_dim // 2, dtype=np.float64)\n    omega /= embed_dim / 2.0\n    omega = 1.0 / 10000**omega  # (D/2,)\n\n    pos = pos.reshape(-1)  # (M,)\n    out = np.einsum(\"m,d->md\", pos, omega)  # (M, D/2), outer product\n\n    emb_sin = np.sin(out)  # (M, D/2)\n    emb_cos = np.cos(out)  # (M, D/2)\n\n    emb = np.concatenate([emb_sin, emb_cos], axis=1)  # (M, D)\n    return emb\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/pixart/__init__.py",
    "content": "from .pixart import PixArt, PixArt_1B_2, PixArt_XL_2\nfrom .pixart_sigma import PixArt_Sigma_XL_2\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/pixart/pixart.py",
    "content": "# Adapted from PixArt\n#\n# Copyright (C) 2023  PixArt-alpha/PixArt-alpha\n#\n# This program is free software: you can redistribute it and/or modify\n# it under the terms of the GNU Affero General Public License as published\n# by the Free Software Foundation, either version 3 of the License, or\n# (at your option) any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU Affero General Public License for more details.\n#\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# PixArt: https://github.com/PixArt-alpha/PixArt-alpha\n# DiT:    https://github.com/facebookresearch/DiT/tree/main\n# --------------------------------------------------------\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom einops import rearrange\nfrom timm.models.layers import DropPath\nfrom timm.models.vision_transformer import Mlp\n\n# from .builder import MODELS\nfrom opensora.acceleration.checkpoint import auto_grad_checkpoint\nfrom opensora.models.layers.blocks import (\n    Attention,\n    CaptionEmbedder,\n    MultiHeadCrossAttention,\n    PatchEmbed3D,\n    SeqParallelAttention,\n    SeqParallelMultiHeadCrossAttention,\n    SizeEmbedder,\n    T2IFinalLayer,\n    TimestepEmbedder,\n    approx_gelu,\n    get_1d_sincos_pos_embed,\n    get_2d_sincos_pos_embed,\n    get_layernorm,\n    t2i_modulate,\n)\nfrom opensora.registry import MODELS\nfrom opensora.utils.ckpt_utils import load_checkpoint\n\n\nclass PixArtBlock(nn.Module):\n    \"\"\"\n    A PixArt block with adaptive layer norm (adaLN-single) conditioning.\n    \"\"\"\n\n    def __init__(\n        self,\n        hidden_size,\n        num_heads,\n        mlp_ratio=4.0,\n        drop_path=0.0,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n    ):\n        super().__init__()\n        self.hidden_size = hidden_size\n        self.enable_flash_attn = enable_flash_attn\n        self._enable_sequence_parallelism = enable_sequence_parallelism\n\n        if enable_sequence_parallelism:\n            self.attn_cls = SeqParallelAttention\n            self.mha_cls = SeqParallelMultiHeadCrossAttention\n        else:\n            self.attn_cls = Attention\n            self.mha_cls = MultiHeadCrossAttention\n\n        self.norm1 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.attn = self.attn_cls(\n            hidden_size,\n            num_heads=num_heads,\n            qkv_bias=True,\n            enable_flash_attn=enable_flash_attn,\n        )\n        self.cross_attn = self.mha_cls(hidden_size, num_heads)\n        self.norm2 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.mlp = Mlp(\n            in_features=hidden_size, hidden_features=int(hidden_size * mlp_ratio), act_layer=approx_gelu, drop=0\n        )\n        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity()\n        self.scale_shift_table = nn.Parameter(torch.randn(6, hidden_size) / hidden_size**0.5)\n\n    def forward(self, x, y, t, mask=None):\n        B, N, C = x.shape\n\n        shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (\n            self.scale_shift_table[None] + t.reshape(B, 6, -1)\n        ).chunk(6, dim=1)\n        x = x + self.drop_path(gate_msa * self.attn(t2i_modulate(self.norm1(x), shift_msa, scale_msa)).reshape(B, N, C))\n        x = x + self.cross_attn(x, y, mask)\n        x = x + self.drop_path(gate_mlp * self.mlp(t2i_modulate(self.norm2(x), shift_mlp, scale_mlp)))\n\n        return x\n\n\n@MODELS.register_module()\nclass PixArt(nn.Module):\n    \"\"\"\n    Diffusion model with a Transformer backbone.\n    \"\"\"\n\n    def __init__(\n        self,\n        input_size=(1, 32, 32),\n        in_channels=4,\n        patch_size=(1, 2, 2),\n        hidden_size=1152,\n        depth=28,\n        num_heads=16,\n        mlp_ratio=4.0,\n        class_dropout_prob=0.1,\n        pred_sigma=True,\n        drop_path: float = 0.0,\n        no_temporal_pos_emb=False,\n        caption_channels=4096,\n        model_max_length=120,\n        dtype=torch.float32,\n        freeze=None,\n        space_scale=1.0,\n        time_scale=1.0,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n        base_size=None,\n    ):\n        super().__init__()\n        assert enable_sequence_parallelism is False, \"Sequence parallelism is not supported in this version.\"\n        self.pred_sigma = pred_sigma\n        self.in_channels = in_channels\n        self.out_channels = in_channels * 2 if pred_sigma else in_channels\n        self.hidden_size = hidden_size\n        self.patch_size = patch_size\n        self.input_size = input_size\n        num_patches = np.prod([input_size[i] // patch_size[i] for i in range(3)])\n        self.num_patches = num_patches\n        self.num_temporal = input_size[0] // patch_size[0]\n        self.num_spatial = num_patches // self.num_temporal\n        if base_size is None:\n            self.base_size = int(np.sqrt(self.num_spatial))\n        else:\n            self.base_size = base_size // patch_size[1]\n        self.num_heads = num_heads\n        self.dtype = dtype\n        self.no_temporal_pos_emb = no_temporal_pos_emb\n        self.depth = depth\n        self.mlp_ratio = mlp_ratio\n        self.enable_flash_attn = enable_flash_attn\n        self.enable_layernorm_kernel = enable_layernorm_kernel\n        self.space_scale = space_scale\n        self.time_scale = time_scale\n\n        self.x_embedder = PatchEmbed3D(patch_size, in_channels, hidden_size)\n        self.t_embedder = TimestepEmbedder(hidden_size)\n        self.t_block = nn.Sequential(nn.SiLU(), nn.Linear(hidden_size, 6 * hidden_size, bias=True))\n        self.y_embedder = CaptionEmbedder(\n            in_channels=caption_channels,\n            hidden_size=hidden_size,\n            uncond_prob=class_dropout_prob,\n            act_layer=approx_gelu,\n            token_num=model_max_length,\n        )\n\n        self.register_buffer(\"pos_embed\", self.get_spatial_pos_embed())\n        self.register_buffer(\"pos_embed_temporal\", self.get_temporal_pos_embed())\n\n        drop_path = [x.item() for x in torch.linspace(0, drop_path, depth)]  # stochastic depth decay rule\n        self.blocks = nn.ModuleList(\n            [\n                PixArtBlock(\n                    hidden_size,\n                    num_heads,\n                    mlp_ratio=mlp_ratio,\n                    drop_path=drop_path[i],\n                    enable_flash_attn=enable_flash_attn,\n                    enable_layernorm_kernel=enable_layernorm_kernel,\n                )\n                for i in range(depth)\n            ]\n        )\n        self.final_layer = T2IFinalLayer(hidden_size, np.prod(self.patch_size), self.out_channels)\n\n        self.initialize_weights()\n        if freeze is not None:\n            assert freeze in [\"text\"]\n            if freeze == \"text\":\n                self.freeze_text()\n\n    def forward(self, x, timestep, y, mask=None, **kwargs):\n        \"\"\"\n        Forward pass of PixArt.\n        x: (N, C, H, W) tensor of spatial inputs (images or latent representations of images)\n        t: (N,) tensor of diffusion timesteps\n        y: (N, 1, 120, C) tensor of class labels\n        \"\"\"\n        dtype = self.x_embedder.proj.weight.dtype\n        B = x.size(0)\n        x = x.to(dtype)\n        timestep = timestep.to(dtype)\n        y = y.to(dtype)\n\n        # embedding\n        x = self.x_embedder(x)  # (B, N, D)\n        x = rearrange(x, \"b (t s) d -> b t s d\", t=self.num_temporal, s=self.num_spatial)\n        x = x + self.pos_embed\n        if not self.no_temporal_pos_emb:\n            x = rearrange(x, \"b t s d -> b s t d\")\n            x = x + self.pos_embed_temporal\n            x = rearrange(x, \"b s t d -> b (t s) d\")\n        else:\n            x = rearrange(x, \"b t s d -> b (t s) d\")\n\n        t = self.t_embedder(timestep, dtype=x.dtype)  # (N, D)\n        t0 = self.t_block(t)\n        y = self.y_embedder(y, self.training)  # (N, 1, L, D)\n        if mask is not None:\n            if mask.shape[0] != y.shape[0]:\n                mask = mask.repeat(y.shape[0] // mask.shape[0], 1)\n            mask = mask.squeeze(1).squeeze(1)\n            y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])\n            y_lens = mask.sum(dim=1).tolist()\n        else:\n            y_lens = [y.shape[2]] * y.shape[0]\n            y = y.squeeze(1).view(1, -1, x.shape[-1])\n\n        # blocks\n        for block in self.blocks:\n            x = auto_grad_checkpoint(block, x, y, t0, y_lens)\n\n        # final process\n        x = self.final_layer(x, t)  # (N, T, patch_size ** 2 * out_channels)\n        x = self.unpatchify(x)  # (N, out_channels, H, W)\n\n        # cast to float32 for better accuracy\n        x = x.to(torch.float32)\n        return x\n\n    def unpatchify(self, x):\n        c = self.out_channels\n        t, h, w = [self.input_size[i] // self.patch_size[i] for i in range(3)]\n        pt, ph, pw = self.patch_size\n\n        x = x.reshape(shape=(x.shape[0], t, h, w, pt, ph, pw, c))\n        x = rearrange(x, \"n t h w r p q c -> n c t r h p w q\")\n        imgs = x.reshape(shape=(x.shape[0], c, t * pt, h * ph, w * pw))\n        return imgs\n\n    def get_spatial_pos_embed(self, grid_size=None):\n        if grid_size is None:\n            grid_size = self.input_size[1:]\n        pos_embed = get_2d_sincos_pos_embed(\n            self.hidden_size,\n            (grid_size[0] // self.patch_size[1], grid_size[1] // self.patch_size[2]),\n            scale=self.space_scale,\n            base_size=self.base_size,\n        )\n        pos_embed = torch.from_numpy(pos_embed).float().unsqueeze(0).requires_grad_(False)\n        return pos_embed\n\n    def get_temporal_pos_embed(self):\n        pos_embed = get_1d_sincos_pos_embed(\n            self.hidden_size,\n            self.input_size[0] // self.patch_size[0],\n            scale=self.time_scale,\n        )\n        pos_embed = torch.from_numpy(pos_embed).float().unsqueeze(0).requires_grad_(False)\n        return pos_embed\n\n    def freeze_text(self):\n        for n, p in self.named_parameters():\n            if \"cross_attn\" in n:\n                p.requires_grad = False\n\n    def initialize_weights(self):\n        # Initialize transformer layers:\n        def _basic_init(module):\n            if isinstance(module, nn.Linear):\n                torch.nn.init.xavier_uniform_(module.weight)\n                if module.bias is not None:\n                    nn.init.constant_(module.bias, 0)\n\n        self.apply(_basic_init)\n\n        # Initialize patch_embed like nn.Linear (instead of nn.Conv2d):\n        w = self.x_embedder.proj.weight.data\n        nn.init.xavier_uniform_(w.view([w.shape[0], -1]))\n\n        # Initialize timestep embedding MLP:\n        nn.init.normal_(self.t_embedder.mlp[0].weight, std=0.02)\n        nn.init.normal_(self.t_embedder.mlp[2].weight, std=0.02)\n        nn.init.normal_(self.t_block[1].weight, std=0.02)\n\n        # Initialize caption embedding MLP:\n        nn.init.normal_(self.y_embedder.y_proj.fc1.weight, std=0.02)\n        nn.init.normal_(self.y_embedder.y_proj.fc2.weight, std=0.02)\n\n        # Zero-out adaLN modulation layers in PixArt blocks:\n        for block in self.blocks:\n            nn.init.constant_(block.cross_attn.proj.weight, 0)\n            nn.init.constant_(block.cross_attn.proj.bias, 0)\n\n        # Zero-out output layers:\n        nn.init.constant_(self.final_layer.linear.weight, 0)\n        nn.init.constant_(self.final_layer.linear.bias, 0)\n\n\n@MODELS.register_module()\nclass PixArtMS(PixArt):\n    def __init__(self, *args, **kwargs):\n        super().__init__(*args, **kwargs)\n\n        assert self.hidden_size % 3 == 0, \"hidden_size must be divisible by 3\"\n        self.csize_embedder = SizeEmbedder(self.hidden_size // 3)\n        self.ar_embedder = SizeEmbedder(self.hidden_size // 3)\n\n    def forward(self, x, timestep, y, mask=None, data_info=None):\n        \"\"\"\n        Forward pass of PixArt.\n        x: (N, C, H, W) tensor of spatial inputs (images or latent representations of images)\n        t: (N,) tensor of diffusion timesteps\n        y: (N, 1, 120, C) tensor of class labels\n        \"\"\"\n        x = x.to(self.dtype)\n        timestep = timestep.to(self.dtype)\n        y = y.to(self.dtype)\n\n        c_size = data_info[\"hw\"]\n        ar = data_info[\"ar\"]\n        pos_embed = self.get_spatial_pos_embed((x.shape[-2], x.shape[-1])).to(x.dtype)\n\n        # embedding\n        x = self.x_embedder(x)  # (B, N, D)\n        x = rearrange(x, \"b (t s) d -> b t s d\", t=self.num_temporal, s=self.num_spatial)\n        x = x + pos_embed.to(x.device)\n        if not self.no_temporal_pos_emb:\n            x = rearrange(x, \"b t s d -> b s t d\")\n            x = x + self.pos_embed_temporal\n            x = rearrange(x, \"b s t d -> b (t s) d\")\n        else:\n            x = rearrange(x, \"b t s d -> b (t s) d\")\n\n        t = self.t_embedder(timestep, dtype=x.dtype)  # (N, D)\n        B = x.shape[0]\n        csize = self.csize_embedder(c_size, B)\n        ar = self.ar_embedder(ar, B)\n        t = t + torch.cat([csize, ar], dim=1)\n\n        t0 = self.t_block(t)\n        y = self.y_embedder(y, self.training)  # (N, 1, L, D)\n        if mask is not None:\n            if mask.shape[0] != y.shape[0]:\n                mask = mask.repeat(y.shape[0] // mask.shape[0], 1)\n            mask = mask.squeeze(1).squeeze(1)\n            y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])\n            y_lens = mask.sum(dim=1).tolist()\n        else:\n            y_lens = [y.shape[2]] * y.shape[0]\n            y = y.squeeze(1).view(1, -1, x.shape[-1])\n\n        # blocks\n        for block in self.blocks:\n            x = block(x, y, t0, y_lens)\n\n        # final process\n        x = self.final_layer(x, t)  # (N, T, patch_size ** 2 * out_channels)\n        x = self.unpatchify(x)  # (N, out_channels, H, W)\n\n        # cast to float32 for better accuracy\n        x = x.to(torch.float32)\n        return x\n\n\n@MODELS.register_module(\"PixArt-XL/2\")\ndef PixArt_XL_2(from_pretrained=None, **kwargs):\n    model = PixArt(depth=28, hidden_size=1152, patch_size=(1, 2, 2), num_heads=16, **kwargs)\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n\n\n@MODELS.register_module(\"PixArt-1B/2\")\ndef PixArt_1B_2(from_pretrained=None, **kwargs):\n    model = PixArt(depth=28, hidden_size=1872, patch_size=(1, 2, 2), num_heads=26, **kwargs)\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n\n\n@MODELS.register_module(\"PixArtMS-XL/2\")\ndef PixArtMS_XL_2(from_pretrained=None, **kwargs):\n    model = PixArtMS(depth=28, hidden_size=1152, patch_size=(1, 2, 2), num_heads=16, **kwargs)\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/pixart/pixart_sigma.py",
    "content": "# Adapted from PixArt\n#\n# Copyright (C) 2023  PixArt-alpha/PixArt-alpha\n#\n# This program is free software: you can redistribute it and/or modify\n# it under the terms of the GNU Affero General Public License as published\n# by the Free Software Foundation, either version 3 of the License, or\n# (at your option) any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU Affero General Public License for more details.\n#\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# PixArt: https://github.com/PixArt-alpha/PixArt-alpha\n# DiT:    https://github.com/facebookresearch/DiT/tree/main\n# --------------------------------------------------------\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom einops import rearrange\nfrom timm.models.layers import DropPath\nfrom timm.models.vision_transformer import Mlp\n\n# from .builder import MODELS\nfrom opensora.acceleration.checkpoint import auto_grad_checkpoint\nfrom opensora.models.layers.blocks import (\n    CaptionEmbedder,\n    KVCompressAttention,\n    MultiHeadCrossAttention,\n    PatchEmbed3D,\n    T2IFinalLayer,\n    TimestepEmbedder,\n    approx_gelu,\n    get_1d_sincos_pos_embed,\n    get_2d_sincos_pos_embed,\n    get_layernorm,\n    t2i_modulate,\n)\nfrom opensora.registry import MODELS\nfrom opensora.utils.ckpt_utils import load_checkpoint\n\n\nclass PixArtBlock(nn.Module):\n    \"\"\"\n    A PixArt block with adaptive layer norm (adaLN-single) conditioning.\n    \"\"\"\n\n    def __init__(\n        self,\n        hidden_size,\n        num_heads,\n        mlp_ratio=4.0,\n        drop_path=0.0,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n        qk_norm=False,\n        sampling=\"conv\",\n        sr_ratio=1,\n    ):\n        super().__init__()\n        self.hidden_size = hidden_size\n        self.enable_flash_attn = enable_flash_attn\n        self._enable_sequence_parallelism = enable_sequence_parallelism\n        assert not enable_sequence_parallelism, \"Sequence parallelism is not supported in this version.\"\n\n        self.attn_cls = KVCompressAttention\n        self.mha_cls = MultiHeadCrossAttention\n\n        self.norm1 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.attn = self.attn_cls(\n            hidden_size,\n            num_heads=num_heads,\n            qkv_bias=True,\n            enable_flash_attn=enable_flash_attn,\n            qk_norm=qk_norm,\n            sr_ratio=sr_ratio,\n            sampling=sampling,\n            attn_half=True,\n        )\n        self.cross_attn = self.mha_cls(hidden_size, num_heads)\n        self.norm2 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.mlp = Mlp(\n            in_features=hidden_size, hidden_features=int(hidden_size * mlp_ratio), act_layer=approx_gelu, drop=0\n        )\n        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity()\n        self.scale_shift_table = nn.Parameter(torch.randn(6, hidden_size) / hidden_size**0.5)\n        self.sampling = sampling\n        self.sr_ratio = sr_ratio\n\n    def forward(self, x, y, t, hw, mask=None):\n        B, N, C = x.shape\n\n        shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (\n            self.scale_shift_table[None] + t.reshape(B, 6, -1)\n        ).chunk(6, dim=1)\n        x = x + self.drop_path(\n            gate_msa * self.attn(t2i_modulate(self.norm1(x), shift_msa, scale_msa), HW=hw).reshape(B, N, C)\n        )\n        x = x + self.cross_attn(x, y, mask)\n        x = x + self.drop_path(gate_mlp * self.mlp(t2i_modulate(self.norm2(x), shift_mlp, scale_mlp)))\n\n        return x\n\n\n@MODELS.register_module()\nclass PixArt_Sigma(nn.Module):\n    \"\"\"\n    Diffusion model with a Transformer backbone.\n    \"\"\"\n\n    def __init__(\n        self,\n        input_size=(1, 32, 32),\n        in_channels=4,\n        patch_size=(1, 2, 2),\n        hidden_size=1152,\n        depth=28,\n        num_heads=16,\n        mlp_ratio=4.0,\n        class_dropout_prob=0.1,\n        pred_sigma=True,\n        drop_path: float = 0.0,\n        no_temporal_pos_emb=False,\n        caption_channels=4096,\n        model_max_length=120,\n        dtype=torch.float32,\n        freeze=None,\n        qk_norm=False,\n        space_scale=1.0,\n        time_scale=1.0,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n        kv_compress_config=None,\n    ):\n        super().__init__()\n        assert enable_sequence_parallelism is False, \"Sequence parallelism is not supported in this version.\"\n        self.pred_sigma = pred_sigma\n        self.in_channels = in_channels\n        self.out_channels = in_channels * 2 if pred_sigma else in_channels\n        self.hidden_size = hidden_size\n        self.patch_size = patch_size\n        self.input_size = input_size\n        num_patches = np.prod([input_size[i] // patch_size[i] for i in range(3)])\n        self.num_patches = num_patches\n        self.num_temporal = input_size[0] // patch_size[0]\n        self.num_spatial = num_patches // self.num_temporal\n        self.base_size = int(np.sqrt(self.num_spatial))\n        self.num_heads = num_heads\n        self.dtype = dtype\n        self.no_temporal_pos_emb = no_temporal_pos_emb\n        self.depth = depth\n        self.mlp_ratio = mlp_ratio\n        self.enable_flash_attn = enable_flash_attn\n        self.enable_layernorm_kernel = enable_layernorm_kernel\n        self.space_scale = space_scale\n        self.time_scale = time_scale\n\n        self.x_embedder = PatchEmbed3D(patch_size, in_channels, hidden_size)\n        self.t_embedder = TimestepEmbedder(hidden_size)\n        self.t_block = nn.Sequential(nn.SiLU(), nn.Linear(hidden_size, 6 * hidden_size, bias=True))\n        self.y_embedder = CaptionEmbedder(\n            in_channels=caption_channels,\n            hidden_size=hidden_size,\n            uncond_prob=class_dropout_prob,\n            act_layer=approx_gelu,\n            token_num=model_max_length,\n        )\n\n        self.register_buffer(\"pos_embed\", self.get_spatial_pos_embed())\n        self.register_buffer(\"pos_embed_temporal\", self.get_temporal_pos_embed())\n\n        drop_path = [x.item() for x in torch.linspace(0, drop_path, depth)]  # stochastic depth decay rule\n\n        self.kv_compress_config = kv_compress_config\n        if kv_compress_config is None:\n            self.kv_compress_config = {\n                \"sampling\": None,\n                \"scale_factor\": 1,\n                \"kv_compress_layer\": [],\n            }\n\n        self.blocks = nn.ModuleList(\n            [\n                PixArtBlock(\n                    hidden_size,\n                    num_heads,\n                    mlp_ratio=mlp_ratio,\n                    drop_path=drop_path[i],\n                    enable_flash_attn=enable_flash_attn,\n                    enable_layernorm_kernel=enable_layernorm_kernel,\n                    qk_norm=qk_norm,\n                    sr_ratio=(\n                        int(self.kv_compress_config[\"scale_factor\"])\n                        if i in self.kv_compress_config[\"kv_compress_layer\"]\n                        else 1\n                    ),\n                    sampling=self.kv_compress_config[\"sampling\"],\n                )\n                for i in range(depth)\n            ]\n        )\n        self.final_layer = T2IFinalLayer(hidden_size, np.prod(self.patch_size), self.out_channels)\n\n        self.initialize_weights()\n        if freeze is not None:\n            assert freeze in [\"text\"]\n            if freeze == \"text\":\n                self.freeze_text()\n\n    def forward(self, x, timestep, y, mask=None):\n        \"\"\"\n        Forward pass of PixArt.\n        x: (N, C, H, W) tensor of spatial inputs (images or latent representations of images)\n        t: (N,) tensor of diffusion timesteps\n        y: (N, 1, 120, C) tensor of class labels\n        \"\"\"\n        x = x.to(self.dtype)\n        timestep = timestep.to(self.dtype)\n        y = y.to(self.dtype)\n        pos_embed = self.get_spatial_pos_embed((x.shape[-2], x.shape[-1])).to(x.dtype)\n        hw = (x.shape[-2] // self.patch_size[-2], x.shape[-1] // self.patch_size[-1])\n\n        # embedding\n        x = self.x_embedder(x)  # (B, N, D)\n        x = rearrange(x, \"b (t s) d -> b t s d\", t=self.num_temporal, s=self.num_spatial)\n        x = x + pos_embed.to(x.device)\n        if not self.no_temporal_pos_emb:\n            x = rearrange(x, \"b t s d -> b s t d\")\n            x = x + self.pos_embed_temporal\n            x = rearrange(x, \"b s t d -> b (t s) d\")\n        else:\n            x = rearrange(x, \"b t s d -> b (t s) d\")\n\n        t = self.t_embedder(timestep, dtype=x.dtype)  # (N, D)\n        t0 = self.t_block(t)\n        y = self.y_embedder(y, self.training)  # (N, 1, L, D)\n        if mask is not None:\n            if mask.shape[0] != y.shape[0]:\n                mask = mask.repeat(y.shape[0] // mask.shape[0], 1)\n            mask = mask.squeeze(1).squeeze(1)\n            y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])\n            y_lens = mask.sum(dim=1).tolist()\n        else:\n            y_lens = [y.shape[2]] * y.shape[0]\n            y = y.squeeze(1).view(1, -1, x.shape[-1])\n\n        # blocks\n        for block in self.blocks:\n            x = auto_grad_checkpoint(block, x, y, t0, hw, y_lens)\n\n        # final process\n        x = self.final_layer(x, t)  # (N, T, patch_size ** 2 * out_channels)\n        x = self.unpatchify(x)  # (N, out_channels, H, W)\n\n        # cast to float32 for better accuracy\n        x = x.to(torch.float32)\n        return x\n\n    def unpatchify(self, x):\n        c = self.out_channels\n        t, h, w = [self.input_size[i] // self.patch_size[i] for i in range(3)]\n        pt, ph, pw = self.patch_size\n\n        x = x.reshape(shape=(x.shape[0], t, h, w, pt, ph, pw, c))\n        x = rearrange(x, \"n t h w r p q c -> n c t r h p w q\")\n        imgs = x.reshape(shape=(x.shape[0], c, t * pt, h * ph, w * pw))\n        return imgs\n\n    def get_spatial_pos_embed(self, grid_size=None):\n        if grid_size is None:\n            grid_size = self.input_size[1:]\n        pos_embed = get_2d_sincos_pos_embed(\n            self.hidden_size,\n            (grid_size[0] // self.patch_size[1], grid_size[1] // self.patch_size[2]),\n            scale=self.space_scale,\n            base_size=self.base_size,\n        )\n        pos_embed = torch.from_numpy(pos_embed).float().unsqueeze(0).requires_grad_(False)\n        return pos_embed\n\n    def get_temporal_pos_embed(self):\n        pos_embed = get_1d_sincos_pos_embed(\n            self.hidden_size,\n            self.input_size[0] // self.patch_size[0],\n            scale=self.time_scale,\n        )\n        pos_embed = torch.from_numpy(pos_embed).float().unsqueeze(0).requires_grad_(False)\n        return pos_embed\n\n    def freeze_text(self):\n        for n, p in self.named_parameters():\n            if \"cross_attn\" in n:\n                p.requires_grad = False\n\n    def initialize_weights(self):\n        # Initialize transformer layers:\n        def _basic_init(module):\n            if isinstance(module, nn.Linear):\n                torch.nn.init.xavier_uniform_(module.weight)\n                if module.bias is not None:\n                    nn.init.constant_(module.bias, 0)\n\n        self.apply(_basic_init)\n\n        # Initialize patch_embed like nn.Linear (instead of nn.Conv2d):\n        w = self.x_embedder.proj.weight.data\n        nn.init.xavier_uniform_(w.view([w.shape[0], -1]))\n\n        # Initialize timestep embedding MLP:\n        nn.init.normal_(self.t_embedder.mlp[0].weight, std=0.02)\n        nn.init.normal_(self.t_embedder.mlp[2].weight, std=0.02)\n        nn.init.normal_(self.t_block[1].weight, std=0.02)\n\n        # Initialize caption embedding MLP:\n        nn.init.normal_(self.y_embedder.y_proj.fc1.weight, std=0.02)\n        nn.init.normal_(self.y_embedder.y_proj.fc2.weight, std=0.02)\n\n        # Zero-out adaLN modulation layers in PixArt blocks:\n        for block in self.blocks:\n            nn.init.constant_(block.cross_attn.proj.weight, 0)\n            nn.init.constant_(block.cross_attn.proj.bias, 0)\n\n        # Zero-out output layers:\n        nn.init.constant_(self.final_layer.linear.weight, 0)\n        nn.init.constant_(self.final_layer.linear.bias, 0)\n\n\n@MODELS.register_module(\"PixArt-Sigma-XL/2\")\ndef PixArt_Sigma_XL_2(from_pretrained=None, **kwargs):\n    model = PixArt_Sigma(depth=28, hidden_size=1152, patch_size=(1, 2, 2), num_heads=16, **kwargs)\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/stdit/__init__.py",
    "content": "from .stdit import STDiT\nfrom .stdit2 import STDiT2\nfrom .stdit3 import STDiT3\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/stdit/stdit.py",
    "content": "import numpy as np\nimport torch\nimport torch.distributed as dist\nimport torch.nn as nn\nfrom einops import rearrange\nfrom timm.models.layers import DropPath\nfrom timm.models.vision_transformer import Mlp\n\nfrom opensora.acceleration.checkpoint import auto_grad_checkpoint\nfrom opensora.acceleration.communications import gather_forward_split_backward, split_forward_gather_backward\nfrom opensora.acceleration.parallel_states import get_sequence_parallel_group\nfrom opensora.models.layers.blocks import (\n    Attention,\n    CaptionEmbedder,\n    MultiHeadCrossAttention,\n    PatchEmbed3D,\n    SeqParallelAttention,\n    SeqParallelMultiHeadCrossAttention,\n    T2IFinalLayer,\n    TimestepEmbedder,\n    approx_gelu,\n    get_1d_sincos_pos_embed,\n    get_2d_sincos_pos_embed,\n    get_layernorm,\n    t2i_modulate,\n)\nfrom opensora.registry import MODELS\nfrom opensora.utils.ckpt_utils import load_checkpoint\n\n\nclass STDiTBlock(nn.Module):\n    def __init__(\n        self,\n        hidden_size,\n        num_heads,\n        d_s=None,\n        d_t=None,\n        mlp_ratio=4.0,\n        drop_path=0.0,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n    ):\n        super().__init__()\n        self.hidden_size = hidden_size\n        self.enable_flash_attn = enable_flash_attn\n        self._enable_sequence_parallelism = enable_sequence_parallelism\n\n        if enable_sequence_parallelism:\n            self.attn_cls = SeqParallelAttention\n            self.mha_cls = SeqParallelMultiHeadCrossAttention\n        else:\n            self.attn_cls = Attention\n            self.mha_cls = MultiHeadCrossAttention\n\n        self.norm1 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.attn = self.attn_cls(\n            hidden_size,\n            num_heads=num_heads,\n            qkv_bias=True,\n            enable_flash_attn=enable_flash_attn,\n        )\n        self.cross_attn = self.mha_cls(hidden_size, num_heads)\n        self.norm2 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.mlp = Mlp(\n            in_features=hidden_size, hidden_features=int(hidden_size * mlp_ratio), act_layer=approx_gelu, drop=0\n        )\n        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity()\n        self.scale_shift_table = nn.Parameter(torch.randn(6, hidden_size) / hidden_size**0.5)\n\n        # temporal attention\n        self.d_s = d_s\n        self.d_t = d_t\n\n        if self._enable_sequence_parallelism:\n            sp_size = dist.get_world_size(get_sequence_parallel_group())\n            # make sure d_t is divisible by sp_size\n            assert d_t % sp_size == 0\n            self.d_t = d_t // sp_size\n\n        self.attn_temp = self.attn_cls(\n            hidden_size,\n            num_heads=num_heads,\n            qkv_bias=True,\n            enable_flash_attn=self.enable_flash_attn,\n        )\n\n    def t_mask_select(self, x, masked_x, x_mask):\n        # x: [B, (T, S), C]\n        # mased_x: [B, (T, S), C]\n        # x_mask: [B, T]\n        x = rearrange(x, \"B (T S) C -> B T S C\", T=self.d_t, S=self.d_s)\n        masked_x = rearrange(masked_x, \"B (T S) C -> B T S C\", T=self.d_t, S=self.d_s)\n        x = torch.where(x_mask[:, :, None, None], x, masked_x)\n        x = rearrange(x, \"B T S C -> B (T S) C\")\n        return x\n\n    def forward(self, x, y, t, mask=None, tpe=None, x_mask=None, t0=None):\n        B, N, C = x.shape\n\n        shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (\n            self.scale_shift_table[None] + t.reshape(B, 6, -1)\n        ).chunk(6, dim=1)\n        x_m = t2i_modulate(self.norm1(x), shift_msa, scale_msa)\n        if x_mask is not None:\n            shift_msa_zero, scale_msa_zero, gate_msa_zero, shift_mlp_zero, scale_mlp_zero, gate_mlp_zero = (\n                self.scale_shift_table[None] + t0.reshape(B, 6, -1)\n            ).chunk(6, dim=1)\n            x_m_zero = t2i_modulate(self.norm1(x), shift_msa_zero, scale_msa_zero)\n            x_m = self.t_mask_select(x_m, x_m_zero, x_mask)\n\n        # spatial branch\n        x_s = rearrange(x_m, \"B (T S) C -> (B T) S C\", T=self.d_t, S=self.d_s)\n        x_s = self.attn(x_s)\n        x_s = rearrange(x_s, \"(B T) S C -> B (T S) C\", T=self.d_t, S=self.d_s)\n\n        if x_mask is not None:\n            x_s_zero = gate_msa_zero * x_s\n            x_s = gate_msa * x_s\n            x_s = self.t_mask_select(x_s, x_s_zero, x_mask)\n        else:\n            x_s = gate_msa * x_s\n\n        x = x + self.drop_path(x_s)\n\n        # temporal branch\n        x_t = rearrange(x, \"B (T S) C -> (B S) T C\", T=self.d_t, S=self.d_s)\n        if tpe is not None:\n            x_t = x_t + tpe\n        x_t = self.attn_temp(x_t)\n        x_t = rearrange(x_t, \"(B S) T C -> B (T S) C\", T=self.d_t, S=self.d_s)\n        x = x + self.drop_path(gate_msa * x_t)\n\n        # cross attn\n        x = x + self.cross_attn(x, y, mask)\n\n        # mlp\n        x_m = t2i_modulate(self.norm2(x), shift_mlp, scale_mlp)\n        if x_mask is not None:\n            x_m_zero = t2i_modulate(self.norm2(x), shift_mlp_zero, scale_mlp_zero)\n            x_m = self.t_mask_select(x_m, x_m_zero, x_mask)\n\n        x_mlp = self.mlp(x_m)\n        if x_mask is not None:\n            x_mlp_zero = gate_mlp_zero * x_mlp\n            x_mlp = gate_mlp * x_mlp\n            x_mlp = self.t_mask_select(x_mlp, x_mlp_zero, x_mask)\n        else:\n            x_mlp = gate_mlp * x_mlp\n\n        x = x + self.drop_path(x_mlp)\n\n        return x\n\n\n@MODELS.register_module()\nclass STDiT(nn.Module):\n    def __init__(\n        self,\n        input_size=(1, 32, 32),\n        in_channels=4,\n        patch_size=(1, 2, 2),\n        hidden_size=1152,\n        depth=28,\n        num_heads=16,\n        mlp_ratio=4.0,\n        class_dropout_prob=0.1,\n        pred_sigma=True,\n        drop_path=0.0,\n        no_temporal_pos_emb=False,\n        caption_channels=4096,\n        model_max_length=120,\n        dtype=torch.float32,\n        space_scale=1.0,\n        time_scale=1.0,\n        freeze=None,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n    ):\n        super().__init__()\n        self.pred_sigma = pred_sigma\n        self.in_channels = in_channels\n        self.out_channels = in_channels * 2 if pred_sigma else in_channels\n        self.hidden_size = hidden_size\n        self.patch_size = patch_size\n        self.input_size = input_size\n        num_patches = np.prod([input_size[i] // patch_size[i] for i in range(3)])\n        self.num_patches = num_patches\n        self.num_temporal = input_size[0] // patch_size[0]\n        self.num_spatial = num_patches // self.num_temporal\n        self.num_heads = num_heads\n        self.dtype = dtype\n        self.no_temporal_pos_emb = no_temporal_pos_emb\n        self.depth = depth\n        self.mlp_ratio = mlp_ratio\n        self.enable_flash_attn = enable_flash_attn\n        self.enable_layernorm_kernel = enable_layernorm_kernel\n        self.space_scale = space_scale\n        self.time_scale = time_scale\n\n        self.register_buffer(\"pos_embed\", self.get_spatial_pos_embed())\n        self.register_buffer(\"pos_embed_temporal\", self.get_temporal_pos_embed())\n\n        self.x_embedder = PatchEmbed3D(patch_size, in_channels, hidden_size)\n        self.t_embedder = TimestepEmbedder(hidden_size)\n        self.t_block = nn.Sequential(nn.SiLU(), nn.Linear(hidden_size, 6 * hidden_size, bias=True))\n        self.y_embedder = CaptionEmbedder(\n            in_channels=caption_channels,\n            hidden_size=hidden_size,\n            uncond_prob=class_dropout_prob,\n            act_layer=approx_gelu,\n            token_num=model_max_length,\n        )\n\n        drop_path = [x.item() for x in torch.linspace(0, drop_path, depth)]\n        self.blocks = nn.ModuleList(\n            [\n                STDiTBlock(\n                    self.hidden_size,\n                    self.num_heads,\n                    mlp_ratio=self.mlp_ratio,\n                    drop_path=drop_path[i],\n                    enable_flash_attn=self.enable_flash_attn,\n                    enable_layernorm_kernel=self.enable_layernorm_kernel,\n                    enable_sequence_parallelism=enable_sequence_parallelism,\n                    d_t=self.num_temporal,\n                    d_s=self.num_spatial,\n                )\n                for i in range(self.depth)\n            ]\n        )\n        self.final_layer = T2IFinalLayer(\n            hidden_size,\n            np.prod(self.patch_size),\n            self.out_channels,\n            d_t=self.num_temporal,\n            d_s=self.num_spatial,\n        )\n\n        # init model\n        self.initialize_weights()\n        self.initialize_temporal()\n        if freeze is not None:\n            assert freeze in [\"not_temporal\", \"text\"]\n            if freeze == \"not_temporal\":\n                self.freeze_not_temporal()\n            elif freeze == \"text\":\n                self.freeze_text()\n\n        # sequence parallel related configs\n        self.enable_sequence_parallelism = enable_sequence_parallelism\n        if enable_sequence_parallelism:\n            self.sp_rank = dist.get_rank(get_sequence_parallel_group())\n        else:\n            self.sp_rank = None\n\n    def forward(self, x, timestep, y, mask=None, x_mask=None, **kwargs):\n        \"\"\"\n        Forward pass of STDiT.\n        Args:\n            x (torch.Tensor): latent representation of video; of shape [B, C, T, H, W]\n            timestep (torch.Tensor): diffusion time steps; of shape [B]\n            y (torch.Tensor): representation of prompts; of shape [B, 1, N_token, C]\n            mask (torch.Tensor): mask for selecting prompt tokens; of shape [B, N_token]\n\n        Returns:\n            x (torch.Tensor): output latent representation; of shape [B, C, T, H, W]\n        \"\"\"\n        dtype = self.x_embedder.proj.weight.dtype\n        x = x.to(dtype)\n        timestep = timestep.to(dtype)\n        y = y.to(dtype)\n\n        # embedding\n        x = self.x_embedder(x)  # [B, N, C]\n        x = rearrange(x, \"B (T S) C -> B T S C\", T=self.num_temporal, S=self.num_spatial)\n        x = x + self.pos_embed\n        x = rearrange(x, \"B T S C -> B (T S) C\")\n\n        # shard over the sequence dim if sp is enabled\n        if self.enable_sequence_parallelism:\n            x = split_forward_gather_backward(x, get_sequence_parallel_group(), dim=1, grad_scale=\"down\")\n\n        t = self.t_embedder(timestep, dtype=x.dtype)  # [B, C]\n        t_mlp = self.t_block(t)  # [B, C]\n        if x_mask is not None:\n            t0_timestep = torch.zeros_like(timestep)\n            t0 = self.t_embedder(t0_timestep, dtype=x.dtype)\n            t0_mlp = self.t_block(t0)\n        else:\n            t0 = None\n            t0_mlp = None\n        y = self.y_embedder(y, self.training)  # [B, 1, N_token, C]\n\n        if mask is not None:\n            if mask.shape[0] != y.shape[0]:\n                mask = mask.repeat(y.shape[0] // mask.shape[0], 1)\n            mask = mask.squeeze(1).squeeze(1)\n            y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])\n            y_lens = mask.sum(dim=1).tolist()\n        else:\n            y_lens = [y.shape[2]] * y.shape[0]\n            y = y.squeeze(1).view(1, -1, x.shape[-1])\n\n        # blocks\n        for i, block in enumerate(self.blocks):\n            if i == 0:\n                if self.enable_sequence_parallelism:\n                    tpe = torch.chunk(\n                        self.pos_embed_temporal, dist.get_world_size(get_sequence_parallel_group()), dim=1\n                    )[self.sp_rank].contiguous()\n                else:\n                    tpe = self.pos_embed_temporal\n            else:\n                tpe = None\n            x = auto_grad_checkpoint(block, x, y, t_mlp, y_lens, tpe, x_mask, t0_mlp)\n\n        if self.enable_sequence_parallelism:\n            x = gather_forward_split_backward(x, get_sequence_parallel_group(), dim=1, grad_scale=\"up\")\n        # x.shape: [B, N, C]\n\n        # final process\n        x = self.final_layer(x, t, x_mask, t0)  # [B, N, C=T_p * H_p * W_p * C_out]\n        x = self.unpatchify(x)  # [B, C_out, T, H, W]\n\n        # cast to float32 for better accuracy\n        x = x.to(torch.float32)\n        return x\n\n    def unpatchify(self, x):\n        \"\"\"\n        Args:\n            x (torch.Tensor): of shape [B, N, C]\n\n        Return:\n            x (torch.Tensor): of shape [B, C_out, T, H, W]\n        \"\"\"\n\n        N_t, N_h, N_w = [self.input_size[i] // self.patch_size[i] for i in range(3)]\n        T_p, H_p, W_p = self.patch_size\n        x = rearrange(\n            x,\n            \"B (N_t N_h N_w) (T_p H_p W_p C_out) -> B C_out (N_t T_p) (N_h H_p) (N_w W_p)\",\n            N_t=N_t,\n            N_h=N_h,\n            N_w=N_w,\n            T_p=T_p,\n            H_p=H_p,\n            W_p=W_p,\n            C_out=self.out_channels,\n        )\n        return x\n\n    def unpatchify_old(self, x):\n        c = self.out_channels\n        t, h, w = [self.input_size[i] // self.patch_size[i] for i in range(3)]\n        pt, ph, pw = self.patch_size\n\n        x = x.reshape(shape=(x.shape[0], t, h, w, pt, ph, pw, c))\n        x = rearrange(x, \"n t h w r p q c -> n c t r h p w q\")\n        imgs = x.reshape(shape=(x.shape[0], c, t * pt, h * ph, w * pw))\n        return imgs\n\n    def get_spatial_pos_embed(self, grid_size=None):\n        if grid_size is None:\n            grid_size = self.input_size[1:]\n        pos_embed = get_2d_sincos_pos_embed(\n            self.hidden_size,\n            (grid_size[0] // self.patch_size[1], grid_size[1] // self.patch_size[2]),\n            scale=self.space_scale,\n        )\n        pos_embed = torch.from_numpy(pos_embed).float().unsqueeze(0).requires_grad_(False)\n        return pos_embed\n\n    def get_temporal_pos_embed(self):\n        pos_embed = get_1d_sincos_pos_embed(\n            self.hidden_size,\n            self.input_size[0] // self.patch_size[0],\n            scale=self.time_scale,\n        )\n        pos_embed = torch.from_numpy(pos_embed).float().unsqueeze(0).requires_grad_(False)\n        return pos_embed\n\n    def freeze_not_temporal(self):\n        for n, p in self.named_parameters():\n            if \"attn_temp\" not in n:\n                p.requires_grad = False\n\n    def freeze_text(self):\n        for n, p in self.named_parameters():\n            if \"cross_attn\" in n:\n                p.requires_grad = False\n\n    def initialize_temporal(self):\n        for block in self.blocks:\n            nn.init.constant_(block.attn_temp.proj.weight, 0)\n            nn.init.constant_(block.attn_temp.proj.bias, 0)\n\n    def initialize_weights(self):\n        # Initialize transformer layers:\n        def _basic_init(module):\n            if isinstance(module, nn.Linear):\n                torch.nn.init.xavier_uniform_(module.weight)\n                if module.bias is not None:\n                    nn.init.constant_(module.bias, 0)\n\n        self.apply(_basic_init)\n\n        # Initialize patch_embed like nn.Linear (instead of nn.Conv2d):\n        w = self.x_embedder.proj.weight.data\n        nn.init.xavier_uniform_(w.view([w.shape[0], -1]))\n\n        # Initialize timestep embedding MLP:\n        nn.init.normal_(self.t_embedder.mlp[0].weight, std=0.02)\n        nn.init.normal_(self.t_embedder.mlp[2].weight, std=0.02)\n        nn.init.normal_(self.t_block[1].weight, std=0.02)\n\n        # Initialize caption embedding MLP:\n        nn.init.normal_(self.y_embedder.y_proj.fc1.weight, std=0.02)\n        nn.init.normal_(self.y_embedder.y_proj.fc2.weight, std=0.02)\n\n        # Zero-out adaLN modulation layers in PixArt blocks:\n        for block in self.blocks:\n            nn.init.constant_(block.cross_attn.proj.weight, 0)\n            nn.init.constant_(block.cross_attn.proj.bias, 0)\n\n        # Zero-out output layers:\n        nn.init.constant_(self.final_layer.linear.weight, 0)\n        nn.init.constant_(self.final_layer.linear.bias, 0)\n\n\n@MODELS.register_module(\"STDiT-XL/2\")\ndef STDiT_XL_2(from_pretrained=None, **kwargs):\n    model = STDiT(depth=28, hidden_size=1152, patch_size=(1, 2, 2), num_heads=16, **kwargs)\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/stdit/stdit2.py",
    "content": "import os\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom einops import rearrange\nfrom rotary_embedding_torch import RotaryEmbedding\nfrom timm.models.layers import DropPath\nfrom timm.models.vision_transformer import Mlp\nfrom transformers import PretrainedConfig, PreTrainedModel\n\nfrom opensora.acceleration.checkpoint import auto_grad_checkpoint\nfrom opensora.models.layers.blocks import (\n    Attention,\n    CaptionEmbedder,\n    MultiHeadCrossAttention,\n    PatchEmbed3D,\n    PositionEmbedding2D,\n    SizeEmbedder,\n    T2IFinalLayer,\n    TimestepEmbedder,\n    approx_gelu,\n    get_2d_sincos_pos_embed,\n    get_layernorm,\n    t2i_modulate,\n)\nfrom opensora.registry import MODELS\nfrom opensora.utils.ckpt_utils import load_checkpoint\n\n\nclass STDiT2Block(nn.Module):\n    def __init__(\n        self,\n        hidden_size,\n        num_heads,\n        mlp_ratio=4.0,\n        drop_path=0.0,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n        rope=None,\n        qk_norm=False,\n        qk_norm_legacy=False,\n    ):\n        super().__init__()\n        self.hidden_size = hidden_size\n        self.enable_flash_attn = enable_flash_attn\n        self._enable_sequence_parallelism = enable_sequence_parallelism\n\n        # spatial branch\n        self.norm1 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.attn = Attention(\n            hidden_size,\n            num_heads=num_heads,\n            qkv_bias=True,\n            enable_flash_attn=enable_flash_attn,\n            qk_norm=qk_norm,\n            qk_norm_legacy=qk_norm_legacy,\n        )\n        self.scale_shift_table = nn.Parameter(torch.randn(6, hidden_size) / hidden_size**0.5)\n\n        # cross attn\n        self.cross_attn = MultiHeadCrossAttention(hidden_size, num_heads)\n\n        # mlp branch\n        self.norm2 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.mlp = Mlp(\n            in_features=hidden_size, hidden_features=int(hidden_size * mlp_ratio), act_layer=approx_gelu, drop=0\n        )\n        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity()\n\n        # temporal branch\n        self.norm_temp = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)  # new\n        self.attn_temp = Attention(\n            hidden_size,\n            num_heads=num_heads,\n            qkv_bias=True,\n            enable_flash_attn=self.enable_flash_attn,\n            rope=rope,\n            qk_norm=qk_norm,\n            qk_norm_legacy=qk_norm_legacy,\n        )\n        self.scale_shift_table_temporal = nn.Parameter(torch.randn(3, hidden_size) / hidden_size**0.5)  # new\n\n    def t_mask_select(self, x_mask, x, masked_x, T, S):\n        # x: [B, (T, S), C]\n        # mased_x: [B, (T, S), C]\n        # x_mask: [B, T]\n        x = rearrange(x, \"B (T S) C -> B T S C\", T=T, S=S)\n        masked_x = rearrange(masked_x, \"B (T S) C -> B T S C\", T=T, S=S)\n        x = torch.where(x_mask[:, :, None, None], x, masked_x)\n        x = rearrange(x, \"B T S C -> B (T S) C\")\n        return x\n\n    def forward(self, x, y, t, t_tmp, mask=None, x_mask=None, t0=None, t0_tmp=None, T=None, S=None):\n        B, N, C = x.shape\n\n        shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (\n            self.scale_shift_table[None] + t.reshape(B, 6, -1)\n        ).chunk(6, dim=1)\n        shift_tmp, scale_tmp, gate_tmp = (self.scale_shift_table_temporal[None] + t_tmp.reshape(B, 3, -1)).chunk(\n            3, dim=1\n        )\n        if x_mask is not None:\n            shift_msa_zero, scale_msa_zero, gate_msa_zero, shift_mlp_zero, scale_mlp_zero, gate_mlp_zero = (\n                self.scale_shift_table[None] + t0.reshape(B, 6, -1)\n            ).chunk(6, dim=1)\n            shift_tmp_zero, scale_tmp_zero, gate_tmp_zero = (\n                self.scale_shift_table_temporal[None] + t0_tmp.reshape(B, 3, -1)\n            ).chunk(3, dim=1)\n\n        # modulate\n        x_m = t2i_modulate(self.norm1(x), shift_msa, scale_msa)\n        if x_mask is not None:\n            x_m_zero = t2i_modulate(self.norm1(x), shift_msa_zero, scale_msa_zero)\n            x_m = self.t_mask_select(x_mask, x_m, x_m_zero, T, S)\n\n        # spatial branch\n        x_s = rearrange(x_m, \"B (T S) C -> (B T) S C\", T=T, S=S)\n        x_s = self.attn(x_s)\n        x_s = rearrange(x_s, \"(B T) S C -> B (T S) C\", T=T, S=S)\n        if x_mask is not None:\n            x_s_zero = gate_msa_zero * x_s\n            x_s = gate_msa * x_s\n            x_s = self.t_mask_select(x_mask, x_s, x_s_zero, T, S)\n        else:\n            x_s = gate_msa * x_s\n        x = x + self.drop_path(x_s)\n\n        # modulate\n        x_m = t2i_modulate(self.norm_temp(x), shift_tmp, scale_tmp)\n        if x_mask is not None:\n            x_m_zero = t2i_modulate(self.norm_temp(x), shift_tmp_zero, scale_tmp_zero)\n            x_m = self.t_mask_select(x_mask, x_m, x_m_zero, T, S)\n\n        # temporal branch\n        x_t = rearrange(x_m, \"B (T S) C -> (B S) T C\", T=T, S=S)\n        x_t = self.attn_temp(x_t)\n        x_t = rearrange(x_t, \"(B S) T C -> B (T S) C\", T=T, S=S)\n        if x_mask is not None:\n            x_t_zero = gate_tmp_zero * x_t\n            x_t = gate_tmp * x_t\n            x_t = self.t_mask_select(x_mask, x_t, x_t_zero, T, S)\n        else:\n            x_t = gate_tmp * x_t\n        x = x + self.drop_path(x_t)\n\n        # cross attn\n        x = x + self.cross_attn(x, y, mask)\n\n        # modulate\n        x_m = t2i_modulate(self.norm2(x), shift_mlp, scale_mlp)\n        if x_mask is not None:\n            x_m_zero = t2i_modulate(self.norm2(x), shift_mlp_zero, scale_mlp_zero)\n            x_m = self.t_mask_select(x_mask, x_m, x_m_zero, T, S)\n\n        # mlp\n        x_mlp = self.mlp(x_m)\n        if x_mask is not None:\n            x_mlp_zero = gate_mlp_zero * x_mlp\n            x_mlp = gate_mlp * x_mlp\n            x_mlp = self.t_mask_select(x_mask, x_mlp, x_mlp_zero, T, S)\n        else:\n            x_mlp = gate_mlp * x_mlp\n        x = x + self.drop_path(x_mlp)\n\n        return x\n\n\nclass STDiT2Config(PretrainedConfig):\n    model_type = \"STDiT2\"\n\n    def __init__(\n        self,\n        input_size=(None, None, None),\n        input_sq_size=32,\n        in_channels=4,\n        patch_size=(1, 2, 2),\n        hidden_size=1152,\n        depth=28,\n        num_heads=16,\n        mlp_ratio=4.0,\n        class_dropout_prob=0.1,\n        pred_sigma=True,\n        drop_path=0.0,\n        no_temporal_pos_emb=False,\n        caption_channels=4096,\n        model_max_length=120,\n        freeze=None,\n        qk_norm=False,\n        qk_norm_legacy=False,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        **kwargs,\n    ):\n        self.input_size = input_size\n        self.input_sq_size = input_sq_size\n        self.in_channels = in_channels\n        self.patch_size = patch_size\n        self.hidden_size = hidden_size\n        self.depth = depth\n        self.num_heads = num_heads\n        self.mlp_ratio = mlp_ratio\n        self.class_dropout_prob = class_dropout_prob\n        self.pred_sigma = pred_sigma\n        self.drop_path = drop_path\n        self.no_temporal_pos_emb = no_temporal_pos_emb\n        self.caption_channels = caption_channels\n        self.model_max_length = model_max_length\n        self.freeze = freeze\n        self.qk_norm = qk_norm\n        self.qk_norm_legacy = qk_norm_legacy\n        self.enable_flash_attn = enable_flash_attn\n        self.enable_layernorm_kernel = enable_layernorm_kernel\n        super().__init__(**kwargs)\n\n\n@MODELS.register_module()\nclass STDiT2(PreTrainedModel):\n    config_class = STDiT2Config\n\n    def __init__(self, config):\n        super().__init__(config)\n        self.pred_sigma = config.pred_sigma\n        self.in_channels = config.in_channels\n        self.out_channels = config.in_channels * 2 if config.pred_sigma else config.in_channels\n        self.hidden_size = config.hidden_size\n        self.num_heads = config.num_heads\n        self.no_temporal_pos_emb = config.no_temporal_pos_emb\n        self.depth = config.depth\n        self.mlp_ratio = config.mlp_ratio\n        self.enable_flash_attn = config.enable_flash_attn\n        self.enable_layernorm_kernel = config.enable_layernorm_kernel\n\n        # support dynamic input\n        self.patch_size = config.patch_size\n        self.input_size = config.input_size\n        self.input_sq_size = config.input_sq_size\n        self.pos_embed = PositionEmbedding2D(config.hidden_size)\n\n        self.x_embedder = PatchEmbed3D(config.patch_size, config.in_channels, config.hidden_size)\n        self.t_embedder = TimestepEmbedder(config.hidden_size)\n        self.t_block = nn.Sequential(nn.SiLU(), nn.Linear(config.hidden_size, 6 * config.hidden_size, bias=True))\n        self.t_block_temp = nn.Sequential(\n            nn.SiLU(), nn.Linear(config.hidden_size, 3 * config.hidden_size, bias=True)\n        )  # new\n        self.y_embedder = CaptionEmbedder(\n            in_channels=config.caption_channels,\n            hidden_size=config.hidden_size,\n            uncond_prob=config.class_dropout_prob,\n            act_layer=approx_gelu,\n            token_num=config.model_max_length,\n        )\n\n        drop_path = [x.item() for x in torch.linspace(0, config.drop_path, config.depth)]\n        self.rope = RotaryEmbedding(dim=self.hidden_size // self.num_heads)  # new\n        self.blocks = nn.ModuleList(\n            [\n                STDiT2Block(\n                    self.hidden_size,\n                    self.num_heads,\n                    mlp_ratio=self.mlp_ratio,\n                    drop_path=drop_path[i],\n                    enable_flash_attn=self.enable_flash_attn,\n                    enable_layernorm_kernel=self.enable_layernorm_kernel,\n                    rope=self.rope.rotate_queries_or_keys,\n                    qk_norm=config.qk_norm,\n                    qk_norm_legacy=config.qk_norm_legacy,\n                )\n                for i in range(self.depth)\n            ]\n        )\n        self.final_layer = T2IFinalLayer(config.hidden_size, np.prod(self.patch_size), self.out_channels)\n\n        # multi_res\n        assert self.hidden_size % 3 == 0, \"hidden_size must be divisible by 3\"\n        self.csize_embedder = SizeEmbedder(self.hidden_size // 3)\n        self.ar_embedder = SizeEmbedder(self.hidden_size // 3)\n        self.fl_embedder = SizeEmbedder(self.hidden_size)  # new\n        self.fps_embedder = SizeEmbedder(self.hidden_size)  # new\n\n        # init model\n        self.initialize_weights()\n        self.initialize_temporal()\n        if config.freeze is not None:\n            assert config.freeze in [\"not_temporal\", \"text\"]\n            if config.freeze == \"not_temporal\":\n                self.freeze_not_temporal()\n            elif config.freeze == \"text\":\n                self.freeze_text()\n\n    def get_dynamic_size(self, x):\n        _, _, T, H, W = x.size()\n        if T % self.patch_size[0] != 0:\n            T += self.patch_size[0] - T % self.patch_size[0]\n        if H % self.patch_size[1] != 0:\n            H += self.patch_size[1] - H % self.patch_size[1]\n        if W % self.patch_size[2] != 0:\n            W += self.patch_size[2] - W % self.patch_size[2]\n        T = T // self.patch_size[0]\n        H = H // self.patch_size[1]\n        W = W // self.patch_size[2]\n        return (T, H, W)\n\n    def forward(\n        self, x, timestep, y, mask=None, x_mask=None, num_frames=None, height=None, width=None, ar=None, fps=None\n    ):\n        \"\"\"\n        Forward pass of STDiT.\n        Args:\n            x (torch.Tensor): latent representation of video; of shape [B, C, T, H, W]\n            timestep (torch.Tensor): diffusion time steps; of shape [B]\n            y (torch.Tensor): representation of prompts; of shape [B, 1, N_token, C]\n            mask (torch.Tensor): mask for selecting prompt tokens; of shape [B, N_token]\n\n        Returns:\n            x (torch.Tensor): output latent representation; of shape [B, C, T, H, W]\n        \"\"\"\n        B = x.shape[0]\n        dtype = self.x_embedder.proj.weight.dtype\n        x = x.to(dtype)\n        timestep = timestep.to(dtype)\n        y = y.to(dtype)\n\n        # === process data info ===\n        # 1. get dynamic size\n        hw = torch.cat([height[:, None], width[:, None]], dim=1)\n        rs = (height[0].item() * width[0].item()) ** 0.5\n        csize = self.csize_embedder(hw, B)\n\n        # 2. get aspect ratio\n        ar = ar.unsqueeze(1)\n        ar = self.ar_embedder(ar, B)\n        data_info = torch.cat([csize, ar], dim=1)\n\n        # 3. get number of frames\n        fl = num_frames.unsqueeze(1)\n        fps = fps.unsqueeze(1)\n        fl = self.fl_embedder(fl, B)\n        fl = fl + self.fps_embedder(fps, B)\n\n        # === get dynamic shape size ===\n        _, _, Tx, Hx, Wx = x.size()\n        T, H, W = self.get_dynamic_size(x)\n        S = H * W\n        scale = rs / self.input_sq_size\n        base_size = round(S**0.5)\n        pos_emb = self.pos_embed(x, H, W, scale=scale, base_size=base_size)\n\n        # embedding\n        x = self.x_embedder(x)  # [B, N, C]\n        x = rearrange(x, \"B (T S) C -> B T S C\", T=T, S=S)\n        x = x + pos_emb\n        x = rearrange(x, \"B T S C -> B (T S) C\")\n\n        # prepare adaIN\n        t = self.t_embedder(timestep, dtype=x.dtype)  # [B, C]\n        t_spc = t + data_info  # [B, C]\n        t_tmp = t + fl  # [B, C]\n        t_spc_mlp = self.t_block(t_spc)  # [B, 6*C]\n        t_tmp_mlp = self.t_block_temp(t_tmp)  # [B, 3*C]\n        if x_mask is not None:\n            t0_timestep = torch.zeros_like(timestep)\n            t0 = self.t_embedder(t0_timestep, dtype=x.dtype)\n            t0_spc = t0 + data_info\n            t0_tmp = t0 + fl\n            t0_spc_mlp = self.t_block(t0_spc)\n            t0_tmp_mlp = self.t_block_temp(t0_tmp)\n        else:\n            t0_spc = None\n            t0_tmp = None\n            t0_spc_mlp = None\n            t0_tmp_mlp = None\n\n        # prepare y\n        y = self.y_embedder(y, self.training)  # [B, 1, N_token, C]\n\n        if mask is not None:\n            if mask.shape[0] != y.shape[0]:\n                mask = mask.repeat(y.shape[0] // mask.shape[0], 1)\n            mask = mask.squeeze(1).squeeze(1)\n            y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])\n            y_lens = mask.sum(dim=1).tolist()\n        else:\n            y_lens = [y.shape[2]] * y.shape[0]\n            y = y.squeeze(1).view(1, -1, x.shape[-1])\n\n        # blocks\n        for _, block in enumerate(self.blocks):\n            x = auto_grad_checkpoint(\n                block,\n                x,\n                y,\n                t_spc_mlp,\n                t_tmp_mlp,\n                y_lens,\n                x_mask,\n                t0_spc_mlp,\n                t0_tmp_mlp,\n                T,\n                S,\n            )\n            # x.shape: [B, N, C]\n\n        # final process\n        x = self.final_layer(x, t, x_mask, t0_spc, T, S)  # [B, N, C=T_p * H_p * W_p * C_out]\n        x = self.unpatchify(x, T, H, W, Tx, Hx, Wx)  # [B, C_out, T, H, W]\n\n        # cast to float32 for better accuracy\n        x = x.to(torch.float32)\n        return x\n\n    def unpatchify(self, x, N_t, N_h, N_w, R_t, R_h, R_w):\n        \"\"\"\n        Args:\n            x (torch.Tensor): of shape [B, N, C]\n\n        Return:\n            x (torch.Tensor): of shape [B, C_out, T, H, W]\n        \"\"\"\n\n        # N_t, N_h, N_w = [self.input_size[i] // self.patch_size[i] for i in range(3)]\n        T_p, H_p, W_p = self.patch_size\n        x = rearrange(\n            x,\n            \"B (N_t N_h N_w) (T_p H_p W_p C_out) -> B C_out (N_t T_p) (N_h H_p) (N_w W_p)\",\n            N_t=N_t,\n            N_h=N_h,\n            N_w=N_w,\n            T_p=T_p,\n            H_p=H_p,\n            W_p=W_p,\n            C_out=self.out_channels,\n        )\n        # unpad\n        x = x[:, :, :R_t, :R_h, :R_w]\n        return x\n\n    def unpatchify_old(self, x):\n        c = self.out_channels\n        t, h, w = [self.input_size[i] // self.patch_size[i] for i in range(3)]\n        pt, ph, pw = self.patch_size\n\n        x = x.reshape(shape=(x.shape[0], t, h, w, pt, ph, pw, c))\n        x = rearrange(x, \"n t h w r p q c -> n c t r h p w q\")\n        imgs = x.reshape(shape=(x.shape[0], c, t * pt, h * ph, w * pw))\n        return imgs\n\n    def get_spatial_pos_embed(self, H, W, scale=1.0, base_size=None):\n        pos_embed = get_2d_sincos_pos_embed(\n            self.hidden_size,\n            (H, W),\n            scale=scale,\n            base_size=base_size,\n        )\n        pos_embed = torch.from_numpy(pos_embed).float().unsqueeze(0).requires_grad_(False)\n        return pos_embed\n\n    def freeze_not_temporal(self):\n        for n, p in self.named_parameters():\n            if \"attn_temp\" not in n:\n                p.requires_grad = False\n\n    def freeze_text(self):\n        for n, p in self.named_parameters():\n            if \"cross_attn\" in n:\n                p.requires_grad = False\n\n    def initialize_temporal(self):\n        for block in self.blocks:\n            nn.init.constant_(block.attn_temp.proj.weight, 0)\n            nn.init.constant_(block.attn_temp.proj.bias, 0)\n\n    def initialize_weights(self):\n        # Initialize transformer layers:\n        def _basic_init(module):\n            if isinstance(module, nn.Linear):\n                torch.nn.init.xavier_uniform_(module.weight)\n                if module.bias is not None:\n                    nn.init.constant_(module.bias, 0)\n\n        self.apply(_basic_init)\n\n        # Initialize patch_embed like nn.Linear (instead of nn.Conv2d):\n        w = self.x_embedder.proj.weight.data\n        nn.init.xavier_uniform_(w.view([w.shape[0], -1]))\n\n        # Initialize timestep embedding MLP:\n        nn.init.normal_(self.t_embedder.mlp[0].weight, std=0.02)\n        nn.init.normal_(self.t_embedder.mlp[2].weight, std=0.02)\n        nn.init.normal_(self.t_block[1].weight, std=0.02)\n        nn.init.normal_(self.t_block_temp[1].weight, std=0.02)\n\n        # Initialize caption embedding MLP:\n        nn.init.normal_(self.y_embedder.y_proj.fc1.weight, std=0.02)\n        nn.init.normal_(self.y_embedder.y_proj.fc2.weight, std=0.02)\n\n        # Zero-out adaLN modulation layers in PixArt blocks:\n        for block in self.blocks:\n            nn.init.constant_(block.cross_attn.proj.weight, 0)\n            nn.init.constant_(block.cross_attn.proj.bias, 0)\n\n        # Zero-out output layers:\n        nn.init.constant_(self.final_layer.linear.weight, 0)\n        nn.init.constant_(self.final_layer.linear.bias, 0)\n\n\n@MODELS.register_module(\"STDiT2-XL/2\")\ndef STDiT2_XL_2(from_pretrained=None, **kwargs):\n    if from_pretrained is not None:\n        if os.path.isdir(from_pretrained) or os.path.isfile(from_pretrained):\n            # if it is a directory or a file, we load the checkpoint manually\n            config = STDiT2Config(depth=28, hidden_size=1152, patch_size=(1, 2, 2), num_heads=16, **kwargs)\n            model = STDiT2(config)\n            load_checkpoint(model, from_pretrained)\n            return model\n        else:\n            # otherwise, we load the model from hugging face hub\n            return STDiT2.from_pretrained(from_pretrained)\n    else:\n        # create a new model\n        config = STDiT2Config(depth=28, hidden_size=1152, patch_size=(1, 2, 2), num_heads=16, **kwargs)\n        model = STDiT2(config)\n    return model\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/stdit/stdit3 copy.py",
    "content": "import os\n\nimport numpy as np\nimport torch\nimport torch.distributed as dist\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom einops import rearrange\nfrom rotary_embedding_torch import RotaryEmbedding\nfrom timm.models.layers import DropPath\nfrom timm.models.vision_transformer import Mlp\nfrom transformers import PretrainedConfig, PreTrainedModel\n\nfrom opensora.acceleration.checkpoint import auto_grad_checkpoint\nfrom opensora.acceleration.communications import gather_forward_split_backward, split_forward_gather_backward\nfrom opensora.acceleration.parallel_states import get_sequence_parallel_group\nfrom opensora.models.layers.blocks import (\n    Attention,\n    CaptionEmbedder,\n    MultiHeadCrossAttention,\n    PatchEmbed3D,\n    PositionEmbedding2D,\n    SeqParallelAttention,\n    SeqParallelMultiHeadCrossAttention,\n    SizeEmbedder,\n    T2IFinalLayer,\n    TimestepEmbedder,\n    approx_gelu,\n    get_layernorm,\n    t2i_modulate,\n)\nfrom opensora.registry import MODELS\nfrom opensora.utils.ckpt_utils import load_checkpoint\n\nfrom ...models.cache_functions import global_force_fresh, cache_cutfresh, update_cache, force_init, score_evaluate\n\nclass STDiT3Block(nn.Module):\n    def __init__(\n        self,\n        hidden_size,\n        num_heads,\n        mlp_ratio=4.0,\n        drop_path=0.0,\n        rope=None,\n        qk_norm=False,\n        temporal=False,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n    ):\n        super().__init__()\n        self.temporal = temporal\n        self.hidden_size = hidden_size\n        self.enable_flash_attn = enable_flash_attn\n        self.enable_sequence_parallelism = enable_sequence_parallelism\n\n        if self.enable_sequence_parallelism and not temporal:\n            attn_cls = SeqParallelAttention\n            mha_cls = SeqParallelMultiHeadCrossAttention\n        else:\n            attn_cls = Attention\n            mha_cls = MultiHeadCrossAttention\n\n        self.norm1 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.attn = attn_cls(\n            hidden_size,\n            num_heads=num_heads,\n            qkv_bias=True,\n            qk_norm=qk_norm,\n            rope=rope,\n            enable_flash_attn=enable_flash_attn,\n        )\n        self.cross_attn = mha_cls(hidden_size, num_heads)\n        self.norm2 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.mlp = Mlp(\n            in_features=hidden_size, hidden_features=int(hidden_size * mlp_ratio), act_layer=approx_gelu, drop=0\n        )\n        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity()\n        self.scale_shift_table = nn.Parameter(torch.randn(6, hidden_size) / hidden_size**0.5)\n\n    def t_mask_select(self, x_mask, x, masked_x, T, S):\n        # x: [B, (T, S), C]\n        # mased_x: [B, (T, S), C]\n        # x_mask: [B, T]\n        x = rearrange(x, \"B (T S) C -> B T S C\", T=T, S=S)\n        masked_x = rearrange(masked_x, \"B (T S) C -> B T S C\", T=T, S=S)\n        x = torch.where(x_mask[:, :, None, None], x, masked_x)\n        x = rearrange(x, \"B T S C -> B (T S) C\")\n        return x\n\n    def forward(\n        self,\n        x,\n        y,\n        t,\n        current,\n        cache_dic,\n        mask=None,  # text mask\n        x_mask=None,  # temporal mask\n        t0=None,  # t with timestamp=0\n        T=None,  # number of frames\n        S=None,  # number of pixel patches\n    ):\n        # prepare modulate parameters\n        B, N, C = x.shape\n        shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (\n            self.scale_shift_table[None] + t.reshape(B, 6, -1)\n        ).chunk(6, dim=1)\n        if x_mask is not None:\n            shift_msa_zero, scale_msa_zero, gate_msa_zero, shift_mlp_zero, scale_mlp_zero, gate_mlp_zero = (\n                self.scale_shift_table[None] + t0.reshape(B, 6, -1)\n            ).chunk(6, dim=1)\n        #attn_tick = torch.cuda.Event(enable_timing=True)\n        #cross_attn_tick = torch.cuda.Event(enable_timing=True)\n        #end_cross_attn_tick = torch.cuda.Event(enable_timing=True)\n        #mlp_tick = torch.cuda.Event(enable_timing=True)\n        #end = torch.cuda.Event(enable_timing=True)\n        if self.temporal:\n            current['flag'] = -1\n        else:\n            current['flag'] = 0\n        is_force_fresh = global_force_fresh(cache_dic, current)\n        current['is_force_fresh'] = is_force_fresh\n        #print(is_force_fresh)\n        if is_force_fresh:\n            # modulate (attention)\n            current['module'] = 'attn'\n            #attn_tick.record()\n            x_m = t2i_modulate(self.norm1(x), shift_msa, scale_msa)\n            if x_mask is not None:\n                x_m_zero = t2i_modulate(self.norm1(x), shift_msa_zero, scale_msa_zero)\n                x_m = self.t_mask_select(x_mask, x_m, x_m_zero, T, S)\n\n            # attention\n            if self.temporal:\n                x_m = rearrange(x_m, \"B (T S) C -> (B S) T C\", T=T, S=S)\n                x_m = self.attn(x_m)\n                x_m = rearrange(x_m, \"(B S) T C -> B (T S) C\", T=T, S=S)\n            else:\n                x_m = rearrange(x_m, \"B (T S) C -> (B T) S C\", T=T, S=S)\n                x_m = self.attn(x_m)\n                x_m = rearrange(x_m, \"(B T) S C -> B (T S) C\", T=T, S=S)\n\n            cache_dic['cache'][current['flag']][current['layer']][current['module']] = x_m\n            force_init(cache_dic, current, x)\n            # modulate (attention)\n            x_m_s = gate_msa * x_m\n            if x_mask is not None:\n                x_m_s_zero = gate_msa_zero * x_m\n                x_m_s = self.t_mask_select(x_mask, x_m_s, x_m_s_zero, T, S)\n\n            # residual\n            x = x + self.drop_path(x_m_s)\n\n            # cross attention\n            current['module'] = 'cross-attn'\n            #cross_attn_tick.record()\n            cache_dic['cache'][current['flag']][current['layer']][current['module']], cache_dic['cross_attn_map'][current['flag']][current['layer']] = self.cross_attn(x, y, mask)\n            force_init(cache_dic, current, x)\n\n            x = x + cache_dic['cache'][current['flag']][current['layer']][current['module']]\n\n            # modulate (MLP)\n            current['module'] = 'mlp'\n            #mlp_tick.record()\n            x_m = t2i_modulate(self.norm2(x), shift_mlp, scale_mlp)\n            if x_mask is not None:\n                x_m_zero = t2i_modulate(self.norm2(x), shift_mlp_zero, scale_mlp_zero)\n                x_m = self.t_mask_select(x_mask, x_m, x_m_zero, T, S)\n\n            # MLP\n            x_m = self.mlp(x_m)\n            cache_dic['cache'][current['flag']][current['layer']][current['module']] = x_m\n            # modulate (MLP)\n            x_m_s = gate_mlp * x_m\n            if x_mask is not None:\n                x_m_s_zero = gate_mlp_zero * x_m\n                x_m_s = self.t_mask_select(x_mask, x_m_s, x_m_s_zero, T, S)\n\n            # residual\n            force_init(cache_dic, current, x)\n            x = x + self.drop_path(x_m_s)\n            #end.record()\n            #torch.cuda.synchronize()\n            #print(attn_tick.elapsed_time(cross_attn_tick),cross_attn_tick.elapsed_time(mlp_tick),mlp_tick.elapsed_time(end))\n        else:\n            # modulate (attention)\n            current['module'] = 'attn'\n            #attn_tick.record()\n            #cal_attn = current['step'] % cache_dic['cal_threshold'] == 1\n            cal_attn = True\n            if cal_attn:\n                x_m = t2i_modulate(self.norm1(x), shift_msa, scale_msa)\n                if x_mask is not None:\n                    x_m_zero = t2i_modulate(self.norm1(x), shift_msa_zero, scale_msa_zero)\n                    x_m = self.t_mask_select(x_mask, x_m, x_m_zero, T, S)\n\n                # attention\n                if self.temporal:\n                    x_m = rearrange(x_m, \"B (T S) C -> (B S) T C\", T=T, S=S)\n                    x_m = self.attn(x_m)\n                    x_m = rearrange(x_m, \"(B S) T C -> B (T S) C\", T=T, S=S)\n                else:\n                    x_m = rearrange(x_m, \"B (T S) C -> (B T) S C\", T=T, S=S)\n                    x_m = self.attn(x_m)\n                    x_m = rearrange(x_m, \"(B T) S C -> B (T S) C\", T=T, S=S)\n\n                cache_dic['cache'][current['flag']][current['layer']][current['module']] = x_m\n            \n            x_m = cache_dic['cache'][current['flag']][current['layer']][current['module']]\n            \n            # modulate (attention)\n            x_m_s = gate_msa * x_m\n            if x_mask is not None:\n                x_m_s_zero = gate_msa_zero * x_m\n                x_m_s = self.t_mask_select(x_mask, x_m_s, x_m_s_zero, T, S)\n\n            # residual\n            x = x + self.drop_path(x_m_s)\n\n            # cross attention\n            current['module'] = 'cross-attn'\n\n            #cache_dic['cache'][flag][current['layer']][current['module']] = self.cross_attn(x, y, mask)\n            #x = x + cache_dic['cache'][flag][current['layer']][current['module']]\n\n            fresh_indices, fresh_tokens = cache_cutfresh(cache_dic, x, current) # 0.6ms\n\n            fresh_tokens, fresh_cross_attn_map = self.cross_attn(fresh_tokens, y, mask) # 0.45ms\n            #cross_attn_tick.record()\n            update_cache(fresh_indices, fresh_tokens=fresh_tokens, cache_dic=cache_dic, current=current, fresh_attn_map=fresh_cross_attn_map) # 0.3ms\n            #cache_dic['cache'][-1][current['layer']][current['module']] = self.cross_attn(x, y, mask)\n            x = x + cache_dic['cache'][current['flag']][current['layer']][current['module']] \n\n            # modulate (MLP)\n            current['module'] = 'mlp'\n            #mlp_tick.record()\n            x_m = t2i_modulate(self.norm2(x), shift_mlp, scale_mlp)\n            if x_mask is not None:\n                x_m_zero = t2i_modulate(self.norm2(x), shift_mlp_zero, scale_mlp_zero)\n                x_m = self.t_mask_select(x_mask, x_m, x_m_zero, T, S)\n            # MLP\n            fresh_indices, fresh_tokens = cache_cutfresh(cache_dic, x_m, current)\n            fresh_tokens = self.mlp(fresh_tokens)\n            update_cache(fresh_indices, fresh_tokens=fresh_tokens, cache_dic=cache_dic, current=current, fresh_attn_map=fresh_cross_attn_map)\n\n            x_m = cache_dic['cache'][current['flag']][current['layer']][current['module']]\n            # modulate (MLP)\n            x_m_s = gate_mlp * x_m\n            if x_mask is not None:\n                x_m_s_zero = gate_mlp_zero * x_m\n                x_m_s = self.t_mask_select(x_mask, x_m_s, x_m_s_zero, T, S)\n\n            # residual\n            x = x + self.drop_path(x_m_s)\n            #end.record()\n            #torch.cuda.synchronize()\n            #print(\"Cached:\",attn_tick.elapsed_time(cross_attn_tick),cross_attn_tick.elapsed_time(mlp_tick),mlp_tick.elapsed_time(end))\n            #print(cross_attn_tick.elapsed_time(end_cross_attn_tick))\n        return x\n\n\nclass STDiT3Config(PretrainedConfig):\n    model_type = \"STDiT3\"\n\n    def __init__(\n        self,\n        input_size=(None, None, None),\n        input_sq_size=512,\n        in_channels=4,\n        patch_size=(1, 2, 2),\n        hidden_size=1152,\n        depth=28,\n        num_heads=16,\n        mlp_ratio=4.0,\n        class_dropout_prob=0.1,\n        pred_sigma=True,\n        drop_path=0.0,\n        caption_channels=4096,\n        model_max_length=300,\n        qk_norm=True,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n        only_train_temporal=False,\n        freeze_y_embedder=False,\n        skip_y_embedder=False,\n        **kwargs,\n    ):\n        self.input_size = input_size\n        self.input_sq_size = input_sq_size\n        self.in_channels = in_channels\n        self.patch_size = patch_size\n        self.hidden_size = hidden_size\n        self.depth = depth\n        self.num_heads = num_heads\n        self.mlp_ratio = mlp_ratio\n        self.class_dropout_prob = class_dropout_prob\n        self.pred_sigma = pred_sigma\n        self.drop_path = drop_path\n        self.caption_channels = caption_channels\n        self.model_max_length = model_max_length\n        self.qk_norm = qk_norm\n        self.enable_flash_attn = enable_flash_attn\n        self.enable_layernorm_kernel = enable_layernorm_kernel\n        self.enable_sequence_parallelism = enable_sequence_parallelism\n        self.only_train_temporal = only_train_temporal\n        self.freeze_y_embedder = freeze_y_embedder\n        self.skip_y_embedder = skip_y_embedder\n        super().__init__(**kwargs)\n\n\nclass STDiT3(PreTrainedModel):\n    config_class = STDiT3Config\n\n    def __init__(self, config):\n        super().__init__(config)\n        self.pred_sigma = config.pred_sigma\n        self.in_channels = config.in_channels\n        self.out_channels = config.in_channels * 2 if config.pred_sigma else config.in_channels\n\n        # model size related\n        self.depth = config.depth\n        self.mlp_ratio = config.mlp_ratio\n        self.hidden_size = config.hidden_size\n        self.num_heads = config.num_heads\n\n        # computation related\n        self.drop_path = config.drop_path\n        self.enable_flash_attn = config.enable_flash_attn\n        self.enable_layernorm_kernel = config.enable_layernorm_kernel\n        self.enable_sequence_parallelism = config.enable_sequence_parallelism\n\n        # input size related\n        self.patch_size = config.patch_size\n        self.input_sq_size = config.input_sq_size\n        self.pos_embed = PositionEmbedding2D(config.hidden_size)\n        self.rope = RotaryEmbedding(dim=self.hidden_size // self.num_heads)\n\n        # embedding\n        self.x_embedder = PatchEmbed3D(config.patch_size, config.in_channels, config.hidden_size)\n        self.t_embedder = TimestepEmbedder(config.hidden_size)\n        self.fps_embedder = SizeEmbedder(self.hidden_size)\n        self.t_block = nn.Sequential(\n            nn.SiLU(),\n            nn.Linear(config.hidden_size, 6 * config.hidden_size, bias=True),\n        )\n        self.y_embedder = CaptionEmbedder(\n            in_channels=config.caption_channels,\n            hidden_size=config.hidden_size,\n            uncond_prob=config.class_dropout_prob,\n            act_layer=approx_gelu,\n            token_num=config.model_max_length,\n        )\n\n        # spatial blocks\n        drop_path = [x.item() for x in torch.linspace(0, self.drop_path, config.depth)]\n        self.spatial_blocks = nn.ModuleList(\n            [\n                STDiT3Block(\n                    hidden_size=config.hidden_size,\n                    num_heads=config.num_heads,\n                    mlp_ratio=config.mlp_ratio,\n                    drop_path=drop_path[i],\n                    qk_norm=config.qk_norm,\n                    enable_flash_attn=config.enable_flash_attn,\n                    enable_layernorm_kernel=config.enable_layernorm_kernel,\n                    enable_sequence_parallelism=config.enable_sequence_parallelism,\n                )\n                for i in range(config.depth)\n            ]\n        )\n\n        # temporal blocks\n        drop_path = [x.item() for x in torch.linspace(0, self.drop_path, config.depth)]\n        self.temporal_blocks = nn.ModuleList(\n            [\n                STDiT3Block(\n                    hidden_size=config.hidden_size,\n                    num_heads=config.num_heads,\n                    mlp_ratio=config.mlp_ratio,\n                    drop_path=drop_path[i],\n                    qk_norm=config.qk_norm,\n                    enable_flash_attn=config.enable_flash_attn,\n                    enable_layernorm_kernel=config.enable_layernorm_kernel,\n                    enable_sequence_parallelism=config.enable_sequence_parallelism,\n                    # temporal\n                    temporal=True,\n                    rope=self.rope.rotate_queries_or_keys,\n                )\n                for i in range(config.depth)\n            ]\n        )\n\n        # final layer\n        self.final_layer = T2IFinalLayer(config.hidden_size, np.prod(self.patch_size), self.out_channels)\n\n        self.initialize_weights()\n        if config.only_train_temporal:\n            for param in self.parameters():\n                param.requires_grad = False\n            for block in self.temporal_blocks:\n                for param in block.parameters():\n                    param.requires_grad = True\n\n        if config.freeze_y_embedder:\n            for param in self.y_embedder.parameters():\n                param.requires_grad = False\n\n    def initialize_weights(self):\n        # Initialize transformer layers:\n        def _basic_init(module):\n            if isinstance(module, nn.Linear):\n                torch.nn.init.xavier_uniform_(module.weight)\n                if module.bias is not None:\n                    nn.init.constant_(module.bias, 0)\n\n        self.apply(_basic_init)\n\n        # Initialize fps_embedder\n        nn.init.normal_(self.fps_embedder.mlp[0].weight, std=0.02)\n        nn.init.constant_(self.fps_embedder.mlp[0].bias, 0)\n        nn.init.constant_(self.fps_embedder.mlp[2].weight, 0)\n        nn.init.constant_(self.fps_embedder.mlp[2].bias, 0)\n\n        # Initialize timporal blocks\n        for block in self.temporal_blocks:\n            nn.init.constant_(block.attn.proj.weight, 0)\n            nn.init.constant_(block.cross_attn.proj.weight, 0)\n            nn.init.constant_(block.mlp.fc2.weight, 0)\n\n    def get_dynamic_size(self, x):\n        _, _, T, H, W = x.size()\n        if T % self.patch_size[0] != 0:\n            T += self.patch_size[0] - T % self.patch_size[0]\n        if H % self.patch_size[1] != 0:\n            H += self.patch_size[1] - H % self.patch_size[1]\n        if W % self.patch_size[2] != 0:\n            W += self.patch_size[2] - W % self.patch_size[2]\n        T = T // self.patch_size[0]\n        H = H // self.patch_size[1]\n        W = W // self.patch_size[2]\n        return (T, H, W)\n\n    def encode_text(self, y, mask=None):\n        y = self.y_embedder(y, self.training)  # [B, 1, N_token, C]\n        if mask is not None:\n            if mask.shape[0] != y.shape[0]:\n                mask = mask.repeat(y.shape[0] // mask.shape[0], 1)\n            mask = mask.squeeze(1).squeeze(1)\n            y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, self.hidden_size)\n            y_lens = mask.sum(dim=1).tolist()\n        else:\n            y_lens = [y.shape[2]] * y.shape[0]\n            y = y.squeeze(1).view(1, -1, self.hidden_size)\n        return y, y_lens\n\n    def forward(self, x, timestep, y, mask=None, x_mask=None, fps=None, height=None, width=None, cache_dic=None, current=None, **kwargs):\n        dtype = self.x_embedder.proj.weight.dtype\n        B = x.size(0)\n        x = x.to(dtype)\n        timestep = timestep.to(dtype)\n        y = y.to(dtype)\n\n        # === get pos embed ===\n        _, _, Tx, Hx, Wx = x.size()\n        T, H, W = self.get_dynamic_size(x)\n        cache_dic['dynamic_size'] = (B,T,H,W)\n        # adjust for sequence parallelism\n        # we need to ensure H * W is divisible by sequence parallel size\n        # for simplicity, we can adjust the height to make it divisible\n        if self.enable_sequence_parallelism:\n            sp_size = dist.get_world_size(get_sequence_parallel_group())\n            if H % sp_size != 0:\n                h_pad_size = sp_size - H % sp_size\n            else:\n                h_pad_size = 0\n\n            if h_pad_size > 0:\n                hx_pad_size = h_pad_size * self.patch_size[1]\n\n                # pad x along the H dimension\n                H += h_pad_size\n                x = F.pad(x, (0, 0, 0, hx_pad_size))\n\n        S = H * W\n        base_size = round(S**0.5)\n        resolution_sq = (height[0].item() * width[0].item()) ** 0.5\n        scale = resolution_sq / self.input_sq_size\n        pos_emb = self.pos_embed(x, H, W, scale=scale, base_size=base_size)\n\n        # === get timestep embed ===\n        t = self.t_embedder(timestep, dtype=x.dtype)  # [B, C]\n        fps = self.fps_embedder(fps.unsqueeze(1), B)\n        t = t + fps\n        t_mlp = self.t_block(t)\n        t0 = t0_mlp = None\n        if x_mask is not None:\n            t0_timestep = torch.zeros_like(timestep)\n            t0 = self.t_embedder(t0_timestep, dtype=x.dtype)\n            t0 = t0 + fps\n            t0_mlp = self.t_block(t0)\n\n        # === get y embed ===\n        if self.config.skip_y_embedder:\n            y_lens = mask\n            if isinstance(y_lens, torch.Tensor):\n                y_lens = y_lens.long().tolist()\n        else:\n            y, y_lens = self.encode_text(y, mask)\n\n        # === get x embed ===\n        x = self.x_embedder(x)  # [B, N, C]\n        x = rearrange(x, \"B (T S) C -> B T S C\", T=T, S=S)\n        x = x + pos_emb\n\n        # shard over the sequence dim if sp is enabled\n        if self.enable_sequence_parallelism:\n            x = split_forward_gather_backward(x, get_sequence_parallel_group(), dim=2, grad_scale=\"down\")\n            S = S // dist.get_world_size(get_sequence_parallel_group())\n\n        x = rearrange(x, \"B T S C -> B (T S) C\", T=T, S=S)\n\n        # === blocks ===\n        for i, (spatial_block, temporal_block) in enumerate(zip(self.spatial_blocks, self.temporal_blocks)):\n            current['layer'] = i\n            #x = auto_grad_checkpoint(spatial_block,  x, y, t_mlp, current, cache_dic, y_lens, x_mask, t0_mlp, T, S)\n            #x = auto_grad_checkpoint(temporal_block, x, y, t_mlp, current, cache_dic, y_lens, x_mask, t0_mlp, T, S)\n            x = spatial_block(x, y, t_mlp, current, cache_dic, y_lens, x_mask, t0_mlp, T, S)\n            x = temporal_block(x, y, t_mlp, current, cache_dic, y_lens, x_mask, t0_mlp, T, S)\n\n        if self.enable_sequence_parallelism:\n            x = rearrange(x, \"B (T S) C -> B T S C\", T=T, S=S)\n            x = gather_forward_split_backward(x, get_sequence_parallel_group(), dim=2, grad_scale=\"up\")\n            S = S * dist.get_world_size(get_sequence_parallel_group())\n            x = rearrange(x, \"B T S C -> B (T S) C\", T=T, S=S)\n\n        # === final layer ===\n        x = self.final_layer(x, t, x_mask, t0, T, S)\n        x = self.unpatchify(x, T, H, W, Tx, Hx, Wx)\n\n        # cast to float32 for better accuracy\n        x = x.to(torch.float32)\n        return x\n\n    def unpatchify(self, x, N_t, N_h, N_w, R_t, R_h, R_w):\n        \"\"\"\n        Args:\n            x (torch.Tensor): of shape [B, N, C]\n\n        Return:\n            x (torch.Tensor): of shape [B, C_out, T, H, W]\n        \"\"\"\n\n        # N_t, N_h, N_w = [self.input_size[i] // self.patch_size[i] for i in range(3)]\n        T_p, H_p, W_p = self.patch_size\n        x = rearrange(\n            x,\n            \"B (N_t N_h N_w) (T_p H_p W_p C_out) -> B C_out (N_t T_p) (N_h H_p) (N_w W_p)\",\n            N_t=N_t,\n            N_h=N_h,\n            N_w=N_w,\n            T_p=T_p,\n            H_p=H_p,\n            W_p=W_p,\n            C_out=self.out_channels,\n        )\n        # unpad\n        x = x[:, :, :R_t, :R_h, :R_w]\n        return x\n\n\n@MODELS.register_module(\"STDiT3-XL/2\")\ndef STDiT3_XL_2(from_pretrained=None, **kwargs):\n    force_huggingface = kwargs.pop(\"force_huggingface\", False)\n    if force_huggingface or from_pretrained is not None and not os.path.exists(from_pretrained):\n        model = STDiT3.from_pretrained(from_pretrained, **kwargs)\n    else:\n        config = STDiT3Config(depth=28, hidden_size=1152, patch_size=(1, 2, 2), num_heads=16, **kwargs)\n        model = STDiT3(config)\n        if from_pretrained is not None:\n            load_checkpoint(model, from_pretrained)\n    return model\n\n\n@MODELS.register_module(\"STDiT3-3B/2\")\ndef STDiT3_3B_2(from_pretrained=None, **kwargs):\n    force_huggingface = kwargs.pop(\"force_huggingface\", False)\n    if force_huggingface or from_pretrained is not None and not os.path.exists(from_pretrained):\n        model = STDiT3.from_pretrained(from_pretrained, **kwargs)\n    else:\n        config = STDiT3Config(depth=28, hidden_size=1872, patch_size=(1, 2, 2), num_heads=26, **kwargs)\n        model = STDiT3(config)\n        if from_pretrained is not None:\n            load_checkpoint(model, from_pretrained)\n    return model\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/stdit/stdit3.py",
    "content": "import os\n\nimport numpy as np\nimport torch\nimport torch.distributed as dist\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom einops import rearrange\nfrom rotary_embedding_torch import RotaryEmbedding\nfrom timm.models.layers import DropPath\nfrom timm.models.vision_transformer import Mlp\nfrom transformers import PretrainedConfig, PreTrainedModel\n\nfrom opensora.acceleration.checkpoint import auto_grad_checkpoint\nfrom opensora.acceleration.communications import gather_forward_split_backward, split_forward_gather_backward\nfrom opensora.acceleration.parallel_states import get_sequence_parallel_group\nfrom opensora.models.layers.blocks import (\n    Attention,\n    CaptionEmbedder,\n    MultiHeadCrossAttention,\n    PatchEmbed3D,\n    PositionEmbedding2D,\n    SeqParallelAttention,\n    SeqParallelMultiHeadCrossAttention,\n    SizeEmbedder,\n    T2IFinalLayer,\n    TimestepEmbedder,\n    approx_gelu,\n    get_layernorm,\n    t2i_modulate,\n)\nfrom opensora.registry import MODELS\nfrom opensora.utils.ckpt_utils import load_checkpoint\n\nfrom ...models.cache_functions import global_force_fresh, cache_cutfresh, update_cache, force_init, score_evaluate\n\nclass STDiT3Block(nn.Module):\n    def __init__(\n        self,\n        hidden_size,\n        num_heads,\n        mlp_ratio=4.0,\n        drop_path=0.0,\n        rope=None,\n        qk_norm=False,\n        temporal=False,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n    ):\n        super().__init__()\n        self.temporal = temporal\n        self.hidden_size = hidden_size\n        self.enable_flash_attn = enable_flash_attn\n        self.enable_sequence_parallelism = enable_sequence_parallelism\n\n        if self.enable_sequence_parallelism and not temporal:\n            attn_cls = SeqParallelAttention\n            mha_cls = SeqParallelMultiHeadCrossAttention\n        else:\n            attn_cls = Attention\n            mha_cls = MultiHeadCrossAttention\n\n        self.norm1 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.attn = attn_cls(\n            hidden_size,\n            num_heads=num_heads,\n            qkv_bias=True,\n            qk_norm=qk_norm,\n            rope=rope,\n            enable_flash_attn=enable_flash_attn,\n        )\n        self.cross_attn = mha_cls(hidden_size, num_heads)\n        self.norm2 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.mlp = Mlp(\n            in_features=hidden_size, hidden_features=int(hidden_size * mlp_ratio), act_layer=approx_gelu, drop=0\n        )\n        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity()\n        self.scale_shift_table = nn.Parameter(torch.randn(6, hidden_size) / hidden_size**0.5)\n\n    def t_mask_select(self, x_mask, x, masked_x, T, S):\n        # x: [B, (T, S), C]\n        # mased_x: [B, (T, S), C]\n        # x_mask: [B, T]\n        x = rearrange(x, \"B (T S) C -> B T S C\", T=T, S=S)\n        masked_x = rearrange(masked_x, \"B (T S) C -> B T S C\", T=T, S=S)\n        x = torch.where(x_mask[:, :, None, None], x, masked_x)\n        x = rearrange(x, \"B T S C -> B (T S) C\")\n        return x\n\n    def forward(\n        self,\n        x,\n        y,\n        t,\n        current,\n        cache_dic,\n        mask=None,  # text mask\n        x_mask=None,  # temporal mask\n        t0=None,  # t with timestamp=0\n        T=None,  # number of frames\n        S=None,  # number of pixel patches\n    ):\n        # prepare modulate parameters\n        B, N, C = x.shape\n        shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (\n            self.scale_shift_table[None] + t.reshape(B, 6, -1)\n        ).chunk(6, dim=1)\n        if x_mask is not None:\n            shift_msa_zero, scale_msa_zero, gate_msa_zero, shift_mlp_zero, scale_mlp_zero, gate_mlp_zero = (\n                self.scale_shift_table[None] + t0.reshape(B, 6, -1)\n            ).chunk(6, dim=1)\n        #attn_tick = torch.cuda.Event(enable_timing=True)\n        #cross_attn_tick = torch.cuda.Event(enable_timing=True)\n        #end_cross_attn_tick = torch.cuda.Event(enable_timing=True)\n        #mlp_tick = torch.cuda.Event(enable_timing=True)\n        #end = torch.cuda.Event(enable_timing=True)\n        if self.temporal:\n            current['flag'] = -1\n        else:\n            current['flag'] = 0\n        is_force_fresh = global_force_fresh(cache_dic, current)\n        current['is_force_fresh'] = is_force_fresh\n        #print(is_force_fresh)\n        \n        # modulate (attention)\n        current['module'] = 'attn'\n\n        if is_force_fresh[current['module']]:\n            #attn_tick.record()\n            x_m = t2i_modulate(self.norm1(x), shift_msa, scale_msa)\n            if x_mask is not None:\n                x_m_zero = t2i_modulate(self.norm1(x), shift_msa_zero, scale_msa_zero)\n                x_m = self.t_mask_select(x_mask, x_m, x_m_zero, T, S)\n\n            # attention\n            if self.temporal:\n                x_m = rearrange(x_m, \"B (T S) C -> (B S) T C\", T=T, S=S)\n                x_m = self.attn(x_m)\n                x_m = rearrange(x_m, \"(B S) T C -> B (T S) C\", T=T, S=S)\n            else:\n                x_m = rearrange(x_m, \"B (T S) C -> (B T) S C\", T=T, S=S)\n                x_m = self.attn(x_m)\n                x_m = rearrange(x_m, \"(B T) S C -> B (T S) C\", T=T, S=S)\n\n            cache_dic['cache'][current['flag']][current['layer']][current['module']] = x_m\n            force_init(cache_dic, current, x)\n        else:            \n            x_m = cache_dic['cache'][current['flag']][current['layer']][current['module']]\n            \n        # modulate (attention)\n        x_m_s = gate_msa * x_m\n        if x_mask is not None:\n            x_m_s_zero = gate_msa_zero * x_m\n            x_m_s = self.t_mask_select(x_mask, x_m_s, x_m_s_zero, T, S)\n        # residual\n        x = x + self.drop_path(x_m_s)\n\n\n        # cross attention\n        current['module'] = 'cross-attn'\n\n        if is_force_fresh[current['module']]:\n            #cross_attn_tick.record()\n            cache_dic['cache'][current['flag']][current['layer']][current['module']], cache_dic['cross_attn_map'][current['flag']][current['layer']] = self.cross_attn(x, y, mask)\n            force_init(cache_dic, current, x)\n\n        else:\n            fresh_indices, fresh_tokens = cache_cutfresh(cache_dic, x, current) # 0.6ms\n            fresh_tokens, fresh_cross_attn_map = self.cross_attn(fresh_tokens, y, mask) # 0.45ms\n            #cross_attn_tick.record()\n            update_cache(fresh_indices, fresh_tokens=fresh_tokens, cache_dic=cache_dic, current=current, fresh_attn_map=fresh_cross_attn_map) # 0.3ms\n            #cache_dic['cache'][-1][current['layer']][current['module']] = self.cross_attn(x, y, mask)\n        x = x + cache_dic['cache'][current['flag']][current['layer']][current['module']]\n\n        # modulate (MLP)\n        current['module'] = 'mlp'\n\n        #mlp_tick.record()\n        x_m = t2i_modulate(self.norm2(x), shift_mlp, scale_mlp)\n        if x_mask is not None:\n            x_m_zero = t2i_modulate(self.norm2(x), shift_mlp_zero, scale_mlp_zero)\n            x_m = self.t_mask_select(x_mask, x_m, x_m_zero, T, S)\n        \n        # MLP\n        if is_force_fresh[current['module']]:\n            x_m = self.mlp(x_m)\n            cache_dic['cache'][current['flag']][current['layer']][current['module']] = x_m\n            force_init(cache_dic, current, x)\n        \n        else:\n            fresh_indices, fresh_tokens = cache_cutfresh(cache_dic, x_m, current)\n            fresh_tokens = self.mlp(fresh_tokens)\n            update_cache(fresh_indices, fresh_tokens=fresh_tokens, cache_dic=cache_dic, current=current)\n\n        # modulate (MLP)\n        x_m_s = gate_mlp * cache_dic['cache'][current['flag']][current['layer']][current['module']]\n\n        if x_mask is not None:\n            x_m_s_zero = gate_mlp_zero * x_m\n            x_m_s = self.t_mask_select(x_mask, x_m_s, x_m_s_zero, T, S)\n\n            # residual    \n        x = x + self.drop_path(x_m_s)\n\n            #end.record()\n            #torch.cuda.synchronize()\n            #print(\"Cached:\",attn_tick.elapsed_time(cross_attn_tick),cross_attn_tick.elapsed_time(mlp_tick),mlp_tick.elapsed_time(end))\n            #print(cross_attn_tick.elapsed_time(end_cross_attn_tick))\n        return x\n\n\nclass STDiT3Config(PretrainedConfig):\n    model_type = \"STDiT3\"\n\n    def __init__(\n        self,\n        input_size=(None, None, None),\n        input_sq_size=512,\n        in_channels=4,\n        patch_size=(1, 2, 2),\n        hidden_size=1152,\n        depth=28,\n        num_heads=16,\n        mlp_ratio=4.0,\n        class_dropout_prob=0.1,\n        pred_sigma=True,\n        drop_path=0.0,\n        caption_channels=4096,\n        model_max_length=300,\n        qk_norm=True,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n        only_train_temporal=False,\n        freeze_y_embedder=False,\n        skip_y_embedder=False,\n        **kwargs,\n    ):\n        self.input_size = input_size\n        self.input_sq_size = input_sq_size\n        self.in_channels = in_channels\n        self.patch_size = patch_size\n        self.hidden_size = hidden_size\n        self.depth = depth\n        self.num_heads = num_heads\n        self.mlp_ratio = mlp_ratio\n        self.class_dropout_prob = class_dropout_prob\n        self.pred_sigma = pred_sigma\n        self.drop_path = drop_path\n        self.caption_channels = caption_channels\n        self.model_max_length = model_max_length\n        self.qk_norm = qk_norm\n        self.enable_flash_attn = enable_flash_attn\n        self.enable_layernorm_kernel = enable_layernorm_kernel\n        self.enable_sequence_parallelism = enable_sequence_parallelism\n        self.only_train_temporal = only_train_temporal\n        self.freeze_y_embedder = freeze_y_embedder\n        self.skip_y_embedder = skip_y_embedder\n        super().__init__(**kwargs)\n\n\nclass STDiT3(PreTrainedModel):\n    config_class = STDiT3Config\n\n    def __init__(self, config):\n        super().__init__(config)\n        self.pred_sigma = config.pred_sigma\n        self.in_channels = config.in_channels\n        self.out_channels = config.in_channels * 2 if config.pred_sigma else config.in_channels\n\n        # model size related\n        self.depth = config.depth\n        self.mlp_ratio = config.mlp_ratio\n        self.hidden_size = config.hidden_size\n        self.num_heads = config.num_heads\n\n        # computation related\n        self.drop_path = config.drop_path\n        self.enable_flash_attn = config.enable_flash_attn\n        self.enable_layernorm_kernel = config.enable_layernorm_kernel\n        self.enable_sequence_parallelism = config.enable_sequence_parallelism\n\n        # input size related\n        self.patch_size = config.patch_size\n        self.input_sq_size = config.input_sq_size\n        self.pos_embed = PositionEmbedding2D(config.hidden_size)\n        self.rope = RotaryEmbedding(dim=self.hidden_size // self.num_heads)\n\n        # embedding\n        self.x_embedder = PatchEmbed3D(config.patch_size, config.in_channels, config.hidden_size)\n        self.t_embedder = TimestepEmbedder(config.hidden_size)\n        self.fps_embedder = SizeEmbedder(self.hidden_size)\n        self.t_block = nn.Sequential(\n            nn.SiLU(),\n            nn.Linear(config.hidden_size, 6 * config.hidden_size, bias=True),\n        )\n        self.y_embedder = CaptionEmbedder(\n            in_channels=config.caption_channels,\n            hidden_size=config.hidden_size,\n            uncond_prob=config.class_dropout_prob,\n            act_layer=approx_gelu,\n            token_num=config.model_max_length,\n        )\n\n        # spatial blocks\n        drop_path = [x.item() for x in torch.linspace(0, self.drop_path, config.depth)]\n        self.spatial_blocks = nn.ModuleList(\n            [\n                STDiT3Block(\n                    hidden_size=config.hidden_size,\n                    num_heads=config.num_heads,\n                    mlp_ratio=config.mlp_ratio,\n                    drop_path=drop_path[i],\n                    qk_norm=config.qk_norm,\n                    enable_flash_attn=config.enable_flash_attn,\n                    enable_layernorm_kernel=config.enable_layernorm_kernel,\n                    enable_sequence_parallelism=config.enable_sequence_parallelism,\n                )\n                for i in range(config.depth)\n            ]\n        )\n\n        # temporal blocks\n        drop_path = [x.item() for x in torch.linspace(0, self.drop_path, config.depth)]\n        self.temporal_blocks = nn.ModuleList(\n            [\n                STDiT3Block(\n                    hidden_size=config.hidden_size,\n                    num_heads=config.num_heads,\n                    mlp_ratio=config.mlp_ratio,\n                    drop_path=drop_path[i],\n                    qk_norm=config.qk_norm,\n                    enable_flash_attn=config.enable_flash_attn,\n                    enable_layernorm_kernel=config.enable_layernorm_kernel,\n                    enable_sequence_parallelism=config.enable_sequence_parallelism,\n                    # temporal\n                    temporal=True,\n                    rope=self.rope.rotate_queries_or_keys,\n                )\n                for i in range(config.depth)\n            ]\n        )\n\n        # final layer\n        self.final_layer = T2IFinalLayer(config.hidden_size, np.prod(self.patch_size), self.out_channels)\n\n        self.initialize_weights()\n        if config.only_train_temporal:\n            for param in self.parameters():\n                param.requires_grad = False\n            for block in self.temporal_blocks:\n                for param in block.parameters():\n                    param.requires_grad = True\n\n        if config.freeze_y_embedder:\n            for param in self.y_embedder.parameters():\n                param.requires_grad = False\n\n    def initialize_weights(self):\n        # Initialize transformer layers:\n        def _basic_init(module):\n            if isinstance(module, nn.Linear):\n                torch.nn.init.xavier_uniform_(module.weight)\n                if module.bias is not None:\n                    nn.init.constant_(module.bias, 0)\n\n        self.apply(_basic_init)\n\n        # Initialize fps_embedder\n        nn.init.normal_(self.fps_embedder.mlp[0].weight, std=0.02)\n        nn.init.constant_(self.fps_embedder.mlp[0].bias, 0)\n        nn.init.constant_(self.fps_embedder.mlp[2].weight, 0)\n        nn.init.constant_(self.fps_embedder.mlp[2].bias, 0)\n\n        # Initialize timporal blocks\n        for block in self.temporal_blocks:\n            nn.init.constant_(block.attn.proj.weight, 0)\n            nn.init.constant_(block.cross_attn.proj.weight, 0)\n            nn.init.constant_(block.mlp.fc2.weight, 0)\n\n    def get_dynamic_size(self, x):\n        _, _, T, H, W = x.size()\n        if T % self.patch_size[0] != 0:\n            T += self.patch_size[0] - T % self.patch_size[0]\n        if H % self.patch_size[1] != 0:\n            H += self.patch_size[1] - H % self.patch_size[1]\n        if W % self.patch_size[2] != 0:\n            W += self.patch_size[2] - W % self.patch_size[2]\n        T = T // self.patch_size[0]\n        H = H // self.patch_size[1]\n        W = W // self.patch_size[2]\n        return (T, H, W)\n\n    def encode_text(self, y, mask=None):\n        y = self.y_embedder(y, self.training)  # [B, 1, N_token, C]\n        if mask is not None:\n            if mask.shape[0] != y.shape[0]:\n                mask = mask.repeat(y.shape[0] // mask.shape[0], 1)\n            mask = mask.squeeze(1).squeeze(1)\n            y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, self.hidden_size)\n            y_lens = mask.sum(dim=1).tolist()\n        else:\n            y_lens = [y.shape[2]] * y.shape[0]\n            y = y.squeeze(1).view(1, -1, self.hidden_size)\n        return y, y_lens\n\n    def forward(self, x, timestep, y, mask=None, x_mask=None, fps=None, height=None, width=None, cache_dic=None, current=None, **kwargs):\n        dtype = self.x_embedder.proj.weight.dtype\n        B = x.size(0)\n        x = x.to(dtype)\n        timestep = timestep.to(dtype)\n        y = y.to(dtype)\n\n        # === get pos embed ===\n        _, _, Tx, Hx, Wx = x.size()\n        T, H, W = self.get_dynamic_size(x)\n        cache_dic['dynamic_size'] = (B,T,H,W)\n        # adjust for sequence parallelism\n        # we need to ensure H * W is divisible by sequence parallel size\n        # for simplicity, we can adjust the height to make it divisible\n        if self.enable_sequence_parallelism:\n            sp_size = dist.get_world_size(get_sequence_parallel_group())\n            if H % sp_size != 0:\n                h_pad_size = sp_size - H % sp_size\n            else:\n                h_pad_size = 0\n\n            if h_pad_size > 0:\n                hx_pad_size = h_pad_size * self.patch_size[1]\n\n                # pad x along the H dimension\n                H += h_pad_size\n                x = F.pad(x, (0, 0, 0, hx_pad_size))\n\n        S = H * W\n        base_size = round(S**0.5)\n        resolution_sq = (height[0].item() * width[0].item()) ** 0.5\n        scale = resolution_sq / self.input_sq_size\n        pos_emb = self.pos_embed(x, H, W, scale=scale, base_size=base_size)\n\n        # === get timestep embed ===\n        t = self.t_embedder(timestep, dtype=x.dtype)  # [B, C]\n        fps = self.fps_embedder(fps.unsqueeze(1), B)\n        t = t + fps\n        t_mlp = self.t_block(t)\n        t0 = t0_mlp = None\n        if x_mask is not None:\n            t0_timestep = torch.zeros_like(timestep)\n            t0 = self.t_embedder(t0_timestep, dtype=x.dtype)\n            t0 = t0 + fps\n            t0_mlp = self.t_block(t0)\n\n        # === get y embed ===\n        if self.config.skip_y_embedder:\n            y_lens = mask\n            if isinstance(y_lens, torch.Tensor):\n                y_lens = y_lens.long().tolist()\n        else:\n            y, y_lens = self.encode_text(y, mask)\n\n        # === get x embed ===\n        x = self.x_embedder(x)  # [B, N, C]\n        x = rearrange(x, \"B (T S) C -> B T S C\", T=T, S=S)\n        x = x + pos_emb\n\n        # shard over the sequence dim if sp is enabled\n        if self.enable_sequence_parallelism:\n            x = split_forward_gather_backward(x, get_sequence_parallel_group(), dim=2, grad_scale=\"down\")\n            S = S // dist.get_world_size(get_sequence_parallel_group())\n\n        x = rearrange(x, \"B T S C -> B (T S) C\", T=T, S=S)\n\n        # === blocks ===\n        for i, (spatial_block, temporal_block) in enumerate(zip(self.spatial_blocks, self.temporal_blocks)):\n            current['layer'] = i\n            #x = auto_grad_checkpoint(spatial_block,  x, y, t_mlp, current, cache_dic, y_lens, x_mask, t0_mlp, T, S)\n            #x = auto_grad_checkpoint(temporal_block, x, y, t_mlp, current, cache_dic, y_lens, x_mask, t0_mlp, T, S)\n            x = spatial_block(x, y, t_mlp, current, cache_dic, y_lens, x_mask, t0_mlp, T, S)\n            x = temporal_block(x, y, t_mlp, current, cache_dic, y_lens, x_mask, t0_mlp, T, S)\n\n        if self.enable_sequence_parallelism:\n            x = rearrange(x, \"B (T S) C -> B T S C\", T=T, S=S)\n            x = gather_forward_split_backward(x, get_sequence_parallel_group(), dim=2, grad_scale=\"up\")\n            S = S * dist.get_world_size(get_sequence_parallel_group())\n            x = rearrange(x, \"B T S C -> B (T S) C\", T=T, S=S)\n\n        # === final layer ===\n        x = self.final_layer(x, t, x_mask, t0, T, S)\n        x = self.unpatchify(x, T, H, W, Tx, Hx, Wx)\n\n        # cast to float32 for better accuracy\n        x = x.to(torch.float32)\n        return x\n\n    def unpatchify(self, x, N_t, N_h, N_w, R_t, R_h, R_w):\n        \"\"\"\n        Args:\n            x (torch.Tensor): of shape [B, N, C]\n\n        Return:\n            x (torch.Tensor): of shape [B, C_out, T, H, W]\n        \"\"\"\n\n        # N_t, N_h, N_w = [self.input_size[i] // self.patch_size[i] for i in range(3)]\n        T_p, H_p, W_p = self.patch_size\n        x = rearrange(\n            x,\n            \"B (N_t N_h N_w) (T_p H_p W_p C_out) -> B C_out (N_t T_p) (N_h H_p) (N_w W_p)\",\n            N_t=N_t,\n            N_h=N_h,\n            N_w=N_w,\n            T_p=T_p,\n            H_p=H_p,\n            W_p=W_p,\n            C_out=self.out_channels,\n        )\n        # unpad\n        x = x[:, :, :R_t, :R_h, :R_w]\n        return x\n\n\n@MODELS.register_module(\"STDiT3-XL/2\")\ndef STDiT3_XL_2(from_pretrained=None, **kwargs):\n    force_huggingface = kwargs.pop(\"force_huggingface\", False)\n    if force_huggingface or from_pretrained is not None and not os.path.exists(from_pretrained):\n        model = STDiT3.from_pretrained(from_pretrained, **kwargs)\n    else:\n        config = STDiT3Config(depth=28, hidden_size=1152, patch_size=(1, 2, 2), num_heads=16, **kwargs)\n        model = STDiT3(config)\n        if from_pretrained is not None:\n            load_checkpoint(model, from_pretrained)\n    return model\n\n\n@MODELS.register_module(\"STDiT3-3B/2\")\ndef STDiT3_3B_2(from_pretrained=None, **kwargs):\n    force_huggingface = kwargs.pop(\"force_huggingface\", False)\n    if force_huggingface or from_pretrained is not None and not os.path.exists(from_pretrained):\n        model = STDiT3.from_pretrained(from_pretrained, **kwargs)\n    else:\n        config = STDiT3Config(depth=28, hidden_size=1872, patch_size=(1, 2, 2), num_heads=26, **kwargs)\n        model = STDiT3(config)\n        if from_pretrained is not None:\n            load_checkpoint(model, from_pretrained)\n    return model\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/text_encoder/__init__.py",
    "content": "from .classes import ClassEncoder\nfrom .clip import ClipEncoder\nfrom .t5 import T5Encoder\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/text_encoder/classes.py",
    "content": "import torch\n\nfrom opensora.registry import MODELS\n\n\n@MODELS.register_module(\"classes\")\nclass ClassEncoder:\n    def __init__(self, num_classes, model_max_length=None, device=\"cuda\", dtype=torch.float):\n        self.num_classes = num_classes\n        self.y_embedder = None\n\n        self.model_max_length = model_max_length\n        self.output_dim = None\n        self.device = device\n\n    def encode(self, text):\n        return dict(y=torch.tensor([int(t) for t in text]).to(self.device))\n\n    def null(self, n):\n        return torch.tensor([self.num_classes] * n).to(self.device)\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/text_encoder/clip.py",
    "content": "# Copyright 2024 Vchitect/Latte\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.# Modified from Latte\n#\n# This file is adapted from the Latte project.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# Latte: https://github.com/Vchitect/Latte\n# DiT:   https://github.com/facebookresearch/DiT/tree/main\n# --------------------------------------------------------\n\n\nimport torch\nimport torch.nn as nn\nimport transformers\nfrom transformers import CLIPTextModel, CLIPTokenizer\n\nfrom opensora.registry import MODELS\n\ntransformers.logging.set_verbosity_error()\n\n\nclass AbstractEncoder(nn.Module):\n    def __init__(self):\n        super().__init__()\n\n    def encode(self, *args, **kwargs):\n        raise NotImplementedError\n\n\nclass FrozenCLIPEmbedder(AbstractEncoder):\n    \"\"\"Uses the CLIP transformer encoder for text (from Hugging Face)\"\"\"\n\n    def __init__(self, path=\"openai/clip-vit-huge-patch14\", device=\"cuda\", max_length=77):\n        super().__init__()\n        self.tokenizer = CLIPTokenizer.from_pretrained(path)\n        self.transformer = CLIPTextModel.from_pretrained(path)\n        self.device = device\n        self.max_length = max_length\n        self._freeze()\n\n    def _freeze(self):\n        self.transformer = self.transformer.eval()\n        for param in self.parameters():\n            param.requires_grad = False\n\n    def forward(self, text):\n        batch_encoding = self.tokenizer(\n            text,\n            truncation=True,\n            max_length=self.max_length,\n            return_length=True,\n            return_overflowing_tokens=False,\n            padding=\"max_length\",\n            return_tensors=\"pt\",\n        )\n        tokens = batch_encoding[\"input_ids\"].to(self.device)\n        outputs = self.transformer(input_ids=tokens)\n\n        z = outputs.last_hidden_state\n        pooled_z = outputs.pooler_output\n        return z, pooled_z\n\n    def encode(self, text):\n        return self(text)\n\n\n@MODELS.register_module(\"clip\")\nclass ClipEncoder:\n    \"\"\"\n    Embeds text prompt into vector representations. Also handles text dropout for classifier-free guidance.\n    \"\"\"\n\n    def __init__(\n        self,\n        from_pretrained,\n        model_max_length=77,\n        device=\"cuda\",\n        dtype=torch.float,\n    ):\n        super().__init__()\n        assert from_pretrained is not None, \"Please specify the path to the T5 model\"\n\n        self.text_encoder = FrozenCLIPEmbedder(path=from_pretrained, max_length=model_max_length).to(device, dtype)\n        self.y_embedder = None\n\n        self.model_max_length = model_max_length\n        self.output_dim = self.text_encoder.transformer.config.hidden_size\n\n    def encode(self, text):\n        _, pooled_embeddings = self.text_encoder.encode(text)\n        y = pooled_embeddings.unsqueeze(1).unsqueeze(1)\n        return dict(y=y)\n\n    def null(self, n):\n        null_y = self.y_embedder.y_embedding[None].repeat(n, 1, 1)[:, None]\n        return null_y\n\n    def to(self, dtype):\n        self.text_encoder = self.text_encoder.to(dtype)\n        return self\n"
  },
  {
    "path": "Open-Sora/build/lib/opensora/models/text_encoder/t5.py",
    "content": "# Adapted from PixArt\n#\n# Copyright (C) 2023  PixArt-alpha/PixArt-alpha\n#\n# This program is free software: you can redistribute it and/or modify\n# it under the terms of the GNU Affero General Public License as published\n# by the Free Software Foundation, either version 3 of the License, or\n# (at your option) any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU Affero General Public License for more details.\n#\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# PixArt: https://github.com/PixArt-alpha/PixArt-alpha\n# T5:     https://github.com/google-research/text-to-text-transfer-transformer\n# --------------------------------------------------------\n\nimport html\nimport re\n\nimport ftfy\nimport torch\nfrom transformers import AutoTokenizer, T5EncoderModel\n\nfrom opensora.registry import MODELS\n\n\nclass T5Embedder:\n    def __init__(\n        self,\n        device,\n        from_pretrained=None,\n        *,\n        cache_dir=None,\n        hf_token=None,\n        use_text_preprocessing=True,\n        t5_model_kwargs=None,\n        torch_dtype=None,\n        use_offload_folder=None,\n        model_max_length=120,\n        local_files_only=False,\n    ):\n        self.device = torch.device(device)\n        self.torch_dtype = torch_dtype or torch.bfloat16\n        self.cache_dir = cache_dir\n\n        if t5_model_kwargs is None:\n            t5_model_kwargs = {\n                \"low_cpu_mem_usage\": True,\n                \"torch_dtype\": self.torch_dtype,\n            }\n\n            if use_offload_folder is not None:\n                t5_model_kwargs[\"offload_folder\"] = use_offload_folder\n                t5_model_kwargs[\"device_map\"] = {\n                    \"shared\": self.device,\n                    \"encoder.embed_tokens\": self.device,\n                    \"encoder.block.0\": self.device,\n                    \"encoder.block.1\": self.device,\n                    \"encoder.block.2\": self.device,\n                    \"encoder.block.3\": self.device,\n                    \"encoder.block.4\": self.device,\n                    \"encoder.block.5\": self.device,\n                    \"encoder.block.6\": self.device,\n                    \"encoder.block.7\": self.device,\n                    \"encoder.block.8\": self.device,\n                    \"encoder.block.9\": self.device,\n                    \"encoder.block.10\": self.device,\n                    \"encoder.block.11\": self.device,\n                    \"encoder.block.12\": \"disk\",\n                    \"encoder.block.13\": \"disk\",\n                    \"encoder.block.14\": \"disk\",\n                    \"encoder.block.15\": \"disk\",\n                    \"encoder.block.16\": \"disk\",\n                    \"encoder.block.17\": \"disk\",\n                    \"encoder.block.18\": \"disk\",\n                    \"encoder.block.19\": \"disk\",\n                    \"encoder.block.20\": \"disk\",\n                    \"encoder.block.21\": \"disk\",\n                    \"encoder.block.22\": \"disk\",\n                    \"encoder.block.23\": \"disk\",\n                    \"encoder.final_layer_norm\": \"disk\",\n                    \"encoder.dropout\": \"disk\",\n                }\n            else:\n                t5_model_kwargs[\"device_map\"] = {\n                    \"shared\": self.device,\n                    \"encoder\": self.device,\n                }\n\n        self.use_text_preprocessing = use_text_preprocessing\n        self.hf_token = hf_token\n\n        self.tokenizer = AutoTokenizer.from_pretrained(\n            from_pretrained,\n            cache_dir=cache_dir,\n            local_files_only=local_files_only,\n        )\n        self.model = T5EncoderModel.from_pretrained(\n            from_pretrained,\n            cache_dir=cache_dir,\n            local_files_only=local_files_only,\n            **t5_model_kwargs,\n        ).eval()\n        self.model_max_length = model_max_length\n\n    def get_text_embeddings(self, texts):\n        text_tokens_and_mask = self.tokenizer(\n            texts,\n            max_length=self.model_max_length,\n            padding=\"max_length\",\n            truncation=True,\n            return_attention_mask=True,\n            add_special_tokens=True,\n            return_tensors=\"pt\",\n        )\n\n        input_ids = text_tokens_and_mask[\"input_ids\"].to(self.device)\n        attention_mask = text_tokens_and_mask[\"attention_mask\"].to(self.device)\n        with torch.no_grad():\n            text_encoder_embs = self.model(\n                input_ids=input_ids,\n                attention_mask=attention_mask,\n            )[\"last_hidden_state\"].detach()\n        return text_encoder_embs, attention_mask\n\n\n@MODELS.register_module(\"t5\")\nclass T5Encoder:\n    def __init__(\n        self,\n        from_pretrained=None,\n        model_max_length=120,\n        device=\"cuda\",\n        dtype=torch.float,\n        cache_dir=None,\n        shardformer=False,\n        local_files_only=False,\n    ):\n        assert from_pretrained is not None, \"Please specify the path to the T5 model\"\n\n        self.t5 = T5Embedder(\n            device=device,\n            torch_dtype=dtype,\n            from_pretrained=from_pretrained,\n            cache_dir=cache_dir,\n            model_max_length=model_max_length,\n            local_files_only=local_files_only,\n        )\n        self.t5.model.to(dtype=dtype)\n        self.y_embedder = None\n\n        self.model_max_length = model_max_length\n        self.output_dim = self.t5.model.config.d_model\n        self.dtype = dtype\n\n        if shardformer:\n            self.shardformer_t5()\n\n    def shardformer_t5(self):\n        from colossalai.shardformer import ShardConfig, ShardFormer\n\n        from opensora.acceleration.shardformer.policy.t5_encoder import T5EncoderPolicy\n        from opensora.utils.misc import requires_grad\n\n        shard_config = ShardConfig(\n            tensor_parallel_process_group=None,\n            pipeline_stage_manager=None,\n            enable_tensor_parallelism=False,\n            enable_fused_normalization=False,\n            enable_flash_attention=False,\n            enable_jit_fused=True,\n            enable_sequence_parallelism=False,\n            enable_sequence_overlap=False,\n        )\n        shard_former = ShardFormer(shard_config=shard_config)\n        optim_model, _ = shard_former.optimize(self.t5.model, policy=T5EncoderPolicy())\n        self.t5.model = optim_model.to(self.dtype)\n\n        # ensure the weights are frozen\n        requires_grad(self.t5.model, False)\n\n    def encode(self, text):\n        caption_embs, emb_masks = self.t5.get_text_embeddings(text)\n        caption_embs = caption_embs[:, None]\n        return dict(y=caption_embs, mask=emb_masks)\n\n    def null(self, n):\n        null_y = self.y_embedder.y_embedding[None].repeat(n, 1, 1)[:, None]\n        return null_y\n\n\ndef basic_clean(text):\n    text = ftfy.fix_text(text)\n    text = html.unescape(html.unescape(text))\n    return text.strip()\n\n\nBAD_PUNCT_REGEX = re.compile(\n    r\"[\" + \"#®•©™&@·º½¾¿¡§~\" + \"\\)\" + \"\\(\" + \"\\]\" + \"\\[\" + \"\\}\" + \"\\{\" + \"\\|\" + \"\\\\\" + \"\\/\" + \"\\*\" + r\"]{1,}\"\n)  # noqa\n\n\ndef clean_caption(caption):\n    import urllib.parse as ul\n\n    from bs4 import BeautifulSoup\n\n    caption = str(caption)\n    caption = ul.unquote_plus(caption)\n    caption = caption.strip().lower()\n    caption = re.sub(\"<person>\", \"person\", caption)\n    # urls:\n    caption = re.sub(\n        r\"\\b((?:https?:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))\",  # noqa\n        \"\",\n        caption,\n    )  # regex for urls\n    caption = re.sub(\n        r\"\\b((?:www:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))\",  # noqa\n        \"\",\n        caption,\n    )  # regex for urls\n    # html:\n    caption = BeautifulSoup(caption, features=\"html.parser\").text\n\n    # @<nickname>\n    caption = re.sub(r\"@[\\w\\d]+\\b\", \"\", caption)\n\n    # 31C0—31EF CJK Strokes\n    # 31F0—31FF Katakana Phonetic Extensions\n    # 3200—32FF Enclosed CJK Letters and Months\n    # 3300—33FF CJK Compatibility\n    # 3400—4DBF CJK Unified Ideographs Extension A\n    # 4DC0—4DFF Yijing Hexagram Symbols\n    # 4E00—9FFF CJK Unified Ideographs\n    caption = re.sub(r\"[\\u31c0-\\u31ef]+\", \"\", caption)\n    caption = re.sub(r\"[\\u31f0-\\u31ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3200-\\u32ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3300-\\u33ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3400-\\u4dbf]+\", \"\", caption)\n    caption = re.sub(r\"[\\u4dc0-\\u4dff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u4e00-\\u9fff]+\", \"\", caption)\n    #######################################################\n\n    # все виды тире / all types of dash --> \"-\"\n    caption = re.sub(\n        r\"[\\u002D\\u058A\\u05BE\\u1400\\u1806\\u2010-\\u2015\\u2E17\\u2E1A\\u2E3A\\u2E3B\\u2E40\\u301C\\u3030\\u30A0\\uFE31\\uFE32\\uFE58\\uFE63\\uFF0D]+\",  # noqa\n        \"-\",\n        caption,\n    )\n\n    # кавычки к одному стандарту\n    caption = re.sub(r\"[`´«»“”¨]\", '\"', caption)\n    caption = re.sub(r\"[‘’]\", \"'\", caption)\n\n    # &quot;\n    caption = re.sub(r\"&quot;?\", \"\", caption)\n    # &amp\n    caption = re.sub(r\"&amp\", \"\", caption)\n\n    # ip adresses:\n    caption = re.sub(r\"\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\", \" \", caption)\n\n    # article ids:\n    caption = re.sub(r\"\\d:\\d\\d\\s+$\", \"\", caption)\n\n    # \\n\n    caption = re.sub(r\"\\\\n\", \" \", caption)\n\n    # \"#123\"\n    caption = re.sub(r\"#\\d{1,3}\\b\", \"\", caption)\n    # \"#12345..\"\n    caption = re.sub(r\"#\\d{5,}\\b\", \"\", caption)\n    # \"123456..\"\n    caption = re.sub(r\"\\b\\d{6,}\\b\", \"\", caption)\n    # filenames:\n    caption = re.sub(r\"[\\S]+\\.(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)\", \"\", caption)\n\n    #\n    caption = re.sub(r\"[\\\"\\']{2,}\", r'\"', caption)  # \"\"\"AUSVERKAUFT\"\"\"\n    caption = re.sub(r\"[\\.]{2,}\", r\" \", caption)  # \"\"\"AUSVERKAUFT\"\"\"\n\n    caption = re.sub(BAD_PUNCT_REGEX, r\" \", caption)  # ***AUSVERKAUFT***, #AUSVERKAUFT\n    caption = re.sub(r\"\\s+\\.\\s+\", r\" \", caption)  # \" . \"\n\n    # this-is-my-cute-cat / this_is_my_cute_cat\n    regex2 = re.compile(r\"(?:\\-|\\_)\")\n    if len(re.findall(regex2, caption)) > 3:\n        caption = re.sub(regex2, \" \", caption)\n\n    caption = basic_clean(caption)\n\n    caption = re.sub(r\"\\b[a-zA-Z]{1,3}\\d{3,15}\\b\", \"\", caption)  # jc6640\n    caption = re.sub(r\"\\b[a-zA-Z]+\\d+[a-zA-Z]+\\b\", \"\", caption)  # jc6640vc\n    caption = re.sub(r\"\\b\\d+[a-zA-Z]+\\d+\\b\", \"\", caption)  # 6640vc231\n\n    caption = re.sub(r\"(worldwide\\s+)?(free\\s+)?shipping\", \"\", caption)\n    caption = re.sub(r\"(free\\s)?download(\\sfree)?\", \"\", caption)\n    caption = re.sub(r\"\\bclick\\b\\s(?:for|on)\\s\\w+\", \"\", caption)\n    caption = re.sub(r\"\\b(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)(\\simage[s]?)?\", \"\", caption)\n    caption = re.sub(r\"\\bpage\\s+\\d+\\b\", \"\", caption)\n\n    caption = re.sub(r\"\\b\\d*[a-zA-Z]+\\d+[a-zA-Z]+\\d+[a-zA-Z\\d]*\\b\", r\" \", caption)  # j2d1a2a...\n\n    caption = re.sub(r\"\\b\\d+\\.?\\d*[xх×]\\d+\\.?\\d*\\b\", \"\", caption)\n\n    caption = re.sub(r\"\\b\\s+\\:\\s+\", r\": \", caption)\n    caption = re.sub(r\"(\\D[,\\./])\\b\", r\"\\1 \", caption)\n    caption = re.sub(r\"\\s+\", \" \", caption)\n\n    caption.strip()\n\n    caption = re.sub(r\"^[\\\"\\']([\\w\\W]+)[\\\"\\']$\", r\"\\1\", caption)\n    caption = re.sub(r\"^[\\'\\_,\\-\\:;]\", r\"\", caption)\n    caption = re.sub(r\"[\\'\\_,\\-\\:\\-\\+]$\", r\"\", caption)\n    caption = re.sub(r\"^\\.\\S+$\", \"\", caption)\n\n    return caption.strip()\n\n\ndef text_preprocessing(text, use_text_preprocessing: bool = True):\n    if use_text_preprocessing:\n        # The exact text cleaning as was in the training stage:\n        text = clean_caption(text)\n        text = clean_caption(text)\n        return text\n    else:\n        return text.lower().strip()\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/caption/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/tools/caption/acceleration/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/tools/caption/acceleration/llava/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/tools/caption/acceleration/llava/policies/__init__.py",
    "content": "from .llama import LlavaLlamaForCausalLMPolicy\nfrom .mistral import LlavaMistralForCausalLMPolicy\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/caption/acceleration/llava/policies/llama.py",
    "content": "from typing import Dict, Union\n\nimport torch.nn as nn\nfrom colossalai.shardformer.layer import Linear1D_Col, Linear1D_Row\nfrom colossalai.shardformer.policies.base_policy import ModulePolicyDescription, Policy, SubModuleReplacementDescription\n\n__all__ = [\"LlavaLlamaPolicy\", \"LlavaLlamaForCausalLMPolicy\"]\n\n\nclass LlavaLlamaPolicy(Policy):\n    def config_sanity_check(self):\n        pass\n\n    def preprocess(self):\n        if self.shard_config.enable_tensor_parallelism:\n            # Resize embedding\n            self.model.config.vocab_size\n            self.shard_config.tensor_parallel_size\n\n            # if vocab_size % world_size != 0:\n            #     new_vocab_size = vocab_size + world_size - vocab_size % world_size\n            #     self.model.resize_token_embeddings(new_vocab_size)\n\n        return self.model\n\n    def module_policy(self) -> Dict[Union[str, nn.Module], ModulePolicyDescription]:\n        from transformers.models.llama.modeling_llama import LlamaDecoderLayer\n\n        policy = {}\n\n        if self.shard_config.enable_tensor_parallelism:\n            decoder_attribute_replacement = {\n                \"self_attn.hidden_size\": self.model.config.hidden_size // self.shard_config.tensor_parallel_size,\n                \"self_attn.num_heads\": self.model.config.num_attention_heads // self.shard_config.tensor_parallel_size,\n            }\n            if getattr(self.model.config, \"num_key_value_heads\", False):\n                decoder_attribute_replacement[\"self_attn.num_key_value_heads\"] = (\n                    self.model.config.num_key_value_heads // self.shard_config.tensor_parallel_size\n                )\n\n            policy[LlamaDecoderLayer] = ModulePolicyDescription(\n                attribute_replacement=decoder_attribute_replacement,\n                sub_module_replacement=[\n                    SubModuleReplacementDescription(\n                        suffix=\"self_attn.q_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"self_attn.k_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"self_attn.v_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"self_attn.o_proj\",\n                        target_module=Linear1D_Row,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"mlp.gate_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"mlp.up_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"mlp.down_proj\",\n                        target_module=Linear1D_Row,\n                    ),\n                ],\n            )\n\n        return policy\n\n    def postprocess(self):\n        return self.model\n\n\nclass LlavaLlamaForCausalLMPolicy(LlavaLlamaPolicy):\n    def module_policy(self):\n        from transformers import LlamaForCausalLM\n\n        policy = super().module_policy()\n        if self.shard_config.enable_tensor_parallelism:\n            # add a new item for casual lm\n            new_item = {\n                LlamaForCausalLM: ModulePolicyDescription(\n                    sub_module_replacement=[\n                        SubModuleReplacementDescription(\n                            suffix=\"lm_head\", target_module=Linear1D_Col, kwargs={\"gather_output\": True}\n                        )\n                    ],\n                )\n            }\n            policy.update(new_item)\n        return policy\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/caption/acceleration/llava/policies/mistral.py",
    "content": "import warnings\nfrom typing import Dict, Union\n\nimport torch.nn as nn\nfrom colossalai.shardformer.layer import Linear1D_Col, Linear1D_Row, VocabParallelEmbedding1D\nfrom colossalai.shardformer.policies.base_policy import ModulePolicyDescription, Policy, SubModuleReplacementDescription\n\n__all__ = [\"LlavaMistralPolicy\", \"LlavaMistralForCausalLMPolicy\"]\n\n\nclass LlavaMistralPolicy(Policy):\n    def config_sanity_check(self):\n        pass\n\n    def preprocess(self):\n        if self.shard_config.enable_tensor_parallelism:\n            # Resize embedding\n            vocab_size = self.model.config.vocab_size\n            world_size = self.shard_config.tensor_parallel_size\n\n            if vocab_size % world_size != 0:\n                new_vocab_size = vocab_size + world_size - vocab_size % world_size\n                self.model.resize_token_embeddings(new_vocab_size)\n\n        return self.model\n\n    def module_policy(self) -> Dict[Union[str, nn.Module], ModulePolicyDescription]:\n        from transformers.models.mistral.modeling_mistral import MistralDecoderLayer, MistralModel\n\n        policy = {}\n\n        if self.shard_config.enable_sequence_parallelism:\n            self.shard_config.enable_sequence_parallelism = False\n            warnings.warn(\n                \"Mistral doesn't support sequence parallelism now, will ignore the sequence parallelism flag.\"\n            )\n\n        if self.shard_config.enable_tensor_parallelism:\n            decoder_attribute_replacement = {\n                \"self_attn.hidden_size\": self.model.config.hidden_size // self.shard_config.tensor_parallel_size,\n                \"self_attn.num_heads\": self.model.config.num_attention_heads // self.shard_config.tensor_parallel_size,\n                \"self_attn.num_key_value_heads\": self.model.config.num_key_value_heads\n                // self.shard_config.tensor_parallel_size,\n            }\n\n            policy[MistralDecoderLayer] = ModulePolicyDescription(\n                attribute_replacement=decoder_attribute_replacement,\n                sub_module_replacement=[\n                    SubModuleReplacementDescription(\n                        suffix=\"self_attn.q_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"self_attn.k_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"self_attn.v_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"self_attn.o_proj\",\n                        target_module=Linear1D_Row,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"mlp.gate_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"mlp.up_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"mlp.down_proj\",\n                        target_module=Linear1D_Row,\n                    ),\n                ],\n            )\n\n            self.append_or_create_submodule_replacement(\n                description=SubModuleReplacementDescription(\n                    suffix=\"embed_tokens\",\n                    target_module=VocabParallelEmbedding1D,\n                ),\n                policy=policy,\n                target_key=MistralModel,\n            )\n\n        return policy\n\n    def postprocess(self):\n        return self.model\n\n\nclass LlavaMistralForCausalLMPolicy(LlavaMistralPolicy):\n    def module_policy(self):\n        from transformers import MistralForCausalLM\n\n        policy = super().module_policy()\n\n        if self.shard_config.enable_tensor_parallelism:\n            # add a new item for casual lm\n            new_item = {\n                MistralForCausalLM: ModulePolicyDescription(\n                    sub_module_replacement=[\n                        SubModuleReplacementDescription(\n                            suffix=\"lm_head\", target_module=Linear1D_Col, kwargs=dict(gather_output=True)\n                        )\n                    ]\n                )\n            }\n            policy.update(new_item)\n        return policy\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/caption/camera_motion/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/tools/caption/camera_motion/camera_motion.py",
    "content": "import os\n\nimport numpy as np\nimport torch\n\nfrom .utils import load_video\nfrom .visualizer import Visualizer\n\n\ndef transform(vector):\n    x = np.mean([item[0] for item in vector])\n    y = np.mean([item[1] for item in vector])\n    return [x, y]\n\n\nclass CameraPredict:\n    def __init__(self, device, submodules_list, factor=0.25):\n        self.device = device\n        self.grid_size = 10\n        self.factor = factor\n        try:\n            self.model = torch.hub.load(submodules_list[\"repo\"], submodules_list[\"model\"]).to(self.device)\n        except:\n            # workaround for CERTIFICATE_VERIFY_FAILED (see: https://github.com/pytorch/pytorch/issues/33288#issuecomment-954160699)\n            import ssl\n\n            ssl._create_default_https_context = ssl._create_unverified_context\n            self.model = torch.hub.load(submodules_list[\"repo\"], submodules_list[\"model\"]).to(self.device)\n\n    def infer(self, video_path, save_video=False, save_dir=\"./saved_videos\"):\n        # load video\n        video = load_video(video_path, return_tensor=False)\n        # set scale\n        height, width = video.shape[1], video.shape[2]\n        self.scale = min(height, width)\n        video = torch.from_numpy(video).permute(0, 3, 1, 2)[None].float().to(self.device)  # B T C H W\n        pred_tracks, pred_visibility = self.model(video, grid_size=self.grid_size)  # B T N 2,  B T N 1\n\n        if save_video:\n            video_name = os.path.basename(video_path)[:-4]\n            vis = Visualizer(save_dir=save_dir, pad_value=120, linewidth=3)\n            vis.visualize(video, pred_tracks, pred_visibility, filename=video_name)\n\n        return pred_tracks[0].long().detach().cpu().numpy()\n\n    def transform_class(self, vector, min_reso):  # 768*0.05\n        scale = min_reso * self.factor\n        x, y = vector\n        direction = []\n        if x > scale:\n            direction.append(\"right\")\n        elif x < -scale:\n            direction.append(\"left\")\n\n        if y > scale:\n            direction.append(\"down\")\n        elif y < -scale:\n            direction.append(\"up\")\n\n        return direction if direction else [\"static\"]\n\n    def get_edge_point(self, track):\n        middle = self.grid_size // 2\n        top = [list(track[0, i, :]) for i in range(middle - 2, middle + 2)]\n        down = [list(track[self.grid_size - 1, i, :]) for i in range(middle - 2, middle + 2)]\n        left = [list(track[i, 0, :]) for i in range(middle - 2, middle + 2)]\n        right = [list(track[i, self.grid_size - 1, :]) for i in range(middle - 2, middle + 2)]\n\n        return top, down, left, right\n\n    def get_edge_direction(self, track1, track2):\n        edge_points1 = self.get_edge_point(track1)\n        edge_points2 = self.get_edge_point(track2)\n\n        vector_results = []\n        for points1, points2 in zip(edge_points1, edge_points2):\n            vectors = [[end[0] - start[0], end[1] - start[1]] for start, end in zip(points1, points2)]\n            vector_results.append(vectors)\n        vector_results = list(map(transform, vector_results))\n        class_results = [self.transform_class(vector, min_reso=self.scale) for vector in vector_results]\n\n        return class_results\n\n    def classify_top_down(self, top, down):\n        results = []\n        classes = [f\"{item_t}_{item_d}\" for item_t in top for item_d in down]\n\n        results_mapping = {\n            \"left_left\": \"pan_right\",\n            \"right_right\": \"pan_left\",\n            \"down_down\": \"tilt_up\",\n            \"up_up\": \"tilt_down\",\n            \"up_down\": \"zoom_in\",\n            \"down_up\": \"zoom_out\",\n            \"static_static\": \"static\",\n        }\n        results = [results_mapping.get(cls) for cls in classes if cls in results_mapping]\n        return results if results else [\"None\"]\n\n    def classify_left_right(self, left, right):\n        results = []\n        classes = [f\"{item_l}_{item_r}\" for item_l in left for item_r in right]\n        results_mapping = {\n            \"left_left\": \"pan_right\",\n            \"right_right\": \"pan_left\",\n            \"down_down\": \"tilt_up\",\n            \"up_up\": \"tilt_down\",\n            \"left_right\": \"zoom_in\",\n            \"right_left\": \"zoom_out\",\n            \"static_static\": \"static\",\n        }\n        results = [results_mapping.get(cls) for cls in classes if cls in results_mapping]\n        return results if results else [\"None\"]\n\n    def camera_classify(self, track1, track2):\n        top, down, left, right = self.get_edge_direction(track1, track2)\n\n        top_results = self.classify_top_down(top, down)\n        left_results = self.classify_left_right(left, right)\n\n        results = list(set(top_results + left_results))\n        if \"None\" in results and len(results) > 1:\n            results.remove(\"None\")\n        if \"static\" in results and len(results) > 1:\n            results.remove(\"static\")\n        if len(results) == 1 and results[0] == \"None\":  # Tom added this to deal with edge cases\n            results = [\"Undetermined\"]\n        return results\n\n    def predict(self, video_path):\n        pred_track = self.infer(video_path)\n        track1 = pred_track[0].reshape((self.grid_size, self.grid_size, 2))\n        track2 = pred_track[-1].reshape((self.grid_size, self.grid_size, 2))\n        results = self.camera_classify(track1, track2)\n        return results\n\n\ndef compute_camera_motion(device, submodules_dict, video_paths, factor):\n    camera = CameraPredict(device, submodules_dict, factor)\n    # predict_results = camera.predict(video_path)\n    # return predict_results\n    all_predictions = []\n    for video_path in video_paths:\n        camera_motion_types = camera.predict(video_path)\n        all_predictions.append(\"+\".join(camera_motion_types))\n    return all_predictions\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/caption/camera_motion/detect.py",
    "content": "# Originally developed by https://github.com/Vchitect/VBench based on https://github.com/facebookresearch/co-tracker.\n\nimport argparse\nfrom typing import List\n\nimport pandas as pd\n\nfrom .camera_motion import compute_camera_motion\n\n\ndef process(paths: List[str], threshold: float) -> List[str]:\n    device = \"cuda\"\n    submodules = {\"repo\": \"facebookresearch/co-tracker\", \"model\": \"cotracker2\"}\n    camera_motion_types = compute_camera_motion(device, submodules, paths, factor=threshold)\n    return camera_motion_types\n\n\ndef main(args):\n    output_file = args.input.replace(\".csv\", \"_cmotion.csv\")\n    data = pd.read_csv(args.input)\n    data[\"cmotion\"] = process(data[\"path\"], args.threshold)\n    data.to_csv(output_file, index=False)\n    print(f\"Output saved to {output_file}\")\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"input\", type=str)\n    parser.add_argument(\"--threshold\", type=float, default=0.25)\n    args = parser.parse_args()\n    main(args)\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/caption/camera_motion/utils.py",
    "content": "import numpy as np\nimport torch\nfrom decord import VideoReader\nfrom PIL import Image, ImageSequence\n\n\ndef get_frame_indices(num_frames, vlen, sample=\"rand\", fix_start=None, input_fps=1, max_num_frames=-1):\n    if sample in [\"rand\", \"middle\"]:  # uniform sampling\n        acc_samples = min(num_frames, vlen)\n        # split the video into `acc_samples` intervals, and sample from each interval.\n        intervals = np.linspace(start=0, stop=vlen, num=acc_samples + 1).astype(int)\n        ranges = []\n        for idx, interv in enumerate(intervals[:-1]):\n            ranges.append((interv, intervals[idx + 1] - 1))\n        if sample == \"rand\":\n            try:\n                frame_indices = [random.choice(range(x[0], x[1])) for x in ranges]\n            except:\n                frame_indices = np.random.permutation(vlen)[:acc_samples]\n                frame_indices.sort()\n                frame_indices = list(frame_indices)\n        elif fix_start is not None:\n            frame_indices = [x[0] + fix_start for x in ranges]\n        elif sample == \"middle\":\n            frame_indices = [(x[0] + x[1]) // 2 for x in ranges]\n        else:\n            raise NotImplementedError\n\n        if len(frame_indices) < num_frames:  # padded with last frame\n            padded_frame_indices = [frame_indices[-1]] * num_frames\n            padded_frame_indices[: len(frame_indices)] = frame_indices\n            frame_indices = padded_frame_indices\n    elif \"fps\" in sample:  # fps0.5, sequentially sample frames at 0.5 fps\n        output_fps = float(sample[3:])\n        duration = float(vlen) / input_fps\n        delta = 1 / output_fps  # gap between frames, this is also the clip length each frame represents\n        frame_seconds = np.arange(0 + delta / 2, duration + delta / 2, delta)\n        frame_indices = np.around(frame_seconds * input_fps).astype(int)\n        frame_indices = [e for e in frame_indices if e < vlen]\n        if max_num_frames > 0 and len(frame_indices) > max_num_frames:\n            frame_indices = frame_indices[:max_num_frames]\n            # frame_indices = np.linspace(0 + delta / 2, duration + delta / 2, endpoint=False, num=max_num_frames)\n    else:\n        raise ValueError\n    return frame_indices\n\n\ndef load_video(video_path, data_transform=None, num_frames=None, return_tensor=True, width=None, height=None):\n    \"\"\"\n    Load a video from a given path and apply optional data transformations.\n\n    The function supports loading video in GIF (.gif), PNG (.png), and MP4 (.mp4) formats.\n    Depending on the format, it processes and extracts frames accordingly.\n\n    Parameters:\n    - video_path (str): The file path to the video or image to be loaded.\n    - data_transform (callable, optional): A function that applies transformations to the video data.\n\n    Returns:\n    - frames (torch.Tensor): A tensor containing the video frames with shape (T, C, H, W),\n      where T is the number of frames, C is the number of channels, H is the height, and W is the width.\n\n    Raises:\n    - NotImplementedError: If the video format is not supported.\n\n    The function first determines the format of the video file by its extension.\n    For GIFs, it iterates over each frame and converts them to RGB.\n    For PNGs, it reads the single frame, converts it to RGB.\n    For MP4s, it reads the frames using the VideoReader class and converts them to NumPy arrays.\n    If a data_transform is provided, it is applied to the buffer before converting it to a tensor.\n    Finally, the tensor is permuted to match the expected (T, C, H, W) format.\n    \"\"\"\n    if video_path.endswith(\".gif\"):\n        frame_ls = []\n        img = Image.open(video_path)\n        for frame in ImageSequence.Iterator(img):\n            frame = frame.convert(\"RGB\")\n            frame = np.array(frame).astype(np.uint8)\n            frame_ls.append(frame)\n        buffer = np.array(frame_ls).astype(np.uint8)\n    elif video_path.endswith(\".png\"):\n        frame = Image.open(video_path)\n        frame = frame.convert(\"RGB\")\n        frame = np.array(frame).astype(np.uint8)\n        frame_ls = [frame]\n        buffer = np.array(frame_ls)\n    elif video_path.endswith(\".mp4\"):\n        import decord\n\n        decord.bridge.set_bridge(\"native\")\n        if width:\n            video_reader = VideoReader(video_path, width=width, height=height, num_threads=1)\n        else:\n            video_reader = VideoReader(video_path, num_threads=1)\n        frames = video_reader.get_batch(range(len(video_reader)))  # (T, H, W, C), torch.uint8\n\n        buffer = frames.asnumpy().astype(np.uint8)\n    else:\n        raise NotImplementedError\n\n    frames = buffer\n    if num_frames:\n        frame_indices = get_frame_indices(num_frames, len(frames), sample=\"middle\")\n        frames = frames[frame_indices]\n\n    if data_transform:\n        frames = data_transform(frames)\n    elif return_tensor:\n        frames = torch.Tensor(frames)\n        frames = frames.permute(0, 3, 1, 2)  # (T, C, H, W), torch.uint8\n\n    return frames\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/caption/camera_motion/visualizer.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the license found in the cotracker github repo. https://github.com/facebookresearch/co-tracker.\nimport os\n\nimport imageio\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\nimport torchvision.transforms as transforms\nfrom matplotlib import cm\nfrom PIL import Image, ImageDraw\n\n\ndef read_video_from_path(path):\n    try:\n        reader = imageio.get_reader(path)\n    except Exception as e:\n        print(\"Error opening video file: \", e)\n        return None\n    frames = []\n    for i, im in enumerate(reader):\n        frames.append(np.array(im))\n    return np.stack(frames)\n\n\ndef draw_circle(rgb, coord, radius, color=(255, 0, 0), visible=True):\n    # Create a draw object\n    draw = ImageDraw.Draw(rgb)\n    # Calculate the bounding box of the circle\n    left_up_point = (coord[0] - radius, coord[1] - radius)\n    right_down_point = (coord[0] + radius, coord[1] + radius)\n    # Draw the circle\n    draw.ellipse(\n        [left_up_point, right_down_point],\n        fill=tuple(color) if visible else None,\n        outline=tuple(color),\n    )\n    return rgb\n\n\ndef draw_line(rgb, coord_y, coord_x, color, linewidth):\n    draw = ImageDraw.Draw(rgb)\n    draw.line(\n        (coord_y[0], coord_y[1], coord_x[0], coord_x[1]),\n        fill=tuple(color),\n        width=linewidth,\n    )\n    return rgb\n\n\ndef add_weighted(rgb, alpha, original, beta, gamma):\n    return (rgb * alpha + original * beta + gamma).astype(\"uint8\")\n\n\nclass Visualizer:\n    def __init__(\n        self,\n        save_dir: str = \"./results\",\n        grayscale: bool = False,\n        pad_value: int = 0,\n        fps: int = 10,\n        mode: str = \"rainbow\",  # 'cool', 'optical_flow'\n        linewidth: int = 2,\n        show_first_frame: int = 10,\n        tracks_leave_trace: int = 0,  # -1 for infinite\n    ):\n        self.mode = mode\n        self.save_dir = save_dir\n        if mode == \"rainbow\":\n            self.color_map = cm.get_cmap(\"gist_rainbow\")\n        elif mode == \"cool\":\n            self.color_map = cm.get_cmap(mode)\n        self.show_first_frame = show_first_frame\n        self.grayscale = grayscale\n        self.tracks_leave_trace = tracks_leave_trace\n        self.pad_value = pad_value\n        self.linewidth = linewidth\n        self.fps = fps\n\n    def visualize(\n        self,\n        video: torch.Tensor,  # (B,T,C,H,W)\n        tracks: torch.Tensor,  # (B,T,N,2)\n        visibility: torch.Tensor = None,  # (B, T, N, 1) bool\n        gt_tracks: torch.Tensor = None,  # (B,T,N,2)\n        segm_mask: torch.Tensor = None,  # (B,1,H,W)\n        filename: str = \"video\",\n        writer=None,  # tensorboard Summary Writer, used for visualization during training\n        step: int = 0,\n        query_frame: int = 0,\n        save_video: bool = True,\n        compensate_for_camera_motion: bool = False,\n    ):\n        if compensate_for_camera_motion:\n            assert segm_mask is not None\n        if segm_mask is not None:\n            coords = tracks[0, query_frame].round().long()\n            segm_mask = segm_mask[0, query_frame][coords[:, 1], coords[:, 0]].long()\n\n        video = F.pad(\n            video,\n            (self.pad_value, self.pad_value, self.pad_value, self.pad_value),\n            \"constant\",\n            255,\n        )\n        print(\"video shape after pad is: \", video.shape)\n        tracks = tracks + self.pad_value\n\n        print(tracks)\n        print(\"tracks shape after pad is: \", tracks.shape)\n\n        if self.grayscale:\n            transform = transforms.Grayscale()\n            video = transform(video)\n            video = video.repeat(1, 1, 3, 1, 1)\n\n        res_video = self.draw_tracks_on_video(\n            video=video,\n            tracks=tracks,\n            visibility=visibility,\n            segm_mask=segm_mask,\n            gt_tracks=gt_tracks,\n            query_frame=query_frame,\n            compensate_for_camera_motion=compensate_for_camera_motion,\n        )\n        if save_video:\n            self.save_video(res_video, filename=filename, writer=writer, step=step)\n        return res_video\n\n    def save_video(self, video, filename, writer=None, step=0):\n        if writer is not None:\n            writer.add_video(\n                filename,\n                video.to(torch.uint8),\n                global_step=step,\n                fps=self.fps,\n            )\n        else:\n            os.makedirs(self.save_dir, exist_ok=True)\n            wide_list = list(video.unbind(1))\n            wide_list = [wide[0].permute(1, 2, 0).cpu().numpy() for wide in wide_list]\n\n            # Prepare the video file path\n            save_path = os.path.join(self.save_dir, f\"{filename}.mp4\")\n\n            # Create a writer object\n            video_writer = imageio.get_writer(save_path, fps=self.fps)\n\n            # Write frames to the video file\n            for frame in wide_list[2:-1]:\n                video_writer.append_data(frame)\n\n            video_writer.close()\n\n            print(f\"Video saved to {save_path}\")\n\n    def draw_tracks_on_video(\n        self,\n        video: torch.Tensor,\n        tracks: torch.Tensor,\n        visibility: torch.Tensor = None,\n        segm_mask: torch.Tensor = None,\n        gt_tracks=None,\n        query_frame: int = 0,\n        compensate_for_camera_motion=False,\n    ):\n        B, T, C, H, W = video.shape\n        _, _, N, D = tracks.shape\n\n        assert D == 2\n        assert C == 3\n        video = video[0].permute(0, 2, 3, 1).byte().detach().cpu().numpy()  # S, H, W, C\n        tracks = tracks[0].long().detach().cpu().numpy()  # S, N, 2\n        if gt_tracks is not None:\n            gt_tracks = gt_tracks[0].detach().cpu().numpy()\n\n        res_video = []\n\n        # process input video\n        for rgb in video:\n            res_video.append(rgb.copy())\n        vector_colors = np.zeros((T, N, 3))\n\n        if self.mode == \"optical_flow\":\n            import flow_vis\n\n            vector_colors = flow_vis.flow_to_color(tracks - tracks[query_frame][None])\n        elif segm_mask is None:\n            if self.mode == \"rainbow\":\n                y_min, y_max = (\n                    tracks[query_frame, :, 1].min(),\n                    tracks[query_frame, :, 1].max(),\n                )\n                norm = plt.Normalize(y_min, y_max)\n                for n in range(N):\n                    color = self.color_map(norm(tracks[query_frame, n, 1]))\n                    color = np.array(color[:3])[None] * 255\n                    vector_colors[:, n] = np.repeat(color, T, axis=0)\n            else:\n                # color changes with time\n                for t in range(T):\n                    color = np.array(self.color_map(t / T)[:3])[None] * 255\n                    vector_colors[t] = np.repeat(color, N, axis=0)\n        else:\n            if self.mode == \"rainbow\":\n                vector_colors[:, segm_mask <= 0, :] = 255\n\n                y_min, y_max = (\n                    tracks[0, segm_mask > 0, 1].min(),\n                    tracks[0, segm_mask > 0, 1].max(),\n                )\n                norm = plt.Normalize(y_min, y_max)\n                for n in range(N):\n                    if segm_mask[n] > 0:\n                        color = self.color_map(norm(tracks[0, n, 1]))\n                        color = np.array(color[:3])[None] * 255\n                        vector_colors[:, n] = np.repeat(color, T, axis=0)\n\n            else:\n                # color changes with segm class\n                segm_mask = segm_mask.cpu()\n                color = np.zeros((segm_mask.shape[0], 3), dtype=np.float32)\n                color[segm_mask > 0] = np.array(self.color_map(1.0)[:3]) * 255.0\n                color[segm_mask <= 0] = np.array(self.color_map(0.0)[:3]) * 255.0\n                vector_colors = np.repeat(color[None], T, axis=0)\n\n        #  draw tracks\n        if self.tracks_leave_trace != 0:\n            for t in range(query_frame + 1, T):\n                first_ind = max(0, t - self.tracks_leave_trace) if self.tracks_leave_trace >= 0 else 0\n                curr_tracks = tracks[first_ind : t + 1]\n                curr_colors = vector_colors[first_ind : t + 1]\n                if compensate_for_camera_motion:\n                    diff = (tracks[first_ind : t + 1, segm_mask <= 0] - tracks[t : t + 1, segm_mask <= 0]).mean(1)[\n                        :, None\n                    ]\n\n                    curr_tracks = curr_tracks - diff\n                    curr_tracks = curr_tracks[:, segm_mask > 0]\n                    curr_colors = curr_colors[:, segm_mask > 0]\n\n                res_video[t] = self._draw_pred_tracks(\n                    res_video[t],\n                    curr_tracks,\n                    curr_colors,\n                )\n                if gt_tracks is not None:\n                    res_video[t] = self._draw_gt_tracks(res_video[t], gt_tracks[first_ind : t + 1])\n\n        #  draw points\n        for t in range(query_frame, T):\n            img = Image.fromarray(np.uint8(res_video[t]))\n            for i in range(N):\n                coord = (tracks[t, i, 0], tracks[t, i, 1])\n                visibile = True\n                if visibility is not None:\n                    visibile = visibility[0, t, i]\n                if coord[0] != 0 and coord[1] != 0:\n                    if not compensate_for_camera_motion or (compensate_for_camera_motion and segm_mask[i] > 0):\n                        img = draw_circle(\n                            img,\n                            coord=coord,\n                            radius=int(self.linewidth * 2),\n                            color=vector_colors[t, i].astype(int),\n                            visible=visibile,\n                        )\n            res_video[t] = np.array(img)\n\n        #  construct the final rgb sequence\n        if self.show_first_frame > 0:\n            res_video = [res_video[0]] * self.show_first_frame + res_video[1:]\n        return torch.from_numpy(np.stack(res_video)).permute(0, 3, 1, 2)[None].byte()\n\n    def _draw_pred_tracks(\n        self,\n        rgb: np.ndarray,  # H x W x 3\n        tracks: np.ndarray,  # T x 2\n        vector_colors: np.ndarray,\n        alpha: float = 0.5,\n    ):\n        T, N, _ = tracks.shape\n        rgb = Image.fromarray(np.uint8(rgb))\n        for s in range(T - 1):\n            vector_color = vector_colors[s]\n            original = rgb.copy()\n            alpha = (s / T) ** 2\n            for i in range(N):\n                coord_y = (int(tracks[s, i, 0]), int(tracks[s, i, 1]))\n                coord_x = (int(tracks[s + 1, i, 0]), int(tracks[s + 1, i, 1]))\n                if coord_y[0] != 0 and coord_y[1] != 0:\n                    rgb = draw_line(\n                        rgb,\n                        coord_y,\n                        coord_x,\n                        vector_color[i].astype(int),\n                        self.linewidth,\n                    )\n            if self.tracks_leave_trace > 0:\n                rgb = Image.fromarray(np.uint8(add_weighted(np.array(rgb), alpha, np.array(original), 1 - alpha, 0)))\n        rgb = np.array(rgb)\n        return rgb\n\n    def _draw_gt_tracks(\n        self,\n        rgb: np.ndarray,  # H x W x 3,\n        gt_tracks: np.ndarray,  # T x 2\n    ):\n        T, N, _ = gt_tracks.shape\n        color = np.array((211, 0, 0))\n        rgb = Image.fromarray(np.uint8(rgb))\n        for t in range(T):\n            for i in range(N):\n                gt_tracks = gt_tracks[t][i]\n                #  draw a red cross\n                if gt_tracks[0] > 0 and gt_tracks[1] > 0:\n                    length = self.linewidth * 3\n                    coord_y = (int(gt_tracks[0]) + length, int(gt_tracks[1]) + length)\n                    coord_x = (int(gt_tracks[0]) - length, int(gt_tracks[1]) - length)\n                    rgb = draw_line(\n                        rgb,\n                        coord_y,\n                        coord_x,\n                        color,\n                        self.linewidth,\n                    )\n                    coord_y = (int(gt_tracks[0]) - length, int(gt_tracks[1]) + length)\n                    coord_x = (int(gt_tracks[0]) + length, int(gt_tracks[1]) - length)\n                    rgb = draw_line(\n                        rgb,\n                        coord_y,\n                        coord_x,\n                        color,\n                        self.linewidth,\n                    )\n        rgb = np.array(rgb)\n        return rgb\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/caption/camera_motion_detect.py",
    "content": "# ref: https://github.com/antiboredom/camera-motion-detector\n\nimport argparse\n\nimport cv2\nimport numpy as np\nimport pandas as pd\nfrom tqdm import tqdm\n\ntqdm.pandas()\n\n\ndef apply(df, func, **kwargs):\n    if pandas_has_parallel:\n        return df.parallel_apply(func, **kwargs)\n    return df.progress_apply(func, **kwargs)\n\n\ntry:\n    from pandarallel import pandarallel\n\n    pandarallel.initialize(progress_bar=True)\n    pandas_has_parallel = True\nexcept ImportError:\n    pandas_has_parallel = False\n\n\ndef make_empty(new_w, new_h):\n    empty = []\n    for y in range(new_h):\n        xvals = []\n        for x in range(new_w):\n            xvals.append([x, y])\n        empty.append(xvals)\n\n    empty = np.array(empty)\n    return empty\n\n\ndef get_type(mag, ang, zoom_in, tau_static=1.0, tau_zoom=(0.4, 0.6)):\n    if mag < tau_static:\n        return \"static\"\n    if zoom_in < tau_zoom[0]:\n        return \"zoom out\"\n    if zoom_in > tau_zoom[1]:\n        return \"zoom in\"\n    if ang < 45 or ang >= 315:\n        return \"pan left\"\n    if 45 <= ang < 135:\n        return \"tilt up\"\n    if 135 <= ang < 225:\n        return \"pan right\"\n    if 225 <= ang < 315:\n        return \"tilt down\"\n    return \"unknown\"\n\n\ndef get_video_type(frame_types):\n    # count the number of each type\n    counts = {}\n    max_count = 0\n    max_type = None\n    for frame_type in frame_types:\n        if frame_type not in counts:\n            counts[frame_type] = 0\n        counts[frame_type] += 1\n        if counts[frame_type] > max_count:\n            max_count = counts[frame_type]\n            max_type = frame_type\n    if max_count > len(frame_types) / 2:\n        return max_type\n    if \"static\" in counts:\n        return \"unknown\"\n    if \"zoom in\" not in counts and \"zoom out\" not in counts:\n        return \"pan/tilt\"\n    return \"dynamic\"\n\n\ndef process(path: str, frame_interval=15) -> str:\n    cap = cv2.VideoCapture(path)\n    count = 0\n    prvs = None\n    frame_types = []\n    while cap.isOpened():\n        ret, frame = cap.read()\n        if ret:\n            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)\n            if count == 0:\n                prvs = frame\n                h, w = frame.shape\n                empty = make_empty(w, h)\n                empty_dists = np.sqrt(\n                    np.square(empty.ravel()[::2] - (w / 2)) + np.square(empty.ravel()[1::2] - (h / 2))\n                )\n            else:\n                flow = cv2.calcOpticalFlowFarneback(prvs, frame, None, 0.5, 3, 15, 3, 5, 1.2, 0)\n                mag, ang = cv2.cartToPolar(flow[..., 0], flow[..., 1], angleInDegrees=True)\n                mean_mag = np.median(mag)\n                mean_ang = np.median(ang)\n\n                flow_coords = flow + empty\n                xvals = flow_coords.ravel()[::2] - (w / 2)\n                yvals = flow_coords.ravel()[1::2] - (h / 2)\n                dists = np.sqrt(np.square(xvals) + np.square(yvals))\n                dist_diff = dists >= empty_dists\n                zoom_in_factor = np.count_nonzero(dist_diff) / len(dist_diff)\n                frame_types.append(get_type(mean_mag, mean_ang, zoom_in_factor))\n            count += frame_interval\n            cap.set(cv2.CAP_PROP_POS_FRAMES, count)\n        else:\n            cap.release()\n            break\n    video_type = get_video_type(frame_types)\n    return video_type\n\n\ndef main(args):\n    output_file = args.input.replace(\".csv\", \"_cmotion.csv\")\n    data = pd.read_csv(args.input)\n    data[\"cmotion\"] = apply(data[\"path\"], process)\n    data.to_csv(output_file, index=False)\n    print(f\"Output saved to {output_file}\")\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"input\", type=str)\n    parser.add_argument(\"--disable-parallel\", action=\"store_true\")\n    args = parser.parse_args()\n    if args.disable_parallel:\n        pandas_has_parallel = False\n    main(args)\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/caption/caption_gpt4.py",
    "content": "import argparse\nimport base64\nimport csv\nimport os\nfrom io import BytesIO\n\nimport requests\nimport tqdm\n\nfrom .utils import IMG_EXTENSIONS, PROMPTS, VID_EXTENSIONS, VideoTextDataset\n\n\ndef to_base64(image):\n    buffer = BytesIO()\n    image.save(buffer, format=\"JPEG\")\n    return base64.b64encode(buffer.getvalue()).decode(\"utf-8\")\n\n\ndef get_caption(frame, prompt, api_key):\n    headers = {\"Content-Type\": \"application/json\", \"Authorization\": f\"Bearer {api_key}\"}\n    payload = {\n        \"model\": \"gpt-4-vision-preview\",\n        \"messages\": [\n            {\n                \"role\": \"user\",\n                \"content\": [\n                    {\n                        \"type\": \"text\",\n                        \"text\": prompt,\n                    },\n                    {\"type\": \"image_url\", \"image_url\": {\"url\": f\"data:image/jpeg;base64,{frame[0]}\"}},\n                    {\"type\": \"image_url\", \"image_url\": {\"url\": f\"data:image/jpeg;base64,{frame[1]}\"}},\n                    {\"type\": \"image_url\", \"image_url\": {\"url\": f\"data:image/jpeg;base64,{frame[2]}\"}},\n                ],\n            }\n        ],\n        \"max_tokens\": 300,\n    }\n    response = requests.post(\"https://api.openai.com/v1/chat/completions\", headers=headers, json=payload, timeout=60)\n    caption = response.json()[\"choices\"][0][\"message\"][\"content\"]\n    caption = caption.replace(\"\\n\", \" \")\n    return caption\n\n\ndef main(args):\n    # ======================================================\n    # 1. read video list\n    # ======================================================\n    dataset = VideoTextDataset(args.input)\n    output_file = os.path.splitext(args.input)[0] + \"_caption.csv\"\n    f = open(output_file, \"w\")\n    writer = csv.writer(f)\n    writer.writerow([\"video\", \"text\"])\n\n    # make sure that the prompt type matches the data type\n    data_extension = \".\" + dataset.data[\"path\"].iloc[0].split(\".\")[-1]\n    prompt_type = PROMPTS[args.prompt][\"type\"]\n    if prompt_type == \"image\":\n        assert (\n            data_extension.lower() in IMG_EXTENSIONS\n        ), \"The prompt is suitable for an image dataset but the data is not image.\"\n    elif prompt_type == \"video\":\n        assert (\n            data_extension.lower() in VID_EXTENSIONS\n        ), \"The prompt is suitable for a video dataset but the data is not video.\"\n    else:\n        raise ValueError(f\"Found invalid prompt type {prompt_type}\")\n\n    # ======================================================\n    # 2. generate captions\n    # ======================================================\n    for sample in tqdm.tqdm(dataset):\n        prompt = PROMPTS[args.prompt][\"text\"]\n        if \"text\" in args.prompt:\n            prompt = prompt.format(sample[\"text\"])\n        frames = sample[\"image\"]\n        frames = [to_base64(frame) for frame in frames]\n        caption = get_caption(frames, prompt, args.key)\n\n        writer.writerow((sample[\"path\"], caption))\n    f.close()\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"input\", type=str, help=\"Path to the input CSV file\")\n    parser.add_argument(\"--prompt\", type=str, default=\"video-f3-detail-3ex\")\n    parser.add_argument(\"--key\", type=str)\n    args = parser.parse_args()\n\n    main(args)\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/caption/caption_llama3.py",
    "content": "import argparse\nimport csv\nimport os\nimport warnings\nfrom datetime import timedelta\n\nimport pandas as pd\nimport torch\nimport torch.distributed as dist\nfrom torch.utils.data import Dataset\nfrom tqdm import tqdm\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nfrom .utils import read_file\n\nos.system(f\"cp {__file__} ~/backup/\")  # optionally backup the script\nwarnings.filterwarnings(\"ignore\")\nos.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\nfrom torch.distributed.elastic.multiprocessing.errors import record\n\n\nclass CSVTextDataset(Dataset):\n    def __init__(self, csv_path):\n        self.df = pd.read_csv(csv_path)\n        # assert text is in the columns\n        assert \"text\" in self.df.columns, \"text column not found in the csv file\"\n\n    def __len__(self):\n        return len(self.df)\n\n    def __getitem__(self, idx):\n        if idx < 0 or idx >= len(self.df):\n            raise IndexError\n        return self.df.iloc[idx]\n\n    def set_rank_and_world_size(self, rank, world_size):\n        self.rank = rank\n        self.world_size = world_size\n        self.data_per_gpu = len(self) // world_size\n        self.start_index = rank * self.data_per_gpu\n        self.end_index = (rank + 1) * self.data_per_gpu if rank != world_size - 1 else len(self)\n        self.df = self.df.iloc[self.start_index : self.end_index]\n\n    def write_to_csv(self, output_file, data, new_key):\n        \"\"\"write the part of the df to a csv file corresponding to the rank and write self.data_list as a new column\"\"\"\n        writer = csv.writer(open(output_file, \"w\"))\n        columns = self.df.columns + [new_key]\n        writer.writerow(columns)\n        for index, row in self.df.iterrows():\n            if index < self.start_index or index >= self.end_index:\n                continue\n            writer.writerow([*row, data[index - self.start_index]])\n        writer.close()\n\n\ndef pad_left(sequences, padding_value=0):\n    # Determine the maximum length of the sequences\n    max_len = max([s.size(0) for s in sequences])\n    # Create a list to hold the padded sequences\n    padded_sequences = []\n    for sequence in sequences:\n        # Calculate the number of padding elements needed for this sequence\n        num_padding = max_len - sequence.size(0)\n        # Create a tensor of padding values\n        padding = torch.full((num_padding,), padding_value, dtype=sequence.dtype).to(sequence.device)\n        # Concatenate the padding and the sequence to pad on the left\n        padded_sequence = torch.cat([padding, sequence], dim=0)\n        padded_sequences.append(padded_sequence)\n    # Stack the padded sequences into a batch\n    batch = torch.stack(padded_sequences)\n    return batch\n\n\n@record\ndef main(args):\n    # ======================================================\n    # 1. init environment\n    # ======================================================\n    dist.init_process_group(backend=\"nccl\", timeout=timedelta(hours=24))\n    torch.cuda.set_device(dist.get_rank() % torch.cuda.device_count())\n\n    # ======================================================\n    # 2. Prep rank-wise dataloader\n    # ======================================================\n    dataframe = read_file(args.input)\n    print(\"read data from {}\".format(args.input))\n    dataset = CSVTextDataset(args.input)\n    dataset.set_rank_and_world_size(dist.get_rank(), dist.get_world_size())\n\n    import os\n\n    if os.getenv(\"DEBUG_ADDRESS\") != None and dist.get_rank() == 2:\n        import ptvsd\n\n        print(\"waiting for debugger attachment\")\n        ptvsd.enable_attach(address=(\"localhost\", int(os.getenv(\"DEBUG_ADDRESS\"))), redirect_output=True)\n        ptvsd.wait_for_attach()\n\n    output_file = args.output_prefix + f\"_rank{dist.get_rank()}\" + f\"_{args.key}.csv\"\n    output_file_handle = open(output_file, \"w\")\n    writer = csv.writer(output_file_handle)\n    columns = list(dataframe.columns) + [args.key]\n\n    writer.writerow(columns)\n\n    # add a new key named summary, write in csv file\n    print(\"the processed data saved on this rank will be saved to {}\".format(output_file))\n\n    def collate_fn(batch):\n        return batch\n\n    dataloader = torch.utils.data.DataLoader(\n        dataset,\n        # num_workers=2,\n        batch_size=args.batch_size,\n        collate_fn=collate_fn,\n        shuffle=False,\n    )\n\n    # ======================================================\n    # 2. process using llama3 and prompt\n    # ======================================================\n\n    print(\"Using model with the id {}\".format(args.model_id))\n    model_id = args.model_id\n    tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side=\"left\")\n    model = AutoModelForCausalLM.from_pretrained(\n        model_id,\n        torch_dtype=torch.bfloat16,\n        device_map=dist.get_rank() % torch.cuda.device_count(),\n    )\n    # .to(dist.get_rank() % torch.cuda.device_count())\n    dist.barrier()\n    print(\"======== Process data using LLAMA3 ========\")\n\n    def extract_batch(texts, prompt):\n        input_ids_list = [\n            tokenizer.apply_chat_template(\n                [{\"role\": \"system\", \"content\": prompt}, {\"role\": \"user\", \"content\": text}],\n                add_generation_prompt=True,\n                return_tensors=\"pt\",\n            ).to(model.device)[0]\n            for text in texts\n        ]\n\n        attention_mask_list = [\n            torch.ones(input_ids.shape, dtype=torch.long, device=model.device) for input_ids in input_ids_list\n        ]\n\n        # input_ids_batch = pad_left(\n        #     input_ids_list, padding_value=tokenizer.eos_token_id\n        # )\n\n        input_ids_batch = torch.nn.utils.rnn.pad_sequence(\n            input_ids_list, batch_first=True, padding_value=tokenizer.eos_token_id\n        )\n\n        attention_mask_batch = torch.nn.utils.rnn.pad_sequence(attention_mask_list, batch_first=True, padding_value=0)\n\n        # attention_mask_batch = pad_left(\n        #     attention_mask_list, padding_value=0\n        # )\n\n        terminators = [\n            tokenizer.eos_token_id,\n            tokenizer.convert_tokens_to_ids(\"<|eot_id|>\"),\n        ]\n        outputs = model.generate(\n            input_ids_batch,\n            max_new_tokens=512,\n            attention_mask=attention_mask_batch,\n            pad_token_id=tokenizer.eos_token_id,\n            eos_token_id=terminators,\n            # do_sample=True,\n            # temperature=0.6,\n            # top_p=0.9,\n        )\n\n        responses = []\n        for i in range(len(texts)):\n            response = outputs[i][input_ids_list[i].shape[-1] :]\n            response = tokenizer.decode(response, skip_special_tokens=True)\n            responses.append(response)\n\n        return responses\n\n    print(\"Processing starting...\")\n    if args.prompt == \"\" and args.key == \"objects\":\n        prompt = (\n            \"You are a AI assistant to extract objects from user's text. \"\n            \"For example: user: 'In this video a dog is running around. In addition, a person is laughing at the dog.', you produce a list of objects separated by ',' and wrapped by '[' and ']': '[dog, person]' \"\n        )\n    elif args.prompt == \"\" and args.key == \"actions\":\n        prompt = (\n            \"You are a AI assistant to extract actions from user's text. \"\n            \"For example: user: 'In this video a dog is running around. In addition, a person is laughing at the dog.', you produce a list of actions separated by ',' and wrapped by '[' and ']': '[run, laugh]' \"\n        )\n    else:\n        prompt = args.prompt\n\n    print(\"Prompt: {}\".format(prompt))\n\n    args.batch_size\n    # for i in tqdm(range(0, len(dataframe), batch_size)):\n    for _, batch in enumerate(tqdm(dataloader)):\n        # get the text column from the batch\n        texts = [batch[i][\"text\"] for i in range(len(batch))]\n        list_keywords = extract_batch(texts, prompt)\n\n        for idx, keywords in enumerate(list_keywords):\n            try:\n                keywords_start = keywords.find(\"[\")\n                keywords_end = keywords.find(\"]\")\n                keywords = keywords[keywords_start + 1 : keywords_end]\n                if (\n                    \"\\n\" in keywords or len(keywords.strip()) == 0\n                ):  # we empirically observe that it produces newlines when no keywords are found\n                    keywords = \"NONE_FOUND\"\n            except:\n                keywords = \"NONE_FOUND\"\n            row = batch[idx]\n            writer.writerow([*row, keywords])\n\n    output_file_handle.close()\n    dist.barrier()\n\n    if dist.get_rank() == 0:\n        collated_file = args.output_prefix + f\"_{args.key}.csv\"\n        print(\"All ranks are finished. Collating the processed data to {}\".format(collated_file))\n        import pandas as pd\n\n        csv_files = [args.output_prefix + f\"_rank{i}\" + f\"_{args.key}.csv\" for i in range(dist.get_world_size())]\n        # List to hold DataFrames\n        dataframes = []\n        # Read each CSV into a DataFrame and append to list\n        for file in csv_files:\n            df = pd.read_csv(file)\n            # scan each line in the df, if the ``key`` column is NaN, replace it with \"NONE_FOUND\"\n            df[args.key] = df[args.key].fillna(\"NONE_FOUND\")\n            dataframes.append(df)\n        # Concatenate all DataFrames\n        combined_df = pd.concat(dataframes, ignore_index=True)\n\n        # Save the combined DataFrame to a new CSV file\n        combined_df.to_csv(collated_file, index=False)\n        print(\"Collated data saved to {}\".format(collated_file))\n    # terminate distributed env\n    dist.destroy_process_group()\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"--model-id\", default=\"meta-llama/Meta-Llama-3-8B-Instruct\")\n    parser.add_argument(\"input\", type=str, help=\"Path to the input CSV file\")\n    parser.add_argument(\"--output_prefix\", type=str, help=\"Path to the output CSV file\")\n    parser.add_argument(\"--prompt\", type=str, default=\"\")\n    parser.add_argument(\"--batch_size\", type=int, default=32)\n    parser.add_argument(\"--key\", type=str)\n    args = parser.parse_args()\n\n    main(args)\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/caption/caption_llava.py",
    "content": "import argparse\nimport csv\nimport time\nimport warnings\nfrom datetime import timedelta\n\nimport torch\nimport torch.distributed as dist\nfrom colossalai.cluster import DistCoordinator, ProcessGroupMesh\nfrom colossalai.shardformer import ShardConfig, ShardFormer\nfrom colossalai.utils import get_current_device, set_seed\nfrom llava.constants import DEFAULT_IMAGE_TOKEN, IMAGE_TOKEN_INDEX\nfrom llava.conversation import conv_templates\nfrom llava.mm_utils import get_model_name_from_path, process_images, tokenizer_image_token\nfrom llava.model.builder import load_pretrained_model\nfrom llava.utils import disable_torch_init\nfrom torch.utils.data.distributed import DistributedSampler\nfrom tqdm import tqdm\n\nfrom ..datasets.utils import IMG_EXTENSIONS, VID_EXTENSIONS\nfrom .acceleration.llava.policies import LlavaLlamaForCausalLMPolicy, LlavaMistralForCausalLMPolicy\nfrom .utils import PROMPTS, Timer, VideoTextDataset, collate_fn\n\ndisable_torch_init()\n\n\nclass NoPaddingDistributedSampler(DistributedSampler):\n    def __init__(self, dataset, num_replicas=None, rank=None, shuffle=True, seed=0, drop_last=False):\n        super().__init__(\n            dataset=dataset, num_replicas=num_replicas, rank=rank, seed=seed, shuffle=False, drop_last=False\n        )\n        remainder = len(self.dataset) % self.num_replicas\n        if remainder > 0 and (self.rank + 1) - remainder <= 0:\n            # if the dataset is not divisible by num_replicas\n            # the remaining items will be allocated to the first n ranks\n            self.num_samples = len(self.dataset) // self.num_replicas + 1\n        else:\n            self.num_samples = len(self.dataset) // self.num_replicas\n        self.total_size = len(dataset)\n\n    def __iter__(self):\n        if self.shuffle:\n            # deterministically shuffle based on epoch and seed\n            g = torch.Generator()\n            g.manual_seed(self.seed + self.epoch)\n            indices = torch.randperm(len(self.dataset), generator=g).tolist()  # type: ignore[arg-type]\n        else:\n            indices = list(range(len(self.dataset)))  # type: ignore[arg-type]\n\n        # remove tail of data to make it evenly divisible.\n        indices = indices[: self.total_size]\n\n        # subsample\n        indices = indices[self.rank : self.total_size : self.num_replicas]\n        assert len(indices) == self.num_samples\n        return iter(indices)\n\n\n@torch.inference_mode()\ndef main(args):\n    # ======================================================\n    # 1. init environment\n    # ======================================================\n    # we set a very large timeout to avoid some processes exit early\n    dist.init_process_group(backend=\"nccl\", timeout=timedelta(hours=24))\n    torch.cuda.set_device(dist.get_rank() % torch.cuda.device_count())\n    set_seed(1024)\n    coordinator = DistCoordinator()\n\n    # prepare the dp and tp groups\n    assert (\n        args.dp_size * args.tp_size == coordinator.world_size\n    ), f\"DP size {args.dp_size} * TP size {args.tp_size} must equal to world size {coordinator.world_size}\"\n    mesh = ProcessGroupMesh(args.dp_size, args.tp_size)\n    dp_group = mesh.get_group_along_axis(0)\n    tp_group = mesh.get_group_along_axis(1)\n\n    # ======================================================\n    # 2. load model\n    # ======================================================\n    model_path = args.model_path\n    with warnings.catch_warnings():\n        warnings.simplefilter(\"ignore\")  # Pytorch non-meta copying warning fills out the console\n        tokenizer, model, image_processor, context_len = load_pretrained_model(\n            model_path=model_path,\n            model_base=None,\n            model_name=get_model_name_from_path(model_path),\n            device=get_current_device(),\n            torch_dtype=torch.float16,\n            attn_implementation=\"flash_attention_2\" if args.flash_attention else \"eager\",\n        )\n        dist.barrier()\n\n    # ======================================================\n    # 3. Apply system optimization\n    # ======================================================\n    tp_size = dist.get_world_size(tp_group)\n    shard_config = ShardConfig(\n        tensor_parallel_process_group=tp_group if tp_size > 1 else None,\n        enable_tensor_parallelism=True if tp_size > 1 else False,\n    )\n    shard_former = ShardFormer(shard_config=shard_config)\n\n    # check the model type\n    model_name = model.__class__.__name__\n    print(model_name)\n    if model_name == \"LlavaLlamaForCausalLM\":\n        model = shard_former.optimize(model, policy=LlavaLlamaForCausalLMPolicy())[0].cuda()\n    elif model_name == \"LlavaMistralForCausalLM\":\n        model = shard_former.optimize(model, policy=LlavaMistralForCausalLMPolicy())[0].cuda()\n    else:\n        print(f\"The shardformer policy for {model_name} is not implemented, skip\")\n    torch.cuda.empty_cache()\n\n    # ======================================================\n    # 4. Prepare dataloader\n    # ======================================================\n    # prepare prompt\n    query = PROMPTS[args.prompt][\"text\"]\n    if dist.get_rank() == 0:\n        print(f\"Prompt: {query}\")\n\n    if \"text\" in args.prompt:\n\n        def get_text_input_ids(text):\n            conv = conv_templates[\"chatml_direct\"].copy()\n            query_text = query.format(text)\n            conv.append_message(conv.roles[0], DEFAULT_IMAGE_TOKEN + \"\\n\" + query_text)\n            prompt = conv.get_prompt()\n            # add num_frames images\n            t = prompt.split(\"<image>\")\n            prompt = t[0] + \"<image>\" * args.num_frames + t[1]\n            input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors=\"pt\")\n            input_ids = input_ids.unsqueeze(0)\n            return input_ids\n\n    else:\n        conv = conv_templates[\"chatml_direct\"].copy()\n        conv.append_message(conv.roles[0], DEFAULT_IMAGE_TOKEN + \"\\n\" + query)\n        prompt = conv.get_prompt()\n        # add num_frames images\n        t = prompt.split(\"<image>\")\n        prompt = t[0] + \"<image>\" * args.num_frames + t[1]\n        input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors=\"pt\")\n        input_ids = input_ids.unsqueeze(0)\n\n        def get_text_input_ids(*args):\n            return input_ids\n\n    # build dataset\n    def transform(imgs):\n        imgs = process_images(imgs, image_processor, model.config)\n        imgs = imgs.to(dtype=torch.float16)\n        return imgs\n\n    dataset = VideoTextDataset(\n        args.input,\n        transform=transform,\n        num_frames=args.num_frames,\n        get_text_input_ids=get_text_input_ids,\n        resize=args.resize,\n    )\n\n    # make sure that the prompt type matches the data type\n    data_extension = \".\" + dataset.data[\"path\"].iloc[0].split(\".\")[-1]\n    prompt_type = PROMPTS[args.prompt][\"type\"]\n    if prompt_type == \"image\":\n        assert (\n            data_extension.lower() in IMG_EXTENSIONS\n        ), f\"The prompt is suitable for an image dataset but the data is not image. The first data is of format {data_extension}\"\n    elif prompt_type == \"video\":\n        assert (\n            data_extension.lower() in VID_EXTENSIONS\n        ), f\"The prompt is suitable for a video dataset but the data is not video. The first data is of format {data_extension}\"\n    else:\n        raise ValueError(f\"Found invalid prompt type {prompt_type}\")\n\n    total_num_videos = len(dataset)\n\n    # build sampler\n    dp_rank = dist.get_rank(dp_group)\n    dp_size = dist.get_world_size(dp_group)\n    sampler = NoPaddingDistributedSampler(dataset, rank=dp_rank, num_replicas=dp_size)\n\n    # build dataloader\n    dataloader = torch.utils.data.DataLoader(\n        dataset,\n        batch_size=args.bs,\n        shuffle=False,\n        num_workers=args.num_workers,\n        pin_memory=True,\n        prefetch_factor=args.prefetch_factor,\n        sampler=sampler,\n        collate_fn=collate_fn,\n    )\n\n    # prepare output file reader\n    output_file = args.input.replace(\".csv\", \"_caption.csv\")\n\n    # create csv writer\n    has_dp_writter = dist.get_rank(tp_group) == 0\n\n    if has_dp_writter:\n        # the dp writer takes care of the files processed on the current dp rank\n        # so we use write mode\n        output_file_split = output_file.replace(\".csv\", f\"_part{dp_rank}.csv\")\n        dp_file = open(output_file_split, \"w\")\n        dp_writer = csv.writer(dp_file)\n        dp_writer.writerow([\"path\", \"text\", \"num_frames\"])\n\n    # ======================================================\n    # 5. generate captions\n    # ======================================================\n    if dist.get_rank(tp_group) == 0:\n        pbar = tqdm(dataloader, position=dp_rank, desc=f\"Data Parallel Rank {dist.get_rank(dp_group)}\")\n    else:\n        pbar = dataloader\n\n    if args.profile:\n        encode_time = []\n        generate_time = []\n        output_length = []\n        total_time = []\n\n    for i, batch in enumerate(pbar):\n        # measure time\n        if args.profile:\n            torch.cuda.synchronize()\n            start_time = time.time()\n\n        video_files, frames, video_lengths, img_size_list, texts = batch\n\n        # encode the batch of inputs\n        with Timer() as encode_timer:\n            samples = []\n            for imgs, imgs_size, input_ids in zip(frames, img_size_list, texts):\n                imgs = imgs.cuda()\n                input_ids = input_ids.cuda()\n                _, _, _, _, inputs_embeds, _ = model.prepare_inputs_labels_for_multimodal(\n                    input_ids, None, None, None, None, images=imgs, image_sizes=imgs_size\n                )\n                samples.append(inputs_embeds)\n\n        # padding\n        max_len = max([sample.shape[1] for sample in samples])\n        attention_mask = torch.tensor(\n            [[0] * (max_len - samples[i].shape[1]) + [1] * samples[i].shape[1] for i in range(len(samples))]\n        ).to(model.device)\n        inputs_embeds = [\n            torch.cat(\n                [\n                    torch.zeros(\n                        (1, max_len - samples[i].shape[1], samples[i].shape[-1]),\n                        device=model.device,\n                        dtype=torch.float16,\n                    ),\n                    samples[i],\n                ],\n                dim=1,\n            )\n            for i in range(len(samples))\n        ]\n        inputs_embeds = torch.cat(inputs_embeds, dim=0)\n\n        # generate outputs\n        with Timer() as generate_timer:\n            output_ids = super(type(model), model).generate(\n                inputs_embeds=inputs_embeds,\n                attention_mask=attention_mask,\n                do_sample=False,  # sampling is not deterministic and may cause TP to hang\n                max_new_tokens=args.max_tokens,\n                use_cache=True,\n            )\n\n            # skip warmup and add profiling data\n            if args.profile and i >= args.profile_warmup:\n                output_length.append(output_ids.size(0) * output_ids.size(1))\n\n            outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)\n            outputs = [output.replace(\"\\n\", \" \").strip() for output in outputs]\n\n        # skip warmup and add profiling data\n        if args.profile and i >= args.profile_warmup:\n            # measure time\n            torch.cuda.synchronize()\n            time_taken = time.time() - start_time\n\n            total_time.append(time_taken)\n            encode_time.append(encode_timer.time_taken)\n            generate_time.append(generate_timer.time_taken)\n\n        # save results\n        if has_dp_writter:\n            result = list(zip(video_files, outputs, video_lengths))\n            for t in result:\n                dp_writer.writerow(t)\n\n    # display profiling info\n    if args.profile:\n        print(output_length)\n        num_samples_after_warmup = total_num_videos - args.bs * args.profile_warmup * dp_size\n        print(f\"throughput (samples/s): {num_samples_after_warmup / sum(total_time)}\")\n        print(f\"average encode time per sample: {sum(encode_time) / num_samples_after_warmup}\")\n        print(f\"average generate time per sample: {sum(generate_time) / num_samples_after_warmup}\")\n        print(f\"average number of tokens characters per sample: {sum(output_length) / num_samples_after_warmup}\")\n        print(f\"Max GPU allocated / GB: {torch.cuda.max_memory_allocated() / 1024**3}\")\n        print(f\"Max GPU reserved / GB: {torch.cuda.max_memory_reserved() / 1024**3}\")\n\n    # ======================================================\n    # 6. shutdown\n    # ======================================================\n    # close file writing\n    if has_dp_writter:\n        dp_file.close()\n    dist.barrier()\n\n    # terminate distributed env\n    dist.destroy_process_group()\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"input\", type=str, help=\"Path to the input CSV file\")\n    parser.add_argument(\"--model-path\", type=str, default=\"liuhaotian/llava-v1.6-34b\")\n    parser.add_argument(\"--prompt\", type=str, default=\"video-f1-detail-3ex\")\n    parser.add_argument(\"--resize\", type=int, default=336)\n    parser.add_argument(\"--num-frames\", type=int, default=1)\n    parser.add_argument(\"--max-tokens\", type=int, default=300)\n    # speed related\n    parser.add_argument(\"--bs\", type=int, default=16)\n    parser.add_argument(\"--tp-size\", type=int, default=2)\n    parser.add_argument(\"--dp-size\", type=int, default=4)\n    parser.add_argument(\"--num-workers\", type=int, default=8)\n    parser.add_argument(\"--prefetch-factor\", type=int, default=8, help=\"Prefetch factor\")\n    parser.add_argument(\n        \"--flash-attention\",\n        action=\"store_true\",\n        help=\"Whether to use flash attention. You can turn on this flag for llama model and off for mistral model.\",\n    )\n    # debug related\n    parser.add_argument(\"--profile\", action=\"store_true\")\n    parser.add_argument(\"--profile-warmup\", type=int, default=1)\n\n    args = parser.parse_args()\n    main(args)\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/caption/utils.py",
    "content": "import time\n\nimport pandas as pd\nimport torch\nimport torchvision.transforms as transforms\nfrom torchvision.datasets.folder import pil_loader\n\nfrom tools.datasets.utils import extract_frames, is_video\n\nIMG_EXTENSIONS = (\".jpg\", \".jpeg\", \".png\", \".ppm\", \".bmp\", \".pgm\", \".tif\", \".tiff\", \".webp\")\nVID_EXTENSIONS = (\".mp4\", \".avi\", \".mov\", \".mkv\")\nPROMPTS = {\n    \"image\": {\n        \"text\": \"Describe this image and its style to generate a succinct yet informative description. Pay attention to all objects in the image. The description should be useful for AI to re-generate the image. The description should be no more than five sentences. Remember do not exceed 5 sentences.\",\n        \"type\": \"image\",\n    },\n    \"image-text\": {\n        \"text\": \"Describe this image and its style in a very detailed manner. Pay attention to all objects in the image. The description should be useful for AI to re-generate the image. The description should be no more than six sentences. Some information about the image is '{}'.\",\n        \"type\": \"image\",\n    },\n    \"image-3ex\": {\n        \"text\": \"An image is given. Describe this image and its style to generate a succinct yet informative description. Pay attention to all objects in the image. The description should be useful for AI to re-generate the video. The description should be no more than five sentences. Here are some examples of good descriptions: 1. A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick and walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about. 2. Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field. 3. Drone view of waves crashing against the rugged cliffs along Big Sur's garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff's edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.\",\n        \"type\": \"image\",\n    },\n    \"video\": {\n        \"text\": \"Describe this video and its style in a very detailed manner. Pay attention to all objects in the video. The description should be useful for AI to re-generate the video. The description should be no more than six sentences.\",\n        \"type\": \"video\",\n    },\n    \"video-text\": {\n        \"text\": \"Describe this video and its style in a very detailed manner. Some information about the image is '{}'. Pay attention to all objects in the video. The description should be useful for AI to re-generate the video. The description should be no more than six sentences.\",\n        \"type\": \"video\",\n    },\n    \"video-f1-detail-3ex\": {\n        \"text\": \"A video is given by providing the middle frame. Describe this video and its style to generate a description. Pay attention to all objects in the video. The description should be useful for AI to re-generate the video. The description should be no more than six sentences. Here are some examples of good descriptions: 1. A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about. 2. Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field. 3. Drone view of waves crashing against the rugged cliffs along Big Sur's garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff's edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.\",\n        \"type\": \"video\",\n    },\n    \"video-f1-detail-2ex-text\": {\n        \"text\": \"A video is given by providing the middle frame. Some information about the image is '{}'. Describe this video and its style to generate a description. Pay attention to all objects in the video. Do not describe each frame individually. Do not reply with words like 'first frame'. The description should be useful for AI to re-generate the video. The description should be no more than six sentences. Here are some examples of good descriptions: 1. A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about. 2. Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.\",\n        \"type\": \"video\",\n    },\n    \"video-f3-detail-3ex\": {\n        \"text\": \"A video is given by providing three frames in chronological order. Describe this video and its style to generate a description. Pay attention to all objects in the video. Do not describe each frame individually. Do not reply with words like 'first frame'. The description should be useful for AI to re-generate the video. The description should be no more than six sentences. Here are some examples of good descriptions: 1. A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about. 2. Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field. 3. Drone view of waves crashing against the rugged cliffs along Big Sur's garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff's edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.\",\n        \"type\": \"video\",\n    },\n    \"video-f3-detail-2ex-text\": {\n        \"text\": \"A video is given by providing three frames in chronological order. Some information about the image is '{}'. Describe this video and its style to generate a description. Pay attention to all objects in the video. Do not describe each frame individually. Do not reply with words like 'first frame'. The description should be useful for AI to re-generate the video. The description should be no more than six sentences. Here are some examples of good descriptions: 1. A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about. 2. Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.\",\n        \"type\": \"video\",\n    },\n}\n\n\nNUM_FRAMES_POINTS = {\n    1: (0.5,),\n    2: (0.25, 0.75),\n    3: (0.1, 0.5, 0.9),\n}\n\n\ndef read_file(input_path):\n    if input_path.endswith(\".csv\"):\n        return pd.read_csv(input_path)\n    elif input_path.endswith(\".parquet\"):\n        return pd.read_parquet(input_path)\n    else:\n        raise NotImplementedError(f\"Unsupported file format: {input_path}\")\n\n\nclass VideoTextDataset(torch.utils.data.Dataset):\n    def __init__(self, csv_path, transform=None, num_frames=3, get_text_input_ids=None, resize=None):\n        self.csv_path = csv_path\n        self.transform = transform\n        self.data = read_file(csv_path)\n        self.points = NUM_FRAMES_POINTS[num_frames]\n        self.get_text_input_ids = get_text_input_ids\n        self.use_text = False\n        self.resize_size = resize\n        self.resize = transforms.Resize(resize, transforms.InterpolationMode.BICUBIC) if resize is not None else None\n        if \"text\" in self.data.columns:\n            self.use_text = True\n\n    def getitem(self, index):\n        sample = self.data.iloc[index]\n        path = sample[\"path\"]\n        if not is_video(path):\n            images = [pil_loader(path)]\n            length = 1\n        else:\n            images, length = extract_frames(sample[\"path\"], points=self.points, backend=\"opencv\", return_length=True)\n        if self.resize_size is not None:\n            images_r = []\n            for img in images:\n                if img.size[0] > self.resize_size or img.size[1] > self.resize_size:\n                    img = self.resize(img)\n                images_r.append(img)\n            images = images_r\n        imgs_size = [img.size for img in images]\n        if self.transform is not None:\n            images = self.transform(images)\n\n        # we put images into a list as pytorch dataloader does not accept Pill\n        out = dict(path=path, image=images, length=length, img_size=imgs_size)\n        if self.get_text_input_ids is not None:\n            if self.use_text:\n                out[\"text\"] = self.get_text_input_ids(sample[\"text\"])\n            else:\n                out[\"text\"] = self.get_text_input_ids()\n        else:\n            if self.use_text:\n                out[\"text\"] = sample[\"text\"]\n            else:\n                out[\"text\"] = \"\"\n        return out\n\n    def __len__(self):\n        return len(self.data)\n\n    def __getitem__(self, index):\n        return self.getitem(index)\n\n\ndef collate_fn(batch):\n    paths = [item[\"path\"] for item in batch]\n    images = [item[\"image\"] for item in batch]\n    lengths = [item[\"length\"] for item in batch]\n    img_sizes = [item[\"img_size\"] for item in batch]\n    texts = [item[\"text\"] for item in batch]\n    return paths, images, lengths, img_sizes, texts\n\n\nclass Timer:\n    def __init__(self):\n        self.time_taken = 0\n        self.start_time = 0\n        self.end_time = 0\n\n    def __enter__(self):\n        self.start_time = time.time()\n        return self\n\n    def __exit__(self, exc_type, exc_value, exc_tb):\n        self.end_time = time.time()\n        self.time_taken = self.end_time - self.start_time\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/datasets/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/tools/datasets/analyze.py",
    "content": "import argparse\nimport os\n\nimport matplotlib.pyplot as plt\nimport pandas as pd\n\n\ndef read_file(input_path):\n    if input_path.endswith(\".csv\"):\n        return pd.read_csv(input_path)\n    elif input_path.endswith(\".parquet\"):\n        return pd.read_parquet(input_path)\n    else:\n        raise NotImplementedError(f\"Unsupported file format: {input_path}\")\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"input\", type=str, help=\"Path to the input dataset\")\n    parser.add_argument(\"--save-img\", type=str, default=\"samples/infos/\", help=\"Path to save the image\")\n    return parser.parse_args()\n\n\ndef plot_data(data, column, bins, name):\n    plt.clf()\n    data.hist(column=column, bins=bins)\n    os.makedirs(os.path.dirname(name), exist_ok=True)\n    plt.savefig(name)\n    print(f\"Saved {name}\")\n\n\ndef plot_categorical_data(data, column, name):\n    plt.clf()\n    data[column].value_counts().plot(kind=\"bar\")\n    os.makedirs(os.path.dirname(name), exist_ok=True)\n    plt.savefig(name)\n    print(f\"Saved {name}\")\n\n\nCOLUMNS = {\n    \"num_frames\": 100,\n    \"resolution\": 100,\n    \"text_len\": 100,\n    \"aes\": 100,\n    \"match\": 100,\n    \"flow\": 100,\n    \"cmotion\": None,\n}\n\n\ndef main(args):\n    data = read_file(args.input)\n\n    # === Image Data Info ===\n    image_index = data[\"num_frames\"] == 1\n    if image_index.sum() > 0:\n        print(\"=== Image Data Info ===\")\n        img_data = data[image_index]\n        print(f\"Number of images: {len(img_data)}\")\n        print(img_data.head())\n        print(img_data.describe())\n        if args.save_img:\n            for column in COLUMNS:\n                if column in img_data.columns and column not in [\"num_frames\", \"cmotion\"]:\n                    if COLUMNS[column] is None:\n                        plot_categorical_data(img_data, column, os.path.join(args.save_img, f\"image_{column}.png\"))\n                    else:\n                        plot_data(img_data, column, COLUMNS[column], os.path.join(args.save_img, f\"image_{column}.png\"))\n\n    # === Video Data Info ===\n    if not image_index.all():\n        print(\"=== Video Data Info ===\")\n        video_data = data[~image_index]\n        print(f\"Number of videos: {len(video_data)}\")\n        if \"num_frames\" in video_data.columns:\n            total_num_frames = video_data[\"num_frames\"].sum()\n            print(f\"Number of frames: {total_num_frames}\")\n            DEFAULT_FPS = 30\n            total_hours = total_num_frames / DEFAULT_FPS / 3600\n            print(f\"Total hours (30 FPS): {int(total_hours)}\")\n        print(video_data.head())\n        print(video_data.describe())\n        if args.save_img:\n            for column in COLUMNS:\n                if column in video_data.columns:\n                    if COLUMNS[column] is None:\n                        plot_categorical_data(video_data, column, os.path.join(args.save_img, f\"video_{column}.png\"))\n                    else:\n                        plot_data(\n                            video_data, column, COLUMNS[column], os.path.join(args.save_img, f\"video_{column}.png\")\n                        )\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n    main(args)\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/datasets/convert.py",
    "content": "import argparse\nimport os\nimport time\n\nimport pandas as pd\nfrom torchvision.datasets import ImageNet\n\nIMG_EXTENSIONS = (\".jpg\", \".jpeg\", \".png\", \".ppm\", \".bmp\", \".pgm\", \".tif\", \".tiff\", \".webp\")\nVID_EXTENSIONS = (\".mp4\", \".avi\", \".mov\", \".mkv\", \".m2ts\")\n\n\ndef scan_recursively(root):\n    num = 0\n    for entry in os.scandir(root):\n        if entry.is_file():\n            yield entry\n        elif entry.is_dir():\n            num += 1\n            if num % 100 == 0:\n                print(f\"Scanned {num} directories.\")\n            yield from scan_recursively(entry.path)\n\n\ndef get_filelist(file_path, exts=None):\n    filelist = []\n    time_start = time.time()\n\n    # == OS Walk ==\n    # for home, dirs, files in os.walk(file_path):\n    #     for filename in files:\n    #         ext = os.path.splitext(filename)[-1].lower()\n    #         if exts is None or ext in exts:\n    #             filelist.append(os.path.join(home, filename))\n\n    # == Scandir ==\n    obj = scan_recursively(file_path)\n    for entry in obj:\n        if entry.is_file():\n            ext = os.path.splitext(entry.name)[-1].lower()\n            if exts is None or ext in exts:\n                filelist.append(entry.path)\n\n    time_end = time.time()\n    print(f\"Scanned {len(filelist)} files in {time_end - time_start:.2f} seconds.\")\n    return filelist\n\n\ndef split_by_capital(name):\n    # BoxingPunchingBag -> Boxing Punching Bag\n    new_name = \"\"\n    for i in range(len(name)):\n        if name[i].isupper() and i != 0:\n            new_name += \" \"\n        new_name += name[i]\n    return new_name\n\n\ndef process_imagenet(root, split):\n    root = os.path.expanduser(root)\n    data = ImageNet(root, split=split)\n    samples = [(path, data.classes[label][0]) for path, label in data.samples]\n    output = f\"imagenet_{split}.csv\"\n\n    df = pd.DataFrame(samples, columns=[\"path\", \"text\"])\n    df.to_csv(output, index=False)\n    print(f\"Saved {len(samples)} samples to {output}.\")\n\n\ndef process_ucf101(root, split):\n    root = os.path.expanduser(root)\n    video_lists = get_filelist(os.path.join(root, split))\n    classes = [x.split(\"/\")[-2] for x in video_lists]\n    classes = [split_by_capital(x) for x in classes]\n    samples = list(zip(video_lists, classes))\n    output = f\"ucf101_{split}.csv\"\n\n    df = pd.DataFrame(samples, columns=[\"path\", \"text\"])\n    df.to_csv(output, index=False)\n    print(f\"Saved {len(samples)} samples to {output}.\")\n\n\ndef process_vidprom(root, info):\n    root = os.path.expanduser(root)\n    video_lists = get_filelist(root)\n    video_set = set(video_lists)\n    # read info csv\n    infos = pd.read_csv(info)\n    abs_path = infos[\"uuid\"].apply(lambda x: os.path.join(root, f\"pika-{x}.mp4\"))\n    is_exist = abs_path.apply(lambda x: x in video_set)\n    df = pd.DataFrame(dict(path=abs_path[is_exist], text=infos[\"prompt\"][is_exist]))\n    df.to_csv(\"vidprom.csv\", index=False)\n    print(f\"Saved {len(df)} samples to vidprom.csv.\")\n\n\ndef process_general_images(root, output):\n    root = os.path.expanduser(root)\n    if not os.path.exists(root):\n        return\n    path_list = get_filelist(root, IMG_EXTENSIONS)\n    fname_list = [os.path.splitext(os.path.basename(x))[0] for x in path_list]\n    df = pd.DataFrame(dict(id=fname_list, path=path_list))\n\n    os.makedirs(os.path.dirname(output), exist_ok=True)\n    df.to_csv(output, index=False)\n    print(f\"Saved {len(df)} samples to {output}.\")\n\n\ndef process_general_videos(root, output):\n    root = os.path.expanduser(root)\n    if not os.path.exists(root):\n        return\n    path_list = get_filelist(root, VID_EXTENSIONS)\n    path_list = list(set(path_list))  # remove duplicates\n    fname_list = [os.path.splitext(os.path.basename(x))[0] for x in path_list]\n    relpath_list = [os.path.relpath(x, root) for x in path_list]\n    df = pd.DataFrame(dict(path=path_list, id=fname_list, relpath=relpath_list))\n\n    os.makedirs(os.path.dirname(output), exist_ok=True)\n    df.to_csv(output, index=False)\n    print(f\"Saved {len(df)} samples to {output}.\")\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"dataset\", type=str, choices=[\"imagenet\", \"ucf101\", \"vidprom\", \"image\", \"video\"])\n    parser.add_argument(\"root\", type=str)\n    parser.add_argument(\"--split\", type=str, default=\"train\")\n    parser.add_argument(\"--info\", type=str, default=None)\n    parser.add_argument(\"--output\", type=str, default=None, required=True, help=\"Output path\")\n    args = parser.parse_args()\n\n    if args.dataset == \"imagenet\":\n        process_imagenet(args.root, args.split)\n    elif args.dataset == \"ucf101\":\n        process_ucf101(args.root, args.split)\n    elif args.dataset == \"vidprom\":\n        process_vidprom(args.root, args.info)\n    elif args.dataset == \"image\":\n        process_general_images(args.root, args.output)\n    elif args.dataset == \"video\":\n        process_general_videos(args.root, args.output)\n    else:\n        raise ValueError(\"Invalid dataset\")\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/datasets/datautil.py",
    "content": "import argparse\nimport html\nimport json\nimport os\nimport random\nimport re\nfrom functools import partial\nfrom glob import glob\n\nimport cv2\nimport numpy as np\nimport pandas as pd\nfrom PIL import Image\nfrom tqdm import tqdm\n\nfrom opensora.datasets.read_video import read_video\n\nfrom .utils import IMG_EXTENSIONS\n\ntqdm.pandas()\n\ntry:\n    from pandarallel import pandarallel\n\n    PANDA_USE_PARALLEL = True\nexcept ImportError:\n    PANDA_USE_PARALLEL = False\n\n\ndef apply(df, func, **kwargs):\n    if PANDA_USE_PARALLEL:\n        return df.parallel_apply(func, **kwargs)\n    return df.progress_apply(func, **kwargs)\n\n\nTRAIN_COLUMNS = [\"path\", \"text\", \"num_frames\", \"fps\", \"height\", \"width\", \"aspect_ratio\", \"resolution\", \"text_len\"]\n\n# ======================================================\n# --info\n# ======================================================\n\n\ndef get_video_length(cap, method=\"header\"):\n    assert method in [\"header\", \"set\"]\n    if method == \"header\":\n        length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))\n    else:\n        cap.set(cv2.CAP_PROP_POS_AVI_RATIO, 1)\n        length = int(cap.get(cv2.CAP_PROP_POS_FRAMES))\n    return length\n\n\ndef get_info_old(path):\n    try:\n        ext = os.path.splitext(path)[1].lower()\n        if ext in IMG_EXTENSIONS:\n            im = cv2.imread(path)\n            if im is None:\n                return 0, 0, 0, np.nan, np.nan, np.nan\n            height, width = im.shape[:2]\n            num_frames, fps = 1, np.nan\n        else:\n            cap = cv2.VideoCapture(path)\n            num_frames, height, width, fps = (\n                get_video_length(cap, method=\"header\"),\n                int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)),\n                int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),\n                float(cap.get(cv2.CAP_PROP_FPS)),\n            )\n        hw = height * width\n        aspect_ratio = height / width if width > 0 else np.nan\n        return num_frames, height, width, aspect_ratio, fps, hw\n    except:\n        return 0, 0, 0, np.nan, np.nan, np.nan\n\n\ndef get_info(path):\n    try:\n        ext = os.path.splitext(path)[1].lower()\n        if ext in IMG_EXTENSIONS:\n            return get_image_info(path)\n        else:\n            return get_video_info(path)\n    except:\n        return 0, 0, 0, np.nan, np.nan, np.nan\n\n\ndef get_image_info(path, backend=\"pillow\"):\n    if backend == \"pillow\":\n        try:\n            with open(path, \"rb\") as f:\n                img = Image.open(f)\n                img = img.convert(\"RGB\")\n            width, height = img.size\n            num_frames, fps = 1, np.nan\n            hw = height * width\n            aspect_ratio = height / width if width > 0 else np.nan\n            return num_frames, height, width, aspect_ratio, fps, hw\n        except:\n            return 0, 0, 0, np.nan, np.nan, np.nan\n    elif backend == \"cv2\":\n        try:\n            im = cv2.imread(path)\n            if im is None:\n                return 0, 0, 0, np.nan, np.nan, np.nan\n            height, width = im.shape[:2]\n            num_frames, fps = 1, np.nan\n            hw = height * width\n            aspect_ratio = height / width if width > 0 else np.nan\n            return num_frames, height, width, aspect_ratio, fps, hw\n        except:\n            return 0, 0, 0, np.nan, np.nan, np.nan\n    else:\n        raise ValueError\n\n\ndef get_video_info(path, backend=\"torchvision\"):\n    if backend == \"torchvision\":\n        try:\n            vframes, infos = read_video(path)\n            num_frames, height, width = vframes.shape[0], vframes.shape[2], vframes.shape[3]\n            if \"video_fps\" in infos:\n                fps = infos[\"video_fps\"]\n            else:\n                fps = np.nan\n            hw = height * width\n            aspect_ratio = height / width if width > 0 else np.nan\n            return num_frames, height, width, aspect_ratio, fps, hw\n        except:\n            return 0, 0, 0, np.nan, np.nan, np.nan\n    elif backend == \"cv2\":\n        try:\n            cap = cv2.VideoCapture(path)\n            num_frames, height, width, fps = (\n                get_video_length(cap, method=\"header\"),\n                int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)),\n                int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),\n                float(cap.get(cv2.CAP_PROP_FPS)),\n            )\n            hw = height * width\n            aspect_ratio = height / width if width > 0 else np.nan\n            return num_frames, height, width, aspect_ratio, fps, hw\n        except:\n            return 0, 0, 0, np.nan, np.nan, np.nan\n    else:\n        raise ValueError\n\n\n# ======================================================\n# --refine-llm-caption\n# ======================================================\n\nLLAVA_PREFIX = [\n    \"The video shows\",\n    \"The video captures\",\n    \"The video features\",\n    \"The video depicts\",\n    \"The video presents\",\n    \"The video features\",\n    \"The video is \",\n    \"In the video,\",\n    \"The image shows\",\n    \"The image captures\",\n    \"The image features\",\n    \"The image depicts\",\n    \"The image presents\",\n    \"The image features\",\n    \"The image is \",\n    \"The image portrays\",\n    \"In the image,\",\n]\n\n\ndef remove_caption_prefix(caption):\n    for prefix in LLAVA_PREFIX:\n        if caption.startswith(prefix) or caption.startswith(prefix.lower()):\n            caption = caption[len(prefix) :].strip()\n            if caption[0].islower():\n                caption = caption[0].upper() + caption[1:]\n            return caption\n    return caption\n\n\n# ======================================================\n# --merge-cmotion\n# ======================================================\n\nCMOTION_TEXT = {\n    \"static\": \"static\",\n    \"pan_right\": \"pan right\",\n    \"pan_left\": \"pan left\",\n    \"zoom_in\": \"zoom in\",\n    \"zoom_out\": \"zoom out\",\n    \"tilt_up\": \"tilt up\",\n    \"tilt_down\": \"tilt down\",\n    # \"pan/tilt\": \"The camera is panning.\",\n    # \"dynamic\": \"The camera is moving.\",\n    # \"unknown\": None,\n}\nCMOTION_PROBS = {\n    # hard-coded probabilities\n    \"static\": 1.0,\n    \"zoom_in\": 1.0,\n    \"zoom_out\": 1.0,\n    \"pan_left\": 1.0,\n    \"pan_right\": 1.0,\n    \"tilt_up\": 1.0,\n    \"tilt_down\": 1.0,\n    # \"dynamic\": 1.0,\n    # \"unknown\": 0.0,\n    # \"pan/tilt\": 1.0,\n}\n\n\ndef merge_cmotion(caption, cmotion):\n    text = CMOTION_TEXT[cmotion]\n    prob = CMOTION_PROBS[cmotion]\n    if text is not None and random.random() < prob:\n        caption = f\"{caption} Camera motion: {text}.\"\n    return caption\n\n\n# ======================================================\n# --lang\n# ======================================================\n\n\ndef build_lang_detector(lang_to_detect):\n    from lingua import Language, LanguageDetectorBuilder\n\n    lang_dict = dict(en=Language.ENGLISH)\n    assert lang_to_detect in lang_dict\n    valid_lang = lang_dict[lang_to_detect]\n    detector = LanguageDetectorBuilder.from_all_spoken_languages().with_low_accuracy_mode().build()\n\n    def detect_lang(caption):\n        confidence_values = detector.compute_language_confidence_values(caption)\n        confidence = [x.language for x in confidence_values[:5]]\n        if valid_lang not in confidence:\n            return False\n        return True\n\n    return detect_lang\n\n\n# ======================================================\n# --clean-caption\n# ======================================================\n\n\ndef basic_clean(text):\n    import ftfy\n\n    text = ftfy.fix_text(text)\n    text = html.unescape(html.unescape(text))\n    return text.strip()\n\n\nBAD_PUNCT_REGEX = re.compile(\n    r\"[\" + \"#®•©™&@·º½¾¿¡§~\" + \"\\)\" + \"\\(\" + \"\\]\" + \"\\[\" + \"\\}\" + \"\\{\" + \"\\|\" + \"\\\\\" + \"\\/\" + \"\\*\" + r\"]{1,}\"\n)  # noqa\n\n\ndef clean_caption(caption):\n    import urllib.parse as ul\n\n    from bs4 import BeautifulSoup\n\n    caption = str(caption)\n    caption = ul.unquote_plus(caption)\n    caption = caption.strip().lower()\n    caption = re.sub(\"<person>\", \"person\", caption)\n    # urls:\n    caption = re.sub(\n        r\"\\b((?:https?:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))\",  # noqa\n        \"\",\n        caption,\n    )  # regex for urls\n    caption = re.sub(\n        r\"\\b((?:www:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))\",  # noqa\n        \"\",\n        caption,\n    )  # regex for urls\n    # html:\n    caption = BeautifulSoup(caption, features=\"html.parser\").text\n\n    # @<nickname>\n    caption = re.sub(r\"@[\\w\\d]+\\b\", \"\", caption)\n\n    # 31C0—31EF CJK Strokes\n    # 31F0—31FF Katakana Phonetic Extensions\n    # 3200—32FF Enclosed CJK Letters and Months\n    # 3300—33FF CJK Compatibility\n    # 3400—4DBF CJK Unified Ideographs Extension A\n    # 4DC0—4DFF Yijing Hexagram Symbols\n    # 4E00—9FFF CJK Unified Ideographs\n    caption = re.sub(r\"[\\u31c0-\\u31ef]+\", \"\", caption)\n    caption = re.sub(r\"[\\u31f0-\\u31ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3200-\\u32ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3300-\\u33ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3400-\\u4dbf]+\", \"\", caption)\n    caption = re.sub(r\"[\\u4dc0-\\u4dff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u4e00-\\u9fff]+\", \"\", caption)\n    #######################################################\n\n    # все виды тире / all types of dash --> \"-\"\n    caption = re.sub(\n        r\"[\\u002D\\u058A\\u05BE\\u1400\\u1806\\u2010-\\u2015\\u2E17\\u2E1A\\u2E3A\\u2E3B\\u2E40\\u301C\\u3030\\u30A0\\uFE31\\uFE32\\uFE58\\uFE63\\uFF0D]+\",  # noqa\n        \"-\",\n        caption,\n    )\n\n    # кавычки к одному стандарту\n    caption = re.sub(r\"[`´«»“”¨]\", '\"', caption)\n    caption = re.sub(r\"[‘’]\", \"'\", caption)\n\n    # &quot;\n    caption = re.sub(r\"&quot;?\", \"\", caption)\n    # &amp\n    caption = re.sub(r\"&amp\", \"\", caption)\n\n    # ip adresses:\n    caption = re.sub(r\"\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\", \" \", caption)\n\n    # article ids:\n    caption = re.sub(r\"\\d:\\d\\d\\s+$\", \"\", caption)\n\n    # \\n\n    caption = re.sub(r\"\\\\n\", \" \", caption)\n\n    # \"#123\"\n    caption = re.sub(r\"#\\d{1,3}\\b\", \"\", caption)\n    # \"#12345..\"\n    caption = re.sub(r\"#\\d{5,}\\b\", \"\", caption)\n    # \"123456..\"\n    caption = re.sub(r\"\\b\\d{6,}\\b\", \"\", caption)\n    # filenames:\n    caption = re.sub(r\"[\\S]+\\.(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)\", \"\", caption)\n\n    #\n    caption = re.sub(r\"[\\\"\\']{2,}\", r'\"', caption)  # \"\"\"AUSVERKAUFT\"\"\"\n    caption = re.sub(r\"[\\.]{2,}\", r\" \", caption)  # \"\"\"AUSVERKAUFT\"\"\"\n\n    caption = re.sub(BAD_PUNCT_REGEX, r\" \", caption)  # ***AUSVERKAUFT***, #AUSVERKAUFT\n    caption = re.sub(r\"\\s+\\.\\s+\", r\" \", caption)  # \" . \"\n\n    # this-is-my-cute-cat / this_is_my_cute_cat\n    regex2 = re.compile(r\"(?:\\-|\\_)\")\n    if len(re.findall(regex2, caption)) > 3:\n        caption = re.sub(regex2, \" \", caption)\n\n    caption = basic_clean(caption)\n\n    caption = re.sub(r\"\\b[a-zA-Z]{1,3}\\d{3,15}\\b\", \"\", caption)  # jc6640\n    caption = re.sub(r\"\\b[a-zA-Z]+\\d+[a-zA-Z]+\\b\", \"\", caption)  # jc6640vc\n    caption = re.sub(r\"\\b\\d+[a-zA-Z]+\\d+\\b\", \"\", caption)  # 6640vc231\n\n    caption = re.sub(r\"(worldwide\\s+)?(free\\s+)?shipping\", \"\", caption)\n    caption = re.sub(r\"(free\\s)?download(\\sfree)?\", \"\", caption)\n    caption = re.sub(r\"\\bclick\\b\\s(?:for|on)\\s\\w+\", \"\", caption)\n    caption = re.sub(r\"\\b(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)(\\simage[s]?)?\", \"\", caption)\n    caption = re.sub(r\"\\bpage\\s+\\d+\\b\", \"\", caption)\n\n    caption = re.sub(r\"\\b\\d*[a-zA-Z]+\\d+[a-zA-Z]+\\d+[a-zA-Z\\d]*\\b\", r\" \", caption)  # j2d1a2a...\n\n    caption = re.sub(r\"\\b\\d+\\.?\\d*[xх×]\\d+\\.?\\d*\\b\", \"\", caption)\n\n    caption = re.sub(r\"\\b\\s+\\:\\s+\", r\": \", caption)\n    caption = re.sub(r\"(\\D[,\\./])\\b\", r\"\\1 \", caption)\n    caption = re.sub(r\"\\s+\", \" \", caption)\n\n    caption.strip()\n\n    caption = re.sub(r\"^[\\\"\\']([\\w\\W]+)[\\\"\\']$\", r\"\\1\", caption)\n    caption = re.sub(r\"^[\\'\\_,\\-\\:;]\", r\"\", caption)\n    caption = re.sub(r\"[\\'\\_,\\-\\:\\-\\+]$\", r\"\", caption)\n    caption = re.sub(r\"^\\.\\S+$\", \"\", caption)\n\n    return caption.strip()\n\n\ndef text_preprocessing(text, use_text_preprocessing: bool = True):\n    if use_text_preprocessing:\n        # The exact text cleaning as was in the training stage:\n        text = clean_caption(text)\n        text = clean_caption(text)\n        return text\n    else:\n        return text.lower().strip()\n\n\n# ======================================================\n# load caption\n# ======================================================\n\n\ndef load_caption(path, ext):\n    try:\n        assert ext in [\"json\"]\n        json_path = path.split(\".\")[0] + \".json\"\n        with open(json_path, \"r\") as f:\n            data = json.load(f)\n        caption = data[\"caption\"]\n        return caption\n    except:\n        return \"\"\n\n\n# ======================================================\n# --clean-caption\n# ======================================================\n\nDROP_SCORE_PROB = 0.2\n\n\ndef score_to_text(data):\n    text = data[\"text\"]\n    scores = []\n    # aesthetic\n    if \"aes\" in data:\n        aes = data[\"aes\"]\n        if random.random() > DROP_SCORE_PROB:\n            score_text = f\"aesthetic score: {aes:.1f}\"\n            scores.append(score_text)\n    if \"flow\" in data:\n        flow = data[\"flow\"]\n        if random.random() > DROP_SCORE_PROB:\n            score_text = f\"motion score: {flow:.1f}\"\n            scores.append(score_text)\n    if len(scores) > 0:\n        text = f\"{text} [{', '.join(scores)}]\"\n    return text\n\n\n# ======================================================\n# read & write\n# ======================================================\n\n\ndef read_file(input_path):\n    if input_path.endswith(\".csv\"):\n        return pd.read_csv(input_path)\n    elif input_path.endswith(\".parquet\"):\n        return pd.read_parquet(input_path)\n    else:\n        raise NotImplementedError(f\"Unsupported file format: {input_path}\")\n\n\ndef save_file(data, output_path):\n    output_dir = os.path.dirname(output_path)\n    if not os.path.exists(output_dir) and output_dir != \"\":\n        os.makedirs(output_dir)\n    if output_path.endswith(\".csv\"):\n        return data.to_csv(output_path, index=False)\n    elif output_path.endswith(\".parquet\"):\n        return data.to_parquet(output_path, index=False)\n    else:\n        raise NotImplementedError(f\"Unsupported file format: {output_path}\")\n\n\ndef read_data(input_paths):\n    data = []\n    input_name = \"\"\n    input_list = []\n    for input_path in input_paths:\n        input_list.extend(glob(input_path))\n    print(\"Input files:\", input_list)\n    for i, input_path in enumerate(input_list):\n        if not os.path.exists(input_path):\n            continue\n        data.append(read_file(input_path))\n        input_name += os.path.basename(input_path).split(\".\")[0]\n        if i != len(input_list) - 1:\n            input_name += \"+\"\n        print(f\"Loaded {len(data[-1])} samples from '{input_path}'.\")\n    if len(data) == 0:\n        print(f\"No samples to process. Exit.\")\n        exit()\n    data = pd.concat(data, ignore_index=True, sort=False)\n    print(f\"Total number of samples: {len(data)}\")\n    return data, input_name\n\n\n# ======================================================\n# main\n# ======================================================\n# To add a new method, register it in the main, parse_args, and get_output_path functions, and update the doc at /tools/datasets/README.md#documentation\n\n\ndef main(args):\n    # reading data\n    data, input_name = read_data(args.input)\n\n    # make difference\n    if args.difference is not None:\n        data_diff = pd.read_csv(args.difference)\n        print(f\"Difference csv contains {len(data_diff)} samples.\")\n        data = data[~data[\"path\"].isin(data_diff[\"path\"])]\n        input_name += f\"-{os.path.basename(args.difference).split('.')[0]}\"\n        print(f\"Filtered number of samples: {len(data)}.\")\n\n    # make intersection\n    if args.intersection is not None:\n        data_new = pd.read_csv(args.intersection)\n        print(f\"Intersection csv contains {len(data_new)} samples.\")\n        cols_to_use = data_new.columns.difference(data.columns)\n\n        col_on = \"path\"\n        # if 'id' in data.columns and 'id' in data_new.columns:\n        #     col_on = 'id'\n        cols_to_use = cols_to_use.insert(0, col_on)\n        data = pd.merge(data, data_new[cols_to_use], on=col_on, how=\"inner\")\n        print(f\"Intersection number of samples: {len(data)}.\")\n\n    # get output path\n    output_path = get_output_path(args, input_name)\n\n    # preparation\n    if args.lang is not None:\n        detect_lang = build_lang_detector(args.lang)\n    if args.count_num_token == \"t5\":\n        from transformers import AutoTokenizer\n\n        tokenizer = AutoTokenizer.from_pretrained(\"DeepFloyd/t5-v1_1-xxl\")\n\n    # IO-related\n    if args.load_caption is not None:\n        assert \"path\" in data.columns\n        data[\"text\"] = apply(data[\"path\"], load_caption, ext=args.load_caption)\n    if args.info:\n        info = apply(data[\"path\"], get_info)\n        (\n            data[\"num_frames\"],\n            data[\"height\"],\n            data[\"width\"],\n            data[\"aspect_ratio\"],\n            data[\"fps\"],\n            data[\"resolution\"],\n        ) = zip(*info)\n    if args.video_info:\n        info = apply(data[\"path\"], get_video_info)\n        (\n            data[\"num_frames\"],\n            data[\"height\"],\n            data[\"width\"],\n            data[\"aspect_ratio\"],\n            data[\"fps\"],\n            data[\"resolution\"],\n        ) = zip(*info)\n    if args.ext:\n        assert \"path\" in data.columns\n        data = data[apply(data[\"path\"], os.path.exists)]\n\n    # filtering\n    if args.remove_url:\n        assert \"text\" in data.columns\n        data = data[~data[\"text\"].str.contains(r\"(?P<url>https?://[^\\s]+)\", regex=True)]\n    if args.lang is not None:\n        assert \"text\" in data.columns\n        data = data[data[\"text\"].progress_apply(detect_lang)]  # cannot parallelize\n    if args.remove_empty_path:\n        assert \"path\" in data.columns\n        data = data[data[\"path\"].str.len() > 0]\n        data = data[~data[\"path\"].isna()]\n    if args.remove_empty_caption:\n        assert \"text\" in data.columns\n        data = data[data[\"text\"].str.len() > 0]\n        data = data[~data[\"text\"].isna()]\n    if args.remove_path_duplication:\n        assert \"path\" in data.columns\n        data = data.drop_duplicates(subset=[\"path\"])\n    if args.path_subset:\n        data = data[data[\"path\"].str.contains(args.path_subset)]\n\n    # processing\n    if args.relpath is not None:\n        data[\"path\"] = apply(data[\"path\"], lambda x: os.path.relpath(x, args.relpath))\n    if args.abspath is not None:\n        data[\"path\"] = apply(data[\"path\"], lambda x: os.path.join(args.abspath, x))\n    if args.path_to_id:\n        data[\"id\"] = apply(data[\"path\"], lambda x: os.path.splitext(os.path.basename(x))[0])\n    if args.merge_cmotion:\n        data[\"text\"] = apply(data, lambda x: merge_cmotion(x[\"text\"], x[\"cmotion\"]), axis=1)\n    if args.refine_llm_caption:\n        assert \"text\" in data.columns\n        data[\"text\"] = apply(data[\"text\"], remove_caption_prefix)\n    if args.append_text is not None:\n        assert \"text\" in data.columns\n        data[\"text\"] = data[\"text\"] + args.append_text\n    if args.score_to_text:\n        data[\"text\"] = apply(data, score_to_text, axis=1)\n    if args.clean_caption:\n        assert \"text\" in data.columns\n        data[\"text\"] = apply(\n            data[\"text\"],\n            partial(text_preprocessing, use_text_preprocessing=True),\n        )\n    if args.count_num_token is not None:\n        assert \"text\" in data.columns\n        data[\"text_len\"] = apply(data[\"text\"], lambda x: len(tokenizer(x)[\"input_ids\"]))\n    if args.update_text is not None:\n        data_new = pd.read_csv(args.update_text)\n        num_updated = data.path.isin(data_new.path).sum()\n        print(f\"Number of updated samples: {num_updated}.\")\n        data = data.set_index(\"path\")\n        data_new = data_new[[\"path\", \"text\"]].set_index(\"path\")\n        data.update(data_new)\n        data = data.reset_index()\n\n    # sort\n    if args.sort is not None:\n        data = data.sort_values(by=args.sort, ascending=False)\n    if args.sort_ascending is not None:\n        data = data.sort_values(by=args.sort_ascending, ascending=True)\n\n    # filtering\n    if args.filesize:\n        assert \"path\" in data.columns\n        data[\"filesize\"] = apply(data[\"path\"], lambda x: os.stat(x).st_size / 1024 / 1024)\n    if args.fsmax is not None:\n        assert \"filesize\" in data.columns\n        data = data[data[\"filesize\"] <= args.fsmax]\n    if args.remove_empty_caption:\n        assert \"text\" in data.columns\n        data = data[data[\"text\"].str.len() > 0]\n        data = data[~data[\"text\"].isna()]\n    if args.fmin is not None:\n        assert \"num_frames\" in data.columns\n        data = data[data[\"num_frames\"] >= args.fmin]\n    if args.fmax is not None:\n        assert \"num_frames\" in data.columns\n        data = data[data[\"num_frames\"] <= args.fmax]\n    if args.fpsmax is not None:\n        assert \"fps\" in data.columns\n        data = data[(data[\"fps\"] <= args.fpsmax) | np.isnan(data[\"fps\"])]\n    if args.hwmax is not None:\n        if \"resolution\" not in data.columns:\n            height = data[\"height\"]\n            width = data[\"width\"]\n            data[\"resolution\"] = height * width\n        data = data[data[\"resolution\"] <= args.hwmax]\n    if args.aesmin is not None:\n        assert \"aes\" in data.columns\n        data = data[data[\"aes\"] >= args.aesmin]\n    if args.matchmin is not None:\n        assert \"match\" in data.columns\n        data = data[data[\"match\"] >= args.matchmin]\n    if args.flowmin is not None:\n        assert \"flow\" in data.columns\n        data = data[data[\"flow\"] >= args.flowmin]\n    if args.remove_text_duplication:\n        data = data.drop_duplicates(subset=[\"text\"], keep=\"first\")\n    if args.img_only:\n        data = data[data[\"path\"].str.lower().str.endswith(IMG_EXTENSIONS)]\n    if args.vid_only:\n        data = data[~data[\"path\"].str.lower().str.endswith(IMG_EXTENSIONS)]\n\n    # process data\n    if args.shuffle:\n        data = data.sample(frac=1).reset_index(drop=True)  # shuffle\n    if args.head is not None:\n        data = data.head(args.head)\n\n    # train columns\n    if args.train_column:\n        all_columns = data.columns\n        columns_to_drop = all_columns.difference(TRAIN_COLUMNS)\n        data = data.drop(columns=columns_to_drop)\n\n    print(f\"Filtered number of samples: {len(data)}.\")\n\n    # shard data\n    if args.shard is not None:\n        sharded_data = np.array_split(data, args.shard)\n        for i in range(args.shard):\n            output_path_part = output_path.split(\".\")\n            output_path_s = \".\".join(output_path_part[:-1]) + f\"_{i}.\" + output_path_part[-1]\n            save_file(sharded_data[i], output_path_s)\n            print(f\"Saved {len(sharded_data[i])} samples to {output_path_s}.\")\n    else:\n        save_file(data, output_path)\n        print(f\"Saved {len(data)} samples to {output_path}.\")\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"input\", type=str, nargs=\"+\", help=\"path to the input dataset\")\n    parser.add_argument(\"--output\", type=str, default=None, help=\"output path\")\n    parser.add_argument(\"--format\", type=str, default=\"csv\", help=\"output format\", choices=[\"csv\", \"parquet\"])\n    parser.add_argument(\"--disable-parallel\", action=\"store_true\", help=\"disable parallel processing\")\n    parser.add_argument(\"--num-workers\", type=int, default=None, help=\"number of workers\")\n    parser.add_argument(\"--seed\", type=int, default=42, help=\"random seed\")\n\n    # special case\n    parser.add_argument(\"--shard\", type=int, default=None, help=\"shard the dataset\")\n    parser.add_argument(\"--sort\", type=str, default=None, help=\"sort by column\")\n    parser.add_argument(\"--sort-ascending\", type=str, default=None, help=\"sort by column (ascending order)\")\n    parser.add_argument(\"--difference\", type=str, default=None, help=\"get difference from the dataset\")\n    parser.add_argument(\n        \"--intersection\", type=str, default=None, help=\"keep the paths in csv from the dataset and merge columns\"\n    )\n    parser.add_argument(\"--train-column\", action=\"store_true\", help=\"only keep the train column\")\n\n    # IO-related\n    parser.add_argument(\"--info\", action=\"store_true\", help=\"get the basic information of each video and image\")\n    parser.add_argument(\"--video-info\", action=\"store_true\", help=\"get the basic information of each video\")\n    parser.add_argument(\"--ext\", action=\"store_true\", help=\"check if the file exists\")\n    parser.add_argument(\n        \"--load-caption\", type=str, default=None, choices=[\"json\", \"txt\"], help=\"load the caption from json or txt\"\n    )\n\n    # path processing\n    parser.add_argument(\"--relpath\", type=str, default=None, help=\"modify the path to relative path by root given\")\n    parser.add_argument(\"--abspath\", type=str, default=None, help=\"modify the path to absolute path by root given\")\n    parser.add_argument(\"--path-to-id\", action=\"store_true\", help=\"add id based on path\")\n    parser.add_argument(\n        \"--path-subset\", type=str, default=None, help=\"extract a subset data containing the given `path-subset` value\"\n    )\n    parser.add_argument(\n        \"--remove-empty-path\",\n        action=\"store_true\",\n        help=\"remove rows with empty path\",  # caused by transform, cannot read path\n    )\n\n    # caption filtering\n    parser.add_argument(\n        \"--remove-empty-caption\",\n        action=\"store_true\",\n        help=\"remove rows with empty caption\",\n    )\n    parser.add_argument(\"--remove-url\", action=\"store_true\", help=\"remove rows with url in caption\")\n    parser.add_argument(\"--lang\", type=str, default=None, help=\"remove rows with other language\")\n    parser.add_argument(\"--remove-path-duplication\", action=\"store_true\", help=\"remove rows with duplicated path\")\n    parser.add_argument(\"--remove-text-duplication\", action=\"store_true\", help=\"remove rows with duplicated caption\")\n\n    # caption processing\n    parser.add_argument(\"--refine-llm-caption\", action=\"store_true\", help=\"modify the caption generated by LLM\")\n    parser.add_argument(\n        \"--clean-caption\", action=\"store_true\", help=\"modify the caption according to T5 pipeline to suit training\"\n    )\n    parser.add_argument(\"--merge-cmotion\", action=\"store_true\", help=\"merge the camera motion to the caption\")\n    parser.add_argument(\n        \"--count-num-token\", type=str, choices=[\"t5\"], default=None, help=\"Count the number of tokens in the caption\"\n    )\n    parser.add_argument(\"--append-text\", type=str, default=None, help=\"append text to the caption\")\n    parser.add_argument(\"--score-to-text\", action=\"store_true\", help=\"convert score to text\")\n    parser.add_argument(\"--update-text\", type=str, default=None, help=\"update the text with the given text\")\n\n    # score filtering\n    parser.add_argument(\"--filesize\", action=\"store_true\", help=\"get the filesize of each video and image in MB\")\n    parser.add_argument(\"--fsmax\", type=int, default=None, help=\"filter the dataset by maximum filesize\")\n    parser.add_argument(\"--fmin\", type=int, default=None, help=\"filter the dataset by minimum number of frames\")\n    parser.add_argument(\"--fmax\", type=int, default=None, help=\"filter the dataset by maximum number of frames\")\n    parser.add_argument(\"--hwmax\", type=int, default=None, help=\"filter the dataset by maximum resolution\")\n    parser.add_argument(\"--aesmin\", type=float, default=None, help=\"filter the dataset by minimum aes score\")\n    parser.add_argument(\"--matchmin\", type=float, default=None, help=\"filter the dataset by minimum match score\")\n    parser.add_argument(\"--flowmin\", type=float, default=None, help=\"filter the dataset by minimum flow score\")\n    parser.add_argument(\"--fpsmax\", type=float, default=None, help=\"filter the dataset by maximum fps\")\n    parser.add_argument(\"--img-only\", action=\"store_true\", help=\"only keep the image data\")\n    parser.add_argument(\"--vid-only\", action=\"store_true\", help=\"only keep the video data\")\n\n    # data processing\n    parser.add_argument(\"--shuffle\", default=False, action=\"store_true\", help=\"shuffle the dataset\")\n    parser.add_argument(\"--head\", type=int, default=None, help=\"return the first n rows of data\")\n\n    return parser.parse_args()\n\n\ndef get_output_path(args, input_name):\n    if args.output is not None:\n        return args.output\n    name = input_name\n    dir_path = os.path.dirname(args.input[0])\n\n    # sort\n    if args.sort is not None:\n        assert args.sort_ascending is None\n        name += \"_sort\"\n    if args.sort_ascending is not None:\n        assert args.sort is None\n        name += \"_sort\"\n\n    # IO-related\n    # for IO-related, the function must be wrapped in try-except\n    if args.info:\n        name += \"_info\"\n    if args.video_info:\n        name += \"_vinfo\"\n    if args.ext:\n        name += \"_ext\"\n    if args.load_caption:\n        name += f\"_load{args.load_caption}\"\n\n    # path processing\n    if args.relpath is not None:\n        name += \"_relpath\"\n    if args.abspath is not None:\n        name += \"_abspath\"\n    if args.remove_empty_path:\n        name += \"_noemptypath\"\n\n    # caption filtering\n    if args.remove_empty_caption:\n        name += \"_noempty\"\n    if args.remove_url:\n        name += \"_nourl\"\n    if args.lang is not None:\n        name += f\"_{args.lang}\"\n    if args.remove_path_duplication:\n        name += \"_noduppath\"\n    if args.remove_text_duplication:\n        name += \"_noduptext\"\n    if args.path_subset:\n        name += \"_subset\"\n\n    # caption processing\n    if args.refine_llm_caption:\n        name += \"_llm\"\n    if args.clean_caption:\n        name += \"_clean\"\n    if args.merge_cmotion:\n        name += \"_cmcaption\"\n    if args.count_num_token:\n        name += \"_ntoken\"\n    if args.append_text is not None:\n        name += \"_appendtext\"\n    if args.score_to_text:\n        name += \"_score2text\"\n    if args.update_text is not None:\n        name += \"_update\"\n\n    # score filtering\n    if args.filesize:\n        name += \"_filesize\"\n    if args.fsmax is not None:\n        name += f\"_fsmax{args.fsmax}\"\n    if args.fmin is not None:\n        name += f\"_fmin{args.fmin}\"\n    if args.fmax is not None:\n        name += f\"_fmax{args.fmax}\"\n    if args.fpsmax is not None:\n        name += f\"_fpsmax{args.fpsmax}\"\n    if args.hwmax is not None:\n        name += f\"_hwmax{args.hwmax}\"\n    if args.aesmin is not None:\n        name += f\"_aesmin{args.aesmin}\"\n    if args.matchmin is not None:\n        name += f\"_matchmin{args.matchmin}\"\n    if args.flowmin is not None:\n        name += f\"_flowmin{args.flowmin}\"\n    if args.img_only:\n        name += \"_img\"\n    if args.vid_only:\n        name += \"_vid\"\n\n    # processing\n    if args.shuffle:\n        name += f\"_shuffled_seed{args.seed}\"\n    if args.head is not None:\n        name += f\"_first_{args.head}_data\"\n\n    output_path = os.path.join(dir_path, f\"{name}.{args.format}\")\n    return output_path\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n    if args.disable_parallel:\n        PANDA_USE_PARALLEL = False\n    if PANDA_USE_PARALLEL:\n        if args.num_workers is not None:\n            pandarallel.initialize(nb_workers=args.num_workers, progress_bar=True)\n        else:\n            pandarallel.initialize(progress_bar=True)\n    if args.seed is not None:\n        random.seed(args.seed)\n        np.random.seed(args.seed)\n    main(args)\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/datasets/filter_panda10m.py",
    "content": "# TODO: remove this file before releasing\n\nimport argparse\nimport html\nimport os\nimport re\n\nimport pandas as pd\nfrom tqdm import tqdm\n\ntqdm.pandas()\n\ntry:\n    from pandarallel import pandarallel\n\n    pandarallel.initialize(progress_bar=True)\n    pandas_has_parallel = True\nexcept ImportError:\n    pandas_has_parallel = False\n\n\ndef apply(df, func, **kwargs):\n    if pandas_has_parallel:\n        return df.parallel_apply(func, **kwargs)\n    return df.progress_apply(func, **kwargs)\n\n\ndef basic_clean(text):\n    import ftfy\n\n    text = ftfy.fix_text(text)\n    text = html.unescape(html.unescape(text))\n    return text.strip()\n\n\nBAD_PUNCT_REGEX = re.compile(\n    r\"[\" + \"#®•©™&@·º½¾¿¡§~\" + \"\\)\" + \"\\(\" + \"\\]\" + \"\\[\" + \"\\}\" + \"\\{\" + \"\\|\" + \"\\\\\" + \"\\/\" + \"\\*\" + r\"]{1,}\"\n)  # noqa\n\n\ndef clean_caption(caption):\n    import urllib.parse as ul\n\n    from bs4 import BeautifulSoup\n\n    caption = str(caption)\n    caption = ul.unquote_plus(caption)\n    caption = caption.strip().lower()\n    caption = re.sub(\"<person>\", \"person\", caption)\n    # urls:\n    caption = re.sub(\n        r\"\\b((?:https?:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))\",  # noqa\n        \"\",\n        caption,\n    )  # regex for urls\n    caption = re.sub(\n        r\"\\b((?:www:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))\",  # noqa\n        \"\",\n        caption,\n    )  # regex for urls\n    # html:\n    caption = BeautifulSoup(caption, features=\"html.parser\").text\n\n    # @<nickname>\n    caption = re.sub(r\"@[\\w\\d]+\\b\", \"\", caption)\n\n    # 31C0—31EF CJK Strokes\n    # 31F0—31FF Katakana Phonetic Extensions\n    # 3200—32FF Enclosed CJK Letters and Months\n    # 3300—33FF CJK Compatibility\n    # 3400—4DBF CJK Unified Ideographs Extension A\n    # 4DC0—4DFF Yijing Hexagram Symbols\n    # 4E00—9FFF CJK Unified Ideographs\n    caption = re.sub(r\"[\\u31c0-\\u31ef]+\", \"\", caption)\n    caption = re.sub(r\"[\\u31f0-\\u31ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3200-\\u32ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3300-\\u33ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3400-\\u4dbf]+\", \"\", caption)\n    caption = re.sub(r\"[\\u4dc0-\\u4dff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u4e00-\\u9fff]+\", \"\", caption)\n    #######################################################\n\n    # все виды тире / all types of dash --> \"-\"\n    caption = re.sub(\n        r\"[\\u002D\\u058A\\u05BE\\u1400\\u1806\\u2010-\\u2015\\u2E17\\u2E1A\\u2E3A\\u2E3B\\u2E40\\u301C\\u3030\\u30A0\\uFE31\\uFE32\\uFE58\\uFE63\\uFF0D]+\",  # noqa\n        \"-\",\n        caption,\n    )\n\n    # кавычки к одному стандарту\n    caption = re.sub(r\"[`´«»“”¨]\", '\"', caption)\n    caption = re.sub(r\"[‘’]\", \"'\", caption)\n\n    # &quot;\n    caption = re.sub(r\"&quot;?\", \"\", caption)\n    # &amp\n    caption = re.sub(r\"&amp\", \"\", caption)\n\n    # ip adresses:\n    caption = re.sub(r\"\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\", \" \", caption)\n\n    # article ids:\n    caption = re.sub(r\"\\d:\\d\\d\\s+$\", \"\", caption)\n\n    # \\n\n    caption = re.sub(r\"\\\\n\", \" \", caption)\n\n    # \"#123\"\n    caption = re.sub(r\"#\\d{1,3}\\b\", \"\", caption)\n    # \"#12345..\"\n    caption = re.sub(r\"#\\d{5,}\\b\", \"\", caption)\n    # \"123456..\"\n    caption = re.sub(r\"\\b\\d{6,}\\b\", \"\", caption)\n    # filenames:\n    caption = re.sub(r\"[\\S]+\\.(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)\", \"\", caption)\n\n    #\n    caption = re.sub(r\"[\\\"\\']{2,}\", r'\"', caption)  # \"\"\"AUSVERKAUFT\"\"\"\n    caption = re.sub(r\"[\\.]{2,}\", r\" \", caption)  # \"\"\"AUSVERKAUFT\"\"\"\n\n    caption = re.sub(BAD_PUNCT_REGEX, r\" \", caption)  # ***AUSVERKAUFT***, #AUSVERKAUFT\n    caption = re.sub(r\"\\s+\\.\\s+\", r\" \", caption)  # \" . \"\n\n    # this-is-my-cute-cat / this_is_my_cute_cat\n    regex2 = re.compile(r\"(?:\\-|\\_)\")\n    if len(re.findall(regex2, caption)) > 3:\n        caption = re.sub(regex2, \" \", caption)\n\n    caption = basic_clean(caption)\n\n    caption = re.sub(r\"\\b[a-zA-Z]{1,3}\\d{3,15}\\b\", \"\", caption)  # jc6640\n    caption = re.sub(r\"\\b[a-zA-Z]+\\d+[a-zA-Z]+\\b\", \"\", caption)  # jc6640vc\n    caption = re.sub(r\"\\b\\d+[a-zA-Z]+\\d+\\b\", \"\", caption)  # 6640vc231\n\n    caption = re.sub(r\"(worldwide\\s+)?(free\\s+)?shipping\", \"\", caption)\n    caption = re.sub(r\"(free\\s)?download(\\sfree)?\", \"\", caption)\n    caption = re.sub(r\"\\bclick\\b\\s(?:for|on)\\s\\w+\", \"\", caption)\n    caption = re.sub(r\"\\b(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)(\\simage[s]?)?\", \"\", caption)\n    caption = re.sub(r\"\\bpage\\s+\\d+\\b\", \"\", caption)\n\n    caption = re.sub(r\"\\b\\d*[a-zA-Z]+\\d+[a-zA-Z]+\\d+[a-zA-Z\\d]*\\b\", r\" \", caption)  # j2d1a2a...\n\n    caption = re.sub(r\"\\b\\d+\\.?\\d*[xх×]\\d+\\.?\\d*\\b\", \"\", caption)\n\n    caption = re.sub(r\"\\b\\s+\\:\\s+\", r\": \", caption)\n    caption = re.sub(r\"(\\D[,\\./])\\b\", r\"\\1 \", caption)\n    caption = re.sub(r\"\\s+\", \" \", caption)\n\n    caption.strip()\n\n    caption = re.sub(r\"^[\\\"\\']([\\w\\W]+)[\\\"\\']$\", r\"\\1\", caption)\n    caption = re.sub(r\"^[\\'\\_,\\-\\:;]\", r\"\", caption)\n    caption = re.sub(r\"[\\'\\_,\\-\\:\\-\\+]$\", r\"\", caption)\n    caption = re.sub(r\"^\\.\\S+$\", \"\", caption)\n\n    return caption.strip()\n\n\ndef get_10m_set():\n    meta_path_10m = \"/mnt/hdd/data/Panda-70M/raw/meta/train/panda70m_training_10m.csv\"\n    meta_10m = pd.read_csv(meta_path_10m)\n\n    def process_single_caption(row):\n        text_list = eval(row[\"caption\"])\n        clean_list = [clean_caption(x) for x in text_list]\n        return str(clean_list)\n\n    ret = apply(meta_10m, process_single_caption, axis=1)\n    # ret = meta_10m.progress_apply(process_single_caption, axis=1)\n    print(\"==> text processed.\")\n\n    text_list = []\n    for x in ret:\n        text_list += eval(x)\n        # text_set = text_set.union(set(eval(x)))\n    text_set = set(text_list)\n    # meta_10m['caption_new'] = ret\n    # meta_10m.to_csv('/mnt/hdd/data/Panda-70M/raw/meta/train/panda70m_training_10m_new-cap.csv')\n\n    # video_id_set = set(meta_10m['videoID'])\n    # id2t = {}\n    # for idx, row in tqdm(meta_10m.iterrows(), total=len(meta_10m)):\n    #     video_id = row['videoID']\n    #     text_list = eval(row['caption'])\n    #     id2t[video_id] = set(text_list)\n\n    print(f\"==> Loaded meta_10m from '{meta_path_10m}'\")\n    return text_set\n\n\ndef filter_panda10m_text(meta_path, text_set):\n    def process_single_row(row):\n        # path = row['path']\n        t = row[\"text\"]\n        # fname = os.path.basename(path)\n        # video_id = fname[:fname.rindex('_')]\n        if t not in text_set:\n            return False\n        return True\n\n    meta = pd.read_csv(meta_path)\n    ret = apply(meta, process_single_row, axis=1)\n    # ret = meta.progress_apply(process_single_row, axis=1)\n\n    meta = meta[ret]\n    wo_ext, ext = os.path.splitext(meta_path)\n    out_path = f\"{wo_ext}_filter-10m{ext}\"\n    meta.to_csv(out_path, index=False)\n    print(f\"New meta (shape={meta.shape}) saved to '{out_path}'.\")\n\n\ndef filter_panda10m_timestamp(meta_path):\n    meta_path_10m = \"/mnt/hdd/data/Panda-70M/raw/meta/train/panda70m_training_10m.csv\"\n    meta_10m = pd.read_csv(meta_path_10m)\n\n    id2t = {}\n    for idx, row in tqdm(meta_10m.iterrows(), total=len(meta_10m)):\n        video_id = row[\"videoID\"]\n        timestamp = eval(row[\"timestamp\"])\n        timestamp = [str(tuple(x)) for x in timestamp]\n        id2t[video_id] = timestamp\n\n    # video_id_set_10m = set(meta_10m['videoID'])\n    print(f\"==> Loaded meta_10m from '{meta_path_10m}'\")\n\n    def process_single_row(row):\n        path = row[\"path\"]\n        t = row[\"timestamp\"]\n        fname = os.path.basename(path)\n        video_id = fname[: fname.rindex(\"_\")]\n        if video_id not in id2t:\n            return False\n        if t not in id2t[video_id]:\n            return False\n        return True\n        # return video_id in video_id_set_10m\n\n    meta = pd.read_csv(meta_path)\n    ret = apply(meta, process_single_row, axis=1)\n\n    meta = meta[ret]\n    wo_ext, ext = os.path.splitext(meta_path)\n    out_path = f\"{wo_ext}_filter-10m{ext}\"\n    meta.to_csv(out_path, index=False)\n    print(f\"New meta (shape={meta.shape}) saved to '{out_path}'.\")\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"--meta_path\", type=str, nargs=\"+\")\n    parser.add_argument(\"--num_workers\", default=5, type=int)\n\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n\n    text_set = get_10m_set()\n    for x in args.meta_path:\n        filter_panda10m_text(x, text_set)\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/datasets/split.py",
    "content": "import argparse\nfrom typing import List\n\nimport pandas as pd\nfrom mmengine.config import Config\n\nfrom opensora.datasets.bucket import Bucket\n\n\ndef split_by_bucket(\n    bucket: Bucket,\n    input_files: List[str],\n    output_path: str,\n    limit: int,\n    frame_interval: int,\n):\n    print(f\"Split {len(input_files)} files into {len(bucket)} buckets\")\n    total_limit = len(bucket) * limit\n    bucket_cnt = {}\n    # get all bucket id\n    for hw_id, d in bucket.ar_criteria.items():\n        for t_id, v in d.items():\n            for ar_id in v.keys():\n                bucket_id = (hw_id, t_id, ar_id)\n                bucket_cnt[bucket_id] = 0\n    output_df = None\n    # split files\n    for path in input_files:\n        df = pd.read_csv(path)\n        if output_df is None:\n            output_df = pd.DataFrame(columns=df.columns)\n        for i in range(len(df)):\n            row = df.iloc[i]\n            t, h, w = row[\"num_frames\"], row[\"height\"], row[\"width\"]\n            bucket_id = bucket.get_bucket_id(t, h, w, frame_interval)\n            if bucket_id is None:\n                continue\n            if bucket_cnt[bucket_id] < limit:\n                bucket_cnt[bucket_id] += 1\n                output_df = pd.concat([output_df, pd.DataFrame([row])], ignore_index=True)\n                if len(output_df) >= total_limit:\n                    break\n        if len(output_df) >= total_limit:\n            break\n    assert len(output_df) <= total_limit\n    if len(output_df) == total_limit:\n        print(f\"All buckets are full ({total_limit} samples)\")\n    else:\n        print(f\"Only {len(output_df)} files are used\")\n    output_df.to_csv(output_path, index=False)\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"input\", type=str, nargs=\"+\")\n    parser.add_argument(\"-o\", \"--output\", required=True)\n    parser.add_argument(\"-c\", \"--config\", required=True)\n    parser.add_argument(\"-l\", \"--limit\", default=200, type=int)\n    args = parser.parse_args()\n    assert args.limit > 0\n\n    cfg = Config.fromfile(args.config)\n    bucket_config = cfg.bucket_config\n    # rewrite bucket_config\n    for ar, d in bucket_config.items():\n        for frames, t in d.items():\n            p, bs = t\n            if p > 0.0:\n                p = 1.0\n            d[frames] = (p, bs)\n    bucket = Bucket(bucket_config)\n    split_by_bucket(bucket, args.input, args.output, args.limit, cfg.dataset.frame_interval)\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/datasets/transform.py",
    "content": "import argparse\nimport os\nimport random\n\nimport cv2\nimport numpy as np\nimport pandas as pd\nfrom tqdm import tqdm\n\nfrom .utils import IMG_EXTENSIONS, extract_frames\n\ntqdm.pandas()\n\ntry:\n    from pandarallel import pandarallel\n\n    pandarallel.initialize(progress_bar=True)\n    pandas_has_parallel = True\nexcept ImportError:\n    pandas_has_parallel = False\n\n\ndef apply(df, func, **kwargs):\n    if pandas_has_parallel:\n        return df.parallel_apply(func, **kwargs)\n    return df.progress_apply(func, **kwargs)\n\n\ndef get_new_path(path, input_dir, output):\n    path_new = os.path.join(output, os.path.relpath(path, input_dir))\n    os.makedirs(os.path.dirname(path_new), exist_ok=True)\n    return path_new\n\n\ndef resize(path, length, input_dir, output):\n    path_new = get_new_path(path, input_dir, output)\n    ext = os.path.splitext(path)[1].lower()\n    assert ext in IMG_EXTENSIONS\n    img = cv2.imread(path)\n    if img is not None:\n        h, w = img.shape[:2]\n        if min(h, w) > length:\n            if h > w:\n                new_h = length\n                new_w = int(w * new_h / h)\n            else:\n                new_w = length\n                new_h = int(h * new_w / w)\n            img = cv2.resize(img, (new_w, new_h))\n        cv2.imwrite(path_new, img)\n    else:\n        path_new = \"\"\n    return path_new\n\n\ndef rand_crop(path, input_dir, output):\n    ext = os.path.splitext(path)[1].lower()\n    path_new = get_new_path(path, input_dir, output)\n    assert ext in IMG_EXTENSIONS\n    img = cv2.imread(path)\n    if img is not None:\n        h, w = img.shape[:2]\n        width, height, _ = img.shape\n        pos = random.randint(0, 3)\n        if pos == 0:\n            img_cropped = img[: width // 2, : height // 2]\n        elif pos == 1:\n            img_cropped = img[width // 2 :, : height // 2]\n        elif pos == 2:\n            img_cropped = img[: width // 2, height // 2 :]\n        else:\n            img_cropped = img[width // 2 :, height // 2 :]\n        cv2.imwrite(path_new, img_cropped)\n    else:\n        path_new = \"\"\n    return path_new\n\n\ndef main(args):\n    data = pd.read_csv(args.input)\n    if args.method == \"img_rand_crop\":\n        data[\"path\"] = apply(data[\"path\"], lambda x: rand_crop(x, args.input_dir, args.output))\n        output_csv = args.input.replace(\".csv\", f\"_rand_crop.csv\")\n    elif args.method == \"img_resize\":\n        data[\"path\"] = apply(data[\"path\"], lambda x: resize(x, args.length, args.input_dir, args.output))\n        output_csv = args.input.replace(\".csv\", f\"_resized{args.length}.csv\")\n    elif args.method == \"vid_frame_extract\":\n        points = args.points if args.points is not None else args.points_index\n        data = pd.DataFrame(np.repeat(data.values, 3, axis=0), columns=data.columns)\n        num_points = len(points)\n        data[\"point\"] = np.nan\n        for i, point in enumerate(points):\n            if isinstance(point, int):\n                data.loc[i::num_points, \"point\"] = point\n            else:\n                data.loc[i::num_points, \"point\"] = data.loc[i::num_points, \"num_frames\"] * point\n        data[\"path\"] = apply(data, lambda x: extract_frames(x[\"path\"], args.input_dir, args.output, x[\"point\"]), axis=1)\n        output_csv = args.input.replace(\".csv\", f\"_vid_frame_extract.csv\")\n\n    data.to_csv(output_csv, index=False)\n    print(f\"Saved to {output_csv}\")\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"method\", type=str, choices=[\"img_resize\", \"img_rand_crop\", \"vid_frame_extract\"])\n    parser.add_argument(\"input\", type=str)\n    parser.add_argument(\"input_dir\", type=str)\n    parser.add_argument(\"output\", type=str)\n    parser.add_argument(\"--disable-parallel\", action=\"store_true\")\n    parser.add_argument(\"--length\", type=int, default=2160)\n    parser.add_argument(\"--seed\", type=int, default=42, help=\"seed for random\")\n    parser.add_argument(\"--points\", nargs=\"+\", type=float, default=None)\n    parser.add_argument(\"--points_index\", nargs=\"+\", type=int, default=None)\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n    random.seed(args.seed)\n    if args.disable_parallel:\n        pandas_has_parallel = False\n    main(args)\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/datasets/utils.py",
    "content": "import os\n\nimport cv2\nimport numpy as np\nfrom PIL import Image\n\nIMG_EXTENSIONS = (\".jpg\", \".jpeg\", \".png\", \".ppm\", \".bmp\", \".pgm\", \".tif\", \".tiff\", \".webp\")\nVID_EXTENSIONS = (\".mp4\", \".avi\", \".mov\", \".mkv\")\n\n\ndef is_video(filename):\n    ext = os.path.splitext(filename)[-1].lower()\n    return ext in VID_EXTENSIONS\n\n\ndef extract_frames(\n    video_path,\n    frame_inds=None,\n    points=None,\n    backend=\"opencv\",\n    return_length=False,\n    num_frames=None,\n):\n    \"\"\"\n    Args:\n        video_path (str): path to video\n        frame_inds (List[int]): indices of frames to extract\n        points (List[float]): values within [0, 1); multiply #frames to get frame indices\n    Return:\n        List[PIL.Image]\n    \"\"\"\n    assert backend in [\"av\", \"opencv\", \"decord\"]\n    assert (frame_inds is None) or (points is None)\n\n    if backend == \"av\":\n        import av\n\n        container = av.open(video_path)\n        if num_frames is not None:\n            total_frames = num_frames\n        else:\n            total_frames = container.streams.video[0].frames\n\n        if points is not None:\n            frame_inds = [int(p * total_frames) for p in points]\n\n        frames = []\n        for idx in frame_inds:\n            if idx >= total_frames:\n                idx = total_frames - 1\n            target_timestamp = int(idx * av.time_base / container.streams.video[0].average_rate)\n            container.seek(target_timestamp)\n            frame = next(container.decode(video=0)).to_image()\n            frames.append(frame)\n\n        if return_length:\n            return frames, total_frames\n        return frames\n\n    elif backend == \"decord\":\n        import decord\n\n        container = decord.VideoReader(video_path, num_threads=1)\n        if num_frames is not None:\n            total_frames = num_frames\n        else:\n            total_frames = len(container)\n\n        if points is not None:\n            frame_inds = [int(p * total_frames) for p in points]\n\n        frame_inds = np.array(frame_inds).astype(np.int32)\n        frame_inds[frame_inds >= total_frames] = total_frames - 1\n        frames = container.get_batch(frame_inds).asnumpy()  # [N, H, W, C]\n        frames = [Image.fromarray(x) for x in frames]\n\n        if return_length:\n            return frames, total_frames\n        return frames\n\n    elif backend == \"opencv\":\n        cap = cv2.VideoCapture(video_path)\n        if num_frames is not None:\n            total_frames = num_frames\n        else:\n            total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))\n\n        if points is not None:\n            frame_inds = [int(p * total_frames) for p in points]\n\n        frames = []\n        for idx in frame_inds:\n            if idx >= total_frames:\n                idx = total_frames - 1\n\n            cap.set(cv2.CAP_PROP_POS_FRAMES, idx)\n\n            # HACK: sometimes OpenCV fails to read frames, return a black frame instead\n            try:\n                ret, frame = cap.read()\n                frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n                frame = Image.fromarray(frame)\n            except Exception as e:\n                print(f\"[Warning] Error reading frame {idx} from {video_path}: {e}\")\n                # First, try to read the first frame\n                try:\n                    print(f\"[Warning] Try reading first frame.\")\n                    cap.set(cv2.CAP_PROP_POS_FRAMES, 0)\n                    ret, frame = cap.read()\n                    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n                    frame = Image.fromarray(frame)\n                # If that fails, return a black frame\n                except Exception as e:\n                    print(f\"[Warning] Error in reading first frame from {video_path}: {e}\")\n                    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))\n                    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))\n                    frame = Image.new(\"RGB\", (width, height), (0, 0, 0))\n\n            # HACK: if height or width is 0, return a black frame instead\n            if frame.height == 0 or frame.width == 0:\n                height = width = 256\n                frame = Image.new(\"RGB\", (width, height), (0, 0, 0))\n\n            frames.append(frame)\n\n        if return_length:\n            return frames, total_frames\n        return frames\n    else:\n        raise ValueError\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/frame_interpolation/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/tools/frame_interpolation/interpolation.py",
    "content": "# this script is modified from https://github.com/MCG-NKU/AMT/blob/main/demos/demo_2x.py\nimport argparse\nimport os\nimport os.path as osp\n\nimport cv2\nimport numpy as np\nimport torch\n\nfrom opensora.utils.ckpt_utils import download_model\n\nfrom .networks.amt_g import Model\nfrom .utils.utils import InputPadder, img2tensor, tensor2img\n\nhf_endpoint = os.environ.get(\"HF_ENDPOINT\")\nif hf_endpoint is None:\n    hf_endpoint = \"https://huggingface.co\"\nVID_EXT = [\".mp4\", \".avi\", \".mov\", \".mkv\", \".flv\", \".wmv\", \".webm\"]\nnetwork_cfg = {\n    \"params\": {\n        \"corr_radius\": 3,\n        \"corr_lvls\": 4,\n        \"num_flows\": 5,\n    },\n}\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n\n\ndef init():\n    \"\"\"\n    initialize the device and the anchor resolution.\n    \"\"\"\n\n    if device == \"cuda\":\n        anchor_resolution = 1024 * 512\n        anchor_memory = 1500 * 1024**2\n        anchor_memory_bias = 2500 * 1024**2\n        vram_avail = torch.cuda.get_device_properties(device).total_memory\n        print(\"VRAM available: {:.1f} MB\".format(vram_avail / 1024**2))\n    else:\n        # Do not resize in cpu mode\n        anchor_resolution = 8192 * 8192\n        anchor_memory = 1\n        anchor_memory_bias = 0\n        vram_avail = 1\n\n    return anchor_resolution, anchor_memory, anchor_memory_bias, vram_avail\n\n\ndef get_input_video_from_path(input_path):\n    \"\"\"\n    Get the input video from the input_path.\n\n    params:\n        input_path: str, the path of the input video.\n        devices: str, the device to run the model.\n    returns:\n        inputs: list, the list of the input frames.\n        scale: float, the scale of the input frames.\n        padder: InputPadder, the padder to pad the input frames.\n    \"\"\"\n\n    anchor_resolution, anchor_memory, anchor_memory_bias, vram_avail = init()\n\n    if osp.splitext(input_path)[-1].lower() in VID_EXT:\n        vcap = cv2.VideoCapture(input_path)\n\n        inputs = []\n        w = int(vcap.get(cv2.CAP_PROP_FRAME_WIDTH))\n        h = int(vcap.get(cv2.CAP_PROP_FRAME_HEIGHT))\n        scale = anchor_resolution / (h * w) * np.sqrt((vram_avail - anchor_memory_bias) / anchor_memory)\n        scale = 1 if scale > 1 else scale\n        scale = 1 / np.floor(1 / np.sqrt(scale) * 16) * 16\n        if scale < 1:\n            print(f\"Due to the limited VRAM, the video will be scaled by {scale:.2f}\")\n        padding = int(16 / scale)\n        padder = InputPadder((h, w), padding)\n        while True:\n            ret, frame = vcap.read()\n            if ret is False:\n                break\n            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n            frame_t = img2tensor(frame).to(device)\n            frame_t = padder.pad(frame_t)\n            inputs.append(frame_t)\n        print(f\"Loading the [video] from {input_path}, the number of frames [{len(inputs)}]\")\n    else:\n        raise TypeError(\"Input should be a video.\")\n\n    return inputs, scale, padder\n\n\ndef load_model(ckpt):\n    \"\"\"\n    load the frame interpolation model.\n    \"\"\"\n    params = network_cfg.get(\"params\", {})\n    model = Model(**params)\n    model.load_state_dict(ckpt[\"state_dict\"])\n    model = model.to(device)\n    model.eval()\n    return model\n\n\ndef interpolater(model, inputs, scale, padder, iters=1):\n    \"\"\"\n    interpolating with the interpolation model.\n\n    params:\n        model: nn.Module, the frame interpolation model.\n        inputs: list, the list of the input frames.\n        scale: float, the scale of the input frames.\n        iters: int, the number of iterations of interpolation. The final frames model generating is 2 ** iters * (m - 1) + 1 and m is input frames.\n    returns:\n        outputs: list, the list of the output frames.\n    \"\"\"\n\n    print(\"Start frame interpolation:\")\n    embt = torch.tensor(1 / 2).float().view(1, 1, 1, 1).to(device)\n\n    for i in range(iters):\n        print(f\"Iter {i+1}. input_frames={len(inputs)} output_frames={2*len(inputs)-1}\")\n        outputs = [inputs[0]]\n        for in_0, in_1 in zip(inputs[:-1], inputs[1:]):\n            in_0 = in_0.to(device)\n            in_1 = in_1.to(device)\n            with torch.no_grad():\n                imgt_pred = model(in_0, in_1, embt, scale_factor=scale, eval=True)[\"imgt_pred\"]\n            outputs += [imgt_pred.cpu(), in_1.cpu()]\n        inputs = outputs\n\n    outputs = padder.unpad(*outputs)\n    return outputs\n\n\ndef write(outputs, input_path, output_path, fps=30):\n    \"\"\"\n    write results to the output_path.\n    \"\"\"\n\n    if osp.exists(output_path) is False:\n        os.makedirs(output_path)\n\n    size = outputs[0].shape[2:][::-1]\n\n    _, file_name_with_extension = os.path.split(input_path)\n    file_name, _ = os.path.splitext(file_name_with_extension)\n\n    save_video_path = f\"{output_path}/fps{fps}_{file_name}.mp4\"\n    fourcc = cv2.VideoWriter_fourcc(*\"mp4v\")\n    writer = cv2.VideoWriter(save_video_path, fourcc, fps, size)\n\n    for i, imgt_pred in enumerate(outputs):\n        imgt_pred = tensor2img(imgt_pred)\n        imgt_pred = cv2.cvtColor(imgt_pred, cv2.COLOR_RGB2BGR)\n        writer.write(imgt_pred)\n    print(f\"Demo video is saved to [{save_video_path}]\")\n\n    writer.release()\n\n\ndef process(\n    model,\n    image_path,\n    output_path,\n    fps,\n    iters,\n):\n    inputs, scale, padder = get_input_video_from_path(image_path)\n    outputs = interpolater(model, inputs, scale, padder, iters)\n    write(outputs, image_path, output_path, fps)\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"input\", help=\"Input video.\")\n    parser.add_argument(\"--ckpt\", type=str, default=\"./pretrained_models/amt-g.pth\", help=\"The pretrained model.\")\n    parser.add_argument(\n        \"--niters\",\n        type=int,\n        default=1,\n        help=\"Iter of Interpolation. The number of frames will be double after per iter.\",\n    )\n    parser.add_argument(\"--output_path\", type=str, default=\"samples\", help=\"Output path.\")\n    parser.add_argument(\"--fps\", type=int, default=8, help=\"Frames rate of the output video.\")\n    parser.add_argument(\"--folder\", action=\"store_true\", help=\"If the input is a folder, set this flag.\")\n    args = parser.parse_args()\n\n    times_frame = 2**args.niters\n    old_fps = args.fps\n    args.fps = args.fps * times_frame\n    print(f\"Interpolation will turn {old_fps}fps video to {args.fps}fps video.\")\n    args.input = os.path.expanduser(args.input)\n    args.ckpt = os.path.expanduser(args.ckpt)\n    args.folder = osp.splitext(args.input)[-1].lower() not in VID_EXT\n    args.ckpt = download_model(local_path=args.ckpt, url=hf_endpoint + \"/lalala125/AMT/resolve/main/amt-g.pth\")\n    return args\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n    ckpt_path = args.ckpt\n    input_path = args.input\n    output_path = args.output_path\n    iters = int(args.niters)\n    fps = int(args.fps)\n\n    model = load_model(ckpt_path)\n\n    if args.folder:\n        for file in os.listdir(input_path):\n            if osp.splitext(file)[-1].lower() in VID_EXT:\n                vid_path = os.path.join(input_path, file)\n                process(model, vid_path, output_path, fps, iters)\n    else:\n        process(model, input_path, output_path, fps, iters)\n\n    print(\"Interpolation is done.\")\n    print(f\"Output path: {output_path}\")\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/frame_interpolation/networks/__init__.py",
    "content": "from .amt_g import Model\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/frame_interpolation/networks/amt_g.py",
    "content": "import torch\nimport torch.nn as nn\n\nfrom .blocks.feat_enc import LargeEncoder\nfrom .blocks.ifrnet import Encoder, InitDecoder, IntermediateDecoder, resize\nfrom .blocks.multi_flow import MultiFlowDecoder, multi_flow_combine\nfrom .blocks.raft import BasicUpdateBlock, BidirCorrBlock, coords_grid\n\n\nclass Model(nn.Module):\n    def __init__(self, corr_radius=3, corr_lvls=4, num_flows=5, channels=[84, 96, 112, 128], skip_channels=84):\n        super(Model, self).__init__()\n        self.radius = corr_radius\n        self.corr_levels = corr_lvls\n        self.num_flows = num_flows\n\n        self.feat_encoder = LargeEncoder(output_dim=128, norm_fn=\"instance\", dropout=0.0)\n        self.encoder = Encoder(channels, large=True)\n        self.decoder4 = InitDecoder(channels[3], channels[2], skip_channels)\n        self.decoder3 = IntermediateDecoder(channels[2], channels[1], skip_channels)\n        self.decoder2 = IntermediateDecoder(channels[1], channels[0], skip_channels)\n        self.decoder1 = MultiFlowDecoder(channels[0], skip_channels, num_flows)\n\n        self.update4 = self._get_updateblock(112, None)\n        self.update3_low = self._get_updateblock(96, 2.0)\n        self.update2_low = self._get_updateblock(84, 4.0)\n\n        self.update3_high = self._get_updateblock(96, None)\n        self.update2_high = self._get_updateblock(84, None)\n\n        self.comb_block = nn.Sequential(\n            nn.Conv2d(3 * self.num_flows, 6 * self.num_flows, 7, 1, 3),\n            nn.PReLU(6 * self.num_flows),\n            nn.Conv2d(6 * self.num_flows, 3, 7, 1, 3),\n        )\n\n    def _get_updateblock(self, cdim, scale_factor=None):\n        return BasicUpdateBlock(\n            cdim=cdim,\n            hidden_dim=192,\n            flow_dim=64,\n            corr_dim=256,\n            corr_dim2=192,\n            fc_dim=188,\n            scale_factor=scale_factor,\n            corr_levels=self.corr_levels,\n            radius=self.radius,\n        )\n\n    def _corr_scale_lookup(self, corr_fn, coord, flow0, flow1, embt, downsample=1):\n        # convert t -> 0 to 0 -> 1 | convert t -> 1 to 1 -> 0\n        # based on linear assumption\n        t1_scale = 1.0 / embt\n        t0_scale = 1.0 / (1.0 - embt)\n        if downsample != 1:\n            inv = 1 / downsample\n            flow0 = inv * resize(flow0, scale_factor=inv)\n            flow1 = inv * resize(flow1, scale_factor=inv)\n\n        corr0, corr1 = corr_fn(coord + flow1 * t1_scale, coord + flow0 * t0_scale)\n        corr = torch.cat([corr0, corr1], dim=1)\n        flow = torch.cat([flow0, flow1], dim=1)\n        return corr, flow\n\n    def forward(self, img0, img1, embt, scale_factor=1.0, eval=False, **kwargs):\n        mean_ = torch.cat([img0, img1], 2).mean(1, keepdim=True).mean(2, keepdim=True).mean(3, keepdim=True)\n        img0 = img0 - mean_\n        img1 = img1 - mean_\n        img0_ = resize(img0, scale_factor) if scale_factor != 1.0 else img0\n        img1_ = resize(img1, scale_factor) if scale_factor != 1.0 else img1\n        b, _, h, w = img0_.shape\n        coord = coords_grid(b, h // 8, w // 8, img0.device)\n\n        fmap0, fmap1 = self.feat_encoder([img0_, img1_])  # [1, 128, H//8, W//8]\n        corr_fn = BidirCorrBlock(fmap0, fmap1, radius=self.radius, num_levels=self.corr_levels)\n\n        # f0_1: [1, c0, H//2, W//2] | f0_2: [1, c1, H//4, W//4]\n        # f0_3: [1, c2, H//8, W//8] | f0_4: [1, c3, H//16, W//16]\n        f0_1, f0_2, f0_3, f0_4 = self.encoder(img0_)\n        f1_1, f1_2, f1_3, f1_4 = self.encoder(img1_)\n\n        ######################################### the 4th decoder #########################################\n        up_flow0_4, up_flow1_4, ft_3_ = self.decoder4(f0_4, f1_4, embt)\n        corr_4, flow_4 = self._corr_scale_lookup(corr_fn, coord, up_flow0_4, up_flow1_4, embt, downsample=1)\n\n        # residue update with lookup corr\n        delta_ft_3_, delta_flow_4 = self.update4(ft_3_, flow_4, corr_4)\n        delta_flow0_4, delta_flow1_4 = torch.chunk(delta_flow_4, 2, 1)\n        up_flow0_4 = up_flow0_4 + delta_flow0_4\n        up_flow1_4 = up_flow1_4 + delta_flow1_4\n        ft_3_ = ft_3_ + delta_ft_3_\n\n        ######################################### the 3rd decoder #########################################\n        up_flow0_3, up_flow1_3, ft_2_ = self.decoder3(ft_3_, f0_3, f1_3, up_flow0_4, up_flow1_4)\n        corr_3, flow_3 = self._corr_scale_lookup(corr_fn, coord, up_flow0_3, up_flow1_3, embt, downsample=2)\n\n        # residue update with lookup corr\n        delta_ft_2_, delta_flow_3 = self.update3_low(ft_2_, flow_3, corr_3)\n        delta_flow0_3, delta_flow1_3 = torch.chunk(delta_flow_3, 2, 1)\n        up_flow0_3 = up_flow0_3 + delta_flow0_3\n        up_flow1_3 = up_flow1_3 + delta_flow1_3\n        ft_2_ = ft_2_ + delta_ft_2_\n\n        # residue update with lookup corr (hr)\n        corr_3 = resize(corr_3, scale_factor=2.0)\n        up_flow_3 = torch.cat([up_flow0_3, up_flow1_3], dim=1)\n        delta_ft_2_, delta_up_flow_3 = self.update3_high(ft_2_, up_flow_3, corr_3)\n        ft_2_ += delta_ft_2_\n        up_flow0_3 += delta_up_flow_3[:, 0:2]\n        up_flow1_3 += delta_up_flow_3[:, 2:4]\n\n        ######################################### the 2nd decoder #########################################\n        up_flow0_2, up_flow1_2, ft_1_ = self.decoder2(ft_2_, f0_2, f1_2, up_flow0_3, up_flow1_3)\n        corr_2, flow_2 = self._corr_scale_lookup(corr_fn, coord, up_flow0_2, up_flow1_2, embt, downsample=4)\n\n        # residue update with lookup corr\n        delta_ft_1_, delta_flow_2 = self.update2_low(ft_1_, flow_2, corr_2)\n        delta_flow0_2, delta_flow1_2 = torch.chunk(delta_flow_2, 2, 1)\n        up_flow0_2 = up_flow0_2 + delta_flow0_2\n        up_flow1_2 = up_flow1_2 + delta_flow1_2\n        ft_1_ = ft_1_ + delta_ft_1_\n\n        # residue update with lookup corr (hr)\n        corr_2 = resize(corr_2, scale_factor=4.0)\n        up_flow_2 = torch.cat([up_flow0_2, up_flow1_2], dim=1)\n        delta_ft_1_, delta_up_flow_2 = self.update2_high(ft_1_, up_flow_2, corr_2)\n        ft_1_ += delta_ft_1_\n        up_flow0_2 += delta_up_flow_2[:, 0:2]\n        up_flow1_2 += delta_up_flow_2[:, 2:4]\n\n        ######################################### the 1st decoder #########################################\n        up_flow0_1, up_flow1_1, mask, img_res = self.decoder1(ft_1_, f0_1, f1_1, up_flow0_2, up_flow1_2)\n\n        if scale_factor != 1.0:\n            up_flow0_1 = resize(up_flow0_1, scale_factor=(1.0 / scale_factor)) * (1.0 / scale_factor)\n            up_flow1_1 = resize(up_flow1_1, scale_factor=(1.0 / scale_factor)) * (1.0 / scale_factor)\n            mask = resize(mask, scale_factor=(1.0 / scale_factor))\n            img_res = resize(img_res, scale_factor=(1.0 / scale_factor))\n\n        # Merge multiple predictions\n        imgt_pred = multi_flow_combine(self.comb_block, img0, img1, up_flow0_1, up_flow1_1, mask, img_res, mean_)\n        imgt_pred = torch.clamp(imgt_pred, 0, 1)\n\n        if eval:\n            return {\n                \"imgt_pred\": imgt_pred,\n            }\n        else:\n            up_flow0_1 = up_flow0_1.reshape(b, self.num_flows, 2, h, w)\n            up_flow1_1 = up_flow1_1.reshape(b, self.num_flows, 2, h, w)\n            return {\n                \"imgt_pred\": imgt_pred,\n                \"flow0_pred\": [up_flow0_1, up_flow0_2, up_flow0_3, up_flow0_4],\n                \"flow1_pred\": [up_flow1_1, up_flow1_2, up_flow1_3, up_flow1_4],\n                \"ft_pred\": [ft_1_, ft_2_, ft_3_],\n            }\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/frame_interpolation/networks/blocks/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/tools/frame_interpolation/networks/blocks/feat_enc.py",
    "content": "import torch\nimport torch.nn as nn\n\n\nclass BottleneckBlock(nn.Module):\n    def __init__(self, in_planes, planes, norm_fn=\"group\", stride=1):\n        super(BottleneckBlock, self).__init__()\n\n        self.conv1 = nn.Conv2d(in_planes, planes // 4, kernel_size=1, padding=0)\n        self.conv2 = nn.Conv2d(planes // 4, planes // 4, kernel_size=3, padding=1, stride=stride)\n        self.conv3 = nn.Conv2d(planes // 4, planes, kernel_size=1, padding=0)\n        self.relu = nn.ReLU(inplace=True)\n\n        num_groups = planes // 8\n\n        if norm_fn == \"group\":\n            self.norm1 = nn.GroupNorm(num_groups=num_groups, num_channels=planes // 4)\n            self.norm2 = nn.GroupNorm(num_groups=num_groups, num_channels=planes // 4)\n            self.norm3 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n            if not stride == 1:\n                self.norm4 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n\n        elif norm_fn == \"batch\":\n            self.norm1 = nn.BatchNorm2d(planes // 4)\n            self.norm2 = nn.BatchNorm2d(planes // 4)\n            self.norm3 = nn.BatchNorm2d(planes)\n            if not stride == 1:\n                self.norm4 = nn.BatchNorm2d(planes)\n\n        elif norm_fn == \"instance\":\n            self.norm1 = nn.InstanceNorm2d(planes // 4)\n            self.norm2 = nn.InstanceNorm2d(planes // 4)\n            self.norm3 = nn.InstanceNorm2d(planes)\n            if not stride == 1:\n                self.norm4 = nn.InstanceNorm2d(planes)\n\n        elif norm_fn == \"none\":\n            self.norm1 = nn.Sequential()\n            self.norm2 = nn.Sequential()\n            self.norm3 = nn.Sequential()\n            if not stride == 1:\n                self.norm4 = nn.Sequential()\n\n        if stride == 1:\n            self.downsample = None\n\n        else:\n            self.downsample = nn.Sequential(nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride), self.norm4)\n\n    def forward(self, x):\n        y = x\n        y = self.relu(self.norm1(self.conv1(y)))\n        y = self.relu(self.norm2(self.conv2(y)))\n        y = self.relu(self.norm3(self.conv3(y)))\n\n        if self.downsample is not None:\n            x = self.downsample(x)\n\n        return self.relu(x + y)\n\n\nclass ResidualBlock(nn.Module):\n    def __init__(self, in_planes, planes, norm_fn=\"group\", stride=1):\n        super(ResidualBlock, self).__init__()\n\n        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, padding=1, stride=stride)\n        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, padding=1)\n        self.relu = nn.ReLU(inplace=True)\n\n        num_groups = planes // 8\n\n        if norm_fn == \"group\":\n            self.norm1 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n            self.norm2 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n            if not stride == 1:\n                self.norm3 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n\n        elif norm_fn == \"batch\":\n            self.norm1 = nn.BatchNorm2d(planes)\n            self.norm2 = nn.BatchNorm2d(planes)\n            if not stride == 1:\n                self.norm3 = nn.BatchNorm2d(planes)\n\n        elif norm_fn == \"instance\":\n            self.norm1 = nn.InstanceNorm2d(planes)\n            self.norm2 = nn.InstanceNorm2d(planes)\n            if not stride == 1:\n                self.norm3 = nn.InstanceNorm2d(planes)\n\n        elif norm_fn == \"none\":\n            self.norm1 = nn.Sequential()\n            self.norm2 = nn.Sequential()\n            if not stride == 1:\n                self.norm3 = nn.Sequential()\n\n        if stride == 1:\n            self.downsample = None\n\n        else:\n            self.downsample = nn.Sequential(nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride), self.norm3)\n\n    def forward(self, x):\n        y = x\n        y = self.relu(self.norm1(self.conv1(y)))\n        y = self.relu(self.norm2(self.conv2(y)))\n\n        if self.downsample is not None:\n            x = self.downsample(x)\n\n        return self.relu(x + y)\n\n\nclass SmallEncoder(nn.Module):\n    def __init__(self, output_dim=128, norm_fn=\"batch\", dropout=0.0):\n        super(SmallEncoder, self).__init__()\n        self.norm_fn = norm_fn\n\n        if self.norm_fn == \"group\":\n            self.norm1 = nn.GroupNorm(num_groups=8, num_channels=32)\n\n        elif self.norm_fn == \"batch\":\n            self.norm1 = nn.BatchNorm2d(32)\n\n        elif self.norm_fn == \"instance\":\n            self.norm1 = nn.InstanceNorm2d(32)\n\n        elif self.norm_fn == \"none\":\n            self.norm1 = nn.Sequential()\n\n        self.conv1 = nn.Conv2d(3, 32, kernel_size=7, stride=2, padding=3)\n        self.relu1 = nn.ReLU(inplace=True)\n\n        self.in_planes = 32\n        self.layer1 = self._make_layer(32, stride=1)\n        self.layer2 = self._make_layer(64, stride=2)\n        self.layer3 = self._make_layer(96, stride=2)\n\n        self.dropout = None\n        if dropout > 0:\n            self.dropout = nn.Dropout2d(p=dropout)\n\n        self.conv2 = nn.Conv2d(96, output_dim, kernel_size=1)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode=\"fan_out\", nonlinearity=\"relu\")\n            elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):\n                if m.weight is not None:\n                    nn.init.constant_(m.weight, 1)\n                if m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n\n    def _make_layer(self, dim, stride=1):\n        layer1 = BottleneckBlock(self.in_planes, dim, self.norm_fn, stride=stride)\n        layer2 = BottleneckBlock(dim, dim, self.norm_fn, stride=1)\n        layers = (layer1, layer2)\n\n        self.in_planes = dim\n        return nn.Sequential(*layers)\n\n    def forward(self, x):\n        # if input is list, combine batch dimension\n        is_list = isinstance(x, tuple) or isinstance(x, list)\n        if is_list:\n            batch_dim = x[0].shape[0]\n            x = torch.cat(x, dim=0)\n\n        x = self.conv1(x)\n        x = self.norm1(x)\n        x = self.relu1(x)\n\n        x = self.layer1(x)\n        x = self.layer2(x)\n        x = self.layer3(x)\n        x = self.conv2(x)\n\n        if self.training and self.dropout is not None:\n            x = self.dropout(x)\n\n        if is_list:\n            x = torch.split(x, [batch_dim, batch_dim], dim=0)\n\n        return x\n\n\nclass BasicEncoder(nn.Module):\n    def __init__(self, output_dim=128, norm_fn=\"batch\", dropout=0.0):\n        super(BasicEncoder, self).__init__()\n        self.norm_fn = norm_fn\n\n        if self.norm_fn == \"group\":\n            self.norm1 = nn.GroupNorm(num_groups=8, num_channels=64)\n\n        elif self.norm_fn == \"batch\":\n            self.norm1 = nn.BatchNorm2d(64)\n\n        elif self.norm_fn == \"instance\":\n            self.norm1 = nn.InstanceNorm2d(64)\n\n        elif self.norm_fn == \"none\":\n            self.norm1 = nn.Sequential()\n\n        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)\n        self.relu1 = nn.ReLU(inplace=True)\n\n        self.in_planes = 64\n        self.layer1 = self._make_layer(64, stride=1)\n        self.layer2 = self._make_layer(72, stride=2)\n        self.layer3 = self._make_layer(128, stride=2)\n\n        # output convolution\n        self.conv2 = nn.Conv2d(128, output_dim, kernel_size=1)\n\n        self.dropout = None\n        if dropout > 0:\n            self.dropout = nn.Dropout2d(p=dropout)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode=\"fan_out\", nonlinearity=\"relu\")\n            elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):\n                if m.weight is not None:\n                    nn.init.constant_(m.weight, 1)\n                if m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n\n    def _make_layer(self, dim, stride=1):\n        layer1 = ResidualBlock(self.in_planes, dim, self.norm_fn, stride=stride)\n        layer2 = ResidualBlock(dim, dim, self.norm_fn, stride=1)\n        layers = (layer1, layer2)\n\n        self.in_planes = dim\n        return nn.Sequential(*layers)\n\n    def forward(self, x):\n        # if input is list, combine batch dimension\n        is_list = isinstance(x, tuple) or isinstance(x, list)\n        if is_list:\n            batch_dim = x[0].shape[0]\n            x = torch.cat(x, dim=0)\n\n        x = self.conv1(x)\n        x = self.norm1(x)\n        x = self.relu1(x)\n\n        x = self.layer1(x)\n        x = self.layer2(x)\n        x = self.layer3(x)\n\n        x = self.conv2(x)\n\n        if self.training and self.dropout is not None:\n            x = self.dropout(x)\n\n        if is_list:\n            x = torch.split(x, [batch_dim, batch_dim], dim=0)\n\n        return x\n\n\nclass LargeEncoder(nn.Module):\n    def __init__(self, output_dim=128, norm_fn=\"batch\", dropout=0.0):\n        super(LargeEncoder, self).__init__()\n        self.norm_fn = norm_fn\n\n        if self.norm_fn == \"group\":\n            self.norm1 = nn.GroupNorm(num_groups=8, num_channels=64)\n\n        elif self.norm_fn == \"batch\":\n            self.norm1 = nn.BatchNorm2d(64)\n\n        elif self.norm_fn == \"instance\":\n            self.norm1 = nn.InstanceNorm2d(64)\n\n        elif self.norm_fn == \"none\":\n            self.norm1 = nn.Sequential()\n\n        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)\n        self.relu1 = nn.ReLU(inplace=True)\n\n        self.in_planes = 64\n        self.layer1 = self._make_layer(64, stride=1)\n        self.layer2 = self._make_layer(112, stride=2)\n        self.layer3 = self._make_layer(160, stride=2)\n        self.layer3_2 = self._make_layer(160, stride=1)\n\n        # output convolution\n        self.conv2 = nn.Conv2d(self.in_planes, output_dim, kernel_size=1)\n\n        self.dropout = None\n        if dropout > 0:\n            self.dropout = nn.Dropout2d(p=dropout)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode=\"fan_out\", nonlinearity=\"relu\")\n            elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):\n                if m.weight is not None:\n                    nn.init.constant_(m.weight, 1)\n                if m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n\n    def _make_layer(self, dim, stride=1):\n        layer1 = ResidualBlock(self.in_planes, dim, self.norm_fn, stride=stride)\n        layer2 = ResidualBlock(dim, dim, self.norm_fn, stride=1)\n        layers = (layer1, layer2)\n\n        self.in_planes = dim\n        return nn.Sequential(*layers)\n\n    def forward(self, x):\n        # if input is list, combine batch dimension\n        is_list = isinstance(x, tuple) or isinstance(x, list)\n        if is_list:\n            batch_dim = x[0].shape[0]\n            x = torch.cat(x, dim=0)\n\n        x = self.conv1(x)\n        x = self.norm1(x)\n        x = self.relu1(x)\n\n        x = self.layer1(x)\n        x = self.layer2(x)\n        x = self.layer3(x)\n        x = self.layer3_2(x)\n\n        x = self.conv2(x)\n\n        if self.training and self.dropout is not None:\n            x = self.dropout(x)\n\n        if is_list:\n            x = torch.split(x, [batch_dim, batch_dim], dim=0)\n\n        return x\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/frame_interpolation/networks/blocks/ifrnet.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom tools.frame_interpolation.utils.flow_utils import warp\n\n\ndef resize(x, scale_factor):\n    return F.interpolate(x, scale_factor=scale_factor, mode=\"bilinear\", align_corners=False)\n\n\ndef convrelu(in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1, groups=1, bias=True):\n    return nn.Sequential(\n        nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias=bias),\n        nn.PReLU(out_channels),\n    )\n\n\nclass ResBlock(nn.Module):\n    def __init__(self, in_channels, side_channels, bias=True):\n        super(ResBlock, self).__init__()\n        self.side_channels = side_channels\n        self.conv1 = nn.Sequential(\n            nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1, bias=bias), nn.PReLU(in_channels)\n        )\n        self.conv2 = nn.Sequential(\n            nn.Conv2d(side_channels, side_channels, kernel_size=3, stride=1, padding=1, bias=bias),\n            nn.PReLU(side_channels),\n        )\n        self.conv3 = nn.Sequential(\n            nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1, bias=bias), nn.PReLU(in_channels)\n        )\n        self.conv4 = nn.Sequential(\n            nn.Conv2d(side_channels, side_channels, kernel_size=3, stride=1, padding=1, bias=bias),\n            nn.PReLU(side_channels),\n        )\n        self.conv5 = nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1, bias=bias)\n        self.prelu = nn.PReLU(in_channels)\n\n    def forward(self, x):\n        out = self.conv1(x)\n\n        res_feat = out[:, : -self.side_channels, ...]\n        side_feat = out[:, -self.side_channels :, :, :]\n        side_feat = self.conv2(side_feat)\n        out = self.conv3(torch.cat([res_feat, side_feat], 1))\n\n        res_feat = out[:, : -self.side_channels, ...]\n        side_feat = out[:, -self.side_channels :, :, :]\n        side_feat = self.conv4(side_feat)\n        out = self.conv5(torch.cat([res_feat, side_feat], 1))\n\n        out = self.prelu(x + out)\n        return out\n\n\nclass Encoder(nn.Module):\n    def __init__(self, channels, large=False):\n        super(Encoder, self).__init__()\n        self.channels = channels\n        prev_ch = 3\n        for idx, ch in enumerate(channels, 1):\n            k = 7 if large and idx == 1 else 3\n            p = 3 if k == 7 else 1\n            self.register_module(\n                f\"pyramid{idx}\", nn.Sequential(convrelu(prev_ch, ch, k, 2, p), convrelu(ch, ch, 3, 1, 1))\n            )\n            prev_ch = ch\n\n    def forward(self, in_x):\n        fs = []\n        for idx in range(len(self.channels)):\n            out_x = getattr(self, f\"pyramid{idx+1}\")(in_x)\n            fs.append(out_x)\n            in_x = out_x\n        return fs\n\n\nclass InitDecoder(nn.Module):\n    def __init__(self, in_ch, out_ch, skip_ch) -> None:\n        super().__init__()\n        self.convblock = nn.Sequential(\n            convrelu(in_ch * 2 + 1, in_ch * 2),\n            ResBlock(in_ch * 2, skip_ch),\n            nn.ConvTranspose2d(in_ch * 2, out_ch + 4, 4, 2, 1, bias=True),\n        )\n\n    def forward(self, f0, f1, embt):\n        h, w = f0.shape[2:]\n        embt = embt.repeat(1, 1, h, w)\n        out = self.convblock(torch.cat([f0, f1, embt], 1))\n        flow0, flow1 = torch.chunk(out[:, :4, ...], 2, 1)\n        ft_ = out[:, 4:, ...]\n        return flow0, flow1, ft_\n\n\nclass IntermediateDecoder(nn.Module):\n    def __init__(self, in_ch, out_ch, skip_ch) -> None:\n        super().__init__()\n        self.convblock = nn.Sequential(\n            convrelu(in_ch * 3 + 4, in_ch * 3),\n            ResBlock(in_ch * 3, skip_ch),\n            nn.ConvTranspose2d(in_ch * 3, out_ch + 4, 4, 2, 1, bias=True),\n        )\n\n    def forward(self, ft_, f0, f1, flow0_in, flow1_in):\n        f0_warp = warp(f0, flow0_in)\n        f1_warp = warp(f1, flow1_in)\n        f_in = torch.cat([ft_, f0_warp, f1_warp, flow0_in, flow1_in], 1)\n        out = self.convblock(f_in)\n        flow0, flow1 = torch.chunk(out[:, :4, ...], 2, 1)\n        ft_ = out[:, 4:, ...]\n        flow0 = flow0 + 2.0 * resize(flow0_in, scale_factor=2.0)\n        flow1 = flow1 + 2.0 * resize(flow1_in, scale_factor=2.0)\n        return flow0, flow1, ft_\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/frame_interpolation/networks/blocks/multi_flow.py",
    "content": "import torch\nimport torch.nn as nn\n\nfrom tools.frame_interpolation.utils.flow_utils import warp\n\nfrom .ifrnet import ResBlock, convrelu, resize\n\n\ndef multi_flow_combine(comb_block, img0, img1, flow0, flow1, mask=None, img_res=None, mean=None):\n    \"\"\"\n    A parallel implementation of multiple flow field warping\n    comb_block: An nn.Seqential object.\n    img shape: [b, c, h, w]\n    flow shape: [b, 2*num_flows, h, w]\n    mask (opt):\n        If 'mask' is None, the function conduct a simple average.\n    img_res (opt):\n        If 'img_res' is None, the function adds zero instead.\n    mean (opt):\n        If 'mean' is None, the function adds zero instead.\n    \"\"\"\n    b, c, h, w = flow0.shape\n    num_flows = c // 2\n    flow0 = flow0.reshape(b, num_flows, 2, h, w).reshape(-1, 2, h, w)\n    flow1 = flow1.reshape(b, num_flows, 2, h, w).reshape(-1, 2, h, w)\n\n    mask = mask.reshape(b, num_flows, 1, h, w).reshape(-1, 1, h, w) if mask is not None else None\n    img_res = img_res.reshape(b, num_flows, 3, h, w).reshape(-1, 3, h, w) if img_res is not None else 0\n    img0 = torch.stack([img0] * num_flows, 1).reshape(-1, 3, h, w)\n    img1 = torch.stack([img1] * num_flows, 1).reshape(-1, 3, h, w)\n    mean = torch.stack([mean] * num_flows, 1).reshape(-1, 1, 1, 1) if mean is not None else 0\n\n    img0_warp = warp(img0, flow0)\n    img1_warp = warp(img1, flow1)\n    img_warps = mask * img0_warp + (1 - mask) * img1_warp + mean + img_res\n    img_warps = img_warps.reshape(b, num_flows, 3, h, w)\n    imgt_pred = img_warps.mean(1) + comb_block(img_warps.view(b, -1, h, w))\n    return imgt_pred\n\n\nclass MultiFlowDecoder(nn.Module):\n    def __init__(self, in_ch, skip_ch, num_flows=3):\n        super(MultiFlowDecoder, self).__init__()\n        self.num_flows = num_flows\n        self.convblock = nn.Sequential(\n            convrelu(in_ch * 3 + 4, in_ch * 3),\n            ResBlock(in_ch * 3, skip_ch),\n            nn.ConvTranspose2d(in_ch * 3, 8 * num_flows, 4, 2, 1, bias=True),\n        )\n\n    def forward(self, ft_, f0, f1, flow0, flow1):\n        n = self.num_flows\n        f0_warp = warp(f0, flow0)\n        f1_warp = warp(f1, flow1)\n        out = self.convblock(torch.cat([ft_, f0_warp, f1_warp, flow0, flow1], 1))\n        delta_flow0, delta_flow1, mask, img_res = torch.split(out, [2 * n, 2 * n, n, 3 * n], 1)\n        mask = torch.sigmoid(mask)\n\n        flow0 = delta_flow0 + 2.0 * resize(flow0, scale_factor=2.0).repeat(1, self.num_flows, 1, 1)\n        flow1 = delta_flow1 + 2.0 * resize(flow1, scale_factor=2.0).repeat(1, self.num_flows, 1, 1)\n\n        return flow0, flow1, mask, img_res\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/frame_interpolation/networks/blocks/raft.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\ndef resize(x, scale_factor):\n    return F.interpolate(x, scale_factor=scale_factor, mode=\"bilinear\", align_corners=False)\n\n\ndef bilinear_sampler(img, coords, mask=False):\n    \"\"\"Wrapper for grid_sample, uses pixel coordinates\"\"\"\n    H, W = img.shape[-2:]\n    xgrid, ygrid = coords.split([1, 1], dim=-1)\n    xgrid = 2 * xgrid / (W - 1) - 1\n    ygrid = 2 * ygrid / (H - 1) - 1\n\n    grid = torch.cat([xgrid, ygrid], dim=-1)\n    img = F.grid_sample(img, grid, align_corners=True)\n\n    if mask:\n        mask = (xgrid > -1) & (ygrid > -1) & (xgrid < 1) & (ygrid < 1)\n        return img, mask.float()\n\n    return img\n\n\ndef coords_grid(batch, ht, wd, device):\n    coords = torch.meshgrid(torch.arange(ht, device=device), torch.arange(wd, device=device), indexing=\"ij\")\n    coords = torch.stack(coords[::-1], dim=0).float()\n    return coords[None].repeat(batch, 1, 1, 1)\n\n\nclass SmallUpdateBlock(nn.Module):\n    def __init__(self, cdim, hidden_dim, flow_dim, corr_dim, fc_dim, corr_levels=4, radius=3, scale_factor=None):\n        super(SmallUpdateBlock, self).__init__()\n        cor_planes = corr_levels * (2 * radius + 1) ** 2\n        self.scale_factor = scale_factor\n\n        self.convc1 = nn.Conv2d(2 * cor_planes, corr_dim, 1, padding=0)\n        self.convf1 = nn.Conv2d(4, flow_dim * 2, 7, padding=3)\n        self.convf2 = nn.Conv2d(flow_dim * 2, flow_dim, 3, padding=1)\n        self.conv = nn.Conv2d(corr_dim + flow_dim, fc_dim, 3, padding=1)\n\n        self.gru = nn.Sequential(\n            nn.Conv2d(fc_dim + 4 + cdim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n        )\n\n        self.feat_head = nn.Sequential(\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, cdim, 3, padding=1),\n        )\n\n        self.flow_head = nn.Sequential(\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, 4, 3, padding=1),\n        )\n\n        self.lrelu = nn.LeakyReLU(negative_slope=0.1, inplace=True)\n\n    def forward(self, net, flow, corr):\n        net = resize(net, 1 / self.scale_factor) if self.scale_factor is not None else net\n        cor = self.lrelu(self.convc1(corr))\n        flo = self.lrelu(self.convf1(flow))\n        flo = self.lrelu(self.convf2(flo))\n        cor_flo = torch.cat([cor, flo], dim=1)\n        inp = self.lrelu(self.conv(cor_flo))\n        inp = torch.cat([inp, flow, net], dim=1)\n\n        out = self.gru(inp)\n        delta_net = self.feat_head(out)\n        delta_flow = self.flow_head(out)\n\n        if self.scale_factor is not None:\n            delta_net = resize(delta_net, scale_factor=self.scale_factor)\n            delta_flow = self.scale_factor * resize(delta_flow, scale_factor=self.scale_factor)\n\n        return delta_net, delta_flow\n\n\nclass BasicUpdateBlock(nn.Module):\n    def __init__(\n        self,\n        cdim,\n        hidden_dim,\n        flow_dim,\n        corr_dim,\n        corr_dim2,\n        fc_dim,\n        corr_levels=4,\n        radius=3,\n        scale_factor=None,\n        out_num=1,\n    ):\n        super(BasicUpdateBlock, self).__init__()\n        cor_planes = corr_levels * (2 * radius + 1) ** 2\n\n        self.scale_factor = scale_factor\n        self.convc1 = nn.Conv2d(2 * cor_planes, corr_dim, 1, padding=0)\n        self.convc2 = nn.Conv2d(corr_dim, corr_dim2, 3, padding=1)\n        self.convf1 = nn.Conv2d(4, flow_dim * 2, 7, padding=3)\n        self.convf2 = nn.Conv2d(flow_dim * 2, flow_dim, 3, padding=1)\n        self.conv = nn.Conv2d(flow_dim + corr_dim2, fc_dim, 3, padding=1)\n\n        self.gru = nn.Sequential(\n            nn.Conv2d(fc_dim + 4 + cdim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n        )\n\n        self.feat_head = nn.Sequential(\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, cdim, 3, padding=1),\n        )\n\n        self.flow_head = nn.Sequential(\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, 4 * out_num, 3, padding=1),\n        )\n\n        self.lrelu = nn.LeakyReLU(negative_slope=0.1, inplace=True)\n\n    def forward(self, net, flow, corr):\n        net = resize(net, 1 / self.scale_factor) if self.scale_factor is not None else net\n        cor = self.lrelu(self.convc1(corr))\n        cor = self.lrelu(self.convc2(cor))\n        flo = self.lrelu(self.convf1(flow))\n        flo = self.lrelu(self.convf2(flo))\n        cor_flo = torch.cat([cor, flo], dim=1)\n        inp = self.lrelu(self.conv(cor_flo))\n        inp = torch.cat([inp, flow, net], dim=1)\n\n        out = self.gru(inp)\n        delta_net = self.feat_head(out)\n        delta_flow = self.flow_head(out)\n\n        if self.scale_factor is not None:\n            delta_net = resize(delta_net, scale_factor=self.scale_factor)\n            delta_flow = self.scale_factor * resize(delta_flow, scale_factor=self.scale_factor)\n        return delta_net, delta_flow\n\n\nclass BidirCorrBlock:\n    def __init__(self, fmap1, fmap2, num_levels=4, radius=4):\n        self.num_levels = num_levels\n        self.radius = radius\n        self.corr_pyramid = []\n        self.corr_pyramid_T = []\n\n        corr = BidirCorrBlock.corr(fmap1, fmap2)\n        batch, h1, w1, dim, h2, w2 = corr.shape\n        corr_T = corr.clone().permute(0, 4, 5, 3, 1, 2)\n\n        corr = corr.reshape(batch * h1 * w1, dim, h2, w2)\n        corr_T = corr_T.reshape(batch * h2 * w2, dim, h1, w1)\n\n        self.corr_pyramid.append(corr)\n        self.corr_pyramid_T.append(corr_T)\n\n        for _ in range(self.num_levels - 1):\n            corr = F.avg_pool2d(corr, 2, stride=2)\n            corr_T = F.avg_pool2d(corr_T, 2, stride=2)\n            self.corr_pyramid.append(corr)\n            self.corr_pyramid_T.append(corr_T)\n\n    def __call__(self, coords0, coords1):\n        r = self.radius\n        coords0 = coords0.permute(0, 2, 3, 1)\n        coords1 = coords1.permute(0, 2, 3, 1)\n        assert coords0.shape == coords1.shape, f\"coords0 shape: [{coords0.shape}] is not equal to [{coords1.shape}]\"\n        batch, h1, w1, _ = coords0.shape\n\n        out_pyramid = []\n        out_pyramid_T = []\n        for i in range(self.num_levels):\n            corr = self.corr_pyramid[i]\n            corr_T = self.corr_pyramid_T[i]\n\n            dx = torch.linspace(-r, r, 2 * r + 1, device=coords0.device)\n            dy = torch.linspace(-r, r, 2 * r + 1, device=coords0.device)\n            delta = torch.stack(torch.meshgrid(dy, dx, indexing=\"ij\"), axis=-1)\n            delta_lvl = delta.view(1, 2 * r + 1, 2 * r + 1, 2)\n\n            centroid_lvl_0 = coords0.reshape(batch * h1 * w1, 1, 1, 2) / 2**i\n            centroid_lvl_1 = coords1.reshape(batch * h1 * w1, 1, 1, 2) / 2**i\n            coords_lvl_0 = centroid_lvl_0 + delta_lvl\n            coords_lvl_1 = centroid_lvl_1 + delta_lvl\n\n            corr = bilinear_sampler(corr, coords_lvl_0)\n            corr_T = bilinear_sampler(corr_T, coords_lvl_1)\n            corr = corr.view(batch, h1, w1, -1)\n            corr_T = corr_T.view(batch, h1, w1, -1)\n            out_pyramid.append(corr)\n            out_pyramid_T.append(corr_T)\n\n        out = torch.cat(out_pyramid, dim=-1)\n        out_T = torch.cat(out_pyramid_T, dim=-1)\n        return out.permute(0, 3, 1, 2).contiguous().float(), out_T.permute(0, 3, 1, 2).contiguous().float()\n\n    @staticmethod\n    def corr(fmap1, fmap2):\n        batch, dim, ht, wd = fmap1.shape\n        fmap1 = fmap1.view(batch, dim, ht * wd)\n        fmap2 = fmap2.view(batch, dim, ht * wd)\n\n        corr = torch.matmul(fmap1.transpose(1, 2), fmap2)\n        corr = corr.view(batch, ht, wd, 1, ht, wd)\n        return corr / torch.sqrt(torch.tensor(dim).float())\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/frame_interpolation/utils/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/tools/frame_interpolation/utils/dist_utils.py",
    "content": "import os\n\nimport torch\n\n\ndef get_world_size():\n    \"\"\"Find OMPI world size without calling mpi functions\n    :rtype: int\n    \"\"\"\n    if os.environ.get(\"PMI_SIZE\") is not None:\n        return int(os.environ.get(\"PMI_SIZE\") or 1)\n    elif os.environ.get(\"OMPI_COMM_WORLD_SIZE\") is not None:\n        return int(os.environ.get(\"OMPI_COMM_WORLD_SIZE\") or 1)\n    else:\n        return torch.cuda.device_count()\n\n\ndef get_global_rank():\n    \"\"\"Find OMPI world rank without calling mpi functions\n    :rtype: int\n    \"\"\"\n    if os.environ.get(\"PMI_RANK\") is not None:\n        return int(os.environ.get(\"PMI_RANK\") or 0)\n    elif os.environ.get(\"OMPI_COMM_WORLD_RANK\") is not None:\n        return int(os.environ.get(\"OMPI_COMM_WORLD_RANK\") or 0)\n    else:\n        return 0\n\n\ndef get_local_rank():\n    \"\"\"Find OMPI local rank without calling mpi functions\n    :rtype: int\n    \"\"\"\n    if os.environ.get(\"MPI_LOCALRANKID\") is not None:\n        return int(os.environ.get(\"MPI_LOCALRANKID\") or 0)\n    elif os.environ.get(\"OMPI_COMM_WORLD_LOCAL_RANK\") is not None:\n        return int(os.environ.get(\"OMPI_COMM_WORLD_LOCAL_RANK\") or 0)\n    else:\n        return 0\n\n\ndef get_master_ip():\n    if os.environ.get(\"AZ_BATCH_MASTER_NODE\") is not None:\n        return os.environ.get(\"AZ_BATCH_MASTER_NODE\").split(\":\")[0]\n    elif os.environ.get(\"AZ_BATCHAI_MPI_MASTER_NODE\") is not None:\n        return os.environ.get(\"AZ_BATCHAI_MPI_MASTER_NODE\")\n    else:\n        return \"127.0.0.1\"\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/frame_interpolation/utils/flow_utils.py",
    "content": "import numpy as np\nimport torch\nimport torch.nn.functional as F\nfrom PIL import ImageFile\n\nImageFile.LOAD_TRUNCATED_IMAGES = True\n\n\ndef warp(img, flow):\n    B, _, H, W = flow.shape\n    xx = torch.linspace(-1.0, 1.0, W).view(1, 1, 1, W).expand(B, -1, H, -1)\n    yy = torch.linspace(-1.0, 1.0, H).view(1, 1, H, 1).expand(B, -1, -1, W)\n    grid = torch.cat([xx, yy], 1).to(img)\n    flow_ = torch.cat([flow[:, 0:1, :, :] / ((W - 1.0) / 2.0), flow[:, 1:2, :, :] / ((H - 1.0) / 2.0)], 1)\n    grid_ = (grid + flow_).permute(0, 2, 3, 1)\n    output = F.grid_sample(input=img, grid=grid_, mode=\"bilinear\", padding_mode=\"border\", align_corners=True)\n    return output\n\n\ndef make_colorwheel():\n    \"\"\"\n    Generates a color wheel for optical flow visualization as presented in:\n        Baker et al. \"A Database and Evaluation Methodology for Optical Flow\" (ICCV, 2007)\n        URL: http://vision.middlebury.edu/flow/flowEval-iccv07.pdf\n    Code follows the original C++ source code of Daniel Scharstein.\n    Code follows the Matlab source code of Deqing Sun.\n    Returns:\n        np.ndarray: Color wheel\n    \"\"\"\n\n    RY = 15\n    YG = 6\n    GC = 4\n    CB = 11\n    BM = 13\n    MR = 6\n\n    ncols = RY + YG + GC + CB + BM + MR\n    colorwheel = np.zeros((ncols, 3))\n    col = 0\n\n    # RY\n    colorwheel[0:RY, 0] = 255\n    colorwheel[0:RY, 1] = np.floor(255 * np.arange(0, RY) / RY)\n    col = col + RY\n    # YG\n    colorwheel[col : col + YG, 0] = 255 - np.floor(255 * np.arange(0, YG) / YG)\n    colorwheel[col : col + YG, 1] = 255\n    col = col + YG\n    # GC\n    colorwheel[col : col + GC, 1] = 255\n    colorwheel[col : col + GC, 2] = np.floor(255 * np.arange(0, GC) / GC)\n    col = col + GC\n    # CB\n    colorwheel[col : col + CB, 1] = 255 - np.floor(255 * np.arange(CB) / CB)\n    colorwheel[col : col + CB, 2] = 255\n    col = col + CB\n    # BM\n    colorwheel[col : col + BM, 2] = 255\n    colorwheel[col : col + BM, 0] = np.floor(255 * np.arange(0, BM) / BM)\n    col = col + BM\n    # MR\n    colorwheel[col : col + MR, 2] = 255 - np.floor(255 * np.arange(MR) / MR)\n    colorwheel[col : col + MR, 0] = 255\n    return colorwheel\n\n\ndef flow_uv_to_colors(u, v, convert_to_bgr=False):\n    \"\"\"\n    Applies the flow color wheel to (possibly clipped) flow components u and v.\n    According to the C++ source code of Daniel Scharstein\n    According to the Matlab source code of Deqing Sun\n    Args:\n        u (np.ndarray): Input horizontal flow of shape [H,W]\n        v (np.ndarray): Input vertical flow of shape [H,W]\n        convert_to_bgr (bool, optional): Convert output image to BGR. Defaults to False.\n    Returns:\n        np.ndarray: Flow visualization image of shape [H,W,3]\n    \"\"\"\n    flow_image = np.zeros((u.shape[0], u.shape[1], 3), np.uint8)\n    colorwheel = make_colorwheel()  # shape [55x3]\n    ncols = colorwheel.shape[0]\n    rad = np.sqrt(np.square(u) + np.square(v))\n    a = np.arctan2(-v, -u) / np.pi\n    fk = (a + 1) / 2 * (ncols - 1)\n    k0 = np.floor(fk).astype(np.int32)\n    k1 = k0 + 1\n    k1[k1 == ncols] = 0\n    f = fk - k0\n    for i in range(colorwheel.shape[1]):\n        tmp = colorwheel[:, i]\n        col0 = tmp[k0] / 255.0\n        col1 = tmp[k1] / 255.0\n        col = (1 - f) * col0 + f * col1\n        idx = rad <= 1\n        col[idx] = 1 - rad[idx] * (1 - col[idx])\n        col[~idx] = col[~idx] * 0.75  # out of range\n        # Note the 2-i => BGR instead of RGB\n        ch_idx = 2 - i if convert_to_bgr else i\n        flow_image[:, :, ch_idx] = np.floor(255 * col)\n    return flow_image\n\n\ndef flow_to_image(flow_uv, clip_flow=None, convert_to_bgr=False):\n    \"\"\"\n    Expects a two dimensional flow image of shape.\n    Args:\n        flow_uv (np.ndarray): Flow UV image of shape [H,W,2]\n        clip_flow (float, optional): Clip maximum of flow values. Defaults to None.\n        convert_to_bgr (bool, optional): Convert output image to BGR. Defaults to False.\n    Returns:\n        np.ndarray: Flow visualization image of shape [H,W,3]\n    \"\"\"\n    assert flow_uv.ndim == 3, \"input flow must have three dimensions\"\n    assert flow_uv.shape[2] == 2, \"input flow must have shape [H,W,2]\"\n    if clip_flow is not None:\n        flow_uv = np.clip(flow_uv, 0, clip_flow)\n    u = flow_uv[:, :, 0]\n    v = flow_uv[:, :, 1]\n    rad = np.sqrt(np.square(u) + np.square(v))\n    rad_max = np.max(rad)\n    epsilon = 1e-5\n    u = u / (rad_max + epsilon)\n    v = v / (rad_max + epsilon)\n    return flow_uv_to_colors(u, v, convert_to_bgr)\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/frame_interpolation/utils/utils.py",
    "content": "import random\nimport re\nimport sys\n\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\nfrom imageio import imread, imwrite\nfrom PIL import ImageFile\n\nImageFile.LOAD_TRUNCATED_IMAGES = True\n\n\nclass AverageMeter:\n    def __init__(self):\n        self.reset()\n\n    def reset(self):\n        self.val = 0.0\n        self.avg = 0.0\n        self.sum = 0.0\n        self.count = 0\n\n    def update(self, val, n=1):\n        self.val = val\n        self.sum += val * n\n        self.count += n\n        self.avg = self.sum / self.count\n\n\nclass AverageMeterGroups:\n    def __init__(self) -> None:\n        self.meter_dict = dict()\n\n    def update(self, dict, n=1):\n        for name, val in dict.items():\n            if self.meter_dict.get(name) is None:\n                self.meter_dict[name] = AverageMeter()\n            self.meter_dict[name].update(val, n)\n\n    def reset(self, name=None):\n        if name is None:\n            for v in self.meter_dict.values():\n                v.reset()\n        else:\n            meter = self.meter_dict.get(name)\n            if meter is not None:\n                meter.reset()\n\n    def avg(self, name):\n        meter = self.meter_dict.get(name)\n        if meter is not None:\n            return meter.avg\n\n\nclass InputPadder:\n    \"\"\"Pads images such that dimensions are divisible by divisor\"\"\"\n\n    def __init__(self, dims, divisor=16):\n        self.ht, self.wd = dims[-2:]\n        pad_ht = (((self.ht // divisor) + 1) * divisor - self.ht) % divisor\n        pad_wd = (((self.wd // divisor) + 1) * divisor - self.wd) % divisor\n        self._pad = [pad_wd // 2, pad_wd - pad_wd // 2, pad_ht // 2, pad_ht - pad_ht // 2]\n\n    def pad(self, *inputs):\n        if len(inputs) == 1:\n            return F.pad(inputs[0], self._pad, mode=\"replicate\")\n        else:\n            return [F.pad(x, self._pad, mode=\"replicate\") for x in inputs]\n\n    def unpad(self, *inputs):\n        if len(inputs) == 1:\n            return self._unpad(inputs[0])\n        else:\n            return [self._unpad(x) for x in inputs]\n\n    def _unpad(self, x):\n        ht, wd = x.shape[-2:]\n        c = [self._pad[2], ht - self._pad[3], self._pad[0], wd - self._pad[1]]\n        return x[..., c[0] : c[1], c[2] : c[3]]\n\n\ndef img2tensor(img):\n    if img.shape[-1] > 3:\n        img = img[:, :, :3]\n    return torch.tensor(img).permute(2, 0, 1).unsqueeze(0) / 255.0\n\n\ndef tensor2img(img_t):\n    return (img_t * 255.0).detach().squeeze(0).permute(1, 2, 0).cpu().numpy().clip(0, 255).astype(np.uint8)\n\n\ndef seed_all(seed):\n    random.seed(seed)\n    np.random.seed(seed)\n    torch.manual_seed(seed)\n    torch.cuda.manual_seed_all(seed)\n\n\ndef read(file):\n    if file.endswith(\".float3\"):\n        return readFloat(file)\n    elif file.endswith(\".flo\"):\n        return readFlow(file)\n    elif file.endswith(\".ppm\"):\n        return readImage(file)\n    elif file.endswith(\".pgm\"):\n        return readImage(file)\n    elif file.endswith(\".png\"):\n        return readImage(file)\n    elif file.endswith(\".jpg\"):\n        return readImage(file)\n    elif file.endswith(\".pfm\"):\n        return readPFM(file)[0]\n    else:\n        raise Exception(\"don't know how to read %s\" % file)\n\n\ndef write(file, data):\n    if file.endswith(\".float3\"):\n        return writeFloat(file, data)\n    elif file.endswith(\".flo\"):\n        return writeFlow(file, data)\n    elif file.endswith(\".ppm\"):\n        return writeImage(file, data)\n    elif file.endswith(\".pgm\"):\n        return writeImage(file, data)\n    elif file.endswith(\".png\"):\n        return writeImage(file, data)\n    elif file.endswith(\".jpg\"):\n        return writeImage(file, data)\n    elif file.endswith(\".pfm\"):\n        return writePFM(file, data)\n    else:\n        raise Exception(\"don't know how to write %s\" % file)\n\n\ndef readPFM(file):\n    file = open(file, \"rb\")\n\n    color = None\n    width = None\n    height = None\n    scale = None\n    endian = None\n\n    header = file.readline().rstrip()\n    if header.decode(\"ascii\") == \"PF\":\n        color = True\n    elif header.decode(\"ascii\") == \"Pf\":\n        color = False\n    else:\n        raise Exception(\"Not a PFM file.\")\n\n    dim_match = re.match(r\"^(\\d+)\\s(\\d+)\\s$\", file.readline().decode(\"ascii\"))\n    if dim_match:\n        width, height = list(map(int, dim_match.groups()))\n    else:\n        raise Exception(\"Malformed PFM header.\")\n\n    scale = float(file.readline().decode(\"ascii\").rstrip())\n    if scale < 0:\n        endian = \"<\"\n        scale = -scale\n    else:\n        endian = \">\"\n\n    data = np.fromfile(file, endian + \"f\")\n    shape = (height, width, 3) if color else (height, width)\n\n    data = np.reshape(data, shape)\n    data = np.flipud(data)\n    return data, scale\n\n\ndef writePFM(file, image, scale=1):\n    file = open(file, \"wb\")\n\n    color = None\n\n    if image.dtype.name != \"float32\":\n        raise Exception(\"Image dtype must be float32.\")\n\n    image = np.flipud(image)\n\n    if len(image.shape) == 3 and image.shape[2] == 3:\n        color = True\n    elif len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1:\n        color = False\n    else:\n        raise Exception(\"Image must have H x W x 3, H x W x 1 or H x W dimensions.\")\n\n    file.write(\"PF\\n\" if color else \"Pf\\n\".encode())\n    file.write(\"%d %d\\n\".encode() % (image.shape[1], image.shape[0]))\n\n    endian = image.dtype.byteorder\n\n    if endian == \"<\" or endian == \"=\" and sys.byteorder == \"little\":\n        scale = -scale\n\n    file.write(\"%f\\n\".encode() % scale)\n\n    image.tofile(file)\n\n\ndef readFlow(name):\n    if name.endswith(\".pfm\") or name.endswith(\".PFM\"):\n        return readPFM(name)[0][:, :, 0:2]\n\n    f = open(name, \"rb\")\n\n    header = f.read(4)\n    if header.decode(\"utf-8\") != \"PIEH\":\n        raise Exception(\"Flow file header does not contain PIEH\")\n\n    width = np.fromfile(f, np.int32, 1).squeeze()\n    height = np.fromfile(f, np.int32, 1).squeeze()\n\n    flow = np.fromfile(f, np.float32, width * height * 2).reshape((height, width, 2))\n\n    return flow.astype(np.float32)\n\n\ndef readImage(name):\n    if name.endswith(\".pfm\") or name.endswith(\".PFM\"):\n        data = readPFM(name)[0]\n        if len(data.shape) == 3:\n            return data[:, :, 0:3]\n        else:\n            return data\n    return imread(name)\n\n\ndef writeImage(name, data):\n    if name.endswith(\".pfm\") or name.endswith(\".PFM\"):\n        return writePFM(name, data, 1)\n    return imwrite(name, data)\n\n\ndef writeFlow(name, flow):\n    f = open(name, \"wb\")\n    f.write(\"PIEH\".encode(\"utf-8\"))\n    np.array([flow.shape[1], flow.shape[0]], dtype=np.int32).tofile(f)\n    flow = flow.astype(np.float32)\n    flow.tofile(f)\n\n\ndef readFloat(name):\n    f = open(name, \"rb\")\n\n    if (f.readline().decode(\"utf-8\")) != \"float\\n\":\n        raise Exception(\"float file %s did not contain <float> keyword\" % name)\n\n    dim = int(f.readline())\n\n    dims = []\n    count = 1\n    for i in range(0, dim):\n        d = int(f.readline())\n        dims.append(d)\n        count *= d\n\n    dims = list(reversed(dims))\n\n    data = np.fromfile(f, np.float32, count).reshape(dims)\n    if dim > 2:\n        data = np.transpose(data, (2, 1, 0))\n        data = np.transpose(data, (1, 0, 2))\n\n    return data\n\n\ndef writeFloat(name, data):\n    f = open(name, \"wb\")\n\n    dim = len(data.shape)\n    if dim > 3:\n        raise Exception(\"bad float file dimension: %d\" % dim)\n\n    f.write((\"float\\n\").encode(\"ascii\"))\n    f.write((\"%d\\n\" % dim).encode(\"ascii\"))\n\n    if dim == 1:\n        f.write((\"%d\\n\" % data.shape[0]).encode(\"ascii\"))\n    else:\n        f.write((\"%d\\n\" % data.shape[1]).encode(\"ascii\"))\n        f.write((\"%d\\n\" % data.shape[0]).encode(\"ascii\"))\n        for i in range(2, dim):\n            f.write((\"%d\\n\" % data.shape[i]).encode(\"ascii\"))\n\n    data = data.astype(np.float32)\n    if dim == 2:\n        data.tofile(f)\n\n    else:\n        np.transpose(data, (2, 0, 1)).tofile(f)\n\n\ndef check_dim_and_resize(tensor_list):\n    shape_list = []\n    for t in tensor_list:\n        shape_list.append(t.shape[2:])\n\n    if len(set(shape_list)) > 1:\n        desired_shape = shape_list[0]\n        print(f\"Inconsistent size of input video frames. All frames will be resized to {desired_shape}\")\n\n        resize_tensor_list = []\n        for t in tensor_list:\n            resize_tensor_list.append(torch.nn.functional.interpolate(t, size=tuple(desired_shape), mode=\"bilinear\"))\n\n        tensor_list = resize_tensor_list\n\n    return tensor_list\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/scene_cut/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/tools/scene_cut/convert_id_to_path.py",
    "content": "import argparse\nimport json\nimport os\nfrom functools import partial\n\nimport cv2\nimport numpy as np\nimport pandas as pd\nfrom mmengine.logging import print_log\nfrom moviepy.editor import VideoFileClip\nfrom pandarallel import pandarallel\nfrom tqdm import tqdm\n\ntqdm.pandas()\n\n\ndef is_intact_video(video_path, mode=\"moviepy\", verbose=False, logger=None):\n    if not os.path.exists(video_path):\n        if verbose:\n            print_log(f\"Could not find '{video_path}'\", logger=logger)\n        return False\n\n    if mode == \"moviepy\":\n        try:\n            VideoFileClip(video_path)\n            if verbose:\n                print_log(f\"The video file '{video_path}' is intact.\", logger=logger)\n            return True\n        except Exception as e:\n            if verbose:\n                print_log(f\"Error: {e}\", logger=logger)\n                print_log(f\"The video file '{video_path}' is not intact.\", logger=logger)\n            return False\n    elif mode == \"cv2\":\n        try:\n            cap = cv2.VideoCapture(video_path)\n            if cap.isOpened():\n                if verbose:\n                    print_log(f\"The video file '{video_path}' is intact.\", logger=logger)\n                return True\n        except Exception as e:\n            if verbose:\n                print_log(f\"Error: {e}\", logger=logger)\n                print_log(f\"The video file '{video_path}' is not intact.\", logger=logger)\n            return False\n    else:\n        raise ValueError\n\n\ndef has_downloaded_success(json_path):\n    if not os.path.exists(json_path):\n        return False\n\n    try:\n        with open(json_path, \"r\") as f:\n            data = json.load(f)\n            if \"success\" not in data or isinstance(data[\"success\"], bool) is False or data[\"success\"] is False:\n                return False\n    except Exception:\n        return False\n\n    return True\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"meta_path\", type=str)\n    parser.add_argument(\"--folder_path\", type=str, required=True)\n    parser.add_argument(\"--mode\", type=str, default=None)\n    parser.add_argument(\"--num_workers\", type=int, default=None, help=\"#workers for pandarallel\")\n\n    args = parser.parse_args()\n    return args\n\n\ndef main():\n    args = parse_args()\n\n    meta_path = args.meta_path\n    folder_path = args.folder_path\n    mode = args.mode\n\n    def is_intact(row, mode=None):\n        video_id = row[\"id\"]\n        video_path = os.path.join(folder_path, f\"{video_id}.mp4\")\n        row[\"path\"] = video_path\n\n        if mode == \".mp4\":\n            if is_intact_video(video_path):\n                return True, video_path\n            return False, video_path\n        elif mode == \".json\":\n            # json_path = os.path.join(root_raw, f\"data/{split}/{video_id}.json\")\n            json_path = os.path.join(folder_path, f\"{video_id}.json\")\n            if has_downloaded_success(json_path):\n                return True, video_path\n            return False, video_path\n        elif mode is None:\n            return True, video_path\n        else:\n            raise ValueError\n\n    meta_dirpath = os.path.dirname(meta_path)\n    meta_fname = os.path.basename(meta_path)\n    wo_ext, ext = os.path.splitext(meta_fname)\n\n    if args.num_workers is not None:\n        pandarallel.initialize(progress_bar=True, nb_workers=args.num_workers)\n    else:\n        pandarallel.initialize(progress_bar=True)\n    is_intact_partial = partial(is_intact, mode=mode)\n\n    meta = pd.read_csv(meta_path)\n    ret = meta.parallel_apply(is_intact_partial, axis=1)\n    intact, paths = list(zip(*ret))\n\n    meta[\"intact\"] = intact\n    meta[\"path\"] = paths\n    out_path = os.path.join(meta_dirpath, f\"{wo_ext}_path_intact.csv\")\n    meta.to_csv(out_path, index=False)\n    print(f\"New meta (shape={meta.shape}) with intact info saved to '{out_path}'\")\n\n    meta_format = meta[np.array(intact)]\n    meta_format.drop(\"intact\", axis=1, inplace=True)\n    out_path = os.path.join(meta_dirpath, f\"{wo_ext}_path-filtered.csv\")\n    meta_format.to_csv(out_path, index=False)\n    print(f\"New meta (shape={meta_format.shape}) with format info saved to '{out_path}'\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/scene_cut/cut.py",
    "content": "import cv2  # isort:skip\n\nimport argparse\nimport os\nimport subprocess\nfrom functools import partial\n\nimport pandas as pd\nfrom imageio_ffmpeg import get_ffmpeg_exe\nfrom pandarallel import pandarallel\nfrom scenedetect import FrameTimecode\nfrom tqdm import tqdm\n\ntqdm.pandas()\n\n\ndef print_log(s, logger=None):\n    if logger is not None:\n        logger.info(s)\n    else:\n        print(s)\n\n\ndef process_single_row(row, args):\n    video_path = row[\"path\"]\n\n    logger = None\n\n    # check mp4 integrity\n    # if not is_intact_video(video_path, logger=logger):\n    #     return False\n    try:\n        if \"timestamp\" in row:\n            timestamp = row[\"timestamp\"]\n            if not (timestamp.startswith(\"[\") and timestamp.endswith(\"]\")):\n                return False\n            scene_list = eval(timestamp)\n            scene_list = [(FrameTimecode(s, fps=100), FrameTimecode(t, fps=100)) for s, t in scene_list]\n        else:\n            scene_list = [None]\n        if args.drop_invalid_timestamps:\n            return True\n    except Exception as e:\n        if args.drop_invalid_timestamps:\n            return False\n\n    if \"relpath\" in row:\n        save_dir = os.path.dirname(os.path.join(args.save_dir, row[\"relpath\"]))\n        os.makedirs(save_dir, exist_ok=True)\n    else:\n        save_dir = args.save_dir\n\n    shorter_size = args.shorter_size\n    if (shorter_size is not None) and (\"height\" in row) and (\"width\" in row):\n        min_size = min(row[\"height\"], row[\"width\"])\n        if min_size <= shorter_size:\n            shorter_size = None\n\n    split_video(\n        video_path,\n        scene_list,\n        save_dir=save_dir,\n        min_seconds=args.min_seconds,\n        max_seconds=args.max_seconds,\n        target_fps=args.target_fps,\n        shorter_size=shorter_size,\n        logger=logger,\n    )\n    return True\n\ndef split_video(\n    video_path,\n    scene_list,\n    save_dir,\n    min_seconds=2,\n    max_seconds=15,\n    target_fps=30,\n    shorter_size=None,\n    verbose=False,\n    logger=None,\n):\n    \"\"\"\n    scenes shorter than min_seconds will be ignored;\n    scenes longer than max_seconds will be cut to save the beginning max_seconds.\n    Currently, the saved file name pattern is f'{fname}_scene-{idx}'.mp4\n\n    Args:\n        scene_list (List[Tuple[FrameTimecode, FrameTimecode]]): each element is (s, t): start and end of a scene.\n        min_seconds (float | None)\n        max_seconds (float | None)\n        target_fps (int | None)\n        shorter_size (int | None)\n    \"\"\"\n    FFMPEG_PATH = get_ffmpeg_exe()\n\n    save_path_list = []\n    for idx, scene in enumerate(scene_list):\n        if scene is not None:\n            s, t = scene  # FrameTimecode\n            if min_seconds is not None:\n                if (t - s).get_seconds() < min_seconds:\n                    continue\n\n            duration = t - s\n            if max_seconds is not None:\n                fps = s.framerate\n                max_duration = FrameTimecode(max_seconds, fps=fps)\n                duration = min(max_duration, duration)\n\n        # save path\n        fname = os.path.basename(video_path)\n        fname_wo_ext = os.path.splitext(fname)[0]\n        # TODO: fname pattern\n        save_path = os.path.join(save_dir, f\"{fname_wo_ext}_scene-{idx}.mp4\")\n        if os.path.exists(save_path):\n            # print_log(f\"File '{save_path}' already exists. Skip.\", logger=logger)\n            continue\n        \n        # ffmpeg cmd\n        cmd = [FFMPEG_PATH]\n\n        # Only show ffmpeg output for the first call, which will display any\n        # errors if it fails, and then break the loop. We only show error messages\n        # for the remaining calls.\n        # cmd += ['-v', 'error']\n\n        # clip to cut\n        # Note: -ss after -i is very slow; put -ss before -i !!!\n        if scene is None:\n            cmd += [\"-nostdin\", \"-y\", \"-i\", video_path]\n        else:\n            cmd += [\"-nostdin\", \"-y\", \"-ss\", str(s.get_seconds()), \"-i\", video_path, \"-t\", str(duration.get_seconds())]\n\n        # target fps\n        if target_fps is not None:\n            cmd += [\"-r\", f\"{target_fps}\"]\n\n        # aspect ratio\n        if shorter_size is not None:\n            cmd += [\"-vf\", f\"scale='if(gt(iw,ih),-2,{shorter_size})':'if(gt(iw,ih),{shorter_size},-2)'\"]\n            # cmd += ['-vf', f\"scale='if(gt(iw,ih),{shorter_size},trunc(ow/a/2)*2)':-2\"]\n\n        cmd += [\"-map\", \"0:v\", save_path]\n        # print(cmd)\n        proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\n        stdout, stderr = proc.communicate()\n        # stdout = stdout.decode(\"utf-8\")\n        # print_log(stdout, logger=logger)\n\n        save_path_list.append(video_path)\n        if verbose:\n            print_log(f\"Video clip saved to '{save_path}'\", logger=logger)\n\n    return save_path_list\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"meta_path\", type=str)\n    parser.add_argument(\"--save_dir\", type=str)\n    parser.add_argument(\n        \"--min_seconds\", type=float, default=None, help=\"if not None, clip shorter than min_seconds is ignored\"\n    )\n    parser.add_argument(\n        \"--max_seconds\", type=float, default=None, help=\"if not None, clip longer than max_seconds is truncated\"\n    )\n    parser.add_argument(\"--target_fps\", type=int, default=None, help=\"target fps of clips\")\n    parser.add_argument(\n        \"--shorter_size\", type=int, default=None, help=\"resize the shorter size by keeping ratio; will not do upscale\"\n    )\n    parser.add_argument(\"--num_workers\", type=int, default=None, help=\"#workers for pandarallel\")\n    parser.add_argument(\"--disable_parallel\", action=\"store_true\", help=\"disable parallel processing\")\n    parser.add_argument(\"--drop_invalid_timestamps\", action=\"store_true\", help=\"drop rows with invalid timestamps\")\n    args = parser.parse_args()\n    return args\n\n\ndef main():\n    args = parse_args()\n    meta_path = args.meta_path\n    if not os.path.exists(meta_path):\n        print(f\"Meta file '{meta_path}' not found. Exit.\")\n        exit()\n\n    # create save_dir\n    os.makedirs(args.save_dir, exist_ok=True)\n\n    # initialize pandarallel\n    if not args.disable_parallel:\n        if args.num_workers is not None:\n            pandarallel.initialize(progress_bar=True, nb_workers=args.num_workers)\n        else:\n            pandarallel.initialize(progress_bar=True)\n    process_single_row_partial = partial(process_single_row, args=args)\n\n    # process\n    meta = pd.read_csv(args.meta_path)\n    if not args.disable_parallel:\n        results = meta.parallel_apply(process_single_row_partial, axis=1)\n    else:\n        results = meta.apply(process_single_row_partial, axis=1)\n    if args.drop_invalid_timestamps:\n        meta = meta[results]\n        assert args.meta_path.endswith(\"timestamp.csv\"), \"Only support *timestamp.csv\"\n        meta.to_csv(args.meta_path.replace(\"timestamp.csv\", \"correct_timestamp.csv\"), index=False)\n        print(f\"Corrected timestamp file saved to '{args.meta_path.replace('timestamp.csv', 'correct_timestamp.csv')}'\")\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/scene_cut/scene_detect.py",
    "content": "import argparse\nimport os\n\nimport numpy as np\nimport pandas as pd\nfrom pandarallel import pandarallel\nfrom scenedetect import AdaptiveDetector, detect\nfrom tqdm import tqdm\n\ntqdm.pandas()\n\n\ndef process_single_row(row):\n    # windows\n    # from scenedetect import detect, ContentDetector, AdaptiveDetector\n\n    video_path = row[\"path\"]\n\n    detector = AdaptiveDetector(\n        adaptive_threshold=3.0,\n        # luma_only=True,\n    )\n    # detector = ContentDetector()\n    # TODO: catch error here\n    try:\n        scene_list = detect(video_path, detector, start_in_scene=True)\n        timestamp = [(s.get_timecode(), t.get_timecode()) for s, t in scene_list]\n        return True, str(timestamp)\n    except Exception as e:\n        print(f\"Video '{video_path}' with error {e}\")\n        return False, \"\"\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"meta_path\", type=str)\n    parser.add_argument(\"--num_workers\", type=int, default=None, help=\"#workers for pandarallel\")\n\n    args = parser.parse_args()\n    return args\n\n\ndef main():\n    args = parse_args()\n    meta_path = args.meta_path\n    if not os.path.exists(meta_path):\n        print(f\"Meta file '{meta_path}' not found. Exit.\")\n        exit()\n\n    if args.num_workers is not None:\n        pandarallel.initialize(progress_bar=True, nb_workers=args.num_workers)\n    else:\n        pandarallel.initialize(progress_bar=True)\n\n    meta = pd.read_csv(meta_path)\n    ret = meta.parallel_apply(process_single_row, axis=1)\n\n    succ, timestamps = list(zip(*ret))\n    meta[\"timestamp\"] = timestamps\n    meta = meta[np.array(succ)]\n\n    wo_ext, ext = os.path.splitext(meta_path)\n    out_path = f\"{wo_ext}_timestamp{ext}\"\n    meta.to_csv(out_path, index=False)\n    print(f\"New meta (shape={meta.shape}) with timestamp saved to '{out_path}'.\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/scoring/aesthetic/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/tools/scoring/aesthetic/inference.py",
    "content": "# adapted from https://github.com/christophschuhmann/improved-aesthetic-predictor/blob/main/simple_inference.py\nimport cv2  # isort:skip\n\nimport argparse\nimport gc\nimport os\nfrom datetime import timedelta\n\nimport clip\nimport numpy as np\nimport pandas as pd\nimport torch\nimport torch.distributed as dist\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom einops import rearrange\nfrom torch.utils.data import DataLoader, DistributedSampler\nfrom torchvision.datasets.folder import pil_loader\nfrom tqdm import tqdm\n\nfrom tools.datasets.utils import extract_frames, is_video\n\nNUM_FRAMES_POINTS = {\n    1: (0.5,),\n    2: (0.25, 0.5),\n    3: (0.1, 0.5, 0.9),\n}\n\n\ndef merge_scores(gathered_list: list, meta: pd.DataFrame, column):\n    # reorder\n    indices_list = list(map(lambda x: x[0], gathered_list))\n    scores_list = list(map(lambda x: x[1], gathered_list))\n\n    flat_indices = []\n    for x in zip(*indices_list):\n        flat_indices.extend(x)\n    flat_scores = []\n    for x in zip(*scores_list):\n        flat_scores.extend(x)\n    flat_indices = np.array(flat_indices)\n    flat_scores = np.array(flat_scores)\n\n    # filter duplicates\n    unique_indices, unique_indices_idx = np.unique(flat_indices, return_index=True)\n    meta.loc[unique_indices, column] = flat_scores[unique_indices_idx]\n\n    # drop indices in meta not in unique_indices\n    meta = meta.loc[unique_indices]\n    return meta\n\n\nclass VideoTextDataset(torch.utils.data.Dataset):\n    def __init__(self, meta_path, transform=None, num_frames=3):\n        self.meta_path = meta_path\n        self.meta = pd.read_csv(meta_path)\n        self.transform = transform\n        self.points = NUM_FRAMES_POINTS[num_frames]\n\n    def __getitem__(self, index):\n        sample = self.meta.iloc[index]\n        path = sample[\"path\"]\n\n        # extract frames\n        if not is_video(path):\n            images = [pil_loader(path)]\n        else:\n            num_frames = sample[\"num_frames\"] if \"num_frames\" in sample else None\n            images = extract_frames(sample[\"path\"], points=self.points, backend=\"opencv\", num_frames=num_frames)\n\n        # transform\n        images = [self.transform(img) for img in images]\n\n        # stack\n        images = torch.stack(images)\n\n        ret = dict(index=index, images=images)\n        return ret\n\n    def __len__(self):\n        return len(self.meta)\n\n\nclass MLP(nn.Module):\n    def __init__(self, input_size):\n        super().__init__()\n        self.input_size = input_size\n        self.layers = nn.Sequential(\n            nn.Linear(self.input_size, 1024),\n            nn.Dropout(0.2),\n            nn.Linear(1024, 128),\n            nn.Dropout(0.2),\n            nn.Linear(128, 64),\n            nn.Dropout(0.1),\n            nn.Linear(64, 16),\n            nn.Linear(16, 1),\n        )\n\n    def forward(self, x):\n        return self.layers(x)\n\n\nclass AestheticScorer(nn.Module):\n    def __init__(self, input_size, device):\n        super().__init__()\n        self.mlp = MLP(input_size)\n        self.clip, self.preprocess = clip.load(\"ViT-L/14\", device=device)\n\n        self.eval()\n        self.to(device)\n\n    def forward(self, x):\n        image_features = self.clip.encode_image(x)\n        image_features = F.normalize(image_features, p=2, dim=-1).float()\n        return self.mlp(image_features)\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"meta_path\", type=str, help=\"Path to the input CSV file\")\n    parser.add_argument(\"--bs\", type=int, default=1024, help=\"Batch size\")\n    parser.add_argument(\"--num_workers\", type=int, default=16, help=\"Number of workers\")\n    parser.add_argument(\"--prefetch_factor\", type=int, default=3, help=\"Prefetch factor\")\n    parser.add_argument(\"--num_frames\", type=int, default=3, help=\"Number of frames to extract\")\n    parser.add_argument(\"--skip_if_existing\", action=\"store_true\")\n    args = parser.parse_args()\n\n    return args\n\n\ndef main():\n    args = parse_args()\n\n    meta_path = args.meta_path\n    if not os.path.exists(meta_path):\n        print(f\"Meta file '{meta_path}' not found. Exit.\")\n        exit()\n\n    wo_ext, ext = os.path.splitext(meta_path)\n    out_path = f\"{wo_ext}_aes{ext}\"\n    if args.skip_if_existing and os.path.exists(out_path):\n        print(f\"Output meta file '{out_path}' already exists. Exit.\")\n        exit()\n\n    dist.init_process_group(backend=\"nccl\", timeout=timedelta(hours=24))\n    torch.cuda.set_device(dist.get_rank() % torch.cuda.device_count())\n\n    # build model\n    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n    model = AestheticScorer(768, device)\n    model.mlp.load_state_dict(torch.load(\"pretrained_models/aesthetic.pth\", map_location=device))\n    preprocess = model.preprocess\n\n    # build dataset\n    dataset = VideoTextDataset(args.meta_path, transform=preprocess, num_frames=args.num_frames)\n    dataloader = DataLoader(\n        dataset,\n        batch_size=args.bs,\n        num_workers=args.num_workers,\n        sampler=DistributedSampler(\n            dataset,\n            num_replicas=dist.get_world_size(),\n            rank=dist.get_rank(),\n            shuffle=False,\n            drop_last=False,\n        ),\n    )\n\n    # compute aesthetic scores\n    indices_list = []\n    scores_list = []\n    model.eval()\n    for batch in tqdm(dataloader, disable=dist.get_rank() != 0):\n        indices = batch[\"index\"]\n        images = batch[\"images\"].to(device, non_blocking=True)\n\n        B = images.shape[0]\n        images = rearrange(images, \"B N C H W -> (B N) C H W\")\n\n        # compute score\n        with torch.no_grad():\n            scores = model(images)\n\n        scores = rearrange(scores, \"(B N) 1 -> B N\", B=B)\n        scores = scores.mean(dim=1)\n        scores_np = scores.to(torch.float32).cpu().numpy()\n\n        indices_list.extend(indices.tolist())\n        scores_list.extend(scores_np.tolist())\n\n    # save local results\n    meta_local = merge_scores([(indices_list, scores_list)], dataset.meta, column=\"aes\")\n    save_dir_local = os.path.join(os.path.dirname(out_path), \"parts\")\n    os.makedirs(save_dir_local, exist_ok=True)\n    out_path_local = os.path.join(\n        save_dir_local, os.path.basename(out_path).replace(\".csv\", f\"_part_{dist.get_rank()}.csv\")\n    )\n    meta_local.to_csv(out_path_local, index=False)\n\n    # wait for all ranks to finish data processing\n    dist.barrier()\n\n    torch.cuda.empty_cache()\n    gc.collect()\n    gathered_list = [None] * dist.get_world_size()\n    dist.all_gather_object(gathered_list, (indices_list, scores_list))\n    if dist.get_rank() == 0:\n        meta_new = merge_scores(gathered_list, dataset.meta, column=\"aes\")\n        meta_new.to_csv(out_path, index=False)\n        print(f\"New meta with aesthetic scores saved to '{out_path}'.\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/build/lib/tools/scoring/matching/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/tools/scoring/matching/inference.py",
    "content": "import argparse\nimport os\n\nimport clip\nimport colossalai\nimport numpy as np\nimport pandas as pd\nimport torch\nimport torch.distributed as dist\nimport torch.nn.functional as F\nfrom torch.utils.data import DataLoader, DistributedSampler\nfrom torchvision.datasets.folder import pil_loader\nfrom tqdm import tqdm\n\nfrom tools.datasets.utils import extract_frames, is_video\n\n\ndef merge_scores(gathered_list: list, meta: pd.DataFrame, column):\n    # reorder\n    indices_list = list(map(lambda x: x[0], gathered_list))\n    scores_list = list(map(lambda x: x[1], gathered_list))\n\n    flat_indices = []\n    for x in zip(*indices_list):\n        flat_indices.extend(x)\n    flat_scores = []\n    for x in zip(*scores_list):\n        flat_scores.extend(x)\n    flat_indices = np.array(flat_indices)\n    flat_scores = np.array(flat_scores)\n\n    # filter duplicates\n    unique_indices, unique_indices_idx = np.unique(flat_indices, return_index=True)\n    meta.loc[unique_indices, column] = flat_scores[unique_indices_idx]\n    return meta\n\n\nclass VideoTextDataset(torch.utils.data.Dataset):\n    def __init__(self, meta_path, transform):\n        self.meta_path = meta_path\n        self.meta = pd.read_csv(meta_path)\n        self.transform = transform\n\n    def __getitem__(self, index):\n        row = self.meta.iloc[index]\n        path = row[\"path\"]\n\n        if is_video(path):\n            img = extract_frames(path, points=[0.5], backend=\"opencv\")[0]\n        else:\n            img = pil_loader(path)\n\n        img = self.transform(img)\n\n        text = row[\"text\"]\n        text = clip.tokenize(text, truncate=True).squeeze()\n\n        return img, text, index\n\n    def __len__(self):\n        return len(self.meta)\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"meta_path\", type=str, help=\"Path to the input CSV file\")\n    parser.add_argument(\"--bs\", type=int, default=16, help=\"Batch size\")\n    parser.add_argument(\"--num_workers\", type=int, default=16, help=\"Number of workers\")\n    parser.add_argument(\"--skip_if_existing\", action=\"store_true\")\n    args = parser.parse_args()\n    return args\n\n\ndef main():\n    args = parse_args()\n\n    meta_path = args.meta_path\n    if not os.path.exists(meta_path):\n        print(f\"Meta file '{meta_path}' not found. Exit.\")\n        exit()\n\n    wo_ext, ext = os.path.splitext(meta_path)\n    out_path = f\"{wo_ext}_match{ext}\"\n    if args.skip_if_existing and os.path.exists(out_path):\n        print(f\"Output meta file '{out_path}' already exists. Exit.\")\n        exit()\n\n    colossalai.launch_from_torch({})\n\n    # build model\n    device = torch.device(\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n    model, preprocess = clip.load(\"ViT-L/14\", device=device)\n    logit_scale = model.logit_scale.exp().item()\n\n    # build dataset\n    dataset = VideoTextDataset(meta_path=meta_path, transform=preprocess)\n    dataloader = DataLoader(\n        dataset,\n        batch_size=args.bs,\n        num_workers=args.num_workers,\n        sampler=DistributedSampler(\n            dataset,\n            num_replicas=dist.get_world_size(),\n            rank=dist.get_rank(),\n            shuffle=False,\n            drop_last=False,\n        ),\n    )\n\n    # compute scores\n    indices_list = []\n    scores_list = []\n    model.eval()\n    for imgs, text, indices in tqdm(dataloader, disable=dist.get_rank() != 0):\n        imgs = imgs.to(device)\n        text = text.to(device)\n\n        with torch.no_grad():\n            feat_img = model.encode_image(imgs)\n            feat_text = model.encode_text(text)\n\n        feat_img = F.normalize(feat_img, dim=1)\n        feat_text = F.normalize(feat_text, dim=1)\n        clip_scores = logit_scale * (feat_img * feat_text).sum(dim=1)\n        clip_scores = clip_scores.cpu().tolist()\n        indices_list.extend(indices)\n        scores_list.extend(clip_scores)\n\n    gathered_list = [None] * dist.get_world_size()\n    dist.all_gather_object(gathered_list, (indices_list, scores_list))\n    if dist.get_rank() == 0:\n        meta_new = merge_scores(gathered_list, dataset.meta, column=\"match\")\n        meta_new.to_csv(out_path, index=False)\n        print(f\"New meta with matching scores saved to '{out_path}'.\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/__init__.py",
    "content": "import os\n\nfrom .utils import get_prompt_from_filename, init_submodules, save_json, load_json\nimport importlib\nfrom itertools import chain\nfrom pathlib import Path\n\nclass VBench(object):\n    def __init__(self, device, full_info_dir, output_path):\n        self.device = device                        # cuda or cpu\n        self.full_info_dir = full_info_dir          # full json file that VBench originally provides\n        self.output_path = output_path              # output directory to save VBench results\n        if not os.path.exists(self.output_path):\n            os.makedirs(self.output_path, exist_ok=False)\n\n    def build_full_dimension_list(self, ):\n        return [\"subject_consistency\", \"background_consistency\", \"aesthetic_quality\", \"imaging_quality\", \"object_class\", \"multiple_objects\", \"color\", \"spatial_relationship\", \"scene\", \"temporal_style\", 'overall_consistency', \"human_action\", \"temporal_flickering\", \"motion_smoothness\", \"dynamic_degree\", \"appearance_style\"]        \n\n    def check_dimension_requires_extra_info(self, dimension_list):\n        dim_custom_not_supported = set(dimension_list) & set([\n            'background_consistency', 'object_class', 'multiple_objects', 'scene', 'appearance_style', 'color', 'spatial_relationship'\n        ])\n        assert len(dim_custom_not_supported) == 0, f\"dimensions : {dim_custom_not_supported} not supported for custom input\"\n\n\n    def build_full_info_json(self, videos_path, name, dimension_list, prompt_list=[], special_str='', verbose=False, mode='vbench_standard', **kwargs):\n        cur_full_info_list=[] # to save the prompt and video path info for the current dimensions\n        if mode=='custom_input':\n            self.check_dimension_requires_extra_info(dimension_list)\n            if os.path.isfile(videos_path):\n                cur_full_info_list = [{\"prompt_en\": get_prompt_from_filename(videos_path), \"dimension\": dimension_list, \"video_list\": [videos_path]}]\n                if len(prompt_list) == 1:\n                    cur_full_info_list[0][\"prompt_en\"] = prompt_list[0]\n            else:\n                video_names = os.listdir(videos_path)\n\n                cur_full_info_list = []\n\n                for filename in video_names:\n                    postfix = Path(os.path.join(videos_path, filename)).suffix\n                    if postfix.lower() not in ['.mp4', '.gif', '.jpg', '.png']:\n                        continue\n                    cur_full_info_list.append({\n                        \"prompt_en\": get_prompt_from_filename(filename), \n                        \"dimension\": dimension_list, \n                        \"video_list\": [os.path.join(videos_path, filename)]\n                    })\n\n                if len(prompt_list) > 0:\n                    prompt_list = {os.path.join(videos_path, path): prompt_list[path] for path in prompt_list}\n                    assert len(prompt_list) >= len(cur_full_info_list), \"\"\"\n                        Number of prompts should match with number of videos.\\n\n                        Got {len(prompt_list)=}, {len(cur_full_info_list)=}\\n\n                        To read the prompt from filename, delete --prompt_file and --prompt_list\n                        \"\"\"\n\n                    all_video_path = [os.path.abspath(file) for file in list(chain.from_iterable(vid[\"video_list\"] for vid in cur_full_info_list))]\n                    backslash = \"\\n\"\n                    assert len(set(all_video_path) - set([os.path.abspath(path_key) for path_key in prompt_list])) == 0, f\"\"\"\n                    The prompts for the following videos are not found in the prompt file: \\n\n                    {backslash.join(set(all_video_path) - set([os.path.abspath(path_key) for path_key in prompt_list]))}\n                    \"\"\"\n\n                    video_map = {}\n                    for prompt_key in prompt_list:\n                        video_map[os.path.abspath(prompt_key)] = prompt_list[prompt_key]\n\n                    for video_info in cur_full_info_list:\n                        video_info[\"prompt_en\"] = video_map[os.path.abspath(video_info[\"video_list\"][0])]\n\n        elif mode=='vbench_category':\n            self.check_dimension_requires_extra_info(dimension_list)\n            CUR_DIR = os.path.dirname(os.path.abspath(__file__))\n            category_supported = [ Path(category).stem for category in os.listdir(f'prompts/prompts_per_category') ]# TODO: probably need refactoring again\n            if 'category' not in kwargs:\n                category = category_supported\n            else:\n                category = kwargs['category']\n\n            assert category is not None, \"Please specify the category to be evaluated with --category\"\n            assert category in category_supported, f'''\n            The following category is not supported, {category}.\n            '''\n\n            video_names = os.listdir(videos_path)\n            postfix = Path(video_names[0]).suffix\n\n            with open(f'{CUR_DIR}/prompts_per_category/{category}.txt', 'r') as f:\n                video_prompts = [line.strip() for line in f.readlines()]\n\n            for prompt in video_prompts:\n                video_list = []\n                for filename in video_names:\n                    if (not Path(filename).stem.startswith(prompt)):\n                        continue\n                    postfix = Path(os.path.join(videos_path, filename)).suffix\n                    if postfix.lower() not in ['.mp4', '.gif', '.jpg', '.png']:\n                        continue\n                    video_list.append(os.path.join(videos_path, filename))\n\n                cur_full_info_list.append({\n                    \"prompt_en\": prompt, \n                    \"dimension\": dimension_list, \n                    \"video_list\": video_list \n                })\n\n        else:\n            full_info_list = load_json(self.full_info_dir)\n            video_names = os.listdir(videos_path)\n            postfix = Path(video_names[0]).suffix\n            for prompt_dict in full_info_list:\n                # if the prompt belongs to any dimension we want to evaluate\n                if set(dimension_list) & set(prompt_dict[\"dimension\"]): \n                    prompt = prompt_dict['prompt_en']\n                    prompt_dict['video_list'] = []\n                    for i in range(5): # video index for the same prompt\n                        intended_video_name = f'{prompt}{special_str}-{str(i)}{postfix}'\n                        if intended_video_name in video_names: # if the video exists\n                            intended_video_path = os.path.join(videos_path, intended_video_name)\n                            prompt_dict['video_list'].append(intended_video_path)\n                            if verbose:\n                                print(f'Successfully found video: {intended_video_name}')\n                        else:\n                            print(f'WARNING!!! This required video is not found! Missing benchmark videos can lead to unfair evaluation result. The missing video is: {intended_video_name}')\n                    cur_full_info_list.append(prompt_dict)\n\n        \n        cur_full_info_path = os.path.join(self.output_path, name+'_full_info.json')\n        save_json(cur_full_info_list, cur_full_info_path)\n        print(f'Evaluation meta data saved to {cur_full_info_path}')\n        return cur_full_info_path\n\n\n    def evaluate(self, videos_path, name, prompt_list=[], dimension_list=None, local=False, read_frame=False, mode='vbench_standard', **kwargs):\n        results_dict = {}\n        if dimension_list is None:\n            dimension_list = self.build_full_dimension_list()\n        submodules_dict = init_submodules(dimension_list, local=local, read_frame=read_frame)\n\n        cur_full_info_path = self.build_full_info_json(videos_path, name, dimension_list, prompt_list, mode=mode, **kwargs)\n        \n        for dimension in dimension_list:\n            try:\n                dimension_module = importlib.import_module(f'vbench.{dimension}')\n                evaluate_func = getattr(dimension_module, f'compute_{dimension}')\n            except Exception as e:\n                raise NotImplementedError(f'UnImplemented dimension {dimension}!, {e}')\n            submodules_list = submodules_dict[dimension]\n            print(f'cur_full_info_path: {cur_full_info_path}') # TODO: to delete\n            results = evaluate_func(cur_full_info_path, self.device, submodules_list, **kwargs)\n            results_dict[dimension] = results\n        output_name = os.path.join(self.output_path, name+'_eval_results.json')\n        save_json(results_dict, output_name)\n        print(f'Evaluation results saved to {output_name}')\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/aesthetic_quality.py",
    "content": "import os\nimport clip\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport subprocess\nfrom urllib.request import urlretrieve\nfrom vbench.utils import load_video, load_dimension_info, clip_transform\nfrom tqdm import tqdm\n\n\ndef get_aesthetic_model(cache_folder):\n    \"\"\"load the aethetic model\"\"\"\n    path_to_model = cache_folder + \"/sa_0_4_vit_l_14_linear.pth\"\n    if not os.path.exists(path_to_model):\n        os.makedirs(cache_folder, exist_ok=True)\n        url_model = (\n            \"https://github.com/LAION-AI/aesthetic-predictor/blob/main/sa_0_4_vit_l_14_linear.pth?raw=true\"\n        )\n        # download aesthetic predictor\n        if not os.path.isfile(path_to_model):\n            try:\n                print(f'trying urlretrieve to download {url_model} to {path_to_model}')\n                urlretrieve(url_model, path_to_model) # unable to download https://github.com/LAION-AI/aesthetic-predictor/blob/main/sa_0_4_vit_l_14_linear.pth?raw=true to pretrained/aesthetic_model/emb_reader/sa_0_4_vit_l_14_linear.pth \n            except:\n                print(f'unable to download {url_model} to {path_to_model} using urlretrieve, trying wget')\n                wget_command = ['wget', url_model, '-P', os.path.dirname(path_to_model)]\n                subprocess.run(wget_command)\n    m = nn.Linear(768, 1)\n    s = torch.load(path_to_model)\n    m.load_state_dict(s)\n    m.eval()\n    return m\n\n\ndef laion_aesthetic(aesthetic_model, clip_model, video_list, device):\n    aesthetic_model.eval()\n    clip_model.eval()\n    aesthetic_avg = 0.0\n    num = 0\n    video_results = []\n    for video_path in tqdm(video_list):\n        images = load_video(video_path)\n        image_transform = clip_transform(224)\n        images = image_transform(images)\n        images = images.to(device)\n        image_feats = clip_model.encode_image(images).to(torch.float32)\n        image_feats = F.normalize(image_feats, dim=-1, p=2)\n        aesthetic_scores = aesthetic_model(image_feats).squeeze()\n        normalized_aesthetic_scores = aesthetic_scores/10\n        cur_avg = torch.mean(normalized_aesthetic_scores, dim=0, keepdim=True)\n        aesthetic_avg += cur_avg.item()\n        num += 1\n        video_results.append({'video_path': video_path, 'video_results': cur_avg.item()})\n    aesthetic_avg /= num\n    return aesthetic_avg, video_results\n\n\ndef compute_aesthetic_quality(json_dir, device, submodules_list, **kwargs):\n    vit_path = submodules_list[0]\n    aes_path = submodules_list[1]\n    aesthetic_model = get_aesthetic_model(aes_path).to(device)\n    clip_model, preprocess = clip.load(vit_path, device=device)\n    video_list, _ = load_dimension_info(json_dir, dimension='aesthetic_quality', lang='en')\n    all_results, video_results = laion_aesthetic(aesthetic_model, clip_model, video_list, device)\n    return all_results, video_results\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/appearance_style.py",
    "content": "import os\nimport json\nimport numpy as np\nfrom tqdm import tqdm\n\nimport torch\nimport clip\nfrom PIL import Image\nfrom vbench.utils import load_video, load_dimension_info, clip_transform, read_frames_decord_by_fps, clip_transform_Image\n\ndef get_text_features(model, input_text, tokenizer, text_feature_dict={}):\n    if input_text in text_feature_dict:\n        return text_feature_dict[input_text]\n    text_template= f\"{input_text}\"\n    with torch.no_grad():\n        text_features = model.encode_text(text_template).float()\n        text_features /= text_features.norm(dim=-1, keepdim=True)      \n        text_feature_dict[input_text] = text_features\n    return text_features\n\ndef get_vid_features(model, input_frames):\n    with torch.no_grad():\n        clip_feat = model.encode_vision(input_frames,test=True).float()\n        clip_feat /= clip_feat.norm(dim=-1, keepdim=True)    \n    return clip_feat\n\ndef get_predict_label(clip_feature, text_feats_tensor, top=5):\n    label_probs = (100.0 * clip_feature @ text_feats_tensor.T).softmax(dim=-1)\n    top_probs, top_labels = label_probs.cpu().topk(top, dim=-1)\n    return top_probs, top_labels\n\ndef appearance_style(clip_model, video_dict, device, sample=\"rand\"):\n    sim = 0.0\n    cnt = 0\n    video_results = []\n    image_transform = clip_transform_Image(224)\n    for info in tqdm(video_dict):\n        if 'auxiliary_info' not in info:\n            raise \"Auxiliary info is not in json, please check your json.\"\n        query = info['auxiliary_info']['appearance_style']\n        text = clip.tokenize([query]).to(device)\n        video_list = info['video_list']\n        for video_path in video_list:\n            cur_video = []\n            with torch.no_grad():\n                video_arrays = load_video(video_path, return_tensor=False)\n                images = [Image.fromarray(i) for i in video_arrays]\n                for image in images:\n                    image = image_transform(image)\n                    image = image.to(device)\n                    logits_per_image, logits_per_text = clip_model(image.unsqueeze(0), text)\n                    cur_sim = float(logits_per_text[0][0].cpu())\n                    cur_sim = cur_sim / 100\n                    cur_video.append(cur_sim)\n                    sim += cur_sim\n                    cnt +=1\n                video_sim = np.mean(cur_video)\n                video_results.append({'video_path': video_path, 'video_results': video_sim, 'frame_results':cur_video})\n    sim_per_frame = sim / cnt\n    return sim_per_frame, video_results\n\ndef compute_appearance_style(json_dir, device, submodules_list, **kwargs):\n    clip_model, preprocess = clip.load(device=device, **submodules_list)\n    _, video_dict = load_dimension_info(json_dir, dimension='appearance_style', lang='en')\n    all_results, video_results = appearance_style(clip_model, video_dict, device)\n    return all_results, video_results\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/background_consistency.py",
    "content": "import os\nimport json\nimport logging\nimport numpy as np\nimport clip\nfrom PIL import Image\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom vbench.utils import load_video, load_dimension_info, clip_transform\nfrom tqdm import tqdm\n\n\ndef background_consistency(clip_model, preprocess, video_list, device, read_frame):\n    sim = 0.0\n    cnt = 0\n    video_results = []\n    image_transform = clip_transform(224)\n    for video_path in tqdm(video_list):\n        video_sim = 0.0\n        if read_frame:\n            video_path = video_path[:-4].replace('videos', 'frames').replace(' ', '_')\n            tmp_paths = [os.path.join(video_path, f) for f in sorted(os.listdir(video_path))]\n            images = []\n            for tmp_path in tmp_paths:\n                images.append(preprocess(Image.open(tmp_path)))\n            images = torch.stack(images)\n        else:\n            images = load_video(video_path)\n            images = image_transform(images)\n        images = images.to(device)\n        image_features = clip_model.encode_image(images)\n        image_features = F.normalize(image_features, dim=-1, p=2)\n        for i in range(len(image_features)):\n            image_feature = image_features[i].unsqueeze(0)\n            if i == 0:\n                first_image_feature = image_feature\n            else:\n                sim_pre = max(0.0, F.cosine_similarity(former_image_feature, image_feature).item())\n                sim_fir = max(0.0, F.cosine_similarity(first_image_feature, image_feature).item())\n                cur_sim = (sim_pre + sim_fir) / 2\n                video_sim += cur_sim\n                cnt += 1\n            former_image_feature = image_feature\n        sim_per_image = video_sim / (len(image_features) - 1)\n        sim += video_sim\n        video_results.append({'video_path': video_path, 'video_results': sim_per_image})\n    sim_per_video = sim / (len(video_list) - 1)\n    sim_per_frame = sim / cnt\n    return sim_per_frame, video_results\n\n\ndef compute_background_consistency(json_dir, device, submodules_list, **kwargs):\n    vit_path, read_frame = submodules_list[0], submodules_list[1]\n    clip_model, preprocess = clip.load(vit_path, device=device)\n    video_list, _ = load_dimension_info(json_dir, dimension='background_consistency', lang='en')\n    all_results, video_results = background_consistency(clip_model, preprocess, video_list, device, read_frame)\n    return all_results, video_results\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/cli/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/cli/evaluate.py",
    "content": "import torch\nimport os\nfrom vbench import VBench\nfrom datetime import datetime\nimport argparse\nimport json\n\nCUR_DIR = os.path.dirname(os.path.abspath(__file__))\ndef register_subparsers(subparser):\n    parser = subparser.add_parser('evaluate', formatter_class=argparse.RawTextHelpFormatter)\n    parser.add_argument(\n        \"--output_path\",\n        type=str,\n        default='./evaluation_results/',\n        help=\"output path to save the evaluation results\",\n    )\n    parser.add_argument(\n        \"--full_json_dir\",\n        type=str,\n        default=f'{CUR_DIR}/../VBench_full_info.json',\n        help=\"path to save the json file that contains the prompt and dimension information\",\n    )\n    parser.add_argument(\n        \"--videos_path\",\n        type=str,\n        required=True,\n        help=\"folder that contains the sampled videos\",\n    )\n    parser.add_argument(\n        \"--dimension\",\n        nargs='+',\n        required=True,\n        help=\"list of evaluation dimensions, usage: --dimension <dim_1> <dim_2>\",\n    )\n    parser.add_argument(\n        \"--load_ckpt_from_local\",\n        type=bool,\n        required=False,\n        help=\"whether load checkpoints from local default paths (assuming you have downloaded the checkpoints locally\",\n    )\n    parser.add_argument(\n        \"--read_frame\",\n        type=bool,\n        required=False,\n        help=\"whether directly read frames, or directly read videos\",\n    )\n    parser.add_argument(\n        \"--mode\",\n        choices=['custom_input', 'vbench_standard', 'vbench_category'],\n        default='vbench_standard',\n        help=\"\"\"This flags determine the mode of evaluations, choose one of the following:\n        1. \"custom_input\": receive input prompt from either --prompt/--prompt_file flags or the filename\n        2. \"vbench_standard\": evaluate on standard prompt suite of VBench\n        3. \"vbench_category\": evaluate on specific category\n        \"\"\",\n    )\n    parser.add_argument(\n        \"--custom_input\",\n        action=\"store_true\",\n        required=False,\n        help=\"(deprecated) use --mode=\\\"custom_input\\\" instead\",\n    )\n    parser.add_argument(\n        \"--prompt\",\n        type=str,\n        default=\"\",\n        help=\"\"\"Specify the input prompt\n        If not specified, filenames will be used as input prompts\n        * Mutually exclusive to --prompt_file.\n        ** This option must be used with --custom_input flag\n        \"\"\"\n    )\n    parser.add_argument(\n        \"--prompt_file\",\n        type=str,\n        required=False,\n        help=\"\"\"Specify the path of the file that contains prompt lists\n        If not specified, filenames will be used as input prompts\n        * Mutually exclusive to --prompt.\n        ** This option must be used with --custom_input flag\n        \"\"\"\n    )\n    parser.add_argument(\n        \"--category\",\n        type=str,\n        required=False,\n        help=\"\"\"This is for mode=='vbench_category'\n        The category to evaluate on, usage: --category=animal.\n        \"\"\",\n    )\n\n    ## for dimension specific params ###\n    parser.add_argument(\n        \"--imaging_quality_preprocessing_mode\",\n        type=str,\n        required=False,\n        default='longer',\n        help=\"\"\"This is for setting preprocessing in imaging_quality\n        1. 'shorter': if the shorter side is more than 512, the image is resized so that the shorter side is 512.\n        2. 'longer': if the longer side is more than 512, the image is resized so that the longer side is 512.\n        3. 'shorter_centercrop': if the shorter side is more than 512, the image is resized so that the shorter side is 512. \n        Then the center 512 x 512 after resized is used for evaluation.\n        4. 'None': no preprocessing\n        \"\"\",\n    )\n    parser.set_defaults(func=evaluate)\n\ndef evaluate(args):\n    print(f'args: {args}')\n\n    device = torch.device(\"cuda\")\n    my_VBench = VBench(device, args.full_json_dir, args.output_path)\n    \n    print(f'start evaluation')\n    \n    current_time = datetime.now().strftime('%Y-%m-%d-%H:%M:%S')\n\n    kwargs = {}\n\n    prompt = []\n\n    assert args.custom_input == False, \"(Deprecated) use --mode=custom_input instead\"\n    \n    if (args.prompt_file is not None) and (args.prompt != \"\"):\n        raise Exception(\"--prompt_file and --prompt cannot be used together\")\n    if (args.prompt_file is not None or args.prompt != \"\") and (not args.mode=='custom_input'):\n        raise Exception(\"must set --mode=custom_input for using external prompt\")\n\n    if args.prompt_file:\n        with open(args.prompt_file, 'r') as f:\n            prompt = json.load(f)\n        assert type(prompt) == dict, \"Invalid prompt file format. The correct format is {\\\"video_path\\\": prompt, ... }\"\n    elif args.prompt != \"\":\n        prompt = [args.prompt]\n\n    if args.category != \"\":\n        kwargs['category'] = args.category\n\n    kwargs['imaging_quality_preprocessing_mode'] = args.imaging_quality_preprocessing_mode\n\n    my_VBench.evaluate(\n        videos_path = args.videos_path,\n        name = f'results_{current_time}',\n        prompt_list=prompt, # pass in [] to read prompt from filename\n        dimension_list = args.dimension,\n        local=args.load_ckpt_from_local,\n        read_frame=args.read_frame,\n        mode=args.mode,\n        **kwargs\n    )\n    print('done')\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/cli/static_filter.py",
    "content": "import os\nimport cv2\nimport glob\nimport numpy as np\nimport torch\nfrom tqdm import tqdm\nfrom pathlib import Path\nimport json\nimport shutil\n\nimport logging\nlogging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(levelname)s - %(message)s')\nlogger = logging.getLogger(__name__)\n\nfrom vbench.utils import CACHE_DIR, get_prompt_from_filename, load_json\nfrom vbench.third_party.RAFT.core.raft import RAFT\nfrom vbench.third_party.RAFT.core.utils_core.utils import InputPadder\n\n\nCUR_DIR = os.path.dirname(os.path.abspath(__file__))\nDEVICE = 'cuda'\n\n\nclass StaticFilter:\n    def __init__(self, args, device):\n        self.args = args\n        self.device = device\n        self.load_model()\n\n\n    def load_model(self):\n        self.model = torch.nn.DataParallel(RAFT(self.args))\n        self.model.load_state_dict(torch.load(self.args.model))\n\n        self.model = self.model.module\n        self.model.to(self.device)\n        self.model.eval()\n\n\n    def get_score(self, img, flo):\n        img = img[0].permute(1,2,0).cpu().numpy()\n        flo = flo[0].permute(1,2,0).cpu().numpy()\n\n        u = flo[:,:,0]\n        v = flo[:,:,1]\n        rad = np.sqrt(np.square(u) + np.square(v))\n        \n        h, w = rad.shape\n        rad_flat = rad.flatten()\n        cut_index = int(h*w*0.02)\n\n        max_rad = np.mean(abs(np.sort(-rad_flat))[:cut_index])\n\n        return max_rad\n\n\n    def check_static(self, score_list):\n        thres = self.params[\"thres\"]\n        count_num = self.params[\"count_num\"]\n        count = 0\n        for score in score_list[:-2]:\n            if score > thres:\n                count += 1\n            if count > count_num:\n                return False\n        for score in score_list[-2:]:\n            if score > thres*count_num*2:\n                return False\n        return True\n    \n\n    def set_params(self, frame, count):\n        scale = min(list(frame.shape)[-2:])\n        self.params = {\"thres\":3.0*(scale/256.0), \"count_num\":round(2*(count/16.0))}\n\n\n    def infer(self, path):\n        with torch.no_grad():\n            frames = self.get_frames(path)\n            self.set_params(frame=frames[0], count=len(frames))\n            static_score = []\n            for image1, image2 in zip(frames[:-1]+[frames[0],frames[-1]], frames[1:]+[frames[-1],frames[0]]):\n                padder = InputPadder(image1.shape)\n                image1, image2 = padder.pad(image1, image2)\n                _, flow_up = self.model(image1, image2, iters=20, test_mode=True)\n                max_rad = self.get_score(image1, flow_up)\n                static_score.append(max_rad)\n            whether_static = self.check_static(static_score)\n            return whether_static\n\n\n    def get_frames(self, video_path):\n        frame_list = []\n        video = cv2.VideoCapture(video_path)\n        while video.isOpened():\n            success, frame = video.read()\n            if success:\n                frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)  # convert to rgb\n                frame = torch.from_numpy(frame.astype(np.uint8)).permute(2, 0, 1).float()\n                frame = frame[None].to(DEVICE)\n                frame_list.append(frame)\n            else:\n                break\n        video.release()\n        assert frame_list != []\n        return frame_list\n\ndef check_and_move(args, filter_results, target_path=None):\n    if target_path is None:\n         target_path = os.path.join(args.result_path, \"filtered_videos\")\n    os.makedirs(target_path, exist_ok=True)\n    for prompt, v in filter_results.items():\n        if v[\"static_count\"] < 5 and args.filter_scope=='temporal_flickering':\n            logger.warning(f\"Prompt: '{prompt}' has fewer than 5 filter results.\")\n        for i, video_path in enumerate(v[\"static_path\"]):\n            target_name = os.path.join(target_path, f\"{prompt}-{i}.mp4\")\n            shutil.copy(video_path, target_name)\n    logger.info(f\"All filtered videos are saved in the '{target_path}' path\")\n\ndef static_filter(args):\n    static_filter = StaticFilter(args, device=DEVICE)\n    prompt_dict = {}\n    prompt_list = []\n    paths = sorted(glob.glob(os.path.join(args.videos_path, \"*.mp4\")))\n    \n    if args.filter_scope=='temporal_flickering':\n        full_prompt_list = load_json(f\"{CUR_DIR}/../VBench_full_info.json\")\n        for prompt in full_prompt_list:\n            if 'temporal_flickering' in prompt['dimension']:\n                prompt_dict[prompt['prompt_en']] = {\"static_count\":0, \"static_path\":[]}\n                prompt_list.append(prompt['prompt_en'])\n\n    elif args.filter_scope=='all':\n        for prompt in paths:\n            prompt = get_prompt_from_filename(prompt)\n            prompt_dict[prompt] = {\"static_count\":0, \"static_path\":[]}\n            prompt_list.append(prompt)\n\n    else:\n        assert os.path.isfile(args.filter_scope) and Path(args.filter_scope).suffix.lower() == '.json', f\"\"\"\n        --filter_scope flag is not correctly set, set to 'all' to filter all videos in the --videos_path directory, \n        or provide the correct path to the JSON file\n        \"\"\"\n        full_prompt_list = load_json(args.filter_scope)\n        for prompt in full_prompt_list:\n            prompt = get_prompt_from_filename(prompt)\n            prompt_dict[prompt] = {\"static_count\":0, \"static_path\":[]}\n            prompt_list.append(prompt)\n    \n    for path in tqdm(paths):\n        name = get_prompt_from_filename(path)\n        if name in prompt_list:\n            if prompt_dict[name][\"static_count\"] < 5 or args.filter_scope != 'temporal_flickering':\n                if static_filter.infer(path):\n                    prompt_dict[name][\"static_count\"] += 1\n                    prompt_dict[name][\"static_path\"].append(path)\n\n    os.makedirs(args.result_path, exist_ok=True)\n    info_file = os.path.join(args.result_path, args.store_name)\n    json.dump(prompt_dict, open(info_file, \"w\"))\n    logger.info(f\"Filtered results info is saved in the '{info_file}' file\")\n    check_and_move(args, prompt_dict)\n\ndef register_subparsers(subparser):\n    parser = subparser.add_parser('static_filter')\n    parser.add_argument('--model', type=str, default=f\"{CACHE_DIR}/raft_model/models/raft-things.pth\", help=\"restore checkpoint\")\n    parser.add_argument('--videos_path', default=\"\", required=True, help=\"video path for filtering\")\n    parser.add_argument('--result_path', type=str, default=\"./filter_results\", help='result save path')\n    parser.add_argument('--store_name', type=str, default=\"filtered_static_video.json\", help='result file name')\n    parser.add_argument('--small', action='store_true', help='use small model')\n    parser.add_argument('--mixed_precision', action='store_true', help='use mixed precision')\n    parser.add_argument('--alternate_corr', action='store_true', help='use efficent correlation implementation')\n    parser.add_argument('--filter_scope', default='temporal_flickering', help=f'''For specifying the scope for filtering videos\n        1. 'temporal_flickering' (default): filter videos based on matches with temporal_flickering dimension of VBench.\n        2. 'all': filter all video in the current directory.\n        3. '$filename': if a filepath to a JSON file is provided, only the filename exists in JSON file will be filtered.\n                >       usage: --filter_scope example.json\n    ''')\n    parser.set_defaults(func=static_filter)\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/cli/vbench.py",
    "content": "import argparse\nimport importlib\nimport subprocess\n\nvbench_cmd = ['evaluate', 'static_filter']\n\ndef main():\n    parser = argparse.ArgumentParser(prog=\"vbench\", formatter_class=argparse.RawTextHelpFormatter)\n    subparsers = parser.add_subparsers(title='vbench subcommands')\n\n    for cmd in vbench_cmd:\n        module = importlib.import_module(f'vbench.cli.{cmd}')\n        module.register_subparsers(subparsers)\n    parser.set_defaults(func=help)\n    args = parser.parse_args()\n    args.func(args)\n\ndef help(args):\n    subprocess.run(['vbench', '-h'], check=True)\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/color.py",
    "content": "import os\nimport json\n\nimport torch\nimport numpy as np\nfrom tqdm import tqdm\nfrom vbench.utils import load_video, load_dimension_info, read_frames_decord_by_fps\nfrom vbench.third_party.grit_model import DenseCaptioning\n\nimport logging\nlogging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')\nlogger = logging.getLogger(__name__)\n\ndef get_dect_from_grit(model, image_arrays):\n    pred = []\n    if type(image_arrays) is not list and type(image_arrays) is not np.ndarray:\n        image_arrays = image_arrays.numpy()\n    with torch.no_grad():\n        for frame in image_arrays:\n            ret = model.run_caption_tensor(frame)\n            cur_pred = []\n            if len(ret[0])<1:\n                cur_pred.append(['',''])\n            else:\n                for idx, cap_det in enumerate(ret[0]):\n                    cur_pred.append([cap_det[0], cap_det[2][0]])\n            pred.append(cur_pred)\n    return pred\n\ndef check_generate(color_key, object_key, predictions):\n    cur_object_color, cur_object = 0, 0\n    for frame_pred in predictions:\n        object_flag, color_flag = False, False\n        for pred in frame_pred:\n            if object_key == pred[1]:\n                for color_query in [\"white\",\"red\",\"pink\",\"blue\",\"silver\",\"purple\",\"orange\",\"green\",\"gray\",\"yellow\",\"black\",\"grey\"]:\n                    if color_query in pred[0]:\n                        object_flag =True\n                if color_key in pred[0]:\n                    color_flag = True\n        if color_flag:\n            cur_object_color+=1\n        if object_flag:\n            cur_object +=1\n    return cur_object, cur_object_color\n\ndef color(model, video_dict, device):\n    success_frame_count_all, video_count = 0, 0\n    video_results = []\n    for info in tqdm(video_dict):\n        if 'auxiliary_info' not in info:\n            raise \"Auxiliary info is not in json, please check your json.\"\n        # print(info)\n        color_info = info['auxiliary_info']['color']\n        object_info = info['prompt']\n        object_info = object_info.replace('a ','').replace('an ','').replace(color_info,'').strip()\n        for video_path in info['video_list']:\n            video_arrays = load_video(video_path, num_frames=16, return_tensor=False)\n            cur_video_pred = get_dect_from_grit(model ,video_arrays)\n            cur_object, cur_object_color = check_generate(color_info, object_info, cur_video_pred)\n            if cur_object>0:\n                cur_success_frame_rate = cur_object_color/cur_object\n                success_frame_count_all += cur_success_frame_rate\n                video_count += 1\n                video_results.append({'video_path': video_path, 'video_results': cur_success_frame_rate})\n    success_rate = success_frame_count_all / video_count\n    return success_rate, video_results\n        \n\ndef compute_color(json_dir, device, submodules_dict, **kwargs):\n    dense_caption_model = DenseCaptioning(device)\n    dense_caption_model.initialize_model(**submodules_dict)\n    logger.info(\"Initialize detection model success\")\n    _, prompt_dict_ls = load_dimension_info(json_dir, dimension='color', lang='en')\n    all_results, video_results = color(dense_caption_model, prompt_dict_ls, device)\n    return all_results, video_results\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/dynamic_degree.py",
    "content": "import argparse\nimport os\nimport cv2\nimport glob\nimport numpy as np\nimport torch\nfrom tqdm import tqdm\nfrom easydict import EasyDict as edict\n\nfrom vbench.utils import load_dimension_info\n\nfrom vbench.third_party.RAFT.core.raft import RAFT\nfrom vbench.third_party.RAFT.core.utils_core.utils import InputPadder\n\nclass DynamicDegree:\n    def __init__(self, args, device):\n        self.args = args\n        self.device = device\n        self.load_model()\n    \n\n    def load_model(self):\n        self.model = torch.nn.DataParallel(RAFT(self.args))\n        self.model.load_state_dict(torch.load(self.args.model))\n\n        self.model = self.model.module\n        self.model.to(self.device)\n        self.model.eval()\n\n\n\n    def get_score(self, img, flo):\n        img = img[0].permute(1,2,0).cpu().numpy()\n        flo = flo[0].permute(1,2,0).cpu().numpy()\n\n        u = flo[:,:,0]\n        v = flo[:,:,1]\n        rad = np.sqrt(np.square(u) + np.square(v))\n        \n        h, w = rad.shape\n        rad_flat = rad.flatten()\n        cut_index = int(h*w*0.05)\n\n        max_rad = np.mean(abs(np.sort(-rad_flat))[:cut_index])\n\n        return max_rad.item()\n\n\n    def set_params(self, frame, count):\n        scale = min(list(frame.shape)[-2:])\n        self.params = {\"thres\":6.0*(scale/256.0), \"count_num\":round(4*(count/16.0))}\n\n\n    def infer(self, video_path):\n        with torch.no_grad():\n            if video_path.endswith('.mp4'):\n                frames = self.get_frames(video_path)\n            elif os.path.isdir(video_path):\n                frames = self.get_frames_from_img_folder(video_path)\n            else:\n                raise NotImplementedError\n            self.set_params(frame=frames[0], count=len(frames))\n            static_score = []\n            for image1, image2 in zip(frames[:-1], frames[1:]):\n                padder = InputPadder(image1.shape)\n                image1, image2 = padder.pad(image1, image2)\n                _, flow_up = self.model(image1, image2, iters=20, test_mode=True)\n                max_rad = self.get_score(image1, flow_up)\n                static_score.append(max_rad)\n            whether_move = self.check_move(static_score)\n            return whether_move\n\n\n    def check_move(self, score_list):\n        thres = self.params[\"thres\"]\n        count_num = self.params[\"count_num\"]\n        count = 0\n        for score in score_list:\n            if score > thres:\n                count += 1\n            if count >= count_num:\n                return True\n        return False\n\n\n    def get_frames(self, video_path):\n        frame_list = []\n        video = cv2.VideoCapture(video_path)\n        fps = video.get(cv2.CAP_PROP_FPS) # get fps\n        interval = round(fps/8)\n        while video.isOpened():\n            success, frame = video.read()\n            if success:\n                frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)  # convert to rgb\n                frame = torch.from_numpy(frame.astype(np.uint8)).permute(2, 0, 1).float()\n                frame = frame[None].to(self.device)\n                frame_list.append(frame)\n            else:\n                break\n        video.release()\n        assert frame_list != []\n        frame_list = self.extract_frame(frame_list, interval)\n        return frame_list \n    \n    \n    def extract_frame(self, frame_list, interval=1):\n        extract = []\n        for i in range(0, len(frame_list), interval):\n            extract.append(frame_list[i])\n        return extract\n\n\n    def get_frames_from_img_folder(self, img_folder):\n        exts = ['jpg', 'png', 'jpeg', 'bmp', 'tif', \n        'tiff', 'JPG', 'PNG', 'JPEG', 'BMP', \n        'TIF', 'TIFF']\n        frame_list = []\n        imgs = sorted([p for p in glob.glob(os.path.join(img_folder, \"*\")) if os.path.splitext(p)[1][1:] in exts])\n        # imgs = sorted(glob.glob(os.path.join(img_folder, \"*.png\")))\n        for img in imgs:\n            frame = cv2.imread(img, cv2.IMREAD_COLOR)\n            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n            frame = torch.from_numpy(frame.astype(np.uint8)).permute(2, 0, 1).float()\n            frame = frame[None].to(self.device)\n            frame_list.append(frame)\n        assert frame_list != []\n        return frame_list\n\n\n\ndef dynamic_degree(dynamic, video_list):\n    sim = []\n    video_results = []\n    for video_path in tqdm(video_list):\n        score_per_video = dynamic.infer(video_path)\n        video_results.append({'video_path': video_path, 'video_results': score_per_video})\n        sim.append(score_per_video)\n    avg_score = np.mean(sim)\n    return avg_score, video_results\n\n\n\ndef compute_dynamic_degree(json_dir, device, submodules_list, **kwargs):\n    model_path = submodules_list[\"model\"] \n    # set_args\n    args_new = edict({\"model\":model_path, \"small\":False, \"mixed_precision\":False, \"alternate_corr\":False})\n    dynamic = DynamicDegree(args_new, device)\n    video_list, _ = load_dimension_info(json_dir, dimension='dynamic_degree', lang='en')\n    all_results, video_results = dynamic_degree(dynamic, video_list)\n    return all_results, video_results\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/human_action.py",
    "content": "import os\nimport json\nimport numpy as np\nimport clip\nfrom PIL import Image\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom vbench.utils import load_video, load_dimension_info\nfrom vbench.third_party.umt.datasets.video_transforms import (\n    Compose, Resize, CenterCrop, Normalize,\n    create_random_augment, random_short_side_scale_jitter, \n    random_crop, random_resized_crop_with_shift, random_resized_crop,\n    horizontal_flip, random_short_side_scale_jitter, uniform_crop, \n)\nfrom vbench.third_party.umt.datasets.volume_transforms import ClipToTensor\nfrom timm.models import create_model\nfrom vbench.third_party.umt.models.modeling_finetune import vit_large_patch16_224\nfrom tqdm import tqdm\n\ndef build_dict():\n    CUR_DIR = os.path.dirname(os.path.abspath(__file__))\n    path = f'{CUR_DIR}/third_party/umt/kinetics_400_categories.txt'\n    results = {}\n    with open(path, 'r') as f:\n        cat_list = f.readlines()\n        cat_list = [c.strip() for c in cat_list]\n        for line in cat_list:\n            cat, number = line.split('\\t')\n            results[number] = cat.lower()\n    return results\n\n\ndef human_action(umt_path, video_list, device):\n    state_dict = torch.load(umt_path, map_location='cpu')\n    model = create_model(\n        \"vit_large_patch16_224\",\n        pretrained=False,\n        num_classes=400,\n        all_frames=16,\n        tubelet_size=1,\n        use_learnable_pos_emb=False,\n        fc_drop_rate=0.,\n        drop_rate=0.,\n        drop_path_rate=0.2,\n        attn_drop_rate=0.,\n        drop_block_rate=None,\n        use_checkpoint=False,\n        checkpoint_num=16,\n        use_mean_pooling=True,\n        init_scale=0.001,\n    )\n    data_transform = Compose([\n        Resize(256, interpolation='bilinear'),\n        CenterCrop(size=(224, 224)),\n        ClipToTensor(),\n        Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])\n    ])\n    model = model.to(device)\n    model.load_state_dict(state_dict, strict=False)\n    model.eval()\n    cat_dict = build_dict()\n    cnt= 0\n    cor_num = 0\n    video_results = []\n    for video_path in tqdm(video_list):\n        video_label_ls = video_path.split('/')[-1].lower().split('-')[0].split(\"person is \")[-1].split('_')[0]\n        cnt += 1\n        images = load_video(video_path, data_transform, num_frames=16)\n        images = images.unsqueeze(0)\n        images = images.to(device)\n        with torch.no_grad():\n            logits = torch.sigmoid(model(images))\n            results, indices = torch.topk(logits, 5, dim=1)\n        indices = indices.squeeze().tolist()\n        results = results.squeeze().tolist()\n        results = [round(f, 4) for f in results]\n        cat_ls = []\n        for i in range(5):\n            if results[i] >= 0.85:\n                cat_ls.append(cat_dict[str(indices[i])])\n        flag = False\n        for cat in cat_ls:\n            if cat == video_label_ls:\n                cor_num += 1\n                flag = True\n                # print(f\"{cnt}: {video_path} correct, top-5: {cat_ls}, logits: {results}\", flush=True)\n                break\n        if flag is False:\n            # print(f\"{cnt}: {video_path} false, gt: {video_label_ls}, top-5: {cat_ls}, logits: {results}\", flush=True)\n            pass\n        video_results.append({'video_path': video_path, 'video_results': flag})\n    # print(f\"cor num: {cor_num}, total: {cnt}\")\n    acc = cor_num / cnt\n    return acc, video_results\n\n\ndef compute_human_action(json_dir, device, submodules_list, **kwargs):\n    umt_path = submodules_list[0]\n    video_list, _ = load_dimension_info(json_dir, dimension='human_action', lang='en')\n    all_results, video_results = human_action(umt_path, video_list, device)\n    return all_results, video_results\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/imaging_quality.py",
    "content": "import torch\nfrom tqdm import tqdm\nfrom torchvision import transforms\nfrom pyiqa.archs.musiq_arch import MUSIQ\nfrom vbench.utils import load_video, load_dimension_info\n\ndef transform(images, preprocess_mode='shorter'):\n    if preprocess_mode.startswith('shorter'):\n        _, _, h, w = images.size()\n        if min(h,w) > 512:\n            scale = 512./min(h,w)\n            images = transforms.Resize(size=( int(scale * h), int(scale * w) ))(images)\n            if preprocess_mode == 'shorter_centercrop':\n                images = transforms.CenterCrop(512)(images)\n\n    elif preprocess_mode == 'longer':\n        _, _, h, w = images.size()\n        if max(h,w) > 512:\n            scale = 512./max(h,w)\n            images = transforms.Resize(size=( int(scale * h), int(scale * w) ))(images)\n\n    elif preprocess_mode == 'None':\n        return images / 255.\n\n    else:\n        raise ValueError(\"Please recheck imaging_quality_mode\")\n    return images / 255.\n\ndef technical_quality(model, video_list, device, **kwargs):\n    preprocess_mode = kwargs['imaging_quality_preprocessing_mode']\n    video_results = []\n    for video_path in tqdm(video_list):\n        images = load_video(video_path)\n        images = transform(images, preprocess_mode)\n        acc_score_video = 0.\n        for i in range(len(images)):\n            frame = images[i].unsqueeze(0).to(device)\n            score = model(frame)\n            acc_score_video += float(score)\n        video_results.append({'video_path': video_path, 'video_results': acc_score_video/len(images)})\n    average_score = sum([o['video_results'] for o in video_results]) / len(video_results)\n    average_score = average_score / 100.\n    return average_score, video_results\n\n\ndef compute_imaging_quality(json_dir, device, submodules_list, **kwargs):\n    model_path = submodules_list['model_path']\n\n    model = MUSIQ(pretrained_model_path=model_path)\n    model.to(device)\n    model.training = False\n    \n    video_list, _ = load_dimension_info(json_dir, dimension='imaging_quality', lang='en')\n    all_results, video_results = technical_quality(model, video_list, device, **kwargs)\n    return all_results, video_results\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/motion_smoothness.py",
    "content": "import os\nimport cv2\nimport glob\nimport torch\nimport numpy as np\nfrom tqdm import tqdm\nfrom omegaconf import OmegaConf\n\nfrom vbench.utils import load_dimension_info\n\nfrom vbench.third_party.amt.utils.utils import (\n    img2tensor, tensor2img,\n    check_dim_and_resize\n    )\nfrom vbench.third_party.amt.utils.build_utils import build_from_cfg\nfrom vbench.third_party.amt.utils.utils import InputPadder\n\n\nclass FrameProcess:\n    def __init__(self):\n        pass\n\n\n    def get_frames(self, video_path):\n        frame_list = []\n        video = cv2.VideoCapture(video_path)\n        while video.isOpened():\n            success, frame = video.read()\n            if success:\n                frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)  # convert to rgb\n                frame_list.append(frame)\n            else:\n                break\n        video.release()\n        assert frame_list != []\n        return frame_list \n    \n\n    def get_frames_from_img_folder(self, img_folder):\n        exts = ['jpg', 'png', 'jpeg', 'bmp', 'tif', \n                'tiff', 'JPG', 'PNG', 'JPEG', 'BMP', \n                'TIF', 'TIFF']\n        frame_list = []\n        imgs = sorted([p for p in glob.glob(os.path.join(img_folder, \"*\")) if os.path.splitext(p)[1][1:] in exts])\n        # imgs = sorted(glob.glob(os.path.join(img_folder, \"*.png\")))\n        for img in imgs:\n            frame = cv2.imread(img, cv2.IMREAD_COLOR)\n            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n            frame_list.append(frame)\n        assert frame_list != []\n        return frame_list\n\n\n    def extract_frame(self, frame_list, start_from=0):\n        extract = []\n        for i in range(start_from, len(frame_list), 2):\n            extract.append(frame_list[i])\n        return extract\n\n\nclass MotionSmoothness:\n    def __init__(self, config, ckpt, device):\n        self.device = device\n        self.config = config\n        self.ckpt = ckpt\n        self.niters = 1\n        self.initialization()\n        self.load_model()\n\n    \n    def load_model(self):\n        cfg_path = self.config\n        ckpt_path = self.ckpt\n        network_cfg = OmegaConf.load(cfg_path).network\n        network_name = network_cfg.name\n        print(f'Loading [{network_name}] from [{ckpt_path}]...')\n        self.model = build_from_cfg(network_cfg)\n        ckpt = torch.load(ckpt_path)\n        self.model.load_state_dict(ckpt['state_dict'])\n        self.model = self.model.to(self.device)\n        self.model.eval()\n\n\n    def initialization(self):\n        if self.device == 'cuda':\n            self.anchor_resolution = 1024 * 512\n            self.anchor_memory = 1500 * 1024**2\n            self.anchor_memory_bias = 2500 * 1024**2\n            self.vram_avail = torch.cuda.get_device_properties(self.device).total_memory\n            print(\"VRAM available: {:.1f} MB\".format(self.vram_avail / 1024 ** 2))\n        else:\n            # Do not resize in cpu mode\n            self.anchor_resolution = 8192*8192\n            self.anchor_memory = 1\n            self.anchor_memory_bias = 0\n            self.vram_avail = 1\n\n        self.embt = torch.tensor(1/2).float().view(1, 1, 1, 1).to(self.device)\n        self.fp = FrameProcess()\n\n\n    def motion_score(self, video_path):\n        iters = int(self.niters)\n        # get inputs\n        if video_path.endswith('.mp4'):\n            frames = self.fp.get_frames(video_path)\n        elif os.path.isdir(video_path):\n            frames = self.fp.get_frames_from_img_folder(video_path)\n        else:\n            raise NotImplementedError\n        frame_list = self.fp.extract_frame(frames, start_from=0)\n        # print(f'Loading [images] from [{video_path}], the number of images = [{len(frame_list)}]')\n        inputs = [img2tensor(frame).to(self.device) for frame in frame_list]\n        assert len(inputs) > 1, f\"The number of input should be more than one (current {len(inputs)})\"\n        inputs = check_dim_and_resize(inputs)\n        h, w = inputs[0].shape[-2:]\n        scale = self.anchor_resolution / (h * w) * np.sqrt((self.vram_avail - self.anchor_memory_bias) / self.anchor_memory)\n        scale = 1 if scale > 1 else scale\n        scale = 1 / np.floor(1 / np.sqrt(scale) * 16) * 16\n        if scale < 1:\n            print(f\"Due to the limited VRAM, the video will be scaled by {scale:.2f}\")\n        padding = int(16 / scale)\n        padder = InputPadder(inputs[0].shape, padding)\n        inputs = padder.pad(*inputs)\n\n        # -----------------------  Interpolater ----------------------- \n        # print(f'Start frame interpolation:')\n        for i in range(iters):\n            # print(f'Iter {i+1}. input_frames={len(inputs)} output_frames={2*len(inputs)-1}')\n            outputs = [inputs[0]]\n            for in_0, in_1 in zip(inputs[:-1], inputs[1:]):\n                in_0 = in_0.to(self.device)\n                in_1 = in_1.to(self.device)\n                with torch.no_grad():\n                    imgt_pred = self.model(in_0, in_1, self.embt, scale_factor=scale, eval=True)['imgt_pred']\n                outputs += [imgt_pred.cpu(), in_1.cpu()]\n            inputs = outputs\n\n        # -----------------------  cal_vfi_score ----------------------- \n        outputs = padder.unpad(*outputs)\n        outputs = [tensor2img(out) for out in outputs]\n        vfi_score = self.vfi_score(frames, outputs)\n        norm = (255.0 - vfi_score)/255.0\n        return norm\n\n\n    def vfi_score(self, ori_frames, interpolate_frames):\n        ori = self.fp.extract_frame(ori_frames, start_from=1)\n        interpolate = self.fp.extract_frame(interpolate_frames, start_from=1)\n        scores = []\n        for i in range(len(interpolate)):\n            scores.append(self.get_diff(ori[i], interpolate[i]))\n        return np.mean(np.array(scores))\n\n\n    def get_diff(self, img1, img2):\n        img = cv2.absdiff(img1, img2)\n        return np.mean(img)\n\n\n\ndef motion_smoothness(motion, video_list):\n    sim = []\n    video_results = []\n    for video_path in tqdm(video_list):\n        score_per_video = motion.motion_score(video_path)\n        video_results.append({'video_path': video_path, 'video_results': score_per_video})\n        sim.append(score_per_video)\n    avg_score = np.mean(sim)\n    return avg_score, video_results\n\n\n\ndef compute_motion_smoothness(json_dir, device, submodules_list, **kwargs):\n    config = submodules_list[\"config\"] # pretrained/amt_model/AMT-S.yaml\n    ckpt = submodules_list[\"ckpt\"] # pretrained/amt_model/amt-s.pth\n    motion = MotionSmoothness(config, ckpt, device)\n    video_list, _ = load_dimension_info(json_dir, dimension='motion_smoothness', lang='en')\n    all_results, video_results = motion_smoothness(motion, video_list)\n    return all_results, video_results\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/multiple_objects.py",
    "content": "import os\nimport json\n\nimport torch\nimport numpy as np\nfrom tqdm import tqdm\nfrom vbench.utils import load_video, load_dimension_info\nfrom vbench.third_party.grit_model import DenseCaptioning\n\nimport logging\nlogging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')\nlogger = logging.getLogger(__name__)\n\ndef get_dect_from_grit(model, image_arrays):\n    pred = []\n    if type(image_arrays) is not list:\n        image_arrays = image_arrays.numpy()\n    with torch.no_grad():\n        for frame in image_arrays:\n            ret = model.run_caption_tensor(frame)\n            if len(ret[0])>0:\n                pred.append(set(ret[0][0][2]))\n            else:\n                pred.append(set([]))\n    return pred\n\ndef check_generate(key_info, predictions):\n    cur_cnt = 0\n    key_a, key_b = key_info.split(' and ')\n    key_a = key_a.strip()\n    key_b = key_b.strip()\n    for pred in predictions:\n        if key_a in pred and key_b in pred:\n            cur_cnt+=1\n    return cur_cnt\n\ndef multiple_objects(model, video_dict, device):\n    success_frame_count, frame_count = 0,0\n    video_results = []\n    for info in tqdm(video_dict):\n        if 'auxiliary_info' not in info:\n            raise \"Auxiliary info is not in json, please check your json.\"\n        object_info = info['auxiliary_info']['object']\n        for video_path in info['video_list']:\n            video_tensor = load_video(video_path, num_frames=16)\n            cur_video_pred = get_dect_from_grit(model, video_tensor.permute(0,2,3,1))\n            cur_success_frame_count = check_generate(object_info, cur_video_pred)\n            cur_success_frame_rate = cur_success_frame_count/len(cur_video_pred)\n            success_frame_count += cur_success_frame_count\n            frame_count += len(cur_video_pred)\n            video_results.append({'video_path': video_path, 'video_results': cur_success_frame_rate})\n    success_rate = success_frame_count / frame_count\n    return success_rate, video_results\n        \n\ndef compute_multiple_objects(json_dir, device, submodules_dict, **kwargs):\n    dense_caption_model = DenseCaptioning(device)\n    dense_caption_model.initialize_model_det(**submodules_dict)\n    logger.info(\"Initialize detection model success\")\n    _, prompt_dict_ls = load_dimension_info(json_dir, dimension='multiple_objects', lang='en')\n    all_results, video_results = multiple_objects(dense_caption_model, prompt_dict_ls, device)\n    return all_results, video_results\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/object_class.py",
    "content": "import os\nimport json\n\nimport torch\nimport numpy as np\nfrom tqdm import tqdm\nfrom vbench.utils import load_video, load_dimension_info\nfrom vbench.third_party.grit_model import DenseCaptioning\n\nimport logging\nlogging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')\nlogger = logging.getLogger(__name__)\n\ndef get_dect_from_grit(model, image_arrays):\n    pred = []\n    if type(image_arrays) is not list:\n        image_arrays = image_arrays.numpy()\n    with torch.no_grad():\n        for frame in image_arrays:\n            try:\n                pred.append(set(model.run_caption_tensor(frame)[0][0][2]))\n            except:\n                pred.append(set())\n    return pred\n\ndef check_generate(key_info, predictions):\n    cur_cnt = 0\n    for pred in predictions:\n        if key_info in pred:\n            cur_cnt+=1\n    return cur_cnt\n\ndef object_class(model, video_dict, device):\n    success_frame_count, frame_count = 0,0\n    video_results = []\n    for info in tqdm(video_dict):\n        if 'auxiliary_info' not in info:\n            raise \"Auxiliary info is not in json, please check your json.\"\n        object_info = info['auxiliary_info']['object']\n        for video_path in info['video_list']:\n            video_tensor = load_video(video_path, num_frames=16)\n            cur_video_pred = get_dect_from_grit(model, video_tensor.permute(0,2,3,1))\n            cur_success_frame_count = check_generate(object_info, cur_video_pred)\n            cur_success_frame_rate = cur_success_frame_count/len(cur_video_pred)\n            success_frame_count += cur_success_frame_count\n            frame_count += len(cur_video_pred)\n            video_results.append({'video_path': video_path, 'video_results': cur_success_frame_rate})\n    success_rate = success_frame_count / frame_count\n    return success_rate, video_results\n        \n\ndef compute_object_class(json_dir, device, submodules_dict, **kwargs):\n    dense_caption_model = DenseCaptioning(device)\n    dense_caption_model.initialize_model_det(**submodules_dict)\n    logger.info(\"Initialize detection model success\")\n    _, prompt_dict_ls = load_dimension_info(json_dir, dimension='object_class', lang='en')\n    all_results, video_results = object_class(dense_caption_model, prompt_dict_ls, device)\n    return all_results, video_results\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/overall_consistency.py",
    "content": "import os\nimport json\nimport numpy as np\n\nimport torch\nimport clip\nfrom tqdm import tqdm\nfrom vbench.utils import load_video, load_dimension_info, clip_transform, read_frames_decord_by_fps, CACHE_DIR\nfrom vbench.third_party.ViCLIP.viclip import ViCLIP\nfrom vbench.third_party.ViCLIP.simple_tokenizer import SimpleTokenizer\n\ndef get_text_features(model, input_text, tokenizer, text_feature_dict={}):\n    if input_text in text_feature_dict:\n        return text_feature_dict[input_text]\n    text_template= f\"{input_text}\"\n    with torch.no_grad():\n        text_features = model.encode_text(text_template).float()\n        text_features /= text_features.norm(dim=-1, keepdim=True)      \n        text_feature_dict[input_text] = text_features\n    return text_features\n\ndef get_vid_features(model, input_frames):\n    with torch.no_grad():\n        clip_feat = model.encode_vision(input_frames,test=True).float()\n        clip_feat /= clip_feat.norm(dim=-1, keepdim=True)    \n    return clip_feat\n\ndef get_predict_label(clip_feature, text_feats_tensor, top=5):\n    label_probs = (100.0 * clip_feature @ text_feats_tensor.T).softmax(dim=-1)\n    top_probs, top_labels = label_probs.cpu().topk(top, dim=-1)\n    return top_probs, top_labels\n\ndef overall_consistency(clip_model, video_dict, tokenizer, device, sample=\"middle\"):\n    sim = []\n    video_results = []\n    image_transform = clip_transform(224)\n    for info in tqdm(video_dict):\n        query = info['prompt']\n        text = clip.tokenize([query]).to(device)\n        video_list = info['video_list']\n        for video_path in video_list:\n            cur_video = []\n            with torch.no_grad():\n                images = read_frames_decord_by_fps(video_path, num_frames=8, sample=sample)\n                images = image_transform(images)\n                images = images.to(device)\n                clip_feat = get_vid_features(clip_model,images.unsqueeze(0))\n                text_feat = get_text_features(clip_model, query, tokenizer)\n                logit_per_text =  clip_feat @ text_feat.T\n                score_per_video =  float(logit_per_text[0][0].cpu())\n                sim.append(score_per_video)\n                video_results.append({'video_path': video_path, 'video_results': score_per_video})\n    avg_score = np.mean(sim)\n    return avg_score, video_results\n\ndef compute_overall_consistency(json_dir, device, submodules_list, **kwargs):\n    tokenizer = SimpleTokenizer(os.path.join(CACHE_DIR, \"ViCLIP/bpe_simple_vocab_16e6.txt.gz\"))\n    viclip = ViCLIP(tokenizer= tokenizer, **submodules_list).to(device)\n    _, video_dict = load_dimension_info(json_dir, dimension='overall_consistency', lang='en')\n    all_results, video_results = overall_consistency(viclip, video_dict, tokenizer, device)\n    return all_results, video_results\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/scene.py",
    "content": "import os\nimport json\n\nimport torch\nimport numpy as np\nfrom tqdm import tqdm\nfrom vbench.utils import load_video, load_dimension_info, tag2text_transform\nfrom vbench.third_party.tag2Text.tag2text import tag2text_caption\n\nimport logging\nlogging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')\nlogger = logging.getLogger(__name__)\n\ndef get_caption(model, image_arrays):\n    caption, tag_predict = model.generate(image_arrays, tag_input = None, return_tag_predict = True)\n    return caption\n\ndef check_generate(key_info, predictions):\n    cur_cnt = 0\n    key = key_info['scene']\n    for pred in predictions:\n        q_flag = [q in pred for q in key.split(' ')]\n        if len(q_flag) == sum(q_flag):\n            cur_cnt +=1\n    return cur_cnt\n\ndef scene(model, video_dict, device):\n    success_frame_count, frame_count = 0,0\n    video_results = []\n    transform = tag2text_transform(384)\n    for info in tqdm(video_dict):\n        if 'auxiliary_info' not in info:\n            raise \"Auxiliary info is not in json, please check your json.\"\n        scene_info = info['auxiliary_info']['scene']\n        for video_path in info['video_list']:\n            video_array = load_video(video_path, num_frames=16, return_tensor=False, width=384, height=384)\n            video_tensor_list = []\n            for i in video_array:\n                video_tensor_list.append(transform(i).to(device).unsqueeze(0))\n            video_tensor = torch.cat(video_tensor_list)\n            cur_video_pred = get_caption(model, video_tensor)\n            cur_success_frame_count = check_generate(scene_info, cur_video_pred)\n            cur_success_frame_rate = cur_success_frame_count/len(cur_video_pred)\n            success_frame_count += cur_success_frame_count\n            frame_count += len(cur_video_pred)\n            video_results.append({'video_path': video_path, 'video_results': cur_success_frame_rate})\n    success_rate = success_frame_count / frame_count\n    return success_rate, video_results\n        \n\ndef compute_scene(json_dir, device, submodules_dict, **kwargs):\n    model = tag2text_caption(**submodules_dict)\n    model.eval()\n    model = model.to(device)\n    logger.info(\"Initialize caption model success\")\n    _, prompt_dict_ls = load_dimension_info(json_dir, dimension='scene', lang='en')\n    all_results, video_results = scene(model, prompt_dict_ls, device)\n    return all_results, video_results\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/spatial_relationship.py",
    "content": "import os\nimport json\n\nimport torch\nimport numpy as np\nfrom tqdm import tqdm\nfrom vbench.utils import load_video, load_dimension_info\nfrom vbench.third_party.grit_model import DenseCaptioning\n\nimport logging\nlogging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')\nlogger = logging.getLogger(__name__)\n\ndef get_position_score(locality, obj1,obj2, iou_threshold=0.1):\n    # input obj1 and obj2 should be [x0,y0,x1,y1]\n    # Calculate centers of bounding boxes\n    box1 = {\n        'x_min': obj1[0],\n        'y_min': obj1[1],\n        'x_max': obj1[2],\n        'y_max': obj1[3],\n        'width': obj1[2] - obj1[0],\n        'height': obj1[3] - obj1[1]\n    }\n\n    box2 = {\n        'x_min': obj2[0],\n        'y_min': obj2[1],\n        'x_max': obj2[2],\n        'y_max': obj2[3],\n        'width': obj2[2] - obj2[0],\n        'height': obj2[3] - obj2[1]\n    }\n    \n    # Get the object center\n    box1_center = ((box1['x_min'] + box1['x_max']) / 2, (box1['y_min'] + box1['y_max']) / 2)\n    box2_center = ((box2['x_min'] + box2['x_max']) / 2, (box2['y_min'] + box2['y_max']) / 2)\n\n    # Calculate horizontal and vertical distances\n    x_distance = box2_center[0] - box1_center[0]\n    y_distance = box2_center[1] - box1_center[1]\n\n    # Calculate IoU\n    x_overlap = max(0, min(box1['x_max'], box2['x_max']) - max(box1['x_min'], box2['x_min']))\n    y_overlap = max(0, min(box1['y_max'], box2['y_max']) - max(box1['y_min'], box2['y_min']))\n    intersection = x_overlap * y_overlap\n    box1_area = (box1['x_max'] - box1['x_min']) * (box1['y_max'] - box1['y_min'])\n    box2_area = (box2['x_max'] - box2['x_min']) * (box2['y_max'] - box2['y_min'])\n    union = box1_area + box2_area - intersection\n    iou = intersection / union\n\n    # get max object width and max object height\n    max_width = max(box1['width'], box2['width'])\n    max_height = max(box1['height'], box2['height'])\n\n    score=0\n    if locality in 'on the right of' or locality in 'on the left of':\n        if abs(x_distance) > abs(y_distance) and iou < iou_threshold:\n            score=1\n        elif abs(x_distance) > abs(y_distance) and iou >= iou_threshold:\n            score=iou_threshold/iou\n        else:\n            score=0\n    elif locality in 'on the bottom of' or locality in 'on the top of':\n        if abs(y_distance) > abs(x_distance) and iou < iou_threshold:\n            score=1\n        elif abs(y_distance) > abs(x_distance) and iou >= iou_threshold:\n            score=iou_threshold/iou\n        else:\n            score = 0\n    return score\n\ndef get_dect_from_grit(model, image_arrays):\n    pred = []\n    if type(image_arrays) is not list:\n        image_arrays = image_arrays.numpy()\n    with torch.no_grad():\n        for frame in image_arrays:\n            ret = model.run_caption_tensor(frame)\n            pred_cur = []\n            if len(ret[0])>0:\n                for info in ret[0]:\n                    pred_cur.append([info[0],info[1]])\n            pred.append(pred_cur)\n    return pred\n\ndef check_generate(key_info, predictions):\n    key_a = key_info['object_a']\n    key_b = key_info['object_b']\n    relation = key_info['relationship']\n    frame_score =[]\n    for frame_pred in predictions:\n        # filter the target object\n        frame_obj_locats = []\n        cur_score = [0]\n        for item in frame_pred:\n            if (key_a == item[0]) or (key_b == item[0]):\n                frame_obj_locats.append(item[1])\n            for c_obj1 in range(len(frame_obj_locats)-1):\n                for c_obj2 in range(c_obj1+1 ,len(frame_obj_locats)):\n                    score_obj1_obj2 = get_position_score(relation, frame_obj_locats[c_obj1], frame_obj_locats[c_obj2])\n                    cur_score.append(score_obj1_obj2)\n        frame_score.append(max(cur_score))\n    return frame_score\n\ndef spatial_relationship(model, video_dict, device):\n    video_results = []\n    frame_score_overall = []\n    for info in tqdm(video_dict):\n        if 'auxiliary_info' not in info:\n            raise \"Auxiliary info is not in json, please check your json.\"\n        object_info = info['auxiliary_info']['spatial_relationship']\n        for video_path in info['video_list']:\n            video_tensor = load_video(video_path, num_frames=16)\n            cur_video_pred = get_dect_from_grit(model, video_tensor.permute(0,2,3,1))\n            cur_video_frame_score = check_generate(object_info, cur_video_pred)\n            cur_success_frame_rate = np.mean(cur_video_frame_score)\n            frame_score_overall.extend(cur_video_frame_score)\n            video_results.append({'video_path': video_path, 'video_results': cur_success_frame_rate, 'frame_results':cur_video_frame_score})\n    success_rate = np.mean(frame_score_overall)\n    return success_rate, video_results\n        \n\ndef compute_spatial_relationship(json_dir, device, submodules_dict, **kwargs):\n    dense_caption_model = DenseCaptioning(device)\n    dense_caption_model.initialize_model_det(**submodules_dict)\n    logger.info(\"Initialize detection model success\")\n    _, prompt_dict_ls = load_dimension_info(json_dir, dimension='spatial_relationship', lang='en')\n    all_results, video_results = spatial_relationship(dense_caption_model, prompt_dict_ls, device)\n    return all_results, video_results\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/subject_consistency.py",
    "content": "import io\nimport os\nimport cv2\nimport json\nimport numpy as np\nfrom PIL import Image\nfrom tqdm import tqdm\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torchvision.transforms as transforms\n\nfrom vbench.utils import load_video, load_dimension_info, dino_transform, dino_transform_Image\nimport logging\nlogging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')\nlogger = logging.getLogger(__name__)\n\ndef subject_consistency(model, video_list, device, read_frame):\n    sim = 0.0\n    cnt = 0\n    video_results = []\n    if read_frame:\n        image_transform = dino_transform_Image(224)\n    else:\n        image_transform = dino_transform(224)\n    for video_path in tqdm(video_list):\n        video_sim = 0.0\n        if read_frame:\n            video_path = video_path[:-4].replace('videos', 'frames').replace(' ', '_')\n            tmp_paths = [os.path.join(video_path, f) for f in sorted(os.listdir(video_path))]\n            images = []\n            for tmp_path in tmp_paths:\n                images.append(image_transform(Image.open(tmp_path)))\n        else:\n            images = load_video(video_path)\n            images = image_transform(images)\n        for i in range(len(images)):\n            with torch.no_grad():\n                image = images[i].unsqueeze(0)\n                image = image.to(device)\n                image_features = model(image)\n                image_features = F.normalize(image_features, dim=-1, p=2)\n                if i == 0:\n                    first_image_features = image_features\n                else:\n                    sim_pre = max(0.0, F.cosine_similarity(former_image_features, image_features).item())\n                    sim_fir = max(0.0, F.cosine_similarity(first_image_features, image_features).item())\n                    cur_sim = (sim_pre + sim_fir) / 2\n                    video_sim += cur_sim\n                    cnt += 1\n            former_image_features = image_features\n        sim += video_sim\n        video_results.append({'video_path': video_path, 'video_results': video_sim})\n    sim_per_video = sim / (len(video_list) - 1)\n    sim_per_frame = sim / cnt\n    return sim_per_frame, video_results\n\n\ndef compute_subject_consistency(json_dir, device, submodules_list, **kwargs):\n    dino_model = torch.hub.load(**submodules_list).to(device)\n    read_frame = submodules_list['read_frame']\n    logger.info(\"Initialize DINO success\")\n    video_list, _ = load_dimension_info(json_dir, dimension='subject_consistency', lang='en')\n    all_results, video_results = subject_consistency(dino_model, video_list, device, read_frame)\n    return all_results, video_results\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/temporal_flickering.py",
    "content": "import numpy as np\nfrom tqdm import tqdm\nimport cv2\nfrom vbench.utils import load_dimension_info\n\n\ndef get_frames(video_path):\n        frames = []\n        video = cv2.VideoCapture(video_path)\n        while video.isOpened():\n            success, frame = video.read()\n            if success:\n                frames.append(frame)\n            else:\n                break\n        video.release()\n        assert frames != []\n        return frames\n\n\ndef mae_seq(frames):\n    ssds = []\n    for i in range(len(frames)-1):\n        ssds.append(calculate_mae(frames[i], frames[i+1]))\n    return np.array(ssds)\n\n\ndef calculate_mae(img1, img2):\n    \"\"\"Computing the mean absolute error (MAE) between two images.\"\"\"\n    if img1.shape != img2.shape:\n        print(\"Images don't have the same shape.\")\n        return\n    return np.mean(cv2.absdiff(np.array(img1, dtype=np.float32), np.array(img2, dtype=np.float32)))\n\n\ndef cal_score(video_path):\n    \"\"\"please ensure the video is static\"\"\"\n    frames = get_frames(video_path)\n    score_seq = mae_seq(frames)\n    return (255.0 - np.mean(score_seq).item())/255.0\n\n\ndef temporal_flickering(video_list):\n    sim = []\n    video_results = []\n    for video_path in tqdm(video_list):\n        try:\n            score_per_video = cal_score(video_path)\n        except AssertionError:\n            continue\n        video_results.append({'video_path': video_path, 'video_results': score_per_video})\n        sim.append(score_per_video)\n    avg_score = np.mean(sim)\n    return avg_score, video_results\n\n\ndef compute_temporal_flickering(json_dir, device, submodules_list, **kwargs):\n    video_list, _ = load_dimension_info(json_dir, dimension='temporal_flickering', lang='en')\n    all_results, video_results = temporal_flickering(video_list)\n    return all_results, video_results\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/temporal_style.py",
    "content": "import os\nimport json\nimport numpy as np\n\nimport torch\nimport clip\nfrom tqdm import tqdm\nfrom vbench.utils import load_video, load_dimension_info, clip_transform, read_frames_decord_by_fps, CACHE_DIR\nfrom vbench.third_party.ViCLIP.viclip import ViCLIP\nfrom vbench.third_party.ViCLIP.simple_tokenizer import SimpleTokenizer\n\ndef get_text_features(model, input_text, tokenizer, text_feature_dict={}):\n    if input_text in text_feature_dict:\n        return text_feature_dict[input_text]\n    text_template= f\"{input_text}\"\n    with torch.no_grad():\n        text_features = model.encode_text(text_template).float()\n        text_features /= text_features.norm(dim=-1, keepdim=True)      \n        text_feature_dict[input_text] = text_features\n    return text_features\n\ndef get_vid_features(model, input_frames):\n    with torch.no_grad():\n        clip_feat = model.encode_vision(input_frames,test=True).float()\n        clip_feat /= clip_feat.norm(dim=-1, keepdim=True)    \n    return clip_feat\n\ndef get_predict_label(clip_feature, text_feats_tensor, top=5):\n    label_probs = (100.0 * clip_feature @ text_feats_tensor.T).softmax(dim=-1)\n    top_probs, top_labels = label_probs.cpu().topk(top, dim=-1)\n    return top_probs, top_labels\n\ndef temporal_style(clip_model, video_dict, tokenizer, device, sample=\"middle\"):\n    sim = []\n    video_results = []\n    image_transform = clip_transform(224)\n    for info in tqdm(video_dict):\n        query = info['prompt']\n        text = clip.tokenize([query]).to(device)\n        video_list = info['video_list']\n        for video_path in video_list:\n            cur_video = []\n            with torch.no_grad():\n                # images = load_video(video_path, num_frames=8)\n                images = read_frames_decord_by_fps(video_path, num_frames=8, sample=sample)\n                images = image_transform(images)\n                images = images.to(device)\n                clip_feat = get_vid_features(clip_model,images.unsqueeze(0))\n                text_feat = get_text_features(clip_model, query, tokenizer)\n                logit_per_text =  clip_feat @ text_feat.T\n                score_per_video =  float(logit_per_text[0][0].cpu())\n                sim.append(score_per_video)\n                video_results.append({'video_path': video_path, 'video_results': score_per_video})\n    avg_score = np.mean(sim)\n    return avg_score, video_results\n\ndef compute_temporal_style(json_dir, device, submodules_list, **kwargs):\n    tokenizer = SimpleTokenizer(os.path.join(CACHE_DIR, \"ViCLIP/bpe_simple_vocab_16e6.txt.gz\"))\n    viclip = ViCLIP(tokenizer= tokenizer, **submodules_list).to(device)\n    _, video_dict = load_dimension_info(json_dir, dimension='temporal_style', lang='en')\n    all_results, video_results = temporal_style(viclip, video_dict, tokenizer, device)\n    return all_results, video_results\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/0.txt",
    "content": "\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/RAFT/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/RAFT/core/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/RAFT/core/corr.py",
    "content": "import torch\nimport torch.nn.functional as F\nfrom .utils_core.utils import bilinear_sampler, coords_grid\n\ntry:\n    import alt_cuda_corr\nexcept:\n    # alt_cuda_corr is not compiled\n    pass\n\n\nclass CorrBlock:\n    def __init__(self, fmap1, fmap2, num_levels=4, radius=4):\n        self.num_levels = num_levels\n        self.radius = radius\n        self.corr_pyramid = []\n\n        # all pairs correlation\n        corr = CorrBlock.corr(fmap1, fmap2)\n\n        batch, h1, w1, dim, h2, w2 = corr.shape\n        corr = corr.reshape(batch*h1*w1, dim, h2, w2)\n        \n        self.corr_pyramid.append(corr)\n        for i in range(self.num_levels-1):\n            corr = F.avg_pool2d(corr, 2, stride=2)\n            self.corr_pyramid.append(corr)\n\n    def __call__(self, coords):\n        r = self.radius\n        coords = coords.permute(0, 2, 3, 1)\n        batch, h1, w1, _ = coords.shape\n\n        out_pyramid = []\n        for i in range(self.num_levels):\n            corr = self.corr_pyramid[i]\n            dx = torch.linspace(-r, r, 2*r+1, device=coords.device)\n            dy = torch.linspace(-r, r, 2*r+1, device=coords.device)\n            delta = torch.stack(torch.meshgrid(dy, dx), axis=-1)\n\n            centroid_lvl = coords.reshape(batch*h1*w1, 1, 1, 2) / 2**i\n            delta_lvl = delta.view(1, 2*r+1, 2*r+1, 2)\n            coords_lvl = centroid_lvl + delta_lvl\n\n            corr = bilinear_sampler(corr, coords_lvl)\n            corr = corr.view(batch, h1, w1, -1)\n            out_pyramid.append(corr)\n\n        out = torch.cat(out_pyramid, dim=-1)\n        return out.permute(0, 3, 1, 2).contiguous().float()\n\n    @staticmethod\n    def corr(fmap1, fmap2):\n        batch, dim, ht, wd = fmap1.shape\n        fmap1 = fmap1.view(batch, dim, ht*wd)\n        fmap2 = fmap2.view(batch, dim, ht*wd) \n        \n        corr = torch.matmul(fmap1.transpose(1,2), fmap2)\n        corr = corr.view(batch, ht, wd, 1, ht, wd)\n        return corr  / torch.sqrt(torch.tensor(dim).float())\n\n\nclass AlternateCorrBlock:\n    def __init__(self, fmap1, fmap2, num_levels=4, radius=4):\n        self.num_levels = num_levels\n        self.radius = radius\n\n        self.pyramid = [(fmap1, fmap2)]\n        for i in range(self.num_levels):\n            fmap1 = F.avg_pool2d(fmap1, 2, stride=2)\n            fmap2 = F.avg_pool2d(fmap2, 2, stride=2)\n            self.pyramid.append((fmap1, fmap2))\n\n    def __call__(self, coords):\n        coords = coords.permute(0, 2, 3, 1)\n        B, H, W, _ = coords.shape\n        dim = self.pyramid[0][0].shape[1]\n\n        corr_list = []\n        for i in range(self.num_levels):\n            r = self.radius\n            fmap1_i = self.pyramid[0][0].permute(0, 2, 3, 1).contiguous()\n            fmap2_i = self.pyramid[i][1].permute(0, 2, 3, 1).contiguous()\n\n            coords_i = (coords / 2**i).reshape(B, 1, H, W, 2).contiguous()\n            corr, = alt_cuda_corr.forward(fmap1_i, fmap2_i, coords_i, r)\n            corr_list.append(corr.squeeze(1))\n\n        corr = torch.stack(corr_list, dim=1)\n        corr = corr.reshape(B, -1, H, W)\n        return corr / torch.sqrt(torch.tensor(dim).float())\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/RAFT/core/datasets.py",
    "content": "# Data loading based on https://github.com/NVIDIA/flownet2-pytorch\n\nimport numpy as np\nimport torch\nimport torch.utils.data as data\nimport torch.nn.functional as F\n\nimport os\nimport math\nimport random\nfrom glob import glob\nimport os.path as osp\n\nfrom utils_core import frame_utils\nfrom utils_core.augmentor import FlowAugmentor, SparseFlowAugmentor\n\n\nclass FlowDataset(data.Dataset):\n    def __init__(self, aug_params=None, sparse=False):\n        self.augmentor = None\n        self.sparse = sparse\n        if aug_params is not None:\n            if sparse:\n                self.augmentor = SparseFlowAugmentor(**aug_params)\n            else:\n                self.augmentor = FlowAugmentor(**aug_params)\n\n        self.is_test = False\n        self.init_seed = False\n        self.flow_list = []\n        self.image_list = []\n        self.extra_info = []\n\n    def __getitem__(self, index):\n\n        if self.is_test:\n            img1 = frame_utils.read_gen(self.image_list[index][0])\n            img2 = frame_utils.read_gen(self.image_list[index][1])\n            img1 = np.array(img1).astype(np.uint8)[..., :3]\n            img2 = np.array(img2).astype(np.uint8)[..., :3]\n            img1 = torch.from_numpy(img1).permute(2, 0, 1).float()\n            img2 = torch.from_numpy(img2).permute(2, 0, 1).float()\n            return img1, img2, self.extra_info[index]\n\n        if not self.init_seed:\n            worker_info = torch.utils.data.get_worker_info()\n            if worker_info is not None:\n                torch.manual_seed(worker_info.id)\n                np.random.seed(worker_info.id)\n                random.seed(worker_info.id)\n                self.init_seed = True\n\n        index = index % len(self.image_list)\n        valid = None\n        if self.sparse:\n            flow, valid = frame_utils.readFlowKITTI(self.flow_list[index])\n        else:\n            flow = frame_utils.read_gen(self.flow_list[index])\n\n        img1 = frame_utils.read_gen(self.image_list[index][0])\n        img2 = frame_utils.read_gen(self.image_list[index][1])\n\n        flow = np.array(flow).astype(np.float32)\n        img1 = np.array(img1).astype(np.uint8)\n        img2 = np.array(img2).astype(np.uint8)\n\n        # grayscale images\n        if len(img1.shape) == 2:\n            img1 = np.tile(img1[...,None], (1, 1, 3))\n            img2 = np.tile(img2[...,None], (1, 1, 3))\n        else:\n            img1 = img1[..., :3]\n            img2 = img2[..., :3]\n\n        if self.augmentor is not None:\n            if self.sparse:\n                img1, img2, flow, valid = self.augmentor(img1, img2, flow, valid)\n            else:\n                img1, img2, flow = self.augmentor(img1, img2, flow)\n\n        img1 = torch.from_numpy(img1).permute(2, 0, 1).float()\n        img2 = torch.from_numpy(img2).permute(2, 0, 1).float()\n        flow = torch.from_numpy(flow).permute(2, 0, 1).float()\n\n        if valid is not None:\n            valid = torch.from_numpy(valid)\n        else:\n            valid = (flow[0].abs() < 1000) & (flow[1].abs() < 1000)\n\n        return img1, img2, flow, valid.float()\n\n\n    def __rmul__(self, v):\n        self.flow_list = v * self.flow_list\n        self.image_list = v * self.image_list\n        return self\n        \n    def __len__(self):\n        return len(self.image_list)\n        \n\nclass MpiSintel(FlowDataset):\n    def __init__(self, aug_params=None, split='training', root='datasets/Sintel', dstype='clean'):\n        super(MpiSintel, self).__init__(aug_params)\n        flow_root = osp.join(root, split, 'flow')\n        image_root = osp.join(root, split, dstype)\n\n        if split == 'test':\n            self.is_test = True\n\n        for scene in os.listdir(image_root):\n            image_list = sorted(glob(osp.join(image_root, scene, '*.png')))\n            for i in range(len(image_list)-1):\n                self.image_list += [ [image_list[i], image_list[i+1]] ]\n                self.extra_info += [ (scene, i) ] # scene and frame_id\n\n            if split != 'test':\n                self.flow_list += sorted(glob(osp.join(flow_root, scene, '*.flo')))\n\n\nclass FlyingChairs(FlowDataset):\n    def __init__(self, aug_params=None, split='train', root='datasets/FlyingChairs_release/data'):\n        super(FlyingChairs, self).__init__(aug_params)\n\n        images = sorted(glob(osp.join(root, '*.ppm')))\n        flows = sorted(glob(osp.join(root, '*.flo')))\n        assert (len(images)//2 == len(flows))\n\n        split_list = np.loadtxt('chairs_split.txt', dtype=np.int32)\n        for i in range(len(flows)):\n            xid = split_list[i]\n            if (split=='training' and xid==1) or (split=='validation' and xid==2):\n                self.flow_list += [ flows[i] ]\n                self.image_list += [ [images[2*i], images[2*i+1]] ]\n\n\nclass FlyingThings3D(FlowDataset):\n    def __init__(self, aug_params=None, root='datasets/FlyingThings3D', dstype='frames_cleanpass'):\n        super(FlyingThings3D, self).__init__(aug_params)\n\n        for cam in ['left']:\n            for direction in ['into_future', 'into_past']:\n                image_dirs = sorted(glob(osp.join(root, dstype, 'TRAIN/*/*')))\n                image_dirs = sorted([osp.join(f, cam) for f in image_dirs])\n\n                flow_dirs = sorted(glob(osp.join(root, 'optical_flow/TRAIN/*/*')))\n                flow_dirs = sorted([osp.join(f, direction, cam) for f in flow_dirs])\n\n                for idir, fdir in zip(image_dirs, flow_dirs):\n                    images = sorted(glob(osp.join(idir, '*.png')) )\n                    flows = sorted(glob(osp.join(fdir, '*.pfm')) )\n                    for i in range(len(flows)-1):\n                        if direction == 'into_future':\n                            self.image_list += [ [images[i], images[i+1]] ]\n                            self.flow_list += [ flows[i] ]\n                        elif direction == 'into_past':\n                            self.image_list += [ [images[i+1], images[i]] ]\n                            self.flow_list += [ flows[i+1] ]\n      \n\nclass KITTI(FlowDataset):\n    def __init__(self, aug_params=None, split='training', root='datasets/KITTI'):\n        super(KITTI, self).__init__(aug_params, sparse=True)\n        if split == 'testing':\n            self.is_test = True\n\n        root = osp.join(root, split)\n        images1 = sorted(glob(osp.join(root, 'image_2/*_10.png')))\n        images2 = sorted(glob(osp.join(root, 'image_2/*_11.png')))\n\n        for img1, img2 in zip(images1, images2):\n            frame_id = img1.split('/')[-1]\n            self.extra_info += [ [frame_id] ]\n            self.image_list += [ [img1, img2] ]\n\n        if split == 'training':\n            self.flow_list = sorted(glob(osp.join(root, 'flow_occ/*_10.png')))\n\n\nclass HD1K(FlowDataset):\n    def __init__(self, aug_params=None, root='datasets/HD1k'):\n        super(HD1K, self).__init__(aug_params, sparse=True)\n\n        seq_ix = 0\n        while 1:\n            flows = sorted(glob(os.path.join(root, 'hd1k_flow_gt', 'flow_occ/%06d_*.png' % seq_ix)))\n            images = sorted(glob(os.path.join(root, 'hd1k_input', 'image_2/%06d_*.png' % seq_ix)))\n\n            if len(flows) == 0:\n                break\n\n            for i in range(len(flows)-1):\n                self.flow_list += [flows[i]]\n                self.image_list += [ [images[i], images[i+1]] ]\n\n            seq_ix += 1\n\n\ndef fetch_dataloader(args, TRAIN_DS='C+T+K+S+H'):\n    \"\"\" Create the data loader for the corresponding trainign set \"\"\"\n\n    if args.stage == 'chairs':\n        aug_params = {'crop_size': args.image_size, 'min_scale': -0.1, 'max_scale': 1.0, 'do_flip': True}\n        train_dataset = FlyingChairs(aug_params, split='training')\n    \n    elif args.stage == 'things':\n        aug_params = {'crop_size': args.image_size, 'min_scale': -0.4, 'max_scale': 0.8, 'do_flip': True}\n        clean_dataset = FlyingThings3D(aug_params, dstype='frames_cleanpass')\n        final_dataset = FlyingThings3D(aug_params, dstype='frames_finalpass')\n        train_dataset = clean_dataset + final_dataset\n\n    elif args.stage == 'sintel':\n        aug_params = {'crop_size': args.image_size, 'min_scale': -0.2, 'max_scale': 0.6, 'do_flip': True}\n        things = FlyingThings3D(aug_params, dstype='frames_cleanpass')\n        sintel_clean = MpiSintel(aug_params, split='training', dstype='clean')\n        sintel_final = MpiSintel(aug_params, split='training', dstype='final')        \n\n        if TRAIN_DS == 'C+T+K+S+H':\n            kitti = KITTI({'crop_size': args.image_size, 'min_scale': -0.3, 'max_scale': 0.5, 'do_flip': True})\n            hd1k = HD1K({'crop_size': args.image_size, 'min_scale': -0.5, 'max_scale': 0.2, 'do_flip': True})\n            train_dataset = 100*sintel_clean + 100*sintel_final + 200*kitti + 5*hd1k + things\n\n        elif TRAIN_DS == 'C+T+K/S':\n            train_dataset = 100*sintel_clean + 100*sintel_final + things\n\n    elif args.stage == 'kitti':\n        aug_params = {'crop_size': args.image_size, 'min_scale': -0.2, 'max_scale': 0.4, 'do_flip': False}\n        train_dataset = KITTI(aug_params, split='training')\n\n    train_loader = data.DataLoader(train_dataset, batch_size=args.batch_size, \n        pin_memory=False, shuffle=True, num_workers=4, drop_last=True)\n\n    print('Training with %d image pairs' % len(train_dataset))\n    return train_loader\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/RAFT/core/extractor.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\nclass ResidualBlock(nn.Module):\n    def __init__(self, in_planes, planes, norm_fn='group', stride=1):\n        super(ResidualBlock, self).__init__()\n  \n        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, padding=1, stride=stride)\n        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, padding=1)\n        self.relu = nn.ReLU(inplace=True)\n\n        num_groups = planes // 8\n\n        if norm_fn == 'group':\n            self.norm1 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n            self.norm2 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n            if not stride == 1:\n                self.norm3 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n        \n        elif norm_fn == 'batch':\n            self.norm1 = nn.BatchNorm2d(planes)\n            self.norm2 = nn.BatchNorm2d(planes)\n            if not stride == 1:\n                self.norm3 = nn.BatchNorm2d(planes)\n        \n        elif norm_fn == 'instance':\n            self.norm1 = nn.InstanceNorm2d(planes)\n            self.norm2 = nn.InstanceNorm2d(planes)\n            if not stride == 1:\n                self.norm3 = nn.InstanceNorm2d(planes)\n\n        elif norm_fn == 'none':\n            self.norm1 = nn.Sequential()\n            self.norm2 = nn.Sequential()\n            if not stride == 1:\n                self.norm3 = nn.Sequential()\n\n        if stride == 1:\n            self.downsample = None\n        \n        else:    \n            self.downsample = nn.Sequential(\n                nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride), self.norm3)\n\n\n    def forward(self, x):\n        y = x\n        y = self.relu(self.norm1(self.conv1(y)))\n        y = self.relu(self.norm2(self.conv2(y)))\n\n        if self.downsample is not None:\n            x = self.downsample(x)\n\n        return self.relu(x+y)\n\n\n\nclass BottleneckBlock(nn.Module):\n    def __init__(self, in_planes, planes, norm_fn='group', stride=1):\n        super(BottleneckBlock, self).__init__()\n  \n        self.conv1 = nn.Conv2d(in_planes, planes//4, kernel_size=1, padding=0)\n        self.conv2 = nn.Conv2d(planes//4, planes//4, kernel_size=3, padding=1, stride=stride)\n        self.conv3 = nn.Conv2d(planes//4, planes, kernel_size=1, padding=0)\n        self.relu = nn.ReLU(inplace=True)\n\n        num_groups = planes // 8\n\n        if norm_fn == 'group':\n            self.norm1 = nn.GroupNorm(num_groups=num_groups, num_channels=planes//4)\n            self.norm2 = nn.GroupNorm(num_groups=num_groups, num_channels=planes//4)\n            self.norm3 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n            if not stride == 1:\n                self.norm4 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n        \n        elif norm_fn == 'batch':\n            self.norm1 = nn.BatchNorm2d(planes//4)\n            self.norm2 = nn.BatchNorm2d(planes//4)\n            self.norm3 = nn.BatchNorm2d(planes)\n            if not stride == 1:\n                self.norm4 = nn.BatchNorm2d(planes)\n        \n        elif norm_fn == 'instance':\n            self.norm1 = nn.InstanceNorm2d(planes//4)\n            self.norm2 = nn.InstanceNorm2d(planes//4)\n            self.norm3 = nn.InstanceNorm2d(planes)\n            if not stride == 1:\n                self.norm4 = nn.InstanceNorm2d(planes)\n\n        elif norm_fn == 'none':\n            self.norm1 = nn.Sequential()\n            self.norm2 = nn.Sequential()\n            self.norm3 = nn.Sequential()\n            if not stride == 1:\n                self.norm4 = nn.Sequential()\n\n        if stride == 1:\n            self.downsample = None\n        \n        else:    \n            self.downsample = nn.Sequential(\n                nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride), self.norm4)\n\n\n    def forward(self, x):\n        y = x\n        y = self.relu(self.norm1(self.conv1(y)))\n        y = self.relu(self.norm2(self.conv2(y)))\n        y = self.relu(self.norm3(self.conv3(y)))\n\n        if self.downsample is not None:\n            x = self.downsample(x)\n\n        return self.relu(x+y)\n\nclass BasicEncoder(nn.Module):\n    def __init__(self, output_dim=128, norm_fn='batch', dropout=0.0):\n        super(BasicEncoder, self).__init__()\n        self.norm_fn = norm_fn\n\n        if self.norm_fn == 'group':\n            self.norm1 = nn.GroupNorm(num_groups=8, num_channels=64)\n            \n        elif self.norm_fn == 'batch':\n            self.norm1 = nn.BatchNorm2d(64)\n\n        elif self.norm_fn == 'instance':\n            self.norm1 = nn.InstanceNorm2d(64)\n\n        elif self.norm_fn == 'none':\n            self.norm1 = nn.Sequential()\n\n        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)\n        self.relu1 = nn.ReLU(inplace=True)\n\n        self.in_planes = 64\n        self.layer1 = self._make_layer(64,  stride=1)\n        self.layer2 = self._make_layer(96, stride=2)\n        self.layer3 = self._make_layer(128, stride=2)\n\n        # output convolution\n        self.conv2 = nn.Conv2d(128, output_dim, kernel_size=1)\n\n        self.dropout = None\n        if dropout > 0:\n            self.dropout = nn.Dropout2d(p=dropout)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n            elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):\n                if m.weight is not None:\n                    nn.init.constant_(m.weight, 1)\n                if m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n\n    def _make_layer(self, dim, stride=1):\n        layer1 = ResidualBlock(self.in_planes, dim, self.norm_fn, stride=stride)\n        layer2 = ResidualBlock(dim, dim, self.norm_fn, stride=1)\n        layers = (layer1, layer2)\n        \n        self.in_planes = dim\n        return nn.Sequential(*layers)\n\n\n    def forward(self, x):\n\n        # if input is list, combine batch dimension\n        is_list = isinstance(x, tuple) or isinstance(x, list)\n        if is_list:\n            batch_dim = x[0].shape[0]\n            x = torch.cat(x, dim=0)\n\n        x = self.conv1(x)\n        x = self.norm1(x)\n        x = self.relu1(x)\n\n        x = self.layer1(x)\n        x = self.layer2(x)\n        x = self.layer3(x)\n\n        x = self.conv2(x)\n\n        if self.training and self.dropout is not None:\n            x = self.dropout(x)\n\n        if is_list:\n            x = torch.split(x, [batch_dim, batch_dim], dim=0)\n\n        return x\n\n\nclass SmallEncoder(nn.Module):\n    def __init__(self, output_dim=128, norm_fn='batch', dropout=0.0):\n        super(SmallEncoder, self).__init__()\n        self.norm_fn = norm_fn\n\n        if self.norm_fn == 'group':\n            self.norm1 = nn.GroupNorm(num_groups=8, num_channels=32)\n            \n        elif self.norm_fn == 'batch':\n            self.norm1 = nn.BatchNorm2d(32)\n\n        elif self.norm_fn == 'instance':\n            self.norm1 = nn.InstanceNorm2d(32)\n\n        elif self.norm_fn == 'none':\n            self.norm1 = nn.Sequential()\n\n        self.conv1 = nn.Conv2d(3, 32, kernel_size=7, stride=2, padding=3)\n        self.relu1 = nn.ReLU(inplace=True)\n\n        self.in_planes = 32\n        self.layer1 = self._make_layer(32,  stride=1)\n        self.layer2 = self._make_layer(64, stride=2)\n        self.layer3 = self._make_layer(96, stride=2)\n\n        self.dropout = None\n        if dropout > 0:\n            self.dropout = nn.Dropout2d(p=dropout)\n        \n        self.conv2 = nn.Conv2d(96, output_dim, kernel_size=1)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n            elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):\n                if m.weight is not None:\n                    nn.init.constant_(m.weight, 1)\n                if m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n\n    def _make_layer(self, dim, stride=1):\n        layer1 = BottleneckBlock(self.in_planes, dim, self.norm_fn, stride=stride)\n        layer2 = BottleneckBlock(dim, dim, self.norm_fn, stride=1)\n        layers = (layer1, layer2)\n    \n        self.in_planes = dim\n        return nn.Sequential(*layers)\n\n\n    def forward(self, x):\n\n        # if input is list, combine batch dimension\n        is_list = isinstance(x, tuple) or isinstance(x, list)\n        if is_list:\n            batch_dim = x[0].shape[0]\n            x = torch.cat(x, dim=0)\n\n        x = self.conv1(x)\n        x = self.norm1(x)\n        x = self.relu1(x)\n\n        x = self.layer1(x)\n        x = self.layer2(x)\n        x = self.layer3(x)\n        x = self.conv2(x)\n\n        if self.training and self.dropout is not None:\n            x = self.dropout(x)\n\n        if is_list:\n            x = torch.split(x, [batch_dim, batch_dim], dim=0)\n\n        return x\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/RAFT/core/raft.py",
    "content": "import numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom .update import BasicUpdateBlock, SmallUpdateBlock\nfrom .extractor import BasicEncoder, SmallEncoder\nfrom .corr import CorrBlock, AlternateCorrBlock\nfrom .utils_core.utils import bilinear_sampler, coords_grid, upflow8\n\ntry:\n    autocast = torch.cuda.amp.autocast\nexcept:\n    # dummy autocast for PyTorch < 1.6\n    class autocast:\n        def __init__(self, enabled):\n            pass\n        def __enter__(self):\n            pass\n        def __exit__(self, *args):\n            pass\n\n\nclass RAFT(nn.Module):\n    def __init__(self, args):\n        super(RAFT, self).__init__()\n        self.args = args\n\n        if args.small:\n            self.hidden_dim = hdim = 96\n            self.context_dim = cdim = 64\n            args.corr_levels = 4\n            args.corr_radius = 3\n        \n        else:\n            self.hidden_dim = hdim = 128\n            self.context_dim = cdim = 128\n            args.corr_levels = 4\n            args.corr_radius = 4\n\n        if 'dropout' not in self.args:\n            self.args.dropout = 0\n\n        if 'alternate_corr' not in self.args:\n            self.args.alternate_corr = False\n\n        # feature network, context network, and update block\n        if args.small:\n            self.fnet = SmallEncoder(output_dim=128, norm_fn='instance', dropout=args.dropout)        \n            self.cnet = SmallEncoder(output_dim=hdim+cdim, norm_fn='none', dropout=args.dropout)\n            self.update_block = SmallUpdateBlock(self.args, hidden_dim=hdim)\n\n        else:\n            self.fnet = BasicEncoder(output_dim=256, norm_fn='instance', dropout=args.dropout)        \n            self.cnet = BasicEncoder(output_dim=hdim+cdim, norm_fn='batch', dropout=args.dropout)\n            self.update_block = BasicUpdateBlock(self.args, hidden_dim=hdim)\n\n    def freeze_bn(self):\n        for m in self.modules():\n            if isinstance(m, nn.BatchNorm2d):\n                m.eval()\n\n    def initialize_flow(self, img):\n        \"\"\" Flow is represented as difference between two coordinate grids flow = coords1 - coords0\"\"\"\n        N, C, H, W = img.shape\n        coords0 = coords_grid(N, H//8, W//8, device=img.device)\n        coords1 = coords_grid(N, H//8, W//8, device=img.device)\n\n        # optical flow computed as difference: flow = coords1 - coords0\n        return coords0, coords1\n\n    def upsample_flow(self, flow, mask):\n        \"\"\" Upsample flow field [H/8, W/8, 2] -> [H, W, 2] using convex combination \"\"\"\n        N, _, H, W = flow.shape\n        mask = mask.view(N, 1, 9, 8, 8, H, W)\n        mask = torch.softmax(mask, dim=2)\n\n        up_flow = F.unfold(8 * flow, [3,3], padding=1)\n        up_flow = up_flow.view(N, 2, 9, 1, 1, H, W)\n\n        up_flow = torch.sum(mask * up_flow, dim=2)\n        up_flow = up_flow.permute(0, 1, 4, 2, 5, 3)\n        return up_flow.reshape(N, 2, 8*H, 8*W)\n\n\n    def forward(self, image1, image2, iters=12, flow_init=None, upsample=True, test_mode=False):\n        \"\"\" Estimate optical flow between pair of frames \"\"\"\n\n        image1 = 2 * (image1 / 255.0) - 1.0\n        image2 = 2 * (image2 / 255.0) - 1.0\n\n        image1 = image1.contiguous()\n        image2 = image2.contiguous()\n\n        hdim = self.hidden_dim\n        cdim = self.context_dim\n\n        # run the feature network\n        with autocast(enabled=self.args.mixed_precision):\n            fmap1, fmap2 = self.fnet([image1, image2])        \n        \n        fmap1 = fmap1.float()\n        fmap2 = fmap2.float()\n        if self.args.alternate_corr:\n            corr_fn = AlternateCorrBlock(fmap1, fmap2, radius=self.args.corr_radius)\n        else:\n            corr_fn = CorrBlock(fmap1, fmap2, radius=self.args.corr_radius)\n\n        # run the context network\n        with autocast(enabled=self.args.mixed_precision):\n            cnet = self.cnet(image1)\n            net, inp = torch.split(cnet, [hdim, cdim], dim=1)\n            net = torch.tanh(net)\n            inp = torch.relu(inp)\n\n        coords0, coords1 = self.initialize_flow(image1)\n\n        if flow_init is not None:\n            coords1 = coords1 + flow_init\n\n        flow_predictions = []\n        for itr in range(iters):\n            coords1 = coords1.detach()\n            corr = corr_fn(coords1) # index correlation volume\n\n            flow = coords1 - coords0\n            with autocast(enabled=self.args.mixed_precision):\n                net, up_mask, delta_flow = self.update_block(net, inp, corr, flow)\n\n            # F(t+1) = F(t) + \\Delta(t)\n            coords1 = coords1 + delta_flow\n\n            # upsample predictions\n            if up_mask is None:\n                flow_up = upflow8(coords1 - coords0)\n            else:\n                flow_up = self.upsample_flow(coords1 - coords0, up_mask)\n            \n            flow_predictions.append(flow_up)\n\n        if test_mode:\n            return coords1 - coords0, flow_up\n            \n        return flow_predictions\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/RAFT/core/update.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\nclass FlowHead(nn.Module):\n    def __init__(self, input_dim=128, hidden_dim=256):\n        super(FlowHead, self).__init__()\n        self.conv1 = nn.Conv2d(input_dim, hidden_dim, 3, padding=1)\n        self.conv2 = nn.Conv2d(hidden_dim, 2, 3, padding=1)\n        self.relu = nn.ReLU(inplace=True)\n\n    def forward(self, x):\n        return self.conv2(self.relu(self.conv1(x)))\n\nclass ConvGRU(nn.Module):\n    def __init__(self, hidden_dim=128, input_dim=192+128):\n        super(ConvGRU, self).__init__()\n        self.convz = nn.Conv2d(hidden_dim+input_dim, hidden_dim, 3, padding=1)\n        self.convr = nn.Conv2d(hidden_dim+input_dim, hidden_dim, 3, padding=1)\n        self.convq = nn.Conv2d(hidden_dim+input_dim, hidden_dim, 3, padding=1)\n\n    def forward(self, h, x):\n        hx = torch.cat([h, x], dim=1)\n\n        z = torch.sigmoid(self.convz(hx))\n        r = torch.sigmoid(self.convr(hx))\n        q = torch.tanh(self.convq(torch.cat([r*h, x], dim=1)))\n\n        h = (1-z) * h + z * q\n        return h\n\nclass SepConvGRU(nn.Module):\n    def __init__(self, hidden_dim=128, input_dim=192+128):\n        super(SepConvGRU, self).__init__()\n        self.convz1 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (1,5), padding=(0,2))\n        self.convr1 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (1,5), padding=(0,2))\n        self.convq1 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (1,5), padding=(0,2))\n\n        self.convz2 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (5,1), padding=(2,0))\n        self.convr2 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (5,1), padding=(2,0))\n        self.convq2 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (5,1), padding=(2,0))\n\n\n    def forward(self, h, x):\n        # horizontal\n        hx = torch.cat([h, x], dim=1)\n        z = torch.sigmoid(self.convz1(hx))\n        r = torch.sigmoid(self.convr1(hx))\n        q = torch.tanh(self.convq1(torch.cat([r*h, x], dim=1)))        \n        h = (1-z) * h + z * q\n\n        # vertical\n        hx = torch.cat([h, x], dim=1)\n        z = torch.sigmoid(self.convz2(hx))\n        r = torch.sigmoid(self.convr2(hx))\n        q = torch.tanh(self.convq2(torch.cat([r*h, x], dim=1)))       \n        h = (1-z) * h + z * q\n\n        return h\n\nclass SmallMotionEncoder(nn.Module):\n    def __init__(self, args):\n        super(SmallMotionEncoder, self).__init__()\n        cor_planes = args.corr_levels * (2*args.corr_radius + 1)**2\n        self.convc1 = nn.Conv2d(cor_planes, 96, 1, padding=0)\n        self.convf1 = nn.Conv2d(2, 64, 7, padding=3)\n        self.convf2 = nn.Conv2d(64, 32, 3, padding=1)\n        self.conv = nn.Conv2d(128, 80, 3, padding=1)\n\n    def forward(self, flow, corr):\n        cor = F.relu(self.convc1(corr))\n        flo = F.relu(self.convf1(flow))\n        flo = F.relu(self.convf2(flo))\n        cor_flo = torch.cat([cor, flo], dim=1)\n        out = F.relu(self.conv(cor_flo))\n        return torch.cat([out, flow], dim=1)\n\nclass BasicMotionEncoder(nn.Module):\n    def __init__(self, args):\n        super(BasicMotionEncoder, self).__init__()\n        cor_planes = args.corr_levels * (2*args.corr_radius + 1)**2\n        self.convc1 = nn.Conv2d(cor_planes, 256, 1, padding=0)\n        self.convc2 = nn.Conv2d(256, 192, 3, padding=1)\n        self.convf1 = nn.Conv2d(2, 128, 7, padding=3)\n        self.convf2 = nn.Conv2d(128, 64, 3, padding=1)\n        self.conv = nn.Conv2d(64+192, 128-2, 3, padding=1)\n\n    def forward(self, flow, corr):\n        cor = F.relu(self.convc1(corr))\n        cor = F.relu(self.convc2(cor))\n        flo = F.relu(self.convf1(flow))\n        flo = F.relu(self.convf2(flo))\n\n        cor_flo = torch.cat([cor, flo], dim=1)\n        out = F.relu(self.conv(cor_flo))\n        return torch.cat([out, flow], dim=1)\n\nclass SmallUpdateBlock(nn.Module):\n    def __init__(self, args, hidden_dim=96):\n        super(SmallUpdateBlock, self).__init__()\n        self.encoder = SmallMotionEncoder(args)\n        self.gru = ConvGRU(hidden_dim=hidden_dim, input_dim=82+64)\n        self.flow_head = FlowHead(hidden_dim, hidden_dim=128)\n\n    def forward(self, net, inp, corr, flow):\n        motion_features = self.encoder(flow, corr)\n        inp = torch.cat([inp, motion_features], dim=1)\n        net = self.gru(net, inp)\n        delta_flow = self.flow_head(net)\n\n        return net, None, delta_flow\n\nclass BasicUpdateBlock(nn.Module):\n    def __init__(self, args, hidden_dim=128, input_dim=128):\n        super(BasicUpdateBlock, self).__init__()\n        self.args = args\n        self.encoder = BasicMotionEncoder(args)\n        self.gru = SepConvGRU(hidden_dim=hidden_dim, input_dim=128+hidden_dim)\n        self.flow_head = FlowHead(hidden_dim, hidden_dim=256)\n\n        self.mask = nn.Sequential(\n            nn.Conv2d(128, 256, 3, padding=1),\n            nn.ReLU(inplace=True),\n            nn.Conv2d(256, 64*9, 1, padding=0))\n\n    def forward(self, net, inp, corr, flow, upsample=True):\n        motion_features = self.encoder(flow, corr)\n        inp = torch.cat([inp, motion_features], dim=1)\n\n        net = self.gru(net, inp)\n        delta_flow = self.flow_head(net)\n\n        # scale mask to balence gradients\n        mask = .25 * self.mask(net)\n        return net, mask, delta_flow\n\n\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/RAFT/core/utils_core/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/RAFT/core/utils_core/augmentor.py",
    "content": "import numpy as np\nimport random\nimport math\nfrom PIL import Image\n\nimport cv2\ncv2.setNumThreads(0)\ncv2.ocl.setUseOpenCL(False)\n\nimport torch\nfrom torchvision.transforms import ColorJitter\nimport torch.nn.functional as F\n\n\nclass FlowAugmentor:\n    def __init__(self, crop_size, min_scale=-0.2, max_scale=0.5, do_flip=True):\n        \n        # spatial augmentation params\n        self.crop_size = crop_size\n        self.min_scale = min_scale\n        self.max_scale = max_scale\n        self.spatial_aug_prob = 0.8\n        self.stretch_prob = 0.8\n        self.max_stretch = 0.2\n\n        # flip augmentation params\n        self.do_flip = do_flip\n        self.h_flip_prob = 0.5\n        self.v_flip_prob = 0.1\n\n        # photometric augmentation params\n        self.photo_aug = ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0.5/3.14)\n        self.asymmetric_color_aug_prob = 0.2\n        self.eraser_aug_prob = 0.5\n\n    def color_transform(self, img1, img2):\n        \"\"\" Photometric augmentation \"\"\"\n\n        # asymmetric\n        if np.random.rand() < self.asymmetric_color_aug_prob:\n            img1 = np.array(self.photo_aug(Image.fromarray(img1)), dtype=np.uint8)\n            img2 = np.array(self.photo_aug(Image.fromarray(img2)), dtype=np.uint8)\n\n        # symmetric\n        else:\n            image_stack = np.concatenate([img1, img2], axis=0)\n            image_stack = np.array(self.photo_aug(Image.fromarray(image_stack)), dtype=np.uint8)\n            img1, img2 = np.split(image_stack, 2, axis=0)\n\n        return img1, img2\n\n    def eraser_transform(self, img1, img2, bounds=[50, 100]):\n        \"\"\" Occlusion augmentation \"\"\"\n\n        ht, wd = img1.shape[:2]\n        if np.random.rand() < self.eraser_aug_prob:\n            mean_color = np.mean(img2.reshape(-1, 3), axis=0)\n            for _ in range(np.random.randint(1, 3)):\n                x0 = np.random.randint(0, wd)\n                y0 = np.random.randint(0, ht)\n                dx = np.random.randint(bounds[0], bounds[1])\n                dy = np.random.randint(bounds[0], bounds[1])\n                img2[y0:y0+dy, x0:x0+dx, :] = mean_color\n\n        return img1, img2\n\n    def spatial_transform(self, img1, img2, flow):\n        # randomly sample scale\n        ht, wd = img1.shape[:2]\n        min_scale = np.maximum(\n            (self.crop_size[0] + 8) / float(ht), \n            (self.crop_size[1] + 8) / float(wd))\n\n        scale = 2 ** np.random.uniform(self.min_scale, self.max_scale)\n        scale_x = scale\n        scale_y = scale\n        if np.random.rand() < self.stretch_prob:\n            scale_x *= 2 ** np.random.uniform(-self.max_stretch, self.max_stretch)\n            scale_y *= 2 ** np.random.uniform(-self.max_stretch, self.max_stretch)\n        \n        scale_x = np.clip(scale_x, min_scale, None)\n        scale_y = np.clip(scale_y, min_scale, None)\n\n        if np.random.rand() < self.spatial_aug_prob:\n            # rescale the images\n            img1 = cv2.resize(img1, None, fx=scale_x, fy=scale_y, interpolation=cv2.INTER_LINEAR)\n            img2 = cv2.resize(img2, None, fx=scale_x, fy=scale_y, interpolation=cv2.INTER_LINEAR)\n            flow = cv2.resize(flow, None, fx=scale_x, fy=scale_y, interpolation=cv2.INTER_LINEAR)\n            flow = flow * [scale_x, scale_y]\n\n        if self.do_flip:\n            if np.random.rand() < self.h_flip_prob: # h-flip\n                img1 = img1[:, ::-1]\n                img2 = img2[:, ::-1]\n                flow = flow[:, ::-1] * [-1.0, 1.0]\n\n            if np.random.rand() < self.v_flip_prob: # v-flip\n                img1 = img1[::-1, :]\n                img2 = img2[::-1, :]\n                flow = flow[::-1, :] * [1.0, -1.0]\n\n        y0 = np.random.randint(0, img1.shape[0] - self.crop_size[0])\n        x0 = np.random.randint(0, img1.shape[1] - self.crop_size[1])\n        \n        img1 = img1[y0:y0+self.crop_size[0], x0:x0+self.crop_size[1]]\n        img2 = img2[y0:y0+self.crop_size[0], x0:x0+self.crop_size[1]]\n        flow = flow[y0:y0+self.crop_size[0], x0:x0+self.crop_size[1]]\n\n        return img1, img2, flow\n\n    def __call__(self, img1, img2, flow):\n        img1, img2 = self.color_transform(img1, img2)\n        img1, img2 = self.eraser_transform(img1, img2)\n        img1, img2, flow = self.spatial_transform(img1, img2, flow)\n\n        img1 = np.ascontiguousarray(img1)\n        img2 = np.ascontiguousarray(img2)\n        flow = np.ascontiguousarray(flow)\n\n        return img1, img2, flow\n\nclass SparseFlowAugmentor:\n    def __init__(self, crop_size, min_scale=-0.2, max_scale=0.5, do_flip=False):\n        # spatial augmentation params\n        self.crop_size = crop_size\n        self.min_scale = min_scale\n        self.max_scale = max_scale\n        self.spatial_aug_prob = 0.8\n        self.stretch_prob = 0.8\n        self.max_stretch = 0.2\n\n        # flip augmentation params\n        self.do_flip = do_flip\n        self.h_flip_prob = 0.5\n        self.v_flip_prob = 0.1\n\n        # photometric augmentation params\n        self.photo_aug = ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3, hue=0.3/3.14)\n        self.asymmetric_color_aug_prob = 0.2\n        self.eraser_aug_prob = 0.5\n        \n    def color_transform(self, img1, img2):\n        image_stack = np.concatenate([img1, img2], axis=0)\n        image_stack = np.array(self.photo_aug(Image.fromarray(image_stack)), dtype=np.uint8)\n        img1, img2 = np.split(image_stack, 2, axis=0)\n        return img1, img2\n\n    def eraser_transform(self, img1, img2):\n        ht, wd = img1.shape[:2]\n        if np.random.rand() < self.eraser_aug_prob:\n            mean_color = np.mean(img2.reshape(-1, 3), axis=0)\n            for _ in range(np.random.randint(1, 3)):\n                x0 = np.random.randint(0, wd)\n                y0 = np.random.randint(0, ht)\n                dx = np.random.randint(50, 100)\n                dy = np.random.randint(50, 100)\n                img2[y0:y0+dy, x0:x0+dx, :] = mean_color\n\n        return img1, img2\n\n    def resize_sparse_flow_map(self, flow, valid, fx=1.0, fy=1.0):\n        ht, wd = flow.shape[:2]\n        coords = np.meshgrid(np.arange(wd), np.arange(ht))\n        coords = np.stack(coords, axis=-1)\n\n        coords = coords.reshape(-1, 2).astype(np.float32)\n        flow = flow.reshape(-1, 2).astype(np.float32)\n        valid = valid.reshape(-1).astype(np.float32)\n\n        coords0 = coords[valid>=1]\n        flow0 = flow[valid>=1]\n\n        ht1 = int(round(ht * fy))\n        wd1 = int(round(wd * fx))\n\n        coords1 = coords0 * [fx, fy]\n        flow1 = flow0 * [fx, fy]\n\n        xx = np.round(coords1[:,0]).astype(np.int32)\n        yy = np.round(coords1[:,1]).astype(np.int32)\n\n        v = (xx > 0) & (xx < wd1) & (yy > 0) & (yy < ht1)\n        xx = xx[v]\n        yy = yy[v]\n        flow1 = flow1[v]\n\n        flow_img = np.zeros([ht1, wd1, 2], dtype=np.float32)\n        valid_img = np.zeros([ht1, wd1], dtype=np.int32)\n\n        flow_img[yy, xx] = flow1\n        valid_img[yy, xx] = 1\n\n        return flow_img, valid_img\n\n    def spatial_transform(self, img1, img2, flow, valid):\n        # randomly sample scale\n\n        ht, wd = img1.shape[:2]\n        min_scale = np.maximum(\n            (self.crop_size[0] + 1) / float(ht), \n            (self.crop_size[1] + 1) / float(wd))\n\n        scale = 2 ** np.random.uniform(self.min_scale, self.max_scale)\n        scale_x = np.clip(scale, min_scale, None)\n        scale_y = np.clip(scale, min_scale, None)\n\n        if np.random.rand() < self.spatial_aug_prob:\n            # rescale the images\n            img1 = cv2.resize(img1, None, fx=scale_x, fy=scale_y, interpolation=cv2.INTER_LINEAR)\n            img2 = cv2.resize(img2, None, fx=scale_x, fy=scale_y, interpolation=cv2.INTER_LINEAR)\n            flow, valid = self.resize_sparse_flow_map(flow, valid, fx=scale_x, fy=scale_y)\n\n        if self.do_flip:\n            if np.random.rand() < 0.5: # h-flip\n                img1 = img1[:, ::-1]\n                img2 = img2[:, ::-1]\n                flow = flow[:, ::-1] * [-1.0, 1.0]\n                valid = valid[:, ::-1]\n\n        margin_y = 20\n        margin_x = 50\n\n        y0 = np.random.randint(0, img1.shape[0] - self.crop_size[0] + margin_y)\n        x0 = np.random.randint(-margin_x, img1.shape[1] - self.crop_size[1] + margin_x)\n\n        y0 = np.clip(y0, 0, img1.shape[0] - self.crop_size[0])\n        x0 = np.clip(x0, 0, img1.shape[1] - self.crop_size[1])\n\n        img1 = img1[y0:y0+self.crop_size[0], x0:x0+self.crop_size[1]]\n        img2 = img2[y0:y0+self.crop_size[0], x0:x0+self.crop_size[1]]\n        flow = flow[y0:y0+self.crop_size[0], x0:x0+self.crop_size[1]]\n        valid = valid[y0:y0+self.crop_size[0], x0:x0+self.crop_size[1]]\n        return img1, img2, flow, valid\n\n\n    def __call__(self, img1, img2, flow, valid):\n        img1, img2 = self.color_transform(img1, img2)\n        img1, img2 = self.eraser_transform(img1, img2)\n        img1, img2, flow, valid = self.spatial_transform(img1, img2, flow, valid)\n\n        img1 = np.ascontiguousarray(img1)\n        img2 = np.ascontiguousarray(img2)\n        flow = np.ascontiguousarray(flow)\n        valid = np.ascontiguousarray(valid)\n\n        return img1, img2, flow, valid\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/RAFT/core/utils_core/flow_viz.py",
    "content": "# Flow visualization code used from https://github.com/tomrunia/OpticalFlow_Visualization\n\n\n# MIT License\n#\n# Copyright (c) 2018 Tom Runia\n#\n# Permission is hereby granted, free of charge, to any person obtaining a copy\n# of this software and associated documentation files (the \"Software\"), to deal\n# in the Software without restriction, including without limitation the rights\n# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n# copies of the Software, and to permit persons to whom the Software is\n# furnished to do so, subject to conditions.\n#\n# Author: Tom Runia\n# Date Created: 2018-08-03\n\nimport numpy as np\n\ndef make_colorwheel():\n    \"\"\"\n    Generates a color wheel for optical flow visualization as presented in:\n        Baker et al. \"A Database and Evaluation Methodology for Optical Flow\" (ICCV, 2007)\n        URL: http://vision.middlebury.edu/flow/flowEval-iccv07.pdf\n\n    Code follows the original C++ source code of Daniel Scharstein.\n    Code follows the the Matlab source code of Deqing Sun.\n\n    Returns:\n        np.ndarray: Color wheel\n    \"\"\"\n\n    RY = 15\n    YG = 6\n    GC = 4\n    CB = 11\n    BM = 13\n    MR = 6\n\n    ncols = RY + YG + GC + CB + BM + MR\n    colorwheel = np.zeros((ncols, 3))\n    col = 0\n\n    # RY\n    colorwheel[0:RY, 0] = 255\n    colorwheel[0:RY, 1] = np.floor(255*np.arange(0,RY)/RY)\n    col = col+RY\n    # YG\n    colorwheel[col:col+YG, 0] = 255 - np.floor(255*np.arange(0,YG)/YG)\n    colorwheel[col:col+YG, 1] = 255\n    col = col+YG\n    # GC\n    colorwheel[col:col+GC, 1] = 255\n    colorwheel[col:col+GC, 2] = np.floor(255*np.arange(0,GC)/GC)\n    col = col+GC\n    # CB\n    colorwheel[col:col+CB, 1] = 255 - np.floor(255*np.arange(CB)/CB)\n    colorwheel[col:col+CB, 2] = 255\n    col = col+CB\n    # BM\n    colorwheel[col:col+BM, 2] = 255\n    colorwheel[col:col+BM, 0] = np.floor(255*np.arange(0,BM)/BM)\n    col = col+BM\n    # MR\n    colorwheel[col:col+MR, 2] = 255 - np.floor(255*np.arange(MR)/MR)\n    colorwheel[col:col+MR, 0] = 255\n    return colorwheel\n\n\ndef flow_uv_to_colors(u, v, convert_to_bgr=False):\n    \"\"\"\n    Applies the flow color wheel to (possibly clipped) flow components u and v.\n\n    According to the C++ source code of Daniel Scharstein\n    According to the Matlab source code of Deqing Sun\n\n    Args:\n        u (np.ndarray): Input horizontal flow of shape [H,W]\n        v (np.ndarray): Input vertical flow of shape [H,W]\n        convert_to_bgr (bool, optional): Convert output image to BGR. Defaults to False.\n\n    Returns:\n        np.ndarray: Flow visualization image of shape [H,W,3]\n    \"\"\"\n    flow_image = np.zeros((u.shape[0], u.shape[1], 3), np.uint8)\n    colorwheel = make_colorwheel()  # shape [55x3]\n    ncols = colorwheel.shape[0]\n    rad = np.sqrt(np.square(u) + np.square(v))\n    a = np.arctan2(-v, -u)/np.pi\n    fk = (a+1) / 2*(ncols-1)\n    k0 = np.floor(fk).astype(np.int32)\n    k1 = k0 + 1\n    k1[k1 == ncols] = 0\n    f = fk - k0\n    for i in range(colorwheel.shape[1]):\n        tmp = colorwheel[:,i]\n        col0 = tmp[k0] / 255.0\n        col1 = tmp[k1] / 255.0\n        col = (1-f)*col0 + f*col1\n        idx = (rad <= 1)\n        col[idx]  = 1 - rad[idx] * (1-col[idx])\n        col[~idx] = col[~idx] * 0.75   # out of range\n        # Note the 2-i => BGR instead of RGB\n        ch_idx = 2-i if convert_to_bgr else i\n        flow_image[:,:,ch_idx] = np.floor(255 * col)\n    return flow_image\n\n\ndef flow_to_image(flow_uv, clip_flow=None, convert_to_bgr=False):\n    \"\"\"\n    Expects a two dimensional flow image of shape.\n\n    Args:\n        flow_uv (np.ndarray): Flow UV image of shape [H,W,2]\n        clip_flow (float, optional): Clip maximum of flow values. Defaults to None.\n        convert_to_bgr (bool, optional): Convert output image to BGR. Defaults to False.\n\n    Returns:\n        np.ndarray: Flow visualization image of shape [H,W,3]\n    \"\"\"\n    assert flow_uv.ndim == 3, 'input flow must have three dimensions'\n    assert flow_uv.shape[2] == 2, 'input flow must have shape [H,W,2]'\n    if clip_flow is not None:\n        flow_uv = np.clip(flow_uv, 0, clip_flow)\n    u = flow_uv[:,:,0]\n    v = flow_uv[:,:,1]\n    rad = np.sqrt(np.square(u) + np.square(v))\n    rad_max = np.max(rad)\n    epsilon = 1e-5\n    u = u / (rad_max + epsilon)\n    v = v / (rad_max + epsilon)\n    return flow_uv_to_colors(u, v, convert_to_bgr)"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/RAFT/core/utils_core/frame_utils.py",
    "content": "import numpy as np\nfrom PIL import Image\nfrom os.path import *\nimport re\n\nimport cv2\ncv2.setNumThreads(0)\ncv2.ocl.setUseOpenCL(False)\n\nTAG_CHAR = np.array([202021.25], np.float32)\n\ndef readFlow(fn):\n    \"\"\" Read .flo file in Middlebury format\"\"\"\n    # Code adapted from:\n    # http://stackoverflow.com/questions/28013200/reading-middlebury-flow-files-with-python-bytes-array-numpy\n\n    # WARNING: this will work on little-endian architectures (eg Intel x86) only!\n    # print 'fn = %s'%(fn)\n    with open(fn, 'rb') as f:\n        magic = np.fromfile(f, np.float32, count=1)\n        if 202021.25 != magic:\n            print('Magic number incorrect. Invalid .flo file')\n            return None\n        else:\n            w = np.fromfile(f, np.int32, count=1)\n            h = np.fromfile(f, np.int32, count=1)\n            # print 'Reading %d x %d flo file\\n' % (w, h)\n            data = np.fromfile(f, np.float32, count=2*int(w)*int(h))\n            # Reshape data into 3D array (columns, rows, bands)\n            # The reshape here is for visualization, the original code is (w,h,2)\n            return np.resize(data, (int(h), int(w), 2))\n\ndef readPFM(file):\n    file = open(file, 'rb')\n\n    color = None\n    width = None\n    height = None\n    scale = None\n    endian = None\n\n    header = file.readline().rstrip()\n    if header == b'PF':\n        color = True\n    elif header == b'Pf':\n        color = False\n    else:\n        raise Exception('Not a PFM file.')\n\n    dim_match = re.match(rb'^(\\d+)\\s(\\d+)\\s$', file.readline())\n    if dim_match:\n        width, height = map(int, dim_match.groups())\n    else:\n        raise Exception('Malformed PFM header.')\n\n    scale = float(file.readline().rstrip())\n    if scale < 0: # little-endian\n        endian = '<'\n        scale = -scale\n    else:\n        endian = '>' # big-endian\n\n    data = np.fromfile(file, endian + 'f')\n    shape = (height, width, 3) if color else (height, width)\n\n    data = np.reshape(data, shape)\n    data = np.flipud(data)\n    return data\n\ndef writeFlow(filename,uv,v=None):\n    \"\"\" Write optical flow to file.\n    \n    If v is None, uv is assumed to contain both u and v channels,\n    stacked in depth.\n    Original code by Deqing Sun, adapted from Daniel Scharstein.\n    \"\"\"\n    nBands = 2\n\n    if v is None:\n        assert(uv.ndim == 3)\n        assert(uv.shape[2] == 2)\n        u = uv[:,:,0]\n        v = uv[:,:,1]\n    else:\n        u = uv\n\n    assert(u.shape == v.shape)\n    height,width = u.shape\n    f = open(filename,'wb')\n    # write the header\n    f.write(TAG_CHAR)\n    np.array(width).astype(np.int32).tofile(f)\n    np.array(height).astype(np.int32).tofile(f)\n    # arrange into matrix form\n    tmp = np.zeros((height, width*nBands))\n    tmp[:,np.arange(width)*2] = u\n    tmp[:,np.arange(width)*2 + 1] = v\n    tmp.astype(np.float32).tofile(f)\n    f.close()\n\n\ndef readFlowKITTI(filename):\n    flow = cv2.imread(filename, cv2.IMREAD_ANYDEPTH|cv2.IMREAD_COLOR)\n    flow = flow[:,:,::-1].astype(np.float32)\n    flow, valid = flow[:, :, :2], flow[:, :, 2]\n    flow = (flow - 2**15) / 64.0\n    return flow, valid\n\ndef readDispKITTI(filename):\n    disp = cv2.imread(filename, cv2.IMREAD_ANYDEPTH) / 256.0\n    valid = disp > 0.0\n    flow = np.stack([-disp, np.zeros_like(disp)], -1)\n    return flow, valid\n\n\ndef writeFlowKITTI(filename, uv):\n    uv = 64.0 * uv + 2**15\n    valid = np.ones([uv.shape[0], uv.shape[1], 1])\n    uv = np.concatenate([uv, valid], axis=-1).astype(np.uint16)\n    cv2.imwrite(filename, uv[..., ::-1])\n    \n\ndef read_gen(file_name, pil=False):\n    ext = splitext(file_name)[-1]\n    if ext == '.png' or ext == '.jpeg' or ext == '.ppm' or ext == '.jpg':\n        return Image.open(file_name)\n    elif ext == '.bin' or ext == '.raw':\n        return np.load(file_name)\n    elif ext == '.flo':\n        return readFlow(file_name).astype(np.float32)\n    elif ext == '.pfm':\n        flow = readPFM(file_name).astype(np.float32)\n        if len(flow.shape) == 2:\n            return flow\n        else:\n            return flow[:, :, :-1]\n    return []"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/RAFT/core/utils_core/utils.py",
    "content": "import torch\nimport torch.nn.functional as F\nimport numpy as np\nfrom scipy import interpolate\n\n\nclass InputPadder:\n    \"\"\" Pads images such that dimensions are divisible by 8 \"\"\"\n    def __init__(self, dims, mode='sintel'):\n        self.ht, self.wd = dims[-2:]\n        pad_ht = (((self.ht // 8) + 1) * 8 - self.ht) % 8\n        pad_wd = (((self.wd // 8) + 1) * 8 - self.wd) % 8\n        if mode == 'sintel':\n            self._pad = [pad_wd//2, pad_wd - pad_wd//2, pad_ht//2, pad_ht - pad_ht//2]\n        else:\n            self._pad = [pad_wd//2, pad_wd - pad_wd//2, 0, pad_ht]\n\n    def pad(self, *inputs):\n        return [F.pad(x, self._pad, mode='replicate') for x in inputs]\n\n    def unpad(self,x):\n        ht, wd = x.shape[-2:]\n        c = [self._pad[2], ht-self._pad[3], self._pad[0], wd-self._pad[1]]\n        return x[..., c[0]:c[1], c[2]:c[3]]\n\ndef forward_interpolate(flow):\n    flow = flow.detach().cpu().numpy()\n    dx, dy = flow[0], flow[1]\n\n    ht, wd = dx.shape\n    x0, y0 = np.meshgrid(np.arange(wd), np.arange(ht))\n\n    x1 = x0 + dx\n    y1 = y0 + dy\n    \n    x1 = x1.reshape(-1)\n    y1 = y1.reshape(-1)\n    dx = dx.reshape(-1)\n    dy = dy.reshape(-1)\n\n    valid = (x1 > 0) & (x1 < wd) & (y1 > 0) & (y1 < ht)\n    x1 = x1[valid]\n    y1 = y1[valid]\n    dx = dx[valid]\n    dy = dy[valid]\n\n    flow_x = interpolate.griddata(\n        (x1, y1), dx, (x0, y0), method='nearest', fill_value=0)\n\n    flow_y = interpolate.griddata(\n        (x1, y1), dy, (x0, y0), method='nearest', fill_value=0)\n\n    flow = np.stack([flow_x, flow_y], axis=0)\n    return torch.from_numpy(flow).float()\n\n\ndef bilinear_sampler(img, coords, mode='bilinear', mask=False):\n    \"\"\" Wrapper for grid_sample, uses pixel coordinates \"\"\"\n    H, W = img.shape[-2:]\n    xgrid, ygrid = coords.split([1,1], dim=-1)\n    xgrid = 2*xgrid/(W-1) - 1\n    ygrid = 2*ygrid/(H-1) - 1\n\n    grid = torch.cat([xgrid, ygrid], dim=-1)\n    img = F.grid_sample(img, grid, align_corners=True)\n\n    if mask:\n        mask = (xgrid > -1) & (ygrid > -1) & (xgrid < 1) & (ygrid < 1)\n        return img, mask.float()\n\n    return img\n\n\ndef coords_grid(batch, ht, wd, device):\n    coords = torch.meshgrid(torch.arange(ht, device=device), torch.arange(wd, device=device))\n    coords = torch.stack(coords[::-1], dim=0).float()\n    return coords[None].repeat(batch, 1, 1, 1)\n\n\ndef upflow8(flow, mode='bilinear'):\n    new_size = (8 * flow.shape[2], 8 * flow.shape[3])\n    return  8 * F.interpolate(flow, size=new_size, mode=mode, align_corners=True)\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/ViCLIP/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/ViCLIP/simple_tokenizer.py",
    "content": "import gzip\nimport html\nimport os\nimport subprocess\nfrom functools import lru_cache\nimport ftfy\nimport regex as re\nfrom vbench.utils import CACHE_DIR\n\ndef default_bpe():\n    tokenizer_file = os.path.join(CACHE_DIR, \"ViCLIP/bpe_simple_vocab_16e6.txt.gz\")\n    if not os.path.exists(tokenizer_file):\n        print(f'Downloading ViCLIP tokenizer to {tokenizer_file}')\n        wget_command = ['wget', 'https://raw.githubusercontent.com/openai/CLIP/main/clip/bpe_simple_vocab_16e6.txt.gz', '-P', os.path.dirname(tokenizer_file)]\n        subprocess.run(wget_command)\n    return tokenizer_file\n\n\n@lru_cache()\ndef bytes_to_unicode():\n    \"\"\"\n    Returns list of utf-8 byte and a corresponding list of unicode strings.\n    The reversible bpe codes work on unicode strings.\n    This means you need a large # of unicode characters in your vocab if you want to avoid UNKs.\n    When you're at something like a 10B token dataset you end up needing around 5K for decent coverage.\n    This is a signficant percentage of your normal, say, 32K bpe vocab.\n    To avoid that, we want lookup tables between utf-8 bytes and unicode strings.\n    And avoids mapping to whitespace/control characters the bpe code barfs on.\n    \"\"\"\n    bs = list(range(ord(\"!\"), ord(\"~\")+1))+list(range(ord(\"¡\"), ord(\"¬\")+1))+list(range(ord(\"®\"), ord(\"ÿ\")+1))\n    cs = bs[:]\n    n = 0\n    for b in range(2**8):\n        if b not in bs:\n            bs.append(b)\n            cs.append(2**8+n)\n            n += 1\n    cs = [chr(n) for n in cs]\n    return dict(zip(bs, cs))\n\n\ndef get_pairs(word):\n    \"\"\"Return set of symbol pairs in a word.\n    Word is represented as tuple of symbols (symbols being variable-length strings).\n    \"\"\"\n    pairs = set()\n    prev_char = word[0]\n    for char in word[1:]:\n        pairs.add((prev_char, char))\n        prev_char = char\n    return pairs\n\n\ndef basic_clean(text):\n    text = ftfy.fix_text(text)\n    text = html.unescape(html.unescape(text))\n    return text.strip()\n\n\ndef whitespace_clean(text):\n    text = re.sub(r'\\s+', ' ', text)\n    text = text.strip()\n    return text\n\n\nclass SimpleTokenizer(object):\n    def __init__(self, bpe_path: str = default_bpe()):\n        self.byte_encoder = bytes_to_unicode()\n        self.byte_decoder = {v: k for k, v in self.byte_encoder.items()}\n        merges = gzip.open(bpe_path).read().decode(\"utf-8\").split('\\n')\n        merges = merges[1:49152-256-2+1]\n        merges = [tuple(merge.split()) for merge in merges]\n        vocab = list(bytes_to_unicode().values())\n        vocab = vocab + [v+'</w>' for v in vocab]\n        for merge in merges:\n            vocab.append(''.join(merge))\n        vocab.extend(['<|startoftext|>', '<|endoftext|>'])\n        self.encoder = dict(zip(vocab, range(len(vocab))))\n        self.decoder = {v: k for k, v in self.encoder.items()}\n        self.bpe_ranks = dict(zip(merges, range(len(merges))))\n        self.cache = {'<|startoftext|>': '<|startoftext|>', '<|endoftext|>': '<|endoftext|>'}\n        self.pat = re.compile(r\"\"\"<\\|startoftext\\|>|<\\|endoftext\\|>|'s|'t|'re|'ve|'m|'ll|'d|[\\p{L}]+|[\\p{N}]|[^\\s\\p{L}\\p{N}]+\"\"\", re.IGNORECASE)\n\n    def bpe(self, token):\n        if token in self.cache:\n            return self.cache[token]\n        word = tuple(token[:-1]) + ( token[-1] + '</w>',)\n        pairs = get_pairs(word)\n\n        if not pairs:\n            return token+'</w>'\n\n        while True:\n            bigram = min(pairs, key = lambda pair: self.bpe_ranks.get(pair, float('inf')))\n            if bigram not in self.bpe_ranks:\n                break\n            first, second = bigram\n            new_word = []\n            i = 0\n            while i < len(word):\n                try:\n                    j = word.index(first, i)\n                    new_word.extend(word[i:j])\n                    i = j\n                except:\n                    new_word.extend(word[i:])\n                    break\n\n                if word[i] == first and i < len(word)-1 and word[i+1] == second:\n                    new_word.append(first+second)\n                    i += 2\n                else:\n                    new_word.append(word[i])\n                    i += 1\n            new_word = tuple(new_word)\n            word = new_word\n            if len(word) == 1:\n                break\n            else:\n                pairs = get_pairs(word)\n        word = ' '.join(word)\n        self.cache[token] = word\n        return word\n\n    def encode(self, text):\n        bpe_tokens = []\n        text = whitespace_clean(basic_clean(text)).lower()\n        for token in re.findall(self.pat, text):\n            token = ''.join(self.byte_encoder[b] for b in token.encode('utf-8'))\n            bpe_tokens.extend(self.encoder[bpe_token] for bpe_token in self.bpe(token).split(' '))\n        return bpe_tokens\n\n    def decode(self, tokens):\n        text = ''.join([self.decoder[token] for token in tokens])\n        text = bytearray([self.byte_decoder[c] for c in text]).decode('utf-8', errors=\"replace\").replace('</w>', ' ')\n        return text\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/ViCLIP/viclip.py",
    "content": "import os\nimport logging\n\nimport torch\nfrom einops import rearrange\nfrom torch import nn\nimport math\n\nfrom .simple_tokenizer import SimpleTokenizer as _Tokenizer\nfrom .viclip_vision import clip_joint_l14\nfrom .viclip_text import clip_text_l14\n\nlogger = logging.getLogger(__name__)\n\n\nclass ViCLIP(nn.Module):\n    \"\"\"docstring for ViCLIP\"\"\"\n\n    def __init__(self,  tokenizer=None, pretrain=os.path.join(os.path.dirname(os.path.abspath(__file__)), \"ViClip-InternVid-10M-FLT.pth\"), freeze_text=True):\n        super(ViCLIP, self).__init__()\n        if tokenizer:\n            self.tokenizer = tokenizer\n        else:\n            self.tokenizer = _Tokenizer()\n        self.max_txt_l = 32\n        \n        self.vision_encoder_name = 'vit_l14'\n    \n        self.vision_encoder_pretrained = False\n        self.inputs_image_res = 224\n        self.vision_encoder_kernel_size = 1\n        self.vision_encoder_center = True\n        self.video_input_num_frames = 8\n        self.vision_encoder_drop_path_rate = 0.1\n        self.vision_encoder_checkpoint_num = 24\n        self.is_pretrain = pretrain\n        self.vision_width = 1024\n        self.text_width = 768 \n        self.embed_dim = 768 \n        self.masking_prob = 0.9\n        \n        self.text_encoder_name = 'vit_l14'\n        self.text_encoder_pretrained = False#'bert-base-uncased'\n        self.text_encoder_d_model = 768\n\n        self.text_encoder_vocab_size = 49408\n        \n        \n        # create modules.\n        self.vision_encoder = self.build_vision_encoder()\n        self.text_encoder = self.build_text_encoder()\n\n        self.temp = nn.parameter.Parameter(torch.ones([]) * 1 / 100.0)\n        self.temp_min = 1 / 100.0\n\n        if pretrain:\n            logger.info(f\"Load pretrained weights from {pretrain}\")\n            state_dict = torch.load(pretrain, map_location='cpu')['model']\n            self.load_state_dict(state_dict)\n        \n        # Freeze weights\n        if freeze_text:\n            self.freeze_text()\n            \n\n\n    def freeze_text(self):\n        \"\"\"freeze text encoder\"\"\"\n        for p in self.text_encoder.parameters():\n            p.requires_grad = False\n\n    def no_weight_decay(self):\n        ret = {\"temp\"}\n        ret.update(\n            {\"vision_encoder.\" + k for k in self.vision_encoder.no_weight_decay()}\n        )\n        ret.update(\n            {\"text_encoder.\" + k for k in self.text_encoder.no_weight_decay()}\n        )\n\n        return ret\n\n    def forward(self, image, text, raw_text, idx, log_generation=None, return_sims=False):\n        \"\"\"forward and calculate loss.\n\n        Args:\n            image (torch.Tensor): The input images. Shape: [B,T,C,H,W].\n            text (dict): TODO\n            idx (torch.Tensor): TODO\n\n        Returns: TODO\n\n        \"\"\"\n        self.clip_contrastive_temperature()\n\n        vision_embeds = self.encode_vision(image)\n        text_embeds = self.encode_text(raw_text)\n        if return_sims:\n            sims = torch.nn.functional.normalize(vision_embeds, dim=-1) @ \\\n                  torch.nn.functional.normalize(text_embeds, dim=-1).transpose(0, 1)\n            return sims\n\n        # calculate loss\n\n        ## VTC loss\n        loss_vtc = self.clip_loss.vtc_loss(\n            vision_embeds, text_embeds, idx, self.temp, all_gather=True\n        )\n\n        return dict(\n            loss_vtc=loss_vtc,\n        )\n\n    def encode_vision(self, image, test=False):\n        \"\"\"encode image / videos as features.\n\n        Args:\n            image (torch.Tensor): The input images.\n            test (bool): Whether testing.\n\n        Returns: tuple.\n            - vision_embeds (torch.Tensor): The features of all patches. Shape: [B,T,L,C].\n            - pooled_vision_embeds (torch.Tensor): The pooled features. Shape: [B,T,C].\n\n        \"\"\"\n        if image.ndim == 5:\n            image = image.permute(0, 2, 1, 3, 4).contiguous()\n        else:\n            image = image.unsqueeze(2)\n\n        if not test and self.masking_prob > 0.0:\n            return self.vision_encoder(\n                image, masking_prob=self.masking_prob\n            )\n\n        return self.vision_encoder(image)\n\n    def encode_text(self, text):\n        \"\"\"encode text.\n        Args:\n            text (dict): The output of huggingface's `PreTrainedTokenizer`. contains keys:\n                - input_ids (torch.Tensor): Token ids to be fed to a model. Shape: [B,L].\n                - attention_mask (torch.Tensor): The mask indicate padded tokens. Shape: [B,L]. 0 is padded token.\n                - other keys refer to \"https://huggingface.co/docs/transformers/v4.21.2/en/main_classes/tokenizer#transformers.PreTrainedTokenizer.__call__\".\n        Returns: tuple.\n            - text_embeds (torch.Tensor): The features of all tokens. Shape: [B,L,C].\n            - pooled_text_embeds (torch.Tensor): The pooled features. Shape: [B,C].\n\n        \"\"\"\n        device = next(self.text_encoder.parameters()).device\n        text = self.text_encoder.tokenize(\n            text, context_length=self.max_txt_l\n        ).to(device)\n        text_embeds = self.text_encoder(text)\n        return text_embeds\n\n    @torch.no_grad()\n    def clip_contrastive_temperature(self, min_val=0.001, max_val=0.5):\n        \"\"\"Seems only used during pre-training\"\"\"\n        self.temp.clamp_(min=self.temp_min)\n\n    def build_vision_encoder(self):\n        \"\"\"build vision encoder\n        Returns: (vision_encoder, vision_layernorm). Each is a `nn.Module`.\n\n        \"\"\"\n        encoder_name = self.vision_encoder_name\n        if encoder_name != \"vit_l14\":\n            raise ValueError(f\"Not implemented: {encoder_name}\")\n        vision_encoder = clip_joint_l14(\n            pretrained=self.vision_encoder_pretrained,\n            input_resolution=self.inputs_image_res,\n            kernel_size=self.vision_encoder_kernel_size,\n            center=self.vision_encoder_center,\n            num_frames=self.video_input_num_frames,\n            drop_path=self.vision_encoder_drop_path_rate,\n            checkpoint_num=self.vision_encoder_checkpoint_num,\n        )\n        return vision_encoder\n\n    def build_text_encoder(self):\n        \"\"\"build text_encoder and possiblly video-to-text multimodal fusion encoder.\n        Returns: nn.Module. The text encoder\n\n        \"\"\"\n        encoder_name = self.text_encoder_name\n        if encoder_name != \"vit_l14\":\n            raise ValueError(f\"Not implemented: {encoder_name}\")\n        text_encoder = clip_text_l14(\n            pretrained=self.text_encoder_pretrained,\n            embed_dim=self.text_encoder_d_model,\n            context_length=self.max_txt_l,\n            vocab_size=self.text_encoder_vocab_size,\n            checkpoint_num=0,\n        )\n\n        return text_encoder\n\n    def get_text_encoder(self):\n        \"\"\"get text encoder, used for text and cross-modal encoding\"\"\"\n        encoder = self.text_encoder\n        return encoder.bert if hasattr(encoder, \"bert\") else encoder\n    \n    def get_text_features(self, input_text, tokenizer, text_feature_dict={}):\n        if input_text in text_feature_dict:\n            return text_feature_dict[input_text]\n        text_template= f\"{input_text}\"\n        with torch.no_grad():\n            # text_token = tokenizer.encode(text_template).cuda()\n            text_features = self.encode_text(text_template).float()\n            text_features /= text_features.norm(dim=-1, keepdim=True)      \n            text_feature_dict[input_text] = text_features\n        return text_features\n\n    def get_vid_features(self, input_frames):\n        with torch.no_grad():\n            clip_feat = self.encode_vision(input_frames,test=True).float()\n            clip_feat /= clip_feat.norm(dim=-1, keepdim=True)    \n        return clip_feat\n\n    def get_predict_label(self, clip_feature, text_feats_tensor, top=5):\n        label_probs = (100.0 * clip_feature @ text_feats_tensor.T).softmax(dim=-1)\n        top_probs, top_labels = label_probs.cpu().topk(top, dim=-1)\n        return top_probs, top_labels\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/ViCLIP/viclip_text.py",
    "content": "import os\nimport logging\nfrom collections import OrderedDict\nfrom pkg_resources import packaging\nfrom .simple_tokenizer import SimpleTokenizer as _Tokenizer\n\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\nfrom torch import nn\nimport torch.utils.checkpoint as checkpoint\nimport functools\n\nlogger = logging.getLogger(__name__)\n\n\nMODEL_PATH = 'https://huggingface.co/laion/CLIP-ViT-L-14-DataComp.XL-s13B-b90K'\n_MODELS = {\n    \"ViT-L/14\": os.path.join(MODEL_PATH, \"vit_l14_text.pth\"),\n}\n\n\nclass LayerNorm(nn.LayerNorm):\n    \"\"\"Subclass torch's LayerNorm to handle fp16.\"\"\"\n\n    def forward(self, x: torch.Tensor):\n        orig_type = x.dtype\n        ret = super().forward(x.type(torch.float32))\n        return ret.type(orig_type)\n\n\nclass QuickGELU(nn.Module):\n    def forward(self, x: torch.Tensor):\n        return x * torch.sigmoid(1.702 * x)\n\n\nclass ResidualAttentionBlock(nn.Module):\n    def __init__(self, d_model: int, n_head: int, attn_mask: torch.Tensor = None):\n        super().__init__()\n\n        self.attn = nn.MultiheadAttention(d_model, n_head)\n        self.ln_1 = LayerNorm(d_model)\n        self.mlp = nn.Sequential(OrderedDict([\n            (\"c_fc\", nn.Linear(d_model, d_model * 4)),\n            (\"gelu\", QuickGELU()),\n            (\"c_proj\", nn.Linear(d_model * 4, d_model))\n        ]))\n        self.ln_2 = LayerNorm(d_model)\n        self.attn_mask = attn_mask\n\n    def attention(self, x: torch.Tensor):\n        self.attn_mask = self.attn_mask.to(dtype=x.dtype, device=x.device) if self.attn_mask is not None else None\n        return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]\n\n    def forward(self, x: torch.Tensor):\n        x = x + self.attention(self.ln_1(x))\n        x = x + self.mlp(self.ln_2(x))\n        return x\n\n\nclass Transformer(nn.Module):\n    def __init__(self, width: int, layers: int, heads: int, attn_mask: torch.Tensor = None,\n                 checkpoint_num: int = 0):\n        super().__init__()\n        self.width = width\n        self.layers = layers\n        self.resblocks = nn.Sequential(*[ResidualAttentionBlock(width, heads, attn_mask) for _ in range(layers)])\n\n        self.checkpoint_num = checkpoint_num\n\n    def forward(self, x: torch.Tensor):\n        if self.checkpoint_num > 0:\n            segments = min(self.checkpoint_num, len(self.resblocks))\n            return checkpoint.checkpoint_sequential(self.resblocks, segments, x)\n        else:\n            return self.resblocks(x)\n\n\nclass CLIP_TEXT(nn.Module):\n    def __init__(\n            self,\n            embed_dim: int,\n            context_length: int,\n            vocab_size: int,\n            transformer_width: int,\n            transformer_heads: int,\n            transformer_layers: int,\n            checkpoint_num: int,\n        ):\n        super().__init__()\n\n        self.context_length = context_length\n        self._tokenizer = _Tokenizer()\n\n        self.transformer = Transformer(\n            width=transformer_width,\n            layers=transformer_layers,\n            heads=transformer_heads,\n            attn_mask=self.build_attention_mask(),\n            checkpoint_num=checkpoint_num,\n        )\n\n        self.vocab_size = vocab_size\n        self.token_embedding = nn.Embedding(vocab_size, transformer_width)\n        self.positional_embedding = nn.Parameter(torch.empty(self.context_length, transformer_width))\n        self.ln_final = LayerNorm(transformer_width)\n\n        self.text_projection = nn.Parameter(torch.empty(transformer_width, embed_dim))\n    \n    def no_weight_decay(self):\n        return {'token_embedding', 'positional_embedding'}\n\n    @functools.lru_cache(maxsize=None)\n    def build_attention_mask(self):\n        # lazily create causal attention mask, with full attention between the vision tokens\n        # pytorch uses additive attention mask; fill with -inf\n        mask = torch.empty(self.context_length, self.context_length)\n        mask.fill_(float(\"-inf\"))\n        mask.triu_(1)  # zero out the lower diagonal\n        return mask\n\n    def tokenize(self, texts, context_length=77, truncate=True):\n        \"\"\"\n        Returns the tokenized representation of given input string(s)\n        Parameters\n        ----------\n        texts : Union[str, List[str]]\n            An input string or a list of input strings to tokenize\n        context_length : int\n            The context length to use; all CLIP models use 77 as the context length\n        truncate: bool\n            Whether to truncate the text in case its encoding is longer than the context length\n        Returns\n        -------\n        A two-dimensional tensor containing the resulting tokens, shape = [number of input strings, context_length].\n        We return LongTensor when torch version is <1.8.0, since older index_select requires indices to be long.\n        \"\"\"\n        if isinstance(texts, str):\n            texts = [texts]\n\n        sot_token = self._tokenizer.encoder[\"<|startoftext|>\"]\n        eot_token = self._tokenizer.encoder[\"<|endoftext|>\"]\n        all_tokens = [[sot_token] + self._tokenizer.encode(text) + [eot_token] for text in texts]\n        if packaging.version.parse(torch.__version__) < packaging.version.parse(\"1.8.0\"):\n            result = torch.zeros(len(all_tokens), context_length, dtype=torch.long)\n        else:\n            result = torch.zeros(len(all_tokens), context_length, dtype=torch.int)\n\n        for i, tokens in enumerate(all_tokens):\n            if len(tokens) > context_length:\n                if truncate:\n                    tokens = tokens[:context_length]\n                    tokens[-1] = eot_token\n                else:\n                    raise RuntimeError(f\"Input {texts[i]} is too long for context length {context_length}\")\n            result[i, :len(tokens)] = torch.tensor(tokens)\n\n        return result\n\n    def forward(self, text):\n        x = self.token_embedding(text)  # [batch_size, n_ctx, d_model]\n\n        x = x + self.positional_embedding\n        x = x.permute(1, 0, 2)  # NLD -> LND\n        x = self.transformer(x)\n        x = x.permute(1, 0, 2)  # LND -> NLD\n        x = self.ln_final(x)\n\n        # x.shape = [batch_size, n_ctx, transformer.width]\n        # take features from the eot embedding (eot_token is the highest number in each sequence)\n        x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)] @ self.text_projection\n\n        return x\n\n\ndef clip_text_b16(\n    embed_dim=512,\n    context_length=77,\n    vocab_size=49408,\n    transformer_width=512,\n    transformer_heads=8,\n    transformer_layers=12,\n):\n    raise NotImplementedError\n    model = CLIP_TEXT(\n        embed_dim,\n        context_length,\n        vocab_size,\n        transformer_width,\n        transformer_heads,\n        transformer_layers\n    )\n    pretrained = _MODELS[\"ViT-B/16\"]\n    logger.info(f\"Load pretrained weights from {pretrained}\")\n    state_dict = torch.load(pretrained, map_location='cpu')\n    model.load_state_dict(state_dict, strict=False)\n    return model.eval()\n\n\ndef clip_text_l14(\n    embed_dim=768,\n    context_length=77,\n    vocab_size=49408,\n    transformer_width=768,\n    transformer_heads=12,\n    transformer_layers=12,\n    checkpoint_num=0,\n    pretrained=True,\n):\n    model = CLIP_TEXT(\n        embed_dim,\n        context_length,\n        vocab_size,\n        transformer_width,\n        transformer_heads,\n        transformer_layers,\n        checkpoint_num,\n    )\n    if pretrained:\n        if isinstance(pretrained, str) and pretrained != \"bert-base-uncased\":\n            pretrained = _MODELS[pretrained]\n        else:\n            pretrained = _MODELS[\"ViT-L/14\"]\n        logger.info(f\"Load pretrained weights from {pretrained}\")\n        state_dict = torch.load(pretrained, map_location='cpu')\n        if context_length != state_dict[\"positional_embedding\"].size(0):\n            # assert context_length < state_dict[\"positional_embedding\"].size(0), \"Cannot increase context length.\"\n            print(f\"Resize positional embedding from {state_dict['positional_embedding'].size(0)} to {context_length}\")\n            if context_length < state_dict[\"positional_embedding\"].size(0):\n                state_dict[\"positional_embedding\"] = state_dict[\"positional_embedding\"][:context_length]\n            else:\n                state_dict[\"positional_embedding\"] = F.pad(\n                    state_dict[\"positional_embedding\"],\n                    (0, 0, 0, context_length - state_dict[\"positional_embedding\"].size(0)),\n                    value=0,\n                )\n\n        message = model.load_state_dict(state_dict, strict=False)\n        print(f\"Load pretrained weights from {pretrained}: {message}\")\n    return model.eval()\n\n\ndef clip_text_l14_336(\n    embed_dim=768,\n    context_length=77,\n    vocab_size=49408,\n    transformer_width=768,\n    transformer_heads=12,\n    transformer_layers=12,\n):\n    raise NotImplementedError\n    model = CLIP_TEXT(\n        embed_dim,\n        context_length,\n        vocab_size,\n        transformer_width,\n        transformer_heads,\n        transformer_layers\n    )\n    pretrained = _MODELS[\"ViT-L/14_336\"]\n    logger.info(f\"Load pretrained weights from {pretrained}\")\n    state_dict = torch.load(pretrained, map_location='cpu')\n    model.load_state_dict(state_dict, strict=False)\n    return model.eval()\n\n\ndef build_clip(config):\n    model_cls = config.text_encoder.clip_teacher\n    model = eval(model_cls)()\n    return model\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/ViCLIP/viclip_vision.py",
    "content": "#!/usr/bin/env python\nimport os\nimport logging\nfrom collections import OrderedDict\n\nimport torch\nfrom torch import nn\nfrom einops import rearrange\nfrom timm.models.layers import DropPath\nfrom timm.models.registry import register_model\n\nimport torch.utils.checkpoint as checkpoint\n\nlogger = logging.getLogger(__name__)\n\ndef load_temp_embed_with_mismatch(temp_embed_old, temp_embed_new, add_zero=True):\n    \"\"\"\n    Add/Remove extra temporal_embeddings as needed.\n    https://arxiv.org/abs/2104.00650 shows adding zero paddings works.\n\n    temp_embed_old: (1, num_frames_old, 1, d)\n    temp_embed_new: (1, num_frames_new, 1, d)\n    add_zero: bool, if True, add zero, else, interpolate trained embeddings.\n    \"\"\"\n    # TODO zero pad\n    num_frms_new = temp_embed_new.shape[1]\n    num_frms_old = temp_embed_old.shape[1]\n    logger.info(f\"Load temporal_embeddings, lengths: {num_frms_old}-->{num_frms_new}\")\n    if num_frms_new > num_frms_old:\n        if add_zero:\n            temp_embed_new[\n                :, :num_frms_old\n            ] = temp_embed_old  # untrained embeddings are zeros.\n        else:\n            temp_embed_new = interpolate_temporal_pos_embed(temp_embed_old, num_frms_new)\n    elif num_frms_new < num_frms_old:\n        temp_embed_new = temp_embed_old[:, :num_frms_new]\n    else:  # =\n        temp_embed_new = temp_embed_old\n    return temp_embed_new\n\n\nMODEL_PATH = 'https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/viclip/'\n_MODELS = {\n    \"ViT-L/14\": os.path.join(MODEL_PATH, \"ViClip-InternVid-10M-FLT.pth\"),\n}\n\n\nclass QuickGELU(nn.Module):\n    def forward(self, x):\n        return x * torch.sigmoid(1.702 * x)\n\n\nclass ResidualAttentionBlock(nn.Module):\n    def __init__(self, d_model, n_head, drop_path=0., attn_mask=None, dropout=0.):\n        super().__init__()\n\n        self.drop_path1 = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n        self.drop_path2 = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n        self.attn = nn.MultiheadAttention(d_model, n_head, dropout=dropout)\n        self.ln_1 = nn.LayerNorm(d_model)\n        self.mlp = nn.Sequential(OrderedDict([\n            (\"c_fc\", nn.Linear(d_model, d_model * 4)),\n            (\"gelu\", QuickGELU()),\n            (\"drop1\", nn.Dropout(dropout)),\n            (\"c_proj\", nn.Linear(d_model * 4, d_model)),\n            (\"drop2\", nn.Dropout(dropout)),\n        ]))\n        self.ln_2 = nn.LayerNorm(d_model)\n        self.attn_mask = attn_mask\n\n    def attention(self, x):\n        self.attn_mask = self.attn_mask.to(dtype=x.dtype, device=x.device) if self.attn_mask is not None else None\n        return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]\n\n    def forward(self, x):\n        x = x + self.drop_path1(self.attention(self.ln_1(x)))\n        x = x + self.drop_path2(self.mlp(self.ln_2(x)))\n        return x\n\n\nclass Transformer(nn.Module):\n    def __init__(self, width, layers, heads, drop_path=0., checkpoint_num=0, dropout=0.):\n        super().__init__()\n        dpr = [x.item() for x in torch.linspace(0, drop_path, layers)]\n        self.resblocks = nn.ModuleList()\n        for idx in range(layers):\n            self.resblocks.append(ResidualAttentionBlock(width, heads, drop_path=dpr[idx], dropout=dropout))\n        self.checkpoint_num = checkpoint_num\n\n    def forward(self, x):\n        for idx, blk in enumerate(self.resblocks):\n            if idx < self.checkpoint_num:\n                x = checkpoint.checkpoint(blk, x)\n            else:\n                x = blk(x)\n        return x\n\n\nclass VisionTransformer(nn.Module):\n    def __init__(\n        self, input_resolution, patch_size, width, layers, heads, output_dim=None, \n        kernel_size=1, num_frames=8, drop_path=0, checkpoint_num=0, dropout=0.,\n        temp_embed=True,\n    ):\n        super().__init__()\n        self.output_dim = output_dim\n        self.conv1 = nn.Conv3d(\n            3, width, \n            (kernel_size, patch_size, patch_size), \n            (kernel_size, patch_size, patch_size), \n            (0, 0, 0), bias=False\n        )\n\n        scale = width ** -0.5\n        self.class_embedding = nn.Parameter(scale * torch.randn(width))\n        self.positional_embedding = nn.Parameter(scale * torch.randn((input_resolution // patch_size) ** 2 + 1, width))\n        self.ln_pre = nn.LayerNorm(width)\n        if temp_embed:\n            self.temporal_positional_embedding = nn.Parameter(torch.zeros(1, num_frames, width))\n        \n        self.transformer = Transformer(\n            width, layers, heads, drop_path=drop_path, checkpoint_num=checkpoint_num,\n            dropout=dropout)\n\n        self.ln_post = nn.LayerNorm(width)\n        if output_dim is not None:\n            self.proj = nn.Parameter(torch.empty(width, output_dim))\n        else:\n            self.proj = None\n        \n        self.dropout = nn.Dropout(dropout)\n\n    def get_num_layers(self):\n        return len(self.transformer.resblocks)\n\n    @torch.jit.ignore\n    def no_weight_decay(self):\n        return {'positional_embedding', 'class_embedding', 'temporal_positional_embedding'}\n    \n    def mask_tokens(self, inputs, masking_prob=0.0):\n        B, L, _ = inputs.shape\n\n        # This is different from text as we are masking a fix number of tokens\n        Lm = int(masking_prob * L)\n        masked_indices = torch.zeros(B, L)\n        indices = torch.argsort(torch.rand_like(masked_indices), dim=-1)[:, :Lm]\n        batch_indices = (\n            torch.arange(masked_indices.shape[0]).unsqueeze(-1).expand_as(indices)\n        )\n        masked_indices[batch_indices, indices] = 1\n\n        masked_indices = masked_indices.bool()\n\n        return inputs[~masked_indices].reshape(B, -1, inputs.shape[-1])\n\n    def forward(self, x, masking_prob=0.0):\n        x = self.conv1(x)  # shape = [*, width, grid, grid]\n        B, C, T, H, W = x.shape\n        x = x.permute(0, 2, 3, 4, 1).reshape(B * T, H * W, C)\n\n        x = torch.cat([self.class_embedding.to(x.dtype) + torch.zeros(x.shape[0], 1, x.shape[-1], dtype=x.dtype, device=x.device), x], dim=1)  # shape = [*, grid ** 2 + 1, width]\n        x = x + self.positional_embedding.to(x.dtype)\n\n        # temporal pos\n        cls_tokens = x[:B, :1, :]\n        x = x[:, 1:]\n        x = rearrange(x, '(b t) n m -> (b n) t m', b=B, t=T)\n        if hasattr(self, 'temporal_positional_embedding'):\n            if x.size(1) == 1:\n                # This is a workaround for unused parameter issue\n                x = x + self.temporal_positional_embedding.mean(1)\n            else:\n                x = x + self.temporal_positional_embedding\n        x = rearrange(x, '(b n) t m -> b (n t) m', b=B, t=T)\n\n        if masking_prob > 0.0:\n            x = self.mask_tokens(x, masking_prob)\n\n        x = torch.cat((cls_tokens, x), dim=1)\n\n        x = self.ln_pre(x)\n\n        x = x.permute(1, 0, 2)  #BND -> NBD\n        x = self.transformer(x)\n\n        x = self.ln_post(x)\n\n        if self.proj is not None:\n            x = self.dropout(x[0]) @ self.proj\n        else:\n            x = x.permute(1, 0, 2)  #NBD -> BND\n\n        return x\n\n\ndef inflate_weight(weight_2d, time_dim, center=True):\n    logger.info(f'Init center: {center}')\n    if center:\n        weight_3d = torch.zeros(*weight_2d.shape)\n        weight_3d = weight_3d.unsqueeze(2).repeat(1, 1, time_dim, 1, 1)\n        middle_idx = time_dim // 2\n        weight_3d[:, :, middle_idx, :, :] = weight_2d\n    else:\n        weight_3d = weight_2d.unsqueeze(2).repeat(1, 1, time_dim, 1, 1)\n        weight_3d = weight_3d / time_dim\n    return weight_3d\n\n\ndef load_state_dict(model, state_dict, input_resolution=224, patch_size=16, center=True):\n    state_dict_3d = model.state_dict()\n    for k in state_dict.keys():\n        if k in state_dict_3d.keys() and state_dict[k].shape != state_dict_3d[k].shape:\n            if len(state_dict_3d[k].shape) <= 2:\n                logger.info(f'Ignore: {k}')\n                continue\n            logger.info(f'Inflate: {k}, {state_dict[k].shape} => {state_dict_3d[k].shape}')\n            time_dim = state_dict_3d[k].shape[2]\n            state_dict[k] = inflate_weight(state_dict[k], time_dim, center=center)\n\n    pos_embed_checkpoint = state_dict['positional_embedding']\n    embedding_size = pos_embed_checkpoint.shape[-1]\n    num_patches = (input_resolution // patch_size) ** 2\n    orig_size = int((pos_embed_checkpoint.shape[-2] - 1) ** 0.5)\n    new_size = int(num_patches ** 0.5)\n    if orig_size != new_size:\n        logger.info(f'Pos_emb from {orig_size} to {new_size}')\n        extra_tokens = pos_embed_checkpoint[:1]\n        pos_tokens = pos_embed_checkpoint[1:]\n        pos_tokens = pos_tokens.reshape(-1, orig_size, orig_size, embedding_size).permute(0, 3, 1, 2)\n        pos_tokens = torch.nn.functional.interpolate(\n            pos_tokens, size=(new_size, new_size), mode='bicubic', align_corners=False)\n        pos_tokens = pos_tokens.permute(0, 2, 3, 1).flatten(0, 2)\n        new_pos_embed = torch.cat((extra_tokens, pos_tokens), dim=0)\n        state_dict['positional_embedding'] = new_pos_embed\n    \n    message = model.load_state_dict(state_dict, strict=False)\n    logger.info(f\"Load pretrained weights: {message}\")\n\n\n@register_model\ndef clip_joint_b16(\n    pretrained=True, input_resolution=224, kernel_size=1,\n    center=True, num_frames=8, drop_path=0.\n):\n    model = VisionTransformer(\n        input_resolution=input_resolution, patch_size=16, \n        width=768, layers=12, heads=12, output_dim=512,\n        kernel_size=kernel_size, num_frames=num_frames, \n        drop_path=drop_path,\n    )\n    raise NotImplementedError\n    if pretrained:\n        logger.info('load pretrained weights')\n        state_dict = torch.load(_MODELS[\"ViT-B/16\"], map_location='cpu')\n        load_state_dict(model, state_dict, input_resolution=input_resolution, patch_size=16, center=center)\n    return model.eval()\n\n\n@register_model\ndef clip_joint_l14(\n    pretrained=False, input_resolution=224, kernel_size=1,\n    center=True, num_frames=8, drop_path=0., checkpoint_num=0,\n    dropout=0.,\n):\n    model = VisionTransformer(\n        input_resolution=input_resolution, patch_size=14,\n        width=1024, layers=24, heads=16, output_dim=768,\n        kernel_size=kernel_size, num_frames=num_frames, \n        drop_path=drop_path, checkpoint_num=checkpoint_num,\n        dropout=dropout,\n    )\n    if pretrained:\n        if isinstance(pretrained, str):\n            model_name = pretrained\n        else:\n            model_name = \"ViT-L/14\"\n        logger.info('load pretrained weights')\n        state_dict = torch.load(_MODELS[model_name], map_location='cpu')\n        load_state_dict(model, state_dict, input_resolution=input_resolution, patch_size=14, center=center)\n    return model.eval()\n\n\n@register_model\ndef clip_joint_l14_336(\n    pretrained=True, input_resolution=336, kernel_size=1,\n    center=True, num_frames=8, drop_path=0.\n):\n    raise NotImplementedError\n    model = VisionTransformer(\n        input_resolution=input_resolution, patch_size=14, \n        width=1024, layers=24, heads=16, output_dim=768,\n        kernel_size=kernel_size, num_frames=num_frames,\n        drop_path=drop_path,\n    )\n    if pretrained:\n        logger.info('load pretrained weights')\n        state_dict = torch.load(_MODELS[\"ViT-L/14_336\"], map_location='cpu')\n        load_state_dict(model, state_dict, input_resolution=input_resolution, patch_size=14, center=center)\n    return model.eval()\n\n\ndef interpolate_pos_embed_vit(state_dict, new_model):\n    key = \"vision_encoder.temporal_positional_embedding\"\n    if key in state_dict:\n        vision_temp_embed_new = new_model.state_dict()[key]\n        vision_temp_embed_new = vision_temp_embed_new.unsqueeze(2)  # [1, n, d] -> [1, n, 1, d]\n        vision_temp_embed_old = state_dict[key]\n        vision_temp_embed_old = vision_temp_embed_old.unsqueeze(2)\n\n        state_dict[key] = load_temp_embed_with_mismatch(\n            vision_temp_embed_old, vision_temp_embed_new, add_zero=False\n        ).squeeze(2)\n\n    key = \"text_encoder.positional_embedding\"\n    if key in state_dict:\n        text_temp_embed_new = new_model.state_dict()[key]\n        text_temp_embed_new = text_temp_embed_new.unsqueeze(0).unsqueeze(2)  # [n, d] -> [1, n, 1, d]\n        text_temp_embed_old = state_dict[key]\n        text_temp_embed_old = text_temp_embed_old.unsqueeze(0).unsqueeze(2)\n\n        state_dict[key] = load_temp_embed_with_mismatch(\n            text_temp_embed_old, text_temp_embed_new, add_zero=False\n        ).squeeze(2).squeeze(0)\n    return state_dict\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/benchmarks/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/benchmarks/adobe240.py",
    "content": "import sys\nimport tqdm\nimport torch\nimport argparse\nimport numpy as np\nfrom omegaconf import OmegaConf\n\nsys.path.append('.')\nfrom utils.build_utils import build_from_cfg\nfrom datasets.adobe_datasets import Adobe240_Dataset\nfrom metrics.psnr_ssim import calculate_psnr, calculate_ssim\n\nparser = argparse.ArgumentParser(\n                prog = 'AMT',\n                description = 'Adobe240 evaluation',\n                )\nparser.add_argument('-c', '--config', default='cfgs/AMT-S_gopro.yaml') \nparser.add_argument('-p', '--ckpt', default='pretrained/gopro_amt-s.pth',) \nparser.add_argument('-r', '--root', default='data/Adobe240/test_frames',) \nargs = parser.parse_args()\n\ndevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\ncfg_path = args.config\nckpt_path = args.ckpt\nroot = args.root\n\nnetwork_cfg = OmegaConf.load(cfg_path).network\nnetwork_name = network_cfg.name\nmodel = build_from_cfg(network_cfg)\nckpt = torch.load(ckpt_path)\nmodel.load_state_dict(ckpt['state_dict'])\nmodel = model.to(device)\nmodel.eval()\n\ndataset = Adobe240_Dataset(dataset_dir=root, augment=False)\n\npsnr_list = []\nssim_list = []\npbar = tqdm.tqdm(dataset, total=len(dataset))\nfor data in pbar:\n    input_dict = {}\n    for k, v in data.items():\n        input_dict[k] = v.to(device).unsqueeze(0)\n    with torch.no_grad():\n        imgt_pred = model(**input_dict)['imgt_pred']\n        psnr = calculate_psnr(imgt_pred, input_dict['imgt'])\n        ssim = calculate_ssim(imgt_pred, input_dict['imgt'])\n    psnr_list.append(psnr)\n    ssim_list.append(ssim)\n    avg_psnr = np.mean(psnr_list)\n    avg_ssim = np.mean(ssim_list)\n    desc_str = f'[{network_name}/Adobe240] psnr: {avg_psnr:.02f}, ssim: {avg_ssim:.04f}'\n    pbar.set_description_str(desc_str)\n\n\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/benchmarks/gopro.py",
    "content": "import sys\nimport tqdm\nimport torch\nimport argparse\nimport numpy as np\nfrom omegaconf import OmegaConf\n\nsys.path.append('.')\nfrom utils.build_utils import build_from_cfg\nfrom datasets.gopro_datasets import GoPro_Test_Dataset\nfrom metrics.psnr_ssim import calculate_psnr, calculate_ssim\n\nparser = argparse.ArgumentParser(\n                prog = 'AMT',\n                description = 'GOPRO evaluation',\n                )\nparser.add_argument('-c', '--config', default='cfgs/AMT-S_gopro.yaml') \nparser.add_argument('-p', '--ckpt', default='pretrained/gopro_amt-s.pth',) \nparser.add_argument('-r', '--root', default='data/GOPRO',) \nargs = parser.parse_args()\n\ndevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\ncfg_path = args.config\nckpt_path = args.ckpt\nroot = args.root\n\nnetwork_cfg = OmegaConf.load(cfg_path).network\nnetwork_name = network_cfg.name\nmodel = build_from_cfg(network_cfg)\nckpt = torch.load(ckpt_path)\nmodel.load_state_dict(ckpt['state_dict'])\nmodel = model.to(device)\nmodel.eval()\n\ndataset = GoPro_Test_Dataset(dataset_dir=root)\n\npsnr_list = []\nssim_list = []\npbar = tqdm.tqdm(dataset, total=len(dataset))\nfor data in pbar:\n    input_dict = {}\n    for k, v in data.items():\n        input_dict[k] = v.to(device).unsqueeze(0)\n    with torch.no_grad():\n        imgt_pred = model(**input_dict)['imgt_pred']\n        psnr = calculate_psnr(imgt_pred, input_dict['imgt'])\n        ssim = calculate_ssim(imgt_pred, input_dict['imgt'])\n    psnr_list.append(psnr)\n    ssim_list.append(ssim)\n    avg_psnr = np.mean(psnr_list)\n    avg_ssim = np.mean(ssim_list)\n    desc_str = f'[{network_name}/GOPRO] psnr: {avg_psnr:.02f}, ssim: {avg_ssim:.04f}'\n    pbar.set_description_str(desc_str)\n\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/benchmarks/snu_film.py",
    "content": "import os\nimport sys\nimport tqdm\nimport torch\nimport argparse\nimport numpy as np\nimport os.path as osp\nfrom omegaconf import OmegaConf\n\nsys.path.append('.')\nfrom utils.build_utils import build_from_cfg\nfrom metrics.psnr_ssim import calculate_psnr, calculate_ssim\nfrom utils.utils import InputPadder, read, img2tensor\n\n\ndef parse_path(path):\n    path_list = path.split('/')\n    new_path = osp.join(*path_list[-3:])\n    return new_path\n\nparser = argparse.ArgumentParser(\n                prog = 'AMT',\n                description = 'SNU-FILM evaluation',\n                )\nparser.add_argument('-c', '--config', default='cfgs/AMT-S.yaml') \nparser.add_argument('-p', '--ckpt', default='pretrained/amt-s.pth')\nparser.add_argument('-r', '--root', default='data/SNU_FILM') \nargs = parser.parse_args()\n\ndevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\ncfg_path = args.config\nckpt_path = args.ckpt\nroot = args.root\n\nnetwork_cfg = OmegaConf.load(cfg_path).network\nnetwork_name = network_cfg.name\nmodel = build_from_cfg(network_cfg)\nckpt = torch.load(ckpt_path)\nmodel.load_state_dict(ckpt['state_dict'])\nmodel = model.to(device)\nmodel.eval()\n\ndivisor = 20; scale_factor = 0.8\nsplits = ['easy', 'medium', 'hard', 'extreme']\nfor split in splits:\n    with open(os.path.join(root, f'test-{split}.txt'), \"r\") as fr:\n        file_list = [l.strip().split(' ') for l in fr.readlines()]\n    pbar = tqdm.tqdm(file_list, total=len(file_list))\n    \n    psnr_list = []; ssim_list = []\n    for name in pbar:\n        img0 = img2tensor(read(osp.join(root, parse_path(name[0])))).to(device)\n        imgt = img2tensor(read(osp.join(root, parse_path(name[1])))).to(device)\n        img1 = img2tensor(read(osp.join(root, parse_path(name[2])))).to(device)\n        padder = InputPadder(img0.shape, divisor)\n        img0, img1 = padder.pad(img0, img1)\n            \n        embt = torch.tensor(1/2).float().view(1, 1, 1, 1).to(device)\n        imgt_pred = model(img0, img1, embt, scale_factor=scale_factor, eval=True)['imgt_pred']\n        imgt_pred = padder.unpad(imgt_pred)\n\n        psnr = calculate_psnr(imgt_pred, imgt).detach().cpu().numpy()\n        ssim = calculate_ssim(imgt_pred, imgt).detach().cpu().numpy()\n\n        psnr_list.append(psnr)\n        ssim_list.append(ssim)\n        avg_psnr = np.mean(psnr_list)\n        avg_ssim = np.mean(ssim_list)\n        desc_str = f'[{network_name}/SNU-FILM] [{split}] psnr: {avg_psnr:.02f}, ssim: {avg_ssim:.04f}'\n        pbar.set_description_str(desc_str)\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/benchmarks/speed_parameters.py",
    "content": "import sys\nimport time\nimport torch\nimport argparse\nfrom omegaconf import OmegaConf\n\nsys.path.append('.')\nfrom utils.build_utils import build_from_cfg\n\nparser = argparse.ArgumentParser(\n                prog = 'AMT',\n                description = 'Speed&parameter benchmark',\n                )\nparser.add_argument('-c', '--config', default='cfgs/AMT-S.yaml') \nargs = parser.parse_args()\n\ncfg_path = args.config\nnetwork_cfg = OmegaConf.load(cfg_path).network\nmodel = build_from_cfg(network_cfg)\nmodel = model.cuda()\nmodel.eval()\n\nimg0 = torch.randn(1, 3, 256, 448).cuda()\nimg1 = torch.randn(1, 3, 256, 448).cuda()\nembt = torch.tensor(1/2).float().view(1, 1, 1, 1).cuda()\n\nwith torch.no_grad():\n    for i in range(100):\n        out = model(img0, img1, embt, eval=True)\n    torch.cuda.synchronize()\n    time_stamp = time.time()\n    for i in range(1000):\n        out = model(img0, img1, embt, eval=True)\n    torch.cuda.synchronize()\n    print('Time: {:.5f}s'.format((time.time() - time_stamp) / 1))\n\ntotal = sum([param.nelement() for param in model.parameters()])\nprint('Parameters: {:.2f}M'.format(total / 1e6))\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/benchmarks/ucf101.py",
    "content": "import os\nimport sys\nimport tqdm\nimport torch\nimport argparse\nimport numpy as np\nimport os.path as osp\nfrom omegaconf import OmegaConf\n\nsys.path.append('.')\nfrom utils.utils import read, img2tensor\nfrom utils.build_utils import build_from_cfg\nfrom metrics.psnr_ssim import calculate_psnr, calculate_ssim\n\nparser = argparse.ArgumentParser(\n                prog = 'AMT',\n                description = 'UCF101 evaluation',\n                )\nparser.add_argument('-c', '--config', default='cfgs/AMT-S.yaml') \nparser.add_argument('-p', '--ckpt', default='pretrained/amt-s.pth') \nparser.add_argument('-r', '--root', default='data/ucf101_interp_ours') \nargs = parser.parse_args()\n\ndevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\ncfg_path = args.config\nckpt_path = args.ckpt\nroot = args.root\n\nnetwork_cfg = OmegaConf.load(cfg_path).network\nnetwork_name = network_cfg.name\nmodel = build_from_cfg(network_cfg)\nckpt = torch.load(ckpt_path)\nmodel.load_state_dict(ckpt['state_dict'])\nmodel = model.to(device)\nmodel.eval()\n\ndirs = sorted(os.listdir(root))\npsnr_list = []\nssim_list = []\npbar = tqdm.tqdm(dirs, total=len(dirs))\nfor d in pbar:\n    dir_path = osp.join(root, d)\n    I0 = img2tensor(read(osp.join(dir_path, 'frame_00.png'))).to(device)\n    I1 = img2tensor(read(osp.join(dir_path, 'frame_01_gt.png'))).to(device)\n    I2 = img2tensor(read(osp.join(dir_path, 'frame_02.png'))).to(device)\n    embt = torch.tensor(1/2).float().view(1, 1, 1, 1).to(device)\n\n    I1_pred = model(I0, I2, embt, eval=True)['imgt_pred']\n\n    psnr = calculate_psnr(I1_pred, I1).detach().cpu().numpy()\n    ssim = calculate_ssim(I1_pred, I1).detach().cpu().numpy()\n\n    psnr_list.append(psnr)\n    ssim_list.append(ssim)\n    \n    avg_psnr = np.mean(psnr_list)\n    avg_ssim = np.mean(ssim_list)\n    desc_str = f'[{network_name}/UCF101] psnr: {avg_psnr:.02f}, ssim: {avg_ssim:.04f}'\n    pbar.set_description_str(desc_str)"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/benchmarks/vimeo90k.py",
    "content": "import sys\nimport tqdm\nimport torch\nimport argparse\nimport numpy as np\nimport os.path as osp\nfrom omegaconf import OmegaConf\n\nsys.path.append('.')\nfrom utils.utils import read, img2tensor\nfrom utils.build_utils import build_from_cfg\nfrom metrics.psnr_ssim import calculate_psnr, calculate_ssim\n\nparser = argparse.ArgumentParser(\n                prog = 'AMT',\n                description = 'Vimeo90K evaluation',\n                )\nparser.add_argument('-c', '--config', default='cfgs/AMT-S.yaml') \nparser.add_argument('-p', '--ckpt', default='pretrained/amt-s.pth',) \nparser.add_argument('-r', '--root', default='data/vimeo_triplet',) \nargs = parser.parse_args()\n\ndevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\ncfg_path = args.config\nckpt_path = args.ckpt\nroot = args.root\n\nnetwork_cfg = OmegaConf.load(cfg_path).network\nnetwork_name = network_cfg.name\nmodel = build_from_cfg(network_cfg)\nckpt = torch.load(ckpt_path)\nmodel.load_state_dict(ckpt['state_dict'])\nmodel = model.to(device)\nmodel.eval()\n\nwith open(osp.join(root, 'tri_testlist.txt'), 'r') as fr:\n    file_list = fr.readlines()\n\npsnr_list = []\nssim_list = []\n\npbar = tqdm.tqdm(file_list, total=len(file_list))\nfor name in pbar:\n    name = str(name).strip()\n    if(len(name) <= 1):\n        continue\n    dir_path = osp.join(root, 'sequences', name)\n    I0 = img2tensor(read(osp.join(dir_path, 'im1.png'))).to(device)\n    I1 = img2tensor(read(osp.join(dir_path, 'im2.png'))).to(device)\n    I2 = img2tensor(read(osp.join(dir_path, 'im3.png'))).to(device)\n    embt = torch.tensor(1/2).float().view(1, 1, 1, 1).to(device)\n\n    I1_pred = model(I0, I2, embt, \n                        scale_factor=1.0, eval=True)['imgt_pred']\n\n    psnr = calculate_psnr(I1_pred, I1).detach().cpu().numpy()\n    ssim = calculate_ssim(I1_pred, I1).detach().cpu().numpy()\n\n    psnr_list.append(psnr)\n    ssim_list.append(ssim)\n    avg_psnr = np.mean(psnr_list)\n    avg_ssim = np.mean(ssim_list)\n    desc_str = f'[{network_name}/Vimeo90K] psnr: {avg_psnr:.02f}, ssim: {avg_ssim:.04f}'\n    pbar.set_description_str(desc_str)\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/benchmarks/vimeo90k_tta.py",
    "content": "import sys\nimport tqdm\nimport torch\nimport argparse\nimport numpy as np\nimport os.path as osp\nfrom omegaconf import OmegaConf\n\nsys.path.append('.')\nfrom utils.utils import read, img2tensor\nfrom utils.build_utils import build_from_cfg\nfrom metrics.psnr_ssim import calculate_psnr, calculate_ssim\n\nparser = argparse.ArgumentParser(\n                prog = 'AMT',\n                description = 'Vimeo90K evaluation (with Test-Time Augmentation)',\n                )\nparser.add_argument('-c', '--config', default='cfgs/AMT-S.yaml') \nparser.add_argument('p', '--ckpt', default='pretrained/amt-s.pth',) \nparser.add_argument('-r', '--root', default='data/vimeo_triplet',) \nargs = parser.parse_args()\n\ndevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\ncfg_path = args.config\nckpt_path = args.ckpt\nroot = args.root\n\nnetwork_cfg = OmegaConf.load(cfg_path).network\nnetwork_name = network_cfg.name\nmodel = build_from_cfg(network_cfg)\nckpt = torch.load(ckpt_path)\nmodel.load_state_dict(ckpt['state_dict'])\nmodel = model.to(device)\nmodel.eval()\n\nwith open(osp.join(root, 'tri_testlist.txt'), 'r') as fr:\n    file_list = fr.readlines()\n\npsnr_list = []\nssim_list = []\n\npbar = tqdm.tqdm(file_list, total=len(file_list))\nfor name in pbar:\n    name = str(name).strip()\n    if(len(name) <= 1):\n        continue\n    dir_path = osp.join(root, 'sequences', name)\n    I0 = img2tensor(read(osp.join(dir_path, 'im1.png'))).to(device)\n    I1 = img2tensor(read(osp.join(dir_path, 'im2.png'))).to(device)\n    I2 = img2tensor(read(osp.join(dir_path, 'im3.png'))).to(device)\n    embt = torch.tensor(1/2).float().view(1, 1, 1, 1).to(device)\n\n    I1_pred1 = model(I0, I2, embt, \n                        scale_factor=1.0, eval=True)['imgt_pred']\n    I1_pred2 = model(torch.flip(I0, [2]), torch.flip(I2, [2]), embt, \n                        scale_factor=1.0, eval=True)['imgt_pred']\n    I1_pred = I1_pred1 / 2 + torch.flip(I1_pred2, [2]) / 2\n    psnr = calculate_psnr(I1_pred, I1).detach().cpu().numpy()\n    ssim = calculate_ssim(I1_pred, I1).detach().cpu().numpy()\n\n    psnr_list.append(psnr)\n    ssim_list.append(ssim)\n    avg_psnr = np.mean(psnr_list)\n    avg_ssim = np.mean(ssim_list)\n    desc_str = f'[{network_name}/Vimeo90K] psnr: {avg_psnr:.02f}, ssim: {avg_ssim:.04f}'\n    pbar.set_description_str(desc_str)\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/benchmarks/xiph.py",
    "content": "import os\nimport sys\nimport cv2\nimport tqdm\nimport glob\nimport torch\nimport argparse\nimport numpy as np\nimport os.path as osp\nfrom omegaconf import OmegaConf\n\nsys.path.append('.')\nfrom utils.utils import InputPadder, read, img2tensor\nfrom utils.build_utils import build_from_cfg\nfrom metrics.psnr_ssim import calculate_psnr, calculate_ssim\n\nparser = argparse.ArgumentParser(\n                prog = 'AMT',\n                description = 'Xiph evaluation',\n                )\nparser.add_argument('-c', '--config', default='cfgs/AMT-S.yaml') \nparser.add_argument('-p', '--ckpt', default='pretrained/amt-s.pth') \nparser.add_argument('-r', '--root', default='data/xiph') \nargs = parser.parse_args()\n\ndevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\ncfg_path = args.config\nckpt_path = args.ckpt\nroot = args.root\n\nnetwork_cfg = OmegaConf.load(cfg_path).network\nnetwork_name = network_cfg.name\nmodel = build_from_cfg(network_cfg)\nckpt = torch.load(ckpt_path)\nmodel.load_state_dict(ckpt['state_dict'], False)\nmodel = model.to(device)\nmodel.eval()\n\n############################################# Prepare Dataset #############################################\ndownload_links = [\n    'https://media.xiph.org/video/derf/ElFuente/Netflix_BoxingPractice_4096x2160_60fps_10bit_420.y4m',\n    'https://media.xiph.org/video/derf/ElFuente/Netflix_Crosswalk_4096x2160_60fps_10bit_420.y4m',\n    'https://media.xiph.org/video/derf/Chimera/Netflix_DrivingPOV_4096x2160_60fps_10bit_420.y4m',\n    'https://media.xiph.org/video/derf/ElFuente/Netflix_FoodMarket_4096x2160_60fps_10bit_420.y4m',\n    'https://media.xiph.org/video/derf/ElFuente/Netflix_FoodMarket2_4096x2160_60fps_10bit_420.y4m',\n    'https://media.xiph.org/video/derf/ElFuente/Netflix_RitualDance_4096x2160_60fps_10bit_420.y4m',\n    'https://media.xiph.org/video/derf/ElFuente/Netflix_SquareAndTimelapse_4096x2160_60fps_10bit_420.y4m',\n    'https://media.xiph.org/video/derf/ElFuente/Netflix_Tango_4096x2160_60fps_10bit_420.y4m',\n]\nfile_list = ['BoxingPractice', 'Crosswalk', 'DrivingPOV', 'FoodMarket', 'FoodMarket2', 'RitualDance', \n             'SquareAndTimelapse', 'Tango']\n\nfor file_name, link in zip(file_list, download_links):\n    data_dir = osp.join(root, file_name)\n    if osp.exists(data_dir) is False:\n        os.makedirs(data_dir)\n    if len(glob.glob(f'{data_dir}/*.png')) < 100:\n        os.system(f'ffmpeg -i {link} -pix_fmt rgb24 -vframes 100 {data_dir}/%03d.png')\n############################################### Prepare End ###############################################\n\n\ndivisor = 32; scale_factor = 0.5\nfor category in ['resized-2k', 'cropped-4k']:\n    psnr_list = []\n    ssim_list = []\n    pbar = tqdm.tqdm(file_list, total=len(file_list))\n    for flie_name in pbar:\n        dir_name = osp.join(root, flie_name)\n        for intFrame in range(2, 99, 2):\n            img0 = read(f'{dir_name}/{intFrame - 1:03d}.png')\n            img1 = read(f'{dir_name}/{intFrame + 1:03d}.png')\n            imgt = read(f'{dir_name}/{intFrame:03d}.png')\n\n            if category == 'resized-2k':\n                img0 = cv2.resize(src=img0, dsize=(2048, 1080), fx=0.0, fy=0.0, interpolation=cv2.INTER_AREA)\n                img1 = cv2.resize(src=img1, dsize=(2048, 1080), fx=0.0, fy=0.0, interpolation=cv2.INTER_AREA)\n                imgt = cv2.resize(src=imgt, dsize=(2048, 1080), fx=0.0, fy=0.0, interpolation=cv2.INTER_AREA)\n\n            elif category == 'cropped-4k':\n                img0 = img0[540:-540, 1024:-1024, :]\n                img1 = img1[540:-540, 1024:-1024, :]\n                imgt = imgt[540:-540, 1024:-1024, :]\n            img0 = img2tensor(img0).to(device)\n            imgt = img2tensor(imgt).to(device)\n            img1 = img2tensor(img1).to(device)\n            embt = torch.tensor(1/2).float().view(1, 1, 1, 1).to(device)\n            \n            padder = InputPadder(img0.shape, divisor)\n            img0, img1 = padder.pad(img0, img1)\n\n            with torch.no_grad():\n                imgt_pred = model(img0, img1, embt, scale_factor=scale_factor, eval=True)['imgt_pred']\n                imgt_pred = padder.unpad(imgt_pred)\n\n            psnr = calculate_psnr(imgt_pred, imgt)\n            ssim = calculate_ssim(imgt_pred, imgt)\n\n            avg_psnr = np.mean(psnr_list)\n            avg_ssim = np.mean(ssim_list)\n            psnr_list.append(psnr)\n            ssim_list.append(ssim)\n            desc_str = f'[{network_name}/Xiph] [{category}/{flie_name}] psnr: {avg_psnr:.02f}, ssim: {avg_ssim:.04f}'\n\n            pbar.set_description_str(desc_str)"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/datasets/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/datasets/adobe_datasets.py",
    "content": "'''\n    This code is partially borrowed from IFRNet (https://github.com/ltkong218/IFRNet). \n'''\nimport os\nimport sys\nimport torch\nimport numpy as np\nfrom torch.utils.data import Dataset\nsys.path.append('.')\nfrom utils.utils import read, img2tensor\nfrom datasets.gopro_datasets import (\n    random_resize_woflow, random_crop_woflow, center_crop_woflow,\n    random_reverse_channel_woflow, random_vertical_flip_woflow,\n    random_horizontal_flip_woflow, random_rotate_woflow, \n    random_reverse_time_woflow\n)\n\n\nclass Adobe240_Dataset(Dataset):\n    def __init__(self, dataset_dir='data/adobe240/test_frames', interFrames=7, augment=True):\n        super().__init__()\n        self.augment = augment\n        self.interFrames = interFrames\n        self.setLength = interFrames + 2\n        self.dataset_dir = os.path.join(dataset_dir)\n        video_list = os.listdir(self.dataset_dir)[9::10]\n        self.frames_list = []\n        self.file_list = []\n        for video in video_list:\n            frames = sorted(os.listdir(os.path.join(self.dataset_dir, video)))\n            n_sets = (len(frames) - self.setLength) // (interFrames + 1)  + 1\n            videoInputs = [frames[(interFrames + 1) * i: (interFrames + 1) * i + self.setLength] for i in range(n_sets)]\n            videoInputs = [[os.path.join(video, f) for f in group] for group in videoInputs]\n            self.file_list.extend(videoInputs)\n\n    def __getitem__(self, idx):\n        clip_idx = idx // self.interFrames\n        embt_idx = idx % self.interFrames\n        imgpaths = [os.path.join(self.dataset_dir, fp) for fp in self.file_list[clip_idx]]\n        pick_idxs = list(range(0, self.setLength, self.interFrames + 1))\n        imgt_beg = self.setLength // 2 - self.interFrames // 2\n        imgt_end = self.setLength // 2 + self.interFrames // 2 + self.interFrames % 2\n        imgt_idx = list(range(imgt_beg, imgt_end)) \n        input_paths = [imgpaths[idx] for idx in pick_idxs]\n        imgt_paths = [imgpaths[idx] for idx in imgt_idx]\n        \n        img0 = np.array(read(input_paths[0]))\n        imgt = np.array(read(imgt_paths[embt_idx]))\n        img1 = np.array(read(input_paths[1]))\n        embt = torch.from_numpy(np.array((embt_idx  + 1) / (self.interFrames + 1)\n                                         ).reshape(1, 1, 1).astype(np.float32))\n\n        if self.augment == True:\n            img0, imgt, img1 = random_resize_woflow(img0, imgt, img1, p=0.1)\n            img0, imgt, img1 = random_crop_woflow(img0, imgt, img1, crop_size=(224, 224))\n            img0, imgt, img1 = random_reverse_channel_woflow(img0, imgt, img1, p=0.5)\n            img0, imgt, img1 = random_vertical_flip_woflow(img0, imgt, img1, p=0.3)\n            img0, imgt, img1 = random_horizontal_flip_woflow(img0, imgt, img1, p=0.5)\n            img0, imgt, img1 = random_rotate_woflow(img0, imgt, img1, p=0.05)\n            img0, imgt, img1, embt = random_reverse_time_woflow(img0, imgt, img1, \n                                                                embt=embt, p=0.5)\n        else:\n            img0, imgt, img1 = center_crop_woflow(img0, imgt, img1, crop_size=(512, 512))\n            \n        img0 = img2tensor(img0).squeeze(0)\n        imgt = img2tensor(imgt).squeeze(0)\n        img1 = img2tensor(img1).squeeze(0)\n        \n        return {'img0': img0.float(), \n                'imgt': imgt.float(), \n                'img1': img1.float(),  \n                'embt': embt}\n\n    def __len__(self):\n        return len(self.file_list) * self.interFrames\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/datasets/gopro_datasets.py",
    "content": "'''\n    This code is partially borrowed from IFRNet (https://github.com/ltkong218/IFRNet). \n    In the consideration of the difficulty in flow supervision generation, we abort \n    flow loss in the 8x case.\n'''\nimport os\nimport cv2\nimport torch\nimport random\nimport numpy as np\nfrom torch.utils.data import Dataset\nfrom utils.utils import read, img2tensor\n\ndef random_resize_woflow(img0, imgt, img1, p=0.1):\n    if random.uniform(0, 1) < p:\n        img0 = cv2.resize(img0, dsize=None, fx=2.0, fy=2.0, interpolation=cv2.INTER_LINEAR)\n        imgt = cv2.resize(imgt, dsize=None, fx=2.0, fy=2.0, interpolation=cv2.INTER_LINEAR)\n        img1 = cv2.resize(img1, dsize=None, fx=2.0, fy=2.0, interpolation=cv2.INTER_LINEAR)\n    return img0, imgt, img1\n\ndef random_crop_woflow(img0, imgt, img1, crop_size=(224, 224)):\n    h, w = crop_size[0], crop_size[1]\n    ih, iw, _ = img0.shape\n    x = np.random.randint(0, ih-h+1)\n    y = np.random.randint(0, iw-w+1)\n    img0 = img0[x: x + h, y : y + w, :]\n    imgt = imgt[x: x + h, y : y + w, :]\n    img1 = img1[x: x + h, y : y + w, :]\n    return img0, imgt, img1\n\ndef center_crop_woflow(img0, imgt, img1, crop_size=(512, 512)):\n    h, w = crop_size[0], crop_size[1]\n    ih, iw, _ = img0.shape\n    img0 = img0[ih // 2 - h // 2: ih // 2 + h // 2, iw // 2 - w // 2: iw // 2 +  w // 2, :]\n    imgt = imgt[ih // 2 - h // 2: ih // 2 + h // 2, iw // 2 - w // 2: iw // 2 +  w // 2, :]\n    img1 = img1[ih // 2 - h // 2: ih // 2 + h // 2, iw // 2 - w // 2: iw // 2 +  w // 2, :]\n    return img0, imgt, img1\n\ndef random_reverse_channel_woflow(img0, imgt, img1, p=0.5):\n    if random.uniform(0, 1) < p:\n        img0 = img0[:, :, ::-1]\n        imgt = imgt[:, :, ::-1]\n        img1 = img1[:, :, ::-1]\n    return img0, imgt, img1\n\ndef random_vertical_flip_woflow(img0, imgt, img1, p=0.3):\n    if random.uniform(0, 1) < p:\n        img0 = img0[::-1]\n        imgt = imgt[::-1]\n        img1 = img1[::-1]\n    return img0, imgt, img1\n\ndef random_horizontal_flip_woflow(img0, imgt, img1, p=0.5):\n    if random.uniform(0, 1) < p:\n        img0 = img0[:, ::-1]\n        imgt = imgt[:, ::-1]\n        img1 = img1[:, ::-1]\n    return img0, imgt, img1\n\ndef random_rotate_woflow(img0, imgt, img1, p=0.05):\n    if random.uniform(0, 1) < p:\n        img0 = img0.transpose((1, 0, 2))\n        imgt = imgt.transpose((1, 0, 2))\n        img1 = img1.transpose((1, 0, 2))\n    return img0, imgt, img1\n\ndef random_reverse_time_woflow(img0, imgt, img1, embt, p=0.5):\n    if random.uniform(0, 1) < p:\n        tmp = img1\n        img1 = img0\n        img0 = tmp\n    embt = 1 - embt\n    return img0, imgt, img1, embt\n\nclass GoPro_Train_Dataset(Dataset):\n    def __init__(self, dataset_dir='data/GOPRO', interFrames=7, augment=True):\n        self.dataset_dir = dataset_dir + '/train'\n        self.interFrames = interFrames\n        self.augment = augment\n        self.setLength = interFrames + 2\n        video_list = [\n            'GOPR0372_07_00', 'GOPR0374_11_01', 'GOPR0378_13_00', 'GOPR0384_11_01', \n            'GOPR0384_11_04', 'GOPR0477_11_00', 'GOPR0868_11_02', 'GOPR0884_11_00', \n            'GOPR0372_07_01', 'GOPR0374_11_02', 'GOPR0379_11_00', 'GOPR0384_11_02', \n            'GOPR0385_11_00', 'GOPR0857_11_00', 'GOPR0871_11_01', 'GOPR0374_11_00', \n            'GOPR0374_11_03', 'GOPR0380_11_00', 'GOPR0384_11_03', 'GOPR0386_11_00', \n            'GOPR0868_11_01', 'GOPR0881_11_00']\n        self.frames_list = []\n        self.file_list = []\n        for video in video_list:\n            frames = sorted(os.listdir(os.path.join(self.dataset_dir, video)))\n            n_sets = (len(frames) - self.setLength) // (interFrames+1)  + 1\n            videoInputs = [frames[(interFrames + 1) * i: (interFrames + 1) * i + self.setLength\n                                                        ] for i in range(n_sets)]\n            videoInputs = [[os.path.join(video, f) for f in group] for group in videoInputs]\n            self.file_list.extend(videoInputs)\n\n    def __len__(self):\n        return len(self.file_list) * self.interFrames\n\n    def __getitem__(self, idx):\n        clip_idx = idx // self.interFrames\n        embt_idx = idx % self.interFrames\n        imgpaths = [os.path.join(self.dataset_dir, fp) for fp in self.file_list[clip_idx]]\n        pick_idxs = list(range(0, self.setLength, self.interFrames + 1))\n        imgt_beg = self.setLength // 2 - self.interFrames // 2\n        imgt_end = self.setLength // 2 + self.interFrames // 2 + self.interFrames % 2\n        imgt_idx = list(range(imgt_beg, imgt_end)) \n        input_paths = [imgpaths[idx] for idx in pick_idxs]\n        imgt_paths = [imgpaths[idx] for idx in imgt_idx]\n        \n        embt = torch.from_numpy(np.array((embt_idx  + 1) / (self.interFrames+1)\n                                         ).reshape(1, 1, 1).astype(np.float32))\n        img0 = np.array(read(input_paths[0]))\n        imgt = np.array(read(imgt_paths[embt_idx]))\n        img1 = np.array(read(input_paths[1]))\n\n        if self.augment == True:\n            img0, imgt, img1 = random_resize_woflow(img0, imgt, img1, p=0.1)\n            img0, imgt, img1 = random_crop_woflow(img0, imgt, img1, crop_size=(224, 224))\n            img0, imgt, img1 = random_reverse_channel_woflow(img0, imgt, img1, p=0.5)\n            img0, imgt, img1 = random_vertical_flip_woflow(img0, imgt, img1, p=0.3)\n            img0, imgt, img1 = random_horizontal_flip_woflow(img0, imgt, img1, p=0.5)\n            img0, imgt, img1 = random_rotate_woflow(img0, imgt, img1, p=0.05)\n            img0, imgt, img1, embt = random_reverse_time_woflow(img0, imgt, img1, \n                                                                embt=embt, p=0.5)\n        else:\n            img0, imgt, img1 = center_crop_woflow(img0, imgt, img1, crop_size=(512, 512))\n            \n        img0 = img2tensor(img0.copy()).squeeze(0)\n        imgt = img2tensor(imgt.copy()).squeeze(0)\n        img1 = img2tensor(img1.copy()).squeeze(0)\n        \n        return {'img0': img0.float(), \n                'imgt': imgt.float(), \n                'img1': img1.float(),  \n                'embt': embt}\n\nclass GoPro_Test_Dataset(Dataset):\n    def __init__(self, dataset_dir='data/GOPRO', interFrames=7):\n        self.dataset_dir = dataset_dir + '/test'\n        self.interFrames = interFrames\n        self.setLength = interFrames + 2\n        video_list = [\n            'GOPR0384_11_00', 'GOPR0385_11_01', 'GOPR0410_11_00', \n            'GOPR0862_11_00', 'GOPR0869_11_00', 'GOPR0881_11_01', \n            'GOPR0384_11_05', 'GOPR0396_11_00', 'GOPR0854_11_00', \n            'GOPR0868_11_00', 'GOPR0871_11_00']\n        self.frames_list = []\n        self.file_list = []\n        for video in video_list:\n            frames = sorted(os.listdir(os.path.join(self.dataset_dir, video)))\n            n_sets = (len(frames) - self.setLength)//(interFrames+1)  + 1\n            videoInputs = [frames[(interFrames + 1) * i:(interFrames + 1) * i + self.setLength\n                                                        ] for i in range(n_sets)]\n            videoInputs = [[os.path.join(video, f) for f in group] for group in videoInputs]\n            self.file_list.extend(videoInputs)\n\n    def __len__(self):\n        return len(self.file_list) * self.interFrames\n\n    def __getitem__(self, idx):\n        clip_idx = idx // self.interFrames\n        embt_idx = idx % self.interFrames\n        imgpaths = [os.path.join(self.dataset_dir, fp) for fp in self.file_list[clip_idx]]\n        pick_idxs = list(range(0, self.setLength, self.interFrames + 1))\n        imgt_beg = self.setLength // 2 - self.interFrames // 2\n        imgt_end = self.setLength // 2 + self.interFrames // 2 + self.interFrames % 2\n        imgt_idx = list(range(imgt_beg, imgt_end)) \n        input_paths = [imgpaths[idx] for idx in pick_idxs]\n        imgt_paths = [imgpaths[idx] for idx in imgt_idx]\n\n        img0 = np.array(read(input_paths[0]))\n        imgt = np.array(read(imgt_paths[embt_idx]))\n        img1 = np.array(read(input_paths[1]))\n\n        img0, imgt, img1 = center_crop_woflow(img0, imgt, img1, crop_size=(512, 512))\n\n        img0 = img2tensor(img0).squeeze(0)\n        imgt = img2tensor(imgt).squeeze(0)\n        img1 = img2tensor(img1).squeeze(0)\n        \n        embt = torch.from_numpy(np.array((embt_idx + 1) / (self.interFrames + 1)\n                                         ).reshape(1, 1, 1).astype(np.float32))\n        return {'img0': img0.float(), \n                'imgt': imgt.float(), \n                'img1': img1.float(),  \n                'embt': embt}"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/datasets/vimeo_datasets.py",
    "content": "'''\n    This code is partially borrowed from IFRNet (https://github.com/ltkong218/IFRNet). \n'''\nimport os\nimport cv2\nimport torch\nimport random\nimport numpy as np\nfrom torch.utils.data import Dataset\nfrom utils.utils import read\n\n\ndef random_resize(img0, imgt, img1, flow, p=0.1):\n    if random.uniform(0, 1) < p:\n        img0 = cv2.resize(img0, dsize=None, fx=2.0, fy=2.0, interpolation=cv2.INTER_LINEAR)\n        imgt = cv2.resize(imgt, dsize=None, fx=2.0, fy=2.0, interpolation=cv2.INTER_LINEAR)\n        img1 = cv2.resize(img1, dsize=None, fx=2.0, fy=2.0, interpolation=cv2.INTER_LINEAR)\n        flow = cv2.resize(flow, dsize=None, fx=2.0, fy=2.0, interpolation=cv2.INTER_LINEAR) * 2.0\n    return img0, imgt, img1, flow\n\ndef random_crop(img0, imgt, img1, flow, crop_size=(224, 224)):\n    h, w = crop_size[0], crop_size[1]\n    ih, iw, _ = img0.shape\n    x = np.random.randint(0, ih-h+1)\n    y = np.random.randint(0, iw-w+1)\n    img0 = img0[x:x+h, y:y+w, :]\n    imgt = imgt[x:x+h, y:y+w, :]\n    img1 = img1[x:x+h, y:y+w, :]\n    flow = flow[x:x+h, y:y+w, :]\n    return img0, imgt, img1, flow\n\ndef random_reverse_channel(img0, imgt, img1, flow, p=0.5):\n    if random.uniform(0, 1) < p:\n        img0 = img0[:, :, ::-1]\n        imgt = imgt[:, :, ::-1]\n        img1 = img1[:, :, ::-1]\n    return img0, imgt, img1, flow\n\ndef random_vertical_flip(img0, imgt, img1, flow, p=0.3):\n    if random.uniform(0, 1) < p:\n        img0 = img0[::-1]\n        imgt = imgt[::-1]\n        img1 = img1[::-1]\n        flow = flow[::-1]\n        flow = np.concatenate((flow[:, :, 0:1], -flow[:, :, 1:2], flow[:, :, 2:3], -flow[:, :, 3:4]), 2)\n    return img0, imgt, img1, flow\n\ndef random_horizontal_flip(img0, imgt, img1, flow, p=0.5):\n    if random.uniform(0, 1) < p:\n        img0 = img0[:, ::-1]\n        imgt = imgt[:, ::-1]\n        img1 = img1[:, ::-1]\n        flow = flow[:, ::-1]\n        flow = np.concatenate((-flow[:, :, 0:1], flow[:, :, 1:2], -flow[:, :, 2:3], flow[:, :, 3:4]), 2)\n    return img0, imgt, img1, flow\n\ndef random_rotate(img0, imgt, img1, flow, p=0.05):\n    if random.uniform(0, 1) < p:\n        img0 = img0.transpose((1, 0, 2))\n        imgt = imgt.transpose((1, 0, 2))\n        img1 = img1.transpose((1, 0, 2))\n        flow = flow.transpose((1, 0, 2))\n        flow = np.concatenate((flow[:, :, 1:2], flow[:, :, 0:1], flow[:, :, 3:4], flow[:, :, 2:3]), 2)\n    return img0, imgt, img1, flow\n\ndef random_reverse_time(img0, imgt, img1, flow, p=0.5):\n    if random.uniform(0, 1) < p:\n        tmp = img1\n        img1 = img0\n        img0 = tmp\n        flow = np.concatenate((flow[:, :, 2:4], flow[:, :, 0:2]), 2)\n    return img0, imgt, img1, flow\n\n\nclass Vimeo90K_Train_Dataset(Dataset):\n    def __init__(self, \n                 dataset_dir='data/vimeo_triplet', \n                 flow_dir=None, \n                 augment=True, \n                 crop_size=(224, 224)):\n        self.dataset_dir = dataset_dir\n        self.augment = augment\n        self.crop_size = crop_size\n        self.img0_list = []\n        self.imgt_list = []\n        self.img1_list = []\n        self.flow_t0_list = []\n        self.flow_t1_list = []\n        if flow_dir is None:\n            flow_dir = 'flow'\n        with open(os.path.join(dataset_dir, 'tri_trainlist.txt'), 'r') as f:\n            for i in f:\n                name = str(i).strip()\n                if(len(name) <= 1):\n                    continue\n                self.img0_list.append(os.path.join(dataset_dir, 'sequences', name, 'im1.png'))\n                self.imgt_list.append(os.path.join(dataset_dir, 'sequences', name, 'im2.png'))\n                self.img1_list.append(os.path.join(dataset_dir, 'sequences', name, 'im3.png'))\n                self.flow_t0_list.append(os.path.join(dataset_dir, flow_dir, name, 'flow_t0.flo'))\n                self.flow_t1_list.append(os.path.join(dataset_dir, flow_dir, name, 'flow_t1.flo'))\n\n    def __len__(self):\n        return len(self.imgt_list)\n\n    def __getitem__(self, idx):\n        img0 = read(self.img0_list[idx])\n        imgt = read(self.imgt_list[idx])\n        img1 = read(self.img1_list[idx])\n        flow_t0 = read(self.flow_t0_list[idx])\n        flow_t1 = read(self.flow_t1_list[idx])\n        flow = np.concatenate((flow_t0, flow_t1), 2).astype(np.float64)\n\n        if self.augment == True:\n            img0, imgt, img1, flow = random_resize(img0, imgt, img1, flow, p=0.1)\n            img0, imgt, img1, flow = random_crop(img0, imgt, img1, flow, crop_size=self.crop_size)\n            img0, imgt, img1, flow = random_reverse_channel(img0, imgt, img1, flow, p=0.5)\n            img0, imgt, img1, flow = random_vertical_flip(img0, imgt, img1, flow, p=0.3)\n            img0, imgt, img1, flow = random_horizontal_flip(img0, imgt, img1, flow, p=0.5)\n            img0, imgt, img1, flow = random_rotate(img0, imgt, img1, flow, p=0.05)\n            img0, imgt, img1, flow = random_reverse_time(img0, imgt, img1, flow, p=0.5)\n                \n        \n        img0 = torch.from_numpy(img0.transpose((2, 0, 1)).astype(np.float32) / 255.0)\n        imgt = torch.from_numpy(imgt.transpose((2, 0, 1)).astype(np.float32) / 255.0)\n        img1 = torch.from_numpy(img1.transpose((2, 0, 1)).astype(np.float32) / 255.0)\n        flow = torch.from_numpy(flow.transpose((2, 0, 1)).astype(np.float32))\n        embt = torch.from_numpy(np.array(1/2).reshape(1, 1, 1).astype(np.float32))\n\n        return {'img0': img0.float(), 'imgt': imgt.float(), 'img1': img1.float(), 'flow': flow.float(), 'embt': embt}\n\n\nclass Vimeo90K_Test_Dataset(Dataset):\n    def __init__(self, dataset_dir='data/vimeo_triplet'):\n        self.dataset_dir = dataset_dir\n        self.img0_list = []\n        self.imgt_list = []\n        self.img1_list = []\n        self.flow_t0_list = []\n        self.flow_t1_list = []\n        with open(os.path.join(dataset_dir, 'tri_testlist.txt'), 'r') as f:\n            for i in f:\n                name = str(i).strip()\n                if(len(name) <= 1):\n                    continue\n                self.img0_list.append(os.path.join(dataset_dir, 'sequences', name, 'im1.png'))\n                self.imgt_list.append(os.path.join(dataset_dir, 'sequences', name, 'im2.png'))\n                self.img1_list.append(os.path.join(dataset_dir, 'sequences', name, 'im3.png'))\n                self.flow_t0_list.append(os.path.join(dataset_dir, 'flow', name, 'flow_t0.flo'))\n                self.flow_t1_list.append(os.path.join(dataset_dir, 'flow', name, 'flow_t1.flo'))\n\n    def __len__(self):\n        return len(self.imgt_list)\n\n    def __getitem__(self, idx):\n        img0 = read(self.img0_list[idx])\n        imgt = read(self.imgt_list[idx])\n        img1 = read(self.img1_list[idx])\n        flow_t0 = read(self.flow_t0_list[idx])\n        flow_t1 = read(self.flow_t1_list[idx])\n        flow = np.concatenate((flow_t0, flow_t1), 2)\n\n        img0 = torch.from_numpy(img0.transpose((2, 0, 1)).astype(np.float32) / 255.0)\n        imgt = torch.from_numpy(imgt.transpose((2, 0, 1)).astype(np.float32) / 255.0)\n        img1 = torch.from_numpy(img1.transpose((2, 0, 1)).astype(np.float32) / 255.0)\n        flow = torch.from_numpy(flow.transpose((2, 0, 1)).astype(np.float32))\n        embt = torch.from_numpy(np.array(1/2).reshape(1, 1, 1).astype(np.float32))\n        \n        return {'img0': img0.float(), \n                'imgt': imgt.float(), \n                'img1': img1.float(), \n                'flow': flow.float(), \n                'embt': embt}\n\n\n\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/flow_generation/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/flow_generation/gen_flow.py",
    "content": "import os\nimport sys\nimport torch\nimport argparse\nimport numpy as np\nimport os.path as osp\nimport torch.nn.functional as F\n\nsys.path.append('.')\nfrom utils.utils import read, write\nfrom flow_generation.liteflownet.run import estimate\n\nparser = argparse.ArgumentParser(\n                prog = 'AMT',\n                description = 'Flow generation',\n                )\nparser.add_argument('-r', '--root', default='data/vimeo_triplet') \nargs = parser.parse_args()\n\nvimeo90k_dir = args.root\nvimeo90k_sequences_dir = osp.join(vimeo90k_dir, 'sequences')\nvimeo90k_flow_dir = osp.join(vimeo90k_dir, 'flow')\n\ndef pred_flow(img1, img2):\n    img1 = torch.from_numpy(img1).float().permute(2, 0, 1) / 255.0\n    img2 = torch.from_numpy(img2).float().permute(2, 0, 1) / 255.0\n\n    flow = estimate(img1, img2)\n\n    flow = flow.permute(1, 2, 0).cpu().numpy()\n    return flow\n\nprint('Built Flow Path')\nif not osp.exists(vimeo90k_flow_dir):\n    os.makedirs(vimeo90k_flow_dir)\n\nfor sequences_path in sorted(os.listdir(vimeo90k_sequences_dir)):\n    vimeo90k_sequences_path_dir = osp.join(vimeo90k_sequences_dir, sequences_path)\n    vimeo90k_flow_path_dir = osp.join(vimeo90k_flow_dir, sequences_path)\n    if not osp.exists(vimeo90k_flow_path_dir):\n        os.mkdir(vimeo90k_flow_path_dir)\n        \n    for sequences_id in sorted(os.listdir(vimeo90k_sequences_path_dir)):\n        vimeo90k_flow_id_dir = osp.join(vimeo90k_flow_path_dir, sequences_id)\n        if not osp.exists(vimeo90k_flow_id_dir):\n            os.mkdir(vimeo90k_flow_id_dir)\n\nfor sequences_path in sorted(os.listdir(vimeo90k_sequences_dir)):\n    vimeo90k_sequences_path_dir = os.path.join(vimeo90k_sequences_dir, sequences_path)\n    vimeo90k_flow_path_dir = os.path.join(vimeo90k_flow_dir, sequences_path)\n    \n    for sequences_id in sorted(os.listdir(vimeo90k_sequences_path_dir)):\n        vimeo90k_sequences_id_dir = os.path.join(vimeo90k_sequences_path_dir, sequences_id)\n        vimeo90k_flow_id_dir = os.path.join(vimeo90k_flow_path_dir, sequences_id)\n        \n        img0_path = vimeo90k_sequences_id_dir + '/im1.png'\n        imgt_path = vimeo90k_sequences_id_dir + '/im2.png'\n        img1_path = vimeo90k_sequences_id_dir + '/im3.png'\n        flow_t0_path = vimeo90k_flow_id_dir + '/flow_t0.flo'\n        flow_t1_path = vimeo90k_flow_id_dir + '/flow_t1.flo'\n        \n        img0 = read(img0_path)\n        imgt = read(imgt_path)\n        img1 = read(img1_path)\n        \n        flow_t0 = pred_flow(imgt, img0)\n        flow_t1 = pred_flow(imgt, img1)\n        \n        write(flow_t0_path, flow_t0)\n        write(flow_t1_path, flow_t1)\n        \n    print('Written Sequences {}'.format(sequences_path))"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/flow_generation/liteflownet/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/flow_generation/liteflownet/run.py",
    "content": "#!/usr/bin/env python\n\nimport getopt\nimport math\nimport numpy\nimport PIL\nimport PIL.Image\nimport sys\nimport torch\n\ntry:\n    from .correlation import correlation # the custom cost volume layer\nexcept:\n    sys.path.insert(0, './correlation'); import correlation # you should consider upgrading python\n# end\n\n##########################################################\n\nassert(int(str('').join(torch.__version__.split('.')[0:2])) >= 13) # requires at least pytorch version 1.3.0\n\ntorch.set_grad_enabled(False) # make sure to not compute gradients for computational performance\n\ntorch.backends.cudnn.enabled = True # make sure to use cudnn for computational performance\n\n##########################################################\n\narguments_strModel = 'default' # 'default', or 'kitti', or 'sintel'\narguments_strOne = './images/one.png'\narguments_strTwo = './images/two.png'\narguments_strOut = './out.flo'\n\nfor strOption, strArgument in getopt.getopt(sys.argv[1:], '', [ strParameter[2:] + '=' for strParameter in sys.argv[1::2] ])[0]:\n    if strOption == '--model' and strArgument != '': arguments_strModel = strArgument # which model to use\n    if strOption == '--one' and strArgument != '': arguments_strOne = strArgument # path to the first frame\n    if strOption == '--two' and strArgument != '': arguments_strTwo = strArgument # path to the second frame\n    if strOption == '--out' and strArgument != '': arguments_strOut = strArgument # path to where the output should be stored\n# end\n\n##########################################################\n\nbackwarp_tenGrid = {}\n\ndef backwarp(tenInput, tenFlow):\n    if str(tenFlow.shape) not in backwarp_tenGrid:\n        tenHor = torch.linspace(-1.0 + (1.0 / tenFlow.shape[3]), 1.0 - (1.0 / tenFlow.shape[3]), tenFlow.shape[3]).view(1, 1, 1, -1).repeat(1, 1, tenFlow.shape[2], 1)\n        tenVer = torch.linspace(-1.0 + (1.0 / tenFlow.shape[2]), 1.0 - (1.0 / tenFlow.shape[2]), tenFlow.shape[2]).view(1, 1, -1, 1).repeat(1, 1, 1, tenFlow.shape[3])\n\n        backwarp_tenGrid[str(tenFlow.shape)] = torch.cat([ tenHor, tenVer ], 1).cuda()\n    # end\n\n    tenFlow = torch.cat([ tenFlow[:, 0:1, :, :] / ((tenInput.shape[3] - 1.0) / 2.0), tenFlow[:, 1:2, :, :] / ((tenInput.shape[2] - 1.0) / 2.0) ], 1)\n\n    return torch.nn.functional.grid_sample(input=tenInput, grid=(backwarp_tenGrid[str(tenFlow.shape)] + tenFlow).permute(0, 2, 3, 1), mode='bilinear', padding_mode='zeros', align_corners=False)\n# end\n\n##########################################################\n\nclass Network(torch.nn.Module):\n    def __init__(self):\n        super().__init__()\n\n        class Features(torch.nn.Module):\n            def __init__(self):\n                super().__init__()\n\n                self.netOne = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=3, out_channels=32, kernel_size=7, stride=1, padding=3),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1)\n                )\n\n                self.netTwo = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride=2, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1)\n                )\n\n                self.netThr = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=2, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1)\n                )\n\n                self.netFou = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=64, out_channels=96, kernel_size=3, stride=2, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=96, out_channels=96, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1)\n                )\n\n                self.netFiv = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=96, out_channels=128, kernel_size=3, stride=2, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1)\n                )\n\n                self.netSix = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=128, out_channels=192, kernel_size=3, stride=2, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1)\n                )\n            # end\n\n            def forward(self, tenInput):\n                tenOne = self.netOne(tenInput)\n                tenTwo = self.netTwo(tenOne)\n                tenThr = self.netThr(tenTwo)\n                tenFou = self.netFou(tenThr)\n                tenFiv = self.netFiv(tenFou)\n                tenSix = self.netSix(tenFiv)\n\n                return [ tenOne, tenTwo, tenThr, tenFou, tenFiv, tenSix ]\n            # end\n        # end\n\n        class Matching(torch.nn.Module):\n            def __init__(self, intLevel):\n                super().__init__()\n\n                self.fltBackwarp = [ 0.0, 0.0, 10.0, 5.0, 2.5, 1.25, 0.625 ][intLevel]\n\n                if intLevel != 2:\n                    self.netFeat = torch.nn.Sequential()\n\n                elif intLevel == 2:\n                    self.netFeat = torch.nn.Sequential(\n                        torch.nn.Conv2d(in_channels=32, out_channels=64, kernel_size=1, stride=1, padding=0),\n                        torch.nn.LeakyReLU(inplace=False, negative_slope=0.1)\n                    )\n\n                # end\n\n                if intLevel == 6:\n                    self.netUpflow = None\n\n                elif intLevel != 6:\n                    self.netUpflow = torch.nn.ConvTranspose2d(in_channels=2, out_channels=2, kernel_size=4, stride=2, padding=1, bias=False, groups=2)\n\n                # end\n\n                if intLevel >= 4:\n                    self.netUpcorr = None\n\n                elif intLevel < 4:\n                    self.netUpcorr = torch.nn.ConvTranspose2d(in_channels=49, out_channels=49, kernel_size=4, stride=2, padding=1, bias=False, groups=49)\n\n                # end\n\n                self.netMain = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=49, out_channels=128, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=128, out_channels=64, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=64, out_channels=32, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=32, out_channels=2, kernel_size=[ 0, 0, 7, 5, 5, 3, 3 ][intLevel], stride=1, padding=[ 0, 0, 3, 2, 2, 1, 1 ][intLevel])\n                )\n            # end\n\n            def forward(self, tenOne, tenTwo, tenFeaturesOne, tenFeaturesTwo, tenFlow):\n                tenFeaturesOne = self.netFeat(tenFeaturesOne)\n                tenFeaturesTwo = self.netFeat(tenFeaturesTwo)\n\n                if tenFlow is not None:\n                    tenFlow = self.netUpflow(tenFlow)\n                # end\n\n                if tenFlow is not None:\n                    tenFeaturesTwo = backwarp(tenInput=tenFeaturesTwo, tenFlow=tenFlow * self.fltBackwarp)\n                # end\n\n                if self.netUpcorr is None:\n                    tenCorrelation = torch.nn.functional.leaky_relu(input=correlation.FunctionCorrelation(tenOne=tenFeaturesOne, tenTwo=tenFeaturesTwo, intStride=1), negative_slope=0.1, inplace=False)\n\n                elif self.netUpcorr is not None:\n                    tenCorrelation = self.netUpcorr(torch.nn.functional.leaky_relu(input=correlation.FunctionCorrelation(tenOne=tenFeaturesOne, tenTwo=tenFeaturesTwo, intStride=2), negative_slope=0.1, inplace=False))\n\n                # end\n\n                return (tenFlow if tenFlow is not None else 0.0) + self.netMain(tenCorrelation)\n            # end\n        # end\n\n        class Subpixel(torch.nn.Module):\n            def __init__(self, intLevel):\n                super().__init__()\n\n                self.fltBackward = [ 0.0, 0.0, 10.0, 5.0, 2.5, 1.25, 0.625 ][intLevel]\n\n                if intLevel != 2:\n                    self.netFeat = torch.nn.Sequential()\n\n                elif intLevel == 2:\n                    self.netFeat = torch.nn.Sequential(\n                        torch.nn.Conv2d(in_channels=32, out_channels=64, kernel_size=1, stride=1, padding=0),\n                        torch.nn.LeakyReLU(inplace=False, negative_slope=0.1)\n                    )\n\n                # end\n\n                self.netMain = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=[ 0, 0, 130, 130, 194, 258, 386 ][intLevel], out_channels=128, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=128, out_channels=64, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=64, out_channels=32, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=32, out_channels=2, kernel_size=[ 0, 0, 7, 5, 5, 3, 3 ][intLevel], stride=1, padding=[ 0, 0, 3, 2, 2, 1, 1 ][intLevel])\n                )\n            # end\n\n            def forward(self, tenOne, tenTwo, tenFeaturesOne, tenFeaturesTwo, tenFlow):\n                tenFeaturesOne = self.netFeat(tenFeaturesOne)\n                tenFeaturesTwo = self.netFeat(tenFeaturesTwo)\n\n                if tenFlow is not None:\n                    tenFeaturesTwo = backwarp(tenInput=tenFeaturesTwo, tenFlow=tenFlow * self.fltBackward)\n                # end\n\n                return (tenFlow if tenFlow is not None else 0.0) + self.netMain(torch.cat([ tenFeaturesOne, tenFeaturesTwo, tenFlow ], 1))\n            # end\n        # end\n\n        class Regularization(torch.nn.Module):\n            def __init__(self, intLevel):\n                super().__init__()\n\n                self.fltBackward = [ 0.0, 0.0, 10.0, 5.0, 2.5, 1.25, 0.625 ][intLevel]\n\n                self.intUnfold = [ 0, 0, 7, 5, 5, 3, 3 ][intLevel]\n\n                if intLevel >= 5:\n                    self.netFeat = torch.nn.Sequential()\n\n                elif intLevel < 5:\n                    self.netFeat = torch.nn.Sequential(\n                        torch.nn.Conv2d(in_channels=[ 0, 0, 32, 64, 96, 128, 192 ][intLevel], out_channels=128, kernel_size=1, stride=1, padding=0),\n                        torch.nn.LeakyReLU(inplace=False, negative_slope=0.1)\n                    )\n\n                # end\n\n                self.netMain = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=[ 0, 0, 131, 131, 131, 131, 195 ][intLevel], out_channels=128, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=128, out_channels=64, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=64, out_channels=32, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1)\n                )\n\n                if intLevel >= 5:\n                    self.netDist = torch.nn.Sequential(\n                        torch.nn.Conv2d(in_channels=32, out_channels=[ 0, 0, 49, 25, 25, 9, 9 ][intLevel], kernel_size=[ 0, 0, 7, 5, 5, 3, 3 ][intLevel], stride=1, padding=[ 0, 0, 3, 2, 2, 1, 1 ][intLevel])\n                    )\n\n                elif intLevel < 5:\n                    self.netDist = torch.nn.Sequential(\n                        torch.nn.Conv2d(in_channels=32, out_channels=[ 0, 0, 49, 25, 25, 9, 9 ][intLevel], kernel_size=([ 0, 0, 7, 5, 5, 3, 3 ][intLevel], 1), stride=1, padding=([ 0, 0, 3, 2, 2, 1, 1 ][intLevel], 0)),\n                        torch.nn.Conv2d(in_channels=[ 0, 0, 49, 25, 25, 9, 9 ][intLevel], out_channels=[ 0, 0, 49, 25, 25, 9, 9 ][intLevel], kernel_size=(1, [ 0, 0, 7, 5, 5, 3, 3 ][intLevel]), stride=1, padding=(0, [ 0, 0, 3, 2, 2, 1, 1 ][intLevel]))\n                    )\n\n                # end\n\n                self.netScaleX = torch.nn.Conv2d(in_channels=[ 0, 0, 49, 25, 25, 9, 9 ][intLevel], out_channels=1, kernel_size=1, stride=1, padding=0)\n                self.netScaleY = torch.nn.Conv2d(in_channels=[ 0, 0, 49, 25, 25, 9, 9 ][intLevel], out_channels=1, kernel_size=1, stride=1, padding=0)\n            # eny\n\n            def forward(self, tenOne, tenTwo, tenFeaturesOne, tenFeaturesTwo, tenFlow):\n                tenDifference = ((tenOne - backwarp(tenInput=tenTwo, tenFlow=tenFlow * self.fltBackward)) ** 2).sum(1, True).sqrt().detach()\n\n                tenDist = self.netDist(self.netMain(torch.cat([ tenDifference, tenFlow - tenFlow.view(tenFlow.shape[0], 2, -1).mean(2, True).view(tenFlow.shape[0], 2, 1, 1), self.netFeat(tenFeaturesOne) ], 1)))\n                tenDist = (tenDist ** 2).neg()\n                tenDist = (tenDist - tenDist.max(1, True)[0]).exp()\n\n                tenDivisor = tenDist.sum(1, True).reciprocal()\n\n                tenScaleX = self.netScaleX(tenDist * torch.nn.functional.unfold(input=tenFlow[:, 0:1, :, :], kernel_size=self.intUnfold, stride=1, padding=int((self.intUnfold - 1) / 2)).view_as(tenDist)) * tenDivisor\n                tenScaleY = self.netScaleY(tenDist * torch.nn.functional.unfold(input=tenFlow[:, 1:2, :, :], kernel_size=self.intUnfold, stride=1, padding=int((self.intUnfold - 1) / 2)).view_as(tenDist)) * tenDivisor\n\n                return torch.cat([ tenScaleX, tenScaleY ], 1)\n            # end\n        # end\n\n        self.netFeatures = Features()\n        self.netMatching = torch.nn.ModuleList([ Matching(intLevel) for intLevel in [ 2, 3, 4, 5, 6 ] ])\n        self.netSubpixel = torch.nn.ModuleList([ Subpixel(intLevel) for intLevel in [ 2, 3, 4, 5, 6 ] ])\n        self.netRegularization = torch.nn.ModuleList([ Regularization(intLevel) for intLevel in [ 2, 3, 4, 5, 6 ] ])\n\n        self.load_state_dict({ strKey.replace('module', 'net'): tenWeight for strKey, tenWeight in torch.hub.load_state_dict_from_url(url='http://content.sniklaus.com/github/pytorch-liteflownet/network-' + arguments_strModel + '.pytorch').items() })\n        # self.load_state_dict(torch.load('./liteflownet/network-default.pth'))\n    # end\n\n    def forward(self, tenOne, tenTwo):\n        tenOne[:, 0, :, :] = tenOne[:, 0, :, :] - 0.411618\n        tenOne[:, 1, :, :] = tenOne[:, 1, :, :] - 0.434631\n        tenOne[:, 2, :, :] = tenOne[:, 2, :, :] - 0.454253\n\n        tenTwo[:, 0, :, :] = tenTwo[:, 0, :, :] - 0.410782\n        tenTwo[:, 1, :, :] = tenTwo[:, 1, :, :] - 0.433645\n        tenTwo[:, 2, :, :] = tenTwo[:, 2, :, :] - 0.452793\n\n        tenFeaturesOne = self.netFeatures(tenOne)\n        tenFeaturesTwo = self.netFeatures(tenTwo)\n\n        tenOne = [ tenOne ]\n        tenTwo = [ tenTwo ]\n\n        for intLevel in [ 1, 2, 3, 4, 5 ]:\n            tenOne.append(torch.nn.functional.interpolate(input=tenOne[-1], size=(tenFeaturesOne[intLevel].shape[2], tenFeaturesOne[intLevel].shape[3]), mode='bilinear', align_corners=False))\n            tenTwo.append(torch.nn.functional.interpolate(input=tenTwo[-1], size=(tenFeaturesTwo[intLevel].shape[2], tenFeaturesTwo[intLevel].shape[3]), mode='bilinear', align_corners=False))\n        # end\n\n        tenFlow = None\n\n        for intLevel in [ -1, -2, -3, -4, -5 ]:\n            tenFlow = self.netMatching[intLevel](tenOne[intLevel], tenTwo[intLevel], tenFeaturesOne[intLevel], tenFeaturesTwo[intLevel], tenFlow)\n            tenFlow = self.netSubpixel[intLevel](tenOne[intLevel], tenTwo[intLevel], tenFeaturesOne[intLevel], tenFeaturesTwo[intLevel], tenFlow)\n            tenFlow = self.netRegularization[intLevel](tenOne[intLevel], tenTwo[intLevel], tenFeaturesOne[intLevel], tenFeaturesTwo[intLevel], tenFlow)\n        # end\n\n        return tenFlow * 20.0\n    # end\n# end\n\nnetNetwork = None\n\n##########################################################\n\ndef estimate(tenOne, tenTwo):\n    global netNetwork\n\n    if netNetwork is None:\n        netNetwork = Network().cuda().eval()\n    # end\n\n    assert(tenOne.shape[1] == tenTwo.shape[1])\n    assert(tenOne.shape[2] == tenTwo.shape[2])\n\n    intWidth = tenOne.shape[2]\n    intHeight = tenOne.shape[1]\n\n    # assert(intWidth == 1024) # remember that there is no guarantee for correctness, comment this line out if you acknowledge this and want to continue\n    # assert(intHeight == 436) # remember that there is no guarantee for correctness, comment this line out if you acknowledge this and want to continue\n\n    tenPreprocessedOne = tenOne.cuda().view(1, 3, intHeight, intWidth)\n    tenPreprocessedTwo = tenTwo.cuda().view(1, 3, intHeight, intWidth)\n\n    intPreprocessedWidth = int(math.floor(math.ceil(intWidth / 32.0) * 32.0))\n    intPreprocessedHeight = int(math.floor(math.ceil(intHeight / 32.0) * 32.0))\n\n    tenPreprocessedOne = torch.nn.functional.interpolate(input=tenPreprocessedOne, size=(intPreprocessedHeight, intPreprocessedWidth), mode='bilinear', align_corners=False)\n    tenPreprocessedTwo = torch.nn.functional.interpolate(input=tenPreprocessedTwo, size=(intPreprocessedHeight, intPreprocessedWidth), mode='bilinear', align_corners=False)\n\n    tenFlow = torch.nn.functional.interpolate(input=netNetwork(tenPreprocessedOne, tenPreprocessedTwo), size=(intHeight, intWidth), mode='bilinear', align_corners=False)\n\n    tenFlow[:, 0, :, :] *= float(intWidth) / float(intPreprocessedWidth)\n    tenFlow[:, 1, :, :] *= float(intHeight) / float(intPreprocessedHeight)\n\n    return tenFlow[0, :, :, :].cpu()\n# end\n\n##########################################################\n\nif __name__ == '__main__':\n    tenOne = torch.FloatTensor(numpy.ascontiguousarray(numpy.array(PIL.Image.open(arguments_strOne))[:, :, ::-1].transpose(2, 0, 1).astype(numpy.float32) * (1.0 / 255.0)))\n    tenTwo = torch.FloatTensor(numpy.ascontiguousarray(numpy.array(PIL.Image.open(arguments_strTwo))[:, :, ::-1].transpose(2, 0, 1).astype(numpy.float32) * (1.0 / 255.0)))\n\n    tenOutput = estimate(tenOne, tenTwo)\n\n    objOutput = open(arguments_strOut, 'wb')\n\n    numpy.array([ 80, 73, 69, 72 ], numpy.uint8).tofile(objOutput)\n    numpy.array([ tenOutput.shape[2], tenOutput.shape[1] ], numpy.int32).tofile(objOutput)\n    numpy.array(tenOutput.numpy().transpose(1, 2, 0), numpy.float32).tofile(objOutput)\n\n    objOutput.close()\n# end"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/losses/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/losses/loss.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport numpy as np\n\n\nclass Loss(nn.Module):\n    def __init__(self, loss_weight, keys, mapping=None) -> None:\n        '''\n            mapping: map the kwargs keys into desired ones.\n        '''\n        super().__init__()\n        self.loss_weight = loss_weight\n        self.keys = keys\n        self.mapping = mapping\n        if isinstance(mapping, dict):\n            self.mapping = {k: v for k, v in mapping if v in keys}\n\n    \n    def forward(self, **kwargs):\n        params = {k: v for k, v in kwargs.items() if k in self.keys}\n        if self.mapping is not None:\n            for k, v in kwargs.items(): \n                if self.mapping.get(k) is not None: \n                    params[self.mapping[k]] = v \n        \n        return self._forward(**params) * self.loss_weight\n\n    def _forward(self, **kwargs):\n        pass\n\n\nclass CharbonnierLoss(Loss):\n    def __init__(self, loss_weight, keys) -> None:\n        super().__init__(loss_weight, keys)\n        \n    def _forward(self, imgt_pred, imgt):    \n        diff = imgt_pred - imgt\n        loss = ((diff ** 2 + 1e-6) ** 0.5).mean()\n        return loss\n\n\nclass AdaCharbonnierLoss(Loss):\n    def __init__(self, loss_weight, keys) -> None:\n        super().__init__(loss_weight, keys)\n        \n    def _forward(self, imgt_pred, imgt, weight):   \n        alpha = weight / 2\n        epsilon = 10 ** (-(10 * weight - 1) / 3)\n\n        diff = imgt_pred - imgt\n        loss = ((diff ** 2 + epsilon ** 2) ** alpha).mean()\n        return loss\n  \n  \nclass TernaryLoss(Loss):\n    def __init__(self, loss_weight, keys, patch_size=7):\n        super().__init__(loss_weight, keys)\n        self.patch_size = patch_size\n        out_channels = patch_size * patch_size\n        self.w = np.eye(out_channels).reshape((patch_size, patch_size, 1, out_channels))\n        self.w = np.transpose(self.w, (3, 2, 0, 1))\n        self.w = torch.tensor(self.w, dtype=torch.float32)\n\n    def transform(self, tensor):\n        self.w = self.w.to(tensor.device)\n        tensor_ = tensor.mean(dim=1, keepdim=True)\n        patches = F.conv2d(tensor_, self.w, padding=self.patch_size//2, bias=None)\n        loc_diff = patches - tensor_\n        loc_diff_norm = loc_diff / torch.sqrt(0.81 + loc_diff ** 2)\n        return loc_diff_norm\n\n    def valid_mask(self, tensor):\n        padding = self.patch_size//2\n        b, c, h, w = tensor.size()\n        inner = torch.ones(b, 1, h - 2 * padding, w - 2 * padding).type_as(tensor)\n        mask = F.pad(inner, [padding] * 4)\n        return mask\n  \n    def _forward(self, imgt_pred, imgt):\n        loc_diff_x = self.transform(imgt_pred)\n        loc_diff_y = self.transform(imgt)\n        diff = loc_diff_x - loc_diff_y.detach()\n        dist = (diff ** 2 / (0.1 + diff ** 2)).mean(dim=1, keepdim=True)\n        mask = self.valid_mask(imgt_pred)\n        loss = (dist * mask).mean()\n        return loss\n \n\nclass GeometryLoss(Loss):\n    def __init__(self, loss_weight, keys, patch_size=3):\n        super().__init__(loss_weight, keys)\n        self.patch_size = patch_size\n        out_channels = patch_size * patch_size\n        self.w = np.eye(out_channels).reshape((patch_size, patch_size, 1, out_channels))\n        self.w = np.transpose(self.w, (3, 2, 0, 1))\n        self.w = torch.tensor(self.w).float()\n\n    def transform(self, tensor):\n        b, c, h, w = tensor.size()\n        self.w = self.w.to(tensor.device)\n        tensor_ = tensor.reshape(b*c, 1, h, w)\n        patches = F.conv2d(tensor_, self.w, padding=self.patch_size // 2, bias=None)\n        loc_diff = patches - tensor_\n        loc_diff_ = loc_diff.reshape(b, c*(self.patch_size ** 2), h, w)\n        loc_diff_norm = loc_diff_ / torch.sqrt(0.81 + loc_diff_ ** 2)\n        return loc_diff_norm\n\n    def valid_mask(self, tensor):\n        padding = self.patch_size // 2\n        b, c, h, w = tensor.size()\n        inner = torch.ones(b, 1, h - 2 * padding, w - 2 * padding).type_as(tensor)\n        mask = F.pad(inner, [padding] * 4)\n        return mask\n\n    def _forward(self, ft_pred, ft_gt):\n        loss = 0.\n        for pred, gt in zip(ft_pred, ft_gt):\n            loc_diff_x = self.transform(pred)\n            loc_diff_y = self.transform(gt)\n            diff = loc_diff_x - loc_diff_y\n            dist = (diff ** 2 / (0.1 + diff ** 2)).mean(dim=1, keepdim=True)\n            mask = self.valid_mask(pred)\n            loss = loss + (dist * mask).mean()\n        return loss\n    \n\nclass IFRFlowLoss(Loss):\n    def __init__(self, loss_weight, keys, beta=0.3) -> None:\n        super().__init__(loss_weight, keys)\n        self.beta = beta\n        self.ada_cb_loss = AdaCharbonnierLoss(1.0, ['imgt_pred', 'imgt', 'weight'])\n    \n    def _forward(self, flow0_pred, flow1_pred, flow):\n        \n        robust_weight0 = self.get_robust_weight(flow0_pred[0], flow[:, 0:2])\n        robust_weight1 = self.get_robust_weight(flow1_pred[0], flow[:, 2:4])\n        loss = 0\n        for lvl in range(1, len(flow0_pred)):\n            scale_factor = 2**lvl\n            loss = loss + self.ada_cb_loss(**{\n                'imgt_pred': self.resize(flow0_pred[lvl], scale_factor),\n                'imgt': flow[:, 0:2],\n                'weight': robust_weight0\n            })\n            loss = loss + self.ada_cb_loss(**{\n                'imgt_pred': self.resize(flow1_pred[lvl], scale_factor),\n                'imgt': flow[:, 2:4],\n                'weight': robust_weight1\n            })\n        return loss\n    \n    def resize(self, x, scale_factor):\n        return scale_factor * F.interpolate(x, scale_factor=scale_factor, mode=\"bilinear\", align_corners=False)\n    \n    def get_robust_weight(self, flow_pred, flow_gt):\n        epe = ((flow_pred.detach() - flow_gt) ** 2).sum(dim=1, keepdim=True) ** 0.5\n        robust_weight = torch.exp(-self.beta * epe)\n        return robust_weight\n\n\nclass MultipleFlowLoss(Loss):\n    def __init__(self, loss_weight, keys, beta=0.3) -> None:\n        super().__init__(loss_weight, keys)\n        self.beta = beta\n        self.ada_cb_loss = AdaCharbonnierLoss(1.0, ['imgt_pred', 'imgt', 'weight'])\n    \n    def _forward(self, flow0_pred, flow1_pred, flow):\n        \n        robust_weight0 = self.get_mutli_flow_robust_weight(flow0_pred[0], flow[:, 0:2])\n        robust_weight1 = self.get_mutli_flow_robust_weight(flow1_pred[0], flow[:, 2:4])\n        loss = 0\n        for lvl in range(1, len(flow0_pred)):\n            scale_factor = 2**lvl\n            loss = loss + self.ada_cb_loss(**{\n                'imgt_pred': self.resize(flow0_pred[lvl], scale_factor),\n                'imgt': flow[:, 0:2],\n                'weight': robust_weight0\n            })\n            loss = loss + self.ada_cb_loss(**{\n                'imgt_pred': self.resize(flow1_pred[lvl], scale_factor),\n                'imgt': flow[:, 2:4],\n                'weight': robust_weight1\n            })\n        return loss\n    \n    def resize(self, x, scale_factor):\n        return scale_factor * F.interpolate(x, scale_factor=scale_factor, mode=\"bilinear\", align_corners=False)\n\n    def get_mutli_flow_robust_weight(self, flow_pred, flow_gt):\n        b, num_flows, c, h, w = flow_pred.shape\n        flow_pred = flow_pred.view(b, num_flows, c, h, w)\n        flow_gt = flow_gt.repeat(1, num_flows, 1, 1).view(b, num_flows, c, h, w)\n        epe = ((flow_pred.detach() - flow_gt) ** 2).sum(dim=2, keepdim=True).max(1)[0] ** 0.5\n        robust_weight = torch.exp(-self.beta * epe)\n        return robust_weight"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/metrics/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/metrics/psnr_ssim.py",
    "content": "import torch\nimport torch.nn.functional as F\nfrom math import exp\n\ndevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n\ndef gaussian(window_size, sigma):\n    gauss = torch.Tensor([exp(-(x - window_size//2)**2/float(2*sigma**2)) for x in range(window_size)])\n    return gauss/gauss.sum()\n\n\ndef create_window(window_size, channel=1):\n    _1D_window = gaussian(window_size, 1.5).unsqueeze(1)\n    _2D_window = _1D_window.mm(_1D_window.t()).float().unsqueeze(0).unsqueeze(0).to(device)\n    window = _2D_window.expand(channel, 1, window_size, window_size).contiguous()\n    return window\n\n\ndef create_window_3d(window_size, channel=1):\n    _1D_window = gaussian(window_size, 1.5).unsqueeze(1)\n    _2D_window = _1D_window.mm(_1D_window.t())\n    _3D_window = _2D_window.unsqueeze(2) @ (_1D_window.t())\n    window = _3D_window.expand(1, channel, window_size, window_size, window_size).contiguous().to(device)\n    return window\n\n\ndef ssim(img1, img2, window_size=11, window=None, size_average=True, full=False, val_range=None):\n    if val_range is None:\n        if torch.max(img1) > 128:\n            max_val = 255\n        else:\n            max_val = 1\n\n        if torch.min(img1) < -0.5:\n            min_val = -1\n        else:\n            min_val = 0\n        L = max_val - min_val\n    else:\n        L = val_range\n\n    padd = 0\n    (_, channel, height, width) = img1.size()\n    if window is None:\n        real_size = min(window_size, height, width)\n        window = create_window(real_size, channel=channel).to(img1.device)\n\n    mu1 = F.conv2d(F.pad(img1, (5, 5, 5, 5), mode='replicate'), window, padding=padd, groups=channel)\n    mu2 = F.conv2d(F.pad(img2, (5, 5, 5, 5), mode='replicate'), window, padding=padd, groups=channel)\n\n    mu1_sq = mu1.pow(2)\n    mu2_sq = mu2.pow(2)\n    mu1_mu2 = mu1 * mu2\n\n    sigma1_sq = F.conv2d(F.pad(img1 * img1, (5, 5, 5, 5), 'replicate'), window, padding=padd, groups=channel) - mu1_sq\n    sigma2_sq = F.conv2d(F.pad(img2 * img2, (5, 5, 5, 5), 'replicate'), window, padding=padd, groups=channel) - mu2_sq\n    sigma12 = F.conv2d(F.pad(img1 * img2, (5, 5, 5, 5), 'replicate'), window, padding=padd, groups=channel) - mu1_mu2\n\n    C1 = (0.01 * L) ** 2\n    C2 = (0.03 * L) ** 2\n\n    v1 = 2.0 * sigma12 + C2\n    v2 = sigma1_sq + sigma2_sq + C2\n    cs = torch.mean(v1 / v2)\n\n    ssim_map = ((2 * mu1_mu2 + C1) * v1) / ((mu1_sq + mu2_sq + C1) * v2)\n\n    if size_average:\n        ret = ssim_map.mean()\n    else:\n        ret = ssim_map.mean(1).mean(1).mean(1)\n\n    if full:\n        return ret, cs\n    return ret\n\n\ndef calculate_ssim(img1, img2, window_size=11, window=None, size_average=True, full=False, val_range=None):\n    if val_range is None:\n        if torch.max(img1) > 128:\n            max_val = 255\n        else:\n            max_val = 1\n\n        if torch.min(img1) < -0.5:\n            min_val = -1\n        else:\n            min_val = 0\n        L = max_val - min_val\n    else:\n        L = val_range\n\n    padd = 0\n    (_, _, height, width) = img1.size()\n    if window is None:\n        real_size = min(window_size, height, width)\n        window = create_window_3d(real_size, channel=1).to(img1.device)\n\n    img1 = img1.unsqueeze(1)\n    img2 = img2.unsqueeze(1)\n\n    mu1 = F.conv3d(F.pad(img1, (5, 5, 5, 5, 5, 5), mode='replicate'), window, padding=padd, groups=1)\n    mu2 = F.conv3d(F.pad(img2, (5, 5, 5, 5, 5, 5), mode='replicate'), window, padding=padd, groups=1)\n\n    mu1_sq = mu1.pow(2)\n    mu2_sq = mu2.pow(2)\n    mu1_mu2 = mu1 * mu2\n\n    sigma1_sq = F.conv3d(F.pad(img1 * img1, (5, 5, 5, 5, 5, 5), 'replicate'), window, padding=padd, groups=1) - mu1_sq\n    sigma2_sq = F.conv3d(F.pad(img2 * img2, (5, 5, 5, 5, 5, 5), 'replicate'), window, padding=padd, groups=1) - mu2_sq\n    sigma12 = F.conv3d(F.pad(img1 * img2, (5, 5, 5, 5, 5, 5), 'replicate'), window, padding=padd, groups=1) - mu1_mu2\n\n    C1 = (0.01 * L) ** 2\n    C2 = (0.03 * L) ** 2\n\n    v1 = 2.0 * sigma12 + C2\n    v2 = sigma1_sq + sigma2_sq + C2\n    cs = torch.mean(v1 / v2)\n\n    ssim_map = ((2 * mu1_mu2 + C1) * v1) / ((mu1_sq + mu2_sq + C1) * v2)\n\n    if size_average:\n        ret = ssim_map.mean()\n    else:\n        ret = ssim_map.mean(1).mean(1).mean(1)\n\n    if full:\n        return ret, cs\n    return ret.detach().cpu().numpy()\n\n\n\ndef calculate_psnr(img1, img2):\n    psnr = -10 * torch.log10(((img1 - img2) * (img1 - img2)).mean())\n    return psnr.detach().cpu().numpy()\n\n\ndef calculate_ie(img1, img2):\n    ie = torch.abs(torch.round(img1 * 255.0) - torch.round(img2 * 255.0)).mean()\n    return ie.detach().cpu().numpy()\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/networks/AMT-G.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom vbench.third_party.amt.networks.blocks.raft import (\n    coords_grid,\n    BasicUpdateBlock, BidirCorrBlock\n)\nfrom vbench.third_party.amt.networks.blocks.feat_enc import (\n    LargeEncoder\n)\nfrom vbench.third_party.amt.networks.blocks.ifrnet import (\n    resize,\n    Encoder,\n    InitDecoder,\n    IntermediateDecoder\n)\nfrom vbench.third_party.amt.networks.blocks.multi_flow import (\n    multi_flow_combine,\n    MultiFlowDecoder\n)\n\n\nclass Model(nn.Module):\n    def __init__(self, \n                 corr_radius=3, \n                 corr_lvls=4, \n                 num_flows=5, \n                 channels=[84, 96, 112, 128], \n                 skip_channels=84):\n        super(Model, self).__init__()\n        self.radius = corr_radius\n        self.corr_levels = corr_lvls\n        self.num_flows = num_flows\n\n        self.feat_encoder = LargeEncoder(output_dim=128, norm_fn='instance', dropout=0.)\n        self.encoder = Encoder(channels, large=True)\n        self.decoder4 = InitDecoder(channels[3], channels[2], skip_channels)\n        self.decoder3 = IntermediateDecoder(channels[2], channels[1], skip_channels)\n        self.decoder2 = IntermediateDecoder(channels[1], channels[0], skip_channels)\n        self.decoder1 = MultiFlowDecoder(channels[0], skip_channels, num_flows)\n\n        self.update4 = self._get_updateblock(112, None)\n        self.update3_low = self._get_updateblock(96, 2.0)\n        self.update2_low = self._get_updateblock(84, 4.0)\n        \n        self.update3_high = self._get_updateblock(96, None)\n        self.update2_high = self._get_updateblock(84, None)\n        \n        self.comb_block = nn.Sequential(\n            nn.Conv2d(3*self.num_flows, 6*self.num_flows, 7, 1, 3),\n            nn.PReLU(6*self.num_flows),\n            nn.Conv2d(6*self.num_flows, 3, 7, 1, 3),\n        )\n\n    def _get_updateblock(self, cdim, scale_factor=None):\n        return BasicUpdateBlock(cdim=cdim, hidden_dim=192, flow_dim=64, \n                                corr_dim=256, corr_dim2=192, fc_dim=188, \n                                scale_factor=scale_factor, corr_levels=self.corr_levels, \n                                radius=self.radius)\n\n    def _corr_scale_lookup(self, corr_fn, coord, flow0, flow1, embt, downsample=1):\n        # convert t -> 0 to 0 -> 1 | convert t -> 1 to 1 -> 0\n        # based on linear assumption\n        t1_scale = 1. / embt\n        t0_scale = 1. / (1. - embt)\n        if downsample != 1:\n            inv = 1 / downsample\n            flow0 = inv * resize(flow0, scale_factor=inv)\n            flow1 = inv * resize(flow1, scale_factor=inv)\n            \n        corr0, corr1 = corr_fn(coord + flow1 * t1_scale, coord + flow0 * t0_scale) \n        corr = torch.cat([corr0, corr1], dim=1)\n        flow = torch.cat([flow0, flow1], dim=1)\n        return corr, flow\n    \n    def forward(self, img0, img1, embt, scale_factor=1.0, eval=False, **kwargs):\n        mean_ = torch.cat([img0, img1], 2).mean(1, keepdim=True).mean(2, keepdim=True).mean(3, keepdim=True)\n        img0 = img0 - mean_\n        img1 = img1 - mean_\n        img0_ = resize(img0, scale_factor) if scale_factor != 1.0 else img0\n        img1_ = resize(img1, scale_factor) if scale_factor != 1.0 else img1\n        b, _, h, w = img0_.shape\n        coord = coords_grid(b, h // 8, w // 8, img0.device)\n        \n        fmap0, fmap1 = self.feat_encoder([img0_, img1_]) # [1, 128, H//8, W//8]\n        corr_fn = BidirCorrBlock(fmap0, fmap1, radius=self.radius, num_levels=self.corr_levels)\n\n        # f0_1: [1, c0, H//2, W//2] | f0_2: [1, c1, H//4, W//4]\n        # f0_3: [1, c2, H//8, W//8] | f0_4: [1, c3, H//16, W//16]\n        f0_1, f0_2, f0_3, f0_4 = self.encoder(img0_)\n        f1_1, f1_2, f1_3, f1_4 = self.encoder(img1_)\n\n        ######################################### the 4th decoder #########################################\n        up_flow0_4, up_flow1_4, ft_3_ = self.decoder4(f0_4, f1_4, embt)\n        corr_4, flow_4 = self._corr_scale_lookup(corr_fn, coord, \n                                                 up_flow0_4, up_flow1_4, \n                                                 embt, downsample=1)\n\n        # residue update with lookup corr\n        delta_ft_3_, delta_flow_4 = self.update4(ft_3_, flow_4, corr_4)\n        delta_flow0_4, delta_flow1_4 = torch.chunk(delta_flow_4, 2, 1)\n        up_flow0_4 = up_flow0_4 + delta_flow0_4\n        up_flow1_4 = up_flow1_4 + delta_flow1_4\n        ft_3_ = ft_3_ + delta_ft_3_\n\n        ######################################### the 3rd decoder #########################################\n        up_flow0_3, up_flow1_3, ft_2_ = self.decoder3(ft_3_, f0_3, f1_3, up_flow0_4, up_flow1_4)\n        corr_3, flow_3 = self._corr_scale_lookup(corr_fn, \n                                                 coord, up_flow0_3, up_flow1_3, \n                                                 embt, downsample=2)\n\n        # residue update with lookup corr\n        delta_ft_2_, delta_flow_3 = self.update3_low(ft_2_, flow_3, corr_3)\n        delta_flow0_3, delta_flow1_3 = torch.chunk(delta_flow_3, 2, 1)\n        up_flow0_3 = up_flow0_3 + delta_flow0_3\n        up_flow1_3 = up_flow1_3 + delta_flow1_3\n        ft_2_ = ft_2_ + delta_ft_2_\n        \n        # residue update with lookup corr (hr)\n        corr_3 = resize(corr_3, scale_factor=2.0)\n        up_flow_3 = torch.cat([up_flow0_3, up_flow1_3], dim=1)\n        delta_ft_2_, delta_up_flow_3 = self.update3_high(ft_2_, up_flow_3, corr_3)\n        ft_2_ += delta_ft_2_\n        up_flow0_3 += delta_up_flow_3[:, 0:2]\n        up_flow1_3 += delta_up_flow_3[:, 2:4]\n        \n        ######################################### the 2nd decoder #########################################\n        up_flow0_2, up_flow1_2, ft_1_  = self.decoder2(ft_2_, f0_2, f1_2, up_flow0_3, up_flow1_3)\n        corr_2, flow_2 = self._corr_scale_lookup(corr_fn, \n                                                 coord, up_flow0_2, up_flow1_2, \n                                                 embt, downsample=4)\n        \n        # residue update with lookup corr\n        delta_ft_1_, delta_flow_2 = self.update2_low(ft_1_, flow_2, corr_2)\n        delta_flow0_2, delta_flow1_2 = torch.chunk(delta_flow_2, 2, 1)\n        up_flow0_2 = up_flow0_2 + delta_flow0_2\n        up_flow1_2 = up_flow1_2 + delta_flow1_2\n        ft_1_ = ft_1_ + delta_ft_1_\n        \n        # residue update with lookup corr (hr)\n        corr_2 = resize(corr_2, scale_factor=4.0)\n        up_flow_2 = torch.cat([up_flow0_2, up_flow1_2], dim=1)\n        delta_ft_1_, delta_up_flow_2 = self.update2_high(ft_1_, up_flow_2, corr_2)\n        ft_1_ += delta_ft_1_\n        up_flow0_2 += delta_up_flow_2[:, 0:2]\n        up_flow1_2 += delta_up_flow_2[:, 2:4]\n        \n        ######################################### the 1st decoder #########################################\n        up_flow0_1, up_flow1_1, mask, img_res = self.decoder1(ft_1_, f0_1, f1_1, up_flow0_2, up_flow1_2)\n        \n        if scale_factor != 1.0: \n            up_flow0_1 = resize(up_flow0_1, scale_factor=(1.0/scale_factor)) * (1.0/scale_factor)\n            up_flow1_1 = resize(up_flow1_1, scale_factor=(1.0/scale_factor)) * (1.0/scale_factor)\n            mask = resize(mask, scale_factor=(1.0/scale_factor))\n            img_res = resize(img_res, scale_factor=(1.0/scale_factor))\n\n        # Merge multiple predictions \n        imgt_pred = multi_flow_combine(self.comb_block, img0, img1, up_flow0_1, up_flow1_1, \n                                                                        mask, img_res, mean_)\n        imgt_pred = torch.clamp(imgt_pred, 0, 1)\n\n        if eval:\n            return  { 'imgt_pred': imgt_pred, }\n        else:\n            up_flow0_1 = up_flow0_1.reshape(b, self.num_flows, 2, h, w)\n            up_flow1_1 = up_flow1_1.reshape(b, self.num_flows, 2, h, w)\n            return {\n                'imgt_pred': imgt_pred,\n                'flow0_pred': [up_flow0_1, up_flow0_2, up_flow0_3, up_flow0_4],\n                'flow1_pred': [up_flow1_1, up_flow1_2, up_flow1_3, up_flow1_4],\n                'ft_pred': [ft_1_, ft_2_, ft_3_],\n            }\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/networks/AMT-L.py",
    "content": "import torch\nimport torch.nn as nn\nfrom vbench.third_party.amt.networks.blocks.raft import (\n    coords_grid,\n    BasicUpdateBlock, BidirCorrBlock\n)\nfrom vbench.third_party.amt.networks.blocks.feat_enc import (\n    BasicEncoder,\n)\nfrom vbench.third_party.amt.networks.blocks.ifrnet import (\n    resize,\n    Encoder,\n    InitDecoder,\n    IntermediateDecoder\n)\nfrom vbench.third_party.amt.networks.blocks.multi_flow import (\n    multi_flow_combine,\n    MultiFlowDecoder\n)\n\nclass Model(nn.Module):\n    def __init__(self, \n                 corr_radius=3, \n                 corr_lvls=4, \n                 num_flows=5,\n                 channels=[48, 64, 72, 128], \n                 skip_channels=48\n                 ):\n        super(Model, self).__init__()\n        self.radius = corr_radius\n        self.corr_levels = corr_lvls\n        self.num_flows = num_flows\n\n        self.feat_encoder = BasicEncoder(output_dim=128, norm_fn='instance', dropout=0.)\n        self.encoder = Encoder([48, 64, 72, 128], large=True)\n        \n        self.decoder4 = InitDecoder(channels[3], channels[2], skip_channels)\n        self.decoder3 = IntermediateDecoder(channels[2], channels[1], skip_channels)\n        self.decoder2 = IntermediateDecoder(channels[1], channels[0], skip_channels)\n        self.decoder1 = MultiFlowDecoder(channels[0], skip_channels, num_flows)\n\n        self.update4 = self._get_updateblock(72, None)\n        self.update3 = self._get_updateblock(64, 2.0)\n        self.update2 = self._get_updateblock(48, 4.0)\n        \n        self.comb_block = nn.Sequential(\n            nn.Conv2d(3*self.num_flows, 6*self.num_flows, 7, 1, 3),\n            nn.PReLU(6*self.num_flows),\n            nn.Conv2d(6*self.num_flows, 3, 7, 1, 3),\n        )\n\n    def _get_updateblock(self, cdim, scale_factor=None):\n        return BasicUpdateBlock(cdim=cdim, hidden_dim=128, flow_dim=48, \n                                corr_dim=256, corr_dim2=160, fc_dim=124, \n                                scale_factor=scale_factor, corr_levels=self.corr_levels, \n                                radius=self.radius)\n\n    def _corr_scale_lookup(self, corr_fn, coord, flow0, flow1, embt, downsample=1):\n        # convert t -> 0 to 0 -> 1 | convert t -> 1 to 1 -> 0\n        # based on linear assumption\n        t1_scale = 1. / embt\n        t0_scale = 1. / (1. - embt)\n        if downsample != 1:\n            inv = 1 / downsample\n            flow0 = inv * resize(flow0, scale_factor=inv)\n            flow1 = inv * resize(flow1, scale_factor=inv)\n            \n        corr0, corr1 = corr_fn(coord + flow1 * t1_scale, coord + flow0 * t0_scale) \n        corr = torch.cat([corr0, corr1], dim=1)\n        flow = torch.cat([flow0, flow1], dim=1)\n        return corr, flow\n    \n    def forward(self, img0, img1, embt, scale_factor=1.0, eval=False, **kwargs):\n        mean_ = torch.cat([img0, img1], 2).mean(1, keepdim=True).mean(2, keepdim=True).mean(3, keepdim=True)\n        img0 = img0 - mean_\n        img1 = img1 - mean_\n        img0_ = resize(img0, scale_factor) if scale_factor != 1.0 else img0\n        img1_ = resize(img1, scale_factor) if scale_factor != 1.0 else img1\n        b, _, h, w = img0_.shape\n        coord = coords_grid(b, h // 8, w // 8, img0.device)\n        \n        fmap0, fmap1 = self.feat_encoder([img0_, img1_]) # [1, 128, H//8, W//8]\n        corr_fn = BidirCorrBlock(fmap0, fmap1, radius=self.radius, num_levels=self.corr_levels)\n\n        # f0_1: [1, c0, H//2, W//2] | f0_2: [1, c1, H//4, W//4]\n        # f0_3: [1, c2, H//8, W//8] | f0_4: [1, c3, H//16, W//16]\n        f0_1, f0_2, f0_3, f0_4 = self.encoder(img0_)\n        f1_1, f1_2, f1_3, f1_4 = self.encoder(img1_)\n\n        ######################################### the 4th decoder #########################################\n        up_flow0_4, up_flow1_4, ft_3_ = self.decoder4(f0_4, f1_4, embt)\n        corr_4, flow_4 = self._corr_scale_lookup(corr_fn, coord, \n                                                 up_flow0_4, up_flow1_4, \n                                                 embt, downsample=1)\n\n        # residue update with lookup corr\n        delta_ft_3_, delta_flow_4 = self.update4(ft_3_, flow_4, corr_4)\n        delta_flow0_4, delta_flow1_4 = torch.chunk(delta_flow_4, 2, 1)\n        up_flow0_4 = up_flow0_4 + delta_flow0_4\n        up_flow1_4 = up_flow1_4 + delta_flow1_4\n        ft_3_ = ft_3_ + delta_ft_3_\n\n        ######################################### the 3rd decoder #########################################\n        up_flow0_3, up_flow1_3, ft_2_ = self.decoder3(ft_3_, f0_3, f1_3, up_flow0_4, up_flow1_4)\n        corr_3, flow_3 = self._corr_scale_lookup(corr_fn, \n                                                 coord, up_flow0_3, up_flow1_3, \n                                                 embt, downsample=2)\n\n        # residue update with lookup corr\n        delta_ft_2_, delta_flow_3 = self.update3(ft_2_, flow_3, corr_3)\n        delta_flow0_3, delta_flow1_3 = torch.chunk(delta_flow_3, 2, 1)\n        up_flow0_3 = up_flow0_3 + delta_flow0_3\n        up_flow1_3 = up_flow1_3 + delta_flow1_3\n        ft_2_ = ft_2_ + delta_ft_2_\n\n        ######################################### the 2nd decoder #########################################\n        up_flow0_2, up_flow1_2, ft_1_  = self.decoder2(ft_2_, f0_2, f1_2, up_flow0_3, up_flow1_3)\n        corr_2, flow_2 = self._corr_scale_lookup(corr_fn, \n                                                 coord, up_flow0_2, up_flow1_2, \n                                                 embt, downsample=4)\n        \n        # residue update with lookup corr\n        delta_ft_1_, delta_flow_2 = self.update2(ft_1_, flow_2, corr_2)\n        delta_flow0_2, delta_flow1_2 = torch.chunk(delta_flow_2, 2, 1)\n        up_flow0_2 = up_flow0_2 + delta_flow0_2\n        up_flow1_2 = up_flow1_2 + delta_flow1_2\n        ft_1_ = ft_1_ + delta_ft_1_\n\n        ######################################### the 1st decoder #########################################\n        up_flow0_1, up_flow1_1, mask, img_res = self.decoder1(ft_1_, f0_1, f1_1, up_flow0_2, up_flow1_2)\n        \n        if scale_factor != 1.0: \n            up_flow0_1 = resize(up_flow0_1, scale_factor=(1.0/scale_factor)) * (1.0/scale_factor)\n            up_flow1_1 = resize(up_flow1_1, scale_factor=(1.0/scale_factor)) * (1.0/scale_factor)\n            mask = resize(mask, scale_factor=(1.0/scale_factor))\n            img_res = resize(img_res, scale_factor=(1.0/scale_factor))\n\n        # Merge multiple predictions \n        imgt_pred = multi_flow_combine(self.comb_block, img0, img1, up_flow0_1, up_flow1_1, \n                                                                        mask, img_res, mean_)\n        imgt_pred = torch.clamp(imgt_pred, 0, 1)\n\n        if eval:\n            return  { 'imgt_pred': imgt_pred, }\n        else:\n            up_flow0_1 = up_flow0_1.reshape(b, self.num_flows, 2, h, w)\n            up_flow1_1 = up_flow1_1.reshape(b, self.num_flows, 2, h, w)\n            return {\n                'imgt_pred': imgt_pred,\n                'flow0_pred': [up_flow0_1, up_flow0_2, up_flow0_3, up_flow0_4],\n                'flow1_pred': [up_flow1_1, up_flow1_2, up_flow1_3, up_flow1_4],\n                'ft_pred': [ft_1_, ft_2_, ft_3_],\n            }\n    \n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/networks/AMT-S.py",
    "content": "import torch\nimport torch.nn as nn\nfrom vbench.third_party.amt.networks.blocks.raft import (\n    SmallUpdateBlock,\n    coords_grid,\n    BidirCorrBlock\n)\nfrom vbench.third_party.amt.networks.blocks.feat_enc import (\n    SmallEncoder\n)\nfrom vbench.third_party.amt.networks.blocks.ifrnet import (\n    resize,\n    Encoder,\n    InitDecoder,\n    IntermediateDecoder\n)\nfrom vbench.third_party.amt.networks.blocks.multi_flow import (\n    multi_flow_combine,\n    MultiFlowDecoder\n)\n\nclass Model(nn.Module):\n    def __init__(self, \n                 corr_radius=3, \n                 corr_lvls=4, \n                 num_flows=3, \n                 channels=[20, 32, 44, 56], \n                 skip_channels=20):\n        super(Model, self).__init__()\n        self.radius = corr_radius\n        self.corr_levels = corr_lvls\n        self.num_flows = num_flows\n        self.channels = channels\n        self.skip_channels = skip_channels\n\n        self.feat_encoder = SmallEncoder(output_dim=84, norm_fn='instance', dropout=0.)\n        self.encoder = Encoder(channels)\n\n        self.decoder4 = InitDecoder(channels[3], channels[2], skip_channels)\n        self.decoder3 = IntermediateDecoder(channels[2], channels[1], skip_channels)\n        self.decoder2 = IntermediateDecoder(channels[1], channels[0], skip_channels)\n        self.decoder1 = MultiFlowDecoder(channels[0], skip_channels, num_flows)\n\n        self.update4 = self._get_updateblock(44)\n        self.update3 = self._get_updateblock(32, 2)\n        self.update2 = self._get_updateblock(20, 4)\n        \n        self.comb_block = nn.Sequential(\n            nn.Conv2d(3*num_flows, 6*num_flows, 3, 1, 1),\n            nn.PReLU(6*num_flows),\n            nn.Conv2d(6*num_flows, 3, 3, 1, 1),\n        )\n\n    def _get_updateblock(self, cdim, scale_factor=None):\n        return SmallUpdateBlock(cdim=cdim, hidden_dim=76, flow_dim=20, corr_dim=64, \n                                fc_dim=68, scale_factor=scale_factor, \n                                corr_levels=self.corr_levels, radius=self.radius)\n\n    def _corr_scale_lookup(self, corr_fn, coord, flow0, flow1, embt, downsample=1):\n        # convert t -> 0 to 0 -> 1 | convert t -> 1 to 1 -> 0\n        # based on linear assumption\n        t1_scale = 1. / embt\n        t0_scale = 1. / (1. - embt)\n        if downsample != 1:\n            inv = 1 / downsample\n            flow0 = inv * resize(flow0, scale_factor=inv)\n            flow1 = inv * resize(flow1, scale_factor=inv)\n            \n        corr0, corr1 = corr_fn(coord + flow1 * t1_scale, coord + flow0 * t0_scale) \n        corr = torch.cat([corr0, corr1], dim=1)\n        flow = torch.cat([flow0, flow1], dim=1)\n        return corr, flow\n\n    def forward(self, img0, img1, embt, scale_factor=1.0, eval=False, **kwargs):\n        mean_ = torch.cat([img0, img1], 2).mean(1, keepdim=True).mean(2, keepdim=True).mean(3, keepdim=True)\n        img0 = img0 - mean_\n        img1 = img1 - mean_\n        img0_ = resize(img0, scale_factor) if scale_factor != 1.0 else img0\n        img1_ = resize(img1, scale_factor) if scale_factor != 1.0 else img1\n        b, _, h, w = img0_.shape\n        coord = coords_grid(b, h // 8, w // 8, img0.device)\n        \n        fmap0, fmap1 = self.feat_encoder([img0_, img1_]) # [1, 128, H//8, W//8]\n        corr_fn = BidirCorrBlock(fmap0, fmap1, radius=self.radius, num_levels=self.corr_levels)\n\n        # f0_1: [1, c0, H//2, W//2] | f0_2: [1, c1, H//4, W//4]\n        # f0_3: [1, c2, H//8, W//8] | f0_4: [1, c3, H//16, W//16]\n        f0_1, f0_2, f0_3, f0_4 = self.encoder(img0_)\n        f1_1, f1_2, f1_3, f1_4 = self.encoder(img1_)\n\n        ######################################### the 4th decoder #########################################\n        up_flow0_4, up_flow1_4, ft_3_ = self.decoder4(f0_4, f1_4, embt)\n        corr_4, flow_4 = self._corr_scale_lookup(corr_fn, coord, \n                                                 up_flow0_4, up_flow1_4, \n                                                 embt, downsample=1)\n\n        # residue update with lookup corr\n        delta_ft_3_, delta_flow_4 = self.update4(ft_3_, flow_4, corr_4)\n        delta_flow0_4, delta_flow1_4 = torch.chunk(delta_flow_4, 2, 1)\n        up_flow0_4 = up_flow0_4 + delta_flow0_4\n        up_flow1_4 = up_flow1_4 + delta_flow1_4\n        ft_3_ = ft_3_ + delta_ft_3_\n\n        ######################################### the 3rd decoder #########################################\n        up_flow0_3, up_flow1_3, ft_2_ = self.decoder3(ft_3_, f0_3, f1_3, up_flow0_4, up_flow1_4)\n        corr_3, flow_3 = self._corr_scale_lookup(corr_fn, \n                                                 coord, up_flow0_3, up_flow1_3, \n                                                 embt, downsample=2)\n\n        # residue update with lookup corr\n        delta_ft_2_, delta_flow_3 = self.update3(ft_2_, flow_3, corr_3)\n        delta_flow0_3, delta_flow1_3 = torch.chunk(delta_flow_3, 2, 1)\n        up_flow0_3 = up_flow0_3 + delta_flow0_3\n        up_flow1_3 = up_flow1_3 + delta_flow1_3\n        ft_2_ = ft_2_ + delta_ft_2_\n\n        ######################################### the 2nd decoder #########################################\n        up_flow0_2, up_flow1_2, ft_1_  = self.decoder2(ft_2_, f0_2, f1_2, up_flow0_3, up_flow1_3)\n        corr_2, flow_2 = self._corr_scale_lookup(corr_fn, \n                                                 coord, up_flow0_2, up_flow1_2, \n                                                 embt, downsample=4)\n        \n        # residue update with lookup corr\n        delta_ft_1_, delta_flow_2 = self.update2(ft_1_, flow_2, corr_2)\n        delta_flow0_2, delta_flow1_2 = torch.chunk(delta_flow_2, 2, 1)\n        up_flow0_2 = up_flow0_2 + delta_flow0_2\n        up_flow1_2 = up_flow1_2 + delta_flow1_2\n        ft_1_ = ft_1_ + delta_ft_1_\n\n        ######################################### the 1st decoder #########################################\n        up_flow0_1, up_flow1_1, mask, img_res = self.decoder1(ft_1_, f0_1, f1_1, up_flow0_2, up_flow1_2)\n        \n        if scale_factor != 1.0: \n            up_flow0_1 = resize(up_flow0_1, scale_factor=(1.0/scale_factor)) * (1.0/scale_factor)\n            up_flow1_1 = resize(up_flow1_1, scale_factor=(1.0/scale_factor)) * (1.0/scale_factor)\n            mask = resize(mask, scale_factor=(1.0/scale_factor))\n            img_res = resize(img_res, scale_factor=(1.0/scale_factor))\n        \n        # Merge multiple predictions \n        imgt_pred = multi_flow_combine(self.comb_block, img0, img1, up_flow0_1, up_flow1_1, \n                                                                        mask, img_res, mean_)\n        imgt_pred = torch.clamp(imgt_pred, 0, 1)\n\n        if eval:\n            return  { 'imgt_pred': imgt_pred, }\n        else:\n            up_flow0_1 = up_flow0_1.reshape(b, self.num_flows, 2, h, w)\n            up_flow1_1 = up_flow1_1.reshape(b, self.num_flows, 2, h, w)\n            return {\n                'imgt_pred': imgt_pred,\n                'flow0_pred': [up_flow0_1, up_flow0_2, up_flow0_3, up_flow0_4],\n                'flow1_pred': [up_flow1_1, up_flow1_2, up_flow1_3, up_flow1_4],\n                'ft_pred': [ft_1_, ft_2_, ft_3_],\n            }\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/networks/blocks/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/networks/blocks/feat_enc.py",
    "content": "import torch\nimport torch.nn as nn\n\n\nclass BottleneckBlock(nn.Module):\n    def __init__(self, in_planes, planes, norm_fn='group', stride=1):\n        super(BottleneckBlock, self).__init__()\n  \n        self.conv1 = nn.Conv2d(in_planes, planes//4, kernel_size=1, padding=0)\n        self.conv2 = nn.Conv2d(planes//4, planes//4, kernel_size=3, padding=1, stride=stride)\n        self.conv3 = nn.Conv2d(planes//4, planes, kernel_size=1, padding=0)\n        self.relu = nn.ReLU(inplace=True)\n\n        num_groups = planes // 8\n\n        if norm_fn == 'group':\n            self.norm1 = nn.GroupNorm(num_groups=num_groups, num_channels=planes//4)\n            self.norm2 = nn.GroupNorm(num_groups=num_groups, num_channels=planes//4)\n            self.norm3 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n            if not stride == 1:\n                self.norm4 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n        \n        elif norm_fn == 'batch':\n            self.norm1 = nn.BatchNorm2d(planes//4)\n            self.norm2 = nn.BatchNorm2d(planes//4)\n            self.norm3 = nn.BatchNorm2d(planes)\n            if not stride == 1:\n                self.norm4 = nn.BatchNorm2d(planes)\n        \n        elif norm_fn == 'instance':\n            self.norm1 = nn.InstanceNorm2d(planes//4)\n            self.norm2 = nn.InstanceNorm2d(planes//4)\n            self.norm3 = nn.InstanceNorm2d(planes)\n            if not stride == 1:\n                self.norm4 = nn.InstanceNorm2d(planes)\n\n        elif norm_fn == 'none':\n            self.norm1 = nn.Sequential()\n            self.norm2 = nn.Sequential()\n            self.norm3 = nn.Sequential()\n            if not stride == 1:\n                self.norm4 = nn.Sequential()\n\n        if stride == 1:\n            self.downsample = None\n        \n        else:    \n            self.downsample = nn.Sequential(\n                nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride), self.norm4)\n\n\n    def forward(self, x):\n        y = x\n        y = self.relu(self.norm1(self.conv1(y)))\n        y = self.relu(self.norm2(self.conv2(y)))\n        y = self.relu(self.norm3(self.conv3(y)))\n\n        if self.downsample is not None:\n            x = self.downsample(x)\n\n        return self.relu(x+y)\n\n\nclass ResidualBlock(nn.Module):\n    def __init__(self, in_planes, planes, norm_fn='group', stride=1):\n        super(ResidualBlock, self).__init__()\n  \n        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, padding=1, stride=stride)\n        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, padding=1)\n        self.relu = nn.ReLU(inplace=True)\n\n        num_groups = planes // 8\n\n        if norm_fn == 'group':\n            self.norm1 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n            self.norm2 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n            if not stride == 1:\n                self.norm3 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n        \n        elif norm_fn == 'batch':\n            self.norm1 = nn.BatchNorm2d(planes)\n            self.norm2 = nn.BatchNorm2d(planes)\n            if not stride == 1:\n                self.norm3 = nn.BatchNorm2d(planes)\n        \n        elif norm_fn == 'instance':\n            self.norm1 = nn.InstanceNorm2d(planes)\n            self.norm2 = nn.InstanceNorm2d(planes)\n            if not stride == 1:\n                self.norm3 = nn.InstanceNorm2d(planes)\n\n        elif norm_fn == 'none':\n            self.norm1 = nn.Sequential()\n            self.norm2 = nn.Sequential()\n            if not stride == 1:\n                self.norm3 = nn.Sequential()\n\n        if stride == 1:\n            self.downsample = None\n        \n        else:    \n            self.downsample = nn.Sequential(\n                nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride), self.norm3)\n\n\n    def forward(self, x):\n        y = x\n        y = self.relu(self.norm1(self.conv1(y)))\n        y = self.relu(self.norm2(self.conv2(y)))\n\n        if self.downsample is not None:\n            x = self.downsample(x)\n\n        return self.relu(x+y)\n\n\nclass SmallEncoder(nn.Module):\n    def __init__(self, output_dim=128, norm_fn='batch', dropout=0.0):\n        super(SmallEncoder, self).__init__()\n        self.norm_fn = norm_fn\n\n        if self.norm_fn == 'group':\n            self.norm1 = nn.GroupNorm(num_groups=8, num_channels=32)\n            \n        elif self.norm_fn == 'batch':\n            self.norm1 = nn.BatchNorm2d(32)\n\n        elif self.norm_fn == 'instance':\n            self.norm1 = nn.InstanceNorm2d(32)\n\n        elif self.norm_fn == 'none':\n            self.norm1 = nn.Sequential()\n\n        self.conv1 = nn.Conv2d(3, 32, kernel_size=7, stride=2, padding=3)\n        self.relu1 = nn.ReLU(inplace=True)\n\n        self.in_planes = 32\n        self.layer1 = self._make_layer(32,  stride=1)\n        self.layer2 = self._make_layer(64, stride=2)\n        self.layer3 = self._make_layer(96, stride=2)\n\n        self.dropout = None\n        if dropout > 0:\n            self.dropout = nn.Dropout2d(p=dropout)\n        \n        self.conv2 = nn.Conv2d(96, output_dim, kernel_size=1)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n            elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):\n                if m.weight is not None:\n                    nn.init.constant_(m.weight, 1)\n                if m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n\n    def _make_layer(self, dim, stride=1):\n        layer1 = BottleneckBlock(self.in_planes, dim, self.norm_fn, stride=stride)\n        layer2 = BottleneckBlock(dim, dim, self.norm_fn, stride=1)\n        layers = (layer1, layer2)\n    \n        self.in_planes = dim\n        return nn.Sequential(*layers)\n\n\n    def forward(self, x):\n\n        # if input is list, combine batch dimension\n        is_list = isinstance(x, tuple) or isinstance(x, list)\n        if is_list:\n            batch_dim = x[0].shape[0]\n            x = torch.cat(x, dim=0)\n\n        x = self.conv1(x)\n        x = self.norm1(x)\n        x = self.relu1(x)\n\n        x = self.layer1(x)\n        x = self.layer2(x)\n        x = self.layer3(x)\n        x = self.conv2(x)\n\n        if self.training and self.dropout is not None:\n            x = self.dropout(x)\n\n        if is_list:\n            x = torch.split(x, [batch_dim, batch_dim], dim=0)\n\n        return x\n\nclass BasicEncoder(nn.Module):\n    def __init__(self, output_dim=128, norm_fn='batch', dropout=0.0):\n        super(BasicEncoder, self).__init__()\n        self.norm_fn = norm_fn\n\n        if self.norm_fn == 'group':\n            self.norm1 = nn.GroupNorm(num_groups=8, num_channels=64)\n            \n        elif self.norm_fn == 'batch':\n            self.norm1 = nn.BatchNorm2d(64)\n\n        elif self.norm_fn == 'instance':\n            self.norm1 = nn.InstanceNorm2d(64)\n\n        elif self.norm_fn == 'none':\n            self.norm1 = nn.Sequential()\n\n        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)\n        self.relu1 = nn.ReLU(inplace=True)\n\n        self.in_planes = 64\n        self.layer1 = self._make_layer(64,  stride=1)\n        self.layer2 = self._make_layer(72, stride=2)\n        self.layer3 = self._make_layer(128, stride=2)\n\n        # output convolution\n        self.conv2 = nn.Conv2d(128, output_dim, kernel_size=1)\n\n        self.dropout = None\n        if dropout > 0:\n            self.dropout = nn.Dropout2d(p=dropout)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n            elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):\n                if m.weight is not None:\n                    nn.init.constant_(m.weight, 1)\n                if m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n\n    def _make_layer(self, dim, stride=1):\n        layer1 = ResidualBlock(self.in_planes, dim, self.norm_fn, stride=stride)\n        layer2 = ResidualBlock(dim, dim, self.norm_fn, stride=1)\n        layers = (layer1, layer2)\n        \n        self.in_planes = dim\n        return nn.Sequential(*layers)\n\n\n    def forward(self, x):\n\n        # if input is list, combine batch dimension\n        is_list = isinstance(x, tuple) or isinstance(x, list)\n        if is_list:\n            batch_dim = x[0].shape[0]\n            x = torch.cat(x, dim=0)\n\n        x = self.conv1(x)\n        x = self.norm1(x)\n        x = self.relu1(x)\n\n        x = self.layer1(x)\n        x = self.layer2(x)\n        x = self.layer3(x)\n\n        x = self.conv2(x)\n\n        if self.training and self.dropout is not None:\n            x = self.dropout(x)\n\n        if is_list:\n            x = torch.split(x, [batch_dim, batch_dim], dim=0)\n\n        return x\n\nclass LargeEncoder(nn.Module):\n    def __init__(self, output_dim=128, norm_fn='batch', dropout=0.0):\n        super(LargeEncoder, self).__init__()\n        self.norm_fn = norm_fn\n\n        if self.norm_fn == 'group':\n            self.norm1 = nn.GroupNorm(num_groups=8, num_channels=64)\n            \n        elif self.norm_fn == 'batch':\n            self.norm1 = nn.BatchNorm2d(64)\n\n        elif self.norm_fn == 'instance':\n            self.norm1 = nn.InstanceNorm2d(64)\n\n        elif self.norm_fn == 'none':\n            self.norm1 = nn.Sequential()\n\n        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)\n        self.relu1 = nn.ReLU(inplace=True)\n\n        self.in_planes = 64\n        self.layer1 = self._make_layer(64, stride=1)\n        self.layer2 = self._make_layer(112, stride=2)\n        self.layer3 = self._make_layer(160, stride=2)\n        self.layer3_2 = self._make_layer(160, stride=1)\n\n        # output convolution\n        self.conv2 = nn.Conv2d(self.in_planes, output_dim, kernel_size=1)\n\n        self.dropout = None\n        if dropout > 0:\n            self.dropout = nn.Dropout2d(p=dropout)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n            elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):\n                if m.weight is not None:\n                    nn.init.constant_(m.weight, 1)\n                if m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n\n    def _make_layer(self, dim, stride=1):\n        layer1 = ResidualBlock(self.in_planes, dim, self.norm_fn, stride=stride)\n        layer2 = ResidualBlock(dim, dim, self.norm_fn, stride=1)\n        layers = (layer1, layer2)\n        \n        self.in_planes = dim\n        return nn.Sequential(*layers)\n\n\n    def forward(self, x):\n\n        # if input is list, combine batch dimension\n        is_list = isinstance(x, tuple) or isinstance(x, list)\n        if is_list:\n            batch_dim = x[0].shape[0]\n            x = torch.cat(x, dim=0)\n\n        x = self.conv1(x)\n        x = self.norm1(x)\n        x = self.relu1(x)\n\n        x = self.layer1(x)\n        x = self.layer2(x)\n        x = self.layer3(x)\n        x = self.layer3_2(x)\n\n        x = self.conv2(x)\n\n        if self.training and self.dropout is not None:\n            x = self.dropout(x)\n\n        if is_list:\n            x = torch.split(x, [batch_dim, batch_dim], dim=0)\n\n        return x\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/networks/blocks/ifrnet.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom vbench.third_party.amt.utils.flow_utils import warp\n\n\ndef resize(x, scale_factor):\n    return F.interpolate(x, scale_factor=scale_factor, mode=\"bilinear\", align_corners=False)\n\ndef convrelu(in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1, groups=1, bias=True):\n    return nn.Sequential(\n        nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias=bias), \n        nn.PReLU(out_channels)\n    )\n\nclass ResBlock(nn.Module):\n    def __init__(self, in_channels, side_channels, bias=True):\n        super(ResBlock, self).__init__()\n        self.side_channels = side_channels\n        self.conv1 = nn.Sequential(\n            nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1, bias=bias), \n            nn.PReLU(in_channels)\n        )\n        self.conv2 = nn.Sequential(\n            nn.Conv2d(side_channels, side_channels, kernel_size=3, stride=1, padding=1, bias=bias), \n            nn.PReLU(side_channels)\n        )\n        self.conv3 = nn.Sequential(\n            nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1, bias=bias), \n            nn.PReLU(in_channels)\n        )\n        self.conv4 = nn.Sequential(\n            nn.Conv2d(side_channels, side_channels, kernel_size=3, stride=1, padding=1, bias=bias), \n            nn.PReLU(side_channels)\n        )\n        self.conv5 = nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1, bias=bias)\n        self.prelu = nn.PReLU(in_channels)\n\n    def forward(self, x):\n        out = self.conv1(x)\n\n        res_feat = out[:, :-self.side_channels, ...]\n        side_feat = out[:, -self.side_channels:, :, :]\n        side_feat = self.conv2(side_feat)\n        out = self.conv3(torch.cat([res_feat, side_feat], 1))\n\n        res_feat = out[:, :-self.side_channels, ...]\n        side_feat = out[:, -self.side_channels:, :, :]\n        side_feat = self.conv4(side_feat)\n        out = self.conv5(torch.cat([res_feat, side_feat], 1))\n\n        out = self.prelu(x + out)\n        return out\n    \nclass Encoder(nn.Module):\n    def __init__(self, channels, large=False):\n        super(Encoder, self).__init__()\n        self.channels = channels        \n        prev_ch = 3\n        for idx, ch in enumerate(channels, 1):\n            k = 7 if large and idx == 1 else 3\n            p = 3 if k ==7 else 1\n            self.register_module(f'pyramid{idx}', \n            nn.Sequential(\n                convrelu(prev_ch, ch, k, 2, p),\n                convrelu(ch, ch, 3, 1, 1)\n            ))\n            prev_ch = ch\n                \n    def forward(self, in_x):\n        fs = []\n        for idx in range(len(self.channels)):\n            out_x = getattr(self, f'pyramid{idx+1}')(in_x)\n            fs.append(out_x)\n            in_x = out_x\n        return fs\n    \nclass InitDecoder(nn.Module):\n    def __init__(self, in_ch, out_ch, skip_ch) -> None:\n        super().__init__()\n        self.convblock = nn.Sequential(\n            convrelu(in_ch*2+1, in_ch*2), \n            ResBlock(in_ch*2, skip_ch), \n            nn.ConvTranspose2d(in_ch*2, out_ch+4, 4, 2, 1, bias=True)\n        )\n    def forward(self, f0, f1, embt):\n        h, w = f0.shape[2:]\n        embt = embt.repeat(1, 1, h, w)\n        out = self.convblock(torch.cat([f0, f1, embt], 1))\n        flow0, flow1 = torch.chunk(out[:, :4, ...], 2, 1)\n        ft_ = out[:, 4:, ...]\n        return flow0, flow1, ft_\n    \nclass IntermediateDecoder(nn.Module):\n    def __init__(self, in_ch, out_ch, skip_ch) -> None:\n        super().__init__()\n        self.convblock = nn.Sequential(\n            convrelu(in_ch*3+4, in_ch*3), \n            ResBlock(in_ch*3, skip_ch), \n            nn.ConvTranspose2d(in_ch*3, out_ch+4, 4, 2, 1, bias=True)\n        )\n    def forward(self, ft_, f0, f1, flow0_in, flow1_in):\n        f0_warp = warp(f0, flow0_in)\n        f1_warp = warp(f1, flow1_in)\n        f_in = torch.cat([ft_, f0_warp, f1_warp, flow0_in, flow1_in], 1)\n        out = self.convblock(f_in)\n        flow0, flow1 = torch.chunk(out[:, :4, ...], 2, 1)\n        ft_ = out[:, 4:, ...]\n        flow0 = flow0 + 2.0 * resize(flow0_in, scale_factor=2.0)\n        flow1 = flow1 + 2.0 * resize(flow1_in, scale_factor=2.0)\n        return flow0, flow1, ft_\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/networks/blocks/multi_flow.py",
    "content": "import torch\nimport torch.nn as nn\nfrom vbench.third_party.amt.utils.flow_utils import warp\nfrom vbench.third_party.amt.networks.blocks.ifrnet import (\n    convrelu, resize,\n    ResBlock,\n)\n\n\ndef multi_flow_combine(comb_block, img0, img1, flow0, flow1, \n                       mask=None, img_res=None, mean=None):\n        '''\n            A parallel implementation of multiple flow field warping \n            comb_block: An nn.Seqential object.\n            img shape: [b, c, h, w]\n            flow shape: [b, 2*num_flows, h, w]\n            mask (opt):\n                If 'mask' is None, the function conduct a simple average.\n            img_res (opt):\n                If 'img_res' is None, the function adds zero instead. \n            mean (opt):\n                If 'mean' is None, the function adds zero instead.       \n        '''\n        b, c, h, w = flow0.shape\n        num_flows = c // 2\n        flow0   =   flow0.reshape(b, num_flows, 2, h, w).reshape(-1, 2, h, w)\n        flow1   =   flow1.reshape(b, num_flows, 2, h, w).reshape(-1, 2, h, w)\n        \n        mask    =    mask.reshape(b, num_flows, 1, h, w\n                            ).reshape(-1, 1, h, w) if mask is not None else None\n        img_res = img_res.reshape(b, num_flows, 3, h, w\n                            ).reshape(-1, 3, h, w)  if img_res is not None else 0\n        img0 = torch.stack([img0] * num_flows, 1).reshape(-1, 3, h, w)\n        img1 = torch.stack([img1] * num_flows, 1).reshape(-1, 3, h, w)\n        mean = torch.stack([mean] * num_flows, 1).reshape(-1, 1, 1, 1\n                                                    ) if mean is not None else 0\n        \n        img0_warp = warp(img0, flow0)\n        img1_warp = warp(img1, flow1)\n        img_warps = mask * img0_warp + (1 - mask) * img1_warp + mean + img_res\n        img_warps = img_warps.reshape(b, num_flows, 3, h, w)\n        imgt_pred = img_warps.mean(1) + comb_block(img_warps.view(b, -1, h, w))\n        return imgt_pred\n\n\nclass MultiFlowDecoder(nn.Module):\n    def __init__(self, in_ch, skip_ch, num_flows=3):\n        super(MultiFlowDecoder, self).__init__()\n        self.num_flows = num_flows\n        self.convblock = nn.Sequential(\n            convrelu(in_ch*3+4, in_ch*3), \n            ResBlock(in_ch*3, skip_ch), \n            nn.ConvTranspose2d(in_ch*3, 8*num_flows, 4, 2, 1, bias=True)\n        )\n        \n    def forward(self, ft_, f0, f1, flow0, flow1):\n        n = self.num_flows\n        f0_warp = warp(f0, flow0)\n        f1_warp = warp(f1, flow1)\n        out = self.convblock(torch.cat([ft_, f0_warp, f1_warp, flow0, flow1], 1))\n        delta_flow0, delta_flow1, mask, img_res = torch.split(out, [2*n, 2*n, n, 3*n], 1)\n        mask = torch.sigmoid(mask)\n        \n        flow0 = delta_flow0 + 2.0 * resize(flow0, scale_factor=2.0\n                                           ).repeat(1, self.num_flows, 1, 1)\n        flow1 = delta_flow1 + 2.0 * resize(flow1, scale_factor=2.0\n                                           ).repeat(1, self.num_flows, 1, 1)\n        \n        return flow0, flow1, mask, img_res\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/amt/networks/blocks/raft.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\ndef resize(x, scale_factor):\n    return F.interpolate(x, scale_factor=scale_factor, mode=\"bilinear\", align_corners=False)\n\n\ndef bilinear_sampler(img, coords, mask=False):\n    \"\"\" Wrapper for grid_sample, uses pixel coordinates \"\"\"\n    H, W = img.shape[-2:]\n    xgrid, ygrid = coords.split([1,1], dim=-1)\n    xgrid = 2*xgrid/(W-1) - 1\n    ygrid = 2*ygrid/(H-1) - 1\n\n    grid = torch.cat([xgrid, ygrid], dim=-1)\n    img = F.grid_sample(img, grid, align_corners=True)\n\n    if mask:\n        mask = (xgrid > -1) & (ygrid > -1) & (xgrid < 1) & (ygrid < 1)\n        return img, mask.float()\n\n    return img\n\n\ndef coords_grid(batch, ht, wd, device):\n    coords = torch.meshgrid(torch.arange(ht, device=device), \n                            torch.arange(wd, device=device), \n                            indexing='ij')\n    coords = torch.stack(coords[::-1], dim=0).float()\n    return coords[None].repeat(batch, 1, 1, 1)\n\n\nclass SmallUpdateBlock(nn.Module):\n    def __init__(self, cdim, hidden_dim, flow_dim, corr_dim, fc_dim,\n                 corr_levels=4, radius=3, scale_factor=None):\n        super(SmallUpdateBlock, self).__init__()\n        cor_planes = corr_levels * (2 * radius + 1) **2\n        self.scale_factor = scale_factor\n\n        self.convc1 = nn.Conv2d(2 * cor_planes, corr_dim, 1, padding=0)\n        self.convf1 = nn.Conv2d(4, flow_dim*2, 7, padding=3)\n        self.convf2 = nn.Conv2d(flow_dim*2, flow_dim, 3, padding=1)\n        self.conv = nn.Conv2d(corr_dim+flow_dim, fc_dim, 3, padding=1)\n\n        self.gru = nn.Sequential(\n            nn.Conv2d(fc_dim+4+cdim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n        )\n\n        self.feat_head = nn.Sequential(\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, cdim, 3, padding=1),\n        )\n\n        self.flow_head = nn.Sequential(\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, 4, 3, padding=1),\n        )\n\n        self.lrelu = nn.LeakyReLU(negative_slope=0.1, inplace=True)\n            \n    def forward(self, net, flow, corr):\n        net = resize(net, 1 / self.scale_factor\n                      ) if self.scale_factor is not None else net\n        cor = self.lrelu(self.convc1(corr))\n        flo = self.lrelu(self.convf1(flow))\n        flo = self.lrelu(self.convf2(flo))\n        cor_flo = torch.cat([cor, flo], dim=1)\n        inp = self.lrelu(self.conv(cor_flo))\n        inp = torch.cat([inp, flow, net], dim=1)\n\n        out = self.gru(inp)\n        delta_net = self.feat_head(out)\n        delta_flow = self.flow_head(out)\n        \n        if self.scale_factor is not None:\n            delta_net = resize(delta_net, scale_factor=self.scale_factor)\n            delta_flow = self.scale_factor * resize(delta_flow, scale_factor=self.scale_factor)\n        \n        return delta_net, delta_flow\n\n\nclass BasicUpdateBlock(nn.Module):\n    def __init__(self, cdim, hidden_dim, flow_dim, corr_dim, corr_dim2, \n                 fc_dim, corr_levels=4, radius=3, scale_factor=None, out_num=1):\n        super(BasicUpdateBlock, self).__init__()\n        cor_planes = corr_levels * (2 * radius + 1) **2\n\n        self.scale_factor = scale_factor\n        self.convc1 = nn.Conv2d(2 * cor_planes, corr_dim, 1, padding=0)\n        self.convc2 = nn.Conv2d(corr_dim, corr_dim2, 3, padding=1)\n        self.convf1 = nn.Conv2d(4, flow_dim*2, 7, padding=3)\n        self.convf2 = nn.Conv2d(flow_dim*2, flow_dim, 3, padding=1)\n        self.conv = nn.Conv2d(flow_dim+corr_dim2, fc_dim, 3, padding=1)\n\n        self.gru = nn.Sequential(\n            nn.Conv2d(fc_dim+4+cdim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n        )\n\n        self.feat_head = nn.Sequential(\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, cdim, 3, padding=1),\n        )\n\n        self.flow_head = nn.Sequential(\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, 4*out_num, 3, padding=1),\n        )\n\n        self.lrelu = nn.LeakyReLU(negative_slope=0.1, inplace=True)\n            \n    def forward(self, net, flow, corr):\n        net = resize(net, 1 / self.scale_factor\n                      ) if self.scale_factor is not None else net\n        cor = self.lrelu(self.convc1(corr))\n        cor = self.lrelu(self.convc2(cor))\n        flo = self.lrelu(self.convf1(flow))\n        flo = self.lrelu(self.convf2(flo))\n        cor_flo = torch.cat([cor, flo], dim=1)\n        inp = self.lrelu(self.conv(cor_flo))\n        inp = torch.cat([inp, flow, net], dim=1)\n\n        out = self.gru(inp)\n        delta_net = self.feat_head(out)\n        delta_flow = self.flow_head(out)\n        \n        if self.scale_factor is not None:\n            delta_net = resize(delta_net, scale_factor=self.scale_factor)\n            delta_flow = self.scale_factor * resize(delta_flow, scale_factor=self.scale_factor)\n        return delta_net, delta_flow\n\n\nclass BidirCorrBlock:\n    def __init__(self, fmap1, fmap2, num_levels=4, radius=4):\n        self.num_levels = num_levels\n        self.radius = radius\n        self.corr_pyramid = []\n        self.corr_pyramid_T = []\n\n        corr = BidirCorrBlock.corr(fmap1, fmap2)\n        batch, h1, w1, dim, h2, w2 = corr.shape\n        corr_T = corr.clone().permute(0, 4, 5, 3, 1, 2)\n\n        corr = corr.reshape(batch*h1*w1, dim, h2, w2)\n        corr_T = corr_T.reshape(batch*h2*w2, dim, h1, w1)\n        \n        self.corr_pyramid.append(corr)\n        self.corr_pyramid_T.append(corr_T)\n\n        for _ in range(self.num_levels-1):\n            corr = F.avg_pool2d(corr, 2, stride=2)\n            corr_T = F.avg_pool2d(corr_T, 2, stride=2)\n            self.corr_pyramid.append(corr)\n            self.corr_pyramid_T.append(corr_T)\n\n    def __call__(self, coords0, coords1):\n        r = self.radius\n        coords0 = coords0.permute(0, 2, 3, 1)\n        coords1 = coords1.permute(0, 2, 3, 1)\n        assert coords0.shape == coords1.shape, f\"coords0 shape: [{coords0.shape}] is not equal to [{coords1.shape}]\"\n        batch, h1, w1, _ = coords0.shape\n\n        out_pyramid = []\n        out_pyramid_T = []\n        for i in range(self.num_levels):\n            corr = self.corr_pyramid[i]\n            corr_T = self.corr_pyramid_T[i]\n\n            dx = torch.linspace(-r, r, 2*r+1, device=coords0.device)\n            dy = torch.linspace(-r, r, 2*r+1, device=coords0.device)\n            delta = torch.stack(torch.meshgrid(dy, dx, indexing='ij'), axis=-1)\n            delta_lvl = delta.view(1, 2*r+1, 2*r+1, 2)\n\n            centroid_lvl_0 = coords0.reshape(batch*h1*w1, 1, 1, 2) / 2**i\n            centroid_lvl_1 = coords1.reshape(batch*h1*w1, 1, 1, 2) / 2**i\n            coords_lvl_0 = centroid_lvl_0 + delta_lvl\n            coords_lvl_1 = centroid_lvl_1 + delta_lvl\n\n            corr = bilinear_sampler(corr, coords_lvl_0)\n            corr_T = bilinear_sampler(corr_T, coords_lvl_1)\n            corr = corr.view(batch, h1, w1, -1)\n            corr_T = corr_T.view(batch, h1, w1, -1)\n            out_pyramid.append(corr)\n            out_pyramid_T.append(corr_T)\n\n        out = torch.cat(out_pyramid, dim=-1)\n        out_T = torch.cat(out_pyramid_T, dim=-1)\n        return out.permute(0, 3, 1, 2).contiguous().float(), out_T.permute(0, 3, 1, 2).contiguous().float()\n\n    @staticmethod\n    def corr(fmap1, fmap2):\n        batch, dim, ht, wd = fmap1.shape\n        fmap1 = fmap1.view(batch, dim, ht*wd)\n        fmap2 = fmap2.view(batch, dim, ht*wd) \n        \n        corr = torch.matmul(fmap1.transpose(1,2), fmap2)\n        corr = corr.view(batch, ht, wd, 1, ht, wd)\n        return corr  / torch.sqrt(torch.tensor(dim).float())"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_model.py",
    "content": "import os\nimport sys\n\nfrom .grit_src.image_dense_captions import image_caption_api, init_demo, dense_pred_to_caption, dense_pred_to_caption_only_name,dense_pred_to_caption_tuple\nfrom detectron2.data.detection_utils import read_image\n\nclass DenseCaptioning():\n    def __init__(self, device):\n        self.device = device\n        self.demo =  None\n\n\n    def initialize_model(self, model_weight):\n        self.demo = init_demo(self.device, model_weight=model_weight)\n        \n    def initialize_model_det(self, model_weight):\n        self.demo = init_demo(self.device, model_weight = model_weight, task=\"ObjectDet\")\n    \n    def image_dense_caption(self, image_src):\n        dense_caption = image_caption_api(image_src, self.device)\n        print('\\033[1;35m' + '*' * 100 + '\\033[0m')\n        print(\"Step2, Dense Caption:\\n\")\n        print(dense_caption)\n        print('\\033[1;35m' + '*' * 100 + '\\033[0m')\n        return dense_caption\n    \n    def run_caption_api(self,image_src):\n        img = read_image(image_src, format=\"BGR\")\n        print(img.shape)\n        predictions, visualized_output = self.demo.run_on_image(img)\n        new_caption = dense_pred_to_caption_only_name(predictions)\n        return new_caption\n\n    def run_caption_tensor(self,img):\n        predictions, visualized_output = self.demo.run_on_image(img)\n        new_caption = dense_pred_to_caption_tuple(predictions)\n        return new_caption, visualized_output\n\n    def run_det_tensor(self,img):\n        predictions, visualized_output = self.demo.run_on_image(img)\n        return predictions, visualized_output\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/__init__.py",
    "content": "from .modeling.meta_arch.centernet_detector import CenterNetDetector\nfrom .modeling.dense_heads.centernet import CenterNet\nfrom .modeling.roi_heads.custom_roi_heads import CustomROIHeads, CustomCascadeROIHeads\n\nfrom .modeling.backbone.fpn_p5 import build_p67_resnet_fpn_backbone\nfrom .modeling.backbone.dla import build_dla_backbone\nfrom .modeling.backbone.dlafpn import build_dla_fpn3_backbone\nfrom .modeling.backbone.bifpn import build_resnet_bifpn_backbone\nfrom .modeling.backbone.bifpn_fcos import build_fcos_resnet_bifpn_backbone\nfrom .modeling.backbone.res2net import build_p67_res2net_fpn_backbone\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/config.py",
    "content": "from detectron2.config import CfgNode as CN\n\ndef add_centernet_config(cfg):\n    _C = cfg\n\n    _C.MODEL.CENTERNET = CN()\n    _C.MODEL.CENTERNET.NUM_CLASSES = 80\n    _C.MODEL.CENTERNET.IN_FEATURES = [\"p3\", \"p4\", \"p5\", \"p6\", \"p7\"]\n    _C.MODEL.CENTERNET.FPN_STRIDES = [8, 16, 32, 64, 128]\n    _C.MODEL.CENTERNET.PRIOR_PROB = 0.01\n    _C.MODEL.CENTERNET.INFERENCE_TH = 0.05\n    _C.MODEL.CENTERNET.CENTER_NMS = False\n    _C.MODEL.CENTERNET.NMS_TH_TRAIN = 0.6\n    _C.MODEL.CENTERNET.NMS_TH_TEST = 0.6\n    _C.MODEL.CENTERNET.PRE_NMS_TOPK_TRAIN = 1000\n    _C.MODEL.CENTERNET.POST_NMS_TOPK_TRAIN = 100\n    _C.MODEL.CENTERNET.PRE_NMS_TOPK_TEST = 1000\n    _C.MODEL.CENTERNET.POST_NMS_TOPK_TEST = 100\n    _C.MODEL.CENTERNET.NORM = \"GN\"\n    _C.MODEL.CENTERNET.USE_DEFORMABLE = False\n    _C.MODEL.CENTERNET.NUM_CLS_CONVS = 4\n    _C.MODEL.CENTERNET.NUM_BOX_CONVS = 4\n    _C.MODEL.CENTERNET.NUM_SHARE_CONVS = 0\n    _C.MODEL.CENTERNET.LOC_LOSS_TYPE = 'giou'\n    _C.MODEL.CENTERNET.SIGMOID_CLAMP = 1e-4\n    _C.MODEL.CENTERNET.HM_MIN_OVERLAP = 0.8\n    _C.MODEL.CENTERNET.MIN_RADIUS = 4\n    _C.MODEL.CENTERNET.SOI = [[0, 80], [64, 160], [128, 320], [256, 640], [512, 10000000]]\n    _C.MODEL.CENTERNET.POS_WEIGHT = 1.\n    _C.MODEL.CENTERNET.NEG_WEIGHT = 1.\n    _C.MODEL.CENTERNET.REG_WEIGHT = 2.\n    _C.MODEL.CENTERNET.HM_FOCAL_BETA = 4\n    _C.MODEL.CENTERNET.HM_FOCAL_ALPHA = 0.25\n    _C.MODEL.CENTERNET.LOSS_GAMMA = 2.0\n    _C.MODEL.CENTERNET.WITH_AGN_HM = False\n    _C.MODEL.CENTERNET.ONLY_PROPOSAL = False\n    _C.MODEL.CENTERNET.AS_PROPOSAL = False\n    _C.MODEL.CENTERNET.IGNORE_HIGH_FP = -1.\n    _C.MODEL.CENTERNET.MORE_POS = False\n    _C.MODEL.CENTERNET.MORE_POS_THRESH = 0.2\n    _C.MODEL.CENTERNET.MORE_POS_TOPK = 9\n    _C.MODEL.CENTERNET.NOT_NORM_REG = True\n    _C.MODEL.CENTERNET.NOT_NMS = False\n    _C.MODEL.CENTERNET.NO_REDUCE = False\n\n    _C.MODEL.ROI_BOX_HEAD.USE_SIGMOID_CE = False\n    _C.MODEL.ROI_BOX_HEAD.PRIOR_PROB = 0.01\n    _C.MODEL.ROI_BOX_HEAD.USE_EQL_LOSS = False\n    _C.MODEL.ROI_BOX_HEAD.CAT_FREQ_PATH = \\\n        'datasets/lvis/lvis_v1_train_cat_info.json'\n    _C.MODEL.ROI_BOX_HEAD.EQL_FREQ_CAT = 200\n    _C.MODEL.ROI_BOX_HEAD.USE_FED_LOSS = False\n    _C.MODEL.ROI_BOX_HEAD.FED_LOSS_NUM_CAT = 50\n    _C.MODEL.ROI_BOX_HEAD.FED_LOSS_FREQ_WEIGHT = 0.5\n    _C.MODEL.ROI_BOX_HEAD.MULT_PROPOSAL_SCORE = False\n\n    _C.MODEL.BIFPN = CN()\n    _C.MODEL.BIFPN.NUM_LEVELS = 5\n    _C.MODEL.BIFPN.NUM_BIFPN = 6\n    _C.MODEL.BIFPN.NORM = 'GN'\n    _C.MODEL.BIFPN.OUT_CHANNELS = 160\n    _C.MODEL.BIFPN.SEPARABLE_CONV = False\n\n    _C.MODEL.DLA = CN()\n    _C.MODEL.DLA.OUT_FEATURES = ['dla2']\n    _C.MODEL.DLA.USE_DLA_UP = True\n    _C.MODEL.DLA.NUM_LAYERS = 34\n    _C.MODEL.DLA.MS_OUTPUT = False\n    _C.MODEL.DLA.NORM = 'BN'\n    _C.MODEL.DLA.DLAUP_IN_FEATURES = ['dla3', 'dla4', 'dla5']\n    _C.MODEL.DLA.DLAUP_NODE = 'conv'\n\n    _C.SOLVER.RESET_ITER = False\n    _C.SOLVER.TRAIN_ITER = -1\n\n    _C.INPUT.CUSTOM_AUG = ''\n    _C.INPUT.TRAIN_SIZE = 640\n    _C.INPUT.TEST_SIZE = 640\n    _C.INPUT.SCALE_RANGE = (0.1, 2.)\n    # 'default' for fixed short/ long edge, 'square' for max size=INPUT.SIZE\n    _C.INPUT.TEST_INPUT_TYPE = 'default' \n    \n    _C.DEBUG = False\n    _C.SAVE_DEBUG = False\n    _C.SAVE_PTH = False\n    _C.VIS_THRESH = 0.3\n    _C.DEBUG_SHOW_NAME = False\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/backbone/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/backbone/bifpn.py",
    "content": "# Modified from https://github.com/rwightman/efficientdet-pytorch/blob/master/effdet/efficientdet.py\n# The original file is under Apache-2.0 License\nimport math\nfrom os.path import join\nimport numpy as np\nfrom collections import OrderedDict\nfrom typing import List\n\nimport torch\nfrom torch import nn\nimport torch.utils.model_zoo as model_zoo\nimport torch.nn.functional as F\nimport fvcore.nn.weight_init as weight_init\n\nfrom detectron2.layers import ShapeSpec, Conv2d\nfrom detectron2.modeling.backbone.resnet import build_resnet_backbone\nfrom detectron2.modeling.backbone.build import BACKBONE_REGISTRY\nfrom detectron2.layers.batch_norm import get_norm\nfrom detectron2.modeling.backbone import Backbone\nfrom .dlafpn import dla34\n\ndef get_fpn_config(base_reduction=8):\n    \"\"\"BiFPN config with sum.\"\"\"\n    p = {\n        'nodes': [\n            {'reduction': base_reduction << 3, 'inputs_offsets': [3, 4]},\n            {'reduction': base_reduction << 2, 'inputs_offsets': [2, 5]},\n            {'reduction': base_reduction << 1, 'inputs_offsets': [1, 6]},\n            {'reduction': base_reduction, 'inputs_offsets': [0, 7]},\n            {'reduction': base_reduction << 1, 'inputs_offsets': [1, 7, 8]},\n            {'reduction': base_reduction << 2, 'inputs_offsets': [2, 6, 9]},\n            {'reduction': base_reduction << 3, 'inputs_offsets': [3, 5, 10]},\n            {'reduction': base_reduction << 4, 'inputs_offsets': [4, 11]},\n        ],\n        'weight_method': 'fastattn',\n    }\n    return p\n\n\ndef swish(x, inplace: bool = False):\n    \"\"\"Swish - Described in: https://arxiv.org/abs/1710.05941\n    \"\"\"\n    return x.mul_(x.sigmoid()) if inplace else x.mul(x.sigmoid())\n\n\nclass Swish(nn.Module):\n    def __init__(self, inplace: bool = False):\n        super(Swish, self).__init__()\n        self.inplace = inplace\n\n    def forward(self, x):\n        return swish(x, self.inplace)\n\n\nclass SequentialAppend(nn.Sequential):\n    def __init__(self, *args):\n        super(SequentialAppend, self).__init__(*args)\n\n    def forward(self, x):\n        for module in self:\n            x.append(module(x))\n        return x\n\n\nclass SequentialAppendLast(nn.Sequential):\n    def __init__(self, *args):\n        super(SequentialAppendLast, self).__init__(*args)\n\n    # def forward(self, x: List[torch.Tensor]):\n    def forward(self, x):\n        for module in self:\n            x.append(module(x[-1]))\n        return x\n\n\nclass ConvBnAct2d(nn.Module):\n    def __init__(self, in_channels, out_channels, kernel_size, stride=1, dilation=1, padding='', bias=False,\n                 norm='', act_layer=Swish):\n        super(ConvBnAct2d, self).__init__()\n        # self.conv = create_conv2d(\n        #     in_channels, out_channels, kernel_size, stride=stride, dilation=dilation, padding=padding, bias=bias)\n        self.conv = Conv2d(\n            in_channels, out_channels, kernel_size=kernel_size, stride=stride, \n            padding=kernel_size // 2, bias=(norm == ''))\n        self.bn = get_norm(norm, out_channels)\n        self.act = None if act_layer is None else act_layer(inplace=True)\n\n    def forward(self, x):\n        x = self.conv(x)\n        if self.bn is not None:\n            x = self.bn(x)\n        if self.act is not None:\n            x = self.act(x)\n        return x\n\n\nclass SeparableConv2d(nn.Module):\n    \"\"\" Separable Conv\n    \"\"\"\n    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, dilation=1, padding='', bias=False,\n                 channel_multiplier=1.0, pw_kernel_size=1, act_layer=Swish,\n                 norm=''):\n        super(SeparableConv2d, self).__init__()\n\n        # self.conv_dw = create_conv2d(\n        #     in_channels, int(in_channels * channel_multiplier), kernel_size,\n        #     stride=stride, dilation=dilation, padding=padding, depthwise=True)\n\n        self.conv_dw = Conv2d(\n            in_channels, int(in_channels * channel_multiplier), \n            kernel_size=kernel_size, stride=stride, padding=kernel_size // 2, bias=bias,\n            groups=out_channels)\n        # print('conv_dw', kernel_size, stride) \n        # self.conv_pw = create_conv2d(\n        #     int(in_channels * channel_multiplier), out_channels, pw_kernel_size, padding=padding, bias=bias)\n        \n        self.conv_pw = Conv2d(\n            int(in_channels * channel_multiplier), out_channels, \n            kernel_size=pw_kernel_size, padding=pw_kernel_size // 2, bias=(norm==''))\n        # print('conv_pw', pw_kernel_size) \n\n        self.bn = get_norm(norm, out_channels)\n        self.act = None if act_layer is None else act_layer(inplace=True)\n\n    def forward(self, x):\n        x = self.conv_dw(x)\n        x = self.conv_pw(x)\n        if self.bn is not None:\n            x = self.bn(x)\n        if self.act is not None:\n            x = self.act(x)\n        return x\n\n\nclass ResampleFeatureMap(nn.Sequential):\n    def __init__(self, in_channels, out_channels, reduction_ratio=1., pad_type='', pooling_type='max',\n                 norm='', apply_bn=False, conv_after_downsample=False,\n                 redundant_bias=False):\n        super(ResampleFeatureMap, self).__init__()\n        pooling_type = pooling_type or 'max'\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n        self.reduction_ratio = reduction_ratio\n        self.conv_after_downsample = conv_after_downsample\n\n        conv = None\n        if in_channels != out_channels:\n            conv = ConvBnAct2d(\n                in_channels, out_channels, kernel_size=1, padding=pad_type,\n                norm=norm if apply_bn else '', \n                bias=not apply_bn or redundant_bias, act_layer=None)\n\n        if reduction_ratio > 1:\n            stride_size = int(reduction_ratio)\n            if conv is not None and not self.conv_after_downsample:\n                self.add_module('conv', conv)\n            self.add_module(\n                'downsample',\n                # create_pool2d(\n                #     pooling_type, kernel_size=stride_size + 1, stride=stride_size, padding=pad_type)\n                # nn.MaxPool2d(kernel_size=stride_size + 1, stride=stride_size, padding=pad_type)\n                nn.MaxPool2d(kernel_size=stride_size, stride=stride_size)\n                )\n            if conv is not None and self.conv_after_downsample:\n                self.add_module('conv', conv)\n        else:\n            if conv is not None:\n                self.add_module('conv', conv)\n            if reduction_ratio < 1:\n                scale = int(1 // reduction_ratio)\n                self.add_module('upsample', nn.UpsamplingNearest2d(scale_factor=scale))\n\n\nclass FpnCombine(nn.Module):\n    def __init__(self, feature_info, fpn_config, fpn_channels, inputs_offsets, target_reduction, pad_type='',\n                 pooling_type='max', norm='', apply_bn_for_resampling=False,\n                 conv_after_downsample=False, redundant_bias=False, weight_method='attn'):\n        super(FpnCombine, self).__init__()\n        self.inputs_offsets = inputs_offsets\n        self.weight_method = weight_method\n\n        self.resample = nn.ModuleDict()\n        for idx, offset in enumerate(inputs_offsets):\n            in_channels = fpn_channels\n            if offset < len(feature_info):\n                in_channels = feature_info[offset]['num_chs']\n                input_reduction = feature_info[offset]['reduction']\n            else:\n                node_idx = offset - len(feature_info)\n                # print('node_idx, len', node_idx, len(fpn_config['nodes']))\n                input_reduction = fpn_config['nodes'][node_idx]['reduction']\n            reduction_ratio = target_reduction / input_reduction\n            self.resample[str(offset)] = ResampleFeatureMap(\n                in_channels, fpn_channels, reduction_ratio=reduction_ratio, pad_type=pad_type,\n                pooling_type=pooling_type, norm=norm,\n                apply_bn=apply_bn_for_resampling, conv_after_downsample=conv_after_downsample,\n                redundant_bias=redundant_bias)\n\n        if weight_method == 'attn' or weight_method == 'fastattn':\n            # WSM\n            self.edge_weights = nn.Parameter(torch.ones(len(inputs_offsets)), requires_grad=True)\n        else:\n            self.edge_weights = None\n\n    def forward(self, x):\n        dtype = x[0].dtype\n        nodes = []\n        for offset in self.inputs_offsets:\n            input_node = x[offset]\n            input_node = self.resample[str(offset)](input_node)\n            nodes.append(input_node)\n\n        if self.weight_method == 'attn':\n            normalized_weights = torch.softmax(self.edge_weights.type(dtype), dim=0)\n            x = torch.stack(nodes, dim=-1) * normalized_weights\n        elif self.weight_method == 'fastattn':\n            edge_weights = nn.functional.relu(self.edge_weights.type(dtype))\n            weights_sum = torch.sum(edge_weights)\n            x = torch.stack(\n                [(nodes[i] * edge_weights[i]) / (weights_sum + 0.0001) for i in range(len(nodes))], dim=-1)\n        elif self.weight_method == 'sum':\n            x = torch.stack(nodes, dim=-1)\n        else:\n            raise ValueError('unknown weight_method {}'.format(self.weight_method))\n        x = torch.sum(x, dim=-1)\n        return x\n\n\nclass BiFpnLayer(nn.Module):\n    def __init__(self, feature_info, fpn_config, fpn_channels, num_levels=5, pad_type='',\n                 pooling_type='max', norm='', act_layer=Swish,\n                 apply_bn_for_resampling=False, conv_after_downsample=True, conv_bn_relu_pattern=False,\n                 separable_conv=True, redundant_bias=False):\n        super(BiFpnLayer, self).__init__()\n        self.fpn_config = fpn_config\n        self.num_levels = num_levels\n        self.conv_bn_relu_pattern = False\n\n        self.feature_info = []\n        self.fnode = SequentialAppend()\n        for i, fnode_cfg in enumerate(fpn_config['nodes']):\n            # logging.debug('fnode {} : {}'.format(i, fnode_cfg))\n            # print('fnode {} : {}'.format(i, fnode_cfg))\n            fnode_layers = OrderedDict()\n\n            # combine features\n            reduction = fnode_cfg['reduction']\n            fnode_layers['combine'] = FpnCombine(\n                feature_info, fpn_config, fpn_channels, fnode_cfg['inputs_offsets'], target_reduction=reduction,\n                pad_type=pad_type, pooling_type=pooling_type, norm=norm,\n                apply_bn_for_resampling=apply_bn_for_resampling, conv_after_downsample=conv_after_downsample,\n                redundant_bias=redundant_bias, weight_method=fpn_config['weight_method'])\n            self.feature_info.append(dict(num_chs=fpn_channels, reduction=reduction))\n\n            # after combine ops\n            after_combine = OrderedDict()\n            if not conv_bn_relu_pattern:\n                after_combine['act'] = act_layer(inplace=True)\n                conv_bias = redundant_bias\n                conv_act = None\n            else:\n                conv_bias = False\n                conv_act = act_layer\n            conv_kwargs = dict(\n                in_channels=fpn_channels, out_channels=fpn_channels, kernel_size=3, padding=pad_type,\n                bias=conv_bias, norm=norm, act_layer=conv_act)\n            after_combine['conv'] = SeparableConv2d(**conv_kwargs) if separable_conv else ConvBnAct2d(**conv_kwargs)\n            fnode_layers['after_combine'] = nn.Sequential(after_combine)\n\n            self.fnode.add_module(str(i), nn.Sequential(fnode_layers))\n\n        self.feature_info = self.feature_info[-num_levels::]\n\n    def forward(self, x):\n        x = self.fnode(x)\n        return x[-self.num_levels::]\n\n\nclass BiFPN(Backbone):\n    def __init__(\n        self, cfg, bottom_up, in_features, out_channels, norm='', \n        num_levels=5, num_bifpn=4, separable_conv=False,\n    ):\n        super(BiFPN, self).__init__()\n        assert isinstance(bottom_up, Backbone)\n        \n        # Feature map strides and channels from the bottom up network (e.g. ResNet)\n        input_shapes = bottom_up.output_shape()\n        in_strides = [input_shapes[f].stride for f in in_features]\n        in_channels = [input_shapes[f].channels for f in in_features]\n\n        self.num_levels = num_levels\n        self.num_bifpn = num_bifpn\n        self.bottom_up = bottom_up\n        self.in_features = in_features\n        self._size_divisibility = 128\n        levels = [int(math.log2(s)) for s in in_strides]\n        self._out_feature_strides = {\n            \"p{}\".format(int(math.log2(s))): s for s in in_strides}\n        if len(in_features) < num_levels:\n            for l in range(num_levels - len(in_features)):\n                s = l + levels[-1]\n                self._out_feature_strides[\"p{}\".format(s + 1)] = 2 ** (s + 1)\n        self._out_features = list(sorted(self._out_feature_strides.keys()))\n        self._out_feature_channels = {k: out_channels for k in self._out_features}\n        \n        # print('self._out_feature_strides', self._out_feature_strides)\n        # print('self._out_feature_channels', self._out_feature_channels)\n        \n        feature_info = [\n            {'num_chs': in_channels[level], 'reduction': in_strides[level]} \\\n            for level in range(len(self.in_features))\n        ]\n        # self.config = config\n        fpn_config = get_fpn_config()\n        self.resample = SequentialAppendLast()\n        for level in range(num_levels):\n            if level < len(feature_info):\n                in_chs = in_channels[level] # feature_info[level]['num_chs']\n                reduction = in_strides[level] # feature_info[level]['reduction']\n            else:\n                # Adds a coarser level by downsampling the last feature map\n                reduction_ratio = 2\n                self.resample.add_module(str(level), ResampleFeatureMap(\n                    in_channels=in_chs,\n                    out_channels=out_channels,\n                    pad_type='same',\n                    pooling_type=None,\n                    norm=norm,\n                    reduction_ratio=reduction_ratio,\n                    apply_bn=True,\n                    conv_after_downsample=False,\n                    redundant_bias=False,\n                ))\n                in_chs = out_channels\n                reduction = int(reduction * reduction_ratio)\n                feature_info.append(dict(num_chs=in_chs, reduction=reduction))\n\n        self.cell = nn.Sequential()\n        for rep in range(self.num_bifpn):\n            # logging.debug('building cell {}'.format(rep))\n            # print('building cell {}'.format(rep))\n            fpn_layer = BiFpnLayer(\n                feature_info=feature_info,\n                fpn_config=fpn_config,\n                fpn_channels=out_channels,\n                num_levels=self.num_levels,\n                pad_type='same',\n                pooling_type=None,\n                norm=norm,\n                act_layer=Swish,\n                separable_conv=separable_conv,\n                apply_bn_for_resampling=True,\n                conv_after_downsample=False,\n                conv_bn_relu_pattern=False,\n                redundant_bias=False,\n            )\n            self.cell.add_module(str(rep), fpn_layer)\n            feature_info = fpn_layer.feature_info\n        # import pdb; pdb.set_trace()\n\n    @property\n    def size_divisibility(self):\n        return self._size_divisibility\n\n    def forward(self, x):\n        # print('input shapes', x.shape)\n        bottom_up_features = self.bottom_up(x)\n        x = [bottom_up_features[f] for f in self.in_features]\n        assert len(self.resample) == self.num_levels - len(x)\n        x = self.resample(x)\n        shapes = [xx.shape for xx in x]\n        # print('resample shapes', shapes)\n        x = self.cell(x)\n        out = {f: xx for f, xx in zip(self._out_features, x)}\n        # import pdb; pdb.set_trace()\n        return out\n\n\n@BACKBONE_REGISTRY.register()\ndef build_resnet_bifpn_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n    bottom_up = build_resnet_backbone(cfg, input_shape)\n    in_features = cfg.MODEL.FPN.IN_FEATURES\n    backbone = BiFPN(\n        cfg=cfg,\n        bottom_up=bottom_up,\n        in_features=in_features,\n        out_channels=cfg.MODEL.BIFPN.OUT_CHANNELS,\n        norm=cfg.MODEL.BIFPN.NORM,\n        num_levels=cfg.MODEL.BIFPN.NUM_LEVELS,\n        num_bifpn=cfg.MODEL.BIFPN.NUM_BIFPN,\n        separable_conv=cfg.MODEL.BIFPN.SEPARABLE_CONV,\n    )\n    return backbone\n\n@BACKBONE_REGISTRY.register()\ndef build_p37_dla_bifpn_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n    bottom_up = dla34(cfg)\n    in_features = cfg.MODEL.FPN.IN_FEATURES\n    assert cfg.MODEL.BIFPN.NUM_LEVELS == 5\n\n    backbone = BiFPN(\n        cfg=cfg,\n        bottom_up=bottom_up,\n        in_features=in_features,\n        out_channels=cfg.MODEL.BIFPN.OUT_CHANNELS,\n        norm=cfg.MODEL.BIFPN.NORM,\n        num_levels=cfg.MODEL.BIFPN.NUM_LEVELS,\n        num_bifpn=cfg.MODEL.BIFPN.NUM_BIFPN,\n        separable_conv=cfg.MODEL.BIFPN.SEPARABLE_CONV,\n    )\n    return backbone\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/backbone/bifpn_fcos.py",
    "content": "# This file is modified from https://github.com/aim-uofa/AdelaiDet/blob/master/adet/modeling/backbone/bifpn.py\n# The original file is under 2-clause BSD License for academic use, and *non-commercial use*.\nimport torch\nimport torch.nn.functional as F\nfrom torch import nn\n\nfrom detectron2.layers import Conv2d, ShapeSpec, get_norm\n\nfrom detectron2.modeling.backbone import Backbone, build_resnet_backbone\nfrom detectron2.modeling import BACKBONE_REGISTRY\nfrom .dlafpn import dla34\n\n__all__ = []\n\n\ndef swish(x):\n    return x * x.sigmoid()\n\n\ndef split_name(name):\n    for i, c in enumerate(name):\n        if not c.isalpha():\n            return name[:i], int(name[i:])\n    raise ValueError()\n\n\nclass FeatureMapResampler(nn.Module):\n    def __init__(self, in_channels, out_channels, stride, norm=\"\"):\n        super(FeatureMapResampler, self).__init__()\n        if in_channels != out_channels:\n            self.reduction = Conv2d(\n                in_channels, out_channels, kernel_size=1,\n                bias=(norm == \"\"),\n                norm=get_norm(norm, out_channels),\n                activation=None\n            )\n        else:\n            self.reduction = None\n\n        assert stride <= 2\n        self.stride = stride\n\n    def forward(self, x):\n        if self.reduction is not None:\n            x = self.reduction(x)\n\n        if self.stride == 2:\n            x = F.max_pool2d(\n                x, kernel_size=self.stride + 1,\n                stride=self.stride, padding=1\n            )\n        elif self.stride == 1:\n            pass\n        else:\n            raise NotImplementedError()\n        return x\n\n\nclass BackboneWithTopLevels(Backbone):\n    def __init__(self, backbone, out_channels, num_top_levels, norm=\"\"):\n        super(BackboneWithTopLevels, self).__init__()\n        self.backbone = backbone\n        backbone_output_shape = backbone.output_shape()\n\n        self._out_feature_channels = {name: shape.channels for name, shape in backbone_output_shape.items()}\n        self._out_feature_strides = {name: shape.stride for name, shape in backbone_output_shape.items()}\n        self._out_features = list(self._out_feature_strides.keys())\n\n        last_feature_name = max(self._out_feature_strides.keys(), key=lambda x: split_name(x)[1])\n        self.last_feature_name = last_feature_name\n        self.num_top_levels = num_top_levels\n\n        last_channels = self._out_feature_channels[last_feature_name]\n        last_stride = self._out_feature_strides[last_feature_name]\n\n        prefix, suffix = split_name(last_feature_name)\n        prev_channels = last_channels\n        for i in range(num_top_levels):\n            name = prefix + str(suffix + i + 1)\n            self.add_module(name, FeatureMapResampler(\n                prev_channels, out_channels, 2, norm\n            ))\n            prev_channels = out_channels\n\n            self._out_feature_channels[name] = out_channels\n            self._out_feature_strides[name] = last_stride * 2 ** (i + 1)\n            self._out_features.append(name)\n\n    def forward(self, x):\n        outputs = self.backbone(x)\n        last_features = outputs[self.last_feature_name]\n        prefix, suffix = split_name(self.last_feature_name)\n\n        x = last_features\n        for i in range(self.num_top_levels):\n            name = prefix + str(suffix + i + 1)\n            x = self.__getattr__(name)(x)\n            outputs[name] = x\n\n        return outputs\n\n\nclass SingleBiFPN(Backbone):\n    \"\"\"\n    This module implements Feature Pyramid Network.\n    It creates pyramid features built on top of some input feature maps.\n    \"\"\"\n\n    def __init__(\n        self, in_channels_list, out_channels, norm=\"\"\n    ):\n        \"\"\"\n        Args:\n            bottom_up (Backbone): module representing the bottom up subnetwork.\n                Must be a subclass of :class:`Backbone`. The multi-scale feature\n                maps generated by the bottom up network, and listed in `in_features`,\n                are used to generate FPN levels.\n            in_features (list[str]): names of the input feature maps coming\n                from the backbone to which FPN is attached. For example, if the\n                backbone produces [\"res2\", \"res3\", \"res4\"], any *contiguous* sublist\n                of these may be used; order must be from high to low resolution.\n            out_channels (int): number of channels in the output feature maps.\n            norm (str): the normalization to use.\n        \"\"\"\n        super(SingleBiFPN, self).__init__()\n\n        self.out_channels = out_channels\n        # build 5-levels bifpn\n        if len(in_channels_list) == 5:\n            self.nodes = [\n                {'feat_level': 3, 'inputs_offsets': [3, 4]},\n                {'feat_level': 2, 'inputs_offsets': [2, 5]},\n                {'feat_level': 1, 'inputs_offsets': [1, 6]},\n                {'feat_level': 0, 'inputs_offsets': [0, 7]},\n                {'feat_level': 1, 'inputs_offsets': [1, 7, 8]},\n                {'feat_level': 2, 'inputs_offsets': [2, 6, 9]},\n                {'feat_level': 3, 'inputs_offsets': [3, 5, 10]},\n                {'feat_level': 4, 'inputs_offsets': [4, 11]},\n            ]\n        elif len(in_channels_list) == 3:\n            self.nodes = [\n                {'feat_level': 1, 'inputs_offsets': [1, 2]},\n                {'feat_level': 0, 'inputs_offsets': [0, 3]},\n                {'feat_level': 1, 'inputs_offsets': [1, 3, 4]},\n                {'feat_level': 2, 'inputs_offsets': [2, 5]},\n            ]\n        else:\n            raise NotImplementedError\n\n        node_info = [_ for _ in in_channels_list]\n\n        num_output_connections = [0 for _ in in_channels_list]\n        for fnode in self.nodes:\n            feat_level = fnode[\"feat_level\"]\n            inputs_offsets = fnode[\"inputs_offsets\"]\n            inputs_offsets_str = \"_\".join(map(str, inputs_offsets))\n            for input_offset in inputs_offsets:\n                num_output_connections[input_offset] += 1\n\n                in_channels = node_info[input_offset]\n                if in_channels != out_channels:\n                    lateral_conv = Conv2d(\n                        in_channels,\n                        out_channels,\n                        kernel_size=1,\n                        norm=get_norm(norm, out_channels)\n                    )\n                    self.add_module(\n                        \"lateral_{}_f{}\".format(input_offset, feat_level), lateral_conv\n                    )\n            node_info.append(out_channels)\n            num_output_connections.append(0)\n\n            # generate attention weights\n            name = \"weights_f{}_{}\".format(feat_level, inputs_offsets_str)\n            self.__setattr__(name, nn.Parameter(\n                    torch.ones(len(inputs_offsets), dtype=torch.float32),\n                    requires_grad=True\n                ))\n\n            # generate convolutions after combination\n            name = \"outputs_f{}_{}\".format(feat_level, inputs_offsets_str)\n            self.add_module(name, Conv2d(\n                out_channels,\n                out_channels,\n                kernel_size=3,\n                padding=1,\n                norm=get_norm(norm, out_channels),\n                bias=(norm == \"\")\n            ))\n\n    def forward(self, feats):\n        \"\"\"\n        Args:\n            input (dict[str->Tensor]): mapping feature map name (e.g., \"p5\") to\n                feature map tensor for each feature level in high to low resolution order.\n        Returns:\n            dict[str->Tensor]:\n                mapping from feature map name to FPN feature map tensor\n                in high to low resolution order. Returned feature names follow the FPN\n                paper convention: \"p<stage>\", where stage has stride = 2 ** stage e.g.,\n                [\"n2\", \"n3\", ..., \"n6\"].\n        \"\"\"\n        feats = [_ for _ in feats]\n        num_levels = len(feats)\n        num_output_connections = [0 for _ in feats]\n        for fnode in self.nodes:\n            feat_level = fnode[\"feat_level\"]\n            inputs_offsets = fnode[\"inputs_offsets\"]\n            inputs_offsets_str = \"_\".join(map(str, inputs_offsets))\n            input_nodes = []\n            _, _, target_h, target_w = feats[feat_level].size()\n            for input_offset in inputs_offsets:\n                num_output_connections[input_offset] += 1\n                input_node = feats[input_offset]\n\n                # reduction\n                if input_node.size(1) != self.out_channels:\n                    name = \"lateral_{}_f{}\".format(input_offset, feat_level)\n                    input_node = self.__getattr__(name)(input_node)\n\n                # maybe downsample\n                _, _, h, w = input_node.size()\n                if h > target_h and w > target_w:\n                    height_stride_size = int((h - 1) // target_h + 1)\n                    width_stride_size = int((w - 1) // target_w + 1)\n                    assert height_stride_size == width_stride_size == 2\n                    input_node = F.max_pool2d(\n                        input_node, kernel_size=(height_stride_size + 1, width_stride_size + 1),\n                        stride=(height_stride_size, width_stride_size), padding=1\n                    )\n                elif h <= target_h and w <= target_w:\n                    if h < target_h or w < target_w:\n                        input_node = F.interpolate(\n                            input_node,\n                            size=(target_h, target_w),\n                            mode=\"nearest\"\n                        )\n                else:\n                    raise NotImplementedError()\n                input_nodes.append(input_node)\n\n            # attention\n            name = \"weights_f{}_{}\".format(feat_level, inputs_offsets_str)\n            weights = F.relu(self.__getattr__(name))\n            norm_weights = weights / (weights.sum() + 0.0001)\n\n            new_node = torch.stack(input_nodes, dim=-1)\n            new_node = (norm_weights * new_node).sum(dim=-1)\n            new_node = swish(new_node)\n\n            name = \"outputs_f{}_{}\".format(feat_level, inputs_offsets_str)\n            feats.append(self.__getattr__(name)(new_node))\n\n            num_output_connections.append(0)\n\n        output_feats = []\n        for idx in range(num_levels):\n            for i, fnode in enumerate(reversed(self.nodes)):\n                if fnode['feat_level'] == idx:\n                    output_feats.append(feats[-1 - i])\n                    break\n            else:\n                raise ValueError()\n        return output_feats\n\n\nclass BiFPN(Backbone):\n    \"\"\"\n    This module implements Feature Pyramid Network.\n    It creates pyramid features built on top of some input feature maps.\n    \"\"\"\n\n    def __init__(\n        self, bottom_up, in_features, out_channels, num_top_levels, num_repeats, norm=\"\"\n    ):\n        \"\"\"\n        Args:\n            bottom_up (Backbone): module representing the bottom up subnetwork.\n                Must be a subclass of :class:`Backbone`. The multi-scale feature\n                maps generated by the bottom up network, and listed in `in_features`,\n                are used to generate FPN levels.\n            in_features (list[str]): names of the input feature maps coming\n                from the backbone to which FPN is attached. For example, if the\n                backbone produces [\"res2\", \"res3\", \"res4\"], any *contiguous* sublist\n                of these may be used; order must be from high to low resolution.\n            out_channels (int): number of channels in the output feature maps.\n            num_top_levels (int): the number of the top levels (p6 or p7).\n            num_repeats (int): the number of repeats of BiFPN.\n            norm (str): the normalization to use.\n        \"\"\"\n        super(BiFPN, self).__init__()\n        assert isinstance(bottom_up, Backbone)\n\n        # add extra feature levels (i.e., 6 and 7)\n        self.bottom_up = BackboneWithTopLevels(\n            bottom_up, out_channels,\n            num_top_levels, norm\n        )\n        bottom_up_output_shapes = self.bottom_up.output_shape()\n\n        in_features = sorted(in_features, key=lambda x: split_name(x)[1])\n        self._size_divisibility = 128 #bottom_up_output_shapes[in_features[-1]].stride\n        self.out_channels = out_channels\n        self.min_level = split_name(in_features[0])[1]\n\n        # add the names for top blocks\n        prefix, last_suffix = split_name(in_features[-1])\n        for i in range(num_top_levels):\n            in_features.append(prefix + str(last_suffix + i + 1))\n        self.in_features = in_features\n\n        # generate output features\n        self._out_features = [\"p{}\".format(split_name(name)[1]) for name in in_features]\n        self._out_feature_strides = {\n            out_name: bottom_up_output_shapes[in_name].stride\n            for out_name, in_name in zip(self._out_features, in_features)\n        }\n        self._out_feature_channels = {k: out_channels for k in self._out_features}\n\n        # build bifpn\n        self.repeated_bifpn = nn.ModuleList()\n        for i in range(num_repeats):\n            if i == 0:\n                in_channels_list = [\n                    bottom_up_output_shapes[name].channels for name in in_features\n                ]\n            else:\n                in_channels_list = [\n                    self._out_feature_channels[name] for name in self._out_features\n                ]\n            self.repeated_bifpn.append(SingleBiFPN(\n                in_channels_list, out_channels, norm\n            ))\n\n    @property\n    def size_divisibility(self):\n        return self._size_divisibility\n\n    def forward(self, x):\n        \"\"\"\n        Args:\n            input (dict[str->Tensor]): mapping feature map name (e.g., \"p5\") to\n                feature map tensor for each feature level in high to low resolution order.\n        Returns:\n            dict[str->Tensor]:\n                mapping from feature map name to FPN feature map tensor\n                in high to low resolution order. Returned feature names follow the FPN\n                paper convention: \"p<stage>\", where stage has stride = 2 ** stage e.g.,\n                [\"n2\", \"n3\", ..., \"n6\"].\n        \"\"\"\n        bottom_up_features = self.bottom_up(x)\n        feats = [bottom_up_features[f] for f in self.in_features]\n\n        for bifpn in self.repeated_bifpn:\n             feats = bifpn(feats)\n\n        return dict(zip(self._out_features, feats))\n\n\ndef _assert_strides_are_log2_contiguous(strides):\n    \"\"\"\n    Assert that each stride is 2x times its preceding stride, i.e. \"contiguous in log2\".\n    \"\"\"\n    for i, stride in enumerate(strides[1:], 1):\n        assert stride == 2 * strides[i - 1], \"Strides {} {} are not log2 contiguous\".format(\n            stride, strides[i - 1]\n        )\n\n\n@BACKBONE_REGISTRY.register()\ndef build_fcos_resnet_bifpn_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n    bottom_up = build_resnet_backbone(cfg, input_shape)\n    in_features = cfg.MODEL.FPN.IN_FEATURES\n    out_channels = cfg.MODEL.BIFPN.OUT_CHANNELS\n    num_repeats = cfg.MODEL.BIFPN.NUM_BIFPN\n    top_levels = 2\n\n    backbone = BiFPN(\n        bottom_up=bottom_up,\n        in_features=in_features,\n        out_channels=out_channels,\n        num_top_levels=top_levels,\n        num_repeats=num_repeats,\n        norm=cfg.MODEL.BIFPN.NORM\n    )\n    return backbone\n\n\n\n@BACKBONE_REGISTRY.register()\ndef build_p35_fcos_resnet_bifpn_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n    bottom_up = build_resnet_backbone(cfg, input_shape)\n    in_features = cfg.MODEL.FPN.IN_FEATURES\n    out_channels = cfg.MODEL.BIFPN.OUT_CHANNELS\n    num_repeats = cfg.MODEL.BIFPN.NUM_BIFPN\n    top_levels = 0\n\n    backbone = BiFPN(\n        bottom_up=bottom_up,\n        in_features=in_features,\n        out_channels=out_channels,\n        num_top_levels=top_levels,\n        num_repeats=num_repeats,\n        norm=cfg.MODEL.BIFPN.NORM\n    )\n    return backbone\n\n\n@BACKBONE_REGISTRY.register()\ndef build_p35_fcos_dla_bifpn_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n    bottom_up = dla34(cfg)\n    in_features = cfg.MODEL.FPN.IN_FEATURES\n    out_channels = cfg.MODEL.BIFPN.OUT_CHANNELS\n    num_repeats = cfg.MODEL.BIFPN.NUM_BIFPN\n    top_levels = 0\n\n    backbone = BiFPN(\n        bottom_up=bottom_up,\n        in_features=in_features,\n        out_channels=out_channels,\n        num_top_levels=top_levels,\n        num_repeats=num_repeats,\n        norm=cfg.MODEL.BIFPN.NORM\n    )\n    return backbone\n\n@BACKBONE_REGISTRY.register()\ndef build_p37_fcos_dla_bifpn_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n    bottom_up = dla34(cfg)\n    in_features = cfg.MODEL.FPN.IN_FEATURES\n    out_channels = cfg.MODEL.BIFPN.OUT_CHANNELS\n    num_repeats = cfg.MODEL.BIFPN.NUM_BIFPN\n    assert cfg.MODEL.BIFPN.NUM_LEVELS == 5\n    top_levels = 2\n\n    backbone = BiFPN(\n        bottom_up=bottom_up,\n        in_features=in_features,\n        out_channels=out_channels,\n        num_top_levels=top_levels,\n        num_repeats=num_repeats,\n        norm=cfg.MODEL.BIFPN.NORM\n    )\n    return backbone\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/backbone/dla.py",
    "content": "import numpy as np\nimport math\nfrom os.path import join\nimport fvcore.nn.weight_init as weight_init\nimport torch\nimport torch.nn.functional as F\nfrom torch import nn\nimport torch.utils.model_zoo as model_zoo\n\nfrom detectron2.modeling.backbone.resnet import (\n    BasicStem, BottleneckBlock, DeformBottleneckBlock)\nfrom detectron2.layers import (\n    Conv2d,\n    DeformConv,\n    FrozenBatchNorm2d,\n    ModulatedDeformConv,\n    ShapeSpec,\n    get_norm,\n)\n\nfrom detectron2.modeling.backbone.backbone import Backbone\nfrom detectron2.modeling.backbone.build import BACKBONE_REGISTRY\nfrom detectron2.modeling.backbone.fpn import FPN\n\n__all__ = [\n    \"BottleneckBlock\",\n    \"DeformBottleneckBlock\",\n    \"BasicStem\",\n]\n\nDCNV1 = False\n\nHASH = {\n    34: 'ba72cf86',\n    60: '24839fc4',\n}\n\ndef get_model_url(data, name, hash):\n    return join('http://dl.yf.io/dla/models', data, '{}-{}.pth'.format(name, hash))\n\nclass BasicBlock(nn.Module):\n    def __init__(self, inplanes, planes, stride=1, dilation=1, norm='BN'):\n        super(BasicBlock, self).__init__()\n        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=3,\n                               stride=stride, padding=dilation,\n                               bias=False, dilation=dilation)\n        self.bn1 = get_norm(norm, planes)\n        self.relu = nn.ReLU(inplace=True)\n        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,\n                               stride=1, padding=dilation,\n                               bias=False, dilation=dilation)\n        self.bn2 = get_norm(norm, planes)\n        self.stride = stride\n\n    def forward(self, x, residual=None):\n        if residual is None:\n            residual = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n\n        out += residual\n        out = self.relu(out)\n\n        return out\n\nclass Bottleneck(nn.Module):\n    expansion = 2\n\n    def __init__(self, inplanes, planes, stride=1, dilation=1, norm='BN'):\n        super(Bottleneck, self).__init__()\n        expansion = Bottleneck.expansion\n        bottle_planes = planes // expansion\n        self.conv1 = nn.Conv2d(inplanes, bottle_planes,\n                               kernel_size=1, bias=False)\n        self.bn1 = get_norm(norm, bottle_planes)\n        self.conv2 = nn.Conv2d(bottle_planes, bottle_planes, kernel_size=3,\n                               stride=stride, padding=dilation,\n                               bias=False, dilation=dilation)\n        self.bn2 = get_norm(norm, bottle_planes)\n        self.conv3 = nn.Conv2d(bottle_planes, planes,\n                               kernel_size=1, bias=False)\n        self.bn3 = get_norm(norm, planes)\n        self.relu = nn.ReLU(inplace=True)\n        self.stride = stride\n\n    def forward(self, x, residual=None):\n        if residual is None:\n            residual = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n        out = self.relu(out)\n\n        out = self.conv3(out)\n        out = self.bn3(out)\n\n        out += residual\n        out = self.relu(out)\n\n        return out\n\nclass Root(nn.Module):\n    def __init__(self, in_channels, out_channels, kernel_size, residual, norm='BN'):\n        super(Root, self).__init__()\n        self.conv = nn.Conv2d(\n            in_channels, out_channels, 1,\n            stride=1, bias=False, padding=(kernel_size - 1) // 2)\n        self.bn = get_norm(norm, out_channels)\n        self.relu = nn.ReLU(inplace=True)\n        self.residual = residual\n\n    def forward(self, *x):\n        children = x\n        x = self.conv(torch.cat(x, 1))\n        x = self.bn(x)\n        if self.residual:\n            x += children[0]\n        x = self.relu(x)\n\n        return x\n\n\nclass Tree(nn.Module):\n    def __init__(self, levels, block, in_channels, out_channels, stride=1,\n                 level_root=False, root_dim=0, root_kernel_size=1,\n                 dilation=1, root_residual=False, norm='BN'):\n        super(Tree, self).__init__()\n        if root_dim == 0:\n            root_dim = 2 * out_channels\n        if level_root:\n            root_dim += in_channels\n        if levels == 1:\n            self.tree1 = block(in_channels, out_channels, stride,\n                               dilation=dilation, norm=norm)\n            self.tree2 = block(out_channels, out_channels, 1,\n                               dilation=dilation, norm=norm)\n        else:\n            self.tree1 = Tree(levels - 1, block, in_channels, out_channels,\n                              stride, root_dim=0,\n                              root_kernel_size=root_kernel_size,\n                              dilation=dilation, root_residual=root_residual, \n                              norm=norm)\n            self.tree2 = Tree(levels - 1, block, out_channels, out_channels,\n                              root_dim=root_dim + out_channels,\n                              root_kernel_size=root_kernel_size,\n                              dilation=dilation, root_residual=root_residual, \n                              norm=norm)\n        if levels == 1:\n            self.root = Root(root_dim, out_channels, root_kernel_size,\n                             root_residual, norm=norm)\n        self.level_root = level_root\n        self.root_dim = root_dim\n        self.downsample = None\n        self.project = None\n        self.levels = levels\n        if stride > 1:\n            self.downsample = nn.MaxPool2d(stride, stride=stride)\n        if in_channels != out_channels:\n            self.project = nn.Sequential(\n                nn.Conv2d(in_channels, out_channels,\n                          kernel_size=1, stride=1, bias=False),\n                get_norm(norm, out_channels)\n            )\n\n    def forward(self, x, residual=None, children=None):\n        children = [] if children is None else children\n        bottom = self.downsample(x) if self.downsample else x\n        residual = self.project(bottom) if self.project else bottom\n        if self.level_root:\n            children.append(bottom)\n        x1 = self.tree1(x, residual)\n        if self.levels == 1:\n            x2 = self.tree2(x1)\n            x = self.root(x2, x1, *children)\n        else:\n            children.append(x1)\n            x = self.tree2(x1, children=children)\n        return x\n\nclass DLA(nn.Module):\n    def __init__(self, num_layers, levels, channels, \n        block=BasicBlock, residual_root=False, norm='BN'):\n        \"\"\"\n        Args:\n        \"\"\"\n        super(DLA, self).__init__()\n        self.norm = norm\n        self.channels = channels\n        self.base_layer = nn.Sequential(\n            nn.Conv2d(3, channels[0], kernel_size=7, stride=1,\n                      padding=3, bias=False),\n            get_norm(self.norm, channels[0]),\n            nn.ReLU(inplace=True))\n        self.level0 = self._make_conv_level(\n            channels[0], channels[0], levels[0])\n        self.level1 = self._make_conv_level(\n            channels[0], channels[1], levels[1], stride=2)\n        self.level2 = Tree(levels[2], block, channels[1], channels[2], 2,\n                           level_root=False,\n                           root_residual=residual_root, norm=norm)\n        self.level3 = Tree(levels[3], block, channels[2], channels[3], 2,\n                           level_root=True, root_residual=residual_root, \n                           norm=norm)\n        self.level4 = Tree(levels[4], block, channels[3], channels[4], 2,\n                           level_root=True, root_residual=residual_root, \n                           norm=norm)\n        self.level5 = Tree(levels[5], block, channels[4], channels[5], 2,\n                           level_root=True, root_residual=residual_root, \n                           norm=norm)\n        self.load_pretrained_model(\n            data='imagenet', name='dla{}'.format(num_layers), \n            hash=HASH[num_layers])\n\n    def load_pretrained_model(self, data, name, hash):\n        model_url = get_model_url(data, name, hash)\n        model_weights = model_zoo.load_url(model_url)\n        num_classes = len(model_weights[list(model_weights.keys())[-1]])\n        self.fc = nn.Conv2d(\n            self.channels[-1], num_classes,\n            kernel_size=1, stride=1, padding=0, bias=True)\n        print('Loading pretrained')\n        self.load_state_dict(model_weights, strict=False)\n\n    def _make_conv_level(self, inplanes, planes, convs, stride=1, dilation=1):\n        modules = []\n        for i in range(convs):\n            modules.extend([\n                nn.Conv2d(inplanes, planes, kernel_size=3,\n                          stride=stride if i == 0 else 1,\n                          padding=dilation, bias=False, dilation=dilation),\n                get_norm(self.norm, planes),\n                nn.ReLU(inplace=True)])\n            inplanes = planes\n        return nn.Sequential(*modules)\n\n    def forward(self, x):\n        y = []\n        x = self.base_layer(x)\n        for i in range(6):\n            x = getattr(self, 'level{}'.format(i))(x)\n            y.append(x)\n        return y\n\n\ndef fill_up_weights(up):\n    w = up.weight.data\n    f = math.ceil(w.size(2) / 2)\n    c = (2 * f - 1 - f % 2) / (2. * f)\n    for i in range(w.size(2)):\n        for j in range(w.size(3)):\n            w[0, 0, i, j] = \\\n                (1 - math.fabs(i / f - c)) * (1 - math.fabs(j / f - c))\n    for c in range(1, w.size(0)):\n        w[c, 0, :, :] = w[0, 0, :, :]\n\n\nclass _DeformConv(nn.Module):\n    def __init__(self, chi, cho, norm='BN'):\n        super(_DeformConv, self).__init__()\n        self.actf = nn.Sequential(\n            get_norm(norm, cho),\n            nn.ReLU(inplace=True)\n        )\n        if DCNV1:\n            self.offset = Conv2d(\n                chi, 18, kernel_size=3, stride=1,\n                padding=1, dilation=1)\n            self.conv = DeformConv(\n                chi, cho, kernel_size=(3,3), stride=1, padding=1,\n                dilation=1, deformable_groups=1)\n        else:\n            self.offset = Conv2d(\n                chi, 27, kernel_size=3, stride=1,\n                padding=1, dilation=1)\n            self.conv = ModulatedDeformConv(\n                chi, cho, kernel_size=3, stride=1, padding=1,\n                dilation=1, deformable_groups=1)\n        nn.init.constant_(self.offset.weight, 0)\n        nn.init.constant_(self.offset.bias, 0)\n        \n    def forward(self, x):\n        if DCNV1:\n            offset = self.offset(x)\n            x = self.conv(x, offset)\n        else:\n            offset_mask = self.offset(x)\n            offset_x, offset_y, mask = torch.chunk(offset_mask, 3, dim=1)\n            offset = torch.cat((offset_x, offset_y), dim=1)\n            mask = mask.sigmoid()\n            x = self.conv(x, offset, mask)\n        x = self.actf(x)\n        return x\n\n\nclass IDAUp(nn.Module):\n    def __init__(self, o, channels, up_f, norm='BN'):\n        super(IDAUp, self).__init__()\n        for i in range(1, len(channels)):\n            c = channels[i]\n            f = int(up_f[i])  \n            proj = _DeformConv(c, o, norm=norm)\n            node = _DeformConv(o, o, norm=norm)\n     \n            up = nn.ConvTranspose2d(o, o, f * 2, stride=f, \n                                    padding=f // 2, output_padding=0,\n                                    groups=o, bias=False)\n            fill_up_weights(up)\n\n            setattr(self, 'proj_' + str(i), proj)\n            setattr(self, 'up_' + str(i), up)\n            setattr(self, 'node_' + str(i), node)\n                 \n        \n    def forward(self, layers, startp, endp):\n        for i in range(startp + 1, endp):\n            upsample = getattr(self, 'up_' + str(i - startp))\n            project = getattr(self, 'proj_' + str(i - startp))\n            layers[i] = upsample(project(layers[i]))\n            node = getattr(self, 'node_' + str(i - startp))\n            layers[i] = node(layers[i] + layers[i - 1])\n\n\nclass DLAUp(nn.Module):\n    def __init__(self, startp, channels, scales, in_channels=None, norm='BN'):\n        super(DLAUp, self).__init__()\n        self.startp = startp\n        if in_channels is None:\n            in_channels = channels\n        self.channels = channels\n        channels = list(channels)\n        scales = np.array(scales, dtype=int)\n        for i in range(len(channels) - 1):\n            j = -i - 2\n            setattr(self, 'ida_{}'.format(i),\n                    IDAUp(channels[j], in_channels[j:],\n                          scales[j:] // scales[j], norm=norm))\n            scales[j + 1:] = scales[j]\n            in_channels[j + 1:] = [channels[j] for _ in channels[j + 1:]]\n\n    def forward(self, layers):\n        out = [layers[-1]] # start with 32\n        for i in range(len(layers) - self.startp - 1):\n            ida = getattr(self, 'ida_{}'.format(i))\n            ida(layers, len(layers) -i - 2, len(layers))\n            out.insert(0, layers[-1])\n        return out\n\nDLA_CONFIGS = {\n    34: ([1, 1, 1, 2, 2, 1], [16, 32, 64, 128, 256, 512], BasicBlock),\n    60: ([1, 1, 1, 2, 3, 1], [16, 32, 128, 256, 512, 1024], Bottleneck)\n}\n\n\nclass DLASeg(Backbone):\n    def __init__(self, num_layers, out_features, use_dla_up=True, \n        ms_output=False, norm='BN'):\n        super(DLASeg, self).__init__()\n        # depth = 34\n        levels, channels, Block = DLA_CONFIGS[num_layers]\n        self.base = DLA(num_layers=num_layers,\n            levels=levels, channels=channels, block=Block, norm=norm)\n        down_ratio = 4\n        self.first_level = int(np.log2(down_ratio))\n        self.ms_output = ms_output\n        self.last_level = 5 if not self.ms_output else 6\n        channels = self.base.channels\n        scales = [2 ** i for i in range(len(channels[self.first_level:]))]\n        self.use_dla_up = use_dla_up\n        if self.use_dla_up:\n            self.dla_up = DLAUp(\n                self.first_level, channels[self.first_level:], scales, \n                norm=norm)\n        out_channel = channels[self.first_level]\n        if not self.ms_output: # stride 4 DLA\n            self.ida_up = IDAUp(\n                out_channel, channels[self.first_level:self.last_level], \n                [2 ** i for i in range(self.last_level - self.first_level)], \n                norm=norm)\n        self._out_features = out_features\n        self._out_feature_channels = {\n            'dla{}'.format(i): channels[i] for i in range(6)}\n        self._out_feature_strides = {\n            'dla{}'.format(i): 2 ** i for i in range(6)}\n        self._size_divisibility = 32\n\n    @property\n    def size_divisibility(self):\n        return self._size_divisibility\n\n    def forward(self, x):\n        x = self.base(x)\n        if self.use_dla_up:\n            x = self.dla_up(x)\n        if not self.ms_output: # stride 4 dla\n            y = []\n            for i in range(self.last_level - self.first_level):\n                y.append(x[i].clone())\n            self.ida_up(y, 0, len(y))\n            ret = {}\n            for i in range(self.last_level - self.first_level):\n                out_feature = 'dla{}'.format(i)\n                if out_feature in self._out_features:\n                    ret[out_feature] = y[i]\n        else:\n            ret = {}\n            st = self.first_level if self.use_dla_up else 0\n            for i in range(self.last_level - st):\n                out_feature = 'dla{}'.format(i + st)\n                if out_feature in self._out_features:\n                    ret[out_feature] = x[i]\n        \n        return ret\n\n\n@BACKBONE_REGISTRY.register()\ndef build_dla_backbone(cfg, input_shape):\n    \"\"\"\n    Create a ResNet instance from config.\n\n    Returns:\n        ResNet: a :class:`ResNet` instance.\n    \"\"\"\n    return DLASeg(\n        out_features=cfg.MODEL.DLA.OUT_FEATURES, \n        num_layers=cfg.MODEL.DLA.NUM_LAYERS,\n        use_dla_up=cfg.MODEL.DLA.USE_DLA_UP,\n        ms_output=cfg.MODEL.DLA.MS_OUTPUT,\n        norm=cfg.MODEL.DLA.NORM)\n\nclass LastLevelP6P7(nn.Module):\n    \"\"\"\n    This module is used in RetinaNet to generate extra layers, P6 and P7 from\n    C5 feature.\n    \"\"\"\n\n    def __init__(self, in_channels, out_channels):\n        super().__init__()\n        self.num_levels = 2\n        self.in_feature = \"dla5\"\n        self.p6 = nn.Conv2d(in_channels, out_channels, 3, 2, 1)\n        self.p7 = nn.Conv2d(out_channels, out_channels, 3, 2, 1)\n        for module in [self.p6, self.p7]:\n            weight_init.c2_xavier_fill(module)\n\n    def forward(self, c5):\n        p6 = self.p6(c5)\n        p7 = self.p7(F.relu(p6))\n        return [p6, p7]\n\n@BACKBONE_REGISTRY.register()\ndef build_retinanet_dla_fpn_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n    bottom_up = build_dla_backbone(cfg, input_shape)\n    in_features = cfg.MODEL.FPN.IN_FEATURES\n    out_channels = cfg.MODEL.FPN.OUT_CHANNELS\n    in_channels_p6p7 = bottom_up.output_shape()['dla5'].channels\n    backbone = FPN(\n        bottom_up=bottom_up,\n        in_features=in_features,\n        out_channels=out_channels,\n        norm=cfg.MODEL.FPN.NORM,\n        top_block=LastLevelP6P7(in_channels_p6p7, out_channels),\n        fuse_type=cfg.MODEL.FPN.FUSE_TYPE,\n    )\n    return backbone\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/backbone/dlafpn.py",
    "content": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\n# this file is from https://github.com/ucbdrive/dla/blob/master/dla.py.\n\nimport math\nfrom os.path import join\nimport numpy as np\n\nimport torch\nfrom torch import nn\nimport torch.utils.model_zoo as model_zoo\nimport torch.nn.functional as F\nimport fvcore.nn.weight_init as weight_init\n\nfrom detectron2.modeling.backbone import FPN\nfrom detectron2.layers import ShapeSpec, ModulatedDeformConv, Conv2d\nfrom detectron2.modeling.backbone.build import BACKBONE_REGISTRY\nfrom detectron2.layers.batch_norm import get_norm\nfrom detectron2.modeling.backbone import Backbone\n\nWEB_ROOT = 'http://dl.yf.io/dla/models'\n\n\ndef get_model_url(data, name, hash):\n    return join(\n        'http://dl.yf.io/dla/models', data, '{}-{}.pth'.format(name, hash))\n\n\ndef conv3x3(in_planes, out_planes, stride=1):\n    \"3x3 convolution with padding\"\n    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,\n                     padding=1, bias=False)\n\n\nclass BasicBlock(nn.Module):\n    def __init__(self, cfg, inplanes, planes, stride=1, dilation=1):\n        super(BasicBlock, self).__init__()\n        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=3,\n                               stride=stride, padding=dilation,\n                               bias=False, dilation=dilation)\n        self.bn1 = get_norm(cfg.MODEL.DLA.NORM, planes)\n        self.relu = nn.ReLU(inplace=True)\n        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,\n                               stride=1, padding=dilation,\n                               bias=False, dilation=dilation)\n        self.bn2 = get_norm(cfg.MODEL.DLA.NORM, planes)\n        self.stride = stride\n\n    def forward(self, x, residual=None):\n        if residual is None:\n            residual = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n\n        out += residual\n        out = self.relu(out)\n\n        return out\n\n\nclass Bottleneck(nn.Module):\n    expansion = 2\n\n    def __init__(self, cfg, inplanes, planes, stride=1, dilation=1):\n        super(Bottleneck, self).__init__()\n        expansion = Bottleneck.expansion\n        bottle_planes = planes // expansion\n        self.conv1 = nn.Conv2d(inplanes, bottle_planes,\n                               kernel_size=1, bias=False)\n        self.bn1 = get_norm(cfg.MODEL.DLA.NORM, bottle_planes)\n        self.conv2 = nn.Conv2d(bottle_planes, bottle_planes, kernel_size=3,\n                               stride=stride, padding=dilation,\n                               bias=False, dilation=dilation)\n        self.bn2 = get_norm(cfg.MODEL.DLA.NORM, bottle_planes)\n        self.conv3 = nn.Conv2d(bottle_planes, planes,\n                               kernel_size=1, bias=False)\n        self.bn3 = get_norm(cfg.MODEL.DLA.NORM, planes)\n        self.relu = nn.ReLU(inplace=True)\n        self.stride = stride\n\n    def forward(self, x, residual=None):\n        if residual is None:\n            residual = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n        out = self.relu(out)\n\n        out = self.conv3(out)\n        out = self.bn3(out)\n\n        out += residual\n        out = self.relu(out)\n\n        return out\n\n\nclass Root(nn.Module):\n    def __init__(self, cfg, in_channels, out_channels, kernel_size, residual):\n        super(Root, self).__init__()\n        self.conv = nn.Conv2d(\n            in_channels, out_channels, kernel_size,\n            stride=1, bias=False, padding=(kernel_size - 1) // 2)\n        self.bn = get_norm(cfg.MODEL.DLA.NORM, out_channels)\n        self.relu = nn.ReLU(inplace=True)\n        self.residual = residual\n\n    def forward(self, *x):\n        children = x\n        x = self.conv(torch.cat(x, 1))\n        x = self.bn(x)\n        if self.residual:\n            x += children[0]\n        x = self.relu(x)\n\n        return x\n\n\nclass Tree(nn.Module):\n    def __init__(self, cfg, levels, block, in_channels, out_channels, stride=1,\n                 level_root=False, root_dim=0, root_kernel_size=1,\n                 dilation=1, root_residual=False):\n        super(Tree, self).__init__()\n        if root_dim == 0:\n            root_dim = 2 * out_channels\n        if level_root:\n            root_dim += in_channels\n        if levels == 1:\n            self.tree1 = block(cfg, in_channels, out_channels, stride,\n                               dilation=dilation)\n            self.tree2 = block(cfg, out_channels, out_channels, 1,\n                               dilation=dilation)\n        else:\n            self.tree1 = Tree(cfg, levels - 1, block, in_channels, out_channels,\n                              stride, root_dim=0,\n                              root_kernel_size=root_kernel_size,\n                              dilation=dilation, root_residual=root_residual)\n            self.tree2 = Tree(cfg, levels - 1, block, out_channels, out_channels,\n                              root_dim=root_dim + out_channels,\n                              root_kernel_size=root_kernel_size,\n                              dilation=dilation, root_residual=root_residual)\n        if levels == 1:\n            self.root = Root(cfg, root_dim, out_channels, root_kernel_size,\n                             root_residual)\n        self.level_root = level_root\n        self.root_dim = root_dim\n        self.downsample = None\n        self.project = None\n        self.levels = levels\n        if stride > 1:\n            self.downsample = nn.MaxPool2d(stride, stride=stride)\n        if in_channels != out_channels:\n            self.project = nn.Sequential(\n                nn.Conv2d(in_channels, out_channels,\n                          kernel_size=1, stride=1, bias=False),\n                get_norm(cfg.MODEL.DLA.NORM, out_channels)\n            )\n\n    def forward(self, x, residual=None, children=None):\n        if self.training and residual is not None:\n            x = x + residual.sum() * 0.0\n        children = [] if children is None else children\n        bottom = self.downsample(x) if self.downsample else x\n        residual = self.project(bottom) if self.project else bottom\n        if self.level_root:\n            children.append(bottom)\n        x1 = self.tree1(x, residual)\n        if self.levels == 1:\n            x2 = self.tree2(x1)\n            x = self.root(x2, x1, *children)\n        else:\n            children.append(x1)\n            x = self.tree2(x1, children=children)\n        return x\n\n\nclass DLA(Backbone):\n    def __init__(self, cfg, levels, channels, block=BasicBlock, residual_root=False):\n        super(DLA, self).__init__()\n        self.cfg = cfg\n        self.channels = channels\n\n        self._out_features = [\"dla{}\".format(i) for i in range(6)]\n        self._out_feature_channels = {k: channels[i] for i, k in enumerate(self._out_features)}\n        self._out_feature_strides = {k: 2 ** i for i, k in enumerate(self._out_features)}\n\n        self.base_layer = nn.Sequential(\n            nn.Conv2d(3, channels[0], kernel_size=7, stride=1,\n                      padding=3, bias=False),\n            get_norm(cfg.MODEL.DLA.NORM, channels[0]),\n            nn.ReLU(inplace=True))\n        self.level0 = self._make_conv_level(\n            channels[0], channels[0], levels[0])\n        self.level1 = self._make_conv_level(\n            channels[0], channels[1], levels[1], stride=2)\n        self.level2 = Tree(cfg, levels[2], block, channels[1], channels[2], 2,\n                           level_root=False,\n                           root_residual=residual_root)\n        self.level3 = Tree(cfg, levels[3], block, channels[2], channels[3], 2,\n                           level_root=True, root_residual=residual_root)\n        self.level4 = Tree(cfg, levels[4], block, channels[3], channels[4], 2,\n                           level_root=True, root_residual=residual_root)\n        self.level5 = Tree(cfg, levels[5], block, channels[4], channels[5], 2,\n                           level_root=True, root_residual=residual_root)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n                m.weight.data.normal_(0, math.sqrt(2. / n))\n\n        self.load_pretrained_model(\n            data='imagenet', name='dla34', hash='ba72cf86')\n\n    def load_pretrained_model(self, data, name, hash):\n        model_url = get_model_url(data, name, hash)\n        model_weights = model_zoo.load_url(model_url)\n        del model_weights['fc.weight']\n        del model_weights['fc.bias']\n        print('Loading pretrained DLA!')\n        self.load_state_dict(model_weights, strict=True)\n\n    def _make_conv_level(self, inplanes, planes, convs, stride=1, dilation=1):\n        modules = []\n        for i in range(convs):\n            modules.extend([\n                nn.Conv2d(inplanes, planes, kernel_size=3,\n                          stride=stride if i == 0 else 1,\n                          padding=dilation, bias=False, dilation=dilation),\n                get_norm(self.cfg.MODEL.DLA.NORM, planes),\n                nn.ReLU(inplace=True)])\n            inplanes = planes\n        return nn.Sequential(*modules)\n\n    def forward(self, x):\n        y = {}\n        x = self.base_layer(x)\n        for i in range(6):\n            name = 'level{}'.format(i)\n            x = getattr(self, name)(x)\n            y['dla{}'.format(i)] = x\n        return y\n\n\ndef fill_up_weights(up):\n    w = up.weight.data\n    f = math.ceil(w.size(2) / 2)\n    c = (2 * f - 1 - f % 2) / (2. * f)\n    for i in range(w.size(2)):\n        for j in range(w.size(3)):\n            w[0, 0, i, j] = \\\n                (1 - math.fabs(i / f - c)) * (1 - math.fabs(j / f - c))\n    for c in range(1, w.size(0)):\n        w[c, 0, :, :] = w[0, 0, :, :]\n\n\nclass Conv(nn.Module):\n    def __init__(self, chi, cho, norm):\n        super(Conv, self).__init__()\n        self.conv = nn.Sequential(\n            nn.Conv2d(chi, cho, kernel_size=1, stride=1, bias=False),\n            get_norm(norm, cho),\n            nn.ReLU(inplace=True))\n    \n    def forward(self, x):\n        return self.conv(x)\n\n\nclass DeformConv(nn.Module):\n    def __init__(self, chi, cho, norm):\n        super(DeformConv, self).__init__()\n        self.actf = nn.Sequential(\n            get_norm(norm, cho),\n            nn.ReLU(inplace=True)\n        )\n        self.offset = Conv2d(\n            chi, 27, kernel_size=3, stride=1,\n            padding=1, dilation=1)\n        self.conv = ModulatedDeformConv(\n            chi, cho, kernel_size=3, stride=1, padding=1,\n            dilation=1, deformable_groups=1)\n        nn.init.constant_(self.offset.weight, 0)\n        nn.init.constant_(self.offset.bias, 0)\n\n    def forward(self, x):\n        offset_mask = self.offset(x)\n        offset_x, offset_y, mask = torch.chunk(offset_mask, 3, dim=1)\n        offset = torch.cat((offset_x, offset_y), dim=1)\n        mask = mask.sigmoid()\n        x = self.conv(x, offset, mask)\n        x = self.actf(x)\n        return x\n\n\nclass IDAUp(nn.Module):\n    def __init__(self, o, channels, up_f, norm='FrozenBN', node_type=Conv):\n        super(IDAUp, self).__init__()\n        for i in range(1, len(channels)):\n            c = channels[i]\n            f = int(up_f[i])  \n            proj = node_type(c, o, norm)\n            node = node_type(o, o, norm)\n     \n            up = nn.ConvTranspose2d(o, o, f * 2, stride=f, \n                                    padding=f // 2, output_padding=0,\n                                    groups=o, bias=False)\n            fill_up_weights(up)\n\n            setattr(self, 'proj_' + str(i), proj)\n            setattr(self, 'up_' + str(i), up)\n            setattr(self, 'node_' + str(i), node)\n                 \n        \n    def forward(self, layers, startp, endp):\n        for i in range(startp + 1, endp):\n            upsample = getattr(self, 'up_' + str(i - startp))\n            project = getattr(self, 'proj_' + str(i - startp))\n            layers[i] = upsample(project(layers[i]))\n            node = getattr(self, 'node_' + str(i - startp))\n            layers[i] = node(layers[i] + layers[i - 1])\n\n\nDLAUP_NODE_MAP = {\n    'conv': Conv,\n    'dcn': DeformConv,\n}\n\nclass DLAUP(Backbone):\n    def __init__(self, bottom_up, in_features, norm, dlaup_node='conv'):\n        super(DLAUP, self).__init__()\n        assert isinstance(bottom_up, Backbone)\n        self.bottom_up = bottom_up\n        input_shapes = bottom_up.output_shape()\n        in_strides = [input_shapes[f].stride for f in in_features]\n        in_channels = [input_shapes[f].channels for f in in_features] \n        in_levels = [int(math.log2(input_shapes[f].stride)) for f in in_features]\n        self.in_features = in_features\n        out_features = ['dlaup{}'.format(l) for l in in_levels]\n        self._out_features = out_features\n        self._out_feature_channels = {\n            'dlaup{}'.format(l): in_channels[i] for i, l in enumerate(in_levels)}\n        self._out_feature_strides = {\n            'dlaup{}'.format(l): 2 ** l for l in in_levels}\n\n        print('self._out_features', self._out_features)\n        print('self._out_feature_channels', self._out_feature_channels)\n        print('self._out_feature_strides', self._out_feature_strides)\n        self._size_divisibility = 32\n\n        node_type = DLAUP_NODE_MAP[dlaup_node]\n\n        self.startp = int(math.log2(in_strides[0]))\n        self.channels = in_channels\n        channels = list(in_channels)\n        scales = np.array([2 ** i for i in range(len(out_features))], dtype=int)\n        for i in range(len(channels) - 1):\n            j = -i - 2\n            setattr(self, 'ida_{}'.format(i),\n                    IDAUp(channels[j], in_channels[j:],\n                          scales[j:] // scales[j],\n                          norm=norm,\n                          node_type=node_type))\n            scales[j + 1:] = scales[j]\n            in_channels[j + 1:] = [channels[j] for _ in channels[j + 1:]]\n\n    @property\n    def size_divisibility(self):\n        return self._size_divisibility\n\n    def forward(self, x):\n        bottom_up_features = self.bottom_up(x)\n        layers = [bottom_up_features[f] for f in self.in_features]\n        out = [layers[-1]] # start with 32\n        for i in range(len(layers) - 1):\n            ida = getattr(self, 'ida_{}'.format(i))\n            ida(layers, len(layers) - i - 2, len(layers))\n            out.insert(0, layers[-1])\n        ret = {}\n        for k, v in zip(self._out_features, out):\n            ret[k] = v\n        # import pdb; pdb.set_trace()\n        return ret\n\n\ndef dla34(cfg, pretrained=None):  # DLA-34\n    model = DLA(cfg, [1, 1, 1, 2, 2, 1],\n                [16, 32, 64, 128, 256, 512],\n                block=BasicBlock)\n    return model\n\n\nclass LastLevelP6P7(nn.Module):\n    \"\"\"\n    This module is used in RetinaNet to generate extra layers, P6 and P7 from\n    C5 feature.\n    \"\"\"\n\n    def __init__(self, in_channels, out_channels):\n        super().__init__()\n        self.num_levels = 2\n        self.in_feature = \"dla5\"\n        self.p6 = nn.Conv2d(in_channels, out_channels, 3, 2, 1)\n        self.p7 = nn.Conv2d(out_channels, out_channels, 3, 2, 1)\n        for module in [self.p6, self.p7]:\n            weight_init.c2_xavier_fill(module)\n\n    def forward(self, c5):\n        p6 = self.p6(c5)\n        p7 = self.p7(F.relu(p6))\n        return [p6, p7]\n\n\n@BACKBONE_REGISTRY.register()\ndef build_dla_fpn3_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n\n    depth_to_creator = {\"dla34\": dla34}\n    bottom_up = depth_to_creator['dla{}'.format(cfg.MODEL.DLA.NUM_LAYERS)](cfg)\n    in_features = cfg.MODEL.FPN.IN_FEATURES\n    out_channels = cfg.MODEL.FPN.OUT_CHANNELS\n\n    backbone = FPN(\n        bottom_up=bottom_up,\n        in_features=in_features,\n        out_channels=out_channels,\n        norm=cfg.MODEL.FPN.NORM,\n        top_block=None,\n        fuse_type=cfg.MODEL.FPN.FUSE_TYPE,\n    )\n\n    return backbone\n\n@BACKBONE_REGISTRY.register()\ndef build_dla_fpn5_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n\n    depth_to_creator = {\"dla34\": dla34}\n    bottom_up = depth_to_creator['dla{}'.format(cfg.MODEL.DLA.NUM_LAYERS)](cfg)\n    in_features = cfg.MODEL.FPN.IN_FEATURES\n    out_channels = cfg.MODEL.FPN.OUT_CHANNELS\n    in_channels_top = bottom_up.output_shape()['dla5'].channels\n\n    backbone = FPN(\n        bottom_up=bottom_up,\n        in_features=in_features,\n        out_channels=out_channels,\n        norm=cfg.MODEL.FPN.NORM,\n        top_block=LastLevelP6P7(in_channels_top, out_channels),\n        fuse_type=cfg.MODEL.FPN.FUSE_TYPE,\n    )\n\n    return backbone\n\n\n@BACKBONE_REGISTRY.register()\ndef build_dlaup_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n\n    depth_to_creator = {\"dla34\": dla34}\n    bottom_up = depth_to_creator['dla{}'.format(cfg.MODEL.DLA.NUM_LAYERS)](cfg)\n\n    backbone = DLAUP(\n        bottom_up=bottom_up,\n        in_features=cfg.MODEL.DLA.DLAUP_IN_FEATURES,\n        norm=cfg.MODEL.DLA.NORM,\n        dlaup_node=cfg.MODEL.DLA.DLAUP_NODE,\n    )\n\n    return backbone\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/backbone/fpn_p5.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\nimport math\nimport fvcore.nn.weight_init as weight_init\nimport torch.nn.functional as F\nfrom torch import nn\n\nfrom detectron2.layers import Conv2d, ShapeSpec, get_norm\n\nfrom detectron2.modeling.backbone import Backbone\nfrom detectron2.modeling.backbone.fpn import FPN \nfrom detectron2.modeling.backbone.build import BACKBONE_REGISTRY\nfrom detectron2.modeling.backbone.resnet import build_resnet_backbone\n\n\nclass LastLevelP6P7_P5(nn.Module):\n    \"\"\"\n    This module is used in RetinaNet to generate extra layers, P6 and P7 from\n    C5 feature.\n    \"\"\"\n\n    def __init__(self, in_channels, out_channels):\n        super().__init__()\n        self.num_levels = 2\n        self.in_feature = \"p5\"\n        self.p6 = nn.Conv2d(in_channels, out_channels, 3, 2, 1)\n        self.p7 = nn.Conv2d(out_channels, out_channels, 3, 2, 1)\n        for module in [self.p6, self.p7]:\n            weight_init.c2_xavier_fill(module)\n\n    def forward(self, c5):\n        p6 = self.p6(c5)\n        p7 = self.p7(F.relu(p6))\n        return [p6, p7]\n\n\n@BACKBONE_REGISTRY.register()\ndef build_p67_resnet_fpn_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n    bottom_up = build_resnet_backbone(cfg, input_shape)\n    in_features = cfg.MODEL.FPN.IN_FEATURES\n    out_channels = cfg.MODEL.FPN.OUT_CHANNELS\n    backbone = FPN(\n        bottom_up=bottom_up,\n        in_features=in_features,\n        out_channels=out_channels,\n        norm=cfg.MODEL.FPN.NORM,\n        top_block=LastLevelP6P7_P5(out_channels, out_channels),\n        fuse_type=cfg.MODEL.FPN.FUSE_TYPE,\n    )\n    return backbone\n\n@BACKBONE_REGISTRY.register()\ndef build_p35_resnet_fpn_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n    bottom_up = build_resnet_backbone(cfg, input_shape)\n    in_features = cfg.MODEL.FPN.IN_FEATURES\n    out_channels = cfg.MODEL.FPN.OUT_CHANNELS\n    backbone = FPN(\n        bottom_up=bottom_up,\n        in_features=in_features,\n        out_channels=out_channels,\n        norm=cfg.MODEL.FPN.NORM,\n        top_block=None,\n        fuse_type=cfg.MODEL.FPN.FUSE_TYPE,\n    )\n    return backbone\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/backbone/res2net.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n# This file is modified from https://github.com/Res2Net/Res2Net-detectron2/blob/master/detectron2/modeling/backbone/resnet.py\n# The original file is under Apache-2.0 License\nimport numpy as np\nimport fvcore.nn.weight_init as weight_init\nimport torch\nimport torch.nn.functional as F\nfrom torch import nn\n\nfrom detectron2.layers import (\n    CNNBlockBase,\n    Conv2d,\n    DeformConv,\n    ModulatedDeformConv,\n    ShapeSpec,\n    get_norm,\n)\n\nfrom detectron2.modeling.backbone import Backbone\nfrom detectron2.modeling.backbone.fpn import FPN \nfrom detectron2.modeling.backbone.build import BACKBONE_REGISTRY\nfrom .fpn_p5 import LastLevelP6P7_P5\nfrom .bifpn import BiFPN\n\n__all__ = [\n    \"ResNetBlockBase\",\n    \"BasicBlock\",\n    \"BottleneckBlock\",\n    \"DeformBottleneckBlock\",\n    \"BasicStem\",\n    \"ResNet\",\n    \"make_stage\",\n    \"build_res2net_backbone\",\n]\n\n\nResNetBlockBase = CNNBlockBase\n\"\"\"\nAlias for backward compatibiltiy.\n\"\"\"\n\n\nclass BasicBlock(CNNBlockBase):\n    \"\"\"\n    The basic residual block for ResNet-18 and ResNet-34, with two 3x3 conv layers\n    and a projection shortcut if needed.\n    \"\"\"\n\n    def __init__(self, in_channels, out_channels, *, stride=1, norm=\"BN\"):\n        \"\"\"\n        Args:\n            in_channels (int): Number of input channels.\n            out_channels (int): Number of output channels.\n            stride (int): Stride for the first conv.\n            norm (str or callable): normalization for all conv layers.\n                See :func:`layers.get_norm` for supported format.\n        \"\"\"\n        super().__init__(in_channels, out_channels, stride)\n\n        if in_channels != out_channels:\n            self.shortcut = Conv2d(\n                in_channels,\n                out_channels,\n                kernel_size=1,\n                stride=stride,\n                bias=False,\n                norm=get_norm(norm, out_channels),\n            )\n        else:\n            self.shortcut = None\n\n        self.conv1 = Conv2d(\n            in_channels,\n            out_channels,\n            kernel_size=3,\n            stride=stride,\n            padding=1,\n            bias=False,\n            norm=get_norm(norm, out_channels),\n        )\n\n        self.conv2 = Conv2d(\n            out_channels,\n            out_channels,\n            kernel_size=3,\n            stride=1,\n            padding=1,\n            bias=False,\n            norm=get_norm(norm, out_channels),\n        )\n\n        for layer in [self.conv1, self.conv2, self.shortcut]:\n            if layer is not None:  # shortcut can be None\n                weight_init.c2_msra_fill(layer)\n\n    def forward(self, x):\n        out = self.conv1(x)\n        out = F.relu_(out)\n        out = self.conv2(out)\n\n        if self.shortcut is not None:\n            shortcut = self.shortcut(x)\n        else:\n            shortcut = x\n\n        out += shortcut\n        out = F.relu_(out)\n        return out\n\n\nclass BottleneckBlock(CNNBlockBase):\n    \"\"\"\n    The standard bottle2neck residual block used by Res2Net-50, 101 and 152.\n    \"\"\"\n\n    def __init__(\n        self,\n        in_channels,\n        out_channels,\n        *,\n        bottleneck_channels,\n        stride=1,\n        num_groups=1,\n        norm=\"BN\",\n        stride_in_1x1=False,\n        dilation=1,\n        basewidth=26, \n        scale=4,\n    ):\n        \"\"\"\n        Args:\n            bottleneck_channels (int): number of output channels for the 3x3\n                \"bottleneck\" conv layers.\n            num_groups (int): number of groups for the 3x3 conv layer.\n            norm (str or callable): normalization for all conv layers.\n                See :func:`layers.get_norm` for supported format.\n            stride_in_1x1 (bool): when stride>1, whether to put stride in the\n                first 1x1 convolution or the bottleneck 3x3 convolution.\n            dilation (int): the dilation rate of the 3x3 conv layer.\n        \"\"\"\n        super().__init__(in_channels, out_channels, stride)\n\n        if in_channels != out_channels:\n            self.shortcut = nn.Sequential(\n                nn.AvgPool2d(kernel_size=stride, stride=stride, \n                    ceil_mode=True, count_include_pad=False),\n                Conv2d(\n                    in_channels,\n                    out_channels,\n                    kernel_size=1,\n                    stride=1,\n                    bias=False,\n                    norm=get_norm(norm, out_channels),\n                )\n            )\n        else:\n            self.shortcut = None\n\n        # The original MSRA ResNet models have stride in the first 1x1 conv\n        # The subsequent fb.torch.resnet and Caffe2 ResNe[X]t implementations have\n        # stride in the 3x3 conv\n        stride_1x1, stride_3x3 = (stride, 1) if stride_in_1x1 else (1, stride)\n        width = bottleneck_channels//scale\n\n        self.conv1 = Conv2d(\n            in_channels,\n            bottleneck_channels,\n            kernel_size=1,\n            stride=stride_1x1,\n            bias=False,\n            norm=get_norm(norm, bottleneck_channels),\n        )\n        if scale == 1:\n          self.nums = 1\n        else:\n          self.nums = scale -1\n        if self.in_channels!=self.out_channels and stride_3x3!=2:\n            self.pool = nn.AvgPool2d(kernel_size=3, stride = stride_3x3, padding=1)\n\n        convs = []\n        bns = []\n        for i in range(self.nums):\n            convs.append(nn.Conv2d(\n                            width, \n                            width, \n                            kernel_size=3, \n                            stride=stride_3x3, \n                            padding=1 * dilation, \n                            bias=False,\n                            groups=num_groups,\n                            dilation=dilation,\n                            ))\n            bns.append(get_norm(norm, width))\n        self.convs = nn.ModuleList(convs)\n        self.bns = nn.ModuleList(bns)\n\n        self.conv3 = Conv2d(\n            bottleneck_channels,\n            out_channels,\n            kernel_size=1,\n            bias=False,\n            norm=get_norm(norm, out_channels),\n        )\n        self.scale = scale\n        self.width = width\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n        self.stride_3x3 = stride_3x3\n        for layer in [self.conv1, self.conv3]:\n            if layer is not None:  # shortcut can be None\n                weight_init.c2_msra_fill(layer)\n        if self.shortcut is not None:\n            for layer in self.shortcut.modules():\n                if isinstance(layer, Conv2d):\n                    weight_init.c2_msra_fill(layer)\n                \n        for layer in self.convs:\n            if layer is not None:  # shortcut can be None\n                weight_init.c2_msra_fill(layer)\n\n        # Zero-initialize the last normalization in each residual branch,\n        # so that at the beginning, the residual branch starts with zeros,\n        # and each residual block behaves like an identity.\n        # See Sec 5.1 in \"Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour\":\n        # \"For BN layers, the learnable scaling coefficient γ is initialized\n        # to be 1, except for each residual block's last BN\n        # where γ is initialized to be 0.\"\n\n        # nn.init.constant_(self.conv3.norm.weight, 0)\n        # TODO this somehow hurts performance when training GN models from scratch.\n        # Add it as an option when we need to use this code to train a backbone.\n\n    def forward(self, x):\n        out = self.conv1(x)\n        out = F.relu_(out)\n\n        spx = torch.split(out, self.width, 1)\n        for i in range(self.nums):\n            if i==0 or self.in_channels!=self.out_channels:\n                sp = spx[i]\n            else:\n                sp = sp + spx[i]\n            sp = self.convs[i](sp)\n            sp = F.relu_(self.bns[i](sp))\n            if i==0:\n                out = sp\n            else:\n                out = torch.cat((out, sp), 1)\n        if self.scale!=1 and self.stride_3x3==1:\n            out = torch.cat((out, spx[self.nums]), 1)\n        elif self.scale != 1 and self.stride_3x3==2:\n            out = torch.cat((out, self.pool(spx[self.nums])), 1)\n\n        out = self.conv3(out)\n\n        if self.shortcut is not None:\n            shortcut = self.shortcut(x)\n        else:\n            shortcut = x\n\n        out += shortcut\n        out = F.relu_(out)\n        return out\n\n\nclass DeformBottleneckBlock(ResNetBlockBase):\n    \"\"\"\n    Not implemented for res2net yet.\n    Similar to :class:`BottleneckBlock`, but with deformable conv in the 3x3 convolution.\n    \"\"\"\n\n    def __init__(\n        self,\n        in_channels,\n        out_channels,\n        *,\n        bottleneck_channels,\n        stride=1,\n        num_groups=1,\n        norm=\"BN\",\n        stride_in_1x1=False,\n        dilation=1,\n        deform_modulated=False,\n        deform_num_groups=1,\n        basewidth=26, \n        scale=4,\n    ):\n        super().__init__(in_channels, out_channels, stride)\n        self.deform_modulated = deform_modulated\n\n        if in_channels != out_channels:\n            # self.shortcut = Conv2d(\n            #     in_channels,\n            #     out_channels,\n            #     kernel_size=1,\n            #     stride=stride,\n            #     bias=False,\n            #     norm=get_norm(norm, out_channels),\n            # )\n            self.shortcut = nn.Sequential(\n                nn.AvgPool2d(kernel_size=stride, stride=stride, \n                    ceil_mode=True, count_include_pad=False),\n                Conv2d(\n                    in_channels,\n                    out_channels,\n                    kernel_size=1,\n                    stride=1,\n                    bias=False,\n                    norm=get_norm(norm, out_channels),\n                )\n            )\n        else:\n            self.shortcut = None\n\n        stride_1x1, stride_3x3 = (stride, 1) if stride_in_1x1 else (1, stride)\n        width = bottleneck_channels//scale\n\n        self.conv1 = Conv2d(\n            in_channels,\n            bottleneck_channels,\n            kernel_size=1,\n            stride=stride_1x1,\n            bias=False,\n            norm=get_norm(norm, bottleneck_channels),\n        )\n\n        if scale == 1:\n          self.nums = 1\n        else:\n          self.nums = scale -1\n        if self.in_channels!=self.out_channels and stride_3x3!=2:\n            self.pool = nn.AvgPool2d(kernel_size=3, stride = stride_3x3, padding=1)\n\n        if deform_modulated:\n            deform_conv_op = ModulatedDeformConv\n            # offset channels are 2 or 3 (if with modulated) * kernel_size * kernel_size\n            offset_channels = 27\n        else:\n            deform_conv_op = DeformConv\n            offset_channels = 18\n\n        # self.conv2_offset = Conv2d(\n        #     bottleneck_channels,\n        #     offset_channels * deform_num_groups,\n        #     kernel_size=3,\n        #     stride=stride_3x3,\n        #     padding=1 * dilation,\n        #     dilation=dilation,\n        # )\n        # self.conv2 = deform_conv_op(\n        #     bottleneck_channels,\n        #     bottleneck_channels,\n        #     kernel_size=3,\n        #     stride=stride_3x3,\n        #     padding=1 * dilation,\n        #     bias=False,\n        #     groups=num_groups,\n        #     dilation=dilation,\n        #     deformable_groups=deform_num_groups,\n        #     norm=get_norm(norm, bottleneck_channels),\n        # )\n\n        conv2_offsets = []\n        convs = []\n        bns = []\n        for i in range(self.nums):\n            conv2_offsets.append(Conv2d(\n                            width, \n                            offset_channels * deform_num_groups, \n                            kernel_size=3, \n                            stride=stride_3x3, \n                            padding=1 * dilation, \n                            bias=False,\n                            groups=num_groups,\n                            dilation=dilation,\n                            ))\n            convs.append(deform_conv_op(\n                            width, \n                            width, \n                            kernel_size=3, \n                            stride=stride_3x3, \n                            padding=1 * dilation, \n                            bias=False,\n                            groups=num_groups,\n                            dilation=dilation,\n                            deformable_groups=deform_num_groups,\n                            ))\n            bns.append(get_norm(norm, width))\n        self.conv2_offsets = nn.ModuleList(conv2_offsets)\n        self.convs = nn.ModuleList(convs)\n        self.bns = nn.ModuleList(bns)\n\n        self.conv3 = Conv2d(\n            bottleneck_channels,\n            out_channels,\n            kernel_size=1,\n            bias=False,\n            norm=get_norm(norm, out_channels),\n        )\n        self.scale = scale\n        self.width = width\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n        self.stride_3x3 = stride_3x3\n        # for layer in [self.conv1, self.conv2, self.conv3, self.shortcut]:\n        #     if layer is not None:  # shortcut can be None\n        #         weight_init.c2_msra_fill(layer)\n\n        # nn.init.constant_(self.conv2_offset.weight, 0)\n        # nn.init.constant_(self.conv2_offset.bias, 0)\n        for layer in [self.conv1, self.conv3]:\n            if layer is not None:  # shortcut can be None\n                weight_init.c2_msra_fill(layer)\n        if self.shortcut is not None:\n            for layer in self.shortcut.modules():\n                if isinstance(layer, Conv2d):\n                    weight_init.c2_msra_fill(layer)\n                \n        for layer in self.convs:\n            if layer is not None:  # shortcut can be None\n                weight_init.c2_msra_fill(layer)\n\n        for layer in self.conv2_offsets:\n            if layer.weight is not None:\n                nn.init.constant_(layer.weight, 0)\n            if layer.bias is not None:\n                nn.init.constant_(layer.bias, 0)\n\n    def forward(self, x):\n        out = self.conv1(x)\n        out = F.relu_(out)\n\n        # if self.deform_modulated:\n        #     offset_mask = self.conv2_offset(out)\n        #     offset_x, offset_y, mask = torch.chunk(offset_mask, 3, dim=1)\n        #     offset = torch.cat((offset_x, offset_y), dim=1)\n        #     mask = mask.sigmoid()\n        #     out = self.conv2(out, offset, mask)\n        # else:\n        #     offset = self.conv2_offset(out)\n        #     out = self.conv2(out, offset)\n        # out = F.relu_(out)\n\n        spx = torch.split(out, self.width, 1)\n        for i in range(self.nums):\n            if i==0 or self.in_channels!=self.out_channels:\n                sp = spx[i].contiguous()\n            else:\n                sp = sp + spx[i].contiguous()\n            \n            # sp = self.convs[i](sp)\n            if self.deform_modulated:\n                offset_mask = self.conv2_offsets[i](sp)\n                offset_x, offset_y, mask = torch.chunk(offset_mask, 3, dim=1)\n                offset = torch.cat((offset_x, offset_y), dim=1)\n                mask = mask.sigmoid()\n                sp = self.convs[i](sp, offset, mask)\n            else:\n                offset = self.conv2_offsets[i](sp)\n                sp = self.convs[i](sp, offset)\n            sp = F.relu_(self.bns[i](sp))\n            if i==0:\n                out = sp\n            else:\n                out = torch.cat((out, sp), 1)\n        if self.scale!=1 and self.stride_3x3==1:\n            out = torch.cat((out, spx[self.nums]), 1)\n        elif self.scale != 1 and self.stride_3x3==2:\n            out = torch.cat((out, self.pool(spx[self.nums])), 1)\n\n        out = self.conv3(out)\n\n        if self.shortcut is not None:\n            shortcut = self.shortcut(x)\n        else:\n            shortcut = x\n\n        out += shortcut\n        out = F.relu_(out)\n        return out\n\n\ndef make_stage(block_class, num_blocks, first_stride, *, in_channels, out_channels, **kwargs):\n    \"\"\"\n    Create a list of blocks just like those in a ResNet stage.\n    Args:\n        block_class (type): a subclass of ResNetBlockBase\n        num_blocks (int):\n        first_stride (int): the stride of the first block. The other blocks will have stride=1.\n        in_channels (int): input channels of the entire stage.\n        out_channels (int): output channels of **every block** in the stage.\n        kwargs: other arguments passed to the constructor of every block.\n    Returns:\n        list[nn.Module]: a list of block module.\n    \"\"\"\n    assert \"stride\" not in kwargs, \"Stride of blocks in make_stage cannot be changed.\"\n    blocks = []\n    for i in range(num_blocks):\n        blocks.append(\n            block_class(\n                in_channels=in_channels,\n                out_channels=out_channels,\n                stride=first_stride if i == 0 else 1,\n                **kwargs,\n            )\n        )\n        in_channels = out_channels\n    return blocks\n\n\nclass BasicStem(CNNBlockBase):\n    \"\"\"\n    The standard ResNet stem (layers before the first residual block).\n    \"\"\"\n\n    def __init__(self, in_channels=3, out_channels=64, norm=\"BN\"):\n        \"\"\"\n        Args:\n            norm (str or callable): norm after the first conv layer.\n                See :func:`layers.get_norm` for supported format.\n        \"\"\"\n        super().__init__(in_channels, out_channels, 4)\n        self.in_channels = in_channels\n        self.conv1 = nn.Sequential(\n            Conv2d(\n                in_channels,\n                32,\n                kernel_size=3,\n                stride=2,\n                padding=1,\n                bias=False,\n                ),\n            get_norm(norm, 32),\n            nn.ReLU(inplace=True),\n            Conv2d(\n                32,\n                32,\n                kernel_size=3,\n                stride=1,\n                padding=1,\n                bias=False,\n                ),\n            get_norm(norm, 32),\n            nn.ReLU(inplace=True),\n            Conv2d(\n                32,\n                out_channels,\n                kernel_size=3,\n                stride=1,\n                padding=1,\n                bias=False,\n                ),\n        )\n        self.bn1 = get_norm(norm, out_channels)\n\n        for layer in self.conv1:\n            if isinstance(layer, Conv2d):\n                weight_init.c2_msra_fill(layer)\n\n    def forward(self, x):\n        x = self.conv1(x)\n        x = self.bn1(x)\n        x = F.relu_(x)\n        x = F.max_pool2d(x, kernel_size=3, stride=2, padding=1)\n        return x\n\n\nclass ResNet(Backbone):\n    def __init__(self, stem, stages, num_classes=None, out_features=None):\n        \"\"\"\n        Args:\n            stem (nn.Module): a stem module\n            stages (list[list[CNNBlockBase]]): several (typically 4) stages,\n                each contains multiple :class:`CNNBlockBase`.\n            num_classes (None or int): if None, will not perform classification.\n                Otherwise, will create a linear layer.\n            out_features (list[str]): name of the layers whose outputs should\n                be returned in forward. Can be anything in \"stem\", \"linear\", or \"res2\" ...\n                If None, will return the output of the last layer.\n        \"\"\"\n        super(ResNet, self).__init__()\n        self.stem = stem\n        self.num_classes = num_classes\n\n        current_stride = self.stem.stride\n        self._out_feature_strides = {\"stem\": current_stride}\n        self._out_feature_channels = {\"stem\": self.stem.out_channels}\n\n        self.stages_and_names = []\n        for i, blocks in enumerate(stages):\n            assert len(blocks) > 0, len(blocks)\n            for block in blocks:\n                assert isinstance(block, CNNBlockBase), block\n\n            name = \"res\" + str(i + 2)\n            stage = nn.Sequential(*blocks)\n\n            self.add_module(name, stage)\n            self.stages_and_names.append((stage, name))\n\n            self._out_feature_strides[name] = current_stride = int(\n                current_stride * np.prod([k.stride for k in blocks])\n            )\n            self._out_feature_channels[name] = curr_channels = blocks[-1].out_channels\n\n        if num_classes is not None:\n            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))\n            self.linear = nn.Linear(curr_channels, num_classes)\n\n            # Sec 5.1 in \"Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour\":\n            # \"The 1000-way fully-connected layer is initialized by\n            # drawing weights from a zero-mean Gaussian with standard deviation of 0.01.\"\n            nn.init.normal_(self.linear.weight, std=0.01)\n            name = \"linear\"\n\n        if out_features is None:\n            out_features = [name]\n        self._out_features = out_features\n        assert len(self._out_features)\n        children = [x[0] for x in self.named_children()]\n        for out_feature in self._out_features:\n            assert out_feature in children, \"Available children: {}\".format(\", \".join(children))\n\n    def forward(self, x):\n        outputs = {}\n        x = self.stem(x)\n        if \"stem\" in self._out_features:\n            outputs[\"stem\"] = x\n        for stage, name in self.stages_and_names:\n            x = stage(x)\n            if name in self._out_features:\n                outputs[name] = x\n        if self.num_classes is not None:\n            x = self.avgpool(x)\n            x = torch.flatten(x, 1)\n            x = self.linear(x)\n            if \"linear\" in self._out_features:\n                outputs[\"linear\"] = x\n        return outputs\n\n    def output_shape(self):\n        return {\n            name: ShapeSpec(\n                channels=self._out_feature_channels[name], stride=self._out_feature_strides[name]\n            )\n            for name in self._out_features\n        }\n\n    def freeze(self, freeze_at=0):\n        \"\"\"\n        Freeze the first several stages of the ResNet. Commonly used in\n        fine-tuning.\n        Args:\n            freeze_at (int): number of stem and stages to freeze.\n                `1` means freezing the stem. `2` means freezing the stem and\n                the first stage, etc.\n        Returns:\n            nn.Module: this ResNet itself\n        \"\"\"\n        if freeze_at >= 1:\n            self.stem.freeze()\n        for idx, (stage, _) in enumerate(self.stages_and_names, start=2):\n            if freeze_at >= idx:\n                for block in stage.children():\n                    block.freeze()\n        return self\n\n\n@BACKBONE_REGISTRY.register()\ndef build_res2net_backbone(cfg, input_shape):\n    \"\"\"\n    Create a Res2Net instance from config.\n    Returns:\n        ResNet: a :class:`ResNet` instance.\n    \"\"\"\n    # need registration of new blocks/stems?\n    norm = cfg.MODEL.RESNETS.NORM\n    stem = BasicStem(\n        in_channels=input_shape.channels,\n        out_channels=cfg.MODEL.RESNETS.STEM_OUT_CHANNELS,\n        norm=norm,\n    )\n\n    # fmt: off\n    freeze_at           = cfg.MODEL.BACKBONE.FREEZE_AT\n    out_features        = cfg.MODEL.RESNETS.OUT_FEATURES\n    depth               = cfg.MODEL.RESNETS.DEPTH\n    num_groups          = cfg.MODEL.RESNETS.NUM_GROUPS\n    width_per_group     = cfg.MODEL.RESNETS.WIDTH_PER_GROUP\n    scale              = 4\n    bottleneck_channels = num_groups * width_per_group * scale\n    in_channels         = cfg.MODEL.RESNETS.STEM_OUT_CHANNELS\n    out_channels        = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS\n    stride_in_1x1       = cfg.MODEL.RESNETS.STRIDE_IN_1X1\n    res5_dilation       = cfg.MODEL.RESNETS.RES5_DILATION\n    deform_on_per_stage = cfg.MODEL.RESNETS.DEFORM_ON_PER_STAGE\n    deform_modulated    = cfg.MODEL.RESNETS.DEFORM_MODULATED\n    deform_num_groups   = cfg.MODEL.RESNETS.DEFORM_NUM_GROUPS\n    # fmt: on\n    assert res5_dilation in {1, 2}, \"res5_dilation cannot be {}.\".format(res5_dilation)\n\n    num_blocks_per_stage = {\n        18: [2, 2, 2, 2],\n        34: [3, 4, 6, 3],\n        50: [3, 4, 6, 3],\n        101: [3, 4, 23, 3],\n        152: [3, 8, 36, 3],\n    }[depth]\n\n    if depth in [18, 34]:\n        assert out_channels == 64, \"Must set MODEL.RESNETS.RES2_OUT_CHANNELS = 64 for R18/R34\"\n        assert not any(\n            deform_on_per_stage\n        ), \"MODEL.RESNETS.DEFORM_ON_PER_STAGE unsupported for R18/R34\"\n        assert res5_dilation == 1, \"Must set MODEL.RESNETS.RES5_DILATION = 1 for R18/R34\"\n        assert num_groups == 1, \"Must set MODEL.RESNETS.NUM_GROUPS = 1 for R18/R34\"\n\n    stages = []\n\n    # Avoid creating variables without gradients\n    # It consumes extra memory and may cause allreduce to fail\n    out_stage_idx = [{\"res2\": 2, \"res3\": 3, \"res4\": 4, \"res5\": 5}[f] for f in out_features]\n    max_stage_idx = max(out_stage_idx)\n    for idx, stage_idx in enumerate(range(2, max_stage_idx + 1)):\n        dilation = res5_dilation if stage_idx == 5 else 1\n        first_stride = 1 if idx == 0 or (stage_idx == 5 and dilation == 2) else 2\n        stage_kargs = {\n            \"num_blocks\": num_blocks_per_stage[idx],\n            \"first_stride\": first_stride,\n            \"in_channels\": in_channels,\n            \"out_channels\": out_channels,\n            \"norm\": norm,\n        }\n        # Use BasicBlock for R18 and R34.\n        if depth in [18, 34]:\n            stage_kargs[\"block_class\"] = BasicBlock\n        else:\n            stage_kargs[\"bottleneck_channels\"] = bottleneck_channels\n            stage_kargs[\"stride_in_1x1\"] = stride_in_1x1\n            stage_kargs[\"dilation\"] = dilation\n            stage_kargs[\"num_groups\"] = num_groups\n            stage_kargs[\"scale\"] = scale\n\n            if deform_on_per_stage[idx]:\n                stage_kargs[\"block_class\"] = DeformBottleneckBlock\n                stage_kargs[\"deform_modulated\"] = deform_modulated\n                stage_kargs[\"deform_num_groups\"] = deform_num_groups\n            else:\n                stage_kargs[\"block_class\"] = BottleneckBlock\n        blocks = make_stage(**stage_kargs)\n        in_channels = out_channels\n        out_channels *= 2\n        bottleneck_channels *= 2\n        stages.append(blocks)\n    return ResNet(stem, stages, out_features=out_features).freeze(freeze_at)\n\n\n@BACKBONE_REGISTRY.register()\ndef build_p67_res2net_fpn_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n    bottom_up = build_res2net_backbone(cfg, input_shape)\n    in_features = cfg.MODEL.FPN.IN_FEATURES\n    out_channels = cfg.MODEL.FPN.OUT_CHANNELS\n    backbone = FPN(\n        bottom_up=bottom_up,\n        in_features=in_features,\n        out_channels=out_channels,\n        norm=cfg.MODEL.FPN.NORM,\n        top_block=LastLevelP6P7_P5(out_channels, out_channels),\n        fuse_type=cfg.MODEL.FPN.FUSE_TYPE,\n    )\n    return backbone\n\n\n@BACKBONE_REGISTRY.register()\ndef build_res2net_bifpn_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n    bottom_up = build_res2net_backbone(cfg, input_shape)\n    in_features = cfg.MODEL.FPN.IN_FEATURES\n    backbone = BiFPN(\n        cfg=cfg,\n        bottom_up=bottom_up,\n        in_features=in_features,\n        out_channels=cfg.MODEL.BIFPN.OUT_CHANNELS,\n        norm=cfg.MODEL.BIFPN.NORM,\n        num_levels=cfg.MODEL.BIFPN.NUM_LEVELS,\n        num_bifpn=cfg.MODEL.BIFPN.NUM_BIFPN,\n        separable_conv=cfg.MODEL.BIFPN.SEPARABLE_CONV,\n    )\n    return backbone\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/debug.py",
    "content": "import cv2\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\n\nCOLORS = ((np.random.rand(1300, 3) * 0.4 + 0.6) * 255).astype(\n  np.uint8).reshape(1300, 1, 1, 3)\n\ndef _get_color_image(heatmap):\n  heatmap = heatmap.reshape(\n    heatmap.shape[0], heatmap.shape[1], heatmap.shape[2], 1)\n  if heatmap.shape[0] == 1:\n      color_map = (heatmap * np.ones((1, 1, 1, 3), np.uint8) * 255).max(\n          axis=0).astype(np.uint8) # H, W, 3\n  else:\n      color_map = (heatmap * COLORS[:heatmap.shape[0]]).max(axis=0).astype(np.uint8) # H, W, 3\n\n  return color_map\n\ndef _blend_image(image, color_map, a=0.7):\n  color_map = cv2.resize(color_map, (image.shape[1], image.shape[0]))\n  ret = np.clip(image * (1 - a) + color_map * a, 0, 255).astype(np.uint8)\n  return ret\n\ndef _blend_image_heatmaps(image, color_maps, a=0.7):\n    merges = np.zeros((image.shape[0], image.shape[1], 3), np.float32)\n    for color_map in color_maps:\n        color_map = cv2.resize(color_map, (image.shape[1], image.shape[0]))\n        merges = np.maximum(merges, color_map)\n    ret = np.clip(image * (1 - a) + merges * a, 0, 255).astype(np.uint8)\n    return ret\n\ndef _decompose_level(x, shapes_per_level, N):\n    '''\n    x: LNHiWi x C\n    '''\n    x = x.view(x.shape[0], -1)\n    ret = []\n    st = 0\n    for l in range(len(shapes_per_level)):\n        ret.append([])\n        h = shapes_per_level[l][0].int().item()\n        w = shapes_per_level[l][1].int().item()\n        for i in range(N):\n            ret[l].append(x[st + h * w * i:st + h * w * (i + 1)].view(\n                h, w, -1).permute(2, 0, 1))\n        st += h * w * N\n    return ret\n\ndef _imagelist_to_tensor(images):\n    images = [x for x in images]\n    image_sizes = [x.shape[-2:] for x in images]\n    h = max([size[0] for size in image_sizes])\n    w = max([size[1] for size in image_sizes])\n    S = 32\n    h, w = ((h - 1) // S + 1) * S, ((w - 1) // S + 1) * S\n    images = [F.pad(x, (0, w - x.shape[2], 0, h - x.shape[1], 0, 0)) \\\n        for x in images]\n    images = torch.stack(images)\n    return images\n\n\ndef _ind2il(ind, shapes_per_level, N):\n    r = ind\n    l = 0\n    S = 0\n    while r - S >= N * shapes_per_level[l][0] * shapes_per_level[l][1]:\n        S += N * shapes_per_level[l][0] * shapes_per_level[l][1]\n        l += 1\n    i = (r - S) // (shapes_per_level[l][0] * shapes_per_level[l][1])\n    return i, l\n\ndef debug_train(\n    images, gt_instances, flattened_hms, reg_targets, labels, pos_inds,\n    shapes_per_level, locations, strides):\n    '''\n    images: N x 3 x H x W\n    flattened_hms: LNHiWi x C\n    shapes_per_level: L x 2 [(H_i, W_i)]\n    locations: LNHiWi x 2\n    '''\n    reg_inds = torch.nonzero(\n        reg_targets.max(dim=1)[0] > 0).squeeze(1)\n    N = len(images)\n    images = _imagelist_to_tensor(images)\n    repeated_locations = [torch.cat([loc] * N, dim=0) \\\n        for loc in locations]\n    locations = torch.cat(repeated_locations, dim=0)\n    gt_hms = _decompose_level(flattened_hms, shapes_per_level, N)\n    masks = flattened_hms.new_zeros((flattened_hms.shape[0], 1))\n    masks[pos_inds] = 1\n    masks = _decompose_level(masks, shapes_per_level, N)\n    for i in range(len(images)):\n        image = images[i].detach().cpu().numpy().transpose(1, 2, 0)\n        color_maps = []\n        for l in range(len(gt_hms)):\n            color_map = _get_color_image(\n                gt_hms[l][i].detach().cpu().numpy())\n            color_maps.append(color_map)\n            cv2.imshow('gthm_{}'.format(l), color_map)\n        blend = _blend_image_heatmaps(image.copy(), color_maps)\n        if gt_instances is not None:\n            bboxes = gt_instances[i].gt_boxes.tensor\n            for j in range(len(bboxes)):\n                bbox = bboxes[j]\n                cv2.rectangle(\n                    blend, \n                    (int(bbox[0]), int(bbox[1])),\n                    (int(bbox[2]), int(bbox[3])),\n                    (0, 0, 255), 3, cv2.LINE_AA)\n    \n        for j in range(len(pos_inds)):\n            image_id, l = _ind2il(pos_inds[j], shapes_per_level, N)\n            if image_id != i:\n                continue\n            loc = locations[pos_inds[j]]\n            cv2.drawMarker(\n                blend, (int(loc[0]), int(loc[1])), (0, 255, 255),\n                markerSize=(l + 1) * 16)\n        \n        for j in range(len(reg_inds)):\n            image_id, l = _ind2il(reg_inds[j], shapes_per_level, N)\n            if image_id != i:\n                continue\n            ltrb = reg_targets[reg_inds[j]]\n            ltrb *= strides[l]\n            loc = locations[reg_inds[j]]\n            bbox = [(loc[0] - ltrb[0]), (loc[1] - ltrb[1]),\n                    (loc[0] + ltrb[2]), (loc[1] + ltrb[3])]\n            cv2.rectangle(\n                blend, \n                (int(bbox[0]), int(bbox[1])),\n                (int(bbox[2]), int(bbox[3])),\n                (255, 0, 0), 1, cv2.LINE_AA)  \n            cv2.circle(blend, (int(loc[0]), int(loc[1])), 2, (255, 0, 0), -1)\n\n        cv2.imshow('blend', blend)\n        cv2.waitKey()\n\n\ndef debug_test(\n    images, logits_pred, reg_pred, agn_hm_pred=[], preds=[], \n    vis_thresh=0.3, debug_show_name=False, mult_agn=False):\n    '''\n    images: N x 3 x H x W\n    class_target: LNHiWi x C\n    cat_agn_heatmap: LNHiWi\n    shapes_per_level: L x 2 [(H_i, W_i)]\n    '''\n    N = len(images)\n    for i in range(len(images)):\n        image = images[i].detach().cpu().numpy().transpose(1, 2, 0)\n        result = image.copy().astype(np.uint8)\n        pred_image = image.copy().astype(np.uint8)\n        color_maps = []\n        L = len(logits_pred)\n        for l in range(L):\n            if logits_pred[0] is not None:\n                stride = min(image.shape[0], image.shape[1]) / min(\n                    logits_pred[l][i].shape[1], logits_pred[l][i].shape[2])\n            else:\n                stride = min(image.shape[0], image.shape[1]) / min(\n                    agn_hm_pred[l][i].shape[1], agn_hm_pred[l][i].shape[2])\n            stride = stride if stride < 60 else 64 if stride < 100 else 128\n            if logits_pred[0] is not None:\n                if mult_agn:\n                    logits_pred[l][i] = logits_pred[l][i] * agn_hm_pred[l][i]\n                color_map = _get_color_image(\n                    logits_pred[l][i].detach().cpu().numpy())\n                color_maps.append(color_map)\n                cv2.imshow('predhm_{}'.format(l), color_map)\n\n            if debug_show_name:\n                from detectron2.data.datasets.lvis_v1_categories import LVIS_CATEGORIES \n                cat2name = [x['name'] for x in LVIS_CATEGORIES]\n            for j in range(len(preds[i].scores) if preds is not None else 0):\n                if preds[i].scores[j] > vis_thresh:\n                    bbox = preds[i].proposal_boxes[j] \\\n                        if preds[i].has('proposal_boxes') else \\\n                        preds[i].pred_boxes[j]\n                    bbox = bbox.tensor[0].detach().cpu().numpy().astype(np.int32)\n                    cat = int(preds[i].pred_classes[j]) \\\n                        if preds[i].has('pred_classes') else 0\n                    cl = COLORS[cat, 0, 0]\n                    cv2.rectangle(\n                        pred_image, (int(bbox[0]), int(bbox[1])), \n                        (int(bbox[2]), int(bbox[3])), \n                        (int(cl[0]), int(cl[1]), int(cl[2])), 2, cv2.LINE_AA)\n                    if debug_show_name:\n                        txt = '{}{:.1f}'.format(\n                            cat2name[cat] if cat > 0 else '', \n                            preds[i].scores[j])\n                        font = cv2.FONT_HERSHEY_SIMPLEX\n                        cat_size = cv2.getTextSize(txt, font, 0.5, 2)[0]\n                        cv2.rectangle(\n                            pred_image,\n                            (int(bbox[0]), int(bbox[1] - cat_size[1] - 2)),\n                            (int(bbox[0] + cat_size[0]), int(bbox[1] - 2)), \n                            (int(cl[0]), int(cl[1]), int(cl[2])), -1)\n                        cv2.putText(\n                            pred_image, txt, (int(bbox[0]), int(bbox[1] - 2)), \n                            font, 0.5, (0, 0, 0), thickness=1, lineType=cv2.LINE_AA)\n\n\n            if agn_hm_pred[l] is not None:\n                agn_hm_ = agn_hm_pred[l][i, 0, :, :, None].detach().cpu().numpy()\n                agn_hm_ = (agn_hm_ * np.array([255, 255, 255]).reshape(\n                    1, 1, 3)).astype(np.uint8)\n                cv2.imshow('agn_hm_{}'.format(l), agn_hm_)\n        blend = _blend_image_heatmaps(image.copy(), color_maps)\n        cv2.imshow('blend', blend)\n        cv2.imshow('preds', pred_image)\n        cv2.waitKey()\n\nglobal cnt\ncnt = 0\n\ndef debug_second_stage(images, instances, proposals=None, vis_thresh=0.3, \n    save_debug=False, debug_show_name=False):\n    images = _imagelist_to_tensor(images)\n    if debug_show_name:\n        from detectron2.data.datasets.lvis_v1_categories import LVIS_CATEGORIES\n        cat2name = [x['name'] for x in LVIS_CATEGORIES]\n    for i in range(len(images)):\n        image = images[i].detach().cpu().numpy().transpose(1, 2, 0).astype(np.uint8).copy()\n        if instances[i].has('gt_boxes'):\n            bboxes = instances[i].gt_boxes.tensor.cpu().numpy()\n            scores = np.ones(bboxes.shape[0])\n            cats = instances[i].gt_classes.cpu().numpy()\n        else:\n            bboxes = instances[i].pred_boxes.tensor.cpu().numpy()\n            scores = instances[i].scores.cpu().numpy()\n            cats = instances[i].pred_classes.cpu().numpy()\n        for j in range(len(bboxes)):\n            if scores[j] > vis_thresh:\n                bbox = bboxes[j]\n                cl = COLORS[cats[j], 0, 0]\n                cl = (int(cl[0]), int(cl[1]), int(cl[2]))\n                cv2.rectangle(\n                    image, \n                    (int(bbox[0]), int(bbox[1])),\n                    (int(bbox[2]), int(bbox[3])),\n                    cl, 2, cv2.LINE_AA)\n                if debug_show_name:\n                    cat = cats[j]\n                    txt = '{}{:.1f}'.format(\n                        cat2name[cat] if cat > 0 else '', \n                        scores[j])\n                    font = cv2.FONT_HERSHEY_SIMPLEX\n                    cat_size = cv2.getTextSize(txt, font, 0.5, 2)[0]\n                    cv2.rectangle(\n                        image,\n                        (int(bbox[0]), int(bbox[1] - cat_size[1] - 2)),\n                        (int(bbox[0] + cat_size[0]), int(bbox[1] - 2)), \n                        (int(cl[0]), int(cl[1]), int(cl[2])), -1)\n                    cv2.putText(\n                        image, txt, (int(bbox[0]), int(bbox[1] - 2)), \n                        font, 0.5, (0, 0, 0), thickness=1, lineType=cv2.LINE_AA)\n        if proposals is not None:\n            proposal_image = images[i].detach().cpu().numpy().transpose(1, 2, 0).astype(np.uint8).copy()\n            bboxes = proposals[i].proposal_boxes.tensor.cpu().numpy()\n            if proposals[i].has('scores'):\n                scores = proposals[i].scores.cpu().numpy()\n            else:\n                scores = proposals[i].objectness_logits.sigmoid().cpu().numpy()\n            for j in range(len(bboxes)):\n                if scores[j] > vis_thresh:\n                    bbox = bboxes[j]\n                    cl = (209, 159, 83)\n                    cv2.rectangle(\n                        proposal_image, \n                        (int(bbox[0]), int(bbox[1])),\n                        (int(bbox[2]), int(bbox[3])),\n                        cl, 2, cv2.LINE_AA)\n                            \n        cv2.imshow('image', image)\n        if proposals is not None:\n            cv2.imshow('proposals', proposal_image)\n            if save_debug:\n                global cnt\n                cnt += 1\n                cv2.imwrite('output/save_debug/{}.jpg'.format(cnt), proposal_image)\n        cv2.waitKey()"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/dense_heads/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/dense_heads/centernet.py",
    "content": "\nimport math\nimport json\nimport copy\nfrom typing import List, Dict\nimport numpy as np\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nfrom detectron2.modeling.proposal_generator.build import PROPOSAL_GENERATOR_REGISTRY\nfrom detectron2.layers import ShapeSpec, cat\nfrom detectron2.structures import Instances, Boxes\nfrom detectron2.modeling import detector_postprocess\nfrom detectron2.utils.comm import get_world_size\nfrom detectron2.config import configurable\n\nfrom ..layers.heatmap_focal_loss import heatmap_focal_loss_jit\nfrom ..layers.heatmap_focal_loss import  binary_heatmap_focal_loss\nfrom ..layers.iou_loss import IOULoss\nfrom ..layers.ml_nms import ml_nms\nfrom ..debug import debug_train, debug_test\nfrom .utils import reduce_sum, _transpose\nfrom .centernet_head import CenterNetHead\n\n__all__ = [\"CenterNet\"]\n\nINF = 100000000\n\n@PROPOSAL_GENERATOR_REGISTRY.register()\nclass CenterNet(nn.Module):\n    @configurable\n    def __init__(self, \n        # input_shape: Dict[str, ShapeSpec],\n        in_channels=256,\n        *,\n        num_classes=80,\n        in_features=(\"p3\", \"p4\", \"p5\", \"p6\", \"p7\"),\n        strides=(8, 16, 32, 64, 128),\n        score_thresh=0.05,\n        hm_min_overlap=0.8,\n        loc_loss_type='giou',\n        min_radius=4,\n        hm_focal_alpha=0.25,\n        hm_focal_beta=4,\n        loss_gamma=2.0,\n        reg_weight=2.0,\n        not_norm_reg=True,\n        with_agn_hm=False,\n        only_proposal=False,\n        as_proposal=False,\n        not_nms=False,\n        pos_weight=1.,\n        neg_weight=1.,\n        sigmoid_clamp=1e-4,\n        ignore_high_fp=-1.,\n        center_nms=False,\n        sizes_of_interest=[[0,80],[64,160],[128,320],[256,640],[512,10000000]],\n        more_pos=False,\n        more_pos_thresh=0.2,\n        more_pos_topk=9,\n        pre_nms_topk_train=1000,\n        pre_nms_topk_test=1000,\n        post_nms_topk_train=100,\n        post_nms_topk_test=100,\n        nms_thresh_train=0.6,\n        nms_thresh_test=0.6,\n        no_reduce=False,\n        debug=False,\n        vis_thresh=0.5,\n        pixel_mean=[103.530,116.280,123.675],\n        pixel_std=[1.0,1.0,1.0],\n        device='cuda',\n        centernet_head=None,\n    ):\n        super().__init__()\n        self.num_classes = num_classes\n        self.in_features = in_features\n        self.strides = strides\n        self.score_thresh = score_thresh\n        self.min_radius = min_radius\n        self.hm_focal_alpha = hm_focal_alpha\n        self.hm_focal_beta = hm_focal_beta\n        self.loss_gamma = loss_gamma\n        self.reg_weight = reg_weight\n        self.not_norm_reg = not_norm_reg\n        self.with_agn_hm = with_agn_hm\n        self.only_proposal = only_proposal\n        self.as_proposal = as_proposal\n        self.not_nms = not_nms\n        self.pos_weight = pos_weight\n        self.neg_weight = neg_weight\n        self.sigmoid_clamp = sigmoid_clamp\n        self.ignore_high_fp = ignore_high_fp\n        self.center_nms = center_nms\n        self.sizes_of_interest = sizes_of_interest\n        self.more_pos = more_pos\n        self.more_pos_thresh = more_pos_thresh\n        self.more_pos_topk = more_pos_topk\n        self.pre_nms_topk_train = pre_nms_topk_train\n        self.pre_nms_topk_test = pre_nms_topk_test\n        self.post_nms_topk_train = post_nms_topk_train\n        self.post_nms_topk_test = post_nms_topk_test\n        self.nms_thresh_train = nms_thresh_train\n        self.nms_thresh_test = nms_thresh_test\n        self.no_reduce = no_reduce\n        self.debug = debug\n        self.vis_thresh = vis_thresh\n        if self.center_nms:\n            self.not_nms = True\n        self.iou_loss = IOULoss(loc_loss_type)\n        assert (not self.only_proposal) or self.with_agn_hm\n        # delta for rendering heatmap\n        self.delta = (1 - hm_min_overlap) / (1 + hm_min_overlap)\n        if centernet_head is None:\n            self.centernet_head = CenterNetHead(\n                in_channels=in_channels,\n                num_levels=len(in_features),\n                with_agn_hm=with_agn_hm,\n                only_proposal=only_proposal)\n        else:\n            self.centernet_head = centernet_head\n        if self.debug:\n            pixel_mean = torch.Tensor(pixel_mean).to(\n                torch.device(device)).view(3, 1, 1)\n            pixel_std = torch.Tensor(pixel_std).to(\n                torch.device(device)).view(3, 1, 1)\n            self.denormalizer = lambda x: x * pixel_std + pixel_mean\n\n    @classmethod\n    def from_config(cls, cfg, input_shape):\n        ret = {\n            # 'input_shape': input_shape,\n            'in_channels': input_shape[\n                cfg.MODEL.CENTERNET.IN_FEATURES[0]].channels,\n            'num_classes': cfg.MODEL.CENTERNET.NUM_CLASSES,\n            'in_features': cfg.MODEL.CENTERNET.IN_FEATURES,\n            'strides': cfg.MODEL.CENTERNET.FPN_STRIDES,\n            'score_thresh': cfg.MODEL.CENTERNET.INFERENCE_TH,\n            'loc_loss_type': cfg.MODEL.CENTERNET.LOC_LOSS_TYPE,\n            'hm_min_overlap': cfg.MODEL.CENTERNET.HM_MIN_OVERLAP,\n            'min_radius': cfg.MODEL.CENTERNET.MIN_RADIUS,\n            'hm_focal_alpha': cfg.MODEL.CENTERNET.HM_FOCAL_ALPHA,\n            'hm_focal_beta': cfg.MODEL.CENTERNET.HM_FOCAL_BETA,\n            'loss_gamma': cfg.MODEL.CENTERNET.LOSS_GAMMA,\n            'reg_weight': cfg.MODEL.CENTERNET.REG_WEIGHT,\n            'not_norm_reg': cfg.MODEL.CENTERNET.NOT_NORM_REG,\n            'with_agn_hm': cfg.MODEL.CENTERNET.WITH_AGN_HM,\n            'only_proposal': cfg.MODEL.CENTERNET.ONLY_PROPOSAL,\n            'as_proposal': cfg.MODEL.CENTERNET.AS_PROPOSAL,\n            'not_nms': cfg.MODEL.CENTERNET.NOT_NMS,\n            'pos_weight': cfg.MODEL.CENTERNET.POS_WEIGHT,\n            'neg_weight': cfg.MODEL.CENTERNET.NEG_WEIGHT,\n            'sigmoid_clamp': cfg.MODEL.CENTERNET.SIGMOID_CLAMP,\n            'ignore_high_fp': cfg.MODEL.CENTERNET.IGNORE_HIGH_FP,\n            'center_nms': cfg.MODEL.CENTERNET.CENTER_NMS,\n            'sizes_of_interest': cfg.MODEL.CENTERNET.SOI,\n            'more_pos': cfg.MODEL.CENTERNET.MORE_POS,\n            'more_pos_thresh': cfg.MODEL.CENTERNET.MORE_POS_THRESH,\n            'more_pos_topk': cfg.MODEL.CENTERNET.MORE_POS_TOPK,\n            'pre_nms_topk_train': cfg.MODEL.CENTERNET.PRE_NMS_TOPK_TRAIN,\n            'pre_nms_topk_test': cfg.MODEL.CENTERNET.PRE_NMS_TOPK_TEST,\n            'post_nms_topk_train': cfg.MODEL.CENTERNET.POST_NMS_TOPK_TRAIN,\n            'post_nms_topk_test': cfg.MODEL.CENTERNET.POST_NMS_TOPK_TEST,\n            'nms_thresh_train': cfg.MODEL.CENTERNET.NMS_TH_TRAIN,\n            'nms_thresh_test': cfg.MODEL.CENTERNET.NMS_TH_TEST,\n            'no_reduce': cfg.MODEL.CENTERNET.NO_REDUCE,\n            'debug': cfg.DEBUG,\n            'vis_thresh': cfg.VIS_THRESH,\n            'pixel_mean': cfg.MODEL.PIXEL_MEAN,\n            'pixel_std': cfg.MODEL.PIXEL_STD,\n            'device': cfg.MODEL.DEVICE,\n            'centernet_head': CenterNetHead(\n                cfg, [input_shape[f] for f in cfg.MODEL.CENTERNET.IN_FEATURES]),\n        }\n        return ret\n\n\n    def forward(self, images, features_dict, gt_instances):\n        features = [features_dict[f] for f in self.in_features]\n        clss_per_level, reg_pred_per_level, agn_hm_pred_per_level = \\\n            self.centernet_head(features)\n        grids = self.compute_grids(features)\n        shapes_per_level = grids[0].new_tensor(\n                    [(x.shape[2], x.shape[3]) for x in reg_pred_per_level])\n        \n        if not self.training:\n            return self.inference(\n                images, clss_per_level, reg_pred_per_level, \n                agn_hm_pred_per_level, grids)\n        else:\n            pos_inds, labels, reg_targets, flattened_hms = \\\n                self._get_ground_truth(\n                    grids, shapes_per_level, gt_instances)\n            # logits_pred: M x F, reg_pred: M x 4, agn_hm_pred: M\n            logits_pred, reg_pred, agn_hm_pred = self._flatten_outputs(\n                clss_per_level, reg_pred_per_level, agn_hm_pred_per_level)\n\n            if self.more_pos:\n                # add more pixels as positive if \\\n                #   1. they are within the center3x3 region of an object\n                #   2. their regression losses are small (<self.more_pos_thresh)\n                pos_inds, labels = self._add_more_pos(\n                    reg_pred, gt_instances, shapes_per_level)\n            \n            losses = self.losses(\n                pos_inds, labels, reg_targets, flattened_hms,\n                logits_pred, reg_pred, agn_hm_pred)\n            \n            proposals = None\n            if self.only_proposal:\n                agn_hm_pred_per_level = [x.sigmoid() for x in agn_hm_pred_per_level]\n                proposals = self.predict_instances(\n                    grids, agn_hm_pred_per_level, reg_pred_per_level, \n                    images.image_sizes, [None for _ in agn_hm_pred_per_level])\n            elif self.as_proposal: # category specific bbox as agnostic proposals\n                clss_per_level = [x.sigmoid() for x in clss_per_level]\n                proposals = self.predict_instances(\n                    grids, clss_per_level, reg_pred_per_level, \n                    images.image_sizes, agn_hm_pred_per_level)\n            if self.only_proposal or self.as_proposal:\n                for p in range(len(proposals)):\n                    proposals[p].proposal_boxes = proposals[p].get('pred_boxes')\n                    proposals[p].objectness_logits = proposals[p].get('scores')\n                    proposals[p].remove('pred_boxes')\n                    proposals[p].remove('scores')\n                    proposals[p].remove('pred_classes')\n\n            if self.debug:\n                debug_train(\n                    [self.denormalizer(x) for x in images], \n                    gt_instances, flattened_hms, reg_targets, \n                    labels, pos_inds, shapes_per_level, grids, self.strides)\n            return proposals, losses\n\n\n    def losses(\n        self, pos_inds, labels, reg_targets, flattened_hms,\n        logits_pred, reg_pred, agn_hm_pred):\n        '''\n        Inputs:\n            pos_inds: N\n            labels: N\n            reg_targets: M x 4\n            flattened_hms: M x C\n            logits_pred: M x C\n            reg_pred: M x 4\n            agn_hm_pred: M x 1 or None\n            N: number of positive locations in all images\n            M: number of pixels from all FPN levels\n            C: number of classes\n        '''\n        assert (torch.isfinite(reg_pred).all().item())\n        num_pos_local = pos_inds.numel()\n        num_gpus = get_world_size()\n        if self.no_reduce:\n            total_num_pos = num_pos_local * num_gpus\n        else:\n            total_num_pos = reduce_sum(\n                pos_inds.new_tensor([num_pos_local])).item()\n        num_pos_avg = max(total_num_pos / num_gpus, 1.0)\n\n        losses = {}\n        if not self.only_proposal:\n            pos_loss, neg_loss = heatmap_focal_loss_jit(\n                logits_pred, flattened_hms, pos_inds, labels,\n                alpha=self.hm_focal_alpha, \n                beta=self.hm_focal_beta, \n                gamma=self.loss_gamma, \n                reduction='sum',\n                sigmoid_clamp=self.sigmoid_clamp,\n                ignore_high_fp=self.ignore_high_fp,\n            )\n            pos_loss = self.pos_weight * pos_loss / num_pos_avg\n            neg_loss = self.neg_weight * neg_loss / num_pos_avg\n            losses['loss_centernet_pos'] = pos_loss\n            losses['loss_centernet_neg'] = neg_loss\n        \n        reg_inds = torch.nonzero(reg_targets.max(dim=1)[0] >= 0).squeeze(1)\n        reg_pred = reg_pred[reg_inds]\n        reg_targets_pos = reg_targets[reg_inds]\n        reg_weight_map = flattened_hms.max(dim=1)[0]\n        reg_weight_map = reg_weight_map[reg_inds]\n        reg_weight_map = reg_weight_map * 0 + 1 \\\n            if self.not_norm_reg else reg_weight_map\n        if self.no_reduce:\n            reg_norm = max(reg_weight_map.sum(), 1)\n        else:\n            reg_norm = max(reduce_sum(reg_weight_map.sum()).item() / num_gpus, 1)\n        \n        reg_loss = self.reg_weight * self.iou_loss(\n            reg_pred, reg_targets_pos, reg_weight_map,\n            reduction='sum') / reg_norm\n        losses['loss_centernet_loc'] = reg_loss\n\n        if self.with_agn_hm:\n            cat_agn_heatmap = flattened_hms.max(dim=1)[0] # M\n            agn_pos_loss, agn_neg_loss = binary_heatmap_focal_loss(\n                agn_hm_pred, cat_agn_heatmap, pos_inds,\n                alpha=self.hm_focal_alpha, \n                beta=self.hm_focal_beta, \n                gamma=self.loss_gamma,\n                sigmoid_clamp=self.sigmoid_clamp,\n                ignore_high_fp=self.ignore_high_fp,\n            )\n            agn_pos_loss = self.pos_weight * agn_pos_loss / num_pos_avg\n            agn_neg_loss = self.neg_weight * agn_neg_loss / num_pos_avg\n            losses['loss_centernet_agn_pos'] = agn_pos_loss\n            losses['loss_centernet_agn_neg'] = agn_neg_loss\n    \n        if self.debug:\n            print('losses', losses)\n            print('total_num_pos', total_num_pos)\n        return losses\n\n\n    def compute_grids(self, features):\n        grids = []\n        for level, feature in enumerate(features):\n            h, w = feature.size()[-2:]\n            shifts_x = torch.arange(\n                0, w * self.strides[level], \n                step=self.strides[level],\n                dtype=torch.float32, device=feature.device)\n            shifts_y = torch.arange(\n                0, h * self.strides[level], \n                step=self.strides[level],\n                dtype=torch.float32, device=feature.device)\n            shift_y, shift_x = torch.meshgrid(shifts_y, shifts_x)\n            shift_x = shift_x.reshape(-1)\n            shift_y = shift_y.reshape(-1)\n            grids_per_level = torch.stack((shift_x, shift_y), dim=1) + \\\n                self.strides[level] // 2\n            grids.append(grids_per_level)\n        return grids\n\n\n    def _get_ground_truth(self, grids, shapes_per_level, gt_instances):\n        '''\n        Input:\n            grids: list of tensors [(hl x wl, 2)]_l\n            shapes_per_level: list of tuples L x 2:\n            gt_instances: gt instances\n        Retuen:\n            pos_inds: N\n            labels: N\n            reg_targets: M x 4\n            flattened_hms: M x C or M x 1\n            N: number of objects in all images\n            M: number of pixels from all FPN levels\n        '''\n\n        # get positive pixel index\n        if not self.more_pos:\n            pos_inds, labels = self._get_label_inds(\n                gt_instances, shapes_per_level) \n        else:\n            pos_inds, labels = None, None\n        heatmap_channels = self.num_classes\n        L = len(grids)\n        num_loc_list = [len(loc) for loc in grids]\n        strides = torch.cat([\n            shapes_per_level.new_ones(num_loc_list[l]) * self.strides[l] \\\n            for l in range(L)]).float() # M\n        reg_size_ranges = torch.cat([\n            shapes_per_level.new_tensor(self.sizes_of_interest[l]).float().view(\n            1, 2).expand(num_loc_list[l], 2) for l in range(L)]) # M x 2\n        grids = torch.cat(grids, dim=0) # M x 2\n        M = grids.shape[0]\n\n        reg_targets = []\n        flattened_hms = []\n        for i in range(len(gt_instances)): # images\n            boxes = gt_instances[i].gt_boxes.tensor # N x 4\n            area = gt_instances[i].gt_boxes.area() # N\n            gt_classes = gt_instances[i].gt_classes # N in [0, self.num_classes]\n\n            N = boxes.shape[0]\n            if N == 0:\n                reg_targets.append(grids.new_zeros((M, 4)) - INF)\n                flattened_hms.append(\n                    grids.new_zeros((\n                        M, 1 if self.only_proposal else heatmap_channels)))\n                continue\n            \n            l = grids[:, 0].view(M, 1) - boxes[:, 0].view(1, N) # M x N\n            t = grids[:, 1].view(M, 1) - boxes[:, 1].view(1, N) # M x N\n            r = boxes[:, 2].view(1, N) - grids[:, 0].view(M, 1) # M x N\n            b = boxes[:, 3].view(1, N) - grids[:, 1].view(M, 1) # M x N\n            reg_target = torch.stack([l, t, r, b], dim=2) # M x N x 4\n\n            centers = ((boxes[:, [0, 1]] + boxes[:, [2, 3]]) / 2) # N x 2\n            centers_expanded = centers.view(1, N, 2).expand(M, N, 2) # M x N x 2\n            strides_expanded = strides.view(M, 1, 1).expand(M, N, 2)\n            centers_discret = ((centers_expanded / strides_expanded).int() * \\\n                strides_expanded).float() + strides_expanded / 2 # M x N x 2\n            \n            is_peak = (((grids.view(M, 1, 2).expand(M, N, 2) - \\\n                centers_discret) ** 2).sum(dim=2) == 0) # M x N\n            is_in_boxes = reg_target.min(dim=2)[0] > 0 # M x N\n            is_center3x3 = self.get_center3x3(\n                grids, centers, strides) & is_in_boxes # M x N\n            is_cared_in_the_level = self.assign_reg_fpn(\n                reg_target, reg_size_ranges) # M x N\n            reg_mask = is_center3x3 & is_cared_in_the_level # M x N\n\n            dist2 = ((grids.view(M, 1, 2).expand(M, N, 2) - \\\n                centers_expanded) ** 2).sum(dim=2) # M x N\n            dist2[is_peak] = 0\n            radius2 = self.delta ** 2 * 2 * area # N\n            radius2 = torch.clamp(\n                radius2, min=self.min_radius ** 2)\n            weighted_dist2 = dist2 / radius2.view(1, N).expand(M, N) # M x N            \n            reg_target = self._get_reg_targets(\n                reg_target, weighted_dist2.clone(), reg_mask, area) # M x 4\n\n            if self.only_proposal:\n                flattened_hm = self._create_agn_heatmaps_from_dist(\n                    weighted_dist2.clone()) # M x 1\n            else:\n                flattened_hm = self._create_heatmaps_from_dist(\n                    weighted_dist2.clone(), gt_classes, \n                    channels=heatmap_channels) # M x C\n\n            reg_targets.append(reg_target)\n            flattened_hms.append(flattened_hm)\n        \n        # transpose im first training_targets to level first ones\n        reg_targets = _transpose(reg_targets, num_loc_list)\n        flattened_hms = _transpose(flattened_hms, num_loc_list)\n        for l in range(len(reg_targets)):\n            reg_targets[l] = reg_targets[l] / float(self.strides[l])\n        reg_targets = cat([x for x in reg_targets], dim=0) # MB x 4\n        flattened_hms = cat([x for x in flattened_hms], dim=0) # MB x C\n        \n        return pos_inds, labels, reg_targets, flattened_hms\n\n\n    def _get_label_inds(self, gt_instances, shapes_per_level):\n        '''\n        Inputs:\n            gt_instances: [n_i], sum n_i = N\n            shapes_per_level: L x 2 [(h_l, w_l)]_L\n        Returns:\n            pos_inds: N'\n            labels: N'\n        '''\n        pos_inds = []\n        labels = []\n        L = len(self.strides)\n        B = len(gt_instances)\n        shapes_per_level = shapes_per_level.long()\n        loc_per_level = (shapes_per_level[:, 0] * shapes_per_level[:, 1]).long() # L\n        level_bases = []\n        s = 0\n        for l in range(L):\n            level_bases.append(s)\n            s = s + B * loc_per_level[l]\n        level_bases = shapes_per_level.new_tensor(level_bases).long() # L\n        strides_default = shapes_per_level.new_tensor(self.strides).float() # L\n        for im_i in range(B):\n            targets_per_im = gt_instances[im_i]\n            bboxes = targets_per_im.gt_boxes.tensor # n x 4\n            n = bboxes.shape[0]\n            centers = ((bboxes[:, [0, 1]] + bboxes[:, [2, 3]]) / 2) # n x 2\n            centers = centers.view(n, 1, 2).expand(n, L, 2)\n            strides = strides_default.view(1, L, 1).expand(n, L, 2)\n            centers_inds = (centers / strides).long() # n x L x 2\n            Ws = shapes_per_level[:, 1].view(1, L).expand(n, L)\n            pos_ind = level_bases.view(1, L).expand(n, L) + \\\n                       im_i * loc_per_level.view(1, L).expand(n, L) + \\\n                       centers_inds[:, :, 1] * Ws + \\\n                       centers_inds[:, :, 0] # n x L\n            is_cared_in_the_level = self.assign_fpn_level(bboxes)\n            pos_ind = pos_ind[is_cared_in_the_level].view(-1)\n            label = targets_per_im.gt_classes.view(\n                n, 1).expand(n, L)[is_cared_in_the_level].view(-1)\n\n            pos_inds.append(pos_ind) # n'\n            labels.append(label) # n'\n        pos_inds = torch.cat(pos_inds, dim=0).long()\n        labels = torch.cat(labels, dim=0)\n        return pos_inds, labels # N, N\n\n\n    def assign_fpn_level(self, boxes):\n        '''\n        Inputs:\n            boxes: n x 4\n            size_ranges: L x 2\n        Return:\n            is_cared_in_the_level: n x L\n        '''\n        size_ranges = boxes.new_tensor(\n            self.sizes_of_interest).view(len(self.sizes_of_interest), 2) # L x 2\n        crit = ((boxes[:, 2:] - boxes[:, :2]) **2).sum(dim=1) ** 0.5 / 2 # n\n        n, L = crit.shape[0], size_ranges.shape[0]\n        crit = crit.view(n, 1).expand(n, L)\n        size_ranges_expand = size_ranges.view(1, L, 2).expand(n, L, 2)\n        is_cared_in_the_level = (crit >= size_ranges_expand[:, :, 0]) & \\\n            (crit <= size_ranges_expand[:, :, 1])\n        return is_cared_in_the_level\n    \n\n    def assign_reg_fpn(self, reg_targets_per_im, size_ranges):\n        '''\n        TODO (Xingyi): merge it with assign_fpn_level\n        Inputs:\n            reg_targets_per_im: M x N x 4\n            size_ranges: M x 2\n        '''\n        crit = ((reg_targets_per_im[:, :, :2] + \\\n            reg_targets_per_im[:, :, 2:])**2).sum(dim=2) ** 0.5 / 2 # M x N\n        is_cared_in_the_level = (crit >= size_ranges[:, [0]]) & \\\n            (crit <= size_ranges[:, [1]])\n        return is_cared_in_the_level\n\n\n    def _get_reg_targets(self, reg_targets, dist, mask, area):\n        '''\n          reg_targets (M x N x 4): long tensor\n          dist (M x N)\n          is_*: M x N\n        '''\n        dist[mask == 0] = INF * 1.0\n        min_dist, min_inds = dist.min(dim=1) # M\n        reg_targets_per_im = reg_targets[\n            range(len(reg_targets)), min_inds] # M x N x 4 --> M x 4\n        reg_targets_per_im[min_dist == INF] = - INF\n        return reg_targets_per_im\n\n\n    def _create_heatmaps_from_dist(self, dist, labels, channels):\n        '''\n        dist: M x N\n        labels: N\n        return:\n          heatmaps: M x C\n        '''\n        heatmaps = dist.new_zeros((dist.shape[0], channels))\n        for c in range(channels):\n            inds = (labels == c) # N\n            if inds.int().sum() == 0:\n                continue\n            heatmaps[:, c] = torch.exp(-dist[:, inds].min(dim=1)[0])\n            zeros = heatmaps[:, c] < 1e-4\n            heatmaps[zeros, c] = 0\n        return heatmaps\n\n\n    def _create_agn_heatmaps_from_dist(self, dist):\n        '''\n        TODO (Xingyi): merge it with _create_heatmaps_from_dist\n        dist: M x N\n        return:\n          heatmaps: M x 1\n        '''\n        heatmaps = dist.new_zeros((dist.shape[0], 1))\n        heatmaps[:, 0] = torch.exp(-dist.min(dim=1)[0])\n        zeros = heatmaps < 1e-4\n        heatmaps[zeros] = 0\n        return heatmaps\n\n\n    def _flatten_outputs(self, clss, reg_pred, agn_hm_pred):\n        # Reshape: (N, F, Hl, Wl) -> (N, Hl, Wl, F) -> (sum_l N*Hl*Wl, F)\n        clss = cat([x.permute(0, 2, 3, 1).reshape(-1, x.shape[1]) \\\n            for x in clss], dim=0) if clss[0] is not None else None\n        reg_pred = cat(\n            [x.permute(0, 2, 3, 1).reshape(-1, 4) for x in reg_pred], dim=0)            \n        agn_hm_pred = cat([x.permute(0, 2, 3, 1).reshape(-1) \\\n            for x in agn_hm_pred], dim=0) if self.with_agn_hm else None\n        return clss, reg_pred, agn_hm_pred\n\n\n    def get_center3x3(self, locations, centers, strides):\n        '''\n        Inputs:\n            locations: M x 2\n            centers: N x 2\n            strides: M\n        '''\n        M, N = locations.shape[0], centers.shape[0]\n        locations_expanded = locations.view(M, 1, 2).expand(M, N, 2) # M x N x 2\n        centers_expanded = centers.view(1, N, 2).expand(M, N, 2) # M x N x 2\n        strides_expanded = strides.view(M, 1, 1).expand(M, N, 2) # M x N\n        centers_discret = ((centers_expanded / strides_expanded).int() * \\\n            strides_expanded).float() + strides_expanded / 2 # M x N x 2\n        dist_x = (locations_expanded[:, :, 0] - centers_discret[:, :, 0]).abs()\n        dist_y = (locations_expanded[:, :, 1] - centers_discret[:, :, 1]).abs()\n        return (dist_x <= strides_expanded[:, :, 0]) & \\\n            (dist_y <= strides_expanded[:, :, 0])\n\n\n    def inference(self, images, clss_per_level, reg_pred_per_level, \n        agn_hm_pred_per_level, grids):\n        logits_pred = [x.sigmoid() if x is not None else None \\\n            for x in clss_per_level]\n        agn_hm_pred_per_level = [x.sigmoid() if x is not None else None \\\n            for x in agn_hm_pred_per_level]\n\n        if self.only_proposal:\n            proposals = self.predict_instances(\n                grids, agn_hm_pred_per_level, reg_pred_per_level, \n                images.image_sizes, [None for _ in agn_hm_pred_per_level])\n        else:\n            proposals = self.predict_instances(\n                grids, logits_pred, reg_pred_per_level, \n                images.image_sizes, agn_hm_pred_per_level)\n        if self.as_proposal or self.only_proposal:\n            for p in range(len(proposals)):\n                proposals[p].proposal_boxes = proposals[p].get('pred_boxes')\n                proposals[p].objectness_logits = proposals[p].get('scores')\n                proposals[p].remove('pred_boxes')\n\n        if self.debug:\n            debug_test(\n                [self.denormalizer(x) for x in images], \n                logits_pred, reg_pred_per_level, \n                agn_hm_pred_per_level, preds=proposals,\n                vis_thresh=self.vis_thresh, \n                debug_show_name=False)\n        return proposals, {}\n\n\n    def predict_instances(\n        self, grids, logits_pred, reg_pred, image_sizes, agn_hm_pred, \n        is_proposal=False):\n        sampled_boxes = []\n        for l in range(len(grids)):\n            sampled_boxes.append(self.predict_single_level(\n                grids[l], logits_pred[l], reg_pred[l] * self.strides[l],\n                image_sizes, agn_hm_pred[l], l, is_proposal=is_proposal))\n        boxlists = list(zip(*sampled_boxes))\n        boxlists = [Instances.cat(boxlist) for boxlist in boxlists]\n        boxlists = self.nms_and_topK(\n            boxlists, nms=not self.not_nms)\n        return boxlists\n\n\n    def predict_single_level(\n        self, grids, heatmap, reg_pred, image_sizes, agn_hm, level, \n        is_proposal=False):\n        N, C, H, W = heatmap.shape\n        # put in the same format as grids\n        if self.center_nms:\n            heatmap_nms = nn.functional.max_pool2d(\n                heatmap, (3, 3), stride=1, padding=1)\n            heatmap = heatmap * (heatmap_nms == heatmap).float()\n        heatmap = heatmap.permute(0, 2, 3, 1) # N x H x W x C\n        heatmap = heatmap.reshape(N, -1, C) # N x HW x C\n        box_regression = reg_pred.view(N, 4, H, W).permute(0, 2, 3, 1) # N x H x W x 4 \n        box_regression = box_regression.reshape(N, -1, 4)\n\n        candidate_inds = heatmap > self.score_thresh # 0.05\n        pre_nms_top_n = candidate_inds.view(N, -1).sum(1) # N\n        pre_nms_topk = self.pre_nms_topk_train if self.training else self.pre_nms_topk_test\n        pre_nms_top_n = pre_nms_top_n.clamp(max=pre_nms_topk) # N\n\n        if agn_hm is not None:\n            agn_hm = agn_hm.view(N, 1, H, W).permute(0, 2, 3, 1)\n            agn_hm = agn_hm.reshape(N, -1)\n            heatmap = heatmap * agn_hm[:, :, None]\n\n        results = []\n        for i in range(N):\n            per_box_cls = heatmap[i] # HW x C\n            per_candidate_inds = candidate_inds[i] # n\n            per_box_cls = per_box_cls[per_candidate_inds] # n\n\n            per_candidate_nonzeros = per_candidate_inds.nonzero() # n\n            per_box_loc = per_candidate_nonzeros[:, 0] # n\n            per_class = per_candidate_nonzeros[:, 1] # n\n\n            per_box_regression = box_regression[i] # HW x 4\n            per_box_regression = per_box_regression[per_box_loc] # n x 4\n            per_grids = grids[per_box_loc] # n x 2\n\n            per_pre_nms_top_n = pre_nms_top_n[i] # 1\n\n            if per_candidate_inds.sum().item() > per_pre_nms_top_n.item():\n                per_box_cls, top_k_indices = \\\n                    per_box_cls.topk(per_pre_nms_top_n, sorted=False)\n                per_class = per_class[top_k_indices]\n                per_box_regression = per_box_regression[top_k_indices]\n                per_grids = per_grids[top_k_indices]\n            \n            detections = torch.stack([\n                per_grids[:, 0] - per_box_regression[:, 0],\n                per_grids[:, 1] - per_box_regression[:, 1],\n                per_grids[:, 0] + per_box_regression[:, 2],\n                per_grids[:, 1] + per_box_regression[:, 3],\n            ], dim=1) # n x 4\n\n            # avoid invalid boxes in RoI heads\n            detections[:, 2] = torch.max(detections[:, 2], detections[:, 0] + 0.01)\n            detections[:, 3] = torch.max(detections[:, 3], detections[:, 1] + 0.01)\n            boxlist = Instances(image_sizes[i])\n            boxlist.scores = torch.sqrt(per_box_cls) \\\n                if self.with_agn_hm else per_box_cls # n\n            # import pdb; pdb.set_trace()\n            boxlist.pred_boxes = Boxes(detections)\n            boxlist.pred_classes = per_class\n            results.append(boxlist)\n        return results\n\n\n    def nms_and_topK(self, boxlists, nms=True):\n        num_images = len(boxlists)\n        results = []\n        for i in range(num_images):\n            nms_thresh = self.nms_thresh_train if self.training else \\\n                self.nms_thresh_test\n            result = ml_nms(boxlists[i], nms_thresh) if nms else boxlists[i]\n            if self.debug:\n                print('#proposals before nms', len(boxlists[i]))\n                print('#proposals after nms', len(result))\n            num_dets = len(result)\n            post_nms_topk = self.post_nms_topk_train if self.training else \\\n                self.post_nms_topk_test\n            if num_dets > post_nms_topk:\n                cls_scores = result.scores\n                image_thresh, _ = torch.kthvalue(\n                    cls_scores.float().cpu(),\n                    num_dets - post_nms_topk + 1\n                )\n                keep = cls_scores >= image_thresh.item()\n                keep = torch.nonzero(keep).squeeze(1)\n                result = result[keep]\n            if self.debug:\n                print('#proposals after filter', len(result))\n            results.append(result)\n        return results\n\n\n    def _add_more_pos(self, reg_pred, gt_instances, shapes_per_level):\n        labels, level_masks, c33_inds, c33_masks, c33_regs = \\\n            self._get_c33_inds(gt_instances, shapes_per_level)\n        N, L, K = labels.shape[0], len(self.strides), 9\n        c33_inds[c33_masks == 0] = 0\n        reg_pred_c33 = reg_pred[c33_inds].detach() # N x L x K\n        invalid_reg = c33_masks == 0\n        c33_regs_expand = c33_regs.view(N * L * K, 4).clamp(min=0)\n        if N > 0:\n            with torch.no_grad():\n                c33_reg_loss = self.iou_loss(\n                    reg_pred_c33.view(N * L * K, 4), \n                    c33_regs_expand, None,\n                    reduction='none').view(N, L, K).detach() # N x L x K\n        else:\n            c33_reg_loss = reg_pred_c33.new_zeros((N, L, K)).detach()\n        c33_reg_loss[invalid_reg] = INF # N x L x K\n        c33_reg_loss.view(N * L, K)[level_masks.view(N * L), 4] = 0 # real center\n        c33_reg_loss = c33_reg_loss.view(N, L * K)\n        if N == 0:\n            loss_thresh = c33_reg_loss.new_ones((N)).float()\n        else:\n            loss_thresh = torch.kthvalue(\n                c33_reg_loss, self.more_pos_topk, dim=1)[0] # N\n        loss_thresh[loss_thresh > self.more_pos_thresh] = self.more_pos_thresh # N\n        new_pos = c33_reg_loss.view(N, L, K) < \\\n            loss_thresh.view(N, 1, 1).expand(N, L, K)\n        pos_inds = c33_inds[new_pos].view(-1) # P\n        labels = labels.view(N, 1, 1).expand(N, L, K)[new_pos].view(-1)\n        return pos_inds, labels\n        \n    \n    def _get_c33_inds(self, gt_instances, shapes_per_level):\n        '''\n        TODO (Xingyi): The current implementation is ugly. Refactor.\n        Get the center (and the 3x3 region near center) locations of each objects\n        Inputs:\n            gt_instances: [n_i], sum n_i = N\n            shapes_per_level: L x 2 [(h_l, w_l)]_L\n        '''\n        labels = []\n        level_masks = []\n        c33_inds = []\n        c33_masks = []\n        c33_regs = []\n        L = len(self.strides)\n        B = len(gt_instances)\n        shapes_per_level = shapes_per_level.long()\n        loc_per_level = (shapes_per_level[:, 0] * shapes_per_level[:, 1]).long() # L\n        level_bases = []\n        s = 0\n        for l in range(L):\n            level_bases.append(s)\n            s = s + B * loc_per_level[l]\n        level_bases = shapes_per_level.new_tensor(level_bases).long() # L\n        strides_default = shapes_per_level.new_tensor(self.strides).float() # L\n        K = 9\n        dx = shapes_per_level.new_tensor([-1, 0, 1, -1, 0, 1, -1, 0, 1]).long()\n        dy = shapes_per_level.new_tensor([-1, -1, -1, 0, 0, 0, 1, 1, 1]).long()\n        for im_i in range(B):\n            targets_per_im = gt_instances[im_i]\n            bboxes = targets_per_im.gt_boxes.tensor # n x 4\n            n = bboxes.shape[0]\n            if n == 0:\n                continue\n            centers = ((bboxes[:, [0, 1]] + bboxes[:, [2, 3]]) / 2) # n x 2\n            centers = centers.view(n, 1, 2).expand(n, L, 2)\n\n            strides = strides_default.view(1, L, 1).expand(n, L, 2) # \n            centers_inds = (centers / strides).long() # n x L x 2\n            center_grids = centers_inds * strides + strides // 2# n x L x 2\n            l = center_grids[:, :, 0] - bboxes[:, 0].view(n, 1).expand(n, L)\n            t = center_grids[:, :, 1] - bboxes[:, 1].view(n, 1).expand(n, L)\n            r = bboxes[:, 2].view(n, 1).expand(n, L) - center_grids[:, :, 0]\n            b = bboxes[:, 3].view(n, 1).expand(n, L) - center_grids[:, :, 1] # n x L\n            reg = torch.stack([l, t, r, b], dim=2) # n x L x 4\n            reg = reg / strides_default.view(1, L, 1).expand(n, L, 4).float()\n            \n            Ws = shapes_per_level[:, 1].view(1, L).expand(n, L)\n            Hs = shapes_per_level[:, 0].view(1, L).expand(n, L)\n            expand_Ws = Ws.view(n, L, 1).expand(n, L, K)\n            expand_Hs = Hs.view(n, L, 1).expand(n, L, K)\n            label = targets_per_im.gt_classes.view(n).clone()\n            mask = reg.min(dim=2)[0] >= 0 # n x L\n            mask = mask & self.assign_fpn_level(bboxes)\n            labels.append(label) # n\n            level_masks.append(mask) # n x L\n\n            Dy = dy.view(1, 1, K).expand(n, L, K)\n            Dx = dx.view(1, 1, K).expand(n, L, K)\n            c33_ind = level_bases.view(1, L, 1).expand(n, L, K) + \\\n                       im_i * loc_per_level.view(1, L, 1).expand(n, L, K) + \\\n                       (centers_inds[:, :, 1:2].expand(n, L, K) + Dy) * expand_Ws + \\\n                       (centers_inds[:, :, 0:1].expand(n, L, K) + Dx) # n x L x K\n            \n            c33_mask = \\\n                ((centers_inds[:, :, 1:2].expand(n, L, K) + dy) < expand_Hs) & \\\n                ((centers_inds[:, :, 1:2].expand(n, L, K) + dy) >= 0) & \\\n                ((centers_inds[:, :, 0:1].expand(n, L, K) + dx) < expand_Ws) & \\\n                ((centers_inds[:, :, 0:1].expand(n, L, K) + dx) >= 0)\n            # TODO (Xingyi): think about better way to implement this\n            # Currently it hard codes the 3x3 region\n            c33_reg = reg.view(n, L, 1, 4).expand(n, L, K, 4).clone()\n            c33_reg[:, :, [0, 3, 6], 0] -= 1\n            c33_reg[:, :, [0, 3, 6], 2] += 1\n            c33_reg[:, :, [2, 5, 8], 0] += 1\n            c33_reg[:, :, [2, 5, 8], 2] -= 1\n            c33_reg[:, :, [0, 1, 2], 1] -= 1\n            c33_reg[:, :, [0, 1, 2], 3] += 1\n            c33_reg[:, :, [6, 7, 8], 1] += 1\n            c33_reg[:, :, [6, 7, 8], 3] -= 1\n            c33_mask = c33_mask & (c33_reg.min(dim=3)[0] >= 0) # n x L x K\n            c33_inds.append(c33_ind)\n            c33_masks.append(c33_mask)\n            c33_regs.append(c33_reg)\n        \n        if len(level_masks) > 0:\n            labels = torch.cat(labels, dim=0)\n            level_masks = torch.cat(level_masks, dim=0)\n            c33_inds = torch.cat(c33_inds, dim=0).long()\n            c33_regs = torch.cat(c33_regs, dim=0)\n            c33_masks = torch.cat(c33_masks, dim=0)\n        else:\n            labels = shapes_per_level.new_zeros((0)).long()\n            level_masks = shapes_per_level.new_zeros((0, L)).bool()\n            c33_inds = shapes_per_level.new_zeros((0, L, K)).long()\n            c33_regs = shapes_per_level.new_zeros((0, L, K, 4)).float()\n            c33_masks = shapes_per_level.new_zeros((0, L, K)).bool()\n        return labels, level_masks, c33_inds, c33_masks, c33_regs # N x L, N x L x K\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/dense_heads/centernet_head.py",
    "content": "import math\nfrom typing import List\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nfrom detectron2.layers import ShapeSpec, get_norm\nfrom detectron2.config import configurable\nfrom ..layers.deform_conv import DFConv2d\n\n__all__ = [\"CenterNetHead\"]\n\nclass Scale(nn.Module):\n    def __init__(self, init_value=1.0):\n        super(Scale, self).__init__()\n        self.scale = nn.Parameter(torch.FloatTensor([init_value]))\n\n    def forward(self, input):\n        return input * self.scale\n\nclass CenterNetHead(nn.Module):\n    @configurable\n    def __init__(self, \n        # input_shape: List[ShapeSpec],\n        in_channels,\n        num_levels,\n        *,\n        num_classes=80,\n        with_agn_hm=False,\n        only_proposal=False,\n        norm='GN',\n        num_cls_convs=4,\n        num_box_convs=4,\n        num_share_convs=0,\n        use_deformable=False,\n        prior_prob=0.01):\n        super().__init__()\n        self.num_classes = num_classes\n        self.with_agn_hm = with_agn_hm\n        self.only_proposal = only_proposal\n        self.out_kernel = 3\n\n        head_configs = {\n            \"cls\": (num_cls_convs if not self.only_proposal else 0, \\\n                use_deformable),\n            \"bbox\": (num_box_convs, use_deformable),\n            \"share\": (num_share_convs, use_deformable)}\n\n        # in_channels = [s.channels for s in input_shape]\n        # assert len(set(in_channels)) == 1, \\\n        #     \"Each level must have the same channel!\"\n        # in_channels = in_channels[0]\n        channels = {\n            'cls': in_channels,\n            'bbox': in_channels,\n            'share': in_channels,\n        }\n        for head in head_configs:\n            tower = []\n            num_convs, use_deformable = head_configs[head]\n            channel = channels[head]\n            for i in range(num_convs):\n                if use_deformable and i == num_convs - 1:\n                    conv_func = DFConv2d\n                else:\n                    conv_func = nn.Conv2d\n                tower.append(conv_func(\n                        in_channels if i == 0 else channel,\n                        channel, \n                        kernel_size=3, stride=1,\n                        padding=1, bias=True\n                ))\n                if norm == 'GN' and channel % 32 != 0:\n                    tower.append(nn.GroupNorm(25, channel))\n                elif norm != '':\n                    tower.append(get_norm(norm, channel))\n                tower.append(nn.ReLU())\n            self.add_module('{}_tower'.format(head),\n                            nn.Sequential(*tower))\n\n        self.bbox_pred = nn.Conv2d(\n            in_channels, 4, kernel_size=self.out_kernel,\n            stride=1, padding=self.out_kernel // 2\n        )\n\n        self.scales = nn.ModuleList(\n            [Scale(init_value=1.0) for _ in range(num_levels)])\n\n        for modules in [\n            self.cls_tower, self.bbox_tower,\n            self.share_tower,\n            self.bbox_pred,\n        ]:\n            for l in modules.modules():\n                if isinstance(l, nn.Conv2d):\n                    torch.nn.init.normal_(l.weight, std=0.01)\n                    torch.nn.init.constant_(l.bias, 0)\n        \n        torch.nn.init.constant_(self.bbox_pred.bias, 8.)\n        prior_prob = prior_prob\n        bias_value = -math.log((1 - prior_prob) / prior_prob)\n\n        if self.with_agn_hm:\n            self.agn_hm = nn.Conv2d(\n                in_channels, 1, kernel_size=self.out_kernel,\n                stride=1, padding=self.out_kernel // 2\n            )\n            torch.nn.init.constant_(self.agn_hm.bias, bias_value)\n            torch.nn.init.normal_(self.agn_hm.weight, std=0.01)\n\n        if not self.only_proposal:\n            cls_kernel_size = self.out_kernel\n            self.cls_logits = nn.Conv2d(\n                in_channels, self.num_classes,\n                kernel_size=cls_kernel_size, \n                stride=1,\n                padding=cls_kernel_size // 2,\n            )\n\n            torch.nn.init.constant_(self.cls_logits.bias, bias_value)\n            torch.nn.init.normal_(self.cls_logits.weight, std=0.01)\n\n    @classmethod\n    def from_config(cls, cfg, input_shape):\n        ret = {\n            # 'input_shape': input_shape,\n            'in_channels': [s.channels for s in input_shape][0],\n            'num_levels': len(input_shape),\n            'num_classes': cfg.MODEL.CENTERNET.NUM_CLASSES,\n            'with_agn_hm': cfg.MODEL.CENTERNET.WITH_AGN_HM,\n            'only_proposal': cfg.MODEL.CENTERNET.ONLY_PROPOSAL,\n            'norm': cfg.MODEL.CENTERNET.NORM,\n            'num_cls_convs': cfg.MODEL.CENTERNET.NUM_CLS_CONVS,\n            'num_box_convs': cfg.MODEL.CENTERNET.NUM_BOX_CONVS,\n            'num_share_convs': cfg.MODEL.CENTERNET.NUM_SHARE_CONVS,\n            'use_deformable': cfg.MODEL.CENTERNET.USE_DEFORMABLE,\n            'prior_prob': cfg.MODEL.CENTERNET.PRIOR_PROB,\n        }\n        return ret\n\n    def forward(self, x):\n        clss = []\n        bbox_reg = []\n        agn_hms = []\n        for l, feature in enumerate(x):\n            feature = self.share_tower(feature)\n            cls_tower = self.cls_tower(feature)\n            bbox_tower = self.bbox_tower(feature)\n            if not self.only_proposal:\n                clss.append(self.cls_logits(cls_tower))\n            else:\n                clss.append(None)\n\n            if self.with_agn_hm:\n                agn_hms.append(self.agn_hm(bbox_tower))\n            else:\n                agn_hms.append(None)\n            reg = self.bbox_pred(bbox_tower)\n            reg = self.scales[l](reg)\n            bbox_reg.append(F.relu(reg))\n        \n        return clss, bbox_reg, agn_hms"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/dense_heads/utils.py",
    "content": "import cv2\nimport torch\nfrom torch import nn\nfrom detectron2.utils.comm import get_world_size\nfrom detectron2.structures import pairwise_iou, Boxes\n# from .data import CenterNetCrop\nimport torch.nn.functional as F\nimport numpy as np\nfrom detectron2.structures import Boxes, ImageList, Instances\n\n__all__ = ['reduce_sum', '_transpose']\n\nINF = 1000000000\n\ndef _transpose(training_targets, num_loc_list):\n    '''\n    This function is used to transpose image first training targets to \n        level first ones\n    :return: level first training targets\n    '''\n    for im_i in range(len(training_targets)):\n        training_targets[im_i] = torch.split(\n            training_targets[im_i], num_loc_list, dim=0)\n\n    targets_level_first = []\n    for targets_per_level in zip(*training_targets):\n        targets_level_first.append(\n            torch.cat(targets_per_level, dim=0))\n    return targets_level_first\n\n\ndef reduce_sum(tensor):\n    world_size = get_world_size()\n    if world_size < 2:\n        return tensor\n    tensor = tensor.clone()\n    torch.distributed.all_reduce(tensor, op=torch.distributed.ReduceOp.SUM)\n    return tensor"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/layers/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/layers/deform_conv.py",
    "content": "import torch\nfrom torch import nn\n\nfrom detectron2.layers import Conv2d\n\n\nclass _NewEmptyTensorOp(torch.autograd.Function):\n    @staticmethod\n    def forward(ctx, x, new_shape):\n        ctx.shape = x.shape\n        return x.new_empty(new_shape)\n\n    @staticmethod\n    def backward(ctx, grad):\n        shape = ctx.shape\n        return _NewEmptyTensorOp.apply(grad, shape), None\n\n\nclass DFConv2d(nn.Module):\n    \"\"\"Deformable convolutional layer\"\"\"\n    def __init__(\n            self,\n            in_channels,\n            out_channels,\n            with_modulated_dcn=True,\n            kernel_size=3,\n            stride=1,\n            groups=1,\n            dilation=1,\n            deformable_groups=1,\n            bias=False,\n            padding=None\n    ):\n        super(DFConv2d, self).__init__()\n        if isinstance(kernel_size, (list, tuple)):\n            assert isinstance(stride, (list, tuple))\n            assert isinstance(dilation, (list, tuple))\n            assert len(kernel_size) == 2\n            assert len(stride) == 2\n            assert len(dilation) == 2\n            padding = (\n                dilation[0] * (kernel_size[0] - 1) // 2,\n                dilation[1] * (kernel_size[1] - 1) // 2\n            )\n            offset_base_channels = kernel_size[0] * kernel_size[1]\n        else:\n            padding = dilation * (kernel_size - 1) // 2\n            offset_base_channels = kernel_size * kernel_size\n        if with_modulated_dcn:\n            from detectron2.layers.deform_conv import ModulatedDeformConv\n            offset_channels = offset_base_channels * 3  # default: 27\n            conv_block = ModulatedDeformConv\n        else:\n            from detectron2.layers.deform_conv import DeformConv\n            offset_channels = offset_base_channels * 2  # default: 18\n            conv_block = DeformConv\n        self.offset = Conv2d(\n            in_channels,\n            deformable_groups * offset_channels,\n            kernel_size=kernel_size,\n            stride=stride,\n            padding=padding,\n            groups=1,\n            dilation=dilation\n        )\n        nn.init.constant_(self.offset.weight, 0)\n        nn.init.constant_(self.offset.bias, 0)\n        '''\n        for l in [self.offset, ]:\n            nn.init.kaiming_uniform_(l.weight, a=1)\n            torch.nn.init.constant_(l.bias, 0.)\n        '''\n        self.conv = conv_block(\n            in_channels,\n            out_channels,\n            kernel_size=kernel_size,\n            stride=stride,\n            padding=padding,\n            dilation=dilation,\n            groups=groups,\n            deformable_groups=deformable_groups,\n            bias=bias\n        )\n        self.with_modulated_dcn = with_modulated_dcn\n        self.kernel_size = kernel_size\n        self.stride = stride\n        self.padding = padding\n        self.dilation = dilation\n        self.offset_split = offset_base_channels * deformable_groups * 2\n\n    def forward(self, x, return_offset=False):\n        if x.numel() > 0:\n            if not self.with_modulated_dcn:\n                offset_mask = self.offset(x)\n                x = self.conv(x, offset_mask)\n            else:\n                offset_mask = self.offset(x)\n                offset = offset_mask[:, :self.offset_split, :, :]\n                mask = offset_mask[:, self.offset_split:, :, :].sigmoid()\n                x = self.conv(x, offset, mask)\n            if return_offset:\n                return x, offset_mask\n            return x\n        # get output shape\n        output_shape = [\n            (i + 2 * p - (di * (k - 1) + 1)) // d + 1\n            for i, p, di, k, d in zip(\n                x.shape[-2:],\n                self.padding,\n                self.dilation,\n                self.kernel_size,\n                self.stride\n            )\n        ]\n        output_shape = [x.shape[0], self.conv.weight.shape[0]] + output_shape\n        return _NewEmptyTensorOp.apply(x, output_shape)"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/layers/heatmap_focal_loss.py",
    "content": "import torch\nfrom torch.nn import functional as F\n\n# TODO: merge these two function\ndef heatmap_focal_loss(\n    inputs,\n    targets,\n    pos_inds,\n    labels,\n    alpha: float = -1,\n    beta: float = 4,\n    gamma: float = 2,\n    reduction: str = 'sum',\n    sigmoid_clamp: float = 1e-4,\n    ignore_high_fp: float = -1.,\n):\n    \"\"\"\n    Loss used in RetinaNet for dense detection: https://arxiv.org/abs/1708.02002.\n    Args:\n        inputs:  (sum_l N*Hl*Wl, C)\n        targets: (sum_l N*Hl*Wl, C)\n        pos_inds: N\n        labels: N\n    Returns:\n        Loss tensor with the reduction option applied.\n    \"\"\"\n    pred = torch.clamp(inputs.sigmoid_(), min=sigmoid_clamp, max=1-sigmoid_clamp)\n    neg_weights = torch.pow(1 - targets, beta)\n    pos_pred_pix = pred[pos_inds] # N x C\n    pos_pred = pos_pred_pix.gather(1, labels.unsqueeze(1))\n    pos_loss = torch.log(pos_pred) * torch.pow(1 - pos_pred, gamma)\n    neg_loss = torch.log(1 - pred) * torch.pow(pred, gamma) * neg_weights\n\n    if ignore_high_fp > 0:\n        not_high_fp = (pred < ignore_high_fp).float()\n        neg_loss = not_high_fp * neg_loss\n\n    if reduction == \"sum\":\n        pos_loss = pos_loss.sum()\n        neg_loss = neg_loss.sum()\n\n    if alpha >= 0:\n        pos_loss = alpha * pos_loss\n        neg_loss = (1 - alpha) * neg_loss\n\n    return - pos_loss, - neg_loss\n\nheatmap_focal_loss_jit = torch.jit.script(heatmap_focal_loss)\n# heatmap_focal_loss_jit = heatmap_focal_loss\n\ndef binary_heatmap_focal_loss(\n    inputs,\n    targets,\n    pos_inds,\n    alpha: float = -1,\n    beta: float = 4,\n    gamma: float = 2,\n    sigmoid_clamp: float = 1e-4,\n    ignore_high_fp: float = -1.,\n):\n    \"\"\"\n    Args:\n        inputs:  (sum_l N*Hl*Wl,)\n        targets: (sum_l N*Hl*Wl,)\n        pos_inds: N\n    Returns:\n        Loss tensor with the reduction option applied.\n    \"\"\"\n    pred = torch.clamp(inputs.sigmoid_(), min=sigmoid_clamp, max=1-sigmoid_clamp)\n    neg_weights = torch.pow(1 - targets, beta)\n    for i, ind in enumerate(pos_inds):\n        if ind >= pred.shape[0]:\n            print('%'*100)\n            print(pred.shape, ind, pos_inds)\n            pos_inds[i] = pred.shape[0] - 1\n    pos_pred = pred[pos_inds] # N\n    pos_loss = torch.log(pos_pred) * torch.pow(1 - pos_pred, gamma)\n    neg_loss = torch.log(1 - pred) * torch.pow(pred, gamma) * neg_weights\n    if ignore_high_fp > 0:\n        not_high_fp = (pred < ignore_high_fp).float()\n        neg_loss = not_high_fp * neg_loss\n\n    pos_loss = - pos_loss.sum()\n    neg_loss = - neg_loss.sum()\n\n    if alpha >= 0:\n        pos_loss = alpha * pos_loss\n        neg_loss = (1 - alpha) * neg_loss\n\n    return pos_loss, neg_loss\n\n# binary_heatmap_focal_loss_jit = torch.jit.script(binary_heatmap_focal_loss)"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/layers/iou_loss.py",
    "content": "import torch\nfrom torch import nn\n\n\nclass IOULoss(nn.Module):\n    def __init__(self, loc_loss_type='iou'):\n        super(IOULoss, self).__init__()\n        self.loc_loss_type = loc_loss_type\n\n    def forward(self, pred, target, weight=None, reduction='sum'):\n        pred_left = pred[:, 0]\n        pred_top = pred[:, 1]\n        pred_right = pred[:, 2]\n        pred_bottom = pred[:, 3]\n\n        target_left = target[:, 0]\n        target_top = target[:, 1]\n        target_right = target[:, 2]\n        target_bottom = target[:, 3]\n\n        target_aera = (target_left + target_right) * \\\n                      (target_top + target_bottom)\n        pred_aera = (pred_left + pred_right) * \\\n                    (pred_top + pred_bottom)\n\n        w_intersect = torch.min(pred_left, target_left) + \\\n                      torch.min(pred_right, target_right)\n        h_intersect = torch.min(pred_bottom, target_bottom) + \\\n                      torch.min(pred_top, target_top)\n\n        g_w_intersect = torch.max(pred_left, target_left) + \\\n                        torch.max(pred_right, target_right)\n        g_h_intersect = torch.max(pred_bottom, target_bottom) + \\\n                        torch.max(pred_top, target_top)\n        ac_uion = g_w_intersect * g_h_intersect\n\n        area_intersect = w_intersect * h_intersect\n        area_union = target_aera + pred_aera - area_intersect\n\n        ious = (area_intersect + 1.0) / (area_union + 1.0)\n        gious = ious - (ac_uion - area_union) / ac_uion\n        if self.loc_loss_type == 'iou':\n            losses = -torch.log(ious)\n        elif self.loc_loss_type == 'linear_iou':\n            losses = 1 - ious\n        elif self.loc_loss_type == 'giou':\n            losses = 1 - gious\n        else:\n            raise NotImplementedError\n\n        if weight is not None:\n            losses = losses * weight\n        else:\n            losses = losses\n\n        if reduction == 'sum':\n            return losses.sum()\n        elif reduction == 'batch':\n            return losses.sum(dim=[1])\n        elif reduction == 'none':\n            return losses\n        else:\n            raise NotImplementedError\n\n\ndef giou_loss(\n    boxes1: torch.Tensor,\n    boxes2: torch.Tensor,\n    reduction: str = \"none\",\n    eps: float = 1e-7,\n) -> torch.Tensor:\n    \"\"\"\n    Generalized Intersection over Union Loss (Hamid Rezatofighi et. al)\n    https://arxiv.org/abs/1902.09630\n    Gradient-friendly IoU loss with an additional penalty that is non-zero when the\n    boxes do not overlap and scales with the size of their smallest enclosing box.\n    This loss is symmetric, so the boxes1 and boxes2 arguments are interchangeable.\n    Args:\n        boxes1, boxes2 (Tensor): box locations in XYXY format, shape (N, 4) or (4,).\n        reduction: 'none' | 'mean' | 'sum'\n                 'none': No reduction will be applied to the output.\n                 'mean': The output will be averaged.\n                 'sum': The output will be summed.\n        eps (float): small number to prevent division by zero\n    \"\"\"\n\n    x1, y1, x2, y2 = boxes1.unbind(dim=-1)\n    x1g, y1g, x2g, y2g = boxes2.unbind(dim=-1)\n\n    assert (x2 >= x1).all(), \"bad box: x1 larger than x2\"\n    assert (y2 >= y1).all(), \"bad box: y1 larger than y2\"\n\n    # Intersection keypoints\n    xkis1 = torch.max(x1, x1g)\n    ykis1 = torch.max(y1, y1g)\n    xkis2 = torch.min(x2, x2g)\n    ykis2 = torch.min(y2, y2g)\n\n    intsctk = torch.zeros_like(x1)\n    mask = (ykis2 > ykis1) & (xkis2 > xkis1)\n    intsctk[mask] = (xkis2[mask] - xkis1[mask]) * (ykis2[mask] - ykis1[mask])\n    unionk = (x2 - x1) * (y2 - y1) + (x2g - x1g) * (y2g - y1g) - intsctk\n    iouk = intsctk / (unionk + eps)\n\n    # smallest enclosing box\n    xc1 = torch.min(x1, x1g)\n    yc1 = torch.min(y1, y1g)\n    xc2 = torch.max(x2, x2g)\n    yc2 = torch.max(y2, y2g)\n\n    area_c = (xc2 - xc1) * (yc2 - yc1)\n    miouk = iouk - ((area_c - unionk) / (area_c + eps))\n\n    loss = 1 - miouk\n\n    if reduction == \"mean\":\n        loss = loss.mean()\n    elif reduction == \"sum\":\n        loss = loss.sum()\n\n    return loss"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/layers/ml_nms.py",
    "content": "from detectron2.layers import batched_nms\n\n\ndef ml_nms(boxlist, nms_thresh, max_proposals=-1,\n           score_field=\"scores\", label_field=\"labels\"):\n    \"\"\"\n    Performs non-maximum suppression on a boxlist, with scores specified\n    in a boxlist field via score_field.\n    Arguments:\n        boxlist(BoxList)\n        nms_thresh (float)\n        max_proposals (int): if > 0, then only the top max_proposals are kept\n            after non-maximum suppression\n        score_field (str)\n    \"\"\"\n    if nms_thresh <= 0:\n        return boxlist\n    if boxlist.has('pred_boxes'):\n        boxes = boxlist.pred_boxes.tensor\n        labels = boxlist.pred_classes\n    else:\n        boxes = boxlist.proposal_boxes.tensor\n        labels = boxlist.proposal_boxes.tensor.new_zeros(\n            len(boxlist.proposal_boxes.tensor))\n    scores = boxlist.scores\n    \n    keep = batched_nms(boxes, scores, labels, nms_thresh)\n    if max_proposals > 0:\n        keep = keep[: max_proposals]\n    boxlist = boxlist[keep]\n    return boxlist\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/meta_arch/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/meta_arch/centernet_detector.py",
    "content": "import math\nimport json\nimport numpy as np\nimport torch\nfrom torch import nn\n\nfrom detectron2.modeling.meta_arch.build import META_ARCH_REGISTRY\nfrom detectron2.modeling import build_backbone, build_proposal_generator\nfrom detectron2.modeling import detector_postprocess\nfrom detectron2.structures import ImageList\n\n@META_ARCH_REGISTRY.register()\nclass CenterNetDetector(nn.Module):\n    def __init__(self, cfg):\n        super().__init__()\n        self.mean, self.std = cfg.MODEL.PIXEL_MEAN, cfg.MODEL.PIXEL_STD\n        self.register_buffer(\"pixel_mean\", torch.Tensor(cfg.MODEL.PIXEL_MEAN).view(-1, 1, 1))\n        self.register_buffer(\"pixel_std\", torch.Tensor(cfg.MODEL.PIXEL_STD).view(-1, 1, 1))\n        \n        self.backbone = build_backbone(cfg)\n        self.proposal_generator = build_proposal_generator(\n            cfg, self.backbone.output_shape()) # TODO: change to a more precise name\n    \n    \n    def forward(self, batched_inputs):\n        if not self.training:\n            return self.inference(batched_inputs)\n        images = self.preprocess_image(batched_inputs)\n        features = self.backbone(images.tensor)\n        gt_instances = [x[\"instances\"].to(self.device) for x in batched_inputs]\n\n        _, proposal_losses = self.proposal_generator(\n            images, features, gt_instances)\n        return proposal_losses\n\n\n    @property\n    def device(self):\n        return self.pixel_mean.device\n\n\n    @torch.no_grad()\n    def inference(self, batched_inputs, do_postprocess=True):\n        images = self.preprocess_image(batched_inputs)\n        inp = images.tensor\n        features = self.backbone(inp)\n        proposals, _ = self.proposal_generator(images, features, None)\n\n        processed_results = []\n        for results_per_image, input_per_image, image_size in zip(\n            proposals, batched_inputs, images.image_sizes):\n            if do_postprocess:\n                height = input_per_image.get(\"height\", image_size[0])\n                width = input_per_image.get(\"width\", image_size[1])\n                r = detector_postprocess(results_per_image, height, width)\n                processed_results.append({\"instances\": r})\n            else:\n                r = results_per_image\n                processed_results.append(r)\n        return processed_results\n\n    def preprocess_image(self, batched_inputs):\n        \"\"\"\n        Normalize, pad and batch the input images.\n        \"\"\"\n        images = [x[\"image\"].to(self.device) for x in batched_inputs]\n        images = [(x - self.pixel_mean) / self.pixel_std for x in images]\n        images = ImageList.from_tensors(images, self.backbone.size_divisibility)\n        return images\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/roi_heads/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/roi_heads/custom_fast_rcnn.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n# Part of the code is from https://github.com/tztztztztz/eql.detectron2/blob/master/projects/EQL/eql/fast_rcnn.py\nimport logging\nimport math\nimport json\nfrom typing import Dict, Union\nimport torch\nfrom fvcore.nn import giou_loss, smooth_l1_loss\nfrom torch import nn\nfrom torch.nn import functional as F\n\nfrom detectron2.config import configurable\nfrom detectron2.layers import Linear, ShapeSpec, batched_nms, cat, nonzero_tuple\nfrom detectron2.modeling.box_regression import Box2BoxTransform\nfrom detectron2.structures import Boxes, Instances\nfrom detectron2.utils.events import get_event_storage\nfrom detectron2.modeling.roi_heads.fast_rcnn import FastRCNNOutputLayers\nfrom detectron2.modeling.roi_heads.fast_rcnn import fast_rcnn_inference\nfrom detectron2.modeling.roi_heads.fast_rcnn import _log_classification_stats\nfrom detectron2.utils.comm import get_world_size\nfrom .fed_loss import load_class_freq, get_fed_loss_inds\n\n__all__ = [\"CustomFastRCNNOutputLayers\"]\n\nclass CustomFastRCNNOutputLayers(FastRCNNOutputLayers):\n    def __init__(\n        self, \n        cfg, \n        input_shape: ShapeSpec,\n        **kwargs\n    ):\n        super().__init__(cfg, input_shape, **kwargs)\n\n        self.cfg = cfg\n\n    def losses(self, predictions, proposals):\n        \"\"\"\n        enable advanced loss\n        \"\"\"\n        scores, proposal_deltas = predictions\n        gt_classes = (\n            cat([p.gt_classes for p in proposals], dim=0) if len(proposals) else torch.empty(0)\n        )\n        num_classes = self.num_classes\n        _log_classification_stats(scores, gt_classes)\n\n        if len(proposals):\n            proposal_boxes = cat([p.proposal_boxes.tensor for p in proposals], dim=0)  # Nx4\n            assert not proposal_boxes.requires_grad, \"Proposals should not require gradients!\"\n            gt_boxes = cat(\n                [(p.gt_boxes if p.has(\"gt_boxes\") else p.proposal_boxes).tensor for p in proposals],\n                dim=0,\n            )\n        else:\n            proposal_boxes = gt_boxes = torch.empty((0, 4), device=proposal_deltas.device)\n\n        loss_cls = self.softmax_cross_entropy_loss(scores, gt_classes)\n        return {\n            \"loss_cls\": loss_cls, \n            \"loss_box_reg\": self.box_reg_loss(\n                proposal_boxes, gt_boxes, proposal_deltas, gt_classes)\n        }\n\n\n    def sigmoid_cross_entropy_loss(self, pred_class_logits, gt_classes):\n        if pred_class_logits.numel() == 0:\n            return pred_class_logits.new_zeros([1])[0] # This is more robust than .sum() * 0.\n\n        B = pred_class_logits.shape[0]\n        C = pred_class_logits.shape[1] - 1\n\n        target = pred_class_logits.new_zeros(B, C + 1)\n        target[range(len(gt_classes)), gt_classes] = 1 # B x (C + 1)\n        target = target[:, :C] # B x C\n\n        weight = 1\n\n        cls_loss = F.binary_cross_entropy_with_logits(\n            pred_class_logits[:, :-1], target, reduction='none') # B x C\n        loss =  torch.sum(cls_loss * weight) / B  \n        return loss\n        \n    \n    def softmax_cross_entropy_loss(self, pred_class_logits, gt_classes):\n        \"\"\"\n        change _no_instance handling\n        \"\"\"\n        if pred_class_logits.numel() == 0:\n            return pred_class_logits.new_zeros([1])[0]\n\n        loss = F.cross_entropy(\n            pred_class_logits, gt_classes, reduction=\"mean\")\n        return loss\n\n\n    def inference(self, predictions, proposals):\n        \"\"\"\n        enable use proposal boxes\n        \"\"\"\n        boxes = self.predict_boxes(predictions, proposals)\n        scores = self.predict_probs(predictions, proposals)\n        if self.cfg.MODEL.ROI_BOX_HEAD.MULT_PROPOSAL_SCORE:\n            proposal_scores = [p.get('objectness_logits') for p in proposals]\n            scores = [(s * ps[:, None]) ** 0.5 \\\n                for s, ps in zip(scores, proposal_scores)]\n        image_shapes = [x.image_size for x in proposals]\n        return fast_rcnn_inference(\n            boxes,\n            scores,\n            image_shapes,\n            self.test_score_thresh,\n            self.test_nms_thresh,\n            self.test_topk_per_image,\n        )\n\n\n    def predict_probs(self, predictions, proposals):\n        \"\"\"\n        support sigmoid\n        \"\"\"\n        scores, _ = predictions\n        num_inst_per_image = [len(p) for p in proposals]\n        probs = F.softmax(scores, dim=-1)\n        return probs.split(num_inst_per_image, dim=0)\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/roi_heads/custom_roi_heads.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\nimport numpy as np\nimport json\nimport math\nimport torch\nfrom torch import nn\nfrom torch.autograd.function import Function\nfrom typing import Dict, List, Optional, Tuple, Union\n\nfrom detectron2.layers import ShapeSpec\nfrom detectron2.structures import Boxes, Instances, pairwise_iou\nfrom detectron2.utils.events import get_event_storage\n\nfrom detectron2.modeling.box_regression import Box2BoxTransform\nfrom detectron2.modeling.roi_heads.fast_rcnn import fast_rcnn_inference\nfrom detectron2.modeling.roi_heads.roi_heads import ROI_HEADS_REGISTRY, StandardROIHeads\nfrom detectron2.modeling.roi_heads.cascade_rcnn import CascadeROIHeads\nfrom detectron2.modeling.roi_heads.box_head import build_box_head\nfrom .custom_fast_rcnn import CustomFastRCNNOutputLayers\n\n\n@ROI_HEADS_REGISTRY.register()\nclass CustomROIHeads(StandardROIHeads):\n    @classmethod\n    def _init_box_head(self, cfg, input_shape):\n        ret = super()._init_box_head(cfg, input_shape)\n        del ret['box_predictor']\n        ret['box_predictor'] = CustomFastRCNNOutputLayers(\n            cfg, ret['box_head'].output_shape)\n        self.debug = cfg.DEBUG\n        if self.debug:\n            self.debug_show_name = cfg.DEBUG_SHOW_NAME\n            self.save_debug = cfg.SAVE_DEBUG\n            self.vis_thresh = cfg.VIS_THRESH\n            self.pixel_mean = torch.Tensor(cfg.MODEL.PIXEL_MEAN).to(\n                torch.device(cfg.MODEL.DEVICE)).view(3, 1, 1)\n            self.pixel_std = torch.Tensor(cfg.MODEL.PIXEL_STD).to(\n                torch.device(cfg.MODEL.DEVICE)).view(3, 1, 1)\n        return ret\n\n    def forward(self, images, features, proposals, targets=None):\n        \"\"\"\n        enable debug\n        \"\"\"\n        if not self.debug:\n            del images\n        if self.training:\n            assert targets\n            proposals = self.label_and_sample_proposals(proposals, targets)\n        del targets\n\n        if self.training:\n            losses = self._forward_box(features, proposals)\n            losses.update(self._forward_mask(features, proposals))\n            losses.update(self._forward_keypoint(features, proposals))\n            return proposals, losses\n        else:\n            pred_instances = self._forward_box(features, proposals)\n            pred_instances = self.forward_with_given_boxes(features, pred_instances)\n            if self.debug:\n                from ..debug import debug_second_stage\n                denormalizer = lambda x: x * self.pixel_std + self.pixel_mean\n                debug_second_stage(\n                    [denormalizer(images[0].clone())],\n                    pred_instances, proposals=proposals,\n                    debug_show_name=self.debug_show_name)\n            return pred_instances, {}\n\n\n@ROI_HEADS_REGISTRY.register()\nclass CustomCascadeROIHeads(CascadeROIHeads):\n    @classmethod\n    def _init_box_head(self, cfg, input_shape):\n        self.mult_proposal_score = cfg.MODEL.ROI_BOX_HEAD.MULT_PROPOSAL_SCORE\n        ret = super()._init_box_head(cfg, input_shape)\n        del ret['box_predictors']\n        cascade_bbox_reg_weights = cfg.MODEL.ROI_BOX_CASCADE_HEAD.BBOX_REG_WEIGHTS\n        box_predictors = []\n        for box_head, bbox_reg_weights in zip(ret['box_heads'], cascade_bbox_reg_weights):\n            box_predictors.append(\n                CustomFastRCNNOutputLayers(\n                    cfg, box_head.output_shape,\n                    box2box_transform=Box2BoxTransform(weights=bbox_reg_weights)\n                ))\n        ret['box_predictors'] = box_predictors\n        self.debug = cfg.DEBUG\n        if self.debug:\n            self.debug_show_name = cfg.DEBUG_SHOW_NAME\n            self.save_debug = cfg.SAVE_DEBUG\n            self.vis_thresh = cfg.VIS_THRESH\n            self.pixel_mean = torch.Tensor(cfg.MODEL.PIXEL_MEAN).to(\n                torch.device(cfg.MODEL.DEVICE)).view(3, 1, 1)\n            self.pixel_std = torch.Tensor(cfg.MODEL.PIXEL_STD).to(\n                torch.device(cfg.MODEL.DEVICE)).view(3, 1, 1)\n        return ret\n\n\n    def _forward_box(self, features, proposals, targets=None):\n        \"\"\"\n        Add mult proposal scores at testing\n        \"\"\"\n        if (not self.training) and self.mult_proposal_score:\n            if len(proposals) > 0 and proposals[0].has('scores'):\n                proposal_scores = [\n                    p.get('scores') for p in proposals]\n            else:\n                proposal_scores = [\n                    p.get('objectness_logits') for p in proposals]\n        \n        features = [features[f] for f in self.box_in_features]\n        head_outputs = []  # (predictor, predictions, proposals)\n        prev_pred_boxes = None\n        image_sizes = [x.image_size for x in proposals]\n        for k in range(self.num_cascade_stages):\n            if k > 0:\n                proposals = self._create_proposals_from_boxes(prev_pred_boxes, image_sizes)\n                if self.training:\n                    proposals = self._match_and_label_boxes(proposals, k, targets)\n            predictions = self._run_stage(features, proposals, k)\n            prev_pred_boxes = self.box_predictor[k].predict_boxes(predictions, proposals)\n            head_outputs.append((self.box_predictor[k], predictions, proposals))\n\n        if self.training:\n            losses = {}\n            storage = get_event_storage()\n            for stage, (predictor, predictions, proposals) in enumerate(head_outputs):\n                with storage.name_scope(\"stage{}\".format(stage)):\n                    stage_losses = predictor.losses(predictions, proposals)\n                losses.update({k + \"_stage{}\".format(stage): v for k, v in stage_losses.items()})\n            return losses\n        else:\n            # Each is a list[Tensor] of length #image. Each tensor is Ri x (K+1)\n            scores_per_stage = [h[0].predict_probs(h[1], h[2]) for h in head_outputs]\n            scores = [\n                sum(list(scores_per_image)) * (1.0 / self.num_cascade_stages)\n                for scores_per_image in zip(*scores_per_stage)\n            ]\n            \n            if self.mult_proposal_score:\n                scores = [(s * ps[:, None]) ** 0.5 \\\n                    for s, ps in zip(scores, proposal_scores)]\n\n            predictor, predictions, proposals = head_outputs[-1]\n            boxes = predictor.predict_boxes(predictions, proposals)\n            pred_instances, _ = fast_rcnn_inference(\n                boxes,\n                scores,\n                image_sizes,\n                predictor.test_score_thresh,\n                predictor.test_nms_thresh,\n                predictor.test_topk_per_image,\n            )\n            \n            return pred_instances\n\n    def forward(self, images, features, proposals, targets=None):\n        '''\n        enable debug\n        '''\n        if not self.debug:\n            del images\n        if self.training:\n            proposals = self.label_and_sample_proposals(proposals, targets)\n\n        if self.training:\n            losses = self._forward_box(features, proposals, targets)\n            losses.update(self._forward_mask(features, proposals))\n            losses.update(self._forward_keypoint(features, proposals))\n            return proposals, losses\n        else:\n            # import pdb; pdb.set_trace()\n            pred_instances = self._forward_box(features, proposals)\n            pred_instances = self.forward_with_given_boxes(features, pred_instances)\n            if self.debug:\n                from ..debug import debug_second_stage\n                denormalizer = lambda x: x * self.pixel_std + self.pixel_mean\n                debug_second_stage(\n                    [denormalizer(x.clone()) for x in images],\n                    pred_instances, proposals=proposals,\n                    save_debug=self.save_debug,\n                    debug_show_name=self.debug_show_name,\n                    vis_thresh=self.vis_thresh)\n            return pred_instances, {}\n\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/grit_src/centernet2/centernet/modeling/roi_heads/fed_loss.py",
    "content": "import torch\nimport json\nimport numpy as np\nfrom torch.nn import functional as F\n\ndef load_class_freq(\n    path='datasets/lvis/lvis_v1_train_cat_info.json', \n    freq_weight=0.5):\n    cat_info = json.load(open(path, 'r'))\n    cat_info = torch.tensor(\n        [c['image_count'] for c in sorted(cat_info, key=lambda x: x['id'])])\n    freq_weight = cat_info.float() ** freq_weight\n    return freq_weight\n\ndef get_fed_loss_inds(\n    gt_classes, num_sample_cats=50, C=1203, \\\n    weight=None, fed_cls_inds=-1):\n    appeared = torch.unique(gt_classes) # C'\n    prob = appeared.new_ones(C + 1).float()\n    prob[-1] = 0\n    if len(appeared) < num_sample_cats:\n        if weight is not None:\n            prob[:C] = weight.float().clone()\n        prob[appeared] = 0\n        if fed_cls_inds > 0:\n            prob[fed_cls_inds:] = 0\n        more_appeared = torch.multinomial(\n            prob, num_sample_cats - len(appeared),\n            replacement=False)\n        appeared = torch.cat([appeared, more_appeared])\n    return appeared"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/tag2Text/__init__.py",
    "content": "import sys\nsys.path.append('third_party/grit_src')\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/tag2Text/med.py",
    "content": "'''\n * Copyright (c) 2022, salesforce.com, inc.\n * All rights reserved.\n * SPDX-License-Identifier: BSD-3-Clause\n * For full license text, see LICENSE.txt file in the repo root or https://opensource.org/licenses/BSD-3-Clause\n * By Junnan Li\n * Based on huggingface code base\n * https://github.com/huggingface/transformers/blob/v4.15.0/src/transformers/models/bert\n'''\n\nimport math\nimport os\nimport warnings\nfrom dataclasses import dataclass\nfrom typing import Optional, Tuple\n\nimport torch\nfrom torch import Tensor, device, dtype, nn\nimport torch.utils.checkpoint\nfrom torch import nn\nfrom torch.nn import CrossEntropyLoss\nimport torch.nn.functional as F\n\nfrom transformers.activations import ACT2FN\nfrom transformers.file_utils import (\n    ModelOutput,\n)\nfrom transformers.modeling_outputs import (\n    BaseModelOutputWithPastAndCrossAttentions,\n    BaseModelOutputWithPoolingAndCrossAttentions,\n    CausalLMOutputWithCrossAttentions,\n    MaskedLMOutput,\n    MultipleChoiceModelOutput,\n    NextSentencePredictorOutput,\n    QuestionAnsweringModelOutput,\n    SequenceClassifierOutput,\n    TokenClassifierOutput,\n)\nfrom transformers.modeling_utils import (\n    PreTrainedModel,\n    apply_chunking_to_forward,\n    find_pruneable_heads_and_indices,\n    prune_linear_layer,\n)\nfrom transformers.utils import logging\nfrom transformers.models.bert.configuration_bert import BertConfig\n\n\nlogger = logging.get_logger(__name__)\n\n\nclass BertEmbeddings_nopos(nn.Module):\n    \"\"\"Construct the embeddings from word and position embeddings.\"\"\"\n\n    def __init__(self, config):\n        super().__init__()\n        self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)\n        # self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.hidden_size)\n\n        # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load\n        # any TensorFlow checkpoint file\n        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)\n        self.dropout = nn.Dropout(config.hidden_dropout_prob)\n\n        # position_ids (1, len position emb) is contiguous in memory and exported when serialized\n        # self.register_buffer(\"position_ids\", torch.arange(config.max_position_embeddings).expand((1, -1)))\n        # self.position_embedding_type = getattr(config, \"position_embedding_type\", \"absolute\")\n        \n        self.config = config\n\n    def forward(\n        self, input_ids=None, position_ids=None, inputs_embeds=None, past_key_values_length=0\n    ):\n        if input_ids is not None:\n            input_shape = input_ids.size()\n        else:\n            input_shape = inputs_embeds.size()[:-1]\n\n        seq_length = input_shape[1]\n\n        # if position_ids is None:\n            # position_ids = self.position_ids[:, past_key_values_length : seq_length + past_key_values_length]\n\n        if inputs_embeds is None:\n            inputs_embeds = self.word_embeddings(input_ids)\n\n        embeddings = inputs_embeds\n\n        # if self.position_embedding_type == \"absolute\":\n        #     position_embeddings = self.position_embeddings(position_ids)\n        #     # print('add position_embeddings!!!!')\n        #     embeddings += position_embeddings\n        embeddings = self.LayerNorm(embeddings)\n        embeddings = self.dropout(embeddings)\n        return embeddings\n\n\n\n\nclass BertEmbeddings(nn.Module):\n    \"\"\"Construct the embeddings from word and position embeddings.\"\"\"\n\n    def __init__(self, config):\n        super().__init__()\n        self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)\n        self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.hidden_size)\n\n        # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load\n        # any TensorFlow checkpoint file\n        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)\n        self.dropout = nn.Dropout(config.hidden_dropout_prob)\n\n        # position_ids (1, len position emb) is contiguous in memory and exported when serialized\n        self.register_buffer(\"position_ids\", torch.arange(config.max_position_embeddings).expand((1, -1)))\n        self.position_embedding_type = getattr(config, \"position_embedding_type\", \"absolute\")\n        \n        self.config = config\n\n    def forward(\n        self, input_ids=None, position_ids=None, inputs_embeds=None, past_key_values_length=0\n    ):\n        if input_ids is not None:\n            input_shape = input_ids.size()\n        else:\n            input_shape = inputs_embeds.size()[:-1]\n\n        seq_length = input_shape[1]\n\n        if position_ids is None:\n            position_ids = self.position_ids[:, past_key_values_length : seq_length + past_key_values_length]\n\n        if inputs_embeds is None:\n            inputs_embeds = self.word_embeddings(input_ids)\n\n        embeddings = inputs_embeds\n\n        if self.position_embedding_type == \"absolute\":\n            position_embeddings = self.position_embeddings(position_ids)\n            # print('add position_embeddings!!!!')\n            embeddings += position_embeddings\n        embeddings = self.LayerNorm(embeddings)\n        embeddings = self.dropout(embeddings)\n        return embeddings\n\n\nclass BertSelfAttention(nn.Module):\n    def __init__(self, config, is_cross_attention):\n        super().__init__()\n        self.config = config\n        if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, \"embedding_size\"):\n            raise ValueError(\n                \"The hidden size (%d) is not a multiple of the number of attention \"\n                \"heads (%d)\" % (config.hidden_size, config.num_attention_heads)\n            )\n        \n        self.num_attention_heads = config.num_attention_heads\n        self.attention_head_size = int(config.hidden_size / config.num_attention_heads)\n        self.all_head_size = self.num_attention_heads * self.attention_head_size\n\n        self.query = nn.Linear(config.hidden_size, self.all_head_size)\n        if is_cross_attention:\n            self.key = nn.Linear(config.encoder_width, self.all_head_size)\n            self.value = nn.Linear(config.encoder_width, self.all_head_size)\n        else:\n            self.key = nn.Linear(config.hidden_size, self.all_head_size)\n            self.value = nn.Linear(config.hidden_size, self.all_head_size)\n\n        self.dropout = nn.Dropout(config.attention_probs_dropout_prob)\n        self.position_embedding_type = getattr(config, \"position_embedding_type\", \"absolute\")\n        if self.position_embedding_type == \"relative_key\" or self.position_embedding_type == \"relative_key_query\":\n            self.max_position_embeddings = config.max_position_embeddings\n            self.distance_embedding = nn.Embedding(2 * config.max_position_embeddings - 1, self.attention_head_size)\n        self.save_attention = False   \n            \n    def save_attn_gradients(self, attn_gradients):\n        self.attn_gradients = attn_gradients\n        \n    def get_attn_gradients(self):\n        return self.attn_gradients\n    \n    def save_attention_map(self, attention_map):\n        self.attention_map = attention_map\n        \n    def get_attention_map(self):\n        return self.attention_map\n    \n    def transpose_for_scores(self, x):\n        new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size)\n        x = x.view(*new_x_shape)\n        return x.permute(0, 2, 1, 3)\n\n    def forward(\n        self,\n        hidden_states,\n        attention_mask=None,\n        head_mask=None,\n        encoder_hidden_states=None,\n        encoder_attention_mask=None,\n        past_key_value=None,\n        output_attentions=False,\n    ):\n        mixed_query_layer = self.query(hidden_states)\n\n        # If this is instantiated as a cross-attention module, the keys\n        # and values come from an encoder; the attention mask needs to be\n        # such that the encoder's padding tokens are not attended to.\n        is_cross_attention = encoder_hidden_states is not None\n\n        if is_cross_attention:\n            # print(self.key.weight.shape)\n            key_layer = self.transpose_for_scores(self.key(encoder_hidden_states))\n            value_layer = self.transpose_for_scores(self.value(encoder_hidden_states))\n            attention_mask = encoder_attention_mask\n        elif past_key_value is not None:\n            key_layer = self.transpose_for_scores(self.key(hidden_states))\n            value_layer = self.transpose_for_scores(self.value(hidden_states))\n            key_layer = torch.cat([past_key_value[0], key_layer], dim=2)\n            value_layer = torch.cat([past_key_value[1], value_layer], dim=2)\n        else:\n            key_layer = self.transpose_for_scores(self.key(hidden_states))\n            value_layer = self.transpose_for_scores(self.value(hidden_states))\n\n        query_layer = self.transpose_for_scores(mixed_query_layer)\n       \n        if key_layer.shape[0] > query_layer.shape[0]:\n            key_layer = key_layer[:query_layer.shape[0], :, :, :]\n            attention_mask = attention_mask[:query_layer.shape[0], :, :]\n            value_layer = value_layer[:query_layer.shape[0], :, :, :]\n        attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))\n\n        past_key_value = (key_layer, value_layer)\n\n        # Take the dot product between \"query\" and \"key\" to get the raw attention scores.\n        attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))\n\n        if self.position_embedding_type == \"relative_key\" or self.position_embedding_type == \"relative_key_query\":\n            seq_length = hidden_states.size()[1]\n            position_ids_l = torch.arange(seq_length, dtype=torch.long, device=hidden_states.device).view(-1, 1)\n            position_ids_r = torch.arange(seq_length, dtype=torch.long, device=hidden_states.device).view(1, -1)\n            distance = position_ids_l - position_ids_r\n            positional_embedding = self.distance_embedding(distance + self.max_position_embeddings - 1)\n            positional_embedding = positional_embedding.to(dtype=query_layer.dtype)  # fp16 compatibility\n\n            if self.position_embedding_type == \"relative_key\":\n                relative_position_scores = torch.einsum(\"bhld,lrd->bhlr\", query_layer, positional_embedding)\n                attention_scores = attention_scores + relative_position_scores\n            elif self.position_embedding_type == \"relative_key_query\":\n                relative_position_scores_query = torch.einsum(\"bhld,lrd->bhlr\", query_layer, positional_embedding)\n                relative_position_scores_key = torch.einsum(\"bhrd,lrd->bhlr\", key_layer, positional_embedding)\n                attention_scores = attention_scores + relative_position_scores_query + relative_position_scores_key\n\n        attention_scores = attention_scores / math.sqrt(self.attention_head_size)\n        if attention_mask is not None:\n            # Apply the attention mask is (precomputed for all layers in BertModel forward() function)\n            attention_scores = attention_scores + attention_mask\n\n        # Normalize the attention scores to probabilities.\n        attention_probs = nn.Softmax(dim=-1)(attention_scores)\n        \n        if is_cross_attention and self.save_attention:\n            self.save_attention_map(attention_probs)\n            attention_probs.register_hook(self.save_attn_gradients)         \n\n        # This is actually dropping out entire tokens to attend to, which might\n        # seem a bit unusual, but is taken from the original Transformer paper.\n        attention_probs_dropped = self.dropout(attention_probs)\n\n        # Mask heads if we want to\n        if head_mask is not None:\n            attention_probs_dropped = attention_probs_dropped * head_mask\n\n        context_layer = torch.matmul(attention_probs_dropped, value_layer)\n\n        context_layer = context_layer.permute(0, 2, 1, 3).contiguous()\n        new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)\n        context_layer = context_layer.view(*new_context_layer_shape)\n\n        outputs = (context_layer, attention_probs) if output_attentions else (context_layer,)\n\n        outputs = outputs + (past_key_value,)\n        return outputs\n\n\nclass BertSelfOutput(nn.Module):\n    def __init__(self, config):\n        super().__init__()\n        self.dense = nn.Linear(config.hidden_size, config.hidden_size)\n        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)\n        self.dropout = nn.Dropout(config.hidden_dropout_prob)\n\n    def forward(self, hidden_states, input_tensor):\n        hidden_states = self.dense(hidden_states)\n        hidden_states = self.dropout(hidden_states)\n        hidden_states = self.LayerNorm(hidden_states + input_tensor)\n        return hidden_states\n\n\nclass BertAttention(nn.Module):\n    def __init__(self, config, is_cross_attention=False):\n        super().__init__()\n        self.self = BertSelfAttention(config, is_cross_attention)\n        self.output = BertSelfOutput(config)\n        self.pruned_heads = set()\n\n    def prune_heads(self, heads):\n        if len(heads) == 0:\n            return\n        heads, index = find_pruneable_heads_and_indices(\n            heads, self.self.num_attention_heads, self.self.attention_head_size, self.pruned_heads\n        )\n\n        # Prune linear layers\n        self.self.query = prune_linear_layer(self.self.query, index)\n        self.self.key = prune_linear_layer(self.self.key, index)\n        self.self.value = prune_linear_layer(self.self.value, index)\n        self.output.dense = prune_linear_layer(self.output.dense, index, dim=1)\n\n        # Update hyper params and store pruned heads\n        self.self.num_attention_heads = self.self.num_attention_heads - len(heads)\n        self.self.all_head_size = self.self.attention_head_size * self.self.num_attention_heads\n        self.pruned_heads = self.pruned_heads.union(heads)\n\n    def forward(\n        self,\n        hidden_states,\n        attention_mask=None,\n        head_mask=None,\n        encoder_hidden_states=None,\n        encoder_attention_mask=None,\n        past_key_value=None,\n        output_attentions=False,\n    ):\n        self_outputs = self.self(\n            hidden_states,\n            attention_mask,\n            head_mask,\n            encoder_hidden_states,\n            encoder_attention_mask,\n            past_key_value,\n            output_attentions,\n        )\n        attention_output = self.output(self_outputs[0], hidden_states)\n        outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them\n        return outputs\n\n\nclass BertIntermediate(nn.Module):\n    def __init__(self, config):\n        super().__init__()\n        self.dense = nn.Linear(config.hidden_size, config.intermediate_size)\n        if isinstance(config.hidden_act, str):\n            self.intermediate_act_fn = ACT2FN[config.hidden_act]\n        else:\n            self.intermediate_act_fn = config.hidden_act\n\n    def forward(self, hidden_states):\n        hidden_states = self.dense(hidden_states)\n        hidden_states = self.intermediate_act_fn(hidden_states)\n        return hidden_states\n\n\nclass BertOutput(nn.Module):\n    def __init__(self, config):\n        super().__init__()\n        self.dense = nn.Linear(config.intermediate_size, config.hidden_size)\n        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)\n        self.dropout = nn.Dropout(config.hidden_dropout_prob)\n\n    def forward(self, hidden_states, input_tensor):\n        hidden_states = self.dense(hidden_states)\n        hidden_states = self.dropout(hidden_states)\n        hidden_states = self.LayerNorm(hidden_states + input_tensor)\n        return hidden_states\n\n\nclass BertLayer(nn.Module):\n    def __init__(self, config, layer_num):\n        super().__init__()\n        self.config = config\n        self.chunk_size_feed_forward = config.chunk_size_feed_forward\n        self.seq_len_dim = 1\n        self.attention = BertAttention(config)      \n        self.layer_num = layer_num          \n        if self.config.add_cross_attention:\n            self.crossattention = BertAttention(config, is_cross_attention=self.config.add_cross_attention)\n        self.intermediate = BertIntermediate(config)\n        self.output = BertOutput(config)\n\n    def forward(\n        self,\n        hidden_states,\n        attention_mask=None,\n        head_mask=None,\n        encoder_hidden_states=None,\n        encoder_attention_mask=None,\n        past_key_value=None,\n        output_attentions=False,\n        mode=None,\n    ):\n        \n        if mode == 'mlr':\n\n            assert encoder_hidden_states is not None, \"encoder_hidden_states must be given for cross-attention layers\"\n\n            # print('attention_output.shape',attention_output.shape)\n            # print('encoder_hidden_states.shape',encoder_hidden_states.shape)\n            cross_attention_outputs = self.crossattention(\n                hidden_states,\n                attention_mask,\n                head_mask,\n                encoder_hidden_states,\n                encoder_attention_mask,\n                output_attentions=output_attentions,\n            )\n            attention_output = cross_attention_outputs[0]\n            outputs = cross_attention_outputs[1:-1]  # add cross attentions if we output attention weights  \n\n            present_key_value = cross_attention_outputs[-1]\n\n        else:\n            # decoder uni-directional self-attention cached key/values tuple is at positions 1,2\n            self_attn_past_key_value = past_key_value[:2] if past_key_value is not None else None\n            self_attention_outputs = self.attention(\n                hidden_states,\n                attention_mask,\n                head_mask,\n                output_attentions=output_attentions,\n                past_key_value=self_attn_past_key_value,\n            )\n            attention_output = self_attention_outputs[0]\n\n            outputs = self_attention_outputs[1:-1]\n            present_key_value = self_attention_outputs[-1]\n\n            if mode=='multimodal':\n                assert encoder_hidden_states is not None, \"encoder_hidden_states must be given for cross-attention layers\"\n\n                cross_attention_outputs = self.crossattention(\n                    attention_output,\n                    attention_mask,\n                    head_mask,\n                    encoder_hidden_states,\n                    encoder_attention_mask,\n                    output_attentions=output_attentions,\n                )\n                attention_output = cross_attention_outputs[0]\n                outputs = outputs + cross_attention_outputs[1:-1]  # add cross attentions if we output attention weights                               \n        layer_output = apply_chunking_to_forward(\n            self.feed_forward_chunk, self.chunk_size_feed_forward, self.seq_len_dim, attention_output\n        )\n        outputs = (layer_output,) + outputs\n\n        outputs = outputs + (present_key_value,)\n\n        return outputs\n\n    def feed_forward_chunk(self, attention_output):\n        intermediate_output = self.intermediate(attention_output)\n        layer_output = self.output(intermediate_output, attention_output)\n        return layer_output\n\n\nclass BertEncoder(nn.Module):\n    def __init__(self, config):\n        super().__init__()\n        self.config = config\n        self.layer = nn.ModuleList([BertLayer(config,i) for i in range(config.num_hidden_layers)])\n        self.gradient_checkpointing = False\n\n    def forward(\n        self,\n        hidden_states,\n        attention_mask=None,\n        head_mask=None,\n        encoder_hidden_states=None,\n        encoder_attention_mask=None,\n        past_key_values=None,\n        use_cache=None,\n        output_attentions=False,\n        output_hidden_states=False,\n        return_dict=True,\n        mode='multimodal',\n    ):\n        all_hidden_states = () if output_hidden_states else None\n        all_self_attentions = () if output_attentions else None\n        all_cross_attentions = () if output_attentions and self.config.add_cross_attention else None\n\n        next_decoder_cache = () if use_cache else None\n               \n        for i in range(self.config.num_hidden_layers):\n            layer_module = self.layer[i]\n            if output_hidden_states:\n                all_hidden_states = all_hidden_states + (hidden_states,)\n\n            layer_head_mask = head_mask[i] if head_mask is not None else None\n            past_key_value = past_key_values[i] if past_key_values is not None else None\n\n            if self.gradient_checkpointing and self.training:\n\n                if use_cache:\n                    logger.warn(\n                        \"`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\"\n                    )\n                    use_cache = False\n\n                def create_custom_forward(module):\n                    def custom_forward(*inputs):\n                        return module(*inputs, past_key_value, output_attentions)\n\n                    return custom_forward\n\n                layer_outputs = torch.utils.checkpoint.checkpoint(\n                    create_custom_forward(layer_module),\n                    hidden_states,\n                    attention_mask,\n                    layer_head_mask,\n                    encoder_hidden_states,\n                    encoder_attention_mask,\n                    mode=mode,\n                )\n            else:\n                layer_outputs = layer_module(\n                    hidden_states,\n                    attention_mask,\n                    layer_head_mask,\n                    encoder_hidden_states,\n                    encoder_attention_mask,\n                    past_key_value,\n                    output_attentions,\n                    mode=mode,\n                )\n\n            hidden_states = layer_outputs[0]\n            if use_cache:\n                next_decoder_cache += (layer_outputs[-1],)\n            if output_attentions:\n                all_self_attentions = all_self_attentions + (layer_outputs[1],)\n\n        if output_hidden_states:\n            all_hidden_states = all_hidden_states + (hidden_states,)\n\n        if not return_dict:\n            return tuple(\n                v\n                for v in [\n                    hidden_states,\n                    next_decoder_cache,\n                    all_hidden_states,\n                    all_self_attentions,\n                    all_cross_attentions,\n                ]\n                if v is not None\n            )\n        return BaseModelOutputWithPastAndCrossAttentions(\n            last_hidden_state=hidden_states,\n            past_key_values=next_decoder_cache,\n            hidden_states=all_hidden_states,\n            attentions=all_self_attentions,\n            cross_attentions=all_cross_attentions,\n        )\n\n\nclass BertPooler(nn.Module):\n    def __init__(self, config):\n        super().__init__()\n        self.dense = nn.Linear(config.hidden_size, config.hidden_size)\n        self.activation = nn.Tanh()\n\n    def forward(self, hidden_states):\n        # We \"pool\" the model by simply taking the hidden state corresponding\n        # to the first token.\n        first_token_tensor = hidden_states[:, 0]\n        pooled_output = self.dense(first_token_tensor)\n        pooled_output = self.activation(pooled_output)\n        return pooled_output\n\n\nclass BertPredictionHeadTransform(nn.Module):\n    def __init__(self, config):\n        super().__init__()\n        self.dense = nn.Linear(config.hidden_size, config.hidden_size)\n        if isinstance(config.hidden_act, str):\n            self.transform_act_fn = ACT2FN[config.hidden_act]\n        else:\n            self.transform_act_fn = config.hidden_act\n        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)\n\n    def forward(self, hidden_states):\n        hidden_states = self.dense(hidden_states)\n        hidden_states = self.transform_act_fn(hidden_states)\n        hidden_states = self.LayerNorm(hidden_states)\n        return hidden_states\n\n\nclass BertLMPredictionHead(nn.Module):\n    def __init__(self, config):\n        super().__init__()\n        self.transform = BertPredictionHeadTransform(config)\n\n        # The output weights are the same as the input embeddings, but there is\n        # an output-only bias for each token.\n        self.decoder = nn.Linear(config.hidden_size, config.vocab_size, bias=False)\n\n        self.bias = nn.Parameter(torch.zeros(config.vocab_size))\n\n        # Need a link between the two variables so that the bias is correctly resized with `resize_token_embeddings`\n        self.decoder.bias = self.bias\n\n    def forward(self, hidden_states):\n        hidden_states = self.transform(hidden_states)\n        hidden_states = self.decoder(hidden_states)\n        return hidden_states\n\n\nclass BertOnlyMLMHead(nn.Module):\n    def __init__(self, config):\n        super().__init__()\n        self.predictions = BertLMPredictionHead(config)\n\n    def forward(self, sequence_output):\n        prediction_scores = self.predictions(sequence_output)\n        return prediction_scores\n\n\nclass BertPreTrainedModel(PreTrainedModel):\n    \"\"\"\n    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained\n    models.\n    \"\"\"\n\n    config_class = BertConfig\n    base_model_prefix = \"bert\"\n    _keys_to_ignore_on_load_missing = [r\"position_ids\"]\n\n    def _init_weights(self, module):\n        \"\"\" Initialize the weights \"\"\"\n        if isinstance(module, (nn.Linear, nn.Embedding)):\n            # Slightly different from the TF version which uses truncated_normal for initialization\n            # cf https://github.com/pytorch/pytorch/pull/5617\n            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)\n        elif isinstance(module, nn.LayerNorm):\n            module.bias.data.zero_()\n            module.weight.data.fill_(1.0)\n        if isinstance(module, nn.Linear) and module.bias is not None:\n            module.bias.data.zero_()\n\n\nclass BertModel(BertPreTrainedModel):\n    \"\"\"\n    The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of\n    cross-attention is added between the self-attention layers, following the architecture described in `Attention is\n    all you need <https://arxiv.org/abs/1706.03762>`__ by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit,\n    Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.\n    argument and :obj:`add_cross_attention` set to :obj:`True`; an :obj:`encoder_hidden_states` is then expected as an\n    input to the forward pass.\n    \"\"\"\n\n    def __init__(self, config, add_pooling_layer=True):\n        super().__init__(config)\n        self.config = config\n\n        self.embeddings = BertEmbeddings(config)\n        \n        self.encoder = BertEncoder(config)\n\n        self.pooler = BertPooler(config) if add_pooling_layer else None\n\n        self.init_weights()\n \n\n    def get_input_embeddings(self):\n        return self.embeddings.word_embeddings\n\n    def set_input_embeddings(self, value):\n        self.embeddings.word_embeddings = value\n\n    def _prune_heads(self, heads_to_prune):\n        \"\"\"\n        Prunes heads of the model. heads_to_prune: dict of {layer_num: list of heads to prune in this layer} See base\n        class PreTrainedModel\n        \"\"\"\n        for layer, heads in heads_to_prune.items():\n            self.encoder.layer[layer].attention.prune_heads(heads)\n\n    \n    def get_extended_attention_mask(self, attention_mask: Tensor, input_shape: Tuple[int], device: device, is_decoder: bool) -> Tensor:\n        \"\"\"\n        Makes broadcastable attention and causal masks so that future and masked tokens are ignored.\n\n        Arguments:\n            attention_mask (:obj:`torch.Tensor`):\n                Mask with ones indicating tokens to attend to, zeros for tokens to ignore.\n            input_shape (:obj:`Tuple[int]`):\n                The shape of the input to the model.\n            device: (:obj:`torch.device`):\n                The device of the input to the model.\n\n        Returns:\n            :obj:`torch.Tensor` The extended attention mask, with a the same dtype as :obj:`attention_mask.dtype`.\n        \"\"\"\n        # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]\n        # ourselves in which case we just need to make it broadcastable to all heads.\n        if attention_mask.dim() == 3:\n            extended_attention_mask = attention_mask[:, None, :, :]\n        elif attention_mask.dim() == 2:\n            # Provided a padding mask of dimensions [batch_size, seq_length]\n            # - if the model is a decoder, apply a causal mask in addition to the padding mask\n            # - if the model is an encoder, make the mask broadcastable to [batch_size, num_heads, seq_length, seq_length]\n            if is_decoder:\n                batch_size, seq_length = input_shape\n\n                seq_ids = torch.arange(seq_length, device=device)\n                causal_mask = seq_ids[None, None, :].repeat(batch_size, seq_length, 1) <= seq_ids[None, :, None]\n                # in case past_key_values are used we need to add a prefix ones mask to the causal mask\n                # causal and attention masks must have same type with pytorch version < 1.3\n                causal_mask = causal_mask.to(attention_mask.dtype)\n   \n                if causal_mask.shape[1] < attention_mask.shape[1]:\n                    prefix_seq_len = attention_mask.shape[1] - causal_mask.shape[1]\n                    causal_mask = torch.cat(\n                        [\n                            torch.ones((batch_size, seq_length, prefix_seq_len), device=device, dtype=causal_mask.dtype),\n                            causal_mask,\n                        ],\n                        axis=-1,\n                    )                     \n\n                extended_attention_mask = causal_mask[:, None, :, :] * attention_mask[:, None, None, :]\n            else:\n                extended_attention_mask = attention_mask[:, None, None, :]\n        else:\n            raise ValueError(\n                \"Wrong shape for input_ids (shape {}) or attention_mask (shape {})\".format(\n                    input_shape, attention_mask.shape\n                )\n            )\n\n        # Since attention_mask is 1.0 for positions we want to attend and 0.0 for\n        # masked positions, this operation will create a tensor which is 0.0 for\n        # positions we want to attend and -10000.0 for masked positions.\n        # Since we are adding it to the raw scores before the softmax, this is\n        # effectively the same as removing these entirely.\n        extended_attention_mask = extended_attention_mask.to(dtype=self.dtype)  # fp16 compatibility\n        extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0\n        return extended_attention_mask\n    \n    def forward(\n        self,\n        input_ids=None,\n        attention_mask=None,\n        position_ids=None,\n        head_mask=None,\n        inputs_embeds=None,\n        encoder_embeds=None,\n        encoder_hidden_states=None,\n        encoder_attention_mask=None,\n        past_key_values=None,\n        use_cache=None,\n        output_attentions=None,\n        output_hidden_states=None,\n        return_dict=None,\n        is_decoder=False,\n        mode='multimodal',\n    ):\n        r\"\"\"\n        encoder_hidden_states  (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):\n            Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if\n            the model is configured as a decoder.\n        encoder_attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):\n            Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in\n            the cross-attention if the model is configured as a decoder. Mask values selected in ``[0, 1]``:\n            - 1 for tokens that are **not masked**,\n            - 0 for tokens that are **masked**.\n        past_key_values (:obj:`tuple(tuple(torch.FloatTensor))` of length :obj:`config.n_layers` with each tuple having 4 tensors of shape :obj:`(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):\n            Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.\n            If :obj:`past_key_values` are used, the user can optionally input only the last :obj:`decoder_input_ids`\n            (those that don't have their past key value states given to this model) of shape :obj:`(batch_size, 1)`\n            instead of all :obj:`decoder_input_ids` of shape :obj:`(batch_size, sequence_length)`.\n        use_cache (:obj:`bool`, `optional`):\n            If set to :obj:`True`, :obj:`past_key_values` key value states are returned and can be used to speed up\n            decoding (see :obj:`past_key_values`).\n        \"\"\"\n        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions\n        output_hidden_states = (\n            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states\n        )\n        return_dict = return_dict if return_dict is not None else self.config.use_return_dict\n\n        if is_decoder:\n            use_cache = use_cache if use_cache is not None else self.config.use_cache\n        else:\n            use_cache = False\n\n        if input_ids is not None and inputs_embeds is not None:\n            raise ValueError(\"You cannot specify both input_ids and inputs_embeds at the same time\")\n        elif input_ids is not None:\n            input_shape = input_ids.size()\n            batch_size, seq_length = input_shape\n            device = input_ids.device\n        elif inputs_embeds is not None:\n            input_shape = inputs_embeds.size()[:-1]\n            batch_size, seq_length = input_shape\n            device = inputs_embeds.device\n        elif encoder_embeds is not None:    \n            input_shape = encoder_embeds.size()[:-1]\n            batch_size, seq_length = input_shape \n            device = encoder_embeds.device\n        else:\n            raise ValueError(\"You have to specify either input_ids or inputs_embeds or encoder_embeds\")\n\n        # past_key_values_length\n        past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0\n\n        if attention_mask is None:\n            attention_mask = torch.ones(((batch_size, seq_length + past_key_values_length)), device=device)\n            \n        # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]\n        # ourselves in which case we just need to make it broadcastable to all heads.\n        extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape, \n                                                                                 device, is_decoder)\n\n        # If a 2D or 3D attention mask is provided for the cross-attention\n        # we need to make broadcastable to [batch_size, num_heads, seq_length, seq_length]\n        if encoder_hidden_states is not None:\n            if type(encoder_hidden_states) == list:\n                encoder_batch_size, encoder_sequence_length, _ = encoder_hidden_states[0].size()\n            else:\n                encoder_batch_size, encoder_sequence_length, _ = encoder_hidden_states.size()\n            encoder_hidden_shape = (encoder_batch_size, encoder_sequence_length)\n            \n            if type(encoder_attention_mask) == list:\n                encoder_extended_attention_mask = [self.invert_attention_mask(mask) for mask in encoder_attention_mask]\n            elif encoder_attention_mask is None:\n                encoder_attention_mask = torch.ones(encoder_hidden_shape, device=device)\n                encoder_extended_attention_mask = self.invert_attention_mask(encoder_attention_mask)\n            else:    \n                encoder_extended_attention_mask = self.invert_attention_mask(encoder_attention_mask)\n        else:\n            encoder_extended_attention_mask = None\n\n        # Prepare head mask if needed\n        # 1.0 in head_mask indicate we keep the head\n        # attention_probs has shape bsz x n_heads x N x N\n        # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]\n        # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]\n        head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)\n        \n        if encoder_embeds is None:\n            embedding_output = self.embeddings(\n                input_ids=input_ids,\n                position_ids=position_ids,\n                inputs_embeds=inputs_embeds,\n                past_key_values_length=past_key_values_length,\n            )\n        else:\n            embedding_output = encoder_embeds\n            \n        encoder_outputs = self.encoder(\n            embedding_output,\n            attention_mask=extended_attention_mask,\n            head_mask=head_mask,\n            encoder_hidden_states=encoder_hidden_states,\n            encoder_attention_mask=encoder_extended_attention_mask,\n            past_key_values=past_key_values,\n            use_cache=use_cache,\n            output_attentions=output_attentions,\n            output_hidden_states=output_hidden_states,\n            return_dict=return_dict,\n            mode=mode,\n        )\n        sequence_output = encoder_outputs[0]\n        pooled_output = self.pooler(sequence_output) if self.pooler is not None else None\n\n        if not return_dict:\n            return (sequence_output, pooled_output) + encoder_outputs[1:]\n\n        return BaseModelOutputWithPoolingAndCrossAttentions(\n            last_hidden_state=sequence_output,\n            pooler_output=pooled_output,\n            past_key_values=encoder_outputs.past_key_values,\n            hidden_states=encoder_outputs.hidden_states,\n            attentions=encoder_outputs.attentions,\n            cross_attentions=encoder_outputs.cross_attentions,\n        )\n\n\nclass BertLMHeadModel(BertPreTrainedModel):\n\n    _keys_to_ignore_on_load_unexpected = [r\"pooler\"]\n    _keys_to_ignore_on_load_missing = [r\"position_ids\", r\"predictions.decoder.bias\"]\n\n    def __init__(self, config):\n        super().__init__(config)\n\n        self.bert = BertModel(config, add_pooling_layer=False)\n        self.cls = BertOnlyMLMHead(config)\n\n        self.init_weights()\n\n    def get_output_embeddings(self):\n        return self.cls.predictions.decoder\n\n    def set_output_embeddings(self, new_embeddings):\n        self.cls.predictions.decoder = new_embeddings\n\n    def forward(\n        self,\n        input_ids=None,\n        attention_mask=None,\n        position_ids=None,\n        head_mask=None,\n        inputs_embeds=None,\n        encoder_hidden_states=None,\n        encoder_attention_mask=None,\n        labels=None,\n        past_key_values=None,\n        use_cache=None,\n        output_attentions=None,\n        output_hidden_states=None,\n        return_dict=None,\n        return_logits=False,            \n        is_decoder=True,\n        reduction='mean',\n        mode='multimodal', \n    ):\n        r\"\"\"\n        encoder_hidden_states  (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):\n            Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if\n            the model is configured as a decoder.\n        encoder_attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):\n            Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in\n            the cross-attention if the model is configured as a decoder. Mask values selected in ``[0, 1]``:\n            - 1 for tokens that are **not masked**,\n            - 0 for tokens that are **masked**.\n        labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):\n            Labels for computing the left-to-right language modeling loss (next word prediction). Indices should be in\n            ``[-100, 0, ..., config.vocab_size]`` (see ``input_ids`` docstring) Tokens with indices set to ``-100`` are\n            ignored (masked), the loss is only computed for the tokens with labels n ``[0, ..., config.vocab_size]``\n        past_key_values (:obj:`tuple(tuple(torch.FloatTensor))` of length :obj:`config.n_layers` with each tuple having 4 tensors of shape :obj:`(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):\n            Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.\n            If :obj:`past_key_values` are used, the user can optionally input only the last :obj:`decoder_input_ids`\n            (those that don't have their past key value states given to this model) of shape :obj:`(batch_size, 1)`\n            instead of all :obj:`decoder_input_ids` of shape :obj:`(batch_size, sequence_length)`.\n        use_cache (:obj:`bool`, `optional`):\n            If set to :obj:`True`, :obj:`past_key_values` key value states are returned and can be used to speed up\n            decoding (see :obj:`past_key_values`).\n        Returns:\n        Example::\n            >>> from transformers import BertTokenizer, BertLMHeadModel, BertConfig\n            >>> import torch\n            >>> tokenizer = BertTokenizer.from_pretrained('bert-base-cased')\n            >>> config = BertConfig.from_pretrained(\"bert-base-cased\")\n            >>> model = BertLMHeadModel.from_pretrained('bert-base-cased', config=config)\n            >>> inputs = tokenizer(\"Hello, my dog is cute\", return_tensors=\"pt\")\n            >>> outputs = model(**inputs)\n            >>> prediction_logits = outputs.logits\n        \"\"\"\n        return_dict = return_dict if return_dict is not None else self.config.use_return_dict\n        if labels is not None:\n            use_cache = False\n\n        outputs = self.bert(\n            input_ids,\n            attention_mask=attention_mask,\n            position_ids=position_ids,\n            head_mask=head_mask,\n            inputs_embeds=inputs_embeds,\n            encoder_hidden_states=encoder_hidden_states,\n            encoder_attention_mask=encoder_attention_mask,\n            past_key_values=past_key_values,\n            use_cache=use_cache,\n            output_attentions=output_attentions,\n            output_hidden_states=output_hidden_states,\n            return_dict=return_dict,\n            is_decoder=is_decoder,\n            mode=mode,\n        )\n        \n        sequence_output = outputs[0]\n        prediction_scores = self.cls(sequence_output)\n        # sequence_output.shape torch.Size([85, 30, 768])\n        # prediction_scores.shape torch.Size([85, 30, 30524])\n        # labels.shape torch.Size([85, 30])\n\n\n        if return_logits:\n            return prediction_scores[:, :-1, :].contiguous()  \n\n        lm_loss = None\n        if labels is not None:\n            # we are doing next-token prediction; shift prediction scores and input ids by one\n            shifted_prediction_scores = prediction_scores[:, :-1, :].contiguous()\n            labels = labels[:, 1:].contiguous()\n            loss_fct = CrossEntropyLoss(reduction=reduction, label_smoothing=0.1) \n            lm_loss = loss_fct(shifted_prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))\n            if reduction=='none':\n                lm_loss = lm_loss.view(prediction_scores.size(0),-1).sum(1)               \n\n        if not return_dict:\n            output = (prediction_scores,) + outputs[2:]\n            return ((lm_loss,) + output) if lm_loss is not None else output\n\n        return CausalLMOutputWithCrossAttentions(\n            loss=lm_loss,\n            logits=prediction_scores,\n            past_key_values=outputs.past_key_values,\n            hidden_states=outputs.hidden_states,\n            attentions=outputs.attentions,\n            cross_attentions=outputs.cross_attentions,\n        )\n\n    def prepare_inputs_for_generation(self, input_ids, past=None, attention_mask=None, **model_kwargs):\n        input_shape = input_ids.shape\n        # if model is used as a decoder in encoder-decoder model, the decoder attention mask is created on the fly\n        if attention_mask is None:\n            attention_mask = input_ids.new_ones(input_shape)\n\n        # cut decoder_input_ids if past is used\n        if past is not None:\n            input_ids = input_ids[:, -1:]\n\n        return {\n            \"input_ids\": input_ids, \n            \"attention_mask\": attention_mask, \n            \"past_key_values\": past,\n            \"encoder_hidden_states\": model_kwargs.get(\"encoder_hidden_states\", None),\n            \"encoder_attention_mask\": model_kwargs.get(\"encoder_attention_mask\", None),\n            \"is_decoder\": True,\n        }\n\n    def _reorder_cache(self, past, beam_idx):\n        reordered_past = ()\n        for layer_past in past:\n            reordered_past += (tuple(past_state.index_select(0, beam_idx) for past_state in layer_past),)\n        return reordered_past\n\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/tag2Text/swin_transformer.py",
    "content": "# --------------------------------------------------------\n# Swin Transformer\n# Copyright (c) 2021 Microsoft\n# Licensed under The MIT License [see LICENSE for details]\n# Written by Ze Liu\n# --------------------------------------------------------\n\nimport numpy as np\nfrom scipy import interpolate\n\nimport torch\nimport torch.nn as nn\nimport torch.utils.checkpoint as checkpoint\nfrom timm.models.layers import DropPath, to_2tuple, trunc_normal_\n\n\nclass Mlp(nn.Module):\n    def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):\n        super().__init__()\n        out_features = out_features or in_features\n        hidden_features = hidden_features or in_features\n        self.fc1 = nn.Linear(in_features, hidden_features)\n        self.act = act_layer()\n        self.fc2 = nn.Linear(hidden_features, out_features)\n        self.drop = nn.Dropout(drop)\n\n    def forward(self, x):\n        x = self.fc1(x)\n        x = self.act(x)\n        x = self.drop(x)\n        x = self.fc2(x)\n        x = self.drop(x)\n        return x\n\n\ndef window_partition(x, window_size):\n    \"\"\"\n    Args:\n        x: (B, H, W, C)\n        window_size (int): window size\n\n    Returns:\n        windows: (num_windows*B, window_size, window_size, C)\n    \"\"\"\n    B, H, W, C = x.shape\n    x = x.view(B, H // window_size, window_size, W // window_size, window_size, C)\n    windows = x.permute(0, 1, 3, 2, 4, 5).contiguous().view(-1, window_size, window_size, C)\n    return windows\n\n\ndef window_reverse(windows, window_size, H, W):\n    \"\"\"\n    Args:\n        windows: (num_windows*B, window_size, window_size, C)\n        window_size (int): Window size\n        H (int): Height of image\n        W (int): Width of image\n\n    Returns:\n        x: (B, H, W, C)\n    \"\"\"\n    B = int(windows.shape[0] / (H * W / window_size / window_size))\n    x = windows.view(B, H // window_size, W // window_size, window_size, window_size, -1)\n    x = x.permute(0, 1, 3, 2, 4, 5).contiguous().view(B, H, W, -1)\n    return x\n\n\nclass WindowAttention(nn.Module):\n    r\"\"\" Window based multi-head self attention (W-MSA) module with relative position bias.\n    It supports both of shifted and non-shifted window.\n\n    Args:\n        dim (int): Number of input channels.\n        window_size (tuple[int]): The height and width of the window.\n        num_heads (int): Number of attention heads.\n        qkv_bias (bool, optional):  If True, add a learnable bias to query, key, value. Default: True\n        qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set\n        attn_drop (float, optional): Dropout ratio of attention weight. Default: 0.0\n        proj_drop (float, optional): Dropout ratio of output. Default: 0.0\n    \"\"\"\n\n    def __init__(self, dim, window_size, num_heads, qkv_bias=True, qk_scale=None, attn_drop=0., proj_drop=0.):\n\n        super().__init__()\n        self.dim = dim\n        self.window_size = window_size  # Wh, Ww\n        self.num_heads = num_heads\n        head_dim = dim // num_heads\n        self.scale = qk_scale or head_dim ** -0.5\n\n        # define a parameter table of relative position bias\n        self.relative_position_bias_table = nn.Parameter(\n            torch.zeros((2 * window_size[0] - 1) * (2 * window_size[1] - 1), num_heads))  # 2*Wh-1 * 2*Ww-1, nH\n\n        # get pair-wise relative position index for each token inside the window\n        coords_h = torch.arange(self.window_size[0])\n        coords_w = torch.arange(self.window_size[1])\n        coords = torch.stack(torch.meshgrid([coords_h, coords_w]))  # 2, Wh, Ww\n        coords_flatten = torch.flatten(coords, 1)  # 2, Wh*Ww\n        relative_coords = coords_flatten[:, :, None] - coords_flatten[:, None, :]  # 2, Wh*Ww, Wh*Ww\n        relative_coords = relative_coords.permute(1, 2, 0).contiguous()  # Wh*Ww, Wh*Ww, 2\n        relative_coords[:, :, 0] += self.window_size[0] - 1  # shift to start from 0\n        relative_coords[:, :, 1] += self.window_size[1] - 1\n        relative_coords[:, :, 0] *= 2 * self.window_size[1] - 1\n        relative_position_index = relative_coords.sum(-1)  # Wh*Ww, Wh*Ww\n        self.register_buffer(\"relative_position_index\", relative_position_index)\n\n        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj = nn.Linear(dim, dim)\n        self.proj_drop = nn.Dropout(proj_drop)\n\n        trunc_normal_(self.relative_position_bias_table, std=.02)\n        self.softmax = nn.Softmax(dim=-1)\n\n    def forward(self, x, mask=None):\n        \"\"\"\n        Args:\n            x: input features with shape of (num_windows*B, N, C)\n            mask: (0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None\n        \"\"\"\n        B_, N, C = x.shape\n        qkv = self.qkv(x).reshape(B_, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)\n        q, k, v = qkv[0], qkv[1], qkv[2]  # make torchscript happy (cannot use tensor as tuple)\n\n        q = q * self.scale\n        attn = (q @ k.transpose(-2, -1))\n\n        relative_position_bias = self.relative_position_bias_table[self.relative_position_index.view(-1)].view(\n            self.window_size[0] * self.window_size[1], self.window_size[0] * self.window_size[1], -1)  # Wh*Ww,Wh*Ww,nH\n        relative_position_bias = relative_position_bias.permute(2, 0, 1).contiguous()  # nH, Wh*Ww, Wh*Ww\n        attn = attn + relative_position_bias.unsqueeze(0)\n\n        if mask is not None:\n            nW = mask.shape[0]\n            attn = attn.view(B_ // nW, nW, self.num_heads, N, N) + mask.unsqueeze(1).unsqueeze(0)\n            attn = attn.view(-1, self.num_heads, N, N)\n            attn = self.softmax(attn)\n        else:\n            attn = self.softmax(attn)\n\n        attn = self.attn_drop(attn)\n\n        x = (attn @ v).transpose(1, 2).reshape(B_, N, C)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        return x\n\n    def extra_repr(self) -> str:\n        return f'dim={self.dim}, window_size={self.window_size}, num_heads={self.num_heads}'\n\n    def flops(self, N):\n        # calculate flops for 1 window with token length of N\n        flops = 0\n        # qkv = self.qkv(x)\n        flops += N * self.dim * 3 * self.dim\n        # attn = (q @ k.transpose(-2, -1))\n        flops += self.num_heads * N * (self.dim // self.num_heads) * N\n        #  x = (attn @ v)\n        flops += self.num_heads * N * N * (self.dim // self.num_heads)\n        # x = self.proj(x)\n        flops += N * self.dim * self.dim\n        return flops\n\n\nclass SwinTransformerBlock(nn.Module):\n    r\"\"\" Swin Transformer Block.\n\n    Args:\n        dim (int): Number of input channels.\n        input_resolution (tuple[int]): Input resulotion.\n        num_heads (int): Number of attention heads.\n        window_size (int): Window size.\n        shift_size (int): Shift size for SW-MSA.\n        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim.\n        qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True\n        qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set.\n        drop (float, optional): Dropout rate. Default: 0.0\n        attn_drop (float, optional): Attention dropout rate. Default: 0.0\n        drop_path (float, optional): Stochastic depth rate. Default: 0.0\n        act_layer (nn.Module, optional): Activation layer. Default: nn.GELU\n        norm_layer (nn.Module, optional): Normalization layer.  Default: nn.LayerNorm\n    \"\"\"\n\n    def __init__(self, dim, input_resolution, num_heads, window_size=7, shift_size=0,\n                 mlp_ratio=4., qkv_bias=True, qk_scale=None, drop=0., attn_drop=0., drop_path=0.,\n                 act_layer=nn.GELU, norm_layer=nn.LayerNorm):\n        super().__init__()\n        self.dim = dim\n        self.input_resolution = input_resolution\n        self.num_heads = num_heads\n        self.window_size = window_size\n        self.shift_size = shift_size\n        self.mlp_ratio = mlp_ratio\n        if min(self.input_resolution) <= self.window_size:\n            # if window size is larger than input resolution, we don't partition windows\n            self.shift_size = 0\n            self.window_size = min(self.input_resolution)\n        assert 0 <= self.shift_size < self.window_size, \"shift_size must in 0-window_size\"\n\n        self.norm1 = norm_layer(dim)\n        self.attn = WindowAttention(\n            dim, window_size=to_2tuple(self.window_size), num_heads=num_heads,\n            qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop)\n\n        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n        self.norm2 = norm_layer(dim)\n        mlp_hidden_dim = int(dim * mlp_ratio)\n        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)\n\n        if self.shift_size > 0:\n            # calculate attention mask for SW-MSA\n            H, W = self.input_resolution\n            img_mask = torch.zeros((1, H, W, 1))  # 1 H W 1\n            h_slices = (slice(0, -self.window_size),\n                        slice(-self.window_size, -self.shift_size),\n                        slice(-self.shift_size, None))\n            w_slices = (slice(0, -self.window_size),\n                        slice(-self.window_size, -self.shift_size),\n                        slice(-self.shift_size, None))\n            cnt = 0\n            for h in h_slices:\n                for w in w_slices:\n                    img_mask[:, h, w, :] = cnt\n                    cnt += 1\n\n            mask_windows = window_partition(img_mask, self.window_size)  # nW, window_size, window_size, 1\n            mask_windows = mask_windows.view(-1, self.window_size * self.window_size)\n            attn_mask = mask_windows.unsqueeze(1) - mask_windows.unsqueeze(2)\n            attn_mask = attn_mask.masked_fill(attn_mask != 0, float(-100.0)).masked_fill(attn_mask == 0, float(0.0))\n        else:\n            attn_mask = None\n\n        self.register_buffer(\"attn_mask\", attn_mask)\n\n    def forward(self, x):\n        H, W = self.input_resolution\n        B, L, C = x.shape\n        assert L == H * W, \"input feature has wrong size\"\n\n        shortcut = x\n        x = self.norm1(x)\n        x = x.view(B, H, W, C)\n\n        # cyclic shift\n        if self.shift_size > 0:\n            shifted_x = torch.roll(x, shifts=(-self.shift_size, -self.shift_size), dims=(1, 2))\n        else:\n            shifted_x = x\n\n        # partition windows\n        x_windows = window_partition(shifted_x, self.window_size)  # nW*B, window_size, window_size, C\n        x_windows = x_windows.view(-1, self.window_size * self.window_size, C)  # nW*B, window_size*window_size, C\n\n        # W-MSA/SW-MSA\n        attn_windows = self.attn(x_windows, mask=self.attn_mask)  # nW*B, window_size*window_size, C\n\n        # merge windows\n        attn_windows = attn_windows.view(-1, self.window_size, self.window_size, C)\n        shifted_x = window_reverse(attn_windows, self.window_size, H, W)  # B H' W' C\n\n        # reverse cyclic shift\n        if self.shift_size > 0:\n            x = torch.roll(shifted_x, shifts=(self.shift_size, self.shift_size), dims=(1, 2))\n        else:\n            x = shifted_x\n        x = x.view(B, H * W, C)\n\n        # FFN\n        x = shortcut + self.drop_path(x)\n        x = x + self.drop_path(self.mlp(self.norm2(x)))\n\n        return x\n\n    def extra_repr(self) -> str:\n        return f\"dim={self.dim}, input_resolution={self.input_resolution}, num_heads={self.num_heads}, \" \\\n               f\"window_size={self.window_size}, shift_size={self.shift_size}, mlp_ratio={self.mlp_ratio}\"\n\n    def flops(self):\n        flops = 0\n        H, W = self.input_resolution\n        # norm1\n        flops += self.dim * H * W\n        # W-MSA/SW-MSA\n        nW = H * W / self.window_size / self.window_size\n        flops += nW * self.attn.flops(self.window_size * self.window_size)\n        # mlp\n        flops += 2 * H * W * self.dim * self.dim * self.mlp_ratio\n        # norm2\n        flops += self.dim * H * W\n        return flops\n\n\nclass PatchMerging(nn.Module):\n    r\"\"\" Patch Merging Layer.\n\n    Args:\n        input_resolution (tuple[int]): Resolution of input feature.\n        dim (int): Number of input channels.\n        norm_layer (nn.Module, optional): Normalization layer.  Default: nn.LayerNorm\n    \"\"\"\n\n    def __init__(self, input_resolution, dim, norm_layer=nn.LayerNorm):\n        super().__init__()\n        self.input_resolution = input_resolution\n        self.dim = dim\n        self.reduction = nn.Linear(4 * dim, 2 * dim, bias=False)\n        self.norm = norm_layer(4 * dim)\n\n    def forward(self, x):\n        \"\"\"\n        x: B, H*W, C\n        \"\"\"\n        H, W = self.input_resolution\n        B, L, C = x.shape\n        assert L == H * W, \"input feature has wrong size\"\n        assert H % 2 == 0 and W % 2 == 0, f\"x size ({H}*{W}) are not even.\"\n\n        x = x.view(B, H, W, C)\n\n        x0 = x[:, 0::2, 0::2, :]  # B H/2 W/2 C\n        x1 = x[:, 1::2, 0::2, :]  # B H/2 W/2 C\n        x2 = x[:, 0::2, 1::2, :]  # B H/2 W/2 C\n        x3 = x[:, 1::2, 1::2, :]  # B H/2 W/2 C\n        x = torch.cat([x0, x1, x2, x3], -1)  # B H/2 W/2 4*C\n        x = x.view(B, -1, 4 * C)  # B H/2*W/2 4*C\n\n        x = self.norm(x)\n        x = self.reduction(x)\n\n        return x\n\n    def extra_repr(self) -> str:\n        return f\"input_resolution={self.input_resolution}, dim={self.dim}\"\n\n    def flops(self):\n        H, W = self.input_resolution\n        flops = H * W * self.dim\n        flops += (H // 2) * (W // 2) * 4 * self.dim * 2 * self.dim\n        return flops\n\n\nclass BasicLayer(nn.Module):\n    \"\"\" A basic Swin Transformer layer for one stage.\n\n    Args:\n        dim (int): Number of input channels.\n        input_resolution (tuple[int]): Input resolution.\n        depth (int): Number of blocks.\n        num_heads (int): Number of attention heads.\n        window_size (int): Local window size.\n        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim.\n        qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True\n        qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set.\n        drop (float, optional): Dropout rate. Default: 0.0\n        attn_drop (float, optional): Attention dropout rate. Default: 0.0\n        drop_path (float | tuple[float], optional): Stochastic depth rate. Default: 0.0\n        norm_layer (nn.Module, optional): Normalization layer. Default: nn.LayerNorm\n        downsample (nn.Module | None, optional): Downsample layer at the end of the layer. Default: None\n        use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False.\n    \"\"\"\n\n    def __init__(self, dim, input_resolution, depth, num_heads, window_size,\n                 mlp_ratio=4., qkv_bias=True, qk_scale=None, drop=0., attn_drop=0.,\n                 drop_path=0., norm_layer=nn.LayerNorm, downsample=None, use_checkpoint=False):\n\n        super().__init__()\n        self.dim = dim\n        self.input_resolution = input_resolution\n        self.depth = depth\n        self.use_checkpoint = use_checkpoint\n\n        # build blocks\n        self.blocks = nn.ModuleList([\n            SwinTransformerBlock(dim=dim, input_resolution=input_resolution,\n                                 num_heads=num_heads, window_size=window_size,\n                                 shift_size=0 if (i % 2 == 0) else window_size // 2,\n                                 mlp_ratio=mlp_ratio,\n                                 qkv_bias=qkv_bias, qk_scale=qk_scale,\n                                 drop=drop, attn_drop=attn_drop,\n                                 drop_path=drop_path[i] if isinstance(drop_path, list) else drop_path,\n                                 norm_layer=norm_layer)\n            for i in range(depth)])\n\n        # patch merging layer\n        if downsample is not None:\n            self.downsample = downsample(input_resolution, dim=dim, norm_layer=norm_layer)\n        else:\n            self.downsample = None\n\n    def forward(self, x):\n        for blk in self.blocks:\n            if self.use_checkpoint:\n                x = checkpoint.checkpoint(blk, x)\n            else:\n                x = blk(x)\n        if self.downsample is not None:\n            x = self.downsample(x)\n        return x\n\n    def extra_repr(self) -> str:\n        return f\"dim={self.dim}, input_resolution={self.input_resolution}, depth={self.depth}\"\n\n    def flops(self):\n        flops = 0\n        for blk in self.blocks:\n            flops += blk.flops()\n        if self.downsample is not None:\n            flops += self.downsample.flops()\n        return flops\n\n\nclass PatchEmbed(nn.Module):\n    r\"\"\" Image to Patch Embedding\n\n    Args:\n        img_size (int): Image size.  Default: 224.\n        patch_size (int): Patch token size. Default: 4.\n        in_chans (int): Number of input image channels. Default: 3.\n        embed_dim (int): Number of linear projection output channels. Default: 96.\n        norm_layer (nn.Module, optional): Normalization layer. Default: None\n    \"\"\"\n\n    def __init__(self, img_size=224, patch_size=4, in_chans=3, embed_dim=96, norm_layer=None):\n        super().__init__()\n        img_size = to_2tuple(img_size)\n        patch_size = to_2tuple(patch_size)\n        patches_resolution = [img_size[0] // patch_size[0], img_size[1] // patch_size[1]]\n        self.img_size = img_size\n        self.patch_size = patch_size\n        self.patches_resolution = patches_resolution\n        self.num_patches = patches_resolution[0] * patches_resolution[1]\n\n        self.in_chans = in_chans\n        self.embed_dim = embed_dim\n\n        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size)\n        if norm_layer is not None:\n            self.norm = norm_layer(embed_dim)\n        else:\n            self.norm = None\n\n    def forward(self, x):\n        B, C, H, W = x.shape\n        # FIXME look at relaxing size constraints\n        assert H == self.img_size[0] and W == self.img_size[1], \\\n            f\"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]}).\"\n        x = self.proj(x).flatten(2).transpose(1, 2)  # B Ph*Pw C\n        if self.norm is not None:\n            x = self.norm(x)\n        return x\n\n    def flops(self):\n        Ho, Wo = self.patches_resolution\n        flops = Ho * Wo * self.embed_dim * self.in_chans * (self.patch_size[0] * self.patch_size[1])\n        if self.norm is not None:\n            flops += Ho * Wo * self.embed_dim\n        return flops\n\n\nclass SwinTransformer(nn.Module):\n    r\"\"\" Swin Transformer\n        A PyTorch impl of : `Swin Transformer: Hierarchical Vision Transformer using Shifted Windows`  -\n          https://arxiv.org/pdf/2103.14030\n\n    Args:\n        img_size (int | tuple(int)): Input image size. Default 224\n        patch_size (int | tuple(int)): Patch size. Default: 4\n        in_chans (int): Number of input image channels. Default: 3\n        num_classes (int): Number of classes for classification head. Default: 1000\n        embed_dim (int): Patch embedding dimension. Default: 96\n        depths (tuple(int)): Depth of each Swin Transformer layer.\n        num_heads (tuple(int)): Number of attention heads in different layers.\n        window_size (int): Window size. Default: 7\n        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. Default: 4\n        qkv_bias (bool): If True, add a learnable bias to query, key, value. Default: True\n        qk_scale (float): Override default qk scale of head_dim ** -0.5 if set. Default: None\n        drop_rate (float): Dropout rate. Default: 0\n        attn_drop_rate (float): Attention dropout rate. Default: 0\n        drop_path_rate (float): Stochastic depth rate. Default: 0.1\n        norm_layer (nn.Module): Normalization layer. Default: nn.LayerNorm.\n        ape (bool): If True, add absolute position embedding to the patch embedding. Default: False\n        patch_norm (bool): If True, add normalization after patch embedding. Default: True\n        use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False\n    \"\"\"\n\n    def __init__(self, img_size=224, patch_size=4, in_chans=3, num_classes=1000,\n                 embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24],\n                 window_size=7, mlp_ratio=4., qkv_bias=True, qk_scale=None,\n                 drop_rate=0., attn_drop_rate=0., drop_path_rate=0.1,\n                 norm_layer=nn.LayerNorm, ape=False, patch_norm=True,\n                 use_checkpoint=False, **kwargs):\n        super().__init__()\n\n        self.num_classes = num_classes\n        self.num_layers = len(depths)\n        self.embed_dim = embed_dim\n        self.ape = ape\n        self.patch_norm = patch_norm\n        self.num_features = int(embed_dim * 2 ** (self.num_layers - 1))\n        self.mlp_ratio = mlp_ratio\n\n        # split image into non-overlapping patches\n        self.patch_embed = PatchEmbed(\n            img_size=img_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim,\n            norm_layer=norm_layer if self.patch_norm else None)\n        num_patches = self.patch_embed.num_patches\n        patches_resolution = self.patch_embed.patches_resolution\n        self.patches_resolution = patches_resolution\n\n        # absolute position embedding\n        if self.ape:\n            self.absolute_pos_embed = nn.Parameter(torch.zeros(1, num_patches, embed_dim))\n            trunc_normal_(self.absolute_pos_embed, std=.02)\n\n        self.pos_drop = nn.Dropout(p=drop_rate)\n\n        # stochastic depth\n        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))]  # stochastic depth decay rule\n\n        # build layers\n        self.layers = nn.ModuleList()\n        for i_layer in range(self.num_layers):\n            layer = BasicLayer(dim=int(embed_dim * 2 ** i_layer),\n                               input_resolution=(patches_resolution[0] // (2 ** i_layer),\n                                                 patches_resolution[1] // (2 ** i_layer)),\n                               depth=depths[i_layer],\n                               num_heads=num_heads[i_layer],\n                               window_size=window_size,\n                               mlp_ratio=self.mlp_ratio,\n                               qkv_bias=qkv_bias, qk_scale=qk_scale,\n                               drop=drop_rate, attn_drop=attn_drop_rate,\n                               drop_path=dpr[sum(depths[:i_layer]):sum(depths[:i_layer + 1])],\n                               norm_layer=norm_layer,\n                               downsample=PatchMerging if (i_layer < self.num_layers - 1) else None,\n                               use_checkpoint=use_checkpoint)\n            self.layers.append(layer)\n\n        self.norm = norm_layer(self.num_features)\n        self.avgpool = nn.AdaptiveAvgPool1d(1)\n        # self.head = nn.Linear(self.num_features, num_classes) if num_classes > 0 else nn.Identity()\n\n        self.apply(self._init_weights)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Linear):\n            trunc_normal_(m.weight, std=.02)\n            if isinstance(m, nn.Linear) and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n        elif isinstance(m, nn.LayerNorm):\n            nn.init.constant_(m.bias, 0)\n            nn.init.constant_(m.weight, 1.0)\n\n    @torch.jit.ignore\n    def no_weight_decay(self):\n        return {'absolute_pos_embed'}\n\n    @torch.jit.ignore\n    def no_weight_decay_keywords(self):\n        return {'relative_position_bias_table'}\n\n    def forward(self, x, idx_to_group_img=None, image_atts=None, **kwargs):\n        x = self.patch_embed(x)\n        if self.ape:\n            x = x + self.absolute_pos_embed\n        x = self.pos_drop(x)\n\n        for layer in self.layers:\n            x = layer(x)\n\n        x = self.norm(x)  # B L C\n\n        x_cls = self.avgpool(x.transpose(1, 2))  # B C 1\n\n        if idx_to_group_img is None:\n            return torch.cat([x_cls.transpose(1, 2), x], dim=1)\n        else:\n            x_bs = torch.gather(x, dim=0, index=idx_to_group_img.view(-1, 1, 1).expand(-1, x.shape[1], x.shape[2]))\n            weights = image_atts[:, 1:].unsqueeze(2)  # B L 1\n            x_bs_cls = torch.sum((weights * x_bs).transpose(1, 2), dim=-1, keepdim=True)   # B C 1\n            x_bs_cls = x_bs_cls / torch.sum(weights.transpose(1, 2), dim=-1, keepdim=True)  # avgpool\n\n            return torch.cat([x_bs_cls.transpose(1, 2), x_bs], dim=1), \\\n                   torch.cat([x_cls.transpose(1, 2), x], dim=1)\n\n    def flops(self):\n        flops = 0\n        flops += self.patch_embed.flops()\n        for i, layer in enumerate(self.layers):\n            flops += layer.flops()\n        flops += self.num_features * self.patches_resolution[0] * self.patches_resolution[1] // (2 ** self.num_layers)\n        flops += self.num_features * self.num_classes\n        return flops\n\n\ndef interpolate_relative_pos_embed(rel_pos_bias, dst_num_pos, param_name=''):\n    # from: https://github.com/microsoft/unilm/blob/8a0a1c1f4e7326938ea7580a00d56d7f17d65612/beit/run_class_finetuning.py#L348\n\n    # rel_pos_bias: relative_position_bias_table\n    src_num_pos, num_attn_heads = rel_pos_bias.size()\n\n    num_extra_tokens = 0\n    src_size = int((src_num_pos - num_extra_tokens) ** 0.5)\n    dst_size = int((dst_num_pos - num_extra_tokens) ** 0.5)\n    if src_size != dst_size:\n        print(\"Position interpolate %s from %dx%d to %dx%d\" % (param_name, src_size, src_size, dst_size, dst_size))\n\n        # extra_tokens = rel_pos_bias[-num_extra_tokens:, :]\n        # rel_pos_bias = rel_pos_bias[:-num_extra_tokens, :]\n\n        def geometric_progression(a, r, n):\n            return a * (1.0 - r ** n) / (1.0 - r)\n\n        left, right = 1.01, 1.5\n        while right - left > 1e-6:\n            q = (left + right) / 2.0\n            gp = geometric_progression(1, q, src_size // 2)\n            if gp > dst_size // 2:\n                right = q\n            else:\n                left = q\n\n        # if q > 1.090307:\n        #     q = 1.090307\n\n        dis = []\n        cur = 1\n        for i in range(src_size // 2):\n            dis.append(cur)\n            cur += q ** (i + 1)\n\n        r_ids = [-_ for _ in reversed(dis)]\n\n        x = r_ids + [0] + dis\n        y = r_ids + [0] + dis\n\n        t = dst_size // 2.0\n        dx = np.arange(-t, t + 0.1, 1.0)\n        dy = np.arange(-t, t + 0.1, 1.0)\n\n        # print(\"Original positions = %s\" % str(x))\n        # print(\"Target positions = %s\" % str(dx))\n\n        all_rel_pos_bias = []\n\n        for i in range(num_attn_heads):\n            z = rel_pos_bias[:, i].view(src_size, src_size).float().numpy()\n            f = interpolate.interp2d(x, y, z, kind='cubic')\n            all_rel_pos_bias.append(\n                torch.Tensor(f(dx, dy)).contiguous().view(-1, 1).to(rel_pos_bias.device))\n\n        rel_pos_bias = torch.cat(all_rel_pos_bias, dim=-1)\n\n    return rel_pos_bias"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/tag2Text/tag2text.py",
    "content": "'''\n * Tag2Text\n * Written by Xinyu Huang\n'''\nimport warnings\nwarnings.filterwarnings(\"ignore\")\n\nfrom .vit import VisionTransformer, interpolate_pos_embed\nfrom .swin_transformer import SwinTransformer, interpolate_relative_pos_embed\nfrom .med import BertConfig, BertModel, BertLMHeadModel\nfrom transformers import BertTokenizer\n\nimport torch\nfrom torch import nn\nimport torch.nn.functional as F\n\nimport os\nCUR_DIR = os.path.dirname(os.path.abspath(__file__))\nfrom urllib.parse import urlparse\nfrom timm.models.hub import download_cached_file\nfrom .tag_class import tra_array\nimport json\nimport math\nimport numpy as np\n\ndef read_json(rpath):\n    with open(rpath, 'r') as f:\n        return json.load(f)\n\ndelete_tag_index = [127, 3351, 3265, 3338, 3355, 3359]\n        \nclass Tag2Text_Caption(nn.Module):\n    def __init__(self,                 \n                 med_config = f'{CUR_DIR}/med_config.json',  \n                 image_size = 384,\n                 vit = 'base',\n                 vit_grad_ckpt = False,\n                 vit_ckpt_layer = 0,\n                 prompt = 'a picture of ',\n                 threshold = 0.7,\n                 ):\n        \"\"\"\n        Args:\n            med_config (str): path for the mixture of encoder-decoder model's configuration file\n            image_size (int): input image size\n            vit (str): model size of vision transformer\n        \"\"\"            \n        super().__init__()\n\n        if vit=='swin_b':\n            if image_size == 224:\n                vision_config_path = 'configs/swin/config_swinB_224.json'\n            elif image_size == 384:\n                vision_config_path = f'{CUR_DIR}/config_swinB_384.json'\n            vision_config = read_json(vision_config_path)\n            assert image_size == vision_config['image_res']\n\n            vision_width = vision_config['vision_width']\n\n            self.visual_encoder = SwinTransformer(img_size=vision_config['image_res'],\n                                            patch_size=4,\n                                            in_chans=3,\n                                            embed_dim=vision_config['embed_dim'],\n                                            depths=vision_config['depths'],\n                                            num_heads=vision_config['num_heads'],\n                                            window_size=vision_config['window_size'],\n                                            mlp_ratio=4.,\n                                            qkv_bias=True,\n                                            drop_rate=0.0,\n                                            drop_path_rate=0.1,\n                                            ape=False,\n                                            patch_norm=True,\n                                            use_checkpoint=False)\n        \n        else:\n            self.visual_encoder, vision_width = create_vit(vit,image_size, vit_grad_ckpt, vit_ckpt_layer)\n\n\n        self.tokenizer = init_tokenizer()   \n\n        # create the decoder\n        decoder_config = BertConfig.from_json_file(med_config)\n        decoder_config.encoder_width = 768\n        self.text_decoder = BertLMHeadModel(config=decoder_config)     \n\n        # create encoder\n        encoder_config = BertConfig.from_json_file(med_config)\n        encoder_config.encoder_width = vision_width\n        self.tag_encoder = BertModel(config=encoder_config, add_pooling_layer=False)\n        \n        self.prompt = prompt\n        self.prompt_length = len(self.tokenizer(self.prompt).input_ids)-1\n\n        self.threshold = threshold\n        num_features = 768\n        self.num_class = 3429\n\n        q2l_config = BertConfig.from_json_file(f'{CUR_DIR}/q2l_config.json')\n        q2l_config.encoder_width = vision_width\n        self.vision_multi = BertModel.from_pretrained('bert-base-uncased',config=q2l_config, add_pooling_layer=False)\n        self.vision_multi.resize_token_embeddings(len(self.tokenizer)) \n        self.label_embed = nn.Embedding(self.num_class, q2l_config.hidden_size)\n        self.fc =  GroupWiseLinear(self.num_class, num_features, bias=True)\n        self.del_selfattention()\n\n        tie_encoder_decoder_weights(self.tag_encoder,self.vision_multi,'',' ')\n        self.tag_array = tra_array\n\n    def del_selfattention(self):\n        del self.vision_multi.embeddings\n        for layer in self.vision_multi.encoder.layer:\n            del layer.attention\n        \n    def generate(self, image, sample=False, num_beams=3, max_length=30, min_length=10, top_p=0.9, repetition_penalty=1.0, tag_input = None, return_tag_predict = False):\n        image_embeds = self.visual_encoder(image)\n        image_atts = torch.ones(image_embeds.size()[:-1],dtype=torch.long).to(image.device)\n\n        #==============generate tag==============#\n        if tag_input == None:\n            image_spatial_embeds = image_embeds[:,1:,:]\n            image_cls_embeds = image_embeds[:,0,:]\n\n            bs = image_spatial_embeds.shape[0]\n            label_embed = self.label_embed.weight.unsqueeze(0).repeat(bs,1,1)\n            mlr_tagembedding = self.vision_multi(encoder_embeds = label_embed,\n                                encoder_hidden_states = image_embeds,\n                                encoder_attention_mask = image_atts,      \n                                return_dict = False,\n                                mode = 'mlr',\n                                )  \n\n            logits = self.fc(mlr_tagembedding[0])\n            \n            targets = torch.where(torch.sigmoid(logits) > self.threshold , torch.tensor(1.0).to(image.device), torch.zeros(self.num_class).to(image.device))\n\n            tag = targets.cpu().numpy()\n            tag[:,delete_tag_index] = 0\n            bs = image.size(0)\n            tag_input = []\n            for b in range(bs):\n                index = np.argwhere(tag[b] == 1)\n                token = self.tag_array[index].squeeze(axis = 1)\n                tag_input.append(' | '.join(token))            \n        #========================================#\n        \n        if not sample:\n            image_embeds = image_embeds.repeat_interleave(num_beams,dim=0)\n            image_atts = image_atts.repeat_interleave(num_beams,dim=0)\n            tag_input_temp = []\n            for tag in tag_input:\n                for i in range(num_beams):\n                    tag_input_temp.append(tag)\n            tag_input = tag_input_temp\n\n\n        tag_input_tokenzier = self.tokenizer(tag_input, padding='max_length', truncation=True, max_length=40, \n                              return_tensors=\"pt\").to(image.device)  \n        \n        encoder_input_ids = tag_input_tokenzier.input_ids\n        encoder_input_ids[:,0] = self.tokenizer.enc_token_id\n        # print(encoder_input_ids.size(), tag_input_tokenzier.attention_mask.size(),image_embeds.size(),  image_atts.size())\n        # import pdb\n        # pdb.set_trace()\n        output_tagembedding = self.tag_encoder(encoder_input_ids,\n                                       attention_mask = tag_input_tokenzier.attention_mask,\n                                       encoder_hidden_states = image_embeds,\n                                       encoder_attention_mask = image_atts,      \n                                       return_dict = True,\n                                      )  \n        \n        prompt = [self.prompt] * image.size(0)\n        input_ids = self.tokenizer(prompt, return_tensors=\"pt\").input_ids.to(image.device) \n        input_ids[:,0] = self.tokenizer.bos_token_id\n        input_ids = input_ids[:, :-1] \n\n        if sample:\n            #nucleus sampling\n            model_kwargs = {\"encoder_hidden_states\": output_tagembedding.last_hidden_state, \"encoder_attention_mask\":None}\n            outputs = self.text_decoder.generate(input_ids=input_ids,\n                                                max_length=max_length,\n                                                min_length=min_length,\n                                                do_sample=True,\n                                                top_p=top_p,\n                                                num_return_sequences=1,\n                                                eos_token_id=self.tokenizer.sep_token_id,\n                                                pad_token_id=self.tokenizer.pad_token_id, \n                                                repetition_penalty=1.1,                                            \n                                                **model_kwargs)\n        else:\n            #beam search\n            model_kwargs = {\"encoder_hidden_states\": output_tagembedding.last_hidden_state, \"encoder_attention_mask\":None}\n            outputs = self.text_decoder.generate(input_ids=input_ids,\n                                                max_length=max_length,\n                                                min_length=min_length,\n                                                num_beams=num_beams,\n                                                eos_token_id=self.tokenizer.sep_token_id,\n                                                pad_token_id=self.tokenizer.pad_token_id,     \n                                                repetition_penalty=repetition_penalty,\n                                                **model_kwargs)            \n            \n        captions = []    \n        for output in outputs:\n            caption = self.tokenizer.decode(output, skip_special_tokens=True)    \n            captions.append(caption[len(self.prompt):])\n        if return_tag_predict == True:\n            if sample:\n                return captions, tag_input\n            else:\n                return captions, tag_input[0:int(len(tag_input)/num_beams)]            \n        return captions\n\n\ndef tag2text_caption(pretrained='',**kwargs):\n    model = Tag2Text_Caption(**kwargs)\n    if pretrained:\n        if kwargs['vit'] == 'swin_b':\n            model,msg = load_checkpoint_swinbase(model,pretrained,kwargs)\n        else:\n            model,msg = load_checkpoint(model,pretrained)\n        # print('vit:',kwargs['vit'])\n        # print('msg_v2',msg)\n    return model    \n\n\nfrom typing import List\ndef tie_encoder_decoder_weights(encoder: nn.Module, decoder: nn.Module, base_model_prefix: str, skip_key:str):\n    uninitialized_encoder_weights: List[str] = []\n    if decoder.__class__ != encoder.__class__:\n        logger.info(\n            f\"{decoder.__class__} and {encoder.__class__} are not equal. In this case make sure that all encoder weights are correctly initialized.\"\n        )\n\n    def tie_encoder_to_decoder_recursively(\n        decoder_pointer: nn.Module,\n        encoder_pointer: nn.Module,\n        module_name: str,\n        uninitialized_encoder_weights: List[str],\n        skip_key: str,\n        depth=0,\n    ):\n        assert isinstance(decoder_pointer, nn.Module) and isinstance(\n            encoder_pointer, nn.Module\n        ), f\"{decoder_pointer} and {encoder_pointer} have to be of type torch.nn.Module\"\n        if hasattr(decoder_pointer, \"weight\") and skip_key not in module_name:\n            assert hasattr(encoder_pointer, \"weight\")\n            encoder_pointer.weight = decoder_pointer.weight\n            if hasattr(decoder_pointer, \"bias\"):\n                assert hasattr(encoder_pointer, \"bias\")\n                encoder_pointer.bias = decoder_pointer.bias                \n            # print(module_name+' is tied')    \n            return\n\n        encoder_modules = encoder_pointer._modules\n        decoder_modules = decoder_pointer._modules\n        if len(decoder_modules) > 0:\n            assert (\n                len(encoder_modules) > 0\n            ), f\"Encoder module {encoder_pointer} does not match decoder module {decoder_pointer}\"\n\n            all_encoder_weights = set([module_name + \"/\" + sub_name for sub_name in encoder_modules.keys()])\n            encoder_layer_pos = 0\n            for name, module in decoder_modules.items():\n                if name.isdigit():\n                    encoder_name = str(int(name) + encoder_layer_pos)\n                    decoder_name = name\n                    if not isinstance(decoder_modules[decoder_name], type(encoder_modules[encoder_name])) and len(\n                        encoder_modules\n                    ) != len(decoder_modules):\n                        # this can happen if the name corresponds to the position in a list module list of layers\n                        # in this case the decoder has added a cross-attention that the encoder does not have\n                        # thus skip this step and subtract one layer pos from encoder\n                        encoder_layer_pos -= 1\n                        continue\n                elif name not in encoder_modules:\n                    continue\n                elif depth > 500:\n                    raise ValueError(\n                        \"Max depth of recursive function `tie_encoder_to_decoder` reached. It seems that there is a circular dependency between two or more `nn.Modules` of your model.\"\n                    )\n                else:\n                    decoder_name = encoder_name = name\n                tie_encoder_to_decoder_recursively(\n                    decoder_modules[decoder_name],\n                    encoder_modules[encoder_name],\n                    module_name + \"/\" + name,\n                    uninitialized_encoder_weights,\n                    skip_key,\n                    depth=depth + 1,\n                )\n                all_encoder_weights.remove(module_name + \"/\" + encoder_name)\n\n            uninitialized_encoder_weights += list(all_encoder_weights)\n\n    # tie weights recursively\n    tie_encoder_to_decoder_recursively(decoder, encoder, base_model_prefix, uninitialized_encoder_weights, skip_key)  \n\n\nclass GroupWiseLinear(nn.Module):\n    # could be changed to: \n    # output = torch.einsum('ijk,zjk->ij', x, self.W)\n    # or output = torch.einsum('ijk,jk->ij', x, self.W[0])\n    def __init__(self, num_class, hidden_dim, bias=True):\n        super().__init__()\n        self.num_class = num_class\n        self.hidden_dim = hidden_dim\n        self.bias = bias\n\n        self.W = nn.Parameter(torch.Tensor(1, num_class, hidden_dim))\n        if bias:\n            self.b = nn.Parameter(torch.Tensor(1, num_class))\n        self.reset_parameters()\n\n    def reset_parameters(self):\n        stdv = 1. / math.sqrt(self.W.size(2))\n        for i in range(self.num_class):\n            self.W[0][i].data.uniform_(-stdv, stdv)\n        if self.bias:\n            for i in range(self.num_class):\n                self.b[0][i].data.uniform_(-stdv, stdv)\n\n    def forward(self, x):\n        # x: B,K,d\n        x = (self.W * x).sum(-1)\n        if self.bias:\n            x = x + self.b\n        return x\n\n\ndef init_tokenizer():\n    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')\n    tokenizer.add_special_tokens({'bos_token':'[DEC]'})\n    tokenizer.add_special_tokens({'additional_special_tokens':['[ENC]']})       \n    tokenizer.enc_token_id = tokenizer.additional_special_tokens_ids[0]  \n    return tokenizer\n\n\ndef create_vit(vit, image_size, use_grad_checkpointing=False, ckpt_layer=0, drop_path_rate=0):\n        \n    assert vit in ['base', 'large'], \"vit parameter must be base or large\"\n    if vit=='base':\n        vision_width = 768\n        visual_encoder = VisionTransformer(img_size=image_size, patch_size=16, embed_dim=vision_width, depth=12, \n                                           num_heads=12, use_grad_checkpointing=use_grad_checkpointing, ckpt_layer=ckpt_layer,\n                                           drop_path_rate=0 or drop_path_rate\n                                          )   \n    elif vit=='large':\n        vision_width = 1024\n        visual_encoder = VisionTransformer(img_size=image_size, patch_size=16, embed_dim=vision_width, depth=24, \n                                           num_heads=16, use_grad_checkpointing=use_grad_checkpointing, ckpt_layer=ckpt_layer,\n                                           drop_path_rate=0.1 or drop_path_rate\n                                          )   \n    return visual_encoder, vision_width\n\ndef is_url(url_or_filename):\n    parsed = urlparse(url_or_filename)\n    return parsed.scheme in (\"http\", \"https\")\n\ndef load_checkpoint(model,url_or_filename):\n    if is_url(url_or_filename):\n        cached_file = download_cached_file(url_or_filename, check_hash=False, progress=True)\n        checkpoint = torch.load(cached_file, map_location='cpu') \n    elif os.path.isfile(url_or_filename):        \n        checkpoint = torch.load(url_or_filename, map_location='cpu') \n    else:\n        raise RuntimeError('checkpoint url or path is invalid')\n        \n    state_dict = checkpoint['model']\n    \n    state_dict['visual_encoder.pos_embed'] = interpolate_pos_embed(state_dict['visual_encoder.pos_embed'],model.visual_encoder) \n    if 'visual_encoder_m.pos_embed' in model.state_dict().keys():\n        state_dict['visual_encoder_m.pos_embed'] = interpolate_pos_embed(state_dict['visual_encoder_m.pos_embed'],\n                                                                         model.visual_encoder_m)    \n    for key in model.state_dict().keys():\n        if key in state_dict.keys():\n            if state_dict[key].shape!=model.state_dict()[key].shape:\n                del state_dict[key]\n    \n    msg = model.load_state_dict(state_dict,strict=False)\n    # print('load checkpoint from %s'%url_or_filename)  \n    return model,msg\n    \n\ndef load_checkpoint_swinbase(model,url_or_filename,kwargs):\n    if kwargs['image_size'] == 224:\n        vision_config_path = 'configs/swin/config_swinB_224.json'\n    elif kwargs['image_size'] == 384:\n        vision_config_path = f'{CUR_DIR}/config_swinB_384.json'\n    elif kwargs['image_size'] == 480:\n        vision_config_path = 'configs/swin/config_swinB_480.json'\n    elif kwargs['image_size'] == 576:\n        vision_config_path = 'configs/swin/config_swinB_576.json'\n    elif kwargs['image_size'] == 608:\n        vision_config_path = 'configs/swin/config_swinB_608.json'\n    window_size = read_json(vision_config_path)['window_size']\n    # print('--------------')\n    # print(url_or_filename)\n    # print('--------------')\n    if is_url(url_or_filename):\n        cached_file = download_cached_file(url_or_filename, check_hash=False, progress=True)\n        checkpoint = torch.load(cached_file, map_location='cpu') \n    elif os.path.isfile(url_or_filename):        \n        checkpoint = torch.load(url_or_filename, map_location='cpu') \n    else:\n        raise RuntimeError('checkpoint url or path is invalid')\n        \n    state_dict = checkpoint['model']\n\n    for k in list(state_dict.keys()):\n        if 'relative_position_bias_table' in k:\n            dst_num_pos = (2 * window_size - 1) ** 2\n            state_dict[k] = interpolate_relative_pos_embed(state_dict[k], dst_num_pos, param_name=k)\n        elif ('relative_position_index' in k) or ('attn_mask' in k):\n            del state_dict[k]\n    \n    msg = model.load_state_dict(state_dict,strict=False)\n    print('load checkpoint from %s'%url_or_filename)  \n    return model,msg\n    \n\n\n\n\nif __name__==\"__main__\":\n    model = Tag2Text_Caption()\n    import pdb\n    pdb.set_trace()\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/tag2Text/tag_class.py",
    "content": "import numpy as np\n\n\ntra_array = ['tennis',\n'bear cub',\n'observatory',\n'bicycle',\n'hillside',\n'judge',\n'watercolor illustration',\n'granite',\n'lobster',\n'livery',\n'stone',\n'ceramic',\n'ranch',\n'cloth',\n'smile',\n'building',\n'tattoo',\n'cricketer',\n'cheek',\n'pear',\n'source',\n'winter',\n'surface',\n'spray',\n'ceremony',\n'magic',\n'curve',\n'container',\n'fair',\n'medicine',\n'baby',\n'tennis racquet',\n'ornament',\n'bamboo',\n'duckling',\n'song',\n'safari',\n'team presentation',\n'daffodil',\n'cross',\n'toothpaste',\n'shield',\n'fashion model',\n'capsule',\n'map',\n'creek',\n'glass house',\n'glass plate',\n'siding',\n'corner',\n'water buffalo',\n'bison',\n'figure skater',\n'diploma',\n'tire',\n'race',\n'cable car',\n'brain',\n'gas stove',\n'soap bubble',\n'palette',\n'snowboard',\n'school child',\n'trench coat',\n'monk',\n'fiber',\n'kitchen window',\n'sunglass',\n'coffee',\n'security',\n'strawberry',\n'penguin',\n'tree root',\n'loaf',\n'engagement ring',\n'lamb',\n'vector cartoon illustration',\n'sandwich',\n'mountain village',\n'shape',\n'charm',\n'fiction',\n'knot',\n'greenhouse',\n'sushi',\n'text',\n'disaster',\n'trophy',\n'gang',\n'strap',\n'soccer game',\n'cardinal',\n'tee',\n'turtle',\n'water surface',\n'grassland',\n'dolphin',\n'store',\n'dirt',\n'iceberg',\n'pergola',\n'farmer market',\n'publicity portrait',\n'tote bag',\n'teenage girl',\n'view mirror',\n'session',\n'commuter',\n'dressing room',\n'tricycle',\n'christmas ball',\n'headlight',\n'police',\n'armchair',\n'chart',\n'yacht',\n'saw',\n'printer',\n'rock band',\n'gingerbread house',\n'tag',\n'table lamp',\n'hockey game',\n'slope',\n'font',\n'wicker basket',\n'jewelry',\n'quarter',\n'software',\n'weapon',\n'pin',\n'worship',\n'painter',\n'goal',\n'morning light',\n'bike',\n'baseball bat',\n'elevator',\n'cuisine',\n'sausage',\n'stunt',\n'wrestler',\n'statue',\n'landing',\n'pillar',\n'willow tree',\n'sea wave',\n'chicken',\n'peanut',\n'muscle',\n'bob',\n'tv genre',\n'bathroom window',\n'radish',\n'textile',\n'pelican',\n'marketplace',\n'crest',\n'elevation map',\n'gift',\n'parish',\n'traffic light',\n'campfire',\n'fog',\n'award winner',\n'beach ball',\n'mat',\n'white house',\n'plaster',\n'moped',\n'football team',\n'solution',\n'bicyclist',\n'bit',\n'playground',\n'darkness',\n'cake',\n'maple leave',\n'mold',\n'cracker',\n'blueberry',\n'rubble',\n'container ship',\n'pedestrian bridge',\n'snail',\n'parrot',\n'form',\n'circuit',\n'highlight',\n'pickup truck',\n'koala',\n'rain',\n'system',\n'weather',\n'raincoat',\n'soccer team',\n'windshield',\n'thunderstorm',\n'mike',\n'bird house',\n'bridge',\n'grandfather',\n'restroom',\n'animation',\n'wilderness',\n'clown',\n'banana',\n'brown',\n'braid',\n'dining room',\n'kindergarten',\n'launch event',\n'purple',\n'school',\n'stairwell',\n'brooch',\n'movie poster image',\n'mountain river',\n'shelf',\n'wicket',\n'headboard',\n'buddha',\n'flower field',\n'dugout',\n'cd',\n'bald eagle',\n'lagoon',\n'seaweed',\n'agriculture',\n'emergency service',\n'maple tree',\n'parachute',\n'continent',\n'amusement park',\n'remote',\n'bun',\n'tackle',\n'hospital',\n'garage door',\n'birthday party',\n'friendship',\n'go',\n'mausoleum',\n'jeep',\n'raccoon',\n'step',\n'ice hockey team',\n'cigarette',\n'lace dress',\n'forest floor',\n'mall',\n'captain',\n'milk',\n'golf course',\n'meal',\n'picnic table',\n'sail',\n'volleyball',\n'canal',\n'terrace',\n'computer desk',\n'caravan',\n'hotel',\n'cheerleader',\n'nurse',\n'museum',\n'marsh',\n'fox',\n'plateau',\n'night',\n'twin',\n'letter logo',\n'autumn tree',\n'powder',\n'convention',\n'creature',\n'lighthouse',\n'shop window',\n'jacket',\n'stork',\n'taxi',\n'trade',\n'blackboard',\n'olive',\n'road sign',\n'resort',\n'snowflake',\n'cemetery',\n'travel',\n'evening dress',\n'picnic',\n'drink',\n'winter morning',\n'football player',\n'snack',\n'boxing glove',\n'dinner party',\n'airline',\n'swing',\n'port',\n'wheelbarrow',\n'bathroom sink',\n'sweater',\n'ambulance',\n'gear',\n'oil',\n'wii controller',\n'array',\n'home office',\n'car show',\n'mixture',\n'profession',\n'tree frog',\n'square',\n'facility',\n'coral reef',\n'sea wall',\n'pizza',\n'exhibit',\n'demolition',\n'trout',\n'ring',\n'coffee shop',\n'bracelet',\n'bean',\n'lip',\n'fencing',\n'landscape',\n'sitting',\n'package',\n'metal',\n'bust',\n'king',\n'hair',\n'window seat',\n'wildlife',\n'trunk',\n'greenery',\n'stencil',\n'fire hydrant',\n'bridesmaid',\n'plaza',\n'alps',\n'tower bridge',\n'crop top',\n'crossing',\n'cinema',\n'pedestrian crossing',\n'family',\n'shopping cart',\n'stomach',\n'church building',\n'screen door',\n'skater',\n'soccer field',\n'kettle',\n'mussel',\n'raindrop',\n'candy cane',\n'water lily',\n'flower girl',\n'desert',\n'enclosure',\n'christmas light',\n'kitchen',\n'caterpillar',\n'plaid',\n'bath',\n'bush',\n'mud',\n'ballet',\n'knee',\n'adult',\n'raft',\n'sea view',\n'cactus',\n'office chair',\n'overall',\n'rim',\n'scaffolding',\n'pig',\n'cover',\n'poster page',\n'sprinkle',\n'chandelier',\n'algae',\n'traffic',\n'surfboard',\n'book',\n'filming',\n'flash',\n'mansion',\n'camouflage',\n'trouser',\n'ticket',\n'weed',\n'cab',\n'trench',\n'elephant',\n'huddle',\n'sphere',\n'christmas decoration',\n'city',\n'launch',\n'doll',\n'christmas ornament',\n'fabric',\n'bikini',\n'biplane',\n'breakfast',\n'neighbourhood',\n'race track',\n'foliage',\n'avocado',\n'school bus',\n'footwear',\n'highway',\n'ocean view',\n'art vector illustration',\n'wall clock',\n'curtain',\n'teenager',\n'kitchen area',\n'robot',\n'tusk',\n'lounge chair',\n'beam',\n'paddle',\n'camel',\n'lid',\n'world map',\n'city view',\n'newlywed',\n'cargo ship',\n'yellow',\n'exhibition',\n'bend',\n'novel',\n'wool',\n'ontario',\n'bread',\n'campus',\n'coastline',\n'cutting board',\n'booth',\n'table top',\n'carpet',\n'beach chair',\n'workout',\n'street food',\n'fun',\n'costumer film designer',\n'gadget',\n'artist',\n'fishing village',\n'builder',\n'violinist',\n'iphone',\n'spider web',\n'traffic sign',\n'ruin',\n'rescue',\n'clipboard',\n'seal',\n'film director',\n'paw',\n'nursery',\n'intersection',\n'tomato sauce',\n'taste',\n'paddy field',\n'christmas tree',\n'wave',\n'stool',\n'watering can',\n'rug',\n'daytime',\n'subway station',\n'craft',\n'pine forest',\n'black',\n'planet',\n'motif',\n'christmas market',\n'glass window',\n'college',\n'wheat',\n'damage',\n'rectangle',\n'picture frame',\n'chess',\n'guest room',\n'street corner',\n'religion',\n'seed',\n'puzzle',\n'freeway',\n'beauty',\n'ocean',\n'watch',\n'mother',\n'garage',\n'quote',\n'dj',\n'supporter',\n'hip hop artist',\n'muffin',\n'eiffel tower',\n'cash',\n'firefighter',\n'cauliflower',\n'bunker',\n'sled',\n'manicure',\n'shark',\n'stall',\n'jungle',\n'family home',\n'tour bus',\n'chimney',\n'touchdown',\n'roundabout',\n'coyote',\n'street scene',\n'tank',\n'wedding dress',\n'mantle',\n'bedroom window',\n'coconut',\n'chapel',\n'goat',\n'living space',\n'rock wall',\n'polka dot',\n'railway',\n'mandala',\n'mango',\n'lesson',\n'mountain landscape',\n'team photo',\n'bookshelf',\n'meter',\n'bulldog',\n'evening sun',\n'stick',\n'card',\n'pink',\n'fish pond',\n'paint',\n'pill',\n'cart',\n'pea',\n'van',\n'album',\n'football college game',\n'mountain pass',\n'doughnut',\n'ski slope',\n'match',\n'official',\n'shadow',\n'organ',\n'celebration',\n'coin',\n'log cabin',\n'firework display',\n'present',\n'twig',\n'chef',\n'confetti',\n'footpath',\n'tour',\n'ponytail',\n'artwork',\n'race car',\n'club',\n'season',\n'hose',\n'pencil',\n'aircraft',\n'rock formation',\n'wardrobe',\n'participant',\n'politician',\n'engineer',\n'peace',\n'filter',\n'sailing boat',\n'water bottle',\n'service dog',\n'poodle',\n'loki',\n'statesman',\n'sleeping bag',\n'outskirt',\n'clock',\n'factory',\n'oak tree',\n'physician',\n'color',\n'room',\n'stairway',\n'company',\n'lady',\n'graph',\n'faucet',\n'tablecloth',\n'subway train',\n'chocolate chip cookie',\n'headquarters',\n'screw',\n'goggle',\n'halloween',\n'city street',\n'swirl',\n'cord',\n'forward',\n'bone',\n'bedding',\n'archway',\n'wig',\n'lobby',\n'mask',\n'attic',\n'kitchen table',\n'skylight',\n'fire',\n'exit',\n'oil painting',\n'passenger',\n'meditation',\n'salmon',\n'fedora',\n'rubber stamp',\n'orange juice',\n'arch',\n'scientist',\n'stroll',\n'manhattan',\n'float',\n'baseball uniform',\n'circle',\n'church',\n'decker bus',\n'competitor',\n'zoo',\n'basketball team',\n'tourist',\n'daughter',\n'silverware',\n'ceiling fan',\n'birth',\n'vase',\n'jack',\n'mushroom',\n'spiral',\n'cage',\n'limb',\n'salad',\n'ad',\n'control',\n'earth',\n'party',\n'bolt',\n'tractor',\n'barley',\n'wedding photo',\n'hawk',\n'warehouse',\n'vegetable garden',\n'chocolate cake',\n'cabbage',\n'floor window',\n'baby shower',\n'magnifying glass',\n'table',\n'stethoscope',\n'reading',\n'mission',\n'croissant',\n'gift box',\n'rocket',\n'forest road',\n'cooking',\n'suite',\n'hill country',\n'motorcycle',\n'baseball player',\n'angle',\n'drug',\n'sport association',\n'championship',\n'family portrait',\n'florist',\n'softball',\n'egret',\n'office',\n'plywood',\n'jockey',\n'mosque',\n'brunch',\n'beanie',\n'office building',\n'pattern',\n'calendar',\n'indoor',\n'pepper',\n'ledge',\n'trail',\n'fuel',\n'laptop computer',\n'tennis shoe',\n'deck chair',\n'guitarist',\n'barn',\n'surgery',\n'cartoon illustration',\n'nebula',\n'railroad',\n'mountain goat',\n'goose',\n'car door',\n'cheer',\n'liquid',\n'hardwood floor',\n'pathway',\n'acorn',\n'gull',\n'airliner',\n'couch',\n'lake house',\n'spaghetti',\n'promenade',\n'collection',\n'garden',\n'bank',\n'robin',\n'tennis ball',\n'peony',\n'gymnast',\n'lavender',\n'deck',\n'test',\n'riverside',\n'rapper',\n'domino',\n'bride',\n'mouse',\n'basil',\n'wedding couple',\n'ocean wave',\n'arm',\n'kitchen floor',\n'grove',\n'family member',\n'backyard',\n'raspberry',\n'forest fire',\n'officer',\n'hibiscus',\n'canyon',\n'composer',\n'signature',\n'olive oil',\n'hibiscus flower',\n'rose',\n'vector icon',\n'sunrise',\n'horseback',\n'motor scooter',\n'office worker',\n'tradition',\n'ingredient',\n'washing machine',\n'lighting',\n'bagel',\n'sailboat',\n'policeman',\n'mare',\n'graphic',\n'halloween pumpkin',\n'stock',\n'pilot',\n'education',\n'team',\n'body',\n'horse',\n'kimono',\n'bazaar',\n'bag',\n'recording studio',\n'parsley',\n'entrance',\n'denim',\n'vet',\n'horse farm',\n'charcoal',\n'architecture',\n'glass vase',\n'puppy',\n'estuary',\n'television show host',\n'city bus',\n'shoulder',\n'beast',\n'balance',\n'golfer',\n'roadside',\n'denim jacket',\n'stone wall',\n'counter top',\n'app icon',\n'toast',\n'head coach',\n'ham',\n'warrior',\n'gem',\n'refrigerator',\n'snowman',\n'construction worker',\n'coal',\n'website',\n'morning fog',\n'mustard',\n'human',\n'owl',\n'puppy dog',\n'piggy bank',\n'vegetation',\n'pirate',\n'action film',\n'marshmallow',\n'thanksgiving',\n'business',\n'disease',\n'signage',\n'greeting',\n'skate park',\n'tile',\n'mouth',\n'spinach',\n'vacation',\n'leader',\n'shrine',\n'walker',\n'science fiction film',\n'bill',\n'rabbit',\n'motor boat',\n'bar',\n'radio',\n'barge',\n'tail',\n'chainsaw',\n'gallery',\n'rainbow',\n'pasta',\n'padlock',\n'web',\n'pastry',\n'ink',\n'reef',\n'school uniform',\n'shawl',\n'treasure',\n'peach',\n'dinner table',\n'injury',\n'harbor',\n'witch',\n'car dealership',\n'litter',\n'gesture',\n'documentary',\n'marriage',\n'sea shell',\n'priest',\n'dome',\n'kit',\n'icon',\n'seaside',\n'bucket',\n'entertainment',\n'stable',\n'hat',\n'puddle',\n'sock',\n'shopper',\n'technology',\n'harbour',\n'orbit',\n'antler',\n'tube',\n'flag waving',\n'cook',\n'tight',\n'commander',\n'farmland',\n'switch',\n'hiker',\n'wedding ceremony',\n'award ceremony',\n'champion',\n'chopstick',\n'farmhouse',\n'performer',\n'spike',\n'accident',\n'cruise ship',\n'passenger train',\n'attraction',\n'entertainer',\n'rear view',\n'sidewalk',\n'parade',\n'racing',\n'plane',\n'ritual',\n'peacock',\n'pocket',\n'plum',\n'drop',\n'carrot',\n'floor',\n'sunset',\n'troop',\n'architect',\n'coffee table',\n'dust',\n'outline',\n'leather',\n'charity event',\n'heat',\n'whale',\n'laundry',\n'coconut tree',\n'crosswalk',\n'pony',\n'ant',\n'pipe',\n'string',\n'coat',\n'angel',\n'beef',\n'church tower',\n'dish',\n'pitch',\n'cupboard',\n'thermometer',\n'dirt field',\n'fireworks',\n'minute',\n'cane',\n'pajama',\n'flower garden',\n'autumn',\n'trash can',\n'dachshund',\n'banana tree',\n'tray',\n'moose',\n'roadway',\n'carnival',\n'antenna',\n'pole',\n'castle wall',\n'ram',\n'cattle',\n'hay',\n'cookie',\n'swimmer',\n'baseball team',\n'strait',\n'hedge',\n'jet',\n'fire pit',\n'octopus',\n'calf',\n'cube',\n'opera',\n'cardboard box',\n'tiara',\n'kitchen sink',\n'prairie',\n'bowl',\n'galaxy',\n'straw hat',\n'linen',\n'ski resort',\n'stitch',\n'street lamp',\n'motorist',\n'icicle',\n'stain',\n'flora',\n'drain',\n'kitchen cabinet',\n'decor',\n'bouquet',\n'pound',\n'interior design',\n'nail polish',\n'figurine',\n'tomb',\n'disc',\n'twist',\n'blouse',\n'ribbon',\n'figure',\n'burger',\n'cork',\n'soccer goalkeeper',\n'train bridge',\n'drinking water',\n'dew',\n'baker',\n'storm cloud',\n'tarmac',\n'tv drama',\n'sponge',\n'magnet',\n'sailor',\n'entry',\n'swan',\n'exercise',\n'sloth',\n'jewel',\n'scuba diver',\n'bite',\n'cat tree',\n'tent',\n'can',\n'tennis match',\n'ecosystem',\n'picket fence',\n'palm',\n'train car',\n'frying pan',\n'rally',\n'tablet pc',\n'reindeer',\n'image',\n'wolf',\n'chin',\n'conservatory',\n'flood water',\n'cityscape',\n'beach sand',\n'car park',\n'pavement',\n'farm field',\n'swimming',\n'winter storm',\n'stem',\n'pillow',\n'inning',\n'gorilla',\n'desk',\n'avenue',\n'fern',\n'money',\n'pearl',\n'train station',\n'skillet',\n'nap',\n'barber',\n'library',\n'freezer',\n'label',\n'rainforest',\n'parking sign',\n'mirror',\n'wing',\n'noodle',\n'press room',\n'sculpture',\n'tablet',\n'viewer',\n'prayer',\n'mini',\n'mechanic',\n'laugh',\n'rice field',\n'hand',\n'mustache',\n'mountain road',\n'catwalk',\n'conference',\n'cape',\n'installation',\n'musician',\n'stream',\n'machine',\n'speech',\n'crocodile',\n'soccer match',\n'town square',\n'passport',\n'post box',\n'point',\n'stone building',\n'motorway',\n'mix',\n'dentist',\n'businessperson',\n'happiness',\n'boat',\n'vineyard',\n'treadmill',\n'glass wall',\n'water droplet',\n'coffee mug',\n'graduate',\n'sunflower',\n'parliament',\n'shepherd',\n'movie',\n'wine',\n'orchard',\n'tulip',\n'motherboard',\n'cup',\n'broom',\n'spot',\n'drawing',\n'polo shirt',\n'graduation',\n'film producer',\n'moonlight',\n'glow',\n'film format',\n't shirt',\n'rock face',\n'sword',\n'clinic',\n'festival day',\n'meadow',\n'staple',\n'pupil',\n'training ground',\n'rider',\n'flower',\n'foal',\n'wharf',\n'foot bridge',\n'shooting',\n'top',\n'mast',\n'police car',\n'robe',\n'wedding bouquet',\n'stop sign',\n'birthday cake',\n'glitter',\n'butter',\n'scooter',\n'tundra',\n'superhero',\n'pocket watch',\n'inscription',\n'youngster',\n'fruit tree',\n'movie poster',\n'engine',\n'foundation',\n'motorcyclist',\n'take',\n'woman',\n'antelope',\n'country artist',\n'road trip',\n'typewriter',\n'tuxedo',\n'brand',\n'pine',\n'bathroom',\n'paradise',\n'texture',\n'balloon',\n'dining table',\n'home',\n'computer screen',\n'actor',\n'clip',\n'tv tower',\n'panorama',\n'summit',\n'cat',\n'plot',\n'eagle',\n'dancer',\n'pup',\n'studio shot',\n'tear',\n'bird bath',\n'classroom',\n'bookstore',\n'city wall',\n'tv programme',\n'blade',\n'easel',\n'buttercream',\n'sweet',\n'designer',\n'diamond',\n'handshake',\n'herb',\n'corn field',\n'seafront',\n'concrete',\n'street artist',\n'gas',\n'stamp',\n'window display',\n'paper',\n'note',\n'pint',\n'quarry',\n'research',\n'fixture',\n'manager',\n'soil',\n'leopard',\n'board game',\n'ladder',\n'stop light',\n'island',\n'ramp',\n'football match',\n'icing',\n'drill',\n'currency',\n'summer evening',\n'topping',\n'pyramid',\n'pomegranate',\n'cell',\n'ivy',\n'squad',\n'scenery',\n'computer',\n'locomotive',\n'surf',\n'mascot',\n'dune',\n'path',\n'duck',\n'twilight',\n'wire',\n'bow tie',\n'strike',\n'cormorant',\n'car wash',\n'crane',\n'market',\n'philosopher',\n'alarm clock',\n'camera',\n'birch',\n'greeting card',\n'plain',\n'clay',\n'donut',\n'lock',\n'moth',\n'laboratory',\n'fan',\n'violin',\n'jazz fusion artist',\n'mountain biker',\n'terrain',\n'magazine',\n'pickup',\n'comedy film',\n'smartphone',\n'film',\n'bed',\n'microwave oven',\n'tournament',\n'lawn',\n'car window',\n'alligator',\n'screen',\n'jetty',\n'shopping bag',\n'landscape view',\n'cabinetry',\n'friendly match',\n'thing',\n'petal',\n'shopping center',\n'transport',\n'ballet dancer',\n'shoreline',\n'princess',\n'car seat',\n'parking meter',\n'green',\n'vodka',\n'band',\n'rock',\n'costume',\n'warning sign',\n'strip',\n'plaque',\n'wheelchair',\n'headband',\n'ginger',\n'dice',\n'media',\n'hairdresser',\n'press',\n'living room',\n'stove',\n'player',\n'cherry',\n'workshop',\n'carving',\n'embroidery',\n'doodle',\n'adventure',\n'rugby player',\n'monument',\n'brush',\n'marker',\n'loft',\n'postcard',\n'collage',\n'ball',\n'professor',\n'dresser',\n'gig',\n'festival',\n'blackbird',\n'makeup artist',\n'video camera',\n'sticker',\n'peak',\n'wildflower',\n'santa hat',\n'rodeo',\n'wedding photographer',\n'guy',\n'staff',\n'waterfall',\n'operation',\n'defender',\n'falcon',\n'haze',\n'individual',\n'gentleman',\n'greyhound',\n'rocking chair',\n'rice',\n'garbage',\n'platter',\n'chocolate',\n'splash',\n'business suit',\n'cheetah',\n'valley',\n'maze',\n'trampoline',\n'garland',\n'slalom',\n'unicorn',\n'tree stump',\n'painting',\n'romance',\n'fight',\n'alcohol',\n'ghost',\n'fondant',\n'spa',\n'shutter',\n'death',\n'demonstration',\n'cotton',\n'pier',\n'flea market',\n'history',\n'savannah',\n'fist',\n'aisle',\n'crew',\n'jug',\n'pose',\n'anchor',\n'teapot',\n'boat house',\n'business team',\n'tripod',\n'bee',\n'pebble',\n'mattress',\n'canvas',\n'hallway',\n'campaign',\n'pod',\n'lake district',\n'article',\n'white',\n'sofa',\n'honey',\n'marathon',\n'pancake',\n'tourist attraction',\n'wedding gown',\n'battle',\n'shelving',\n'sea',\n'sheet music',\n'pie',\n'yarn',\n'construction site',\n'flyer',\n'tie',\n'star',\n'lettuce',\n'martial artist',\n'dart',\n'straw',\n'reflection',\n'conference room',\n'temperature',\n'rugby',\n'mosquito',\n'physicist',\n'rock climber',\n'crash',\n'backdrop',\n'toilet seat',\n'sand castle',\n'water park',\n'toy car',\n'waste',\n'luxury',\n'hangar',\n'rv',\n'tree trunk',\n'board',\n'gold',\n'project picture',\n'cap',\n'cottage',\n'relief',\n'attire',\n'microscope',\n'battery',\n'roll',\n'line',\n'parking garage',\n'crystal',\n'broadcasting',\n'brick wall',\n'lab',\n'flooring',\n'meeting',\n'3d cg rendering',\n'desktop computer',\n'cowboy',\n'sailing ship',\n'junction',\n'hairstyle',\n'homework',\n'profile',\n'model',\n'flower pot',\n'street light',\n'salt lake',\n'maple',\n'space',\n'blizzard',\n'throw',\n'zebras',\n'brochure',\n'constellation',\n'beak',\n'kilt',\n'pond',\n'blue sky',\n'sneaker',\n'sand dune',\n'morning sun',\n'almond',\n'grill',\n'curl',\n'basketball girl game',\n'chameleon',\n'toilet bowl',\n'prince',\n'keyboard',\n'queen',\n'computer monitor',\n'writing',\n'crown',\n'basilica',\n'kiss',\n'house',\n'parking',\n'football competition',\n'shell',\n'sport equipment',\n'comedy',\n'baboon',\n'vendor',\n'rise building',\n'wrap',\n'food truck',\n'cat bed',\n'rickshaw',\n'flare',\n'teal',\n'nectar',\n'eclipse',\n'vehicle',\n'steam locomotive',\n'gorge',\n'cow',\n'christmas card',\n'demonstrator',\n'memorial',\n'towel',\n'jewellery',\n'train',\n'frisbee',\n'baseball game',\n'fur',\n'afternoon sun',\n'community',\n'sparkler',\n'bandage',\n'firework',\n'dollar',\n'pasture',\n'video',\n'bus',\n'tree house',\n'seashore',\n'field',\n'hamburger',\n'souvenir',\n'hedgehog',\n'worm',\n'pine cone',\n'osprey',\n'dinosaur',\n'vegetable',\n'junk',\n'poster',\n'army',\n'winger',\n'bundle',\n'stage',\n'growth',\n'wedding party',\n'service',\n'blanket',\n'ruler',\n'eye',\n'credit card',\n'castle',\n'diner',\n'hut',\n'elk',\n'hard rock artist',\n'nun',\n'dog breed',\n'nest',\n'drama film',\n'number icon',\n'water tank',\n'giraffe',\n'altar',\n'pavilion',\n'tv personality',\n'suv',\n'street vendor',\n'street sign',\n'ditch',\n'debris',\n'foam',\n'takeoff',\n'spice',\n'mountain lake',\n'tea',\n'orchestra',\n'spacecraft',\n'counter',\n'abbey',\n'mountain',\n'hydrangea',\n'racer',\n'orange tree',\n'tide',\n'cowboy hat',\n'rapid',\n'town',\n'wild',\n'herd',\n'vein',\n'driveway',\n'jar',\n'bark',\n'illustration',\n'horror film',\n'corn',\n'stroller',\n'industry',\n'mountain stream',\n'gym',\n'neckline',\n'pan',\n'client',\n'spectator',\n'eggplant',\n'camper',\n'fawn',\n'hoodie',\n'meat',\n'lemonade',\n'food market',\n'slum',\n'comic book character',\n'flower market',\n'love',\n'palace',\n'gun',\n'heel',\n'shopping street',\n'shooting basketball guard',\n'family photo',\n'rooftop',\n'laundry basket',\n'airport runway',\n'horn',\n'face mask',\n'flight',\n'appetizer',\n'violet',\n'country lane',\n'cement',\n'instrument',\n'tv actor',\n'spark',\n'celebrity',\n'award',\n'country house',\n'standing',\n'auction',\n'date',\n'engagement',\n'puck',\n'advertisement',\n'chair',\n'zebra',\n'driftwood',\n'bumblebee',\n'maple leaf',\n'bonnet',\n'orange',\n'water tower',\n'door',\n'singer',\n'floor plan',\n'discussion',\n'theatre',\n'pilgrim',\n'mug',\n'branch',\n'window sill',\n'baseball pitcher',\n'bakery',\n'lollipop',\n'basketball player',\n'toilet paper',\n'chalkboard',\n'cabin',\n'sign',\n'night sky',\n'cannon',\n'fishing net',\n'submarine',\n'suit',\n'fur coat',\n'wine bottle',\n'folder',\n'street art',\n'suspension bridge',\n'evening sky',\n'billboard',\n'postage stamp',\n'newspaper',\n'transportation',\n'surgeon',\n'light',\n'park',\n'horizon',\n'road',\n'sand bar',\n'trumpet',\n'lounge',\n'cloud forest',\n'birthday celebration',\n'balcony',\n'anime',\n'beehive',\n'umbrella',\n'goldfish',\n'baseball cap',\n'waterhole',\n'ceiling',\n'carousel',\n'backpack',\n'plant pot',\n'atmosphere',\n'sunflower field',\n'spire',\n'vision',\n'woodpecker',\n'chip',\n'pool table',\n'lotus flower',\n'cone',\n'humpback whale',\n'reservoir',\n'hunt',\n'piano',\n'plate',\n'dining area',\n'luggage',\n'skier',\n'dance floor',\n'crow',\n'stair',\n'overpass',\n'opera house',\n'bear',\n'jazz artist',\n'water',\n'vessel',\n'cast',\n'yard',\n'cathedral',\n'basketball hoop',\n'graveyard',\n'sound',\n'berry',\n'onlooker',\n'fauna',\n'birch tree',\n'retail',\n'hill',\n'skeleton',\n'journalist',\n'frost',\n'basket',\n'nail',\n'dusk',\n'trash',\n'dawn',\n'clover',\n'hen',\n'volcano',\n'basketball coach',\n'home decor',\n'charge',\n'haircut',\n'sense',\n'university',\n'lizard',\n'daisy',\n'tablet computer',\n'grass field',\n'prison',\n'metal artist',\n'bathroom mirror',\n'window frame',\n'chest',\n'flavor',\n'pop country artist',\n'market square',\n'monkey',\n'blog',\n'deer',\n'speech bubble',\n'dog',\n'independence day',\n'girl',\n'boy',\n'tartan',\n'furniture',\n'appliance',\n'office window',\n'fish boat',\n'sand box',\n'tv sitcom',\n'drama',\n'sleigh',\n'depression',\n'paper towel',\n'baseball',\n'protestor',\n'grape',\n'wedding cake',\n'invitation',\n'accessory',\n'pick',\n'grandparent',\n'racket',\n'tea plantation',\n'outdoors',\n'egg',\n'glass bowl',\n'sun',\n'organization',\n'lion',\n'panel',\n'station',\n'wallpaper',\n'helicopter',\n'salt',\n'vanity',\n'patio',\n'lunch',\n'street performer',\n'mountain range',\n'soup',\n'bacon',\n'power station',\n'cantilever bridge',\n'hummingbird',\n'shirt',\n'rope',\n'hip',\n'chalk',\n'pendant',\n'choir',\n'tv',\n'lichen',\n'railway bridge',\n'art gallery',\n'bartender',\n'wagon',\n'baby elephant',\n'accordion',\n'horseshoe',\n'building site',\n'clutch',\n'harvest',\n'savanna',\n'geranium',\n'business woman',\n'paddock',\n'patch',\n'beech tree',\n'war',\n'suburbs',\n'hospital bed',\n'motorcycle racer',\n'moss',\n'gravel',\n'government agency',\n'dollar bill',\n'father',\n'fjord',\n'concert',\n'nut',\n'wedding photography',\n'finish line',\n'home plate',\n'food',\n'nose',\n'thumb',\n'village',\n'dining room table',\n'bumper',\n'monster',\n'blackberry',\n'lime',\n'conflict',\n'gala',\n'wallet',\n'wrist',\n'hug',\n'mermaid',\n'lava',\n'lawyer',\n'folk rock artist',\n'arena',\n'onion',\n'toothbrush',\n'fashion',\n'perfume',\n'flip',\n'triangle',\n'woodland',\n'mail',\n'grasshopper',\n'studio',\n'wood floor',\n'den',\n'racquet',\n'cello',\n'lemur',\n'astronaut',\n'glass table',\n'blood',\n'dvd',\n'planter',\n'silver',\n'leash',\n'master bedroom',\n'forest',\n'batter',\n'shoe',\n'engraving',\n'opening',\n'product',\n'toe',\n'cocktail',\n'mallard duck',\n'bike ride',\n'oasis',\n'wedding ring',\n'cinematographer',\n'holly',\n'autograph',\n'fence',\n'ice cube',\n'cove',\n'pineapple',\n'aurora',\n'glass bead',\n'produce',\n'apartment building',\n'cob',\n'miniature',\n'cockpit',\n'flashlight',\n'frog',\n'sheep',\n'groom',\n'steel',\n'watermelon',\n'clip art',\n'paper plate',\n'ostrich',\n'contour',\n'mural',\n'cub',\n'paisley bandanna',\n'winery',\n'turn',\n'handle',\n'satellite',\n'post',\n'pork',\n'child',\n'asphalt',\n'grocery store',\n'vulture',\n'trolley',\n'nightclub',\n'brick',\n'trailer',\n'compass',\n'cereal',\n'cafe',\n'cartoon character',\n'sugar',\n'fiction book',\n'glass floor',\n'umpire',\n'guitar',\n'hamster',\n'protester',\n'airplane',\n'garment',\n'blazer',\n'railway line',\n'wedding',\n'shoe box',\n'parking lot',\n'construction',\n'graduation ceremony',\n'tram',\n'telescope',\n'copper',\n'pain',\n'autumn forest',\n'guest house',\n'partner',\n'crayon',\n'dip',\n'boot',\n'corridor',\n'computer keyboard',\n'hockey player',\n'chicken coop',\n'bus station',\n'gathering',\n'ankle',\n'bunk bed',\n'wood table',\n'football coach',\n'monarch',\n'pharmacy',\n'legging',\n'mannequin',\n'female',\n'train track',\n'stack',\n'canopy',\n'design element',\n'grandmother',\n'symbol',\n'beach hut',\n'zucchini',\n'bomb',\n'businessman',\n'skyscraper',\n'tongue',\n'case',\n'sparkle',\n'highland',\n'ballroom',\n'prom',\n'estate',\n'customer',\n'archipelago',\n'cheese',\n'debate',\n'carriage',\n'bulldozer',\n'pumpkin',\n'sitting room',\n'gas station',\n'wedding reception',\n'camp',\n'dog bed',\n'tower',\n'property',\n'river bed',\n'pop latin artist',\n'fridge',\n'wine glass',\n'coast',\n'beer',\n'tow truck',\n'fire truck',\n'mountain bike',\n'thigh',\n'heron',\n'boat ride',\n'gondola',\n'turquoise',\n'lake',\n'llama',\n'kitty',\n'tin',\n'waiting room',\n'coffee cup',\n'socialite',\n'guard',\n'tap',\n'waterway',\n'forehead',\n'list',\n'erosion',\n'box',\n'sea lion',\n'pollen',\n'dam',\n'wasp',\n'salon',\n'tennis tournament',\n'flower box',\n'aquarium',\n'rain cloud',\n'clothing store',\n'lead singer',\n'cupcake',\n'tortoise',\n'lettering',\n'sport facility',\n'dance',\n'dog house',\n'nature',\n'football',\n'rooster',\n'footballer',\n'railway track',\n'crowd',\n'fishing rod',\n'silhouette',\n'wind turbine',\n'sari',\n'bus window',\n'cloud',\n'charity',\n'medal',\n'yoga',\n'event',\n'veil',\n'fashion menswear milan week',\n'news',\n'knife',\n'print',\n'screen tv',\n'walnut',\n'fungus',\n'ice cream',\n'computer mouse',\n'play',\n'tribe',\n'picture',\n'video game',\n'business card',\n'music festival',\n'rack',\n'envelope',\n'shower',\n'dirt road',\n'mine',\n'oyster',\n'monarch butterfly',\n'dude',\n'fruit salad',\n'podium',\n'fork',\n'lace',\n'test match',\n'boulder',\n'cricket player',\n'staircase',\n'peninsula',\n'shopping',\n'popcorn',\n'oak',\n'market stall',\n'pine tree',\n'mountaineer',\n'student',\n'closet',\n'hood',\n'handstand',\n'centerpiece',\n'insect',\n'patient',\n'makeover',\n'tennis player',\n'sheet',\n'park bench',\n'apple',\n'organism',\n'hook',\n'turkey',\n'tangerine',\n'sibling',\n'shopping mall',\n'bird',\n'scarf',\n'smoothie',\n'net',\n'grass',\n'napkin',\n'ray',\n'eyebrow',\n'laptop keyboard',\n'motorbike',\n'woman hand',\n'oven',\n'book cover',\n'easter egg',\n'microwave',\n'sand',\n'snapshot',\n'soccer ball',\n'makeup',\n'knight',\n'bowling ball',\n'shower curtain',\n'flame',\n'lightning',\n'running',\n'power plant',\n'crib',\n'cartoon',\n'moat',\n'fashion girl',\n'wedding invitation',\n'bottle',\n'cliff',\n'monastery',\n'file photo',\n'apartment',\n'casino',\n'cream',\n'sweatshirt',\n'storm',\n'cruise',\n'teddy bear',\n'shovel',\n'wind farm',\n'writer',\n'dock',\n'professional',\n'hotel room',\n'job',\n'monitor',\n'donkey',\n'pass',\n'interview',\n'duchess',\n'mark',\n'plank',\n'beard',\n'zombie',\n'trio',\n'channel',\n'cricket team',\n'windmill',\n'vest',\n'diagram',\n'cable',\n'winter scene',\n'golden gate bridge',\n'buffalo',\n'studio portrait',\n'pagoda',\n'whiskey',\n'freight train',\n'kite',\n'future',\n'steam train',\n'phone box',\n'headset',\n'wood',\n'snowboarder',\n'paper bag',\n'slide',\n'grapefruit',\n'seating',\n'morning',\n'bronze sculpture',\n'theatre actor',\n'stump',\n'jean',\n'landmark',\n'jam',\n'waist',\n'watercolor',\n'hammock',\n'light fixture',\n'ice',\n'basin',\n'beverage',\n'shelter',\n'premiere',\n'mound',\n'ear',\n'bronze',\n'sunlight',\n'street',\n'energy',\n'barn door',\n'hike',\n'fleet',\n'claw',\n'beach',\n'pepperoni',\n'bin',\n'trainer',\n'buffet',\n'archive',\n'toddler',\n'referee',\n'bay window',\n'dove',\n'production company',\n'evening light',\n'gate',\n'farm',\n'reed',\n'fruit stand',\n'explorer',\n'snow storm',\n'throw pillow',\n'button',\n'display case',\n'bookcase',\n'lead',\n'lipstick',\n'basketball court',\n'cargo',\n'ensemble',\n'pope',\n'clock tower',\n'teen',\n'speaker',\n'rat',\n'laptop',\n'ski',\n'mess',\n'stadium',\n'ferry boat',\n'bunny',\n'waterfront',\n'downtown',\n'sink',\n'press conference',\n'dinner',\n'condiment',\n'thread',\n'audience',\n'grid',\n'car',\n'plastic',\n'people',\n'barbecue',\n'pigeon',\n'urinal',\n'seagull',\n'volunteer',\n'hockey',\n'fir tree',\n'pollution',\n'trial',\n'collar',\n'area',\n'meeting room',\n'circus',\n'yogurt',\n'orangutan',\n'viaduct',\n'comedian',\n'drone',\n'scissor',\n'pop rock artist',\n'biscuit',\n'panda',\n'water feature',\n'air balloon',\n'remote control',\n'watercolor painting',\n'show',\n'walk',\n'post office',\n'bike path',\n'rap gangsta artist',\n'microphone',\n'crack',\n'sunset sky',\n'glass',\n'tv show',\n'cartoon style',\n'stripe',\n'foyer',\n'signal',\n'calligraphy',\n'bulb',\n'gardener',\n'coffee bean',\n'spider',\n'tapestry',\n'city skyline',\n'necklace',\n'kitten',\n'traveler',\n'veteran',\n'frosting',\n'fry',\n'tennis court',\n'tank top',\n'butterfly house',\n'mist',\n'drummer',\n'water level',\n'scale',\n'baseball glove',\n'music video performer',\n'champagne',\n'camping',\n'clothing',\n'water drop',\n'telephone box',\n'pen',\n'morning mist',\n'fire engine',\n'porch',\n'opening ceremony',\n'style',\n'palm tree',\n'fashion show',\n'universe',\n'scratch',\n'axe',\n'ottoman',\n'explosion',\n'rib',\n'boutique',\n'game',\n'cucumber',\n'fruit',\n'stone bridge',\n'nature reserve',\n'track',\n'train window',\n'punch',\n'telephone pole',\n'velvet',\n'sauce',\n'moon',\n'contrast',\n'flamingo',\n'bat',\n'vending machine',\n'ship',\n'equestrian',\n'shade',\n'comforter',\n'pallet',\n'sparrow',\n'wii',\n'glaze',\n'grocery',\n'steeple',\n'soccer player',\n'contract',\n'advertising',\n'runner',\n'chimpanzee',\n'world',\n'seat',\n'project',\n'chihuahua',\n'bubble',\n'willow',\n'pedestal',\n'soul hip hop artist',\n'curb',\n'drawer',\n'leaf',\n'banner',\n'launch party',\n'coach',\n'government',\n'snowball',\n'toy',\n'portrait',\n'doctor',\n'whiteboard',\n'electronic',\n'tiger',\n'graffiti',\n'column',\n'nightstand',\n'whistle',\n'maxi dress',\n'bench',\n'wetsuit',\n'bird feeder',\n'football game',\n'basketball',\n'class',\n'bathroom door',\n'store window',\n'text message',\n'wreath',\n'street view',\n'binocular',\n'pet',\n'facade',\n'drought',\n'lemon',\n'new year',\n'night view',\n'airplane window',\n'specie',\n'rule',\n'jaw',\n'wheat field',\n'diet',\n'pop artist',\n'habitat',\n'screenshot',\n'scoreboard',\n'shore',\n'mane',\n'quilt',\n'ski lift',\n'orchid',\n'turban',\n'christmas',\n'airport',\n'marina',\n'glass door',\n'glass bottle',\n'restaurant',\n'conductor',\n'logo',\n'sleep',\n'tape',\n'tomato',\n'river bank',\n'lilac',\n'tooth',\n'training',\n'pottery',\n'shop',\n'steam engine',\n'mason jar',\n'base',\n'procession',\n'border',\n'shoot',\n'footprint',\n'hotdog',\n'bull',\n'stocking',\n'recreation',\n'automobile model',\n'design',\n'country pop artist',\n'river',\n'retriever',\n'department store',\n'auditorium',\n'sport car',\n'supermarket',\n'belt',\n'cricket',\n'window box',\n'dress shirt',\n'letter',\n'residence',\n'megaphone',\n'pant',\n'wildfire',\n'bird nest',\n'crab',\n'swimsuit',\n'candle',\n'funeral',\n'mill',\n'national park',\n'plant',\n'cop',\n'power line',\n'perch',\n'blue',\n'finger',\n'ferris wheel',\n'globe',\n'skateboard',\n'helmet',\n'movie theater',\n'uniform',\n'hammer',\n'material',\n'kid',\n'well',\n'butterfly',\n'sideline',\n'fashion fall show',\n'planet earth',\n'lift',\n'male',\n'sauna',\n'gray',\n'flour',\n'sand sculpture',\n'program',\n'cabinet',\n'infant',\n'wheel',\n'aircraft model',\n'dough',\n'garlic',\n'skate',\n'arrow',\n'wrapping paper',\n'ripple',\n'lamp',\n'iron',\n'banknote',\n'beaver',\n'ferry',\n'courtyard',\n'bassist',\n'countryside',\n'steak',\n'comfort',\n'boxer',\n'laundry room',\n'campsite',\n'brick building',\n'golf',\n'subway',\n'headphone',\n'fort',\n'handbag',\n'drum',\n'flood',\n'saddle',\n'bass',\n'labyrinth',\n'needle',\n'sun ray',\n'app',\n'menu',\n'president',\n'cardigan',\n'dandelion',\n'wetland',\n'ice hockey player',\n'number',\n'city hall',\n'fishing',\n'portrait session',\n'pug',\n'key',\n'art print',\n'minister',\n'hurdle',\n'emergency',\n'painting artist',\n'flag pole',\n'evening',\n'purse',\n'recipe',\n'golf ball',\n'coloring book',\n'mountain peak',\n'senior',\n'holiday',\n'bud',\n'cousin',\n'pantry',\n'lap',\n'skin',\n'flag',\n'tissue paper',\n'ridge',\n'wire fence',\n'surfer',\n'climber',\n'photograph',\n'sewing machine',\n'cooler',\n'actress',\n'apple tree',\n'cancer',\n'starfish',\n'automobile make',\n'dumbbell',\n'brace',\n'tunnel',\n'window',\n'paint artist',\n'composition',\n'school student',\n'condo',\n'convertible',\n'cushion',\n'selfie',\n'territory',\n'guide',\n'tree',\n'court',\n'shrimp',\n'stone house',\n'dress',\n'eyelash',\n'juice',\n'broccoli',\n'chain',\n'tourism',\n'mountain top',\n'concept car',\n'film premiere',\n'light bulb',\n'cafeteria',\n'badge',\n'flower bed',\n'theater',\n'root',\n'racecar driver',\n'basketball boy game',\n'glove',\n'skyline',\n'wall',\n'glacier',\n'airport terminal',\n'bug',\n'trim',\n'railway station',\n'briefcase',\n'flat',\n'fountain',\n'person',\n'lane',\n'asparagus',\n'art',\n'lantern',\n'dishwasher',\n'director',\n'snake',\n'lecture',\n'game controller',\n'tree branch',\n'pub',\n'bathing suit',\n'queue',\n'belly',\n'poppy',\n'bow',\n'pitcher',\n'ice cream cone',\n'cave',\n'candy',\n'road bridge',\n'host',\n'traffic jam',\n'earring',\n'file',\n'foot',\n'watermark overlay stamp',\n'mailbox',\n'supercar',\n'railing',\n'bedroom',\n'seafood',\n'waffle',\n'bronze statue',\n'plan',\n'flow',\n'marble',\n'basketball game',\n'automobile',\n'scene',\n'cypress tree',\n'soldier',\n'skateboarder',\n'glass building',\n'cherry tree',\n'pump',\n'grain',\n'wildebeest',\n'loop',\n'frame',\n'bathtub',\n'saxophone',\n'diver',\n'stalk',\n'lily',\n'bead',\n'alley',\n'flock',\n'family room',\n'manufacturing',\n'pointer',\n'worker',\n'navy',\n'potato',\n'teacher',\n'photography',\n'dolly',\n'boardwalk',\n'water fountain',\n'athlete',\n'side dish',\n'bay',\n'ice hockey',\n'phone',\n'hero',\n'face',\n'gold medal',\n'blind',\n'swamp',\n'researcher',\n'swim',\n'meatball',\n'iguana',\n'leather jacket',\n'jellyfish',\n'site',\n'smoke',\n'traffic signal',\n'melon',\n'beetle',\n'calculator',\n'skirt',\n'plantation',\n'sculptor',\n'barrier',\n'catcher',\n'security guard',\n'sketch',\n'awning',\n'steering wheel',\n'mountain view',\n'bus stop',\n'pool',\n'leg',\n'spotlight',\n'apron',\n'mineral',\n'inlet',\n'sleeve',\n'torch',\n'emotion',\n'march',\n'police officer',\n'performance',\n'lamp post',\n'fishing boat',\n'summer',\n'presentation',\n'saucer',\n'suitcase',\n'supermodel',\n'goalkeeper',\n'shrub',\n'rock artist',\n'document',\n'beach house',\n'man',\n'blue artist',\n'cigar',\n'railroad track',\n'gown',\n'mosaic',\n'bungalow',\n'alphabet',\n'baseball field',\n'shed',\n'pedestrian',\n'rail',\n'soap',\n'kitchen counter',\n'dessert',\n'dunk',\n'blossom',\n'conversation',\n'fruit market',\n'glass jar',\n'military',\n'beer bottle',\n'photographer',\n'tennis racket',\n'competition',\n'escalator',\n'bell tower',\n'stilt',\n'ballerina',\n'television',\n'feather',\n'fence post',\n'rear',\n'dahlia',\n'red carpet',\n'tub',\n'hole',\n'fortress',\n'pack',\n'telephone',\n'cardboard',\n'city park',\n'platform',\n'college student',\n'arch bridge',\n'wind',\n'blender',\n'bloom',\n'ice rink',\n'birthday',\n'raven',\n'fairy',\n'embankment',\n'hall',\n'flower shop',\n'suburb',\n'barrel',\n'biker',\n'steam',\n'dragonfly',\n'formation',\n'electricity',\n'business people',\n'symmetry',\n'walkway',\n'fisherman',\n'gas mask',\n'loch',\n'youth',\n'hanger',\n'dot',\n'fish',\n'street market',\n'animation film',\n'crime fiction film',\n'boar',\n'emblem',\n'halloween costume',\n'kangaroo',\n'couple',\n'spoon',\n'squirrel',\n'neon sign',\n'sky',\n'office desk',\n'beauty salon',\n'breakwater',\n'fashion look',\n'toaster',\n'author',\n'news conference',\n'outdoor',\n'canoe',\n'dragon',\n'tool',\n'shopping centre',\n'ladybug',\n'swimming pool',\n'landscaping',\n'ski pole',\n'red',\n'truck',\n'fly',\n'temple',\n'level',\n'sunday',\n'railroad bridge',\n'car mirror',\n'lawn mower',\n'flute',\n'aircraft carrier',\n'fashion menswear london week',\n'sunshine',\n'tile floor',\n'skull',\n'fossil',\n'flower arrangement',\n'diaper',\n'sea turtle',\n'cherry blossom',\n'fireman',\n'shack',\n'lens',\n'waiter',\n'animal',\n'basement',\n'snow',\n'autumn park',\n'glass box',\n'kick',\n'head',\n'anniversary',\n'vine',\n'back',\n'paper lantern',\n'fish tank',\n'cellphone',\n'silk',\n'coral',\n'notebook',\n'photo',\n'gazebo',\n'ketchup',\n'driver',\n'farmer',\n'bonfire',\n'chestnut',\n'photoshoot',\n'football field',\n'olive tree',\n'pheasant',\n'sandal',\n'toilet',\n'fireplace',\n'music',\n'deity',\n'fish market',\n'fig',\n'bell',\n'neck',\n'grave',\n'villa',\n'cyclist',\n'crate',\n'grey',\n'asphalt road',\n'soccer',\n'hostel',\n'municipality',\n'courthouse',\n'roof',\n'end table',\n'pot',\n'sedan',\n'structure',\n'folk artist',\n'sport',\n'sport team',\n'protest',\n'syringe',\n'fashion designer',\n'jersey',\n'heart shape',\n'kayak',\n'stare',\n'sit with',\n'direct',\n'read',\n'photograph',\n'spin',\n'teach',\n'laugh',\n'carve',\n'grow on',\n'warm',\n'watch',\n'stretch',\n'smell',\n'decorate',\n'shine',\n'light',\n'dance',\n'send',\n'park',\n'chase',\n'collect',\n'lead',\n'kiss',\n'lead to',\n'lick',\n'smile',\n'cheer',\n'sit',\n'point',\n'block',\n'rock',\n'drop',\n'cut',\n'ski',\n'wrap',\n'lose',\n'serve',\n'provide',\n'sleep',\n'dress',\n'embrace',\n'burn',\n'pack',\n'stir',\n'create',\n'touch',\n'wash',\n'stick',\n'reveal',\n'shop',\n'train',\n'paint',\n'groom',\n'hunt',\n'bloom',\n'play',\n'pay',\n'brush',\n'shoot',\n'hold',\n'picture',\n'carry',\n'sip',\n'contain',\n'turn',\n'pour',\n'pitch',\n'give',\n'add',\n'blow',\n'look in',\n'show',\n'walk',\n'illuminate',\n'kneel',\n'cover',\n'drag',\n'post',\n'present',\n'fit',\n'operate',\n'fish',\n'race',\n'write',\n'deliver',\n'peel',\n'push',\n'run',\n'sit around',\n'buy',\n'jump',\n'walk on',\n'attend',\n'clean',\n'sell',\n'ride on',\n'mount',\n'host',\n'dry',\n'plant',\n'sing',\n'row',\n'shake',\n'perch',\n'ride',\n'fight',\n'skateboard',\n'live',\n'call',\n'surround',\n'practice',\n'play on',\n'work on',\n'step',\n'relax',\n'hit',\n'fall in',\n'flow',\n'greet',\n'launch',\n'wear',\n'hang on',\n'drive',\n'sit in',\n'break',\n'learn',\n'fly',\n'connect',\n'display',\n'locate',\n'compete',\n'go for',\n'sail',\n'lift',\n'toast',\n'help',\n'run on',\n'reflect',\n'pose',\n'scratch',\n'frame',\n'dribble',\n'herd',\n'enter',\n'exit',\n'place',\n'inspect',\n'build',\n'pick',\n'fill',\n'grind',\n'skate',\n'offer',\n'float',\n'sit by',\n'stand',\n'release',\n'rest',\n'singe',\n'climb',\n'tie',\n'mark',\n'lay',\n'stand around',\n'capture',\n'set',\n'land',\n'swinge',\n'run in',\n'kick',\n'lean',\n'head',\n'sign',\n'approach',\n'swim',\n'close',\n'crash',\n'control',\n'fall',\n'remove',\n'repair',\n'open',\n'appear',\n'travel',\n'load',\n'miss',\n'check',\n'surf',\n'moor',\n'smoke',\n'drink',\n'board',\n'seat',\n'feed',\n'rise',\n'sit on',\n'swing',\n'grow',\n'strike',\n'date',\n'slide',\n'share',\n'graze',\n'jump in',\n'lie',\n'extrude',\n'roll',\n'move',\n'gather',\n'eat',\n'pull',\n'run through',\n'squeeze',\n'lay on',\n'draw',\n'play with',\n'wave',\n'assemble',\n'perform',\n'march',\n'score',\n'attach',\n'adjust',\n'hang',\n'hug',\n'sleep on',\n'throw',\n'live in',\n'talk',\n'pet',\n'work',\n'run with',\n'see',\n'flip',\n'catch',\n'cook',\n'receive',\n'celebrate',\n'look',\n'classic',\n'bridal',\n'indoor',\n'industrial',\n'teenage',\n'mini',\n'grassy',\n'aged',\n'long',\n'warm',\n'light',\n'handsome',\n'happy',\n'three',\n'pregnant',\n'circular',\n'urban',\n'silver',\n'ceramic',\n'3d',\n'green',\n'blonde',\n'golden',\n'dark',\n'tropical',\n'ripe',\n'deep',\n'fat',\n'musical',\n'giant',\n'medical',\n'medieval',\n'bare',\n'stunning',\n'bold',\n'geographical',\n'huge',\n'plastic',\n'foggy',\n'stormy',\n'gothic',\n'biological',\n'empty',\n'clear',\n'antique',\n'pink',\n'steep',\n'brown',\n'striped',\n'aerial',\n'rainy',\n'cool',\n'flying',\n'commercial',\n'purple',\n'trendy',\n'blank',\n'haired',\n'dead',\n'wooden',\n'flat',\n'high',\n'beige',\n'panoramic',\n'angry',\n'dozen',\n'rural',\n'solar',\n'big',\n'small',\n'stained',\n'thick',\n'many',\n'fresh',\n'clean',\n'strong',\n'abstract',\n'crowded',\n'retro',\n'dry',\n'gorgeous',\n'martial',\n'modern',\n'blue',\n'cloudy',\n'low',\n'four',\n'outdoor',\n'single',\n'much',\n'beautiful',\n'snowy',\n'pretty',\n'new',\n'short',\n'sunny',\n'closed',\n'rocky',\n'red',\n'two',\n'double',\n'male',\n'gray',\n'five',\n'colorful',\n'automotive',\n'various',\n'one',\n'old',\n'rusty',\n'tall',\n'wild',\n'narrow',\n'natural',\n'several',\n'frozen',\n'textured',\n'lush',\n'young',\n'hot',\n'mixed',\n'white',\n'float',\n'quiet',\n'round',\n'bright',\n'religious',\n'female',\n'historical',\n'shiny',\n'traditional',\n'tourist',\n'yellow',\n'bald',\n'coastal',\n'lovely',\n'little',\n'broken',\n'romantic',\n'wide',\n'royal',\n'rich',\n'open',\n'cute',\n'ancient',\n'cold',\n'political',\n'elderly',\n'gold',\n'full',\n'rustic',\n'metallic',\n'floral',\n'sad',\n'wet',\n'fancy',\n'senior',\n'tiny',\n'stylish',\n'large',\n'frosty',\n'orange',\n'transparent',\n'electronic',\n'shallow',\n'scared',\n'armed',\n'dirty',\n'historic',\n'black',\n'few',\n'windy',\n'some',\n'square',\n'ornamental',\n'sandy',\n'thin']\n\n\ntra_array = np.array(tra_array)\n\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/tag2Text/vit.py",
    "content": "'''\n * Copyright (c) 2022, salesforce.com, inc.\n * All rights reserved.\n * SPDX-License-Identifier: BSD-3-Clause\n * For full license text, see LICENSE.txt file in the repo root or https://opensource.org/licenses/BSD-3-Clause\n * By Junnan Li\n * Based on timm code base\n * https://github.com/rwightman/pytorch-image-models/tree/master/timm\n'''\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom functools import partial\n\nfrom timm.models.vision_transformer import _cfg, PatchEmbed\nfrom timm.models.registry import register_model\nfrom timm.models.layers import trunc_normal_, DropPath\nfrom timm.models.helpers import named_apply, adapt_input_conv\n\nfrom fairscale.nn.checkpoint.checkpoint_activations import checkpoint_wrapper\n\nclass Mlp(nn.Module):\n    \"\"\" MLP as used in Vision Transformer, MLP-Mixer and related networks\n    \"\"\"\n    def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):\n        super().__init__()\n        out_features = out_features or in_features\n        hidden_features = hidden_features or in_features\n        self.fc1 = nn.Linear(in_features, hidden_features)\n        self.act = act_layer()\n        self.fc2 = nn.Linear(hidden_features, out_features)\n        self.drop = nn.Dropout(drop)\n\n    def forward(self, x):\n        x = self.fc1(x)\n        x = self.act(x)\n        x = self.drop(x)\n        x = self.fc2(x)\n        x = self.drop(x)\n        return x\n\n\nclass Attention(nn.Module):\n    def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0.):\n        super().__init__()\n        self.num_heads = num_heads\n        head_dim = dim // num_heads\n        # NOTE scale factor was wrong in my original version, can set manually to be compat with prev weights\n        self.scale = qk_scale or head_dim ** -0.5\n        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj = nn.Linear(dim, dim)\n        self.proj_drop = nn.Dropout(proj_drop)\n        self.attn_gradients = None\n        self.attention_map = None\n        \n    def save_attn_gradients(self, attn_gradients):\n        self.attn_gradients = attn_gradients\n        \n    def get_attn_gradients(self):\n        return self.attn_gradients\n    \n    def save_attention_map(self, attention_map):\n        self.attention_map = attention_map\n        \n    def get_attention_map(self):\n        return self.attention_map\n    \n    def forward(self, x, register_hook=False):\n        B, N, C = x.shape\n        qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)\n        q, k, v = qkv[0], qkv[1], qkv[2]   # make torchscript happy (cannot use tensor as tuple)\n\n        attn = (q @ k.transpose(-2, -1)) * self.scale\n        attn = attn.softmax(dim=-1)\n        attn = self.attn_drop(attn)\n                \n        if register_hook:\n            self.save_attention_map(attn)\n            attn.register_hook(self.save_attn_gradients)        \n\n        x = (attn @ v).transpose(1, 2).reshape(B, N, C)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        return x\n\n\nclass Block(nn.Module):\n\n    def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,\n                 drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm, use_grad_checkpointing=False):\n        super().__init__()\n        self.norm1 = norm_layer(dim)\n        self.attn = Attention(\n            dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop)\n        # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here\n        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n        self.norm2 = norm_layer(dim)\n        mlp_hidden_dim = int(dim * mlp_ratio)\n        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)\n\n        if use_grad_checkpointing:\n            self.attn = checkpoint_wrapper(self.attn)\n            self.mlp = checkpoint_wrapper(self.mlp)\n\n    def forward(self, x, register_hook=False):\n        x = x + self.drop_path(self.attn(self.norm1(x), register_hook=register_hook))\n        x = x + self.drop_path(self.mlp(self.norm2(x)))\n        return x\n\n    \nclass VisionTransformer(nn.Module):\n    \"\"\" Vision Transformer\n    A PyTorch impl of : `An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale`  -\n        https://arxiv.org/abs/2010.11929\n    \"\"\"\n    def __init__(self, img_size=224, patch_size=16, in_chans=3, num_classes=1000, embed_dim=768, depth=12,\n                 num_heads=12, mlp_ratio=4., qkv_bias=True, qk_scale=None, representation_size=None,\n                 drop_rate=0., attn_drop_rate=0., drop_path_rate=0., norm_layer=None, \n                 use_grad_checkpointing=False, ckpt_layer=0):\n        \"\"\"\n        Args:\n            img_size (int, tuple): input image size\n            patch_size (int, tuple): patch size\n            in_chans (int): number of input channels\n            num_classes (int): number of classes for classification head\n            embed_dim (int): embedding dimension\n            depth (int): depth of transformer\n            num_heads (int): number of attention heads\n            mlp_ratio (int): ratio of mlp hidden dim to embedding dim\n            qkv_bias (bool): enable bias for qkv if True\n            qk_scale (float): override default qk scale of head_dim ** -0.5 if set\n            representation_size (Optional[int]): enable and set representation layer (pre-logits) to this value if set\n            drop_rate (float): dropout rate\n            attn_drop_rate (float): attention dropout rate\n            drop_path_rate (float): stochastic depth rate\n            norm_layer: (nn.Module): normalization layer\n        \"\"\"\n        super().__init__()\n        self.num_features = self.embed_dim = embed_dim  # num_features for consistency with other models\n        norm_layer = norm_layer or partial(nn.LayerNorm, eps=1e-6)\n\n        self.patch_embed = PatchEmbed(\n            img_size=img_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim)\n\n        num_patches = self.patch_embed.num_patches\n\n        self.cls_token = nn.Parameter(torch.zeros(1, 1, embed_dim))\n        self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim))\n        self.pos_drop = nn.Dropout(p=drop_rate)\n\n        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)]  # stochastic depth decay rule\n        self.blocks = nn.ModuleList([\n            Block(\n                dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,\n                drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer,\n                use_grad_checkpointing=(use_grad_checkpointing and i>=depth-ckpt_layer)\n            )\n            for i in range(depth)])\n        self.norm = norm_layer(embed_dim)\n\n        trunc_normal_(self.pos_embed, std=.02)\n        trunc_normal_(self.cls_token, std=.02)\n        self.apply(self._init_weights)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Linear):\n            trunc_normal_(m.weight, std=.02)\n            if isinstance(m, nn.Linear) and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n        elif isinstance(m, nn.LayerNorm):\n            nn.init.constant_(m.bias, 0)\n            nn.init.constant_(m.weight, 1.0)\n\n    @torch.jit.ignore\n    def no_weight_decay(self):\n        return {'pos_embed', 'cls_token'}\n\n    def forward(self, x, register_blk=-1):\n        B = x.shape[0]\n        x = self.patch_embed(x)\n\n        cls_tokens = self.cls_token.expand(B, -1, -1)  # stole cls_tokens impl from Phil Wang, thanks\n        x = torch.cat((cls_tokens, x), dim=1)\n  \n        x = x + self.pos_embed[:,:x.size(1),:]\n        x = self.pos_drop(x)\n\n        for i,blk in enumerate(self.blocks):\n            x = blk(x, register_blk==i)\n        x = self.norm(x)\n        \n        return x\n\n    @torch.jit.ignore()\n    def load_pretrained(self, checkpoint_path, prefix=''):\n        _load_weights(self, checkpoint_path, prefix)\n        \n\n@torch.no_grad()\ndef _load_weights(model: VisionTransformer, checkpoint_path: str, prefix: str = ''):\n    \"\"\" Load weights from .npz checkpoints for official Google Brain Flax implementation\n    \"\"\"\n    import numpy as np\n\n    def _n2p(w, t=True):\n        if w.ndim == 4 and w.shape[0] == w.shape[1] == w.shape[2] == 1:\n            w = w.flatten()\n        if t:\n            if w.ndim == 4:\n                w = w.transpose([3, 2, 0, 1])\n            elif w.ndim == 3:\n                w = w.transpose([2, 0, 1])\n            elif w.ndim == 2:\n                w = w.transpose([1, 0])\n        return torch.from_numpy(w)\n\n    w = np.load(checkpoint_path)\n    if not prefix and 'opt/target/embedding/kernel' in w:\n        prefix = 'opt/target/'\n\n    if hasattr(model.patch_embed, 'backbone'):\n        # hybrid\n        backbone = model.patch_embed.backbone\n        stem_only = not hasattr(backbone, 'stem')\n        stem = backbone if stem_only else backbone.stem\n        stem.conv.weight.copy_(adapt_input_conv(stem.conv.weight.shape[1], _n2p(w[f'{prefix}conv_root/kernel'])))\n        stem.norm.weight.copy_(_n2p(w[f'{prefix}gn_root/scale']))\n        stem.norm.bias.copy_(_n2p(w[f'{prefix}gn_root/bias']))\n        if not stem_only:\n            for i, stage in enumerate(backbone.stages):\n                for j, block in enumerate(stage.blocks):\n                    bp = f'{prefix}block{i + 1}/unit{j + 1}/'\n                    for r in range(3):\n                        getattr(block, f'conv{r + 1}').weight.copy_(_n2p(w[f'{bp}conv{r + 1}/kernel']))\n                        getattr(block, f'norm{r + 1}').weight.copy_(_n2p(w[f'{bp}gn{r + 1}/scale']))\n                        getattr(block, f'norm{r + 1}').bias.copy_(_n2p(w[f'{bp}gn{r + 1}/bias']))\n                    if block.downsample is not None:\n                        block.downsample.conv.weight.copy_(_n2p(w[f'{bp}conv_proj/kernel']))\n                        block.downsample.norm.weight.copy_(_n2p(w[f'{bp}gn_proj/scale']))\n                        block.downsample.norm.bias.copy_(_n2p(w[f'{bp}gn_proj/bias']))\n        embed_conv_w = _n2p(w[f'{prefix}embedding/kernel'])\n    else:\n        embed_conv_w = adapt_input_conv(\n            model.patch_embed.proj.weight.shape[1], _n2p(w[f'{prefix}embedding/kernel']))\n    model.patch_embed.proj.weight.copy_(embed_conv_w)\n    model.patch_embed.proj.bias.copy_(_n2p(w[f'{prefix}embedding/bias']))\n    model.cls_token.copy_(_n2p(w[f'{prefix}cls'], t=False))\n    pos_embed_w = _n2p(w[f'{prefix}Transformer/posembed_input/pos_embedding'], t=False)\n    if pos_embed_w.shape != model.pos_embed.shape:\n        pos_embed_w = resize_pos_embed(  # resize pos embedding when different size from pretrained weights\n            pos_embed_w, model.pos_embed, getattr(model, 'num_tokens', 1), model.patch_embed.grid_size)\n    model.pos_embed.copy_(pos_embed_w)\n    model.norm.weight.copy_(_n2p(w[f'{prefix}Transformer/encoder_norm/scale']))\n    model.norm.bias.copy_(_n2p(w[f'{prefix}Transformer/encoder_norm/bias']))\n#     if isinstance(model.head, nn.Linear) and model.head.bias.shape[0] == w[f'{prefix}head/bias'].shape[-1]:\n#         model.head.weight.copy_(_n2p(w[f'{prefix}head/kernel']))\n#         model.head.bias.copy_(_n2p(w[f'{prefix}head/bias']))\n#     if isinstance(getattr(model.pre_logits, 'fc', None), nn.Linear) and f'{prefix}pre_logits/bias' in w:\n#         model.pre_logits.fc.weight.copy_(_n2p(w[f'{prefix}pre_logits/kernel']))\n#         model.pre_logits.fc.bias.copy_(_n2p(w[f'{prefix}pre_logits/bias']))\n    for i, block in enumerate(model.blocks.children()):\n        block_prefix = f'{prefix}Transformer/encoderblock_{i}/'\n        mha_prefix = block_prefix + 'MultiHeadDotProductAttention_1/'\n        block.norm1.weight.copy_(_n2p(w[f'{block_prefix}LayerNorm_0/scale']))\n        block.norm1.bias.copy_(_n2p(w[f'{block_prefix}LayerNorm_0/bias']))\n        block.attn.qkv.weight.copy_(torch.cat([\n            _n2p(w[f'{mha_prefix}{n}/kernel'], t=False).flatten(1).T for n in ('query', 'key', 'value')]))\n        block.attn.qkv.bias.copy_(torch.cat([\n            _n2p(w[f'{mha_prefix}{n}/bias'], t=False).reshape(-1) for n in ('query', 'key', 'value')]))\n        block.attn.proj.weight.copy_(_n2p(w[f'{mha_prefix}out/kernel']).flatten(1))\n        block.attn.proj.bias.copy_(_n2p(w[f'{mha_prefix}out/bias']))\n        for r in range(2):\n            getattr(block.mlp, f'fc{r + 1}').weight.copy_(_n2p(w[f'{block_prefix}MlpBlock_3/Dense_{r}/kernel']))\n            getattr(block.mlp, f'fc{r + 1}').bias.copy_(_n2p(w[f'{block_prefix}MlpBlock_3/Dense_{r}/bias']))\n        block.norm2.weight.copy_(_n2p(w[f'{block_prefix}LayerNorm_2/scale']))\n        block.norm2.bias.copy_(_n2p(w[f'{block_prefix}LayerNorm_2/bias']))\n\n            \ndef interpolate_pos_embed(pos_embed_checkpoint, visual_encoder):        \n    # interpolate position embedding\n    embedding_size = pos_embed_checkpoint.shape[-1]\n    num_patches = visual_encoder.patch_embed.num_patches\n    num_extra_tokens = visual_encoder.pos_embed.shape[-2] - num_patches\n    # height (== width) for the checkpoint position embedding\n    orig_size = int((pos_embed_checkpoint.shape[-2] - num_extra_tokens) ** 0.5)\n    # height (== width) for the new position embedding\n    new_size = int(num_patches ** 0.5)\n\n    if orig_size!=new_size:\n        # class_token and dist_token are kept unchanged\n        extra_tokens = pos_embed_checkpoint[:, :num_extra_tokens]\n        # only the position tokens are interpolated\n        pos_tokens = pos_embed_checkpoint[:, num_extra_tokens:]\n        pos_tokens = pos_tokens.reshape(-1, orig_size, orig_size, embedding_size).permute(0, 3, 1, 2)\n        pos_tokens = torch.nn.functional.interpolate(\n            pos_tokens, size=(new_size, new_size), mode='bicubic', align_corners=False)\n        pos_tokens = pos_tokens.permute(0, 2, 3, 1).flatten(1, 2)\n        new_pos_embed = torch.cat((extra_tokens, pos_tokens), dim=1)\n        print('reshape position embedding from %d to %d'%(orig_size ** 2,new_size ** 2))\n        \n        return new_pos_embed    \n    else:\n        return pos_embed_checkpoint"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/datasets/__init__.py",
    "content": "from .build import build_dataset, build_pretraining_dataset"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/datasets/build.py",
    "content": "import os\nfrom torchvision import transforms\nfrom .transforms import *\nfrom .masking_generator import TubeMaskingGenerator, RandomMaskingGenerator\nfrom .mae import VideoMAE\nfrom .kinetics import VideoClsDataset\nfrom .kinetics_sparse import VideoClsDataset_sparse\nfrom .ssv2 import SSVideoClsDataset, SSRawFrameClsDataset\n\n\nclass DataAugmentationForVideoMAE(object):\n    def __init__(self, args):\n        self.input_mean = [0.485, 0.456, 0.406]  # IMAGENET_DEFAULT_MEAN\n        self.input_std = [0.229, 0.224, 0.225]  # IMAGENET_DEFAULT_STD\n        normalize = GroupNormalize(self.input_mean, self.input_std)\n        self.train_augmentation = GroupMultiScaleCrop(args.input_size, [1, .875, .75, .66])\n        if args.color_jitter > 0:\n            self.transform = transforms.Compose([                            \n                self.train_augmentation,\n                GroupColorJitter(args.color_jitter),\n                GroupRandomHorizontalFlip(flip=args.flip),\n                Stack(roll=False),\n                ToTorchFormatTensor(div=True),\n                normalize,\n            ])\n        else:\n            self.transform = transforms.Compose([                            \n                self.train_augmentation,\n                GroupRandomHorizontalFlip(flip=args.flip),\n                Stack(roll=False),\n                ToTorchFormatTensor(div=True),\n                normalize,\n            ])\n        if args.mask_type == 'tube':\n            self.masked_position_generator = TubeMaskingGenerator(\n                args.window_size, args.mask_ratio\n            )\n        elif args.mask_type == 'random':\n            self.masked_position_generator = RandomMaskingGenerator(\n                args.window_size, args.mask_ratio\n            )\n        elif args.mask_type in 'attention':\n            self.masked_position_generator = None\n\n    def __call__(self, images):\n        process_data, _ = self.transform(images)\n        if self.masked_position_generator is None:\n            return process_data, -1\n        else:\n            return process_data, self.masked_position_generator()\n\n    def __repr__(self):\n        repr = \"(DataAugmentationForVideoMAE,\\n\"\n        repr += \"  transform = %s,\\n\" % str(self.transform)\n        repr += \"  Masked position generator = %s,\\n\" % str(self.masked_position_generator)\n        repr += \")\"\n        return repr\n\n\ndef build_pretraining_dataset(args):\n    transform = DataAugmentationForVideoMAE(args)\n    dataset = VideoMAE(\n        root=None,\n        setting=args.data_path,\n        prefix=args.prefix,\n        split=args.split,\n        video_ext='mp4',\n        is_color=True,\n        modality='rgb',\n        num_segments=args.num_segments,\n        new_length=args.num_frames,\n        new_step=args.sampling_rate,\n        transform=transform,\n        temporal_jitter=False,\n        video_loader=True,\n        use_decord=args.use_decord,\n        lazy_init=False,\n        num_sample=args.num_sample)\n    print(\"Data Aug = %s\" % str(transform))\n    return dataset\n\n\ndef build_dataset(is_train, test_mode, args):\n    print(f'Use Dataset: {args.data_set}')\n    if args.data_set in [\n            'Kinetics',\n            'Kinetics_sparse',\n            'mitv1_sparse'\n        ]:\n        mode = None\n        anno_path = None\n        if is_train is True:\n            mode = 'train'\n            anno_path = os.path.join(args.data_path, 'train.csv')\n        elif test_mode is True:\n            mode = 'test'\n            anno_path = os.path.join(args.data_path, 'test.csv') \n        else:  \n            mode = 'validation'\n            anno_path = os.path.join(args.data_path, 'val.csv') \n\n        if 'sparse' in args.data_set:\n            func = VideoClsDataset_sparse\n        else:\n            func = VideoClsDataset\n\n        dataset = func(\n            anno_path=anno_path,\n            prefix=args.prefix,\n            split=args.split,\n            mode=mode,\n            clip_len=args.num_frames,\n            frame_sample_rate=args.sampling_rate,\n            num_segment=1,\n            test_num_segment=args.test_num_segment,\n            test_num_crop=args.test_num_crop,\n            num_crop=1 if not test_mode else 3,\n            keep_aspect_ratio=True,\n            crop_size=args.input_size,\n            short_side_size=args.short_side_size,\n            new_height=256,\n            new_width=320,\n            args=args)\n        \n        nb_classes = args.nb_classes\n    \n    elif args.data_set == 'SSV2':\n        mode = None\n        anno_path = None\n        if is_train is True:\n            mode = 'train'\n            anno_path = os.path.join(args.data_path, 'train.csv')\n        elif test_mode is True:\n            mode = 'test'\n            anno_path = os.path.join(args.data_path, 'test.csv') \n        else:  \n            mode = 'validation'\n            anno_path = os.path.join(args.data_path, 'val.csv') \n\n        if args.use_decord:\n            func = SSVideoClsDataset\n        else:\n            func = SSRawFrameClsDataset\n\n        dataset = func(\n            anno_path=anno_path,\n            prefix=args.prefix,\n            split=args.split,\n            mode=mode,\n            clip_len=1,\n            num_segment=args.num_frames,\n            test_num_segment=args.test_num_segment,\n            test_num_crop=args.test_num_crop,\n            num_crop=1 if not test_mode else 3,\n            keep_aspect_ratio=True,\n            crop_size=args.input_size,\n            short_side_size=args.short_side_size,\n            new_height=256,\n            new_width=320,\n            args=args)\n        nb_classes = 174\n\n    elif args.data_set == 'UCF101':\n        mode = None\n        anno_path = None\n        if is_train is True:\n            mode = 'train'\n            anno_path = os.path.join(args.data_path, 'train.csv')\n        elif test_mode is True:\n            mode = 'test'\n            anno_path = os.path.join(args.data_path, 'test.csv') \n        else:  \n            mode = 'validation'\n            anno_path = os.path.join(args.data_path, 'val.csv') \n\n        dataset = VideoClsDataset(\n            anno_path=anno_path,\n            prefix=args.prefix,\n            split=args.split,\n            mode=mode,\n            clip_len=args.num_frames,\n            frame_sample_rate=args.sampling_rate,\n            num_segment=1,\n            test_num_segment=args.test_num_segment,\n            test_num_crop=args.test_num_crop,\n            num_crop=1 if not test_mode else 3,\n            keep_aspect_ratio=True,\n            crop_size=args.input_size,\n            short_side_size=args.short_side_size,\n            new_height=256,\n            new_width=320,\n            args=args)\n        nb_classes = 101\n    \n    elif args.data_set == 'HMDB51':\n        mode = None\n        anno_path = None\n        if is_train is True:\n            mode = 'train'\n            anno_path = os.path.join(args.data_path, 'train.csv')\n        elif test_mode is True:\n            mode = 'test'\n            anno_path = os.path.join(args.data_path, 'test.csv') \n        else:  \n            mode = 'validation'\n            anno_path = os.path.join(args.data_path, 'val.csv') \n\n        dataset = VideoClsDataset(\n            anno_path=anno_path,\n            prefix=args.prefix,\n            split=args.split,\n            mode=mode,\n            clip_len=args.num_frames,\n            frame_sample_rate=args.sampling_rate,\n            num_segment=1,\n            test_num_segment=args.test_num_segment,\n            test_num_crop=args.test_num_crop,\n            num_crop=1 if not test_mode else 3,\n            keep_aspect_ratio=True,\n            crop_size=args.input_size,\n            short_side_size=args.short_side_size,\n            new_height=256,\n            new_width=320,\n            args=args)\n        nb_classes = 51\n    else:\n        print(f'Wrong: {args.data_set}')\n        raise NotImplementedError()\n    assert nb_classes == args.nb_classes\n    print(\"Number of the class = %d\" % args.nb_classes)\n\n    return dataset, nb_classes\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/datasets/kinetics.py",
    "content": "import os\nimport os\nimport io\nimport numpy as np\nfrom numpy.lib.function_base import disp\nimport torch\nfrom torchvision import transforms\nimport warnings\nfrom decord import VideoReader, cpu\nfrom torch.utils.data import Dataset\nfrom .random_erasing import RandomErasing\nfrom .video_transforms import (\n    Compose, Resize, CenterCrop, Normalize,\n    create_random_augment, random_short_side_scale_jitter, \n    random_crop, random_resized_crop_with_shift, random_resized_crop,\n    horizontal_flip, random_short_side_scale_jitter, uniform_crop, \n)\nfrom .volume_transforms import ClipToTensor\n\ntry:\n    from petrel_client.client import Client\n    has_client = True\nexcept ImportError:\n    has_client = False\n\nclass VideoClsDataset(Dataset):\n    \"\"\"Load your own video classification dataset.\"\"\"\n\n    def __init__(self, anno_path, prefix='', split=' ', mode='train', clip_len=8,\n                 frame_sample_rate=2, crop_size=224, short_side_size=256,\n                 new_height=256, new_width=340, keep_aspect_ratio=True,\n                 num_segment=1, num_crop=1, test_num_segment=10, test_num_crop=3,\n                 args=None):\n        self.anno_path = anno_path\n        self.prefix = prefix\n        self.split = split\n        self.mode = mode\n        self.clip_len = clip_len\n        self.frame_sample_rate = frame_sample_rate\n        self.crop_size = crop_size\n        self.short_side_size = short_side_size\n        self.new_height = new_height\n        self.new_width = new_width\n        self.keep_aspect_ratio = keep_aspect_ratio\n        self.num_segment = num_segment\n        self.test_num_segment = test_num_segment\n        self.num_crop = num_crop\n        self.test_num_crop = test_num_crop\n        self.args = args\n        self.aug = False\n        self.rand_erase = False\n        assert num_segment == 1\n        if self.mode in ['train']:\n            self.aug = True\n            if self.args.reprob > 0:\n                self.rand_erase = True\n        if VideoReader is None:\n            raise ImportError(\"Unable to import `decord` which is required to read videos.\")\n\n        import pandas as pd\n        cleaned = pd.read_csv(self.anno_path, header=None, delimiter=self.split)\n        self.dataset_samples = list(cleaned.values[:, 0])\n        self.label_array = list(cleaned.values[:, 1])\n\n        self.client = None\n        if has_client:\n            self.client = Client('~/petreloss.conf')\n\n        if (mode == 'train'):\n            pass\n\n        elif (mode == 'validation'):\n            self.data_transform = Compose([\n                Resize(self.short_side_size, interpolation='bilinear'),\n                CenterCrop(size=(self.crop_size, self.crop_size)),\n                ClipToTensor(),\n                Normalize(mean=[0.485, 0.456, 0.406],\n                                           std=[0.229, 0.224, 0.225])\n            ])\n        elif mode == 'test':\n            self.data_resize = Compose([\n                Resize(size=(short_side_size), interpolation='bilinear')\n            ])\n            self.data_transform = Compose([\n                ClipToTensor(),\n                Normalize(mean=[0.485, 0.456, 0.406],\n                                           std=[0.229, 0.224, 0.225])\n            ])\n            self.test_seg = []\n            self.test_dataset = []\n            self.test_label_array = []\n            for ck in range(self.test_num_segment):\n                for cp in range(self.test_num_crop):\n                    for idx in range(len(self.label_array)):\n                        sample_label = self.label_array[idx]\n                        self.test_label_array.append(sample_label)\n                        self.test_dataset.append(self.dataset_samples[idx])\n                        self.test_seg.append((ck, cp))\n\n    def __getitem__(self, index):\n        if self.mode == 'train':\n            args = self.args \n            scale_t = 1\n\n            sample = self.dataset_samples[index]\n            buffer = self.loadvideo_decord(sample, sample_rate_scale=scale_t) # T H W C\n            if len(buffer) == 0:\n                while len(buffer) == 0:\n                    warnings.warn(\"video {} not correctly loaded during training\".format(sample))\n                    index = np.random.randint(self.__len__())\n                    sample = self.dataset_samples[index]\n                    buffer = self.loadvideo_decord(sample, sample_rate_scale=scale_t)\n\n            if args.num_sample > 1:\n                frame_list = []\n                label_list = []\n                index_list = []\n                for _ in range(args.num_sample):\n                    new_frames = self._aug_frame(buffer, args)\n                    label = self.label_array[index]\n                    frame_list.append(new_frames)\n                    label_list.append(label)\n                    index_list.append(index)\n                return frame_list, label_list, index_list, {}\n            else:\n                buffer = self._aug_frame(buffer, args)\n            \n            return buffer, self.label_array[index], index, {}\n\n        elif self.mode == 'validation':\n            sample = self.dataset_samples[index]\n            buffer = self.loadvideo_decord(sample)\n            if len(buffer) == 0:\n                while len(buffer) == 0:\n                    warnings.warn(\"video {} not correctly loaded during validation\".format(sample))\n                    index = np.random.randint(self.__len__())\n                    sample = self.dataset_samples[index]\n                    buffer = self.loadvideo_decord(sample)\n            buffer = self.data_transform(buffer)\n            return buffer, self.label_array[index], sample.split(\"/\")[-1].split(\".\")[0]\n\n        elif self.mode == 'test':\n            sample = self.test_dataset[index]\n            chunk_nb, split_nb = self.test_seg[index]\n            buffer = self.loadvideo_decord(sample, chunk_nb=chunk_nb)\n\n            while len(buffer) == 0:\n                warnings.warn(\"video {}, temporal {}, spatial {} not found during testing\".format(\\\n                    str(self.test_dataset[index]), chunk_nb, split_nb))\n                index = np.random.randint(self.__len__())\n                sample = self.test_dataset[index]\n                chunk_nb, split_nb = self.test_seg[index]\n                buffer = self.loadvideo_decord(sample, chunk_nb=chunk_nb)\n\n            buffer = self.data_resize(buffer)\n            if isinstance(buffer, list):\n                buffer = np.stack(buffer, 0)\n\n            if self.test_num_crop == 1:\n                spatial_step = 1.0 * (max(buffer.shape[1], buffer.shape[2]) - self.short_side_size) / 2\n                spatial_start = int(spatial_step)\n            else:\n                spatial_step = 1.0 * (max(buffer.shape[1], buffer.shape[2]) - self.short_side_size) \\\n                                    / (self.test_num_crop - 1)\n                spatial_start = int(split_nb * spatial_step)\n            if buffer.shape[1] >= buffer.shape[2]:\n                buffer = buffer[:, spatial_start:spatial_start + self.short_side_size, :, :]\n            else:\n                buffer = buffer[:, :, spatial_start:spatial_start + self.short_side_size, :]\n\n            buffer = self.data_transform(buffer)\n            return buffer, self.test_label_array[index], sample.split(\"/\")[-1].split(\".\")[0], \\\n                   chunk_nb, split_nb\n        else:\n            raise NameError('mode {} unkown'.format(self.mode))\n\n    def _aug_frame(\n        self,\n        buffer,\n        args,\n    ):\n\n        aug_transform = create_random_augment(\n            input_size=(self.crop_size, self.crop_size),\n            auto_augment=args.aa,\n            interpolation=args.train_interpolation,\n        )\n\n        buffer = [\n            transforms.ToPILImage()(frame) for frame in buffer\n        ]\n\n        buffer = aug_transform(buffer)\n\n        buffer = [transforms.ToTensor()(img) for img in buffer]\n        buffer = torch.stack(buffer) # T C H W\n        buffer = buffer.permute(0, 2, 3, 1) # T H W C \n        \n        # T H W C \n        buffer = tensor_normalize(\n            buffer, [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]\n        )\n        # T H W C -> C T H W.\n        buffer = buffer.permute(3, 0, 1, 2)\n        # Perform data augmentation.\n        scl, asp = (\n            [0.08, 1.0],\n            [0.75, 1.3333],\n        )\n\n        buffer = spatial_sampling(\n            buffer,\n            spatial_idx=-1,\n            min_scale=256,\n            max_scale=320,\n            crop_size=self.crop_size,\n            random_horizontal_flip=False if args.data_set == 'SSV2' else True ,\n            inverse_uniform_sampling=False,\n            aspect_ratio=asp,\n            scale=scl,\n            motion_shift=False\n        )\n\n        if self.rand_erase:\n            erase_transform = RandomErasing(\n                args.reprob,\n                mode=args.remode,\n                max_count=args.recount,\n                num_splits=args.recount,\n                device=\"cpu\",\n            )\n            buffer = buffer.permute(1, 0, 2, 3)\n            buffer = erase_transform(buffer)\n            buffer = buffer.permute(1, 0, 2, 3)\n\n        return buffer\n\n\n    def loadvideo_decord(self, sample, sample_rate_scale=1, chunk_nb=0):\n        \"\"\"Load video content using Decord\"\"\"\n        fname = sample\n        fname = os.path.join(self.prefix, fname)\n\n        try:\n            if self.keep_aspect_ratio:\n                if fname.startswith('s3'):\n                    video_bytes = self.client.get(fname)\n                    vr = VideoReader(io.BytesIO(video_bytes),\n                                     num_threads=1,\n                                     ctx=cpu(0))\n                else:\n                    vr = VideoReader(fname, num_threads=1, ctx=cpu(0))\n            else:\n                if fname.startswith('s3:'):\n                    video_bytes = self.client.get(fname)\n                    vr = VideoReader(io.BytesIO(video_bytes),\n                                     width=self.new_width,\n                                     height=self.new_height,\n                                     num_threads=1,\n                                     ctx=cpu(0))\n                else:\n                    vr = VideoReader(fname, width=self.new_width, height=self.new_height,\n                                    num_threads=1, ctx=cpu(0))\n\n            # handle temporal segments\n            converted_len = int(self.clip_len * self.frame_sample_rate)\n            seg_len = len(vr) // self.num_segment\n\n            if self.mode == 'test':\n                temporal_step = max(1.0 * (len(vr) - converted_len) / (self.test_num_segment - 1), 0)\n                temporal_start = int(chunk_nb * temporal_step)\n\n                bound = min(temporal_start + converted_len, len(vr))\n                all_index = [x for x in range(temporal_start, bound, self.frame_sample_rate)]\n                while len(all_index) < self.clip_len:\n                    all_index.append(all_index[-1])\n                vr.seek(0)\n                buffer = vr.get_batch(all_index).asnumpy()\n                return buffer\n\n            all_index = []\n            for i in range(self.num_segment):\n                if seg_len <= converted_len:\n                    index = np.linspace(0, seg_len, num=seg_len // self.frame_sample_rate)\n                    index = np.concatenate((index, np.ones(self.clip_len - seg_len // self.frame_sample_rate) * seg_len))\n                    index = np.clip(index, 0, seg_len - 1).astype(np.int64)\n                else:\n                    if self.mode == 'validation':\n                        end_idx = (seg_len - converted_len) // 2\n                    else:\n                        end_idx = np.random.randint(converted_len, seg_len)\n                    str_idx = end_idx - converted_len\n                    index = np.linspace(str_idx, end_idx, num=self.clip_len)\n                    index = np.clip(index, str_idx, end_idx - 1).astype(np.int64)\n                index = index + i*seg_len\n                all_index.extend(list(index))\n\n            all_index = all_index[::int(sample_rate_scale)]\n            vr.seek(0)\n            buffer = vr.get_batch(all_index).asnumpy()\n            return buffer\n        except:\n            print(\"video cannot be loaded by decord: \", fname)\n            return []\n\n    def __len__(self):\n        if self.mode != 'test':\n            return len(self.dataset_samples)\n        else:\n            return len(self.test_dataset)\n\n\ndef spatial_sampling(\n    frames,\n    spatial_idx=-1,\n    min_scale=256,\n    max_scale=320,\n    crop_size=224,\n    random_horizontal_flip=True,\n    inverse_uniform_sampling=False,\n    aspect_ratio=None,\n    scale=None,\n    motion_shift=False,\n):\n    \"\"\"\n    Perform spatial sampling on the given video frames. If spatial_idx is\n    -1, perform random scale, random crop, and random flip on the given\n    frames. If spatial_idx is 0, 1, or 2, perform spatial uniform sampling\n    with the given spatial_idx.\n    Args:\n        frames (tensor): frames of images sampled from the video. The\n            dimension is `num frames` x `height` x `width` x `channel`.\n        spatial_idx (int): if -1, perform random spatial sampling. If 0, 1,\n            or 2, perform left, center, right crop if width is larger than\n            height, and perform top, center, buttom crop if height is larger\n            than width.\n        min_scale (int): the minimal size of scaling.\n        max_scale (int): the maximal size of scaling.\n        crop_size (int): the size of height and width used to crop the\n            frames.\n        inverse_uniform_sampling (bool): if True, sample uniformly in\n            [1 / max_scale, 1 / min_scale] and take a reciprocal to get the\n            scale. If False, take a uniform sample from [min_scale,\n            max_scale].\n        aspect_ratio (list): Aspect ratio range for resizing.\n        scale (list): Scale range for resizing.\n        motion_shift (bool): Whether to apply motion shift for resizing.\n    Returns:\n        frames (tensor): spatially sampled frames.\n    \"\"\"\n    assert spatial_idx in [-1, 0, 1, 2]\n    if spatial_idx == -1:\n        if aspect_ratio is None and scale is None:\n            frames, _ = random_short_side_scale_jitter(\n                images=frames,\n                min_size=min_scale,\n                max_size=max_scale,\n                inverse_uniform_sampling=inverse_uniform_sampling,\n            )\n            frames, _ = random_crop(frames, crop_size)\n        else:\n            transform_func = (\n                random_resized_crop_with_shift\n                if motion_shift\n                else random_resized_crop\n            )\n            frames = transform_func(\n                images=frames,\n                target_height=crop_size,\n                target_width=crop_size,\n                scale=scale,\n                ratio=aspect_ratio,\n            )\n        if random_horizontal_flip:\n            frames, _ = horizontal_flip(0.5, frames)\n    else:\n        # The testing is deterministic and no jitter should be performed.\n        # min_scale, max_scale, and crop_size are expect to be the same.\n        assert len({min_scale, max_scale, crop_size}) == 1\n        frames, _ = random_short_side_scale_jitter(\n            frames, min_scale, max_scale\n        )\n        frames, _ = uniform_crop(frames, crop_size, spatial_idx)\n    return frames\n\n\ndef tensor_normalize(tensor, mean, std):\n    \"\"\"\n    Normalize a given tensor by subtracting the mean and dividing the std.\n    Args:\n        tensor (tensor): tensor to normalize.\n        mean (tensor or list): mean value to subtract.\n        std (tensor or list): std to divide.\n    \"\"\"\n    if tensor.dtype == torch.uint8:\n        tensor = tensor.float()\n        tensor = tensor / 255.0\n    if type(mean) == list:\n        mean = torch.tensor(mean)\n    if type(std) == list:\n        std = torch.tensor(std)\n    tensor = tensor - mean\n    tensor = tensor / std\n    return tensor\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/datasets/kinetics_sparse.py",
    "content": "import os\nimport os\nimport io\nimport random\nimport numpy as np\nfrom numpy.lib.function_base import disp\nimport torch\nfrom torchvision import transforms\nimport warnings\nfrom decord import VideoReader, cpu\nfrom torch.utils.data import Dataset\nfrom .random_erasing import RandomErasing\nfrom .video_transforms import (\n    Compose, Resize, CenterCrop, Normalize,\n    create_random_augment, random_short_side_scale_jitter, \n    random_crop, random_resized_crop_with_shift, random_resized_crop,\n    horizontal_flip, random_short_side_scale_jitter, uniform_crop, \n)\nfrom .volume_transforms import ClipToTensor\n\ntry:\n    from petrel_client.client import Client\n    has_client = True\nexcept ImportError:\n    has_client = False\n\nclass VideoClsDataset_sparse(Dataset):\n    \"\"\"Load your own video classification dataset.\"\"\"\n\n    def __init__(self, anno_path, prefix='', split=' ', mode='train', clip_len=8,\n                 frame_sample_rate=2, crop_size=224, short_side_size=256,\n                 new_height=256, new_width=340, keep_aspect_ratio=True,\n                 num_segment=1, num_crop=1, test_num_segment=10, test_num_crop=3,\n                 args=None):\n        self.anno_path = anno_path\n        self.prefix = prefix\n        self.split = split\n        self.mode = mode\n        self.clip_len = clip_len\n        self.frame_sample_rate = frame_sample_rate\n        self.crop_size = crop_size\n        self.short_side_size = short_side_size\n        self.new_height = new_height\n        self.new_width = new_width\n        self.keep_aspect_ratio = keep_aspect_ratio\n        self.num_segment = num_segment\n        self.test_num_segment = test_num_segment\n        self.num_crop = num_crop\n        self.test_num_crop = test_num_crop\n        self.args = args\n        self.aug = False\n        self.rand_erase = False\n        assert num_segment == 1\n        if self.mode in ['train']:\n            self.aug = True\n            if self.args.reprob > 0:\n                self.rand_erase = True\n        if VideoReader is None:\n            raise ImportError(\"Unable to import `decord` which is required to read videos.\")\n\n        import pandas as pd\n        cleaned = pd.read_csv(self.anno_path, header=None, delimiter=self.split)\n        self.dataset_samples = list(cleaned.values[:, 0])\n        self.label_array = list(cleaned.values[:, 1])\n\n        self.client = None\n        if has_client:\n            self.client = Client('~/petreloss.conf')\n\n        if (mode == 'train'):\n            pass\n\n        elif (mode == 'validation'):\n            self.data_transform = Compose([\n                Resize(self.short_side_size, interpolation='bilinear'),\n                CenterCrop(size=(self.crop_size, self.crop_size)),\n                ClipToTensor(),\n                Normalize(mean=[0.485, 0.456, 0.406],\n                                           std=[0.229, 0.224, 0.225])\n            ])\n        elif mode == 'test':\n            self.data_resize = Compose([\n                Resize(size=(short_side_size), interpolation='bilinear')\n            ])\n            self.data_transform = Compose([\n                ClipToTensor(),\n                Normalize(mean=[0.485, 0.456, 0.406],\n                                           std=[0.229, 0.224, 0.225])\n            ])\n            self.test_seg = []\n            self.test_dataset = []\n            self.test_label_array = []\n            for ck in range(self.test_num_segment):\n                for cp in range(self.test_num_crop):\n                    for idx in range(len(self.label_array)):\n                        sample_label = self.label_array[idx]\n                        self.test_label_array.append(sample_label)\n                        self.test_dataset.append(self.dataset_samples[idx])\n                        self.test_seg.append((ck, cp))\n\n    def __getitem__(self, index):\n        if self.mode == 'train':\n            args = self.args \n\n            sample = self.dataset_samples[index]\n            buffer = self.loadvideo_decord(sample, chunk_nb=-1) # T H W C\n            if len(buffer) == 0:\n                while len(buffer) == 0:\n                    warnings.warn(\"video {} not correctly loaded during training\".format(sample))\n                    index = np.random.randint(self.__len__())\n                    sample = self.dataset_samples[index]\n                    buffer = self.loadvideo_decord(sample, chunk_nb=-1)\n\n            if args.num_sample > 1:\n                frame_list = []\n                label_list = []\n                index_list = []\n                for _ in range(args.num_sample):\n                    new_frames = self._aug_frame(buffer, args)\n                    label = self.label_array[index]\n                    frame_list.append(new_frames)\n                    label_list.append(label)\n                    index_list.append(index)\n                return frame_list, label_list, index_list, {}\n            else:\n                buffer = self._aug_frame(buffer, args)\n            \n            return buffer, self.label_array[index], index, {}\n\n        elif self.mode == 'validation':\n            sample = self.dataset_samples[index]\n            buffer = self.loadvideo_decord(sample, chunk_nb=0)\n            if len(buffer) == 0:\n                while len(buffer) == 0:\n                    warnings.warn(\"video {} not correctly loaded during validation\".format(sample))\n                    index = np.random.randint(self.__len__())\n                    sample = self.dataset_samples[index]\n                    buffer = self.loadvideo_decord(sample, chunk_nb=0)\n            buffer = self.data_transform(buffer)\n            return buffer, self.label_array[index], sample.split(\"/\")[-1].split(\".\")[0]\n\n        elif self.mode == 'test':\n            sample = self.test_dataset[index]\n            chunk_nb, split_nb = self.test_seg[index]\n            buffer = self.loadvideo_decord(sample, chunk_nb=chunk_nb)\n\n            while len(buffer) == 0:\n                warnings.warn(\"video {}, temporal {}, spatial {} not found during testing\".format(\\\n                    str(self.test_dataset[index]), chunk_nb, split_nb))\n                index = np.random.randint(self.__len__())\n                sample = self.test_dataset[index]\n                chunk_nb, split_nb = self.test_seg[index]\n                buffer = self.loadvideo_decord(sample, chunk_nb=chunk_nb)\n\n            buffer = self.data_resize(buffer)\n            if isinstance(buffer, list):\n                buffer = np.stack(buffer, 0)\n            if self.test_num_crop == 1:\n                spatial_step = 1.0 * (max(buffer.shape[1], buffer.shape[2]) - self.short_side_size) / 2\n                spatial_start = int(spatial_step)\n            else:\n                spatial_step = 1.0 * (max(buffer.shape[1], buffer.shape[2]) - self.short_side_size) \\\n                                    / (self.test_num_crop - 1)\n                spatial_start = int(split_nb * spatial_step)\n            if buffer.shape[1] >= buffer.shape[2]:\n                buffer = buffer[:, spatial_start:spatial_start + self.short_side_size, :, :]\n            else:\n                buffer = buffer[:, :, spatial_start:spatial_start + self.short_side_size, :]\n\n            buffer = self.data_transform(buffer)\n            return buffer, self.test_label_array[index], sample.split(\"/\")[-1].split(\".\")[0], \\\n                   chunk_nb, split_nb\n        else:\n            raise NameError('mode {} unkown'.format(self.mode))\n\n    def _aug_frame(\n        self,\n        buffer,\n        args,\n    ):\n\n        aug_transform = create_random_augment(\n            input_size=(self.crop_size, self.crop_size),\n            auto_augment=args.aa,\n            interpolation=args.train_interpolation,\n        )\n\n        buffer = [\n            transforms.ToPILImage()(frame) for frame in buffer\n        ]\n\n        buffer = aug_transform(buffer)\n\n        buffer = [transforms.ToTensor()(img) for img in buffer]\n        buffer = torch.stack(buffer) # T C H W\n        buffer = buffer.permute(0, 2, 3, 1) # T H W C \n        \n        # T H W C \n        buffer = tensor_normalize(\n            buffer, [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]\n        )\n        # T H W C -> C T H W.\n        buffer = buffer.permute(3, 0, 1, 2)\n        # Perform data augmentation.\n        scl, asp = (\n            [0.08, 1.0],\n            [0.75, 1.3333],\n        )\n\n        buffer = spatial_sampling(\n            buffer,\n            spatial_idx=-1,\n            min_scale=256,\n            max_scale=320,\n            crop_size=self.crop_size,\n            random_horizontal_flip=False if args.data_set == 'SSV2' else True ,\n            inverse_uniform_sampling=False,\n            aspect_ratio=asp,\n            scale=scl,\n            motion_shift=False\n        )\n\n        if self.rand_erase:\n            erase_transform = RandomErasing(\n                args.reprob,\n                mode=args.remode,\n                max_count=args.recount,\n                num_splits=args.recount,\n                device=\"cpu\",\n            )\n            buffer = buffer.permute(1, 0, 2, 3)\n            buffer = erase_transform(buffer)\n            buffer = buffer.permute(1, 0, 2, 3)\n\n        return buffer\n\n    def _get_seq_frames(self, video_size, num_frames, clip_idx=-1):\n        seg_size = max(0., float(video_size - 1) / num_frames)\n        max_frame = int(video_size) - 1\n        seq = []\n        # index from 1, must add 1\n        if clip_idx == -1:\n            for i in range(num_frames):\n                start = int(np.round(seg_size * i))\n                end = int(np.round(seg_size * (i + 1)))\n                idx = min(random.randint(start, end), max_frame)\n                seq.append(idx)\n        else:\n            num_segment = 1\n            if self.mode == 'test':\n                num_segment = self.test_num_segment\n            duration = seg_size / (num_segment + 1)\n            for i in range(num_frames):\n                start = int(np.round(seg_size * i))\n                frame_index = start + int(duration * (clip_idx + 1))\n                idx = min(frame_index, max_frame)\n                seq.append(idx)\n        return seq\n\n    def loadvideo_decord(self, sample, chunk_nb=0):\n        \"\"\"Load video content using Decord\"\"\"\n        fname = sample\n        fname = os.path.join(self.prefix, fname)\n\n        try:\n            if self.keep_aspect_ratio:\n                if fname.startswith('s3'):\n                    video_bytes = self.client.get(fname)\n                    vr = VideoReader(io.BytesIO(video_bytes),\n                                     num_threads=1,\n                                     ctx=cpu(0))\n                else:\n                    vr = VideoReader(fname, num_threads=1, ctx=cpu(0))\n            else:\n                if fname.startswith('s3:'):\n                    video_bytes = self.client.get(fname)\n                    vr = VideoReader(io.BytesIO(video_bytes),\n                                     width=self.new_width,\n                                     height=self.new_height,\n                                     num_threads=1,\n                                     ctx=cpu(0))\n                else:\n                    vr = VideoReader(fname, width=self.new_width, height=self.new_height,\n                                    num_threads=1, ctx=cpu(0))\n\n            all_index = self._get_seq_frames(len(vr), self.clip_len, clip_idx=chunk_nb)\n            vr.seek(0)\n            buffer = vr.get_batch(all_index).asnumpy()\n            return buffer\n        except:\n            print(\"video cannot be loaded by decord: \", fname)\n            return []\n\n    def __len__(self):\n        if self.mode != 'test':\n            return len(self.dataset_samples)\n        else:\n            return len(self.test_dataset)\n\n\ndef spatial_sampling(\n    frames,\n    spatial_idx=-1,\n    min_scale=256,\n    max_scale=320,\n    crop_size=224,\n    random_horizontal_flip=True,\n    inverse_uniform_sampling=False,\n    aspect_ratio=None,\n    scale=None,\n    motion_shift=False,\n):\n    \"\"\"\n    Perform spatial sampling on the given video frames. If spatial_idx is\n    -1, perform random scale, random crop, and random flip on the given\n    frames. If spatial_idx is 0, 1, or 2, perform spatial uniform sampling\n    with the given spatial_idx.\n    Args:\n        frames (tensor): frames of images sampled from the video. The\n            dimension is `num frames` x `height` x `width` x `channel`.\n        spatial_idx (int): if -1, perform random spatial sampling. If 0, 1,\n            or 2, perform left, center, right crop if width is larger than\n            height, and perform top, center, buttom crop if height is larger\n            than width.\n        min_scale (int): the minimal size of scaling.\n        max_scale (int): the maximal size of scaling.\n        crop_size (int): the size of height and width used to crop the\n            frames.\n        inverse_uniform_sampling (bool): if True, sample uniformly in\n            [1 / max_scale, 1 / min_scale] and take a reciprocal to get the\n            scale. If False, take a uniform sample from [min_scale,\n            max_scale].\n        aspect_ratio (list): Aspect ratio range for resizing.\n        scale (list): Scale range for resizing.\n        motion_shift (bool): Whether to apply motion shift for resizing.\n    Returns:\n        frames (tensor): spatially sampled frames.\n    \"\"\"\n    assert spatial_idx in [-1, 0, 1, 2]\n    if spatial_idx == -1:\n        if aspect_ratio is None and scale is None:\n            frames, _ = random_short_side_scale_jitter(\n                images=frames,\n                min_size=min_scale,\n                max_size=max_scale,\n                inverse_uniform_sampling=inverse_uniform_sampling,\n            )\n            frames, _ = random_crop(frames, crop_size)\n        else:\n            transform_func = (\n                random_resized_crop_with_shift\n                if motion_shift\n                else random_resized_crop\n            )\n            frames = transform_func(\n                images=frames,\n                target_height=crop_size,\n                target_width=crop_size,\n                scale=scale,\n                ratio=aspect_ratio,\n            )\n        if random_horizontal_flip:\n            frames, _ = horizontal_flip(0.5, frames)\n    else:\n        # The testing is deterministic and no jitter should be performed.\n        # min_scale, max_scale, and crop_size are expect to be the same.\n        assert len({min_scale, max_scale, crop_size}) == 1\n        frames, _ = random_short_side_scale_jitter(\n            frames, min_scale, max_scale\n        )\n        frames, _ = uniform_crop(frames, crop_size, spatial_idx)\n    return frames\n\n\ndef tensor_normalize(tensor, mean, std):\n    \"\"\"\n    Normalize a given tensor by subtracting the mean and dividing the std.\n    Args:\n        tensor (tensor): tensor to normalize.\n        mean (tensor or list): mean value to subtract.\n        std (tensor or list): std to divide.\n    \"\"\"\n    if tensor.dtype == torch.uint8:\n        tensor = tensor.float()\n        tensor = tensor / 255.0\n    if type(mean) == list:\n        mean = torch.tensor(mean)\n    if type(std) == list:\n        std = torch.tensor(std)\n    tensor = tensor - mean\n    tensor = tensor / std\n    return tensor\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/datasets/mae.py",
    "content": "import os\nimport cv2\nimport io\nimport numpy as np\nimport torch\nimport decord\nfrom PIL import Image\nfrom decord import VideoReader, cpu\nimport random\n\ntry:\n    from petrel_client.client import Client\n    has_client = True\nexcept ImportError:\n    has_client = False\n\n\nclass VideoMAE(torch.utils.data.Dataset):\n    \"\"\"Load your own video classification dataset.\n    Parameters\n    ----------\n    root : str, required.\n        Path to the root folder storing the dataset.\n    setting : str, required.\n        A text file describing the dataset, each line per video sample.\n        There are three items in each line: (1) video path; (2) video length and (3) video label.\n    prefix : str, required.\n        The prefix for loading data.\n    split : str, required.\n        The split character for metadata.\n    train : bool, default True.\n        Whether to load the training or validation set.\n    test_mode : bool, default False.\n        Whether to perform evaluation on the test set.\n        Usually there is three-crop or ten-crop evaluation strategy involved.\n    name_pattern : str, default None.\n        The naming pattern of the decoded video frames.\n        For example, img_00012.jpg.\n    video_ext : str, default 'mp4'.\n        If video_loader is set to True, please specify the video format accordinly.\n    is_color : bool, default True.\n        Whether the loaded image is color or grayscale.\n    modality : str, default 'rgb'.\n        Input modalities, we support only rgb video frames for now.\n        Will add support for rgb difference image and optical flow image later.\n    num_segments : int, default 1.\n        Number of segments to evenly divide the video into clips.\n        A useful technique to obtain global video-level information.\n        Limin Wang, etal, Temporal Segment Networks: Towards Good Practices for Deep Action Recognition, ECCV 2016.\n    num_crop : int, default 1.\n        Number of crops for each image. default is 1.\n        Common choices are three crops and ten crops during evaluation.\n    new_length : int, default 1.\n        The length of input video clip. Default is a single image, but it can be multiple video frames.\n        For example, new_length=16 means we will extract a video clip of consecutive 16 frames.\n    new_step : int, default 1.\n        Temporal sampling rate. For example, new_step=1 means we will extract a video clip of consecutive frames.\n        new_step=2 means we will extract a video clip of every other frame.\n    temporal_jitter : bool, default False.\n        Whether to temporally jitter if new_step > 1.\n    video_loader : bool, default False.\n        Whether to use video loader to load data.\n    use_decord : bool, default True.\n        Whether to use Decord video loader to load data. Otherwise load image.\n    transform : function, default None.\n        A function that takes data and label and transforms them.\n    data_aug : str, default 'v1'.\n        Different types of data augmentation auto. Supports v1, v2, v3 and v4.\n    lazy_init : bool, default False.\n        If set to True, build a dataset instance without loading any dataset.\n    \"\"\"\n    def __init__(self,\n                 root,\n                 setting,\n                 prefix='',\n                 split=' ',\n                 train=True,\n                 test_mode=False,\n                 name_pattern='img_%05d.jpg',\n                 video_ext='mp4',\n                 is_color=True,\n                 modality='rgb',\n                 num_segments=1,\n                 num_crop=1,\n                 new_length=1,\n                 new_step=1,\n                 transform=None,\n                 temporal_jitter=False,\n                 video_loader=False,\n                 use_decord=True,\n                 lazy_init=False,\n                 num_sample=1,\n                 ):\n\n        super(VideoMAE, self).__init__()\n        self.root = root\n        self.setting = setting\n        self.prefix = prefix\n        self.split = split\n        self.train = train\n        self.test_mode = test_mode\n        self.is_color = is_color\n        self.modality = modality\n        self.num_segments = num_segments\n        self.num_crop = num_crop\n        self.new_length = new_length\n        self.new_step = new_step\n        self.skip_length = self.new_length * self.new_step\n        self.temporal_jitter = temporal_jitter\n        self.name_pattern = name_pattern\n        self.video_loader = video_loader\n        self.video_ext = video_ext\n        self.use_decord = use_decord\n        self.transform = transform\n        self.lazy_init = lazy_init\n        self.num_sample = num_sample\n\n        # sparse sampling, num_segments != 1\n        if self.num_segments != 1:\n            print('Use sparse sampling, change frame and stride')\n            self.new_length = self.num_segments\n            self.skip_length = 1\n\n        self.client = None\n        if has_client:\n            self.client = Client('~/petreloss.conf')\n\n        if not self.lazy_init:\n            self.clips = self._make_dataset(root, setting)\n            if len(self.clips) == 0:\n                raise(RuntimeError(\"Found 0 video clips in subfolders of: \" + root + \"\\n\"\n                                   \"Check your data directory (opt.data-dir).\"))\n\n    def __getitem__(self, index):\n        while True:\n            try:\n                images = None\n                if self.use_decord:\n                    directory, target = self.clips[index]\n                    if self.video_loader:\n                        if '.' in directory.split('/')[-1]:\n                            # data in the \"setting\" file already have extension, e.g., demo.mp4\n                            video_name = directory\n                        else:\n                            # data in the \"setting\" file do not have extension, e.g., demo\n                            # So we need to provide extension (i.e., .mp4) to complete the file name.\n                            video_name = '{}.{}'.format(directory, self.video_ext)\n\n                        video_name = os.path.join(self.prefix, video_name)\n                        if video_name.startswith('s3'):\n                            video_bytes = self.client.get(video_name)\n                            decord_vr = VideoReader(io.BytesIO(video_bytes),\n                                                    num_threads=1,\n                                                    ctx=cpu(0))\n                        else:\n                            decord_vr = decord.VideoReader(video_name, num_threads=1, ctx=cpu(0))\n                        duration = len(decord_vr)\n                        \n                    segment_indices, skip_offsets = self._sample_train_indices(duration)\n                    images = self._video_TSN_decord_batch_loader(directory, decord_vr, duration, segment_indices, skip_offsets)\n                \n                else:\n                    video_name, total_frame, target = self.clips[index]\n                    video_name = os.path.join(self.prefix, video_name)\n\n                    segment_indices, skip_offsets = self._sample_train_indices(total_frame)\n                    frame_id_list = self._get_frame_id_list(total_frame, segment_indices, skip_offsets)\n                    images = []\n                    for idx in frame_id_list:\n                        frame_fname = os.path.join(video_name, self.name_pattern.format(idx))\n                        img_bytes = self.client.get(frame_fname)\n                        img_np = np.frombuffer(img_bytes, np.uint8)\n                        img = cv2.imdecode(img_np, cv2.IMREAD_COLOR)\n                        cv2.cvtColor(img, cv2.COLOR_BGR2RGB, img)\n                        images.append(Image.fromarray(img))    \n                if images is not None:\n                    break\n            except Exception as e:\n                print(\"Failed to load video from {} with error {}\".format(\n                    video_name, e))\n            index = random.randint(0, len(self.clips) - 1)\n       \n        if self.num_sample > 1:\n            process_data_list = []\n            mask_list = []\n            for _ in range(self.num_sample):\n                process_data, mask = self.transform((images, None))\n                process_data = process_data.view((self.new_length, 3) + process_data.size()[-2:]).transpose(0, 1)\n                process_data_list.append(process_data)\n                mask_list.append(mask)\n            return process_data_list, mask_list\n        else:\n            process_data, mask = self.transform((images, None)) # T*C,H,W\n            process_data = process_data.view((self.new_length, 3) + process_data.size()[-2:]).transpose(0, 1)  # T*C,H,W -> T,C,H,W -> C,T,H,W\n            return (process_data, mask)\n\n    def __len__(self):\n        return len(self.clips)\n\n    def _make_dataset(self, directory, setting):\n        if not os.path.exists(setting):\n            raise(RuntimeError(\"Setting file %s doesn't exist. Check opt.train-list and opt.val-list. \" % (setting)))\n        clips = []\n\n        print(f'Load dataset using decord: {self.use_decord}')\n        with open(setting) as split_f:\n            data = split_f.readlines()\n            for line in data:\n                line_info = line.split(self.split)\n                if len(line_info) < 2:\n                    raise(RuntimeError('Video input format is not correct, missing one or more element. %s' % line))\n                if self.use_decord:\n                    # line format: video_path, video_label\n                    clip_path = os.path.join(line_info[0])\n                    target = int(line_info[1])\n                    item = (clip_path, target)\n                else:\n                    # line format: video_path, video_duration, video_label\n                    clip_path = os.path.join(line_info[0])\n                    total_frame = int(line_info[1])\n                    target = int(line_info[2])\n                    item = (clip_path, total_frame, target)\n                clips.append(item)\n        return clips\n\n    def _sample_train_indices(self, num_frames):\n        average_duration = (num_frames - self.skip_length + 1) // self.num_segments\n        if average_duration > 0:\n            offsets = np.multiply(list(range(self.num_segments)),\n                                  average_duration)\n            offsets = offsets + np.random.randint(average_duration,\n                                                  size=self.num_segments)\n        elif num_frames > max(self.num_segments, self.skip_length):\n            offsets = np.sort(np.random.randint(\n                num_frames - self.skip_length + 1,\n                size=self.num_segments))\n        else:\n            offsets = np.zeros((self.num_segments,))\n\n        if self.temporal_jitter:\n            skip_offsets = np.random.randint(\n                self.new_step, size=self.skip_length // self.new_step)\n        else:\n            skip_offsets = np.zeros(\n                self.skip_length // self.new_step, dtype=int)\n        return offsets + 1, skip_offsets\n\n    def _get_frame_id_list(self, duration, indices, skip_offsets):\n        frame_id_list = []\n        for seg_ind in indices:\n            offset = int(seg_ind)\n            for i, _ in enumerate(range(0, self.skip_length, self.new_step)):\n                if offset + skip_offsets[i] <= duration:\n                    frame_id = offset + skip_offsets[i] - 1\n                else:\n                    frame_id = offset - 1\n                frame_id_list.append(frame_id)\n                if offset + self.new_step < duration:\n                    offset += self.new_step\n        return frame_id_list\n\n    def _video_TSN_decord_batch_loader(self, directory, video_reader, duration, indices, skip_offsets):\n        sampled_list = []\n        frame_id_list = []\n        for seg_ind in indices:\n            offset = int(seg_ind)\n            for i, _ in enumerate(range(0, self.skip_length, self.new_step)):\n                if offset + skip_offsets[i] <= duration:\n                    frame_id = offset + skip_offsets[i] - 1\n                else:\n                    frame_id = offset - 1\n                frame_id_list.append(frame_id)\n                if offset + self.new_step < duration:\n                    offset += self.new_step\n        try:\n            video_data = video_reader.get_batch(frame_id_list).asnumpy()\n            sampled_list = [Image.fromarray(video_data[vid, :, :, :]).convert('RGB') for vid, _ in enumerate(frame_id_list)]\n        except:\n            raise RuntimeError('Error occured in reading frames {} from video {} of duration {}.'.format(frame_id_list, directory, duration))\n        return sampled_list"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/datasets/masking_generator.py",
    "content": "import numpy as np\n\n\nclass TubeMaskingGenerator:\n    def __init__(self, input_size, mask_ratio):\n        self.frames, self.height, self.width = input_size\n        self.num_patches_per_frame = self.height * self.width\n        self.total_patches = self.frames * self.num_patches_per_frame \n        self.num_masks_per_frame = int(mask_ratio * self.num_patches_per_frame)\n        self.total_masks = self.frames * self.num_masks_per_frame\n\n    def __repr__(self):\n        repr_str = \"Maks: total patches {}, mask patches {}\".format(\n            self.total_patches, self.total_masks\n        )\n        return repr_str\n\n    def __call__(self):\n        mask_per_frame = np.hstack([\n            np.zeros(self.num_patches_per_frame - self.num_masks_per_frame),\n            np.ones(self.num_masks_per_frame),\n        ])\n        np.random.shuffle(mask_per_frame)\n        mask = np.tile(mask_per_frame, (self.frames, 1)).flatten()\n        return mask \n\n\nclass RandomMaskingGenerator:\n    def __init__(self, input_size, mask_ratio):\n        if not isinstance(input_size, tuple):\n            input_size = (input_size, ) * 3\n\n        self.frames, self.height, self.width = input_size\n\n        self.num_patches = self.frames * self.height * self.width  # 8x14x14\n        self.num_mask = int(mask_ratio * self.num_patches)\n\n    def __repr__(self):\n        repr_str = \"Maks: total patches {}, mask patches {}\".format(\n            self.num_patches, self.num_mask)\n        return repr_str\n\n    def __call__(self):\n        mask = np.hstack([\n            np.zeros(self.num_patches - self.num_mask),\n            np.ones(self.num_mask),\n        ])\n        np.random.shuffle(mask)\n        return mask  # [196*8]\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/datasets/mixup.py",
    "content": "\"\"\" Mixup and Cutmix\n\nPapers:\nmixup: Beyond Empirical Risk Minimization (https://arxiv.org/abs/1710.09412)\n\nCutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features (https://arxiv.org/abs/1905.04899)\n\nCode Reference:\nCutMix: https://github.com/clovaai/CutMix-PyTorch\n\nHacked together by / Copyright 2019, Ross Wightman\n\"\"\"\nimport numpy as np\nimport torch\n\n\ndef one_hot(x, num_classes, on_value=1., off_value=0., device='cuda'):\n    x = x.long().view(-1, 1)\n    return torch.full((x.size()[0], num_classes), off_value, device=device).scatter_(1, x, on_value)\n\n\ndef mixup_target(target, num_classes, lam=1., smoothing=0.0, device='cuda'):\n    off_value = smoothing / num_classes\n    on_value = 1. - smoothing + off_value\n    y1 = one_hot(target, num_classes, on_value=on_value, off_value=off_value, device=device)\n    y2 = one_hot(target.flip(0), num_classes, on_value=on_value, off_value=off_value, device=device)\n    return y1 * lam + y2 * (1. - lam)\n\n\ndef rand_bbox(img_shape, lam, margin=0., count=None):\n    \"\"\" Standard CutMix bounding-box\n    Generates a random square bbox based on lambda value. This impl includes\n    support for enforcing a border margin as percent of bbox dimensions.\n\n    Args:\n        img_shape (tuple): Image shape as tuple\n        lam (float): Cutmix lambda value\n        margin (float): Percentage of bbox dimension to enforce as margin (reduce amount of box outside image)\n        count (int): Number of bbox to generate\n    \"\"\"\n    ratio = np.sqrt(1 - lam)\n    img_h, img_w = img_shape[-2:]\n    cut_h, cut_w = int(img_h * ratio), int(img_w * ratio)\n    margin_y, margin_x = int(margin * cut_h), int(margin * cut_w)\n    cy = np.random.randint(0 + margin_y, img_h - margin_y, size=count)\n    cx = np.random.randint(0 + margin_x, img_w - margin_x, size=count)\n    yl = np.clip(cy - cut_h // 2, 0, img_h)\n    yh = np.clip(cy + cut_h // 2, 0, img_h)\n    xl = np.clip(cx - cut_w // 2, 0, img_w)\n    xh = np.clip(cx + cut_w // 2, 0, img_w)\n    return yl, yh, xl, xh\n\n\ndef rand_bbox_minmax(img_shape, minmax, count=None):\n    \"\"\" Min-Max CutMix bounding-box\n    Inspired by Darknet cutmix impl, generates a random rectangular bbox\n    based on min/max percent values applied to each dimension of the input image.\n\n    Typical defaults for minmax are usually in the  .2-.3 for min and .8-.9 range for max.\n\n    Args:\n        img_shape (tuple): Image shape as tuple\n        minmax (tuple or list): Min and max bbox ratios (as percent of image size)\n        count (int): Number of bbox to generate\n    \"\"\"\n    assert len(minmax) == 2\n    img_h, img_w = img_shape[-2:]\n    cut_h = np.random.randint(int(img_h * minmax[0]), int(img_h * minmax[1]), size=count)\n    cut_w = np.random.randint(int(img_w * minmax[0]), int(img_w * minmax[1]), size=count)\n    yl = np.random.randint(0, img_h - cut_h, size=count)\n    xl = np.random.randint(0, img_w - cut_w, size=count)\n    yu = yl + cut_h\n    xu = xl + cut_w\n    return yl, yu, xl, xu\n\n\ndef cutmix_bbox_and_lam(img_shape, lam, ratio_minmax=None, correct_lam=True, count=None):\n    \"\"\" Generate bbox and apply lambda correction.\n    \"\"\"\n    if ratio_minmax is not None:\n        yl, yu, xl, xu = rand_bbox_minmax(img_shape, ratio_minmax, count=count)\n    else:\n        yl, yu, xl, xu = rand_bbox(img_shape, lam, count=count)\n    if correct_lam or ratio_minmax is not None:\n        bbox_area = (yu - yl) * (xu - xl)\n        lam = 1. - bbox_area / float(img_shape[-2] * img_shape[-1])\n    return (yl, yu, xl, xu), lam\n\n\nclass Mixup:\n    \"\"\" Mixup/Cutmix that applies different params to each element or whole batch\n\n    Args:\n        mixup_alpha (float): mixup alpha value, mixup is active if > 0.\n        cutmix_alpha (float): cutmix alpha value, cutmix is active if > 0.\n        cutmix_minmax (List[float]): cutmix min/max image ratio, cutmix is active and uses this vs alpha if not None.\n        prob (float): probability of applying mixup or cutmix per batch or element\n        switch_prob (float): probability of switching to cutmix instead of mixup when both are active\n        mode (str): how to apply mixup/cutmix params (per 'batch', 'pair' (pair of elements), 'elem' (element)\n        correct_lam (bool): apply lambda correction when cutmix bbox clipped by image borders\n        label_smoothing (float): apply label smoothing to the mixed target tensor\n        num_classes (int): number of classes for target\n    \"\"\"\n    def __init__(self, mixup_alpha=1., cutmix_alpha=0., cutmix_minmax=None, prob=1.0, switch_prob=0.5,\n                 mode='batch', correct_lam=True, label_smoothing=0.1, num_classes=1000):\n        self.mixup_alpha = mixup_alpha\n        self.cutmix_alpha = cutmix_alpha\n        self.cutmix_minmax = cutmix_minmax\n        if self.cutmix_minmax is not None:\n            assert len(self.cutmix_minmax) == 2\n            # force cutmix alpha == 1.0 when minmax active to keep logic simple & safe\n            self.cutmix_alpha = 1.0\n        self.mix_prob = prob\n        self.switch_prob = switch_prob\n        self.label_smoothing = label_smoothing\n        self.num_classes = num_classes\n        self.mode = mode\n        self.correct_lam = correct_lam  # correct lambda based on clipped area for cutmix\n        self.mixup_enabled = True  # set to false to disable mixing (intended tp be set by train loop)\n\n    def _params_per_elem(self, batch_size):\n        lam = np.ones(batch_size, dtype=np.float32)\n        use_cutmix = np.zeros(batch_size, dtype=np.bool)\n        if self.mixup_enabled:\n            if self.mixup_alpha > 0. and self.cutmix_alpha > 0.:\n                use_cutmix = np.random.rand(batch_size) < self.switch_prob\n                lam_mix = np.where(\n                    use_cutmix,\n                    np.random.beta(self.cutmix_alpha, self.cutmix_alpha, size=batch_size),\n                    np.random.beta(self.mixup_alpha, self.mixup_alpha, size=batch_size))\n            elif self.mixup_alpha > 0.:\n                lam_mix = np.random.beta(self.mixup_alpha, self.mixup_alpha, size=batch_size)\n            elif self.cutmix_alpha > 0.:\n                use_cutmix = np.ones(batch_size, dtype=np.bool)\n                lam_mix = np.random.beta(self.cutmix_alpha, self.cutmix_alpha, size=batch_size)\n            else:\n                assert False, \"One of mixup_alpha > 0., cutmix_alpha > 0., cutmix_minmax not None should be true.\"\n            lam = np.where(np.random.rand(batch_size) < self.mix_prob, lam_mix.astype(np.float32), lam)\n        return lam, use_cutmix\n\n    def _params_per_batch(self):\n        lam = 1.\n        use_cutmix = False\n        if self.mixup_enabled and np.random.rand() < self.mix_prob:\n            if self.mixup_alpha > 0. and self.cutmix_alpha > 0.:\n                use_cutmix = np.random.rand() < self.switch_prob\n                lam_mix = np.random.beta(self.cutmix_alpha, self.cutmix_alpha) if use_cutmix else \\\n                    np.random.beta(self.mixup_alpha, self.mixup_alpha)\n            elif self.mixup_alpha > 0.:\n                lam_mix = np.random.beta(self.mixup_alpha, self.mixup_alpha)\n            elif self.cutmix_alpha > 0.:\n                use_cutmix = True\n                lam_mix = np.random.beta(self.cutmix_alpha, self.cutmix_alpha)\n            else:\n                assert False, \"One of mixup_alpha > 0., cutmix_alpha > 0., cutmix_minmax not None should be true.\"\n            lam = float(lam_mix)\n        return lam, use_cutmix\n\n    def _mix_elem(self, x):\n        batch_size = len(x)\n        lam_batch, use_cutmix = self._params_per_elem(batch_size)\n        x_orig = x.clone()  # need to keep an unmodified original for mixing source\n        for i in range(batch_size):\n            j = batch_size - i - 1\n            lam = lam_batch[i]\n            if lam != 1.:\n                if use_cutmix[i]:\n                    (yl, yh, xl, xh), lam = cutmix_bbox_and_lam(\n                        x[i].shape, lam, ratio_minmax=self.cutmix_minmax, correct_lam=self.correct_lam)\n                    x[i][..., yl:yh, xl:xh] = x_orig[j][..., yl:yh, xl:xh]\n                    lam_batch[i] = lam\n                else:\n                    x[i] = x[i] * lam + x_orig[j] * (1 - lam)\n        return torch.tensor(lam_batch, device=x.device, dtype=x.dtype).unsqueeze(1)\n\n    def _mix_pair(self, x):\n        batch_size = len(x)\n        lam_batch, use_cutmix = self._params_per_elem(batch_size // 2)\n        x_orig = x.clone()  # need to keep an unmodified original for mixing source\n        for i in range(batch_size // 2):\n            j = batch_size - i - 1\n            lam = lam_batch[i]\n            if lam != 1.:\n                if use_cutmix[i]:\n                    (yl, yh, xl, xh), lam = cutmix_bbox_and_lam(\n                        x[i].shape, lam, ratio_minmax=self.cutmix_minmax, correct_lam=self.correct_lam)\n                    x[i][:, yl:yh, xl:xh] = x_orig[j][:, yl:yh, xl:xh]\n                    x[j][:, yl:yh, xl:xh] = x_orig[i][:, yl:yh, xl:xh]\n                    lam_batch[i] = lam\n                else:\n                    x[i] = x[i] * lam + x_orig[j] * (1 - lam)\n                    x[j] = x[j] * lam + x_orig[i] * (1 - lam)\n        lam_batch = np.concatenate((lam_batch, lam_batch[::-1]))\n        return torch.tensor(lam_batch, device=x.device, dtype=x.dtype).unsqueeze(1)\n\n    def _mix_batch(self, x):\n        lam, use_cutmix = self._params_per_batch()\n        if lam == 1.:\n            return 1.\n        if use_cutmix:\n            (yl, yh, xl, xh), lam = cutmix_bbox_and_lam(\n                x.shape, lam, ratio_minmax=self.cutmix_minmax, correct_lam=self.correct_lam)\n            x[..., yl:yh, xl:xh] = x.flip(0)[..., yl:yh, xl:xh]\n        else:\n            x_flipped = x.flip(0).mul_(1. - lam)\n            x.mul_(lam).add_(x_flipped)\n        return lam\n\n    def __call__(self, x, target):\n        assert len(x) % 2 == 0, 'Batch size should be even when using this'\n        if self.mode == 'elem':\n            lam = self._mix_elem(x)\n        elif self.mode == 'pair':\n            lam = self._mix_pair(x)\n        else:\n            lam = self._mix_batch(x)\n        target = mixup_target(target, self.num_classes, lam, self.label_smoothing, x.device)\n        return x, target\n\n\nclass FastCollateMixup(Mixup):\n    \"\"\" Fast Collate w/ Mixup/Cutmix that applies different params to each element or whole batch\n\n    A Mixup impl that's performed while collating the batches.\n    \"\"\"\n\n    def _mix_elem_collate(self, output, batch, half=False):\n        batch_size = len(batch)\n        num_elem = batch_size // 2 if half else batch_size\n        assert len(output) == num_elem\n        lam_batch, use_cutmix = self._params_per_elem(num_elem)\n        for i in range(num_elem):\n            j = batch_size - i - 1\n            lam = lam_batch[i]\n            mixed = batch[i][0]\n            if lam != 1.:\n                if use_cutmix[i]:\n                    if not half:\n                        mixed = mixed.copy()\n                    (yl, yh, xl, xh), lam = cutmix_bbox_and_lam(\n                        output.shape, lam, ratio_minmax=self.cutmix_minmax, correct_lam=self.correct_lam)\n                    mixed[:, yl:yh, xl:xh] = batch[j][0][:, yl:yh, xl:xh]\n                    lam_batch[i] = lam\n                else:\n                    mixed = mixed.astype(np.float32) * lam + batch[j][0].astype(np.float32) * (1 - lam)\n                    np.rint(mixed, out=mixed)\n            output[i] += torch.from_numpy(mixed.astype(np.uint8))\n        if half:\n            lam_batch = np.concatenate((lam_batch, np.ones(num_elem)))\n        return torch.tensor(lam_batch).unsqueeze(1)\n\n    def _mix_pair_collate(self, output, batch):\n        batch_size = len(batch)\n        lam_batch, use_cutmix = self._params_per_elem(batch_size // 2)\n        for i in range(batch_size // 2):\n            j = batch_size - i - 1\n            lam = lam_batch[i]\n            mixed_i = batch[i][0]\n            mixed_j = batch[j][0]\n            assert 0 <= lam <= 1.0\n            if lam < 1.:\n                if use_cutmix[i]:\n                    (yl, yh, xl, xh), lam = cutmix_bbox_and_lam(\n                        output.shape, lam, ratio_minmax=self.cutmix_minmax, correct_lam=self.correct_lam)\n                    patch_i = mixed_i[:, yl:yh, xl:xh].copy()\n                    mixed_i[:, yl:yh, xl:xh] = mixed_j[:, yl:yh, xl:xh]\n                    mixed_j[:, yl:yh, xl:xh] = patch_i\n                    lam_batch[i] = lam\n                else:\n                    mixed_temp = mixed_i.astype(np.float32) * lam + mixed_j.astype(np.float32) * (1 - lam)\n                    mixed_j = mixed_j.astype(np.float32) * lam + mixed_i.astype(np.float32) * (1 - lam)\n                    mixed_i = mixed_temp\n                    np.rint(mixed_j, out=mixed_j)\n                    np.rint(mixed_i, out=mixed_i)\n            output[i] += torch.from_numpy(mixed_i.astype(np.uint8))\n            output[j] += torch.from_numpy(mixed_j.astype(np.uint8))\n        lam_batch = np.concatenate((lam_batch, lam_batch[::-1]))\n        return torch.tensor(lam_batch).unsqueeze(1)\n\n    def _mix_batch_collate(self, output, batch):\n        batch_size = len(batch)\n        lam, use_cutmix = self._params_per_batch()\n        if use_cutmix:\n            (yl, yh, xl, xh), lam = cutmix_bbox_and_lam(\n                output.shape, lam, ratio_minmax=self.cutmix_minmax, correct_lam=self.correct_lam)\n        for i in range(batch_size):\n            j = batch_size - i - 1\n            mixed = batch[i][0]\n            if lam != 1.:\n                if use_cutmix:\n                    mixed = mixed.copy()  # don't want to modify the original while iterating\n                    mixed[..., yl:yh, xl:xh] = batch[j][0][..., yl:yh, xl:xh]\n                else:\n                    mixed = mixed.astype(np.float32) * lam + batch[j][0].astype(np.float32) * (1 - lam)\n                    np.rint(mixed, out=mixed)\n            output[i] += torch.from_numpy(mixed.astype(np.uint8))\n        return lam\n\n    def __call__(self, batch, _=None):\n        batch_size = len(batch)\n        assert batch_size % 2 == 0, 'Batch size should be even when using this'\n        half = 'half' in self.mode\n        if half:\n            batch_size //= 2\n        output = torch.zeros((batch_size, *batch[0][0].shape), dtype=torch.uint8)\n        if self.mode == 'elem' or self.mode == 'half':\n            lam = self._mix_elem_collate(output, batch, half=half)\n        elif self.mode == 'pair':\n            lam = self._mix_pair_collate(output, batch)\n        else:\n            lam = self._mix_batch_collate(output, batch)\n        target = torch.tensor([b[1] for b in batch], dtype=torch.int64)\n        target = mixup_target(target, self.num_classes, lam, self.label_smoothing, device='cpu')\n        target = target[:batch_size]\n        return output, target\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/datasets/rand_augment.py",
    "content": "\"\"\"\nThis implementation is based on\nhttps://github.com/rwightman/pytorch-image-models/blob/master/timm/data/auto_augment.py\npulished under an Apache License 2.0.\n\nCOMMENT FROM ORIGINAL:\nAutoAugment, RandAugment, and AugMix for PyTorch\nThis code implements the searched ImageNet policies with various tweaks and\nimprovements and does not include any of the search code. AA and RA\nImplementation adapted from:\n    https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/autoaugment.py\nAugMix adapted from:\n    https://github.com/google-research/augmix\nPapers:\n    AutoAugment: Learning Augmentation Policies from Data\n    https://arxiv.org/abs/1805.09501\n    Learning Data Augmentation Strategies for Object Detection\n    https://arxiv.org/abs/1906.11172\n    RandAugment: Practical automated data augmentation...\n    https://arxiv.org/abs/1909.13719\n    AugMix: A Simple Data Processing Method to Improve Robustness and\n    Uncertainty https://arxiv.org/abs/1912.02781\n\nHacked together by / Copyright 2020 Ross Wightman\n\"\"\"\n\nimport math\nimport numpy as np\nimport random\nimport re\nimport PIL\nfrom PIL import Image, ImageEnhance, ImageOps\n\n_PIL_VER = tuple([int(x) for x in PIL.__version__.split(\".\")[:2]])\n\n_FILL = (128, 128, 128)\n\n# This signifies the max integer that the controller RNN could predict for the\n# augmentation scheme.\n_MAX_LEVEL = 10.0\n\n_HPARAMS_DEFAULT = {\n    \"translate_const\": 250,\n    \"img_mean\": _FILL,\n}\n\n_RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC)\n\n\ndef _interpolation(kwargs):\n    interpolation = kwargs.pop(\"resample\", Image.BILINEAR)\n    if isinstance(interpolation, (list, tuple)):\n        return random.choice(interpolation)\n    else:\n        return interpolation\n\n\ndef _check_args_tf(kwargs):\n    if \"fillcolor\" in kwargs and _PIL_VER < (5, 0):\n        kwargs.pop(\"fillcolor\")\n    kwargs[\"resample\"] = _interpolation(kwargs)\n\n\ndef shear_x(img, factor, **kwargs):\n    _check_args_tf(kwargs)\n    return img.transform(\n        img.size, Image.AFFINE, (1, factor, 0, 0, 1, 0), **kwargs\n    )\n\n\ndef shear_y(img, factor, **kwargs):\n    _check_args_tf(kwargs)\n    return img.transform(\n        img.size, Image.AFFINE, (1, 0, 0, factor, 1, 0), **kwargs\n    )\n\n\ndef translate_x_rel(img, pct, **kwargs):\n    pixels = pct * img.size[0]\n    _check_args_tf(kwargs)\n    return img.transform(\n        img.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0), **kwargs\n    )\n\n\ndef translate_y_rel(img, pct, **kwargs):\n    pixels = pct * img.size[1]\n    _check_args_tf(kwargs)\n    return img.transform(\n        img.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels), **kwargs\n    )\n\n\ndef translate_x_abs(img, pixels, **kwargs):\n    _check_args_tf(kwargs)\n    return img.transform(\n        img.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0), **kwargs\n    )\n\n\ndef translate_y_abs(img, pixels, **kwargs):\n    _check_args_tf(kwargs)\n    return img.transform(\n        img.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels), **kwargs\n    )\n\n\ndef rotate(img, degrees, **kwargs):\n    _check_args_tf(kwargs)\n    if _PIL_VER >= (5, 2):\n        return img.rotate(degrees, **kwargs)\n    elif _PIL_VER >= (5, 0):\n        w, h = img.size\n        post_trans = (0, 0)\n        rotn_center = (w / 2.0, h / 2.0)\n        angle = -math.radians(degrees)\n        matrix = [\n            round(math.cos(angle), 15),\n            round(math.sin(angle), 15),\n            0.0,\n            round(-math.sin(angle), 15),\n            round(math.cos(angle), 15),\n            0.0,\n        ]\n\n        def transform(x, y, matrix):\n            (a, b, c, d, e, f) = matrix\n            return a * x + b * y + c, d * x + e * y + f\n\n        matrix[2], matrix[5] = transform(\n            -rotn_center[0] - post_trans[0],\n            -rotn_center[1] - post_trans[1],\n            matrix,\n        )\n        matrix[2] += rotn_center[0]\n        matrix[5] += rotn_center[1]\n        return img.transform(img.size, Image.AFFINE, matrix, **kwargs)\n    else:\n        return img.rotate(degrees, resample=kwargs[\"resample\"])\n\n\ndef auto_contrast(img, **__):\n    return ImageOps.autocontrast(img)\n\n\ndef invert(img, **__):\n    return ImageOps.invert(img)\n\n\ndef equalize(img, **__):\n    return ImageOps.equalize(img)\n\n\ndef solarize(img, thresh, **__):\n    return ImageOps.solarize(img, thresh)\n\n\ndef solarize_add(img, add, thresh=128, **__):\n    lut = []\n    for i in range(256):\n        if i < thresh:\n            lut.append(min(255, i + add))\n        else:\n            lut.append(i)\n    if img.mode in (\"L\", \"RGB\"):\n        if img.mode == \"RGB\" and len(lut) == 256:\n            lut = lut + lut + lut\n        return img.point(lut)\n    else:\n        return img\n\n\ndef posterize(img, bits_to_keep, **__):\n    if bits_to_keep >= 8:\n        return img\n    return ImageOps.posterize(img, bits_to_keep)\n\n\ndef contrast(img, factor, **__):\n    return ImageEnhance.Contrast(img).enhance(factor)\n\n\ndef color(img, factor, **__):\n    return ImageEnhance.Color(img).enhance(factor)\n\n\ndef brightness(img, factor, **__):\n    return ImageEnhance.Brightness(img).enhance(factor)\n\n\ndef sharpness(img, factor, **__):\n    return ImageEnhance.Sharpness(img).enhance(factor)\n\n\ndef _randomly_negate(v):\n    \"\"\"With 50% prob, negate the value\"\"\"\n    return -v if random.random() > 0.5 else v\n\n\ndef _rotate_level_to_arg(level, _hparams):\n    # range [-30, 30]\n    level = (level / _MAX_LEVEL) * 30.0\n    level = _randomly_negate(level)\n    return (level,)\n\n\ndef _enhance_level_to_arg(level, _hparams):\n    # range [0.1, 1.9]\n    return ((level / _MAX_LEVEL) * 1.8 + 0.1,)\n\n\ndef _enhance_increasing_level_to_arg(level, _hparams):\n    # the 'no change' level is 1.0, moving away from that towards 0. or 2.0 increases the enhancement blend\n    # range [0.1, 1.9]\n    level = (level / _MAX_LEVEL) * 0.9\n    level = 1.0 + _randomly_negate(level)\n    return (level,)\n\n\ndef _shear_level_to_arg(level, _hparams):\n    # range [-0.3, 0.3]\n    level = (level / _MAX_LEVEL) * 0.3\n    level = _randomly_negate(level)\n    return (level,)\n\n\ndef _translate_abs_level_to_arg(level, hparams):\n    translate_const = hparams[\"translate_const\"]\n    level = (level / _MAX_LEVEL) * float(translate_const)\n    level = _randomly_negate(level)\n    return (level,)\n\n\ndef _translate_rel_level_to_arg(level, hparams):\n    # default range [-0.45, 0.45]\n    translate_pct = hparams.get(\"translate_pct\", 0.45)\n    level = (level / _MAX_LEVEL) * translate_pct\n    level = _randomly_negate(level)\n    return (level,)\n\n\ndef _posterize_level_to_arg(level, _hparams):\n    # As per Tensorflow TPU EfficientNet impl\n    # range [0, 4], 'keep 0 up to 4 MSB of original image'\n    # intensity/severity of augmentation decreases with level\n    return (int((level / _MAX_LEVEL) * 4),)\n\n\ndef _posterize_increasing_level_to_arg(level, hparams):\n    # As per Tensorflow models research and UDA impl\n    # range [4, 0], 'keep 4 down to 0 MSB of original image',\n    # intensity/severity of augmentation increases with level\n    return (4 - _posterize_level_to_arg(level, hparams)[0],)\n\n\ndef _posterize_original_level_to_arg(level, _hparams):\n    # As per original AutoAugment paper description\n    # range [4, 8], 'keep 4 up to 8 MSB of image'\n    # intensity/severity of augmentation decreases with level\n    return (int((level / _MAX_LEVEL) * 4) + 4,)\n\n\ndef _solarize_level_to_arg(level, _hparams):\n    # range [0, 256]\n    # intensity/severity of augmentation decreases with level\n    return (int((level / _MAX_LEVEL) * 256),)\n\n\ndef _solarize_increasing_level_to_arg(level, _hparams):\n    # range [0, 256]\n    # intensity/severity of augmentation increases with level\n    return (256 - _solarize_level_to_arg(level, _hparams)[0],)\n\n\ndef _solarize_add_level_to_arg(level, _hparams):\n    # range [0, 110]\n    return (int((level / _MAX_LEVEL) * 110),)\n\n\nLEVEL_TO_ARG = {\n    \"AutoContrast\": None,\n    \"Equalize\": None,\n    \"Invert\": None,\n    \"Rotate\": _rotate_level_to_arg,\n    # There are several variations of the posterize level scaling in various Tensorflow/Google repositories/papers\n    \"Posterize\": _posterize_level_to_arg,\n    \"PosterizeIncreasing\": _posterize_increasing_level_to_arg,\n    \"PosterizeOriginal\": _posterize_original_level_to_arg,\n    \"Solarize\": _solarize_level_to_arg,\n    \"SolarizeIncreasing\": _solarize_increasing_level_to_arg,\n    \"SolarizeAdd\": _solarize_add_level_to_arg,\n    \"Color\": _enhance_level_to_arg,\n    \"ColorIncreasing\": _enhance_increasing_level_to_arg,\n    \"Contrast\": _enhance_level_to_arg,\n    \"ContrastIncreasing\": _enhance_increasing_level_to_arg,\n    \"Brightness\": _enhance_level_to_arg,\n    \"BrightnessIncreasing\": _enhance_increasing_level_to_arg,\n    \"Sharpness\": _enhance_level_to_arg,\n    \"SharpnessIncreasing\": _enhance_increasing_level_to_arg,\n    \"ShearX\": _shear_level_to_arg,\n    \"ShearY\": _shear_level_to_arg,\n    \"TranslateX\": _translate_abs_level_to_arg,\n    \"TranslateY\": _translate_abs_level_to_arg,\n    \"TranslateXRel\": _translate_rel_level_to_arg,\n    \"TranslateYRel\": _translate_rel_level_to_arg,\n}\n\n\nNAME_TO_OP = {\n    \"AutoContrast\": auto_contrast,\n    \"Equalize\": equalize,\n    \"Invert\": invert,\n    \"Rotate\": rotate,\n    \"Posterize\": posterize,\n    \"PosterizeIncreasing\": posterize,\n    \"PosterizeOriginal\": posterize,\n    \"Solarize\": solarize,\n    \"SolarizeIncreasing\": solarize,\n    \"SolarizeAdd\": solarize_add,\n    \"Color\": color,\n    \"ColorIncreasing\": color,\n    \"Contrast\": contrast,\n    \"ContrastIncreasing\": contrast,\n    \"Brightness\": brightness,\n    \"BrightnessIncreasing\": brightness,\n    \"Sharpness\": sharpness,\n    \"SharpnessIncreasing\": sharpness,\n    \"ShearX\": shear_x,\n    \"ShearY\": shear_y,\n    \"TranslateX\": translate_x_abs,\n    \"TranslateY\": translate_y_abs,\n    \"TranslateXRel\": translate_x_rel,\n    \"TranslateYRel\": translate_y_rel,\n}\n\n\nclass AugmentOp:\n    \"\"\"\n    Apply for video.\n    \"\"\"\n\n    def __init__(self, name, prob=0.5, magnitude=10, hparams=None):\n        hparams = hparams or _HPARAMS_DEFAULT\n        self.aug_fn = NAME_TO_OP[name]\n        self.level_fn = LEVEL_TO_ARG[name]\n        self.prob = prob\n        self.magnitude = magnitude\n        self.hparams = hparams.copy()\n        self.kwargs = {\n            \"fillcolor\": hparams[\"img_mean\"]\n            if \"img_mean\" in hparams\n            else _FILL,\n            \"resample\": hparams[\"interpolation\"]\n            if \"interpolation\" in hparams\n            else _RANDOM_INTERPOLATION,\n        }\n\n        # If magnitude_std is > 0, we introduce some randomness\n        # in the usually fixed policy and sample magnitude from a normal distribution\n        # with mean `magnitude` and std-dev of `magnitude_std`.\n        # NOTE This is my own hack, being tested, not in papers or reference impls.\n        self.magnitude_std = self.hparams.get(\"magnitude_std\", 0)\n\n    def __call__(self, img_list):\n        if self.prob < 1.0 and random.random() > self.prob:\n            return img_list\n        magnitude = self.magnitude\n        if self.magnitude_std and self.magnitude_std > 0:\n            magnitude = random.gauss(magnitude, self.magnitude_std)\n        magnitude = min(_MAX_LEVEL, max(0, magnitude))  # clip to valid range\n        level_args = (\n            self.level_fn(magnitude, self.hparams)\n            if self.level_fn is not None\n            else ()\n        )\n\n        if isinstance(img_list, list):\n            return [\n                self.aug_fn(img, *level_args, **self.kwargs) for img in img_list\n            ]\n        else:\n            return self.aug_fn(img_list, *level_args, **self.kwargs)\n\n\n_RAND_TRANSFORMS = [\n    \"AutoContrast\",\n    \"Equalize\",\n    \"Invert\",\n    \"Rotate\",\n    \"Posterize\",\n    \"Solarize\",\n    \"SolarizeAdd\",\n    \"Color\",\n    \"Contrast\",\n    \"Brightness\",\n    \"Sharpness\",\n    \"ShearX\",\n    \"ShearY\",\n    \"TranslateXRel\",\n    \"TranslateYRel\",\n]\n\n\n_RAND_INCREASING_TRANSFORMS = [\n    \"AutoContrast\",\n    \"Equalize\",\n    \"Invert\",\n    \"Rotate\",\n    \"PosterizeIncreasing\",\n    \"SolarizeIncreasing\",\n    \"SolarizeAdd\",\n    \"ColorIncreasing\",\n    \"ContrastIncreasing\",\n    \"BrightnessIncreasing\",\n    \"SharpnessIncreasing\",\n    \"ShearX\",\n    \"ShearY\",\n    \"TranslateXRel\",\n    \"TranslateYRel\",\n]\n\n\n# These experimental weights are based loosely on the relative improvements mentioned in paper.\n# They may not result in increased performance, but could likely be tuned to so.\n_RAND_CHOICE_WEIGHTS_0 = {\n    \"Rotate\": 0.3,\n    \"ShearX\": 0.2,\n    \"ShearY\": 0.2,\n    \"TranslateXRel\": 0.1,\n    \"TranslateYRel\": 0.1,\n    \"Color\": 0.025,\n    \"Sharpness\": 0.025,\n    \"AutoContrast\": 0.025,\n    \"Solarize\": 0.005,\n    \"SolarizeAdd\": 0.005,\n    \"Contrast\": 0.005,\n    \"Brightness\": 0.005,\n    \"Equalize\": 0.005,\n    \"Posterize\": 0,\n    \"Invert\": 0,\n}\n\n\ndef _select_rand_weights(weight_idx=0, transforms=None):\n    transforms = transforms or _RAND_TRANSFORMS\n    assert weight_idx == 0  # only one set of weights currently\n    rand_weights = _RAND_CHOICE_WEIGHTS_0\n    probs = [rand_weights[k] for k in transforms]\n    probs /= np.sum(probs)\n    return probs\n\n\ndef rand_augment_ops(magnitude=10, hparams=None, transforms=None):\n    hparams = hparams or _HPARAMS_DEFAULT\n    transforms = transforms or _RAND_TRANSFORMS\n    return [\n        AugmentOp(name, prob=0.5, magnitude=magnitude, hparams=hparams)\n        for name in transforms\n    ]\n\n\nclass RandAugment:\n    def __init__(self, ops, num_layers=2, choice_weights=None):\n        self.ops = ops\n        self.num_layers = num_layers\n        self.choice_weights = choice_weights\n\n    def __call__(self, img):\n        # no replacement when using weighted choice\n        ops = np.random.choice(\n            self.ops,\n            self.num_layers,\n            replace=self.choice_weights is None,\n            p=self.choice_weights,\n        )\n        for op in ops:\n            img = op(img)\n        return img\n\n\ndef rand_augment_transform(config_str, hparams):\n    \"\"\"\n    RandAugment: Practical automated data augmentation... - https://arxiv.org/abs/1909.13719\n\n    Create a RandAugment transform\n    :param config_str: String defining configuration of random augmentation. Consists of multiple sections separated by\n    dashes ('-'). The first section defines the specific variant of rand augment (currently only 'rand'). The remaining\n    sections, not order sepecific determine\n        'm' - integer magnitude of rand augment\n        'n' - integer num layers (number of transform ops selected per image)\n        'w' - integer probabiliy weight index (index of a set of weights to influence choice of op)\n        'mstd' -  float std deviation of magnitude noise applied\n        'inc' - integer (bool), use augmentations that increase in severity with magnitude (default: 0)\n    Ex 'rand-m9-n3-mstd0.5' results in RandAugment with magnitude 9, num_layers 3, magnitude_std 0.5\n    'rand-mstd1-w0' results in magnitude_std 1.0, weights 0, default magnitude of 10 and num_layers 2\n    :param hparams: Other hparams (kwargs) for the RandAugmentation scheme\n    :return: A PyTorch compatible Transform\n    \"\"\"\n    magnitude = _MAX_LEVEL  # default to _MAX_LEVEL for magnitude (currently 10)\n    num_layers = 2  # default to 2 ops per image\n    weight_idx = None  # default to no probability weights for op choice\n    transforms = _RAND_TRANSFORMS\n    config = config_str.split(\"-\")\n    assert config[0] == \"rand\"\n    config = config[1:]\n    for c in config:\n        cs = re.split(r\"(\\d.*)\", c)\n        if len(cs) < 2:\n            continue\n        key, val = cs[:2]\n        if key == \"mstd\":\n            # noise param injected via hparams for now\n            hparams.setdefault(\"magnitude_std\", float(val))\n        elif key == \"inc\":\n            if bool(val):\n                transforms = _RAND_INCREASING_TRANSFORMS\n        elif key == \"m\":\n            magnitude = int(val)\n        elif key == \"n\":\n            num_layers = int(val)\n        elif key == \"w\":\n            weight_idx = int(val)\n        else:\n            assert NotImplementedError\n    ra_ops = rand_augment_ops(\n        magnitude=magnitude, hparams=hparams, transforms=transforms\n    )\n    choice_weights = (\n        None if weight_idx is None else _select_rand_weights(weight_idx)\n    )\n    return RandAugment(ra_ops, num_layers, choice_weights=choice_weights)\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/datasets/random_erasing.py",
    "content": "\"\"\"\nThis implementation is based on\nhttps://github.com/rwightman/pytorch-image-models/blob/master/timm/data/random_erasing.py\npulished under an Apache License 2.0.\n\"\"\"\nimport math\nimport random\nimport torch\n\n\ndef _get_pixels(\n    per_pixel, rand_color, patch_size, dtype=torch.float32, device=\"cuda\"\n):\n    # NOTE I've seen CUDA illegal memory access errors being caused by the normal_()\n    # paths, flip the order so normal is run on CPU if this becomes a problem\n    # Issue has been fixed in master https://github.com/pytorch/pytorch/issues/19508\n    if per_pixel:\n        return torch.empty(patch_size, dtype=dtype, device=device).normal_()\n    elif rand_color:\n        return torch.empty(\n            (patch_size[0], 1, 1), dtype=dtype, device=device\n        ).normal_()\n    else:\n        return torch.zeros((patch_size[0], 1, 1), dtype=dtype, device=device)\n\n\nclass RandomErasing:\n    \"\"\"Randomly selects a rectangle region in an image and erases its pixels.\n        'Random Erasing Data Augmentation' by Zhong et al.\n        See https://arxiv.org/pdf/1708.04896.pdf\n        This variant of RandomErasing is intended to be applied to either a batch\n        or single image tensor after it has been normalized by dataset mean and std.\n    Args:\n         probability: Probability that the Random Erasing operation will be performed.\n         min_area: Minimum percentage of erased area wrt input image area.\n         max_area: Maximum percentage of erased area wrt input image area.\n         min_aspect: Minimum aspect ratio of erased area.\n         mode: pixel color mode, one of 'const', 'rand', or 'pixel'\n            'const' - erase block is constant color of 0 for all channels\n            'rand'  - erase block is same per-channel random (normal) color\n            'pixel' - erase block is per-pixel random (normal) color\n        max_count: maximum number of erasing blocks per image, area per box is scaled by count.\n            per-image count is randomly chosen between 1 and this value.\n    \"\"\"\n\n    def __init__(\n        self,\n        probability=0.5,\n        min_area=0.02,\n        max_area=1 / 3,\n        min_aspect=0.3,\n        max_aspect=None,\n        mode=\"const\",\n        min_count=1,\n        max_count=None,\n        num_splits=0,\n        device=\"cuda\",\n        cube=True,\n    ):\n        self.probability = probability\n        self.min_area = min_area\n        self.max_area = max_area\n        max_aspect = max_aspect or 1 / min_aspect\n        self.log_aspect_ratio = (math.log(min_aspect), math.log(max_aspect))\n        self.min_count = min_count\n        self.max_count = max_count or min_count\n        self.num_splits = num_splits\n        mode = mode.lower()\n        self.rand_color = False\n        self.per_pixel = False\n        self.cube = cube\n        if mode == \"rand\":\n            self.rand_color = True  # per block random normal\n        elif mode == \"pixel\":\n            self.per_pixel = True  # per pixel random normal\n        else:\n            assert not mode or mode == \"const\"\n        self.device = device\n\n    def _erase(self, img, chan, img_h, img_w, dtype):\n        if random.random() > self.probability:\n            return\n        area = img_h * img_w\n        count = (\n            self.min_count\n            if self.min_count == self.max_count\n            else random.randint(self.min_count, self.max_count)\n        )\n        for _ in range(count):\n            for _ in range(10):\n                target_area = (\n                    random.uniform(self.min_area, self.max_area) * area / count\n                )\n                aspect_ratio = math.exp(random.uniform(*self.log_aspect_ratio))\n                h = int(round(math.sqrt(target_area * aspect_ratio)))\n                w = int(round(math.sqrt(target_area / aspect_ratio)))\n                if w < img_w and h < img_h:\n                    top = random.randint(0, img_h - h)\n                    left = random.randint(0, img_w - w)\n                    img[:, top : top + h, left : left + w] = _get_pixels(\n                        self.per_pixel,\n                        self.rand_color,\n                        (chan, h, w),\n                        dtype=dtype,\n                        device=self.device,\n                    )\n                    break\n\n    def _erase_cube(\n        self,\n        img,\n        batch_start,\n        batch_size,\n        chan,\n        img_h,\n        img_w,\n        dtype,\n    ):\n        if random.random() > self.probability:\n            return\n        area = img_h * img_w\n        count = (\n            self.min_count\n            if self.min_count == self.max_count\n            else random.randint(self.min_count, self.max_count)\n        )\n        for _ in range(count):\n            for _ in range(100):\n                target_area = (\n                    random.uniform(self.min_area, self.max_area) * area / count\n                )\n                aspect_ratio = math.exp(random.uniform(*self.log_aspect_ratio))\n                h = int(round(math.sqrt(target_area * aspect_ratio)))\n                w = int(round(math.sqrt(target_area / aspect_ratio)))\n                if w < img_w and h < img_h:\n                    top = random.randint(0, img_h - h)\n                    left = random.randint(0, img_w - w)\n                    for i in range(batch_start, batch_size):\n                        img_instance = img[i]\n                        img_instance[\n                            :, top : top + h, left : left + w\n                        ] = _get_pixels(\n                            self.per_pixel,\n                            self.rand_color,\n                            (chan, h, w),\n                            dtype=dtype,\n                            device=self.device,\n                        )\n                    break\n\n    def __call__(self, input):\n        if len(input.size()) == 3:\n            self._erase(input, *input.size(), input.dtype)\n        else:\n            batch_size, chan, img_h, img_w = input.size()\n            # skip first slice of batch if num_splits is set (for clean portion of samples)\n            batch_start = (\n                batch_size // self.num_splits if self.num_splits > 1 else 0\n            )\n            if self.cube:\n                self._erase_cube(\n                    input,\n                    batch_start,\n                    batch_size,\n                    chan,\n                    img_h,\n                    img_w,\n                    input.dtype,\n                )\n            else:\n                for i in range(batch_start, batch_size):\n                    self._erase(input[i], chan, img_h, img_w, input.dtype)\n        return input\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/datasets/ssv2.py",
    "content": "import os\nimport io\nimport cv2\nimport numpy as np\nimport torch\nfrom torchvision import transforms\nimport warnings\nfrom decord import VideoReader, cpu\nfrom torch.utils.data import Dataset\nfrom .random_erasing import RandomErasing\nfrom .video_transforms import (\n    Compose, Resize, CenterCrop, Normalize,\n    create_random_augment, random_short_side_scale_jitter, \n    random_crop, random_resized_crop_with_shift, random_resized_crop,\n    horizontal_flip, random_short_side_scale_jitter, uniform_crop, \n)\nfrom .volume_transforms import ClipToTensor\n\ntry:\n    from petrel_client.client import Client\n    has_client = True\nexcept ImportError:\n    has_client = False\n\n\nclass SSRawFrameClsDataset(Dataset):\n    \"\"\"Load your own raw frame classification dataset.\"\"\"\n\n    def __init__(self, anno_path, prefix='', split=' ', mode='train', clip_len=8,\n                 crop_size=224, short_side_size=256, new_height=256, new_width=340,\n                 keep_aspect_ratio=True, num_segment=1, num_crop=1, test_num_segment=10,\n                 test_num_crop=3, filename_tmpl='img_{:05}.jpg', args=None):\n        self.anno_path = anno_path\n        self.prefix = prefix\n        self.split = split\n        self.mode = mode\n        self.clip_len = clip_len\n        self.crop_size = crop_size\n        self.short_side_size = short_side_size\n        self.new_height = new_height\n        self.new_width = new_width\n        self.keep_aspect_ratio = keep_aspect_ratio\n        self.num_segment = num_segment\n        self.test_num_segment = test_num_segment\n        self.num_crop = num_crop\n        self.test_num_crop = test_num_crop\n        self.filename_tmpl = filename_tmpl\n        self.args = args\n        self.aug = False\n        self.rand_erase = False\n\n        self.client = None\n        if has_client:\n            self.client = Client('~/petreloss.conf')\n\n        if self.mode in ['train']:\n            self.aug = True\n            if self.args.reprob > 0:\n                self.rand_erase = True\n        if VideoReader is None:\n            raise ImportError(\n                \"Unable to import `decord` which is required to read videos.\")\n\n        import pandas as pd\n        cleaned = pd.read_csv(self.anno_path, header=None, delimiter=self.split)\n        self.dataset_samples = list(cleaned.values[:, 0])\n        self.total_frames = list(cleaned.values[:, 1])\n        self.label_array = list(cleaned.values[:, -1])\n\n        if (mode == 'train'):\n            pass\n\n        elif (mode == 'validation'):\n            self.data_transform = Compose([\n                Resize(self.short_side_size,\n                                        interpolation='bilinear'),\n                CenterCrop(size=(self.crop_size,\n                                                  self.crop_size)),\n                ClipToTensor(),\n                Normalize(mean=[0.485, 0.456, 0.406],\n                                           std=[0.229, 0.224, 0.225])\n            ])\n        elif mode == 'test':\n            self.data_resize = Compose([\n                Resize(size=(short_side_size),\n                                        interpolation='bilinear')\n            ])\n            self.data_transform = Compose([\n                ClipToTensor(),\n                Normalize(mean=[0.485, 0.456, 0.406],\n                                           std=[0.229, 0.224, 0.225])\n            ])\n            self.test_seg = []\n            self.test_dataset = []\n            self.test_total_frames = []\n            self.test_label_array = []\n            for ck in range(self.test_num_segment):\n                for cp in range(self.test_num_crop):\n                    for idx in range(len(self.label_array)):\n                        self.test_seg.append((ck, cp))\n                        self.test_dataset.append(self.dataset_samples[idx])\n                        self.test_total_frames.append(self.total_frames[idx])\n                        self.test_label_array.append(self.label_array[idx])\n\n    def __getitem__(self, index):\n        if self.mode == 'train':\n            args = self.args\n            scale_t = 1\n\n            sample = self.dataset_samples[index]\n            total_frame = self.total_frames[index]\n            buffer = self.load_frame(sample,\n                                     total_frame,\n                                     sample_rate_scale=scale_t)  # T H W C\n            if len(buffer) == 0:\n                while len(buffer) == 0:\n                    warnings.warn(\n                        \"video {} not correctly loaded during training\".format(\n                            sample))\n                    index = np.random.randint(self.__len__())\n                    sample = self.dataset_samples[index]\n                    total_frame = self.total_frames[index]\n                    buffer = self.load_frame(sample,\n                                             total_frame,\n                                             sample_rate_scale=scale_t)\n\n            if args.num_sample > 1:\n                frame_list = []\n                label_list = []\n                index_list = []\n                for _ in range(args.num_sample):\n                    new_frames = self._aug_frame(buffer, args)\n                    label = self.label_array[index]\n                    frame_list.append(new_frames)\n                    label_list.append(label)\n                    index_list.append(index)\n                return frame_list, label_list, index_list, {}\n            else:\n                buffer = self._aug_frame(buffer, args)\n\n            return buffer, self.label_array[index], index, {}\n\n        elif self.mode == 'validation':\n            sample = self.dataset_samples[index]\n            total_frame = self.total_frames[index]\n            buffer = self.load_frame(sample, total_frame)\n            if len(buffer) == 0:\n                while len(buffer) == 0:\n                    warnings.warn(\n                        \"video {} not correctly loaded during validation\".\n                        format(sample))\n                    index = np.random.randint(self.__len__())\n                    sample = self.dataset_samples[index]\n                    buffer = self.load_frame(sample, total_frame)\n            buffer = self.data_transform(buffer)\n            return buffer, self.label_array[index], sample.split(\n                \"/\")[-1].split(\".\")[0]\n\n        elif self.mode == 'test':\n            sample = self.test_dataset[index]\n            total_frame = self.test_total_frames[index]\n            chunk_nb, split_nb = self.test_seg[index]\n            buffer = self.load_frame(sample, total_frame)\n\n            while len(buffer) == 0:\n                warnings.warn(\"video {}, temporal {}, spatial {} not found during testing\".format(\\\n                    str(self.test_dataset[index]), chunk_nb, split_nb))\n                index = np.random.randint(self.__len__())\n                sample = self.test_dataset[index]\n                total_frame = self.test_total_frames[index]\n                chunk_nb, split_nb = self.test_seg[index]\n                buffer = self.load_frame(sample, total_frame)\n\n            buffer = self.data_resize(buffer)\n            if isinstance(buffer, list):\n                buffer = np.stack(buffer, 0)\n\n            spatial_step = 1.0 * (max(buffer.shape[1], buffer.shape[2]) - self.short_side_size) \\\n                                / (self.test_num_crop - 1)\n            temporal_start = chunk_nb\n            spatial_start = int(split_nb * spatial_step)\n            if buffer.shape[1] >= buffer.shape[2]:\n                buffer = buffer[temporal_start::self.test_num_segment, \\\n                       spatial_start:spatial_start + self.short_side_size, :, :]\n            else:\n                buffer = buffer[temporal_start::self.test_num_segment, \\\n                       :, spatial_start:spatial_start + self.short_side_size, :]\n\n            buffer = self.data_transform(buffer)\n            return buffer, self.test_label_array[index], sample.split(\"/\")[-1].split(\".\")[0], \\\n                   chunk_nb, split_nb\n        else:\n            raise NameError('mode {} unkown'.format(self.mode))\n\n    def _aug_frame(\n        self,\n        buffer,\n        args,\n    ):\n\n        aug_transform = create_random_augment(\n            input_size=(self.crop_size, self.crop_size),\n            auto_augment=args.aa,\n            interpolation=args.train_interpolation,\n        )\n\n        buffer = [transforms.ToPILImage()(frame) for frame in buffer]\n\n        buffer = aug_transform(buffer)\n\n        buffer = [transforms.ToTensor()(img) for img in buffer]\n        buffer = torch.stack(buffer)  # T C H W\n        buffer = buffer.permute(0, 2, 3, 1)  # T H W C\n\n        # T H W C\n        buffer = tensor_normalize(buffer, [0.485, 0.456, 0.406],\n                                  [0.229, 0.224, 0.225])\n        # T H W C -> C T H W.\n        buffer = buffer.permute(3, 0, 1, 2)\n        # Perform data augmentation.\n        scl, asp = (\n            [0.08, 1.0],\n            [0.75, 1.3333],\n        )\n\n        buffer = spatial_sampling(\n            buffer,\n            spatial_idx=-1,\n            min_scale=256,\n            max_scale=320,\n            crop_size=self.crop_size,\n            random_horizontal_flip=False if args.data_set == 'SSV2' else True,\n            inverse_uniform_sampling=False,\n            aspect_ratio=asp,\n            scale=scl,\n            motion_shift=False)\n\n        if self.rand_erase:\n            erase_transform = RandomErasing(\n                args.reprob,\n                mode=args.remode,\n                max_count=args.recount,\n                num_splits=args.recount,\n                device=\"cpu\",\n            )\n            buffer = buffer.permute(1, 0, 2, 3)\n            buffer = erase_transform(buffer)\n            buffer = buffer.permute(1, 0, 2, 3)\n\n        return buffer\n\n    def load_frame(self, sample, num_frames, sample_rate_scale=1):\n        \"\"\"Load video content using Decord\"\"\"\n        fname = sample\n        fname = os.path.join(self.prefix, fname)\n\n        if self.mode == 'test':\n            tick = num_frames / float(self.num_segment)\n            all_index = []\n            for t_seg in range(self.test_num_segment):\n                tmp_index = [\n                    int(t_seg * tick / self.test_num_segment + tick * x)\n                    for x in range(self.num_segment)\n                ]\n                all_index.extend(tmp_index)\n            all_index = list(np.sort(np.array(all_index)))\n            imgs = []\n            for idx in all_index:\n                frame_fname = os.path.join(fname, self.filename_tmpl.format(idx + 1)) \n                img_bytes = self.client.get(frame_fname)\n                img_np = np.frombuffer(img_bytes, np.uint8)\n                img = cv2.imdecode(img_np, cv2.IMREAD_COLOR)\n                cv2.cvtColor(img, cv2.COLOR_BGR2RGB, img)\n                imgs.append(img)\n            buffer = np.array(imgs)\n            return buffer\n\n        # handle temporal segments\n        average_duration = num_frames // self.num_segment\n        all_index = []\n        if average_duration > 0:\n            if self.mode == 'validation':\n                all_index = list(\n                    np.multiply(list(range(self.num_segment)),\n                                average_duration) +\n                    np.ones(self.num_segment, dtype=int) *\n                    (average_duration // 2))\n            else:\n                all_index = list(\n                    np.multiply(list(range(self.num_segment)),\n                                average_duration) +\n                    np.random.randint(average_duration, size=self.num_segment))\n        elif num_frames > self.num_segment:\n            if self.mode == 'validation':\n                all_index = list(range(self.num_segment))\n            else:\n                all_index = list(\n                    np.sort(\n                        np.random.randint(num_frames, size=self.num_segment)))\n        else:\n            all_index = [0] * (self.num_segment - num_frames) + list(\n                range(num_frames))\n        all_index = list(np.array(all_index))\n        imgs = []\n        for idx in all_index:\n            frame_fname = os.path.join(fname, self.filename_tmpl.format(idx + 1))\n            img_bytes = self.client.get(frame_fname)\n            img_np = np.frombuffer(img_bytes, np.uint8)\n            img = cv2.imdecode(img_np, cv2.IMREAD_COLOR)\n            cv2.cvtColor(img, cv2.COLOR_BGR2RGB, img)\n            imgs.append(img)\n        buffer = np.array(imgs)\n        return buffer\n\n    def __len__(self):\n        if self.mode != 'test':\n            return len(self.dataset_samples)\n        else:\n            return len(self.test_dataset)\n\n\nclass SSVideoClsDataset(Dataset):\n    \"\"\"Load your own video classification dataset.\"\"\"\n\n    def __init__(self, anno_path, prefix='', split=' ', mode='train', clip_len=8,\n                crop_size=224, short_side_size=256, new_height=256,\n                new_width=340, keep_aspect_ratio=True, num_segment=1,\n                num_crop=1, test_num_segment=10, test_num_crop=3, args=None):\n        self.anno_path = anno_path\n        self.prefix = prefix\n        self.split = split\n        self.mode = mode\n        self.clip_len = clip_len\n        self.crop_size = crop_size\n        self.short_side_size = short_side_size\n        self.new_height = new_height\n        self.new_width = new_width\n        self.keep_aspect_ratio = keep_aspect_ratio\n        self.num_segment = num_segment\n        self.test_num_segment = test_num_segment\n        self.num_crop = num_crop\n        self.test_num_crop = test_num_crop\n        self.args = args\n        self.aug = False\n        self.rand_erase = False\n        \n        self.client = None\n        if has_client:\n            self.client = Client('~/petreloss.conf')\n\n        if self.mode in ['train']:\n            self.aug = True\n            if self.args.reprob > 0:\n                self.rand_erase = True\n        if VideoReader is None:\n            raise ImportError(\"Unable to import `decord` which is required to read videos.\")\n\n        import pandas as pd\n        cleaned = pd.read_csv(self.anno_path, header=None, delimiter=self.split)\n        self.dataset_samples = list(cleaned.values[:, 0])\n        self.label_array = list(cleaned.values[:, 1])\n\n        if (mode == 'train'):\n            pass\n\n        elif (mode == 'validation'):\n            self.data_transform = Compose([\n                Resize(self.short_side_size, interpolation='bilinear'),\n                CenterCrop(size=(self.crop_size, self.crop_size)),\n                ClipToTensor(),\n                Normalize(mean=[0.485, 0.456, 0.406],\n                                        std=[0.229, 0.224, 0.225])\n            ])\n        elif mode == 'test':\n            self.data_resize = Compose([\n                Resize(size=(short_side_size), interpolation='bilinear')\n            ])\n            self.data_transform = Compose([\n                ClipToTensor(),\n                Normalize(mean=[0.485, 0.456, 0.406],\n                                        std=[0.229, 0.224, 0.225])\n            ])\n            self.test_seg = []\n            self.test_dataset = []\n            self.test_label_array = []\n            for ck in range(self.test_num_segment):\n                for cp in range(self.test_num_crop):\n                    for idx in range(len(self.label_array)):\n                        sample_label = self.label_array[idx]\n                        self.test_label_array.append(sample_label)\n                        self.test_dataset.append(self.dataset_samples[idx])\n                        self.test_seg.append((ck, cp))\n\n    def __getitem__(self, index):\n        if self.mode == 'train':\n            args = self.args \n            scale_t = 1\n\n            sample = self.dataset_samples[index]\n            buffer = self.loadvideo_decord(sample, sample_rate_scale=scale_t) # T H W C\n            if len(buffer) == 0:\n                while len(buffer) == 0:\n                    warnings.warn(\"video {} not correctly loaded during training\".format(sample))\n                    index = np.random.randint(self.__len__())\n                    sample = self.dataset_samples[index]\n                    buffer = self.loadvideo_decord(sample, sample_rate_scale=scale_t)\n\n            if args.num_sample > 1:\n                frame_list = []\n                label_list = []\n                index_list = []\n                for _ in range(args.num_sample):\n                    new_frames = self._aug_frame(buffer, args)\n                    label = self.label_array[index]\n                    frame_list.append(new_frames)\n                    label_list.append(label)\n                    index_list.append(index)\n                return frame_list, label_list, index_list, {}\n            else:\n                buffer = self._aug_frame(buffer, args)\n            \n            return buffer, self.label_array[index], index, {}\n\n        elif self.mode == 'validation':\n            sample = self.dataset_samples[index]\n            buffer = self.loadvideo_decord(sample)\n            if len(buffer) == 0:\n                while len(buffer) == 0:\n                    warnings.warn(\"video {} not correctly loaded during validation\".format(sample))\n                    index = np.random.randint(self.__len__())\n                    sample = self.dataset_samples[index]\n                    buffer = self.loadvideo_decord(sample)\n            buffer = self.data_transform(buffer)\n            return buffer, self.label_array[index], sample.split(\"/\")[-1].split(\".\")[0]\n\n        elif self.mode == 'test':\n            sample = self.test_dataset[index]\n            chunk_nb, split_nb = self.test_seg[index]\n            buffer = self.loadvideo_decord(sample)\n\n            while len(buffer) == 0:\n                warnings.warn(\"video {}, temporal {}, spatial {} not found during testing\".format(\\\n                    str(self.test_dataset[index]), chunk_nb, split_nb))\n                index = np.random.randint(self.__len__())\n                sample = self.test_dataset[index]\n                chunk_nb, split_nb = self.test_seg[index]\n                buffer = self.loadvideo_decord(sample)\n\n            buffer = self.data_resize(buffer)\n            if isinstance(buffer, list):\n                buffer = np.stack(buffer, 0)\n\n            spatial_step = 1.0 * (max(buffer.shape[1], buffer.shape[2]) - self.short_side_size) \\\n                                / (self.test_num_crop - 1)\n            temporal_start = chunk_nb # 0/1\n            spatial_start = int(split_nb * spatial_step)\n            if buffer.shape[1] >= buffer.shape[2]:\n                buffer = buffer[temporal_start::2, \\\n                       spatial_start:spatial_start + self.short_side_size, :, :]\n            else:\n                buffer = buffer[temporal_start::2, \\\n                       :, spatial_start:spatial_start + self.short_side_size, :]\n\n            buffer = self.data_transform(buffer)\n            return buffer, self.test_label_array[index], sample.split(\"/\")[-1].split(\".\")[0], \\\n                   chunk_nb, split_nb\n        else:\n            raise NameError('mode {} unkown'.format(self.mode))\n\n    def _aug_frame(\n        self,\n        buffer,\n        args,\n    ):\n\n        aug_transform = create_random_augment(\n            input_size=(self.crop_size, self.crop_size),\n            auto_augment=args.aa,\n            interpolation=args.train_interpolation,\n        )\n\n        buffer = [\n            transforms.ToPILImage()(frame) for frame in buffer\n        ]\n\n        buffer = aug_transform(buffer)\n\n        buffer = [transforms.ToTensor()(img) for img in buffer]\n        buffer = torch.stack(buffer) # T C H W\n        buffer = buffer.permute(0, 2, 3, 1) # T H W C \n        \n        # T H W C \n        buffer = tensor_normalize(\n            buffer, [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]\n        )\n        # T H W C -> C T H W.\n        buffer = buffer.permute(3, 0, 1, 2)\n        # Perform data augmentation.\n        scl, asp = (\n            [0.08, 1.0],\n            [0.75, 1.3333],\n        )\n\n        buffer = spatial_sampling(\n            buffer,\n            spatial_idx=-1,\n            min_scale=256,\n            max_scale=320,\n            crop_size=self.crop_size,\n            random_horizontal_flip=False if args.data_set == 'SSV2' else True,\n            inverse_uniform_sampling=False,\n            aspect_ratio=asp,\n            scale=scl,\n            motion_shift=False\n        )\n\n        if self.rand_erase:\n            erase_transform = RandomErasing(\n                args.reprob,\n                mode=args.remode,\n                max_count=args.recount,\n                num_splits=args.recount,\n                device=\"cpu\",\n            )\n            buffer = buffer.permute(1, 0, 2, 3)\n            buffer = erase_transform(buffer)\n            buffer = buffer.permute(1, 0, 2, 3)\n\n        return buffer\n\n\n    def loadvideo_decord(self, sample, sample_rate_scale=1):\n        \"\"\"Load video content using Decord\"\"\"\n        fname = sample\n        fname = os.path.join(self.prefix, fname)\n\n        try:\n            if self.keep_aspect_ratio:\n                if fname.startswith('s3'):\n                    video_bytes = self.client.get(fname)\n                    vr = VideoReader(io.BytesIO(video_bytes),\n                                     num_threads=1,\n                                     ctx=cpu(0))\n                else:\n                    vr = VideoReader(fname, num_threads=1, ctx=cpu(0))\n            else:\n                if fname.startswith('s3:'):\n                    video_bytes = self.client.get(fname)\n                    vr = VideoReader(io.BytesIO(video_bytes),\n                                     width=self.new_width,\n                                     height=self.new_height,\n                                     num_threads=1,\n                                     ctx=cpu(0))\n                else:\n                    vr = VideoReader(fname, width=self.new_width, height=self.new_height,\n                                    num_threads=1, ctx=cpu(0))\n        except:\n            print(\"video cannot be loaded by decord: \", fname)\n            return []\n\n        if self.mode == 'test':\n            tick = len(vr) / float(self.num_segment)\n            all_index = list(np.array([int(tick / 2.0 + tick * x) for x in range(self.num_segment)] +\n                               [int(tick * x) for x in range(self.num_segment)]))\n            while len(all_index) < (self.num_segment * self.test_num_segment):\n                all_index.append(all_index[-1])\n            all_index = np.sort(np.array(all_index))\n            vr.seek(0)\n            buffer = vr.get_batch(all_index).asnumpy()\n            return buffer\n        elif self.mode == 'validation':\n            tick = len(vr) / float(self.num_segment)\n            all_index = np.array([int(tick / 2.0 + tick * x) for x in range(self.num_segment)])\n            vr.seek(0)\n            buffer = vr.get_batch(all_index).asnumpy()\n            return buffer\n\n        # handle temporal segments\n        average_duration = len(vr) // self.num_segment\n        if average_duration > 0:\n            all_index = list(np.multiply(list(range(self.num_segment)), average_duration) + np.random.randint(average_duration,\n                                                                                                        size=self.num_segment))\n        elif len(vr) > self.num_segment:\n            all_index = list(np.sort(np.random.randint(len(vr), size=self.num_segment)))\n        else:\n            all_index = list(np.zeros((self.num_segment,)))\n        vr.seek(0)\n        buffer = vr.get_batch(all_index).asnumpy()\n        return buffer\n\n    def __len__(self):\n        if self.mode != 'test':\n            return len(self.dataset_samples)\n        else:\n            return len(self.test_dataset)\n\n\ndef spatial_sampling(\n    frames,\n    spatial_idx=-1,\n    min_scale=256,\n    max_scale=320,\n    crop_size=224,\n    random_horizontal_flip=True,\n    inverse_uniform_sampling=False,\n    aspect_ratio=None,\n    scale=None,\n    motion_shift=False,\n):\n    \"\"\"\n    Perform spatial sampling on the given video frames. If spatial_idx is\n    -1, perform random scale, random crop, and random flip on the given\n    frames. If spatial_idx is 0, 1, or 2, perform spatial uniform sampling\n    with the given spatial_idx.\n    Args:\n        frames (tensor): frames of images sampled from the video. The\n            dimension is `num frames` x `height` x `width` x `channel`.\n        spatial_idx (int): if -1, perform random spatial sampling. If 0, 1,\n            or 2, perform left, center, right crop if width is larger than\n            height, and perform top, center, buttom crop if height is larger\n            than width.\n        min_scale (int): the minimal size of scaling.\n        max_scale (int): the maximal size of scaling.\n        crop_size (int): the size of height and width used to crop the\n            frames.\n        inverse_uniform_sampling (bool): if True, sample uniformly in\n            [1 / max_scale, 1 / min_scale] and take a reciprocal to get the\n            scale. If False, take a uniform sample from [min_scale,\n            max_scale].\n        aspect_ratio (list): Aspect ratio range for resizing.\n        scale (list): Scale range for resizing.\n        motion_shift (bool): Whether to apply motion shift for resizing.\n    Returns:\n        frames (tensor): spatially sampled frames.\n    \"\"\"\n    assert spatial_idx in [-1, 0, 1, 2]\n    if spatial_idx == -1:\n        if aspect_ratio is None and scale is None:\n            frames, _ = random_short_side_scale_jitter(\n                images=frames,\n                min_size=min_scale,\n                max_size=max_scale,\n                inverse_uniform_sampling=inverse_uniform_sampling,\n            )\n            frames, _ = random_crop(frames, crop_size)\n        else:\n            transform_func = (\n                random_resized_crop_with_shift\n                if motion_shift\n                else random_resized_crop\n            )\n            frames = transform_func(\n                images=frames,\n                target_height=crop_size,\n                target_width=crop_size,\n                scale=scale,\n                ratio=aspect_ratio,\n            )\n        if random_horizontal_flip:\n            frames, _ = horizontal_flip(0.5, frames)\n    else:\n        # The testing is deterministic and no jitter should be performed.\n        # min_scale, max_scale, and crop_size are expect to be the same.\n        assert len({min_scale, max_scale, crop_size}) == 1\n        frames, _ = random_short_side_scale_jitter(\n            frames, min_scale, max_scale\n        )\n        frames, _ = uniform_crop(frames, crop_size, spatial_idx)\n    return frames\n\n\ndef tensor_normalize(tensor, mean, std):\n    \"\"\"\n    Normalize a given tensor by subtracting the mean and dividing the std.\n    Args:\n        tensor (tensor): tensor to normalize.\n        mean (tensor or list): mean value to subtract.\n        std (tensor or list): std to divide.\n    \"\"\"\n    if tensor.dtype == torch.uint8:\n        tensor = tensor.float()\n        tensor = tensor / 255.0\n    if type(mean) == list:\n        mean = torch.tensor(mean)\n    if type(std) == list:\n        std = torch.tensor(std)\n    tensor = tensor - mean\n    tensor = tensor / std\n    return tensor\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/datasets/transforms.py",
    "content": "import torch\nimport torchvision.transforms.functional as F\nimport warnings\nimport random\nimport numpy as np\nimport torchvision\nfrom PIL import Image, ImageOps\nimport numbers\n\n\nclass GroupRandomCrop(object):\n    def __init__(self, size):\n        if isinstance(size, numbers.Number):\n            self.size = (int(size), int(size))\n        else:\n            self.size = size\n\n    def __call__(self, img_tuple):\n        img_group, label = img_tuple\n        \n        w, h = img_group[0].size\n        th, tw = self.size\n\n        out_images = list()\n\n        x1 = random.randint(0, w - tw)\n        y1 = random.randint(0, h - th)\n\n        for img in img_group:\n            assert(img.size[0] == w and img.size[1] == h)\n            if w == tw and h == th:\n                out_images.append(img)\n            else:\n                out_images.append(img.crop((x1, y1, x1 + tw, y1 + th)))\n\n        return (out_images, label)\n\n\nclass GroupCenterCrop(object):\n    def __init__(self, size):\n        self.worker = torchvision.transforms.CenterCrop(size)\n\n    def __call__(self, img_tuple):\n        img_group, label = img_tuple\n        return ([self.worker(img) for img in img_group], label)\n\n\nclass GroupRandomHorizontalFlip(object):\n    def __init__(self, flip=False):\n        self.flip = flip\n\n    def __call__(self, img_tuple):\n        v = random.random()\n        if self.flip and v < 0.5:\n            img_group, label = img_tuple\n            ret = [img.transpose(Image.FLIP_LEFT_RIGHT) for img in img_group]\n            return (ret, label)\n        else:\n            return img_tuple\n\n\nclass GroupNormalize(object):\n    def __init__(self, mean, std):\n        self.mean = mean\n        self.std = std\n\n    def __call__(self, tensor_tuple):\n        tensor, label = tensor_tuple\n        rep_mean = self.mean * (tensor.size()[0]//len(self.mean))\n        rep_std = self.std * (tensor.size()[0]//len(self.std))\n        \n        # TODO: make efficient\n        for t, m, s in zip(tensor, rep_mean, rep_std):\n            t.sub_(m).div_(s)\n\n        return (tensor,label)\n\n\nclass GroupGrayScale(object):\n    def __init__(self, size):\n        self.worker = torchvision.transforms.Grayscale(size)\n\n    def __call__(self, img_tuple):\n        img_group, label = img_tuple\n        return ([self.worker(img) for img in img_group], label)\n\n\nclass GroupColorJitter(object):\n    def __init__(self, size):\n        self.worker = torchvision.transforms.ColorJitter(\n            brightness=size, contrast=size, saturation=size\n        )\n\n    def __call__(self, img_tuple):\n        img_group, label = img_tuple\n        return ([self.worker(img) for img in img_group], label)\n\n    \nclass GroupScale(object):\n    \"\"\" Rescales the input PIL.Image to the given 'size'.\n    'size' will be the size of the smaller edge.\n    For example, if height > width, then image will be\n    rescaled to (size * height / width, size)\n    size: size of the smaller edge\n    interpolation: Default: PIL.Image.BILINEAR\n    \"\"\"\n\n    def __init__(self, size, interpolation=Image.BILINEAR):\n        self.worker = torchvision.transforms.Resize(size, interpolation)\n\n    def __call__(self, img_tuple):\n        img_group, label = img_tuple\n        return ([self.worker(img) for img in img_group], label)\n\n\nclass GroupMultiScaleCrop(object):\n\n    def __init__(self, input_size, scales=None, max_distort=1, fix_crop=True, more_fix_crop=True):\n        self.scales = scales if scales is not None else [1, 875, .75, .66]\n        self.max_distort = max_distort\n        self.fix_crop = fix_crop\n        self.more_fix_crop = more_fix_crop\n        self.input_size = input_size if not isinstance(input_size, int) else [input_size, input_size]\n        self.interpolation = Image.BILINEAR\n\n    def __call__(self, img_tuple):\n        img_group, label = img_tuple\n        \n        im_size = img_group[0].size\n\n        crop_w, crop_h, offset_w, offset_h = self._sample_crop_size(im_size)\n        crop_img_group = [img.crop((offset_w, offset_h, offset_w + crop_w, offset_h + crop_h)) for img in img_group]\n        ret_img_group = [img.resize((self.input_size[0], self.input_size[1]), self.interpolation) for img in crop_img_group]\n        return (ret_img_group, label)\n\n    def _sample_crop_size(self, im_size):\n        image_w, image_h = im_size[0], im_size[1]\n\n        # find a crop size\n        base_size = min(image_w, image_h)\n        crop_sizes = [int(base_size * x) for x in self.scales]\n        crop_h = [self.input_size[1] if abs(x - self.input_size[1]) < 3 else x for x in crop_sizes]\n        crop_w = [self.input_size[0] if abs(x - self.input_size[0]) < 3 else x for x in crop_sizes]\n\n        pairs = []\n        for i, h in enumerate(crop_h):\n            for j, w in enumerate(crop_w):\n                if abs(i - j) <= self.max_distort:\n                    pairs.append((w, h))\n\n        crop_pair = random.choice(pairs)\n        if not self.fix_crop:\n            w_offset = random.randint(0, image_w - crop_pair[0])\n            h_offset = random.randint(0, image_h - crop_pair[1])\n        else:\n            w_offset, h_offset = self._sample_fix_offset(image_w, image_h, crop_pair[0], crop_pair[1])\n\n        return crop_pair[0], crop_pair[1], w_offset, h_offset\n\n    def _sample_fix_offset(self, image_w, image_h, crop_w, crop_h):\n        offsets = self.fill_fix_offset(self.more_fix_crop, image_w, image_h, crop_w, crop_h)\n        return random.choice(offsets)\n\n    @staticmethod\n    def fill_fix_offset(more_fix_crop, image_w, image_h, crop_w, crop_h):\n        w_step = (image_w - crop_w) // 4\n        h_step = (image_h - crop_h) // 4\n\n        ret = list()\n        ret.append((0, 0))  # upper left\n        ret.append((4 * w_step, 0))  # upper right\n        ret.append((0, 4 * h_step))  # lower left\n        ret.append((4 * w_step, 4 * h_step))  # lower right\n        ret.append((2 * w_step, 2 * h_step))  # center\n\n        if more_fix_crop:\n            ret.append((0, 2 * h_step))  # center left\n            ret.append((4 * w_step, 2 * h_step))  # center right\n            ret.append((2 * w_step, 4 * h_step))  # lower center\n            ret.append((2 * w_step, 0 * h_step))  # upper center\n\n            ret.append((1 * w_step, 1 * h_step))  # upper left quarter\n            ret.append((3 * w_step, 1 * h_step))  # upper right quarter\n            ret.append((1 * w_step, 3 * h_step))  # lower left quarter\n            ret.append((3 * w_step, 3 * h_step))  # lower righ quarter\n        return ret\n\n\nclass Stack(object):\n\n    def __init__(self, roll=False):\n        self.roll = roll\n\n    def __call__(self, img_tuple):\n        img_group, label = img_tuple\n        \n        if img_group[0].mode == 'L':\n            return (np.concatenate([np.expand_dims(x, 2) for x in img_group], axis=2), label)\n        elif img_group[0].mode == 'RGB':\n            if self.roll:\n                return (np.concatenate([np.array(x)[:, :, ::-1] for x in img_group], axis=2), label)\n            else:\n                return (np.concatenate(img_group, axis=2), label)\n\n\nclass ToTorchFormatTensor(object):\n    \"\"\" Converts a PIL.Image (RGB) or numpy.ndarray (H x W x C) in the range [0, 255]\n    to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] \"\"\"\n    def __init__(self, div=True):\n        self.div = div\n\n    def __call__(self, pic_tuple):\n        pic, label = pic_tuple\n        \n        if isinstance(pic, np.ndarray):\n            # handle numpy array\n            img = torch.from_numpy(pic).permute(2, 0, 1).contiguous()\n        else:\n            # handle PIL Image\n            img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))\n            img = img.view(pic.size[1], pic.size[0], len(pic.mode))\n            # put it from HWC to CHW format\n            # yikes, this transpose takes 80% of the loading time/CPU\n            img = img.transpose(0, 1).transpose(0, 2).contiguous()\n        return (img.float().div(255.) if self.div else img.float(), label)\n\n\nclass IdentityTransform(object):\n\n    def __call__(self, data):\n        return data\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/datasets/video_transforms.py",
    "content": "#!/usr/bin/env python3\nimport math\nimport numpy as np\nimport random\nimport torch\nimport torchvision.transforms.functional as F\nfrom PIL import Image\nfrom torchvision import transforms\n\nfrom .rand_augment import rand_augment_transform\nfrom .random_erasing import RandomErasing\n\nimport numbers\nimport PIL\nimport torchvision\n\nimport vbench.third_party.umt.functional as FF\n\n_pil_interpolation_to_str = {\n    Image.NEAREST: \"PIL.Image.NEAREST\",\n    Image.BILINEAR: \"PIL.Image.BILINEAR\",\n    Image.BICUBIC: \"PIL.Image.BICUBIC\",\n    Image.LANCZOS: \"PIL.Image.LANCZOS\",\n    Image.HAMMING: \"PIL.Image.HAMMING\",\n    Image.BOX: \"PIL.Image.BOX\",\n}\n\n\n_RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC)\n\n\ndef _pil_interp(method):\n    if method == \"bicubic\":\n        return Image.BICUBIC\n    elif method == \"lanczos\":\n        return Image.LANCZOS\n    elif method == \"hamming\":\n        return Image.HAMMING\n    else:\n        return Image.BILINEAR\n\n\ndef random_short_side_scale_jitter(\n    images, min_size, max_size, boxes=None, inverse_uniform_sampling=False\n):\n    \"\"\"\n    Perform a spatial short scale jittering on the given images and\n    corresponding boxes.\n    Args:\n        images (tensor): images to perform scale jitter. Dimension is\n            `num frames` x `channel` x `height` x `width`.\n        min_size (int): the minimal size to scale the frames.\n        max_size (int): the maximal size to scale the frames.\n        boxes (ndarray): optional. Corresponding boxes to images.\n            Dimension is `num boxes` x 4.\n        inverse_uniform_sampling (bool): if True, sample uniformly in\n            [1 / max_scale, 1 / min_scale] and take a reciprocal to get the\n            scale. If False, take a uniform sample from [min_scale, max_scale].\n    Returns:\n        (tensor): the scaled images with dimension of\n            `num frames` x `channel` x `new height` x `new width`.\n        (ndarray or None): the scaled boxes with dimension of\n            `num boxes` x 4.\n    \"\"\"\n    if inverse_uniform_sampling:\n        size = int(\n            round(1.0 / np.random.uniform(1.0 / max_size, 1.0 / min_size))\n        )\n    else:\n        size = int(round(np.random.uniform(min_size, max_size)))\n\n    height = images.shape[2]\n    width = images.shape[3]\n    if (width <= height and width == size) or (\n        height <= width and height == size\n    ):\n        return images, boxes\n    new_width = size\n    new_height = size\n    if width < height:\n        new_height = int(math.floor((float(height) / width) * size))\n        if boxes is not None:\n            boxes = boxes * float(new_height) / height\n    else:\n        new_width = int(math.floor((float(width) / height) * size))\n        if boxes is not None:\n            boxes = boxes * float(new_width) / width\n\n    return (\n        torch.nn.functional.interpolate(\n            images,\n            size=(new_height, new_width),\n            mode=\"bilinear\",\n            align_corners=False,\n        ),\n        boxes,\n    )\n\n\ndef crop_boxes(boxes, x_offset, y_offset):\n    \"\"\"\n    Peform crop on the bounding boxes given the offsets.\n    Args:\n        boxes (ndarray or None): bounding boxes to peform crop. The dimension\n            is `num boxes` x 4.\n        x_offset (int): cropping offset in the x axis.\n        y_offset (int): cropping offset in the y axis.\n    Returns:\n        cropped_boxes (ndarray or None): the cropped boxes with dimension of\n            `num boxes` x 4.\n    \"\"\"\n    cropped_boxes = boxes.copy()\n    cropped_boxes[:, [0, 2]] = boxes[:, [0, 2]] - x_offset\n    cropped_boxes[:, [1, 3]] = boxes[:, [1, 3]] - y_offset\n\n    return cropped_boxes\n\n\ndef random_crop(images, size, boxes=None):\n    \"\"\"\n    Perform random spatial crop on the given images and corresponding boxes.\n    Args:\n        images (tensor): images to perform random crop. The dimension is\n            `num frames` x `channel` x `height` x `width`.\n        size (int): the size of height and width to crop on the image.\n        boxes (ndarray or None): optional. Corresponding boxes to images.\n            Dimension is `num boxes` x 4.\n    Returns:\n        cropped (tensor): cropped images with dimension of\n            `num frames` x `channel` x `size` x `size`.\n        cropped_boxes (ndarray or None): the cropped boxes with dimension of\n            `num boxes` x 4.\n    \"\"\"\n    if images.shape[2] == size and images.shape[3] == size:\n        return images\n    height = images.shape[2]\n    width = images.shape[3]\n    y_offset = 0\n    if height > size:\n        y_offset = int(np.random.randint(0, height - size))\n    x_offset = 0\n    if width > size:\n        x_offset = int(np.random.randint(0, width - size))\n    cropped = images[\n        :, :, y_offset : y_offset + size, x_offset : x_offset + size\n    ]\n\n    cropped_boxes = (\n        crop_boxes(boxes, x_offset, y_offset) if boxes is not None else None\n    )\n\n    return cropped, cropped_boxes\n\n\ndef horizontal_flip(prob, images, boxes=None):\n    \"\"\"\n    Perform horizontal flip on the given images and corresponding boxes.\n    Args:\n        prob (float): probility to flip the images.\n        images (tensor): images to perform horizontal flip, the dimension is\n            `num frames` x `channel` x `height` x `width`.\n        boxes (ndarray or None): optional. Corresponding boxes to images.\n            Dimension is `num boxes` x 4.\n    Returns:\n        images (tensor): images with dimension of\n            `num frames` x `channel` x `height` x `width`.\n        flipped_boxes (ndarray or None): the flipped boxes with dimension of\n            `num boxes` x 4.\n    \"\"\"\n    if boxes is None:\n        flipped_boxes = None\n    else:\n        flipped_boxes = boxes.copy()\n\n    if np.random.uniform() < prob:\n        images = images.flip((-1))\n\n        if len(images.shape) == 3:\n            width = images.shape[2]\n        elif len(images.shape) == 4:\n            width = images.shape[3]\n        else:\n            raise NotImplementedError(\"Dimension does not supported\")\n        if boxes is not None:\n            flipped_boxes[:, [0, 2]] = width - boxes[:, [2, 0]] - 1\n\n    return images, flipped_boxes\n\n\ndef uniform_crop(images, size, spatial_idx, boxes=None, scale_size=None):\n    \"\"\"\n    Perform uniform spatial sampling on the images and corresponding boxes.\n    Args:\n        images (tensor): images to perform uniform crop. The dimension is\n            `num frames` x `channel` x `height` x `width`.\n        size (int): size of height and weight to crop the images.\n        spatial_idx (int): 0, 1, or 2 for left, center, and right crop if width\n            is larger than height. Or 0, 1, or 2 for top, center, and bottom\n            crop if height is larger than width.\n        boxes (ndarray or None): optional. Corresponding boxes to images.\n            Dimension is `num boxes` x 4.\n        scale_size (int): optinal. If not None, resize the images to scale_size before\n            performing any crop.\n    Returns:\n        cropped (tensor): images with dimension of\n            `num frames` x `channel` x `size` x `size`.\n        cropped_boxes (ndarray or None): the cropped boxes with dimension of\n            `num boxes` x 4.\n    \"\"\"\n    assert spatial_idx in [0, 1, 2]\n    ndim = len(images.shape)\n    if ndim == 3:\n        images = images.unsqueeze(0)\n    height = images.shape[2]\n    width = images.shape[3]\n\n    if scale_size is not None:\n        if width <= height:\n            width, height = scale_size, int(height / width * scale_size)\n        else:\n            width, height = int(width / height * scale_size), scale_size\n        images = torch.nn.functional.interpolate(\n            images,\n            size=(height, width),\n            mode=\"bilinear\",\n            align_corners=False,\n        )\n\n    y_offset = int(math.ceil((height - size) / 2))\n    x_offset = int(math.ceil((width - size) / 2))\n\n    if height > width:\n        if spatial_idx == 0:\n            y_offset = 0\n        elif spatial_idx == 2:\n            y_offset = height - size\n    else:\n        if spatial_idx == 0:\n            x_offset = 0\n        elif spatial_idx == 2:\n            x_offset = width - size\n    cropped = images[\n        :, :, y_offset : y_offset + size, x_offset : x_offset + size\n    ]\n    cropped_boxes = (\n        crop_boxes(boxes, x_offset, y_offset) if boxes is not None else None\n    )\n    if ndim == 3:\n        cropped = cropped.squeeze(0)\n    return cropped, cropped_boxes\n\n\ndef clip_boxes_to_image(boxes, height, width):\n    \"\"\"\n    Clip an array of boxes to an image with the given height and width.\n    Args:\n        boxes (ndarray): bounding boxes to perform clipping.\n            Dimension is `num boxes` x 4.\n        height (int): given image height.\n        width (int): given image width.\n    Returns:\n        clipped_boxes (ndarray): the clipped boxes with dimension of\n            `num boxes` x 4.\n    \"\"\"\n    clipped_boxes = boxes.copy()\n    clipped_boxes[:, [0, 2]] = np.minimum(\n        width - 1.0, np.maximum(0.0, boxes[:, [0, 2]])\n    )\n    clipped_boxes[:, [1, 3]] = np.minimum(\n        height - 1.0, np.maximum(0.0, boxes[:, [1, 3]])\n    )\n    return clipped_boxes\n\n\ndef blend(images1, images2, alpha):\n    \"\"\"\n    Blend two images with a given weight alpha.\n    Args:\n        images1 (tensor): the first images to be blended, the dimension is\n            `num frames` x `channel` x `height` x `width`.\n        images2 (tensor): the second images to be blended, the dimension is\n            `num frames` x `channel` x `height` x `width`.\n        alpha (float): the blending weight.\n    Returns:\n        (tensor): blended images, the dimension is\n            `num frames` x `channel` x `height` x `width`.\n    \"\"\"\n    return images1 * alpha + images2 * (1 - alpha)\n\n\ndef grayscale(images):\n    \"\"\"\n    Get the grayscale for the input images. The channels of images should be\n    in order BGR.\n    Args:\n        images (tensor): the input images for getting grayscale. Dimension is\n            `num frames` x `channel` x `height` x `width`.\n    Returns:\n        img_gray (tensor): blended images, the dimension is\n            `num frames` x `channel` x `height` x `width`.\n    \"\"\"\n    # R -> 0.299, G -> 0.587, B -> 0.114.\n    img_gray = torch.tensor(images)\n    gray_channel = (\n        0.299 * images[:, 2] + 0.587 * images[:, 1] + 0.114 * images[:, 0]\n    )\n    img_gray[:, 0] = gray_channel\n    img_gray[:, 1] = gray_channel\n    img_gray[:, 2] = gray_channel\n    return img_gray\n\n\ndef color_jitter(images, img_brightness=0, img_contrast=0, img_saturation=0):\n    \"\"\"\n    Perfrom a color jittering on the input images. The channels of images\n    should be in order BGR.\n    Args:\n        images (tensor): images to perform color jitter. Dimension is\n            `num frames` x `channel` x `height` x `width`.\n        img_brightness (float): jitter ratio for brightness.\n        img_contrast (float): jitter ratio for contrast.\n        img_saturation (float): jitter ratio for saturation.\n    Returns:\n        images (tensor): the jittered images, the dimension is\n            `num frames` x `channel` x `height` x `width`.\n    \"\"\"\n\n    jitter = []\n    if img_brightness != 0:\n        jitter.append(\"brightness\")\n    if img_contrast != 0:\n        jitter.append(\"contrast\")\n    if img_saturation != 0:\n        jitter.append(\"saturation\")\n\n    if len(jitter) > 0:\n        order = np.random.permutation(np.arange(len(jitter)))\n        for idx in range(0, len(jitter)):\n            if jitter[order[idx]] == \"brightness\":\n                images = brightness_jitter(img_brightness, images)\n            elif jitter[order[idx]] == \"contrast\":\n                images = contrast_jitter(img_contrast, images)\n            elif jitter[order[idx]] == \"saturation\":\n                images = saturation_jitter(img_saturation, images)\n    return images\n\n\ndef brightness_jitter(var, images):\n    \"\"\"\n    Perfrom brightness jittering on the input images. The channels of images\n    should be in order BGR.\n    Args:\n        var (float): jitter ratio for brightness.\n        images (tensor): images to perform color jitter. Dimension is\n            `num frames` x `channel` x `height` x `width`.\n    Returns:\n        images (tensor): the jittered images, the dimension is\n            `num frames` x `channel` x `height` x `width`.\n    \"\"\"\n    alpha = 1.0 + np.random.uniform(-var, var)\n\n    img_bright = torch.zeros(images.shape)\n    images = blend(images, img_bright, alpha)\n    return images\n\n\ndef contrast_jitter(var, images):\n    \"\"\"\n    Perfrom contrast jittering on the input images. The channels of images\n    should be in order BGR.\n    Args:\n        var (float): jitter ratio for contrast.\n        images (tensor): images to perform color jitter. Dimension is\n            `num frames` x `channel` x `height` x `width`.\n    Returns:\n        images (tensor): the jittered images, the dimension is\n            `num frames` x `channel` x `height` x `width`.\n    \"\"\"\n    alpha = 1.0 + np.random.uniform(-var, var)\n\n    img_gray = grayscale(images)\n    img_gray[:] = torch.mean(img_gray, dim=(1, 2, 3), keepdim=True)\n    images = blend(images, img_gray, alpha)\n    return images\n\n\ndef saturation_jitter(var, images):\n    \"\"\"\n    Perfrom saturation jittering on the input images. The channels of images\n    should be in order BGR.\n    Args:\n        var (float): jitter ratio for saturation.\n        images (tensor): images to perform color jitter. Dimension is\n            `num frames` x `channel` x `height` x `width`.\n    Returns:\n        images (tensor): the jittered images, the dimension is\n            `num frames` x `channel` x `height` x `width`.\n    \"\"\"\n    alpha = 1.0 + np.random.uniform(-var, var)\n    img_gray = grayscale(images)\n    images = blend(images, img_gray, alpha)\n\n    return images\n\n\ndef lighting_jitter(images, alphastd, eigval, eigvec):\n    \"\"\"\n    Perform AlexNet-style PCA jitter on the given images.\n    Args:\n        images (tensor): images to perform lighting jitter. Dimension is\n            `num frames` x `channel` x `height` x `width`.\n        alphastd (float): jitter ratio for PCA jitter.\n        eigval (list): eigenvalues for PCA jitter.\n        eigvec (list[list]): eigenvectors for PCA jitter.\n    Returns:\n        out_images (tensor): the jittered images, the dimension is\n            `num frames` x `channel` x `height` x `width`.\n    \"\"\"\n    if alphastd == 0:\n        return images\n    # generate alpha1, alpha2, alpha3.\n    alpha = np.random.normal(0, alphastd, size=(1, 3))\n    eig_vec = np.array(eigvec)\n    eig_val = np.reshape(eigval, (1, 3))\n    rgb = np.sum(\n        eig_vec * np.repeat(alpha, 3, axis=0) * np.repeat(eig_val, 3, axis=0),\n        axis=1,\n    )\n    out_images = torch.zeros_like(images)\n    if len(images.shape) == 3:\n        # C H W\n        channel_dim = 0\n    elif len(images.shape) == 4:\n        # T C H W\n        channel_dim = 1\n    else:\n        raise NotImplementedError(f\"Unsupported dimension {len(images.shape)}\")\n\n    for idx in range(images.shape[channel_dim]):\n        # C H W\n        if len(images.shape) == 3:\n            out_images[idx] = images[idx] + rgb[2 - idx]\n        # T C H W\n        elif len(images.shape) == 4:\n            out_images[:, idx] = images[:, idx] + rgb[2 - idx]\n        else:\n            raise NotImplementedError(\n                f\"Unsupported dimension {len(images.shape)}\"\n            )\n\n    return out_images\n\n\ndef color_normalization(images, mean, stddev):\n    \"\"\"\n    Perform color nomration on the given images.\n    Args:\n        images (tensor): images to perform color normalization. Dimension is\n            `num frames` x `channel` x `height` x `width`.\n        mean (list): mean values for normalization.\n        stddev (list): standard deviations for normalization.\n\n    Returns:\n        out_images (tensor): the noramlized images, the dimension is\n            `num frames` x `channel` x `height` x `width`.\n    \"\"\"\n    if len(images.shape) == 3:\n        assert (\n            len(mean) == images.shape[0]\n        ), \"channel mean not computed properly\"\n        assert (\n            len(stddev) == images.shape[0]\n        ), \"channel stddev not computed properly\"\n    elif len(images.shape) == 4:\n        assert (\n            len(mean) == images.shape[1]\n        ), \"channel mean not computed properly\"\n        assert (\n            len(stddev) == images.shape[1]\n        ), \"channel stddev not computed properly\"\n    else:\n        raise NotImplementedError(f\"Unsupported dimension {len(images.shape)}\")\n\n    out_images = torch.zeros_like(images)\n    for idx in range(len(mean)):\n        # C H W\n        if len(images.shape) == 3:\n            out_images[idx] = (images[idx] - mean[idx]) / stddev[idx]\n        elif len(images.shape) == 4:\n            out_images[:, idx] = (images[:, idx] - mean[idx]) / stddev[idx]\n        else:\n            raise NotImplementedError(\n                f\"Unsupported dimension {len(images.shape)}\"\n            )\n    return out_images\n\n\ndef _get_param_spatial_crop(\n    scale, ratio, height, width, num_repeat=10, log_scale=True, switch_hw=False\n):\n    \"\"\"\n    Given scale, ratio, height and width, return sampled coordinates of the videos.\n    \"\"\"\n    for _ in range(num_repeat):\n        area = height * width\n        target_area = random.uniform(*scale) * area\n        if log_scale:\n            log_ratio = (math.log(ratio[0]), math.log(ratio[1]))\n            aspect_ratio = math.exp(random.uniform(*log_ratio))\n        else:\n            aspect_ratio = random.uniform(*ratio)\n\n        w = int(round(math.sqrt(target_area * aspect_ratio)))\n        h = int(round(math.sqrt(target_area / aspect_ratio)))\n\n        if np.random.uniform() < 0.5 and switch_hw:\n            w, h = h, w\n\n        if 0 < w <= width and 0 < h <= height:\n            i = random.randint(0, height - h)\n            j = random.randint(0, width - w)\n            return i, j, h, w\n\n    # Fallback to central crop\n    in_ratio = float(width) / float(height)\n    if in_ratio < min(ratio):\n        w = width\n        h = int(round(w / min(ratio)))\n    elif in_ratio > max(ratio):\n        h = height\n        w = int(round(h * max(ratio)))\n    else:  # whole image\n        w = width\n        h = height\n    i = (height - h) // 2\n    j = (width - w) // 2\n    return i, j, h, w\n\n\ndef random_resized_crop(\n    images,\n    target_height,\n    target_width,\n    scale=(0.8, 1.0),\n    ratio=(3.0 / 4.0, 4.0 / 3.0),\n):\n    \"\"\"\n    Crop the given images to random size and aspect ratio. A crop of random\n    size (default: of 0.08 to 1.0) of the original size and a random aspect\n    ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This\n    crop is finally resized to given size. This is popularly used to train the\n    Inception networks.\n\n    Args:\n        images: Images to perform resizing and cropping.\n        target_height: Desired height after cropping.\n        target_width: Desired width after cropping.\n        scale: Scale range of Inception-style area based random resizing.\n        ratio: Aspect ratio range of Inception-style area based random resizing.\n    \"\"\"\n\n    height = images.shape[2]\n    width = images.shape[3]\n\n    i, j, h, w = _get_param_spatial_crop(scale, ratio, height, width)\n    cropped = images[:, :, i : i + h, j : j + w]\n    return torch.nn.functional.interpolate(\n        cropped,\n        size=(target_height, target_width),\n        mode=\"bilinear\",\n        align_corners=False,\n    )\n\n\ndef random_resized_crop_with_shift(\n    images,\n    target_height,\n    target_width,\n    scale=(0.8, 1.0),\n    ratio=(3.0 / 4.0, 4.0 / 3.0),\n):\n    \"\"\"\n    This is similar to random_resized_crop. However, it samples two different\n    boxes (for cropping) for the first and last frame. It then linearly\n    interpolates the two boxes for other frames.\n\n    Args:\n        images: Images to perform resizing and cropping.\n        target_height: Desired height after cropping.\n        target_width: Desired width after cropping.\n        scale: Scale range of Inception-style area based random resizing.\n        ratio: Aspect ratio range of Inception-style area based random resizing.\n    \"\"\"\n    t = images.shape[1]\n    height = images.shape[2]\n    width = images.shape[3]\n\n    i, j, h, w = _get_param_spatial_crop(scale, ratio, height, width)\n    i_, j_, h_, w_ = _get_param_spatial_crop(scale, ratio, height, width)\n    i_s = [int(i) for i in torch.linspace(i, i_, steps=t).tolist()]\n    j_s = [int(i) for i in torch.linspace(j, j_, steps=t).tolist()]\n    h_s = [int(i) for i in torch.linspace(h, h_, steps=t).tolist()]\n    w_s = [int(i) for i in torch.linspace(w, w_, steps=t).tolist()]\n    out = torch.zeros((3, t, target_height, target_width))\n    for ind in range(t):\n        out[:, ind : ind + 1, :, :] = torch.nn.functional.interpolate(\n            images[\n                :,\n                ind : ind + 1,\n                i_s[ind] : i_s[ind] + h_s[ind],\n                j_s[ind] : j_s[ind] + w_s[ind],\n            ],\n            size=(target_height, target_width),\n            mode=\"bilinear\",\n            align_corners=False,\n        )\n    return out\n\n\ndef create_random_augment(\n    input_size,\n    auto_augment=None,\n    interpolation=\"bilinear\",\n):\n    \"\"\"\n    Get video randaug transform.\n\n    Args:\n        input_size: The size of the input video in tuple.\n        auto_augment: Parameters for randaug. An example:\n            \"rand-m7-n4-mstd0.5-inc1\" (m is the magnitude and n is the number\n            of operations to apply).\n        interpolation: Interpolation method.\n    \"\"\"\n    if isinstance(input_size, tuple):\n        img_size = input_size[-2:]\n    else:\n        img_size = input_size\n\n    if auto_augment:\n        assert isinstance(auto_augment, str)\n        if isinstance(img_size, tuple):\n            img_size_min = min(img_size)\n        else:\n            img_size_min = img_size\n        aa_params = {\"translate_const\": int(img_size_min * 0.45)}\n        if interpolation and interpolation != \"random\":\n            aa_params[\"interpolation\"] = _pil_interp(interpolation)\n        if auto_augment.startswith(\"rand\"):\n            return transforms.Compose(\n                [rand_augment_transform(auto_augment, aa_params)]\n            )\n    raise NotImplementedError\n\n\ndef random_sized_crop_img(\n    im,\n    size,\n    jitter_scale=(0.08, 1.0),\n    jitter_aspect=(3.0 / 4.0, 4.0 / 3.0),\n    max_iter=10,\n):\n    \"\"\"\n    Performs Inception-style cropping (used for training).\n    \"\"\"\n    assert (\n        len(im.shape) == 3\n    ), \"Currently only support image for random_sized_crop\"\n    h, w = im.shape[1:3]\n    i, j, h, w = _get_param_spatial_crop(\n        scale=jitter_scale,\n        ratio=jitter_aspect,\n        height=h,\n        width=w,\n        num_repeat=max_iter,\n        log_scale=False,\n        switch_hw=True,\n    )\n    cropped = im[:, i : i + h, j : j + w]\n    return torch.nn.functional.interpolate(\n        cropped.unsqueeze(0),\n        size=(size, size),\n        mode=\"bilinear\",\n        align_corners=False,\n    ).squeeze(0)\n\n\n# The following code are modified based on timm lib, we will replace the following\n# contents with dependency from PyTorchVideo.\n# https://github.com/facebookresearch/pytorchvideo\nclass RandomResizedCropAndInterpolation:\n    \"\"\"Crop the given PIL Image to random size and aspect ratio with random interpolation.\n    A crop of random size (default: of 0.08 to 1.0) of the original size and a random\n    aspect ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This crop\n    is finally resized to given size.\n    This is popularly used to train the Inception networks.\n    Args:\n        size: expected output size of each edge\n        scale: range of size of the origin size cropped\n        ratio: range of aspect ratio of the origin aspect ratio cropped\n        interpolation: Default: PIL.Image.BILINEAR\n    \"\"\"\n\n    def __init__(\n        self,\n        size,\n        scale=(0.08, 1.0),\n        ratio=(3.0 / 4.0, 4.0 / 3.0),\n        interpolation=\"bilinear\",\n    ):\n        if isinstance(size, tuple):\n            self.size = size\n        else:\n            self.size = (size, size)\n        if (scale[0] > scale[1]) or (ratio[0] > ratio[1]):\n            print(\"range should be of kind (min, max)\")\n\n        if interpolation == \"random\":\n            self.interpolation = _RANDOM_INTERPOLATION\n        else:\n            self.interpolation = _pil_interp(interpolation)\n        self.scale = scale\n        self.ratio = ratio\n\n    @staticmethod\n    def get_params(img, scale, ratio):\n        \"\"\"Get parameters for ``crop`` for a random sized crop.\n        Args:\n            img (PIL Image): Image to be cropped.\n            scale (tuple): range of size of the origin size cropped\n            ratio (tuple): range of aspect ratio of the origin aspect ratio cropped\n        Returns:\n            tuple: params (i, j, h, w) to be passed to ``crop`` for a random\n                sized crop.\n        \"\"\"\n        area = img.size[0] * img.size[1]\n\n        for _ in range(10):\n            target_area = random.uniform(*scale) * area\n            log_ratio = (math.log(ratio[0]), math.log(ratio[1]))\n            aspect_ratio = math.exp(random.uniform(*log_ratio))\n\n            w = int(round(math.sqrt(target_area * aspect_ratio)))\n            h = int(round(math.sqrt(target_area / aspect_ratio)))\n\n            if w <= img.size[0] and h <= img.size[1]:\n                i = random.randint(0, img.size[1] - h)\n                j = random.randint(0, img.size[0] - w)\n                return i, j, h, w\n\n        # Fallback to central crop\n        in_ratio = img.size[0] / img.size[1]\n        if in_ratio < min(ratio):\n            w = img.size[0]\n            h = int(round(w / min(ratio)))\n        elif in_ratio > max(ratio):\n            h = img.size[1]\n            w = int(round(h * max(ratio)))\n        else:  # whole image\n            w = img.size[0]\n            h = img.size[1]\n        i = (img.size[1] - h) // 2\n        j = (img.size[0] - w) // 2\n        return i, j, h, w\n\n    def __call__(self, img):\n        \"\"\"\n        Args:\n            img (PIL Image): Image to be cropped and resized.\n        Returns:\n            PIL Image: Randomly cropped and resized image.\n        \"\"\"\n        i, j, h, w = self.get_params(img, self.scale, self.ratio)\n        if isinstance(self.interpolation, (tuple, list)):\n            interpolation = random.choice(self.interpolation)\n        else:\n            interpolation = self.interpolation\n        return F.resized_crop(img, i, j, h, w, self.size, interpolation)\n\n    def __repr__(self):\n        if isinstance(self.interpolation, (tuple, list)):\n            interpolate_str = \" \".join(\n                [_pil_interpolation_to_str[x] for x in self.interpolation]\n            )\n        else:\n            interpolate_str = _pil_interpolation_to_str[self.interpolation]\n        format_string = self.__class__.__name__ + \"(size={0}\".format(self.size)\n        format_string += \", scale={0}\".format(\n            tuple(round(s, 4) for s in self.scale)\n        )\n        format_string += \", ratio={0}\".format(\n            tuple(round(r, 4) for r in self.ratio)\n        )\n        format_string += \", interpolation={0})\".format(interpolate_str)\n        return format_string\n\n\ndef transforms_imagenet_train(\n    img_size=224,\n    scale=None,\n    ratio=None,\n    hflip=0.5,\n    vflip=0.0,\n    color_jitter=0.4,\n    auto_augment=None,\n    interpolation=\"random\",\n    use_prefetcher=False,\n    mean=(0.485, 0.456, 0.406),\n    std=(0.229, 0.224, 0.225),\n    re_prob=0.0,\n    re_mode=\"const\",\n    re_count=1,\n    re_num_splits=0,\n    separate=False,\n):\n    \"\"\"\n    If separate==True, the transforms are returned as a tuple of 3 separate transforms\n    for use in a mixing dataset that passes\n     * all data through the first (primary) transform, called the 'clean' data\n     * a portion of the data through the secondary transform\n     * normalizes and converts the branches above with the third, final transform\n    \"\"\"\n    if isinstance(img_size, tuple):\n        img_size = img_size[-2:]\n    else:\n        img_size = img_size\n\n    scale = tuple(scale or (0.08, 1.0))  # default imagenet scale range\n    ratio = tuple(\n        ratio or (3.0 / 4.0, 4.0 / 3.0)\n    )  # default imagenet ratio range\n    primary_tfl = [\n        RandomResizedCropAndInterpolation(\n            img_size, scale=scale, ratio=ratio, interpolation=interpolation\n        )\n    ]\n    if hflip > 0.0:\n        primary_tfl += [transforms.RandomHorizontalFlip(p=hflip)]\n    if vflip > 0.0:\n        primary_tfl += [transforms.RandomVerticalFlip(p=vflip)]\n\n    secondary_tfl = []\n    if auto_augment:\n        assert isinstance(auto_augment, str)\n        if isinstance(img_size, tuple):\n            img_size_min = min(img_size)\n        else:\n            img_size_min = img_size\n        aa_params = dict(\n            translate_const=int(img_size_min * 0.45),\n            img_mean=tuple([min(255, round(255 * x)) for x in mean]),\n        )\n        if interpolation and interpolation != \"random\":\n            aa_params[\"interpolation\"] = _pil_interp(interpolation)\n        if auto_augment.startswith(\"rand\"):\n            secondary_tfl += [rand_augment_transform(auto_augment, aa_params)]\n        elif auto_augment.startswith(\"augmix\"):\n            raise NotImplementedError(\"Augmix not implemented\")\n        else:\n            raise NotImplementedError(\"Auto aug not implemented\")\n    elif color_jitter is not None:\n        # color jitter is enabled when not using AA\n        if isinstance(color_jitter, (list, tuple)):\n            # color jitter should be a 3-tuple/list if spec brightness/contrast/saturation\n            # or 4 if also augmenting hue\n            assert len(color_jitter) in (3, 4)\n        else:\n            # if it's a scalar, duplicate for brightness, contrast, and saturation, no hue\n            color_jitter = (float(color_jitter),) * 3\n        secondary_tfl += [transforms.ColorJitter(*color_jitter)]\n\n    final_tfl = []\n    final_tfl += [\n        transforms.ToTensor(),\n        transforms.Normalize(mean=torch.tensor(mean), std=torch.tensor(std)),\n    ]\n    if re_prob > 0.0:\n        final_tfl.append(\n            RandomErasing(\n                re_prob,\n                mode=re_mode,\n                max_count=re_count,\n                num_splits=re_num_splits,\n                device=\"cpu\",\n                cube=False,\n            )\n        )\n\n    if separate:\n        return (\n            transforms.Compose(primary_tfl),\n            transforms.Compose(secondary_tfl),\n            transforms.Compose(final_tfl),\n        )\n    else:\n        return transforms.Compose(primary_tfl + secondary_tfl + final_tfl)\n\n############################################################################################################\n############################################################################################################\n\nclass Compose(object):\n    \"\"\"Composes several transforms\n    Args:\n    transforms (list of ``Transform`` objects): list of transforms\n    to compose\n    \"\"\"\n\n    def __init__(self, transforms):\n        self.transforms = transforms\n\n    def __call__(self, clip):\n        for t in self.transforms:\n            clip = t(clip)\n        return clip\n\n\nclass RandomHorizontalFlip(object):\n    \"\"\"Horizontally flip the list of given images randomly\n    with a probability 0.5\n    \"\"\"\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n        img (PIL.Image or numpy.ndarray): List of images to be cropped\n        in format (h, w, c) in numpy.ndarray\n        Returns:\n        PIL.Image or numpy.ndarray: Randomly flipped clip\n        \"\"\"\n        if random.random() < 0.5:\n            if isinstance(clip[0], np.ndarray):\n                return [np.fliplr(img) for img in clip]\n            elif isinstance(clip[0], PIL.Image.Image):\n                return [\n                    img.transpose(PIL.Image.FLIP_LEFT_RIGHT) for img in clip\n                ]\n            else:\n                raise TypeError('Expected numpy.ndarray or PIL.Image' +\n                                ' but got list of {0}'.format(type(clip[0])))\n        return clip\n\n\nclass RandomResize(object):\n    \"\"\"Resizes a list of (H x W x C) numpy.ndarray to the final size\n    The larger the original image is, the more times it takes to\n    interpolate\n    Args:\n    interpolation (str): Can be one of 'nearest', 'bilinear'\n    defaults to nearest\n    size (tuple): (widht, height)\n    \"\"\"\n\n    def __init__(self, ratio=(3. / 4., 4. / 3.), interpolation='nearest'):\n        self.ratio = ratio\n        self.interpolation = interpolation\n\n    def __call__(self, clip):\n        scaling_factor = random.uniform(self.ratio[0], self.ratio[1])\n\n        if isinstance(clip[0], np.ndarray):\n            im_h, im_w, im_c = clip[0].shape\n        elif isinstance(clip[0], PIL.Image.Image):\n            im_w, im_h = clip[0].size\n\n        new_w = int(im_w * scaling_factor)\n        new_h = int(im_h * scaling_factor)\n        new_size = (new_w, new_h)\n        resized = FF.resize_clip(\n            clip, new_size, interpolation=self.interpolation)\n        return resized\n\n\nclass Resize(object):\n    \"\"\"Resizes a list of (H x W x C) numpy.ndarray to the final size\n    The larger the original image is, the more times it takes to\n    interpolate\n    Args:\n    interpolation (str): Can be one of 'nearest', 'bilinear'\n    defaults to nearest\n    size (tuple): (widht, height)\n    \"\"\"\n\n    def __init__(self, size, interpolation='nearest'):\n        self.size = size\n        self.interpolation = interpolation\n\n    def __call__(self, clip):\n        resized = FF.resize_clip(\n            clip, self.size, interpolation=self.interpolation)\n        return resized\n\n\nclass RandomCrop(object):\n    \"\"\"Extract random crop at the same location for a list of images\n    Args:\n    size (sequence or int): Desired output size for the\n    crop in format (h, w)\n    \"\"\"\n\n    def __init__(self, size):\n        if isinstance(size, numbers.Number):\n            size = (size, size)\n\n        self.size = size\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n        img (PIL.Image or numpy.ndarray): List of images to be cropped\n        in format (h, w, c) in numpy.ndarray\n        Returns:\n        PIL.Image or numpy.ndarray: Cropped list of images\n        \"\"\"\n        h, w = self.size\n        if isinstance(clip[0], np.ndarray):\n            im_h, im_w, im_c = clip[0].shape\n        elif isinstance(clip[0], PIL.Image.Image):\n            im_w, im_h = clip[0].size\n        else:\n            raise TypeError('Expected numpy.ndarray or PIL.Image' +\n                            'but got list of {0}'.format(type(clip[0])))\n        if w > im_w or h > im_h:\n            error_msg = (\n                'Initial image size should be larger then '\n                'cropped size but got cropped sizes : ({w}, {h}) while '\n                'initial image is ({im_w}, {im_h})'.format(\n                    im_w=im_w, im_h=im_h, w=w, h=h))\n            raise ValueError(error_msg)\n\n        x1 = random.randint(0, im_w - w)\n        y1 = random.randint(0, im_h - h)\n        cropped = FF.crop_clip(clip, y1, x1, h, w)\n\n        return cropped\n\n\nclass ThreeCrop(object):\n    \"\"\"Extract random crop at the same location for a list of images\n    Args:\n    size (sequence or int): Desired output size for the\n    crop in format (h, w)\n    \"\"\"\n\n    def __init__(self, size):\n        if isinstance(size, numbers.Number):\n            size = (size, size)\n\n        self.size = size\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n        img (PIL.Image or numpy.ndarray): List of images to be cropped\n        in format (h, w, c) in numpy.ndarray\n        Returns:\n        PIL.Image or numpy.ndarray: Cropped list of images\n        \"\"\"\n        h, w = self.size\n        if isinstance(clip[0], np.ndarray):\n            im_h, im_w, im_c = clip[0].shape\n        elif isinstance(clip[0], PIL.Image.Image):\n            im_w, im_h = clip[0].size\n        else:\n            raise TypeError('Expected numpy.ndarray or PIL.Image' +\n                            'but got list of {0}'.format(type(clip[0])))\n        if w != im_w and h != im_h:\n            clip = FF.resize_clip(clip, self.size, interpolation=\"bilinear\")\n            im_h, im_w, im_c = clip[0].shape\n\n        step = np.max((np.max((im_w, im_h)) - self.size[0]) // 2, 0)\n        cropped = []\n        for i in range(3):\n            if (im_h > self.size[0]):\n                x1 = 0\n                y1 = i * step\n                cropped.extend(FF.crop_clip(clip, y1, x1, h, w))\n            else:\n                x1 = i * step\n                y1 = 0\n                cropped.extend(FF.crop_clip(clip, y1, x1, h, w))\n        return cropped\n\n\nclass RandomRotation(object):\n    \"\"\"Rotate entire clip randomly by a random angle within\n    given bounds\n    Args:\n    degrees (sequence or int): Range of degrees to select from\n    If degrees is a number instead of sequence like (min, max),\n    the range of degrees, will be (-degrees, +degrees).\n    \"\"\"\n\n    def __init__(self, degrees):\n        if isinstance(degrees, numbers.Number):\n            if degrees < 0:\n                raise ValueError('If degrees is a single number,'\n                                 'must be positive')\n            degrees = (-degrees, degrees)\n        else:\n            if len(degrees) != 2:\n                raise ValueError('If degrees is a sequence,'\n                                 'it must be of len 2.')\n\n        self.degrees = degrees\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n        img (PIL.Image or numpy.ndarray): List of images to be cropped\n        in format (h, w, c) in numpy.ndarray\n        Returns:\n        PIL.Image or numpy.ndarray: Cropped list of images\n        \"\"\"\n        import skimage\n        angle = random.uniform(self.degrees[0], self.degrees[1])\n        if isinstance(clip[0], np.ndarray):\n            rotated = [skimage.transform.rotate(img, angle) for img in clip]\n        elif isinstance(clip[0], PIL.Image.Image):\n            rotated = [img.rotate(angle) for img in clip]\n        else:\n            raise TypeError('Expected numpy.ndarray or PIL.Image' +\n                            'but got list of {0}'.format(type(clip[0])))\n\n        return rotated\n\n\nclass CenterCrop(object):\n    \"\"\"Extract center crop at the same location for a list of images\n    Args:\n    size (sequence or int): Desired output size for the\n    crop in format (h, w)\n    \"\"\"\n\n    def __init__(self, size):\n        if isinstance(size, numbers.Number):\n            size = (size, size)\n\n        self.size = size\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n        img (PIL.Image or numpy.ndarray): List of images to be cropped\n        in format (h, w, c) in numpy.ndarray\n        Returns:\n        PIL.Image or numpy.ndarray: Cropped list of images\n        \"\"\"\n        h, w = self.size\n        if isinstance(clip[0], np.ndarray):\n            im_h, im_w, im_c = clip[0].shape\n        elif isinstance(clip[0], PIL.Image.Image):\n            im_w, im_h = clip[0].size\n        else:\n            raise TypeError('Expected numpy.ndarray or PIL.Image' +\n                            'but got list of {0}'.format(type(clip[0])))\n        if w > im_w or h > im_h:\n            error_msg = (\n                'Initial image size should be larger then '\n                'cropped size but got cropped sizes : ({w}, {h}) while '\n                'initial image is ({im_w}, {im_h})'.format(\n                    im_w=im_w, im_h=im_h, w=w, h=h))\n            raise ValueError(error_msg)\n\n        x1 = int(round((im_w - w) / 2.))\n        y1 = int(round((im_h - h) / 2.))\n        cropped = FF.crop_clip(clip, y1, x1, h, w)\n\n        return cropped\n\n\nclass ColorJitter(object):\n    \"\"\"Randomly change the brightness, contrast and saturation and hue of the clip\n    Args:\n    brightness (float): How much to jitter brightness. brightness_factor\n    is chosen uniformly from [max(0, 1 - brightness), 1 + brightness].\n    contrast (float): How much to jitter contrast. contrast_factor\n    is chosen uniformly from [max(0, 1 - contrast), 1 + contrast].\n    saturation (float): How much to jitter saturation. saturation_factor\n    is chosen uniformly from [max(0, 1 - saturation), 1 + saturation].\n    hue(float): How much to jitter hue. hue_factor is chosen uniformly from\n    [-hue, hue]. Should be >=0 and <= 0.5.\n    \"\"\"\n\n    def __init__(self, brightness=0, contrast=0, saturation=0, hue=0):\n        self.brightness = brightness\n        self.contrast = contrast\n        self.saturation = saturation\n        self.hue = hue\n\n    def get_params(self, brightness, contrast, saturation, hue):\n        if brightness > 0:\n            brightness_factor = random.uniform(\n                max(0, 1 - brightness), 1 + brightness)\n        else:\n            brightness_factor = None\n\n        if contrast > 0:\n            contrast_factor = random.uniform(\n                max(0, 1 - contrast), 1 + contrast)\n        else:\n            contrast_factor = None\n\n        if saturation > 0:\n            saturation_factor = random.uniform(\n                max(0, 1 - saturation), 1 + saturation)\n        else:\n            saturation_factor = None\n\n        if hue > 0:\n            hue_factor = random.uniform(-hue, hue)\n        else:\n            hue_factor = None\n        return brightness_factor, contrast_factor, saturation_factor, hue_factor\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n        clip (list): list of PIL.Image\n        Returns:\n        list PIL.Image : list of transformed PIL.Image\n        \"\"\"\n        if isinstance(clip[0], np.ndarray):\n            raise TypeError(\n                'Color jitter not yet implemented for numpy arrays')\n        elif isinstance(clip[0], PIL.Image.Image):\n            brightness, contrast, saturation, hue = self.get_params(\n                self.brightness, self.contrast, self.saturation, self.hue)\n\n            # Create img transform function sequence\n            img_transforms = []\n            if brightness is not None:\n                img_transforms.append(lambda img: torchvision.transforms.functional.adjust_brightness(img, brightness))\n            if saturation is not None:\n                img_transforms.append(lambda img: torchvision.transforms.functional.adjust_saturation(img, saturation))\n            if hue is not None:\n                img_transforms.append(lambda img: torchvision.transforms.functional.adjust_hue(img, hue))\n            if contrast is not None:\n                img_transforms.append(lambda img: torchvision.transforms.functional.adjust_contrast(img, contrast))\n            random.shuffle(img_transforms)\n\n            # Apply to all images\n            jittered_clip = []\n            for img in clip:\n                for func in img_transforms:\n                    jittered_img = func(img)\n                jittered_clip.append(jittered_img)\n\n        else:\n            raise TypeError('Expected numpy.ndarray or PIL.Image' +\n                            'but got list of {0}'.format(type(clip[0])))\n        return jittered_clip\n\n\nclass Normalize(object):\n    \"\"\"Normalize a clip with mean and standard deviation.\n    Given mean: ``(M1,...,Mn)`` and std: ``(S1,..,Sn)`` for ``n`` channels, this transform\n    will normalize each channel of the input ``torch.*Tensor`` i.e.\n    ``input[channel] = (input[channel] - mean[channel]) / std[channel]``\n    .. note::\n        This transform acts out of place, i.e., it does not mutates the input tensor.\n    Args:\n        mean (sequence): Sequence of means for each channel.\n        std (sequence): Sequence of standard deviations for each channel.\n    \"\"\"\n\n    def __init__(self, mean, std):\n        self.mean = mean\n        self.std = std\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n            clip (Tensor): Tensor clip of size (T, C, H, W) to be normalized.\n        Returns:\n            Tensor: Normalized Tensor clip.\n        \"\"\"\n        return FF.normalize(clip, self.mean, self.std)\n\n    def __repr__(self):\n        return self.__class__.__name__ + '(mean={0}, std={1})'.format(self.mean, self.std)\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/datasets/volume_transforms.py",
    "content": "import numpy as np\nfrom PIL import Image\nimport torch\n\n\ndef convert_img(img):\n    \"\"\"Converts (H, W, C) numpy.ndarray to (C, W, H) format\n    \"\"\"\n    if len(img.shape) == 3:\n        img = img.transpose(2, 0, 1)\n    if len(img.shape) == 2:\n        img = np.expand_dims(img, 0)\n    return img\n\n\nclass ClipToTensor(object):\n    \"\"\"Convert a list of m (H x W x C) numpy.ndarrays in the range [0, 255]\n    to a torch.FloatTensor of shape (C x m x H x W) in the range [0, 1.0]\n    \"\"\"\n\n    def __init__(self, channel_nb=3, div_255=True, numpy=False):\n        self.channel_nb = channel_nb\n        self.div_255 = div_255\n        self.numpy = numpy\n\n    def __call__(self, clip):\n        \"\"\"\n        Args: clip (list of numpy.ndarray): clip (list of images)\n        to be converted to tensor.\n        \"\"\"\n        # Retrieve shape\n        if isinstance(clip[0], np.ndarray):\n            h, w, ch = clip[0].shape\n            assert ch == self.channel_nb, 'Got {0} instead of 3 channels'.format(\n                ch)\n        elif isinstance(clip[0], Image.Image):\n            w, h = clip[0].size\n        else:\n            raise TypeError('Expected numpy.ndarray or PIL.Image\\\n            but got list of {0}'.format(type(clip[0])))\n\n        np_clip = np.zeros([self.channel_nb, len(clip), int(h), int(w)])\n\n        # Convert\n        for img_idx, img in enumerate(clip):\n            if isinstance(img, np.ndarray):\n                pass\n            elif isinstance(img, Image.Image):\n                img = np.array(img, copy=False)\n            else:\n                raise TypeError('Expected numpy.ndarray or PIL.Image\\\n                but got list of {0}'.format(type(clip[0])))\n            img = convert_img(img)\n            np_clip[:, img_idx, :, :] = img\n        if self.numpy:\n            if self.div_255:\n                np_clip = np_clip / 255.0\n            return np_clip\n\n        else:\n            tensor_clip = torch.from_numpy(np_clip)\n\n            if not isinstance(tensor_clip, torch.FloatTensor):\n                tensor_clip = tensor_clip.float()\n            if self.div_255:\n                tensor_clip = torch.div(tensor_clip, 255)\n            return tensor_clip\n\n\n# Note this norms data to -1/1\nclass ClipToTensor_K(object):\n    \"\"\"Convert a list of m (H x W x C) numpy.ndarrays in the range [0, 255]\n    to a torch.FloatTensor of shape (C x m x H x W) in the range [0, 1.0]\n    \"\"\"\n\n    def __init__(self, channel_nb=3, div_255=True, numpy=False):\n        self.channel_nb = channel_nb\n        self.div_255 = div_255\n        self.numpy = numpy\n\n    def __call__(self, clip):\n        \"\"\"\n        Args: clip (list of numpy.ndarray): clip (list of images)\n        to be converted to tensor.\n        \"\"\"\n        # Retrieve shape\n        if isinstance(clip[0], np.ndarray):\n            h, w, ch = clip[0].shape\n            assert ch == self.channel_nb, 'Got {0} instead of 3 channels'.format(\n                ch)\n        elif isinstance(clip[0], Image.Image):\n            w, h = clip[0].size\n        else:\n            raise TypeError('Expected numpy.ndarray or PIL.Image\\\n            but got list of {0}'.format(type(clip[0])))\n\n        np_clip = np.zeros([self.channel_nb, len(clip), int(h), int(w)])\n\n        # Convert\n        for img_idx, img in enumerate(clip):\n            if isinstance(img, np.ndarray):\n                pass\n            elif isinstance(img, Image.Image):\n                img = np.array(img, copy=False)\n            else:\n                raise TypeError('Expected numpy.ndarray or PIL.Image\\\n                but got list of {0}'.format(type(clip[0])))\n            img = convert_img(img)\n            np_clip[:, img_idx, :, :] = img\n        if self.numpy:\n            if self.div_255:\n                np_clip = (np_clip - 127.5) / 127.5\n            return np_clip\n\n        else:\n            tensor_clip = torch.from_numpy(np_clip)\n\n            if not isinstance(tensor_clip, torch.FloatTensor):\n                tensor_clip = tensor_clip.float()\n            if self.div_255:\n                tensor_clip = torch.div(torch.sub(tensor_clip, 127.5), 127.5)\n            return tensor_clip\n\n\nclass ToTensor(object):\n    \"\"\"Converts numpy array to tensor\n    \"\"\"\n\n    def __call__(self, array):\n        tensor = torch.from_numpy(array)\n        return tensor\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/functional.py",
    "content": "import numbers\nimport cv2\nimport numpy as np\nimport PIL\nimport torch\n\n\ndef _is_tensor_clip(clip):\n    return torch.is_tensor(clip) and clip.ndimension() == 4\n\n\ndef crop_clip(clip, min_h, min_w, h, w):\n    if isinstance(clip[0], np.ndarray):\n        cropped = [img[min_h:min_h + h, min_w:min_w + w, :] for img in clip]\n\n    elif isinstance(clip[0], PIL.Image.Image):\n        cropped = [\n            img.crop((min_w, min_h, min_w + w, min_h + h)) for img in clip\n        ]\n    else:\n        raise TypeError('Expected numpy.ndarray or PIL.Image' +\n                        'but got list of {0}'.format(type(clip[0])))\n    return cropped\n\n\ndef resize_clip(clip, size, interpolation='bilinear'):\n    if isinstance(clip[0], np.ndarray):\n        if isinstance(size, numbers.Number):\n            im_h, im_w, im_c = clip[0].shape\n            # Min spatial dim already matches minimal size\n            if (im_w <= im_h and im_w == size) or (im_h <= im_w\n                                                   and im_h == size):\n                return clip\n            new_h, new_w = get_resize_sizes(im_h, im_w, size)\n            size = (new_w, new_h)\n        else:\n            size = size[0], size[1]\n        if interpolation == 'bilinear':\n            np_inter = cv2.INTER_LINEAR\n        else:\n            np_inter = cv2.INTER_NEAREST\n        scaled = [\n            cv2.resize(img, size, interpolation=np_inter) for img in clip\n        ]\n    elif isinstance(clip[0], PIL.Image.Image):\n        if isinstance(size, numbers.Number):\n            im_w, im_h = clip[0].size\n            # Min spatial dim already matches minimal size\n            if (im_w <= im_h and im_w == size) or (im_h <= im_w\n                                                   and im_h == size):\n                return clip\n            new_h, new_w = get_resize_sizes(im_h, im_w, size)\n            size = (new_w, new_h)\n        else:\n            size = size[1], size[0]\n        if interpolation == 'bilinear':\n            pil_inter = PIL.Image.BILINEAR\n        else:\n            pil_inter = PIL.Image.NEAREST\n        scaled = [img.resize(size, pil_inter) for img in clip]\n    else:\n        raise TypeError('Expected numpy.ndarray or PIL.Image' +\n                        'but got list of {0}'.format(type(clip[0])))\n    return scaled\n\n\ndef get_resize_sizes(im_h, im_w, size):\n    if im_w < im_h:\n        ow = size\n        oh = int(size * im_h / im_w)\n    else:\n        oh = size\n        ow = int(size * im_w / im_h)\n    return oh, ow\n\n\ndef normalize(clip, mean, std, inplace=False):\n    if not _is_tensor_clip(clip):\n        raise TypeError('tensor is not a torch clip.')\n\n    if not inplace:\n        clip = clip.clone()\n\n    dtype = clip.dtype\n    mean = torch.as_tensor(mean, dtype=dtype, device=clip.device)\n    std = torch.as_tensor(std, dtype=dtype, device=clip.device)\n    clip.sub_(mean[:, None, None, None]).div_(std[:, None, None, None])\n\n    return clip\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/models/__init__.py",
    "content": "from .clip import clip_b16, clip_l14, clip_l14_336\n# from .modeling_finetune import vit_base_patch16_224, vit_base_patch16_384, vit_large_patch16_224, vit_large_patch16_384\nfrom .modeling_finetune import vit_large_patch16_224\nfrom .modeling_pretrain_umt import pretrain_umt_base_patch16_224, pretrain_umt_large_patch16_224 \nfrom .modeling_pretrain import pretrain_videomae_base_patch16_224, pretrain_videomae_large_patch16_224, pretrain_videomae_huge_patch16_224 \n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/models/clip.py",
    "content": "#!/usr/bin/env python\nimport os\nfrom collections import OrderedDict\n\nimport torch\nfrom torch import nn\n\n\nMODEL_PATH = 'your_model_path/clip_visual_encoder'\n_MODELS = {\n    # extracted from OpenAI, see extract_clip\n    \"ViT-B/16\": os.path.join(MODEL_PATH, \"vit_b16.pth\"),\n    \"ViT-L/14\": os.path.join(MODEL_PATH, \"vit_l14.pth\"),\n    \"ViT-L/14_336\": os.path.join(MODEL_PATH, \"vit_l14_336.pth\"),\n}\n\n\nclass LayerNorm(nn.LayerNorm):\n    \"\"\"Subclass torch's LayerNorm to handle fp16.\"\"\"\n\n    def forward(self, x):\n        orig_type = x.dtype\n        ret = super().forward(x.type(torch.float32))\n        return ret.type(orig_type)\n\n\nclass QuickGELU(nn.Module):\n    def forward(self, x):\n        return x * torch.sigmoid(1.702 * x)\n\n\nclass ResidualAttentionBlock(nn.Module):\n    def __init__(self, d_model, n_head, attn_mask=None):\n        super().__init__()\n\n        self.attn = nn.MultiheadAttention(d_model, n_head)\n        self.ln_1 = LayerNorm(d_model)\n        self.mlp = nn.Sequential(OrderedDict([\n            (\"c_fc\", nn.Linear(d_model, d_model * 4)),\n            (\"gelu\", QuickGELU()),\n            (\"c_proj\", nn.Linear(d_model * 4, d_model))\n        ]))\n        self.ln_2 = LayerNorm(d_model)\n        self.attn_mask = attn_mask\n\n    def attention(self, x, return_attn=False):\n        self.attn_mask = self.attn_mask.to(dtype=x.dtype, device=x.device) if self.attn_mask is not None else None\n        if return_attn:\n            return self.attn(x, x, x, need_weights=True, attn_mask=self.attn_mask)\n        else:\n            return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]\n\n    def forward(self, x, return_attn=False):\n        if return_attn:\n            x_, attn = self.attention(self.ln_1(x), return_attn=True)\n            x = x + x_\n            x = x + self.mlp(self.ln_2(x))\n            return x, attn\n        else:\n            x = x + self.attention(self.ln_1(x))\n            x = x + self.mlp(self.ln_2(x))\n            return x\n\n\nclass Transformer(nn.Module):\n    def __init__(\n            self, width, layers, heads, return_attn=False, \n            clip_return_layer=1, clip_return_interval=1,\n        ):\n        super().__init__()\n        self.layers = layers\n        self.return_attn = return_attn\n        self.resblocks = nn.ModuleList()\n        for _ in range(layers):\n            self.resblocks.append(\n                ResidualAttentionBlock(\n                    width, heads,\n                )\n            )\n        self.return_index = []\n        for i in range(clip_return_layer):\n            self.return_index.append(layers - int(i * clip_return_interval) - 1)\n        print(f'Teacher return index: {self.return_index}')\n\n    def forward(self, x):\n        attn = None\n        z = []\n        for idx, blk in enumerate(self.resblocks):\n            if idx == self.layers - 1 and self.return_attn:\n                x, attn = blk(x, return_attn=True)\n            else:\n                x = blk(x)\n            if idx in self.return_index:\n                z.append(x)\n        x = torch.stack(z)\n        return x, attn\n\n\nclass VisionTransformer(nn.Module):\n    def __init__(\n        self, input_resolution, patch_size, width, layers, heads, output_dim, \n        clip_norm_type='l2', kernel_size=1,\n        return_attn=False, clip_return_layer=1, clip_return_interval=1,\n    ):\n        super().__init__()\n        self.clip_norm_type = clip_norm_type\n        self.return_attn = return_attn\n        print(f'Normalization Type: {clip_norm_type}')\n        print(f'Return Attention: {return_attn}')\n        print(f'Return Layer: {clip_return_layer}')\n        print(f'Return Interval: {clip_return_interval}')\n\n        self.output_dim = output_dim\n        self.conv1 = nn.Conv3d(\n            3, width, \n            (kernel_size, patch_size, patch_size), \n            (kernel_size, patch_size, patch_size), \n            (0, 0, 0), bias=False\n        )\n\n        scale = width ** -0.5\n        self.class_embedding = nn.Parameter(scale * torch.randn(width))\n        self.positional_embedding = nn.Parameter(scale * torch.randn((input_resolution // patch_size) ** 2 + 1, width))\n        self.ln_pre = LayerNorm(width)\n        \n        self.transformer = Transformer(\n            width, layers, heads, return_attn=return_attn, \n            clip_return_layer=clip_return_layer,\n            clip_return_interval=clip_return_interval,\n        )\n\n        self.ln_post = LayerNorm(width)\n        self.proj = nn.Parameter(scale * torch.randn(width, output_dim))\n\n    def forward(self, x, mask=None):\n        x = self.conv1(x)  # shape = [*, width, grid, grid]\n        N, C, T, H, W = x.shape\n        x = x.permute(0, 2, 3, 4, 1).reshape(N * T, H * W, C)\n\n        x = torch.cat([self.class_embedding.to(x.dtype) + torch.zeros(x.shape[0], 1, x.shape[-1], dtype=x.dtype, device=x.device), x], dim=1)  # shape = [*, grid ** 2 + 1, width]\n        x = x + self.positional_embedding.to(x.dtype)\n        x = self.ln_pre(x)\n\n        if mask is not None:\n            cls_tokens = x[:, :1, :]\n            x = x[:, 1:]\n            x = x.reshape(N, T * H * W, C)\n            x = x[~mask].view(N * T, -1, C)\n            HW = x.shape[1]\n            x = torch.cat([cls_tokens, x], dim=1)\n        else:\n            HW = H * W\n\n        x = x.permute(1, 0, 2)  # NLD -> LND\n        x, attn = self.transformer(x)\n\n        K = x.shape[0]\n        x = self.ln_post(x[:, 1:, :, :])  # [HW, NT, C]\n        x = x.view(K, HW, N, T, C).permute(0, 2, 3, 1, 4).reshape(K, N, T * HW, C)  # [K, N, THW, C]\n        x = x @ self.proj\n        \n        if self.clip_norm_type == 'l2':\n            x = x / x.norm(dim=-1, keepdim=True)\n        elif self.clip_norm_type == 'none':\n            pass\n        else:\n            raise NotImplementedError\n\n        if self.return_attn:\n            return x, attn[:, 0, 1:]\n        else:\n            return x\n\n\ndef inflate_weight(weight_2d, time_dim, center=True):\n    print(f'Init center: {center}')\n    if center:\n        weight_3d = torch.zeros(*weight_2d.shape)\n        weight_3d = weight_3d.unsqueeze(2).repeat(1, 1, time_dim, 1, 1)\n        middle_idx = time_dim // 2\n        weight_3d[:, :, middle_idx, :, :] = weight_2d\n    else:\n        weight_3d = weight_2d.unsqueeze(2).repeat(1, 1, time_dim, 1, 1)\n        weight_3d = weight_3d / time_dim\n    return weight_3d\n\n\ndef load_state_dict(model, state_dict, input_resolution=224, patch_size=16, center=True):\n    state_dict_3d = model.state_dict()\n    for k in state_dict.keys():\n        if k in state_dict_3d.keys() and state_dict[k].shape != state_dict_3d[k].shape:\n            if len(state_dict_3d[k].shape) <= 2:\n                print(f'Ignore: {k}')\n                continue\n            print(f'Inflate: {k}, {state_dict[k].shape} => {state_dict_3d[k].shape}')\n            time_dim = state_dict_3d[k].shape[2]\n            state_dict[k] = inflate_weight(state_dict[k], time_dim, center=center)\n\n    pos_embed_checkpoint = state_dict['positional_embedding']\n    embedding_size = pos_embed_checkpoint.shape[-1]\n    num_patches = (input_resolution // patch_size) ** 2\n    orig_size = int((pos_embed_checkpoint.shape[-2] - 1) ** 0.5)\n    new_size = int(num_patches ** 0.5)\n    if orig_size != new_size:\n        print(f'Pos_emb from {orig_size} to {new_size}')\n        extra_tokens = pos_embed_checkpoint[:1]\n        pos_tokens = pos_embed_checkpoint[1:]\n        pos_tokens = pos_tokens.reshape(-1, orig_size, orig_size, embedding_size).permute(0, 3, 1, 2)\n        pos_tokens = torch.nn.functional.interpolate(\n            pos_tokens, size=(new_size, new_size), mode='bicubic', align_corners=False)\n        pos_tokens = pos_tokens.permute(0, 2, 3, 1).flatten(0, 2)\n        new_pos_embed = torch.cat((extra_tokens, pos_tokens), dim=0)\n        state_dict['positional_embedding'] = new_pos_embed\n    \n    model.load_state_dict(state_dict, strict=True)\n\n\ndef clip_b16(\n    pretrained=True, \n    clip_norm_type='l2', input_resolution=224, kernel_size=1,\n    return_attn=False, center=True, clip_return_layer=1,\n    clip_return_interval=1\n):\n    model = VisionTransformer(\n        input_resolution=input_resolution, patch_size=16, \n        width=768, layers=12, heads=12, output_dim=512,\n        clip_norm_type=clip_norm_type,\n        kernel_size=kernel_size, return_attn=return_attn,\n        clip_return_layer=clip_return_layer, \n        clip_return_interval=clip_return_interval\n    )\n    if pretrained:\n        print('load pretrained weights')\n        state_dict = torch.load(_MODELS[\"ViT-B/16\"], map_location='cpu')\n        load_state_dict(model, state_dict, input_resolution=input_resolution, patch_size=16, center=center)\n    return model.eval()\n\n\ndef clip_l14(\n    pretrained=True, \n    clip_norm_type='l2', input_resolution=224, kernel_size=1,\n    return_attn=False, center=True, clip_return_layer=1,\n    clip_return_interval=1\n):\n    model = VisionTransformer(\n        input_resolution=input_resolution, patch_size=14,\n        width=1024, layers=24, heads=16, output_dim=768,\n        clip_norm_type=clip_norm_type,\n        kernel_size=kernel_size, return_attn=return_attn,\n        clip_return_layer=clip_return_layer,\n        clip_return_interval=clip_return_interval\n    )\n    if pretrained:\n        print('load pretrained weights')\n        state_dict = torch.load(_MODELS[\"ViT-L/14\"], map_location='cpu')\n        load_state_dict(model, state_dict, input_resolution=input_resolution, patch_size=14, center=center)\n    return model.eval()\n\n\ndef clip_l14_336(\n    pretrained=True, \n    clip_norm_type='l2', input_resolution=336, kernel_size=1,\n    return_attn=False, center=True, clip_return_layer=1,\n    clip_return_interval=1\n):\n    model = VisionTransformer(\n        input_resolution=input_resolution, patch_size=14, \n        width=1024, layers=24, heads=16, output_dim=768,\n        clip_norm_type=clip_norm_type,\n        kernel_size=kernel_size, return_attn=return_attn,\n        clip_return_layer=clip_return_layer,\n        clip_return_interval=clip_return_interval,\n    )\n    if pretrained:\n        print('load pretrained weights')\n        state_dict = torch.load(_MODELS[\"ViT-L/14_336\"], map_location='cpu')\n        load_state_dict(model, state_dict, input_resolution=input_resolution, patch_size=14, center=center)\n    return model.eval()\n\n\nif __name__ == '__main__':\n    import time\n    from fvcore.nn import FlopCountAnalysis\n    from fvcore.nn import flop_count_table\n    import numpy as np\n\n    seed = 4217\n    np.random.seed(seed)\n    torch.manual_seed(seed)\n    torch.cuda.manual_seed(seed)\n    torch.cuda.manual_seed_all(seed)\n    num_frames = 8\n\n    model = clip_ml_b16(pretrained=True, kernel_size=1, return_attn=False, clip_return_layer=1)\n    # print(model)\n\n    # flops = FlopCountAnalysis(model, torch.rand(1, 3, num_frames, 224, 224))\n    # s = time.time()\n    # print(flop_count_table(flops, max_depth=1))\n    # print(time.time()-s)\n    print(model(torch.rand(1, 3, num_frames, 224, 224)).shape)"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/models/modeling_finetune.py",
    "content": "from functools import partial\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom timm.models.layers import drop_path, to_2tuple, trunc_normal_\nfrom timm.models.registry import register_model\nimport torch.utils.checkpoint as checkpoint\n\n\ndef _cfg(url='', **kwargs):\n    return {\n        'url': url,\n        'num_classes': 400, 'input_size': (3, 224, 224), 'pool_size': None,\n        'crop_pct': .9, 'interpolation': 'bicubic',\n        'mean': (0.5, 0.5, 0.5), 'std': (0.5, 0.5, 0.5),\n        **kwargs\n    }\n\n\nclass DropPath(nn.Module):\n    \"\"\"Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks).\n    \"\"\"\n    def __init__(self, drop_prob=None):\n        super(DropPath, self).__init__()\n        self.drop_prob = drop_prob\n\n    def forward(self, x):\n        return drop_path(x, self.drop_prob, self.training)\n    \n    def extra_repr(self) -> str:\n        return 'p={}'.format(self.drop_prob)\n\n\nclass Mlp(nn.Module):\n    def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):\n        super().__init__()\n        out_features = out_features or in_features\n        hidden_features = hidden_features or in_features\n        self.fc1 = nn.Linear(in_features, hidden_features)\n        self.act = act_layer()\n        self.fc2 = nn.Linear(hidden_features, out_features)\n        self.drop = nn.Dropout(drop)\n\n    def forward(self, x):\n        x = self.fc1(x)\n        x = self.act(x)\n        # x = self.drop(x)\n        # commit this for the orignal BERT implement \n        x = self.fc2(x)\n        x = self.drop(x)\n        return x\n\n\nclass Attention(nn.Module):\n    def __init__(\n            self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.,\n            proj_drop=0., attn_head_dim=None):\n        super().__init__()\n        self.num_heads = num_heads\n        head_dim = dim // num_heads\n        if attn_head_dim is not None:\n            head_dim = attn_head_dim\n        all_head_dim = head_dim * self.num_heads\n        self.scale = qk_scale or head_dim ** -0.5\n\n        self.qkv = nn.Linear(dim, all_head_dim * 3, bias=False)\n        if qkv_bias:\n            self.q_bias = nn.Parameter(torch.zeros(all_head_dim))\n            self.v_bias = nn.Parameter(torch.zeros(all_head_dim))\n        else:\n            self.q_bias = None\n            self.v_bias = None\n\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj = nn.Linear(all_head_dim, dim)\n        self.proj_drop = nn.Dropout(proj_drop)\n\n    def forward(self, x):\n        B, N, C = x.shape\n        qkv_bias = None\n        if self.q_bias is not None:\n            qkv_bias = torch.cat((self.q_bias, torch.zeros_like(self.v_bias, requires_grad=False), self.v_bias))\n        # qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)\n        qkv = F.linear(input=x, weight=self.qkv.weight, bias=qkv_bias)\n        qkv = qkv.reshape(B, N, 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)\n        q, k, v = qkv[0], qkv[1], qkv[2]   # make torchscript happy (cannot use tensor as tuple)\n\n        q = q * self.scale\n        attn = (q @ k.transpose(-2, -1))\n        \n        attn = attn.softmax(dim=-1)\n        attn = self.attn_drop(attn)\n\n        x = (attn @ v).transpose(1, 2).reshape(B, N, -1)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        return x\n\n\nclass Block(nn.Module):\n    def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,\n                 drop_path=0., init_values=None, act_layer=nn.GELU, norm_layer=nn.LayerNorm,\n                 attn_head_dim=None):\n        super().__init__()\n        self.norm1 = norm_layer(dim)\n        self.attn = Attention(\n            dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale,\n            attn_drop=attn_drop, proj_drop=drop, attn_head_dim=attn_head_dim)\n        # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here\n        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n        self.norm2 = norm_layer(dim)\n        mlp_hidden_dim = int(dim * mlp_ratio)\n        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)\n\n        if init_values > 0:\n            self.gamma_1 = nn.Parameter(init_values * torch.ones((dim)),requires_grad=True)\n            self.gamma_2 = nn.Parameter(init_values * torch.ones((dim)),requires_grad=True)\n        else:\n            self.gamma_1, self.gamma_2 = None, None\n\n    def forward(self, x):\n        if self.gamma_1 is None:\n            x = x + self.drop_path(self.attn(self.norm1(x)))\n            x = x + self.drop_path(self.mlp(self.norm2(x)))\n        else:\n            x = x + self.drop_path(self.gamma_1 * self.attn(self.norm1(x)))\n            x = x + self.drop_path(self.gamma_2 * self.mlp(self.norm2(x)))\n        return x\n\n\nclass PatchEmbed(nn.Module):\n    \"\"\" Image to Patch Embedding\n    \"\"\"\n    def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=768, num_frames=16, tubelet_size=2):\n        super().__init__()\n        img_size = to_2tuple(img_size)\n        patch_size = to_2tuple(patch_size)\n        self.tubelet_size = int(tubelet_size)\n        num_patches = (img_size[1] // patch_size[1]) * (img_size[0] // patch_size[0]) * (num_frames // self.tubelet_size)\n        self.img_size = img_size\n        self.patch_size = patch_size\n        self.num_patches = num_patches\n        self.proj = nn.Conv3d(in_channels=in_chans, out_channels=embed_dim, \n                            kernel_size=(self.tubelet_size, patch_size[0], patch_size[1]), \n                            stride=(self.tubelet_size, patch_size[0], patch_size[1]))\n\n    def forward(self, x, **kwargs):\n        B, C, T, H, W = x.shape\n        # FIXME look at relaxing size constraints\n        assert H == self.img_size[0] and W == self.img_size[1], \\\n            f\"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]}).\"\n        x = self.proj(x).flatten(2).transpose(1, 2)\n        return x\n    \n# sin-cos position encoding\n# https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/master/transformer/Models.py#L31\ndef get_sinusoid_encoding_table(n_position, d_hid, cur_frame=-1, pre_n_position=1568): \n    ''' Sinusoid position encoding table ''' \n    # TODO: make it with torch instead of numpy \n    def get_position_angle_vec(position): \n        return [position / np.power(10000, 2 * (hid_j // 2) / d_hid) for hid_j in range(d_hid)] \n    \n    # generate checkpoint position embedding\n    sinusoid_table = np.array([get_position_angle_vec(pos_i) for pos_i in range(pre_n_position)]) \n    sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2]) # dim 2i \n    sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2]) # dim 2i+1 \n    sinusoid_table = torch.tensor(sinusoid_table, dtype=torch.float, requires_grad=False).unsqueeze(0)\n    print(f\"n_position: {n_position}\")\n    print(f\"pre_n_position: {pre_n_position}\")\n    if n_position // cur_frame * 8 != pre_n_position and cur_frame != -1:\n        T = 8 # checkpoint frame\n        P = 14 # checkpoint size\n        C = d_hid\n        new_P = int((n_position // cur_frame) ** 0.5) # testing size\n        print(f'Pretraining uses 14x14, but current version is {new_P}x{new_P}')\n        print(f'Interpolate the position embedding')\n        sinusoid_table = sinusoid_table.reshape(-1, T, P, P, C)\n        sinusoid_table = sinusoid_table.reshape(-1, P, P, C).permute(0, 3, 1, 2)\n        sinusoid_table = torch.nn.functional.interpolate(\n            sinusoid_table, size=(new_P, new_P), mode='bicubic', align_corners=False)\n        # BT, C, H, W -> BT, H, W, C ->  B, T, H, W, C\n        sinusoid_table = sinusoid_table.permute(0, 2, 3, 1).reshape(-1, T, new_P, new_P, C)\n        sinusoid_table = sinusoid_table.flatten(1, 3)  # B, THW, C\n    if cur_frame != -1 and cur_frame != 8:\n        print(f'Pretraining uses 8 frames, but current frame is {cur_frame}')\n        print(f'Interpolate the position embedding')\n        T = 8 # checkpoint frame\n        new_T = cur_frame # testing frame\n        # interpolate\n        P = int((n_position // cur_frame) ** 0.5) # testing size\n        C = d_hid\n        sinusoid_table = sinusoid_table.reshape(-1, T, P, P, C)\n        sinusoid_table = sinusoid_table.permute(0, 2, 3, 4, 1).reshape(-1, C, T)  # BHW, C, T\n        sinusoid_table = torch.nn.functional.interpolate(sinusoid_table, size=new_T, mode='linear')\n        sinusoid_table = sinusoid_table.reshape(1, P, P, C, new_T).permute(0, 4, 1, 2, 3) # B, T, H, W, C\n        sinusoid_table = sinusoid_table.flatten(1, 3)  # B, THW, C\n    if n_position == pre_n_position:\n        return sinusoid_table\n    else:\n        print(\"Use learnable position embedding\")\n        return nn.Parameter(sinusoid_table, requires_grad=True)\n\n\nclass VisionTransformer(nn.Module):\n    \"\"\" Vision Transformer with support for patch or hybrid CNN input stage\n    \"\"\"\n    def __init__(self, \n                 img_size=224, \n                 patch_size=16, \n                 in_chans=3, \n                 num_classes=1000, \n                 embed_dim=768, \n                 depth=12,\n                 num_heads=12, \n                 mlp_ratio=4., \n                 qkv_bias=False, \n                 qk_scale=None, \n                 fc_drop_rate=0., \n                 drop_rate=0., \n                 attn_drop_rate=0.,\n                 drop_path_rate=0., \n                 norm_layer=nn.LayerNorm, \n                 init_values=0.,\n                 use_learnable_pos_emb=False, \n                 init_scale=0.,\n                 all_frames=16,\n                 tubelet_size=2,\n                 use_checkpoint=False,\n                 checkpoint_num=0,\n                 use_mean_pooling=True):\n        super().__init__()\n        self.num_classes = num_classes\n        self.num_features = self.embed_dim = embed_dim  # num_features for consistency with other models\n        self.tubelet_size = tubelet_size\n        self.patch_embed = PatchEmbed(\n            img_size=img_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim, num_frames=all_frames, tubelet_size=self.tubelet_size)\n        num_patches = self.patch_embed.num_patches\n        self.use_checkpoint = use_checkpoint\n        self.checkpoint_num = checkpoint_num\n        print(f'Use checkpoint: {use_checkpoint}')\n        print(f'Checkpoint number: {checkpoint_num}')\n\n        if use_learnable_pos_emb:\n            self.pos_embed = nn.Parameter(torch.zeros(1, num_patches, embed_dim))\n        else:\n            # sine-cosine positional embeddings is on the way\n            if patch_size == 14:\n                pre_n_position = 2048\n            else:\n                pre_n_position = 1568\n            self.pos_embed = get_sinusoid_encoding_table(\n                num_patches, embed_dim, all_frames // tubelet_size,\n                pre_n_position=pre_n_position\n            )\n\n        self.pos_drop = nn.Dropout(p=drop_rate)\n\n        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)]  # stochastic depth decay rule\n        self.blocks = nn.ModuleList([\n            Block(\n                dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,\n                drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer,\n                init_values=init_values)\n            for i in range(depth)])\n        self.norm = nn.Identity() if use_mean_pooling else norm_layer(embed_dim)\n        self.fc_norm = norm_layer(embed_dim) if use_mean_pooling else None\n        self.fc_dropout = nn.Dropout(p=fc_drop_rate) if fc_drop_rate > 0 else nn.Identity()\n        self.head = nn.Linear(embed_dim, num_classes) if num_classes > 0 else nn.Identity()\n\n        if use_learnable_pos_emb:\n            trunc_normal_(self.pos_embed, std=.02)\n\n        trunc_normal_(self.head.weight, std=.02)\n        self.apply(self._init_weights)\n\n        self.head.weight.data.mul_(init_scale)\n        self.head.bias.data.mul_(init_scale)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Linear):\n            trunc_normal_(m.weight, std=.02)\n            if isinstance(m, nn.Linear) and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n        elif isinstance(m, nn.LayerNorm):\n            nn.init.constant_(m.bias, 0)\n            nn.init.constant_(m.weight, 1.0)\n\n    def get_num_layers(self):\n        return len(self.blocks)\n\n    @torch.jit.ignore\n    def no_weight_decay(self):\n        return {'pos_embed', 'cls_token'}\n\n    def get_classifier(self):\n        return self.head\n\n    def reset_classifier(self, num_classes, global_pool=''):\n        self.num_classes = num_classes\n        self.head = nn.Linear(self.embed_dim, num_classes) if num_classes > 0 else nn.Identity()\n\n    def forward_features(self, x):\n        x = self.patch_embed(x)\n        B, _, _ = x.size()\n\n        if self.pos_embed is not None:\n            x = x + self.pos_embed.expand(B, -1, -1).type_as(x).to(x.device).clone().detach()\n        x = self.pos_drop(x)\n\n        for idx, blk in enumerate(self.blocks):\n            if self.use_checkpoint and idx < self.checkpoint_num:\n                x = checkpoint.checkpoint(blk, x)\n            else:\n                x = blk(x)\n\n        x = self.norm(x)\n        if self.fc_norm is not None:\n            return self.fc_norm(x.mean(1))\n        else:\n            return x[:, 0]\n\n    def forward(self, x):\n        x = self.forward_features(x)\n        x = self.head(self.fc_dropout(x))\n        return x\n\n\n# @register_model\n# def vit_base_patch16_224(pretrained=False, **kwargs):\n#     model = VisionTransformer(\n#         patch_size=16, embed_dim=768, depth=12, num_heads=12, mlp_ratio=4, qkv_bias=True,\n#         norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs)\n#     model.default_cfg = _cfg()\n#     return model\n# \n# \n# # @register_model\n# def vit_base_patch16_384(pretrained=False, **kwargs):\n#     model = VisionTransformer(\n#         img_size=384, patch_size=16, embed_dim=768, depth=12, num_heads=12, mlp_ratio=4, qkv_bias=True,\n#         norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs)\n#     model.default_cfg = _cfg()\n#     return model\n\n\n@register_model\ndef vit_large_patch16_224(pretrained=False, **kwargs):\n    kwargs.pop('pretrained_cfg', None) # added by Ziqi to accommodate timm=0.9.12\n    kwargs.pop('pretrained_cfg_overlay', None) # added by Ziqi to accommodate timm=0.9.12\n    model = VisionTransformer(\n        patch_size=16, embed_dim=1024, depth=24, num_heads=16, mlp_ratio=4, qkv_bias=True,\n        norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs)\n    model.default_cfg = _cfg()\n    return model\n\n\n# @register_model\n# def vit_large_patch16_384(pretrained=False, **kwargs):\n#     model = VisionTransformer(\n#         img_size=384, patch_size=16, embed_dim=1024, depth=24, num_heads=16, mlp_ratio=4, qkv_bias=True,\n#         norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs)\n#     model.default_cfg = _cfg()\n#     return model\n\n\nif __name__ == '__main__':\n    import time\n    from fvcore.nn import FlopCountAnalysis\n    from fvcore.nn import flop_count_table\n    import numpy as np\n\n    seed = 4217\n    np.random.seed(seed)\n    torch.manual_seed(seed)\n    torch.cuda.manual_seed(seed)\n    torch.cuda.manual_seed_all(seed)\n    num_frames = 8\n\n    # model = vit_base_patch16_384(all_frames=num_frames, tubelet_size=1)\n    # model = vit_large_patch16_384(all_frames=num_frames, tubelet_size=1)\n    # print(model)\n\n    flops = FlopCountAnalysis(model, torch.rand(1, 3, num_frames, 384, 384))\n    s = time.time()\n    print(flop_count_table(flops, max_depth=1))\n    print(time.time()-s)\n    # print(model(torch.rand(1, 3, num_frames, 224, 224)).shape)\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/models/modeling_pretrain.py",
    "content": "import math\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.utils.checkpoint as checkpoint\nfrom functools import partial\n\nfrom .modeling_finetune import Block, _cfg, PatchEmbed, get_sinusoid_encoding_table\nfrom timm.models.registry import register_model\nfrom timm.models.layers import trunc_normal_ as __call_trunc_normal_\n\n\ndef trunc_normal_(tensor, mean=0., std=1.):\n    __call_trunc_normal_(tensor, mean=mean, std=std, a=-std, b=std)\n\n\nclass PretrainVisionTransformerEncoder(nn.Module):\n    \"\"\" Vision Transformer with support for patch or hybrid CNN input stage\n    \"\"\"\n    def __init__(self, img_size=224, patch_size=16, in_chans=3, num_classes=0, embed_dim=768, depth=12,\n                 num_heads=12, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop_rate=0., attn_drop_rate=0.,\n                 drop_path_rate=0., norm_layer=nn.LayerNorm, init_values=None, \n                 num_frames=16, tubelet_size=2, use_checkpoint=False,\n                 use_learnable_pos_emb=False):\n        super().__init__()\n        self.num_classes = num_classes\n        self.num_features = self.embed_dim = embed_dim  # num_features for consistency with other models\n        self.patch_embed = PatchEmbed(\n            img_size=img_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim,\n            num_frames=num_frames, tubelet_size=tubelet_size\n        )\n        num_patches = self.patch_embed.num_patches\n        self.use_checkpoint = use_checkpoint\n\n        # TODO: Add the cls token\n        if use_learnable_pos_emb:\n            self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim))\n        else:\n            # sine-cosine positional embeddings \n            self.pos_embed = get_sinusoid_encoding_table(num_patches, embed_dim)\n\n        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)]  # stochastic depth decay rule\n        self.blocks = nn.ModuleList([\n            Block(\n                dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,\n                drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer,\n                init_values=init_values)\n            for i in range(depth)])\n        self.norm =  norm_layer(embed_dim)\n        self.head = nn.Linear(embed_dim, num_classes) if num_classes > 0 else nn.Identity()\n\n        if use_learnable_pos_emb:\n            trunc_normal_(self.pos_embed, std=.02)\n\n        self.apply(self._init_weights)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Linear):\n            nn.init.xavier_uniform_(m.weight)\n            if isinstance(m, nn.Linear) and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n        elif isinstance(m, nn.LayerNorm):\n            nn.init.constant_(m.bias, 0)\n            nn.init.constant_(m.weight, 1.0)\n\n    def get_num_layers(self):\n        return len(self.blocks)\n\n    @torch.jit.ignore\n    def no_weight_decay(self):\n        return {'pos_embed', 'cls_token'}\n\n    def get_classifier(self):\n        return self.head\n\n    def reset_classifier(self, num_classes, global_pool=''):\n        self.num_classes = num_classes\n        self.head = nn.Linear(self.embed_dim, num_classes) if num_classes > 0 else nn.Identity()\n\n    def forward_features(self, x, mask):\n        _, _, T, _, _ = x.shape\n        x = self.patch_embed(x)\n        \n        x = x + self.pos_embed.type_as(x).to(x.device).clone().detach()\n\n        B, _, C = x.shape\n        x_vis = x[~mask].reshape(B, -1, C) # ~mask means visible\n\n        if self.use_checkpoint:\n            for blk in self.blocks:\n                x_vis = checkpoint.checkpoint(blk, x_vis)\n        else:   \n            for blk in self.blocks:\n                x_vis = blk(x_vis)\n\n        x_vis = self.norm(x_vis)\n        return x_vis\n\n    def forward(self, x, mask):\n        x = self.forward_features(x, mask)\n        x = self.head(x)\n        return x\n\n\nclass PretrainVisionTransformerDecoder(nn.Module):\n    \"\"\" Vision Transformer with support for patch or hybrid CNN input stage\n    \"\"\"\n    def __init__(self, patch_size=16, num_classes=768, embed_dim=768, depth=12, num_heads=12, mlp_ratio=4.,\n                 qkv_bias=False, qk_scale=None, drop_rate=0., attn_drop_rate=0., drop_path_rate=0.,\n                 norm_layer=nn.LayerNorm, init_values=None, num_patches=196, tubelet_size=2, use_checkpoint=False\n                 ):\n        super().__init__()\n        self.num_classes = num_classes\n        assert num_classes == 3 * tubelet_size * patch_size ** 2 \n        self.num_features = self.embed_dim = embed_dim  # num_features for consistency with other models\n        self.patch_size = patch_size\n        self.use_checkpoint = use_checkpoint\n\n        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)]  # stochastic depth decay rule\n        self.blocks = nn.ModuleList([\n            Block(\n                dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,\n                drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer,\n                init_values=init_values)\n            for i in range(depth)])\n        self.norm =  norm_layer(embed_dim)\n        self.head = nn.Linear(embed_dim, num_classes) if num_classes > 0 else nn.Identity()\n\n        self.apply(self._init_weights)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Linear):\n            nn.init.xavier_uniform_(m.weight)\n            if isinstance(m, nn.Linear) and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n        elif isinstance(m, nn.LayerNorm):\n            nn.init.constant_(m.bias, 0)\n            nn.init.constant_(m.weight, 1.0)\n\n    def get_num_layers(self):\n        return len(self.blocks)\n\n    @torch.jit.ignore\n    def no_weight_decay(self):\n        return {'pos_embed', 'cls_token'}\n\n    def get_classifier(self):\n        return self.head\n\n    def reset_classifier(self, num_classes, global_pool=''):\n        self.num_classes = num_classes\n        self.head = nn.Linear(self.embed_dim, num_classes) if num_classes > 0 else nn.Identity()\n\n    def forward(self, x, return_token_num):\n        if self.use_checkpoint:\n            for blk in self.blocks:\n                x = checkpoint.checkpoint(blk, x)\n        else:   \n            for blk in self.blocks:\n                x = blk(x)\n\n        if return_token_num > 0:\n            x = self.head(self.norm(x[:, -return_token_num:])) # only return the mask tokens predict pixels\n        else:\n            x = self.head(self.norm(x))\n\n        return x\n\n\nclass PretrainVisionTransformer(nn.Module):\n    \"\"\" Vision Transformer with support for patch or hybrid CNN input stage\n    \"\"\"\n    def __init__(self,\n                 img_size=224, \n                 patch_size=16, \n                 encoder_in_chans=3, \n                 encoder_num_classes=0, \n                 encoder_embed_dim=768, \n                 encoder_depth=12,\n                 encoder_num_heads=12, \n                 decoder_num_classes=1536, #  decoder_num_classes=768, \n                 decoder_embed_dim=512, \n                 decoder_depth=8,\n                 decoder_num_heads=8, \n                 mlp_ratio=4., \n                 qkv_bias=False, \n                 qk_scale=None, \n                 drop_rate=0., \n                 attn_drop_rate=0.,\n                 drop_path_rate=0., \n                 norm_layer=nn.LayerNorm, \n                 init_values=0.,\n                 use_learnable_pos_emb=False,\n                 use_checkpoint=False,\n                 num_frames=16,\n                 tubelet_size=2,\n                 num_classes=0, # avoid the error from create_fn in timm\n                 in_chans=0, # avoid the error from create_fn in timm\n                 ):\n        super().__init__()\n        self.encoder = PretrainVisionTransformerEncoder(\n            img_size=img_size, \n            patch_size=patch_size, \n            in_chans=encoder_in_chans, \n            num_classes=encoder_num_classes, \n            embed_dim=encoder_embed_dim, \n            depth=encoder_depth,\n            num_heads=encoder_num_heads, \n            mlp_ratio=mlp_ratio, \n            qkv_bias=qkv_bias, \n            qk_scale=qk_scale, \n            drop_rate=drop_rate, \n            attn_drop_rate=attn_drop_rate,\n            drop_path_rate=drop_path_rate, \n            norm_layer=norm_layer, \n            init_values=init_values,\n            num_frames=num_frames,\n            tubelet_size=tubelet_size,\n            use_checkpoint=use_checkpoint,\n            use_learnable_pos_emb=use_learnable_pos_emb)\n\n        self.decoder = PretrainVisionTransformerDecoder(\n            patch_size=patch_size, \n            num_patches=self.encoder.patch_embed.num_patches,\n            num_classes=decoder_num_classes, \n            embed_dim=decoder_embed_dim, \n            depth=decoder_depth,\n            num_heads=decoder_num_heads, \n            mlp_ratio=mlp_ratio, \n            qkv_bias=qkv_bias, \n            qk_scale=qk_scale, \n            drop_rate=drop_rate, \n            attn_drop_rate=attn_drop_rate,\n            drop_path_rate=drop_path_rate, \n            norm_layer=norm_layer, \n            init_values=init_values,\n            tubelet_size=tubelet_size,\n            use_checkpoint=use_checkpoint)\n\n        self.encoder_to_decoder = nn.Linear(encoder_embed_dim, decoder_embed_dim, bias=False)\n\n        self.mask_token = nn.Parameter(torch.zeros(1, 1, decoder_embed_dim))\n\n        self.pos_embed = get_sinusoid_encoding_table(self.encoder.patch_embed.num_patches, decoder_embed_dim)\n\n        trunc_normal_(self.mask_token, std=.02)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Linear):\n            nn.init.xavier_uniform_(m.weight)\n            if isinstance(m, nn.Linear) and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n        elif isinstance(m, nn.LayerNorm):\n            nn.init.constant_(m.bias, 0)\n            nn.init.constant_(m.weight, 1.0)\n\n    def get_num_layers(self):\n        return len(self.blocks)\n\n    @torch.jit.ignore\n    def no_weight_decay(self):\n        return {'pos_embed', 'cls_token', 'mask_token'}\n\n    def forward(self, x, mask):\n        _, _, T, _, _ = x.shape\n        x_vis = self.encoder(x, mask) # [B, N_vis, C_e]\n        x_vis = self.encoder_to_decoder(x_vis) # [B, N_vis, C_d]\n        B, N, C = x_vis.shape\n        # we don't unshuffle the correct visible token order, \n        # but shuffle the pos embedding accorddingly.\n        expand_pos_embed = self.pos_embed.expand(B, -1, -1).type_as(x).to(x.device).clone().detach()\n        pos_emd_vis = expand_pos_embed[~mask].reshape(B, -1, C)\n        pos_emd_mask = expand_pos_embed[mask].reshape(B, -1, C)\n        x_full = torch.cat([x_vis + pos_emd_vis, self.mask_token + pos_emd_mask], dim=1) # [B, N, C_d]\n        x = self.decoder(x_full, pos_emd_mask.shape[1]) # [B, N_mask, 3 * 16 * 16]\n\n        return x\n\n\n@register_model\ndef pretrain_videomae_base_patch16_224(pretrained=False, **kwargs):\n    model = PretrainVisionTransformer(\n        img_size=224,\n        patch_size=16, \n        encoder_embed_dim=768, \n        encoder_depth=12, \n        encoder_num_heads=12,\n        encoder_num_classes=0,\n        decoder_num_classes=1536,\n        decoder_embed_dim=384,\n        decoder_num_heads=6,\n        mlp_ratio=4, \n        qkv_bias=True,\n        norm_layer=partial(nn.LayerNorm, eps=1e-6), \n        **kwargs)\n    model.default_cfg = _cfg()\n    if pretrained:\n        checkpoint = torch.load(\n            kwargs[\"init_ckpt\"], map_location=\"cpu\"\n        )\n        model.load_state_dict(checkpoint[\"model\"])\n    return model\n \n\n@register_model\ndef pretrain_videomae_large_patch16_224(pretrained=False, **kwargs):\n    model = PretrainVisionTransformer(\n        img_size=224,\n        patch_size=16, \n        encoder_embed_dim=1024, \n        encoder_depth=24, \n        encoder_num_heads=16,\n        encoder_num_classes=0,\n        decoder_num_classes=1536, \n        decoder_embed_dim=512,\n        decoder_num_heads=8,\n        mlp_ratio=4, \n        qkv_bias=True,\n        norm_layer=partial(nn.LayerNorm, eps=1e-6), \n        **kwargs)\n    model.default_cfg = _cfg()\n    if pretrained:\n        checkpoint = torch.load(\n            kwargs[\"init_ckpt\"], map_location=\"cpu\"\n        )\n        model.load_state_dict(checkpoint[\"model\"])\n    return model\n\n\n@register_model\ndef pretrain_videomae_huge_patch16_224(pretrained=False, **kwargs):\n    model = PretrainVisionTransformer(\n        img_size=224,\n        patch_size=16, \n        encoder_embed_dim=1280, \n        encoder_depth=32, \n        encoder_num_heads=16,\n        encoder_num_classes=0,\n        decoder_num_classes=1536, \n        decoder_embed_dim=640,\n        decoder_num_heads=8,\n        mlp_ratio=4, \n        qkv_bias=True,\n        norm_layer=partial(nn.LayerNorm, eps=1e-6), \n        **kwargs)\n    model.default_cfg = _cfg()\n    if pretrained:\n        checkpoint = torch.load(\n            kwargs[\"init_ckpt\"], map_location=\"cpu\"\n        )\n        model.load_state_dict(checkpoint[\"model\"])\n    return model\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench/third_pary/umt/models/modeling_pretrain_umt.py",
    "content": "import math\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.utils.checkpoint as checkpoint\nfrom functools import partial\n\nfrom .modeling_finetune import Block, DropPath, Mlp, _cfg, PatchEmbed\nfrom timm.models.registry import register_model\nfrom timm.models.layers import trunc_normal_ as __call_trunc_normal_\n\n\ndef trunc_normal_(tensor, mean=0., std=1.):\n    __call_trunc_normal_(tensor, mean=mean, std=std, a=-std, b=std)\n\n\n# sin-cos position encoding\n# https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/master/transformer/Models.py#L31\ndef get_sinusoid_encoding_table(n_position, d_hid): \n    ''' Sinusoid position encoding table ''' \n    # TODO: make it with torch instead of numpy \n    def get_position_angle_vec(position): \n        return [position / np.power(10000, 2 * (hid_j // 2) / d_hid) for hid_j in range(d_hid)] \n\n    sinusoid_table = np.array([get_position_angle_vec(pos_i) for pos_i in range(n_position)]) \n    sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2]) # dim 2i \n    sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2]) # dim 2i+1 \n\n    return  torch.tensor(sinusoid_table, dtype=torch.float, requires_grad=False).unsqueeze(0) \n\n\nclass PretrainVisionTransformerEncoder(nn.Module):\n    \"\"\" Vision Transformer with support for patch or hybrid CNN input stage\n    \"\"\"\n    def __init__(self, img_size=224, patch_size=16, in_chans=3, num_classes=0, embed_dim=768, depth=12,\n                 num_heads=12, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop_rate=0., attn_drop_rate=0.,\n                 drop_path_rate=0., norm_layer=nn.LayerNorm, init_values=None, num_frames=16, tubelet_size=2,\n                 use_checkpoint=False, checkpoint_num=0, use_learnable_pos_emb=False, clip_return_layer=1,\n                 clip_student_return_interval=1):\n        super().__init__()\n        self.num_classes = num_classes\n        self.num_features = self.embed_dim = embed_dim  # num_features for consistency with other models\n        self.patch_embed = PatchEmbed(\n            img_size=img_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim, \n            num_frames=num_frames, tubelet_size=tubelet_size\n        )\n        num_patches = self.patch_embed.num_patches\n        self.use_checkpoint = use_checkpoint\n        self.checkpoint_num = checkpoint_num\n        print(f'Use checkpoint: {use_checkpoint}')\n        print(f'Checkpoint number: {checkpoint_num}')\n        self.return_index = []\n        for i in range(clip_return_layer):\n            self.return_index.append(depth - int(i * clip_student_return_interval) - 1)\n        print(f'Student return index: {self.return_index}')\n        \n        self.use_learnable_pos_emb = use_learnable_pos_emb\n        if use_learnable_pos_emb:\n            print('Use learnable position embedding')\n            self.pos_embed = nn.Parameter(torch.zeros(1, num_patches, embed_dim))\n        else:\n            # sine-cosine positional embeddings \n            self.pos_embed = get_sinusoid_encoding_table(num_patches, embed_dim)\n\n        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)]  # stochastic depth decay rule\n        self.blocks = nn.ModuleList([\n            Block(\n                dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,\n                drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer,\n                init_values=init_values)\n            for i in range(depth)])\n        self.norm =  norm_layer(embed_dim)\n        self.head = nn.Linear(embed_dim, num_classes) if num_classes > 0 else nn.Identity()\n\n        if use_learnable_pos_emb:\n            trunc_normal_(self.pos_embed, std=.02)\n\n        self.apply(self._init_weights)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Linear):\n            nn.init.xavier_uniform_(m.weight)\n            if isinstance(m, nn.Linear) and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n        elif isinstance(m, nn.LayerNorm):\n            nn.init.constant_(m.bias, 0)\n            nn.init.constant_(m.weight, 1.0)\n\n    def get_num_layers(self):\n        return len(self.blocks)\n\n    @torch.jit.ignore\n    def no_weight_decay(self):\n        return {'pos_embed', 'cls_token'}\n\n    def get_classifier(self):\n        return self.head\n\n    def reset_classifier(self, num_classes, global_pool=''):\n        self.num_classes = num_classes\n        self.head = nn.Linear(self.embed_dim, num_classes) if num_classes > 0 else nn.Identity()\n\n    def forward_features(self, x, mask):\n        x = self.patch_embed(x)\n        \n        if self.use_learnable_pos_emb:\n            x = x + self.pos_embed.type_as(x).to(x.device)\n        else:\n            x = x + self.pos_embed.type_as(x).to(x.device).clone().detach()\n\n        B, _, C = x.shape\n        x_vis = x[~mask].reshape(B, -1, C) # ~mask means visible\n        x_clip_vis = []\n\n        for idx, blk in enumerate(self.blocks):\n            if self.use_checkpoint and idx < self.checkpoint_num:\n                x_vis = checkpoint.checkpoint(blk, x_vis)\n            else:\n                x_vis = blk(x_vis)\n            if idx in self.return_index:\n                x_clip_vis.append(x_vis)\n\n        x_vis = self.norm(x_vis)\n        x_clip_vis = self.norm(torch.stack(x_clip_vis))\n        return x_vis, x_clip_vis\n\n    def forward(self, x, mask):\n        x, x_clip_vis = self.forward_features(x, mask)\n        x = self.head(x)\n        x_clip_vis = self.head(x_clip_vis)\n        return x_clip_vis\n\n\nclass Linear_Decoder(nn.Module):\n    def __init__(self, num_classes=768, embed_dim=768, \n                 norm_layer=nn.LayerNorm, clip_norm_type='l2'):\n        super().__init__()\n        self.clip_norm_type = clip_norm_type\n        print(f'Normalization Type: {clip_norm_type}')\n\n        self.head = nn.Linear(embed_dim, num_classes)\n        self.norm =  norm_layer(num_classes)\n\n        self.apply(self._init_weights)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Linear):\n            nn.init.xavier_uniform_(m.weight)\n            if isinstance(m, nn.Linear) and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n        elif isinstance(m, nn.LayerNorm):\n            nn.init.constant_(m.bias, 0)\n            nn.init.constant_(m.weight, 1.0)\n\n    def forward(self, x):\n        x = self.norm(self.head(x))\n\n        if self.clip_norm_type == 'l2':\n            x = x / x.norm(dim=-1, keepdim=True)\n        elif self.clip_norm_type == 'none':\n            pass\n        else:\n            raise NotImplementedError\n\n        return x\n\n\nclass PretrainVisionTransformer(nn.Module):\n    \"\"\" Vision Transformer with support for patch or hybrid CNN input stage\n    \"\"\"\n    def __init__(self,\n                 img_size=224, \n                 patch_size=16, \n                 encoder_in_chans=3, \n                 encoder_num_classes=0, \n                 encoder_embed_dim=768, \n                 encoder_depth=12,\n                 encoder_num_heads=12, \n                 mlp_ratio=4., \n                 qkv_bias=False, \n                 qk_scale=None, \n                 drop_rate=0., \n                 attn_drop_rate=0.,\n                 drop_path_rate=0., \n                 norm_layer=nn.LayerNorm, \n                 init_values=0.,\n                 use_learnable_pos_emb=False,\n                 use_checkpoint=False,\n                 checkpoint_num=0,\n                 num_frames=16,\n                 tubelet_size=2,\n                 # clip,\n                 clip_decoder_embed_dim=768,\n                 clip_output_dim=512,\n                 clip_norm_type='l2',\n                 clip_return_layer=1,\n                 clip_student_return_interval=1,\n                ):\n        super().__init__()\n\n        self.encoder = PretrainVisionTransformerEncoder(\n            img_size=img_size, \n            patch_size=patch_size, \n            in_chans=encoder_in_chans, \n            num_classes=encoder_num_classes, \n            embed_dim=encoder_embed_dim, \n            depth=encoder_depth,\n            num_heads=encoder_num_heads, \n            mlp_ratio=mlp_ratio, \n            qkv_bias=qkv_bias, \n            qk_scale=qk_scale, \n            drop_rate=drop_rate, \n            attn_drop_rate=attn_drop_rate,\n            drop_path_rate=drop_path_rate, \n            norm_layer=norm_layer, \n            init_values=init_values,\n            num_frames=num_frames,\n            tubelet_size=tubelet_size,\n            use_checkpoint=use_checkpoint,\n            checkpoint_num=checkpoint_num,\n            use_learnable_pos_emb=use_learnable_pos_emb,\n            clip_return_layer=clip_return_layer,\n            clip_student_return_interval=clip_student_return_interval\n        )\n\n        # CLIP decoder\n        self.clip_decoder = nn.ModuleList([\n            Linear_Decoder(\n                num_classes=clip_output_dim, \n                embed_dim=clip_decoder_embed_dim, \n                norm_layer=norm_layer, \n                clip_norm_type=clip_norm_type\n            ) for _ in range(clip_return_layer)\n        ])\n\n        self.clip_pos_embed = get_sinusoid_encoding_table(self.encoder.patch_embed.num_patches, clip_decoder_embed_dim)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Linear):\n            nn.init.xavier_uniform_(m.weight)\n            if isinstance(m, nn.Linear) and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n        elif isinstance(m, nn.LayerNorm):\n            nn.init.constant_(m.bias, 0)\n            nn.init.constant_(m.weight, 1.0)\n\n    def get_num_layers(self):\n        return len(self.blocks)\n\n    @torch.jit.ignore\n    def no_weight_decay(self):\n        return {'pos_embed', 'cls_token', 'mask_token', 'clip_mask_token', 'clip_pos_embed'}\n\n    def forward(self, x, mask):\n        x_clip_vis = self.encoder(x, mask) # [B, N_vis, C_e]\n        \n        # align CLIP\n        K, B, _, C_CLIP = x_clip_vis.shape\n        expand_clip_pos_embed = self.clip_pos_embed.repeat(B, 1, 1).type_as(x).to(x.device).clone().detach()\n        clip_pos_emd_vis = expand_clip_pos_embed[~mask].view(B, -1, C_CLIP).unsqueeze(0).repeat(K, 1, 1, 1)\n        x_clip_full = x_clip_vis + clip_pos_emd_vis # [K, B, N, C_d_clip]\n\n        x_clip = []\n        for idx, clip_decoder in enumerate(self.clip_decoder):\n            x_clip.append(clip_decoder(x_clip_full[idx]))\n        x_clip = torch.stack(x_clip) # align and normalize\n        \n        return x_clip\n    \n\n@register_model\ndef pretrain_umt_base_patch16_224(pretrained=False, **kwargs):\n    model = PretrainVisionTransformer(\n        img_size=224,\n        patch_size=16, \n        encoder_embed_dim=768, \n        encoder_depth=12, \n        encoder_num_heads=12,\n        encoder_num_classes=0,\n        mlp_ratio=4, \n        qkv_bias=True,\n        norm_layer=partial(nn.LayerNorm, eps=1e-6), \n        **kwargs)\n    model.default_cfg = _cfg()\n    if pretrained:\n        checkpoint = torch.load(\n            kwargs[\"init_ckpt\"], map_location=\"cpu\"\n        )\n        model.load_state_dict(checkpoint[\"model\"])\n    return model\n \n\n@register_model\ndef pretrain_umt_large_patch16_224(pretrained=False, **kwargs):\n    model = PretrainVisionTransformer(\n        img_size=224,\n        patch_size=16, \n        encoder_embed_dim=1024, \n        encoder_depth=24, \n        encoder_num_heads=16,\n        encoder_num_classes=0,\n        mlp_ratio=4, \n        qkv_bias=True,\n        norm_layer=partial(nn.LayerNorm, eps=1e-6), \n        **kwargs)\n    model.default_cfg = _cfg()\n    if pretrained:\n        checkpoint = torch.load(\n            kwargs[\"init_ckpt\"], map_location=\"cpu\"\n        )\n        model.load_state_dict(checkpoint[\"model\"])\n    return model\n\n\nif __name__ == '__main__':\n    import time\n    from fvcore.nn import FlopCountAnalysis\n    from fvcore.nn import flop_count_table\n    import numpy as np\n\n    seed = 4217\n    np.random.seed(seed)\n    torch.manual_seed(seed)\n    torch.cuda.manual_seed(seed)\n    torch.cuda.manual_seed_all(seed)\n\n    model = pretrain_umt_base_patch16_224()\n\n    # flops = FlopCountAnalysis(model, torch.rand(1, 3, 16, 224, 224))\n    # s = time.time()\n    # print(flop_count_table(flops, max_depth=1))\n    # print(time.time()-s)\n    mask = torch.cat([\n        torch.ones(1, 8 * int(14 * 14 * 0.75)),\n        torch.zeros(1, 8 * int(14 * 14 * 0.25)),\n    ], dim=-1).to(torch.bool)\n    print(model(torch.rand(1, 3, 16, 224, 224), mask)[1].shape)"
  },
  {
    "path": "Open-Sora/build/lib/vbench/utils.py",
    "content": "import os\nimport json\nimport numpy as np\nimport logging\nimport subprocess\nimport torch\nimport re\nfrom pathlib import Path\nfrom PIL import Image, ImageSequence\nfrom decord import VideoReader, cpu\nfrom torchvision import transforms\nfrom torchvision.transforms import Compose, Resize, CenterCrop, ToTensor, Normalize, ToPILImage\ntry:\n    from torchvision.transforms import InterpolationMode\n    BICUBIC = InterpolationMode.BICUBIC\n    BILINEAR = InterpolationMode.BILINEAR\nexcept ImportError:\n    BICUBIC = Image.BICUBIC\n    BILINEAR = Image.BILINEAR\n\nCACHE_DIR = os.environ.get('VBENCH_CACHE_DIR')\nif CACHE_DIR is None:\n    CACHE_DIR = os.path.join(os.path.expanduser('~'), '.cache', 'vbench')\n\nlogging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')\nlogger = logging.getLogger(__name__)\n\ndef clip_transform(n_px):\n    return Compose([\n        Resize(n_px, interpolation=BICUBIC),\n        CenterCrop(n_px),\n        transforms.Lambda(lambda x: x.float().div(255.0)),\n        Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),\n    ])\n\ndef clip_transform_Image(n_px):\n    return Compose([\n        Resize(n_px, interpolation=BICUBIC),\n        CenterCrop(n_px),\n        ToTensor(),\n        Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),\n    ])\n\ndef dino_transform(n_px):\n    return Compose([\n        Resize(size=n_px),\n        transforms.Lambda(lambda x: x.float().div(255.0)),\n        Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))\n    ])\n\ndef dino_transform_Image(n_px):\n    return Compose([\n        Resize(size=n_px),\n        ToTensor(),\n        Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))\n    ])\n\ndef tag2text_transform(n_px):\n    normalize = Normalize(mean=[0.485, 0.456, 0.406],\n                                        std=[0.229, 0.224, 0.225])\n    return Compose([ToPILImage(),Resize((n_px, n_px)),ToTensor(),normalize])\n\ndef get_frame_indices(num_frames, vlen, sample='rand', fix_start=None, input_fps=1, max_num_frames=-1):\n    if sample in [\"rand\", \"middle\"]: # uniform sampling\n        acc_samples = min(num_frames, vlen)\n        # split the video into `acc_samples` intervals, and sample from each interval.\n        intervals = np.linspace(start=0, stop=vlen, num=acc_samples + 1).astype(int)\n        ranges = []\n        for idx, interv in enumerate(intervals[:-1]):\n            ranges.append((interv, intervals[idx + 1] - 1))\n        if sample == 'rand':\n            try:\n                frame_indices = [random.choice(range(x[0], x[1])) for x in ranges]\n            except:\n                frame_indices = np.random.permutation(vlen)[:acc_samples]\n                frame_indices.sort()\n                frame_indices = list(frame_indices)\n        elif fix_start is not None:\n            frame_indices = [x[0] + fix_start for x in ranges]\n        elif sample == 'middle':\n            frame_indices = [(x[0] + x[1]) // 2 for x in ranges]\n        else:\n            raise NotImplementedError\n\n        if len(frame_indices) < num_frames:  # padded with last frame\n            padded_frame_indices = [frame_indices[-1]] * num_frames\n            padded_frame_indices[:len(frame_indices)] = frame_indices\n            frame_indices = padded_frame_indices\n    elif \"fps\" in sample:  # fps0.5, sequentially sample frames at 0.5 fps\n        output_fps = float(sample[3:])\n        duration = float(vlen) / input_fps\n        delta = 1 / output_fps  # gap between frames, this is also the clip length each frame represents\n        frame_seconds = np.arange(0 + delta / 2, duration + delta / 2, delta)\n        frame_indices = np.around(frame_seconds * input_fps).astype(int)\n        frame_indices = [e for e in frame_indices if e < vlen]\n        if max_num_frames > 0 and len(frame_indices) > max_num_frames:\n            frame_indices = frame_indices[:max_num_frames]\n            # frame_indices = np.linspace(0 + delta / 2, duration + delta / 2, endpoint=False, num=max_num_frames)\n    else:\n        raise ValueError\n    return frame_indices\n\ndef load_video(video_path, data_transform=None, num_frames=None, return_tensor=True, width=None, height=None):\n    \"\"\"\n    Load a video from a given path and apply optional data transformations.\n\n    The function supports loading video in GIF (.gif), PNG (.png), and MP4 (.mp4) formats.\n    Depending on the format, it processes and extracts frames accordingly.\n    \n    Parameters:\n    - video_path (str): The file path to the video or image to be loaded.\n    - data_transform (callable, optional): A function that applies transformations to the video data.\n    \n    Returns:\n    - frames (torch.Tensor): A tensor containing the video frames with shape (T, C, H, W),\n      where T is the number of frames, C is the number of channels, H is the height, and W is the width.\n    \n    Raises:\n    - NotImplementedError: If the video format is not supported.\n    \n    The function first determines the format of the video file by its extension.\n    For GIFs, it iterates over each frame and converts them to RGB.\n    For PNGs, it reads the single frame, converts it to RGB.\n    For MP4s, it reads the frames using the VideoReader class and converts them to NumPy arrays.\n    If a data_transform is provided, it is applied to the buffer before converting it to a tensor.\n    Finally, the tensor is permuted to match the expected (T, C, H, W) format.\n    \"\"\"\n    if video_path.endswith('.gif'):\n        frame_ls = []\n        img = Image.open(video_path)\n        for frame in ImageSequence.Iterator(img):\n            frame = frame.convert('RGB')\n            frame = np.array(frame).astype(np.uint8)\n            frame_ls.append(frame)\n        buffer = np.array(frame_ls).astype(np.uint8)\n    elif video_path.endswith('.png'):\n        frame = Image.open(video_path)\n        frame = frame.convert('RGB')\n        frame = np.array(frame).astype(np.uint8)\n        frame_ls = [frame]\n        buffer = np.array(frame_ls)\n    elif video_path.endswith('.mp4'):\n        import decord\n        decord.bridge.set_bridge('native')\n        if width:\n            video_reader = VideoReader(video_path, width=width, height=height, num_threads=1)\n        else:\n            video_reader = VideoReader(video_path, num_threads=1)\n        frames = video_reader.get_batch(range(len(video_reader)))  # (T, H, W, C), torch.uint8\n\n        buffer = frames.asnumpy().astype(np.uint8)\n    else:\n        raise NotImplementedError\n    \n    frames = buffer\n    if num_frames:\n        frame_indices = get_frame_indices(\n        num_frames, len(frames), sample=\"middle\"\n        )\n        frames = frames[frame_indices]\n    \n    if data_transform:\n        frames = data_transform(frames)\n    elif return_tensor:\n        frames = torch.Tensor(frames)\n        frames = frames.permute(0, 3, 1, 2)  # (T, C, H, W), torch.uint8\n\n    return frames\n\ndef read_frames_decord_by_fps(\n        video_path, sample_fps=2, sample='rand', fix_start=None, \n        max_num_frames=-1,  trimmed30=False, num_frames=8\n    ):\n    import decord\n    decord.bridge.set_bridge(\"torch\")\n    video_reader = VideoReader(video_path, num_threads=1)\n    vlen = len(video_reader)\n    fps = video_reader.get_avg_fps()\n    duration = vlen / float(fps)\n\n    if trimmed30 and duration > 30:\n        duration = 30\n        vlen = int(30 * float(fps))\n\n    frame_indices = get_frame_indices(\n        num_frames, vlen, sample=sample, fix_start=fix_start,\n        input_fps=fps, max_num_frames=max_num_frames\n    )\n    frames = video_reader.get_batch(frame_indices)  # (T, H, W, C), torch.uint8\n    frames = frames.permute(0, 3, 1, 2)  # (T, C, H, W), torch.uint8\n    return frames\n    \ndef load_dimension_info(json_dir, dimension, lang):\n    \"\"\"\n    Load video list and prompt information based on a specified dimension and language from a JSON file.\n    \n    Parameters:\n    - json_dir (str): The directory path where the JSON file is located.\n    - dimension (str): The dimension for evaluation to filter the video prompts.\n    - lang (str): The language key used to retrieve the appropriate prompt text.\n    \n    Returns:\n    - video_list (list): A list of video file paths that match the specified dimension.\n    - prompt_dict_ls (list): A list of dictionaries, each containing a prompt and its corresponding video list.\n    \n    The function reads the JSON file to extract video information. It filters the prompts based on the specified\n    dimension and compiles a list of video paths and associated prompts in the specified language.\n    \n    Notes:\n    - The JSON file is expected to contain a list of dictionaries with keys 'dimension', 'video_list', and language-based prompts.\n    - The function assumes that the 'video_list' key in the JSON can either be a list or a single string value.\n    \"\"\"\n    video_list = []\n    prompt_dict_ls = []\n    full_prompt_list = load_json(json_dir)\n    for prompt_dict in full_prompt_list:\n        if dimension in prompt_dict['dimension'] and 'video_list' in prompt_dict:\n            prompt = prompt_dict[f'prompt_{lang}']\n            cur_video_list = prompt_dict['video_list'] if isinstance(prompt_dict['video_list'], list) else [prompt_dict['video_list']]\n            video_list += cur_video_list\n            if 'auxiliary_info' in prompt_dict and dimension in prompt_dict['auxiliary_info']:\n                prompt_dict_ls += [{'prompt': prompt, 'video_list': cur_video_list, 'auxiliary_info': prompt_dict['auxiliary_info'][dimension]}]\n            else:\n                prompt_dict_ls += [{'prompt': prompt, 'video_list': cur_video_list}]\n    return video_list, prompt_dict_ls\n\ndef init_submodules(dimension_list, local=False, read_frame=False):\n    submodules_dict = {}\n    if local:\n        logger.info(\"\\x1b[32m[Local Mode]\\x1b[0m Working in local mode, please make sure that the pre-trained model has been fully downloaded.\")\n    for dimension in dimension_list:\n        os.makedirs(CACHE_DIR, exist_ok=True)\n        if dimension == 'background_consistency':\n            # read_frame = False\n            if local:\n                vit_b_path = f'{CACHE_DIR}/clip_model/ViT-B-32.pt'\n                if not os.path.isfile(vit_b_path):\n                    wget_command = ['wget', 'https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt', '-P', os.path.dirname(vit_b_path)]\n                    subprocess.run(wget_command, check=True)\n            else:\n                vit_b_path = 'ViT-B/32'\n\n            submodules_dict[dimension] = [vit_b_path, read_frame]\n        elif dimension == 'human_action':\n            umt_path = f'{CACHE_DIR}/umt_model/l16_ptk710_ftk710_ftk400_f16_res224.pth'\n            if not os.path.isfile(umt_path):\n                wget_command = ['wget', 'https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/umt/single_modality/l16_ptk710_ftk710_ftk400_f16_res224.pth', '-P', os.path.dirname(umt_path)]\n                subprocess.run(wget_command, check=True)\n            submodules_dict[dimension] = [umt_path,]\n        elif dimension == 'temporal_flickering':\n            submodules_dict[dimension] = []\n        elif dimension == 'motion_smoothness':\n            CUR_DIR = os.path.dirname(os.path.abspath(__file__))\n            submodules_dict[dimension] = {\n                    'config': f'{CUR_DIR}/third_party/amt/cfgs/AMT-S.yaml',\n                    'ckpt': f'{CACHE_DIR}/amt_model/amt-s.pth'\n                }\n            details = submodules_dict[dimension]\n            # Check if the file exists, if not, download it with wget\n            if not os.path.isfile(details['ckpt']):\n                print(f\"File {details['ckpt']} does not exist. Downloading...\")\n                wget_command = ['wget', '-P', os.path.dirname(details['ckpt']),\n                                'https://huggingface.co/lalala125/AMT/resolve/main/amt-s.pth']\n                subprocess.run(wget_command, check=True)\n\n        elif dimension == 'dynamic_degree':\n            submodules_dict[dimension] = {\n                'model': f'{CACHE_DIR}/raft_model/models/raft-things.pth'\n            }\n            details = submodules_dict[dimension]\n            if not os.path.isfile(details['model']):\n                # raise NotImplementedError\n                print(f\"File {details['model']} does not exist. Downloading...\")\n                wget_command = ['wget', '-P', f'{CACHE_DIR}/raft_model/', 'https://dl.dropboxusercontent.com/s/4j4z58wuv8o0mfz/models.zip']\n                unzip_command = ['unzip', '-d', f'{CACHE_DIR}/raft_model/', f'{CACHE_DIR}/raft_model/models.zip']\n                remove_command = ['rm', '-r', f'{CACHE_DIR}/raft_model/models.zip']\n                try:\n                    subprocess.run(wget_command, check=True)\n                    subprocess.run(unzip_command, check=True)\n                    subprocess.run(remove_command, check=True)\n                except subprocess.CalledProcessError as err:\n                    print(f\"Error during downloading RAFT model: {err}\")\n        # Assign the DINO model path for subject consistency dimension\n        elif dimension == 'subject_consistency':\n            if local:\n                submodules_dict[dimension] = {\n                    'repo_or_dir': f'{CACHE_DIR}/dino_model/facebookresearch_dino_main/',\n                    'path': f'{CACHE_DIR}/dino_model/dino_vitbase16_pretrain.pth', \n                    'model': 'dino_vitb16',\n                    'source': 'local',\n                    'read_frame': read_frame\n                    }\n                details = submodules_dict[dimension]\n                # Check if the file exists, if not, download it with wget\n                if not os.path.isdir(details['repo_or_dir']):\n                    print(f\"Directory {details['repo_or_dir']} does not exist. Cloning repository...\")\n                    subprocess.run(['git', 'clone', 'https://github.com/facebookresearch/dino', details['repo_or_dir']], check=True)\n\n                if not os.path.isfile(details['path']):\n                    print(f\"File {details['path']} does not exist. Downloading...\")\n                    wget_command = ['wget', '-P', os.path.dirname(details['path']),\n                                    'https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth']\n                    subprocess.run(wget_command, check=True)\n            else:\n                submodules_dict[dimension] = {\n                    'repo_or_dir':'facebookresearch/dino:main',\n                    'source':'github',\n                    'model': 'dino_vitb16',\n                    'read_frame': read_frame\n                    }\n        elif dimension == 'aesthetic_quality':\n            aes_path = f'{CACHE_DIR}/aesthetic_model/emb_reader'\n            if local:\n                vit_l_path = f'{CACHE_DIR}/clip_model/ViT-L-14.pt'\n                if not os.path.isfile(vit_l_path):\n                    wget_command = ['wget' ,'https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt', '-P', os.path.dirname(vit_l_path)]\n                    subprocess.run(wget_command, check=True)\n            else:\n                vit_l_path = 'ViT-L/14'\n            submodules_dict[dimension] = [vit_l_path, aes_path]\n        elif dimension == 'imaging_quality':\n            musiq_spaq_path = f'{CACHE_DIR}/pyiqa_model/musiq_spaq_ckpt-358bb6af.pth'\n            if not os.path.isfile(musiq_spaq_path):\n                wget_command = ['wget', 'https://github.com/chaofengc/IQA-PyTorch/releases/download/v0.1-weights/musiq_spaq_ckpt-358bb6af.pth', '-P', os.path.dirname(musiq_spaq_path)]\n                subprocess.run(wget_command, check=True)\n            submodules_dict[dimension] = {'model_path': musiq_spaq_path}\n        elif dimension in [\"object_class\", \"multiple_objects\", \"color\", \"spatial_relationship\" ]:\n            submodules_dict[dimension] = {\n                \"model_weight\": f'{CACHE_DIR}/grit_model/grit_b_densecap_objectdet.pth'\n            }\n            if not os.path.exists(submodules_dict[dimension]['model_weight']):\n                wget_command = ['wget', 'https://datarelease.blob.core.windows.net/grit/models/grit_b_densecap_objectdet.pth', '-P', os.path.dirname(submodules_dict[dimension][\"model_weight\"])]\n                subprocess.run(wget_command, check=True)\n        elif dimension == 'scene':\n            submodules_dict[dimension] = {\n                \"pretrained\": f'{CACHE_DIR}/caption_model/tag2text_swin_14m.pth',\n                \"image_size\":384, \n                \"vit\":\"swin_b\"\n            }\n            if not os.path.exists(submodules_dict[dimension]['pretrained']):\n                wget_command = ['wget', 'https://huggingface.co/spaces/xinyu1205/recognize-anything/resolve/main/tag2text_swin_14m.pth', '-P', os.path.dirname(submodules_dict[dimension][\"pretrained\"])]\n                subprocess.run(wget_command, check=True)\n        elif dimension == 'appearance_style':\n            if local:\n                submodules_dict[dimension] = {\"name\": f'{CACHE_DIR}/clip_model/ViT-B-32.pt'}\n                if not os.path.isfile(submodules_dict[dimension][\"name\"]):\n                    wget_command = ['wget', 'https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt', '-P', os.path.dirname(submodules_dict[dimension][\"name\"])]\n                    subprocess.run(wget_command, check=True)\n            else:\n                submodules_dict[dimension] = {\"name\": 'ViT-B/32'}\n        elif dimension in [\"temporal_style\", \"overall_consistency\"]:\n            submodules_dict[dimension] = {\n                \"pretrain\": f'{CACHE_DIR}/ViCLIP/ViClip-InternVid-10M-FLT.pth',\n            }\n            if not os.path.exists(submodules_dict[dimension]['pretrain']):\n                wget_command = ['wget', 'https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/viclip/ViClip-InternVid-10M-FLT.pth', '-P', os.path.dirname(submodules_dict[dimension][\"pretrain\"])]\n                subprocess.run(wget_command, check=True)\n    return submodules_dict\n\ndef get_prompt_from_filename(path: str):\n    \"\"\"\n    1. prompt-0.suffix -> prompt\n    2. prompt.suffix -> prompt\n    \"\"\"\n    prompt = Path(path).stem\n    number_ending = r'-\\d+$' # checks ending with -<number>\n    if re.search(number_ending, prompt):\n        return re.sub(number_ending, '', prompt)\n    return prompt\n\ndef save_json(data, path, indent=4):\n    with open(path, 'w', encoding='utf-8') as f:\n        json.dump(data, f, indent=indent)\n\ndef load_json(path):\n    \"\"\"\n    Load a JSON file from the given file path.\n    \n    Parameters:\n    - file_path (str): The path to the JSON file.\n    \n    Returns:\n    - data (dict or list): The data loaded from the JSON file, which could be a dictionary or a list.\n    \"\"\"\n    with open(path, 'r', encoding='utf-8') as f:\n        return json.load(f)\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench2_beta_i2v/__init__.py",
    "content": "import os\n\nfrom vbench2_beta_i2v.utils import init_submodules, save_json, load_json\nfrom vbench import VBench\nimport importlib\n\n\nclass VBenchI2V(VBench):\n    def build_full_dimension_list(self, ):\n        return [\"subject_consistency\", \"background_consistency\", \"aesthetic_quality\", \"imaging_quality\", \"object_class\", \"multiple_objects\", \"color\", \"spatial_relationship\", \"scene\", \"temporal_style\", 'overall_consistency', \"human_action\", \"temporal_flickering\", \"motion_smoothness\", \"dynamic_degree\", \"appearance_style\", \"i2v_subject\", \"i2v_background\", \"camera_motion\"]     \n\n    def evaluate(self, videos_path, name, dimension_list=None, local=False, read_frame=False, custom_prompt=False, resolution=\"1-1\"):\n        results_dict = {}\n        if dimension_list is None:\n            dimension_list = self.build_full_dimension_list()\n        submodules_dict = init_submodules(dimension_list, local=local, read_frame=read_frame, resolution=resolution)\n        # print('BEFORE BUILDING')\n        cur_full_info_path = self.build_full_info_json(videos_path, name, dimension_list, custom_prompt=custom_prompt)\n        # print('AFTER BUILDING')\n        for dimension in dimension_list:\n            try:\n                dimension_module = importlib.import_module(f'vbench2_beta_i2v.{dimension}')\n                evaluate_func = getattr(dimension_module, f'compute_{dimension}')\n            except Exception as e:\n                raise NotImplementedError(f'UnImplemented dimension {dimension}!, {e}')\n            submodules_list = submodules_dict[dimension]\n            print(f'cur_full_info_path: {cur_full_info_path}') # TODO: to delete\n            results = evaluate_func(cur_full_info_path, self.device, submodules_list)\n            results_dict[dimension] = results\n        output_name = os.path.join(self.output_path, name+'_eval_results.json')\n        save_json(results_dict, output_name)\n        print(f'Evaluation results saved to {output_name}')\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench2_beta_i2v/camera_motion.py",
    "content": "import torch\nimport os\nimport numpy as np\nfrom tqdm import tqdm\n\nfrom vbench2_beta_i2v.third_party.cotracker.utils.visualizer import Visualizer\nfrom vbench2_beta_i2v.utils import load_video, load_dimension_info\n\n\ndef transform(vector):\n    x = np.mean([item[0] for item in vector])\n    y = np.mean([item[1] for item in vector])\n    return [x, y]\n\n\ndef transform_class(vector, min_reso, factor=0.005): # 768*0.05\n    scale = min_reso * factor\n    x, y = vector\n    direction = []\n\n    if x > scale:\n        direction.append(\"right\")\n    elif x < -scale:\n        direction.append(\"left\")\n    \n    if y > scale:\n        direction.append(\"down\")\n    elif y < -scale:\n        direction.append(\"up\")\n\n    return direction if direction else [\"static\"]\n\n\n\nclass CameraPredict:\n    def __init__(self, device, submodules_list):\n        self.device = device\n        self.grid_size = 10\n        try:\n            self.model = torch.hub.load(submodules_list[\"repo\"], submodules_list[\"model\"]).to(self.device)\n        except:\n            # workaround for CERTIFICATE_VERIFY_FAILED (see: https://github.com/pytorch/pytorch/issues/33288#issuecomment-954160699)\n            import ssl\n            ssl._create_default_https_context = ssl._create_unverified_context\n            self.model = torch.hub.load(submodules_list[\"repo\"], submodules_list[\"model\"]).to(self.device)\n\n    def infer(self, video_path, save_video=False, save_dir=\"./saved_videos\"):\n        # load video\n        video = load_video(video_path, return_tensor=False)\n        # set scale\n        height, width = video.shape[1], video.shape[2]\n        self.scale = min(height, width)\n        video = torch.from_numpy(video).permute(0, 3, 1, 2)[None].float().to(self.device) # B T C H W\n        pred_tracks, pred_visibility = self.model(video, grid_size=self.grid_size) # B T N 2,  B T N 1\n        \n        if save_video:\n            video_name = os.path.basename(video_path)[:-4]\n            vis = Visualizer(save_dir=save_dir, pad_value=120, linewidth=3)\n            vis.visualize(video, pred_tracks, pred_visibility, filename=video_name)\n\n        return pred_tracks[0].long().detach().cpu().numpy()\n    \n\n    def get_edge_point(self, track):\n        middle = self.grid_size // 2\n        top = [list(track[0, i, :]) for i in range(middle-2, middle+2)]\n        down = [list(track[self.grid_size-1, i, :]) for i in range(middle-2, middle+2)]\n        left = [list(track[i, 0, :]) for i in range(middle-2, middle+2)]\n        right = [list(track[i, self.grid_size-1, :]) for i in range(middle-2, middle+2)]\n        \n        return top, down, left, right\n    \n\n    def get_edge_direction(self, track1, track2):\n        edge_points1 = self.get_edge_point(track1)\n        edge_points2 = self.get_edge_point(track2)\n\n        vector_results = []\n        for points1, points2 in zip(edge_points1, edge_points2):\n            vectors = [[end[0]-start[0], end[1]-start[1]] for start, end in zip(points1, points2)]\n            vector_results.append(vectors)\n        vector_results = list(map(transform, vector_results)) \n        class_results = [transform_class(vector, min_reso=self.scale) for vector in vector_results]\n\n        return class_results\n\n\n    def classify_top_down(self, top, down):\n        results = []\n        classes = [f\"{item_t}_{item_d}\" for item_t in top for item_d in down]\n\n        results_mapping = {\n            \"left_left\": \"pan_right\",\n            \"right_right\": \"pan_left\",\n            \"down_down\": \"tilt_up\",\n            \"up_up\": \"tilt_down\",\n            \"up_down\": \"zoom_in\",\n            \"down_up\": \"zoom_out\",\n            \"static_static\": \"static\"\n        }\n        results = [results_mapping.get(cls) for cls in classes if cls in results_mapping]\n        return results if results else [\"None\"]\n\n\n    def classify_left_right(self, left, right):\n        results = []\n        classes = [f\"{item_l}_{item_r}\" for item_l in left for item_r in right]\n\n        results_mapping = {\n            \"left_left\": \"pan_right\",\n            \"right_right\": \"pan_left\",\n            \"down_down\": \"tilt_up\",\n            \"up_up\": \"tilt_down\",\n            \"left_right\": \"zoom_in\",\n            \"right_left\": \"zoom_out\",\n            \"static_static\": \"static\"\n        }\n        results = [results_mapping.get(cls) for cls in classes if cls in results_mapping]\n        return results if results else [\"None\"]\n\n\n    def camera_classify(self, track1, track2):\n        top, down, left, right = self.get_edge_direction(track1, track2)\n\n        top_results = self.classify_top_down(top, down)\n        left_results = self.classify_left_right(left, right)\n\n        results = list(set(top_results+left_results))\n        if \"static\" in results and len(results)>1:\n            results.remove(\"static\")\n        if \"None\" in results and len(results)>1:\n            results.remove(\"None\")  \n\n        return results\n\n\n    def predict(self, video_path):\n        pred_track = self.infer(video_path)\n        track1 = pred_track[0].reshape((self.grid_size, self.grid_size, 2))\n        track2 = pred_track[-1].reshape((self.grid_size, self.grid_size, 2))\n        results = self.camera_classify(track1, track2)\n\n        return results\n\n\ndef get_type(video_name):\n    camera_mapping = {\n        \"camera pans left\": \"pan_left\",\n        \"camera pans right\": \"pan_right\",\n        \"camera tilts up\": \"tilt_up\",\n        \"camera tilts down\": \"tilt_down\",\n        \"camera zooms in\": \"zoom_in\",\n        \"camera zooms out\": \"zoom_out\",\n        \"camera static\": \"static\"\n    }\n\n    for item, value in camera_mapping.items():\n        if item in video_name:\n            return value\n        \n    raise ValueError(\"Not a recognized video name\")\n\n\n\ndef camera_motion(camera, video_list):\n    sim = []\n    video_results = []\n    diff_type_results = {\n        \"pan_left\":[],\n        \"pan_right\":[],\n        \"tilt_up\":[],\n        \"tilt_down\":[],\n        \"zoom_in\":[],\n        \"zoom_out\":[],\n        \"static\":[],\n    }\n    for video_path in tqdm(video_list):\n        target_type = get_type(os.path.basename(video_path))\n        predict_results = camera.predict(video_path)\n\n        video_score = 1.0 if target_type in predict_results else 0.0\n        diff_type_results[target_type].append(video_score)\n        video_results.append({'video_path': video_path, 'video_results': video_score, 'prompt_type':target_type, 'predict_type': predict_results})\n        sim.append(video_score)\n    \n    avg_score = np.mean(sim)\n\n    for key, value in diff_type_results.items():\n        diff_type_results[key] = np.mean(value)\n\n    return avg_score, diff_type_results, video_results\n\n\ndef compute_camera_motion(json_dir, device, submodules_list):\n    camera = CameraPredict(device, submodules_list)\n    video_list, _ = load_dimension_info(json_dir, dimension='camera_motion', lang='en')\n    all_results, diff_type_results, video_results = camera_motion(camera, video_list)\n    return all_results, diff_type_results, video_results\n\n\n\n\n\n\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench2_beta_i2v/crop_to_diff_ratio.py",
    "content": "import os\nfrom PIL import Image\nimport json\nimport os.path as osp\nimport random\nimport argparse\nfrom tqdm import tqdm\n\nimport logging\nlogging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(levelname)s - %(message)s')\nlogger = logging.getLogger(__name__)\n\n\n\ndef save_json(data, save_file):\n    json.dump(data, open(save_file, \"w\"))\n\n\ndef crop(img_path, bbox, save_root):\n    os.makedirs(save_root, exist_ok=True)\n    img = Image.open(img_path)\n    x, y, width, height = map(int, bbox)\n    crop_img = img.crop((x, y, x+width, y+height))\n    crop_img.save(osp.join(save_root, osp.basename(img_path)))\n    \n    \ndef get_other_ratio_crop(second_crop_info, ratio=\"8-5\"):\n    random.seed(123)\n    ratio_w, ratio_h = map(int, ratio.split('-'))\n    assert 1.0 <= ratio_w/ratio_h < 1.7778, \"The ratio does not meet the requirements, it needs to be between 1:1 and 16:9.\"\n    width, height = second_crop_info['width'], second_crop_info['height']\n    x, y, crop_w, crop_h = second_crop_info['second_bbox']\n    \n    if width == height:\n        target_w = int(width/ratio_w) * ratio_w\n        target_h = int(width/ratio_w) * ratio_h\n        assert target_h >= crop_h\n        target_x = 0\n        y_min = max(y - (target_h - crop_h), 0)\n        y_max = min(y + target_h, height) - target_h\n        assert y_max >= y_min\n        target_y = random.randint(y_min, y_max)\n    else:\n        target_w = int(height/ratio_h) * ratio_w\n        target_h = int(height/ratio_h) * ratio_h\n        assert target_w >= crop_w\n        target_y = 0\n        x_min = max(x - (target_w - crop_w), 0)\n        x_max = min(x + target_w, width) - target_w\n        assert x_max >= x_min\n        target_x = random.randint(x_min, x_max)\n        \n    return [target_x, target_y, target_w, target_h]\n\n\ndef transfer_bbox_to_origin_img(first_crop_info, old_bbox):\n    x, y, _, _ = first_crop_info[\"first_bbox\"]\n    old_x, old_y, width, height = old_bbox\n    return [x + old_x, y + old_y, width, height]\n\n\n\ndef get_target_crop(args):\n\n    data = json.load(open(args.crop_info_path, \"r\"))\n    target_results = []\n    os.makedirs(args.result_path, exist_ok=True)\n    \n    ####### get target crop info ########\n    for item in tqdm(data):\n        second_crop_info = item[\"second_crop\"]\n        first_crop_info = item[\"first_crop\"]\n        target_crop = transfer_bbox_to_origin_img(first_crop_info, get_other_ratio_crop(second_crop_info, args.target_ratio))\n        item[\"target_crop\"] = {\n            \"target_ratio\":args.target_ratio,\n            \"target_bbox\":target_crop\n        }\n        target_results.append(item)\n\n    target_file = os.path.join(args.result_path, f\"target_crop_info_{args.target_ratio}.json\")\n    save_json(target_results, target_file)\n    logger.info(f\"Target crop info are saved in the '{target_file}' file\")    \n    \n    ####### crop images #########\n    ori_path = args.ori_image_path\n    target_path = f\"{args.result_path}/{args.target_ratio}\"\n\n    for sample in tqdm(target_results):\n        img_path = osp.join(ori_path, sample[\"file_name\"])\n        target_bbox = sample[\"target_crop\"][\"target_bbox\"]\n        crop(img_path, target_bbox, target_path)\n    \n    logger.info(f\"Cropped images are saved in the '{target_path}' path\")\n\n\n\nif __name__ == '__main__':\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--crop_info_path', type=str, default=\"vbench2_beta_i2v/data/i2v-bench-info.json\", help=\"image suite meta info\")\n    parser.add_argument('--target_ratio', default=\"5-4\", required=True, help=\"the required crop ratio\")\n    parser.add_argument('--ori_image_path', type=str, default=\"vbench2_beta_i2v/data/origin\", help='the file path of the original image data')\n    parser.add_argument('--result_path', type=str, default=\"vbench2_beta_i2v/data/target_crop\", help='result save path')\n    args = parser.parse_args()\n    get_target_crop(args)"
  },
  {
    "path": "Open-Sora/build/lib/vbench2_beta_i2v/i2v_background.py",
    "content": "import io\nimport os\nimport cv2\nimport json\nimport numpy as np\nfrom PIL import Image\nfrom tqdm import tqdm\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torchvision.transforms as transforms\n\nfrom vbench2_beta_i2v.utils import load_video, load_i2v_dimension_info, dino_transform, dino_transform_Image\nimport logging\nlogging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')\nlogger = logging.getLogger(__name__)\n\ndef i2v_background(model, video_pair_list, device):\n    video_results = []\n    sim_list = []\n\n    max_weight = 0.5\n    mean_weight = 0.5\n    min_weight = 0.0\n\n    image_transform = dino_transform_Image(224)\n    frames_transform = dino_transform(224)\n\n    for image_path, video_path in tqdm(video_pair_list):\n        # input image preprocess & extract feature\n        input_image = image_transform(Image.open(image_path))\n        input_image = input_image.unsqueeze(0)\n        input_image = input_image.to(device)\n        input_image_features = model(input_image)\n        input_image_features = F.normalize(input_image_features, dim=-1, p=2)\n\n        # get frames from video\n        images = load_video(video_path)\n        images = frames_transform(images)\n\n        # calculate sim between input image and frames in generated video\n        conformity_scores = []\n        consec_scores = []\n        for i in range(len(images)):\n            with torch.no_grad():\n                image = images[i].unsqueeze(0)\n                image = image.to(device)\n                image_features = model(image)\n                image_features = F.normalize(image_features, dim=-1, p=2)\n                if i != 0:\n                    sim_consec = max(0.0, F.cosine_similarity(former_image_features, image_features).item())\n                    consec_scores.append(sim_consec)\n                sim_to_input = max(0.0, F.cosine_similarity(input_image_features, image_features).item())\n                conformity_scores.append(sim_to_input)\n                former_image_features = image_features\n\n        video_score = max_weight * np.max(conformity_scores) + \\\n            mean_weight * np.mean(consec_scores) + \\\n            min_weight * np.min(consec_scores)\n\n        sim_list.append(video_score)\n        video_results.append({'image_path': image_path, 'video_path': video_path, 'video_results': video_score})\n    return np.mean(sim_list), video_results\n\n\ndef compute_i2v_background(json_dir, device, submodules_list):\n    dino_model = torch.hub.load(**submodules_list).to(device)\n    resolution = submodules_list['resolution']\n    logger.info(\"Initialize DINO success\")\n    video_pair_list, _ = load_i2v_dimension_info(json_dir, dimension='i2v_background', lang='en', resolution=resolution)\n    all_results, video_results = i2v_background(dino_model, video_pair_list, device)\n    return all_results, video_results\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench2_beta_i2v/i2v_subject.py",
    "content": "import io\nimport os\nimport cv2\nimport json\nimport numpy as np\nfrom PIL import Image\nfrom tqdm import tqdm\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torchvision.transforms as transforms\n\nfrom vbench2_beta_i2v.utils import load_video, load_i2v_dimension_info, dino_transform, dino_transform_Image\nimport logging\nlogging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')\nlogger = logging.getLogger(__name__)\n\ndef i2v_subject(model, video_pair_list, device):\n    video_results = []\n    sim_list = []\n\n    max_weight = 0.5\n    mean_weight = 0.5\n    min_weight = 0.0\n\n    image_transform = dino_transform_Image(224)\n    frames_transform = dino_transform(224)\n\n    for image_path, video_path in tqdm(video_pair_list):\n        # input image preprocess & extract feature\n        input_image = image_transform(Image.open(image_path))\n        input_image = input_image.unsqueeze(0)\n        input_image = input_image.to(device)\n        input_image_features = model(input_image)\n        input_image_features = F.normalize(input_image_features, dim=-1, p=2)\n\n        # get frames from video\n        images = load_video(video_path)\n        images = frames_transform(images)\n\n        # calculate sim between input image and frames in generated video\n        conformity_scores = []\n        consec_scores = []\n        for i in range(len(images)):\n            with torch.no_grad():\n                image = images[i].unsqueeze(0)\n                image = image.to(device)\n                image_features = model(image)\n                image_features = F.normalize(image_features, dim=-1, p=2)\n                if i != 0:\n                    sim_consec = max(0.0, F.cosine_similarity(former_image_features, image_features).item())\n                    consec_scores.append(sim_consec)\n                sim_to_input = max(0.0, F.cosine_similarity(input_image_features, image_features).item())\n                conformity_scores.append(sim_to_input)\n                former_image_features = image_features\n\n        video_score = max_weight * np.max(conformity_scores) + \\\n            mean_weight * np.mean(consec_scores) + \\\n            min_weight * np.min(consec_scores)\n\n        sim_list.append(video_score)\n        video_results.append({'image_path': image_path, 'video_path': video_path, 'video_results': video_score})\n    return np.mean(sim_list), video_results\n\n\ndef compute_i2v_subject(json_dir, device, submodules_list):\n    dino_model = torch.hub.load(**submodules_list).to(device)\n    resolution = submodules_list['resolution']\n    logger.info(\"Initialize DINO success\")\n    video_pair_list, _ = load_i2v_dimension_info(json_dir, dimension='i2v_subject', lang='en', resolution=resolution)\n    all_results, video_results = i2v_subject(dino_model, video_pair_list, device)\n    return all_results, video_results\n"
  },
  {
    "path": "Open-Sora/build/lib/vbench2_beta_i2v/utils.py",
    "content": "import os\nimport json\nimport numpy as np\nimport logging\nimport subprocess\nimport torch\nfrom PIL import Image, ImageSequence\nfrom decord import VideoReader, cpu\nfrom torchvision import transforms\nfrom torchvision.transforms import Compose, Resize, CenterCrop, ToTensor, Normalize, ToPILImage\ntry:\n    from torchvision.transforms import InterpolationMode\n    BICUBIC = InterpolationMode.BICUBIC\n    BILINEAR = InterpolationMode.BILINEAR\nexcept ImportError:\n    BICUBIC = Image.BICUBIC\n    BILINEAR = Image.BILINEAR\n\nCACHE_DIR = os.environ.get('VBENCH_CACHE_DIR')\nif CACHE_DIR is None:\n    CACHE_DIR = os.path.join(os.path.expanduser('~'), '.cache', 'vbench')\n\nlogging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')\nlogger = logging.getLogger(__name__)\n\ndef clip_transform(n_px):\n    return Compose([\n        Resize(n_px, interpolation=BICUBIC),\n        CenterCrop(n_px),\n        transforms.Lambda(lambda x: x.float().div(255.0)),\n        Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),\n    ])\n\ndef clip_transform_Image(n_px):\n    return Compose([\n        Resize(n_px, interpolation=BICUBIC),\n        CenterCrop(n_px),\n        ToTensor(),\n        Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),\n    ])\n\ndef dino_transform(n_px):\n    return Compose([\n        Resize(size=n_px),\n        transforms.Lambda(lambda x: x.float().div(255.0)),\n        Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))\n    ])\n\ndef dino_transform_Image(n_px):\n    return Compose([\n        Resize(size=n_px),\n        ToTensor(),\n        Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))\n    ])\n\ndef tag2text_transform(n_px):\n    normalize = Normalize(mean=[0.485, 0.456, 0.406],\n                                        std=[0.229, 0.224, 0.225])\n    return Compose([ToPILImage(),Resize((n_px, n_px)),ToTensor(),normalize])\n\ndef get_frame_indices(num_frames, vlen, sample='rand', fix_start=None, input_fps=1, max_num_frames=-1):\n    if sample in [\"rand\", \"middle\"]: # uniform sampling\n        acc_samples = min(num_frames, vlen)\n        # split the video into `acc_samples` intervals, and sample from each interval.\n        intervals = np.linspace(start=0, stop=vlen, num=acc_samples + 1).astype(int)\n        ranges = []\n        for idx, interv in enumerate(intervals[:-1]):\n            ranges.append((interv, intervals[idx + 1] - 1))\n        if sample == 'rand':\n            try:\n                frame_indices = [random.choice(range(x[0], x[1])) for x in ranges]\n            except:\n                frame_indices = np.random.permutation(vlen)[:acc_samples]\n                frame_indices.sort()\n                frame_indices = list(frame_indices)\n        elif fix_start is not None:\n            frame_indices = [x[0] + fix_start for x in ranges]\n        elif sample == 'middle':\n            frame_indices = [(x[0] + x[1]) // 2 for x in ranges]\n        else:\n            raise NotImplementedError\n\n        if len(frame_indices) < num_frames:  # padded with last frame\n            padded_frame_indices = [frame_indices[-1]] * num_frames\n            padded_frame_indices[:len(frame_indices)] = frame_indices\n            frame_indices = padded_frame_indices\n    elif \"fps\" in sample:  # fps0.5, sequentially sample frames at 0.5 fps\n        output_fps = float(sample[3:])\n        duration = float(vlen) / input_fps\n        delta = 1 / output_fps  # gap between frames, this is also the clip length each frame represents\n        frame_seconds = np.arange(0 + delta / 2, duration + delta / 2, delta)\n        frame_indices = np.around(frame_seconds * input_fps).astype(int)\n        frame_indices = [e for e in frame_indices if e < vlen]\n        if max_num_frames > 0 and len(frame_indices) > max_num_frames:\n            frame_indices = frame_indices[:max_num_frames]\n            # frame_indices = np.linspace(0 + delta / 2, duration + delta / 2, endpoint=False, num=max_num_frames)\n    else:\n        raise ValueError\n    return frame_indices\n\ndef load_video(video_path, data_transform=None, num_frames=None, return_tensor=True, width=None, height=None):\n    \"\"\"\n    Load a video from a given path and apply optional data transformations.\n\n    The function supports loading video in GIF (.gif), PNG (.png), and MP4 (.mp4) formats.\n    Depending on the format, it processes and extracts frames accordingly.\n    \n    Parameters:\n    - video_path (str): The file path to the video or image to be loaded.\n    - data_transform (callable, optional): A function that applies transformations to the video data.\n    \n    Returns:\n    - frames (torch.Tensor): A tensor containing the video frames with shape (T, C, H, W),\n      where T is the number of frames, C is the number of channels, H is the height, and W is the width.\n    \n    Raises:\n    - NotImplementedError: If the video format is not supported.\n    \n    The function first determines the format of the video file by its extension.\n    For GIFs, it iterates over each frame and converts them to RGB.\n    For PNGs, it reads the single frame, converts it to RGB.\n    For MP4s, it reads the frames using the VideoReader class and converts them to NumPy arrays.\n    If a data_transform is provided, it is applied to the buffer before converting it to a tensor.\n    Finally, the tensor is permuted to match the expected (T, C, H, W) format.\n    \"\"\"\n    if video_path.endswith('.gif'):\n        frame_ls = []\n        img = Image.open(video_path)\n        for frame in ImageSequence.Iterator(img):\n            frame = frame.convert('RGB')\n            frame = np.array(frame).astype(np.uint8)\n            frame_ls.append(frame)\n        buffer = np.array(frame_ls).astype(np.uint8)\n    elif video_path.endswith('.png'):\n        frame = Image.open(video_path)\n        frame = frame.convert('RGB')\n        frame = np.array(frame).astype(np.uint8)\n        frame_ls = [frame]\n        buffer = np.array(frame_ls)\n    elif video_path.endswith('.mp4'):\n        import decord\n        decord.bridge.set_bridge('native')\n        if width:\n            video_reader = VideoReader(video_path, width=width, height=height, num_threads=1)\n        else:\n            video_reader = VideoReader(video_path, num_threads=1)\n        frames = video_reader.get_batch(range(len(video_reader)))  # (T, H, W, C), torch.uint8\n\n        buffer = frames.asnumpy().astype(np.uint8)\n    else:\n        raise NotImplementedError\n    \n    frames = buffer\n    if num_frames:\n        frame_indices = get_frame_indices(\n        num_frames, len(frames), sample=\"middle\"\n        )\n        frames = frames[frame_indices]\n    \n    if data_transform:\n        frames = data_transform(frames)\n    elif return_tensor:\n        frames = torch.Tensor(frames)\n        frames = frames.permute(0, 3, 1, 2)  # (T, C, H, W), torch.uint8\n\n    return frames\n\ndef read_frames_decord_by_fps(\n        video_path, sample_fps=2, sample='rand', fix_start=None, \n        max_num_frames=-1,  trimmed30=False, num_frames=8\n    ):\n    import decord\n    decord.bridge.set_bridge(\"torch\")\n    video_reader = VideoReader(video_path, num_threads=1)\n    vlen = len(video_reader)\n    fps = video_reader.get_avg_fps()\n    duration = vlen / float(fps)\n\n    if trimmed30 and duration > 30:\n        duration = 30\n        vlen = int(30 * float(fps))\n\n    frame_indices = get_frame_indices(\n        num_frames, vlen, sample=sample, fix_start=fix_start,\n        input_fps=fps, max_num_frames=max_num_frames\n    )\n    frames = video_reader.get_batch(frame_indices)  # (T, H, W, C), torch.uint8\n    frames = frames.permute(0, 3, 1, 2)  # (T, C, H, W), torch.uint8\n    return frames\n    \ndef load_dimension_info(json_dir, dimension, lang):\n    \"\"\"\n    Load video list and prompt information based on a specified dimension and language from a JSON file.\n    \n    Parameters:\n    - json_dir (str): The directory path where the JSON file is located.\n    - dimension (str): The dimension for evaluation to filter the video prompts.\n    - lang (str): The language key used to retrieve the appropriate prompt text.\n    \n    Returns:\n    - video_list (list): A list of video file paths that match the specified dimension.\n    - prompt_dict_ls (list): A list of dictionaries, each containing a prompt and its corresponding video list.\n    \n    The function reads the JSON file to extract video information. It filters the prompts based on the specified\n    dimension and compiles a list of video paths and associated prompts in the specified language.\n    \n    Notes:\n    - The JSON file is expected to contain a list of dictionaries with keys 'dimension', 'video_list', and language-based prompts.\n    - The function assumes that the 'video_list' key in the JSON can either be a list or a single string value.\n    \"\"\"\n    video_list = []\n    prompt_dict_ls = []\n    full_prompt_list = load_json(json_dir)\n    for prompt_dict in full_prompt_list:\n        if dimension in prompt_dict['dimension'] and 'video_list' in prompt_dict:\n            prompt = prompt_dict[f'prompt_{lang}']\n            cur_video_list = prompt_dict['video_list'] if isinstance(prompt_dict['video_list'], list) else [prompt_dict['video_list']]\n            video_list += cur_video_list\n            if 'auxiliary_info' in prompt_dict and dimension in prompt_dict['auxiliary_info']:\n                prompt_dict_ls += [{'prompt': prompt, 'video_list': cur_video_list, 'auxiliary_info': prompt_dict['auxiliary_info'][dimension]}]\n            else:\n                prompt_dict_ls += [{'prompt': prompt, 'video_list': cur_video_list}]\n    return video_list, prompt_dict_ls\n\n\ndef load_i2v_dimension_info(json_dir, dimension, lang, resolution):\n    \"\"\"\n    Load video list and prompt information based on a specified dimension and language from a JSON file.\n    \n    Parameters:\n    - json_dir (str): The directory path where the JSON file is located.\n    - dimension (str): The dimension for evaluation to filter the video prompts.\n    - lang (str): The language key used to retrieve the appropriate prompt text.\n    - resulution (str): The resolution of the image will be used\n    \n    Returns:\n    - video_list (list): A list of video file paths that match the specified dimension.\n    - prompt_dict_ls (list): A list of dictionaries, each containing a prompt and its corresponding video list.\n    \n    The function reads the JSON file to extract video information. It filters the prompts based on the specified\n    dimension and compiles a list of video paths and associated prompts in the specified language.\n    \n    Notes:\n    - The JSON file is expected to contain a list of dictionaries with keys 'dimension', 'video_list', and language-based prompts.\n    - The function assumes that the 'video_list' key in the JSON can either be a list or a single string value.\n    \"\"\"\n    video_pair_list = []\n    prompt_dict_ls = []\n    full_prompt_list = load_json(json_dir)\n    image_root = f'vbench2_beta_i2v/data/crop/{resolution}'\n    image_root = '/root/autodl-tmp/video_samples/samples_sora-original_model.safetensors_vbench'\n    for prompt_dict in full_prompt_list:\n        if dimension in prompt_dict['dimension'] and 'video_list' in prompt_dict:\n            prompt = prompt_dict[f'prompt_{lang}']\n            cur_video_list = prompt_dict['video_list'] if isinstance(prompt_dict['video_list'], list) else [prompt_dict['video_list']]\n            # create image-video pair\n            image_path = os.path.join(image_root, prompt_dict[\"image_name\"])\n            cur_video_pair = [(image_path, video) for video in cur_video_list]\n            video_pair_list += cur_video_pair\n            if 'auxiliary_info' in prompt_dict and dimension in prompt_dict['auxiliary_info']:\n                prompt_dict_ls += [{'prompt': prompt, 'video_list': cur_video_list, 'auxiliary_info': prompt_dict['auxiliary_info'][dimension]}]\n            else:\n                prompt_dict_ls += [{'prompt': prompt, 'video_list': cur_video_list}]\n    return video_pair_list, prompt_dict_ls\n\n\ndef init_submodules(dimension_list, local=False, read_frame=False, resolution=\"1-1\"):\n    submodules_dict = {}\n    if local:\n        logger.info(\"\\x1b[32m[Local Mode]\\x1b[0m Working in local mode, please make sure that the pre-trained model has been fully downloaded.\")\n    for dimension in dimension_list:\n        os.makedirs(CACHE_DIR, exist_ok=True)\n        if dimension == 'i2v_subject' or dimension == 'i2v_background':\n            if local:\n                submodules_dict[dimension] = {\n                    'repo_or_dir': f'{CACHE_DIR}/dino_model/facebookresearch_dino_main/',\n                    'path': f'{CACHE_DIR}/dino_model/dino_vitbase16_pretrain.pth', \n                    'model': 'dino_vitb16',\n                    'source': 'local',\n                    'resolution': resolution\n                    }\n                details = submodules_dict[dimension]\n                # Check if the file exists, if not, download it with wget\n                if not os.path.isdir(details['repo_or_dir']):\n                    print(f\"Directory {details['repo_or_dir']} does not exist. Cloning repository...\")\n                    subprocess.run(['git', 'clone', 'https://github.com/facebookresearch/dino', details['repo_or_dir']], check=True)\n\n                if not os.path.isfile(details['path']):\n                    print(f\"File {details['path']} does not exist. Downloading...\")\n                    wget_command = ['wget', '-P', os.path.dirname(details['path']),\n                                    'https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth']\n                    subprocess.run(wget_command, check=True)\n            else:\n                submodules_dict[dimension] = {\n                    'repo_or_dir':'facebookresearch/dino:main',\n                    'source':'github',\n                    'model': 'dino_vitb16',\n                    'resolution': resolution\n                    }\n        elif dimension == 'camera_motion':\n            submodules_dict[dimension] = {\n                \"repo\":\"facebookresearch/co-tracker\",\n                \"model\":\"cotracker2\"\n            }\n    return submodules_dict\n\n\n\ndef save_json(data, path, indent=4):\n    with open(path, 'w', encoding='utf-8') as f:\n        json.dump(data, f, indent=indent)\n\ndef load_json(path):\n    \"\"\"\n    Load a JSON file from the given file path.\n    \n    Parameters:\n    - file_path (str): The path to the JSON file.\n    \n    Returns:\n    - data (dict or list): The data loaded from the JSON file, which could be a dictionary or a list.\n    \"\"\"\n    with open(path, 'r', encoding='utf-8') as f:\n        return json.load(f)\n"
  },
  {
    "path": "Open-Sora/configs/dit/inference/16x256x256.py",
    "content": "num_frames = 16\nfps = 8\nimage_size = (256, 256)\n\n# Define model\nmodel = dict(\n    type=\"DiT-XL/2\",\n    condition=\"text\",\n    from_pretrained=\"PRETRAINED_MODEL\",\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n)\ntext_encoder = dict(\n    type=\"clip\",\n    from_pretrained=\"openai/clip-vit-base-patch32\",\n    model_max_length=77,\n)\nscheduler = dict(\n    type=\"dpm-solver\",\n    num_sampling_steps=20,\n    cfg_scale=4.0,\n)\ndtype = \"bf16\"\n\n# Others\nbatch_size = 2\nseed = 42\nprompt_path = \"./assets/texts/ucf101_labels.txt\"\nsave_dir = \"./samples/samples/\"\n"
  },
  {
    "path": "Open-Sora/configs/dit/inference/1x256x256-class.py",
    "content": "num_frames = 1\nfps = 1\nimage_size = (256, 256)\n\n# Define model\nmodel = dict(\n    type=\"DiT-XL/2\",\n    no_temporal_pos_emb=True,\n    condition=\"label_1000\",\n    from_pretrained=\"DiT-XL-2-256x256.pt\",\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n)\ntext_encoder = dict(\n    type=\"classes\",\n    num_classes=1000,\n)\nscheduler = dict(\n    type=\"dpm-solver\",\n    num_sampling_steps=20,\n    cfg_scale=4.0,\n)\ndtype = \"bf16\"\n\n# Others\nbatch_size = 2\nseed = 42\nprompt_path = \"./assets/texts/imagenet_id.txt\"\nsave_dir = \"./samples/samples/\"\n"
  },
  {
    "path": "Open-Sora/configs/dit/inference/1x256x256.py",
    "content": "num_frames = 1\nfps = 1\nimage_size = (256, 256)\n\n# Define model\nmodel = dict(\n    type=\"DiT-XL/2\",\n    no_temporal_pos_emb=True,\n    condition=\"text\",\n    from_pretrained=\"PRETRAINED_MODEL\",\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n)\ntext_encoder = dict(\n    type=\"clip\",\n    from_pretrained=\"openai/clip-vit-base-patch32\",\n    model_max_length=77,\n)\nscheduler = dict(\n    type=\"dpm-solver\",\n    num_sampling_steps=20,\n    cfg_scale=4.0,\n)\ndtype = \"bf16\"\n\n# Others\nbatch_size = 2\nseed = 42\nprompt_path = \"./assets/texts/imagenet_labels.txt\"\nsave_dir = \"./samples/samples/\"\n"
  },
  {
    "path": "Open-Sora/configs/dit/train/16x256x256.py",
    "content": "# Define dataset\ndataset = dict(\n    type=\"VideoTextDataset\",\n    data_path=None,\n    num_frames=16,\n    frame_interval=3,\n    image_size=(256, 256),\n)\n\n# Define acceleration\nnum_workers = 4\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2\"\nsp_size = 1\n\n# Define model\nmodel = dict(\n    type=\"DiT-XL/2\",\n    from_pretrained=\"DiT-XL-2-256x256.pt\",\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n)\ntext_encoder = dict(\n    type=\"clip\",\n    from_pretrained=\"openai/clip-vit-base-patch32\",\n    model_max_length=77,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 1000\nlog_every = 10\nckpt_every = 1000\nload = None\n\nbatch_size = 8\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/dit/train/1x256x256.py",
    "content": "# Define dataset\ndataset = dict(\n    type=\"VideoTextDataset\",\n    data_path=None,\n    num_frames=1,\n    frame_interval=1,\n    image_size=(256, 256),\n    transform_name=\"center\",\n)\n\n# Define acceleration\nnum_workers = 4\ndtype = \"bf16\"\ngrad_checkpoint = False\nplugin = \"zero2\"\nsp_size = 1\n\n# Define model\nmodel = dict(\n    type=\"DiT-XL/2\",\n    no_temporal_pos_emb=True,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n)\ntext_encoder = dict(\n    type=\"clip\",\n    from_pretrained=\"openai/clip-vit-base-patch32\",\n    model_max_length=77,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 1000\nlog_every = 10\nckpt_every = 1000\nload = None\n\nbatch_size = 128\nlr = 1e-4  # according to DiT repo\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/latte/inference/16x256x256-class.py",
    "content": "num_frames = 16\nfps = 8\nimage_size = (256, 256)\n\n# Define model\nmodel = dict(\n    type=\"Latte-XL/2\",\n    condition=\"label_101\",\n    from_pretrained=\"Latte-XL-2-256x256-ucf101.pt\",\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n)\ntext_encoder = dict(\n    type=\"classes\",\n    num_classes=101,\n)\nscheduler = dict(\n    type=\"dpm-solver\",\n    num_sampling_steps=20,\n    cfg_scale=4.0,\n)\ndtype = \"bf16\"\n\n# Others\nbatch_size = 2\nseed = 42\nprompt_path = \"./assets/texts/ucf101_id.txt\"\nsave_dir = \"./samples/samples/\"\n"
  },
  {
    "path": "Open-Sora/configs/latte/inference/16x256x256.py",
    "content": "num_frames = 16\nfps = 8\nimage_size = (256, 256)\n\n# Define model\nmodel = dict(\n    type=\"Latte-XL/2\",\n    condition=\"text\",\n    from_pretrained=\"PRETRAINED_MODEL\",\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n)\ntext_encoder = dict(\n    type=\"clip\",\n    from_pretrained=\"openai/clip-vit-base-patch32\",\n    model_max_length=77,\n)\nscheduler = dict(\n    type=\"dpm-solver\",\n    num_sampling_steps=20,\n    cfg_scale=4.0,\n)\ndtype = \"bf16\"\n\n# Others\nbatch_size = 2\nseed = 42\nprompt_path = \"./assets/texts/ucf101_labels.txt\"\nsave_dir = \"./samples/samples/\"\n"
  },
  {
    "path": "Open-Sora/configs/latte/train/16x256x256.py",
    "content": "# Define dataset\ndataset = dict(\n    type=\"VideoTextDataset\",\n    data_path=None,\n    num_frames=16,\n    frame_interval=3,\n    image_size=(256, 256),\n)\n\n# Define acceleration\nnum_workers = 4\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2\"\nsp_size = 1\n\n# Define model\nmodel = dict(\n    type=\"Latte-XL/2\",\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n)\ntext_encoder = dict(\n    type=\"clip\",\n    from_pretrained=\"openai/clip-vit-base-patch32\",\n    model_max_length=77,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 1000\nlog_every = 10\nckpt_every = 1000\nload = None\n\nbatch_size = 8\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/opensora/inference/16x256x256.py",
    "content": "num_frames = 16\nfps = 24 // 3\nimage_size = (256, 256)\n\n# Define model\nmodel = dict(\n    type=\"STDiT-XL/2\",\n    space_scale=0.5,\n    time_scale=1.0,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n    from_pretrained=\"PRETRAINED_MODEL\",\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    micro_batch_size=4,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=120,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    num_sampling_steps=100,\n    cfg_scale=7.0,\n    cfg_channel=3,  # or None\n)\ndtype = \"bf16\"\n\n# Condition\nprompt_path = \"./assets/texts/t2v_samples.txt\"\nprompt = None  # prompt has higher priority than prompt_path\n\n# Others\nbatch_size = 1\nseed = 42\nsave_dir = \"./samples/samples/\"\n"
  },
  {
    "path": "Open-Sora/configs/opensora/inference/16x512x512-rflow.py",
    "content": "num_frames = 16\nfps = 24 // 3\nimage_size = (512, 512)\n\n# Define model\nmodel = dict(\n    type=\"STDiT-XL/2\",\n    space_scale=1.0,\n    time_scale=1.0,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n    from_pretrained=\"PRETRAINED_MODEL\",\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    micro_batch_size=2,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=120,\n)\nscheduler = dict(\n    type=\"rflow\",\n    num_sampling_steps=10,\n    cfg_scale=7.0,\n)\ndtype = \"bf16\"\n\n# Others\nbatch_size = 2\nseed = 42\nprompt_path = \"./assets/texts/t2v_samples.txt\"\nsave_dir = \"./outputs/samples/\"\n"
  },
  {
    "path": "Open-Sora/configs/opensora/inference/16x512x512.py",
    "content": "num_frames = 16\nfps = 24 // 3\nimage_size = (512, 512)\n\n# Define model\nmodel = dict(\n    type=\"STDiT-XL/2\",\n    space_scale=1.0,\n    time_scale=1.0,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n    from_pretrained=\"PRETRAINED_MODEL\",\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    micro_batch_size=2,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=120,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    num_sampling_steps=100,\n    cfg_scale=7.0,\n)\ndtype = \"bf16\"\n\n# Others\nbatch_size = 2\nseed = 42\nprompt_path = \"./assets/texts/t2v_samples.txt\"\nsave_dir = \"./samples/samples/\"\n"
  },
  {
    "path": "Open-Sora/configs/opensora/inference/64x512x512.py",
    "content": "num_frames = 64\nfps = 24 // 2\nimage_size = (512, 512)\n\n# Define model\nmodel = dict(\n    type=\"STDiT-XL/2\",\n    space_scale=1.0,\n    time_scale=2 / 3,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n    from_pretrained=\"PRETRAINED_MODEL\",\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    micro_batch_size=128,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=120,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    num_sampling_steps=100,\n    cfg_scale=7.0,\n)\ndtype = \"bf16\"\n\n# Others\nbatch_size = 1\nseed = 42\nprompt_path = \"./assets/texts/t2v_samples.txt\"\nsave_dir = \"./samples/samples/\"\n"
  },
  {
    "path": "Open-Sora/configs/opensora/train/16x256x256-mask.py",
    "content": "# Define dataset\ndataset = dict(\n    type=\"VideoTextDataset\",\n    data_path=None,\n    num_frames=16,\n    frame_interval=3,\n    image_size=(256, 256),\n)\n\n# Define acceleration\nnum_workers = 4\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2\"\nsp_size = 1\n\n# Define model\nmodel = dict(\n    type=\"STDiT-XL/2\",\n    space_scale=0.5,\n    time_scale=1.0,\n    from_pretrained=\"PixArt-XL-2-512x512.pth\",\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\nmask_ratios = {\n    \"identity\": 0.7,\n    \"random\": 0.15,\n    \"mask_head\": 0.05,\n    \"mask_tail\": 0.05,\n    \"mask_head_tail\": 0.05,\n}\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=120,\n    shardformer=True,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 1000\nlog_every = 10\nckpt_every = 1000\nload = None\n\nbatch_size = 8\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/opensora/train/16x256x256-spee-rflow.py",
    "content": "# Define dataset\ndataset = dict(\n    type=\"VideoTextDataset\",\n    data_path=None,\n    num_frames=16,\n    frame_interval=3,\n    image_size=(256, 256),\n)\n\n# Define acceleration\nnum_workers = 4\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2\"\nsp_size = 1\n\n# Define model\nmodel = dict(\n    type=\"STDiT-XL/2\",\n    space_scale=0.5,\n    time_scale=1.0,\n    # from_pretrained=\"PixArt-XL-2-512x512.pth\",\n    # from_pretrained = \"/home/zhaowangbo/wangbo/PixArt-alpha/pretrained_models/OpenSora-v1-HQ-16x512x512.pth\",\n    # from_pretrained = \"OpenSora-v1-HQ-16x512x512.pth\",\n    from_pretrained=\"PRETRAINED_MODEL\",\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\n# mask_ratios = [0.5, 0.29, 0.07, 0.07, 0.07]\n# mask_ratios = {\n#     \"identity\": 0.9,\n#     \"random\": 0.06,\n#     \"mask_head\": 0.01,\n#     \"mask_tail\": 0.01,\n#     \"mask_head_tail\": 0.02,\n# }\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=120,\n    shardformer=True,\n)\nscheduler = dict(\n    type=\"rflow\",\n    # timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = True\n\nepochs = 1\nlog_every = 10\nckpt_every = 1000\nload = None\n\nbatch_size = 16\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/opensora/train/16x256x256-spee.py",
    "content": "# Define dataset\ndataset = dict(\n    type=\"VideoTextDataset\",\n    data_path=None,\n    num_frames=16,\n    frame_interval=3,\n    image_size=(256, 256),\n)\n\n# Define acceleration\nnum_workers = 4\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2\"\nsp_size = 1\n\n# Define model\nmodel = dict(\n    type=\"STDiT-XL/2\",\n    space_scale=0.5,\n    time_scale=1.0,\n    from_pretrained=\"PixArt-XL-2-512x512.pth\",\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\nmask_ratios = {\n    \"identity\": 0.5,\n    \"random\": 0.29,\n    \"mask_head\": 0.07,\n    \"mask_tail\": 0.07,\n    \"mask_head_tail\": 0.07,\n}\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=120,\n    shardformer=True,\n)\nscheduler = dict(\n    type=\"iddpm-speed\",\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 1000\nlog_every = 10\nckpt_every = 1000\nload = None\n\nbatch_size = 8\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/opensora/train/16x256x256.py",
    "content": "# Define dataset\ndataset = dict(\n    type=\"VideoTextDataset\",\n    data_path=None,\n    num_frames=16,\n    frame_interval=3,\n    image_size=(256, 256),\n)\n\n# Define acceleration\nnum_workers = 0\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2\"\nsp_size = 1\n\n# Define model\nmodel = dict(\n    type=\"STDiT-XL/2\",\n    space_scale=0.5,\n    time_scale=1.0,\n    from_pretrained=\"PixArt-XL-2-512x512.pth\",\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=120,\n    shardformer=True,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 1000\nlog_every = 10\nckpt_every = 1000\nload = None\n\nbatch_size = 8\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/opensora/train/16x512x512.py",
    "content": "# Define dataset\ndataset = dict(\n    type=\"VideoTextDataset\",\n    data_path=None,\n    num_frames=16,\n    frame_interval=3,\n    image_size=(512, 512),\n)\n\n# Define acceleration\nnum_workers = 4\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2\"\nsp_size = 1\n\n# Define model\nmodel = dict(\n    type=\"STDiT-XL/2\",\n    space_scale=1.0,\n    time_scale=1.0,\n    from_pretrained=None,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    micro_batch_size=128,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=120,\n    shardformer=True,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 1000\nlog_every = 10\nckpt_every = 500\nload = None\n\nbatch_size = 8\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/opensora/train/360x512x512.py",
    "content": "# Define dataset\ndataset = dict(\n    type=\"VideoTextDataset\",\n    data_path=None,\n    num_frames=360,\n    frame_interval=3,\n    image_size=(512, 512),\n)\n\n# Define acceleration\nnum_workers = 4\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2\"\nsp_size = 1\n\n# Define acceleration\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2-seq\"\nsp_size = 2\n\n# Define model\nmodel = dict(\n    type=\"STDiT-XL/2\",\n    space_scale=1.0,\n    time_scale=2 / 3,\n    from_pretrained=None,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n    enable_sequence_parallelism=True,  # enable sq here\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    micro_batch_size=128,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=120,\n    shardformer=True,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 1000\nlog_every = 10\nckpt_every = 250\nload = None\n\nbatch_size = 1\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/opensora/train/64x512x512-sp.py",
    "content": "# Define dataset\ndataset = dict(\n    type=\"VideoTextDataset\",\n    data_path=None,\n    num_frames=16,\n    frame_interval=3,\n    image_size=(512, 512),\n)\n\n# Define acceleration\nnum_workers = 4\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2\"\nsp_size = 2\n\n# Define model\nmodel = dict(\n    type=\"STDiT-XL/2\",\n    space_scale=1.0,\n    time_scale=2 / 3,\n    from_pretrained=None,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n    enable_sequence_parallelism=True,  # enable sq here\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=120,\n    shardformer=True,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 1000\nlog_every = 10\nckpt_every = 1000\nload = None\n\nbatch_size = 1\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/opensora/train/64x512x512.py",
    "content": "# Define dataset\ndataset = dict(\n    type=\"VideoTextDataset\",\n    data_path=None,\n    num_frames=64,\n    frame_interval=3,\n    image_size=(512, 512),\n)\n\n# Define acceleration\nnum_workers = 4\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2\"\nsp_size = 1\n\n# Define model\nmodel = dict(\n    type=\"STDiT-XL/2\",\n    space_scale=1.0,\n    time_scale=2 / 3,\n    from_pretrained=None,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    micro_batch_size=64,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=120,\n    shardformer=True,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 1000\nlog_every = 10\nckpt_every = 250\nload = None\n\nbatch_size = 4\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/opensora-v1-1/inference/sample-ref.py",
    "content": "num_frames = 16\nframe_interval = 3\nfps = 24\nimage_size = (240, 426)\nmulti_resolution = \"STDiT2\"\n\n# Condition\nprompt_path = None\nprompt = [\n    'Drone view of waves crashing against the rugged cliffs along Big Sur\\'s garay point beach. {\"reference_path\": \"assets/images/condition/cliff.png\", \"mask_strategy\": \"0\"}',\n    'A breathtaking sunrise scene.{\"reference_path\": \"assets/images/condition/sunset1.png\",\"mask_strategy\": \"0\"}',\n    'A car driving on the ocean.{\"reference_path\": \"https://cdn.openai.com/tmp/s/interp/d0.mp4\",\"mask_strategy\": \"0,0,-8,0,8\"}',\n    'A snowy forest.{\"reference_path\": \"https://cdn.pixabay.com/video/2021/04/25/72171-542991404_large.mp4\",\"mask_strategy\": \"0,0,0,0,15,0.8\"}',\n    'A breathtaking sunrise scene.{\"reference_path\": \"assets/images/condition/sunset1.png;assets/images/condition/sunset2.png\",\"mask_strategy\": \"0;0,1,0,-1,1\"}',\n    '|0|a white jeep equipped with a roof rack driving on a dirt road in a coniferous forest.|2|a white jeep equipped with a roof rack driving on a dirt road in the desert.|4|a white jeep equipped with a roof rack driving on a dirt road in a mountain.|6|A white jeep equipped with a roof rack driving on a dirt road in a city.|8|a white jeep equipped with a roof rack driving on a dirt road on the surface of a river.|10|a white jeep equipped with a roof rack driving on a dirt road under the lake.|12|a white jeep equipped with a roof rack flying into the sky.|14|a white jeep equipped with a roof rack driving in the universe. Earth is the background.{\"reference_path\": \"https://cdn.openai.com/tmp/s/interp/d0.mp4\", \"mask_strategy\": \"0,0,0,0,15\"}',\n]\n\nloop = 2\ncondition_frame_length = 4\n# (\n#   loop id, [the loop index of the condition image or video]\n#   reference id, [the index of the condition image or video in the reference_path]\n#   reference start, [the start frame of the condition image or video]\n#   target start, [the location to insert]\n#   length, [the number of frames to insert]\n#   edit_ratio [the edit rate of the condition image or video]\n# )\n# See https://github.com/hpcaitech/Open-Sora/blob/main/docs/config.md#advanced-inference-config for more details\n# See https://github.com/hpcaitech/Open-Sora/blob/main/docs/commands.md#inference-with-open-sora-11 for more examples\n\n# Define model\nmodel = dict(\n    type=\"STDiT2-XL/2\",\n    from_pretrained=\"hpcai-tech/OpenSora-STDiT-v2-stage3\",\n    input_sq_size=512,\n    qk_norm=True,\n    qk_norm_legacy=True,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    cache_dir=None,  # \"/mnt/hdd/cached_models\",\n    micro_batch_size=4,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    cache_dir=None,  # \"/mnt/hdd/cached_models\",\n    model_max_length=200,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    num_sampling_steps=100,\n    cfg_scale=7.0,\n    cfg_channel=3,  # or None\n)\ndtype = \"bf16\"\n\n# Others\nbatch_size = 1\nseed = 42\nsave_dir = \"./samples/samples/\"\n"
  },
  {
    "path": "Open-Sora/configs/opensora-v1-1/inference/sample.py",
    "content": "num_frames = 16\nframe_interval = 3\nfps = 24\nimage_size = (240, 426)\nmulti_resolution = \"STDiT2\"\n\n# Define model\nmodel = dict(\n    type=\"STDiT2-XL/2\",\n    from_pretrained=\"hpcai-tech/OpenSora-STDiT-v2-stage3\",\n    input_sq_size=512,\n    qk_norm=True,\n    qk_norm_legacy=True,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    cache_dir=None,  # \"/mnt/hdd/cached_models\",\n    micro_batch_size=4,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    cache_dir=None,  # \"/mnt/hdd/cached_models\",\n    model_max_length=200,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    num_sampling_steps=100,\n    cfg_scale=7.0,\n    cfg_channel=3,  # or None\n)\ndtype = \"bf16\"\n\n# Condition\nprompt_path = \"./assets/texts/t2v_samples.txt\"\nprompt = None  # prompt has higher priority than prompt_path\n\n# Others\nbatch_size = 1\nseed = 42\nsave_dir = \"./samples/samples/\"\n"
  },
  {
    "path": "Open-Sora/configs/opensora-v1-1/train/benchmark.py",
    "content": "# this file is only for batch size search and is not used for training\n\n# Define dataset\ndataset = dict(\n    type=\"VariableVideoTextDataset\",\n    data_path=None,\n    num_frames=None,\n    frame_interval=3,\n    image_size=(None, None),\n    transform_name=\"resize_crop\",\n)\n\n# bucket config format:\n# 1. { resolution: {num_frames: (prob, batch_size)} }, in this case batch_size is ignored when searching\n# 2. { resolution: {num_frames: (prob, (max_batch_size, ))} }, batch_size is searched in the range [batch_size_start, max_batch_size), batch_size_start is configured via CLI\n# 3. { resolution: {num_frames: (prob, (min_batch_size, max_batch_size))} }, batch_size is searched in the range [min_batch_size, max_batch_size)\n# 4. { resolution: {num_frames: (prob, (min_batch_size, max_batch_size, step_size))} }, batch_size is searched in the range [min_batch_size, max_batch_size) with step_size (grid search)\n# 5. { resolution: {num_frames: (0.0, None)} }, this bucket will not be used\n\nbucket_config = {\n    # == manual search ==\n    # \"240p\": {128: (1.0, 2)}, # 4.28s/it\n    # \"240p\": {64: (1.0, 4)},\n    # \"240p\": {32: (1.0, 8)},  # 4.6s/it\n    # \"240p\": {16: (1.0, 16)},  # 4.6s/it\n    # \"480p\": {16: (1.0, 4)},  # 4.6s/it\n    # \"720p\": {16: (1.0, 2)},  # 5.89s/it\n    # \"256\": {1: (1.0, 256)},  # 4.5s/it\n    # \"512\": {1: (1.0, 96)}, # 4.7s/it\n    # \"512\": {1: (1.0, 128)}, # 6.3s/it\n    # \"480p\": {1: (1.0, 50)},  # 4.0s/it\n    # \"1024\": {1: (1.0, 32)},  # 6.8s/it\n    # \"1024\": {1: (1.0, 20)}, # 4.3s/it\n    # \"1080p\": {1: (1.0, 16)}, # 8.6s/it\n    # \"1080p\": {1: (1.0, 8)},  # 4.4s/it\n    # == stage 2 ==\n    # \"240p\": {\n    #     16: (1.0, (2, 32)),\n    #     32: (1.0, (2, 16)),\n    #     64: (1.0, (2, 8)),\n    #     128: (1.0, (2, 6)),\n    # },\n    # \"256\": {1: (1.0, (128, 300))},\n    # \"512\": {1: (0.5, (64, 128))},\n    # \"480p\": {1: (0.4, (32, 128)), 16: (0.4, (2, 32)), 32: (0.0, None)},\n    # \"720p\": {16: (0.1, (2, 16)), 32: (0.0, None)},  # No examples now\n    # \"1024\": {1: (0.3, (8, 64))},\n    # \"1080p\": {1: (0.3, (2, 32))},\n    # == stage 3 ==\n    \"720p\": {1: (20, 40), 32: (0.5, (2, 4)), 64: (0.5, (1, 1))},\n}\n\n\n# Define acceleration\nnum_workers = 4\nnum_bucket_build_workers = 16\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2\"\nsp_size = 1\n\n# Define model\nmodel = dict(\n    type=\"STDiT2-XL/2\",\n    from_pretrained=None,\n    input_sq_size=512,  # pretrained model is trained on 512x512\n    qk_norm=True,\n    qk_norm_legacy=True,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    micro_batch_size=4,\n    local_files_only=True,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=200,\n    shardformer=True,\n    local_files_only=True,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 1000\nlog_every = 10\nckpt_every = 1000\nload = None\n\nbatch_size = None\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/opensora-v1-1/train/image.py",
    "content": "# Define dataset\ndataset = dict(\n    type=\"VariableVideoTextDataset\",\n    data_path=None,\n    num_frames=None,\n    frame_interval=3,\n    image_size=(None, None),\n    transform_name=\"resize_crop\",\n)\nbucket_config = {  # 6s/it\n    \"256\": {1: (1.0, 256)},\n    \"512\": {1: (1.0, 80)},\n    \"480p\": {1: (1.0, 52)},\n    \"1024\": {1: (1.0, 20)},\n    \"1080p\": {1: (1.0, 8)},\n}\n\n# Define acceleration\nnum_workers = 4\nnum_bucket_build_workers = 16\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2\"\nsp_size = 1\n\n# Define model\nmodel = dict(\n    type=\"STDiT2-XL/2\",\n    from_pretrained=None,\n    input_sq_size=512,  # pretrained model is trained on 512x512\n    qk_norm=True,\n    qk_norm_legacy=True,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    micro_batch_size=4,\n    local_files_only=True,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=200,\n    shardformer=True,\n    local_files_only=True,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 1000\nlog_every = 10\nckpt_every = 500\nload = None\n\nbatch_size = 10  # only for logging\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/opensora-v1-1/train/image_rflow.py",
    "content": "# Define dataset\n# dataset = dict(\n#     type=\"VariableVideoTextDataset\",\n#     data_path=None,\n#     num_frames=None,\n#     frame_interval=3,\n#     image_size=(None, None),\n#     transform_name=\"resize_crop\",\n# )\ndataset = dict(\n    type=\"VideoTextDataset\",\n    data_path=None,\n    num_frames=1,\n    frame_interval=1,\n    image_size=(256, 256),\n    transform_name=\"center\",\n)\nbucket_config = {  # 6s/it\n    \"256\": {1: (1.0, 256)},\n    \"512\": {1: (1.0, 80)},\n    \"480p\": {1: (1.0, 52)},\n    \"1024\": {1: (1.0, 20)},\n    \"1080p\": {1: (1.0, 8)},\n}\n\n# Define acceleration\nnum_workers = 16\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2\"\nsp_size = 1\n\n# Define model\n# model = dict(\n#     type=\"DiT-XL/2\",\n#     from_pretrained=\"/home/zhaowangbo/wangbo/PixArt-alpha/pretrained_models/PixArt-XL-2-512x512.pth\",\n#     # input_sq_size=512,  # pretrained model is trained on 512x512\n#     enable_flash_attn=True,\n#     enable_layernorm_kernel=True,\n# )\nmodel = dict(\n    type=\"PixArt-XL/2\",\n    space_scale=1.0,\n    time_scale=1.0,\n    no_temporal_pos_emb=True,\n    from_pretrained=\"PixArt-XL-2-512x512.pth\",\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\n# model = dict(\n#     type=\"DiT-XL/2\",\n#     # space_scale=1.0,\n#     # time_scale=1.0,\n#     no_temporal_pos_emb=True,\n#     # from_pretrained=\"PixArt-XL-2-512x512.pth\",\n#     from_pretrained=\"/home/zhaowangbo/wangbo/PixArt-alpha/pretrained_models/PixArt-XL-2-512x512.pth\",\n#     enable_flash_attn=True,\n#     enable_layernorm_kernel=True,\n# )\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    micro_batch_size=4,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=200,\n    shardformer=True,\n)\nscheduler = dict(\n    type=\"rflow\",\n    # timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 10\nlog_every = 10\nckpt_every = 500\nload = None\n\nbatch_size = 100  # only for logging\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/opensora-v1-1/train/stage1.py",
    "content": "# Define dataset\ndataset = dict(\n    type=\"VariableVideoTextDataset\",\n    data_path=None,\n    num_frames=None,\n    frame_interval=3,\n    image_size=(None, None),\n    transform_name=\"resize_crop\",\n)\n# IMG: 1024 (20%) 512 (30%) 256 (50%) drop (50%)\nbucket_config = {  # 1s/it\n    \"144p\": {1: (0.5, 48), 16: (1.0, 6), 32: (1.0, 3), 96: (1.0, 1)},\n    \"256\": {1: (0.5, 24), 16: (0.5, 3), 48: (0.5, 1), 64: (0.0, None)},\n    \"240p\": {16: (0.3, 2), 32: (0.3, 1), 64: (0.0, None)},\n    \"512\": {1: (0.4, 12)},\n    \"1024\": {1: (0.3, 3)},\n}\nmask_ratios = {\n    \"identity\": 0.75,\n    \"quarter_random\": 0.025,\n    \"quarter_head\": 0.025,\n    \"quarter_tail\": 0.025,\n    \"quarter_head_tail\": 0.05,\n    \"image_random\": 0.025,\n    \"image_head\": 0.025,\n    \"image_tail\": 0.025,\n    \"image_head_tail\": 0.05,\n}\n\n# Define acceleration\nnum_workers = 8\nnum_bucket_build_workers = 16\ndtype = \"bf16\"\ngrad_checkpoint = False\nplugin = \"zero2\"\nsp_size = 1\n\n# Define model\nmodel = dict(\n    type=\"STDiT2-XL/2\",\n    from_pretrained=None,\n    input_sq_size=512,  # pretrained model is trained on 512x512\n    qk_norm=True,\n    qk_norm_legacy=True,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    micro_batch_size=4,\n    local_files_only=True,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=200,\n    shardformer=True,\n    local_files_only=True,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 1000\nlog_every = 10\nckpt_every = 500\nload = None\n\nbatch_size = None\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/opensora-v1-1/train/stage2.py",
    "content": "# Define dataset\ndataset = dict(\n    type=\"VariableVideoTextDataset\",\n    data_path=None,\n    num_frames=None,\n    frame_interval=3,\n    image_size=(None, None),\n    transform_name=\"resize_crop\",\n)\nbucket_config = {  # 7s/it\n    \"144p\": {1: (1.0, 48), 16: (1.0, 17), 32: (1.0, 9), 64: (1.0, 4), 128: (1.0, 1)},\n    \"256\": {1: (0.8, 254), 16: (0.5, 17), 32: (0.5, 9), 64: (0.5, 4), 128: (0.5, 1)},\n    \"240p\": {1: (0.1, 20), 16: (0.9, 17), 32: (0.8, 9), 64: (0.8, 4), 128: (0.8, 2)},\n    \"512\": {1: (0.5, 86), 16: (0.2, 4), 32: (0.2, 2), 64: (0.2, 1), 128: (0.0, None)},\n    \"480p\": {1: (0.4, 54), 16: (0.4, 4), 32: (0.0, None)},\n    \"720p\": {1: (0.1, 20), 16: (0.1, 2), 32: (0.0, None)},\n    \"1024\": {1: (0.3, 20)},\n    \"1080p\": {1: (0.4, 8)},\n}\nmask_ratios = {\n    \"identity\": 0.75,\n    \"quarter_random\": 0.025,\n    \"quarter_head\": 0.025,\n    \"quarter_tail\": 0.025,\n    \"quarter_head_tail\": 0.05,\n    \"image_random\": 0.025,\n    \"image_head\": 0.025,\n    \"image_tail\": 0.025,\n    \"image_head_tail\": 0.05,\n}\n\n# Define acceleration\nnum_workers = 8\nnum_bucket_build_workers = 16\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2\"\nsp_size = 1\n\n# Define model\nmodel = dict(\n    type=\"STDiT2-XL/2\",\n    from_pretrained=None,\n    input_sq_size=512,  # pretrained model is trained on 512x512\n    qk_norm=True,\n    qk_norm_legacy=True,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    micro_batch_size=4,\n    local_files_only=True,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=200,\n    shardformer=True,\n    local_files_only=True,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 1000\nlog_every = 10\nckpt_every = 500\nload = None\n\nbatch_size = None\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/opensora-v1-1/train/stage3.py",
    "content": "# Define dataset\ndataset = dict(\n    type=\"VariableVideoTextDataset\",\n    data_path=None,\n    num_frames=None,\n    frame_interval=3,\n    image_size=(None, None),\n    transform_name=\"resize_crop\",\n)\nbucket_config = {  # 13s/it\n    \"144p\": {1: (1.0, 200), 16: (1.0, 36), 32: (1.0, 18), 64: (1.0, 9), 128: (1.0, 4)},\n    \"256\": {1: (0.8, 200), 16: (0.5, 22), 32: (0.5, 11), 64: (0.5, 6), 128: (0.8, 4)},\n    \"240p\": {1: (0.8, 200), 16: (0.5, 22), 32: (0.5, 10), 64: (0.5, 6), 128: (0.5, 3)},\n    \"360p\": {1: (0.5, 120), 16: (0.5, 9), 32: (0.5, 4), 64: (0.5, 2), 128: (0.5, 1)},\n    \"512\": {1: (0.5, 120), 16: (0.5, 9), 32: (0.5, 4), 64: (0.5, 2), 128: (0.8, 1)},\n    \"480p\": {1: (0.4, 80), 16: (0.6, 6), 32: (0.6, 3), 64: (0.6, 1), 128: (0.0, None)},\n    \"720p\": {1: (0.4, 40), 16: (0.6, 3), 32: (0.6, 1), 96: (0.0, None)},\n    \"1024\": {1: (0.3, 40)},\n}\nmask_ratios = {\n    \"identity\": 0.75,\n    \"quarter_random\": 0.025,\n    \"quarter_head\": 0.025,\n    \"quarter_tail\": 0.025,\n    \"quarter_head_tail\": 0.05,\n    \"image_random\": 0.025,\n    \"image_head\": 0.025,\n    \"image_tail\": 0.025,\n    \"image_head_tail\": 0.05,\n}\n\n# Define acceleration\nnum_workers = 8\nnum_bucket_build_workers = 16\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2\"\nsp_size = 1\n\n# Define model\nmodel = dict(\n    type=\"STDiT2-XL/2\",\n    from_pretrained=None,\n    input_sq_size=512,  # pretrained model is trained on 512x512\n    qk_norm=True,\n    qk_norm_legacy=True,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    micro_batch_size=4,\n    local_files_only=True,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=200,\n    shardformer=True,\n    local_files_only=True,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 1000\nlog_every = 10\nckpt_every = 500\nload = None\n\nbatch_size = None\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/opensora-v1-1/train/video.py",
    "content": "# Define dataset\ndataset = dict(\n    type=\"VariableVideoTextDataset\",\n    data_path=None,\n    num_frames=None,\n    frame_interval=3,\n    image_size=(None, None),\n    transform_name=\"resize_crop\",\n)\nbucket_config = {  # 6s/it\n    \"240p\": {16: (1.0, 16), 32: (1.0, 8), 64: (1.0, 4), 128: (1.0, 2)},\n    \"256\": {1: (1.0, 256)},\n    \"512\": {1: (0.5, 80)},\n    \"480p\": {1: (0.4, 52), 16: (0.4, 4), 32: (0.0, None)},\n    \"720p\": {16: (0.1, 2), 32: (0.0, None)},  # No examples now\n    \"1024\": {1: (0.3, 20)},\n    \"1080p\": {1: (0.3, 8)},\n}\n\n# Define acceleration\nnum_workers = 4\nnum_bucket_build_workers = 16\ndtype = \"bf16\"\ngrad_checkpoint = True\nplugin = \"zero2\"\nsp_size = 1\n\n# Define model\nmodel = dict(\n    type=\"STDiT2-XL/2\",\n    from_pretrained=None,\n    input_sq_size=512,  # pretrained model is trained on 512x512\n    qk_norm=True,\n    qk_norm_legacy=True,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    micro_batch_size=4,\n    local_files_only=True,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=200,\n    shardformer=True,\n    local_files_only=True,\n)\nscheduler = dict(\n    type=\"iddpm\",\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"\nwandb = False\n\nepochs = 1000\nlog_every = 10\nckpt_every = 500\nload = None\n\nbatch_size = 10  # only for logging\nlr = 2e-5\ngrad_clip = 1.0\n"
  },
  {
    "path": "Open-Sora/configs/opensora-v1-2/inference/sample.py",
    "content": "resolution = \"240p\"\naspect_ratio = \"9:16\"\nnum_frames = 51\nfps = 24\nframe_interval = 1\nsave_fps = 24\n\n#save_dir = \"./samples/samples/\"\nsave_dir = \"/root/autodl-tmp/video_samples/\"\nseed = 42\nbatch_size = 1\nmulti_resolution = \"STDiT2\"\ndtype = \"bf16\"\ncondition_frame_length = 5\nalign = 5\n\nmodel = dict(\n    type=\"STDiT3-XL/2\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/hpcai-tech/OpenSora-STDiT-v3\",\n    qk_norm=True,\n    enable_flash_attn=True,#True\n    enable_layernorm_kernel=True,#True\n)\nvae = dict(\n    type=\"OpenSoraVAE_V1_2\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/hpcai-tech/OpenSora-VAE-v1.2\",\n    micro_frame_size=17,\n    micro_batch_size=4,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=300,\n)\nscheduler = dict(\n    type=\"rflow\",\n    use_timestep_transform=True,\n    num_sampling_steps=30,\n    cfg_scale=7.0,\n)\n\naes = 6.5\nflow = None\n#num_sample = 1\n"
  },
  {
    "path": "Open-Sora/docs/acceleration.md",
    "content": "# Acceleration\n\n>This document corresponds to our v1.1 release\n\nOpen-Sora aims to provide a high-speed training framework for diffusion models. We can achieve **55%** training speed acceleration when training on **64 frames 512x512 videos**. Our framework support training **1min 1080p videos**.\n\n## Accelerated Transformer\n\nOpen-Sora boosts the training speed by:\n\n- Kernel optimization including [flash attention](https://github.com/Dao-AILab/flash-attention), fused layernorm kernel, and the ones compiled by colossalAI.\n- Hybrid parallelism including ZeRO.\n- Gradient checkpointing for larger batch size.\n\nOur training speed on images is comparable to [OpenDiT](https://github.com/NUS-HPC-AI-Lab/OpenDiT), a project to accelerate DiT training. The training speed is measured on 8 H800 GPUs with batch size 128, image size 256x256.\n\n| Model    | Throughput (img/s/GPU) | Throughput (tokens/s/GPU) |\n| -------- | ---------------------- | ------------------------- |\n| DiT      | 100                    | 26k                       |\n| OpenDiT  | 175                    | 45k                       |\n| OpenSora | 175                    | 45k                       |\n\n## Efficient STDiT\n\nOur STDiT adopts spatial-temporal attention to model the video data. Compared with directly applying full attention on DiT, our STDiT is more efficient as the number of frames increases. Our current framework only supports sequence parallelism for very long sequence.\n\nThe training speed is measured on 8 H800 GPUs with acceleration techniques applied, GC means gradient checkpointing. Both with T5 conditioning like PixArt.\n\n| Model            | Setting        | Throughput (sample/s/GPU) | Throughput (tokens/s/GPU) |\n| ---------------- | -------------- | ------------------------- | ------------------------- |\n| DiT              | 16x256  (4k)   | 7.20                      | 29k                       |\n| STDiT            | 16x256  (4k)   | 7.00                      | 28k                       |\n| DiT              | 16x512  (16k)  | 0.85                      | 14k                       |\n| STDiT            | 16x512  (16k)  | 1.45                      | 23k                       |\n| DiT (GC)         | 64x512  (65k)  | 0.08                      | 5k                        |\n| STDiT (GC)       | 64x512  (65k)  | 0.40                      | 25k                       |\n| STDiT (GC, sp=2) | 360x512 (370k) | 0.10                      | 18k                       |\n\nWith a 4x downsampling in the temporal dimension with Video-VAE, an 24fps video has 450 frames. The gap between the speed of STDiT (28k tokens/s) and DiT on images (up to 45k tokens/s) mainly comes from the T5 and VAE encoding, and temporal attention.\n\n## Accelerated Encoder (T5, VAE)\n\nDuring training, texts are encoded by T5, and videos are encoded by VAE. Typically there are two ways to accelerate the training:\n\n1. Preprocess text and video data in advance and save them to disk.\n2. Encode text and video data during training, and accelerate the encoding process.\n\nFor option 1, 120 tokens for one sample require 1M disk space, and a 64x64x64 latent requires 4M. Considering a training dataset with 10M video clips, the total disk space required is 50TB. Our storage system is not ready at this time for this scale of data.\n\nFor option 2, we boost T5 speed and memory requirement. According to [OpenDiT](https://github.com/NUS-HPC-AI-Lab/OpenDiT), we find VAE consumes a large number of GPU memory. Thus we split batch size into smaller ones for VAE encoding. With both techniques, we can greatly accelerate the training speed.\n\nThe training speed is measured on 8 H800 GPUs with STDiT.\n\n| Acceleration | Setting       | Throughput (img/s/GPU) | Throughput (tokens/s/GPU) |\n| ------------ | ------------- | ---------------------- | ------------------------- |\n| Baseline     | 16x256  (4k)  | 6.16                   | 25k                       |\n| w. faster T5 | 16x256  (4k)  | 7.00                   | 29k                       |\n| Baseline     | 64x512  (65k) | 0.94                   | 15k                       |\n| w. both      | 64x512  (65k) | 1.45                   | 23k                       |\n"
  },
  {
    "path": "Open-Sora/docs/commands.md",
    "content": "# Commands\n\n- [Config](#Config)\n- [Inference](#inference)\n  - [Inference with Open-Sora 1.2](#inference-with-open-sora-12)\n  - [Inference with Open-Sora 1.1](#inference-with-open-sora-11)\n  - [Inference with DiT pretrained on ImageNet](#inference-with-dit-pretrained-on-imagenet)\n  - [Inference with Latte pretrained on UCF101](#inference-with-latte-pretrained-on-ucf101)\n  - [Inference with PixArt-α pretrained weights](#inference-with-pixart-α-pretrained-weights)\n  - [Inference with checkpoints saved during training](#inference-with-checkpoints-saved-during-training)\n  - [Inference Hyperparameters](#inference-hyperparameters)\n- [Training](#training)\n  - [Training Hyperparameters](#training-hyperparameters)\n- [Search batch size for buckets](#search-batch-size-for-buckets)\n\n## Config\nNote that currently our model loading for vae and diffusion model supports two types:\n\n* load from local file path\n* load from huggingface\n\nOur config supports loading from huggingface online image by default.\nIf you wish to load from a local path downloaded from huggingface image, you need to set `force_huggingface=True`, for instance:\n\n```python\n# for vae\nvae = dict(\n    type=\"OpenSoraVAE_V1_2\",\n    from_pretrained=\"/root/commonData/OpenSora-VAE-v1.2\",\n    micro_frame_size=17,\n    micro_batch_size=4,\n    force_huggingface=True, # NOTE: set here\n)\n# for diffusion model\nmodel = dict(\n    type=\"STDiT3-XL/2\",\n    from_pretrained=\"/root/commonData/OpenSora-STDiT-v3\",\n    qk_norm=True,\n    enable_flash_attn=True,\n    enable_layernorm_kernel=True,\n    force_huggingface=True, # NOTE: set here\n)\n```\nHowever, if you want to load a self-trained model, do not set `force_huggingface=True` since your image won't be in huggingface format.\n\n## Inference\n\nYou can modify corresponding config files to change the inference settings. See more details [here](/docs/structure.md#inference-config-demos).\n\n### Inference with Open-Sora 1.2\n\nThe inference API is compatible with Open-Sora 1.1. To ease users' experience, we add support to `--resolution` and `--aspect-ratio` options, which is a more user-friendly way to specify the image size.\n\n```bash\npython scripts/inference.py configs/opensora-v1-2/inference/sample.py \\\n    --resolution 480p --aspect-ratio 9:16\n# equivalent to\npython scripts/inference.py configs/opensora-v1-2/inference/sample.py \\\n    --image-size 480 853\n```\n\nIn this version, we have merged all functions in previous `inference-long.py` into `inference.py`. The command line arguments are the same as before (only note that the frame index and length is calculated with 4x compressed).\n\n### Inference with Open-Sora 1.1\n\nSince Open-Sora 1.1 supports inference with dynamic input size, you can pass the input size as an argument.\n\n```bash\n# image sampling with prompt path\npython scripts/inference.py configs/opensora-v1-1/inference/sample.py \\\n    --ckpt-path CKPT_PATH --prompt-path assets/texts/t2i_samples.txt --num-frames 1 --image-size 1024 1024\n\n# image sampling with prompt\npython scripts/inference.py configs/opensora-v1-1/inference/sample.py \\\n    --ckpt-path CKPT_PATH --prompt \"A beautiful sunset over the city\" --num-frames 1 --image-size 1024 1024\n\n# video sampling\npython scripts/inference.py configs/opensora-v1-1/inference/sample.py \\\n    --ckpt-path CKPT_PATH --prompt \"A beautiful sunset over the city\" --num-frames 16 --image-size 480 854\n```\n\nYou can adjust the `--num-frames` and `--image-size` to generate different results. We recommend you to use the same image size as the training resolution, which is defined in [aspect.py](/opensora/datasets/aspect.py). Some examples are shown below.\n\n- 240p\n  - 16:9 240x426\n  - 3:4 276x368\n  - 1:1 320x320\n- 480p\n  - 16:9 480x854\n  - 3:4 554x738\n  - 1:1 640x640\n- 720p\n  - 16:9 720x1280\n  - 3:4 832x1110\n  - 1:1 960x960\n\n`inference-long.py` is compatible with `inference.py` and supports advanced features.\n\n```bash\n# image condition\npython scripts/inference-long.py configs/opensora-v1-1/inference/sample.py --ckpt-path CKPT_PATH \\\n  --num-frames 32 --image-size 240 426 --sample-name image-cond \\\n  --prompt 'A breathtaking sunrise scene.{\"reference_path\": \"assets/images/condition/wave.png\",\"mask_strategy\": \"0\"}'\n\n# video extending\npython scripts/inference-long.py configs/opensora-v1-1/inference/sample.py --ckpt-path CKPT_PATH \\\n  --num-frames 32 --image-size 240 426 --sample-name image-cond \\\n  --prompt 'A car driving on the ocean.{\"reference_path\": \"https://cdn.openai.com/tmp/s/interp/d0.mp4\",\"mask_strategy\": \"0,0,0,-8,8\"}'\n\n# long video generation\npython scripts/inference-long.py configs/opensora-v1-1/inference/sample.py --ckpt-path CKPT_PATH \\\n  --num-frames 32 --image-size 240 426 --loop 16 --condition-frame-length 8 --sample-name long \\\n  --prompt '|0|a white jeep equipped with a roof rack driving on a dirt road in a coniferous forest.|2|a white jeep equipped with a roof rack driving on a dirt road in the desert.|4|a white jeep equipped with a roof rack driving on a dirt road in a mountain.|6|A white jeep equipped with a roof rack driving on a dirt road in a city.|8|a white jeep equipped with a roof rack driving on a dirt road on the surface of a river.|10|a white jeep equipped with a roof rack driving on a dirt road under the lake.|12|a white jeep equipped with a roof rack flying into the sky.|14|a white jeep equipped with a roof rack driving in the universe. Earth is the background.{\"reference_path\": \"https://cdn.openai.com/tmp/s/interp/d0.mp4\", \"mask_strategy\": \"0,0,0,0,16\"}'\n\n# video connecting\npython scripts/inference-long.py configs/opensora-v1-1/inference/sample.py --ckpt-path CKPT_PATH \\\n  --num-frames 32 --image-size 240 426 --sample-name connect \\\n  --prompt 'A breathtaking sunrise scene.{\"reference_path\": \"assets/images/condition/sunset1.png;assets/images/condition/sunset2.png\",\"mask_strategy\": \"0;0,1,0,-1,1\"}'\n\n# video editing\npython scripts/inference-long.py configs/opensora-v1-1/inference/sample.py --ckpt-path CKPT_PATH \\\n  --num-frames 32 --image-size 480 853 --sample-name edit \\\n  --prompt 'A cyberpunk-style city at night.{\"reference_path\": \"https://cdn.pixabay.com/video/2021/10/12/91744-636709154_large.mp4\",\"mask_strategy\": \"0,0,0,0,32,0.4\"}'\n```\n\n### Inference with DiT pretrained on ImageNet\n\nThe following command automatically downloads the pretrained weights on ImageNet and runs inference.\n\n```bash\npython scripts/inference.py configs/dit/inference/1x256x256-class.py --ckpt-path DiT-XL-2-256x256.pt\n```\n\n### Inference with Latte pretrained on UCF101\n\nThe following command automatically downloads the pretrained weights on UCF101 and runs inference.\n\n```bash\npython scripts/inference.py configs/latte/inference/16x256x256-class.py --ckpt-path Latte-XL-2-256x256-ucf101.pt\n```\n\n### Inference with PixArt-α pretrained weights\n\nDownload T5 into `./pretrained_models` and run the following command.\n\n```bash\n# 256x256\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/pixart/inference/1x256x256.py --ckpt-path PixArt-XL-2-256x256.pth\n\n# 512x512\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/pixart/inference/1x512x512.py --ckpt-path PixArt-XL-2-512x512.pth\n\n# 1024 multi-scale\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/pixart/inference/1x1024MS.py --ckpt-path PixArt-XL-2-1024MS.pth\n```\n\n### Inference with checkpoints saved during training\n\nDuring training, an experiment logging folder is created in `outputs` directory. Under each checkpoint folder, e.g. `epoch12-global_step2000`, there is a `ema.pt` and the shared `model` folder. Run the following command to perform inference.\n\n```bash\n# inference with ema model\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path outputs/001-STDiT-XL-2/epoch12-global_step2000/ema.pt\n\n# inference with model\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path outputs/001-STDiT-XL-2/epoch12-global_step2000\n\n# inference with sequence parallelism\n# sequence parallelism is enabled automatically when nproc_per_node is larger than 1\ntorchrun --standalone --nproc_per_node 2 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path outputs/001-STDiT-XL-2/epoch12-global_step2000\n```\n\nThe second command will automatically generate a `model_ckpt.pt` file in the checkpoint folder.\n\n### Inference Hyperparameters\n\n1. DPM-solver is good at fast inference for images. However, the video result is not satisfactory. You can use it for fast demo purpose.\n\n```python\ntype=\"dmp-solver\"\nnum_sampling_steps=20\n```\n\n2. You can use [SVD](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt)'s finetuned VAE decoder on videos for inference (consumes more memory). However, we do not see significant improvement in the video result. To use it, download [the pretrained weights](https://huggingface.co/maxin-cn/Latte/tree/main/t2v_required_models/vae_temporal_decoder) into `./pretrained_models/vae_temporal_decoder` and modify the config file as follows.\n\n```python\nvae = dict(\n    type=\"VideoAutoencoderKLTemporalDecoder\",\n    from_pretrained=\"pretrained_models/vae_temporal_decoder\",\n)\n```\n\n## Training\n\nTo resume training, run the following command. ``--load`` different from ``--ckpt-path`` as it loads the optimizer and dataloader states.\n\n```bash\ntorchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --load YOUR_PRETRAINED_CKPT\n```\n\nTo enable wandb logging, add `--wandb` to the command.\n\n```bash\nWANDB_API_KEY=YOUR_WANDB_API_KEY torchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --wandb True\n```\n\nYou can modify corresponding config files to change the training settings. See more details [here](/docs/structure.md#training-config-demos).\n\n### Training Hyperparameters\n\n1. `dtype` is the data type for training. Only `fp16` and `bf16` are supported. ColossalAI automatically enables the mixed precision training for `fp16` and `bf16`. During training, we find `bf16` more stable.\n\n## Search batch size for buckets\n\nTo search the batch size for buckets, run the following command.\n\n```bash\ntorchrun --standalone --nproc_per_node 1 scripts/misc/search_bs.py configs/opensora-v1-2/misc/bs.py --data-path /mnt/nfs-207/sora_data/meta/searchbs.csv\n```\n\nHere, your data should be a small one for searching purposes.\n\nTo control the batch size search range, you should specify `bucket_config` in the config file, where the value tuple is `(guess_value, range)` and the search will be performed in `guess_value±range`.\n\nHere is an example of the bucket config:\n\n```python\nbucket_config = {\n  \"240p\": {\n        1: (100, 100),\n        51: (24, 10),\n        102: (12, 10),\n        204: (4, 8),\n        408: (2, 8),\n    },\n    \"480p\": {\n        1: (50, 50),\n        51: (6, 6),\n        102: (3, 3),\n        204: (1, 2),\n    },\n}\n```\n\nYou can also specify a resolution to search for parallelism.\n\n```bash\ntorchrun --standalone --nproc_per_node 1 scripts/misc/search_bs.py configs/opensora-v1-2/misc/bs.py --data-path /mnt/nfs-207/sora_data/meta/searchbs.csv --resolution 240p\n```\n\nThe searching goal should be specified in the config file as well. There are two ways:\n\n1. Specify a `base_step_time` in the config file. The searching goal is to find the batch size that can achieve the `base_step_time` for each bucket.\n2. If `base_step_time` is not specified, it will be determined by `base` which is a tuple of `(batch_size, step_time)`. The step time is the maximum batch size allowed for the bucket.\n\nThe script will print the best batch size (and corresponding step time) for each bucket and save the output config file. Note that we assume a larger batch size is better, so the script use binary search to find the best batch size.\n"
  },
  {
    "path": "Open-Sora/docs/config.md",
    "content": "# Config Guide\n\n- [Inference Config](#inference-config)\n- [Advanced Inference config](#advanced-inference-config)\n- [Inference Args](#inference-args)\n- [Training Config](#training-config)\n- [Training Args](#training-args)\n- [Training Bucket Configs](#training-bucket-configs)\n\nOur config files follows [MMEgine](https://github.com/open-mmlab/mmengine). MMEngine will reads the config file (a `.py` file) and parse it into a dictionary-like object. We expose some fields in the config file to the command line arguments (defined in [opensora/utils/config_util.py](/opensora/utils/config_utils.py)). To change the inference settings, you can directly modify the corresponding config file. Or you can pass arguments to overwrite the config file.\n\n## Inference Config\n\nThe explanation of each field is provided below.\n\n```python\n# Define sampling size\nnum_frames = 64               # number of frames, 1 means image\nfps = 24                      # frames per second (condition for generation)\nframe_interval = 3            # output video will have fps/frame_interval frames per second\nimage_size = (240, 426)       # image size (height, width)\n\n# Define model\nmodel = dict(\n    type=\"STDiT2-XL/2\",       # Select model type (STDiT-XL/2, DiT-XL/2, etc.)\n    from_pretrained=\"PRETRAINED_MODEL\",  # (Optional) Load from pretrained model\n    input_sq_size=512,        # Base spatial position embedding size\n    qk_norm=True,             # Normalize query and key in attention\n    enable_flash_attn=True,    # (Optional) Speed up training and inference with flash attention\n    # Turn enable_flash_attn to False if you skip flashattn installation\n    enable_layernorm_kernel=True, # (Optional) Speed up training and inference with fused kernel\n    # Turn enable_layernorm_kernel to False if you skip apex installation\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\", # Select VAE type\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\", # Load from pretrained VAE\n    micro_batch_size=4,        # VAE with micro batch size to save memory\n)\ntext_encoder = dict(\n    type=\"t5\",                 # Select text encoder type (t5, clip)\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\", # Load from pretrained text encoder\n    model_max_length=200,      # Maximum length of input text\n)\nscheduler = dict(\n    type=\"iddpm\",              # Select scheduler type (iddpm, dpm-solver)\n    num_sampling_steps=100,    # Number of sampling steps\n    cfg_scale=7.0,             # hyper-parameter for classifier-free diffusion\n    cfg_channel=3,             # how many channels to use for classifier-free diffusion, if None, use all channels\n)\ndtype = \"bf16\"                 # Computation type (fp16, fp32, bf16)\n\n# Condition\nprompt_path = \"./assets/texts/t2v_samples.txt\" # path to prompt file\nprompt = None                  # prompt has higher priority than prompt_path\n\n# Other settings\nbatch_size = 1                 # batch size\nseed = 42                      # random seed\nsave_dir = \"./samples\"         # path to save samples\n```\n\n## Advanced Inference config\n\nThe [`inference-long.py`](/scripts/inference-long.py) script is used to generate long videos, and it also provides all functions of the [`inference.py`](/scripts/inference.py) script. The following arguments are specific to the `inference-long.py` script.\n\n```python\nloop = 10\ncondition_frame_length = 4\nreference_path = [\n    \"https://cdn.openai.com/tmp/s/interp/d0.mp4\",\n    None,\n    \"assets/images/condition/wave.png\",\n]\nmask_strategy = [\n    \"0,0,0,0,8,0.3\",\n    None,\n    \"0,0,0,0,1;0,0,0,-1,1\",\n]\n```\n\nThe following figure provides an illustration of the `mask_strategy`:\n\n![mask_strategy](/assets/readme/report_mask_config.png)\n\nTo generate a long video of infinite time, our strategy is to generate a video with a fixed length first, and then use the last `condition_frame_length` number of frames for the next video generation. This will loop for `loop` times. Thus, the total length of the video is `loop * (num_frames - condition_frame_length) + condition_frame_length`.\n\nTo condition the generation on images or videos, we introduce the `mask_strategy`. It is 6 number tuples separated by `;`.  Each tuple indicate an insertion of the condition image or video to the target generation. The meaning of each number is:\n\n- **First number**: the loop index of the condition image or video. (0 means the first loop, 1 means the second loop, etc.)\n- **Second number**: the index of the condition image or video in the `reference_path`.\n- **Third number**: the start frame of the condition image or video. (0 means the first frame, and images only have one frame)\n- **Fourth number**: the location to insert. (0 means insert at the beginning, 1 means insert at the end, and -1 means insert at the end of the video)\n- **Fifth number**: the number of frames to insert. (1 means insert one frame, and images only have one frame)\n- **Sixth number**: the edit rate of the condition image or video. (0 means no edit, 1 means full edit).\n\nTo facilitate usage, we also accept passing the reference path and mask strategy as a json appended to the prompt. For example,\n\n```plaintext\n'Drone view of waves crashing against the rugged cliffs along Big Sur\\'s garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff\\'s edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff\\'s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.{\"reference_path\": \"assets/images/condition/cliff.png\", \"mask_strategy\": \"0\"}'\n```\n\n## Inference Args\n\nYou can use `python scripts/inference.py --help` to see the following arguments:\n\n- `--seed`: random seed\n- `--ckpt-path`: path to the checkpoint (`model[\"from_pretrained\"]`)\n- `--batch-size`: batch size\n- `--save-dir`: path to save samples\n- `--sample-name`: if None, the sample will be name by `sample_{index}.mp4/png`, otherwise, the sample will be named by `{sample_name}_{index}.mp4/png`\n- `--start-index`: start index of the sample\n- `--end-index`: end index of the sample\n- `--num-sample`: number of samples to generate for each prompt. The sample will be suffixed by `-0`, `-1`, `-2`, etc.\n- `--prompt-as-path`: if True, use the prompt as the name for saving samples\n- `--prompt-path`: path to the prompt file\n- `--prompt`: prompt string list\n- `--num-frames`: number of frames\n- `--fps`: frames per second\n- `--image-size`: image size\n- `--num-sampling-steps`: number of sampling steps (`scheduler[\"num_sampling_steps\"]`)\n- `--cfg-scale`: hyper-parameter for classifier-free diffusion (`scheduler[\"cfg_scale\"]`)\n- `--loop`: loop for long video generation\n- `--condition-frame-length`: condition frame length for long video generation\n- `--reference-path`: reference path for long video generation\n- `--mask-strategy`: mask strategy for long video generation\n\nExample commands for inference can be found in [commands.md](/docs/commands.md).\n\n## Training Config\n\n```python\n# Define dataset\ndataset = dict(\n    type=\"VariableVideoTextDataset\",   # Select dataset type\n    # VideoTextDataset for OpenSora 1.0, VariableVideoTextDataset for OpenSora 1.1 and 1.2\n    data_path=None,                    # Path to the dataset\n    num_frames=None,                   # Number of frames, set None since we support dynamic training\n    frame_interval=3,                  # Frame interval\n    image_size=(None, None),           # Image size, set None since we support dynamic training\n    transform_name=\"resize_crop\",      # Transform name\n)\n# bucket config usage see next section\nbucket_config = {\n    \"144p\": {1: (1.0, 48), 16: (1.0, 17), 32: (1.0, 9), 64: (1.0, 4), 128: (1.0, 1)},\n    \"256\": {1: (0.8, 254), 16: (0.5, 17), 32: (0.5, 9), 64: (0.5, 4), 128: (0.5, 1)},\n    \"240p\": {1: (0.1, 20), 16: (0.9, 17), 32: (0.8, 9), 64: (0.8, 4), 128: (0.8, 2)},\n    \"512\": {1: (0.5, 86), 16: (0.2, 4), 32: (0.2, 2), 64: (0.2, 1), 128: (0.0, None)},\n    \"480p\": {1: (0.4, 54), 16: (0.4, 4), 32: (0.0, None)},\n    \"720p\": {1: (0.1, 20), 16: (0.1, 2), 32: (0.0, None)},\n    \"1024\": {1: (0.3, 20)},\n    \"1080p\": {1: (0.4, 8)},\n}\n# mask ratio in training\nmask_ratios = {\n    \"identity\": 0.75,                   # 75% no mask\n    \"quarter_random\": 0.025,      # 2.5% random mask with 1 frame to 1/4 #frames\n    \"quarter_head\": 0.025,        # 2.5% mask at the beginning with 1 frame to 1/4 #frames\n    \"quarter_tail\": 0.025,        # 2.5% mask at the end with 1 frame to 1/4 #frames\n    \"quarter_head_tail\": 0.05,    # 5% mask at the beginning and end with 1 frame to 1/4 #frames\n    \"image_random\": 0.025,        # 2.5% random mask with 1 image to 1/4 #images\n    \"image_head\": 0.025,          # 2.5% mask at the beginning with 1 image to 1/4 #images\n    \"image_tail\": 0.025,          # 2.5% mask at the end with 1 image to 1/4 #images\n    \"image_head_tail\": 0.05,      # 5% mask at the beginning and end with 1 image to 1/4 #images\n}\n\n# Define acceleration\nnum_workers = 8                        # Number of workers for dataloader\nnum_bucket_build_workers = 16          # Number of workers for bucket building\ndtype = \"bf16\"                         # Computation type (fp16, fp32, bf16)\ngrad_checkpoint = True                 # Use gradient checkpointing\nplugin = \"zero2\"                       # Plugin for training\nsp_size = 1                            # Sequence parallel size\n\n# Define model\nmodel = dict(\n    type=\"STDiT2-XL/2\",                # Select model type (STDiT-XL/2, DiT-XL/2, etc.)\n    from_pretrained=None,              # Load from pretrained model\n    input_sq_size=512,                 # Base spatial position embedding size\n    qk_norm=True,                      # Normalize query and key in attention\n    enable_flash_attn=True,             # (Optional) Speed up training and inference with flash attention\n    enable_layernorm_kernel=True,      # (Optional) Speed up training and inference with fused kernel\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",         # Select VAE type\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    micro_batch_size=4,                # VAE with micro batch size to save memory\n    local_files_only=True,             # Load from local files only (first time should be false)\n)\ntext_encoder = dict(\n    type=\"t5\",                         # Select text encoder type (t5, clip)\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=200,              # Maximum length of input text\n    shardformer=True,                  # Use shardformer\n    local_files_only=True,             # Load from local files only (first time should be false)\n)\nscheduler = dict(\n    type=\"iddpm\",                      # Select scheduler type (iddpm, iddpm-speed)\n    timestep_respacing=\"\",\n)\n\n# Others\nseed = 42                              # random seed\noutputs = \"outputs\"                    # path to save outputs\nwandb = False                          # Use wandb or not\n\nepochs = 1000                          # Number of epochs (set a large number and kill the process when you want to stop)\nlog_every = 10\nckpt_every = 500\nload = None\n\nbatch_size = None\nlr = 2e-5\ngrad_clip = 1.0\n```\n\n## Training Args\n\n- `--seed`: random seed\n- `--ckpt-path`: path to the checkpoint (`model[\"from_pretrained\"]`)\n- `--batch-size`: batch size\n- `--wandb`: use wandb or not\n- `--load`: path to the checkpoint to load\n- `--data-path`: path to the dataset (`dataset[\"data_path\"]`)\n\nSee [commands.md](/docs/commands.md) for example commands.\n\n## Training Bucket Configs\n\nWe support multi-resolution/aspect-ratio/num_frames training with bucket. To enable dynamic training (for STDiT2), use `VariableVideoText` dataset, and set the `bucket_config` in the config. An example is:\n\n```python\nbucket_config = {\n    \"240p\": {16: (1.0, 16), 32: (1.0, 8), 64: (1.0, 4), 128: (1.0, 2)},\n    \"256\": {1: (1.0, 256)},\n    \"512\": {1: (1.0, 80)},\n    \"480p\": {1: (1.0, 52), 16: (0.5, 4), 32: (0.0, None)},\n    \"720p\": {16: (1.0, 2), 32: (0.0, None)},\n    \"1024\": {1: (1.0, 20)},\n    \"1080p\": {1: (1.0, 8)},\n}\n```\n\nThis looks a bit difficult to understand at the first glance. Let's understand this config step by step.\n\n### Three-level bucket\n\n![bucket](/assets/readme/report_bucket.png)\n\nWe design a three-level bucket: `(resolution, num_frames, aspect_ratios)`. The resolution and aspect ratios is predefined in [aspect.py](/opensora/datasets/aspect.py). Commonly used resolutions (e.g., 240p, 1080p) are supported, and the name represents the number of pixels (e.g., 240p is 240x426, however, we define 240p to represent any size with HxW approximately 240x426=102240 pixels). The aspect ratios are defined for each resolution. You do not need to define the aspect ratios in the `bucket_config`.\n\nThe `num_frames` is the number of frames in each sample, with `num_frames=1` especially for images. If `frame_intervals` is not 1, a bucket with `num_frames=k` will contain videos with `k*frame_intervals` frames except for images. Only a video with more than `num_frames` and more than `resolution` pixels will be likely to be put into the bucket.\n\nThe two number defined in the bucket config is `(keep_prob, batch_size)`. Since the memory and speed of samples from different buckets may be different, we use `batch_size` to balance the processing speed. Since our computation is limited, we cannot process videos with their original resolution as stated in OpenAI's sora's report. Thus, we give a `keep_prob` to control the number of samples in each bucket. The `keep_prob` is the probability to keep a sample in the bucket. Let's take the following config as an example:\n\n```python\nbucket_config = {\n    \"480p\": {16: (1.0, 8),},\n    \"720p\": {16: (0.5, 4),},\n    \"1080p\": {16: (0.2, 2)},\n    \"4K\", {16: (0.1, 1)},\n}\n```\n\nGiven a 2K video with more than 16 frames, the program will first try to put it into bucket \"1080p\" since it has a larger resolution than 1080p but less than 4K. Since the `keep_prob` for 1080p is 20%, a random number is generated, and if it is less than 0.2, the video will be put into the bucket. If the video is not put into the bucket, the program will try to put it into the \"720p\" bucket. Since the `keep_prob` for 720p is 50%, the video has a 50% chance to be put into the bucket. If the video is not put into the bucket, the program will try to put it into the \"480p\" bucket directly as it is the smallest resolution.\n\n### Examples\n\nLet's see some simple examples to understand the bucket config. First, the aspect ratio bucket is compulsory, if you want to modify this you need to add your own resolution definition in [aspect.py](/opensora/datasets/aspect.py). Then, to keep only 256x256 resolution and 16 frames as OpenSora 1.0, you can use the following config:\n\n```python\nbucket_config = {\n    \"256\": {16: (1.0, 8)},\n}\n```\n\nIf you want to train a model supporting different resolutions of images, you can use the following config (example [image.py](/configs/opensora-v1-1/train/image.py)):\n\n```python\nbucket_config = {\n    \"256\": {1: (1.0, 256)},\n    \"512\": {1: (1.0, 80)},\n    \"480p\": {1: (1.0, 52)},\n    \"1024\": {1: (1.0, 20)},\n    \"1080p\": {1: (1.0, 8)},\n}\n```\n\nOr if you find the number of high-resolution images is too large, you can modify the `keep_prob` to reduce the number of samples in the bucket:\n\n```python\nbucket_config = {\n    \"256\": {1: (1.0, 256)},\n    \"512\": {1: (0.8, 80)},\n    \"480p\": {1: (0.5, 52)},\n    \"1024\": {1: (0.5, 20)},\n    \"1080p\": {1: (0.2, 8)},\n}\n```\n\nAnd similarly for videos (example [video.py](/configs/opensora-v1-1/train/video.py)):\n\n```python\nbucket_config = {\n    \"240p\": {16: (1.0, 16), 32: (1.0, 8), 64: (1.0, 4), 128: (1.0, 2)},\n    \"480p\": {16: (1.0, 4)},\n    \"720p\": {16: (0.5, 2)},\n}\n```\n\nNote that in the above case, a video with 480p resolution and more than 16 frames will all go into bucket `(\"480p\", 16)`, since they all satisfy this bucket's requirement. But training long videos with 480p resolution may be slow, so you can modify the config as follows to enforce the video with more than 32 frames to go into the 240p bucket.\n\n```python\nbucket_config = {\n    \"240p\": {16: (1.0, 16), 32: (1.0, 8), 64: (1.0, 4), 128: (1.0, 2)},\n    \"480p\": {16: (1.0, 4), 32: (0.0, None)},\n    \"720p\": {16: (0.5, 2)},\n}\n```\n\nCombine the above examples together, we think you can understand the bucket config provided at the beginning of this section and in the config files.\n"
  },
  {
    "path": "Open-Sora/docs/data_processing.md",
    "content": "# Data Processing\n>Open-Sora v1.2 uses Data Propcessing Pipeline v1.1.\n\nWe establish a complete pipeline for video/image data processing. The pipeline is shown below.\n\n![pipeline](/assets/readme/report_data_pipeline.png)\n\nFirst, raw videos,\neither from the  Internet or public datasets, are split into shorter clips based on scene detection.\nThen, we evaluate these videos by predicting multiple scores using existing models. We first predict the aesthetic score\nand the optical flow score for a video. We also conduct OCR to detect texts in the video. Only videos with satisfactory\nevaluation results are sent to the next step for captioning. After captioning, the matching score is also calculated as\nan assessment of video-text alignment. Finally, we filter samples based on the matching score and\nconduct camera motion detection for the remaining samples.\nIn summary, our pipeline produces video-text pairs which have high aesthetic quality, large video motion and strong\nsemantic consistency.\n\nBelow is an example workflow to process videos.\n\n```bash\nROOT_VIDEO=\"/path/to/video/folder\"\nROOT_CLIPS=\"/path/to/video/clips/folder\"\nROOT_META=\"/path/to/meta/folder\"\n\n# 1.1 Create a meta file from a video folder. This should output ${ROOT_META}/meta.csv\npython -m tools.datasets.convert video ${ROOT_VIDEO} --output ${ROOT_META}/meta.csv\n\n# 1.2 Get video information and remove broken videos. This should output ${ROOT_META}/meta_info_fmin1.csv\npython -m tools.datasets.datautil ${ROOT_META}/meta.csv --info --fmin 1\n\n# 2.1 Detect scenes. This should output ${ROOT_META}/meta_info_fmin1_timestamp.csv\npython -m tools.scene_cut.scene_detect ${ROOT_META}/meta_info_fmin1.csv\n\n# 2.2 Cut video into clips based on scenes. This should produce video clips under ${ROOT_CLIPS}\npython -m tools.scene_cut.cut ${ROOT_META}/meta_info_fmin1_timestamp.csv --save_dir ${ROOT_CLIPS}\n\n# 2.3 Create a meta file for video clips. This should output ${ROOT_META}/meta_clips.csv\npython -m tools.datasets.convert video ${ROOT_CLIPS} --output ${ROOT_META}/meta_clips.csv\n\n# 2.4 Get clips information and remove broken ones. This should output ${ROOT_META}/meta_clips_info_fmin1.csv\npython -m tools.datasets.datautil ${ROOT_META}/meta_clips.csv --info --fmin 1\n\n# 3.1 Predict aesthetic scores. This should output ${ROOT_META}/meta_clips_info_fmin1_aes.csv\ntorchrun --nproc_per_node 8 -m tools.scoring.aesthetic.inference \\\n  ${ROOT_META}/meta_clips_info_fmin1.csv \\\n  --bs 1024 \\\n  --num_workers 16\n\n# 3.2 Filter by aesthetic scores. This should output ${ROOT_META}/meta_clips_info_fmin1_aes_aesmin5.csv\npython -m tools.datasets.datautil ${ROOT_META}/meta_clips_info_fmin1_aes.csv --aesmin 5\n\n# 4.1 Generate caption. This should output ${ROOT_META}/meta_clips_info_fmin1_aes_aesmin5_caption_part*.csv\ntorchrun --nproc_per_node 8 --standalone -m tools.caption.caption_llava \\\n  ${ROOT_META}/meta_clips_info_fmin1_aes_aesmin5.csv \\\n  --dp-size 8 \\\n  --tp-size 1 \\\n  --model-path /path/to/llava-v1.6-mistral-7b \\\n  --prompt video\n\n# 4.2 Merge caption results. This should output ${ROOT_META}/meta_clips_caption.csv\npython -m tools.datasets.datautil ${ROOT_META}/meta_clips_info_fmin1_aes_aesmin5_caption_part*.csv --output ${ROOT_META}/meta_clips_caption.csv\n\n# 4.3 Clean caption. This should output ${ROOT_META}/meta_clips_caption_cleaned.csv\npython -m tools.datasets.datautil \\\n  ${ROOT_META}/meta_clips_caption.csv \\\n  --clean-caption \\\n  --refine-llm-caption \\\n  --remove-empty-caption \\\n  --output ${ROOT_META}/meta_clips_caption_cleaned.csv\n\n# 4.4 Optionally generate tags (e.g., objects) based on the captions. This should output your_output_prefix_{key}.csv\ntorchrun --nproc_per_node 8 --standalone -m tools.caption.caption_llama3 ${ROOT_META}/meta_clips_caption_cleaned.csv --key objects --output_prefix your_output_prefix\n\n```\n\n\nFor more information, please refer to:\n- [Dataset Management](../tools/datasets/README.md)\n- [Scene Detection and Video Splitting](../tools/scene_cut/README.md)\n- [Scoring and Filtering](../tools/scoring/README.md)\n- [Captioning](../tools/caption/README.md)\n"
  },
  {
    "path": "Open-Sora/docs/datasets.md",
    "content": "# Datasets\n\nFor Open-Sora 1.2, we conduct mixed training with both images and videos. The main datasets we use are listed below.\nPlease refer to [README](/README.md#data-processing) for data processing.\n\n## Video\n\n### Webvid-10M\n\n[Webvid-10M](https://github.com/m-bain/webvid) contains 10 million video-text pairs scraped from the stock footage sites.\nWe first train the model on this dataset (40k hours) for 30k steps (2 epochs).\n\n### Panda-70M\n\n[Panda-70M](https://github.com/snap-research/Panda-70M) is a large-scale dataset with 70M video-caption pairs.\nWe use the [training-10M subset](https://github.com/snap-research/Panda-70M/tree/main/dataset_dataloading) for training,\nwhich contains ~10M videos of better quality.\n\n### Mixkit\n\n[Mixkit](https://mixkit.co/) is a video website where we obtained 9k videos.\n\n### Pixabay\n\n[Pixabay](https://pixabay.com/videos/) is video website where we obtained 60.5k videos.\n\n### Pexels\n\n[Pexels](https://www.pexels.com/) is a popular online platform that provides high-quality stock photos, videos, and music for free.\nMost videos from this website are of high quality. Thus, we use them for both pre-training and HQ fine-tuning.\nWe really appreciate the great platform and the contributors!\n\n### Inter4K\n\n[Inter4K](https://github.com/alexandrosstergiou/Inter4K) is a dataset containing 1K video clips with 4K resolution.\nThe dataset is proposed for super-resolution tasks. We use the dataset for HQ fine-tuning.\n\n### HD-VG-130M\n\n[HD-VG-130M](https://github.com/daooshee/HD-VG-130M?tab=readme-ov-file) comprises 130M text-video pairs.\nThe caption is generated by BLIP-2.\nWe find the scene and the text quality are relatively poor. For OpenSora 1.0, we only use ~350K samples from this dataset.\n\n### MiraData\n\n[MiraData](https://github.com/mira-space/MiraData): a high-quality dataset with 77k long videos, mainly from games and city/scenic exploration.\n\n\n### Vript\n\n[Vript](https://github.com/mutonix/Vript/tree/main): a densely annotated dataset of 400k videos.\n\n\n## Image\n\n### Midjourney-v5-1.7M\n\n[Midjourney-v5-1.7M](https://huggingface.co/datasets/wanng/midjourney-v5-202304-clean) includes 1.7M image-text pairs.\nIn detail, this dataset introduces two subsets: original and upscale.\nThis dataset is proposed for exploring the relationship of prompts and high-quality images.\n\n### Midjourney-kaggle-clean\n\n[Midjourney-kaggle-clean](https://huggingface.co/datasets/wanng/midjourney-kaggle-clean) is a reconstructed version of [Midjourney User Prompts & Generated Images (250k)](https://www.kaggle.com/datasets/succinctlyai/midjourney-texttoimage?select=general-01_2022_06_20.json%5D), which is cleaned by rules.\nMoreover, this dataset is divided into two subsets: original and upscale.\nThis dataset is proposed for enabling research on text-to-image model prompting.\n\n### Unsplash-lite\n\nThe [Unsplash-lite](https://github.com/unsplash/datasets) Dataset comprises 25k nature-themed Unsplash photos, 25k keywords, and 1M searches.\nThis dataset covers a vast range of uses and contexts. Its extensive scope in intent and semantics opens new avenues for research and learning.\n\n### LAION-AESTHETICS 6.5+\n\nLAION aesthetic 6.5+ dataset is a subset of the LAION dataset, which contains 625K high-quality images with aesthetic scores > 6.5. However, as LAION is currently not publicly available, we use this 168k [subset](https://huggingface.co/datasets/bhargavsdesai/laion_improved_aesthetics_6.5plus_with_images).\n"
  },
  {
    "path": "Open-Sora/docs/installation.md",
    "content": "# Installation\n\nRequirements are listed in `requirements` folder.\nNote that besides these packages, some packages needs to be mannually installed, and are detailed in the following sections.\n\n## Training & Inference\n\nYou need to install `opensora` for training and inference. You can follow the steps below for installation. We also provide guideline for different CUDA versions for compatiblity.\n\nPlease note that the default installation is for training and inference only. Other optional dependencies are detailed in the sections [Data Processing](#data-processing), [Evaluation](#evaluation), and [VAE](#vae) respectively.\n\n### Step 1: Install PyTorch and xformers\n\nFirst of all, make sure you have the latest build toolkit for Python.\n\n```bash\n# update build libs\npip install -U pip setuptools wheel\n```\n\nIf you are using **CUDA 12.1**,  you can execute the command below to directly install PyTorch, torchvision and xformers.\n\n```bash\n# install pytorch, torchvision, and xformers\npip install -r requirements/requirements-cu121.txt\n```\n\nIf you are using different CUDA versions, you need to manually install `torch`, `torchvision` and `xformers`. You can find the compatible distributions according to the links below.\n\n- PyTorch: choose install commands from [PyTorch installation page](https://pytorch.org/get-started/locally/) based on your own CUDA version.\n- xformers: choose install commands from [xformers repo](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) based on your own CUDA version.\n\n### Step 2: Install Open-Sora\n\nThen, you can install the project for training and inference with the following commands:\n\n```bash\n# install this project\ngit clone https://github.com/hpcaitech/Open-Sora\ncd Open-Sora\n\n# the default installation is for inference only\npip install -v . # NOTE: for development mode, run `pip install -v -e .`\n```\n\n### Step 3: Install Acceleration Tools (Optional)\n\nThis is optional but recommended for faster speed, especially for training. To enable `layernorm_kernel` and `flash_attn`, you need to install `apex` and `flash-attn` with the following commands.\n\n```bash\n# install flash attention\n# set enable_flash_attn=False in config to disable flash attention\npip install packaging ninja\npip install flash-attn --no-build-isolation\n\n# install apex, the compilation will take a long time\n# set enable_layernorm_kernel=False in config to disable apex\npip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings \"--build-option=--cpp_ext\" --config-settings \"--build-option=--cuda_ext\" git+https://github.com/NVIDIA/apex.git\n```\n\n## Data Processing\n\n### Step 1: Install Requirements\n\nFirst, run the following command to install requirements:\n\n```bash\npip install -v .[data]\n# For development: `pip install -v -e .[eval]`\n```\n\nNext, you need to manually install the packages listed in the following sections specific to your data processing needs.\n\n### Step 2: Install OpenCV\n\nTo get image and video information, we use [opencv-python](https://github.com/opencv/opencv-python). You can install it with pip:\n\n```bash\npip install opencv-python\n```\n\nHowever, if your videos are in av1 codec instead of h264, you need to install ffmpeg (already in our [requirement script](../requirements/requirements-data.txt)), then run the following to make conda support av1 codec:\n\n```bash\npip uninstall opencv-python\nconda install -c conda-forge opencv\n```\n\n### Step 3: Install Task-specific Dependencies\n\nWe have a variety of data processing pipelines, each requires its own dependencies. You can refer to the sections below to install dependencies according to your own needs.\n\n#### LLaVA Captioning\n\nYou need to manually install LLaVA with the following command:\n\n```bash\npip install --no-deps llava@git+https://github.com/haotian-liu/LLaVA.git@v1.2.2.post1\n```\n\n#### PLLaVA Captioning\n\nYou need to manually install PLLaVa with the following commands:\n\n```bash\ncd tools/caption/pllava_dir # Assume you are in Open-Sora-dev root directory\ngit clone https://github.com/magic-research/PLLaVA.git\ncd PLLaVA\ngit checkout fd9194a # since there is no version tag, we use this commit\npython python_scripts/hf.py # download the PLLaVA weights\n\n# IMPORTANT: create new environment for reliable pllava performances:\nconda create -n pllava python=3.10\n# You need to manually install `torch`, `torchvision` and `xformers` for different CUDA versions, the following works for CUDA 12.1:\nconda activate pllava\npip install -r ../../../requirements/requirements-cu121.txt\npip install packaging ninja\npip install flash-attn --no-build-isolation\n# You may manually remove any lines in requirements.txt that contains `cu11`, then run `pip install -r requirements.txt`\n# Alternatively, use our prepared pllava environment:\npip install -r ../../../../requirements/requirements-pllava.txt\n```\n\n#### Scene Detection\n\nWe use [`PySceneDetect`](https://github.com/Breakthrough/PySceneDetect) for this job. You need to manually run the following:\n\n```bash\npip install scenedetect[opencv] --upgrade\n```\n\n#### OCR\n\nYou need to go into `path_to_your_env/lib/python3.10/site-packages/mmdet/__init__.py`\nand change the assert of `mmcv_version < digit_version(mmcv_maximum_version)` to `mmcv_version <= digit_version(mmcv_maximum_version)`.\n\nIf you are unsure of your path to the mmdet init file, simply run our [OCR command](../tools/scoring/README.md), wait for the mmdeet assertion error on mmcv versions.\nThe error will contain the exact path to the mmdet init file.\n\n\n## Evaluation\n\n### Step 1: Install Requirements\n\nTo conduct evaluation, run the following command to install requirements:\n\n```bash\npip install -v .[eval]\n# For development:`pip install -v -e .[eval]`\n```\n\n### Step 2: Install VBench\n\n<!-- You need to manually install [VBench](https://github.com/Vchitect/VBench):\n\n```bash\npip install --no-deps vbench==0.1.1\n# If the installation shows a warning about the intalled vbench not in PATH, you need to add it by:\nexport PATH=\"/path/to/vbench:$PATH\"\n``` -->\n\nYou need to install VBench mannually by:\n```bash\n# first clone their repo\ncd .. # assume you are in the Open-Sora root folder, you may install at other location but make sure the soft link paths later are correct\ngit clone https://github.com/Vchitect/VBench.git\ncd VBench\ngit checkout v0.1.2\n\n# next, fix their hard-coded path isse\nvim vbench2_beta_i2v/utils.py\n# find `image_root` in the `load_i2v_dimension_info` function, change it to point to your appropriate image folder\n\n# last, create softlinks\ncd ../Open-Sora # or `cd ../Open-Sora-dev` for development\nln -s ../VBench/vbench vbench # you may need to change ../VBench/vbench to your corresponding path\nln -s ../VBench/vbench2_beta_i2v vbench2_beta_i2v # you may need to change ../VBench/vbench_beta_i2v to your corresponding path\n# later you need to make sure to run evaluation from your Open-Sora folder, else vbench, vbench2_beta_i2v cannot be found\n```\n\n\n### Step 3: Install `cupy` for Potential VAE Errors\n\nYou need to mannually install [cupy](https://docs.cupy.dev/en/stable/install.html).\n\n- For CUDA v11.2~11.8 (x86_64 / aarch64), `pip install cupy-cuda11x`\n- For CUDA v12.x (x86_64 / aarch64), `pip install cupy-cuda12x`\n\nNote that for VAE evaluation, you may run into error with `ModuleNotFoundError: No module named 'torchvision.transforms.functional_tensor'`, in this case, you need to go to the corresponding file (`.../pytorchvideo/transforms/augmentations.py`) reporting this error, then change as following:\n\n```python\n# find the original line:\nimport torchvision.transforms.functional_tensor as F_t\n# change to:\nimport torchvision.transforms._functional_tensor as F_t\n```\n\n\n\n\n## VAE\n\n### Step 1: Install Requirements\n\nTo train and evaluate your own VAE, run the following command to install requirements:\n\n```bash\npip install -v .[vae]\n# For development:`pip install -v -e .[vae]`\n```\n\n### Step 2: VAE Evaluation (`cupy` and Potential VAE Errors)\n\nRefer to the [Evaluation's VAE section](#step-3-install-cupy-for-potential-vae-errors) above.\n"
  },
  {
    "path": "Open-Sora/docs/report_01.md",
    "content": "# Open-Sora 1.0 Report\n\nOpenAI's Sora is amazing at generating one minutes high quality videos. However, it reveals almost no information about its details. To make AI more \"open\", we are dedicated to build an open-source version of Sora. This report describes our first attempt to train a transformer-based video diffusion model.\n\n## Efficiency in choosing the architecture\n\nTo lower the computational cost, we want to utilize existing VAE models. Sora uses spatial-temporal VAE to reduce the temporal dimensions. However, we found that there is no open-source high-quality spatial-temporal VAE model. [MAGVIT](https://github.com/google-research/magvit)'s 4x4x4 VAE is not open-sourced, while [VideoGPT](https://wilson1yan.github.io/videogpt/index.html)'s 2x4x4 VAE has a low quality in our experiments. Thus, we decided to use a 2D VAE (from [Stability-AI](https://huggingface.co/stabilityai/sd-vae-ft-mse-original)) in our first version.\n\nThe video training involves a large amount of tokens. Considering 24fps 1min videos, we have 1440 frames. With VAE downsampling 4x and patch size downsampling 2x, we have 1440x1024≈1.5M tokens. Full attention on 1.5M tokens leads to a huge computational cost. Thus, we use spatial-temporal attention to reduce the cost following [Latte](https://github.com/Vchitect/Latte).\n\nAs shown in the figure, we insert a temporal attention right after each spatial attention in STDiT (ST stands for spatial-temporal). This is similar to variant 3 in Latte's paper. However, we do not control a similar number of parameters for these variants. While Latte's paper claims their variant is better than variant 3, our experiments on 16x256x256 videos show that with same number of iterations, the performance ranks as: DiT (full) > STDiT (Sequential) > STDiT (Parallel) ≈ Latte. Thus, we choose STDiT (Sequential) out of efficiency. Speed benchmark is provided [here](/docs/acceleration.md#efficient-stdit).\n\n![Architecture Comparison](/assets/readme/report_arch_comp.png)\n\nTo focus on video generation, we hope to train the model based on a powerful image generation model. [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha) is an efficiently trained high-quality image generation model with T5-conditioned DiT structure. We initialize our model with PixArt-α and initialize the projection layer of inserted temporal attention with zero. This initialization preserves model's ability of image generation at beginning, while Latte's architecture cannot. The inserted attention increases the number of parameter from 580M to 724M.\n\n![Architecture](/assets/readme/report_arch.jpg)\n\nDrawing from the success of PixArt-α and Stable Video Diffusion, we also adopt a progressive training strategy: 16x256x256 on 366K pretraining datasets, and then 16x256x256, 16x512x512, and 64x512x512 on 20K datasets. With scaled position embedding, this strategy greatly reduces the computational cost.\n\nWe also try to use a 3D patch embedder in DiT. However, with 2x downsampling on temporal dimension, the generated videos have a low quality. Thus, we leave the downsampling to temporal VAE in our next version. For now, we sample at every 3 frames with 16 frames training and every 2 frames with 64 frames training.\n\n## Data is the key to high quality\n\nWe find that the number and quality of data have a great impact on the quality of generated videos, even larger than the model architecture and training strategy. At this time, we only prepared the first split (366K video clips) from [HD-VG-130M](https://github.com/daooshee/HD-VG-130M). The quality of these videos varies greatly, and the captions are not that accurate. Thus, we further collect 20k relatively high quality videos from [Pexels](https://www.pexels.com/), which provides free license videos. We label the video with LLaVA, an image captioning model, with three frames and a designed prompt. With designed prompt, LLaVA can generate good quality of captions.\n\n![Caption](/assets/readme/report_caption.png)\n\nAs we lay more emphasis on the quality of data, we prepare to collect more data and build a video preprocessing pipeline in our next version.\n\n## Training Details\n\nWith a limited training budgets, we made only a few exploration. We find learning rate 1e-4 is too large and scales down to 2e-5. When training with a large batch size, we find `fp16` less stable than `bf16` and may lead to generation failure. Thus, we switch to `bf16` for training on 64x512x512. For other hyper-parameters, we follow previous works.\n\n## Loss curves\n\n16x256x256 Pretraining Loss Curve\n\n![16x256x256 Pretraining Loss Curve](/assets/readme/report_loss_curve_1.png)\n\n16x256x256 HQ Training Loss Curve\n\n![16x256x256 HQ Training Loss Curve](/assets/readme/report_loss_curve_2.png)\n\n16x512x512 HQ Training Loss Curve\n\n![16x512x512 HQ Training Loss Curve](/assets/readme/report_loss_curve_3.png)\n\n> Core Contributor: Zangwei Zheng*, Xiangyu Peng*, Shenggui Li, Hongxing Liu, Yang You\n"
  },
  {
    "path": "Open-Sora/docs/report_02.md",
    "content": "# Open-Sora 1.1 Report\n\n- [Model Architecture Modification](#model-architecture-modification)\n- [Support for Multi-time/resolution/aspect ratio/fps Training](#support-for-multi-timeresolutionaspect-ratiofps-training)\n- [Masked DiT as Image/Video-to-Video Model](#masked-dit-as-imagevideo-to-video-model)\n- [Data Collection \\& Pipeline](#data-collection--pipeline)\n- [Training Details](#training-details)\n- [Limitation and Future Work](#limitation-and-future-work)\n\nIn Open-Sora 1.1 release, we train a 700M models on 10M data (Open-Sora 1.0 trained on 400K data) with a better STDiT architecture. We implement the following features mentioned in [sora's report](https://openai.com/research/video-generation-models-as-world-simulators):\n\n- Variable durations, resolutions, aspect ratios (Sampling flexibility, Improved framing and composition)\n- Prompting with images and videos (Animating images, Extending generated videos, Video-to-video editing, Connecting videos)\n- Image generation capabilities\n\nTo achieve this goal, we use multi-task learning in the pretraining stage. For diffusion models, training with different sampled timestep is already a multi-task learning. We further extend this idea to multi-resolution, aspect ratio, frame length, fps, and different mask strategies for image and video conditioned generation. We train the model on **0s~15s, 144p to 720p, various aspect ratios** videos. Although the quality of time consistency is not that high due to limit training FLOPs, we can still see the potential of the model.\n\n## Model Architecture Modification\n\nWe made the following modifications to the original ST-DiT for better training stability and performance (ST-DiT-2):\n\n- **[Rope embedding](https://arxiv.org/abs/2104.09864) for temporal attention**: Following LLM's best practice, we change the sinusoidal positional encoding to rope embedding for temporal attention since it is also a sequence prediction task.\n- **AdaIN and Layernorm for temporal attention**: we wrap the temporal attention with AdaIN and layernorm as the spatial attention to stabilize the training.\n- **[QK-normalization](https://arxiv.org/abs/2302.05442) with [RMSNorm](https://arxiv.org/abs/1910.07467)**: Following [SD3](https://arxiv.org/pdf/2403.03206.pdf), we apply QK-normalization to the all attention for better training stability in half-precision.\n- **Dynamic input size support and video infomation condition**: To support multi-resolution, aspect ratio, and fps training, we make ST-DiT-2 to accept any input size, and automatically scale positional embeddings. Extending [PixArt-alpha](https://github.com/PixArt-alpha/PixArt-alpha)'s idea, we conditioned on video's height, width, aspect ratio, frame length, and fps.\n- **Extending T5 tokens from 120 to 200**: our caption is usually less than 200 tokens, and we find the model can handle longer text well.\n\n## Support for Multi-time/resolution/aspect ratio/fps Training\n\nAs mentioned in the [sora's report](https://openai.com/research/video-generation-models-as-world-simulators), training with original video's resolution, aspect ratio, and length increase sampling flexibility and improve framing and composition. We found three ways to achieve this goal:\n\n- [NaViT](https://arxiv.org/abs/2307.06304): support dynamic size within the same batch by masking, with little efficiency loss. However, the system is a bit complex to implement, and may not benefit from optimized kernels such as flash attention.\n- Padding ([FiT](https://arxiv.org/abs/2402.12376), [Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan)): support dynamic size within the same batch by padding. However, padding different resolutions to the same size is not efficient.\n- Bucket ([SDXL](https://arxiv.org/abs/2307.01952), [PixArt](https://arxiv.org/abs/2310.00426)): support dynamic size in different batches by bucketing, but the size must be the same within the same batch, and only a fixed number of size can be applied. With the same size in a batch, we do not need to implement complex masking or padding.\n\nFor the simplicity of implementation, we choose the bucket method. We pre-define some fixed resolution, and allocate different samples to different bucket. The concern for bucketing is listed below. But we can see that the concern is not a big issue in our case.\n\n<details>\n<summary>View the concerns</summary>\n\n- The bucket size is limited to a fixed number: First, in real-world applications, only a few aspect ratios (9:16, 3:4) and resolutions (240p, 1080p) are commonly used. Second, we find trained models can generalize well to unseen resolutions.\n- The size in each batch is the same, breaks the i.i.d. assumption: Since we are using multiple GPUs, the local batches on different GPUs have different sizes. We did not see a significant performance drop due to this issue.\n- The may not be enough samples to fill each bucket and the distribution may be biased: First, our dataset is large enough to fill each bucket when local batch size is not too large. Second, we should analyze the data's distribution on sizes and define the bucket size accordingly. Third, an unbalanced distribution did not affect the training process significantly.\n- Different resolutions and frame lengths may have different processing speed: Different from PixArt, which only deals with aspect ratios of similar resolutions (similar token numbers), we need to consider the processing speed of different resolutions and frame lengths. We can use the `bucket_config` to define the batch size for each bucket to ensure the processing speed is similar.\n\n</details>\n\n![bucket](/assets/readme/report_bucket.png)\n\nAs shown in the figure, a bucket is a triplet of `(resolution, num_frame, aspect_ratio)`. We provide pre-defined aspect ratios for different resolution that covers most of the common video aspect ratios. Before each epoch, we shuffle the dataset and allocate the samples to different buckets as shown in the figure. We put a sample into a bucket with largest resolution and frame length that is smaller than the video's.\n\nConsidering our computational resource is limited, we further introduce two attributes `keep_prob` and `batch_size` for each `(resolution, num_frame)` to reduce the computational cost and enable multi-stage training. Specifically, a high-resolution video will be downsampled to a lower resolution with probability `1-keep_prob` and the batch size for each bucket is `batch_size`. In this way, we can control the number of samples in different buckets and balance the GPU load by search a good batch size for each bucket.\n\nA detailed explanation of the bucket usage in training is available in [docs/config.md](/docs/config.md#training-bucket-configs).\n\n## Masked DiT as Image/Video-to-Video Model\n\nTransformers can be easily extended to support image-to-image and video-to-video tasks. We propose a mask strategy to support image and video conditioning. The mask strategy is shown in the figure below.\n\n![mask strategy](/assets/readme/report_mask.png)\n\nTypically, we unmask the frames to be conditioned on for image/video-to-video condition. During the ST-DiT forward, unmasked frames will have timestep 0, while others remain the same (t). We find directly apply the strategy to trained model yield poor results as the diffusion model did not learn to handle different timesteps in one sample during training.\n\nInspired by [UL2](https://arxiv.org/abs/2205.05131), we introduce random mask strategy during training. Specifically, we randomly unmask the frames during training, including unmask the first frame, the first k frames, the last frame, the last k frames, the first and last k frames, random frames, etc. Based on Open-Sora 1.0, with 50% probability of applying masking, we see the model can learn to handle image conditioning (while 30% yields worse ability) for 10k steps, with a little text-to-video performance drop. Thus, for Open-Sora 1.1, we pretrain the model from scratch with masking strategy.\n\nAn illustration of masking strategy config to use in inference is given as follow. A five number tuple provides great flexibility in defining the mask strategy. By conditioning on generated frames, we can autogressively generate infinite frames (although error propagates).\n\n![mask strategy config](/assets/readme/report_mask_config.png)\n\nA detailed explanation of the mask strategy usage is available in [docs/config.md](/docs/config.md#advanced-inference-config).\n\n## Data Collection & Pipeline\n\nAs we found in Open-Sora 1.0, the data number and quality are crucial for training a good model, we work hard on scaling the dataset. First, we create an automatic pipeline following [SVD](https://arxiv.org/abs/2311.15127), inlcuding scene cutting, captioning, various scoring and filtering, and dataset management scripts and conventions. More infomation can be found in [docs/data_processing.md](/docs/data_processing.md).\n\n![pipeline](/assets/readme/report_data_pipeline.png)\n\nWe plan to use [panda-70M](https://snap-research.github.io/Panda-70M/) and other data to traing the model, which is approximately 30M+ data. However, we find disk IO a botteleneck for training and data processing at the same time. Thus, we can only prepare a 10M dataset and did not go through all processing pipeline that we built. Finally, we use a dataset with 9.7M videos + 2.6M images for pre-training, and 560k videos + 1.6M images for fine-tuning. The pretraining dataset statistics are shown below. More information about the dataset can be found in [docs/datasets.md](/docs/datasets.md).\n\nImage text tokens (by T5 tokenizer):\n\n![image text tokens](/assets/readme/report_image_textlen.png)\n\nVideo text tokens (by T5 tokenizer). We directly use panda's short caption for training, and caption other datasets by ourselves. The generated caption is usually less than 200 tokens.\n\n![video text tokens](/assets/readme/report_video_textlen.png)\n\nVideo duration:\n\n![video duration](/assets/readme/report_video_duration.png)\n\n## Training Details\n\nWith limited computational resources, we have to carefully monitor the training process, and change the training strategy if we speculate the model is not learning well since there is no computation for ablation study. Thus, Open-Sora 1.1's training includes multiple changes, and as a result, ema is not applied.\n\n1. First, we fine-tune **6k** steps with images of different resolution from `Pixart-alpha-1024` checkpoints. We find the model easily adapts to generate images with different resolutions. We use [SpeeDiT](https://github.com/1zeryu/SpeeDiT) (iddpm-speed) to accelerate the diffusion training.\n2. **[Stage 1]** Then, we pretrain the model with gradient-checkpointing for **24k** steps, which takes **4 days** on 64 H800 GPUs. Although the number of samples seen by the model is the same, we find the model learns slowly compared to a smaller batch size. We speculate that at an early stage, the number of steps is more important for training. The most videos are in **240p** resolution, and the config is similar to [stage2.py](/configs/opensora-v1-1/train/stage2.py). The video looking is good, but the model does not know much about the temporal knowledge. We use mask ratio of 10%.\n3. **[Stage 1]** To increase the number of steps, we switch to a smaller batch size without gradient-checkpointing. We also add fps conditioning at this point. We trained **40k** steps for **2 days**. The most videos are in **144p** resolution, and the config file is [stage1.py](/configs/opensora-v1-1/train/stage1.py). We use a lower resolution as we find in Open-Sora 1.0 that the model can learn temporal knowledge with relatively low resolution.\n4. **[Stage 1]** We find the model cannot learn well for long videos, and find a noised generation result as speculated to be half-precision problem found in Open-Sora 1.0 training. Thus, we adopt the QK-normalization to stabilize the training. Similar to SD3, we find the model quickly adapt to the QK-normalization. We also switch iddpm-speed to iddpm, and increase the mask ratio to 25% as we find image-condition not learning well. We trained for **17k** steps for **14 hours**. The most videos are in **144p** resolution, and the config file is [stage1.py](/configs/opensora-v1-1/train/stage1.py). The stage 1 training lasts for approximately one week, with total step **81k**.\n5. **[Stage 2]** We switch to a higher resolution, where most videos are in **240p and 480p** resolution ([stage2.py](/configs/opensora-v1-1/train/stage2.py)). We trained **22k** steps for **one day** on all pre-training data.\n6. **[Stage 3]** We switch to a higher resolution, where most videos are in **480p and 720p** resolution ([stage3.py](/configs/opensora-v1-1/train/stage3.py)). We trained **4k** with **one day** on high-quality data. We find loading previous stage's optimizer state can help the model learn faster.\n\nTo summarize, the training of Open-Sora 1.1 requires approximately **9 days** on 64 H800 GPUs.\n\n## Limitation and Future Work\n\nAs we get one step closer to the replication of Sora, we find many limitations for the current model, and these limitations point to the future work.\n\n- **Generation Failure**: we fine many cases (especially when the total token number is large or the content is complex),  our model fails to generate the scene. There may be a collapse in the temporal attention and we have identified a potential bug in our code. We are working hard to fix it. Besides, we will increase our model size and training data to improve the generation quality in the next version.\n- **Noisy generation and influency**: we find the generated model is sometimes noisy and not fluent, especially for long videos. We think the problem is due to not using a temporal VAE. As [Pixart-Sigma](https://arxiv.org/abs/2403.04692) finds that adapting to a new VAE is simple, we plan to develop a temporal VAE for the model in the next version.\n- **Lack of time consistency**: we find the model cannot generate videos with high time consistency. We think the problem is due to the lack of training FLOPs. We plan to collect more data and continue training the model to improve the time consistency.\n- **Bad human generation**: We find the model cannot generate high-quality human videos. We think the problem is due to the lack of human data. We plan to collect more human data and continue training the model to improve the human generation.\n- **Low aesthetic score**: we find the model's aesthetic score is not high. The problem is due to the lack of aesthetic score filtering, which is not conducted due to IO bottleneck. We plan to filter the data by aesthetic score and finetuning the model to improve the aesthetic score.\n- **Worse quality for longer video generation**: we find with a same prompt, the longer video has worse quality. This means the image quality is not equally adapted to different lengths of sequences.\n\n> - **Algorithm & Acceleration**: Zangwei Zheng, Xiangyu Peng, Shenggui Li, Hongxing Liu, Yukun Zhou, Tianyi Li\n> - **Data Collection & Pipeline**: Xiangyu Peng, Zangwei Zheng, Chenhui Shen, Tom Young, Junjie Wang, Chenfeng Yu\n"
  },
  {
    "path": "Open-Sora/docs/report_03.md",
    "content": "# Open-Sora 1.2 Report\n\n- [Video compression network](#video-compression-network)\n- [Rectified flow and model adaptation](#rectified-flow-and-model-adaptation)\n- [More data and better multi-stage training](#more-data-and-better-multi-stage-training)\n- [Easy and effective model conditioning](#easy-and-effective-model-conditioning)\n- [Evaluation](#evaluation)\n- [Sequence parallelism](#sequence-parallelism)\n\nIn Open-Sora 1.2 release, we train a 1.1B models on >30M data (\\~80k hours), with training cost 35k H100 GPU hours, supporting 0s\\~16s, 144p to 720p, various aspect ratios video generation. Our configurations is listed below. Following our 1.1 version, Open-Sora 1.2 can also do image-to-video generation and video extension.\n\n|      | image | 2s  | 4s  | 8s  | 16s |\n| ---- | ----- | --- | --- | --- | --- |\n| 240p | ✅     | ✅   | ✅   | ✅   | ✅   |\n| 360p | ✅     | ✅   | ✅   | ✅   | ✅   |\n| 480p | ✅     | ✅   | ✅   | ✅   | 🆗   |\n| 720p | ✅     | ✅   | ✅   | 🆗   | 🆗   |\n\nHere ✅ means that the data is seen during training, and 🆗 means although not trained, the model can inference at that config. Inference for 🆗 requires more than one 80G memory GPU and sequence parallelism.\n\nBesides features introduced in Open-Sora 1.1, Open-Sora 1.2 highlights:\n\n- Video compression network\n- Rectifie-flow training\n- More data and better multi-stage training\n- Easy and effective model conditioning\n- Better evaluation metrics\n\nAll implementations (both training and inference) of the above improvements are available in the Open-Sora 1.2 release. The following sections will introduce the details of the improvements. We also refine our codebase and documentation to make it easier to use and develop, and add a LLM to [refine input prompts](/README.md#gpt-4o-prompt-refinement) and support more languages.\n\n## Video compression network\n\nFor Open-Sora 1.0 & 1.1, we used stability-ai's 83M 2D VAE, which compress the video only in the spatial dimension by 8x8 times. To reduce the temporal dimension, we extracted one frame in every three frames. However, this method led to the low fluency of generated video as the generated fps is sacrificed. Thus, in this release, we introduce the video compression network as OpenAI's Sora does. With a 4 times compression in the temporal dimension, we do not need to extract frames and can generate videos with the original fps.\n\nConsidering the high computational cost of training a 3D VAE, we hope to re-use the knowledge learnt in the 2D VAE. We notice that after 2D VAE's compression, the features adjacent in the temporal dimension are still highly correlated. Thus, we propose a simple video compression network, which first compress the video in the spatial dimension by 8x8 times, then compress the video in the temporal dimension by 4x times. The network is shown below:\n\n![video_compression_network](/assets/readme/report_3d_vae.png)\n\nWe initialize the 2D VAE with [SDXL's VAE](https://huggingface.co/stabilityai/sdxl-vae), which is better than our previously used one. For the 3D VAE, we adopt the structure of VAE in [Magvit-v2](https://magvit.cs.cmu.edu/v2/), which contains 300M parameters. Along with 83M 2D VAE, the total parameters of the video compression network is 384M. We train the 3D VAE for 1.2M steps with local batch size 1. The training data is videos from pixels and pixabay, and the training video size is mainly 17 frames, 256x256 resolution. Causal convolutions are used in the 3D VAE to make the image reconstruction more accurate.\n\nOur training involves three stages:\n\n1. For the first 380k steps, we train on 8 GPUs and freeze the 2D VAE. The training objective includes the reconstruction of the compressed features from 2D VAE (pink one in the figure) and also add a loss to make features from the 3D VAE similar to the features from the 2D VAE (pink one and green one, called identity loss). We find the latter loss can quickly make the whole VAE achieve a good performance for image and much faster to converge in the next stage.\n2. For the next 260k steps, We remove the identity loss and just learn the 3D VAE.\n3. For the last 540k steps , since we find only reconstruction 2D VAE's feature cannot lead to further improvement, we remove the loss and train the whole VAE to reconstruct the original videos. This stage is trained on on 24 GPUs.\n\nFor both stage 1 and stage 2 training, we adopt 20% images and 80% videos. Following [Magvit-v2](https://magvit.cs.cmu.edu/v2/), we train video using 17 frames, while zero-padding the first 16 frames for image. However, we find that this setting leads to blurring of videos with length different from 17 frames. Thus, in stage 3, we use a random number within 34 frames for mixed video length training (a.k.a., zero-pad the first  `43-n` frames if we want to train a `n` frame video), to make our VAE more robust to different video lengths. Our [training](/scripts/train_vae.py) and [inference](/scripts/inference_vae.py) code is available in the Open-Sora 1.2 release.\n\nWhen using the VAE for diffusion model, our stacked VAE requires small memory as the our VAE's input is already compressed. We also split the input videos input several 17 frames clips to make the inference more efficient.  The performance of our VAE is on par with another open-sourced 3D VAE in [Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/main/docs/Report-v1.1.0.md).\n\n| Model              | SSIM↑ | PSNR↑  |\n| ------------------ | ----- | ------ |\n| Open-Sora-Plan 1.1 | 0.882 | 29.890 |\n| Open-Sora 1.2      | 0.880 | 30.590 |\n\n## Rectified flow and model adaptation\n\nLastest diffusion model like Stable Diffusion 3 adopts the [rectified flow](https://github.com/gnobitab/RectifiedFlow) instead of DDPM for better performance. Pitiably, SD3's rectified flow training code is not open-sourced. However, Open-Sora 1.2 provides the training code following SD3's paper, including:\n\n- Basic rectified flow training ([original rectified flow paper](https://arxiv.org/abs/2209.03003))\n- Logit-norm sampling for training acceleration ([SD3 paper](https://arxiv.org/pdf/2403.03206) Section 3.1, intuitively it is more likely to sample timesteps at middle noise level)\n- Resolution and video length aware timestep sampling ([SD3 paper](https://arxiv.org/pdf/2403.03206) Section 5.3.2, intuitively it is more likely to sample timesteps with more noise for larger resolution, and we extend it to longer video)\n\nFor the resolution-aware timestep sampling, we should use more noise for images with larger resolution. We extend this idea to video generation and use more noise for videos with longer length.\n\nOpen-Sora 1.2 starts from the [PixArt-Σ 2K](https://github.com/PixArt-alpha/PixArt-sigma) checkpoint. Note that this model is trained with DDPM and SDXL VAE, also a much higher resolution. We find finetuning on a small dataset can easily adapt the model for our video generation setting. The adaptation process is as follows, all training is done on 8 GPUs (the adaptation for the diffusion model is quite fast and straightforward):\n\n1. Multi-resolution image generation ability: we train the model to generate different resolution ranging from 144p to 2K for 20k steps.\n2. QK-norm: we add the QK-norm to the model and train for 18k steps.\n3. Rectified flow: we transform from discrete-time DDPM to continuous-time rectified flow and train for 10k steps.\n4. Rectified flow with logit-norm sampling and resolution-aware timestep sampling: we train for 33k steps.\n5. Smaller AdamW epsilon: following SD3, with QK-norm, we can use a smaller epsilon (1e-15) for AdamW, we train for 8k steps.\n6. New VAE and fps conditioning: we replace the original VAE with ours and add fps conditioning to the timestep conditioning, we train for 25k steps. Note that normalizing each channel is important for rectified flow training.\n7. Temporal attention blocks: we add temporal attention blocks with zero initialized projection layers. We train on images for 3k steps.\n8. Temporal blocks only for video with mask strategy: we train the temporal attention blocks only on videos for 38k steps.\n\nAfter the above adaptation, we are ready to train the model on videos. The adaptation above maintains the original model's ability to generate high-quality images, and brings multiple benefits for video generation:\n\n- With rectified flow, we can accelerate the training and reduce the number of sampling steps for video from 100 to 30, which greatly reduces the waiting time for inference.\n- With qk-norm, the training is more stablized and an aggressive optimizer can be used.\n- With new VAE, the temporal dimension is compressed by 4 times, which makes the training more efficient.\n- With multi-resolution image generation ability, the model can generate videos with different resolutions.\n\n## More data and better multi-stage training\n\nDue to a limited computational budget, we carefully arrange the training data from low to high quality and split our training into three stages. Our training involves 12x8 GPUs, and the total training time is about 2 weeks for about 70k steps.\n\n### First stage\n\nWe first train the model on Webvid-10M datasets (40k hours) for 30k steps (2 epochs). Since the video is all lower than 360p resolution and contains watermark, we train on this dataset first. The training mainly happens on 240p and 360p, with video length 2s~16s. We use the original caption in the dataset for training. The training config locates in [stage1.py](/configs/opensora-v1-2/train/stage1.py).\n\n### Second stage\n\nThen we train the model on Panda-70M datasets. This dataset is large but the quality varies. We use the official 30M subset which clips are more diverse, and filter out videos with aesthetic score lower than 4.5. This leads to a 20M subset with 41k hours. The captions in the dataset are directly used for our training. The training config locates in [stage2.py](/configs/opensora-v1-2/train/stage2.py).\n\nThe training mainly happens on 360p and 480p. We train the model for 23k steps, which is 0.5 epoch. The training is not fully done since we hope our new model can meet you earlier.\n\n### Third stage\n\nIn this stage, we collect ~2M video clips with a total length of 5K hours from all kinds of sources, including:\n\n- Free-license videos, sourced from Pexels, Pixabay, Mixkit, etc.\n- [MiraData](https://github.com/mira-space/MiraData): a high-quality dataset with long videos, mainly from games and city/scenic exploration.\n- [Vript](https://github.com/mutonix/Vript/tree/main): a densely annotated dataset.\n- And some other datasets.\n\nWhile MiraData and Vript have captions from GPT, we use [PLLaVA](https://github.com/magic-research/PLLaVA) to caption the rest ones. Compared with LLaVA, which is only capable of single frame/image captioning, PLLaVA is specially designed and trained for video captioning. The [accelerated PLLaVA](/tools/caption/README.md#pllava-captioning) is released in our `tools/`. In practice, we use the pretrained PLLaVA 13B model and select 4 frames from each video for captioning with a spatial pooling shape of 2*2.\n\nSome statistics of the video data used in this stage are shown below. We present basic statistics of duration and resolution, as well as aesthetic score and optical flow score distribution.\nWe also extract tags for objects and actions from video captions and count their frequencies.\n![stats](/assets/readme/report-03_video_stats.png)\n![object_count](/assets/readme/report-03_objects_count.png)\n![object_count](/assets/readme/report-03_actions_count.png)\n\nWe mainly train 720p and 1080p videos in this stage, aiming to extend the model's ability to larger resolutions. We use a mask ratio of 25% during training. The training config locates in [stage3.py](/configs/opensora-v1-2/train/stage3.py). We train the model for 15k steps, which is approximately 2 epochs.\n\n## Easy and effective model conditioning\n\nFor stage 3, we calculate the aesthetic score and motion score for each video clip. However, since the number of video clips is small, we are not willing to filter out clips with low scores, which leads to a smaller dataset. Instead, we append the scores to the captions and use them as conditioning. We find this method can make model aware of the scores and follows the scores to generate videos with better quality.\n\nFor example, a video with aesthetic score 5.5, motion score 10, and a detected camera motion pan left, the caption will be:\n\n```plaintext\n[Original Caption] aesthetic score: 5.5, motion score: 10, camera motion: pan left.\n```\n\nDuring inference, we can also use the scores to condition the model. For camera motion, we only label 13k clips with high confidence, and the camera motion detection module is released in our tools.\n\n## Evaluation\n\nPreviously, we monitor the training process only by human evaluation, as DDPM traning loss is not well correlated with the quality of generated videos. However, for rectified flow, we find the training loss is well correlated with the quality of generated videos as stated in SD3. Thus, we keep track of rectified flow evaluation loss on 100 images and 1k videos.\n\nWe sampled 1k videos from pixabay as validation dataset. We calculate the evaluation loss for image and different lengths of videos (2s, 4s, 8s, 16s) for different resolution (144p, 240p, 360p, 480p, 720p). For each setting, we equidistantly sample 10 timesteps. Then all the losses are averaged. We also provide a [video](https://streamable.com/oqkkf1) showing the sampled videos with a fixed prompt for different steps.\n\n![Evaluation Loss](/assets/readme/report_val_loss.png)\n![Video Evaluation Loss](/assets/readme/report_vid_val_loss.png)\n\nIn addition, we also keep track of [VBench](https://vchitect.github.io/VBench-project/) scores during training. VBench is an automatic video evaluation benchmark for short video generation. We calcuate the vbench score with 240p 2s videos. The two metrics verify that our model continues to improve during training.\n\n![VBench](/assets/readme/report_vbench_score.png)\n\nAll the evaluation code is released in `eval` folder. Check the [README](/eval/README.md) for more details.\n\n| Model          | Total Score | Quality Score | Semantic Score |\n| -------------- | ----------- | ------------- | -------------- |\n| Open-Sora V1.0 | 75.91%      | 78.81%        | 64.28%         |\n| Open-Sora V1.2 | 79.23%      | 80.71%        | 73.30%         |\n\n## Sequence parallelism\n\nWe use sequence parallelism to support long-sequence training and inference. Our implementation is based on Ulysses and the workflow is shown below. When sequence parallelism is enabled, we only need to apply the `all-to-all` communication to the spatial block in STDiT as only spatial computation is dependent on the sequence dimension.\n\n![SP](../assets/readme/sequence_parallelism.jpeg)\n\nCurrently, we have not used sequence parallelism for training as data resolution is small and we plan to do so in the next release. As for inference, we can use sequence parallelism in case your GPU goes out of memory. A simple benchmark shows that sequence parallelism can achieve speedup\n\n| Resolution | Seconds | Number of GPUs | Enable SP | Time taken/s | Speedup per GPU |\n| ---------- | ------- | -------------- | --------- | ------------ | --------------- |\n| 720p       | 16s     | 1              | No        | 547.97       | -               |\n| 720p       | 16s     | 2              | Yes       | 244.38       | 12%             |\n"
  },
  {
    "path": "Open-Sora/docs/structure.md",
    "content": "# Repo Structure\n\n```plaintext\nOpen-Sora\n├── README.md\n├── assets\n│   ├── images                     -> images used for image-conditioned generation\n│   ├── demo                       -> images used for demo\n│   ├── texts                      -> prompts used for text-conditioned generation\n│   └── readme                     -> images used in README\n├── configs                        -> Configs for training & inference\n├── docker                         -> dockerfile for Open-Sora\n├── docs\n│   ├── acceleration.md            -> Report on acceleration & speed benchmark\n│   ├── commands.md                -> Commands for training & inference\n│   ├── datasets.md                -> Datasets used in this project\n|   ├── data_processing.md         -> Data pipeline documents\n|   ├── installation.md            -> Data pipeline documents\n│   ├── structure.md               -> This file\n│   ├── config.md                  -> Configs for training and inference\n│   ├── report_01.md               -> Report for Open-Sora 1.0\n│   ├── report_02.md               -> Report for Open-Sora 1.1\n│   ├── report_03.md               -> Report for Open-Sora 1.2\n│   ├── vae.md                     -> our VAE report\n│   └── zh_CN                      -> Chinese version of the above\n├── eval                           -> Evaluation scripts\n│   ├── README.md                  -> Evaluation documentation\n|   ├── human_eval                 -> for human eval\n|   ├── launch.sh                  -> script for launching 8 cards sampling\n|   ├── loss                       -> eval loss\n|   ├── sample.sh                  -> script for quickly launching inference on predefined prompts\n|   ├── vae                        -> for vae eval\n|   ├── vbench                     -> for VBench evaluation\n│   └── vbench_i2v                 -> for VBench i2v evaluation\n├── gradio                         -> Gradio demo related code\n├── notebooks                      -> Jupyter notebooks for generating commands to run\n├── scripts\n│   ├── train.py                   -> diffusion training script\n│   ├── train_vae.py               -> vae training script\n│   ├── inference.py               -> diffusion inference script\n│   ├── inference_vae.py           -> vae inference script\n│   └── misc                       -> misc scripts, including batch size search\n├── opensora\n│   ├── __init__.py\n│   ├── registry.py                -> Registry helper\n│   ├── acceleration               -> Acceleration related code\n│   ├── datasets                    -> Dataset related code\n│   ├── models\n│   │   ├── dit                    -> DiT\n│   │   ├── layers                 -> Common layers\n│   │   ├── vae                    -> VAE as image encoder\n│   │   ├── text_encoder           -> Text encoder\n│   │   │   ├── classes.py         -> Class id encoder (inference only)\n│   │   │   ├── clip.py            -> CLIP encoder\n│   │   │   └── t5.py              -> T5 encoder\n│   │   ├── dit\n│   │   ├── latte\n│   │   ├── pixart\n│   │   └── stdit                  -> Our STDiT related code\n│   ├── schedulers                 -> Diffusion schedulers\n│   │   ├── iddpm                  -> IDDPM for training and inference\n│   │   └── dpms                   -> DPM-Solver for fast inference\n│   └── utils\n├── tests                          -> Tests for the project\n└── tools                          -> Tools for data processing and more\n```\n\n## Configs\n\nOur config files follows [MMEgine](https://github.com/open-mmlab/mmengine). MMEngine will reads the config file (a `.py` file) and parse it into a dictionary-like object.\n\n```plaintext\nOpen-Sora\n└── configs                        -> Configs for training & inference\n    ├── opensora-v1-1              -> STDiT2 related configs\n    │   ├── inference\n    │   │   ├── sample.py          -> Sample videos and images\n    │   │   └── sample-ref.py      -> Sample videos with image/video condition\n    │   └── train\n    │       ├── stage1.py          -> Stage 1 training config\n    │       ├── stage2.py          -> Stage 2 training config\n    │       ├── stage3.py          -> Stage 3 training config\n    │       ├── image.py           -> Illustration of image training config\n    │       ├── video.py           -> Illustration of video training config\n    │       └── benchmark.py       -> For batch size searching\n    ├── opensora                   -> STDiT related configs\n    │   ├── inference\n    │   │   ├── 16x256x256.py      -> Sample videos 16 frames 256x256\n    │   │   ├── 16x512x512.py      -> Sample videos 16 frames 512x512\n    │   │   └── 64x512x512.py      -> Sample videos 64 frames 512x512\n    │   └── train\n    │       ├── 16x256x256.py      -> Train on videos 16 frames 256x256\n    │       ├── 16x256x256.py      -> Train on videos 16 frames 256x256\n    │       └── 64x512x512.py      -> Train on videos 64 frames 512x512\n    ├── dit                        -> DiT related configs\n    │   ├── inference\n    │   │   ├── 1x256x256-class.py -> Sample images with ckpts from DiT\n    │   │   ├── 1x256x256.py       -> Sample images with clip condition\n    │   │   └── 16x256x256.py      -> Sample videos\n    │   └── train\n    │       ├── 1x256x256.py       -> Train on images with clip condition\n    │       └── 16x256x256.py      -> Train on videos\n    ├── latte                      -> Latte related configs\n    └── pixart                     -> PixArt related configs\n```\n\n## Tools\n\n```plaintext\nOpen-Sora\n└── tools\n    ├── datasets                   -> dataset management related code\n    ├── scene_cut                  -> scene cut related code\n    ├── caption                    -> caption related code\n    ├── scoring                    -> scoring related code\n    │   ├── aesthetic              -> aesthetic scoring related code\n    │   ├── matching               -> matching scoring related code\n    │   ├── ocr                    -> ocr scoring related code\n    │   └── optical_flow           -> optical flow scoring related code\n    └── frame_interpolation        -> frame interpolation related code\n"
  },
  {
    "path": "Open-Sora/docs/vae.md",
    "content": "# VAE Report\n\nAs [Pixart-Sigma](https://arxiv.org/abs/2403.04692) finds that adapting to a new VAE is simple, we develop an additional temporal VAE.\nSpecifically, our VAE consists of a pipeline of a [spatial VAE](https://huggingface.co/PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers) followed by a temporal VAE.\nFor the temporal VAE, we follow the implementation of [MAGVIT-v2](https://arxiv.org/abs/2310.05737), with the following modifications:\n\n* We remove the architecture specific to the codebook.\n* We do not use the discriminator, and use the VAE reconstruction loss, kl loss, and perceptual loss for training.\n* In the last linear layer of the encoder, we scale down to a diagonal Gaussian Distribution of 4 channels, following our previously trained STDiT that takes in 4 channels input.\n* Our decoder is symmetric to the encoder architecture.\n\n## Training\n\nWe train the model in different stages.\n\nWe first train the temporal VAE only by freezing the spatial VAE for 380k steps on a single machine (8 GPUs).\nWe use an additional identity loss to make features from the 3D VAE similar to the features from the 2D VAE.\nWe train the VAE using 20% images and 80% videos with 17 frames.\n\n```bash\ntorchrun --nnodes=1 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage1.py --data-path YOUR_CSV_PATH\n```\n\nNext, we remove the identity loss and train the 3D VAE pipeline to reconstructe the 2D-compressed videos for 260k steps.\n\n```bash\ntorchrun --nnodes=1 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage2.py --data-path YOUR_CSV_PATH\n```\n\nFinally, we remove the reconstruction loss for the 2D-compressed videos and train the VAE pipeline to construct the 3D videos for 540k steps.\nWe train our VAE with a random number within 34 frames to make it more robust to different video lengths.\nThis stage is trained on 24 GPUs.\n\n```bash\ntorchrun --nnodes=3 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage3.py --data-path YOUR_CSV_PATH\n```\n\nNote that you need to adjust the `epochs` in the config file accordingly with respect to your own csv data size.\n\n## Inference\n\nTo visually check the performance of the VAE, you may run the following inference.\nIt saves the original video to your specified video directory with `_ori` postfix (i.e. `\"YOUR_VIDEO_DIR\"_ori`), the reconstructed video from the full pipeline with the `_rec` postfix (i.e. `\"YOUR_VIDEO_DIR\"_rec`), and the reconstructed video from the 2D compression and decompression with the `_spatial` postfix (i.e. `\"YOUR_VIDEO_DIR\"_spatial`).\n\n```bash\ntorchrun --standalone --nnodes=1 --nproc_per_node=1 scripts/inference_vae.py configs/vae/inference/video.py --ckpt-path YOUR_VAE_CKPT_PATH --data-path YOUR_CSV_PATH --save-dir YOUR_VIDEO_DIR\n```\n## Evaluation\n\nWe can then calculate the scores of the VAE performances on metrics of SSIM, PSNR, LPIPS, and FLOLPIPS.\n\n* SSIM: structural similarity index measure, the higher the better\n* PSNR: peak-signal-to-noise ratio, the higher the better\n* LPIPS:  learned perceptual image quality degradation, the lower the better\n* [FloLPIPS](https://arxiv.org/pdf/2207.08119): LPIPS with video interpolation, the lower the better.\n\n```bash\npython eval/vae/eval_common_metric.py --batch_size 2 --real_video_dir YOUR_VIDEO_DIR_ori --generated_video_dir YOUR_VIDEO_DIR_rec --device cuda --sample_fps 24 --crop_size 256 --resolution 256 --num_frames 17 --sample_rate 1 --metric ssim psnr lpips flolpips\n```\n\n## Acknowledgement\nWe are grateful for the following work:\n* [MAGVIT-v2](https://arxiv.org/abs/2310.05737): Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation\n* [Taming Transformers](https://github.com/CompVis/taming-transformers): Taming Transformers for High-Resolution Image Synthesis\n* [3D blur pooling](https://github.com/adobe/antialiased-cnns/pull/39/commits/3d6f02b6943c58b68c19c07bc26fad57492ff3bc)\n* [Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan)\n"
  },
  {
    "path": "Open-Sora/docs/zh_CN/README.md",
    "content": "<p align=\"center\">\n    <img src=\"../../assets/readme/icon.png\" width=\"250\"/>\n</p>\n<div align=\"center\">\n    <a href=\"https://github.com/hpcaitech/Open-Sora/stargazers\"><img src=\"https://img.shields.io/github/stars/hpcaitech/Open-Sora?style=social\"></a>\n    <a href=\"https://hpcaitech.github.io/Open-Sora/\"><img src=\"https://img.shields.io/badge/Gallery-View-orange?logo=&amp\"></a>\n    <a href=\"https://discord.gg/kZakZzrSUT\"><img src=\"https://img.shields.io/badge/Discord-join-blueviolet?logo=discord&amp\"></a>\n    <a href=\"https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-247ipg9fk-KRRYmUl~u2ll2637WRURVA\"><img src=\"https://img.shields.io/badge/Slack-ColossalAI-blueviolet?logo=slack&amp\"></a>\n    <a href=\"https://twitter.com/yangyou1991/status/1769411544083996787?s=61&t=jT0Dsx2d-MS5vS9rNM5e5g\"><img src=\"https://img.shields.io/badge/Twitter-Discuss-blue?logo=twitter&amp\"></a>\n    <a href=\"https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png\"><img src=\"https://img.shields.io/badge/微信-小助手加群-green?logo=wechat&amp\"></a>\n    <a href=\"https://hpc-ai.com/blog/open-sora-v1.0\"><img src=\"https://img.shields.io/badge/Open_Sora-Blog-blue\"></a>\n    <a href=\"https://huggingface.co/spaces/hpcai-tech/open-sora\"><img src=\"https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Gradio Demo-blue\"></a>\n</div>\n\n## Open-Sora: 让所有人都能轻松制作高效视频\n\n我们设计并实施了**Open-Sora**，这是一项致力于高效制作高质量视频的计划。我们希望让所有人都能使用模型、工具和所有细节。通过采用开源原则，Open-Sora 不仅使高级视频生成技术的使用变得民主化，而且还提供了一个简化且用户友好的平台，简化了视频生成的复杂性。借助 Open-Sora，我们的目标是在内容创作领域促进创新、创造力和包容性。\n\n[[中文文档](/docs/zh_CN/README.md)] [[潞晨云](https://cloud.luchentech.com/)|[OpenSora镜像](https://cloud.luchentech.com/doc/docs/image/open-sora/)|[视频教程](https://www.bilibili.com/video/BV1ow4m1e7PX/?vd_source=c6b752764cd36ff0e535a768e35d98d2)]\n\n## 📰 资讯\n\n* **[2024.06.22]** 🔥我们在[潞晨云](https://cloud.luchentech.com/)上发布了Open-Sora1.2镜像，并在B站上传了详细的[使用教程](https://www.bilibili.com/video/BV1ow4m1e7PX/)\n* **[2024.06.17]** 🔥我们发布了**Open-Sora 1.2**，其中包括**3D-VAE**，**整流流**和**得分条件**。视频质量大大提高。[[模型权重]](#模型权重) [[技术报告]](report_v3.md) [[公众号文章]](https://mp.weixin.qq.com/s/QHq2eItZS9e00BVZnivdjg)\n* **[2024.04.25]** 🤗 我们在 Hugging Face Spaces 上发布了 [Open-Sora的Gradio演示](https://huggingface.co/spaces/hpcai-tech/open-sora)。\n* **[2024.04.25]** 我们发布了**Open-Sora 1.1**，支持**2s~15s、144p 到 720p、任意比例的文本转图片、文本转视频、图片转视频、视频转视频、无限时间生成**。此外，还发布了完整的视频处理管道。 [[模型权重]](#模型权重) [[技术报告]](report_v2.md)[[公众号文章]](https://mp.weixin.qq.com/s/nkPSTep2se__tzp5OfiRQQ)\n* **[2024.03.18]** 我们发布了 **Open-Sora 1.0**, 一个完全开源的视频生成项目。Open-Sora 1.0 支持完整的视频数据预处理流程、加速训练\n  <a href=\"https://github.com/hpcaitech/ColossalAI\"><img src=\"/assets/readme/colossal_ai.png\" width=\"8%\" ></a>\n、推理等。我们的模型只需 3 天的训练就可以生成 2 秒的 512x512 视频。 [[模型权重]](#模型权重)\n  [[公众号文章]](https://mp.weixin.qq.com/s/H52GW8i4z1Dco3Sg--tCGw) [[技术报告]](report_v1.md)\n* **[2024.03.04]** Open-Sora 提供培训，成本降低 46%。\n  [[公众号文章]](https://mp.weixin.qq.com/s/OjRUdrM55SufDHjwCCAvXg)\n\n## 🎥 Latest Demo\n\n🔥 您可以在HuggingFace上的 [🤗 Gradio应用程序](https://huggingface.co/spaces/hpcai-tech/open-sora)上体验Open-Sora. 我们的[画廊](https://hpcaitech.github.io/Open-Sora/)中提供了更多示例.\n\n| **4s 720×1280**                                                                                                                                      | **4s 720×1280**                                                                                                                                      | **4s 720×1280**                                                                                                                                      |\n| ---------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [<img src=\"/assets/demo/v1.2/sample_0013.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/7895aab6-ed23-488c-8486-091480c26327) | [<img src=\"/assets/demo/v1.2/sample_1718.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/20f07c7b-182b-4562-bbee-f1df74c86c9a) | [<img src=\"/assets/demo/v1.2/sample_0087.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/3d897e0d-dc21-453a-b911-b3bda838acc2) |\n| [<img src=\"/assets/demo/v1.2/sample_0052.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/644bf938-96ce-44aa-b797-b3c0b513d64c) | [<img src=\"/assets/demo/v1.2/sample_1719.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/272d88ac-4b4a-484d-a665-8d07431671d0) | [<img src=\"/assets/demo/v1.2/sample_0002.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/ebbac621-c34e-4bb4-9543-1c34f8989764) |\n| [<img src=\"/assets/demo/v1.2/sample_0011.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/a1e3a1a3-4abd-45f5-8df2-6cced69da4ca) | [<img src=\"/assets/demo/v1.2/sample_0004.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/d6ce9c13-28e1-4dff-9644-cc01f5f11926) | [<img src=\"/assets/demo/v1.2/sample_0061.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/561978f8-f1b0-4f4d-ae7b-45bec9001b4a) |\n\n<details>\n<summary>OpenSora 1.1 演示</summary>\n\n| **2秒 240×426**                                                                                                                                              | **2秒 240×426**                                                                                                                                             |\n| ----------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [<img src=\"/assets/demo/sample_16x240x426_9.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c31ebc52-de39-4a4e-9b1e-9211d45e05b2) | [<img src=\"/assets/demo/sora_16x240x426_26.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c31ebc52-de39-4a4e-9b1e-9211d45e05b2) |\n| [<img src=\"/assets/demo/sora_16x240x426_27.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/f7ce4aaa-528f-40a8-be7a-72e61eaacbbd)  | [<img src=\"/assets/demo/sora_16x240x426_40.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/5d58d71e-1fda-4d90-9ad3-5f2f7b75c6a9) |\n\n| **2秒 426×240**                                                                                                                                             | **4秒 480×854**                                                                                                                                              |\n| ---------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [<img src=\"/assets/demo/sora_16x426x240_24.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/34ecb4a0-4eef-4286-ad4c-8e3a87e5a9fd) | [<img src=\"/assets/demo/sample_32x480x854_9.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c1619333-25d7-42ba-a91c-18dbc1870b18) |\n\n| **16秒 320×320**                                                                                                                                        | **16秒 224×448**                                                                                                                                        | **2秒 426×240**                                                                                                                                            |\n| ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [<img src=\"/assets/demo/sample_16s_320x320.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/3cab536e-9b43-4b33-8da8-a0f9cf842ff2) | [<img src=\"/assets/demo/sample_16s_224x448.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/9fb0b9e0-c6f4-4935-b29e-4cac10b373c4) | [<img src=\"/assets/demo/sora_16x426x240_3.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/3e892ad2-9543-4049-b005-643a4c1bf3bf) |\n\n\n</details>\n\n<details>\n<summary>OpenSora 1.0 Demo</summary>\n\n| **2秒 512×512**                                                                                                                                                                 | **2秒 512×512**                                                                                                                                                              | **2秒 512×512**                                                                                                                                    |\n| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [<img src=\"/assets/readme/sample_0.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/de1963d3-b43b-4e68-a670-bb821ebb6f80)                                 | [<img src=\"/assets/readme/sample_1.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/13f8338f-3d42-4b71-8142-d234fbd746cc)                              | [<img src=\"/assets/readme/sample_2.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/fa6a65a6-e32a-4d64-9a9e-eabb0ebb8c16)    |\n|森林地区宁静的夜景。 [...] 该视频是一段延时摄影，捕捉了白天到夜晚的转变，湖泊和森林始终作为背景。 | 无人机拍摄的镜头捕捉到了海岸悬崖的壮丽美景，[...] 海水轻轻地拍打着岩石底部和紧贴悬崖顶部的绿色植物。| 瀑布从悬崖上倾泻而下，流入宁静的湖泊，气势磅礴。[...] 摄像机角度提供了瀑布的鸟瞰图。 |\n| [<img src=\"/assets/readme/sample_3.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/64232f84-1b36-4750-a6c0-3e610fa9aa94)                                 | [<img src=\"/assets/readme/sample_4.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/983a1965-a374-41a7-a76b-c07941a6c1e9)                              | [<img src=\"/assets/readme/sample_5.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/ec10c879-9767-4c31-865f-2e8d6cf11e65)    |\n| 夜晚繁华的城市街道，充满了汽车前灯的光芒和路灯的氛围光。 [...]                                                           | 向日葵田的生机勃勃，美不胜收。向日葵整齐排列，给人一种秩序感和对称感。 [...]                                            |宁静的水下场景，一只海龟在珊瑚礁中游动。这只海龟的壳呈绿褐色 [...]                   |\n\n视频经过降采样以.gif用于显示。单击查看原始视频。提示经过修剪以用于显示，请参阅[此处](/assets/texts/t2v_samples.txt)查看完整提示。\n\n</details>\n\n## 🔆 新功能/更新\n\n* 📍 **Open-Sora 1.2** 发布。模型权重可在[此处](#model-weights)查看。有关更多详细信息，请参阅我们的**[技术报告 v1.2](docs/report_03.md)** 。\n* ✅ 支持整流流调度。\n* ✅ 训练我们的 3D-VAE 进行时间维度压缩。\n* 📍 **Open-Sora 1.1**发布。模型权重可在[此处](#model-weights)获得。它针对**0s~15s、144p 到 720p、各种宽高比**的视频进行训练。有关更多讨论，请参阅我们的**[技术报告 v1.1](/docs/report_02.md)** 。\n* 🔧 **数据处理流程** v1.1发布，提供从原始视频到（文本，视频片段）对的自动处理流程，包括场景剪切$\\rightarrow$过滤（美学、光流、OCR 等）$\\rightarrow$字幕$\\rightarrow$管理。使用此工具，您可以轻松构建视频数据集。\n* ✅ 改进的 ST-DiT 架构包括 rope 位置编码、qk 范数、更长的文本长度等。\n* ✅ 支持任意分辨率、纵横比和时长（包括图像）的训练。\n* ✅ 支持图像和视频调节以及视频编辑，从而支持动画图像，连接视频等。\n* 📍 **Open-Sora 1.0**发布。模型权重可在[此处](#model-weights)获得。仅使用 400K 视频片段和 200 个 H800 天（相比稳定视频扩散中的 152M 样本），我们就能生成 2s 512×512 视频。有关更多讨论，请参阅我们的**[技术报告 v1.0](docs/report_01.md)**。\n* ✅从图像扩散模型到视频扩散模型的三阶段训练。我们为每个阶段提供权重。\n* ✅ 支持训练加速，包括加速 Transformer、更快的 T5 和 VAE 以及序列并行。Open-Sora 在 64x512x512 视频上训练时可将训练速度提高**55%**。详细信息位于[训练加速.md](docs/acceleration.md)。\n* 🔧 **数据预处理流程 v1.0**,包括 [下载](tools/datasets/README.md), [视频剪辑](tools/scene_cut/README.md), 和 [字幕](tools/caption/README.md) 工具. 我们的数据收集计划可在 [数据集.md](docs/datasets.md)中找到.\n\n<details>\n<summary>查看更多</summary>\n\n✅ 我们发现[VideoGPT](https://wilson1yan.github.io/videogpt/index.html)的 VQ-VAE质量较低，因此采用了[Stability-AI](https://huggingface.co/stabilityai/sd-vae-ft-mse-original)中的更好的 VAE 。我们还发现时间维度的修补会降低质量。有关更多讨论，请参阅我们的**[技术报告v1.0](docs/report_01.md)**。\n✅ 我们研究了不同的架构，包括 DiT、Latte 和我们提出的 **STDiT**。我们的STDiT在质量和速度之间实现了更好的平衡。请参阅我们的 **[技术报告v1.0](docs/report_01.md)**以了解更多讨论。\n✅ 支持剪辑和T5文本调节。\n✅ 通过将图像视为单帧视频，我们的项目支持在图像和视频上训练 DiT（例如 ImageNet 和 UCF101）。有关更多说明，请参阅[commands.md](docs/commands.md) 。\n✅ 支持使用[DiT](https://github.com/facebookresearch/DiT), [Latte](https://github.com/Vchitect/Latte),\n  和 [PixArt](https://pixart-alpha.github.io/).的官方权重进行推理。\n✅ 重构代码库。查看[structure.md](docs/structure.md)以了解项目结构以及如何使用配置文件。\n\n</details>\n\n### 按优先级排序的 TODO 列表\n\n<details>\n<summary>查看更多</summary>\n\n* [x] 训练视频 VAE 并使我们的模型适应新的 VAE\n* [x] 缩放模型参数和数据集大小\n* [x] 纳入更好的调度程序（整流流程）\n* [x] 评估流程\n* [x] 完成数据处理流程（包括密集光流、美学评分、文本-图像相似度等）。有关更多信息，请参阅[数据集](/docs/datasets.md)\n* [x] 支持图像和视频调节\n* [x] 支持可变的纵横比、分辨率和持续时间\n\n</details>\n\n## 内容\n\n* [安装](#安装)\n* [模型权重](#模型权重)\n* [Gradio演示](#gradio演示)\n* [推理](#推理)\n* [数据处理](#数据处理)\n* [训练](#训练)\n* [评估](#评估)\n* [贡献](#贡献)\n* [引用](#引用)\n* [致谢](#致谢)\n\n下面列出了其他有用的文档和链接。\n\n* 报告: [技术报告 v1.2](docs/report_v3.md), [技术报告 v1.1](/docs/report_v2.md), [技术报告 v1.0](/docs/report_v1.md), [训练加速.md](docs/acceleration.md)\n* Repo 结构: [结构.md](docs/structure.md)\n* 配置文件说明: [config.md](docs/config.md)\n* Useful commands: [commands.md](docs/commands.md)\n* 数据处理管道和数据集: [datasets.md](docs/datasets.md)\n* 每个数据处理工具的 README: [dataset conventions and management](/tools/datasets/README.md), [scene cutting](/tools/scene_cut/README.md), [scoring](/tools/scoring/README.md), [caption](/tools/caption/README.md)\n* 评估: [eval](/eval/README.md)\n* 画廊: [gallery](https://hpcaitech.github.io/Open-Sora/)\n\n## 安装\n\n### 从源头安装\n\n对于 CUDA 12.1，您可以使用以下命令[安装](/docs/installation.md)依赖项。否则，请参阅安装以获取有关不同 cuda 版本的更多说明以及数据预处理的其他依赖项。\n\n```bash\n# create a virtual env and activate (conda as an example)\nconda create -n opensora python=3.9\nconda activate opensora\n\n# install torch, torchvision and xformers\npip install -r requirements/requirements-cu121.txt\n\n# download the repo\ngit clone https://github.com/hpcaitech/Open-Sora\ncd Open-Sora\n\n# the default installation is for inference only\npip install -v . # for development mode, `pip install -v -e .`\n\n\n(Optional, recommended for fast speed, especially for training) To enable `layernorm_kernel` and `flash_attn`, you need to install `apex` and `flash-attn` with the following commands.\n\n```bash\n# install flash attention\n# set enable_flash_attn=False in config to disable flash attention\npip install packaging ninja\npip install flash-attn --no-build-isolation\n\n# install apex\n# set enable_layernorm_kernel=False in config to disable apex\npip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings \"--build-option=--cpp_ext\" --config-settings \"--build-option=--cuda_ext\" git+https://github.com/NVIDIA/apex.git\n```\n\n### 使用Docker\n\n运行以下命令从提供的Dockerfile 构建docker 镜像。\n\n```bash\ndocker build -t opensora .\n```\n\n运行以下命令以交互模式启动docker容器。\n\n```bash\ndocker run -ti --gpus all -v .:/workspace/Open-Sora opensora\n```\n\n## 模型权重\n\n### Open-Sora 1.2 模型权重\n| 分辨率 | 模型大小 | 数据 | 迭代次数 | 批次大小 | 网址 |\n| ---------- | ---------- | ---- | ----------- | ---------- | --- |\n| Diffusion | 1.1B       | 30M  | 70k         | 动态大小    | [:link:](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v3) |\n| VAE       | 384M       | 3M   | 1M          | 8          | [:link:](https://huggingface.co/hpcai-tech/OpenSora-VAE-v1.2) |\n\n请参阅我们的**[report 1.2](docs/report_v3.md)**以了解更多信息。\n\n### Open-Sora 1.1 模型权重\n\n<details>\n<summary>查看更多</summary>\n\n| 分辨率         | M | Data                       | #iterations | Batch Size                                        | URL                                                                  |\n| ------------------ | ---------- | -------------------------- | ----------- | ------------------------------------------------- | -------------------------------------------------------------------- |\n| mainly 144p & 240p | 700M       | 10M videos + 2M images     | 100k        | [dynamic](/configs/opensora-v1-1/train/stage2.py) | [:link:](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v2-stage2) |\n| 144p to 720p       | 700M       | 500K HQ videos + 1M images | 4k          | [dynamic](/configs/opensora-v1-1/train/stage3.py) | [:link:](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v2-stage3) |\n\n请参阅我们的 **[报告 1.1](docs/report_02.md)** 以了解更多信息。\n\n:warning: **局限性**: 此版本包含已知问题，我们将在下一版本中修复这些问题（因为我们为下一版本节省了计算资源）。此外，由于此问题，视频生成可能会长时间失败，高分辨率将产生嘈杂的结果。\n\n</details>\n\n### Open-Sora 1.0 模型权重\n<details>\n<summary>查看更多</summary>\n\n| 分辨率 | 模型大小 | 数据   | 迭代次数 | 批量大小 | GPU 天数 (H800) | 网址\n| ---------- | ---------- | ------ | ----------- | ---------- | --------------- |\n| 16×512×512 | 700M       | 20K HQ | 20k         | 2×64       | 35              | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth) |\n| 16×256×256 | 700M       | 20K HQ | 24k         | 8×64       | 45              | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth) |\n| 16×256×256 | 700M       | 366K   | 80k         | 8×64       | 117             | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth)    |\n\n训练流程: 16x256x256 $\\rightarrow$ 16x256x256 高清 $\\rightarrow$ 16x512x512 高质量.\n\n我们的模型权重部分由 [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha)初始化，参数数量为724M.更多信息请参阅  **[技术报告v1.0](docs/report_v1.md)**。数据集相关信息请参阅[数据集文件](docs/datasets.md). HQ 表示高质量.\n\n:warning: **局限性**: 我们的模型是在有限的预算下训练的。质量和文本对齐相对较差。该模型表现不佳，特别是在生成人类时，无法遵循详细的说明。我们正在努力提高质量和文本对齐。\n\n</details>\n\n## Gradio演示\n\n🔥 您可以在Hugging Face 上的[🤗 Gradio 应用程序](https://huggingface.co/spaces/hpcai-tech/open-sora)上在线体验Open-Sora。【由于GPU资源不足，已失效】\n\n### 本地部署\n\n如果您想在本地部署 gradio，我们还在这个存储库中提供了一个[Gradio 应用程序](./gradio) ，您可以使用以下命令启动一个交互式 Web 应用程序来体验使用 Open-Sora 生成视频。\n\n```bash\npip install gradio spaces\npython gradio/app.py\n```\n\n这将在您的本地主机上启动 Gradio 应用程序。如果您想了解有关 Gradio 应用程序的更多信息，可以参考[Gradio README](./gradio/README.md)。\n\n要启用提示增强和其他语言输入（例如中文输入），您需要OPENAI_API_KEY在环境中进行设置。查看[OpenAI的文档](https://platform.openai.com/docs/quickstart)以获取您的 API 密钥。\n\n```bash\nexport OPENAI_API_KEY=YOUR_API_KEY\n```\n\n### 入门\n\n在 Gradio 应用程序中，基本选项如下：\n\n![Gradio Demo](/assets/readme/gradio_basic.png)\n\n生成视频最简单的方式是输入文本提示，然后点击“**生成视频**”按钮（如果找不到，请向下滚动）。生成的视频将显示在右侧面板中。勾选“**使用 GPT4o 增强提示**”将使用 GPT-4o 来细化提示，而“**随机提示**”按钮将由 GPT-4o 为您生成随机提示。由于 OpenAI 的 API 限制，提示细化结果具有一定的随机性。\n\n然后，你可以选择生成视频的**分辨率**、**时长**、**长宽比**。不同的分辨率和视频长度会影响视频生成速度。在 80G H100 GPU 上，生成速度和峰值内存使用量为：\n\n|   分辨率   | 图像   | 2秒       | 4秒        | 8秒        | 16秒       |\n| ---- | ------- | -------- | --------- | --------- | --------- |\n| 360p | 3s, 24G | 18s, 27G | 31s, 27G  | 62s, 28G  | 121s, 33G |\n| 480p | 2s, 24G | 29s, 31G | 55s, 30G  | 108s, 32G | 219s, 36G |\n| 720p | 6s, 27G | 68s, 41G | 130s, 39G | 260s, 45G | 547s, 67G |\n\n注意，除了文本转视频，你还可以使用图片转视频。你可以上传图片，然后点击“**生成视频**”按钮，生成以图片为第一帧的视频。或者，你可以填写文本提示，然后点击“**生成图片**”按钮，根据文本提示生成图片，然后点击“**生成视频**”按钮，根据同一模型生成的图片生成视频。\n\n![Gradio Demo](/assets/readme/gradio_option.png)\n\n然后您可以指定更多选项，包括“**运动强度**”、“**美学**”和“**相机运动**”。如果未选中“启用”或选择“无”，则不会将信息传递给模型。否则，模型将生成具有指定运动强度、美学分数和相机运动的视频。\n\n对于**美学分数**，我们建议使用高于 6 的值。对于**运动强度**，较小的值将导致更平滑但动态性较差的视频，而较大的值将导致更动态但可能更模糊的视频。因此，您可以尝试不使用它，然后根据生成的视频进行调整。对于**相机运动**，有时模型无法很好地遵循指令，我们正在努力改进它。\n\n您还可以调整“**采样步数**”，这是去噪的次数，与生成速度直接相关。小于 30 的数字通常会导致较差的生成结果，而大于 100 的数字通常不会有明显的改善。“种子”用于可重复性，您可以将其设置为固定数字以生成相同的视频。“**CFG 比例**”控制模型遵循文本提示的程度，较小的值会导致视频更随机，而较大的值会导致视频更遵循文本（建议为 7）。\n\n对于更高级的用法，您可以参考[Gradio README](./gradio/README.md#advanced-usage).\n\n## 推理\n\n### Open-Sora 1.2 命令行推理\n\n基础的命令行推理:\n\n```bash\n# text to video\npython scripts/inference.py configs/opensora-v1-2/inference/sample.py \\\n  --num-frames 4s --resolution 720p --aspect-ratio 9:16 \\\n  --prompt \"a beautiful waterfall\"\n```\n\n您可以向命令行添加更多选项来定制生成。\n\n```bash\npython scripts/inference.py configs/opensora-v1-2/inference/sample.py \\\n  --num-frames 4s --resolution 720p --aspect-ratio 9:16 \\\n  --num-sampling-steps 30 --flow 5 --aes 6.5 \\\n  --prompt \"a beautiful waterfall\"\n```\n\n对于图像到视频生成和其他功能，API 与 Open-Sora 1.1 兼容。请参阅[此处]](commands.md)了解更多说明。\n\n如果您的安装不包含 `apex` 和 `flash-attn`, 则需要在配置文件中或通过以下命令禁用它们。\n\n```bash\npython scripts/inference.py configs/opensora-v1-2/inference/sample.py \\\n  --num-frames 4s --resolution 720p \\\n  --layernorm-kernel False --flash-attn False \\\n  --prompt \"a beautiful waterfall\"\n```\n\n### 序列并行推理\n\n要启用序列并行，您需要使用 `torchrun` 来运行推理脚本。以下命令将使用 2 个 GPU 运行推理。\n\n```bash\n# text to video\nCUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node 2 scripts/inference.py configs/opensora-v1-2/inference/sample.py \\\n  --num-frames 4s --resolution 720p --aspect-ratio 9:16 \\\n  --prompt \"a beautiful waterfall\"\n```\n\n:warning: **注意**: gradio 部署不支持序列并行。目前，只有当维度可以除以 GPU 数量时才支持序列并行。因此，在某些情况下可能会失败。我们测试了 4 个 GPU 用于 720p 和 2 个 GPU 用于 480p。\n\n\n### GPT-4o 快速细化\n\n我们发现 GPT-4o 可以细化提示并提高生成视频的质量。利用此功能，您还可以使用其他语言（例如中文）作为提示。要启用此功能，您需要在环境中准备您的 openai api 密钥：\n\n```bash\nexport OPENAI_API_KEY=YOUR_API_KEY\n```\n\n然后您可以用 `--llm-refine True` 启用GPT-4o进行提示细化以完成推理。\n\n### Open-Sora 1.1 命令行推理\n<details>\n<summary>查看更多</summary>\n\n由于 Open-Sora 1.1 支持动态输入大小的推理，因此您可以将输入大小作为参数传递。\n\n```bash\n# text to video\npython scripts/inference.py configs/opensora-v1-1/inference/sample.py --prompt \"A beautiful sunset over the city\" --num-frames 32 --image-size 480 854\n```\n\n如果您的安装不包含`apex` 和 `flash-attn`，则需要在配置文件中或通过以下命令禁用它们。\n\n```bash\npython scripts/inference.py configs/opensora-v1-1/inference/sample.py --prompt \"A beautiful sunset over the city\" --num-frames 32 --image-size 480 854 --layernorm-kernel False --flash-attn False\n```\n\n请参阅[此处](docs/commands.md#inference-with-open-sora-11)了解更多说明，包括文本转图像、图像转视频、视频转视频和无限时间生成。\n\n</details>\n\n### Open-Sora 1.0 命令行推理\n\n<details>\n<summary>查看更多</summary>\n\n我们还提供了离线推理脚本。运行以下命令生成样本，所需的模型权重将自动下载。要更改采样提示，请修改传递给的 txt 文件--prompt-path。请参阅[此处](docs/structure.md#inference-config-demos)以自定义配置。\n\n```bash\n# Sample 16x512x512 (20s/sample, 100 time steps, 24 GB memory)\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path OpenSora-v1-HQ-16x512x512.pth --prompt-path ./assets/texts/t2v_samples.txt\n\n# Sample 16x256x256 (5s/sample, 100 time steps, 22 GB memory)\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path OpenSora-v1-HQ-16x256x256.pth --prompt-path ./assets/texts/t2v_samples.txt\n\n# Sample 64x512x512 (40s/sample, 100 time steps)\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/64x512x512.py --ckpt-path ./path/to/your/ckpt.pth --prompt-path ./assets/texts/t2v_samples.txt\n\n# Sample 64x512x512 with sequence parallelism (30s/sample, 100 time steps)\n# sequence parallelism is enabled automatically when nproc_per_node is larger than 1\ntorchrun --standalone --nproc_per_node 2 scripts/inference.py configs/opensora/inference/64x512x512.py --ckpt-path ./path/to/your/ckpt.pth --prompt-path ./assets/texts/t2v_samples.txt\n```\n\n速度是在 H800 GPU 上测试的。有关使用其他型号进行推理，请参阅[此处](docs/commands.md) 了解更多说明。要降低内存使用量，请`vae.micro_batch_size`在配置中设置较小的值（略低采样速度）。\n\n</details>\n\n## 数据处理\n\n高质量的数据对于训练良好的生成模型至关重要。为此，我们建立了完整的数据处理流程，可以将原始视频无缝转换为高质量的视频-文本对。流程如下所示。有关详细信息，请参阅[数据处理](docs/data_processing.md)。另请查看我们使用的[数据集](docs/datasets.md)。\n\n![Data Processing Pipeline](/assets/readme/report_data_pipeline.png)\n\n## 训练\n\n### Open-Sora 1.2 训练\n\n训练过程与Open-Sora 1.1相同。\n\n```bash\n# one node\ntorchrun --standalone --nproc_per_node 8 scripts/train.py \\\n    configs/opensora-v1-2/train/stage1.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n# multiple nodes\ncolossalai run --nproc_per_node 8 --hostfile hostfile scripts/train.py \\\n    configs/opensora-v1-2/train/stage1.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n```\n\n### Open-Sora 1.1 训练\n\n<details>\n<summary>查看更多</summary>\n\n在文件中准备好数据后`csv`，运行以下命令在单个节点上启动训练。\n\n```bash\n# one node\ntorchrun --standalone --nproc_per_node 8 scripts/train.py \\\n    configs/opensora-v1-1/train/stage1.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n# multiple nodes\ncolossalai run --nproc_per_node 8 --hostfile hostfile scripts/train.py \\\n    configs/opensora-v1-1/train/stage1.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n```\n\n</details>\n\n### Open-Sora 1.0 训练\n\n<details>\n<summary>查看更多</summary>\n\n在文件中准备好数据后`csv`，运行以下命令在单个节点上启动训练。\n\n```bash\n# 1 GPU, 16x256x256\ntorchrun --nnodes=1 --nproc_per_node=1 scripts/train.py configs/opensora/train/16x256x256.py --data-path YOUR_CSV_PATH\n# 8 GPUs, 64x512x512\ntorchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n```\n\n要在多个节点上启动训练，请根据[ColossalAI](https://colossalai.org/docs/basics/launch_colossalai/#launch-with-colossal-ai-cli)准备一个主机文件，并运行以下命令。\n\n```bash\ncolossalai run --nproc_per_node 8 --hostfile hostfile scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n```\n有关训练其他模型和高级用法，请参阅[此处](docs/commands.md)获取更多说明。\n\n</details>\n\n## 评估\n\n我们支持基于以下方面的评估：\n\n- 验证损失\n- [VBench](https://github.com/Vchitect/VBench/tree/master)h分数\n- VBench-i2v 分数\n- 批量生成以供人工评估\n所有评估代码均发布在 `eval`文件夹中。查看[README](/eval/README.md)了解更多详细信息。我们的 [技术报告](report_v3.md#评估)还提供了有关训练期间评估的更多信息。下表显示 Open-Sora 1.2 大大改进了 Open-Sora 1.0。\n\n| 模型          | 总得分 | 质量得分 | 语义得分 |\n| -------------- | ----------- | ------------- | -------------- |\n| Open-Sora V1.0 | 75.91%      | 78.81%        | 64.28%         |\n| Open-Sora V1.2 | 79.23%      | 80.71%        | 73.30%         |\n\n## VAE 训练与评估\n\n我们训练一个由空间 VAE 和时间 VAE 组成的 VAE 管道。有关更多详细信息，请参阅[VAE 文档](vae.md)。在运行以下命令之前，请按照我们的[安装文档](installation.md)安装 VAE 和评估所需的依赖项。\n\n如果您想训练自己的 VAE，我们需要按照[数据处理](#data-processing)流程在 csv 中准备数据，然后运行以下命令。请注意，您需要根据自己的 csv 数据大小相应地调整配置文件中的训练`epochs`数量。\n\n\n```bash\n# stage 1 training, 380k steps, 8 GPUs\ntorchrun --nnodes=1 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage1.py --data-path YOUR_CSV_PATH\n# stage 2 training, 260k steps, 8 GPUs\ntorchrun --nnodes=1 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage2.py --data-path YOUR_CSV_PATH\n# stage 3 training, 540k steps, 24 GPUs\ntorchrun --nnodes=3 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage3.py --data-path YOUR_CSV_PATH\n```\n\n为了评估 VAE 的性能，您需要首先运行 VAE 推理来生成视频，然后计算生成的视频的分数：\n\n```bash\n# video generation\ntorchrun --standalone --nnodes=1 --nproc_per_node=1 scripts/inference_vae.py configs/vae/inference/video.py --ckpt-path YOUR_VAE_CKPT_PATH --data-path YOUR_CSV_PATH --save-dir YOUR_VIDEO_DIR\n# the original videos will be saved to `YOUR_VIDEO_DIR_ori`\n# the reconstructed videos through the pipeline will be saved to `YOUR_VIDEO_DIR_rec`\n# the reconstructed videos through the spatial VAE only will be saved to `YOUR_VIDEO_DIR_spatial`\n\n# score calculation\npython eval/vae/eval_common_metric.py --batch_size 2 --real_video_dir YOUR_VIDEO_DIR_ori --generated_video_dir YOUR_VIDEO_DIR_rec --device cuda --sample_fps 24 --crop_size 256 --resolution 256 --num_frames 17 --sample_rate 1 --metric ssim psnr lpips flolpips\n```\n\n\n## 贡献\n\n感谢以下出色的贡献者：\n\n<a href=\"https://github.com/hpcaitech/Open-Sora/graphs/contributors\">\n  <img src=\"https://contrib.rocks/image?repo=hpcaitech/Open-Sora\" />\n</a>\n\n如果您希望为该项目做出贡献，请参阅[Contribution Guideline](./CONTRIBUTING.md)。\n\n## 致谢\n\n这里我们仅列出了部分项目，其他研究成果及数据集请参考我们的报告。\n\n* [ColossalAI](https://github.com/hpcaitech/ColossalAI): 强大的大型模型并行加速与优化系统。\n* [DiT](https://github.com/facebookresearch/DiT): 带有 Transformer 的可扩展扩散模型。\n* [OpenDiT](https://github.com/NUS-HPC-AI-Lab/OpenDiT): DiT 训练的加速器。我们从 OpenDiT 中采用了有价值的训练进度加速策略。\n* [PixArt](https://github.com/PixArt-alpha/PixArt-alpha): 一个基于 DiT 的开源文本转图像模型。\n* [Latte](https://github.com/Vchitect/Latte): 尝试高效地训练视频的 DiT。\n* [StabilityAI VAE](https://huggingface.co/stabilityai/sd-vae-ft-mse-original): 一个强大的图像 VAE 模型。\n* [CLIP](https://github.com/openai/CLIP): 一个强大的文本图像嵌入模型。\n* [T5](https://github.com/google-research/text-to-text-transfer-transformer): 强大的文本编码器。\n* [LLaVA](https://github.com/haotian-liu/LLaVA): 基于[Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) 和 [Yi-34B](https://huggingface.co/01-ai/Yi-34B). 的强大图像字幕模型。\n* [PLLaVA](https://github.com/magic-research/PLLaVA): 一个强大的视频字幕模型。\n* [MiraData](https://github.com/mira-space/MiraData):具有长持续时间和结构化字幕的大规模视频数据集。\n\n我们感谢他们的出色工作和对开源的慷慨贡献。\n\n## 引用\n\n```bibtex\n@software{opensora,\n  author = {Zangwei Zheng and Xiangyu Peng and Tianji Yang and Chenhui Shen and Shenggui Li and Hongxin Liu and Yukun Zhou and Tianyi Li and Yang You},\n  title = {Open-Sora: Democratizing Efficient Video Production for All},\n  month = {March},\n  year = {2024},\n  url = {https://github.com/hpcaitech/Open-Sora}\n}\n```\n\n## Star增长\n\n[![Star History Chart](https://api.star-history.com/svg?repos=hpcaitech/Open-Sora&type=Date)](https://star-history.com/#hpcaitech/Open-Sora&Date)\n"
  },
  {
    "path": "Open-Sora/docs/zh_CN/READMEv1.1.md",
    "content": "<p align=\"center\">\n    <img src=\"../../assets/readme/icon.png\" width=\"250\"/>\n<p>\n\n<div align=\"center\">\n    <a href=\"https://github.com/hpcaitech/Open-Sora/stargazers\"><img src=\"https://img.shields.io/github/stars/hpcaitech/Open-Sora?style=social\"></a>\n    <a href=\"https://hpcaitech.github.io/Open-Sora/\"><img src=\"https://img.shields.io/badge/Gallery-View-orange?logo=&amp\"></a>\n    <a href=\"https://discord.gg/shpbperhGs\"><img src=\"https://img.shields.io/badge/Discord-join-blueviolet?logo=discord&amp\"></a>\n    <a href=\"https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-247ipg9fk-KRRYmUl~u2ll2637WRURVA\"><img src=\"https://img.shields.io/badge/Slack-ColossalAI-blueviolet?logo=slack&amp\"></a>\n    <a href=\"https://twitter.com/yangyou1991/status/1769411544083996787?s=61&t=jT0Dsx2d-MS5vS9rNM5e5g\"><img src=\"https://img.shields.io/badge/Twitter-Discuss-blue?logo=twitter&amp\"></a>\n    <a href=\"https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png\"><img src=\"https://img.shields.io/badge/微信-小助手加群-green?logo=wechat&amp\"></a>\n    <a href=\"https://hpc-ai.com/blog/open-sora-v1.0\"><img src=\"https://img.shields.io/badge/Open_Sora-Blog-blue\"></a>\n</div>\n\n## Open-Sora： 完全开源的高效复现类Sora视频生成方案\n**Open-Sora**项目是一项致力于**高效**制作高质量视频，并使所有人都能使用其模型、工具和内容的计划。\n通过采用**开源**原则，Open-Sora 不仅实现了先进视频生成技术的低成本普及，还提供了一个精简且用户友好的方案，简化了视频制作的复杂性。\n通过 Open-Sora，我们希望更多开发者一起探索内容创作领域的创新、创造和包容。\n\n[[English Document]](/README.md)\n\n <h4>Open-Sora 项目目前处在早期阶段，并将持续更新。</h4>\n\n## 📰 资讯\n> 由于文档需要进行翻译，最新资讯请看[英文文档](/README.md#-news)\n* **[2024.04.25]** 🤗 我们在Hugging Face Spaces上发布了Open-Sora的[Gradio demo](https://huggingface.co/spaces/hpcai-tech/open-sora)。\n* **[2024.04.25]** 🔥 我们发布了支持**2秒至15秒、144p至720p、任意宽高比**的文本到图像、文本到视频、图像到视频、视频到视频、无限时间生成的**Open-Sora 1.1**版本。此外，还发布了一个完整的视频处理流程。 [[checkpoints]]() [[report]](/docs/report_02.md)\n* **[2024.03.18]** 🔥 我们发布了**Open-Sora 1.0**，这是一个完全开源的视频生成项目。\n* Open-Sora 1.0 支持视频数据预处理、加速训练、推理等全套流程。\n* 我们提供的[模型权重](#模型权重)只需 3 天的训练就能生成 2 秒的 512x512 视频。\n* **[2024.03.04]** Open-Sora：开源Sora复现方案，成本降低46%，序列扩充至近百万。[[英文博客]](https://hpc-ai.com/blog/open-sora)\n\n## 🎥 最新视频\n\n| **2s 512×512**                                                                                                                                                                 | **2s 512×512**                                                                                                                                                              | **2s 512×512**                                                                                                                                    |\n| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [<img src=\"/assets/readme/sample_0.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/de1963d3-b43b-4e68-a670-bb821ebb6f80)                                 | [<img src=\"/assets/readme/sample_1.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/13f8338f-3d42-4b71-8142-d234fbd746cc)                              | [<img src=\"/assets/readme/sample_2.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/fa6a65a6-e32a-4d64-9a9e-eabb0ebb8c16)    |\n| A serene night scene in a forested area. [...] The video is a time-lapse, capturing the transition from day to night, with the lake and forest serving as a constant backdrop. | A soaring drone footage captures the majestic beauty of a coastal cliff, [...] The water gently laps at the rock base and the greenery that clings to the top of the cliff. | The majestic beauty of a waterfall cascading down a cliff into a serene lake. [...] The camera angle provides a bird's eye view of the waterfall. |\n| [<img src=\"/assets/readme/sample_3.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/64232f84-1b36-4750-a6c0-3e610fa9aa94) | [<img src=\"/assets/readme/sample_4.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/983a1965-a374-41a7-a76b-c07941a6c1e9) | [<img src=\"/assets/readme/sample_5.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/ec10c879-9767-4c31-865f-2e8d6cf11e65) |\n| A bustling city street at night, filled with the glow of car headlights and the ambient light of streetlights. [...]                                                           | The vibrant beauty of a sunflower field. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. [...]                                            | A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell [...]                   |\n\n视频经过降采样处理为`.gif`格式，以便显示。点击查看原始视频。为便于显示，文字经过修剪，全文请参见 [此处](/assets/texts/t2v_samples.txt)。在我们的[图片库](https://hpcaitech.github.io/Open-Sora/)中查看更多样本。\n\n## 🔆 新功能\n> 由于文档需要进行翻译，最新资讯请看[英文文档](/README.md#-new-featuresupdates)\n* 📍Open-Sora-v1 已发布。[这里](#模型权重)提供了模型权重。只需 400K 视频片段和在单卡 H800 上训200天（类比Stable Video Diffusion 的 152M 样本），我们就能生成 2 秒的 512×512 视频。\n* ✅ 从图像扩散模型到视频扩散模型的三阶段训练。我们提供每个阶段的权重。\n* ✅ 支持训练加速，包括Transformer加速、更快的 T5 和 VAE 以及序列并行。在对 64x512x512 视频进行训练时，Open-Sora 可将训练速度提高**55%**。详细信息请参见[训练加速](acceleration.md)。\n* 🔧 我们提供用于数据预处理的视频切割和字幕工具。有关说明请点击[此处](tools/data/README.md)，我们的数据收集计划请点击 [数据集](datasets.md)。\n* ✅ 我们发现来自[VideoGPT](https://wilson1yan.github.io/videogpt/index.html)的 VQ-VAE 质量较低，因此采用了来自[Stability-AI](https://huggingface.co/stabilityai/sd-vae-ft-mse-original) 的高质量 VAE。我们还发现使用添加了时间维度的采样会导致生成质量降低。更多讨论，请参阅我们的 **[报告](docs/report_v1.md)**。\n* ✅ 我们研究了不同的架构，包括 DiT、Latte 和我们提出的 **STDiT**。我们的STDiT在质量和速度之间实现了更好的权衡。更多讨论，请参阅我们的 **[报告](report_v1.md)**。\n* ✅ 支持剪辑和 T5 文本调节。\n* ✅ 通过将图像视为单帧视频，我们的项目支持在图像和视频（如 ImageNet 和 UCF101）上训练 DiT。更多说明请参见 [指令解析](command.md)。\n* ✅ 利用[DiT](https://github.com/facebookresearch/DiT)、[Latte](https://github.com/Vchitect/Latte) 和 [PixArt](https://pixart-alpha.github.io/) 的官方权重支持推理。\n\n<details>\n<summary>查看更多</summary>\n\n* ✅ 重构代码库。请参阅[结构](structure.md)，了解项目结构以及如何使用配置文件。\n\n</details>\n\n### 下一步计划【按优先级排序】\n\n* [ ] 训练视频-VAE并让模型适应新的VAE **[项目进行中]**\n* [ ] 缩放模型参数和数据集大小 **[项目进行中]**\n* [ ] 纳入更好的时间表，例如 SD3 中的修正流程。 **[项目进行中]**\n\n<details>\n<summary>查看更多</summary>\n\n* [x] 评估流程。\n* [x] 完成数据处理流程（包括密集光流、美学评分、文本图像相似性、重复数据删除等）。更多信息请参见[数据集](datasets.md)\n* [x] 支持图像和视频调节。\n* [x] 支持可变长宽比、分辨率和持续时间。\n\n</details>\n\n## 目录\n\n* [安装](#安装)\n* [模型权重](#模型权重)\n* [推理](#推理)\n* [数据处理](#数据处理)\n* [训练](#训练)\n* [评估](#评估)\n* [贡献](#贡献)\n* [声明](#声明)\n* [引用](#引用)\n\n## 安装\n\n### 从源码安装\n```bash\n# create a virtual env\nconda create -n opensora python=3.10\n\n# install torch\n# the command below is for CUDA 12.1, choose install commands from\n# https://pytorch.org/get-started/locally/ based on your own CUDA version\npip3 install torch torchvision\n\n# install flash attention (optional)\npip install packaging ninja\npip install flash-attn --no-build-isolation\n\n# install apex (optional)\npip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings \"--build-option=--cpp_ext\" --config-settings \"--build-option=--cuda_ext\" git+https://github.com/NVIDIA/apex.git\n\n# install xformers\npip3 install -U xformers --index-url https://download.pytorch.org/whl/cu121\n\n# install this project\ngit clone https://github.com/hpcaitech/Open-Sora\ncd Open-Sora\npip install -v .\n```\n\n### 使用Docker镜像\n\n运行如下指令使用提供的Dockerfile构建镜像：\n\n```bash\ndocker build -t opensora ./docker\n```\n\n运行以下命令以启动交互模式下的 Docker 容器：\n\n```bash\ndocker run -ti --gpus all -v {MOUNT_DIR}:/data opensora\n```\n\n安装完成后，建议阅读[结构](structure.md)，了解项目结构以及如何使用配置文件。\n\n## 模型权重\n\n| 分辨率  | 数据   | 迭代次数 | 批量大小 | GPU 天数 (H800) | 网址       |\n| ---------- | ------ | ----------- | ---------- | --------------- | ---------- |\n| 16×256×256 | 366K   | 80k         | 8×64       | 117             | [:link:]() |\n| 16×256×256 | 20K HQ | 24k         | 8×64       | 45              | [:link:]() |\n| 16×512×512 | 20K HQ | 20k         | 2×64       | 35              | [:link:]() |\n\n我们模型的权重部分由[PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha) 初始化。参数数量为 724M。有关训练的更多信息，请参阅我们的 **[报告](report_v1.md)**。有关数据集的更多信息，请参阅[数据](datasets.md)。HQ 表示高质量。\n:warning: **局限性**：我们的模型是在有限的预算内训练出来的。质量和文本对齐度相对较差。特别是在生成人类时，模型表现很差，无法遵循详细的指令。我们正在努力改进质量和文本对齐。\n\n## 推理\n\n要使用我们提供的权重进行推理，首先要将[T5](https://huggingface.co/DeepFloyd/t5-v1_1-xxl/tree/main)权重下载到pretrained_models/t5_ckpts/t5-v1_1-xxl 中。然后下载模型权重。运行以下命令生成样本。请参阅[此处](structure.md#推理配置演示)自定义配置。\n\n```bash\n# Sample 16x512x512 (20s/sample, 100 time steps, 24 GB memory)\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path OpenSora-v1-HQ-16x512x512.pth --prompt-path ./assets/texts/t2v_samples.txt\n\n# Sample 16x256x256 (5s/sample, 100 time steps, 22 GB memory)\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path OpenSora-v1-HQ-16x256x256.pth --prompt-path ./assets/texts/t2v_samples.txt\n\n# Sample 64x512x512 (40s/sample, 100 time steps)\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/64x512x512.py --ckpt-path ./path/to/your/ckpt.pth --prompt-path ./assets/texts/t2v_samples.txt\n\n# Sample 64x512x512 with sequence parallelism (30s/sample, 100 time steps)\n# sequence parallelism is enabled automatically when nproc_per_node is larger than 1\ntorchrun --standalone --nproc_per_node 2 scripts/inference.py configs/opensora/inference/64x512x512.py --ckpt-path ./path/to/your/ckpt.pth --prompt-path ./assets/texts/t2v_samples.txt\n\n```\n\n我们在 H800 GPU 上进行了速度测试。如需使用其他模型进行推理，请参阅[此处](commands.md)获取更多说明。减小`vae.micro_batch_size`来降低显存使用（但取样速度会略微减慢）。\n\n## 数据处理\n\n高质量数据是高质量模型的关键。[这里](datasets.md)有我们使用过的数据集和数据收集计划。我们提供处理视频数据的工具。目前，我们的数据处理流程包括以下步骤：\n\n1. 下载数据集。[[文件](/tools/datasets/README.md)]\n2. 将视频分割成片段。 [[文件](/tools/scene_cut/README.md)]\n3. 生成视频字幕。 [[文件](/tools/caption/README.md)]\n\n## 训练\n\n### Open-Sora 1.0 训练\n<details>\n<summary>查看更多</summary>\n\n要启动训练，首先要将[T5](https://huggingface.co/DeepFloyd/t5-v1_1-xxl/tree/main)权重下载到pretrained_models/t5_ckpts/t5-v1_1-xxl 中。然后运行以下命令在单个节点上启动训练。\n\n```bash\n# 1 GPU, 16x256x256\ntorchrun --nnodes=1 --nproc_per_node=1 scripts/train.py configs/opensora/train/16x256x512.py --data-path YOUR_CSV_PATH\n# 8 GPUs, 64x512x512\ntorchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n```\n\n要在多个节点上启动训练，请根据[ColossalAI](https://colossalai.org/docs/basics/launch_colossalai/#launch-with-colossal-ai-cli) 准备一个主机文件，并运行以下命令。\n\n```bash\ncolossalai run --nproc_per_node 8 --hostfile hostfile scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n```\n\n有关其他模型的训练和高级使用方法，请参阅[此处](commands.md)获取更多说明。\n\n</details>\n\n## 评估\n\n点击[这里](https://github.com/hpcaitech/Open-Sora/blob/main/eval/README.md)查看评估\n\n## 贡献\n\n本中文翻译还有许多不足，如果您希望为该项目做出贡献，可以参考 [贡献指南](/CONTRIBUTING.md).\n\n目前需要翻译或更新的文件：\n* [ ] 更新[资讯](#-资讯)\n* [ ] 更新[最新视频](#-最新视频)\n* [ ] 更新[新功能](#-新功能)。\n* [ ] 翻译[评估](https://github.com/hpcaitech/Open-Sora/blob/main/eval/README.md)文件\n* [ ] 更新Open-Sora 1.1[训练](#训练)\n## 声明\n\n* [ColossalAI](https://github.com/hpcaitech/ColossalAI): A powerful large model parallel acceleration and optimization\n* [DiT](https://github.com/facebookresearch/DiT): Scalable Diffusion Models with Transformers.\n* [OpenDiT](https://github.com/NUS-HPC-AI-Lab/OpenDiT): An acceleration for DiT training. We adopt valuable acceleration strategies for training progress from OpenDiT.\n* [PixArt](https://github.com/PixArt-alpha/PixArt-alpha): An open-source DiT-based text-to-image model.\n* [Latte](https://github.com/Vchitect/Latte): An attempt to efficiently train DiT for video.\n* [StabilityAI VAE](https://huggingface.co/stabilityai/sd-vae-ft-mse-original): A powerful image VAE model.\n* [CLIP](https://github.com/openai/CLIP): A powerful text-image embedding model.\n* [T5](https://github.com/google-research/text-to-text-transfer-transformer): A powerful text encoder.\n* [LLaVA](https://github.com/haotian-liu/LLaVA): A powerful image captioning model based on [Yi-34B](https://huggingface.co/01-ai/Yi-34B).\n\n我们对他们的出色工作和对开源的慷慨贡献表示感谢。\n\n## 引用\n\n```bibtex\n@software{opensora,\n  author = {Zangwei Zheng and Xiangyu Peng and Yang You},\n  title = {Open-Sora: Democratizing Efficient Video Production for All},\n  month = {March},\n  year = {2024},\n  url = {https://github.com/hpcaitech/Open-Sora}\n}\n```\n\n[Zangwei Zheng](https://github.com/zhengzangw) and [Xiangyu Peng](https://github.com/xyupeng) equally contributed to this work during their internship at [HPC-AI Tech](https://hpc-ai.com/).\n\n## Star 走势\n\n[![Star History Chart](https://api.star-history.com/svg?repos=hpcaitech/Open-Sora&type=Date)](https://star-history.com/#hpcaitech/Open-Sora&Date)\n"
  },
  {
    "path": "Open-Sora/docs/zh_CN/acceleration.md",
    "content": "# 加速\n\n>本文档对应于Open-Sora v1.1版本。\n\nOpen-Sora 旨在为扩散模型提供一个高速训练框架。在 64 帧 512x512 视频上训练时，我们可以实现 **55%** 的训练速度加速。我们的框架支持训练\n**1分钟1080p视频**。\n\n## 加速的 Transformer\n\nOpen-Sora 通过以下方式提高训练速度：\n\n- 内核优化，包括 [flash attention](https://github.com/Dao-AILab/flash-attention), 融合 layernorm 内核以及由 colossalAI\n  编译的内核。\n- 混合并行性，包括 ZeRO。\n- 用于更大批量的梯度检查点。\n\n我们在图像上的训练速度可与 [OpenDiT](https://github.com/NUS-HPC-AI-Lab/OpenDiT) 相媲美，这是一个加速 DiT\n训练的项目。训练速度是在批处理大小为 128、图像大小为 256x256 的 8 个 H800 GPU 上测量的。\n\n| 模型       | 吞吐量 (img/s/GPU) | 吞吐量 (tokens/s/GPU) |\n|----------|-----------------|--------------------|\n| DiT      | 100             | 26k                |\n| OpenDiT  | 175             | 45k                |\n| OpenSora | 175             | 45k                |\n\n## 高效的 STDiT\n\n我们的 STDiT 采用时空注意力对视频数据进行建模。与直接全神贯注在 Dit 相比，我们的 STDiT 随着帧数的增加而更有效率。我们当前的框架仅支持序列超长序列的并行性。\n\n训练速度是在 8 个 H800 GPU 上测量的，应用了加速技术，GC 表示梯度检查点。\n两者都具有像 PixArt 一样的 T5 调节。\n\n| 模型               | 设置             | 吞吐量 (sample/s/GPU) | 吞吐量 (tokens/s/GPU) |\n|------------------|----------------|--------------------|--------------------|\n| DiT              | 16x256  (4k)   | 7.20               | 29k                |\n| STDiT            | 16x256  (4k)   | 7.00               | 28k                |\n| DiT              | 16x512  (16k)  | 0.85               | 14k                |\n| STDiT            | 16x512  (16k)  | 1.45               | 23k                |\n| DiT (GC)         | 64x512  (65k)  | 0.08               | 5k                 |\n| STDiT (GC)       | 64x512  (65k)  | 0.40               | 25k                |\n| STDiT (GC, sp=2) | 360x512 (370k) | 0.10               | 18k                |\n\n使用 Video-VAE 在时间维度上进行 4 倍下采样时，24fps 视频有 450 帧。STDiT(28k tokens/s) 和 DiT 对图像 (高达 45k tokens/s)\n两者之间的速度差距主要来自 T5 和 VAE 编码，以及时间注意力。\n\n## 加速的编码器 (T5, VAE)\n\n在训练过程中，文本由 T5 编码，视频由 VAE 编码。通常有两种方法可以加速训练：\n\n1. 提前预处理文本和视频数据并保存到磁盘。\n2. 在训练过程中对文本和视频数据进行编码，并加快编码过程。\n\n对于选项 1，一个样本的 120 个令牌需要 1M 磁盘空间，而 64x64x64 的潜在可能需要 4M。考虑训练 包含 10M 视频剪辑的数据集，所需的总磁盘空间为\n50TB。我们的存储系统目前还没有准备好 这种数据规模。\n\n对于选项 2，我们提高了 T5 速度和内存要求。根据在[OpenDiT](https://github.com/NUS-HPC-AI-Lab/OpenDiT)，我们发现 VAE\n消耗了大量的 GPU 内存。因此，我们\n将批大小拆分为较小的批大小，以便进行 VAE 编码。使用这两种技术，我们可以大大加快训练速度。\n\n训练速度是在 8 个带有 STDiT 的 H800 GPU 上测量的。\n\n| 加速模式         | 设置            | 吞吐量 (img/s/GPU) | 吞吐量 (tokens/s/GPU) |\n|--------------|---------------|-----------------|--------------------|\n| Baseline     | 16x256  (4k)  | 6.16            | 25k                |\n| w. faster T5 | 16x256  (4k)  | 7.00            | 29k                |\n| Baseline     | 64x512  (65k) | 0.94            | 15k                |\n| w. both      | 64x512  (65k) | 1.45            | 23k                |\n"
  },
  {
    "path": "Open-Sora/docs/zh_CN/commands.md",
    "content": "# 命令\n\n## 推理\n\n您可以修改相应的配置文件来更改推理设置。在 [此处](/docs/structure.md#inference-config-demos) 查看更多详细信息。\n\n### 在 ImageNet 上使用 DiT 预训练进行推理\n\n以下命令会自动在 ImageNet 上下载预训练权重并运行推理。\n\n```bash\npython scripts/inference.py configs/dit/inference/1x256x256-class.py --ckpt-path DiT-XL-2-256x256.pt\n```\n\n### 在 UCF101 上使用 Latte 预训练进行推理\n\n以下命令会自动下载 UCF101 上的预训练权重并运行推理。\n\n```bash\npython scripts/inference.py configs/latte/inference/16x256x256-class.py --ckpt-path Latte-XL-2-256x256-ucf101.pt\n```\n\n### 使用 PixArt-α 预训练权重进行推理\n\n将 T5 下载到 `./pretrained_models` 并运行以下命令。\n\n```bash\n# 256x256\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/pixart/inference/1x256x256.py --ckpt-path PixArt-XL-2-256x256.pth\n\n# 512x512\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/pixart/inference/1x512x512.py --ckpt-path PixArt-XL-2-512x512.pth\n\n# 1024 multi-scale\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/pixart/inference/1x1024MS.py --ckpt-path PixArt-XL-2-1024MS.pth\n```\n\n### 使用训练期间保存的 checkpoints 进行推理\n\n在训练期间，会在 `outputs` 目录中创建一个实验日志记录文件夹。在每个 checkpoint 文件夹下（例如 `epoch12-global_step2000`），有一个 `ema.pt` 文件和共享的 `model` 文件夹。执行以下命令进行推理。\n\n```bash\n# 使用 ema 模型进行推理\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path outputs/001-STDiT-XL-2/epoch12-global_step2000/ema.pt\n\n# 使用模型进行推理\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path outputs/001-STDiT-XL-2/epoch12-global_step2000\n\n# 使用序列并行进行推理\n# 当 nproc_per_node 大于 1 时，将自动启用序列并行\ntorchrun --standalone --nproc_per_node 2 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path outputs/001-STDiT-XL-2/epoch12-global_step2000\n```\n\n第二个命令将在 checkpoint 文件夹中自动生成一个 `model_ckpt.pt` 文件。\n\n### 推理超参数\n\n1. DPM 求解器擅长对图像进行快速推理。但是，它的视频推理的效果并不令人满意。若出于快速演示目的您可以使用这个求解器。\n\n```python\ntype=\"dmp-solver\"\nnum_sampling_steps=20\n```\n\n2. 您可以在视频推理上使用 [SVD](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) 微调的 VAE 解码器（消耗更多内存）。但是，我们没有看到视频推理效果有明显改善。要使用它，请将 [预训练权重](https://huggingface.co/maxin-cn/Latte/tree/main/t2v_required_models/vae_temporal_decoder) 下载到 `./pretrained_models/vae_temporal_decoder` 中，并修改配置文件，如下所示。\n\n```python\nvae = dict(\n    type=\"VideoAutoencoderKLTemporalDecoder\",\n    from_pretrained=\"pretrained_models/vae_temporal_decoder\",\n)\n```\n\n## 训练\n\n如果您要继续训练，请运行以下命令。参数 ``--load`` 和 ``--ckpt-path`` 不同之处在于，它会加载优化器和数据加载器的状态。\n\n```bash\ntorchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --load YOUR_PRETRAINED_CKPT\n```\n\n如果要启用 wandb 日志，请添加到 `--wandb` 参数到命令中。\n\n```bash\nWANDB_API_KEY=YOUR_WANDB_API_KEY torchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --wandb True\n```\n\n您可以修改相应的配置文件来更改训练设置。在 [此处](/docs/structure.md#training-config-demos) 查看更多详细信息。\n\n### 训练超参数\n\n1. `dtype` 是用于训练的数据类型。仅支持 `fp16` 和 `bf16`。ColossalAI 自动启用 `fp16` 和 `bf16` 的混合精度训练。在训练过程中，我们发现 `bf16` 更稳定。\n"
  },
  {
    "path": "Open-Sora/docs/zh_CN/datasets.md",
    "content": "# 数据集\n\n## 正在使用的数据集\n\n### HD-VG-130M\n\n[HD-VG-130M](https://github.com/daooshee/HD-VG-130M?tab=readme-ov-file) 包括 130M 个文本视频对。标题是\n由 BLIP-2 生成。我们发现剪切和文本质量相对较差。它包含 20 个拆分。对于 OpenSora 1.0，我们使用第一个拆分。我们计划使用整个数据集并对其进行重新处理。\n\n### Inter4k\n\n[Inter4k](https://github.com/alexandrosstergiou/Inter4K) 是一个包含分辨率为 4K 的 1k 视频剪辑的数据集。这个\n数据集被提议用于超分辨率任务。我们使用数据集进行 HQ 训练。处理过的视频可以从这里找到 [这里](README.md#数据处理) 。\n\n### Pexels.com\n\n[Pexels.com](https://www.pexels.com/) 是一个提供免费库存照片和视频的网站。我们收集的 19K 视频\n来自本网站的剪辑，用于高质量训练。处理过的视频可以从这里找到 [这里](README.md#数据处理) 。\n\n## 数据集监视列表\n\n我们也在关注以下数据集，并考虑在未来使用它们，这取决于我们的存储空间以及数据集的质量。\n\n| 名称                | 大小           | 描述                            |\n|-------------------|--------------|-------------------------------|\n| Panda-70M         | 70M videos   | High quality video-text pairs |\n| WebVid-10M        | 10M videos   | Low quality                   |\n| InternVid-10M-FLT | 10M videos   |                               |\n| EGO4D             | 3670 hours   |                               |\n| OpenDV-YouTube    | 1700 hours   |                               |\n| VidProM           | 6.69M videos |                               |\n"
  },
  {
    "path": "Open-Sora/docs/zh_CN/report_v1.md",
    "content": "# Open-Sora v1 技术报告\n\nOpenAI的Sora在生成一分钟高质量视频方面非常出色。然而，它几乎没有透露任何关于其细节的信息。为了使人工智能更加“开放”，我们致力于构建一个开源版本的Sora。这份报告描述了我们第一次尝试训练一个基于Transformer的视频扩散模型。\n\n## 选择高效的架构\n\n为了降低计算成本，我们希望利用现有的VAE模型。Sora使用时空VAE来减少时间维度。然而，我们发现没有开源的高质量时空VAE模型。[MAGVIT](https://github.com/google-research/magvit)的4x4x4 VAE并未开源，而[VideoGPT](https://wilson1yan.github.io/videogpt/index.html)的2x4x4 VAE在我们的实验中质量较低。因此，我们决定在我们第一个版本中使用2D VAE（来自[Stability-AI](https://huggingface.co/stabilityai/sd-vae-ft-mse-original)）。\n\n视频训练涉及大量的token。考虑到24fps的1分钟视频，我们有1440帧。通过VAE下采样4倍和patch大小下采样2倍，我们得到了1440x1024≈150万个token。在150万个token上进行全注意力计算将带来巨大的计算成本。因此，我们使用时空注意力来降低成本，这是遵循[Latte](https://github.com/Vchitect/Latte)的方法。\n\n如图中所示，在STDiT（ST代表时空）中，我们在每个空间注意力之后立即插入一个时间注意力。这类似于Latte论文中的变种3。然而，我们并没有控制这些变体的相似数量的参数。虽然Latte的论文声称他们的变体比变种3更好，但我们在16x256x256视频上的实验表明，相同数量的迭代次数下，性能排名为：DiT（完整）> STDiT（顺序）> STDiT（并行）≈ Latte。因此，我们出于效率考虑选择了STDiT（顺序）。[这里](/docs/acceleration.md#efficient-stdit)提供了速度基准测试。\n\n\n![Architecture Comparison](/assets/readme/report_arch_comp.png)\n\n为了专注于视频生成，我们希望基于一个强大的图像生成模型来训练我们的模型。PixArt-α是一个经过高效训练的高质量图像生成模型，具有T5条件化的DiT结构。我们使用[PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha)初始化我们的模型，并将插入的时间注意力的投影层初始化为零。这种初始化在开始时保留了模型的图像生成能力，而Latte的架构则不能。插入的注意力将参数数量从5.8亿增加到7.24亿。\n\n![Architecture](/assets/readme/report_arch.jpg)\n\n借鉴PixArt-α和Stable Video Diffusion的成功，我们还采用了渐进式训练策略：在366K预训练数据集上进行16x256x256的训练，然后在20K数据集上进行16x256x256、16x512x512和64x512x512的训练。通过扩展位置嵌入，这一策略极大地降低了计算成本。\n\n我们还尝试在DiT中使用3D patch嵌入器。然而，在时间维度上2倍下采样后，生成的视频质量较低。因此，我们将在下一版本中将下采样留给时间VAE。目前，我们在每3帧采样一次进行16帧训练，以及在每2帧采样一次进行64帧训练。\n\n\n## 数据是训练高质量模型的核心\n\n我们发现数据的数量和质量对生成视频的质量有很大的影响，甚至比模型架构和训练策略的影响还要大。目前，我们只从[HD-VG-130M](https://github.com/daooshee/HD-VG-130M)准备了第一批分割（366K个视频片段）。这些视频的质量参差不齐，而且字幕也不够准确。因此，我们进一步从提供免费许可视频的[Pexels](https://www.pexels.com/)收集了20k相对高质量的视频。我们使用LLaVA，一个图像字幕模型，通过三个帧和一个设计好的提示来标记视频。有了设计好的提示，LLaVA能够生成高质量的字幕。\n\n![Caption](/assets/readme/report_caption.png)\n\n由于我们更加注重数据质量，我们准备收集更多数据，并在下一版本中构建一个视频预处理流程。\n\n## 训练细节\n\n在有限的训练预算下，我们只进行了一些探索。我们发现学习率1e-4过大，因此将其降低到2e-5。在进行大批量训练时，我们发现`fp16`比`bf16`不太稳定，可能会导致生成失败。因此，我们在64x512x512的训练中切换到`bf16`。对于其他超参数，我们遵循了之前的研究工作。\n\n## 损失曲线\n\n16x256x256 预训练损失曲线\n\n![16x256x256 Pretraining Loss Curve](/assets/readme/report_loss_curve_1.png)\n\n16x256x256 高质量训练损失曲线\n\n![16x256x256 HQ Training Loss Curve](/assets/readme/report_loss_curve_2.png)\n\n16x512x512 高质量训练损失曲线\n\n![16x512x512 HQ Training Loss Curve](/assets/readme/report_loss_curve_3.png)\n"
  },
  {
    "path": "Open-Sora/docs/zh_CN/report_v2.md",
    "content": "# Open-Sora 1.1 技术报告\n\n- [模型架构修改](#模型架构修改)\n- [支持不同视频长度/分辨率/宽高比/帧率（fps）训练](#支持不同视频长度分辨率宽高比帧率fps训练)\n- [使用Masked DiT作为图生视频/视频生视频模型](#使用masked-dit作为图生视频视频生视频模型)\n- [数据收集和流程](#数据收集和流程)\n- [训练详情](#训练详情)\n- [结果和评价](#结果和评价)\n- [不足和下一步计划](#不足和下一步计划)\n\n在Open-Sora1.1版本中，我们使用了10M数据来训练经过结构调优后的STDiT的700M模型（Open-Sora1.0版本仅用400K数据）。我们实现了[Sora报告](https://openai.com/research/video-generation-models-as-world-simulators)中提到的以下功能：\n\n- 可变的视频时长、分辨率、宽高比（包括采样灵活性、改进的取景范围和构图）\n- 提示词增加图片和视频选项（使图像动起来、生成式增长视频、视频到视频编辑、连接不同视频）\n- 图像生成功能\n\n为了实现这一目标，我们在预训练阶段使用了多任务学习。对于扩散模型来说，用不同的采样时间步长进行训练已经是一种多任务学习。我们将这一思想在图像和视频的条件生成模型上，进一步扩展到多分辨率、宽高比、帧长、fps以及不同的掩码策略。我们在**0~15s、144p到720p、各种宽高比的视频**上训练模型。虽然由于训练FLOPs不足的限制，生成的视频在时间一致性上的表现没有那么高，但我们仍然可以看到这个模型的巨大潜力。\n\n## 模型架构修改\n\n我们对原始ST-DiT模型进行了以下修改，以获得更好的训练稳定性和模型性能（ST-DiT-2）：\n\n- **在时间注意力模块中添加[旋转位置编码](https://arxiv.org/abs/2104.09864)**：遵循目前LLM的最佳实践，我们将时间注意力模块中的正弦位置编码更改为旋转位置编码，因为它也算一项序列预测任务。\n- **在时间注意力模块中添加AdaIN和Layernormal**：我们将时间注意力与AdaIN和Layer范数作为空间注意力包裹起来，以稳定训练。\n- **[QK归一化](https://arxiv.org/abs/2302.05442)与[RMSNorm](https://arxiv.org/abs/1910.07467)**：和[SD3](https://arxiv.org/pdf/2403.03206.pdf)类似地，我们应用QK归一化来提高半精度训练的稳定性。\n- **支持动态输入大小和视频条件限定**：为了支持多分辨率、宽高比和fps训练，我们ST-DiT-2来接受任何输入大小。延申[PixArt-alpha](https://github.com/PixArt-alpha/PixArt-alpha)的想法，我们支持限定视频的高度、宽度、宽高比、帧长和fps。\n- **将T5token数量从120扩展到200**：我们使用的视频描述通常少于200个token，我们发现模型也可以很好地处理更长的文本。\n\n## 支持不同视频长度/分辨率/宽高比/帧率（fps）训练\n\n正如[Sora报告](https://openai.com/research/video-generation-models-as-world-simulators)中提到的，使用原始无损视频的分辨率、宽高比和视频长度进行训练可以增加采样灵活性，改善取景和构图。我们找到了三种实现这一目标的方法：\n- [NaViT](https://arxiv.org/abs/2307.06304)：通过不同掩码策略支持在同一训练批次内使用不同大小的数据，并且训练效率下降很少。然而，该系统实现起来有点复杂，并且可能无法兼容kernal优化技术（如flashattention）。\n- 填充（[FiT](https://arxiv.org/abs/2402.12376)，[Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan)）：通过填充支持同一批次内的不同大小的数据。然而，将不同的分辨率填充到相同的大小会导致效率降低。\n- 分桶训练（[SDXL](https://arxiv.org/abs/2307.01952)、[PixArt](https://arxiv.org/abs/2310.00426)）：支持通过分桶的方式在不同批次中动态调整大小，但在同一批次内数据大小必须相同，只能应用固定数量的数据大小。在一个批次中，我们不需要实现复杂的掩码或填充。\n\n为了更便捷的实现，我们选择分桶训练的方式。我们预先定义了一些固定的分辨率，并将不同的样本分配到不同的桶中。下面列出了分桶方案中值得注意的点。但我们可以看到，这些在我们的实验中并不是一个大问题。\n\n<details>\n<summary>查看注意事项</summary>\n\n- 桶大小被限制为固定数量：首先，在实际应用中，通常只使用少数宽高比（9:16、3:4）和分辨率（240p、1080p）。其次，我们发现经过训练的模型可以很好地推广到未见过的解决方案。\n- 每批的大小相同，打破了独立同分布（i.i.d.）假设：由于我们使用多个 GPU，因此不同 GPU 上的本地批次具有不同的大小。我们没有发现此问题导致性能显着下降。\n- 可能没有足够的样本来填充每个桶，并且分布可能有偏差：首先，当本地批量大小不太大时，我们的数据集足够大以填充每个桶。其次，我们应该分析数据大小的分布并相应地定义桶大小。第三，分配不平衡并没有显着影响训练过程。\n- 不同的分辨率和帧长可能有不同的处理速度：与PixArt只处理相似分辨率（相似token数）的宽高比不同，我们需要考虑不同分辨率和帧长的处理速度。我们可以使用“bucket_config”来定义每个桶的批量大小，以确保处理速度相似。\n\n</details>\n\n![bucket](/assets/readme/report_bucket.png)\n\n如图所示，桶是（分辨率，帧数量，宽高比）的三元组。我们为不同的分辨率提供预定义的宽高比，涵盖了大多数常见的视频宽高比。在每个epoch之前，我们打乱数据集并将样本分配到不同的桶中，如图所示。我们将样本放入最大分辨率和帧长度小于视频的桶中。\n\n考虑到我们的计算资源有限，我们进一步为每个（分辨率，num_frame）二元组引入keep_prob和batch_size两个属性，以降低计算成本并实现多阶段训练。具体来说，高清视频将以概率1-keep_prob下采样到较低分辨率的桶中，并且每个桶的样本数量是由batch_size属性决定的。这样，我们可以控制不同桶中的样本数量，并通过为每个桶搜索合适的数据量来平衡GPU负载。\n\n有关训练中桶使用的详细说明，请参阅[配置文件](/docs/config.md#training-bucket-configs).\n\n## 使用Masked DiT作为图生视频/视频生视频模型\n\nTransformer可以很容易地扩展到支持图生图和视频生视频的任务。我们提出了一种蒙版策略来支持图像和视频的调节。蒙版策略如下图所示。\n\n![mask strategy](/assets/readme/report_mask.png)\n\n在将图像或视频转换成另一个视频的过程中，我们通常会选择出需要作为条件的帧并取消其掩码（unmask）。在使用ST-DiT模型进行前向传播时，被选择取消掩码（unmask）的帧将被赋予时间步长0，而其他帧则保持它们原有的时间步长t。我们发现，如果直接将这种策略应用到训练好的模型上，会得到较差的结果，因为扩散模型在训练过程中并未学会如何处理一个样本中具有不同时间步长的帧。\n\n受[UL2](https://arxiv.org/abs/2205.05131)的启发，我们在训练期间引入了随机掩码策略。具体来说，我们在训练期间随机取消掩码帧，包括取消掩码第一帧，前k帧，最后k帧，最后k帧，第一和最后k帧，随机帧等。基于Open-Sora 1.0模型，以50%的概率应用掩码策略，我们发现模型能够在10,000步的训练中学会处理图像条件（而30%的概率会导致处理能力变差），同时文本到视频的性能略有下降。因此，在Open-Sora 1.1版本中，我们从头开始预训练模型，并采用了掩码策略。\n\n下图给出了用于推理的掩码策略配置的说明。五数字元组在定义掩码策略方面提供了极大的灵活性。\n\n![mask strategy config](/assets/readme/report_mask_config.png)\n\n掩码策略用法的详细说明可在[配置文件](/docs/config.md#advanced-inference-config)中查看.\n\n\n## 数据收集和流程\n\n正如我们在Sora1.0版本中看见的那样，数据数量和质量对于训练一个好的模型至关重要，因此，我们努力扩展数据集。首先，我们创建了一个遵循[SVD](https://arxiv.org/abs/2311.15127)的自动流水线，包括场景切割、字幕、各种评分和过滤以及数据集管理脚本和通用惯例。\n\n![pipeline](/assets/readme/report_data_pipeline.png)\n\n我们计划使用[panda-70M](https://snap-research.github.io/Panda-70M/)和其他数据来训练模型，大约包含3000万条数据。然而，我们发现磁盘输入输出（disk IO）在同时进行训练和数据处理时成为了一个瓶颈。因此，我们只能准备一个包含1000万条数据的数据集，并且没有完成我们构建的所有处理流程。最终，我们使用了包含970万视频和260万图像的数据集进行预训练，以及560,000视频和160万图像的数据集进行微调。预训练数据集的统计信息如下所示。\n\n图像文本标记 (使用T5分词器)：\n![image text tokens](/assets/readme/report_image_textlen.png)\n\n视频文本标记 (使用T5分词器)。我们直接使用Panda的短视频描述进行训练，并自己给其他数据集加视频描述。生成的字幕通常少于200个token。\n![video text tokens](/assets/readme/report_video_textlen.png)\n\n视频时长：\n![video duration](/assets/readme/report_video_duration.png)\n\n## 训练详情\n\n由于计算资源有限，我们必须仔细监控训练过程，并在推测模型学习不佳时更改训练策略，因为没有消融研究的计算。因此，Open-Sora1.1版本的训练包括多个更改，所以，指数移动平均（EMA）未被应用。\n\n1. 首先，我们从`Pixart-alpha-1024`的模型checkpoint开始，使用不同分辨率的图像进行了6000步的微调。我们发现模型能够很容易地适应并生成不同分辨率的图像。为了加快扩散过程的训练，我们使用了[SpeeDiT](https://github.com/1zeryu/SpeeDiT)（iddpm-speed）技术。\n2. **[阶段一]** 然后，我们使用梯度检查点（gradient-checkpointing）技术对模型进行了**24,000**步的预训练，这个过程在64个H800 GPU上运行了**4天**。尽管模型看到的数据样本数量相同，我们发现与使用较小批量大小相比，模型的学习速度较慢。我们推测，在训练的早期阶段，步数的数量对于训练更为重要。大多数视频的分辨率是**240p**，预训练时使用的配置与[stage2.py](/configs/opensora-v1-1/train/stage2.py)相似。\n3. **[阶段一]** 为了增加训练步数，我们改用了更小的批量大小，并且没有使用梯度检查点技术。在这个阶段，我们还引入了帧率（fps）条件。模型训练了**40,000**步，持续了**2天**。训练中使用的视频大多数是**144p**分辨率，使用的配置文件是[stage1.py](/configs/opensora-v1-1/train/stage1.py)。我们使用较低的分辨率，因为我们在Open-Sora 1.0版本中发现模型可以以相对较低的分辨率学习时间知识。\n4. **[阶段一]** 我们发现模型不能很好地学习长视频，并在Open-Sora1.0训练中发现了一个噪声生成结果，推测是半精度问题。因此，我们采用QK-归一化来稳定训练。我们还将iddpm-speed切换成iddpm。我们训练了**17k**步**14小时**。大多数视频的分辨率是144p，预训练时使用的配置是[stage1.py](/configs/opensora-v1-1/train/stage1.py)。阶段1训练持续约一周，总步长**81k**。\n5. **[阶段二]** 我们切换到更高的分辨率，其中大多数视频是**240p和480p**分辨率（[stage2.py](/configs/opensora-v1-1/train/stage2.py)）。我们在所有预训练数据上训练了**22000**步，持续**一天**。\n6. **[阶段三]** 我们切换到更高的分辨率，大多数视频的分辨率是**480p和720p**（[stage3.py](/configs/opensora-v1-1/train/stage3.py)）。我们在高质量数据上训了**4000**步，用时**一天**。\n\n## 结果和评价\n\n## 不足和下一步计划\n\n随着我们离Sora的复现又近了一步，我们发现当前模型存在许多不足，这些不足将在我们下阶段工作中得到改善。\n\n- **噪音的生成和影响**：我们发现生成的模型，特别是长视频中，有时很多噪点，不流畅。我们认为问题在于没有使用时间VAE。由于[Pixart-Sigma](https://arxiv.org/abs/2403.04692)发现适应新VAE很容易，我们计划在下一个版本中为模型开发时间VAE。\n- **缺乏时间一致性**：我们发现模型无法生成具有高时间一致性的视频，我们认为问题是由于缺乏训练FLOPs，我们计划收集更多数据并继续训练模型以提高时间一致性。\n- **人像生成质量低**：我们发现模型无法生成高质量的人类视频，我们认为问题是由于缺乏人类数据，我们计划收集更多的人类数据，并继续训练模型以提高人类生成。\n- **美学得分低**：我们发现模型的美学得分不高。问题在于缺少美学得分过滤，由于IO瓶颈没我们没有进行这一步骤。我们计划通过美学得分和微调模型来过滤数据，以提高美学得分。\n- **长视频生成质量低**：我们发现，使用同样的提示词，视频越长，质量越差。这意味着图像质量不能同等地被不同长度的序列所适应。\n\n> - **算法与加速实现**：Zangwei Zheng, Xiangyu Peng, Shenggui Li, Hongxing Liu, Yukun Zhou\n> - **数据收集与处理**：Xiangyu Peng, Zangwei Zheng, Chenhui Shen, Tom Young, Junjie Wang, Chenfeng Yu\n"
  },
  {
    "path": "Open-Sora/docs/zh_CN/report_v3.md",
    "content": "# Open-Sora 1.2 报告\n\n- [视频压缩网络](#视频压缩网络)\n- [整流流和模型适应](#整流流和模型适应)\n- [更多数据和更好的多阶段训练](#更多数据和更好的多阶段训练)\n- [简单有效的模型调节](#简单有效的模型调节)\n- [评估](#评估)\n\n在 Open-Sora 1.2 版本中，我们在 >30M 数据上训练了 一个1.1B 的模型，支持 0s~16s、144p 到 720p、各种宽高比的视频生成。我们的配置如下所列。继 1.1 版本之后，Open-Sora 1.2 还可以进行图像到视频的生成和视频扩展。\n\n|      | 图像 | 2秒  | 4秒  | 8秒  | 16秒 |\n| ---- | ----- | --- | --- | --- | --- |\n| 240p | ✅     | ✅   | ✅   | ✅   | ✅   |\n| 360p | ✅     | ✅   | ✅   | ✅   | ✅   |\n| 480p | ✅     | ✅   | ✅   | ✅   | 🆗   |\n| 720p | ✅     | ✅   | ✅   | 🆗   | 🆗   |\n\n这里✅表示在训练期间可以看到数据，🆗表示虽然没有经过训练，但模型可以在该配置下进行推理。🆗的推理需要多个80G内存的GPU和序列并行。\n\n除了 Open-Sora 1.1 中引入的功能外，Open-Sora 1.2 还有以下重磅更新：\n\n- 视频压缩网络\n- 整流流训练\n- 更多数据和更好的多阶段训练\n- 简单有效的模型调节\n- 更好的评估指标\n\n上述改进的所有实现（包括训练和推理）均可在 Open-Sora 1.2 版本中使用。以下部分将介绍改进的细节。我们还改进了代码库和文档，使其更易于使用。\n\n## 视频压缩网络\n\n对于 Open-Sora 1.0 & 1.1，我们使用了 stable-ai 的 83M 2D VAE，它仅在空间维度上压缩，将视频压缩 8x8 倍。为了减少时间维度，我们每三帧提取一帧。然而，这种方法导致生成的视频流畅度较低，因为牺牲了生成的帧率（fps）。因此，在这个版本中，我们引入了像 OpenAI 的 Sora 一样的视频压缩网络。该网络在时域上将视频大小压缩至四分之一，因此，我们不必再额外抽帧，而可以使用原有帧率生成模型。\n\n考虑到训练 3D VAE 的计算成本很高，我们希望重新利用在 2D VAE 中学到的知识。我们注意到，经过 2D VAE 压缩后，时间维度上相邻的特征仍然高度相关。因此，我们提出了一个简单的视频压缩网络，首先将视频在空间维度上压缩 8x8 倍，然后将视频在时间维度上压缩 4 倍。网络如下所示：\n\n![video_compression_network](/assets/readme/report_3d_vae.png)\n\n我们用[SDXL 的 VAE](https://huggingface.co/stabilityai/sdxl-vae)初始化 2D VAE ，它比我们以前使用的更好。对于 3D VAE，我们采用[Magvit-v2](https://magvit.cs.cmu.edu/v2/)中的 VAE 结构，它包含 300M 个参数。加上 83M 的 2D VAE，视频压缩网络的总参数为 384M。我们设定batch size 为 1， 对 3D VAE 进行了 1.2M 步的训练。训练数据是来自 pixels 和 pixabay 的视频，训练视频大小主要是 17 帧，256x256 分辨率。3D VAE 中使用causal convolotions使图像重建更加准确。\n\n我们的训练包括三个阶段：\n\n1. 对于前 380k 步，我们冻结 2D VAE并在 8 个 GPU 上进行训练。训练目标包括重建 2D VAE 的压缩特征（图中粉红色），并添加损失以使 3D VAE 的特征与 2D VAE 的特征相似（粉红色和绿色，称为identity loss）。我们发现后者的损失可以快速使整个 VAE 在图像上取得良好的性能，并在下一阶段更快地收敛。\n2. 对于接下来的 260k 步，我们消除identity loss并仅学习 3D VAE。\n3. 对于最后 540k 步，由于我们发现仅重建 2D VAE 的特征无法带来进一步的改进，因此我们移除了loss并训练整个 VAE 来重建原始视频。此阶段在 24 个 GPU 上进行训练。\n\n对于训练的前半部分，我们采用 20% 的图像和 80% 的视频。按照[Magvit-v2](https://magvit.cs.cmu.edu/v2/)，我们使用 17 帧训练视频，同时对图像的前 16 帧进行零填充。然而，我们发现这种设置会导致长度不同于 17 帧的视频变得模糊。因此，在第 3 阶段，我们使用不超过34帧长度的任意帧长度视频进行混合视频长度训练,以使我们的 VAE 对不同视频长度更具鲁棒性（也就是说，如果我们希望训练含有n帧的视频，我们就把原视频中`34-n`帧用0进行填充）。我们的 [训练](/scripts/train_vae.py)和[推理](/scripts/inference_vae.py)代码可在 Open-Sora 1.2 版本中找到。\n\n当使用 VAE 进行扩散模型时，我们的堆叠 VAE 所需的内存较少，因为我们的 VAE 的输入已经经过压缩。我们还将输入视频拆分为几个 17 帧剪辑，以提高推理效率。我们的 VAE 与[Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/main/docs/Report-v1.1.0.md)中的另一个开源 3D VAE 性能相当。\n\n| 模型          | 结构相似性↑ | 峰值信噪比↑  |\n| ------------------ | ----- | ------ |\n| Open-Sora-Plan 1.1 | 0.882 | 29.890 |\n| Open-Sora 1.2      | 0.880 | 30.590 |\n\n## 整流流和模型适应\n\n最新的扩散模型 Stable Diffusion 3 为了获得更好的性能，采用了[rectified flow](https://github.com/gnobitab/RectifiedFlow)替代了 DDPM。可惜 SD3 的 rectified flow 训练代码没有开源。不过 Open-Sora 1.2 提供了遵循 SD3 论文的训练代码，包括：\n\n- 基本整流流训练\n- 用于训练加速的 Logit-norm 采样\n- 分辨率和视频长度感知时间步长采样\n\n对于分辨率感知的时间步长采样，我们应该对分辨率较大的图像使用更多的噪声。我们将这个想法扩展到视频生成，对长度较长的视频使用更多的噪声。\n\nOpen-Sora 1.2 从[PixArt-Σ 2K](https://github.com/PixArt-alpha/PixArt-sigma) 模型checkpoint开始。请注意，此模型使用 DDPM 和 SDXL VAE 进行训练，分辨率也高得多。我们发现在小数据集上进行微调可以轻松地使模型适应我们的视频生成设置。适应过程如下，所有训练都在 8 个 GPU 上完成：\n\n1. 多分辨率图像生成能力：我们训练模型以 20k 步生成从 144p 到 2K 的不同分辨率。\n2. QK-norm：我们将 QK-norm 添加到模型中并训练 18k 步。\n3. 整流流：我们从离散时间 DDPM 转变为连续时间整流流并训练 10k 步。\n4. 使用 logit-norm 采样和分辨率感知时间步采样的整流流：我们训练 33k 步。\n5. 较小的 AdamW epsilon：按照 SD3，使用 QK-norm，我们可以对 AdamW 使用较小的 epsilon（1e-15），我们训练 8k 步。\n6. 新的 VAE 和 fps 调节：我们用自己的 VAE 替换原来的 VAE，并将 fps 调节添加到时间步调节中，我们训练 25k 步。请注意，对每个通道进行规范化对于整流流训练非常重要。\n7. 时间注意力模块：我们添加时间注意力模块，其中没有初始化投影层。我们在图像上进行 3k 步训练。\n8. 仅针对具有掩码策略的视频的时间块：我们仅在视频上训练时间注意力块，步长为 38k。\n\n经过上述调整后，我们就可以开始在视频上训练模型了。上述调整保留了原始模型生成高质量图像的能力，并未后续的视频生成提供了许多助力：\n\n- 通过整流，我们可以加速训练，将视频的采样步数从100步减少到30步，大大减少了推理的等待时间。\n- 使用 qk-norm，训练更加稳定，并且可以使用积极的优化器。\n- 采用新的VAE，时间维度压缩了4倍，使得训练更加高效。\n- 该模型具有多分辨率图像生成能力，可以生成不同分辨率的视频。\n\n## 更多数据和更好的多阶段训练\n\n由于计算预算有限，我们精心安排了训练数据的质量从低到高，并将训练分为三个阶段。我们的训练涉及 12x8 GPU，总训练时间约为 2 周， 约70k步。\n\n### 第一阶段\n\n我们首先在 Webvid-10M 数据集（40k 小时）上训练模型，共 30k 步（2 个 epoch）。由于视频分辨率均低于 360p 且包含水印，因此我们首先在此数据集上进行训练。训练主要在 240p 和 360p 上进行，视频长度为 2s~16s。我们使用数据集中的原始字幕进行训练。训练配置位于[stage1.py](/configs/opensora-v1-2/train/stage1.py)中。\n\n### 第二阶段\n\n然后我们在 Panda-70M 数据集上训练模型。这个数据集很大，但质量参差不齐。我们使用官方的 30M 子集，其中的片段更加多样化，并过滤掉美学评分低于 4.5 的视频。这产生了一个 20M 子集，包含 41k 小时。数据集中的字幕直接用于我们的训练。训练配置位于[stage2.py](/configs/opensora-v1-2/train/stage2.py)中。\n\n训练主要在 360p 和 480p 上进行。我们训练模型 23k 步，即 0.5 个 epoch。训练尚未完成，因为我们希望我们的新模型能早日与大家见面。\n\n### 第三阶段\n\n在此阶段，我们从各种来源收集了 200 万个视频片段，总时长 5000 小时，其中包括：\n\n- 来自 Pexels、Pixabay、Mixkit 等的免费授权视频。\n- [MiraData](https://github.com/mira-space/MiraData)：一个包含长视频的高质量数据集，主要来自游戏和城市/风景探索。\n- [Vript](https://github.com/mutonix/Vript/tree/main)：一个密集注释的数据集。\n- 还有一些其他数据集。\n\nMiraData 和 Vript 有来自 GPT 的字幕，而我们使用[PLLaVA](https://github.com/magic-research/PLLaVA)为其余字幕添加字幕。与只能进行单帧/图像字幕的 LLaVA 相比，PLLaVA 是专门为视频字幕设计和训练的。[加速版PLLaVA](/tools/caption/README.md#pllava-captioning)已在我们的`tools/`中发布。在实践中，我们使用预训练的 PLLaVA 13B 模型，并从每个视频中选择 4 帧生成字幕，空间池化形状为 2*2。\n\n下面显示了此阶段使用的视频数据的一些统计数据。我们提供了持续时间和分辨率的基本统计数据，以及美学分数和光流分数分布。我们还从视频字幕中提取了对象和动作的标签并计算了它们的频率。\n![stats](/assets/readme/report-03_video_stats.png)\n![object_count](/assets/readme/report-03_objects_count.png)\n![object_count](/assets/readme/report-03_actions_count.png)\n\n此阶段我们主要在 720p 和 1080p 上进行训练，以提高模型在高清视频上的表现力。在训练中，我们使用的掩码率为25%。训练配置位于[stage3.py](/configs/opensora-v1-2/train/stage3.py)中。我们对模型进行 15k 步训练，大约为 2 个 epoch。\n\n## 简单有效的模型调节\n\n对于第 3 阶段，我们计算每个视频片段的美学分数和运动分数。但是，由于视频片段数量较少，我们不愿意过滤掉得分较低的片段，这会导致数据集较小。相反，我们将分数附加到字幕中并将其用作条件。我们发现这种方法可以让模型了解分数并遵循分数来生成质量更好的视频。\n\n例如，一段美学评分为 5.5、运动评分为 10 且检测到摄像头运动向左平移的视频，其字幕将为：\n\n```plaintext\n[Original Caption] aesthetic score: 5.5, motion score: 10, camera motion: pan left.\n```\n\n在推理过程中，我们还可以使用分数来调节模型。对于摄像机运动，我们仅标记了 13k 个具有高置信度的剪辑，并且摄像机运动检测模块已在我们的工具中发布。\n\n## 评估\n\n之前，我们仅通过人工评估来监控训练过程，因为 DDPM 训练损失与生成的视频质量没有很好的相关性。但是，对于校正流，如 SD3 中所述，我们发现训练损失与生成的视频质量有很好的相关性。因此，我们跟踪了 100 张图像和 1k 个视频的校正流评估损失。\n\n我们从 pixabay 中抽样了 1k 个视频作为验证数据集。我们计算了不同分辨率（144p、240p、360p、480p、720p）下图像和不同长度的视频（2s、4s、8s、16s）的评估损失。对于每个设置，我们等距采样 10 个时间步长。然后对所有损失取平均值。\n\n![Evaluation Loss](/assets/readme/report_val_loss.png)\n![Video Evaluation Loss](/assets/readme/report_vid_val_loss.png)\n\n此外，我们还会在训练过程中跟踪[VBench](https://vchitect.github.io/VBench-project/)得分。VBench 是用于短视频生成的自动视频评估基准。我们用 240p 2s 视频计算 vbench 得分。这两个指标验证了我们的模型在训练过程中持续改进。\n\n![VBench](/assets/readme/report_vbench_score.png)\n\n所有评估代码均发布在`eval`文件夹中。查看[评估指南](/eval/README.md)了解更多详细信息。\n\n|模型        | 总得分 | 质量得分 | 语义分数 |\n| -------------- | ----------- | ------------- | -------------- |\n| Open-Sora V1.0 | 75.91%      | 78.81%        | 64.28%         |\n| Open-Sora V1.2 | 79.23%      | 80.71%        | 73.30%         |\n\n## 序列并行\n\n我们使用序列并行来支持长序列训练和推理。我们的实现基于Ulysses，工作流程如下所示。启用序列并行后，我们只需要将 `all-to-all` 通信应用于STDiT中的空间模块（spatial block），因为在序列维度上，只有对空间信息的计算是相互依赖的。\n\n![SP](/assets/readme/sequence_parallelism.jpeg)\n\n目前，由于训练数据分辨率较小，我们尚未使用序列并行进行训练，我们计划在下一个版本中使用。至于推理，我们可以使用序列并行，以防您的 GPU 内存不足。下表显示，序列并行可以实现加速：\n\n| 分辨率 | 时长 | GPU数量 | 是否启用序列并行 |用时（秒） | 加速效果/GPU |\n| ---------- | ------- | -------------- | --------- | ------------ | --------------- |\n| 720p       | 16秒     | 1              | 否        | 547.97       | -               |\n| 720p       | 16s秒    | 2              | 是        | 244.38       | 12%             |\n\n"
  },
  {
    "path": "Open-Sora/docs/zh_CN/structure.md",
    "content": "# 代码仓库和配置文件结构\n\n## 代码仓库结构\n\n```plaintext\nOpen-Sora\n├── README.md\n├── docs\n│   ├── acceleration.md            -> Acceleration & Speed benchmark\n│   ├── command.md                 -> Commands for training & inference\n│   ├── datasets.md                -> Datasets used in this project\n│   ├── structure.md               -> This file\n│   └── report_v1.md               -> Report for Open-Sora v1\n├── scripts\n│   ├── train.py                   -> diffusion training script\n│   └── inference.py               -> Report for Open-Sora v1\n├── configs                        -> Configs for training & inference\n├── opensora\n│   ├── __init__.py\n│   ├── registry.py                -> Registry helper\n│   ├── acceleration               -> Acceleration related code\n│   ├── dataset                    -> Dataset related code\n│   ├── models\n│   │   ├── layers                 -> Common layers\n│   │   ├── vae                    -> VAE as image encoder\n│   │   ├── text_encoder           -> Text encoder\n│   │   │   ├── classes.py         -> Class id encoder (inference only)\n│   │   │   ├── clip.py            -> CLIP encoder\n│   │   │   └── t5.py              -> T5 encoder\n│   │   ├── dit\n│   │   ├── latte\n│   │   ├── pixart\n│   │   └── stdit                  -> Our STDiT related code\n│   ├── schedulers                 -> Diffusion schedulers\n│   │   ├── iddpm                  -> IDDPM for training and inference\n│   │   └── dpms                   -> DPM-Solver for fast inference\n│   └── utils\n└── tools                          -> Tools for data processing and more\n```\n\n## 配置文件结构\n\n\n我们的配置文件遵循[MMEgine](https://github.com/open-mmlab/mmengine)。 MMEngine 将读取配置文件（“.py”文件）并将其解析为类似字典的对象。\n\n```plaintext\nOpen-Sora\n└── configs                        -> Configs for training & inference\n    ├── opensora                   -> STDiT related configs\n    │   ├── inference\n    │   │   ├── 16x256x256.py      -> Sample videos 16 frames 256x256\n    │   │   ├── 16x512x512.py      -> Sample videos 16 frames 512x512\n    │   │   └── 64x512x512.py      -> Sample videos 64 frames 512x512\n    │   └── train\n    │       ├── 16x256x256.py      -> Train on videos 16 frames 256x256\n    │       ├── 16x256x256.py      -> Train on videos 16 frames 256x256\n    │       └── 64x512x512.py      -> Train on videos 64 frames 512x512\n    ├── dit                        -> DiT related configs\n    │   ├── inference\n    │   │   ├── 1x256x256-class.py -> Sample images with ckpts from DiT\n    │   │   ├── 1x256x256.py       -> Sample images with clip condition\n    │   │   └── 16x256x256.py      -> Sample videos\n    │   └── train\n    │       ├── 1x256x256.py       -> Train on images with clip condition\n    │       └── 16x256x256.py      -> Train on videos\n    ├── latte                      -> Latte related configs\n    └── pixart                     -> PixArt related configs\n```\n\n## 推理配置演示\n\n要更改推理设置，可以直接修改相应的配置文件。或者您可以传递参数来覆盖配置文件（[config_utils.py](/opensora/utils/config_utils.py)）。要更改采样提示，您应该修改传递给“--prompt_path”参数的“.txt”文件。\n\n```plaintext\n--prompt_path ./assets/texts/t2v_samples.txt  -> prompt_path\n--ckpt-path ./path/to/your/ckpt.pth           -> model[\"from_pretrained\"]\n```\n\n下面提供了每个字段的解释。\n\n```python\n# Define sampling size\nnum_frames = 64               # number of frames\nfps = 24 // 2                 # frames per second (divided by 2 for frame_interval=2)\nimage_size = (512, 512)       # image size (height, width)\n\n# Define model\nmodel = dict(\n    type=\"STDiT-XL/2\",        # Select model type (STDiT-XL/2, DiT-XL/2, etc.)\n    space_scale=1.0,          # (Optional) Space positional encoding scale (new height / old height)\n    time_scale=2 / 3,         # (Optional) Time positional encoding scale (new frame_interval / old frame_interval)\n    enable_flash_attn=True,    # (Optional) Speed up training and inference with flash attention\n    enable_layernorm_kernel=True, # (Optional) Speed up training and inference with fused kernel\n    from_pretrained=\"PRETRAINED_MODEL\",  # (Optional) Load from pretrained model\n    no_temporal_pos_emb=True,  # (Optional) Disable temporal positional encoding (for image)\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\", # Select VAE type\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\", # Load from pretrained VAE\n    micro_batch_size=128,      # VAE with micro batch size to save memory\n)\ntext_encoder = dict(\n    type=\"t5\",                 # Select text encoder type (t5, clip)\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\", # Load from pretrained text encoder\n    model_max_length=120,      # Maximum length of input text\n)\nscheduler = dict(\n    type=\"iddpm\",              # Select scheduler type (iddpm, dpm-solver)\n    num_sampling_steps=100,    # Number of sampling steps\n    cfg_scale=7.0,             # hyper-parameter for classifier-free diffusion\n)\ndtype = \"fp16\"                 # Computation type (fp16, fp32, bf16)\n\n# Other settings\nbatch_size = 1                 # batch size\nseed = 42                      # random seed\nprompt_path = \"./assets/texts/t2v_samples.txt\"  # path to prompt file\nsave_dir = \"./samples\"         # path to save samples\n```\n\n## 训练配置演示\n\n```python\n# Define sampling size\nnum_frames = 64\nframe_interval = 2             # sample every 2 frames\nimage_size = (512, 512)\n\n# Define dataset\nroot = None                    # root path to the dataset\ndata_path = \"CSV_PATH\"         # path to the csv file\nuse_image_transform = False    # True if training on images\nnum_workers = 4                # number of workers for dataloader\n\n# Define acceleration\ndtype = \"bf16\"                 # Computation type (fp16, bf16)\ngrad_checkpoint = True         # Use gradient checkpointing\nplugin = \"zero2\"               # Plugin for distributed training (zero2, zero2-seq)\nsp_size = 1                    # Sequence parallelism size (1 for no sequence parallelism)\n\n# Define model\nmodel = dict(\n    type=\"STDiT-XL/2\",\n    space_scale=1.0,\n    time_scale=2 / 3,\n    from_pretrained=\"YOUR_PRETRAINED_MODEL\",\n    enable_flash_attn=True,        # Enable flash attention\n    enable_layernorm_kernel=True, # Enable layernorm kernel\n)\nvae = dict(\n    type=\"VideoAutoencoderKL\",\n    from_pretrained=\"stabilityai/sd-vae-ft-ema\",\n    micro_batch_size=128,\n)\ntext_encoder = dict(\n    type=\"t5\",\n    from_pretrained=\"/root/autodl-tmp/pretrained_models/DeepFloyd/t5-v1_1-xxl\",\n    model_max_length=120,\n    shardformer=True,           # Enable shardformer for T5 acceleration\n)\nscheduler = dict(\n    type=\"iddpm\",\n    timestep_respacing=\"\",      # Default 1000 timesteps\n)\n\n# Others\nseed = 42\noutputs = \"outputs\"             # path to save checkpoints\nwandb = False                   # Use wandb for logging\n\nepochs = 1000                   # number of epochs (just large enough, kill when satisfied)\nlog_every = 10\nckpt_every = 250\nload = None                     # path to resume training\n\nbatch_size = 4\nlr = 2e-5\ngrad_clip = 1.0                 # gradient clipping\n```\n"
  },
  {
    "path": "Open-Sora/docs/zh_CN/vae.md",
    "content": "# VAE 技术报告\n\n由于 [Pixart-Sigma](https://arxiv.org/abs/2403.04692) 论文中指出适应新的VAE很简单，因此我们开发了一个额外的时间VAE。\n具体而言, 我们的VAE由一个[空间 VAE](https://huggingface.co/PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers)和一个时间VA相接的形式组成.\n对于时间VAE，我们遵循 [MAGVIT-v2](https://arxiv.org/abs/2310.05737)的实现, 并做了以下修改:\n\n* 我们删除了码本特有的架构。\n* 我们不使用鉴别​​器（discriminator），而是使用VAE重建损失、kl损失和感知损失进行训练。\n* 在编码器的最后一个线性层中，我们缩小到 4 通道的对角高斯分布，遵循我们之前训练的接受 4 通道输入的 STDiT。\n* 我们的解码器与编码器架构对称。\n\n## 训练\n我们分不同阶段训练模型。\n\n我们首先通过在单台机器（8 个 GPU）上冻结空间 VAE 380k 步来训练时间 VAE。我们使用额外的身份损失使 3D VAE 的特征与 2D VAE 的特征相似。我们使用 20% 的图像和 80% 的视频（17 帧）来训练 VAE。\n\n```bash\ntorchrun --nnodes=1 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage1.py --data-path YOUR_CSV_PATH\n```\n\n接下来，我们移除身份损失并训练 3D VAE 管道以重建 260k 步的 2D 压缩视频。\n\n```bash\ntorchrun --nnodes=1 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage2.py --data-path YOUR_CSV_PATH\n```\n\n最后，我们移除了 2D 压缩视频的重建损失，并训练 VAE 管道以构建 540k 步的 3D 视频。我们在 34 帧内使用随机数训练 VAE，使其对不同长度的视频更具鲁棒性。此阶段在 24 个 GPU 上进行训练。\n\n```bash\ntorchrun --nnodes=3 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage3.py --data-path YOUR_CSV_PATH\n```\n\n请注意，您需要根据自己的 csv 数据大小相应地调整配置文件中的 `epochs` 。\n\n## 推理\n\n为了直观地检查 VAE 的性能，您可以运行以下推理。它使用 `_ori` 后缀（即 `\"YOUR_VIDEO_DIR\"_ori`）将原始视频保存到您指定的视频目录中，使用`_rec`后缀（即`\"YOUR_VIDEO_DIR\"_rec`）将来自完整管道的重建视频保存到指定的视频目录中，并使用 `_spatial`后缀（即`\"YOUR_VIDEO_DIR\"_spatial`）将来自 2D 压缩和解压缩的重建视频保存到指定的视频目录中。\n\n```bash\ntorchrun --standalone --nnodes=1 --nproc_per_node=1 scripts/inference_vae.py configs/vae/inference/video.py --ckpt-path YOUR_VAE_CKPT_PATH --data-path YOUR_CSV_PATH --save-dir YOUR_VIDEO_DIR\n```\n## 评估\n然后，我们可以计算 VAE 在 SSIM、PSNR、LPIPS 和 FLOLPIPS 指标上的表现得分。\n\n* SSIM: 结构相似性指数度量，越高越好\n* PSNR: 峰值信噪比，越高越好\n* LPIPS: 学习感知图像质量下降，越低越好\n* [FloLPIPS](https://arxiv.org/pdf/2207.08119): 带有视频插值的LPIPS，越低越好。\n\n```bash\npython eval/vae/eval_common_metric.py --batch_size 2 --real_video_dir YOUR_VIDEO_DIR_ori --generated_video_dir YOUR_VIDEO_DIR_rec --device cuda --sample_fps 24 --crop_size 256 --resolution 256 --num_frames 17 --sample_rate 1 --metric ssim psnr lpips flolpips\n```\n\n## 致谢\n我们非常感谢以下工作：\n* [MAGVIT-v2](https://arxiv.org/abs/2310.05737): Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation\n* [Taming Transformers](https://github.com/CompVis/taming-transformers): Taming Transformers for High-Resolution Image Synthesis\n* [3D blur pooling](https://github.com/adobe/antialiased-cnns/pull/39/commits/3d6f02b6943c58b68c19c07bc26fad57492ff3bc)\n* [Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan)\n"
  },
  {
    "path": "Open-Sora/environment-opensora.yml",
    "content": "name: opensora\nchannels:\n  - defaults\ndependencies:\n  - _libgcc_mutex=0.1=main\n  - _openmp_mutex=5.1=1_gnu\n  - ca-certificates=2024.7.2=h06a4308_0\n  - ld_impl_linux-64=2.38=h1181459_1\n  - libffi=3.4.4=h6a678d5_1\n  - libgcc-ng=11.2.0=h1234567_1\n  - libgomp=11.2.0=h1234567_1\n  - libstdcxx-ng=11.2.0=h1234567_1\n  - ncurses=6.4=h6a678d5_0\n  - openssl=3.0.15=h5eee18b_0\n  - pip=24.2=py39h06a4308_0\n  - python=3.9.19=h955ad1f_1\n  - readline=8.2=h5eee18b_0\n  - setuptools=72.1.0=py39h06a4308_0\n  - sqlite=3.45.3=h5eee18b_0\n  - tk=8.6.14=h39e8969_0\n  - wheel=0.43.0=py39h06a4308_0\n  - xz=5.4.6=h5eee18b_1\n  - zlib=1.2.13=h5eee18b_1\n  - pip:\n      - absl-py==2.1.0\n      - accelerate==0.29.2\n      - addict==2.4.0\n      - aiofiles==23.2.1\n      - aiosignal==1.3.1\n      - altair==5.4.1\n      - annotated-types==0.7.0\n      - antlr4-python3-runtime==4.9.3\n      - anyio==4.4.0\n      - apex==0.1\n      - asttokens==2.4.1\n      - attrs==24.2.0\n      - av==13.0.0\n      - bcrypt==4.2.0\n      - beartype==0.18.5\n      - beautifulsoup4==4.12.3\n      - bitsandbytes==0.43.3\n      - black==24.8.0\n      - boto3==1.35.20\n      - botocore==1.35.20\n      - calflops==0.3.2\n      - certifi==2024.8.30\n      - cffi==1.17.1\n      - cfgv==3.4.0\n      - charset-normalizer==3.3.2\n      - click==8.1.7\n      - cloudpickle==3.0.0\n      - colossalai==0.4.0\n      - comm==0.2.2\n      - contexttimer==0.3.3\n      - contourpy==1.3.0\n      - cryptography==43.0.1\n      - cycler==0.12.1\n      - cython==3.0.11\n      - debugpy==1.8.5\n      - decorator==5.1.1\n      - decord==0.6.0\n      - deprecated==1.2.14\n      - detectron2==0.6\n      - diffusers==0.27.2\n      - dill==0.3.8\n      - distlib==0.3.8\n      - distro==1.9.0\n      - docker-pycreds==0.4.0\n      - easydict==1.13\n      - einops==0.8.0\n      - exceptiongroup==1.2.2\n      - executing==2.1.0\n      - fabric==3.2.2\n      - facexlib==0.3.0\n      - fairscale==0.4.13\n      - fastapi==0.114.0\n      - ffmpy==0.4.0\n      - filelock==3.16.0\n      - filterpy==1.4.5\n      - flash-attn==2.6.3\n      - fonttools==4.53.1\n      - frozenlist==1.4.1\n      - fsspec==2024.9.0\n      - ftfy==6.2.3\n      - future==1.0.0\n      - fvcore==0.1.5.post20221221\n      - galore-torch==1.0\n      - gitdb==4.0.11\n      - gitpython==3.1.43\n      - google==3.0.0\n      - gradio==4.26.0\n      - gradio-client==0.15.1\n      - grpcio==1.66.1\n      - h11==0.14.0\n      - httpcore==1.0.5\n      - httpx==0.27.2\n      - huggingface-hub==0.24.6\n      - hydra-core==1.3.2\n      - identify==2.6.0\n      - idna==3.8\n      - imageio==2.35.1\n      - imgaug==0.4.0\n      - importlib-metadata==8.4.0\n      - importlib-resources==6.4.5\n      - invoke==2.2.0\n      - iopath==0.1.9\n      - ipykernel==6.29.5\n      - ipython==8.18.1\n      - ipywidgets==8.1.5\n      - jedi==0.19.1\n      - jinja2==3.1.4\n      - jiter==0.5.0\n      - jmespath==1.0.1\n      - joblib==1.4.2\n      - jsonschema==4.23.0\n      - jsonschema-specifications==2023.12.1\n      - jupyter-client==8.6.2\n      - jupyter-core==5.7.2\n      - jupyterlab-widgets==3.0.13\n      - kiwisolver==1.4.7\n      - lazy-loader==0.4\n      - llvmlite==0.43.0\n      - lmdb==1.5.1\n      - lpips==0.1.4\n      - lvis==0.5.3\n      - markdown==3.7\n      - markdown-it-py==3.0.0\n      - markupsafe==2.1.5\n      - matplotlib==3.9.2\n      - matplotlib-inline==0.1.7\n      - mdurl==0.1.2\n      - mmengine==0.10.4\n      - mpmath==1.3.0\n      - msgpack==1.1.0\n      - mypy-extensions==1.0.0\n      - narwhals==1.8.1\n      - nest-asyncio==1.6.0\n      - networkx==3.2.1\n      - ninja==1.11.1.1\n      - nodeenv==1.9.1\n      - numba==0.60.0\n      - numpy==1.26.4\n      - nvidia-cublas-cu12==12.1.3.1\n      - nvidia-cuda-cupti-cu12==12.1.105\n      - nvidia-cuda-nvrtc-cu12==12.1.105\n      - nvidia-cuda-runtime-cu12==12.1.105\n      - nvidia-cudnn-cu12==8.9.2.26\n      - nvidia-cufft-cu12==11.0.2.54\n      - nvidia-curand-cu12==10.3.2.106\n      - nvidia-cusolver-cu12==11.4.5.107\n      - nvidia-cusparse-cu12==12.1.0.106\n      - nvidia-nccl-cu12==2.19.3\n      - nvidia-nvjitlink-cu12==12.6.68\n      - nvidia-nvtx-cu12==12.1.105\n      - omegaconf==2.3.0\n      - openai==1.44.1\n      - openai-clip==1.0.1\n      - opencv-python==4.10.0.84\n      - opensora==1.2.0\n      - orjson==3.10.7\n      - packaging==24.1\n      - pandarallel==1.6.5\n      - pandas==2.2.2\n      - parameterized==0.9.0\n      - paramiko==3.4.1\n      - parso==0.8.4\n      - pathspec==0.12.1\n      - peft==0.12.0\n      - pexpect==4.9.0\n      - pillow==10.4.0\n      - platformdirs==4.3.2\n      - plumbum==1.8.3\n      - portalocker==2.10.1\n      - pre-commit==3.8.0\n      - prompt-toolkit==3.0.47\n      - protobuf==5.28.0\n      - psutil==5.9.8\n      - ptyprocess==0.7.0\n      - pure-eval==0.2.3\n      - pyarrow==17.0.0\n      - pycocotools==2.0.8\n      - pycparser==2.22\n      - pydantic==2.9.1\n      - pydantic-core==2.23.3\n      - pydub==0.25.1\n      - pygments==2.18.0\n      - pyiqa==0.1.10\n      - pynacl==1.5.0\n      - pyparsing==3.1.4\n      - python-dateutil==2.9.0.post0\n      - python-multipart==0.0.9\n      - pytorchvideo==0.1.5\n      - pytz==2024.1\n      - pyyaml==6.0.2\n      - pyzmq==26.2.0\n      - ray==2.35.0\n      - referencing==0.35.1\n      - regex==2024.7.24\n      - requests==2.32.3\n      - rich==13.8.1\n      - rotary-embedding-torch==0.5.3\n      - rpds-py==0.20.0\n      - rpyc==6.0.0\n      - ruff==0.6.4\n      - s3transfer==0.10.2\n      - safetensors==0.4.5\n      - scikit-image==0.24.0\n      - scikit-learn==1.5.2\n      - scipy==1.13.1\n      - semantic-version==2.10.0\n      - sentencepiece==0.2.0\n      - sentry-sdk==2.14.0\n      - setproctitle==1.3.3\n      - shapely==2.0.6\n      - shellingham==1.5.4\n      - six==1.16.0\n      - smmap==5.0.1\n      - sniffio==1.3.1\n      - soupsieve==2.6\n      - spaces==0.30.2\n      - stack-data==0.6.3\n      - starlette==0.38.5\n      - sympy==1.13.2\n      - tabulate==0.9.0\n      - tensorboard==2.17.1\n      - tensorboard-data-server==0.7.2\n      - termcolor==2.4.0\n      - threadpoolctl==3.5.0\n      - tifffile==2024.8.30\n      - timm==0.9.16\n      - tokenizers==0.15.2\n      - tomli==2.0.1\n      - tomlkit==0.12.0\n      - torch==2.2.2\n      - torchvision==0.17.2\n      - tornado==6.4.1\n      - tqdm==4.66.5\n      - traitlets==5.14.3\n      - transformers==4.39.3\n      - triton==2.2.0\n      - typer==0.12.5\n      - typing-extensions==4.12.2\n      - tzdata==2024.1\n      - urllib3==1.26.20\n      - uvicorn==0.29.0\n      - virtualenv==20.26.4\n      - wandb==0.17.9\n      - wcwidth==0.2.13\n      - websockets==11.0.3\n      - werkzeug==3.0.4\n      - widgetsnbextension==4.0.13\n      - wrapt==1.16.0\n      - xformers==0.0.25.post1\n      - yacs==0.1.8\n      - yapf==0.40.2\n      - zipp==3.20.1\nprefix: /root/miniconda3/envs/opensora\n"
  },
  {
    "path": "Open-Sora/eval/README.md",
    "content": "# Evalution\n\n## Human evaluation\n\nTo conduct human evaluation, we need to generate various samples. We provide many prompts in `assets/texts`, and defined some test setting covering different resolution, duration and aspect ratio in `eval/sample.sh`. To facilitate the usage of multiple GPUs, we split sampling tasks into several parts.\n\n```bash\n# image (1)\nbash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -1\n# video (2a 2b 2c ...)\nbash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -2a\n# launch 8 jobs at once (you must read the script to understand the details)\nbash eval/human_eval/launch.sh /path/to/ckpt num_frames model_name_for_log\n```\n\n## Rectified Flow Loss\n\nEvaluate the rectified flow loss with the following commands.\n\n```bash\n# image\ntorchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/img.csv --ckpt-path /path/to/ckpt\n\n# video\ntorchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/vid.csv --ckpt-path /path/to/ckpt\n\n# select resolution\ntorchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/vid.csv --ckpt-path /path/to/ckpt --resolution 720p\n```\n\nTo launch multiple jobs at once, use the following script.\n\n```bash\nbash eval/loss/launch.sh /path/to/ckpt model_name\n```\n\nTo obtain an organized list of scores:\n```bash\npython eval/loss/tabulate_rl_loss.py --log_dir path/to/log/dir\n```\n\n## VBench\n\n[VBench](https://github.com/Vchitect/VBench) is a benchmark for short text to video generation. We provide a script for easily generating samples required by VBench.\n\nFirst, generate the relevant videos with the following commands:\n\n```bash\n# vbench task, if evaluation all set start_index to 0, end_index to 2000\nbash eval/sample.sh /path/to/ckpt num_frames model_name_for_log  -4 start_index end_index\n\n# Alternatively, launch 8 jobs at once (you must read the script to understand the details)\nbash eval/vbench/launch.sh /path/to/ckpt num_frames model_name\n\n# in addition, you can specify resolution, aspect ratio, sampling steps, flow, and llm-refine\nbash eval/vbench/launch.sh /path/to/ckpt num_frames model_name res_value aspect_ratio_value steps_value flow_value llm_refine_value\n# for example\n# bash eval/vbench/launch.sh /mnt/jfs-hdd/sora/checkpoints/outputs/042-STDiT3-XL-2/epoch1-global_step16200_llm_refine/ema.pt 51 042-STDiT3-XL-2 240p 9:16 30 2 True\n```\n\nAfter generation, install the VBench package following our [installation](../docs/installation.md)'s sections of \"Evaluation Dependencies\". Then, run the following commands to evaluate the generated samples.\n\n<!-- ```bash\nbash eval/vbench/vbench.sh /path/to/video_folder /path/to/model/ckpt\n``` -->\n\n```bash\npython eval/vbench/calc_vbench.py /path/to/video_folder /path/to/model/ckpt\n```\n\nFinally, we obtain the scaled scores for the model by:\n```bash\npython eval/vbench/tabulate_vbench_scores.py --score_dir path/to/score/dir\n```\n\n## VBench-i2v\n\n[VBench-i2v](https://github.com/Vchitect/VBench/tree/master/vbench2_beta_i2v) is a benchmark for short image to video generation (beta version).\nSimilarly, install the VBench package following our [installation](../docs/installation.md)'s sections of \"Evaluation Dependencies\".\n\n```bash\n# Step 1: generate the relevant videos\n# vbench i2v tasks, if evaluation all set start_index to 0, end_index to 2000\nbash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -5 start_index end_index\n# Alternatively, launch 8 jobs at once\nbash eval/vbench_i2v/launch.sh /path/to/ckpt num_frames model_name\n\n# Step 2: run vbench to evaluate the generated samples\npython eval/vbench_i2v/vbench_i2v.py /path/to/video_folder /path/to/model/ckpt\n# Note that if you need to go to `VBench/vbench2_beta_i2v/utils.py` and change the harded-coded var `image_root` in the `load_i2v_dimension_info` function to your corresponding image folder.\n\n# Step 3: obtain the scaled scores\npython eval/vbench_i2v/tabulate_vbench_i2v_scores.py path/to/videos/folder path/to/your/model/ckpt\n# this will store the results under `eval/vbench_i2v` in the path/to/your/model/ckpt\n\n```\n\nSimilarly as VBench, you can specify resolution, aspect ratio, sampling steps, flow, and llm-refine\n\n```bash\nbash eval/vbench_i2v/launch.sh /path/to/ckpt num_frames model_name_for_log res_value aspect_ratio_value steps_value flow_value llm_refine_value\n# for example\n# bash eval/vbench_i2v/launch.sh /mnt/jfs-hdd/sora/checkpoints/outputs/042-STDiT3-XL-2/epoch1-global_step16200_llm_refine/ema.pt 51 042-STDiT3-XL-2 240p 9:16 30 2 True\n# if no flow control, use \"None\" instead\n```\n\n## VAE\n\nInstall the dependencies package following our [installation](../docs/installation.md)'s s sections of \"Evaluation Dependencies\". Then, run the following evaluation command:\n\n```bash\n# metric can any one or list of: ssim, psnr, lpips, flolpips\npython eval/vae/eval_common_metric.py --batch_size 2 --real_video_dir path/to/original/videos --generated_video_dir path/to/generated/videos --device cuda --sample_fps 24 --crop_size 256 --resolution 256 --num_frames 17 --sample_rate 1 --metric ssim psnr lpips flolpips\n```\n"
  },
  {
    "path": "Open-Sora/eval/human_eval/generate.sh",
    "content": "#!/bin/bash\n\nset -x\nset -e\n\nTEXT_PATH=/home/data/sora_data/pixart-sigma-generated/text.txt\nOUTPUT_PATH=/home/data/sora_data/pixart-sigma-generated/raw\nCMD=\"python scripts/inference.py configs/pixart/inference/1x2048MS.py\"\n# LOG_BASE=logs/sample/generate\nLOG_BASE=$(dirname $CKPT)/eval/generate\nmkdir -p ${LOG_BASE}\nNUM_PER_GPU=10000\nN_LAUNCH=2\nNUM_START=$(($N_LAUNCH * $NUM_PER_GPU * 8))\n\nCUDA_VISIBLE_DEVICES=0 $CMD --prompt-path $TEXT_PATH --save-dir $OUTPUT_PATH --start-index $(($NUM_START + $NUM_PER_GPU * 0)) --end-index $(($NUM_START + $NUM_PER_GPU * 1)) --image-size 2048 2048 --verbose 1 --batch-size 2 >${LOG_BASE}/${N_LAUNCH}_1.log 2>&1 &\nCUDA_VISIBLE_DEVICES=1 $CMD --prompt-path $TEXT_PATH --save-dir $OUTPUT_PATH --start-index $(($NUM_START + $NUM_PER_GPU * 1)) --end-index $(($NUM_START + $NUM_PER_GPU * 2)) --image-size 1408 2816 --verbose 1 --batch-size 2 >${LOG_BASE}/${N_LAUNCH}_2.log 2>&1 &\nCUDA_VISIBLE_DEVICES=2 $CMD --prompt-path $TEXT_PATH --save-dir $OUTPUT_PATH --start-index $(($NUM_START + $NUM_PER_GPU * 2)) --end-index $(($NUM_START + $NUM_PER_GPU * 3)) --image-size 2816 1408 --verbose 1 --batch-size 2 >${LOG_BASE}/${N_LAUNCH}_3.log 2>&1 &\nCUDA_VISIBLE_DEVICES=3 $CMD --prompt-path $TEXT_PATH --save-dir $OUTPUT_PATH --start-index $(($NUM_START + $NUM_PER_GPU * 3)) --end-index $(($NUM_START + $NUM_PER_GPU * 4)) --image-size 1664 2304 --verbose 1 --batch-size 2 >${LOG_BASE}/${N_LAUNCH}_4.log 2>&1 &\nCUDA_VISIBLE_DEVICES=4 $CMD --prompt-path $TEXT_PATH --save-dir $OUTPUT_PATH --start-index $(($NUM_START + $NUM_PER_GPU * 4)) --end-index $(($NUM_START + $NUM_PER_GPU * 5)) --image-size 2304 1664 --verbose 1 --batch-size 2 >${LOG_BASE}/${N_LAUNCH}_5.log 2>&1 &\nCUDA_VISIBLE_DEVICES=5 $CMD --prompt-path $TEXT_PATH --save-dir $OUTPUT_PATH --start-index $(($NUM_START + $NUM_PER_GPU * 5)) --end-index $(($NUM_START + $NUM_PER_GPU * 6)) --image-size 1536 2560 --verbose 1 --batch-size 2 >${LOG_BASE}/${N_LAUNCH}_6.log 2>&1 &\nCUDA_VISIBLE_DEVICES=6 $CMD --prompt-path $TEXT_PATH --save-dir $OUTPUT_PATH --start-index $(($NUM_START + $NUM_PER_GPU * 6)) --end-index $(($NUM_START + $NUM_PER_GPU * 7)) --image-size 2560 1536 --verbose 1 --batch-size 2 >${LOG_BASE}/${N_LAUNCH}_7.log 2>&1 &\nCUDA_VISIBLE_DEVICES=7 $CMD --prompt-path $TEXT_PATH --save-dir $OUTPUT_PATH --start-index $(($NUM_START + $NUM_PER_GPU * 7)) --end-index $(($NUM_START + $NUM_PER_GPU * 8)) --image-size 2048 2048 --verbose 1 --batch-size 2 >${LOG_BASE}/${N_LAUNCH}_8.log 2>&1 &\n"
  },
  {
    "path": "Open-Sora/eval/human_eval/launch.sh",
    "content": "#!/bin/bash\n\nCKPT=$1\nNUM_FRAMES=$2\nMODEL_NAME=$3\n\nif [[ $CKPT == *\"ema\"* ]]; then\n    parentdir=$(dirname $CKPT)\n    CKPT_BASE=$(basename $parentdir)_ema\nelse\n    CKPT_BASE=$(basename $CKPT)\nfi\nLOG_BASE=$(dirname $CKPT)/eval\nmkdir -p ${LOG_BASE}\necho \"Logging to $LOG_BASE\"\n\nGPUS=(0 1 2 3 4 5 6 7)\n# TASK_ID_LIST=(1 2a 2b 2c 2d 2e 2f 2g) # move image to video task\nTASK_ID_LIST=(2a 2b 2c 2d 2e 2f 2g 2h)\n# FRAME_LIST=(1 $NUM_FRAMES $NUM_FRAMES $NUM_FRAMES $NUM_FRAMES $NUM_FRAMES $NUM_FRAMES $NUM_FRAMES)\n\nfor i in \"${!GPUS[@]}\"; do\n    CUDA_VISIBLE_DEVICES=${GPUS[i]} bash eval/sample.sh $CKPT $NUM_FRAMES $MODEL_NAME -${TASK_ID_LIST[i]} >${LOG_BASE}/${TASK_ID_LIST[i]}.log 2>&1 &\ndone\n\n# kill all by: pkill -f \"inference\"\n"
  },
  {
    "path": "Open-Sora/eval/loss/eval_loss.py",
    "content": "from pprint import pformat\n\nimport colossalai\nimport torch\nimport torch.distributed as dist\nfrom colossalai.cluster import DistCoordinator\nfrom mmengine.runner import set_random_seed\nfrom tqdm import tqdm\n\nfrom opensora.acceleration.parallel_states import get_data_parallel_group, set_data_parallel_group\nfrom opensora.datasets.dataloader import prepare_dataloader\nfrom opensora.registry import DATASETS, MODELS, SCHEDULERS, build_module\nfrom opensora.utils.config_utils import parse_configs\nfrom opensora.utils.misc import create_logger, to_torch_dtype\nfrom opensora.utils.train_utils import MaskGenerator\n\n\ndef main():\n    torch.set_grad_enabled(False)\n    # ======================================================\n    # configs & runtime variables\n    # ======================================================\n    # == parse configs ==\n    cfg = parse_configs(training=False)\n\n    # == device and dtype ==\n    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n    cfg_dtype = cfg.get(\"dtype\", \"fp32\")\n    assert cfg_dtype in [\"fp16\", \"bf16\", \"fp32\"], f\"Unknown mixed precision {cfg_dtype}\"\n    dtype = to_torch_dtype(cfg.get(\"dtype\", \"bf16\"))\n    torch.backends.cuda.matmul.allow_tf32 = True\n    torch.backends.cudnn.allow_tf32 = True\n\n    # == init distributed env ==\n    colossalai.launch_from_torch({})\n    DistCoordinator()\n    set_random_seed(seed=cfg.get(\"seed\", 1024))\n    set_data_parallel_group(dist.group.WORLD)\n\n    # == init logger ==\n    logger = create_logger()\n    logger.info(\"Eval loss configuration:\\n %s\", pformat(cfg.to_dict()))\n\n    # ======================================================\n    # build model & load weights\n    # ======================================================\n    logger.info(\"Building models...\")\n    # == build text-encoder and vae ==\n    text_encoder = build_module(cfg.text_encoder, MODELS, device=device)\n    vae = build_module(cfg.vae, MODELS).to(device, dtype).eval()\n\n    # == build diffusion model ==\n    input_size = (None, None, None)\n    latent_size = vae.get_latent_size(input_size)\n    model = (\n        build_module(\n            cfg.model,\n            MODELS,\n            input_size=latent_size,\n            in_channels=vae.out_channels,\n            caption_channels=text_encoder.output_dim,\n            model_max_length=text_encoder.model_max_length,\n        )\n        .to(device, dtype)\n        .eval()\n    )\n    text_encoder.y_embedder = model.y_embedder  # HACK: for classifier-free guidance\n\n    # == build scheduler ==\n    scheduler = build_module(cfg.scheduler, SCHEDULERS)\n\n    if cfg.get(\"mask_ratios\", None) is not None:\n        mask_generator = MaskGenerator(cfg.mask_ratios)\n\n    # ======================================================\n    # inference\n    # ======================================================\n    # start evaluation, prepare a dataset everytime in the loop\n    bucket_config = cfg.bucket_config\n    if cfg.get(\"resolution\", None) is not None:\n        bucket_config = {cfg.resolution: bucket_config[cfg.resolution]}\n    assert bucket_config is not None, \"bucket_config is required for evaluation\"\n    logger.info(\"Evaluating bucket_config: %s\", bucket_config)\n\n    def build_dataset(resolution, num_frames, batch_size):\n        bucket_config = {resolution: {num_frames: (1.0, batch_size)}}\n        dataset = build_module(cfg.dataset, DATASETS)\n        dataloader_args = dict(\n            dataset=dataset,\n            batch_size=None,\n            num_workers=cfg.num_workers,\n            shuffle=False,\n            drop_last=False,\n            pin_memory=True,\n            process_group=get_data_parallel_group(),\n        )\n        dataloader, sampler = prepare_dataloader(bucket_config=bucket_config, **dataloader_args)\n        num_batch = sampler.get_num_batch()\n        num_steps_per_epoch = num_batch // dist.get_world_size()\n        return dataloader, num_steps_per_epoch, num_batch\n\n    evaluation_losses = {}\n    start = cfg.start_index if \"start_index\" in cfg else 0\n    end = cfg.end_index if \"end_index\" in cfg else len(bucket_config)\n    for i, res in enumerate(bucket_config):\n        if i < start or i >= end:  # skip task\n            continue\n\n        t_bucket = bucket_config[res]\n        for num_frames, (_, batch_size) in t_bucket.items():\n            if batch_size is None:\n                continue\n            logger.info(\"Evaluating resolution: %s, num_frames: %s\", res, num_frames)\n            dataloader, num_steps_per_epoch, num_batch = build_dataset(res, num_frames, batch_size)\n            if num_batch == 0:\n                logger.warning(\"No data for resolution: %s, num_frames: %s\", res, num_frames)\n                continue\n\n            evaluation_t_losses = []\n            for t in torch.linspace(0, scheduler.num_timesteps, cfg.get(\"num_eval_timesteps\", 10) + 2)[1:-1]:\n                loss_t = 0.0\n                num_samples = 0\n                dataloader_iter = iter(dataloader)\n                for _ in tqdm(range(num_steps_per_epoch), desc=f\"res: {res}, num_frames: {num_frames}, t: {t:.2f}\"):\n                    batch = next(dataloader_iter)\n                    x = batch.pop(\"video\").to(device, dtype)\n                    y = batch.pop(\"text\")\n                    x = vae.encode(x)\n                    model_args = text_encoder.encode(y)\n\n                    # == mask ==\n                    mask = None\n                    if cfg.get(\"mask_ratios\", None) is not None:\n                        mask = mask_generator.get_masks(x)\n                        model_args[\"x_mask\"] = mask\n\n                    # == video meta info ==\n                    for k, v in batch.items():\n                        model_args[k] = v.to(device, dtype)\n\n                    # == diffusion loss computation ==\n                    timestep = torch.tensor([t] * x.shape[0], device=device, dtype=dtype)\n                    loss_dict = scheduler.training_losses(model, x, model_args, mask=mask, t=timestep)\n                    losses = loss_dict[\"loss\"]  # (batch_size)\n                    num_samples += x.shape[0]\n                    loss_t += losses.sum().item()\n                loss_t /= num_samples\n                evaluation_t_losses.append(loss_t)\n                logger.info(\"resolution: %s, num_frames: %s, timestep: %.2f, loss: %.4f\", res, num_frames, t, loss_t)\n\n            evaluation_losses[(res, num_frames)] = sum(evaluation_t_losses) / len(evaluation_t_losses)\n            logger.info(\n                \"Evaluation losses for resolution: %s, num_frames: %s, loss: %s\\n %s\",\n                res,\n                num_frames,\n                evaluation_losses[(res, num_frames)],\n                evaluation_t_losses,\n            )\n    logger.info(\"Evaluation losses: %s\", evaluation_losses)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/eval/loss/launch.sh",
    "content": "#!/bin/bash\n\nCMD=\"torchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py\"\nCKPT_PATH=$1\nMODEL_NAME=$2\nIMG_PATH=$3\nVID_PATH=$4\n\nif [ -z $IMG_PATH ]; then\n    IMG_PATH=\"/mnt/jfs-hdd/sora/meta/validation/img_1k.csv\"\nfi\n\nif [ -z $VID_PATH ]; then\n    VID_PATH=\"/mnt/jfs-hdd/sora/meta/validation/vid_100.csv\"\nfi\n\nif [[ $CKPT_PATH == *\"ema\"* ]]; then\n    parentdir=$(dirname $CKPT_PATH)\n    CKPT_BASE=$(basename $parentdir)_ema\nelse\n    CKPT_BASE=$(basename $CKPT_PATH)\nfi\nLOG_BASE=$(dirname $CKPT_PATH)/eval\nmkdir -p $LOG_BASE\necho \"Logging to $LOG_BASE\"\n\n\nGPUS=(3 4 5 6 7)\nRESOLUTION=(144p 240p 360p 480p 720p)\n\nCUDA_VISIBLE_DEVICES=0 $CMD --data-path $IMG_PATH --ckpt-path $CKPT_PATH --start-index 0 --end-index 5 >${LOG_BASE}/img_0.log 2>&1 &\nCUDA_VISIBLE_DEVICES=1 $CMD --data-path $IMG_PATH --ckpt-path $CKPT_PATH --start-index 5 --end-index 6 >${LOG_BASE}/img_1.log 2>&1 &\nCUDA_VISIBLE_DEVICES=2 $CMD --data-path $IMG_PATH --ckpt-path $CKPT_PATH --start-index 6 >${LOG_BASE}/img_2.log 2>&1 &\n\n\nfor i in \"${!GPUS[@]}\"; do\n    CUDA_VISIBLE_DEVICES=${GPUS[i]} $CMD --data-path $VID_PATH --ckpt-path $CKPT_PATH --resolution ${RESOLUTION[i]} >${LOG_BASE}/${RESOLUTION[i]}_vid.log 2>&1 &\ndone\n"
  },
  {
    "path": "Open-Sora/eval/loss/tabulate_rl_loss.py",
    "content": "\"\"\"\nusage:\n    python tabulate_rl_loss.py --log_dir /home/zhengzangwei/projs/Open-Sora-dev/logs/loss --ckpt_name epoch0-global_step9000\n\nsave the processed json to:\n    Open-Sora-dev/evaluation_results/rectified_flow/<ckpt_name>_loss.json\n\"\"\"\n\nimport argparse\nimport json\nimport os\nfrom ast import literal_eval\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"--log_dir\", type=str)\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n\n    files = os.listdir(args.log_dir)\n    files = [\n        \"img_0.log\",\n        \"img_1.log\",\n        \"img_2.log\",\n        \"144p_vid.log\",\n        \"240p_vid.log\",\n        \"360p_vid.log\",\n        \"480p_vid.log\",\n        \"720p_vid.log\",\n    ]\n\n    loss_info = {}\n\n    for fname in files:\n        path = os.path.join(args.log_dir, fname)\n        with open(path, \"r\", encoding=\"utf-8\") as f:\n            content = f.readlines()\n        eval_line = content[-1].split(\"losses:\")[-1].strip()\n        loss_dict = literal_eval(eval_line)\n        for key, loss in loss_dict.items():\n            resolution, frame = key\n            if resolution not in loss_info:\n                loss_info[resolution] = {}\n            loss_info[resolution][frame] = format(loss, \".4f\")\n\n    # Convert and write JSON object to file\n    output_file_path = os.path.join(args.log_dir, \"loss.json\")\n    with open(output_file_path, \"w\") as outfile:\n        json.dump(loss_info, outfile, indent=4, sort_keys=True)\n    print(f\"results saved to: {output_file_path}\")\n"
  },
  {
    "path": "Open-Sora/eval/sample.sh",
    "content": "# !/bin/bash\n\nCKPT=$1\nNUM_FRAMES=$2\nMODEL_NAME=$3\nTASK_TYPE=$4\nVBENCH_START_INDEX=$5\nVBENCH_END_INDEX=$6\nVBENCH_RES=$7\nVBENCH_ASP_RATIO=$8\n\nNUM_SAMPLING_STEPS=$9\nFLOW=${10}\nLLM_REFINE=${11}\n\nBASE_ASPECT_RATIO=360p\nASPECT_RATIOS=(144p 240p 360p 480p 720p 1080p)\n# Loop through the list of aspect ratios\ni=0\nfor r in \"${ASPECT_RATIOS[@]}\"; do\n  if [[ \"$r\" == \"$BASE_ASPECT_RATIO\" ]]; then\n    # get aspect ratio 1 level up\n    if [[ $((i+1)) -lt ${#ASPECT_RATIOS[@]} ]]; then\n      ASPECT_RATIO_INCR_1=${ASPECT_RATIOS[$((i+1))]}\n    else\n      # If this is the highest ratio, return the highest ratio\n      ASPECT_RATIO_INCR_1=${ASPECT_RATIOS[-1]}\n    fi\n    # get aspect ratio 2 levels up\n    if [[ $((i+2)) -lt ${#ASPECT_RATIOS[@]} ]]; then\n      ASPECT_RATIO_INCR_2=${ASPECT_RATIOS[$((i+2))]}\n    else\n      # If this is the highest ratio, return the highest ratio\n      ASPECT_RATIO_INCR_2=${ASPECT_RATIOS[-1]}\n    fi\n  fi\n  i=$((i+1))\ndone\necho \"base aspect ratio: ${BASE_ASPECT_RATIO}\"\necho \"aspect ratio 1 level up: ${ASPECT_RATIO_INCR_1}\"\necho \"aspect ratio 2 levels up: ${ASPECT_RATIO_INCR_2}\"\necho \"Note that this aspect ratio level setting is used for videos only, not images\"\n\necho \"NUM_FRAMES=${NUM_FRAMES}\"\n\nif [ -z \"${NUM_FRAMES}\" ]; then\n  echo \"you need to pass NUM_FRAMES\"\nelse\n  let DOUBLE_FRAMES=$2*2\n  let QUAD_FRAMES=$2*4\n  let OCT_FRAMES=$2*8\nfi\n\necho \"DOUBLE_FRAMES=${DOUBLE_FRAMES}\"\necho \"QUAD_FRAMES=${QUAD_FRAMES}\"\necho \"OCT_FRAMES=${OCT_FRAMES}\"\n\nCMD=\"python scripts/inference.py configs/opensora-v1-2/inference/sample.py\"\nif [[ $CKPT == *\"ema\"* ]]; then\n  parentdir=$(dirname $CKPT)\n  CKPT_BASE=$(basename $parentdir)_ema\nelse\n  CKPT_BASE=$(basename $CKPT)\nfi\nOUTPUT=\"/root/autodl-tmp/video_samples/samples_${MODEL_NAME}_${CKPT_BASE}\"\nstart=$(date +%s)\nDEFAULT_BS=1\n\n### Functions\n\n# called inside run_video_b\nfunction run_image() { # 14min\n  # 1.1 1024x1024\n  eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2i_samples.txt --save-dir $OUTPUT --num-frames 1 --resolution 1024 --aspect-ratio 1:1 --sample-name image_1024_1_1 --batch-size $DEFAULT_BS\n\n  # 1.2 240x426\n  eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2i_samples.txt --save-dir $OUTPUT --num-frames 1 --resolution 240p --aspect-ratio 9:16 --sample-name image_240p_9_16 --end-index 3 --batch-size $DEFAULT_BS\n\n  # 1.3 512x512\n  eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2i_samples.txt --save-dir $OUTPUT --num-frames 1 --resolution 512 --aspect-ratio 1:1 --sample-name image_t2i_512_1_1 --end-index 3 --batch-size $DEFAULT_BS\n  eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_samples.txt --save-dir $OUTPUT --num-frames 1 --resolution 512 --aspect-ratio 1:1 --sample-name image_t2v_512_1_1 --end-index 3 --batch-size $DEFAULT_BS\n  eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_short.txt --save-dir $OUTPUT --num-frames 1 --resolution 512 --aspect-ratio 1:1 --sample-name image_short_512_1_1 --end-index 3 --batch-size $DEFAULT_BS\n  eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_sora.txt --save-dir $OUTPUT --num-frames 1 --resolution 512 --aspect-ratio 1:1 --sample-name image_sora_512_1_1 --end-index 3 --batch-size $DEFAULT_BS\n\n  # 1.4 720p multi-resolution\n  # 1:1\n  PROMPT=\"Bright scene, aerial view,ancient city, fantasy, gorgeous light, mirror reflection, high detail, wide angle lens.\"\n  eval $CMD --ckpt-path $CKPT --prompt \\\"$PROMPT\\\" --save-dir $OUTPUT --num-frames 1 --resolution 720p --aspect-ratio 1:1 --sample-name image_720p_1_1\n  # 9:16\n  eval $CMD --ckpt-path $CKPT --prompt \\\"$PROMPT\\\" --save-dir $OUTPUT --num-frames 1 --resolution 720p --aspect-ratio 9:16 --sample-name image_720p_9_16\n  # 16:9\n  eval $CMD --ckpt-path $CKPT --prompt \\\"$PROMPT\\\" --save-dir $OUTPUT --num-frames 1 --resolution 720p --aspect-ratio 16:9 --sample-name image_720p_16_9\n  # 4:3\n  eval $CMD --ckpt-path $CKPT --prompt \\\"$PROMPT\\\" --save-dir $OUTPUT --num-frames 1 --resolution 720p --aspect-ratio 4:3 --sample-name image_720p_4_3\n  # 3:4\n  eval $CMD --ckpt-path $CKPT --prompt \\\"$PROMPT\\\" --save-dir $OUTPUT --num-frames 1 --resolution 720p --aspect-ratio 3:4 --sample-name image_720p_3_4\n  # 1:2\n  eval $CMD --ckpt-path $CKPT --prompt \\\"$PROMPT\\\" --save-dir $OUTPUT --num-frames 1 --resolution 720p --aspect-ratio 1:2 --sample-name image_720p_1_2\n  # 2:1\n  eval $CMD --ckpt-path $CKPT --prompt \\\"$PROMPT\\\" --save-dir $OUTPUT --num-frames 1 --resolution 720p --aspect-ratio 2:1 --sample-name image_720p_2_1\n}\n\n# for (sample, short, sora)\n#   for ( (4s, 720p), (8s, 480p), (16s, 360p) )\n\nfunction run_video_a() { # ~ 30min ?\n  ### previous cmds  # 42min, sample & multi-resolution\n  # # sample, 144p, 9:16, 2s\n  # eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_samples.txt --save-dir $OUTPUT --num-frames 2s --resolution 144p --aspect-ratio 9:16 --sample-name sample_2s_144p_9_16 --batch-size $DEFAULT_BS\n  # # sample, 240p, 9:16, 2s\n  # eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_samples.txt --save-dir $OUTPUT --num-frames 2s --resolution 240p --aspect-ratio 9:16 --sample-name sample_2s_240p_9_16 --batch-size $DEFAULT_BS\n  # # sample, 240p, 9:16, 4s\n  # eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_samples.txt --save-dir $OUTPUT --num-frames 4s --resolution 240p --aspect-ratio 9:16 --sample-name sample_4s_240p_9_16 --batch-size $DEFAULT_BS\n  # # sample, 240p, 9:16, 8s\n  # eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_samples.txt --save-dir $OUTPUT --num-frames 8s --resolution 240p --aspect-ratio 9:16 --sample-name sample_8s_240p_9_16 --batch-size $DEFAULT_BS\n  # # sample, 480p, 9:16, 2s\n  # eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_samples.txt --save-dir $OUTPUT --num-frames 2s --resolution 480p --aspect-ratio 9:16 --sample-name sample_2s_480p_9_16 --batch-size $DEFAULT_BS\n  # # sample, 480p, 9:16, 4s\n  # eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_samples.txt --save-dir $OUTPUT --num-frames 4s --resolution 480p --aspect-ratio 9:16 --sample-name sample_4s_480p_9_16 --batch-size $DEFAULT_BS\n  # # sample, 720p, 9:16, 2s\n  # eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_samples.txt --save-dir $OUTPUT --num-frames 2s --resolution 720p --aspect-ratio 9:16 --sample-name sample_2s_720p_9_16 --batch-size $DEFAULT_BS\n\n  # sample, 720p, 9:16, 2s\n  eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_samples.txt --save-dir $OUTPUT --num-frames 4s --resolution ${ASPECT_RATIO_INCR_2} --aspect-ratio 9:16 --sample-name sample_4s_${ASPECT_RATIO_INCR_2} --batch-size $DEFAULT_BS\n\n  # sample, 480p, 9:16, 8s\n  eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_samples.txt --save-dir $OUTPUT --num-frames 8s --resolution ${ASPECT_RATIO_INCR_1} --aspect-ratio 9:16 --sample-name sample_8s_${ASPECT_RATIO_INCR_1} --batch-size $DEFAULT_BS\n\n  # sample, 360p, 9:16, 16s\n  eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_samples.txt --save-dir $OUTPUT --num-frames 16s --resolution ${BASE_ASPECT_RATIO} --aspect-ratio 9:16 --sample-name sample_16s_${BASE_ASPECT_RATIO} --batch-size $DEFAULT_BS\n}\n\nfunction run_video_b() { # 18min + 14min = 32min, short 16x240p & 64x240p\n  # run image, 14min\n  echo \"Inside run_video_b, running image samples...\"\n  run_image\n\n  echo \"Inside run_video_b, running video samples...\"\n\n  ### previous cmds, 18min\n  # # short, 240p, 9:16, 4s\n  # eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_short.txt --save-dir $OUTPUT --num-frames 4s --resolution 240p --aspect-ratio 9:16 --sample-name short_4s_240p_9_16 --batch-size $DEFAULT_BS\n  # # short, 240p, 9:16, 8s\n  # eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_short.txt --save-dir $OUTPUT --num-frames 8s --resolution 240p --aspect-ratio 9:16 --sample-name short_8s_240p_9_16 --batch-size $DEFAULT_BS\n\n  # short, 480p, 9:16, 8s: ~24min\n  eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_short.txt --save-dir $OUTPUT --num-frames 8s --resolution ${ASPECT_RATIO_INCR_1} --aspect-ratio 9:16 --sample-name short_8s_${ASPECT_RATIO_INCR_1} --batch-size $DEFAULT_BS\n\n  # short, 360p, 9:16, 16s: ~24min\n  eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_short.txt --save-dir $OUTPUT --num-frames 16s --resolution ${BASE_ASPECT_RATIO} --aspect-ratio 9:16 --sample-name short_16s_${BASE_ASPECT_RATIO} --batch-size $DEFAULT_BS\n\n}\n\nfunction run_video_c() {\n  ### previous cmds, 60min\n  # # sora, 240p, 16:9, 2s\n  # eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_sora.txt --save-dir $OUTPUT --num-frames 2s --resolution 240p --aspect-ratio 16:9 --sample-name sora_2s_240p_16_9 --batch-size $DEFAULT_BS\n  # # sora, 240p, 9:16, 2s\n  # eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_sora.txt --save-dir $OUTPUT --num-frames 2s --resolution 240p --aspect-ratio 9:16 --sample-name sora_2s_240p_9_16 --batch-size $DEFAULT_BS\n  # # sora, 240p, 9:16, 16s\n  # eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_sora.txt --save-dir $OUTPUT --num-frames 16s --resolution 240p --aspect-ratio 9:16 --sample-name sora_16s_240p_9_16 --batch-size $DEFAULT_BS\n\n  # short, 720p, 9:16, 2s: ~9min\n  eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_short.txt --save-dir $OUTPUT --num-frames 4s --resolution ${ASPECT_RATIO_INCR_2} --aspect-ratio 9:16 --sample-name short_4s_${ASPECT_RATIO_INCR_2} --batch-size $DEFAULT_BS\n\n  # sora, 360p, 9:16, 16s: ~40min\n  eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_sora.txt --save-dir $OUTPUT --num-frames 16s --resolution ${BASE_ASPECT_RATIO} --aspect-ratio 9:16 --sample-name sora_16s_${BASE_ASPECT_RATIO} --batch-size $DEFAULT_BS\n}\n\nfunction run_video_d() {\n  ### previous cmds, 21min + 30min = 51min\n  # # short, 480p, 9:16, 4s: 21min\n  # eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_short.txt --save-dir $OUTPUT --num-frames 4s --resolution 480p --aspect-ratio 9:16 --sample-name short_4s_480p_9_16 --batch-size $DEFAULT_BS\n  # # sora, 480p, 9:16, 8s, 1/3 # moved from run_video_e, 30min\n  # eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_sora.txt --save-dir $OUTPUT --num-frames 8s --resolution 480p --aspect-ratio 9:16 --sample-name sora_8s_480p_9_16 --batch-size $DEFAULT_BS --start-index 0 --end-index 16\n\n  # sora, 480p, 9:16, 8s, 1/3 # moved from run_video_e, 30min\n  eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_sora.txt --save-dir $OUTPUT --num-frames 8s --resolution ${ASPECT_RATIO_INCR_1} --aspect-ratio 9:16 --sample-name sora_8s_${ASPECT_RATIO_INCR_1} --batch-size $DEFAULT_BS --start-index 0 --end-index 16\n}\n\nfunction run_video_e() { # 90min * 2/3 = 60min\n  # sora, 480p, 9:16, 8s, 2/3\n  eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_sora.txt --save-dir $OUTPUT --num-frames 8s --resolution ${ASPECT_RATIO_INCR_1} --aspect-ratio 9:16 --sample-name sora_8s_${ASPECT_RATIO_INCR_1} --batch-size $DEFAULT_BS --start-index 16 --end-index 100\n}\n\nfunction run_video_f() { # 60min\n  # sora, 720p, 9:16, 2s\n  eval $CMD --ckpt-path $CKPT --prompt-path assets/texts/t2v_sora.txt --save-dir $OUTPUT --num-frames 4s --resolution ${ASPECT_RATIO_INCR_2} --aspect-ratio 9:16 --sample-name sora_4s_${ASPECT_RATIO_INCR_2} --batch-size $DEFAULT_BS\n}\n\n# --resolution 720p --aspect-ratio [16:9, 9:16, ...]\n\nfunction run_video_g() { # 15min\n  # 720p, 2s multi-resolution\n  # 1:1\n  PROMPT=\"A soaring drone footage captures the majestic beauty of a coastal cliff, its red and yellow stratified rock faces rich in color and against the vibrant turquoise of the sea. Seabirds can be seen taking flight around the cliff's precipices. As the drone slowly moves from different angles, the changing sunlight casts shifting shadows that highlight the rugged textures of the cliff and the surrounding calm sea. The water gently laps at the rock base and the greenery that clings to the top of the cliff, and the scene gives a sense of peaceful isolation at the fringes of the ocean. The video captures the essence of pristine natural beauty untouched by human structures.\"\n  eval $CMD --ckpt-path $CKPT --prompt \\\"$PROMPT\\\" --save-dir $OUTPUT --num-frames 2s --resolution ${ASPECT_RATIO_INCR_2} --aspect-ratio 1:1 --sample-name drone_cliff_prompt_${ASPECT_RATIO_INCR_2}_2s_1_1\n  # 16:9\n  eval $CMD --ckpt-path $CKPT --prompt \\\"$PROMPT\\\" --save-dir $OUTPUT --num-frames 2s --resolution ${ASPECT_RATIO_INCR_2} --aspect-ratio 16:9 --sample-name drone_cliff_prompt_${ASPECT_RATIO_INCR_2}_2s_16_9\n  # 9:16\n  eval $CMD --ckpt-path $CKPT --prompt \\\"$PROMPT\\\" --save-dir $OUTPUT --num-frames 2s --resolution ${ASPECT_RATIO_INCR_2} --aspect-ratio 9:16 --sample-name drone_cliff_prompt_${ASPECT_RATIO_INCR_2}_2s_9_16\n  # 4:3\n  eval $CMD --ckpt-path $CKPT --prompt \\\"$PROMPT\\\" --save-dir $OUTPUT --num-frames 2s --resolution ${ASPECT_RATIO_INCR_2} --aspect-ratio 4:3 --sample-name drone_cliff_prompt_${ASPECT_RATIO_INCR_2}_2s_4_3\n  # 3:4\n  eval $CMD --ckpt-path $CKPT --prompt \\\"$PROMPT\\\" --save-dir $OUTPUT --num-frames 2s --resolution ${ASPECT_RATIO_INCR_2} --aspect-ratio 3:4 --sample-name drone_cliff_prompt_${ASPECT_RATIO_INCR_2}_2s_3_4\n  # 1:2\n  eval $CMD --ckpt-path $CKPT --prompt \\\"$PROMPT\\\" --save-dir $OUTPUT --num-frames 2s --resolution ${ASPECT_RATIO_INCR_2} --aspect-ratio 1:2 --sample-name drone_cliff_prompt_${ASPECT_RATIO_INCR_2}_2s_1_2\n  # 2:1\n  eval $CMD --ckpt-path $CKPT --prompt \\\"$PROMPT\\\" --save-dir $OUTPUT --num-frames 2s --resolution ${ASPECT_RATIO_INCR_2} --aspect-ratio 2:1 --sample-name drone_cliff_prompt_${ASPECT_RATIO_INCR_2}_2s_2_1\n\n  # add motion score\n  eval $CMD --ckpt-path $CKPT --save-dir $OUTPUT --num-frames 2s --resolution ${ASPECT_RATIO_INCR_2} --sample-name motion_2s_${ASPECT_RATIO_INCR_2} --prompt \\\n    \\\"A stylish woman walking in the street of Tokyo.\\\" \\\"A stylish woman walking in the street of Tokyo. motion score: 0.0\\\" \\\n    \\\"A stylish woman walking in the street of Tokyo. motion score: 2.0\\\" \\\n    \\\"A stylish woman walking in the street of Tokyo. motion score: 4.0\\\" \\\n    \\\"A stylish woman walking in the street of Tokyo. motion score: 6.0\\\" \\\n    \\\"A stylish woman walking in the street of Tokyo. motion score: 10.0\\\" \\\n    \\\"A stylish woman walking in the street of Tokyo. motion score: 25.0\\\" \\\n    \\\"A stylish woman walking in the street of Tokyo. motion score: 50.0\\\" \\\n    \\\"A stylish woman walking in the street of Tokyo. motion score: 100.0\\\"\n\n  # add aes score\n  eval $CMD --ckpt-path $CKPT --save-dir $OUTPUT --num-frames 2s --resolution ${ASPECT_RATIO_INCR_2} --sample-name aes_2s_${ASPECT_RATIO_INCR_2} --prompt \\\n    \\\"A stylish woman walking in the street of Tokyo.\\\" \\\"A stylish woman walking in the street of Tokyo. aesthetic score: 4.0\\\" \\\n    \\\"A stylish woman walking in the street of Tokyo. aesthetic score: 4.5\\\" \\\n    \\\"A stylish woman walking in the street of Tokyo. aesthetic score: 5.0\\\" \\\n    \\\"A stylish woman walking in the street of Tokyo. aesthetic score: 5.5\\\" \\\n    \\\"A stylish woman walking in the street of Tokyo. aesthetic score: 6.0\\\" \\\n    \\\"A stylish woman walking in the street of Tokyo. aesthetic score: 6.5\\\" \\\n    \\\"A stylish woman walking in the street of Tokyo. aesthetic score: 7.0\\\"\n}\n\n# resolution -> 480p\n\nfunction run_video_h() { # 61min\n  # 3.1 image-conditioned long video generation\n  eval $CMD --ckpt-path $CKPT --save-dir $OUTPUT --sample-name ref_L5C5_2s_${BASE_ASPECT_RATIO}_9_16 \\\n    --prompt-path assets/texts/t2v_ref.txt --start-index 0 --end-index 3 \\\n    --num-frames 2s --resolution ${BASE_ASPECT_RATIO} --aspect-ratio 9:16 \\\n    --loop 5 --condition-frame-length 5 \\\n    --reference-path assets/images/condition/cliff.png assets/images/condition/wave.png assets/images/condition/ship.png \\\n    --mask-strategy \"0\" \"0\" \"0\" --batch-size $DEFAULT_BS\n\n  eval $CMD --ckpt-path $CKPT --save-dir $OUTPUT --sample-name ref_L5C10_16s_${BASE_ASPECT_RATIO}_9_16 \\\n    --prompt-path assets/texts/t2v_ref.txt --start-index 0 --end-index 3 \\\n    --num-frames 16s --resolution ${BASE_ASPECT_RATIO} --aspect-ratio 9:16 \\\n    --loop 5 --condition-frame-length 10 \\\n    --reference-path assets/images/condition/cliff.png assets/images/condition/wave.png assets/images/condition/ship.png \\\n    --mask-strategy \"0\" \"0\" \"0\" --batch-size $DEFAULT_BS\n\n  # 3.2\n  eval $CMD --ckpt-path $CKPT --save-dir $OUTPUT --sample-name ref_L1_16s_${BASE_ASPECT_RATIO}_9_16 \\\n    --prompt-path assets/texts/t2v_ref.txt --start-index 3 --end-index 6 \\\n    --num-frames 16s --resolution ${BASE_ASPECT_RATIO} --aspect-ratio 9:16 \\\n    --loop 1 \\\n    --reference-path assets/images/condition/cliff.png \"assets/images/condition/cactus-sad.png\\;assets/images/condition/cactus-happy.png\" https://cdn.openai.com/tmp/s/interp/d0.mp4 \\\n    --mask-strategy \"0\" \"0\\;0,1,0,-1,1\" \"0,0,0,0,${QUAD_FRAMES},0.5\" --batch-size $DEFAULT_BS\n}\n\n# vbench has 950 samples\n\nVBENCH_BS=1 # 80GB\nVBENCH_H=240\nVBENCH_W=426\nVBENCH_NUM_SAMPLE=5\n\nfunction run_vbench() {\n  if [ -z ${VBENCH_RES} ] || [ -z ${VBENCH_ASP_RATIO} ]; then\n    eval $CMD --ckpt-path $CKPT --save-dir ${OUTPUT}_vbench --prompt-as-path --num-sample $VBENCH_NUM_SAMPLE \\\n      --prompt-path assets/texts/VBench/all_dimension.txt \\\n      --image-size $VBENCH_H $VBENCH_W \\\n      --batch-size $VBENCH_BS --num-frames $NUM_FRAMES --start-index $1 --end-index $2\n  else\n    if [ -z ${NUM_SAMPLING_STEPS} ]; then\n        eval $CMD --ckpt-path $CKPT --save-dir ${OUTPUT}_vbench --prompt-as-path --num-sample $VBENCH_NUM_SAMPLE \\\n        --prompt-path assets/texts/VBench/all_dimension.txt \\\n        --resolution $VBENCH_RES --aspect-ratio $VBENCH_ASP_RATIO \\\n        --batch-size $VBENCH_BS --num-frames $NUM_FRAMES --start-index $1 --end-index $2\n    else\n      if [ -z ${FLOW} ]; then\n        eval $CMD --ckpt-path $CKPT --save-dir ${OUTPUT}_vbench --prompt-as-path --num-sample $VBENCH_NUM_SAMPLE5 \\\n        --prompt-path assets/texts/VBench/all_dimension.txt \\\n        --resolution $VBENCH_RES --aspect-ratio $VBENCH_ASP_RATIO --num-sampling-steps ${NUM_SAMPLING_STEPS} \\\n        --batch-size $VBENCH_BS --num-frames $NUM_FRAMES --start-index $1 --end-index $2\n      else\n        if [ -z ${LLM_REFINE} ]; then\n          eval $CMD --ckpt-path $CKPT --save-dir ${OUTPUT}_vbench --prompt-as-path --num-sample $VBENCH_NUM_SAMPLE \\\n          --prompt-path assets/texts/VBench/all_dimension.txt \\\n          --resolution $VBENCH_RES --aspect-ratio $VBENCH_ASP_RATIO --num-sampling-steps ${NUM_SAMPLING_STEPS} --flow ${FLOW} \\\n          --batch-size $VBENCH_BS --num-frames $NUM_FRAMES --start-index $1 --end-index $2\n        else\n          if [ \"${FLOW}\" = \"None\" ]; then\n            eval $CMD --ckpt-path $CKPT --save-dir ${OUTPUT}_vbench --prompt-as-path --num-sample $VBENCH_NUM_SAMPLE \\\n            --prompt-path assets/texts/VBench/all_dimension.txt \\\n            --resolution $VBENCH_RES --aspect-ratio $VBENCH_ASP_RATIO --num-sampling-steps ${NUM_SAMPLING_STEPS} --llm-refine ${LLM_REFINE} \\\n            --batch-size $VBENCH_BS --num-frames $NUM_FRAMES --start-index $1 --end-index $2\n          else\n            eval $CMD --ckpt-path $CKPT --save-dir ${OUTPUT}_vbench --prompt-as-path --num-sample $VBENCH_NUM_SAMPLE \\\n            --prompt-path assets/texts/VBench/all_dimension.txt \\\n            --resolution $VBENCH_RES --aspect-ratio $VBENCH_ASP_RATIO --num-sampling-steps ${NUM_SAMPLING_STEPS} --flow ${FLOW} --llm-refine ${LLM_REFINE} \\\n            --batch-size $VBENCH_BS --num-frames $NUM_FRAMES --start-index $1 --end-index $2\n          fi\n        fi\n      fi\n    fi\n  fi\n}\n\n# vbench-i2v has 1120 samples\n\nVBENCH_I2V_H=256\nVBENCH_I2V_W=256\n\nfunction run_vbench_i2v() {\n  if [ -z ${VBENCH_RES} ] || [ -z ${VBENCH_ASP_RATIO} ]; then\n    eval $CMD --ckpt-path $CKPT --save-dir ${OUTPUT}_vbench_i2v --prompt-as-path --num-sample 5 \\\n      --prompt-path assets/texts/VBench/all_i2v.txt \\\n      --image-size $VBENCH_I2V_H $VBENCH_I2V_W \\\n      --batch-size $VBENCH_BS --num-frames $NUM_FRAMES --start-index $1 --end-index $2\n  else\n    if [ -z ${NUM_SAMPLING_STEPS} ]; then\n        eval $CMD --ckpt-path $CKPT --save-dir ${OUTPUT}_vbench_i2v --prompt-as-path --num-sample 5 \\\n        --prompt-path assets/texts/VBench/all_i2v.txt \\\n        --resolution $VBENCH_RES --aspect-ratio $VBENCH_ASP_RATIO \\\n        --batch-size $VBENCH_BS --num-frames $NUM_FRAMES --start-index $1 --end-index $2\n    else\n      if [ -z ${FLOW} ]; then\n        eval $CMD --ckpt-path $CKPT --save-dir ${OUTPUT}_vbench_i2v --prompt-as-path --num-sample 5 \\\n        --prompt-path assets/texts/VBench/all_i2v.txt \\\n        --resolution $VBENCH_RES --aspect-ratio $VBENCH_ASP_RATIO --num-sampling-steps ${NUM_SAMPLING_STEPS} \\\n        --batch-size $VBENCH_BS --num-frames $NUM_FRAMES --start-index $1 --end-index $2\n      else\n        if [ -z ${LLM_REFINE} ]; then\n          eval $CMD --ckpt-path $CKPT --save-dir ${OUTPUT}_vbench_i2v --prompt-as-path --num-sample 5 \\\n          --prompt-path assets/texts/VBench/all_i2v.txt \\\n          --resolution $VBENCH_RES --aspect-ratio $VBENCH_ASP_RATIO --num-sampling-steps ${NUM_SAMPLING_STEPS} --flow ${FLOW} \\\n          --batch-size $VBENCH_BS --num-frames $NUM_FRAMES --start-index $1 --end-index $2\n        else\n          if [ \"${FLOW}\" = \"None\" ]; then\n            eval $CMD --ckpt-path $CKPT --save-dir ${OUTPUT}_vbench_i2v --prompt-as-path --num-sample 5 \\\n            --prompt-path assets/texts/VBench/all_i2v.txt \\\n            --resolution $VBENCH_RES --aspect-ratio $VBENCH_ASP_RATIO --num-sampling-steps ${NUM_SAMPLING_STEPS} --llm-refine ${LLM_REFINE} \\\n            --batch-size $VBENCH_BS --num-frames $NUM_FRAMES --start-index $1 --end-index $2\n          else\n            eval $CMD --ckpt-path $CKPT --save-dir ${OUTPUT}_vbench_i2v --prompt-as-path --num-sample 5 \\\n            --prompt-path assets/texts/VBench/all_i2v.txt \\\n            --resolution $VBENCH_RES --aspect-ratio $VBENCH_ASP_RATIO --num-sampling-steps ${NUM_SAMPLING_STEPS} --flow ${FLOW} --llm-refine ${LLM_REFINE} \\\n            --batch-size $VBENCH_BS --num-frames $NUM_FRAMES --start-index $1 --end-index $2\n          fi\n        fi\n      fi\n    fi\n  fi\n}\n\n### Main\n\nfor arg in \"$@\"; do\n  # image\n  if [[ \"$arg\" = -1 ]] || [[ \"$arg\" = --image ]]; then\n    echo \"Running image samples...\"\n    run_image\n  fi\n  if [[ \"$arg\" = -2a ]] || [[ \"$arg\" = --video ]]; then\n    echo \"Running video samples a...\"\n    run_video_a\n  fi\n  if [[ \"$arg\" = -2b ]] || [[ \"$arg\" = --video ]]; then\n    echo \"Running video samples b...\"\n    run_video_b\n  fi\n  if [[ \"$arg\" = -2c ]] || [[ \"$arg\" = --video ]]; then\n    echo \"Running video samples c...\"\n    run_video_c\n  fi\n  if [[ \"$arg\" = -2d ]] || [[ \"$arg\" = --video ]]; then\n    echo \"Running video samples d...\"\n    run_video_d\n  fi\n  if [[ \"$arg\" = -2e ]] || [[ \"$arg\" = --video ]]; then\n    echo \"Running video samples e...\"\n    run_video_e\n  fi\n  if [[ \"$arg\" = -2f ]] || [[ \"$arg\" = --video ]]; then\n    echo \"Running video samples f...\"\n    run_video_f\n  fi\n  if [[ \"$arg\" = -2g ]] || [[ \"$arg\" = --video ]]; then\n    echo \"Running video samples g...\"\n    run_video_g\n  fi\n  if [[ \"$arg\" = -2h ]] || [[ \"$arg\" = --video ]]; then\n    echo \"Running video samples h...\"\n    run_video_h\n  fi\n  # vbench\n  if [[ \"$arg\" = -4 ]] || [[ \"$arg\" = --vbench ]]; then\n    echo \"Running vbench samples ...\"\n    if [ -z ${VBENCH_START_INDEX} ] || [ -z ${VBENCH_END_INDEX} ]; then\n      echo \"need to set start_index and end_index\"\n    else\n      run_vbench $VBENCH_START_INDEX $VBENCH_END_INDEX\n    fi\n  fi\n  # vbench-i2v\n  if [[ \"$arg\" = -5 ]] || [[ \"$arg\" = --vbench-i2v ]]; then\n    echo \"Running vbench-i2v samples ...\"\n    if [ -z ${VBENCH_START_INDEX} ] || [ -z ${VBENCH_END_INDEX} ]; then\n      echo \"need to set start_index and end_index\"\n    else\n      run_vbench_i2v $VBENCH_START_INDEX $VBENCH_END_INDEX\n    fi\n  fi\ndone\n\n### End\n\nend=$(date +%s)\n\nruntime=$((end - start))\n\necho \"Runtime: $runtime seconds\"\n"
  },
  {
    "path": "Open-Sora/eval/vae/cal_flolpips.py",
    "content": "import sys\n\nimport numpy as np\nimport torch\nfrom tqdm import tqdm\n\nsys.path.append(\".\")\n\nfrom flolpips.flolpips import FloLPIPS\nfrom flolpips.pwcnet import Network as PWCNet\n\nloss_fn = FloLPIPS(net=\"alex\", version=\"0.1\").eval().requires_grad_(False)\nflownet = PWCNet().eval().requires_grad_(False)\n\n\ndef trans(x):\n    return x\n\n\ndef calculate_flolpips(videos1, videos2, device):\n    global loss_fn, flownet\n\n    print(\"calculate_flowlpips...\")\n    loss_fn = loss_fn.to(device)\n    flownet = flownet.to(device)\n\n    if videos1.shape != videos2.shape:\n        print(\"Warning: the shape of videos are not equal.\")\n        min_frames = min(videos1.shape[1], videos2.shape[1])\n        videos1 = videos1[:, :min_frames]\n        videos2 = videos2[:, :min_frames]\n\n    videos1 = trans(videos1)\n    videos2 = trans(videos2)\n\n    flolpips_results = []\n    for video_num in tqdm(range(videos1.shape[0])):\n        video1 = videos1[video_num].to(device)\n        video2 = videos2[video_num].to(device)\n        frames_rec = video1[:-1]\n        frames_rec_next = video1[1:]\n        frames_gt = video2[:-1]\n        frames_gt_next = video2[1:]\n        t, c, h, w = frames_gt.shape\n        flow_gt = flownet(frames_gt, frames_gt_next)\n        flow_dis = flownet(frames_rec, frames_rec_next)\n        flow_diff = flow_gt - flow_dis\n        flolpips = loss_fn.forward(frames_gt, frames_rec, flow_diff, normalize=True)\n        flolpips_results.append(flolpips.cpu().numpy().tolist())\n\n    flolpips_results = np.array(flolpips_results)  # [batch_size, num_frames]\n    flolpips = {}\n    flolpips_std = {}\n\n    for clip_timestamp in range(flolpips_results.shape[1]):\n        flolpips[clip_timestamp] = np.mean(flolpips_results[:, clip_timestamp], axis=-1)\n        flolpips_std[clip_timestamp] = np.std(flolpips_results[:, clip_timestamp], axis=-1)\n\n    result = {\n        \"value\": flolpips,\n        \"value_std\": flolpips_std,\n        \"video_setting\": video1.shape,\n        \"video_setting_name\": \"time, channel, heigth, width\",\n        \"result\": flolpips_results,\n        \"details\": flolpips_results.tolist(),\n    }\n\n    return result\n\n\n# test code / using example\n\n\ndef main():\n    NUMBER_OF_VIDEOS = 8\n    VIDEO_LENGTH = 50\n    CHANNEL = 3\n    SIZE = 64\n    videos1 = torch.zeros(NUMBER_OF_VIDEOS, VIDEO_LENGTH, CHANNEL, SIZE, SIZE, requires_grad=False)\n    videos2 = torch.zeros(NUMBER_OF_VIDEOS, VIDEO_LENGTH, CHANNEL, SIZE, SIZE, requires_grad=False)\n\n    import json\n\n    result = calculate_flolpips(videos1, videos2, \"cuda:0\")\n    print(json.dumps(result, indent=4))\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/eval/vae/cal_lpips.py",
    "content": "import lpips\nimport numpy as np\nimport torch\nfrom tqdm import tqdm\n\nspatial = True  # Return a spatial map of perceptual distance.\n\n# Linearly calibrated models (LPIPS)\nloss_fn = lpips.LPIPS(net=\"alex\", spatial=spatial)  # Can also set net = 'squeeze' or 'vgg'\n# loss_fn = lpips.LPIPS(net='alex', spatial=spatial, lpips=False) # Can also set net = 'squeeze' or 'vgg'\n\n\ndef trans(x):\n    # if greyscale images add channel\n    if x.shape[-3] == 1:\n        x = x.repeat(1, 1, 3, 1, 1)\n\n    # value range [0, 1] -> [-1, 1]\n    x = x * 2 - 1\n\n    return x\n\n\ndef calculate_lpips(videos1, videos2, device):\n    # image should be RGB, IMPORTANT: normalized to [-1,1]\n    print(\"calculate_lpips...\")\n\n    assert videos1.shape == videos2.shape\n\n    # videos [batch_size, timestamps, channel, h, w]\n\n    # support grayscale input, if grayscale -> channel*3\n    # value range [0, 1] -> [-1, 1]\n    videos1 = trans(videos1)\n    videos2 = trans(videos2)\n\n    lpips_results = []\n\n    for video_num in tqdm(range(videos1.shape[0])):\n        # get a video\n        # video [timestamps, channel, h, w]\n        video1 = videos1[video_num]\n        video2 = videos2[video_num]\n\n        lpips_results_of_a_video = []\n        for clip_timestamp in range(len(video1)):\n            # get a img\n            # img [timestamps[x], channel, h, w]\n            # img [channel, h, w] tensor\n\n            img1 = video1[clip_timestamp].unsqueeze(0).to(device)\n            img2 = video2[clip_timestamp].unsqueeze(0).to(device)\n\n            loss_fn.to(device)\n\n            # calculate lpips of a video\n            lpips_results_of_a_video.append(loss_fn.forward(img1, img2).mean().detach().cpu().tolist())\n        lpips_results.append(lpips_results_of_a_video)\n\n    lpips_results = np.array(lpips_results)\n\n    lpips = {}\n    lpips_std = {}\n\n    for clip_timestamp in range(len(video1)):\n        lpips[clip_timestamp] = np.mean(lpips_results[:, clip_timestamp])\n        lpips_std[clip_timestamp] = np.std(lpips_results[:, clip_timestamp])\n\n    result = {\n        \"value\": lpips,\n        \"value_std\": lpips_std,\n        \"video_setting\": video1.shape,\n        \"video_setting_name\": \"time, channel, heigth, width\",\n    }\n\n    return result\n\n\n# test code / using example\n\n\ndef main():\n    NUMBER_OF_VIDEOS = 8\n    VIDEO_LENGTH = 50\n    CHANNEL = 3\n    SIZE = 64\n    videos1 = torch.zeros(NUMBER_OF_VIDEOS, VIDEO_LENGTH, CHANNEL, SIZE, SIZE, requires_grad=False)\n    videos2 = torch.ones(NUMBER_OF_VIDEOS, VIDEO_LENGTH, CHANNEL, SIZE, SIZE, requires_grad=False)\n    device = torch.device(\"cuda\")\n    # device = torch.device(\"cpu\")\n\n    import json\n\n    result = calculate_lpips(videos1, videos2, device)\n    print(json.dumps(result, indent=4))\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/eval/vae/cal_psnr.py",
    "content": "import math\n\nimport numpy as np\nimport torch\nfrom tqdm import tqdm\n\n\ndef img_psnr(img1, img2):\n    # [0,1]\n    # compute mse\n    # mse = np.mean((img1-img2)**2)\n    mse = np.mean((img1 / 1.0 - img2 / 1.0) ** 2)\n    # compute psnr\n    if mse < 1e-10:\n        return 100\n    psnr = 20 * math.log10(1 / math.sqrt(mse))\n    return psnr\n\n\ndef trans(x):\n    return x\n\n\ndef calculate_psnr(videos1, videos2):\n    print(\"calculate_psnr...\")\n\n    # videos [batch_size, timestamps, channel, h, w]\n\n    assert videos1.shape == videos2.shape\n\n    videos1 = trans(videos1)\n    videos2 = trans(videos2)\n\n    psnr_results = []\n\n    for video_num in tqdm(range(videos1.shape[0])):\n        # get a video\n        # video [timestamps, channel, h, w]\n        video1 = videos1[video_num]\n        video2 = videos2[video_num]\n\n        psnr_results_of_a_video = []\n        for clip_timestamp in range(len(video1)):\n            # get a img\n            # img [timestamps[x], channel, h, w]\n            # img [channel, h, w] numpy\n\n            img1 = video1[clip_timestamp].numpy()\n            img2 = video2[clip_timestamp].numpy()\n\n            # calculate psnr of a video\n            psnr_results_of_a_video.append(img_psnr(img1, img2))\n\n        psnr_results.append(psnr_results_of_a_video)\n\n    psnr_results = np.array(psnr_results)  # [batch_size, num_frames]\n    psnr = {}\n    psnr_std = {}\n\n    for clip_timestamp in range(len(video1)):\n        psnr[clip_timestamp] = np.mean(psnr_results[:, clip_timestamp])\n        psnr_std[clip_timestamp] = np.std(psnr_results[:, clip_timestamp])\n\n    result = {\n        \"value\": psnr,\n        \"value_std\": psnr_std,\n        \"video_setting\": video1.shape,\n        \"video_setting_name\": \"time, channel, heigth, width\",\n    }\n\n    return result\n\n\n# test code / using example\n\n\ndef main():\n    NUMBER_OF_VIDEOS = 8\n    VIDEO_LENGTH = 50\n    CHANNEL = 3\n    SIZE = 64\n    videos1 = torch.zeros(NUMBER_OF_VIDEOS, VIDEO_LENGTH, CHANNEL, SIZE, SIZE, requires_grad=False)\n    videos2 = torch.zeros(NUMBER_OF_VIDEOS, VIDEO_LENGTH, CHANNEL, SIZE, SIZE, requires_grad=False)\n\n    import json\n\n    result = calculate_psnr(videos1, videos2)\n    print(json.dumps(result, indent=4))\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/eval/vae/cal_ssim.py",
    "content": "import cv2\nimport numpy as np\nimport torch\nfrom tqdm import tqdm\n\n\ndef ssim(img1, img2):\n    C1 = 0.01**2\n    C2 = 0.03**2\n    img1 = img1.astype(np.float64)\n    img2 = img2.astype(np.float64)\n    kernel = cv2.getGaussianKernel(11, 1.5)\n    window = np.outer(kernel, kernel.transpose())\n    mu1 = cv2.filter2D(img1, -1, window)[5:-5, 5:-5]  # valid\n    mu2 = cv2.filter2D(img2, -1, window)[5:-5, 5:-5]\n    mu1_sq = mu1**2\n    mu2_sq = mu2**2\n    mu1_mu2 = mu1 * mu2\n    sigma1_sq = cv2.filter2D(img1**2, -1, window)[5:-5, 5:-5] - mu1_sq\n    sigma2_sq = cv2.filter2D(img2**2, -1, window)[5:-5, 5:-5] - mu2_sq\n    sigma12 = cv2.filter2D(img1 * img2, -1, window)[5:-5, 5:-5] - mu1_mu2\n    ssim_map = ((2 * mu1_mu2 + C1) * (2 * sigma12 + C2)) / ((mu1_sq + mu2_sq + C1) * (sigma1_sq + sigma2_sq + C2))\n    return ssim_map.mean()\n\n\ndef calculate_ssim_function(img1, img2):\n    # [0,1]\n    # ssim is the only metric extremely sensitive to gray being compared to b/w\n    if not img1.shape == img2.shape:\n        raise ValueError(\"Input images must have the same dimensions.\")\n    if img1.ndim == 2:\n        return ssim(img1, img2)\n    elif img1.ndim == 3:\n        if img1.shape[0] == 3:\n            ssims = []\n            for i in range(3):\n                ssims.append(ssim(img1[i], img2[i]))\n            return np.array(ssims).mean()\n        elif img1.shape[0] == 1:\n            return ssim(np.squeeze(img1), np.squeeze(img2))\n    else:\n        raise ValueError(\"Wrong input image dimensions.\")\n\n\ndef trans(x):\n    return x\n\n\ndef calculate_ssim(videos1, videos2):\n    print(\"calculate_ssim...\")\n\n    # videos [batch_size, timestamps, channel, h, w]\n\n    assert videos1.shape == videos2.shape\n\n    videos1 = trans(videos1)\n    videos2 = trans(videos2)\n\n    ssim_results = []\n\n    for video_num in tqdm(range(videos1.shape[0])):\n        # get a video\n        # video [timestamps, channel, h, w]\n        video1 = videos1[video_num]\n        video2 = videos2[video_num]\n\n        ssim_results_of_a_video = []\n        for clip_timestamp in range(len(video1)):\n            # get a img\n            # img [timestamps[x], channel, h, w]\n            # img [channel, h, w] numpy\n\n            img1 = video1[clip_timestamp].numpy()\n            img2 = video2[clip_timestamp].numpy()\n\n            # calculate ssim of a video\n            ssim_results_of_a_video.append(calculate_ssim_function(img1, img2))\n\n        ssim_results.append(ssim_results_of_a_video)\n\n    ssim_results = np.array(ssim_results)\n\n    ssim = {}\n    ssim_std = {}\n\n    for clip_timestamp in range(len(video1)):\n        ssim[clip_timestamp] = np.mean(ssim_results[:, clip_timestamp])\n        ssim_std[clip_timestamp] = np.std(ssim_results[:, clip_timestamp])\n\n    result = {\n        \"value\": ssim,\n        \"value_std\": ssim_std,\n        \"video_setting\": video1.shape,\n        \"video_setting_name\": \"time, channel, heigth, width\",\n    }\n\n    return result\n\n\n# test code / using example\n\n\ndef main():\n    NUMBER_OF_VIDEOS = 8\n    VIDEO_LENGTH = 50\n    CHANNEL = 3\n    SIZE = 64\n    videos1 = torch.zeros(NUMBER_OF_VIDEOS, VIDEO_LENGTH, CHANNEL, SIZE, SIZE, requires_grad=False)\n    videos2 = torch.zeros(NUMBER_OF_VIDEOS, VIDEO_LENGTH, CHANNEL, SIZE, SIZE, requires_grad=False)\n    torch.device(\"cuda\")\n\n    import json\n\n    result = calculate_ssim(videos1, videos2)\n    print(json.dumps(result, indent=4))\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/eval/vae/eval_common_metric.py",
    "content": "\"\"\"Calculates the CLIP Scores\n\nThe CLIP model is a contrasitively learned language-image model. There is\nan image encoder and a text encoder. It is believed that the CLIP model could\nmeasure the similarity of cross modalities. Please find more information from\nhttps://github.com/openai/CLIP.\n\nThe CLIP Score measures the Cosine Similarity between two embedded features.\nThis repository utilizes the pretrained CLIP Model to calculate\nthe mean average of cosine similarities.\n\nSee --help to see further details.\n\nCode apapted from https://github.com/mseitzer/pytorch-fid and https://github.com/openai/CLIP.\n\nCopyright 2023 The Hong Kong Polytechnic University\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n   http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n\"\"\"\n\nimport os\nimport os.path as osp\nimport sys\nfrom argparse import ArgumentDefaultsHelpFormatter, ArgumentParser\n\nimport numpy as np\nimport torch\nfrom decord import VideoReader, cpu\nfrom pytorchvideo.transforms import ShortSideScale\nfrom torch.utils.data import DataLoader, Dataset, Subset\nfrom torchvision.transforms import Compose, Lambda\nfrom torchvision.transforms._transforms_video import CenterCropVideo\n\nsys.path.append(\".\")\nfrom cal_flolpips import calculate_flolpips\nfrom cal_lpips import calculate_lpips\nfrom cal_psnr import calculate_psnr\nfrom cal_ssim import calculate_ssim\n\ntry:\n    from tqdm import tqdm\nexcept ImportError:\n    # If tqdm is not available, provide a mock version of it\n    def tqdm(x):\n        return x\n\n\nclass VideoDataset(Dataset):\n    def __init__(\n        self,\n        real_video_dir,\n        generated_video_dir,\n        num_frames,\n        sample_rate=1,\n        crop_size=None,\n        resolution=128,\n    ) -> None:\n        super().__init__()\n        self.real_video_files = self._combine_without_prefix(real_video_dir)\n        self.generated_video_files = self._combine_without_prefix(generated_video_dir)\n        self.num_frames = num_frames\n        self.sample_rate = sample_rate\n        self.crop_size = crop_size\n        self.short_size = resolution\n\n    def __len__(self):\n        return len(self.real_video_files)\n\n    def __getitem__(self, index):\n        if index >= len(self):\n            raise IndexError\n        real_video_file = self.real_video_files[index]\n        generated_video_file = self.generated_video_files[index]\n        print(real_video_file, generated_video_file)\n        real_video_tensor = self._load_video(real_video_file)\n        generated_video_tensor = self._load_video(generated_video_file)\n        return {\"real\": real_video_tensor, \"generated\": generated_video_tensor}\n\n    def _load_video(self, video_path):\n        num_frames = self.num_frames\n        sample_rate = self.sample_rate\n        decord_vr = VideoReader(video_path, ctx=cpu(0))\n        total_frames = len(decord_vr)\n        sample_frames_len = sample_rate * num_frames\n\n        if total_frames >= sample_frames_len:\n            s = 0\n            e = s + sample_frames_len\n            num_frames = num_frames\n        else:\n            s = 0\n            e = total_frames\n            num_frames = int(total_frames / sample_frames_len * num_frames)\n            print(\n                f\"sample_frames_len {sample_frames_len}, only can sample {num_frames * sample_rate}\",\n                video_path,\n                total_frames,\n            )\n\n        frame_id_list = np.linspace(s, e - 1, num_frames, dtype=int)\n        video_data = decord_vr.get_batch(frame_id_list).asnumpy()\n        video_data = torch.from_numpy(video_data)\n        video_data = video_data.permute(0, 3, 1, 2)  # (T, H, W, C) -> (C, T, H, W)\n        return _preprocess(video_data, short_size=self.short_size, crop_size=self.crop_size)\n\n    def _combine_without_prefix(self, folder_path, prefix=\".\"):\n        folder = []\n        os.makedirs(folder_path, exist_ok=True)\n        for name in os.listdir(folder_path):\n            if name[0] == prefix:\n                continue\n            if osp.isfile(osp.join(folder_path, name)):\n                folder.append(osp.join(folder_path, name))\n        folder.sort()\n        return folder\n\n\ndef _preprocess(video_data, short_size=128, crop_size=None):\n    transform = Compose(\n        [\n            Lambda(lambda x: x / 255.0),\n            ShortSideScale(size=short_size),\n            CenterCropVideo(crop_size=crop_size),\n        ]\n    )\n    video_outputs = transform(video_data)\n    # video_outputs = torch.unsqueeze(video_outputs, 0) # (bz,c,t,h,w)\n    return video_outputs\n\n\ndef calculate_common_metric(args, dataloader, device):\n    metric_dict = {}\n    if type(args.metric) is str:\n        args.metric = [m.strip() for m in args.metric.split(\",\")]\n    print(args.metric)\n    for metric in args.metric:\n        score_list = []\n        for batch_data in tqdm(dataloader):  # {'real': real_video_tensor, 'generated':generated_video_tensor }\n            real_videos = batch_data[\"real\"]\n            generated_videos = batch_data[\"generated\"]\n            assert real_videos.shape[2] == generated_videos.shape[2]\n            if metric == \"ssim\":\n                tmp_list = list(calculate_ssim(real_videos, generated_videos)[\"value\"].values())\n            elif metric == \"psnr\":\n                tmp_list = list(calculate_psnr(real_videos, generated_videos)[\"value\"].values())\n            elif metric == \"flolpips\":\n                result = calculate_flolpips(real_videos, generated_videos, args.device)\n                tmp_list = list(result[\"value\"].values())\n            elif metric == \"lpips\":\n                tmp_list = list(calculate_lpips(real_videos, generated_videos, args.device)[\"value\"].values())\n            else:\n                print(f\"metric {metric} is not in acceped list, not calculated\")\n                continue\n            score_list += tmp_list\n        metric_dict[metric] = np.mean(score_list)\n\n    return metric_dict\n\n\ndef main():\n    parser = ArgumentParser(formatter_class=ArgumentDefaultsHelpFormatter)\n    parser.add_argument(\"--batch_size\", type=int, default=2, help=\"Batch size to use\")\n    parser.add_argument(\"--real_video_dir\", type=str, help=(\"the path of real videos`\"))\n    parser.add_argument(\"--generated_video_dir\", type=str, help=(\"the path of generated videos`\"))\n    parser.add_argument(\"--device\", type=str, default=None, help=\"Device to use. Like cuda, cuda:0 or cpu\")\n    parser.add_argument(\n        \"--num_workers\",\n        type=int,\n        default=8,\n        help=(\"Number of processes to use for data loading. \" \"Defaults to `min(8, num_cpus)`\"),\n    )\n    parser.add_argument(\"--sample_fps\", type=int, default=30)\n    parser.add_argument(\"--resolution\", type=int, default=336)\n    parser.add_argument(\"--crop_size\", type=int, default=None)\n    parser.add_argument(\"--num_frames\", type=int, default=100)\n    parser.add_argument(\"--sample_rate\", type=int, default=1)\n    parser.add_argument(\"--subset_size\", type=int, default=None)\n    # parser.add_argument(\"--metric\", type=str, default=\"fvd\",choices=['fvd','psnr','ssim','lpips', 'flolpips'])\n    parser.add_argument(\"--metric\", nargs=\"+\", default=[])\n    parser.add_argument(\"--fvd_method\", type=str, default=\"styleganv\", choices=[\"styleganv\", \"videogpt\"])\n\n    args = parser.parse_args()\n\n    if args.device is None:\n        device = torch.device(\"cuda\" if (torch.cuda.is_available()) else \"cpu\")\n    else:\n        device = torch.device(args.device)\n\n    if args.num_workers is None:\n        try:\n            num_cpus = len(os.sched_getaffinity(0))\n        except AttributeError:\n            # os.sched_getaffinity is not available under Windows, use\n            # os.cpu_count instead (which may not return the *available* number\n            # of CPUs).\n            num_cpus = os.cpu_count()\n\n        num_workers = min(num_cpus, 8) if num_cpus is not None else 0\n    else:\n        num_workers = args.num_workers\n\n    dataset = VideoDataset(\n        args.real_video_dir,\n        args.generated_video_dir,\n        num_frames=args.num_frames,\n        sample_rate=args.sample_rate,\n        crop_size=args.crop_size,\n        resolution=args.resolution,\n    )\n\n    if args.subset_size:\n        indices = range(args.subset_size)\n        dataset = Subset(dataset, indices=indices)\n\n    dataloader = DataLoader(dataset, args.batch_size, num_workers=num_workers, pin_memory=True)\n\n    metric_score = calculate_common_metric(args, dataloader, device)\n    print(\"metric: \", args.metric, \" \", metric_score)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/eval/vae/flolpips/correlation/correlation.py",
    "content": "#!/usr/bin/env python\n\nimport re\n\nimport cupy\nimport torch\n\nkernel_Correlation_rearrange = \"\"\"\n\textern \"C\" __global__ void kernel_Correlation_rearrange(\n\t\tconst int n,\n\t\tconst float* input,\n\t\tfloat* output\n\t) {\n\t  int intIndex = (blockIdx.x * blockDim.x) + threadIdx.x;\n\n\t  if (intIndex >= n) {\n\t    return;\n\t  }\n\n\t  int intSample = blockIdx.z;\n\t  int intChannel = blockIdx.y;\n\n\t  float fltValue = input[(((intSample * SIZE_1(input)) + intChannel) * SIZE_2(input) * SIZE_3(input)) + intIndex];\n\n\t  __syncthreads();\n\n\t  int intPaddedY = (intIndex / SIZE_3(input)) + 4;\n\t  int intPaddedX = (intIndex % SIZE_3(input)) + 4;\n\t  int intRearrange = ((SIZE_3(input) + 8) * intPaddedY) + intPaddedX;\n\n\t  output[(((intSample * SIZE_1(output) * SIZE_2(output)) + intRearrange) * SIZE_1(input)) + intChannel] = fltValue;\n\t}\n\"\"\"\n\nkernel_Correlation_updateOutput = \"\"\"\n\textern \"C\" __global__ void kernel_Correlation_updateOutput(\n\t  const int n,\n\t  const float* rbot0,\n\t  const float* rbot1,\n\t  float* top\n\t) {\n\t  extern __shared__ char patch_data_char[];\n\n\t  float *patch_data = (float *)patch_data_char;\n\n\t  // First (upper left) position of kernel upper-left corner in current center position of neighborhood in image 1\n\t  int x1 = blockIdx.x + 4;\n\t  int y1 = blockIdx.y + 4;\n\t  int item = blockIdx.z;\n\t  int ch_off = threadIdx.x;\n\n\t  // Load 3D patch into shared shared memory\n\t  for (int j = 0; j < 1; j++) { // HEIGHT\n\t    for (int i = 0; i < 1; i++) { // WIDTH\n\t      int ji_off = (j + i) * SIZE_3(rbot0);\n\t      for (int ch = ch_off; ch < SIZE_3(rbot0); ch += 32) { // CHANNELS\n\t        int idx1 = ((item * SIZE_1(rbot0) + y1+j) * SIZE_2(rbot0) + x1+i) * SIZE_3(rbot0) + ch;\n\t        int idxPatchData = ji_off + ch;\n\t        patch_data[idxPatchData] = rbot0[idx1];\n\t      }\n\t    }\n\t  }\n\n\t  __syncthreads();\n\n\t  __shared__ float sum[32];\n\n\t  // Compute correlation\n\t  for (int top_channel = 0; top_channel < SIZE_1(top); top_channel++) {\n\t    sum[ch_off] = 0;\n\n\t    int s2o = top_channel % 9 - 4;\n\t    int s2p = top_channel / 9 - 4;\n\n\t    for (int j = 0; j < 1; j++) { // HEIGHT\n\t      for (int i = 0; i < 1; i++) { // WIDTH\n\t        int ji_off = (j + i) * SIZE_3(rbot0);\n\t        for (int ch = ch_off; ch < SIZE_3(rbot0); ch += 32) { // CHANNELS\n\t          int x2 = x1 + s2o;\n\t          int y2 = y1 + s2p;\n\n\t          int idxPatchData = ji_off + ch;\n\t          int idx2 = ((item * SIZE_1(rbot0) + y2+j) * SIZE_2(rbot0) + x2+i) * SIZE_3(rbot0) + ch;\n\n\t          sum[ch_off] += patch_data[idxPatchData] * rbot1[idx2];\n\t        }\n\t      }\n\t    }\n\n\t    __syncthreads();\n\n\t    if (ch_off == 0) {\n\t      float total_sum = 0;\n\t      for (int idx = 0; idx < 32; idx++) {\n\t        total_sum += sum[idx];\n\t      }\n\t      const int sumelems = SIZE_3(rbot0);\n\t      const int index = ((top_channel*SIZE_2(top) + blockIdx.y)*SIZE_3(top))+blockIdx.x;\n\t      top[index + item*SIZE_1(top)*SIZE_2(top)*SIZE_3(top)] = total_sum / (float)sumelems;\n\t    }\n\t  }\n\t}\n\"\"\"\n\nkernel_Correlation_updateGradFirst = \"\"\"\n\t#define ROUND_OFF 50000\n\n\textern \"C\" __global__ void kernel_Correlation_updateGradFirst(\n\t  const int n,\n\t  const int intSample,\n\t  const float* rbot0,\n\t  const float* rbot1,\n\t  const float* gradOutput,\n\t  float* gradFirst,\n\t  float* gradSecond\n\t) { for (int intIndex = (blockIdx.x * blockDim.x) + threadIdx.x; intIndex < n; intIndex += blockDim.x * gridDim.x) {\n\t  int n = intIndex % SIZE_1(gradFirst); // channels\n\t  int l = (intIndex / SIZE_1(gradFirst)) % SIZE_3(gradFirst) + 4; // w-pos\n\t  int m = (intIndex / SIZE_1(gradFirst) / SIZE_3(gradFirst)) % SIZE_2(gradFirst) + 4; // h-pos\n\n\t  // round_off is a trick to enable integer division with ceil, even for negative numbers\n\t  // We use a large offset, for the inner part not to become negative.\n\t  const int round_off = ROUND_OFF;\n\t  const int round_off_s1 = round_off;\n\n\t  // We add round_off before_s1 the int division and subtract round_off after it, to ensure the formula matches ceil behavior:\n\t  int xmin = (l - 4 + round_off_s1 - 1) + 1 - round_off; // ceil (l - 4)\n\t  int ymin = (m - 4 + round_off_s1 - 1) + 1 - round_off; // ceil (l - 4)\n\n\t  // Same here:\n\t  int xmax = (l - 4 + round_off_s1) - round_off; // floor (l - 4)\n\t  int ymax = (m - 4 + round_off_s1) - round_off; // floor (m - 4)\n\n\t  float sum = 0;\n\t  if (xmax>=0 && ymax>=0 && (xmin<=SIZE_3(gradOutput)-1) && (ymin<=SIZE_2(gradOutput)-1)) {\n\t    xmin = max(0,xmin);\n\t    xmax = min(SIZE_3(gradOutput)-1,xmax);\n\n\t    ymin = max(0,ymin);\n\t    ymax = min(SIZE_2(gradOutput)-1,ymax);\n\n\t    for (int p = -4; p <= 4; p++) {\n\t      for (int o = -4; o <= 4; o++) {\n\t        // Get rbot1 data:\n\t        int s2o = o;\n\t        int s2p = p;\n\t        int idxbot1 = ((intSample * SIZE_1(rbot0) + (m+s2p)) * SIZE_2(rbot0) + (l+s2o)) * SIZE_3(rbot0) + n;\n\t        float bot1tmp = rbot1[idxbot1]; // rbot1[l+s2o,m+s2p,n]\n\n\t        // Index offset for gradOutput in following loops:\n\t        int op = (p+4) * 9 + (o+4); // index[o,p]\n\t        int idxopoffset = (intSample * SIZE_1(gradOutput) + op);\n\n\t        for (int y = ymin; y <= ymax; y++) {\n\t          for (int x = xmin; x <= xmax; x++) {\n\t            int idxgradOutput = (idxopoffset * SIZE_2(gradOutput) + y) * SIZE_3(gradOutput) + x; // gradOutput[x,y,o,p]\n\t            sum += gradOutput[idxgradOutput] * bot1tmp;\n\t          }\n\t        }\n\t      }\n\t    }\n\t  }\n\t  const int sumelems = SIZE_1(gradFirst);\n\t  const int bot0index = ((n * SIZE_2(gradFirst)) + (m-4)) * SIZE_3(gradFirst) + (l-4);\n\t  gradFirst[bot0index + intSample*SIZE_1(gradFirst)*SIZE_2(gradFirst)*SIZE_3(gradFirst)] = sum / (float)sumelems;\n\t} }\n\"\"\"\n\nkernel_Correlation_updateGradSecond = \"\"\"\n\t#define ROUND_OFF 50000\n\n\textern \"C\" __global__ void kernel_Correlation_updateGradSecond(\n\t  const int n,\n\t  const int intSample,\n\t  const float* rbot0,\n\t  const float* rbot1,\n\t  const float* gradOutput,\n\t  float* gradFirst,\n\t  float* gradSecond\n\t) { for (int intIndex = (blockIdx.x * blockDim.x) + threadIdx.x; intIndex < n; intIndex += blockDim.x * gridDim.x) {\n\t  int n = intIndex % SIZE_1(gradSecond); // channels\n\t  int l = (intIndex / SIZE_1(gradSecond)) % SIZE_3(gradSecond) + 4; // w-pos\n\t  int m = (intIndex / SIZE_1(gradSecond) / SIZE_3(gradSecond)) % SIZE_2(gradSecond) + 4; // h-pos\n\n\t  // round_off is a trick to enable integer division with ceil, even for negative numbers\n\t  // We use a large offset, for the inner part not to become negative.\n\t  const int round_off = ROUND_OFF;\n\t  const int round_off_s1 = round_off;\n\n\t  float sum = 0;\n\t  for (int p = -4; p <= 4; p++) {\n\t    for (int o = -4; o <= 4; o++) {\n\t      int s2o = o;\n\t      int s2p = p;\n\n\t      //Get X,Y ranges and clamp\n\t      // We add round_off before_s1 the int division and subtract round_off after it, to ensure the formula matches ceil behavior:\n\t      int xmin = (l - 4 - s2o + round_off_s1 - 1) + 1 - round_off; // ceil (l - 4 - s2o)\n\t      int ymin = (m - 4 - s2p + round_off_s1 - 1) + 1 - round_off; // ceil (l - 4 - s2o)\n\n\t      // Same here:\n\t      int xmax = (l - 4 - s2o + round_off_s1) - round_off; // floor (l - 4 - s2o)\n\t      int ymax = (m - 4 - s2p + round_off_s1) - round_off; // floor (m - 4 - s2p)\n\n\t      if (xmax>=0 && ymax>=0 && (xmin<=SIZE_3(gradOutput)-1) && (ymin<=SIZE_2(gradOutput)-1)) {\n\t        xmin = max(0,xmin);\n\t        xmax = min(SIZE_3(gradOutput)-1,xmax);\n\n\t        ymin = max(0,ymin);\n\t        ymax = min(SIZE_2(gradOutput)-1,ymax);\n\n\t        // Get rbot0 data:\n\t        int idxbot0 = ((intSample * SIZE_1(rbot0) + (m-s2p)) * SIZE_2(rbot0) + (l-s2o)) * SIZE_3(rbot0) + n;\n\t        float bot0tmp = rbot0[idxbot0]; // rbot1[l+s2o,m+s2p,n]\n\n\t        // Index offset for gradOutput in following loops:\n\t        int op = (p+4) * 9 + (o+4); // index[o,p]\n\t        int idxopoffset = (intSample * SIZE_1(gradOutput) + op);\n\n\t        for (int y = ymin; y <= ymax; y++) {\n\t          for (int x = xmin; x <= xmax; x++) {\n\t            int idxgradOutput = (idxopoffset * SIZE_2(gradOutput) + y) * SIZE_3(gradOutput) + x; // gradOutput[x,y,o,p]\n\t            sum += gradOutput[idxgradOutput] * bot0tmp;\n\t          }\n\t        }\n\t      }\n\t    }\n\t  }\n\t  const int sumelems = SIZE_1(gradSecond);\n\t  const int bot1index = ((n * SIZE_2(gradSecond)) + (m-4)) * SIZE_3(gradSecond) + (l-4);\n\t  gradSecond[bot1index + intSample*SIZE_1(gradSecond)*SIZE_2(gradSecond)*SIZE_3(gradSecond)] = sum / (float)sumelems;\n\t} }\n\"\"\"\n\n\ndef cupy_kernel(strFunction, objVariables):\n    strKernel = globals()[strFunction]\n\n    while True:\n        objMatch = re.search(\"(SIZE_)([0-4])(\\()([^\\)]*)(\\))\", strKernel)\n\n        if objMatch is None:\n            break\n        # end\n\n        intArg = int(objMatch.group(2))\n\n        strTensor = objMatch.group(4)\n        intSizes = objVariables[strTensor].size()\n\n        strKernel = strKernel.replace(objMatch.group(), str(intSizes[intArg]))\n    # end\n\n    while True:\n        objMatch = re.search(\"(VALUE_)([0-4])(\\()([^\\)]+)(\\))\", strKernel)\n\n        if objMatch is None:\n            break\n        # end\n\n        intArgs = int(objMatch.group(2))\n        strArgs = objMatch.group(4).split(\",\")\n\n        strTensor = strArgs[0]\n        intStrides = objVariables[strTensor].stride()\n        strIndex = [\n            \"((\"\n            + strArgs[intArg + 1].replace(\"{\", \"(\").replace(\"}\", \")\").strip()\n            + \")*\"\n            + str(intStrides[intArg])\n            + \")\"\n            for intArg in range(intArgs)\n        ]\n\n        strKernel = strKernel.replace(objMatch.group(0), strTensor + \"[\" + str.join(\"+\", strIndex) + \"]\")\n    # end\n\n    return strKernel\n\n\n# end\n\n\n@cupy.memoize(for_each_device=True)\ndef cupy_launch(strFunction, strKernel):\n    return cupy.RawKernel(strKernel, strFunction)\n\n\n# end\n\n\nclass _FunctionCorrelation(torch.autograd.Function):\n    @staticmethod\n    def forward(self, first, second):\n        rbot0 = first.new_zeros([first.shape[0], first.shape[2] + 8, first.shape[3] + 8, first.shape[1]])\n        rbot1 = first.new_zeros([first.shape[0], first.shape[2] + 8, first.shape[3] + 8, first.shape[1]])\n\n        self.save_for_backward(first, second, rbot0, rbot1)\n\n        first = first.contiguous()\n        assert first.is_cuda == True\n        second = second.contiguous()\n        assert second.is_cuda == True\n\n        output = first.new_zeros([first.shape[0], 81, first.shape[2], first.shape[3]])\n\n        if first.is_cuda == True:\n            n = first.shape[2] * first.shape[3]\n            cupy_launch(\n                \"kernel_Correlation_rearrange\",\n                cupy_kernel(\"kernel_Correlation_rearrange\", {\"input\": first, \"output\": rbot0}),\n            )(\n                grid=tuple([int((n + 16 - 1) / 16), first.shape[1], first.shape[0]]),\n                block=tuple([16, 1, 1]),\n                args=[n, first.data_ptr(), rbot0.data_ptr()],\n            )\n\n            n = second.shape[2] * second.shape[3]\n            cupy_launch(\n                \"kernel_Correlation_rearrange\",\n                cupy_kernel(\"kernel_Correlation_rearrange\", {\"input\": second, \"output\": rbot1}),\n            )(\n                grid=tuple([int((n + 16 - 1) / 16), second.shape[1], second.shape[0]]),\n                block=tuple([16, 1, 1]),\n                args=[n, second.data_ptr(), rbot1.data_ptr()],\n            )\n\n            n = output.shape[1] * output.shape[2] * output.shape[3]\n            cupy_launch(\n                \"kernel_Correlation_updateOutput\",\n                cupy_kernel(\"kernel_Correlation_updateOutput\", {\"rbot0\": rbot0, \"rbot1\": rbot1, \"top\": output}),\n            )(\n                grid=tuple([output.shape[3], output.shape[2], output.shape[0]]),\n                block=tuple([32, 1, 1]),\n                shared_mem=first.shape[1] * 4,\n                args=[n, rbot0.data_ptr(), rbot1.data_ptr(), output.data_ptr()],\n            )\n\n        elif first.is_cuda == False:\n            raise NotImplementedError()\n\n        # end\n\n        return output\n\n    # end\n\n    @staticmethod\n    def backward(self, gradOutput):\n        first, second, rbot0, rbot1 = self.saved_tensors\n\n        gradOutput = gradOutput.contiguous()\n        assert gradOutput.is_cuda == True\n\n        gradFirst = (\n            first.new_zeros([first.shape[0], first.shape[1], first.shape[2], first.shape[3]])\n            if self.needs_input_grad[0] == True\n            else None\n        )\n        gradSecond = (\n            first.new_zeros([first.shape[0], first.shape[1], first.shape[2], first.shape[3]])\n            if self.needs_input_grad[1] == True\n            else None\n        )\n\n        if first.is_cuda == True:\n            if gradFirst is not None:\n                for intSample in range(first.shape[0]):\n                    n = first.shape[1] * first.shape[2] * first.shape[3]\n                    cupy_launch(\n                        \"kernel_Correlation_updateGradFirst\",\n                        cupy_kernel(\n                            \"kernel_Correlation_updateGradFirst\",\n                            {\n                                \"rbot0\": rbot0,\n                                \"rbot1\": rbot1,\n                                \"gradOutput\": gradOutput,\n                                \"gradFirst\": gradFirst,\n                                \"gradSecond\": None,\n                            },\n                        ),\n                    )(\n                        grid=tuple([int((n + 512 - 1) / 512), 1, 1]),\n                        block=tuple([512, 1, 1]),\n                        args=[\n                            n,\n                            intSample,\n                            rbot0.data_ptr(),\n                            rbot1.data_ptr(),\n                            gradOutput.data_ptr(),\n                            gradFirst.data_ptr(),\n                            None,\n                        ],\n                    )\n                # end\n            # end\n\n            if gradSecond is not None:\n                for intSample in range(first.shape[0]):\n                    n = first.shape[1] * first.shape[2] * first.shape[3]\n                    cupy_launch(\n                        \"kernel_Correlation_updateGradSecond\",\n                        cupy_kernel(\n                            \"kernel_Correlation_updateGradSecond\",\n                            {\n                                \"rbot0\": rbot0,\n                                \"rbot1\": rbot1,\n                                \"gradOutput\": gradOutput,\n                                \"gradFirst\": None,\n                                \"gradSecond\": gradSecond,\n                            },\n                        ),\n                    )(\n                        grid=tuple([int((n + 512 - 1) / 512), 1, 1]),\n                        block=tuple([512, 1, 1]),\n                        args=[\n                            n,\n                            intSample,\n                            rbot0.data_ptr(),\n                            rbot1.data_ptr(),\n                            gradOutput.data_ptr(),\n                            None,\n                            gradSecond.data_ptr(),\n                        ],\n                    )\n                # end\n            # end\n\n        elif first.is_cuda == False:\n            raise NotImplementedError()\n\n        # end\n\n        return gradFirst, gradSecond\n\n    # end\n\n\n# end\n\n\ndef FunctionCorrelation(tenFirst, tenSecond):\n    return _FunctionCorrelation.apply(tenFirst, tenSecond)\n\n\n# end\n\n\nclass ModuleCorrelation(torch.nn.Module):\n    def __init__(self):\n        super(ModuleCorrelation, self).__init__()\n\n    # end\n\n    def forward(self, tenFirst, tenSecond):\n        return _FunctionCorrelation.apply(tenFirst, tenSecond)\n\n    # end\n\n\n# end\n"
  },
  {
    "path": "Open-Sora/eval/vae/flolpips/flolpips.py",
    "content": "from __future__ import absolute_import\n\nimport hashlib\nimport os\n\nimport requests\nimport torch\nimport torch.nn\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom torch.autograd import Variable\nfrom tqdm import tqdm\n\nfrom .pretrained_networks import alexnet, squeezenet, vgg16\nfrom .pwcnet import Network as PWCNet\nfrom .utils import *\n\nURL_MAP = {\"alex\": \"https://raw.githubusercontent.com/danier97/flolpips/main/weights/v0.1/alex.pth\"}\n\nCKPT_MAP = {\"alex\": \"alex.pth\"}\n\nMD5_MAP = {\"alex\": \"9642209e2b57a85d20f86d812320f9e6\"}\n\n\ndef spatial_average(in_tens, keepdim=True):\n    return in_tens.mean([2, 3], keepdim=keepdim)\n\n\ndef mw_spatial_average(in_tens, flow, keepdim=True):\n    _, _, h, w = in_tens.shape\n    flow = F.interpolate(flow, (h, w), align_corners=False, mode=\"bilinear\")\n    flow_mag = torch.sqrt(flow[:, 0:1] ** 2 + flow[:, 1:2] ** 2)\n    flow_mag = flow_mag / torch.sum(flow_mag, dim=[1, 2, 3], keepdim=True)\n    return torch.sum(in_tens * flow_mag, dim=[2, 3], keepdim=keepdim)\n\n\ndef mtw_spatial_average(in_tens, flow, texture, keepdim=True):\n    _, _, h, w = in_tens.shape\n    flow = F.interpolate(flow, (h, w), align_corners=False, mode=\"bilinear\")\n    texture = F.interpolate(texture, (h, w), align_corners=False, mode=\"bilinear\")\n    flow_mag = torch.sqrt(flow[:, 0:1] ** 2 + flow[:, 1:2] ** 2)\n    flow_mag = (flow_mag - flow_mag.min()) / (flow_mag.max() - flow_mag.min()) + 1e-6\n    texture = (texture - texture.min()) / (texture.max() - texture.min()) + 1e-6\n    weight = flow_mag / texture\n    weight /= torch.sum(weight)\n    return torch.sum(in_tens * weight, dim=[2, 3], keepdim=keepdim)\n\n\ndef m2w_spatial_average(in_tens, flow, keepdim=True):\n    _, _, h, w = in_tens.shape\n    flow = F.interpolate(flow, (h, w), align_corners=False, mode=\"bilinear\")\n    flow_mag = flow[:, 0:1] ** 2 + flow[:, 1:2] ** 2  # B,1,H,W\n    flow_mag = flow_mag / torch.sum(flow_mag)\n    return torch.sum(in_tens * flow_mag, dim=[2, 3], keepdim=keepdim)\n\n\ndef upsample(in_tens, out_HW=(64, 64)):  # assumes scale factor is same for H and W\n    in_H, in_W = in_tens.shape[2], in_tens.shape[3]\n    return nn.Upsample(size=out_HW, mode=\"bilinear\", align_corners=False)(in_tens)\n\n\ndef md5_hash(path):\n    with open(path, \"rb\") as f:\n        content = f.read()\n    return hashlib.md5(content).hexdigest()\n\n\ndef download(url, local_path, chunk_size=1024):\n    os.makedirs(os.path.split(local_path)[0], exist_ok=True)\n    with requests.get(url, stream=True) as r:\n        total_size = int(r.headers.get(\"content-length\", 0))\n        with tqdm(total=total_size, unit=\"B\", unit_scale=True) as pbar:\n            with open(local_path, \"wb\") as f:\n                for data in r.iter_content(chunk_size=chunk_size):\n                    if data:\n                        f.write(data)\n                        pbar.update(chunk_size)\n\n\ndef get_ckpt_path(name, root, check=False):\n    assert name in URL_MAP\n    path = os.path.join(root, CKPT_MAP[name])\n    if not os.path.exists(path) or (check and not md5_hash(path) == MD5_MAP[name]):\n        print(\"Downloading {} model from {} to {}\".format(name, URL_MAP[name], path))\n        download(URL_MAP[name], path)\n        md5 = md5_hash(path)\n        assert md5 == MD5_MAP[name], md5\n    return path\n\n\n# Learned perceptual metric\nclass LPIPS(nn.Module):\n    def __init__(\n        self,\n        pretrained=True,\n        net=\"alex\",\n        version=\"0.1\",\n        lpips=True,\n        spatial=False,\n        pnet_rand=False,\n        pnet_tune=False,\n        use_dropout=True,\n        model_path=None,\n        eval_mode=True,\n        verbose=False,\n    ):\n        # lpips - [True] means with linear calibration on top of base network\n        # pretrained - [True] means load linear weights\n\n        super(LPIPS, self).__init__()\n        if verbose:\n            print(\n                \"Setting up [%s] perceptual loss: trunk [%s], v[%s], spatial [%s]\"\n                % (\"LPIPS\" if lpips else \"baseline\", net, version, \"on\" if spatial else \"off\")\n            )\n\n        self.pnet_type = net\n        self.pnet_tune = pnet_tune\n        self.pnet_rand = pnet_rand\n        self.spatial = spatial\n        self.lpips = lpips  # false means baseline of just averaging all layers\n        self.version = version\n        self.scaling_layer = ScalingLayer()\n\n        if self.pnet_type in [\"vgg\", \"vgg16\"]:\n            net_type = vgg16\n            self.chns = [64, 128, 256, 512, 512]\n        elif self.pnet_type == \"alex\":\n            net_type = alexnet\n            self.chns = [64, 192, 384, 256, 256]\n        elif self.pnet_type == \"squeeze\":\n            net_type = squeezenet\n            self.chns = [64, 128, 256, 384, 384, 512, 512]\n        self.L = len(self.chns)\n\n        self.net = net_type(pretrained=not self.pnet_rand, requires_grad=self.pnet_tune)\n\n        if lpips:\n            self.lin0 = NetLinLayer(self.chns[0], use_dropout=use_dropout)\n            self.lin1 = NetLinLayer(self.chns[1], use_dropout=use_dropout)\n            self.lin2 = NetLinLayer(self.chns[2], use_dropout=use_dropout)\n            self.lin3 = NetLinLayer(self.chns[3], use_dropout=use_dropout)\n            self.lin4 = NetLinLayer(self.chns[4], use_dropout=use_dropout)\n            self.lins = [self.lin0, self.lin1, self.lin2, self.lin3, self.lin4]\n            if self.pnet_type == \"squeeze\":  # 7 layers for squeezenet\n                self.lin5 = NetLinLayer(self.chns[5], use_dropout=use_dropout)\n                self.lin6 = NetLinLayer(self.chns[6], use_dropout=use_dropout)\n                self.lins += [self.lin5, self.lin6]\n            self.lins = nn.ModuleList(self.lins)\n\n            if pretrained:\n                self.load_from_pretrained(version, net)\n                if verbose:\n                    print(\"Loaded model from: %s\" % model_path)\n\n        if eval_mode:\n            self.eval()\n\n    def load_from_pretrained(self, version, net):\n        ckpt = get_ckpt_path(net, \"pretrained_models/flolpips/weights/v%s\" % (version))\n        self.load_state_dict(torch.load(ckpt, map_location=\"cpu\"), strict=False)\n\n    def forward(self, in0, in1, retPerLayer=False, normalize=False):\n        if normalize:  # turn on this flag if input is [0,1] so it can be adjusted to [-1, +1]\n            in0 = 2 * in0 - 1\n            in1 = 2 * in1 - 1\n\n        # v0.0 - original release had a bug, where input was not scaled\n        in0_input, in1_input = (\n            (self.scaling_layer(in0), self.scaling_layer(in1)) if self.version == \"0.1\" else (in0, in1)\n        )\n        outs0, outs1 = self.net.forward(in0_input), self.net.forward(in1_input)\n        feats0, feats1, diffs = {}, {}, {}\n\n        for kk in range(self.L):\n            feats0[kk], feats1[kk] = normalize_tensor(outs0[kk]), normalize_tensor(outs1[kk])\n            diffs[kk] = (feats0[kk] - feats1[kk]) ** 2\n\n        if self.lpips:\n            if self.spatial:\n                res = [upsample(self.lins[kk](diffs[kk]), out_HW=in0.shape[2:]) for kk in range(self.L)]\n            else:\n                res = [spatial_average(self.lins[kk](diffs[kk]), keepdim=True) for kk in range(self.L)]\n        else:\n            if self.spatial:\n                res = [upsample(diffs[kk].sum(dim=1, keepdim=True), out_HW=in0.shape[2:]) for kk in range(self.L)]\n            else:\n                res = [spatial_average(diffs[kk].sum(dim=1, keepdim=True), keepdim=True) for kk in range(self.L)]\n\n        # val = res[0]\n        # for l in range(1,self.L):\n        #     val += res[l]\n        #     print(val)\n\n        # a = spatial_average(self.lins[kk](diffs[kk]), keepdim=True)\n        # b = torch.max(self.lins[kk](feats0[kk]**2))\n        # for kk in range(self.L):\n        #     a += spatial_average(self.lins[kk](diffs[kk]), keepdim=True)\n        #     b = torch.max(b,torch.max(self.lins[kk](feats0[kk]**2)))\n        # a = a/self.L\n        # from IPython import embed\n        # embed()\n        # return 10*torch.log10(b/a)\n\n        # if(retPerLayer):\n        #     return (val, res)\n        # else:\n        return torch.sum(torch.cat(res, 1), dim=(1, 2, 3), keepdims=False)\n\n\nclass ScalingLayer(nn.Module):\n    def __init__(self):\n        super(ScalingLayer, self).__init__()\n        self.register_buffer(\"shift\", torch.Tensor([-0.030, -0.088, -0.188])[None, :, None, None])\n        self.register_buffer(\"scale\", torch.Tensor([0.458, 0.448, 0.450])[None, :, None, None])\n\n    def forward(self, inp):\n        return (inp - self.shift) / self.scale\n\n\nclass NetLinLayer(nn.Module):\n    \"\"\"A single linear layer which does a 1x1 conv\"\"\"\n\n    def __init__(self, chn_in, chn_out=1, use_dropout=False):\n        super(NetLinLayer, self).__init__()\n\n        layers = (\n            [\n                nn.Dropout(),\n            ]\n            if (use_dropout)\n            else []\n        )\n        layers += [\n            nn.Conv2d(chn_in, chn_out, 1, stride=1, padding=0, bias=False),\n        ]\n        self.model = nn.Sequential(*layers)\n\n    def forward(self, x):\n        return self.model(x)\n\n\nclass Dist2LogitLayer(nn.Module):\n    \"\"\"takes 2 distances, puts through fc layers, spits out value between [0,1] (if use_sigmoid is True)\"\"\"\n\n    def __init__(self, chn_mid=32, use_sigmoid=True):\n        super(Dist2LogitLayer, self).__init__()\n\n        layers = [\n            nn.Conv2d(5, chn_mid, 1, stride=1, padding=0, bias=True),\n        ]\n        layers += [\n            nn.LeakyReLU(0.2, True),\n        ]\n        layers += [\n            nn.Conv2d(chn_mid, chn_mid, 1, stride=1, padding=0, bias=True),\n        ]\n        layers += [\n            nn.LeakyReLU(0.2, True),\n        ]\n        layers += [\n            nn.Conv2d(chn_mid, 1, 1, stride=1, padding=0, bias=True),\n        ]\n        if use_sigmoid:\n            layers += [\n                nn.Sigmoid(),\n            ]\n        self.model = nn.Sequential(*layers)\n\n    def forward(self, d0, d1, eps=0.1):\n        return self.model.forward(torch.cat((d0, d1, d0 - d1, d0 / (d1 + eps), d1 / (d0 + eps)), dim=1))\n\n\nclass BCERankingLoss(nn.Module):\n    def __init__(self, chn_mid=32):\n        super(BCERankingLoss, self).__init__()\n        self.net = Dist2LogitLayer(chn_mid=chn_mid)\n        # self.parameters = list(self.net.parameters())\n        self.loss = torch.nn.BCELoss()\n\n    def forward(self, d0, d1, judge):\n        per = (judge + 1.0) / 2.0\n        self.logit = self.net.forward(d0, d1)\n        return self.loss(self.logit, per)\n\n\n# L2, DSSIM metrics\nclass FakeNet(nn.Module):\n    def __init__(self, use_gpu=True, colorspace=\"Lab\"):\n        super(FakeNet, self).__init__()\n        self.use_gpu = use_gpu\n        self.colorspace = colorspace\n\n\nclass L2(FakeNet):\n    def forward(self, in0, in1, retPerLayer=None):\n        assert in0.size()[0] == 1  # currently only supports batchSize 1\n\n        if self.colorspace == \"RGB\":\n            (N, C, X, Y) = in0.size()\n            value = torch.mean(\n                torch.mean(torch.mean((in0 - in1) ** 2, dim=1).view(N, 1, X, Y), dim=2).view(N, 1, 1, Y), dim=3\n            ).view(N)\n            return value\n        elif self.colorspace == \"Lab\":\n            value = l2(\n                tensor2np(tensor2tensorlab(in0.data, to_norm=False)),\n                tensor2np(tensor2tensorlab(in1.data, to_norm=False)),\n                range=100.0,\n            ).astype(\"float\")\n            ret_var = Variable(torch.Tensor((value,)))\n            if self.use_gpu:\n                ret_var = ret_var.cuda()\n            return ret_var\n\n\nclass DSSIM(FakeNet):\n    def forward(self, in0, in1, retPerLayer=None):\n        assert in0.size()[0] == 1  # currently only supports batchSize 1\n\n        if self.colorspace == \"RGB\":\n            value = dssim(1.0 * tensor2im(in0.data), 1.0 * tensor2im(in1.data), range=255.0).astype(\"float\")\n        elif self.colorspace == \"Lab\":\n            value = dssim(\n                tensor2np(tensor2tensorlab(in0.data, to_norm=False)),\n                tensor2np(tensor2tensorlab(in1.data, to_norm=False)),\n                range=100.0,\n            ).astype(\"float\")\n        ret_var = Variable(torch.Tensor((value,)))\n        if self.use_gpu:\n            ret_var = ret_var.cuda()\n        return ret_var\n\n\ndef print_network(net):\n    num_params = 0\n    for param in net.parameters():\n        num_params += param.numel()\n    print(\"Network\", net)\n    print(\"Total number of parameters: %d\" % num_params)\n\n\nclass FloLPIPS(LPIPS):\n    def __init__(\n        self,\n        pretrained=True,\n        net=\"alex\",\n        version=\"0.1\",\n        lpips=True,\n        spatial=False,\n        pnet_rand=False,\n        pnet_tune=False,\n        use_dropout=True,\n        model_path=None,\n        eval_mode=True,\n        verbose=False,\n    ):\n        super(FloLPIPS, self).__init__(\n            pretrained, net, version, lpips, spatial, pnet_rand, pnet_tune, use_dropout, model_path, eval_mode, verbose\n        )\n\n    def forward(self, in0, in1, flow, retPerLayer=False, normalize=False):\n        if normalize:  # turn on this flag if input is [0,1] so it can be adjusted to [-1, +1]\n            in0 = 2 * in0 - 1\n            in1 = 2 * in1 - 1\n\n        in0_input, in1_input = (\n            (self.scaling_layer(in0), self.scaling_layer(in1)) if self.version == \"0.1\" else (in0, in1)\n        )\n        outs0, outs1 = self.net.forward(in0_input), self.net.forward(in1_input)\n        feats0, feats1, diffs = {}, {}, {}\n\n        for kk in range(self.L):\n            feats0[kk], feats1[kk] = normalize_tensor(outs0[kk]), normalize_tensor(outs1[kk])\n            diffs[kk] = (feats0[kk] - feats1[kk]) ** 2\n\n        res = [mw_spatial_average(self.lins[kk](diffs[kk]), flow, keepdim=True) for kk in range(self.L)]\n\n        return torch.sum(torch.cat(res, 1), dim=(1, 2, 3), keepdims=False)\n\n\nclass Flolpips(nn.Module):\n    def __init__(self):\n        super(Flolpips, self).__init__()\n        self.loss_fn = FloLPIPS(net=\"alex\", version=\"0.1\")\n        self.flownet = PWCNet()\n\n    @torch.no_grad()\n    def forward(self, I0, I1, frame_dis, frame_ref):\n        \"\"\"\n        args:\n            I0: first frame of the triplet, shape: [B, C, H, W]\n            I1: third frame of the triplet, shape: [B, C, H, W]\n            frame_dis: prediction of the intermediate frame, shape: [B, C, H, W]\n            frame_ref: ground-truth of the intermediate frame, shape: [B, C, H, W]\n        \"\"\"\n        assert (\n            I0.size() == I1.size() == frame_dis.size() == frame_ref.size()\n        ), \"the 4 input tensors should have same size\"\n\n        flow_ref = self.flownet(frame_ref, I0)\n        flow_dis = self.flownet(frame_dis, I0)\n        flow_diff = flow_ref - flow_dis\n        flolpips_wrt_I0 = self.loss_fn.forward(frame_ref, frame_dis, flow_diff, normalize=True)\n\n        flow_ref = self.flownet(frame_ref, I1)\n        flow_dis = self.flownet(frame_dis, I1)\n        flow_diff = flow_ref - flow_dis\n        flolpips_wrt_I1 = self.loss_fn.forward(frame_ref, frame_dis, flow_diff, normalize=True)\n\n        flolpips = (flolpips_wrt_I0 + flolpips_wrt_I1) / 2\n        return flolpips\n"
  },
  {
    "path": "Open-Sora/eval/vae/flolpips/pretrained_networks.py",
    "content": "from collections import namedtuple\n\nimport torch\nfrom torchvision import models as tv\n\n\nclass squeezenet(torch.nn.Module):\n    def __init__(self, requires_grad=False, pretrained=True):\n        super(squeezenet, self).__init__()\n        pretrained_features = tv.squeezenet1_1(pretrained=pretrained).features\n        self.slice1 = torch.nn.Sequential()\n        self.slice2 = torch.nn.Sequential()\n        self.slice3 = torch.nn.Sequential()\n        self.slice4 = torch.nn.Sequential()\n        self.slice5 = torch.nn.Sequential()\n        self.slice6 = torch.nn.Sequential()\n        self.slice7 = torch.nn.Sequential()\n        self.N_slices = 7\n        for x in range(2):\n            self.slice1.add_module(str(x), pretrained_features[x])\n        for x in range(2, 5):\n            self.slice2.add_module(str(x), pretrained_features[x])\n        for x in range(5, 8):\n            self.slice3.add_module(str(x), pretrained_features[x])\n        for x in range(8, 10):\n            self.slice4.add_module(str(x), pretrained_features[x])\n        for x in range(10, 11):\n            self.slice5.add_module(str(x), pretrained_features[x])\n        for x in range(11, 12):\n            self.slice6.add_module(str(x), pretrained_features[x])\n        for x in range(12, 13):\n            self.slice7.add_module(str(x), pretrained_features[x])\n        if not requires_grad:\n            for param in self.parameters():\n                param.requires_grad = False\n\n    def forward(self, X):\n        h = self.slice1(X)\n        h_relu1 = h\n        h = self.slice2(h)\n        h_relu2 = h\n        h = self.slice3(h)\n        h_relu3 = h\n        h = self.slice4(h)\n        h_relu4 = h\n        h = self.slice5(h)\n        h_relu5 = h\n        h = self.slice6(h)\n        h_relu6 = h\n        h = self.slice7(h)\n        h_relu7 = h\n        vgg_outputs = namedtuple(\"SqueezeOutputs\", [\"relu1\", \"relu2\", \"relu3\", \"relu4\", \"relu5\", \"relu6\", \"relu7\"])\n        out = vgg_outputs(h_relu1, h_relu2, h_relu3, h_relu4, h_relu5, h_relu6, h_relu7)\n\n        return out\n\n\nclass alexnet(torch.nn.Module):\n    def __init__(self, requires_grad=False, pretrained=True):\n        super(alexnet, self).__init__()\n        alexnet_pretrained_features = tv.alexnet(pretrained=pretrained).features\n        self.slice1 = torch.nn.Sequential()\n        self.slice2 = torch.nn.Sequential()\n        self.slice3 = torch.nn.Sequential()\n        self.slice4 = torch.nn.Sequential()\n        self.slice5 = torch.nn.Sequential()\n        self.N_slices = 5\n        for x in range(2):\n            self.slice1.add_module(str(x), alexnet_pretrained_features[x])\n        for x in range(2, 5):\n            self.slice2.add_module(str(x), alexnet_pretrained_features[x])\n        for x in range(5, 8):\n            self.slice3.add_module(str(x), alexnet_pretrained_features[x])\n        for x in range(8, 10):\n            self.slice4.add_module(str(x), alexnet_pretrained_features[x])\n        for x in range(10, 12):\n            self.slice5.add_module(str(x), alexnet_pretrained_features[x])\n        if not requires_grad:\n            for param in self.parameters():\n                param.requires_grad = False\n\n    def forward(self, X):\n        h = self.slice1(X)\n        h_relu1 = h\n        h = self.slice2(h)\n        h_relu2 = h\n        h = self.slice3(h)\n        h_relu3 = h\n        h = self.slice4(h)\n        h_relu4 = h\n        h = self.slice5(h)\n        h_relu5 = h\n        alexnet_outputs = namedtuple(\"AlexnetOutputs\", [\"relu1\", \"relu2\", \"relu3\", \"relu4\", \"relu5\"])\n        out = alexnet_outputs(h_relu1, h_relu2, h_relu3, h_relu4, h_relu5)\n\n        return out\n\n\nclass vgg16(torch.nn.Module):\n    def __init__(self, requires_grad=False, pretrained=True):\n        super(vgg16, self).__init__()\n        vgg_pretrained_features = tv.vgg16(pretrained=pretrained).features\n        self.slice1 = torch.nn.Sequential()\n        self.slice2 = torch.nn.Sequential()\n        self.slice3 = torch.nn.Sequential()\n        self.slice4 = torch.nn.Sequential()\n        self.slice5 = torch.nn.Sequential()\n        self.N_slices = 5\n        for x in range(4):\n            self.slice1.add_module(str(x), vgg_pretrained_features[x])\n        for x in range(4, 9):\n            self.slice2.add_module(str(x), vgg_pretrained_features[x])\n        for x in range(9, 16):\n            self.slice3.add_module(str(x), vgg_pretrained_features[x])\n        for x in range(16, 23):\n            self.slice4.add_module(str(x), vgg_pretrained_features[x])\n        for x in range(23, 30):\n            self.slice5.add_module(str(x), vgg_pretrained_features[x])\n        if not requires_grad:\n            for param in self.parameters():\n                param.requires_grad = False\n\n    def forward(self, X):\n        h = self.slice1(X)\n        h_relu1_2 = h\n        h = self.slice2(h)\n        h_relu2_2 = h\n        h = self.slice3(h)\n        h_relu3_3 = h\n        h = self.slice4(h)\n        h_relu4_3 = h\n        h = self.slice5(h)\n        h_relu5_3 = h\n        vgg_outputs = namedtuple(\"VggOutputs\", [\"relu1_2\", \"relu2_2\", \"relu3_3\", \"relu4_3\", \"relu5_3\"])\n        out = vgg_outputs(h_relu1_2, h_relu2_2, h_relu3_3, h_relu4_3, h_relu5_3)\n\n        return out\n\n\nclass resnet(torch.nn.Module):\n    def __init__(self, requires_grad=False, pretrained=True, num=18):\n        super(resnet, self).__init__()\n        if num == 18:\n            self.net = tv.resnet18(pretrained=pretrained)\n        elif num == 34:\n            self.net = tv.resnet34(pretrained=pretrained)\n        elif num == 50:\n            self.net = tv.resnet50(pretrained=pretrained)\n        elif num == 101:\n            self.net = tv.resnet101(pretrained=pretrained)\n        elif num == 152:\n            self.net = tv.resnet152(pretrained=pretrained)\n        self.N_slices = 5\n\n        self.conv1 = self.net.conv1\n        self.bn1 = self.net.bn1\n        self.relu = self.net.relu\n        self.maxpool = self.net.maxpool\n        self.layer1 = self.net.layer1\n        self.layer2 = self.net.layer2\n        self.layer3 = self.net.layer3\n        self.layer4 = self.net.layer4\n\n    def forward(self, X):\n        h = self.conv1(X)\n        h = self.bn1(h)\n        h = self.relu(h)\n        h_relu1 = h\n        h = self.maxpool(h)\n        h = self.layer1(h)\n        h_conv2 = h\n        h = self.layer2(h)\n        h_conv3 = h\n        h = self.layer3(h)\n        h_conv4 = h\n        h = self.layer4(h)\n        h_conv5 = h\n\n        outputs = namedtuple(\"Outputs\", [\"relu1\", \"conv2\", \"conv3\", \"conv4\", \"conv5\"])\n        out = outputs(h_relu1, h_conv2, h_conv3, h_conv4, h_conv5)\n\n        return out\n"
  },
  {
    "path": "Open-Sora/eval/vae/flolpips/pwcnet.py",
    "content": "#!/usr/bin/env python\n\nimport math\n\nimport torch\n\n# try:\nfrom .correlation import correlation  # the custom cost volume layer\n\n# except:\n# \tsys.path.insert(0, './correlation'); import correlation # you should consider upgrading python\n# end\n\n##########################################################\n\n# assert(int(str('').join(torch.__version__.split('.')[0:2])) >= 13) # requires at least pytorch version 1.3.0\n\n# torch.set_grad_enabled(False) # make sure to not compute gradients for computational performance\n\n# torch.backends.cudnn.enabled = True # make sure to use cudnn for computational performance\n\n# ##########################################################\n\n# arguments_strModel = 'default' # 'default', or 'chairs-things'\n# arguments_strFirst = './images/first.png'\n# arguments_strSecond = './images/second.png'\n# arguments_strOut = './out.flo'\n\n# for strOption, strArgument in getopt.getopt(sys.argv[1:], '', [ strParameter[2:] + '=' for strParameter in sys.argv[1::2] ])[0]:\n# \tif strOption == '--model' and strArgument != '': arguments_strModel = strArgument # which model to use\n# \tif strOption == '--first' and strArgument != '': arguments_strFirst = strArgument # path to the first frame\n# \tif strOption == '--second' and strArgument != '': arguments_strSecond = strArgument # path to the second frame\n# \tif strOption == '--out' and strArgument != '': arguments_strOut = strArgument # path to where the output should be stored\n# end\n\n##########################################################\n\n\ndef backwarp(tenInput, tenFlow):\n    backwarp_tenGrid = {}\n    backwarp_tenPartial = {}\n    if str(tenFlow.shape) not in backwarp_tenGrid:\n        tenHor = (\n            torch.linspace(-1.0 + (1.0 / tenFlow.shape[3]), 1.0 - (1.0 / tenFlow.shape[3]), tenFlow.shape[3])\n            .view(1, 1, 1, -1)\n            .expand(-1, -1, tenFlow.shape[2], -1)\n        )\n        tenVer = (\n            torch.linspace(-1.0 + (1.0 / tenFlow.shape[2]), 1.0 - (1.0 / tenFlow.shape[2]), tenFlow.shape[2])\n            .view(1, 1, -1, 1)\n            .expand(-1, -1, -1, tenFlow.shape[3])\n        )\n\n        backwarp_tenGrid[str(tenFlow.shape)] = torch.cat([tenHor, tenVer], 1).cuda()\n    # end\n\n    if str(tenFlow.shape) not in backwarp_tenPartial:\n        backwarp_tenPartial[str(tenFlow.shape)] = tenFlow.new_ones(\n            [tenFlow.shape[0], 1, tenFlow.shape[2], tenFlow.shape[3]]\n        )\n    # end\n\n    tenFlow = torch.cat(\n        [\n            tenFlow[:, 0:1, :, :] / ((tenInput.shape[3] - 1.0) / 2.0),\n            tenFlow[:, 1:2, :, :] / ((tenInput.shape[2] - 1.0) / 2.0),\n        ],\n        1,\n    )\n    tenInput = torch.cat([tenInput, backwarp_tenPartial[str(tenFlow.shape)]], 1)\n\n    tenOutput = torch.nn.functional.grid_sample(\n        input=tenInput,\n        grid=(backwarp_tenGrid[str(tenFlow.shape)] + tenFlow).permute(0, 2, 3, 1),\n        mode=\"bilinear\",\n        padding_mode=\"zeros\",\n        align_corners=False,\n    )\n\n    tenMask = tenOutput[:, -1:, :, :]\n    tenMask[tenMask > 0.999] = 1.0\n    tenMask[tenMask < 1.0] = 0.0\n\n    return tenOutput[:, :-1, :, :] * tenMask\n\n\n# end\n\n##########################################################\n\n\nclass Network(torch.nn.Module):\n    def __init__(self):\n        super(Network, self).__init__()\n\n        class Extractor(torch.nn.Module):\n            def __init__(self):\n                super(Extractor, self).__init__()\n\n                self.netOne = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=2, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=16, out_channels=16, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=16, out_channels=16, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                )\n\n                self.netTwo = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=2, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                )\n\n                self.netThr = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=2, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                )\n\n                self.netFou = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=64, out_channels=96, kernel_size=3, stride=2, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=96, out_channels=96, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=96, out_channels=96, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                )\n\n                self.netFiv = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=96, out_channels=128, kernel_size=3, stride=2, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                )\n\n                self.netSix = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=128, out_channels=196, kernel_size=3, stride=2, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=196, out_channels=196, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=196, out_channels=196, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                )\n\n            # end\n\n            def forward(self, tenInput):\n                tenOne = self.netOne(tenInput)\n                tenTwo = self.netTwo(tenOne)\n                tenThr = self.netThr(tenTwo)\n                tenFou = self.netFou(tenThr)\n                tenFiv = self.netFiv(tenFou)\n                tenSix = self.netSix(tenFiv)\n\n                return [tenOne, tenTwo, tenThr, tenFou, tenFiv, tenSix]\n\n            # end\n\n        # end\n\n        class Decoder(torch.nn.Module):\n            def __init__(self, intLevel):\n                super(Decoder, self).__init__()\n\n                intPrevious = [\n                    None,\n                    None,\n                    81 + 32 + 2 + 2,\n                    81 + 64 + 2 + 2,\n                    81 + 96 + 2 + 2,\n                    81 + 128 + 2 + 2,\n                    81,\n                    None,\n                ][intLevel + 1]\n                intCurrent = [\n                    None,\n                    None,\n                    81 + 32 + 2 + 2,\n                    81 + 64 + 2 + 2,\n                    81 + 96 + 2 + 2,\n                    81 + 128 + 2 + 2,\n                    81,\n                    None,\n                ][intLevel + 0]\n\n                if intLevel < 6:\n                    self.netUpflow = torch.nn.ConvTranspose2d(\n                        in_channels=2, out_channels=2, kernel_size=4, stride=2, padding=1\n                    )\n                if intLevel < 6:\n                    self.netUpfeat = torch.nn.ConvTranspose2d(\n                        in_channels=intPrevious + 128 + 128 + 96 + 64 + 32,\n                        out_channels=2,\n                        kernel_size=4,\n                        stride=2,\n                        padding=1,\n                    )\n                if intLevel < 6:\n                    self.fltBackwarp = [None, None, None, 5.0, 2.5, 1.25, 0.625, None][intLevel + 1]\n\n                self.netOne = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=intCurrent, out_channels=128, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                )\n\n                self.netTwo = torch.nn.Sequential(\n                    torch.nn.Conv2d(in_channels=intCurrent + 128, out_channels=128, kernel_size=3, stride=1, padding=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                )\n\n                self.netThr = torch.nn.Sequential(\n                    torch.nn.Conv2d(\n                        in_channels=intCurrent + 128 + 128, out_channels=96, kernel_size=3, stride=1, padding=1\n                    ),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                )\n\n                self.netFou = torch.nn.Sequential(\n                    torch.nn.Conv2d(\n                        in_channels=intCurrent + 128 + 128 + 96, out_channels=64, kernel_size=3, stride=1, padding=1\n                    ),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                )\n\n                self.netFiv = torch.nn.Sequential(\n                    torch.nn.Conv2d(\n                        in_channels=intCurrent + 128 + 128 + 96 + 64,\n                        out_channels=32,\n                        kernel_size=3,\n                        stride=1,\n                        padding=1,\n                    ),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                )\n\n                self.netSix = torch.nn.Sequential(\n                    torch.nn.Conv2d(\n                        in_channels=intCurrent + 128 + 128 + 96 + 64 + 32,\n                        out_channels=2,\n                        kernel_size=3,\n                        stride=1,\n                        padding=1,\n                    )\n                )\n\n            # end\n\n            def forward(self, tenFirst, tenSecond, objPrevious):\n                tenFlow = None\n                tenFeat = None\n\n                if objPrevious is None:\n                    tenFlow = None\n                    tenFeat = None\n\n                    tenVolume = torch.nn.functional.leaky_relu(\n                        input=correlation.FunctionCorrelation(tenFirst=tenFirst, tenSecond=tenSecond),\n                        negative_slope=0.1,\n                        inplace=False,\n                    )\n\n                    tenFeat = torch.cat([tenVolume], 1)\n\n                elif objPrevious is not None:\n                    tenFlow = self.netUpflow(objPrevious[\"tenFlow\"])\n                    tenFeat = self.netUpfeat(objPrevious[\"tenFeat\"])\n\n                    tenVolume = torch.nn.functional.leaky_relu(\n                        input=correlation.FunctionCorrelation(\n                            tenFirst=tenFirst,\n                            tenSecond=backwarp(tenInput=tenSecond, tenFlow=tenFlow * self.fltBackwarp),\n                        ),\n                        negative_slope=0.1,\n                        inplace=False,\n                    )\n\n                    tenFeat = torch.cat([tenVolume, tenFirst, tenFlow, tenFeat], 1)\n\n                # end\n\n                tenFeat = torch.cat([self.netOne(tenFeat), tenFeat], 1)\n                tenFeat = torch.cat([self.netTwo(tenFeat), tenFeat], 1)\n                tenFeat = torch.cat([self.netThr(tenFeat), tenFeat], 1)\n                tenFeat = torch.cat([self.netFou(tenFeat), tenFeat], 1)\n                tenFeat = torch.cat([self.netFiv(tenFeat), tenFeat], 1)\n\n                tenFlow = self.netSix(tenFeat)\n\n                return {\"tenFlow\": tenFlow, \"tenFeat\": tenFeat}\n\n            # end\n\n        # end\n\n        class Refiner(torch.nn.Module):\n            def __init__(self):\n                super(Refiner, self).__init__()\n\n                self.netMain = torch.nn.Sequential(\n                    torch.nn.Conv2d(\n                        in_channels=81 + 32 + 2 + 2 + 128 + 128 + 96 + 64 + 32,\n                        out_channels=128,\n                        kernel_size=3,\n                        stride=1,\n                        padding=1,\n                        dilation=1,\n                    ),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=2, dilation=2),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=4, dilation=4),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=128, out_channels=96, kernel_size=3, stride=1, padding=8, dilation=8),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=96, out_channels=64, kernel_size=3, stride=1, padding=16, dilation=16),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=64, out_channels=32, kernel_size=3, stride=1, padding=1, dilation=1),\n                    torch.nn.LeakyReLU(inplace=False, negative_slope=0.1),\n                    torch.nn.Conv2d(in_channels=32, out_channels=2, kernel_size=3, stride=1, padding=1, dilation=1),\n                )\n\n            # end\n\n            def forward(self, tenInput):\n                return self.netMain(tenInput)\n\n            # end\n\n        # end\n\n        self.netExtractor = Extractor()\n\n        self.netTwo = Decoder(2)\n        self.netThr = Decoder(3)\n        self.netFou = Decoder(4)\n        self.netFiv = Decoder(5)\n        self.netSix = Decoder(6)\n\n        self.netRefiner = Refiner()\n\n        self.load_state_dict(\n            {\n                strKey.replace(\"module\", \"net\"): tenWeight\n                for strKey, tenWeight in torch.hub.load_state_dict_from_url(\n                    url=\"http://content.sniklaus.com/github/pytorch-pwc/network-\" + \"default\" + \".pytorch\"\n                ).items()\n            }\n        )\n\n    # end\n\n    def forward(self, tenFirst, tenSecond):\n        intWidth = tenFirst.shape[3]\n        intHeight = tenFirst.shape[2]\n\n        intPreprocessedWidth = int(math.floor(math.ceil(intWidth / 64.0) * 64.0))\n        intPreprocessedHeight = int(math.floor(math.ceil(intHeight / 64.0) * 64.0))\n\n        tenPreprocessedFirst = torch.nn.functional.interpolate(\n            input=tenFirst, size=(intPreprocessedHeight, intPreprocessedWidth), mode=\"bilinear\", align_corners=False\n        )\n        tenPreprocessedSecond = torch.nn.functional.interpolate(\n            input=tenSecond, size=(intPreprocessedHeight, intPreprocessedWidth), mode=\"bilinear\", align_corners=False\n        )\n\n        tenFirst = self.netExtractor(tenPreprocessedFirst)\n        tenSecond = self.netExtractor(tenPreprocessedSecond)\n\n        objEstimate = self.netSix(tenFirst[-1], tenSecond[-1], None)\n        objEstimate = self.netFiv(tenFirst[-2], tenSecond[-2], objEstimate)\n        objEstimate = self.netFou(tenFirst[-3], tenSecond[-3], objEstimate)\n        objEstimate = self.netThr(tenFirst[-4], tenSecond[-4], objEstimate)\n        objEstimate = self.netTwo(tenFirst[-5], tenSecond[-5], objEstimate)\n\n        tenFlow = objEstimate[\"tenFlow\"] + self.netRefiner(objEstimate[\"tenFeat\"])\n        tenFlow = 20.0 * torch.nn.functional.interpolate(\n            input=tenFlow, size=(intHeight, intWidth), mode=\"bilinear\", align_corners=False\n        )\n        tenFlow[:, 0, :, :] *= float(intWidth) / float(intPreprocessedWidth)\n        tenFlow[:, 1, :, :] *= float(intHeight) / float(intPreprocessedHeight)\n\n        return tenFlow\n\n    # end\n\n\n# end\n\nnetNetwork = None\n\n##########################################################\n\n\ndef estimate(tenFirst, tenSecond):\n    global netNetwork\n\n    if netNetwork is None:\n        netNetwork = Network().cuda().eval()\n    # end\n\n    assert tenFirst.shape[1] == tenSecond.shape[1]\n    assert tenFirst.shape[2] == tenSecond.shape[2]\n\n    intWidth = tenFirst.shape[2]\n    intHeight = tenFirst.shape[1]\n\n    assert (\n        intWidth == 1024\n    )  # remember that there is no guarantee for correctness, comment this line out if you acknowledge this and want to continue\n    assert (\n        intHeight == 436\n    )  # remember that there is no guarantee for correctness, comment this line out if you acknowledge this and want to continue\n\n    tenPreprocessedFirst = tenFirst.cuda().view(1, 3, intHeight, intWidth)\n    tenPreprocessedSecond = tenSecond.cuda().view(1, 3, intHeight, intWidth)\n\n    intPreprocessedWidth = int(math.floor(math.ceil(intWidth / 64.0) * 64.0))\n    intPreprocessedHeight = int(math.floor(math.ceil(intHeight / 64.0) * 64.0))\n\n    tenPreprocessedFirst = torch.nn.functional.interpolate(\n        input=tenPreprocessedFirst,\n        size=(intPreprocessedHeight, intPreprocessedWidth),\n        mode=\"bilinear\",\n        align_corners=False,\n    )\n    tenPreprocessedSecond = torch.nn.functional.interpolate(\n        input=tenPreprocessedSecond,\n        size=(intPreprocessedHeight, intPreprocessedWidth),\n        mode=\"bilinear\",\n        align_corners=False,\n    )\n\n    tenFlow = 20.0 * torch.nn.functional.interpolate(\n        input=netNetwork(tenPreprocessedFirst, tenPreprocessedSecond),\n        size=(intHeight, intWidth),\n        mode=\"bilinear\",\n        align_corners=False,\n    )\n\n    tenFlow[:, 0, :, :] *= float(intWidth) / float(intPreprocessedWidth)\n    tenFlow[:, 1, :, :] *= float(intHeight) / float(intPreprocessedHeight)\n\n    return tenFlow[0, :, :, :].cpu()\n\n\n# end\n\n##########################################################\n\n# if __name__ == '__main__':\n# \ttenFirst = torch.FloatTensor(numpy.ascontiguousarray(numpy.array(PIL.Image.open(arguments_strFirst))[:, :, ::-1].transpose(2, 0, 1).astype(numpy.float32) * (1.0 / 255.0)))\n# \ttenSecond = torch.FloatTensor(numpy.ascontiguousarray(numpy.array(PIL.Image.open(arguments_strSecond))[:, :, ::-1].transpose(2, 0, 1).astype(numpy.float32) * (1.0 / 255.0)))\n\n# \ttenOutput = estimate(tenFirst, tenSecond)\n\n# \tobjOutput = open(arguments_strOut, 'wb')\n\n# \tnumpy.array([ 80, 73, 69, 72 ], numpy.uint8).tofile(objOutput)\n# \tnumpy.array([ tenOutput.shape[2], tenOutput.shape[1] ], numpy.int32).tofile(objOutput)\n# \tnumpy.array(tenOutput.numpy().transpose(1, 2, 0), numpy.float32).tofile(objOutput)\n\n# \tobjOutput.close()\n# end\n"
  },
  {
    "path": "Open-Sora/eval/vae/flolpips/utils.py",
    "content": "import cv2\nimport numpy as np\nimport torch\n\n\ndef normalize_tensor(in_feat, eps=1e-10):\n    norm_factor = torch.sqrt(torch.sum(in_feat**2, dim=1, keepdim=True))\n    return in_feat / (norm_factor + eps)\n\n\ndef l2(p0, p1, range=255.0):\n    return 0.5 * np.mean((p0 / range - p1 / range) ** 2)\n\n\ndef dssim(p0, p1, range=255.0):\n    from skimage.measure import compare_ssim\n\n    return (1 - compare_ssim(p0, p1, data_range=range, multichannel=True)) / 2.0\n\n\ndef tensor2im(image_tensor, imtype=np.uint8, cent=1.0, factor=255.0 / 2.0):\n    image_numpy = image_tensor[0].cpu().float().numpy()\n    image_numpy = (np.transpose(image_numpy, (1, 2, 0)) + cent) * factor\n    return image_numpy.astype(imtype)\n\n\ndef tensor2np(tensor_obj):\n    # change dimension of a tensor object into a numpy array\n    return tensor_obj[0].cpu().float().numpy().transpose((1, 2, 0))\n\n\ndef np2tensor(np_obj):\n    # change dimenion of np array into tensor array\n    return torch.Tensor(np_obj[:, :, :, np.newaxis].transpose((3, 2, 0, 1)))\n\n\ndef tensor2tensorlab(image_tensor, to_norm=True, mc_only=False):\n    # image tensor to lab tensor\n    from skimage import color\n\n    img = tensor2im(image_tensor)\n    img_lab = color.rgb2lab(img)\n    if mc_only:\n        img_lab[:, :, 0] = img_lab[:, :, 0] - 50\n    if to_norm and not mc_only:\n        img_lab[:, :, 0] = img_lab[:, :, 0] - 50\n        img_lab = img_lab / 100.0\n\n    return np2tensor(img_lab)\n\n\ndef read_frame_yuv2rgb(stream, width, height, iFrame, bit_depth, pix_fmt=\"420\"):\n    if pix_fmt == \"420\":\n        multiplier = 1\n        uv_factor = 2\n    elif pix_fmt == \"444\":\n        multiplier = 2\n        uv_factor = 1\n    else:\n        print(\"Pixel format {} is not supported\".format(pix_fmt))\n        return\n\n    if bit_depth == 8:\n        datatype = np.uint8\n        stream.seek(iFrame * 1.5 * width * height * multiplier)\n        Y = np.fromfile(stream, dtype=datatype, count=width * height).reshape((height, width))\n\n        # read chroma samples and upsample since original is 4:2:0 sampling\n        U = np.fromfile(stream, dtype=datatype, count=(width // uv_factor) * (height // uv_factor)).reshape(\n            (height // uv_factor, width // uv_factor)\n        )\n        V = np.fromfile(stream, dtype=datatype, count=(width // uv_factor) * (height // uv_factor)).reshape(\n            (height // uv_factor, width // uv_factor)\n        )\n\n    else:\n        datatype = np.uint16\n        stream.seek(iFrame * 3 * width * height * multiplier)\n        Y = np.fromfile(stream, dtype=datatype, count=width * height).reshape((height, width))\n\n        U = np.fromfile(stream, dtype=datatype, count=(width // uv_factor) * (height // uv_factor)).reshape(\n            (height // uv_factor, width // uv_factor)\n        )\n        V = np.fromfile(stream, dtype=datatype, count=(width // uv_factor) * (height // uv_factor)).reshape(\n            (height // uv_factor, width // uv_factor)\n        )\n\n    if pix_fmt == \"420\":\n        yuv = np.empty((height * 3 // 2, width), dtype=datatype)\n        yuv[0:height, :] = Y\n\n        yuv[height : height + height // 4, :] = U.reshape(-1, width)\n        yuv[height + height // 4 :, :] = V.reshape(-1, width)\n\n        if bit_depth != 8:\n            yuv = (yuv / (2**bit_depth - 1) * 255).astype(np.uint8)\n\n        # convert to rgb\n        rgb = cv2.cvtColor(yuv, cv2.COLOR_YUV2RGB_I420)\n\n    else:\n        yvu = np.stack([Y, V, U], axis=2)\n        if bit_depth != 8:\n            yvu = (yvu / (2**bit_depth - 1) * 255).astype(np.uint8)\n        rgb = cv2.cvtColor(yvu, cv2.COLOR_YCrCb2RGB)\n\n    return rgb\n"
  },
  {
    "path": "Open-Sora/eval/vae/script/eval.sh",
    "content": "python eval/eval_common_metric.py \\\n    --batch_size 2 \\\n    --real_video_dir ../test_eval/release/origin \\\n    --generated_video_dir ../test_eval/release \\\n    --device cuda \\\n    --sample_fps 10 \\\n    --crop_size 256 \\\n    --resolution 256 \\\n    --num_frames 17 \\\n    --sample_rate 1 \\\n    --subset_size 100 \\\n    --metric ssim psnr lpips flolpips\n"
  },
  {
    "path": "Open-Sora/eval/vbench/VBench_full_info.json",
    "content": "[\n    {\n        \"prompt_en\": \"In a still frame, a stop sign\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a toilet, frozen in time\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a laptop, frozen in time\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of alley\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of bar\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of barn\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of bathroom\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of bedroom\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of cliff\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, courtyard\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, gas station\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of house\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"indoor gymnasium, frozen in time\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of indoor library\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of kitchen\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of palace\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, parking lot\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, phone booth\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of restaurant\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of tower\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of a bowl\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of an apple\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of a bench\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of a bed\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of a chair\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of a cup\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of a dining table\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, a pear\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of a bunch of grapes\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of a bowl on the kitchen counter\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of a beautiful, handcrafted ceramic bowl\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of an antique bowl\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of an exquisite mahogany dining table\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of a wooden bench in the park\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of a beautiful wrought-iron bench surrounded by blooming flowers\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, a park bench with a view of the lake\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of a vintage rocking chair was placed on the porch\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of the jail cell was small and dimly lit, with cold, steel bars\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of the phone booth was tucked away in a quiet alley\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a dilapidated phone booth stood as a relic of a bygone era on the sidewalk, frozen in time\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of the old red barn stood weathered and iconic against the backdrop of the countryside\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of a picturesque barn was painted a warm shade of red and nestled in a picturesque meadow\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, within the desolate desert, an oasis unfolded, characterized by the stoic presence of palm trees and a motionless, glassy pool of water\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, the Parthenon's majestic Doric columns stand in serene solitude atop the Acropolis, framed by the tranquil Athenian landscape\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, the Temple of Hephaestus, with its timeless Doric grace, stands stoically against the backdrop of a quiet Athens\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, the ornate Victorian streetlamp stands solemnly, adorned with intricate ironwork and stained glass panels\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of the Stonehenge presented itself as an enigmatic puzzle, each colossal stone meticulously placed against the backdrop of tranquility\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, in the vast desert, an oasis nestled among dunes, featuring tall palm trees and an air of serenity\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"static view on a desert scene with an oasis, palm trees, and a clear, calm pool of water\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of an ornate Victorian streetlamp standing on a cobblestone street corner, illuminating the empty night\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of a tranquil lakeside cabin nestled among tall pines, its reflection mirrored perfectly in the calm water\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, a vintage gas lantern, adorned with intricate details, gracing a historic cobblestone square\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, a tranquil Japanese tea ceremony room, with tatami mats, a delicate tea set, and a bonsai tree in the corner\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of the Parthenon stands resolute in its classical elegance, a timeless symbol of Athens' cultural legacy\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of in the heart of Plaka, the neoclassical architecture of the old city harmonizes with the ancient ruins\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of in the desolate beauty of the American Southwest, Chaco Canyon's ancient ruins whispered tales of an enigmatic civilization that once thrived amidst the arid landscapes\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of at the edge of the Arabian Desert, the ancient city of Petra beckoned with its enigmatic rock-carved fa\\u00e7ades\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, amidst the cobblestone streets, an Art Nouveau lamppost stood tall\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of in the quaint village square, a traditional wrought-iron streetlamp featured delicate filigree patterns and amber-hued glass panels\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of the lampposts were adorned with Art Deco motifs, their geometric shapes and frosted glass creating a sense of vintage glamour\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, in the picturesque square, a Gothic-style lamppost adorned with intricate stone carvings added a touch of medieval charm to the setting\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, in the heart of the old city, a row of ornate lantern-style streetlamps bathed the narrow alleyway in a warm, welcoming light\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of in the heart of the Utah desert, a massive sandstone arch spanned the horizon\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of in the Arizona desert, a massive stone bridge arched across a rugged canyon\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of in the corner of the minimalist tea room, a bonsai tree added a touch of nature's beauty to the otherwise simple and elegant space\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, amidst the hushed ambiance of the traditional tea room, a meticulously arranged tea set awaited, with porcelain cups, a bamboo whisk\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, nestled in the Zen garden, a rustic teahouse featured tatami seating and a traditional charcoal brazier\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of a country estate's library featured elegant wooden shelves\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of beneath the shade of a solitary oak tree, an old wooden park bench sat patiently\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of beside a tranquil pond, a weeping willow tree draped its branches gracefully over the water's surface, creating a serene tableau of reflection and calm\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of in the Zen garden, a perfectly raked gravel path led to a serene rock garden\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, a tranquil pond was fringed by weeping cherry trees, their blossoms drifting lazily onto the glassy surface\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"In a still frame, within the historic library's reading room, rows of antique leather chairs and mahogany tables offered a serene haven for literary contemplation\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of a peaceful orchid garden showcased a variety of delicate blooms\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tranquil tableau of in the serene courtyard, a centuries-old stone well stood as a symbol of a bygone era, its mossy stones bearing witness to the passage of time\",\n        \"dimension\": [\n            \"temporal_flickering\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a bird and a cat\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"bird and cat\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a cat and a dog\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"cat and dog\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a dog and a horse\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"dog and horse\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a horse and a sheep\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"horse and sheep\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a sheep and a cow\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"sheep and cow\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a cow and an elephant\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"cow and elephant\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an elephant and a bear\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"elephant and bear\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bear and a zebra\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"bear and zebra\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a zebra and a giraffe\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"zebra and giraffe\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a giraffe and a bird\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"giraffe and bird\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a chair and a couch\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"chair and couch\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a couch and a potted plant\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"couch and potted plant\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a potted plant and a tv\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"potted plant and tv\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a tv and a laptop\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"tv and laptop\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a laptop and a remote\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"laptop and remote\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a remote and a keyboard\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"remote and keyboard\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a keyboard and a cell phone\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"keyboard and cell phone\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a cell phone and a book\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"cell phone and book\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a book and a clock\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"book and clock\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a clock and a backpack\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"clock and backpack\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a backpack and an umbrella\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"backpack and umbrella\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an umbrella and a handbag\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"umbrella and handbag\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a handbag and a tie\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"handbag and tie\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a tie and a suitcase\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"tie and suitcase\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a suitcase and a vase\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"suitcase and vase\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a vase and scissors\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"vase and scissors\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"scissors and a teddy bear\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"scissors and teddy bear\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a teddy bear and a frisbee\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"teddy bear and frisbee\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a frisbee and skis\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"frisbee and skis\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"skis and a snowboard\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"skis and snowboard\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a snowboard and a sports ball\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"snowboard and sports ball\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a sports ball and a kite\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"sports ball and kite\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a kite and a baseball bat\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"kite and baseball bat\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a baseball bat and a baseball glove\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"baseball bat and baseball glove\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a baseball glove and a skateboard\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"baseball glove and skateboard\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a skateboard and a surfboard\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"skateboard and surfboard\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a surfboard and a tennis racket\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"surfboard and tennis racket\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a tennis racket and a bottle\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"tennis racket and bottle\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bottle and a chair\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"bottle and chair\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an airplane and a train\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"airplane and train\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a train and a boat\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"train and boat\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a boat and an airplane\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"boat and airplane\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bicycle and a car\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"bicycle and car\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a car and a motorcycle\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"car and motorcycle\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a motorcycle and a bus\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"motorcycle and bus\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bus and a traffic light\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"bus and traffic light\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a traffic light and a fire hydrant\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"traffic light and fire hydrant\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a fire hydrant and a stop sign\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"fire hydrant and stop sign\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a stop sign and a parking meter\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"stop sign and parking meter\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a parking meter and a truck\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"parking meter and truck\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a truck and a bicycle\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"truck and bicycle\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a toilet and a hair drier\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"toilet and hair drier\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a hair drier and a toothbrush\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"hair drier and toothbrush\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a toothbrush and a sink\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"toothbrush and sink\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a sink and a toilet\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"sink and toilet\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a wine glass and a chair\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"wine glass and chair\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a cup and a couch\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"cup and couch\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a fork and a potted plant\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"fork and potted plant\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a knife and a tv\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"knife and tv\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a spoon and a laptop\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"spoon and laptop\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bowl and a remote\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"bowl and remote\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a banana and a keyboard\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"banana and keyboard\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an apple and a cell phone\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"apple and cell phone\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a sandwich and a book\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"sandwich and book\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an orange and a clock\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"orange and clock\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"broccoli and a backpack\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"broccoli and backpack\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a carrot and an umbrella\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"carrot and umbrella\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a hot dog and a handbag\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"hot dog and handbag\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a pizza and a tie\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"pizza and tie\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a donut and a suitcase\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"donut and suitcase\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a cake and a vase\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"cake and vase\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an oven and scissors\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"oven and scissors\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a toaster and a teddy bear\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"toaster and teddy bear\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a microwave and a frisbee\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"microwave and frisbee\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a refrigerator and skis\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"refrigerator and skis\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bicycle and an airplane\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"bicycle and airplane\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a car and a train\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"car and train\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a motorcycle and a boat\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"motorcycle and boat\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a person and a toilet\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"person and toilet\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a person and a hair drier\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"person and hair drier\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a person and a toothbrush\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"person and toothbrush\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a person and a sink\",\n        \"dimension\": [\n            \"multiple_objects\"\n        ],\n        \"auxiliary_info\": {\n            \"multiple_objects\": {\n                \"object\": \"person and sink\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A person is riding a bike\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is marching\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is roller skating\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is tasting beer\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is clapping\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is drawing\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is petting animal (not cat)\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is eating watermelon\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is playing harp\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is wrestling\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is riding scooter\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is sweeping floor\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is skateboarding\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is dunking basketball\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is playing flute\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is stretching leg\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is tying tie\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is skydiving\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is shooting goal (soccer)\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is playing piano\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is finger snapping\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is canoeing or kayaking\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is laughing\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is digging\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is clay pottery making\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is shooting basketball\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is bending back\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is shaking hands\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is bandaging\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is push up\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is catching or throwing frisbee\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is playing trumpet\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is flying kite\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is filling eyebrows\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is shuffling cards\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is folding clothes\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is smoking\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is tai chi\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is squat\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is playing controller\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is throwing axe\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is giving or receiving award\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is air drumming\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is taking a shower\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is planting trees\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is sharpening knives\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is robot dancing\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is rock climbing\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is hula hooping\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is writing\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is bungee jumping\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is pushing cart\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is cleaning windows\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is cutting watermelon\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is cheerleading\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is washing hands\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is ironing\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is cutting nails\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is hugging\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is trimming or shaving beard\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is jogging\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is making bed\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is washing dishes\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is grooming dog\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is doing laundry\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is knitting\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is reading book\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is baby waking up\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is massaging legs\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is brushing teeth\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is crawling baby\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is motorcycling\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is driving car\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is sticking tongue out\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is shaking head\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is sword fighting\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is doing aerobics\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is strumming guitar\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is riding or walking with horse\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is archery\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is catching or throwing baseball\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is playing chess\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is rock scissors paper\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is using computer\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is arranging flowers\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is bending metal\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is ice skating\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is climbing a rope\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is crying\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is dancing ballet\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is getting a haircut\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is running on treadmill\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is kissing\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is counting money\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is barbequing\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is peeling apples\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is milking cow\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is shining shoes\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is making snowman\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A person is sailing\",\n        \"dimension\": [\n            \"human_action\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a person swimming in ocean\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a person giving a presentation to a room full of colleagues\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a person washing the dishes\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a person eating a burger\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a person walking in the snowstorm\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a person drinking coffee in a cafe\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a person playing guitar\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a bicycle leaning against a tree\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a bicycle gliding through a snowy field\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a bicycle slowing down to stop\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a bicycle accelerating to gain speed\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a car stuck in traffic during rush hour\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a car turning a corner\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a car slowing down to stop\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a car accelerating to gain speed\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a motorcycle cruising along a coastal highway\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a motorcycle turning a corner\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a motorcycle slowing down to stop\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a motorcycle gliding through a snowy field\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a motorcycle accelerating to gain speed\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"an airplane soaring through a clear blue sky\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"an airplane taking off\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"an airplane landing smoothly on a runway\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"an airplane accelerating to gain speed\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a bus turning a corner\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a bus stuck in traffic during rush hour\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a bus accelerating to gain speed\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a train speeding down the tracks\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a train crossing over a tall bridge\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a train accelerating to gain speed\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a truck turning a corner\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a truck anchored in a tranquil bay\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a truck stuck in traffic during rush hour\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a truck slowing down to stop\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a truck accelerating to gain speed\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a boat sailing smoothly on a calm lake\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a boat slowing down to stop\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a boat accelerating to gain speed\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a bird soaring gracefully in the sky\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a bird building a nest from twigs and leaves\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a bird flying over a snowy forest\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a cat grooming itself meticulously with its tongue\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a cat playing in park\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a cat drinking water\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a cat running happily\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a dog enjoying a peaceful walk\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a dog playing in park\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a dog drinking water\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a dog running happily\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a horse bending down to drink water from a river\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a horse galloping across an open field\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a horse taking a peaceful walk\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a horse running to join a herd of its kind\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a sheep bending down to drink water from a river\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a sheep taking a peaceful walk\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a sheep running to join a herd of its kind\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a cow bending down to drink water from a river\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a cow chewing cud while resting in a tranquil barn\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a cow running to join a herd of its kind\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"an elephant spraying itself with water using its trunk to cool down\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"an elephant taking a peaceful walk\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"an elephant running to join a herd of its kind\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a bear catching a salmon in its powerful jaws\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a bear sniffing the air for scents of food\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a bear climbing a tree\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a bear hunting for prey\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a zebra bending down to drink water from a river\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a zebra running to join a herd of its kind\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a zebra taking a peaceful walk\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a giraffe bending down to drink water from a river\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a giraffe taking a peaceful walk\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a giraffe running to join a herd of its kind\",\n        \"dimension\": [\n            \"subject_consistency\",\n            \"dynamic_degree\",\n            \"motion_smoothness\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a person\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"person\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bicycle\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"bicycle\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a car\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"car\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a motorcycle\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"motorcycle\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an airplane\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"airplane\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bus\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"bus\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a train\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"train\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a truck\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"truck\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a boat\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"boat\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a traffic light\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"traffic light\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a fire hydrant\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"fire hydrant\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a stop sign\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"stop sign\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a parking meter\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"parking meter\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bench\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"bench\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bird\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"bird\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a cat\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"cat\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a dog\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"dog\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a horse\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"horse\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a sheep\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"sheep\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a cow\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"cow\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an elephant\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"elephant\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bear\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"bear\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a zebra\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"zebra\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a giraffe\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"giraffe\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a backpack\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"backpack\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an umbrella\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"umbrella\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a handbag\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"handbag\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a tie\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"tie\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a suitcase\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"suitcase\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a frisbee\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"frisbee\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"skis\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"skis\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a snowboard\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"snowboard\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a sports ball\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"sports ball\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a kite\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"kite\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a baseball bat\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"baseball bat\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a baseball glove\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"baseball glove\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a skateboard\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"skateboard\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a surfboard\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"surfboard\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a tennis racket\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"tennis racket\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bottle\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"bottle\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a wine glass\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"wine glass\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a cup\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"cup\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a fork\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"fork\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a knife\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"knife\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a spoon\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"spoon\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bowl\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"bowl\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a banana\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"banana\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an apple\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"apple\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a sandwich\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"sandwich\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an orange\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"orange\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"broccoli\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"broccoli\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a carrot\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"carrot\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a hot dog\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"hot dog\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a pizza\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"pizza\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a donut\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"donut\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a cake\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"cake\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a chair\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"chair\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a couch\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"couch\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a potted plant\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"potted plant\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bed\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"bed\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a dining table\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"dining table\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a toilet\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"toilet\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a tv\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"tv\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a laptop\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"laptop\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a remote\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"remote\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a keyboard\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"keyboard\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a cell phone\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"cell phone\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a microwave\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"microwave\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an oven\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"oven\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a toaster\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"toaster\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a sink\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"sink\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a refrigerator\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"refrigerator\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a book\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"book\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a clock\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"clock\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a vase\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"vase\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"scissors\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"scissors\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a teddy bear\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"teddy bear\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a hair drier\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"hair drier\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a toothbrush\",\n        \"dimension\": [\n            \"object_class\"\n        ],\n        \"auxiliary_info\": {\n            \"object_class\": {\n                \"object\": \"toothbrush\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a red bicycle\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"red\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a green bicycle\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"green\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a blue bicycle\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"blue\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a yellow bicycle\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"yellow\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an orange bicycle\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"orange\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a purple bicycle\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"purple\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a pink bicycle\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"pink\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a black bicycle\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"black\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a white bicycle\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a red car\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"red\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a green car\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"green\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a blue car\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"blue\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a yellow car\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"yellow\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an orange car\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"orange\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a purple car\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"purple\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a pink car\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"pink\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a black car\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"black\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a white car\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a red bird\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"red\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a green bird\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"green\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a blue bird\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"blue\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a yellow bird\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"yellow\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an orange bird\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"orange\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a purple bird\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"purple\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a pink bird\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"pink\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a black bird\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"black\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a white bird\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a black cat\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"black\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a white cat\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an orange cat\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"orange\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a yellow cat\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"yellow\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a red umbrella\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"red\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a green umbrella\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"green\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a blue umbrella\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"blue\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a yellow umbrella\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"yellow\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an orange umbrella\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"orange\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a purple umbrella\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"purple\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a pink umbrella\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"pink\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a black umbrella\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"black\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a white umbrella\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a red suitcase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"red\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a green suitcase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"green\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a blue suitcase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"blue\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a yellow suitcase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"yellow\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an orange suitcase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"orange\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a purple suitcase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"purple\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a pink suitcase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"pink\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a black suitcase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"black\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a white suitcase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a red bowl\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"red\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a green bowl\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"green\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a blue bowl\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"blue\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a yellow bowl\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"yellow\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an orange bowl\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"orange\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a purple bowl\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"purple\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a pink bowl\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"pink\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a black bowl\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"black\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a white bowl\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a red chair\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"red\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a green chair\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"green\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a blue chair\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"blue\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a yellow chair\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"yellow\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an orange chair\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"orange\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a purple chair\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"purple\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a pink chair\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"pink\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a black chair\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"black\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a white chair\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a red clock\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"red\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a green clock\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"green\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a blue clock\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"blue\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a yellow clock\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"yellow\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an orange clock\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"orange\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a purple clock\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"purple\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a pink clock\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"pink\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a black clock\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"black\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a white clock\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a red vase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"red\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a green vase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"green\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a blue vase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"blue\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a yellow vase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"yellow\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an orange vase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"orange\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a purple vase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"purple\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a pink vase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"pink\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a black vase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"black\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a white vase\",\n        \"dimension\": [\n            \"color\"\n        ],\n        \"auxiliary_info\": {\n            \"color\": {\n                \"color\": \"white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, Van Gogh style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"Van Gogh style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, oil painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"oil painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand by Hokusai, in the style of Ukiyo\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"by Hokusai, in the style of Ukiyo\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, black and white\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"black and white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, pixel art\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"pixel art\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, in cyberpunk style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"in cyberpunk style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, animated style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"animated style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, watercolor painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"watercolor painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, surrealism style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"surrealism style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, Van Gogh style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"Van Gogh style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, oil painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"oil painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai by Hokusai, in the style of Ukiyo\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"by Hokusai, in the style of Ukiyo\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, black and white\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"black and white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, pixel art\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"pixel art\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, in cyberpunk style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"in cyberpunk style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, animated style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"animated style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, watercolor painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"watercolor painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, surrealism style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"surrealism style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, Van Gogh style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"Van Gogh style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, oil painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"oil painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean by Hokusai, in the style of Ukiyo\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"by Hokusai, in the style of Ukiyo\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, black and white\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"black and white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, pixel art\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"pixel art\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, in cyberpunk style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"in cyberpunk style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, animated style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"animated style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, watercolor painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"watercolor painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, surrealism style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"surrealism style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, Van Gogh style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"Van Gogh style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, oil painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"oil painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris by Hokusai, in the style of Ukiyo\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"by Hokusai, in the style of Ukiyo\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, black and white\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"black and white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, pixel art\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"pixel art\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, in cyberpunk style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"in cyberpunk style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, animated style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"animated style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, watercolor painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"watercolor painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, surrealism style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"surrealism style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, Van Gogh style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"Van Gogh style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, oil painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"oil painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset by Hokusai, in the style of Ukiyo\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"by Hokusai, in the style of Ukiyo\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, black and white\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"black and white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, pixel art\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"pixel art\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, in cyberpunk style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"in cyberpunk style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, animated style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"animated style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, watercolor painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"watercolor painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, surrealism style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"surrealism style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, Van Gogh style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"Van Gogh style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, oil painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"oil painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book by Hokusai, in the style of Ukiyo\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"by Hokusai, in the style of Ukiyo\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, black and white\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"black and white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, pixel art\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"pixel art\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, in cyberpunk style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"in cyberpunk style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, animated style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"animated style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, watercolor painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"watercolor painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, surrealism style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"surrealism style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, Van Gogh style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"Van Gogh style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, oil painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"oil painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background by Hokusai, in the style of Ukiyo\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"by Hokusai, in the style of Ukiyo\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, black and white\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"black and white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, pixel art\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"pixel art\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, in cyberpunk style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"in cyberpunk style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, animated style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"animated style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, watercolor painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"watercolor painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, surrealism style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"surrealism style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, Van Gogh style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"Van Gogh style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, oil painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"oil painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas by Hokusai, in the style of Ukiyo\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"by Hokusai, in the style of Ukiyo\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, black and white\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"black and white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, pixel art\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"pixel art\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, in cyberpunk style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"in cyberpunk style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, animated style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"animated style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, watercolor painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"watercolor painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, surrealism style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"surrealism style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, Van Gogh style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"Van Gogh style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, oil painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"oil painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space by Hokusai, in the style of Ukiyo\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"by Hokusai, in the style of Ukiyo\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, black and white\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"black and white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, pixel art\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"pixel art\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, in cyberpunk style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"in cyberpunk style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, animated style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"animated style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, watercolor painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"watercolor painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, surrealism style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"surrealism style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, Van Gogh style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"Van Gogh style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, oil painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"oil painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks by Hokusai, in the style of Ukiyo\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"by Hokusai, in the style of Ukiyo\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, black and white\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"black and white\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, pixel art\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"pixel art\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, in cyberpunk style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"in cyberpunk style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, animated style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"animated style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, watercolor painting\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"watercolor painting\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, surrealism style\",\n        \"dimension\": [\n            \"appearance_style\"\n        ],\n        \"auxiliary_info\": {\n            \"appearance_style\": {\n                \"appearance_style\": \"surrealism style\"\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, in super slow motion\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, zoom in\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, zoom out\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, pan left\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, pan right\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, tilt up\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, tilt down\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, with an intense shaking effect\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, featuring a steady and smooth perspective\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand, racking focus\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, in super slow motion\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, zoom in\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, zoom out\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, pan left\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, pan right\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, tilt up\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, tilt down\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, with an intense shaking effect\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, featuring a steady and smooth perspective\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, racking focus\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, in super slow motion\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, zoom in\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, zoom out\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, pan left\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, pan right\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, tilt up\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, tilt down\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, with an intense shaking effect\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, featuring a steady and smooth perspective\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean, racking focus\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, in super slow motion\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, zoom in\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, zoom out\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, pan left\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, pan right\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, tilt up\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, tilt down\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, with an intense shaking effect\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, featuring a steady and smooth perspective\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris, racking focus\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, in super slow motion\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, zoom in\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, zoom out\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, pan left\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, pan right\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, tilt up\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, tilt down\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, with an intense shaking effect\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, featuring a steady and smooth perspective\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset, racking focus\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, in super slow motion\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, zoom in\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, zoom out\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, pan left\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, pan right\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, tilt up\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, tilt down\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, with an intense shaking effect\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, featuring a steady and smooth perspective\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book, racking focus\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, in super slow motion\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, zoom in\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, zoom out\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, pan left\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, pan right\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, tilt up\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, tilt down\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, with an intense shaking effect\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, featuring a steady and smooth perspective\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background, racking focus\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, in super slow motion\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, zoom in\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, zoom out\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, pan left\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, pan right\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, tilt up\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, tilt down\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, with an intense shaking effect\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, featuring a steady and smooth perspective\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A couple in formal evening wear going home get caught in a heavy downpour with umbrellas, racking focus\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, in super slow motion\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, zoom in\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, zoom out\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, pan left\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, pan right\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, tilt up\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, tilt down\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, with an intense shaking effect\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, featuring a steady and smooth perspective\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space, racking focus\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, in super slow motion\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, zoom in\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, zoom out\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, pan left\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, pan right\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, tilt up\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, tilt down\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, with an intense shaking effect\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, featuring a steady and smooth perspective\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks, racking focus\",\n        \"dimension\": [\n            \"temporal_style\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Close up of grapes on a rotating table.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Turtle swimming in ocean.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A storm trooper vacuuming the beach.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A panda standing on a surfboard in the ocean in sunset.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An astronaut feeding ducks on a sunny afternoon, reflection from the water.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Two pandas discussing an academic paper.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Sunset time lapse at the beach with moving clouds and colors in the sky.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A fat rabbit wearing a purple robe walking through a fantasy landscape.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A koala bear playing piano in the forest.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An astronaut flying in space.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Fireworks.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An animated painting of fluffy white clouds moving in sky.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Flying through fantasy landscapes.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A bigfoot walking in the snowstorm.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A squirrel eating a burger.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A cat wearing sunglasses and working as a lifeguard at a pool.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Snow rocky mountains peaks canyon. snow blanketed rocky mountains surround and shadow deep canyons. the canyons twist and bend through the high elevated mountain peaks.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Splash of turquoise water in extreme slow motion, alpha channel included.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"an ice cream is melting on the table.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a drone flying over a snowy forest.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a shark is swimming in the ocean.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Aerial panoramic video from a drone of a fantasy land.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a teddy bear is swimming in the ocean.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"time lapse of sunrise on mars.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"golden fish swimming in the ocean.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An artist brush painting on a canvas close up.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A drone view of celebration with Christmas tree and fireworks, starry sky - background.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"happy dog wearing a yellow turtleneck, studio, portrait, facing camera, dark background\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Origami dancers in white paper, 3D render, on white background, studio shot, dancing modern dance.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Campfire at night in a snowy forest with starry sky in the background.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"a fantasy landscape\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A 3D model of a 1800s victorian house.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"this is how I do makeup in the morning.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A raccoon that looks like a turtle, digital art.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Robot dancing in Times Square.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Busy freeway at night.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Balloon full of water exploding in extreme slow motion.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An astronaut is riding a horse in the space in a photorealistic style.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Macro slo-mo. Slow motion cropped closeup of roasted coffee beans falling into an empty bowl.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Sewing machine, old sewing machine working.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Motion colour drop in water, ink swirling in water, colourful ink in water, abstraction fancy dream cloud of ink.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Few big purple plums rotating on the turntable. water drops appear on the skin during rotation. isolated on the white background. close-up. macro.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Vampire makeup face of beautiful girl, red contact lenses.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Ashtray full of butts on table, smoke flowing on black background, close-up\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Pacific coast, carmel by the sea ocean and waves.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A teddy bear is playing drum kit in NYC Times Square.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A corgi is playing drum kit.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An Iron man is playing the electronic guitar, high electronic guitar.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A raccoon is playing the electronic guitar.\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background by Vincent van Gogh\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A corgi's head depicted as an explosion of a nebula\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A fantasy landscape\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A future where humans have achieved teleportation technology\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A jellyfish floating through the ocean, with bioluminescent tentacles\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A Mars rover moving on Mars\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A panda drinking coffee in a cafe in Paris\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A space shuttle launching into orbit, with flames and smoke billowing out from the engines\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A steam train moving on a mountainside\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A super cool giant robot in Cyberpunk Beijing\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A tropical beach at sunrise, with palm trees and crystal-clear water in the foreground\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Cinematic shot of Van Gogh's selfie, Van Gogh style\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Gwen Stacy reading a book\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Iron Man flying in the sky\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, oil painting\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Yoda playing guitar on the stage\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand by Hokusai, in the style of Ukiyo\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A beautiful coastal beach in spring, waves lapping on sand by Vincent van Gogh\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A boat sailing leisurely along the Seine River with the Eiffel Tower in background\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A car moving slowly on an empty street, rainy evening\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A cat eating food out of a bowl\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A cat wearing sunglasses at a pool\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A confused panda in calculus class\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A cute fluffy panda eating Chinese food in a restaurant\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A cute happy Corgi playing in park, sunset\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A cute raccoon playing guitar in a boat on the ocean\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A happy fuzzy panda playing guitar nearby a campfire, snow mountain in the background\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A lightning striking atop of eiffel tower, dark clouds in the sky\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A modern art museum, with colorful paintings\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A panda cooking in the kitchen\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A panda playing on a swing set\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A polar bear is playing guitar\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A raccoon dressed in suit playing the trumpet, stage background\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A robot DJ is playing the turntable, in heavy raining futuristic tokyo rooftop cyberpunk night, sci-fi, fantasy\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A shark swimming in clear Caribbean ocean\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A super robot protecting city\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"A teddy bear washing the dishes\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An epic tornado attacking above a glowing city at night, the tornado is made of smoke\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"An oil painting of a couple in formal evening wear going home get caught in a heavy downpour with umbrellas\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Clown fish swimming through the coral reef\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Hyper-realistic spaceship landing on Mars\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"The bund Shanghai, vibrant color\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Vincent van Gogh is painting in the room\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"Yellow flowers swing in the wind\",\n        \"dimension\": [\n            \"overall_consistency\",\n            \"aesthetic_quality\",\n            \"imaging_quality\"\n        ]\n    },\n    {\n        \"prompt_en\": \"alley\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"alley\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"amusement park\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"amusement park\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"aquarium\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"aquarium\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"arch\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"arch\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"art gallery\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"art gallery\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"bathroom\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"bathroom\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"bakery shop\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"bakery shop\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"ballroom\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"ballroom\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"bar\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"bar\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"barn\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"barn\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"basement\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"basement\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"beach\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"beach\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"bedroom\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"bedroom\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"bridge\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"bridge\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"botanical garden\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"botanical garden\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"cafeteria\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"cafeteria\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"campsite\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"campsite\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"campus\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"campus\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"carrousel\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"carrousel\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"castle\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"castle\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"cemetery\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"cemetery\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"classroom\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"classroom\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"cliff\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"cliff\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"crosswalk\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"crosswalk\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"construction site\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"construction site\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"corridor\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"corridor\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"courtyard\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"courtyard\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"desert\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"desert\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"downtown\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"downtown\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"driveway\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"driveway\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"farm\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"farm\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"food court\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"food court\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"football field\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"football field\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"forest road\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"forest road\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"fountain\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"fountain\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"gas station\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"gas station\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"glacier\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"glacier\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"golf course\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"golf course\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"indoor gymnasium\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"indoor gymnasium\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"harbor\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"harbor\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"highway\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"highway\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"hospital\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"hospital\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"house\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"house\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"iceberg\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"iceberg\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"industrial area\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"industrial area\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"jail cell\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"jail cell\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"junkyard\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"junkyard\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"kitchen\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"kitchen\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"indoor library\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"indoor library\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"lighthouse\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"lighthouse\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"laboratory\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"laboratory\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"mansion\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"mansion\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"marsh\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"marsh\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"mountain\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"mountain\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"indoor movie theater\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"indoor movie theater\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"indoor museum\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"indoor museum\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"music studio\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"music studio\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"nursery\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"nursery\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"ocean\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"ocean\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"office\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"office\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"palace\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"palace\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"parking lot\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"parking lot\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"pharmacy\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"pharmacy\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"phone booth\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"phone booth\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"raceway\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"raceway\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"restaurant\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"restaurant\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"river\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"river\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"science museum\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"science museum\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"shower\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"shower\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"ski slope\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"ski slope\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"sky\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"sky\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"skyscraper\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"skyscraper\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"baseball stadium\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"baseball stadium\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"staircase\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"staircase\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"street\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"street\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"supermarket\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"supermarket\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"indoor swimming pool\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"indoor swimming pool\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"tower\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"tower\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"outdoor track\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"outdoor track\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"train railway\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"train railway\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"train station platform\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"train station platform\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"underwater coral reef\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"underwater coral reef\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"valley\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"valley\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"volcano\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"volcano\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"waterfall\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"waterfall\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"windmill\",\n        \"dimension\": [\n            \"scene\",\n            \"background_consistency\"\n        ],\n        \"auxiliary_info\": {\n            \"scene\": {\n                \"scene\": {\n                    \"scene\": \"windmill\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bicycle on the left of a car, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"bicycle\",\n                    \"object_b\": \"car\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a car on the right of a motorcycle, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"car\",\n                    \"object_b\": \"motorcycle\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a motorcycle on the left of a bus, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"motorcycle\",\n                    \"object_b\": \"bus\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bus on the right of a traffic light, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"bus\",\n                    \"object_b\": \"traffic light\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a traffic light on the left of a fire hydrant, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"traffic light\",\n                    \"object_b\": \"fire hydrant\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a fire hydrant on the right of a stop sign, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"fire hydrant\",\n                    \"object_b\": \"stop sign\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a stop sign on the left of a parking meter, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"stop sign\",\n                    \"object_b\": \"parking meter\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a parking meter on the right of a bench, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"parking meter\",\n                    \"object_b\": \"bench\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bench on the left of a truck, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"bench\",\n                    \"object_b\": \"truck\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a truck on the right of a bicycle, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"truck\",\n                    \"object_b\": \"bicycle\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bird on the left of a cat, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"bird\",\n                    \"object_b\": \"cat\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a cat on the right of a dog, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"cat\",\n                    \"object_b\": \"dog\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a dog on the left of a horse, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"dog\",\n                    \"object_b\": \"horse\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a horse on the right of a sheep, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"horse\",\n                    \"object_b\": \"sheep\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a sheep on the left of a cow, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"sheep\",\n                    \"object_b\": \"cow\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a cow on the right of an elephant, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"cow\",\n                    \"object_b\": \"elephant\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an elephant on the left of a bear, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"elephant\",\n                    \"object_b\": \"bear\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bear on the right of a zebra, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"bear\",\n                    \"object_b\": \"zebra\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a zebra on the left of a giraffe, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"zebra\",\n                    \"object_b\": \"giraffe\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a giraffe on the right of a bird, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"giraffe\",\n                    \"object_b\": \"bird\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bottle on the left of a wine glass, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"bottle\",\n                    \"object_b\": \"wine glass\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a wine glass on the right of a cup, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"wine glass\",\n                    \"object_b\": \"cup\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a cup on the left of a fork, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"cup\",\n                    \"object_b\": \"fork\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a fork on the right of a knife, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"fork\",\n                    \"object_b\": \"knife\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a knife on the left of a spoon, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"knife\",\n                    \"object_b\": \"spoon\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a spoon on the right of a bowl, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"spoon\",\n                    \"object_b\": \"bowl\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bowl on the left of a bottle, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"bowl\",\n                    \"object_b\": \"bottle\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a potted plant on the left of a remote, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"potted plant\",\n                    \"object_b\": \"remote\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a remote on the right of a clock, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"remote\",\n                    \"object_b\": \"clock\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a clock on the left of a vase, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"clock\",\n                    \"object_b\": \"vase\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a vase on the right of scissors, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"vase\",\n                    \"object_b\": \"scissors\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"scissors on the left of a teddy bear, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"scissors\",\n                    \"object_b\": \"teddy bear\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a teddy bear on the right of a potted plant, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"teddy bear\",\n                    \"object_b\": \"potted plant\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a frisbee on the left of a sports ball, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"frisbee\",\n                    \"object_b\": \"sports ball\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a sports ball on the right of a baseball bat, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"sports ball\",\n                    \"object_b\": \"baseball bat\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a baseball bat on the left of a baseball glove, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"baseball bat\",\n                    \"object_b\": \"baseball glove\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a baseball glove on the right of a tennis racket, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"baseball glove\",\n                    \"object_b\": \"tennis racket\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a tennis racket on the left of a frisbee, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"tennis racket\",\n                    \"object_b\": \"frisbee\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a toilet on the left of a hair drier, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"toilet\",\n                    \"object_b\": \"hair drier\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a hair drier on the right of a toothbrush, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"hair drier\",\n                    \"object_b\": \"toothbrush\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a toothbrush on the left of a sink, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"toothbrush\",\n                    \"object_b\": \"sink\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a sink on the right of a toilet, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"sink\",\n                    \"object_b\": \"toilet\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a chair on the left of a couch, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"chair\",\n                    \"object_b\": \"couch\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a couch on the right of a bed, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"couch\",\n                    \"object_b\": \"bed\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a bed on the left of a tv, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"bed\",\n                    \"object_b\": \"tv\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a tv on the right of a dining table, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"tv\",\n                    \"object_b\": \"dining table\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a dining table on the left of a chair, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"dining table\",\n                    \"object_b\": \"chair\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an airplane on the left of a train, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"airplane\",\n                    \"object_b\": \"train\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a train on the right of a boat, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"train\",\n                    \"object_b\": \"boat\",\n                    \"relationship\": \"on the right of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a boat on the left of an airplane, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"boat\",\n                    \"object_b\": \"airplane\",\n                    \"relationship\": \"on the left of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an oven on the top of a toaster, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"oven\",\n                    \"object_b\": \"toaster\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an oven on the bottom of a toaster, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"oven\",\n                    \"object_b\": \"toaster\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a toaster on the top of a microwave, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"toaster\",\n                    \"object_b\": \"microwave\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a toaster on the bottom of a microwave, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"toaster\",\n                    \"object_b\": \"microwave\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a microwave on the top of an oven, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"microwave\",\n                    \"object_b\": \"oven\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a microwave on the bottom of an oven, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"microwave\",\n                    \"object_b\": \"oven\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a banana on the top of an apple, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"banana\",\n                    \"object_b\": \"apple\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a banana on the bottom of an apple, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"banana\",\n                    \"object_b\": \"apple\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an apple on the top of a sandwich, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"apple\",\n                    \"object_b\": \"sandwich\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an apple on the bottom of a sandwich, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"apple\",\n                    \"object_b\": \"sandwich\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a sandwich on the top of an orange, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"sandwich\",\n                    \"object_b\": \"orange\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a sandwich on the bottom of an orange, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"sandwich\",\n                    \"object_b\": \"orange\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an orange on the top of a carrot, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"orange\",\n                    \"object_b\": \"carrot\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"an orange on the bottom of a carrot, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"orange\",\n                    \"object_b\": \"carrot\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a carrot on the top of a hot dog, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"carrot\",\n                    \"object_b\": \"hot dog\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a carrot on the bottom of a hot dog, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"carrot\",\n                    \"object_b\": \"hot dog\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a hot dog on the top of a pizza, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"hot dog\",\n                    \"object_b\": \"pizza\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a hot dog on the bottom of a pizza, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"hot dog\",\n                    \"object_b\": \"pizza\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a pizza on the top of a donut, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"pizza\",\n                    \"object_b\": \"donut\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a pizza on the bottom of a donut, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"pizza\",\n                    \"object_b\": \"donut\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a donut on the top of broccoli, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"donut\",\n                    \"object_b\": \"broccoli\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a donut on the bottom of broccoli, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"donut\",\n                    \"object_b\": \"broccoli\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"broccoli on the top of a banana, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"broccoli\",\n                    \"object_b\": \"banana\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"broccoli on the bottom of a banana, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"broccoli\",\n                    \"object_b\": \"banana\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"skis on the top of a snowboard, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"skis\",\n                    \"object_b\": \"snowboard\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"skis on the bottom of a snowboard, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"skis\",\n                    \"object_b\": \"snowboard\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a snowboard on the top of a kite, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"snowboard\",\n                    \"object_b\": \"kite\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a snowboard on the bottom of a kite, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"snowboard\",\n                    \"object_b\": \"kite\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a kite on the top of a skateboard, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"kite\",\n                    \"object_b\": \"skateboard\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a kite on the bottom of a skateboard, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"kite\",\n                    \"object_b\": \"skateboard\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a skateboard on the top of a surfboard, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"skateboard\",\n                    \"object_b\": \"surfboard\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a skateboard on the bottom of a surfboard, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"skateboard\",\n                    \"object_b\": \"surfboard\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a surfboard on the top of skis, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"surfboard\",\n                    \"object_b\": \"skis\",\n                    \"relationship\": \"on the top of\"\n                }\n            }\n        }\n    },\n    {\n        \"prompt_en\": \"a surfboard on the bottom of skis, front view\",\n        \"dimension\": [\n            \"spatial_relationship\"\n        ],\n        \"auxiliary_info\": {\n            \"spatial_relationship\": {\n                \"spatial_relationship\": {\n                    \"object_a\": \"surfboard\",\n                    \"object_b\": \"skis\",\n                    \"relationship\": \"on the bottom of\"\n                }\n            }\n        }\n    }\n]\n"
  },
  {
    "path": "Open-Sora/eval/vbench/calc_vbench.py",
    "content": "import argparse\nimport os\nimport time\n\nimport torch\nfrom vbench import VBench\n\nfull_info_path = \"eval/vbench/VBench_full_info.json\"\ndimensions = [\n    # a: 10min\n    \"subject_consistency\",  # 4min\n    \"imaging_quality\",  # 6min\n    # b: 12min\n    \"background_consistency\",  # 2min\n    \"motion_smoothness\",  # 5min\n    \"overall_consistency\",  # 2min\n    \"human_action\",  # 3min\n    # c: 14min\n    \"multiple_objects\",  # 14min\n    # d: 14min\n    \"spatial_relationship\",  # 14min\n    # e: 12min\n    \"object_class\",  # 12min\n    # f: 12min\n    \"color\",  # 12min\n    # g: 10.5min\n    \"aesthetic_quality\",  # 2.5min\n    \"appearance_style\",  # 6min\n    \"temporal_flickering\",  # 2min\n    # h: 9min\n    \"scene\",  # 3min\n    \"temporal_style\",  # 2min\n    \"dynamic_degree\",  # 4min\n]\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"video_folder\", type=str)  # samples/samples..._vbench/eval\n    parser.add_argument(\"model_ckpt\", type=str)\n    parser.add_argument(\"--start\", type=int, default=0)  # start index of dimension to be evaluated\n    parser.add_argument(\"--end\", type=int, default=-1)  # start index of dimension to be evaluated\n\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n    output_dir = os.path.join(args.model_ckpt, \"vbench\")\n    os.makedirs(output_dir, exist_ok=True)\n    video_path = args.video_folder\n\n    kwargs = {}\n    kwargs[\"imaging_quality_preprocessing_mode\"] = \"longer\"  # use VBench/evaluate.py default\n\n    start_time = time.time()\n\n    # NOTE: important to use torch.device(\"cuda\"), else will have issue with object_class third_party module\n    my_VBench = VBench(torch.device(\"cuda\"), full_info_path, output_dir)\n    if args.end == -1:  # adjust end accordingly\n        args.end = len(dimensions)\n    for dim in dimensions[args.start : args.end]:\n        my_VBench.evaluate(\n            videos_path=video_path,\n            name=dim,\n            local=False,\n            read_frame=False,\n            dimension_list=[dim],\n            mode=\"vbench_standard\",\n            **kwargs,\n        )\n\n    print(\"Runtime: %s seconds \" % (time.time() - start_time))\n"
  },
  {
    "path": "Open-Sora/eval/vbench/launch.sh",
    "content": "# !/bin/bash\n\nCKPT=$1\nNUM_FRAMES=$2\nMODEL_NAME=$3\nRES=$4\nASP_RATIO=$5\n\nNUM_SAMPLING_STEPS=$6\nFLOW=$7\nLLM_REFINE=$8\n\nif [[ $CKPT == *\"ema\"* ]]; then\n    parentdir=$(dirname $CKPT)\n    CKPT_BASE=$(basename $parentdir)_ema\nelse\n    CKPT_BASE=$(basename $CKPT)\nfi\nLOG_BASE=$(dirname $CKPT)/eval\necho \"Logging to $LOG_BASE\"\n\n# 确保 eval 目录存在\nmkdir -p $LOG_BASE\n\n#GPUS=(0 1 2 3 4 5 6 7)\n#TASK_ID_LIST=(4a 4b 4c 4d 4e 4f 4g 4h) # for log records only\n#START_INDEX_LIST=(0 120 240 360 480 600 720 840)\n#END_INDEX_LIST=(120 240 360 480 600 720 840 2000)\n\n# 使用 6 张 GPU\nGPUS=(0 1 2 3 4 5)\nTASK_ID_LIST=(4a 4b 4c 4d 4e 4f)\n# 将 950 个 prompts 划分为 6 个区间\nSTART_INDEX_LIST=(0 158 316 474 632 790)\nEND_INDEX_LIST=(158 316 474 632 790 2000)\n\n# 使用 5 张 GPU\n#GPUS=(0 1 2 3 4)\n#TASK_ID_LIST=(4a 4b 4c 4d 4e)\n## 将 950 个 prompts 划分为 5 个区间\n#START_INDEX_LIST=(0 190 380 570 760)\n#END_INDEX_LIST=(190 380 570 760 2000)\n\n## Modify the following to run on multiple machines for faster results\n## 720p will take quite long on a single machine\n# START_INDEX_LIST=(60 180 300 420 540 660 780 900)\n# END_INDEX_LIST=(120 240 360 480 600 720 840 2000)\n# LOG_BASE=$(dirname $CKPT)/eval/last_60\n# mkdir -p ${LOG_BASE}\n# echo \"Logging to $LOG_BASE\"\n\n\n\nfor i in \"${!GPUS[@]}\"; do\n    if [ -z ${RES} ] || [ -z ${ASP_RATIO} ]  ;\n        then\n            CUDA_VISIBLE_DEVICES=${GPUS[i]} bash eval/sample.sh $CKPT ${NUM_FRAMES} ${MODEL_NAME} -4 ${START_INDEX_LIST[i]} ${END_INDEX_LIST[i]} > ${LOG_BASE}/${TASK_ID_LIST[i]}.log 2>&1 &\n        else\n            if [ -z ${NUM_SAMPLING_STEPS} ];\n                then\n                    CUDA_VISIBLE_DEVICES=${GPUS[i]} bash eval/sample.sh $CKPT ${NUM_FRAMES} ${MODEL_NAME} -4 ${START_INDEX_LIST[i]} ${END_INDEX_LIST[i]} ${RES} ${ASP_RATIO} > ${LOG_BASE}/${TASK_ID_LIST[i]}.log 2>&1 &\n                else\n                    if [ -z ${FLOW} ];\n                    then\n                        CUDA_VISIBLE_DEVICES=${GPUS[i]} bash eval/sample.sh $CKPT ${NUM_FRAMES} ${MODEL_NAME} -4 ${START_INDEX_LIST[i]} ${END_INDEX_LIST[i]} ${RES} ${ASP_RATIO} ${NUM_SAMPLING_STEPS} > ${LOG_BASE}/${TASK_ID_LIST[i]}.log 2>&1 &\n                    else\n                        if [ -z ${LLM_REFINE} ];\n                            then\n                                CUDA_VISIBLE_DEVICES=${GPUS[i]} bash eval/sample.sh $CKPT ${NUM_FRAMES} ${MODEL_NAME} -4 ${START_INDEX_LIST[i]} ${END_INDEX_LIST[i]} ${RES} ${ASP_RATIO} ${NUM_SAMPLING_STEPS} ${FLOW} > ${LOG_BASE}/${TASK_ID_LIST[i]}.log 2>&1 &\n                            else\n                                CUDA_VISIBLE_DEVICES=${GPUS[i]} bash eval/sample.sh $CKPT ${NUM_FRAMES} ${MODEL_NAME} -4 ${START_INDEX_LIST[i]} ${END_INDEX_LIST[i]} ${RES} ${ASP_RATIO} ${NUM_SAMPLING_STEPS} ${FLOW} ${LLM_REFINE} > ${LOG_BASE}/${TASK_ID_LIST[i]}.log 2>&1 &\n                        fi\n                    fi\n            fi\n    fi\ndone\n"
  },
  {
    "path": "Open-Sora/eval/vbench/launch_calc.sh",
    "content": "# !/bin/bash\n\nVIDEO_DIR=$1\nCKPT_DIR=$2\nLOG_BASE=$CKPT_DIR\nmkdir -p $LOG_BASE\necho \"Logging to $LOG_BASE\"\n\nGPUS=(0 1 2 3 4 5 6 7)\nSTART_INDEX_LIST=(0 2 6 7 8 9 10 13)\nEND_INDEX_LIST=(2 6 7 8 9 10 13 16)\nTASK_ID_LIST=(calc_vbench_a calc_vbench_b calc_vbench_c calc_vbench_d calc_vbench_e calc_vbench_f calc_vbench_g calc_vbench_h) # for log records only\n\nfor i in \"${!GPUS[@]}\"; do\n    CUDA_VISIBLE_DEVICES=${GPUS[i]} python eval/vbench/calc_vbench.py $VIDEO_DIR $CKPT_DIR --start ${START_INDEX_LIST[i]} --end ${END_INDEX_LIST[i]} > ${LOG_BASE}/${TASK_ID_LIST[i]}.log 2>&1 &\ndone\n"
  },
  {
    "path": "Open-Sora/eval/vbench/tabulate_vbench_scores.py",
    "content": "import argparse\nimport json\nimport os\n\nSEMANTIC_WEIGHT = 1\nQUALITY_WEIGHT = 4\n\nQUALITY_LIST = [\n    \"subject consistency\",\n    \"background consistency\",\n    \"temporal flickering\",\n    \"motion smoothness\",\n    \"aesthetic quality\",\n    \"imaging quality\",\n    \"dynamic degree\",\n]\n\nSEMANTIC_LIST = [\n    \"object class\",\n    \"multiple objects\",\n    \"human action\",\n    \"color\",\n    \"spatial relationship\",\n    \"scene\",\n    \"appearance style\",\n    \"temporal style\",\n    \"overall consistency\",\n]\n\nNORMALIZE_DIC = {\n    \"subject consistency\": {\"Min\": 0.1462, \"Max\": 1.0},\n    \"background consistency\": {\"Min\": 0.2615, \"Max\": 1.0},\n    \"temporal flickering\": {\"Min\": 0.6293, \"Max\": 1.0},\n    \"motion smoothness\": {\"Min\": 0.706, \"Max\": 0.9975},\n    \"dynamic degree\": {\"Min\": 0.0, \"Max\": 1.0},\n    \"aesthetic quality\": {\"Min\": 0.0, \"Max\": 1.0},\n    \"imaging quality\": {\"Min\": 0.0, \"Max\": 1.0},\n    \"object class\": {\"Min\": 0.0, \"Max\": 1.0},\n    \"multiple objects\": {\"Min\": 0.0, \"Max\": 1.0},\n    \"human action\": {\"Min\": 0.0, \"Max\": 1.0},\n    \"color\": {\"Min\": 0.0, \"Max\": 1.0},\n    \"spatial relationship\": {\"Min\": 0.0, \"Max\": 1.0},\n    \"scene\": {\"Min\": 0.0, \"Max\": 0.8222},\n    \"appearance style\": {\"Min\": 0.0009, \"Max\": 0.2855},\n    \"temporal style\": {\"Min\": 0.0, \"Max\": 0.364},\n    \"overall consistency\": {\"Min\": 0.0, \"Max\": 0.364},\n}\n\nDIM_WEIGHT = {\n    \"subject consistency\": 1,\n    \"background consistency\": 1,\n    \"temporal flickering\": 1,\n    \"motion smoothness\": 1,\n    \"aesthetic quality\": 1,\n    \"imaging quality\": 1,\n    \"dynamic degree\": 0.5,\n    \"object class\": 1,\n    \"multiple objects\": 1,\n    \"human action\": 1,\n    \"color\": 1,\n    \"spatial relationship\": 1,\n    \"scene\": 1,\n    \"appearance style\": 1,\n    \"temporal style\": 1,\n    \"overall consistency\": 1,\n}\n\nordered_scaled_res = [\n    \"total score\",\n    \"quality score\",\n    \"semantic score\",\n    \"subject consistency\",\n    \"background consistency\",\n    \"temporal flickering\",\n    \"motion smoothness\",\n    \"dynamic degree\",\n    \"aesthetic quality\",\n    \"imaging quality\",\n    \"object class\",\n    \"multiple objects\",\n    \"human action\",\n    \"color\",\n    \"spatial relationship\",\n    \"scene\",\n    \"appearance style\",\n    \"temporal style\",\n    \"overall consistency\",\n]\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"--score_dir\", type=str)  # ckpt_dir/eval/vbench\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n    res_postfix = \"_eval_results.json\"\n    info_postfix = \"_full_info.json\"\n    files = os.listdir(args.score_dir)\n    res_files = [x for x in files if res_postfix in x]\n    info_files = [x for x in files if info_postfix in x]\n    assert len(res_files) == len(info_files), f\"got {len(res_files)} res files, but {len(info_files)} info files\"\n\n    full_results = {}\n    for res_file in res_files:\n        # first check if results is normal\n        info_file = res_file.split(res_postfix)[0] + info_postfix\n        with open(os.path.join(args.score_dir, info_file), \"r\", encoding=\"utf-8\") as f:\n            info = json.load(f)\n            assert len(info[0][\"video_list\"]) > 0, f\"Error: {info_file} has 0 video list\"\n        # read results\n        with open(os.path.join(args.score_dir, res_file), \"r\", encoding=\"utf-8\") as f:\n            data = json.load(f)\n            for key, val in data.items():\n                full_results[key] = format(val[0], \".4f\")\n\n    scaled_results = {}\n    dims = set()\n    for key, val in full_results.items():\n        dim = key.replace(\"_\", \" \") if \"_\" in key else key\n        scaled_score = (float(val) - NORMALIZE_DIC[dim][\"Min\"]) / (\n            NORMALIZE_DIC[dim][\"Max\"] - NORMALIZE_DIC[dim][\"Min\"]\n        )\n        scaled_score *= DIM_WEIGHT[dim]\n        scaled_results[dim] = scaled_score\n        dims.add(dim)\n\n    assert len(dims) == len(NORMALIZE_DIC), f\"{set(NORMALIZE_DIC.keys())-dims} not calculated yet\"\n\n    quality_score = sum([scaled_results[i] for i in QUALITY_LIST]) / sum([DIM_WEIGHT[i] for i in QUALITY_LIST])\n    semantic_score = sum([scaled_results[i] for i in SEMANTIC_LIST]) / sum([DIM_WEIGHT[i] for i in SEMANTIC_LIST])\n    scaled_results[\"quality score\"] = quality_score\n    scaled_results[\"semantic score\"] = semantic_score\n    scaled_results[\"total score\"] = (quality_score * QUALITY_WEIGHT + semantic_score * SEMANTIC_WEIGHT) / (\n        QUALITY_WEIGHT + SEMANTIC_WEIGHT\n    )\n\n    formated_scaled_results = {\"items\": []}\n    for key in ordered_scaled_res:\n        # formated_scaled_results[key] = format(val * 100, \".2f\") + \"%\"\n        formated_score = format(scaled_results[key] * 100, \".2f\") + \"%\"\n        formated_scaled_results[\"items\"].append({key: formated_score})\n\n    output_file_path = os.path.join(args.score_dir, \"all_results.json\")\n    with open(output_file_path, \"w\") as outfile:\n        json.dump(full_results, outfile, indent=4, sort_keys=True)\n    print(f\"results saved to: {output_file_path}\")\n\n    scaled_file_path = os.path.join(args.score_dir, \"scaled_results.json\")\n    with open(scaled_file_path, \"w\") as outfile:\n        json.dump(formated_scaled_results, outfile, indent=4, sort_keys=True)\n    print(f\"results saved to: {scaled_file_path}\")\n"
  },
  {
    "path": "Open-Sora/eval/vbench_i2v/calc_vbench_i2v.py",
    "content": "import argparse\nimport os\nimport time\n\nimport torch\nfrom vbench import VBench\nfrom vbench2_beta_i2v import VBenchI2V\n\nfull_info_path = \"eval/vbench_i2v/vbench2_i2v_full_info.json\"\nvideo_quality_dimensions = [\n    \"subject_consistency\",\n    \"background_consistency\",\n    \"motion_smoothness\",\n    \"dynamic_degree\",\n    \"aesthetic_quality\",\n    \"imaging_quality\",\n    \"temporal_flickering\",\n]\ni2v_dimensions = [\"i2v_subject\", \"i2v_background\", \"camera_motion\"]\n\n\ndef str2bool(v):\n    if isinstance(v, bool):\n        return v\n    if v.lower() in (\"yes\", \"true\", \"t\", \"y\", \"1\"):\n        return True\n    elif v.lower() in (\"no\", \"false\", \"f\", \"n\", \"0\"):\n        return False\n    else:\n        raise argparse.ArgumentTypeError(\"Boolean value expected.\")\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"video_folder\", type=str)  # samples/samples..._vbench_i2v/\n    parser.add_argument(\"model_ckpt\", type=str)\n    parser.add_argument(\"--start\", type=int, default=0)  # start index of dimension to be evaluated\n    parser.add_argument(\"--end\", type=int, default=-1)  # start index of dimension to be evaluated\n    parser.add_argument(\"--calc_i2v\", type=str2bool, default=True)\n    parser.add_argument(\"--calc_quality\", type=str2bool, default=True)\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n    output_dir = os.path.join(args.model_ckpt, \"vbench_i2v\")\n    os.makedirs(output_dir, exist_ok=True)\n    video_path = args.video_folder\n\n    start_time = time.time()\n\n    if args.calc_i2v:\n        my_VBench_I2V = VBenchI2V(torch.device(\"cuda\"), full_info_path, output_dir)\n        end = args.end if args.end != -1 else len(i2v_dimensions)\n        for i2v_dim in i2v_dimensions[args.start : end]:\n            my_VBench_I2V.evaluate(videos_path=video_path, name=i2v_dim, dimension_list=[i2v_dim], resolution=\"1-1\")\n\n    kwargs = {}\n    kwargs[\"imaging_quality_preprocessing_mode\"] = \"longer\"  # use VBench/evaluate.py default\n\n    if args.calc_quality:\n        my_VBench = VBench(torch.device(\"cuda\"), full_info_path, output_dir)\n        end = args.end if args.end != -1 else len(video_quality_dimensions)\n        for quality_dim in video_quality_dimensions[args.start : end]:\n            my_VBench.evaluate(\n                videos_path=video_path, name=quality_dim, dimension_list=[quality_dim], mode=\"vbench_standard\", **kwargs\n            )\n\n    print(\"Runtime: %s seconds \" % (time.time() - start_time))\n"
  },
  {
    "path": "Open-Sora/eval/vbench_i2v/json_to_txt.py",
    "content": "import json\nimport os\n\nRESOLUTIONS = [\"1-1\", \"16-9\", \"7-4\", \"8-5\"]\n\ncache_root = \"/mnt/jfs-hdd/sora/data/vbench-i2v/crop\"\nresolution = RESOLUTIONS[0]\njson_file = \"vbench2_i2v_full_info.json\"\nsave_path = \"all_i2v.txt\"\n\ndata = json.load(open(json_file))\ntxt = [\n    f'{x[\"prompt_en\"]}{{\"reference_path\": \"{os.path.join(cache_root, resolution, x[\"image_name\"])}\", \"mask_strategy\": \"0\"}}'\n    for x in data\n]\nwith open(save_path, \"w\") as f:\n    f.write(\"\\n\".join(txt))\n"
  },
  {
    "path": "Open-Sora/eval/vbench_i2v/launch.sh",
    "content": "#!/bin/bash\n\nCKPT=$1\nNUM_FRAMES=$2\nMODEL_NAME=$3\nRES=$4\nASP_RATIO=$5\n\nNUM_SAMPLING_STEPS=$6\nFLOW=$7\nLLM_REFINE=$8\n\nif [[ $CKPT == *\"ema\"* ]]; then\n    parentdir=$(dirname $CKPT)\n    CKPT_BASE=$(basename $parentdir)_ema\nelse\n    CKPT_BASE=$(basename $CKPT)\nfi\nLOG_BASE=$(dirname $CKPT)/eval\necho \"Logging to $LOG_BASE\"\n\nGPUS=(0 1 2 3 4 5 6 7)\nTASK_ID_LIST=(5a 5b 5c 5d 5e 5f 5g 5h) # for log records only\nSTART_INDEX_LIST=(0 140 280 420 560 700 840 980)\nEND_INDEX_LIST=(140 280 420 560 700 840 980 2000)\n\n\nfor i in \"${!GPUS[@]}\"; do\n    if [ -z ${RES} ] || [ -z ${ASP_RATIO} ]  ;\n        then\n            CUDA_VISIBLE_DEVICES=${GPUS[i]} bash eval/sample.sh $CKPT $NUM_FRAMES $MODEL_NAME -5 ${START_INDEX_LIST[i]} ${END_INDEX_LIST[i]} > ${LOG_BASE}/${TASK_ID_LIST[i]}.log 2>&1 &\n        else\n            if [ -z ${NUM_SAMPLING_STEPS} ];\n                then\n                    CUDA_VISIBLE_DEVICES=${GPUS[i]} bash eval/sample.sh $CKPT ${NUM_FRAMES} ${MODEL_NAME} -5 ${START_INDEX_LIST[i]} ${END_INDEX_LIST[i]} ${RES} ${ASP_RATIO} > ${LOG_BASE}/${TASK_ID_LIST[i]}.log 2>&1 &\n                else\n                    if [ -z ${FLOW} ];\n                    then\n                        CUDA_VISIBLE_DEVICES=${GPUS[i]} bash eval/sample.sh $CKPT ${NUM_FRAMES} ${MODEL_NAME} -5 ${START_INDEX_LIST[i]} ${END_INDEX_LIST[i]} ${RES} ${ASP_RATIO} ${NUM_SAMPLING_STEPS} > ${LOG_BASE}/${TASK_ID_LIST[i]}.log 2>&1 &\n                    else\n                        if [ -z ${LLM_REFINE} ];\n                            then\n                                CUDA_VISIBLE_DEVICES=${GPUS[i]} bash eval/sample.sh $CKPT ${NUM_FRAMES} ${MODEL_NAME} -5 ${START_INDEX_LIST[i]} ${END_INDEX_LIST[i]} ${RES} ${ASP_RATIO} ${NUM_SAMPLING_STEPS} ${FLOW} > ${LOG_BASE}/${TASK_ID_LIST[i]}.log 2>&1 &\n                            else\n                                CUDA_VISIBLE_DEVICES=${GPUS[i]} bash eval/sample.sh $CKPT ${NUM_FRAMES} ${MODEL_NAME} -5 ${START_INDEX_LIST[i]} ${END_INDEX_LIST[i]} ${RES} ${ASP_RATIO} ${NUM_SAMPLING_STEPS} ${FLOW} ${LLM_REFINE} > ${LOG_BASE}/${TASK_ID_LIST[i]}.log 2>&1 &\n                        fi\n                    fi\n            fi\n    fi\ndone\n"
  },
  {
    "path": "Open-Sora/eval/vbench_i2v/launch_calc.sh",
    "content": "# !/bin/bash\n\nVIDEO_DIR=$1\nCKPT_DIR=$2\nLOG_BASE=$CKPT_DIR\nmkdir -p $LOG_BASE\necho \"Logging to $LOG_BASE\"\n\nGPUS=(0 1 2 3 4 5 6 7)\nCALC_I2V_LIST=(True True False False False False False False)\nCALC_QUALITY_LIST=(False False True True True True True True)\nSTART_INDEX_LIST=(0 2 0 2 3 4 5 6)\nEND_INDEX_LIST=(2 -1 2 3 4 5 6 -1)\nTASK_ID_LIST=(calc_vbench_i2v_a calc_vbench_i2v_b calc_vbench_i2v_c calc_vbench_i2v_d calc_vbench_i2v_e calc_vbench_i2v_f calc_vbench_i2v_g calc_vbench_i2v_h) # for log records only\n\n\nfor i in \"${!GPUS[@]}\"; do\n    CUDA_VISIBLE_DEVICES=${GPUS[i]} python eval/vbench_i2v/calc_vbench_i2v.py $VIDEO_DIR $CKPT_DIR --calc_i2v ${CALC_I2V_LIST[i]} --calc_quality ${CALC_QUALITY_LIST[i]} --start ${START_INDEX_LIST[i]} --end ${END_INDEX_LIST[i]} > ${LOG_BASE}/${TASK_ID_LIST[i]}.log 2>&1 &\ndone\n"
  },
  {
    "path": "Open-Sora/gradio/README.md",
    "content": "---\ntitle: Open Sora\nemoji: 🎥\ncolorFrom: red\ncolorTo: purple\nsdk: gradio\nsdk_version: 4.25.0\napp_file: app.py\npinned: false\nlicense: apache-2.0\npreload_from_hub:\n    - hpcai-tech/OpenSora-STDiT-v3\n    - hpcai-tech/OpenSora-VAE-v1.2\n    - DeepFloyd/t5-v1_1-xxl\n---\n\n\n# 🕹 Gradio Demo\n\nWe have provided a Gradio demo app for you to generate videos via a web interface. You can choose to run it locally or deploy it to Hugging Face by following the instructions given below.\n\n## 🚀 Run Gradio Locally (Outdated)\n\nWe assume that you have already installed `opensora` based on the instructions given in the [main README](../README.md). Follow the steps below to run this app on your local machine.\n\n1. First of all, you need to install `gradio` and `spaces`.\n\n```bash\npip install gradio spaces\n```\n\n2. Afterwards, you can use the following command to launch the application. Remember to launch the command in the project root directory instead of the `gradio` folder.\n\n```bash\n# start the gradio app\npython gradio/app.py\n\n# run with a different port\npython gradio/app.py --port 8000\n\n# run with acceleration such as flash attention and fused norm\npython gradio/app.py --enable-optimization\n\n# run with a sharable Gradio link\npython gradio/app.py --share\n```\n\n3. You should then be able to access this demo via the link which appears in your terminal.\n\n## 📦 Deploy Gradio to Hugging Face Space (Outdated)\n\nWe have also tested this Gradio app on Hugging Face Spaces. You can follow the steps below.\n\n1. Create a Space on Hugging Face, remember to choose `Gradio SDK` and GPU space hardware.\n\n2. Clone the Space repository in your local machine.\n\n3. Copy the `configs` folder and `gradio/app.py` and `gradio/requirements.txt` to the repository you just cloned. The file structure will look like:\n\n```text\n- configs\n    - ...\n- app.py\n- requirements.txt\n- README.md\n- LICENSE\n- ...\n```\n\n4. Push the files to your remote Hugging Face Spaces repository. The application will be built and run automatically.\n\n## Advanced Usage\n\n![Gradio Demo](../assets/readme/gradio_advanced.png)\n\nFor the \"**FPS**\" option, as now we fix the output video's FPS to 24, this option will not affect the output video's length. Thus, for a smaller FPS, the video is supposed to be longer but accelerated due to 24 FPS. Thus, the video will be less smooth but faster. For a larger FPS, the video will be smoother but slower.\n\nFor the \"**Number of Loops**\", it will affect the output video's length and generation speed. For example, if you set the number of loops to 2, the output video will be twice as long as the original video. This is achieved by conditioning the next generation on 1/4 of the previous generated frames and then concatenating all the frames together.\n\nA trick to give different text prompts for different parts of the video is to use the `|x|` symbol to separate the text prompts, where x is the start frame of the next text prompt. This format requires a `|0|` at the start of the prompt. For example, if you want to generate a video with the text prompt \"A cat\" for the first 2 generations and \"A dog\" for the rest generations, you can use the text prompt \"|0|A cat|2|A dog\". You can still check the \"**Enhance prompt with GPT4o**\" to refine your prompts in each part separately.\n"
  },
  {
    "path": "Open-Sora/gradio/app.py",
    "content": "#!/usr/bin/env python\n\"\"\"\nThis script runs a Gradio App for the Open-Sora model.\n\nUsage:\n    python demo.py <config-path>\n\"\"\"\n\nimport argparse\nimport datetime\nimport importlib\nimport os\nimport subprocess\nimport sys\nfrom tempfile import NamedTemporaryFile\n\nimport spaces\nimport torch\n\nimport gradio as gr\n\nMODEL_TYPES = [\"v1.2-stage3\"]\nWATERMARK_PATH = \"./assets/images/watermark/watermark.png\"\nCONFIG_MAP = {\n    \"v1.2-stage3\": \"configs/opensora-v1-2/inference/sample.py\",\n}\nHF_STDIT_MAP = {\"v1.2-stage3\": \"hpcai-tech/OpenSora-STDiT-v3\"}\n\n\n# ============================\n# Prepare Runtime Environment\n# ============================\ndef install_dependencies(enable_optimization=False):\n    \"\"\"\n    Install the required dependencies for the demo if they are not already installed.\n    \"\"\"\n\n    def _is_package_available(name) -> bool:\n        try:\n            importlib.import_module(name)\n            return True\n        except (ImportError, ModuleNotFoundError):\n            return False\n\n    if enable_optimization:\n        # install flash attention\n        if not _is_package_available(\"flash_attn\"):\n            subprocess.run(\n                f\"{sys.executable} -m pip install flash-attn --no-build-isolation\",\n                env={\"FLASH_ATTENTION_SKIP_CUDA_BUILD\": \"TRUE\"},\n                shell=True,\n            )\n\n        # install apex for fused layernorm\n        if not _is_package_available(\"apex\"):\n            subprocess.run(\n                f'{sys.executable} -m pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings \"--build-option=--cpp_ext\" --config-settings \"--build-option=--cuda_ext\" git+https://github.com/NVIDIA/apex.git',\n                shell=True,\n            )\n\n        # install ninja\n        if not _is_package_available(\"ninja\"):\n            subprocess.run(f\"{sys.executable} -m pip install ninja\", shell=True)\n\n        # install xformers\n        if not _is_package_available(\"xformers\"):\n            subprocess.run(\n                f\"{sys.executable} -m pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers\",\n                shell=True,\n            )\n\n\n# ============================\n# Model-related\n# ============================\ndef read_config(config_path):\n    \"\"\"\n    Read the configuration file.\n    \"\"\"\n    from mmengine.config import Config\n\n    return Config.fromfile(config_path)\n\n\ndef build_models(model_type, config, enable_optimization=False):\n    \"\"\"\n    Build the models for the given model type and configuration.\n    \"\"\"\n    # build vae\n    from opensora.registry import MODELS, build_module\n\n    vae = build_module(config.vae, MODELS).cuda()\n\n    # build text encoder\n    text_encoder = build_module(config.text_encoder, MODELS)  # T5 must be fp32\n    text_encoder.t5.model = text_encoder.t5.model.cuda()\n\n    # build stdit\n    # we load model from HuggingFace directly so that we don't need to\n    # handle model download logic in HuggingFace Space\n    from opensora.models.stdit.stdit3 import STDiT3\n\n    model_kwargs = {k: v for k, v in config.model.items() if k not in (\"type\", \"from_pretrained\", \"force_huggingface\")}\n    stdit = STDiT3.from_pretrained(HF_STDIT_MAP[model_type], **model_kwargs)\n    stdit = stdit.cuda()\n\n    # build scheduler\n    from opensora.registry import SCHEDULERS\n\n    scheduler = build_module(config.scheduler, SCHEDULERS)\n\n    # hack for classifier-free guidance\n    text_encoder.y_embedder = stdit.y_embedder\n\n    # move modelst to device\n    vae = vae.to(torch.bfloat16).eval()\n    text_encoder.t5.model = text_encoder.t5.model.eval()  # t5 must be in fp32\n    stdit = stdit.to(torch.bfloat16).eval()\n\n    # clear cuda\n    torch.cuda.empty_cache()\n    return vae, text_encoder, stdit, scheduler\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\n        \"--model-type\",\n        default=\"v1.2-stage3\",\n        choices=MODEL_TYPES,\n        help=f\"The type of model to run for the Gradio App, can only be {MODEL_TYPES}\",\n    )\n    parser.add_argument(\"--output\", default=\"./outputs\", type=str, help=\"The path to the output folder\")\n    parser.add_argument(\"--port\", default=None, type=int, help=\"The port to run the Gradio App on.\")\n    parser.add_argument(\"--host\", default=\"0.0.0.0\", type=str, help=\"The host to run the Gradio App on.\")\n    parser.add_argument(\"--share\", action=\"store_true\", help=\"Whether to share this gradio demo.\")\n    parser.add_argument(\n        \"--enable-optimization\",\n        action=\"store_true\",\n        help=\"Whether to enable optimization such as flash attention and fused layernorm\",\n    )\n    return parser.parse_args()\n\n\n# ============================\n# Main Gradio Script\n# ============================\n# as `run_inference` needs to be wrapped by `spaces.GPU` and the input can only be the prompt text\n# so we can't pass the models to `run_inference` as arguments.\n# instead, we need to define them globally so that we can access these models inside `run_inference`\n\n# read config\nargs = parse_args()\nconfig = read_config(CONFIG_MAP[args.model_type])\ntorch.backends.cuda.matmul.allow_tf32 = True\ntorch.backends.cudnn.allow_tf32 = True\n\n# make outputs dir\nos.makedirs(args.output, exist_ok=True)\n\n# disable torch jit as it can cause failure in gradio SDK\n# gradio sdk uses torch with cuda 11.3\ntorch.jit._state.disable()\n\n# set up\ninstall_dependencies(enable_optimization=args.enable_optimization)\n\n# import after installation\nfrom opensora.datasets import IMG_FPS, save_sample\nfrom opensora.datasets.aspect import get_image_size, get_num_frames\nfrom opensora.models.text_encoder.t5 import text_preprocessing\nfrom opensora.utils.inference_utils import (\n    add_watermark,\n    append_generated,\n    append_score_to_prompts,\n    apply_mask_strategy,\n    collect_references_batch,\n    dframe_to_frame,\n    extract_json_from_prompts,\n    extract_prompts_loop,\n    get_random_prompt_by_openai,\n    has_openai_key,\n    merge_prompt,\n    prepare_multi_resolution_info,\n    refine_prompts_by_openai,\n    split_prompt,\n)\nfrom opensora.utils.misc import to_torch_dtype\n\n# some global variables\ndtype = to_torch_dtype(config.dtype)\ndevice = torch.device(\"cuda\")\n\n# build model\nvae, text_encoder, stdit, scheduler = build_models(\n    args.model_type, config, enable_optimization=args.enable_optimization\n)\n\n\ndef run_inference(\n    mode,\n    prompt_text,\n    resolution,\n    aspect_ratio,\n    length,\n    motion_strength,\n    aesthetic_score,\n    use_motion_strength,\n    use_aesthetic_score,\n    camera_motion,\n    reference_image,\n    refine_prompt,\n    fps,\n    num_loop,\n    seed,\n    sampling_steps,\n    cfg_scale,\n):\n    if prompt_text is None or prompt_text == \"\":\n        gr.Warning(\"Your prompt is empty, please enter a valid prompt\")\n        return None\n\n    torch.manual_seed(seed)\n    with torch.inference_mode():\n        # ======================\n        # 1. Preparation arguments\n        # ======================\n        # parse the inputs\n        # frame_interval must be 1 so  we ignore it here\n        image_size = get_image_size(resolution, aspect_ratio)\n\n        # compute generation parameters\n        if mode == \"Text2Image\":\n            num_frames = 1\n            fps = IMG_FPS\n        else:\n            num_frames = config.num_frames\n            num_frames = get_num_frames(length)\n\n        condition_frame_length = int(num_frames / 17 * 5 / 3)\n        condition_frame_edit = 0.0\n\n        input_size = (num_frames, *image_size)\n        latent_size = vae.get_latent_size(input_size)\n        multi_resolution = \"OpenSora\"\n        align = 5\n\n        # == prepare mask strategy ==\n        if mode == \"Text2Image\":\n            mask_strategy = [None]\n        elif mode == \"Text2Video\":\n            if reference_image is not None:\n                mask_strategy = [\"0\"]\n            else:\n                mask_strategy = [None]\n        else:\n            raise ValueError(f\"Invalid mode: {mode}\")\n\n        # == prepare reference ==\n        if mode == \"Text2Image\":\n            refs = [\"\"]\n        elif mode == \"Text2Video\":\n            if reference_image is not None:\n                # save image to disk\n                from PIL import Image\n\n                im = Image.fromarray(reference_image)\n                temp_file = NamedTemporaryFile(suffix=\".png\")\n                im.save(temp_file.name)\n                refs = [temp_file.name]\n            else:\n                refs = [\"\"]\n        else:\n            raise ValueError(f\"Invalid mode: {mode}\")\n\n        # == get json from prompts ==\n        batch_prompts = [prompt_text]\n        batch_prompts, refs, mask_strategy = extract_json_from_prompts(batch_prompts, refs, mask_strategy)\n\n        # == get reference for condition ==\n        refs = collect_references_batch(refs, vae, image_size)\n\n        # == multi-resolution info ==\n        model_args = prepare_multi_resolution_info(\n            multi_resolution, len(batch_prompts), image_size, num_frames, fps, device, dtype\n        )\n\n        # == process prompts step by step ==\n        # 0. split prompt\n        # each element in the list is [prompt_segment_list, loop_idx_list]\n        batched_prompt_segment_list = []\n        batched_loop_idx_list = []\n        for prompt in batch_prompts:\n            prompt_segment_list, loop_idx_list = split_prompt(prompt)\n            batched_prompt_segment_list.append(prompt_segment_list)\n            batched_loop_idx_list.append(loop_idx_list)\n\n        # 1. refine prompt by openai\n        if refine_prompt:\n            # check if openai key is provided\n            if not has_openai_key():\n                gr.Warning(\"OpenAI API key is not provided, the prompt will not be enhanced.\")\n            else:\n                for idx, prompt_segment_list in enumerate(batched_prompt_segment_list):\n                    batched_prompt_segment_list[idx] = refine_prompts_by_openai(prompt_segment_list)\n\n        # process scores\n        aesthetic_score = aesthetic_score if use_aesthetic_score else None\n        motion_strength = motion_strength if use_motion_strength and mode != \"Text2Image\" else None\n        camera_motion = None if camera_motion == \"none\" or mode == \"Text2Image\" else camera_motion\n        # 2. append score\n        for idx, prompt_segment_list in enumerate(batched_prompt_segment_list):\n            batched_prompt_segment_list[idx] = append_score_to_prompts(\n                prompt_segment_list,\n                aes=aesthetic_score,\n                flow=motion_strength,\n                camera_motion=camera_motion,\n            )\n\n        # 3. clean prompt with T5\n        for idx, prompt_segment_list in enumerate(batched_prompt_segment_list):\n            batched_prompt_segment_list[idx] = [text_preprocessing(prompt) for prompt in prompt_segment_list]\n\n        # 4. merge to obtain the final prompt\n        batch_prompts = []\n        for prompt_segment_list, loop_idx_list in zip(batched_prompt_segment_list, batched_loop_idx_list):\n            batch_prompts.append(merge_prompt(prompt_segment_list, loop_idx_list))\n\n        # =========================\n        # Generate image/video\n        # =========================\n        video_clips = []\n\n        for loop_i in range(num_loop):\n            # 4.4 sample in hidden space\n            batch_prompts_loop = extract_prompts_loop(batch_prompts, loop_i)\n\n            # == loop ==\n            if loop_i > 0:\n                refs, mask_strategy = append_generated(\n                    vae, video_clips[-1], refs, mask_strategy, loop_i, condition_frame_length, condition_frame_edit\n                )\n\n            # == sampling ==\n            z = torch.randn(len(batch_prompts), vae.out_channels, *latent_size, device=device, dtype=dtype)\n            masks = apply_mask_strategy(z, refs, mask_strategy, loop_i, align=align)\n\n            # 4.6. diffusion sampling\n            # hack to update num_sampling_steps and cfg_scale\n            scheduler_kwargs = config.scheduler.copy()\n            scheduler_kwargs.pop(\"type\")\n            scheduler_kwargs[\"num_sampling_steps\"] = sampling_steps\n            scheduler_kwargs[\"cfg_scale\"] = cfg_scale\n\n            scheduler.__init__(**scheduler_kwargs)\n            samples = scheduler.sample(\n                stdit,\n                text_encoder,\n                z=z,\n                prompts=batch_prompts_loop,\n                device=device,\n                additional_args=model_args,\n                progress=True,\n                mask=masks,\n            )\n            samples = vae.decode(samples.to(dtype), num_frames=num_frames)\n            video_clips.append(samples)\n\n        # =========================\n        # Save output\n        # =========================\n        video_clips = [val[0] for val in video_clips]\n        for i in range(1, num_loop):\n            video_clips[i] = video_clips[i][:, dframe_to_frame(condition_frame_length) :]\n        video = torch.cat(video_clips, dim=1)\n        current_datetime = datetime.datetime.now()\n        timestamp = current_datetime.timestamp()\n        save_path = os.path.join(args.output, f\"output_{timestamp}\")\n        saved_path = save_sample(video, save_path=save_path, fps=24)\n        torch.cuda.empty_cache()\n\n        # add watermark\n        # all watermarked videos should have a _watermarked suffix\n        if mode != \"Text2Image\" and os.path.exists(WATERMARK_PATH):\n            watermarked_path = saved_path.replace(\".mp4\", \"_watermarked.mp4\")\n            success = add_watermark(saved_path, WATERMARK_PATH, watermarked_path)\n            if success:\n                return watermarked_path\n            else:\n                return saved_path\n        else:\n            return saved_path\n\n\n@spaces.GPU(duration=200)\ndef run_image_inference(\n    prompt_text,\n    resolution,\n    aspect_ratio,\n    length,\n    motion_strength,\n    aesthetic_score,\n    use_motion_strength,\n    use_aesthetic_score,\n    camera_motion,\n    reference_image,\n    refine_prompt,\n    fps,\n    num_loop,\n    seed,\n    sampling_steps,\n    cfg_scale,\n):\n    return run_inference(\n        \"Text2Image\",\n        prompt_text,\n        resolution,\n        aspect_ratio,\n        length,\n        motion_strength,\n        aesthetic_score,\n        use_motion_strength,\n        use_aesthetic_score,\n        camera_motion,\n        reference_image,\n        refine_prompt,\n        fps,\n        num_loop,\n        seed,\n        sampling_steps,\n        cfg_scale,\n    )\n\n\n@spaces.GPU(duration=200)\ndef run_video_inference(\n    prompt_text,\n    resolution,\n    aspect_ratio,\n    length,\n    motion_strength,\n    aesthetic_score,\n    use_motion_strength,\n    use_aesthetic_score,\n    camera_motion,\n    reference_image,\n    refine_prompt,\n    fps,\n    num_loop,\n    seed,\n    sampling_steps,\n    cfg_scale,\n):\n    # if (resolution == \"480p\" and length == \"16s\") or \\\n    #     (resolution == \"720p\" and length in [\"8s\", \"16s\"]):\n    #     gr.Warning(\"Generation is interrupted as the combination of 480p and 16s will lead to CUDA out of memory\")\n    # else:\n    return run_inference(\n        \"Text2Video\",\n        prompt_text,\n        resolution,\n        aspect_ratio,\n        length,\n        motion_strength,\n        aesthetic_score,\n        use_motion_strength,\n        use_aesthetic_score,\n        camera_motion,\n        reference_image,\n        refine_prompt,\n        fps,\n        num_loop,\n        seed,\n        sampling_steps,\n        cfg_scale,\n    )\n\n\ndef generate_random_prompt():\n    if \"OPENAI_API_KEY\" not in os.environ:\n        gr.Warning(\"Your prompt is empty and the OpenAI API key is not provided, please enter a valid prompt\")\n        return None\n    else:\n        prompt_text = get_random_prompt_by_openai()\n        return prompt_text\n\n\ndef main():\n    # create demo\n    with gr.Blocks() as demo:\n        with gr.Row():\n            with gr.Column():\n                gr.HTML(\n                    \"\"\"\n                <div style='text-align: center;'>\n                    <p align=\"center\">\n                        <img src=\"https://github.com/hpcaitech/Open-Sora/raw/main/assets/readme/icon.png\" width=\"250\"/>\n                    </p>\n                    <div style=\"display: flex; gap: 10px; justify-content: center;\">\n                        <a href=\"https://github.com/hpcaitech/Open-Sora/stargazers\"><img src=\"https://img.shields.io/github/stars/hpcaitech/Open-Sora?style=social\"></a>\n                        <a href=\"https://hpcaitech.github.io/Open-Sora/\"><img src=\"https://img.shields.io/badge/Gallery-View-orange?logo=&amp\"></a>\n                        <a href=\"https://discord.gg/kZakZzrSUT\"><img src=\"https://img.shields.io/badge/Discord-join-blueviolet?logo=discord&amp\"></a>\n                        <a href=\"https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-247ipg9fk-KRRYmUl~u2ll2637WRURVA\"><img src=\"https://img.shields.io/badge/Slack-ColossalAI-blueviolet?logo=slack&amp\"></a>\n                        <a href=\"https://twitter.com/yangyou1991/status/1769411544083996787?s=61&t=jT0Dsx2d-MS5vS9rNM5e5g\"><img src=\"https://img.shields.io/badge/Twitter-Discuss-blue?logo=twitter&amp\"></a>\n                        <a href=\"https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png\"><img src=\"https://img.shields.io/badge/微信-小助手加群-green?logo=wechat&amp\"></a>\n                        <a href=\"https://hpc-ai.com/blog/open-sora-v1.0\"><img src=\"https://img.shields.io/badge/Open_Sora-Blog-blue\"></a>\n                    </div>\n                    <h1 style='margin-top: 5px;'>Open-Sora: Democratizing Efficient Video Production for All</h1>\n                </div>\n                \"\"\"\n                )\n\n        with gr.Row():\n            with gr.Column():\n                prompt_text = gr.Textbox(label=\"Prompt\", placeholder=\"Describe your video here\", lines=4)\n                refine_prompt = gr.Checkbox(\n                    value=has_openai_key(), label=\"Refine prompt with GPT4o\", interactive=has_openai_key()\n                )\n                random_prompt_btn = gr.Button(\"Random Prompt By GPT4o\", interactive=has_openai_key())\n\n                gr.Markdown(\"## Basic Settings\")\n                resolution = gr.Radio(\n                    choices=[\"144p\", \"240p\", \"360p\", \"480p\", \"720p\"],\n                    value=\"480p\",\n                    label=\"Resolution\",\n                )\n                aspect_ratio = gr.Radio(\n                    choices=[\"9:16\", \"16:9\", \"3:4\", \"4:3\", \"1:1\"],\n                    value=\"9:16\",\n                    label=\"Aspect Ratio (H:W)\",\n                )\n                length = gr.Radio(\n                    choices=[\"2s\", \"4s\", \"8s\", \"16s\"],\n                    value=\"2s\",\n                    label=\"Video Length\",\n                    info=\"only effective for video generation, 8s may fail as Hugging Face ZeroGPU has the limitation of max 200 seconds inference time.\",\n                )\n\n                with gr.Row():\n                    seed = gr.Slider(value=1024, minimum=1, maximum=2048, step=1, label=\"Seed\")\n\n                    sampling_steps = gr.Slider(value=30, minimum=1, maximum=200, step=1, label=\"Sampling steps\")\n                    cfg_scale = gr.Slider(value=7.0, minimum=0.0, maximum=10.0, step=0.1, label=\"CFG Scale\")\n\n                with gr.Row():\n                    with gr.Column():\n                        motion_strength = gr.Slider(\n                            value=5,\n                            minimum=0,\n                            maximum=100,\n                            step=1,\n                            label=\"Motion Strength\",\n                            info=\"only effective for video generation\",\n                        )\n                        use_motion_strength = gr.Checkbox(value=False, label=\"Enable\")\n\n                    with gr.Column():\n                        aesthetic_score = gr.Slider(\n                            value=6.5,\n                            minimum=4,\n                            maximum=7,\n                            step=0.1,\n                            label=\"Aesthetic\",\n                            info=\"effective for text & video generation\",\n                        )\n                        use_aesthetic_score = gr.Checkbox(value=True, label=\"Enable\")\n\n                camera_motion = gr.Radio(\n                    value=\"none\",\n                    label=\"Camera Motion\",\n                    choices=[\"none\", \"pan right\", \"pan left\", \"tilt up\", \"tilt down\", \"zoom in\", \"zoom out\", \"static\"],\n                    interactive=True,\n                )\n\n                gr.Markdown(\"## Advanced Settings\")\n                with gr.Row():\n                    fps = gr.Slider(\n                        value=24,\n                        minimum=1,\n                        maximum=60,\n                        step=1,\n                        label=\"FPS\",\n                        info=\"This is the frames per seconds for video generation, keep it to 24 if you are not sure\",\n                    )\n                    num_loop = gr.Slider(\n                        value=1,\n                        minimum=1,\n                        maximum=20,\n                        step=1,\n                        label=\"Number of Loops\",\n                        info=\"This will change the length of the generated video, keep it to 1 if you are not sure\",\n                    )\n\n                gr.Markdown(\"## Reference Image\")\n                reference_image = gr.Image(label=\"Image (optional)\", show_download_button=True)\n\n            with gr.Column():\n                output_video = gr.Video(label=\"Output Video\", height=\"100%\")\n\n        with gr.Row():\n            image_gen_button = gr.Button(\"Generate image\")\n            video_gen_button = gr.Button(\"Generate video\")\n\n        image_gen_button.click(\n            fn=run_image_inference,\n            inputs=[\n                prompt_text,\n                resolution,\n                aspect_ratio,\n                length,\n                motion_strength,\n                aesthetic_score,\n                use_motion_strength,\n                use_aesthetic_score,\n                camera_motion,\n                reference_image,\n                refine_prompt,\n                fps,\n                num_loop,\n                seed,\n                sampling_steps,\n                cfg_scale,\n            ],\n            outputs=reference_image,\n        )\n        video_gen_button.click(\n            fn=run_video_inference,\n            inputs=[\n                prompt_text,\n                resolution,\n                aspect_ratio,\n                length,\n                motion_strength,\n                aesthetic_score,\n                use_motion_strength,\n                use_aesthetic_score,\n                camera_motion,\n                reference_image,\n                refine_prompt,\n                fps,\n                num_loop,\n                seed,\n                sampling_steps,\n                cfg_scale,\n            ],\n            outputs=output_video,\n        )\n        random_prompt_btn.click(fn=generate_random_prompt, outputs=prompt_text)\n\n    # launch\n    demo.queue(max_size=5, default_concurrency_limit=1)\n    demo.launch(server_port=args.port, server_name=args.host, share=args.share, max_threads=1)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/gradio/requirements.txt",
    "content": "xformers\ntransformers\ngit+https://github.com/hpcaitech/Open-Sora.git\n"
  },
  {
    "path": "Open-Sora/notebooks/inference.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Inference for OpenSora\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Define global variables. You should change the following variables according to your setting.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# global variables\\n\",\n    \"ROOT = \\\"..\\\"\\n\",\n    \"cfg_path = f\\\"{ROOT}/configs/opensora-v1-2/inference/sample.py\\\"\\n\",\n    \"ckpt_path = \\\"/home/lishenggui/projects/sora/Open-Sora-dev/outputs/207-STDiT3-XL-2/epoch0-global_step9000/\\\"\\n\",\n    \"vae_path = f\\\"{ROOT}/pretrained_models/vae-pipeline\\\"\\n\",\n    \"save_dir = f\\\"{ROOT}/samples/samples_notebook/\\\"\\n\",\n    \"device = \\\"cuda:0\\\"\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Import necessary libraries and load the models.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"from pprint import pformat\\n\",\n    \"\\n\",\n    \"import colossalai\\n\",\n    \"import torch\\n\",\n    \"import torch.distributed as dist\\n\",\n    \"from colossalai.cluster import DistCoordinator\\n\",\n    \"from mmengine.runner import set_random_seed\\n\",\n    \"from tqdm.notebook import tqdm\\n\",\n    \"\\n\",\n    \"from opensora.acceleration.parallel_states import set_sequence_parallel_group\\n\",\n    \"from opensora.datasets import save_sample, is_img\\n\",\n    \"from opensora.datasets.aspect import get_image_size, get_num_frames\\n\",\n    \"from opensora.models.text_encoder.t5 import text_preprocessing\\n\",\n    \"from opensora.registry import MODELS, SCHEDULERS, build_module\\n\",\n    \"from opensora.utils.config_utils import read_config\\n\",\n    \"from opensora.utils.inference_utils import (\\n\",\n    \"    append_generated,\\n\",\n    \"    apply_mask_strategy,\\n\",\n    \"    collect_references_batch,\\n\",\n    \"    extract_json_from_prompts,\\n\",\n    \"    extract_prompts_loop,\\n\",\n    \"    get_save_path_name,\\n\",\n    \"    load_prompts,\\n\",\n    \"    prepare_multi_resolution_info,\\n\",\n    \")\\n\",\n    \"from opensora.utils.misc import all_exists, create_logger, is_distributed, is_main_process, to_torch_dtype\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"torch.set_grad_enabled(False)\\n\",\n    \"\\n\",\n    \"# == parse configs ==\\n\",\n    \"cfg = read_config(cfg_path)\\n\",\n    \"cfg.model.from_pretrained = ckpt_path\\n\",\n    \"cfg.vae.from_pretrained = vae_path\\n\",\n    \"\\n\",\n    \"# == device and dtype ==\\n\",\n    \"cfg_dtype = cfg.get(\\\"dtype\\\", \\\"fp32\\\")\\n\",\n    \"assert cfg_dtype in [\\\"fp16\\\", \\\"bf16\\\", \\\"fp32\\\"], f\\\"Unknown mixed precision {cfg_dtype}\\\"\\n\",\n    \"dtype = to_torch_dtype(cfg.get(\\\"dtype\\\", \\\"bf16\\\"))\\n\",\n    \"torch.backends.cuda.matmul.allow_tf32 = True\\n\",\n    \"torch.backends.cudnn.allow_tf32 = True\\n\",\n    \"\\n\",\n    \"set_random_seed(seed=cfg.get(\\\"seed\\\", 1024))\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# == build text-encoder and vae ==\\n\",\n    \"text_encoder = build_module(cfg.text_encoder, MODELS, device=device)\\n\",\n    \"vae = build_module(cfg.vae, MODELS).to(device, dtype).eval()\\n\",\n    \"\\n\",\n    \"# == build diffusion model ==\\n\",\n    \"input_size = (None, None, None)\\n\",\n    \"latent_size = vae.get_latent_size(input_size)\\n\",\n    \"model = (\\n\",\n    \"    build_module(\\n\",\n    \"        cfg.model,\\n\",\n    \"        MODELS,\\n\",\n    \"        input_size=latent_size,\\n\",\n    \"        in_channels=vae.out_channels,\\n\",\n    \"        caption_channels=text_encoder.output_dim,\\n\",\n    \"        model_max_length=text_encoder.model_max_length,\\n\",\n    \"    )\\n\",\n    \"    .to(device, dtype)\\n\",\n    \"    .eval()\\n\",\n    \")\\n\",\n    \"text_encoder.y_embedder = model.y_embedder  # HACK: for classifier-free guidance\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Define inference function.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"start_idx = 0\\n\",\n    \"multi_resolution = cfg.get(\\\"multi_resolution\\\", None)\\n\",\n    \"batch_size = cfg.get(\\\"batch_size\\\", 1)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def inference(\\n\",\n    \"    prompts=cfg.get(\\\"prompt\\\", None),\\n\",\n    \"    image_size=None,\\n\",\n    \"    num_frames=None,\\n\",\n    \"    resolution=None,\\n\",\n    \"    aspect_ratio=None,\\n\",\n    \"    mask_strategy=None,\\n\",\n    \"    reference_path=None,\\n\",\n    \"    num_sampling_steps=None,\\n\",\n    \"    cfg_scale=None,\\n\",\n    \"    seed=None,\\n\",\n    \"    fps=cfg.fps,\\n\",\n    \"    num_sample=cfg.get(\\\"num_sample\\\", 1),\\n\",\n    \"    loop=cfg.get(\\\"loop\\\", 1),\\n\",\n    \"    condition_frame_length=cfg.get(\\\"condition_frame_length\\\", 5),\\n\",\n    \"    align=cfg.get(\\\"align\\\", None),\\n\",\n    \"    sample_name=cfg.get(\\\"sample_name\\\", None),\\n\",\n    \"    prompt_as_path=cfg.get(\\\"prompt_as_path\\\", False),\\n\",\n    \"    disable_progress=False,\\n\",\n    \"):\\n\",\n    \"    global start_idx\\n\",\n    \"    os.makedirs(save_dir, exist_ok=True)\\n\",\n    \"    if seed is not None:\\n\",\n    \"        set_random_seed(seed=seed)\\n\",\n    \"    if not isinstance(prompts, list):\\n\",\n    \"        prompts = [prompts]\\n\",\n    \"    if mask_strategy is None:\\n\",\n    \"        mask_strategy = [\\\"\\\"] * len(prompts)\\n\",\n    \"    if reference_path is None:\\n\",\n    \"        reference_path = [\\\"\\\"] * len(prompts)\\n\",\n    \"    save_fps = cfg.fps // cfg.get(\\\"frame_interval\\\", 1)\\n\",\n    \"    if num_sampling_steps is not None:\\n\",\n    \"        cfg.scheduler[\\\"num_sampling_steps\\\"] = num_sampling_steps\\n\",\n    \"    if cfg_scale is not None:\\n\",\n    \"        cfg.scheduler[\\\"scale\\\"] = cfg_scale\\n\",\n    \"    scheduler = build_module(cfg.scheduler, SCHEDULERS)\\n\",\n    \"    ret_path = []\\n\",\n    \"\\n\",\n    \"    # == prepare video size ==\\n\",\n    \"    if image_size is None:\\n\",\n    \"        assert (\\n\",\n    \"            resolution is not None and aspect_ratio is not None\\n\",\n    \"        ), \\\"resolution and aspect_ratio must be provided if image_size is not provided\\\"\\n\",\n    \"        image_size = get_image_size(resolution, aspect_ratio)\\n\",\n    \"    num_frames = get_num_frames(num_frames)\\n\",\n    \"    input_size = (num_frames, *image_size)\\n\",\n    \"    latent_size = vae.get_latent_size(input_size)\\n\",\n    \"\\n\",\n    \"    # == Iter over all samples ==\\n\",\n    \"    for i in tqdm(range(0, len(prompts), batch_size), disable=disable_progress):\\n\",\n    \"        # == prepare batch prompts ==\\n\",\n    \"        batch_prompts = prompts[i : i + batch_size]\\n\",\n    \"        ms = mask_strategy[i : i + batch_size]\\n\",\n    \"        refs = reference_path[i : i + batch_size]\\n\",\n    \"\\n\",\n    \"        batch_prompts, refs, ms = extract_json_from_prompts(batch_prompts, refs, ms)\\n\",\n    \"        refs = collect_references_batch(refs, vae, image_size)\\n\",\n    \"\\n\",\n    \"        # == multi-resolution info ==\\n\",\n    \"        model_args = prepare_multi_resolution_info(\\n\",\n    \"            multi_resolution, len(batch_prompts), image_size, num_frames, fps, device, dtype\\n\",\n    \"        )\\n\",\n    \"\\n\",\n    \"        # == Iter over number of sampling for one prompt ==\\n\",\n    \"        for k in range(num_sample):\\n\",\n    \"            # == prepare save paths ==\\n\",\n    \"            save_paths = [\\n\",\n    \"                get_save_path_name(\\n\",\n    \"                    save_dir,\\n\",\n    \"                    sample_name=sample_name,\\n\",\n    \"                    sample_idx=start_idx + idx,\\n\",\n    \"                    prompt=batch_prompts[idx],\\n\",\n    \"                    prompt_as_path=prompt_as_path,\\n\",\n    \"                    num_sample=num_sample,\\n\",\n    \"                    k=k,\\n\",\n    \"                )\\n\",\n    \"                for idx in range(len(batch_prompts))\\n\",\n    \"            ]\\n\",\n    \"\\n\",\n    \"            # NOTE: Skip if the sample already exists\\n\",\n    \"            # This is useful for resuming sampling VBench\\n\",\n    \"            if prompt_as_path and all_exists(save_paths):\\n\",\n    \"                continue\\n\",\n    \"\\n\",\n    \"            # == Iter over loop generation ==\\n\",\n    \"            video_clips = []\\n\",\n    \"            for loop_i in range(loop):\\n\",\n    \"                batch_prompts_loop = extract_prompts_loop(batch_prompts, loop_i)\\n\",\n    \"                batch_prompts_cleaned = [text_preprocessing(prompt) for prompt in batch_prompts_loop]\\n\",\n    \"\\n\",\n    \"                # == loop ==\\n\",\n    \"                if loop_i > 0:\\n\",\n    \"                    refs, ms = append_generated(vae, video_clips[-1], refs, ms, loop_i, condition_frame_length)\\n\",\n    \"\\n\",\n    \"                # == sampling ==\\n\",\n    \"                z = torch.randn(len(batch_prompts), vae.out_channels, *latent_size, device=device, dtype=dtype)\\n\",\n    \"                masks = apply_mask_strategy(z, refs, ms, loop_i, align=align)\\n\",\n    \"                samples = scheduler.sample(\\n\",\n    \"                    model,\\n\",\n    \"                    text_encoder,\\n\",\n    \"                    z=z,\\n\",\n    \"                    prompts=batch_prompts_cleaned,\\n\",\n    \"                    device=device,\\n\",\n    \"                    additional_args=model_args,\\n\",\n    \"                    progress=False,\\n\",\n    \"                    mask=masks,\\n\",\n    \"                )\\n\",\n    \"                samples = vae.decode(samples.to(dtype), num_frames=num_frames)\\n\",\n    \"                video_clips.append(samples)\\n\",\n    \"\\n\",\n    \"            # == save samples ==\\n\",\n    \"            if is_main_process():\\n\",\n    \"                for idx, batch_prompt in enumerate(batch_prompts):\\n\",\n    \"                    save_path = save_paths[idx]\\n\",\n    \"                    video = [video_clips[i][idx] for i in range(loop)]\\n\",\n    \"                    for i in range(1, loop):\\n\",\n    \"                        video[i] = video[i][:, condition_frame_length:]\\n\",\n    \"                    video = torch.cat(video, dim=1)\\n\",\n    \"                    path = save_sample(\\n\",\n    \"                        video,\\n\",\n    \"                        fps=save_fps,\\n\",\n    \"                        save_path=save_path,\\n\",\n    \"                        verbose=False,\\n\",\n    \"                    )\\n\",\n    \"                    ret_path.append(path)\\n\",\n    \"        start_idx += len(batch_prompts)\\n\",\n    \"    return ret_path\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from IPython.display import Video, Image, display\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def display_results(paths):\\n\",\n    \"    for path in paths:\\n\",\n    \"        if is_img(path):\\n\",\n    \"            display(Image(path))\\n\",\n    \"        else:\\n\",\n    \"            display(Video(path, embed=True))\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def reset_start_idx():\\n\",\n    \"    global start_idx\\n\",\n    \"    start_idx = 0\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"ALL_ASPECT_RATIO = [\\\"1:1\\\", \\\"16:9\\\", \\\"9:16\\\", \\\"3:4\\\", \\\"4:3\\\", \\\"1:2\\\", \\\"2:1\\\"]\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def inference_all_aspects(prompts, resolution, num_frames, *args, **kwargs):\\n\",\n    \"    paths = []\\n\",\n    \"    for aspect_ratio in tqdm(ALL_ASPECT_RATIO):\\n\",\n    \"        paths.extend(\\n\",\n    \"            inference(\\n\",\n    \"                prompts,\\n\",\n    \"                resolution=resolution,\\n\",\n    \"                num_frames=num_frames,\\n\",\n    \"                aspect_ratio=aspect_ratio,\\n\",\n    \"                disable_progress=True,\\n\",\n    \"                *args,\\n\",\n    \"                **kwargs\\n\",\n    \"            )\\n\",\n    \"        )\\n\",\n    \"    return paths\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": []\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Inference for OpenSora\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Sample code for inference for OpenSora.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"paths = inference(\\n\",\n    \"    [\\\"a man.\\\", \\\"a woman\\\"],\\n\",\n    \"    resolution=\\\"240p\\\",\\n\",\n    \"    aspect_ratio=\\\"1:1\\\",\\n\",\n    \"    num_frames=\\\"1x\\\",\\n\",\n    \"    num_sampling_steps=30,\\n\",\n    \"    cfg_scale=7.0,\\n\",\n    \")\\n\",\n    \"display_results(paths)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Sample all aspect ratios.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"PROMPT = \\\"a boy.\\\"\\n\",\n    \"paths = inference_all_aspects(\\n\",\n    \"    PROMPT,\\n\",\n    \"    resolution=\\\"240p\\\",\\n\",\n    \"    num_frames=\\\"1x\\\",\\n\",\n    \"    num_sampling_steps=30,\\n\",\n    \"    cfg_scale=7.0,\\n\",\n    \")\\n\",\n    \"display_results(paths)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Sample all resolution and length.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"PROMPT = \\\"a boy.\\\"\\n\",\n    \"sample_cfg = {\\n\",\n    \"    \\\"144p\\\": [1, \\\"1x\\\", \\\"2x\\\", \\\"4x\\\", \\\"8x\\\"],\\n\",\n    \"    \\\"240p\\\": [1, \\\"1x\\\", \\\"2x\\\", \\\"4x\\\", \\\"8x\\\"],\\n\",\n    \"    \\\"360p\\\": [1, \\\"1x\\\", \\\"2x\\\", \\\"4x\\\"],\\n\",\n    \"    \\\"480p\\\": [1, \\\"1x\\\", \\\"2x\\\", \\\"4x\\\"],\\n\",\n    \"    \\\"720p\\\": [1, \\\"1x\\\", \\\"2x\\\"],\\n\",\n    \"}\\n\",\n    \"all_paths = []\\n\",\n    \"for resolution, num_frames in sample_cfg.items():\\n\",\n    \"    for num_frame in num_frames:\\n\",\n    \"        print(f\\\"Resolution: {resolution}, Num Frames: {num_frame}\\\")\\n\",\n    \"        paths = inference(\\n\",\n    \"            PROMPT,\\n\",\n    \"            resolution=resolution,\\n\",\n    \"            num_frames=num_frame,\\n\",\n    \"            aspect_ratio=\\\"9:16\\\",\\n\",\n    \"            num_sampling_steps=30,\\n\",\n    \"            cfg_scale=7.0,\\n\",\n    \"            disable_progress=True,\\n\",\n    \"        )\\n\",\n    \"        display_results(paths)\\n\",\n    \"        all_paths.extend(paths)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Sample all resolution, length, and aspect ratios.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"PROMPT = \\\"a boy.\\\"\\n\",\n    \"sample_cfg = {\\n\",\n    \"    \\\"144p\\\": [1, \\\"1x\\\", \\\"2x\\\", \\\"4x\\\", \\\"8x\\\"],\\n\",\n    \"    \\\"240p\\\": [1, \\\"1x\\\", \\\"2x\\\", \\\"4x\\\", \\\"8x\\\"],\\n\",\n    \"    \\\"360p\\\": [1, \\\"1x\\\", \\\"2x\\\", \\\"4x\\\"],\\n\",\n    \"    \\\"480p\\\": [1, \\\"1x\\\", \\\"2x\\\", \\\"4x\\\"],\\n\",\n    \"    \\\"720p\\\": [1, \\\"1x\\\", \\\"2x\\\"],\\n\",\n    \"}\\n\",\n    \"all_paths = []\\n\",\n    \"for resolution, num_frames in sample_cfg.items():\\n\",\n    \"    for num_frame in num_frames:\\n\",\n    \"        paths = inference_all_aspects(\\n\",\n    \"            PROMPT,\\n\",\n    \"            resolution=resolution,\\n\",\n    \"            num_frames=num_frames,\\n\",\n    \"            num_sampling_steps=30,\\n\",\n    \"            cfg_scale=7.0,\\n\",\n    \"        )\\n\",\n    \"        display_results(paths)\\n\",\n    \"        all_paths.extend(paths)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": []\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"opensora\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.10.14\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2\n}\n"
  },
  {
    "path": "Open-Sora/notebooks/launch.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Data Process Pipeline\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Data Process Commands\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"\\n\",\n    \"# TODO: change to your own project path!!!\\n\",\n    \"OPEN_SORA_HOME = \\\"/path/to/Open-Sora/\\\"\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def convert_dataset_cmd(input_dir, output_file, datatype=\\\"video\\\"):\\n\",\n    \"    commands = []\\n\",\n    \"    commands.append(f'echo \\\"Converting {input_dir} to {output_file}\\\"')\\n\",\n    \"    output_dir = os.path.dirname(output_file)\\n\",\n    \"\\n\",\n    \"    commands.append(f\\\"mkdir -p {output_dir}\\\")\\n\",\n    \"    commands.append(f\\\"cd {OPEN_SORA_HOME}\\\")\\n\",\n    \"    commands.append(f\\\"python -m tools.datasets.convert {datatype} {input_dir} --output {output_file}\\\")\\n\",\n    \"    return \\\" && \\\".join(commands), output_file\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def get_video_info(input_file):\\n\",\n    \"    commands = []\\n\",\n    \"    base, ext = os.path.splitext(input_file)\\n\",\n    \"    output_file = f\\\"{base}_info{ext}\\\"\\n\",\n    \"    output_format = ext[1:]\\n\",\n    \"\\n\",\n    \"    commands.append(f'echo \\\"Getting info of {input_file} to {output_file}\\\"')\\n\",\n    \"    commands.append(f\\\"cd {OPEN_SORA_HOME}\\\")\\n\",\n    \"    commands.append(\\n\",\n    \"        f\\\"python -m tools.datasets.datautil {input_file} --output {output_file} --format {output_format} --info --fmin 1\\\"\\n\",\n    \"    )\\n\",\n    \"    return \\\" && \\\".join(commands), output_file\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def get_video_info_torchvision(input_file):\\n\",\n    \"    commands = []\\n\",\n    \"    base, ext = os.path.splitext(input_file)\\n\",\n    \"    output_file = f\\\"{base}_info{ext}\\\"\\n\",\n    \"    output_format = ext[1:]\\n\",\n    \"\\n\",\n    \"    commands.append(f'echo \\\"Getting info of {input_file} to {output_file}\\\"')\\n\",\n    \"    commands.append(f\\\"cd {OPEN_SORA_HOME}\\\")\\n\",\n    \"    commands.append(\\n\",\n    \"        f\\\"python -m tools.datasets.datautil {input_file} --output {output_file} --format {output_format} --video-info --fmin 1\\\"\\n\",\n    \"    )\\n\",\n    \"    return \\\" && \\\".join(commands), output_file\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def get_caption_llava7b_video(input_file):\\n\",\n    \"    commands = []\\n\",\n    \"    base, ext = os.path.splitext(input_file)\\n\",\n    \"    output_file = f\\\"{base}_caption{ext}\\\"\\n\",\n    \"    output_format = ext[1:]\\n\",\n    \"\\n\",\n    \"    commands.append(f'echo \\\"Getting info of {input_file} to {output_file}\\\"')\\n\",\n    \"    commands.append(f\\\"cd {OPEN_SORA_HOME}\\\")\\n\",\n    \"    commands.append(f\\\"conda activate llava2\\\")\\n\",\n    \"    commands.append(\\n\",\n    \"        f\\\"torchrun --nproc_per_node 8 --standalone -m tools.caption.caption_llava {input_file} --dp-size 8 --tp-size 1 --model-path liuhaotian/llava-v1.6-mistral-7b --prompt video\\\"\\n\",\n    \"    )\\n\",\n    \"    commands.append(f\\\"conda activate opensora\\\")\\n\",\n    \"    commands.append(\\n\",\n    \"        f\\\"python -m tools.datasets.datautil {base}_caption_part*{ext} --output {output_file} --format {output_format} --intersection {input_file} --clean-caption --refine-llm-caption --remove-empty-caption\\\"\\n\",\n    \"    )\\n\",\n    \"    return \\\" && \\\".join(commands), output_file\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def get_caption_load(input_file):\\n\",\n    \"    commands = []\\n\",\n    \"    base, ext = os.path.splitext(input_file)\\n\",\n    \"    output_file = f\\\"{base}_caption{ext}\\\"\\n\",\n    \"    output_format = ext[1:]\\n\",\n    \"\\n\",\n    \"    commands.append(f'echo \\\"Getting caption of {input_file} to {output_file}\\\"')\\n\",\n    \"    commands.append(f\\\"cd {OPEN_SORA_HOME}\\\")\\n\",\n    \"    commands.append(\\n\",\n    \"        f\\\"python -m tools.datasets.datautil {input_file} --output {output_file} --format {output_format} --load-caption json --remove-empty-caption --clean-caption\\\"\\n\",\n    \"    )\\n\",\n    \"    return \\\" && \\\".join(commands), output_file\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def get_aesthetic_score(input_file):\\n\",\n    \"    commands = []\\n\",\n    \"    base, ext = os.path.splitext(input_file)\\n\",\n    \"    output_file = f\\\"{base}_aes{ext}\\\"\\n\",\n    \"    output_format = ext[1:]\\n\",\n    \"\\n\",\n    \"    commands.append(f'echo \\\"Getting aesthetic score of {input_file} to {output_file}\\\"')\\n\",\n    \"    commands.append(f\\\"cd {OPEN_SORA_HOME}\\\")\\n\",\n    \"    commands.append(f\\\"torchrun --standalone --nproc_per_node 8 -m tools.scoring.aesthetic.inference {input_file}\\\")\\n\",\n    \"    commands.append(\\n\",\n    \"        f\\\"python -m tools.datasets.datautil {base}_aes_part*{ext} --output {output_file} --format {output_format} --sort aes\\\"\\n\",\n    \"    )\\n\",\n    \"    return \\\" && \\\".join(commands), output_file\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def get_flow_score(input_file):\\n\",\n    \"    commands = []\\n\",\n    \"    base, ext = os.path.splitext(input_file)\\n\",\n    \"    output_file = f\\\"{base}_flow{ext}\\\"\\n\",\n    \"\\n\",\n    \"    commands.append(f'echo \\\"Getting flow score of {input_file} to {output_file}\\\"')\\n\",\n    \"    commands.append(f\\\"cd {OPEN_SORA_HOME}\\\")\\n\",\n    \"    commands.append(f\\\"torchrun --standalone --nproc_per_node 8 -m tools.scoring.optical_flow.inference {input_file}\\\")\\n\",\n    \"    return \\\" && \\\".join(commands), output_file\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def get_ocr(input_file):\\n\",\n    \"    commands = []\\n\",\n    \"    base, ext = os.path.splitext(input_file)\\n\",\n    \"    output_file = f\\\"{base}_match{ext}\\\"\\n\",\n    \"\\n\",\n    \"    commands.append(f'echo \\\"Getting match score of {input_file} to {output_file}\\\"')\\n\",\n    \"    commands.append(f\\\"cd {OPEN_SORA_HOME}\\\")\\n\",\n    \"    commands.append(f\\\"torchrun --standalone --nproc_per_node 8 -m tools.scoring.ocr.inference {input_file}\\\")\\n\",\n    \"    return \\\" && \\\".join(commands), output_file\\n\",\n    \"\\n\",\n    \"    \\n\",\n    \"def get_match_score(input_file):\\n\",\n    \"    commands = []\\n\",\n    \"    base, ext = os.path.splitext(input_file)\\n\",\n    \"    output_file = f\\\"{base}_match{ext}\\\"\\n\",\n    \"\\n\",\n    \"    commands.append(f'echo \\\"Getting match score of {input_file} to {output_file}\\\"')\\n\",\n    \"    commands.append(f\\\"cd {OPEN_SORA_HOME}\\\")\\n\",\n    \"    commands.append(f\\\"torchrun --standalone --nproc_per_node 8 -m tools.scoring.matching.inference {input_file}\\\")\\n\",\n    \"    return \\\" && \\\".join(commands), output_file\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def get_cmotion_score(input_file):\\n\",\n    \"    commands = []\\n\",\n    \"    base, ext = os.path.splitext(input_file)\\n\",\n    \"    output_file = f\\\"{base}_cmotion{ext}\\\"\\n\",\n    \"\\n\",\n    \"    commands.append(f'echo \\\"Getting cmotion score of {input_file} to {output_file}\\\"')\\n\",\n    \"    commands.append(f\\\"cd {OPEN_SORA_HOME}\\\")\\n\",\n    \"    commands.append(f\\\"python -m tools.caption.camera_motion_detect {input_file}\\\")\\n\",\n    \"    return \\\" && \\\".join(commands), output_file\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def get_commands(job_list):\\n\",\n    \"    commands = []\\n\",\n    \"    output_file = None\\n\",\n    \"    for job in job_list:\\n\",\n    \"        cmd = job.pop(\\\"cmd\\\")\\n\",\n    \"        if output_file is None:\\n\",\n    \"            command, output_file = cmd(**job)\\n\",\n    \"            commands.append(command)\\n\",\n    \"        else:\\n\",\n    \"            job[\\\"input_file\\\"] = output_file\\n\",\n    \"            command, output_file = cmd(**job)\\n\",\n    \"            commands.append(command)\\n\",\n    \"    commands.append(f'echo \\\"All Done!\\\"')\\n\",\n    \"    return \\\" && \\\".join(commands), output_file\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Remote Launch via Paramiko\\n\",\n    \"\\n\",\n    \"First, add hosts to `~/.ssh/config`\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import paramiko\\n\",\n    \"\\n\",\n    \"HOSTS = [\\\"host-0\\\", \\\"host-1\\\", \\\"host-2\\\", \\\"host-3\\\", \\\"host-4\\\", \\\"host-5\\\", \\\"host-6\\\", \\\"host-7\\\"]\\n\",\n    \"\\n\",\n    \"# load from ~/.ssh/config\\n\",\n    \"ssh_config = paramiko.SSHConfig()\\n\",\n    \"user_config_file = os.path.expanduser(\\\"~/.ssh/config\\\")\\n\",\n    \"if os.path.exists(user_config_file):\\n\",\n    \"    with open(user_config_file) as f:\\n\",\n    \"        ssh_config.parse(f)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def get_ssh_config(hostname):\\n\",\n    \"    # get the configuration for the host\\n\",\n    \"    user_config = ssh_config.lookup(hostname)\\n\",\n    \"    cfg = {\\n\",\n    \"        \\\"hostname\\\": user_config[\\\"hostname\\\"],\\n\",\n    \"        \\\"username\\\": user_config[\\\"user\\\"],\\n\",\n    \"        \\\"port\\\": int(user_config[\\\"port\\\"]),\\n\",\n    \"        \\\"key_filename\\\": user_config[\\\"identityfile\\\"],\\n\",\n    \"    }\\n\",\n    \"    return cfg\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def connect(hostname):\\n\",\n    \"    cfg = get_ssh_config(hostname)\\n\",\n    \"    # connect\\n\",\n    \"    client = paramiko.SSHClient()\\n\",\n    \"    client.set_missing_host_key_policy(paramiko.AutoAddPolicy())\\n\",\n    \"    client.connect(**cfg)\\n\",\n    \"    return client\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def run_command(command, hostname, nohup=False, log_file=None, sleep=None):\\n\",\n    \"    client = connect(hostname)\\n\",\n    \"    print(\\\"HOST:\\\", hostname)\\n\",\n    \"    if sleep:\\n\",\n    \"        command = f\\\"sleep {sleep}; {command}\\\"\\n\",\n    \"    command = f\\\"bash -ic '{command}'\\\"\\n\",\n    \"    if log_file:\\n\",\n    \"        command = f\\\"{command} >> {log_file} 2>&1\\\"\\n\",\n    \"    if nohup:\\n\",\n    \"        command = f\\\"nohup {command} &\\\"\\n\",\n    \"    print(\\\"COMMAND:\\\", command)\\n\",\n    \"    stdin, stdout, stderr = client.exec_command(command, get_pty=False)\\n\",\n    \"\\n\",\n    \"    stdout_str = stdout.read().decode()\\n\",\n    \"    stderr_str = stderr.read().decode()\\n\",\n    \"    if stdout_str:\\n\",\n    \"        print(\\\"==== STDOUT ====\\\\n\\\", stdout_str)\\n\",\n    \"    if stderr_str:\\n\",\n    \"        print(\\\"==== STDERR ====\\\\n\\\", stderr_str)\\n\",\n    \"\\n\",\n    \"    client.close()\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def run_command_all_hosts(command, hosts=HOSTS):\\n\",\n    \"    for hostname in hosts:\\n\",\n    \"        run_command(command, hostname)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Here are tools to examine machine's status.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def nvidia_smi(host):\\n\",\n    \"    if host:\\n\",\n    \"        run_command(\\\"nvidia-smi\\\", host)\\n\",\n    \"    else:\\n\",\n    \"        run_command_all_hosts(\\\"nvidia-smi\\\")\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def nvitop(host=None):\\n\",\n    \"    if host:\\n\",\n    \"        run_command(f\\\"/home/user/.local/bin/nvitop -1\\\", host)\\n\",\n    \"    else:\\n\",\n    \"        run_command_all_hosts(\\\"/home/user/.local/bin/nvitop -1\\\")\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def ps(host=None, interest=\\\"python|sleep|torchrun|colossal\\\", all=True):\\n\",\n    \"    cmd = \\\"ps aux\\\" if all else \\\"ps ux\\\"\\n\",\n    \"    if host:\\n\",\n    \"        if interest is None:\\n\",\n    \"            run_command(f\\\"{cmd} | cat\\\", host)\\n\",\n    \"        else:\\n\",\n    \"            run_command(f'{cmd} | cat | grep --color=never -E \\\"{interest}\\\"', host)\\n\",\n    \"    else:\\n\",\n    \"        if interest is None:\\n\",\n    \"            run_command_all_hosts(f\\\"{cmd} | cat\\\")\\n\",\n    \"        else:\\n\",\n    \"            run_command_all_hosts(f'{cmd} | cat | grep --color=never -E \\\"{interest}\\\"')\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def kill(pid, host):\\n\",\n    \"    run_command(f\\\"kill -KILL {pid}\\\", host)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def pkill(interest, host):\\n\",\n    \"    run_command(f'pkill -9 -f \\\"{interest}\\\"', host)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Example\\n\",\n    \"\\n\",\n    \"Remote launch via paramiko.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"sleep = None\\n\",\n    \"run_command(cmd, host, log_file=log_file, nohup=True, sleep=sleep)\\n\",\n    \"ps(host)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Using following commands to monitor the status of the jobs.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"ps()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"nvitop(host)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"kill(, host)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Training\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def colossal_run(data_path, load_path=None):\\n\",\n    \"    commands = []\\n\",\n    \"    commands.append(f\\\"cd {OPEN_SORA_HOME}\\\")\\n\",\n    \"    command = f\\\"colossalai run --nproc_per_node 8 --hostfile hostfile scripts/train.py configs/opensora-v1-1/train/video.py --wandb True --data-path {data_path}\\\"\\n\",\n    \"    if load_path:\\n\",\n    \"        command = f\\\"{command} --load-path {load_path}\\\"\\n\",\n    \"    commands.append(command)\\n\",\n    \"    cmd = \\\" && \\\".join(commands)\\n\",\n    \"    return cmd\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def kill_all():\\n\",\n    \"    commands = []\\n\",\n    \"    commands.append(f\\\"cd {OPEN_SORA_HOME}\\\")\\n\",\n    \"    commands.append('cat hostfile  | xargs -I \\\"{}\\\" ssh \\\"{}\\\" pkill -9 python')\\n\",\n    \"    cmd = \\\" && \\\".join(commands)\\n\",\n    \"    return cmd\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Examples\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"host = \\\"host-0\\\"\\n\",\n    \"log_file = os.path.join(OPEN_SORA_HOME, \\\"logs/train.log\\\")\\n\",\n    \"data_path = \\\"/path/to/meta.csv\\\"\\n\",\n    \"cmd = colossal_run(data_path)\\n\",\n    \"print(cmd)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"run_command(cmd, host, log_file=log_file, nohup=True)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"cmd = kill_all()\\n\",\n    \"run_command(cmd, host)\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3 (ipykernel)\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.9.18\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "Open-Sora/opensora/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/opensora/acceleration/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/opensora/acceleration/checkpoint.py",
    "content": "from collections.abc import Iterable\n\nimport torch.nn as nn\nfrom torch.utils.checkpoint import checkpoint, checkpoint_sequential\n\n\ndef set_grad_checkpoint(model, use_fp32_attention=False, gc_step=1):\n    assert isinstance(model, nn.Module)\n\n    def set_attr(module):\n        module.grad_checkpointing = True\n        module.fp32_attention = use_fp32_attention\n        module.grad_checkpointing_step = gc_step\n\n    model.apply(set_attr)\n\n\ndef auto_grad_checkpoint(module, *args, **kwargs):\n    if getattr(module, \"grad_checkpointing\", False):\n        if not isinstance(module, Iterable):\n            return checkpoint(module, *args, use_reentrant=False, **kwargs)\n        gc_step = module[0].grad_checkpointing_step\n        return checkpoint_sequential(module, gc_step, *args, use_reentrant=False, **kwargs)\n    return module(*args, **kwargs)\n"
  },
  {
    "path": "Open-Sora/opensora/acceleration/communications.py",
    "content": "import torch\nimport torch.distributed as dist\n\n\n# ====================\n# All-To-All\n# ====================\ndef _all_to_all(\n    input_: torch.Tensor,\n    world_size: int,\n    group: dist.ProcessGroup,\n    scatter_dim: int,\n    gather_dim: int,\n):\n    input_list = [t.contiguous() for t in torch.tensor_split(input_, world_size, scatter_dim)]\n    output_list = [torch.empty_like(input_list[0]) for _ in range(world_size)]\n    dist.all_to_all(output_list, input_list, group=group)\n    return torch.cat(output_list, dim=gather_dim).contiguous()\n\n\nclass _AllToAll(torch.autograd.Function):\n    \"\"\"All-to-all communication.\n\n    Args:\n        input_: input matrix\n        process_group: communication group\n        scatter_dim: scatter dimension\n        gather_dim: gather dimension\n    \"\"\"\n\n    @staticmethod\n    def forward(ctx, input_, process_group, scatter_dim, gather_dim):\n        ctx.process_group = process_group\n        ctx.scatter_dim = scatter_dim\n        ctx.gather_dim = gather_dim\n        ctx.world_size = dist.get_world_size(process_group)\n        output = _all_to_all(input_, ctx.world_size, process_group, scatter_dim, gather_dim)\n        return output\n\n    @staticmethod\n    def backward(ctx, grad_output):\n        grad_output = _all_to_all(\n            grad_output,\n            ctx.world_size,\n            ctx.process_group,\n            ctx.gather_dim,\n            ctx.scatter_dim,\n        )\n        return (\n            grad_output,\n            None,\n            None,\n            None,\n        )\n\n\ndef all_to_all(\n    input_: torch.Tensor,\n    process_group: dist.ProcessGroup,\n    scatter_dim: int = 2,\n    gather_dim: int = 1,\n):\n    return _AllToAll.apply(input_, process_group, scatter_dim, gather_dim)\n\n\ndef _gather(\n    input_: torch.Tensor,\n    world_size: int,\n    group: dist.ProcessGroup,\n    gather_dim: int,\n):\n    if gather_list is None:\n        gather_list = [torch.empty_like(input_) for _ in range(world_size)]\n    dist.gather(input_, gather_list, group=group, gather_dim=gather_dim)\n    return gather_list\n\n\n# ====================\n# Gather-Split\n# ====================\n\n\ndef _split(input_, pg: dist.ProcessGroup, dim=-1):\n    # skip if only one rank involved\n    world_size = dist.get_world_size(pg)\n    rank = dist.get_rank(pg)\n    if world_size == 1:\n        return input_\n\n    # Split along last dimension.\n    dim_size = input_.size(dim)\n    assert dim_size % world_size == 0, (\n        f\"The dimension to split ({dim_size}) is not a multiple of world size ({world_size}), \"\n        f\"cannot split tensor evenly\"\n    )\n\n    tensor_list = torch.split(input_, dim_size // world_size, dim=dim)\n    output = tensor_list[rank].contiguous()\n\n    return output\n\n\ndef _gather(input_, pg: dist.ProcessGroup, dim=-1):\n    # skip if only one rank involved\n    input_ = input_.contiguous()\n    world_size = dist.get_world_size(pg)\n    dist.get_rank(pg)\n\n    if world_size == 1:\n        return input_\n\n    # all gather\n    tensor_list = [torch.empty_like(input_) for _ in range(world_size)]\n    assert input_.device.type == \"cuda\"\n    torch.distributed.all_gather(tensor_list, input_, group=pg)\n\n    # concat\n    output = torch.cat(tensor_list, dim=dim).contiguous()\n\n    return output\n\n\nclass _GatherForwardSplitBackward(torch.autograd.Function):\n    \"\"\"Gather the input from model parallel region and concatenate.\n\n    Args:\n        input_: input matrix.\n        process_group: parallel mode.\n        dim: dimension\n    \"\"\"\n\n    @staticmethod\n    def symbolic(graph, input_):\n        return _gather(input_)\n\n    @staticmethod\n    def forward(ctx, input_, process_group, dim, grad_scale):\n        ctx.mode = process_group\n        ctx.dim = dim\n        ctx.grad_scale = grad_scale\n        return _gather(input_, process_group, dim)\n\n    @staticmethod\n    def backward(ctx, grad_output):\n        if ctx.grad_scale == \"up\":\n            grad_output = grad_output * dist.get_world_size(ctx.mode)\n        elif ctx.grad_scale == \"down\":\n            grad_output = grad_output / dist.get_world_size(ctx.mode)\n\n        return _split(grad_output, ctx.mode, ctx.dim), None, None, None\n\n\nclass _SplitForwardGatherBackward(torch.autograd.Function):\n    \"\"\"\n    Split the input and keep only the corresponding chuck to the rank.\n\n    Args:\n        input_: input matrix.\n        process_group: parallel mode.\n        dim: dimension\n    \"\"\"\n\n    @staticmethod\n    def symbolic(graph, input_):\n        return _split(input_)\n\n    @staticmethod\n    def forward(ctx, input_, process_group, dim, grad_scale):\n        ctx.mode = process_group\n        ctx.dim = dim\n        ctx.grad_scale = grad_scale\n        return _split(input_, process_group, dim)\n\n    @staticmethod\n    def backward(ctx, grad_output):\n        if ctx.grad_scale == \"up\":\n            grad_output = grad_output * dist.get_world_size(ctx.mode)\n        elif ctx.grad_scale == \"down\":\n            grad_output = grad_output / dist.get_world_size(ctx.mode)\n        return _gather(grad_output, ctx.mode, ctx.dim), None, None, None\n\n\ndef split_forward_gather_backward(input_, process_group, dim, grad_scale=1.0):\n    return _SplitForwardGatherBackward.apply(input_, process_group, dim, grad_scale)\n\n\ndef gather_forward_split_backward(input_, process_group, dim, grad_scale=None):\n    return _GatherForwardSplitBackward.apply(input_, process_group, dim, grad_scale)\n"
  },
  {
    "path": "Open-Sora/opensora/acceleration/parallel_states.py",
    "content": "import torch.distributed as dist\n\n_GLOBAL_PARALLEL_GROUPS = dict()\n\n\ndef set_data_parallel_group(group: dist.ProcessGroup):\n    _GLOBAL_PARALLEL_GROUPS[\"data\"] = group\n\n\ndef get_data_parallel_group():\n    return _GLOBAL_PARALLEL_GROUPS.get(\"data\", dist.group.WORLD)\n\n\ndef set_sequence_parallel_group(group: dist.ProcessGroup):\n    _GLOBAL_PARALLEL_GROUPS[\"sequence\"] = group\n\n\ndef get_sequence_parallel_group():\n    return _GLOBAL_PARALLEL_GROUPS.get(\"sequence\", None)\n"
  },
  {
    "path": "Open-Sora/opensora/acceleration/plugin.py",
    "content": "import random\nfrom typing import Optional\n\nimport numpy as np\nimport torch\nfrom colossalai.booster.plugin import LowLevelZeroPlugin\nfrom colossalai.cluster import ProcessGroupMesh\nfrom torch.utils.data import DataLoader\nfrom torch.utils.data.distributed import DistributedSampler\n\nDP_AXIS, SP_AXIS = 0, 1\n\n\nclass ZeroSeqParallelPlugin(LowLevelZeroPlugin):\n    def __init__(\n        self,\n        sp_size: int = 1,\n        stage: int = 2,\n        precision: str = \"fp16\",\n        initial_scale: float = 2**32,\n        min_scale: float = 1,\n        growth_factor: float = 2,\n        backoff_factor: float = 0.5,\n        growth_interval: int = 1000,\n        hysteresis: int = 2,\n        max_scale: float = 2**32,\n        max_norm: float = 0.0,\n        norm_type: float = 2.0,\n        reduce_bucket_size_in_m: int = 12,\n        communication_dtype: Optional[torch.dtype] = None,\n        overlap_communication: bool = True,\n        cpu_offload: bool = False,\n        master_weights: bool = True,\n        verbose: bool = False,\n    ) -> None:\n        super().__init__(\n            stage=stage,\n            precision=precision,\n            initial_scale=initial_scale,\n            min_scale=min_scale,\n            growth_factor=growth_factor,\n            backoff_factor=backoff_factor,\n            growth_interval=growth_interval,\n            hysteresis=hysteresis,\n            max_scale=max_scale,\n            max_norm=max_norm,\n            norm_type=norm_type,\n            reduce_bucket_size_in_m=reduce_bucket_size_in_m,\n            communication_dtype=communication_dtype,\n            overlap_communication=overlap_communication,\n            cpu_offload=cpu_offload,\n            master_weights=master_weights,\n            verbose=verbose,\n        )\n        self.sp_size = sp_size\n        assert self.world_size % sp_size == 0, \"world_size must be divisible by sp_size\"\n        self.dp_size = self.world_size // sp_size\n        self.pg_mesh = ProcessGroupMesh(self.dp_size, self.sp_size)\n        self.dp_group = self.pg_mesh.get_group_along_axis(DP_AXIS)\n        self.sp_group = self.pg_mesh.get_group_along_axis(SP_AXIS)\n        self.dp_rank = self.pg_mesh.coordinate(DP_AXIS)\n        self.sp_rank = self.pg_mesh.coordinate(SP_AXIS)\n\n    def __del__(self):\n        \"\"\"Destroy the prcess groups in ProcessGroupMesh\"\"\"\n        self.pg_mesh.destroy_mesh_process_groups()\n\n    def prepare_dataloader(\n        self,\n        dataset,\n        batch_size,\n        shuffle=False,\n        seed=1024,\n        drop_last=False,\n        pin_memory=False,\n        num_workers=0,\n        distributed_sampler_cls=None,\n        **kwargs,\n    ):\n        _kwargs = kwargs.copy()\n        distributed_sampler_cls = distributed_sampler_cls or DistributedSampler\n        sampler = distributed_sampler_cls(dataset, num_replicas=self.dp_size, rank=self.dp_rank, shuffle=shuffle)\n\n        # Deterministic dataloader\n        def seed_worker(worker_id):\n            worker_seed = seed\n            np.random.seed(worker_seed)\n            torch.manual_seed(worker_seed)\n            random.seed(worker_seed)\n\n        return DataLoader(\n            dataset,\n            batch_size=batch_size,\n            sampler=sampler,\n            worker_init_fn=seed_worker,\n            drop_last=drop_last,\n            pin_memory=pin_memory,\n            num_workers=num_workers,\n            **_kwargs,\n        )\n"
  },
  {
    "path": "Open-Sora/opensora/acceleration/shardformer/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/opensora/acceleration/shardformer/modeling/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/opensora/acceleration/shardformer/modeling/t5.py",
    "content": "import torch\nimport torch.nn as nn\n\n\nclass T5LayerNorm(nn.Module):\n    def __init__(self, hidden_size, eps=1e-6):\n        \"\"\"\n        Construct a layernorm module in the T5 style. No bias and no subtraction of mean.\n        \"\"\"\n        super().__init__()\n        self.weight = nn.Parameter(torch.ones(hidden_size))\n        self.variance_epsilon = eps\n\n    def forward(self, hidden_states):\n        # T5 uses a layer_norm which only scales and doesn't shift, which is also known as Root Mean\n        # Square Layer Normalization https://arxiv.org/abs/1910.07467 thus varience is calculated\n        # w/o mean and there is no bias. Additionally we want to make sure that the accumulation for\n        # half-precision inputs is done in fp32\n\n        variance = hidden_states.to(torch.float32).pow(2).mean(-1, keepdim=True)\n        hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)\n\n        # convert into half-precision if necessary\n        if self.weight.dtype in [torch.float16, torch.bfloat16]:\n            hidden_states = hidden_states.to(self.weight.dtype)\n\n        return self.weight * hidden_states\n\n    @staticmethod\n    def from_native_module(module, *args, **kwargs):\n        assert module.__class__.__name__ == \"FusedRMSNorm\", (\n            \"Recovering T5LayerNorm requires the original layer to be apex's Fused RMS Norm.\"\n            \"Apex's fused norm is automatically used by Hugging Face Transformers https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py#L265C5-L265C48\"\n        )\n\n        layer_norm = T5LayerNorm(module.normalized_shape, eps=module.eps)\n        layer_norm.weight.data.copy_(module.weight.data)\n        layer_norm = layer_norm.to(module.weight.device)\n        return layer_norm\n"
  },
  {
    "path": "Open-Sora/opensora/acceleration/shardformer/policy/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/opensora/acceleration/shardformer/policy/t5_encoder.py",
    "content": "from colossalai.shardformer.modeling.jit import get_jit_fused_dropout_add_func\nfrom colossalai.shardformer.modeling.t5 import get_jit_fused_T5_layer_ff_forward, get_T5_layer_self_attention_forward\nfrom colossalai.shardformer.policies.base_policy import Policy, SubModuleReplacementDescription\n\n\nclass T5EncoderPolicy(Policy):\n    def config_sanity_check(self):\n        assert not self.shard_config.enable_tensor_parallelism\n        assert not self.shard_config.enable_flash_attention\n\n    def preprocess(self):\n        return self.model\n\n    def module_policy(self):\n        from transformers.models.t5.modeling_t5 import T5LayerFF, T5LayerSelfAttention, T5Stack\n\n        policy = {}\n\n        # check whether apex is installed\n        try:\n            from opensora.acceleration.shardformer.modeling.t5 import T5LayerNorm\n\n            # recover hf from fused rms norm to T5 norm which is faster\n            self.append_or_create_submodule_replacement(\n                description=SubModuleReplacementDescription(\n                    suffix=\"layer_norm\",\n                    target_module=T5LayerNorm,\n                ),\n                policy=policy,\n                target_key=T5LayerFF,\n            )\n            self.append_or_create_submodule_replacement(\n                description=SubModuleReplacementDescription(suffix=\"layer_norm\", target_module=T5LayerNorm),\n                policy=policy,\n                target_key=T5LayerSelfAttention,\n            )\n            self.append_or_create_submodule_replacement(\n                description=SubModuleReplacementDescription(suffix=\"final_layer_norm\", target_module=T5LayerNorm),\n                policy=policy,\n                target_key=T5Stack,\n            )\n        except (ImportError, ModuleNotFoundError):\n            pass\n\n        # use jit operator\n        if self.shard_config.enable_jit_fused:\n            self.append_or_create_method_replacement(\n                description={\n                    \"forward\": get_jit_fused_T5_layer_ff_forward(),\n                    \"dropout_add\": get_jit_fused_dropout_add_func(),\n                },\n                policy=policy,\n                target_key=T5LayerFF,\n            )\n            self.append_or_create_method_replacement(\n                description={\n                    \"forward\": get_T5_layer_self_attention_forward(),\n                    \"dropout_add\": get_jit_fused_dropout_add_func(),\n                },\n                policy=policy,\n                target_key=T5LayerSelfAttention,\n            )\n\n        return policy\n\n    def postprocess(self):\n        return self.model\n"
  },
  {
    "path": "Open-Sora/opensora/datasets/__init__.py",
    "content": "from .datasets import IMG_FPS, BatchFeatureDataset, VariableVideoTextDataset, VideoTextDataset\nfrom .utils import get_transforms_image, get_transforms_video, is_img, is_vid, save_sample\n"
  },
  {
    "path": "Open-Sora/opensora/datasets/aspect.py",
    "content": "import math\n\n\n# computation\ndef get_h_w(a, ts, eps=1e-4):\n    h = (ts * a) ** 0.5\n    h = h + eps\n    h = math.ceil(h) if math.ceil(h) % 2 == 0 else math.floor(h)\n    w = h / a\n    w = w + eps\n    w = math.ceil(w) if math.ceil(w) % 2 == 0 else math.floor(w)\n    return h, w\n\n\ndef get_aspect_ratios_dict(ars, ts=360 * 640):\n    est = {f\"{a:.2f}\": get_h_w(a, ts) for a in ars}\n    return est\n\n\ndef get_ar(ratio):\n    h, w = ratio.split(\":\")\n    return int(h) / int(w)\n\n\n# H:W\nASPECT_RATIO_MAP = {\n    \"3:8\": \"0.38\",\n    \"9:21\": \"0.43\",\n    \"12:25\": \"0.48\",\n    \"1:2\": \"0.50\",\n    \"9:17\": \"0.53\",\n    \"27:50\": \"0.54\",\n    \"9:16\": \"0.56\",\n    \"5:8\": \"0.62\",\n    \"2:3\": \"0.67\",\n    \"3:4\": \"0.75\",\n    \"1:1\": \"1.00\",\n    \"4:3\": \"1.33\",\n    \"3:2\": \"1.50\",\n    \"16:9\": \"1.78\",\n    \"17:9\": \"1.89\",\n    \"2:1\": \"2.00\",\n    \"50:27\": \"2.08\",\n}\n\n\nAR = [get_ar(ratio) for ratio in ASPECT_RATIO_MAP.keys()]\n\n# computed from above code\n# S = 8294400\nASPECT_RATIO_4K = {\n    \"0.38\": (1764, 4704),\n    \"0.43\": (1886, 4400),\n    \"0.48\": (1996, 4158),\n    \"0.50\": (2036, 4072),\n    \"0.53\": (2096, 3960),\n    \"0.54\": (2118, 3918),\n    \"0.62\": (2276, 3642),\n    \"0.56\": (2160, 3840),  # base\n    \"0.67\": (2352, 3528),\n    \"0.75\": (2494, 3326),\n    \"1.00\": (2880, 2880),\n    \"1.33\": (3326, 2494),\n    \"1.50\": (3528, 2352),\n    \"1.78\": (3840, 2160),\n    \"1.89\": (3958, 2096),\n    \"2.00\": (4072, 2036),\n    \"2.08\": (4156, 1994),\n}\n\n# S = 3686400\nASPECT_RATIO_2K = {\n    \"0.38\": (1176, 3136),\n    \"0.43\": (1256, 2930),\n    \"0.48\": (1330, 2770),\n    \"0.50\": (1358, 2716),\n    \"0.53\": (1398, 2640),\n    \"0.54\": (1412, 2612),\n    \"0.56\": (1440, 2560),  # base\n    \"0.62\": (1518, 2428),\n    \"0.67\": (1568, 2352),\n    \"0.75\": (1662, 2216),\n    \"1.00\": (1920, 1920),\n    \"1.33\": (2218, 1664),\n    \"1.50\": (2352, 1568),\n    \"1.78\": (2560, 1440),\n    \"1.89\": (2638, 1396),\n    \"2.00\": (2716, 1358),\n    \"2.08\": (2772, 1330),\n}\n\n# S = 2073600\nASPECT_RATIO_1080P = {\n    \"0.38\": (882, 2352),\n    \"0.43\": (942, 2198),\n    \"0.48\": (998, 2080),\n    \"0.50\": (1018, 2036),\n    \"0.53\": (1048, 1980),\n    \"0.54\": (1058, 1958),\n    \"0.56\": (1080, 1920),  # base\n    \"0.62\": (1138, 1820),\n    \"0.67\": (1176, 1764),\n    \"0.75\": (1248, 1664),\n    \"1.00\": (1440, 1440),\n    \"1.33\": (1662, 1246),\n    \"1.50\": (1764, 1176),\n    \"1.78\": (1920, 1080),\n    \"1.89\": (1980, 1048),\n    \"2.00\": (2036, 1018),\n    \"2.08\": (2078, 998),\n}\n\n# S = 921600\nASPECT_RATIO_720P = {\n    \"0.38\": (588, 1568),\n    \"0.43\": (628, 1466),\n    \"0.48\": (666, 1388),\n    \"0.50\": (678, 1356),\n    \"0.53\": (698, 1318),\n    \"0.54\": (706, 1306),\n    \"0.56\": (720, 1280),  # base\n    \"0.62\": (758, 1212),\n    \"0.67\": (784, 1176),\n    \"0.75\": (832, 1110),\n    \"1.00\": (960, 960),\n    \"1.33\": (1108, 832),\n    \"1.50\": (1176, 784),\n    \"1.78\": (1280, 720),\n    \"1.89\": (1320, 698),\n    \"2.00\": (1358, 680),\n    \"2.08\": (1386, 666),\n}\n\n# S = 409920\nASPECT_RATIO_480P = {\n    \"0.38\": (392, 1046),\n    \"0.43\": (420, 980),\n    \"0.48\": (444, 925),\n    \"0.50\": (452, 904),\n    \"0.53\": (466, 880),\n    \"0.54\": (470, 870),\n    \"0.56\": (480, 854),  # base\n    \"0.62\": (506, 810),\n    \"0.67\": (522, 784),\n    \"0.75\": (554, 738),\n    \"1.00\": (640, 640),\n    \"1.33\": (740, 555),\n    \"1.50\": (784, 522),\n    \"1.78\": (854, 480),\n    \"1.89\": (880, 466),\n    \"2.00\": (906, 454),\n    \"2.08\": (924, 444),\n}\n\n# S = 230400\nASPECT_RATIO_360P = {\n    \"0.38\": (294, 784),\n    \"0.43\": (314, 732),\n    \"0.48\": (332, 692),\n    \"0.50\": (340, 680),\n    \"0.53\": (350, 662),\n    \"0.54\": (352, 652),\n    \"0.56\": (360, 640),  # base\n    \"0.62\": (380, 608),\n    \"0.67\": (392, 588),\n    \"0.75\": (416, 554),\n    \"1.00\": (480, 480),\n    \"1.33\": (554, 416),\n    \"1.50\": (588, 392),\n    \"1.78\": (640, 360),\n    \"1.89\": (660, 350),\n    \"2.00\": (678, 340),\n    \"2.08\": (692, 332),\n}\n\n# S = 102240\nASPECT_RATIO_240P = {\n    \"0.38\": (196, 522),\n    \"0.43\": (210, 490),\n    \"0.48\": (222, 462),\n    \"0.50\": (226, 452),\n    \"0.53\": (232, 438),\n    \"0.54\": (236, 436),\n    \"0.56\": (240, 426),  # base\n    \"0.62\": (252, 404),\n    \"0.67\": (262, 393),\n    \"0.75\": (276, 368),\n    \"1.00\": (320, 320),\n    \"1.33\": (370, 278),\n    \"1.50\": (392, 262),\n    \"1.78\": (426, 240),\n    \"1.89\": (440, 232),\n    \"2.00\": (452, 226),\n    \"2.08\": (462, 222),\n}\n\n# S = 36864\nASPECT_RATIO_144P = {\n    \"0.38\": (117, 312),\n    \"0.43\": (125, 291),\n    \"0.48\": (133, 277),\n    \"0.50\": (135, 270),\n    \"0.53\": (139, 262),\n    \"0.54\": (141, 260),\n    \"0.56\": (144, 256),  # base\n    \"0.62\": (151, 241),\n    \"0.67\": (156, 234),\n    \"0.75\": (166, 221),\n    \"1.00\": (192, 192),\n    \"1.33\": (221, 165),\n    \"1.50\": (235, 156),\n    \"1.78\": (256, 144),\n    \"1.89\": (263, 139),\n    \"2.00\": (271, 135),\n    \"2.08\": (277, 132),\n}\n\n# from PixArt\n# S = 8294400\nASPECT_RATIO_2880 = {\n    \"0.25\": (1408, 5760),\n    \"0.26\": (1408, 5568),\n    \"0.27\": (1408, 5376),\n    \"0.28\": (1408, 5184),\n    \"0.32\": (1600, 4992),\n    \"0.33\": (1600, 4800),\n    \"0.34\": (1600, 4672),\n    \"0.40\": (1792, 4480),\n    \"0.42\": (1792, 4288),\n    \"0.47\": (1920, 4096),\n    \"0.49\": (1920, 3904),\n    \"0.51\": (1920, 3776),\n    \"0.55\": (2112, 3840),\n    \"0.59\": (2112, 3584),\n    \"0.68\": (2304, 3392),\n    \"0.72\": (2304, 3200),\n    \"0.78\": (2496, 3200),\n    \"0.83\": (2496, 3008),\n    \"0.89\": (2688, 3008),\n    \"0.93\": (2688, 2880),\n    \"1.00\": (2880, 2880),\n    \"1.07\": (2880, 2688),\n    \"1.12\": (3008, 2688),\n    \"1.21\": (3008, 2496),\n    \"1.28\": (3200, 2496),\n    \"1.39\": (3200, 2304),\n    \"1.47\": (3392, 2304),\n    \"1.70\": (3584, 2112),\n    \"1.82\": (3840, 2112),\n    \"2.03\": (3904, 1920),\n    \"2.13\": (4096, 1920),\n    \"2.39\": (4288, 1792),\n    \"2.50\": (4480, 1792),\n    \"2.92\": (4672, 1600),\n    \"3.00\": (4800, 1600),\n    \"3.12\": (4992, 1600),\n    \"3.68\": (5184, 1408),\n    \"3.82\": (5376, 1408),\n    \"3.95\": (5568, 1408),\n    \"4.00\": (5760, 1408),\n}\n\n# S = 4194304\nASPECT_RATIO_2048 = {\n    \"0.25\": (1024, 4096),\n    \"0.26\": (1024, 3968),\n    \"0.27\": (1024, 3840),\n    \"0.28\": (1024, 3712),\n    \"0.32\": (1152, 3584),\n    \"0.33\": (1152, 3456),\n    \"0.35\": (1152, 3328),\n    \"0.40\": (1280, 3200),\n    \"0.42\": (1280, 3072),\n    \"0.48\": (1408, 2944),\n    \"0.50\": (1408, 2816),\n    \"0.52\": (1408, 2688),\n    \"0.57\": (1536, 2688),\n    \"0.60\": (1536, 2560),\n    \"0.68\": (1664, 2432),\n    \"0.72\": (1664, 2304),\n    \"0.78\": (1792, 2304),\n    \"0.82\": (1792, 2176),\n    \"0.88\": (1920, 2176),\n    \"0.94\": (1920, 2048),\n    \"1.00\": (2048, 2048),\n    \"1.07\": (2048, 1920),\n    \"1.13\": (2176, 1920),\n    \"1.21\": (2176, 1792),\n    \"1.29\": (2304, 1792),\n    \"1.38\": (2304, 1664),\n    \"1.46\": (2432, 1664),\n    \"1.67\": (2560, 1536),\n    \"1.75\": (2688, 1536),\n    \"2.00\": (2816, 1408),\n    \"2.09\": (2944, 1408),\n    \"2.40\": (3072, 1280),\n    \"2.50\": (3200, 1280),\n    \"2.89\": (3328, 1152),\n    \"3.00\": (3456, 1152),\n    \"3.11\": (3584, 1152),\n    \"3.62\": (3712, 1024),\n    \"3.75\": (3840, 1024),\n    \"3.88\": (3968, 1024),\n    \"4.00\": (4096, 1024),\n}\n\n# S = 1048576\nASPECT_RATIO_1024 = {\n    \"0.25\": (512, 2048),\n    \"0.26\": (512, 1984),\n    \"0.27\": (512, 1920),\n    \"0.28\": (512, 1856),\n    \"0.32\": (576, 1792),\n    \"0.33\": (576, 1728),\n    \"0.35\": (576, 1664),\n    \"0.40\": (640, 1600),\n    \"0.42\": (640, 1536),\n    \"0.48\": (704, 1472),\n    \"0.50\": (704, 1408),\n    \"0.52\": (704, 1344),\n    \"0.57\": (768, 1344),\n    \"0.60\": (768, 1280),\n    \"0.68\": (832, 1216),\n    \"0.72\": (832, 1152),\n    \"0.78\": (896, 1152),\n    \"0.82\": (896, 1088),\n    \"0.88\": (960, 1088),\n    \"0.94\": (960, 1024),\n    \"1.00\": (1024, 1024),\n    \"1.07\": (1024, 960),\n    \"1.13\": (1088, 960),\n    \"1.21\": (1088, 896),\n    \"1.29\": (1152, 896),\n    \"1.38\": (1152, 832),\n    \"1.46\": (1216, 832),\n    \"1.67\": (1280, 768),\n    \"1.75\": (1344, 768),\n    \"2.00\": (1408, 704),\n    \"2.09\": (1472, 704),\n    \"2.40\": (1536, 640),\n    \"2.50\": (1600, 640),\n    \"2.89\": (1664, 576),\n    \"3.00\": (1728, 576),\n    \"3.11\": (1792, 576),\n    \"3.62\": (1856, 512),\n    \"3.75\": (1920, 512),\n    \"3.88\": (1984, 512),\n    \"4.00\": (2048, 512),\n}\n\n# S = 262144\nASPECT_RATIO_512 = {\n    \"0.25\": (256, 1024),\n    \"0.26\": (256, 992),\n    \"0.27\": (256, 960),\n    \"0.28\": (256, 928),\n    \"0.32\": (288, 896),\n    \"0.33\": (288, 864),\n    \"0.35\": (288, 832),\n    \"0.40\": (320, 800),\n    \"0.42\": (320, 768),\n    \"0.48\": (352, 736),\n    \"0.50\": (352, 704),\n    \"0.52\": (352, 672),\n    \"0.57\": (384, 672),\n    \"0.60\": (384, 640),\n    \"0.68\": (416, 608),\n    \"0.72\": (416, 576),\n    \"0.78\": (448, 576),\n    \"0.82\": (448, 544),\n    \"0.88\": (480, 544),\n    \"0.94\": (480, 512),\n    \"1.00\": (512, 512),\n    \"1.07\": (512, 480),\n    \"1.13\": (544, 480),\n    \"1.21\": (544, 448),\n    \"1.29\": (576, 448),\n    \"1.38\": (576, 416),\n    \"1.46\": (608, 416),\n    \"1.67\": (640, 384),\n    \"1.75\": (672, 384),\n    \"2.00\": (704, 352),\n    \"2.09\": (736, 352),\n    \"2.40\": (768, 320),\n    \"2.50\": (800, 320),\n    \"2.89\": (832, 288),\n    \"3.00\": (864, 288),\n    \"3.11\": (896, 288),\n    \"3.62\": (928, 256),\n    \"3.75\": (960, 256),\n    \"3.88\": (992, 256),\n    \"4.00\": (1024, 256),\n}\n\n# S = 65536\nASPECT_RATIO_256 = {\n    \"0.25\": (128, 512),\n    \"0.26\": (128, 496),\n    \"0.27\": (128, 480),\n    \"0.28\": (128, 464),\n    \"0.32\": (144, 448),\n    \"0.33\": (144, 432),\n    \"0.35\": (144, 416),\n    \"0.40\": (160, 400),\n    \"0.42\": (160, 384),\n    \"0.48\": (176, 368),\n    \"0.50\": (176, 352),\n    \"0.52\": (176, 336),\n    \"0.57\": (192, 336),\n    \"0.60\": (192, 320),\n    \"0.68\": (208, 304),\n    \"0.72\": (208, 288),\n    \"0.78\": (224, 288),\n    \"0.82\": (224, 272),\n    \"0.88\": (240, 272),\n    \"0.94\": (240, 256),\n    \"1.00\": (256, 256),\n    \"1.07\": (256, 240),\n    \"1.13\": (272, 240),\n    \"1.21\": (272, 224),\n    \"1.29\": (288, 224),\n    \"1.38\": (288, 208),\n    \"1.46\": (304, 208),\n    \"1.67\": (320, 192),\n    \"1.75\": (336, 192),\n    \"2.00\": (352, 176),\n    \"2.09\": (368, 176),\n    \"2.40\": (384, 160),\n    \"2.50\": (400, 160),\n    \"2.89\": (416, 144),\n    \"3.00\": (432, 144),\n    \"3.11\": (448, 144),\n    \"3.62\": (464, 128),\n    \"3.75\": (480, 128),\n    \"3.88\": (496, 128),\n    \"4.00\": (512, 128),\n}\n\n\ndef get_closest_ratio(height: float, width: float, ratios: dict):\n    aspect_ratio = height / width\n    closest_ratio = min(ratios.keys(), key=lambda ratio: abs(float(ratio) - aspect_ratio))\n    return closest_ratio\n\n\nASPECT_RATIOS = {\n    \"144p\": (36864, ASPECT_RATIO_144P),\n    \"256\": (65536, ASPECT_RATIO_256),\n    \"240p\": (102240, ASPECT_RATIO_240P),\n    \"360p\": (230400, ASPECT_RATIO_360P),\n    \"512\": (262144, ASPECT_RATIO_512),\n    \"480p\": (409920, ASPECT_RATIO_480P),\n    \"720p\": (921600, ASPECT_RATIO_720P),\n    \"1024\": (1048576, ASPECT_RATIO_1024),\n    \"1080p\": (2073600, ASPECT_RATIO_1080P),\n    \"2k\": (3686400, ASPECT_RATIO_2K),\n    \"2048\": (4194304, ASPECT_RATIO_2048),\n    \"2880\": (8294400, ASPECT_RATIO_2880),\n    \"4k\": (8294400, ASPECT_RATIO_4K),\n}\n\n\ndef get_num_pixels(name):\n    return ASPECT_RATIOS[name][0]\n\n\ndef get_image_size(resolution, ar_ratio):\n    if ar_ratio in ASPECT_RATIO_MAP:\n        ar_key = ASPECT_RATIO_MAP[ar_ratio]\n    else:\n        ar_key = ar_ratio\n    rs_dict = ASPECT_RATIOS[resolution][1]\n    assert ar_key in rs_dict, f\"Aspect ratio {ar_ratio} not found for resolution {resolution}\"\n    return rs_dict[ar_key]\n\n\nNUM_FRAMES_MAP = {\n    \"1x\": 51,\n    \"2x\": 102,\n    \"4x\": 204,\n    \"8x\": 408,\n    \"16x\": 816,\n    \"2s\": 51,\n    \"4s\": 102,\n    \"8s\": 204,\n    \"16s\": 408,\n    \"32s\": 816,\n}\n\n\ndef get_num_frames(num_frames):\n    if num_frames in NUM_FRAMES_MAP:\n        return NUM_FRAMES_MAP[num_frames]\n    else:\n        return int(num_frames)\n"
  },
  {
    "path": "Open-Sora/opensora/datasets/bucket.py",
    "content": "from collections import OrderedDict\n\nimport numpy as np\n\nfrom opensora.utils.misc import get_logger\n\nfrom .aspect import ASPECT_RATIOS, get_closest_ratio\n\n\ndef find_approximate_hw(hw, hw_dict, approx=0.8):\n    for k, v in hw_dict.items():\n        if hw >= v * approx:\n            return k\n    return None\n\n\ndef find_closet_smaller_bucket(t, t_dict, frame_interval):\n    # process image\n    if t == 1:\n        if 1 in t_dict:\n            return 1\n        else:\n            return None\n    # process video\n    for k, v in t_dict.items():\n        if t >= v * frame_interval and v != 1:\n            return k\n    return None\n\n\nclass Bucket:\n    def __init__(self, bucket_config):\n        for key in bucket_config:\n            assert key in ASPECT_RATIOS, f\"Aspect ratio {key} not found.\"\n        # wrap config with OrderedDict\n        bucket_probs = OrderedDict()\n        bucket_bs = OrderedDict()\n        bucket_names = sorted(bucket_config.keys(), key=lambda x: ASPECT_RATIOS[x][0], reverse=True)\n        for key in bucket_names:\n            bucket_time_names = sorted(bucket_config[key].keys(), key=lambda x: x, reverse=True)\n            bucket_probs[key] = OrderedDict({k: bucket_config[key][k][0] for k in bucket_time_names})\n            bucket_bs[key] = OrderedDict({k: bucket_config[key][k][1] for k in bucket_time_names})\n\n        # first level: HW\n        num_bucket = 0\n        hw_criteria = dict()\n        t_criteria = dict()\n        ar_criteria = dict()\n        bucket_id = OrderedDict()\n        bucket_id_cnt = 0\n        for k1, v1 in bucket_probs.items():\n            hw_criteria[k1] = ASPECT_RATIOS[k1][0]\n            t_criteria[k1] = dict()\n            ar_criteria[k1] = dict()\n            bucket_id[k1] = dict()\n            for k2, _ in v1.items():\n                t_criteria[k1][k2] = k2\n                bucket_id[k1][k2] = bucket_id_cnt\n                bucket_id_cnt += 1\n                ar_criteria[k1][k2] = dict()\n                for k3, v3 in ASPECT_RATIOS[k1][1].items():\n                    ar_criteria[k1][k2][k3] = v3\n                    num_bucket += 1\n\n        self.bucket_probs = bucket_probs\n        self.bucket_bs = bucket_bs\n        self.bucket_id = bucket_id\n        self.hw_criteria = hw_criteria\n        self.t_criteria = t_criteria\n        self.ar_criteria = ar_criteria\n        self.num_bucket = num_bucket\n        get_logger().info(\"Number of buckets: %s\", num_bucket)\n\n    def get_bucket_id(self, T, H, W, frame_interval=1, seed=None):\n        resolution = H * W\n        approx = 0.8\n\n        fail = True\n        for hw_id, t_criteria in self.bucket_probs.items():\n            if resolution < self.hw_criteria[hw_id] * approx:\n                continue\n\n            # if sample is an image\n            if T == 1:\n                if 1 in t_criteria:\n                    rng = np.random.default_rng(seed + self.bucket_id[hw_id][1])\n                    if rng.random() < t_criteria[1]:\n                        fail = False\n                        t_id = 1\n                        break\n                else:\n                    continue\n\n            # otherwise, find suitable t_id for video\n            t_fail = True\n            for t_id, prob in t_criteria.items():\n                rng = np.random.default_rng(seed + self.bucket_id[hw_id][t_id])\n                if isinstance(prob, tuple):\n                    prob_t = prob[1]\n                    if rng.random() > prob_t:\n                        continue\n                if T > t_id * frame_interval and t_id != 1:\n                    t_fail = False\n                    break\n            if t_fail:\n                continue\n\n            # leave the loop if prob is high enough\n            if isinstance(prob, tuple):\n                prob = prob[0]\n            if prob >= 1 or rng.random() < prob:\n                fail = False\n                break\n        if fail:\n            return None\n\n        # get aspect ratio id\n        ar_criteria = self.ar_criteria[hw_id][t_id]\n        ar_id = get_closest_ratio(H, W, ar_criteria)\n        return hw_id, t_id, ar_id\n\n    def get_thw(self, bucket_id):\n        assert len(bucket_id) == 3\n        T = self.t_criteria[bucket_id[0]][bucket_id[1]]\n        H, W = self.ar_criteria[bucket_id[0]][bucket_id[1]][bucket_id[2]]\n        return T, H, W\n\n    def get_prob(self, bucket_id):\n        return self.bucket_probs[bucket_id[0]][bucket_id[1]]\n\n    def get_batch_size(self, bucket_id):\n        return self.bucket_bs[bucket_id[0]][bucket_id[1]]\n\n    def __len__(self):\n        return self.num_bucket\n\n\ndef closet_smaller_bucket(value, bucket):\n    for i in range(1, len(bucket)):\n        if value < bucket[i]:\n            return bucket[i - 1]\n    return bucket[-1]\n"
  },
  {
    "path": "Open-Sora/opensora/datasets/dataloader.py",
    "content": "import collections\nimport random\nfrom typing import Optional\n\nimport numpy as np\nimport torch\nfrom torch.distributed import ProcessGroup\nfrom torch.distributed.distributed_c10d import _get_default_group\nfrom torch.utils.data import DataLoader\n\nfrom .datasets import BatchFeatureDataset, VariableVideoTextDataset, VideoTextDataset\nfrom .sampler import BatchDistributedSampler, StatefulDistributedSampler, VariableVideoBatchSampler\n\n\n# Deterministic dataloader\ndef get_seed_worker(seed):\n    def seed_worker(worker_id):\n        worker_seed = seed\n        np.random.seed(worker_seed)\n        torch.manual_seed(worker_seed)\n        random.seed(worker_seed)\n\n    return seed_worker\n\n\ndef prepare_dataloader(\n    dataset,\n    batch_size=None,\n    shuffle=False,\n    seed=1024,\n    drop_last=False,\n    pin_memory=False,\n    num_workers=0,\n    process_group: Optional[ProcessGroup] = None,\n    bucket_config=None,\n    num_bucket_build_workers=1,\n    prefetch_factor=None,\n    **kwargs,\n):\n    _kwargs = kwargs.copy()\n    if isinstance(dataset, VariableVideoTextDataset):\n        batch_sampler = VariableVideoBatchSampler(\n            dataset,\n            bucket_config,\n            num_replicas=process_group.size(),\n            rank=process_group.rank(),\n            shuffle=shuffle,\n            seed=seed,\n            drop_last=drop_last,\n            verbose=True,\n            num_bucket_build_workers=num_bucket_build_workers,\n        )\n        return (\n            DataLoader(\n                dataset,\n                batch_sampler=batch_sampler,\n                worker_init_fn=get_seed_worker(seed),\n                pin_memory=pin_memory,\n                num_workers=num_workers,\n                collate_fn=collate_fn_default,\n                prefetch_factor=prefetch_factor,\n                **_kwargs,\n            ),\n            batch_sampler,\n        )\n    elif isinstance(dataset, VideoTextDataset):\n        process_group = process_group or _get_default_group()\n        sampler = StatefulDistributedSampler(\n            dataset,\n            num_replicas=process_group.size(),\n            rank=process_group.rank(),\n            shuffle=shuffle,\n        )\n        return (\n            DataLoader(\n                dataset,\n                batch_size=batch_size,\n                sampler=sampler,\n                worker_init_fn=get_seed_worker(seed),\n                drop_last=drop_last,\n                pin_memory=pin_memory,\n                num_workers=num_workers,\n                collate_fn=collate_fn_default,\n                prefetch_factor=prefetch_factor,\n                **_kwargs,\n            ),\n            sampler,\n        )\n    elif isinstance(dataset, BatchFeatureDataset):\n        sampler = BatchDistributedSampler(\n            dataset,\n            num_replicas=process_group.size(),\n            rank=process_group.rank(),\n        )\n        return (\n            DataLoader(\n                dataset,\n                batch_size=1,\n                sampler=sampler,\n                worker_init_fn=get_seed_worker(seed),\n                pin_memory=pin_memory,\n                num_workers=num_workers,\n                collate_fn=collate_fn_batch,\n                prefetch_factor=prefetch_factor,\n                **_kwargs,\n            ),\n            sampler,\n        )\n    else:\n        raise ValueError(f\"Unsupported dataset type: {type(dataset)}\")\n\n\ndef collate_fn_default(batch):\n    # filter out None\n    batch = [x for x in batch if x is not None]\n\n    # HACK: for loading text features\n    use_mask = False\n    if \"mask\" in batch[0] and isinstance(batch[0][\"mask\"], int):\n        masks = [x.pop(\"mask\") for x in batch]\n\n        texts = [x.pop(\"text\") for x in batch]\n        texts = torch.cat(texts, dim=1)\n        use_mask = True\n\n    ret = torch.utils.data.default_collate(batch)\n\n    if use_mask:\n        ret[\"mask\"] = masks\n        ret[\"text\"] = texts\n    return ret\n\n\ndef collate_fn_batch(batch):\n    \"\"\"\n    Used only with BatchDistributedSampler\n    \"\"\"\n    # filter out None\n    batch = [x for x in batch if x is not None]\n    \n    res = torch.utils.data.default_collate(batch)\n\n    # squeeze the first dimension, which is due to torch.stack() in default_collate()\n    if isinstance(res, collections.abc.Mapping):\n        for k, v in res.items():\n            if isinstance(v, torch.Tensor):\n                res[k] = v.squeeze(0)\n    elif isinstance(res, collections.abc.Sequence):\n        res = [x.squeeze(0) if isinstance(x, torch.Tensor) else x for x in res]\n    elif isinstance(res, torch.Tensor):\n        res = res.squeeze(0)\n    else:\n        raise TypeError\n\n    return res\n"
  },
  {
    "path": "Open-Sora/opensora/datasets/datasets.py",
    "content": "import os\nfrom glob import glob\n\nimport numpy as np\nimport torch\nfrom PIL import ImageFile\nfrom torchvision.datasets.folder import IMG_EXTENSIONS, pil_loader\n\nfrom opensora.registry import DATASETS\n\nfrom .read_video import read_video\nfrom .utils import VID_EXTENSIONS, get_transforms_image, get_transforms_video, read_file, temporal_random_crop\n\nImageFile.LOAD_TRUNCATED_IMAGES = True\nIMG_FPS = 120\n\n\n@DATASETS.register_module()\nclass VideoTextDataset(torch.utils.data.Dataset):\n    \"\"\"load video according to the csv file.\n\n    Args:\n        target_video_len (int): the number of video frames will be load.\n        align_transform (callable): Align different videos in a specified size.\n        temporal_sample (callable): Sample the target length of a video.\n    \"\"\"\n\n    def __init__(\n        self,\n        data_path=None,\n        num_frames=16,\n        frame_interval=1,\n        image_size=(256, 256),\n        transform_name=\"center\",\n    ):\n        self.data_path = data_path\n        self.data = read_file(data_path)\n        self.get_text = \"text\" in self.data.columns\n        self.num_frames = num_frames\n        self.frame_interval = frame_interval\n        self.image_size = image_size\n        self.transforms = {\n            \"image\": get_transforms_image(transform_name, image_size),\n            \"video\": get_transforms_video(transform_name, image_size),\n        }\n\n    def _print_data_number(self):\n        num_videos = 0\n        num_images = 0\n        for path in self.data[\"path\"]:\n            if self.get_type(path) == \"video\":\n                num_videos += 1\n            else:\n                num_images += 1\n        print(f\"Dataset contains {num_videos} videos and {num_images} images.\")\n\n    def get_type(self, path):\n        ext = os.path.splitext(path)[-1].lower()\n        if ext.lower() in VID_EXTENSIONS:\n            return \"video\"\n        else:\n            assert ext.lower() in IMG_EXTENSIONS, f\"Unsupported file format: {ext}\"\n            return \"image\"\n\n    def getitem(self, index):\n        sample = self.data.iloc[index]\n        path = sample[\"path\"]\n        file_type = self.get_type(path)\n\n        if file_type == \"video\":\n            # loading\n            vframes, vinfo = read_video(path, backend=\"av\")\n            video_fps = vinfo[\"video_fps\"] if \"video_fps\" in vinfo else 24\n\n            # Sampling video frames\n            video = temporal_random_crop(vframes, self.num_frames, self.frame_interval)\n\n            # transform\n            transform = self.transforms[\"video\"]\n            video = transform(video)  # T C H W\n        else:\n            # loading\n            image = pil_loader(path)\n            video_fps = IMG_FPS\n\n            # transform\n            transform = self.transforms[\"image\"]\n            image = transform(image)\n\n            # repeat\n            video = image.unsqueeze(0).repeat(self.num_frames, 1, 1, 1)\n\n        # TCHW -> CTHW\n        video = video.permute(1, 0, 2, 3)\n\n        ret = {\"video\": video, \"fps\": video_fps}\n        if self.get_text:\n            ret[\"text\"] = sample[\"text\"]\n        return ret\n\n    def __getitem__(self, index):\n        for _ in range(10):\n            try:\n                return self.getitem(index)\n            except Exception as e:\n                path = self.data.iloc[index][\"path\"]\n                print(f\"data {path}: {e}\")\n                index = np.random.randint(len(self))\n        raise RuntimeError(\"Too many bad data.\")\n\n    def __len__(self):\n        return len(self.data)\n\n\n@DATASETS.register_module()\nclass VariableVideoTextDataset(VideoTextDataset):\n    def __init__(\n        self,\n        data_path=None,\n        num_frames=None,\n        frame_interval=1,\n        image_size=(None, None),\n        transform_name=None,\n        dummy_text_feature=False,\n    ):\n        super().__init__(data_path, num_frames, frame_interval, image_size, transform_name=None)\n        self.transform_name = transform_name\n        self.data[\"id\"] = np.arange(len(self.data))\n        self.dummy_text_feature = dummy_text_feature\n\n    def get_data_info(self, index):\n        T = self.data.iloc[index][\"num_frames\"]\n        H = self.data.iloc[index][\"height\"]\n        W = self.data.iloc[index][\"width\"]\n        return T, H, W\n\n    def getitem(self, index):\n        # a hack to pass in the (time, height, width) info from sampler\n        index, num_frames, height, width = [int(val) for val in index.split(\"-\")]\n\n        sample = self.data.iloc[index]\n        path = sample[\"path\"]\n        file_type = self.get_type(path)\n        ar = height / width\n\n        video_fps = 24  # default fps\n        if file_type == \"video\":\n            # loading\n            vframes, vinfo = read_video(path, backend=\"av\")\n            video_fps = vinfo[\"video_fps\"] if \"video_fps\" in vinfo else 24\n\n            # Sampling video frames\n            video = temporal_random_crop(vframes, num_frames, self.frame_interval)\n            video = video.clone()\n            del vframes\n\n            video_fps = video_fps // self.frame_interval\n\n            # transform\n            transform = get_transforms_video(self.transform_name, (height, width))\n            video = transform(video)  # T C H W\n        else:\n            # loading\n            image = pil_loader(path)\n            video_fps = IMG_FPS\n\n            # transform\n            transform = get_transforms_image(self.transform_name, (height, width))\n            image = transform(image)\n\n            # repeat\n            video = image.unsqueeze(0)\n\n        # TCHW -> CTHW\n        video = video.permute(1, 0, 2, 3)\n        ret = {\n            \"video\": video,\n            \"num_frames\": num_frames,\n            \"height\": height,\n            \"width\": width,\n            \"ar\": ar,\n            \"fps\": video_fps,\n        }\n        if self.get_text:\n            ret[\"text\"] = sample[\"text\"]\n        if self.dummy_text_feature:\n            text_len = 50\n            ret[\"text\"] = torch.zeros((1, text_len, 1152))\n            ret[\"mask\"] = text_len\n        return ret\n\n    def __getitem__(self, index):\n        try:\n            return self.getitem(index)\n        except:\n            return None\n\n\n@DATASETS.register_module()\nclass BatchFeatureDataset(torch.utils.data.Dataset):\n    \"\"\"\n    The dataset is composed of multiple .bin files.\n    Each .bin file is a list of batch data (like a buffer). All .bin files have the same length.\n    In each training iteration, one batch is fetched from the current buffer.\n    Once a buffer is consumed, load another one.\n    Avoid loading the same .bin on two difference GPUs, i.e., one .bin is assigned to one GPU only.\n    \"\"\"\n\n    def __init__(self, data_path=None):\n        self.path_list = sorted(glob(data_path + \"/**/*.bin\"))\n\n        self._len_buffer = len(torch.load(self.path_list[0]))\n        self._num_buffers = len(self.path_list)\n        self.num_samples = self.len_buffer * len(self.path_list)\n\n        self.cur_file_idx = -1\n        self.cur_buffer = None\n\n    @property\n    def num_buffers(self):\n        return self._num_buffers\n\n    @property\n    def len_buffer(self):\n        return self._len_buffer\n\n    def _load_buffer(self, idx):\n        file_idx = idx // self.len_buffer\n        if file_idx != self.cur_file_idx:\n            self.cur_file_idx = file_idx\n            self.cur_buffer = torch.load(self.path_list[file_idx])\n\n    def __len__(self):\n        return self.num_samples\n\n    def __getitem__(self, idx):\n        self._load_buffer(idx)\n\n        batch = self.cur_buffer[idx % self.len_buffer]  # dict; keys are {'x', 'fps'} and text related\n\n        ret = {\n            \"video\": batch[\"x\"],\n            \"text\": batch[\"y\"],\n            \"mask\": batch[\"mask\"],\n            \"fps\": batch[\"fps\"],\n            \"height\": batch[\"height\"],\n            \"width\": batch[\"width\"],\n            \"num_frames\": batch[\"num_frames\"],\n        }\n        return ret\n"
  },
  {
    "path": "Open-Sora/opensora/datasets/read_video.py",
    "content": "import gc\nimport math\nimport os\nimport re\nimport warnings\nfrom fractions import Fraction\nfrom typing import Any, Dict, List, Optional, Tuple, Union\n\nimport av\nimport cv2\nimport numpy as np\nimport torch\nfrom torchvision import get_video_backend\nfrom torchvision.io.video import _check_av_available\n\nMAX_NUM_FRAMES = 2500\n\n\ndef read_video_av(\n    filename: str,\n    start_pts: Union[float, Fraction] = 0,\n    end_pts: Optional[Union[float, Fraction]] = None,\n    pts_unit: str = \"pts\",\n    output_format: str = \"THWC\",\n) -> Tuple[torch.Tensor, torch.Tensor, Dict[str, Any]]:\n    \"\"\"\n    Reads a video from a file, returning both the video frames and the audio frames\n\n    This method is modified from torchvision.io.video.read_video, with the following changes:\n\n    1. will not extract audio frames and return empty for aframes\n    2. remove checks and only support pyav\n    3. add container.close() and gc.collect() to avoid thread leakage\n    4. try our best to avoid memory leak\n\n    Args:\n        filename (str): path to the video file\n        start_pts (int if pts_unit = 'pts', float / Fraction if pts_unit = 'sec', optional):\n            The start presentation time of the video\n        end_pts (int if pts_unit = 'pts', float / Fraction if pts_unit = 'sec', optional):\n            The end presentation time\n        pts_unit (str, optional): unit in which start_pts and end_pts values will be interpreted,\n            either 'pts' or 'sec'. Defaults to 'pts'.\n        output_format (str, optional): The format of the output video tensors. Can be either \"THWC\" (default) or \"TCHW\".\n\n    Returns:\n        vframes (Tensor[T, H, W, C] or Tensor[T, C, H, W]): the `T` video frames\n        aframes (Tensor[K, L]): the audio frames, where `K` is the number of channels and `L` is the number of points\n        info (Dict): metadata for the video and audio. Can contain the fields video_fps (float) and audio_fps (int)\n    \"\"\"\n    # format\n    output_format = output_format.upper()\n    if output_format not in (\"THWC\", \"TCHW\"):\n        raise ValueError(f\"output_format should be either 'THWC' or 'TCHW', got {output_format}.\")\n    # file existence\n    if not os.path.exists(filename):\n        raise RuntimeError(f\"File not found: {filename}\")\n    # backend check\n    assert get_video_backend() == \"pyav\", \"pyav backend is required for read_video_av\"\n    _check_av_available()\n    # end_pts check\n    if end_pts is None:\n        end_pts = float(\"inf\")\n    if end_pts < start_pts:\n        raise ValueError(f\"end_pts should be larger than start_pts, got start_pts={start_pts} and end_pts={end_pts}\")\n\n    # == get video info ==\n    info = {}\n    # TODO: creating an container leads to memory leak (1G for 8 workers 1 GPU)\n    container = av.open(filename, metadata_errors=\"ignore\")\n    # fps\n    video_fps = container.streams.video[0].average_rate\n    # guard against potentially corrupted files\n    if video_fps is not None:\n        info[\"video_fps\"] = float(video_fps)\n    iter_video = container.decode(**{\"video\": 0})\n    frame = next(iter_video).to_rgb().to_ndarray()\n    height, width = frame.shape[:2]\n    total_frames = container.streams.video[0].frames\n    if total_frames == 0:\n        total_frames = MAX_NUM_FRAMES\n        warnings.warn(f\"total_frames is 0, using {MAX_NUM_FRAMES} as a fallback\")\n    container.close()\n    del container\n\n    # HACK: must create before iterating stream\n    # use np.zeros will not actually allocate memory\n    # use np.ones will lead to a little memory leak\n    video_frames = np.zeros((total_frames, height, width, 3), dtype=np.uint8)\n\n    # == read ==\n    try:\n        # TODO: The reading has memory leak (4G for 8 workers 1 GPU)\n        container = av.open(filename, metadata_errors=\"ignore\")\n        assert container.streams.video is not None\n        video_frames = _read_from_stream(\n            video_frames,\n            container,\n            start_pts,\n            end_pts,\n            pts_unit,\n            container.streams.video[0],\n            {\"video\": 0},\n            filename=filename,\n        )\n    except av.AVError as e:\n        print(f\"[Warning] Error while reading video {filename}: {e}\")\n\n    vframes = torch.from_numpy(video_frames).clone()\n    del video_frames\n    if output_format == \"TCHW\":\n        # [T,H,W,C] --> [T,C,H,W]\n        vframes = vframes.permute(0, 3, 1, 2)\n\n    aframes = torch.empty((1, 0), dtype=torch.float32)\n    return vframes, aframes, info\n\n\ndef _read_from_stream(\n    video_frames,\n    container: \"av.container.Container\",\n    start_offset: float,\n    end_offset: float,\n    pts_unit: str,\n    stream: \"av.stream.Stream\",\n    stream_name: Dict[str, Optional[Union[int, Tuple[int, ...], List[int]]]],\n    filename: Optional[str] = None,\n) -> List[\"av.frame.Frame\"]:\n    if pts_unit == \"sec\":\n        # TODO: we should change all of this from ground up to simply take\n        # sec and convert to MS in C++\n        start_offset = int(math.floor(start_offset * (1 / stream.time_base)))\n        if end_offset != float(\"inf\"):\n            end_offset = int(math.ceil(end_offset * (1 / stream.time_base)))\n    else:\n        warnings.warn(\"The pts_unit 'pts' gives wrong results. Please use pts_unit 'sec'.\")\n\n    should_buffer = True\n    max_buffer_size = 5\n    if stream.type == \"video\":\n        # DivX-style packed B-frames can have out-of-order pts (2 frames in a single pkt)\n        # so need to buffer some extra frames to sort everything\n        # properly\n        extradata = stream.codec_context.extradata\n        # overly complicated way of finding if `divx_packed` is set, following\n        # https://github.com/FFmpeg/FFmpeg/commit/d5a21172283572af587b3d939eba0091484d3263\n        if extradata and b\"DivX\" in extradata:\n            # can't use regex directly because of some weird characters sometimes...\n            pos = extradata.find(b\"DivX\")\n            d = extradata[pos:]\n            o = re.search(rb\"DivX(\\d+)Build(\\d+)(\\w)\", d)\n            if o is None:\n                o = re.search(rb\"DivX(\\d+)b(\\d+)(\\w)\", d)\n            if o is not None:\n                should_buffer = o.group(3) == b\"p\"\n    seek_offset = start_offset\n    # some files don't seek to the right location, so better be safe here\n    seek_offset = max(seek_offset - 1, 0)\n    if should_buffer:\n        # FIXME this is kind of a hack, but we will jump to the previous keyframe\n        # so this will be safe\n        seek_offset = max(seek_offset - max_buffer_size, 0)\n    try:\n        # TODO check if stream needs to always be the video stream here or not\n        container.seek(seek_offset, any_frame=False, backward=True, stream=stream)\n    except av.AVError as e:\n        print(f\"[Warning] Error while seeking video {filename}: {e}\")\n        return []\n\n    # == main ==\n    buffer_count = 0\n    frames_pts = []\n    cnt = 0\n    try:\n        for _idx, frame in enumerate(container.decode(**stream_name)):\n            frames_pts.append(frame.pts)\n            video_frames[cnt] = frame.to_rgb().to_ndarray()\n            cnt += 1\n            if cnt >= len(video_frames):\n                break\n            if frame.pts >= end_offset:\n                if should_buffer and buffer_count < max_buffer_size:\n                    buffer_count += 1\n                    continue\n                break\n    except av.AVError as e:\n        print(f\"[Warning] Error while reading video {filename}: {e}\")\n\n    # garbage collection for thread leakage\n    container.close()\n    del container\n    # NOTE: manually garbage collect to close pyav threads\n    gc.collect()\n\n    # ensure that the results are sorted wrt the pts\n    # NOTE: here we assert frames_pts is sorted\n    start_ptr = 0\n    end_ptr = cnt\n    while start_ptr < end_ptr and frames_pts[start_ptr] < start_offset:\n        start_ptr += 1\n    while start_ptr < end_ptr and frames_pts[end_ptr - 1] > end_offset:\n        end_ptr -= 1\n    if start_offset > 0 and start_offset not in frames_pts[start_ptr:end_ptr]:\n        # if there is no frame that exactly matches the pts of start_offset\n        # add the last frame smaller than start_offset, to guarantee that\n        # we will have all the necessary data. This is most useful for audio\n        if start_ptr > 0:\n            start_ptr -= 1\n    result = video_frames[start_ptr:end_ptr].copy()\n    return result\n\n\ndef read_video_cv2(video_path):\n    cap = cv2.VideoCapture(video_path)\n\n    if not cap.isOpened():\n        # print(\"Error: Unable to open video\")\n        raise ValueError\n    else:\n        fps = cap.get(cv2.CAP_PROP_FPS)\n        vinfo = {\n            \"video_fps\": fps,\n        }\n\n        frames = []\n        while True:\n            # Read a frame from the video\n            ret, frame = cap.read()\n\n            # If frame is not read correctly, break the loop\n            if not ret:\n                break\n\n            frames.append(frame[:, :, ::-1])  # BGR to RGB\n\n            # Exit if 'q' is pressed\n            if cv2.waitKey(25) & 0xFF == ord(\"q\"):\n                break\n\n        # Release the video capture object and close all windows\n        cap.release()\n        cv2.destroyAllWindows()\n\n        frames = np.stack(frames)\n        frames = torch.from_numpy(frames)  # [T, H, W, C=3]\n        frames = frames.permute(0, 3, 1, 2)\n        return frames, vinfo\n\n\ndef read_video(video_path, backend=\"av\"):\n    if backend == \"cv2\":\n        vframes, vinfo = read_video_cv2(video_path)\n    elif backend == \"av\":\n        vframes, _, vinfo = read_video_av(filename=video_path, pts_unit=\"sec\", output_format=\"TCHW\")\n    else:\n        raise ValueError\n\n    return vframes, vinfo\n"
  },
  {
    "path": "Open-Sora/opensora/datasets/sampler.py",
    "content": "from collections import OrderedDict, defaultdict\nfrom pprint import pformat\nfrom typing import Iterator, List, Optional\n\nimport numpy as np\nimport torch\nimport torch.distributed as dist\nfrom torch.utils.data import Dataset, DistributedSampler\n\nfrom opensora.utils.misc import format_numel_str, get_logger\n\nfrom .aspect import get_num_pixels\nfrom .bucket import Bucket\nfrom .datasets import VariableVideoTextDataset\n\n\n# use pandarallel to accelerate bucket processing\n# NOTE: pandarallel should only access local variables\ndef apply(data, method=None, frame_interval=None, seed=None, num_bucket=None):\n    return method(\n        data[\"num_frames\"],\n        data[\"height\"],\n        data[\"width\"],\n        frame_interval,\n        seed + data[\"id\"] * num_bucket,\n    )\n\n\nclass StatefulDistributedSampler(DistributedSampler):\n    def __init__(\n        self,\n        dataset: Dataset,\n        num_replicas: Optional[int] = None,\n        rank: Optional[int] = None,\n        shuffle: bool = True,\n        seed: int = 0,\n        drop_last: bool = False,\n    ) -> None:\n        super().__init__(dataset, num_replicas, rank, shuffle, seed, drop_last)\n        self.start_index: int = 0\n\n    def __iter__(self) -> Iterator:\n        iterator = super().__iter__()\n        indices = list(iterator)\n        indices = indices[self.start_index :]\n        return iter(indices)\n\n    def __len__(self) -> int:\n        return self.num_samples - self.start_index\n\n    def reset(self) -> None:\n        self.start_index = 0\n\n    def state_dict(self, step) -> dict:\n        return {\"start_index\": step}\n\n    def load_state_dict(self, state_dict: dict) -> None:\n        self.__dict__.update(state_dict)\n\n\nclass VariableVideoBatchSampler(DistributedSampler):\n    def __init__(\n        self,\n        dataset: VariableVideoTextDataset,\n        bucket_config: dict,\n        num_replicas: Optional[int] = None,\n        rank: Optional[int] = None,\n        shuffle: bool = True,\n        seed: int = 0,\n        drop_last: bool = False,\n        verbose: bool = False,\n        num_bucket_build_workers: int = 1,\n    ) -> None:\n        super().__init__(\n            dataset=dataset, num_replicas=num_replicas, rank=rank, shuffle=shuffle, seed=seed, drop_last=drop_last\n        )\n        self.dataset = dataset\n        self.bucket = Bucket(bucket_config)\n        self.verbose = verbose\n        self.last_micro_batch_access_index = 0\n        self.approximate_num_batch = None\n\n        self._get_num_batch_cached_bucket_sample_dict = None\n        self.num_bucket_build_workers = num_bucket_build_workers\n\n    def __iter__(self) -> Iterator[List[int]]:\n        if self._get_num_batch_cached_bucket_sample_dict is not None:\n            bucket_sample_dict = self._get_num_batch_cached_bucket_sample_dict\n            self._get_num_batch_cached_bucket_sample_dict = None\n        else:\n            bucket_sample_dict = self.group_by_bucket()\n            if self.verbose:\n                self._print_bucket_info(bucket_sample_dict)\n\n        g = torch.Generator()\n        g.manual_seed(self.seed + self.epoch)\n        bucket_micro_batch_count = OrderedDict()\n        bucket_last_consumed = OrderedDict()\n\n        # process the samples\n        for bucket_id, data_list in bucket_sample_dict.items():\n            # handle droplast\n            bs_per_gpu = self.bucket.get_batch_size(bucket_id)\n            remainder = len(data_list) % bs_per_gpu\n\n            if remainder > 0:\n                if not self.drop_last:\n                    # if there is remainder, we pad to make it divisible\n                    data_list += data_list[: bs_per_gpu - remainder]\n                else:\n                    # we just drop the remainder to make it divisible\n                    data_list = data_list[:-remainder]\n            bucket_sample_dict[bucket_id] = data_list\n\n            # handle shuffle\n            if self.shuffle:\n                data_indices = torch.randperm(len(data_list), generator=g).tolist()\n                data_list = [data_list[i] for i in data_indices]\n                bucket_sample_dict[bucket_id] = data_list\n\n            # compute how many micro-batches each bucket has\n            num_micro_batches = len(data_list) // bs_per_gpu\n            bucket_micro_batch_count[bucket_id] = num_micro_batches\n\n        # compute the bucket access order\n        # each bucket may have more than one batch of data\n        # thus bucket_id may appear more than 1 time\n        bucket_id_access_order = []\n        for bucket_id, num_micro_batch in bucket_micro_batch_count.items():\n            bucket_id_access_order.extend([bucket_id] * num_micro_batch)\n\n        # randomize the access order\n        if self.shuffle:\n            bucket_id_access_order_indices = torch.randperm(len(bucket_id_access_order), generator=g).tolist()\n            bucket_id_access_order = [bucket_id_access_order[i] for i in bucket_id_access_order_indices]\n\n        # make the number of bucket accesses divisible by dp size\n        remainder = len(bucket_id_access_order) % self.num_replicas\n        if remainder > 0:\n            if self.drop_last:\n                bucket_id_access_order = bucket_id_access_order[: len(bucket_id_access_order) - remainder]\n            else:\n                bucket_id_access_order += bucket_id_access_order[: self.num_replicas - remainder]\n\n        # prepare each batch from its bucket\n        # according to the predefined bucket access order\n        num_iters = len(bucket_id_access_order) // self.num_replicas\n        start_iter_idx = self.last_micro_batch_access_index // self.num_replicas\n\n        # re-compute the micro-batch consumption\n        # this is useful when resuming from a state dict with a different number of GPUs\n        self.last_micro_batch_access_index = start_iter_idx * self.num_replicas\n        for i in range(self.last_micro_batch_access_index):\n            bucket_id = bucket_id_access_order[i]\n            bucket_bs = self.bucket.get_batch_size(bucket_id)\n            if bucket_id in bucket_last_consumed:\n                bucket_last_consumed[bucket_id] += bucket_bs\n            else:\n                bucket_last_consumed[bucket_id] = bucket_bs\n\n        for i in range(start_iter_idx, num_iters):\n            bucket_access_list = bucket_id_access_order[i * self.num_replicas : (i + 1) * self.num_replicas]\n            self.last_micro_batch_access_index += self.num_replicas\n\n            # compute the data samples consumed by each access\n            bucket_access_boundaries = []\n            for bucket_id in bucket_access_list:\n                bucket_bs = self.bucket.get_batch_size(bucket_id)\n                last_consumed_index = bucket_last_consumed.get(bucket_id, 0)\n                bucket_access_boundaries.append([last_consumed_index, last_consumed_index + bucket_bs])\n\n                # update consumption\n                if bucket_id in bucket_last_consumed:\n                    bucket_last_consumed[bucket_id] += bucket_bs\n                else:\n                    bucket_last_consumed[bucket_id] = bucket_bs\n\n            # compute the range of data accessed by each GPU\n            bucket_id = bucket_access_list[self.rank]\n            boundary = bucket_access_boundaries[self.rank]\n            cur_micro_batch = bucket_sample_dict[bucket_id][boundary[0] : boundary[1]]\n\n            # encode t, h, w into the sample index\n            real_t, real_h, real_w = self.bucket.get_thw(bucket_id)\n            cur_micro_batch = [f\"{idx}-{real_t}-{real_h}-{real_w}\" for idx in cur_micro_batch]\n            yield cur_micro_batch\n\n        self.reset()\n\n    def __len__(self) -> int:\n        return self.get_num_batch() // dist.get_world_size()\n\n    def group_by_bucket(self) -> dict:\n        bucket_sample_dict = OrderedDict()\n\n        from pandarallel import pandarallel\n\n        pandarallel.initialize(nb_workers=self.num_bucket_build_workers, progress_bar=False)\n        get_logger().info(\"Building buckets...\")\n        bucket_ids = self.dataset.data.parallel_apply(\n            apply,\n            axis=1,\n            method=self.bucket.get_bucket_id,\n            frame_interval=self.dataset.frame_interval,\n            seed=self.seed + self.epoch,\n            num_bucket=self.bucket.num_bucket,\n        )\n\n        # group by bucket\n        # each data sample is put into a bucket with a similar image/video size\n        for i in range(len(self.dataset)):\n            bucket_id = bucket_ids[i]\n            if bucket_id is None:\n                continue\n            if bucket_id not in bucket_sample_dict:\n                bucket_sample_dict[bucket_id] = []\n            bucket_sample_dict[bucket_id].append(i)\n        return bucket_sample_dict\n\n    def get_num_batch(self) -> int:\n        bucket_sample_dict = self.group_by_bucket()\n        self._get_num_batch_cached_bucket_sample_dict = bucket_sample_dict\n\n        # calculate the number of batches\n        if self.verbose:\n            self._print_bucket_info(bucket_sample_dict)\n        return self.approximate_num_batch\n\n    def _print_bucket_info(self, bucket_sample_dict: dict) -> None:\n        # collect statistics\n        total_samples = 0\n        total_batch = 0\n        num_aspect_dict = defaultdict(lambda: [0, 0])\n        num_hwt_dict = defaultdict(lambda: [0, 0])\n        for k, v in bucket_sample_dict.items():\n            size = len(v)\n            num_batch = size // self.bucket.get_batch_size(k[:-1])\n\n            total_samples += size\n            total_batch += num_batch\n\n            num_aspect_dict[k[-1]][0] += size\n            num_aspect_dict[k[-1]][1] += num_batch\n            num_hwt_dict[k[:-1]][0] += size\n            num_hwt_dict[k[:-1]][1] += num_batch\n\n        # sort\n        num_aspect_dict = dict(sorted(num_aspect_dict.items(), key=lambda x: x[0]))\n        num_hwt_dict = dict(\n            sorted(num_hwt_dict.items(), key=lambda x: (get_num_pixels(x[0][0]), x[0][1]), reverse=True)\n        )\n        num_hwt_img_dict = {k: v for k, v in num_hwt_dict.items() if k[1] == 1}\n        num_hwt_vid_dict = {k: v for k, v in num_hwt_dict.items() if k[1] > 1}\n\n        # log\n        if dist.get_rank() == 0 and self.verbose:\n            get_logger().info(\"Bucket Info:\")\n            get_logger().info(\n                \"Bucket [#sample, #batch] by aspect ratio:\\n%s\", pformat(num_aspect_dict, sort_dicts=False)\n            )\n            get_logger().info(\n                \"Image Bucket [#sample, #batch] by HxWxT:\\n%s\", pformat(num_hwt_img_dict, sort_dicts=False)\n            )\n            get_logger().info(\n                \"Video Bucket [#sample, #batch] by HxWxT:\\n%s\", pformat(num_hwt_vid_dict, sort_dicts=False)\n            )\n            get_logger().info(\n                \"#training batch: %s, #training sample: %s, #non empty bucket: %s\",\n                format_numel_str(total_batch),\n                format_numel_str(total_samples),\n                len(bucket_sample_dict),\n            )\n        self.approximate_num_batch = total_batch\n\n    def reset(self):\n        self.last_micro_batch_access_index = 0\n\n    def state_dict(self, num_steps: int) -> dict:\n        # the last_micro_batch_access_index in the __iter__ is often\n        # not accurate during multi-workers and data prefetching\n        # thus, we need the user to pass the actual steps which have been executed\n        # to calculate the correct last_micro_batch_access_index\n        return {\"seed\": self.seed, \"epoch\": self.epoch, \"last_micro_batch_access_index\": num_steps * self.num_replicas}\n\n    def load_state_dict(self, state_dict: dict) -> None:\n        self.__dict__.update(state_dict)\n\n\nclass BatchDistributedSampler(DistributedSampler):\n    \"\"\"\n    Used with BatchDataset;\n    Suppose len_buffer == 5, num_buffers == 6, #GPUs == 3, then\n           | buffer {i}          | buffer {i+1}\n    ------ | ------------------- | -------------------\n    rank 0 |  0,  1,  2,  3,  4, |  5,  6,  7,  8,  9\n    rank 1 | 10, 11, 12, 13, 14, | 15, 16, 17, 18, 19\n    rank 2 | 20, 21, 22, 23, 24, | 25, 26, 27, 28, 29\n    \"\"\"\n\n    def __init__(self, dataset: Dataset, **kwargs):\n        super().__init__(dataset, **kwargs)\n        self.start_index = 0\n\n    def __iter__(self):\n        num_buffers = self.dataset.num_buffers\n        len_buffer = self.dataset.len_buffer\n        num_buffers_i = num_buffers // self.num_replicas\n        num_samples_i = len_buffer * num_buffers_i\n\n        indices_i = np.arange(self.start_index, num_samples_i) + self.rank * num_samples_i\n        indices_i = indices_i.tolist()\n\n        return iter(indices_i)\n\n    def reset(self):\n        self.start_index = 0\n\n    def state_dict(self, step) -> dict:\n        return {\"start_index\": step}\n\n    def load_state_dict(self, state_dict: dict):\n        self.start_index = state_dict[\"start_index\"] + 1\n"
  },
  {
    "path": "Open-Sora/opensora/datasets/utils.py",
    "content": "import os\nimport re\n\nimport numpy as np\nimport pandas as pd\nimport requests\nimport torch\nimport torchvision\nimport torchvision.transforms as transforms\nfrom PIL import Image\nfrom torchvision.datasets.folder import IMG_EXTENSIONS, pil_loader\nfrom torchvision.io import write_video\nfrom torchvision.utils import save_image\n\nfrom . import video_transforms\n\nVID_EXTENSIONS = (\".mp4\", \".avi\", \".mov\", \".mkv\")\n\nregex = re.compile(\n    r\"^(?:http|ftp)s?://\"  # http:// or https://\n    r\"(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\\.)+(?:[A-Z]{2,6}\\.?|[A-Z0-9-]{2,}\\.?)|\"  # domain...\n    r\"localhost|\"  # localhost...\n    r\"\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\"  # ...or ip\n    r\"(?::\\d+)?\"  # optional port\n    r\"(?:/?|[/?]\\S+)$\",\n    re.IGNORECASE,\n)\n\n\ndef is_img(path):\n    ext = os.path.splitext(path)[-1].lower()\n    return ext in IMG_EXTENSIONS\n\n\ndef is_vid(path):\n    ext = os.path.splitext(path)[-1].lower()\n    return ext in VID_EXTENSIONS\n\n\ndef is_url(url):\n    return re.match(regex, url) is not None\n\n\ndef read_file(input_path):\n    if input_path.endswith(\".csv\"):\n        return pd.read_csv(input_path)\n    elif input_path.endswith(\".parquet\"):\n        return pd.read_parquet(input_path)\n    else:\n        raise NotImplementedError(f\"Unsupported file format: {input_path}\")\n\n\ndef download_url(input_path):\n    output_dir = \"cache\"\n    os.makedirs(output_dir, exist_ok=True)\n    base_name = os.path.basename(input_path)\n    output_path = os.path.join(output_dir, base_name)\n    img_data = requests.get(input_path).content\n    with open(output_path, \"wb\") as handler:\n        handler.write(img_data)\n    print(f\"URL {input_path} downloaded to {output_path}\")\n    return output_path\n\n\ndef temporal_random_crop(vframes, num_frames, frame_interval):\n    temporal_sample = video_transforms.TemporalRandomCrop(num_frames * frame_interval)\n    total_frames = len(vframes)\n    start_frame_ind, end_frame_ind = temporal_sample(total_frames)\n    assert (\n        end_frame_ind - start_frame_ind >= num_frames\n    ), f\"Not enough frames to sample, {end_frame_ind} - {start_frame_ind} < {num_frames}\"\n    frame_indice = np.linspace(start_frame_ind, end_frame_ind - 1, num_frames, dtype=int)\n    video = vframes[frame_indice]\n    return video\n\n\ndef get_transforms_video(name=\"center\", image_size=(256, 256)):\n    if name is None:\n        return None\n    elif name == \"center\":\n        assert image_size[0] == image_size[1], \"image_size must be square for center crop\"\n        transform_video = transforms.Compose(\n            [\n                video_transforms.ToTensorVideo(),  # TCHW\n                # video_transforms.RandomHorizontalFlipVideo(),\n                video_transforms.UCFCenterCropVideo(image_size[0]),\n                transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], inplace=True),\n            ]\n        )\n    elif name == \"resize_crop\":\n        transform_video = transforms.Compose(\n            [\n                video_transforms.ToTensorVideo(),  # TCHW\n                video_transforms.ResizeCrop(image_size),\n                transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], inplace=True),\n            ]\n        )\n    else:\n        raise NotImplementedError(f\"Transform {name} not implemented\")\n    return transform_video\n\n\ndef get_transforms_image(name=\"center\", image_size=(256, 256)):\n    if name is None:\n        return None\n    elif name == \"center\":\n        assert image_size[0] == image_size[1], \"Image size must be square for center crop\"\n        transform = transforms.Compose(\n            [\n                transforms.Lambda(lambda pil_image: center_crop_arr(pil_image, image_size[0])),\n                # transforms.RandomHorizontalFlip(),\n                transforms.ToTensor(),\n                transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], inplace=True),\n            ]\n        )\n    elif name == \"resize_crop\":\n        transform = transforms.Compose(\n            [\n                transforms.Lambda(lambda pil_image: resize_crop_to_fill(pil_image, image_size)),\n                transforms.ToTensor(),\n                transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], inplace=True),\n            ]\n        )\n    else:\n        raise NotImplementedError(f\"Transform {name} not implemented\")\n    return transform\n\n\ndef read_image_from_path(path, transform=None, transform_name=\"center\", num_frames=1, image_size=(256, 256)):\n    image = pil_loader(path)\n    if transform is None:\n        transform = get_transforms_image(image_size=image_size, name=transform_name)\n    image = transform(image)\n    video = image.unsqueeze(0).repeat(num_frames, 1, 1, 1)\n    video = video.permute(1, 0, 2, 3)\n    return video\n\n\ndef read_video_from_path(path, transform=None, transform_name=\"center\", image_size=(256, 256)):\n    vframes, aframes, info = torchvision.io.read_video(filename=path, pts_unit=\"sec\", output_format=\"TCHW\")\n    if transform is None:\n        transform = get_transforms_video(image_size=image_size, name=transform_name)\n    video = transform(vframes)  # T C H W\n    video = video.permute(1, 0, 2, 3)\n    return video\n\n\ndef read_from_path(path, image_size, transform_name=\"center\"):\n    if is_url(path):\n        path = download_url(path)\n    ext = os.path.splitext(path)[-1].lower()\n    if ext.lower() in VID_EXTENSIONS:\n        return read_video_from_path(path, image_size=image_size, transform_name=transform_name)\n    else:\n        assert ext.lower() in IMG_EXTENSIONS, f\"Unsupported file format: {ext}\"\n        return read_image_from_path(path, image_size=image_size, transform_name=transform_name)\n\n\ndef save_sample(x, save_path=None, fps=8, normalize=True, value_range=(-1, 1), force_video=False, verbose=True):\n    \"\"\"\n    Args:\n        x (Tensor): shape [C, T, H, W]\n    \"\"\"\n    assert x.ndim == 4\n\n    if not force_video and x.shape[1] == 1:  # T = 1: save as image\n        save_path += \".png\"\n        x = x.squeeze(1)\n        save_image([x], save_path, normalize=normalize, value_range=value_range)\n    else:\n        save_path += \".mp4\"\n        if normalize:\n            low, high = value_range\n            x.clamp_(min=low, max=high)\n            x.sub_(low).div_(max(high - low, 1e-5))\n\n        x = x.mul(255).add_(0.5).clamp_(0, 255).permute(1, 2, 3, 0).to(\"cpu\", torch.uint8)\n        write_video(save_path, x, fps=fps, video_codec=\"h264\")\n    if verbose:\n        print(f\"Saved to {save_path}\")\n    return save_path\n\n\ndef center_crop_arr(pil_image, image_size):\n    \"\"\"\n    Center cropping implementation from ADM.\n    https://github.com/openai/guided-diffusion/blob/8fb3ad9197f16bbc40620447b2742e13458d2831/guided_diffusion/image_datasets.py#L126\n    \"\"\"\n    while min(*pil_image.size) >= 2 * image_size:\n        pil_image = pil_image.resize(tuple(x // 2 for x in pil_image.size), resample=Image.BOX)\n\n    scale = image_size / min(*pil_image.size)\n    pil_image = pil_image.resize(tuple(round(x * scale) for x in pil_image.size), resample=Image.BICUBIC)\n\n    arr = np.array(pil_image)\n    crop_y = (arr.shape[0] - image_size) // 2\n    crop_x = (arr.shape[1] - image_size) // 2\n    return Image.fromarray(arr[crop_y : crop_y + image_size, crop_x : crop_x + image_size])\n\n\ndef resize_crop_to_fill(pil_image, image_size):\n    w, h = pil_image.size  # PIL is (W, H)\n    th, tw = image_size\n    rh, rw = th / h, tw / w\n    if rh > rw:\n        sh, sw = th, round(w * rh)\n        image = pil_image.resize((sw, sh), Image.BICUBIC)\n        i = 0\n        j = int(round((sw - tw) / 2.0))\n    else:\n        sh, sw = round(h * rw), tw\n        image = pil_image.resize((sw, sh), Image.BICUBIC)\n        i = int(round((sh - th) / 2.0))\n        j = 0\n    arr = np.array(image)\n    assert i + th <= arr.shape[0] and j + tw <= arr.shape[1]\n    return Image.fromarray(arr[i : i + th, j : j + tw])\n"
  },
  {
    "path": "Open-Sora/opensora/datasets/video_transforms.py",
    "content": "# Copyright 2024 Vchitect/Latte\n\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n\n#     http://www.apache.org/licenses/LICENSE-2.0\n\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.# Modified from Latte\n\n# - This file is adapted from https://github.com/Vchitect/Latte/blob/main/datasets/video_transforms.py\n\n\nimport numbers\nimport random\n\nimport numpy as np\nimport torch\n\n\ndef _is_tensor_video_clip(clip):\n    if not torch.is_tensor(clip):\n        raise TypeError(\"clip should be Tensor. Got %s\" % type(clip))\n\n    if not clip.ndimension() == 4:\n        raise ValueError(\"clip should be 4D. Got %dD\" % clip.dim())\n\n    return True\n\n\ndef crop(clip, i, j, h, w):\n    \"\"\"\n    Args:\n        clip (torch.tensor): Video clip to be cropped. Size is (T, C, H, W)\n    \"\"\"\n    if len(clip.size()) != 4:\n        raise ValueError(\"clip should be a 4D tensor\")\n    return clip[..., i : i + h, j : j + w]\n\n\ndef resize(clip, target_size, interpolation_mode):\n    if len(target_size) != 2:\n        raise ValueError(f\"target size should be tuple (height, width), instead got {target_size}\")\n    return torch.nn.functional.interpolate(clip, size=target_size, mode=interpolation_mode, align_corners=False)\n\n\ndef resize_scale(clip, target_size, interpolation_mode):\n    if len(target_size) != 2:\n        raise ValueError(f\"target size should be tuple (height, width), instead got {target_size}\")\n    H, W = clip.size(-2), clip.size(-1)\n    scale_ = target_size[0] / min(H, W)\n    return torch.nn.functional.interpolate(clip, scale_factor=scale_, mode=interpolation_mode, align_corners=False)\n\n\ndef resized_crop(clip, i, j, h, w, size, interpolation_mode=\"bilinear\"):\n    \"\"\"\n    Do spatial cropping and resizing to the video clip\n    Args:\n        clip (torch.tensor): Video clip to be cropped. Size is (T, C, H, W)\n        i (int): i in (i,j) i.e coordinates of the upper left corner.\n        j (int): j in (i,j) i.e coordinates of the upper left corner.\n        h (int): Height of the cropped region.\n        w (int): Width of the cropped region.\n        size (tuple(int, int)): height and width of resized clip\n    Returns:\n        clip (torch.tensor): Resized and cropped clip. Size is (T, C, H, W)\n    \"\"\"\n    if not _is_tensor_video_clip(clip):\n        raise ValueError(\"clip should be a 4D torch.tensor\")\n    clip = crop(clip, i, j, h, w)\n    clip = resize(clip, size, interpolation_mode)\n    return clip\n\n\ndef center_crop(clip, crop_size):\n    if not _is_tensor_video_clip(clip):\n        raise ValueError(\"clip should be a 4D torch.tensor\")\n    h, w = clip.size(-2), clip.size(-1)\n    th, tw = crop_size\n    if h < th or w < tw:\n        raise ValueError(\"height and width must be no smaller than crop_size\")\n\n    i = int(round((h - th) / 2.0))\n    j = int(round((w - tw) / 2.0))\n    return crop(clip, i, j, th, tw)\n\n\ndef center_crop_using_short_edge(clip):\n    if not _is_tensor_video_clip(clip):\n        raise ValueError(\"clip should be a 4D torch.tensor\")\n    h, w = clip.size(-2), clip.size(-1)\n    if h < w:\n        th, tw = h, h\n        i = 0\n        j = int(round((w - tw) / 2.0))\n    else:\n        th, tw = w, w\n        i = int(round((h - th) / 2.0))\n        j = 0\n    return crop(clip, i, j, th, tw)\n\n\ndef resize_crop_to_fill(clip, target_size):\n    if not _is_tensor_video_clip(clip):\n        raise ValueError(\"clip should be a 4D torch.tensor\")\n    h, w = clip.size(-2), clip.size(-1)\n    th, tw = target_size[0], target_size[1]\n    rh, rw = th / h, tw / w\n    if rh > rw:\n        sh, sw = th, round(w * rh)\n        clip = resize(clip, (sh, sw), \"bilinear\")\n        i = 0\n        j = int(round(sw - tw) / 2.0)\n    else:\n        sh, sw = round(h * rw), tw\n        clip = resize(clip, (sh, sw), \"bilinear\")\n        i = int(round(sh - th) / 2.0)\n        j = 0\n    assert i + th <= clip.size(-2) and j + tw <= clip.size(-1)\n    return crop(clip, i, j, th, tw)\n\n\ndef random_shift_crop(clip):\n    \"\"\"\n    Slide along the long edge, with the short edge as crop size\n    \"\"\"\n    if not _is_tensor_video_clip(clip):\n        raise ValueError(\"clip should be a 4D torch.tensor\")\n    h, w = clip.size(-2), clip.size(-1)\n\n    if h <= w:\n        short_edge = h\n    else:\n        short_edge = w\n\n    th, tw = short_edge, short_edge\n\n    i = torch.randint(0, h - th + 1, size=(1,)).item()\n    j = torch.randint(0, w - tw + 1, size=(1,)).item()\n    return crop(clip, i, j, th, tw)\n\n\ndef to_tensor(clip):\n    \"\"\"\n    Convert tensor data type from uint8 to float, divide value by 255.0 and\n    permute the dimensions of clip tensor\n    Args:\n        clip (torch.tensor, dtype=torch.uint8): Size is (T, C, H, W)\n    Return:\n        clip (torch.tensor, dtype=torch.float): Size is (T, C, H, W)\n    \"\"\"\n    _is_tensor_video_clip(clip)\n    if not clip.dtype == torch.uint8:\n        raise TypeError(\"clip tensor should have data type uint8. Got %s\" % str(clip.dtype))\n    # return clip.float().permute(3, 0, 1, 2) / 255.0\n    return clip.float() / 255.0\n\n\ndef normalize(clip, mean, std, inplace=False):\n    \"\"\"\n    Args:\n        clip (torch.tensor): Video clip to be normalized. Size is (T, C, H, W)\n        mean (tuple): pixel RGB mean. Size is (3)\n        std (tuple): pixel standard deviation. Size is (3)\n    Returns:\n        normalized clip (torch.tensor): Size is (T, C, H, W)\n    \"\"\"\n    if not _is_tensor_video_clip(clip):\n        raise ValueError(\"clip should be a 4D torch.tensor\")\n    if not inplace:\n        clip = clip.clone()\n    mean = torch.as_tensor(mean, dtype=clip.dtype, device=clip.device)\n    # print(mean)\n    std = torch.as_tensor(std, dtype=clip.dtype, device=clip.device)\n    clip.sub_(mean[:, None, None, None]).div_(std[:, None, None, None])\n    return clip\n\n\ndef hflip(clip):\n    \"\"\"\n    Args:\n        clip (torch.tensor): Video clip to be normalized. Size is (T, C, H, W)\n    Returns:\n        flipped clip (torch.tensor): Size is (T, C, H, W)\n    \"\"\"\n    if not _is_tensor_video_clip(clip):\n        raise ValueError(\"clip should be a 4D torch.tensor\")\n    return clip.flip(-1)\n\n\nclass ResizeCrop:\n    def __init__(self, size):\n        if isinstance(size, numbers.Number):\n            self.size = (int(size), int(size))\n        else:\n            self.size = size\n\n    def __call__(self, clip):\n        clip = resize_crop_to_fill(clip, self.size)\n        return clip\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(size={self.size})\"\n\n\nclass RandomCropVideo:\n    def __init__(self, size):\n        if isinstance(size, numbers.Number):\n            self.size = (int(size), int(size))\n        else:\n            self.size = size\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n            clip (torch.tensor): Video clip to be cropped. Size is (T, C, H, W)\n        Returns:\n            torch.tensor: randomly cropped video clip.\n                size is (T, C, OH, OW)\n        \"\"\"\n        i, j, h, w = self.get_params(clip)\n        return crop(clip, i, j, h, w)\n\n    def get_params(self, clip):\n        h, w = clip.shape[-2:]\n        th, tw = self.size\n\n        if h < th or w < tw:\n            raise ValueError(f\"Required crop size {(th, tw)} is larger than input image size {(h, w)}\")\n\n        if w == tw and h == th:\n            return 0, 0, h, w\n\n        i = torch.randint(0, h - th + 1, size=(1,)).item()\n        j = torch.randint(0, w - tw + 1, size=(1,)).item()\n\n        return i, j, th, tw\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(size={self.size})\"\n\n\nclass CenterCropResizeVideo:\n    \"\"\"\n    First use the short side for cropping length,\n    center crop video, then resize to the specified size\n    \"\"\"\n\n    def __init__(\n        self,\n        size,\n        interpolation_mode=\"bilinear\",\n    ):\n        if isinstance(size, tuple):\n            if len(size) != 2:\n                raise ValueError(f\"size should be tuple (height, width), instead got {size}\")\n            self.size = size\n        else:\n            self.size = (size, size)\n\n        self.interpolation_mode = interpolation_mode\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n            clip (torch.tensor): Video clip to be cropped. Size is (T, C, H, W)\n        Returns:\n            torch.tensor: scale resized / center cropped video clip.\n                size is (T, C, crop_size, crop_size)\n        \"\"\"\n        clip_center_crop = center_crop_using_short_edge(clip)\n        clip_center_crop_resize = resize(\n            clip_center_crop, target_size=self.size, interpolation_mode=self.interpolation_mode\n        )\n        return clip_center_crop_resize\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(size={self.size}, interpolation_mode={self.interpolation_mode}\"\n\n\nclass UCFCenterCropVideo:\n    \"\"\"\n    First scale to the specified size in equal proportion to the short edge,\n    then center cropping\n    \"\"\"\n\n    def __init__(\n        self,\n        size,\n        interpolation_mode=\"bilinear\",\n    ):\n        if isinstance(size, tuple):\n            if len(size) != 2:\n                raise ValueError(f\"size should be tuple (height, width), instead got {size}\")\n            self.size = size\n        else:\n            self.size = (size, size)\n\n        self.interpolation_mode = interpolation_mode\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n            clip (torch.tensor): Video clip to be cropped. Size is (T, C, H, W)\n        Returns:\n            torch.tensor: scale resized / center cropped video clip.\n                size is (T, C, crop_size, crop_size)\n        \"\"\"\n        clip_resize = resize_scale(clip=clip, target_size=self.size, interpolation_mode=self.interpolation_mode)\n        clip_center_crop = center_crop(clip_resize, self.size)\n        return clip_center_crop\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(size={self.size}, interpolation_mode={self.interpolation_mode}\"\n\n\nclass KineticsRandomCropResizeVideo:\n    \"\"\"\n    Slide along the long edge, with the short edge as crop size. And resie to the desired size.\n    \"\"\"\n\n    def __init__(\n        self,\n        size,\n        interpolation_mode=\"bilinear\",\n    ):\n        if isinstance(size, tuple):\n            if len(size) != 2:\n                raise ValueError(f\"size should be tuple (height, width), instead got {size}\")\n            self.size = size\n        else:\n            self.size = (size, size)\n\n        self.interpolation_mode = interpolation_mode\n\n    def __call__(self, clip):\n        clip_random_crop = random_shift_crop(clip)\n        clip_resize = resize(clip_random_crop, self.size, self.interpolation_mode)\n        return clip_resize\n\n\nclass CenterCropVideo:\n    def __init__(\n        self,\n        size,\n        interpolation_mode=\"bilinear\",\n    ):\n        if isinstance(size, tuple):\n            if len(size) != 2:\n                raise ValueError(f\"size should be tuple (height, width), instead got {size}\")\n            self.size = size\n        else:\n            self.size = (size, size)\n\n        self.interpolation_mode = interpolation_mode\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n            clip (torch.tensor): Video clip to be cropped. Size is (T, C, H, W)\n        Returns:\n            torch.tensor: center cropped video clip.\n                size is (T, C, crop_size, crop_size)\n        \"\"\"\n        clip_center_crop = center_crop(clip, self.size)\n        return clip_center_crop\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(size={self.size}, interpolation_mode={self.interpolation_mode}\"\n\n\nclass NormalizeVideo:\n    \"\"\"\n    Normalize the video clip by mean subtraction and division by standard deviation\n    Args:\n        mean (3-tuple): pixel RGB mean\n        std (3-tuple): pixel RGB standard deviation\n        inplace (boolean): whether do in-place normalization\n    \"\"\"\n\n    def __init__(self, mean, std, inplace=False):\n        self.mean = mean\n        self.std = std\n        self.inplace = inplace\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n            clip (torch.tensor): video clip must be normalized. Size is (C, T, H, W)\n        \"\"\"\n        return normalize(clip, self.mean, self.std, self.inplace)\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(mean={self.mean}, std={self.std}, inplace={self.inplace})\"\n\n\nclass ToTensorVideo:\n    \"\"\"\n    Convert tensor data type from uint8 to float, divide value by 255.0 and\n    permute the dimensions of clip tensor\n    \"\"\"\n\n    def __init__(self):\n        pass\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n            clip (torch.tensor, dtype=torch.uint8): Size is (T, C, H, W)\n        Return:\n            clip (torch.tensor, dtype=torch.float): Size is (T, C, H, W)\n        \"\"\"\n        return to_tensor(clip)\n\n    def __repr__(self) -> str:\n        return self.__class__.__name__\n\n\nclass RandomHorizontalFlipVideo:\n    \"\"\"\n    Flip the video clip along the horizontal direction with a given probability\n    Args:\n        p (float): probability of the clip being flipped. Default value is 0.5\n    \"\"\"\n\n    def __init__(self, p=0.5):\n        self.p = p\n\n    def __call__(self, clip):\n        \"\"\"\n        Args:\n            clip (torch.tensor): Size is (T, C, H, W)\n        Return:\n            clip (torch.tensor): Size is (T, C, H, W)\n        \"\"\"\n        if random.random() < self.p:\n            clip = hflip(clip)\n        return clip\n\n    def __repr__(self) -> str:\n        return f\"{self.__class__.__name__}(p={self.p})\"\n\n\n#  ------------------------------------------------------------\n#  ---------------------  Sampling  ---------------------------\n#  ------------------------------------------------------------\nclass TemporalRandomCrop(object):\n    \"\"\"Temporally crop the given frame indices at a random location.\n\n    Args:\n            size (int): Desired length of frames will be seen in the model.\n    \"\"\"\n\n    def __init__(self, size):\n        self.size = size\n\n    def __call__(self, total_frames):\n        rand_end = max(0, total_frames - self.size - 1)\n        begin_index = random.randint(0, rand_end)\n        end_index = min(begin_index + self.size, total_frames)\n        return begin_index, end_index\n\n\nif __name__ == \"__main__\":\n    import os\n\n    import numpy as np\n    import torchvision.io as io\n    from torchvision import transforms\n    from torchvision.utils import save_image\n\n    vframes, aframes, info = io.read_video(filename=\"./v_Archery_g01_c03.avi\", pts_unit=\"sec\", output_format=\"TCHW\")\n\n    trans = transforms.Compose(\n        [\n            ToTensorVideo(),\n            RandomHorizontalFlipVideo(),\n            UCFCenterCropVideo(512),\n            # NormalizeVideo(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], inplace=True),\n            transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], inplace=True),\n        ]\n    )\n\n    target_video_len = 32\n    frame_interval = 1\n    total_frames = len(vframes)\n    print(total_frames)\n\n    temporal_sample = TemporalRandomCrop(target_video_len * frame_interval)\n\n    # Sampling video frames\n    start_frame_ind, end_frame_ind = temporal_sample(total_frames)\n    # print(start_frame_ind)\n    # print(end_frame_ind)\n    assert end_frame_ind - start_frame_ind >= target_video_len\n    frame_indice = np.linspace(start_frame_ind, end_frame_ind - 1, target_video_len, dtype=int)\n    print(frame_indice)\n\n    select_vframes = vframes[frame_indice]\n    print(select_vframes.shape)\n    print(select_vframes.dtype)\n\n    select_vframes_trans = trans(select_vframes)\n    print(select_vframes_trans.shape)\n    print(select_vframes_trans.dtype)\n\n    select_vframes_trans_int = ((select_vframes_trans * 0.5 + 0.5) * 255).to(dtype=torch.uint8)\n    print(select_vframes_trans_int.dtype)\n    print(select_vframes_trans_int.permute(0, 2, 3, 1).shape)\n\n    io.write_video(\"./test.avi\", select_vframes_trans_int.permute(0, 2, 3, 1), fps=8)\n\n    for i in range(target_video_len):\n        save_image(\n            select_vframes_trans[i], os.path.join(\"./test000\", \"%04d.png\" % i), normalize=True, value_range=(-1, 1)\n        )\n"
  },
  {
    "path": "Open-Sora/opensora/models/__init__.py",
    "content": "from .dit import *\nfrom .latte import *\nfrom .pixart import *\nfrom .stdit import *\nfrom .text_encoder import *\nfrom .vae import *\n"
  },
  {
    "path": "Open-Sora/opensora/models/cache_functions/__init__.py",
    "content": "from .cache_cutfresh import cache_cutfresh\nfrom .fresh_ratio_scheduler import fresh_ratio_scheduler\nfrom .score_evaluate import score_evaluate\nfrom .global_force_fresh import global_force_fresh\nfrom .cache_cutfresh import cache_cutfresh\nfrom .update_cache import update_cache\nfrom .force_init import force_init\nfrom .attention import cached_attention_forward\nfrom .cache_init import cache_init"
  },
  {
    "path": "Open-Sora/opensora/models/cache_functions/attention.py",
    "content": "# Besides, re-arrange the attention module\nfrom torch.jit import Final\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom typing import Optional, Union\nfrom xformers.ops.fmha.attn_bias import BlockDiagonalMask\ndef cached_attention_forward(\n    query: torch.Tensor,\n    key: torch.Tensor,\n    value: torch.Tensor,\n    attn_bias: Optional[Union[torch.Tensor, BlockDiagonalMask]] = None,\n    p: float = 0.0,\n    scale: Optional[float] = None\n) -> torch.Tensor:\n    scale = 1.0 / query.shape[-1] ** 0.5\n    query = query * scale\n    query = query.transpose(1, 2)\n    key = key.transpose(1, 2)\n    value = value.transpose(1, 2)\n    #attn = query @ key.transpose(-2, -1)\n    attn = torch.matmul(query, key.transpose(-2, -1))\n    if attn_bias is not None:\n        attn_bias = attn_bias.materialize(shape= attn.shape, dtype= attn.dtype, device= attn.device)\n        attn = attn + attn_bias\n    #out_map = attn\n    attn_map = attn.softmax(-1)\n    attn = F.dropout(attn_map, p)\n    attn = torch.matmul(attn, value)\n    #attn = attn @ value\n\n    return attn.transpose(1, 2).contiguous(), attn_map.mean(dim=1)"
  },
  {
    "path": "Open-Sora/opensora/models/cache_functions/cache_cutfresh.py",
    "content": "from .fresh_ratio_scheduler import fresh_ratio_scheduler\nfrom .score_evaluate import score_evaluate\n#from .token_merge import token_merge\nimport torch\ndef cache_cutfresh(cache_dic, tokens, current):\n    '''\n    Cut fresh tokens from the input tokens and update the cache counter.\n    \n    cache_dic: dict, the cache dictionary containing cache(main extra memory cost), indices and some other information.\n    tokens: torch.Tensor, the input tokens to be cut.\n    current: dict, the current step, layer, and module information. Particularly convenient for debugging.\n    '''\n    step = current['step']\n    layer = current['layer']\n    module = current['module']\n\n    fresh_ratio = fresh_ratio_scheduler(cache_dic, current)\n\n    fresh_ratio = torch.clamp(torch.tensor(fresh_ratio, device = tokens.device), min=0, max=1)\n    # Generate the index tensor for fresh tokens\n    score = score_evaluate(cache_dic, tokens, current) # s1, s2, s3 mentioned in the paper\n    #score = local_selection_with_space_time_bonus(cache_dic, score, 0.3, 2, time_mean=False) # s4 mentioned in the paper.\n    indices = score.argsort(dim=-1, descending=True)\n    topk = int(fresh_ratio * score.shape[1])\n    fresh_indices = indices[:, :topk]\n    stale_indices = indices[:, topk:]\n    # (B, fresh_ratio *N)\n\n    # Updating the Cache Frequency Score s3 counter mentioned in the paper\n    # stale tokens index + 1 in each ***module***, fresh tokens index = 0\n    cache_dic['cache_index'][current['flag']][layer][module] += 1\n    cache_dic['cache_index'][current['flag']][layer][module].scatter_(dim=1, index=fresh_indices, \n                                                                    src = torch.zeros_like(fresh_indices, dtype=torch.int, device=fresh_indices.device))\n    cache_dic['cache_index']['layer_index'][module] += 1\n    cache_dic['cache_index']['layer_index'][module].scatter_(dim=1, index=fresh_indices, \n                                                                    src = torch.zeros_like(fresh_indices, dtype=torch.int, device=fresh_indices.device))\n    # select the fresh tokens out\n    fresh_indices_expand = fresh_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1])\n\n    if module in ['mlp', 'attn', 'cross-attn']:\n         \n        fresh_tokens = torch.gather(input = tokens, dim = 1, index = fresh_indices_expand)\n\n        return fresh_indices, fresh_tokens\n    else:\n        raise ValueError(\"Unrecognized module?\", module)\n    \nimport torch\nfrom einops import rearrange\n\ndef local_selection_with_space_time_bonus(cache_dic, score, bonus_ratio, grid_size=2, time_mean = False):\n    # Get the shape of the tensor from cache_dic\n    B, T, H, W = cache_dic['dynamic_size']\n    \n    # Reshape the score to [B, T, H, W]\n    score = rearrange(score, \"B (T H W) -> B T H W\", T=T, H=H, W=W)\n    \n    # Calculate the padding size to make H and W divisible by grid_size\n    pad_h = (grid_size - H % grid_size) % grid_size  # Number of zeros to pad in H dimension\n    pad_w = (grid_size - W % grid_size) % grid_size  # Number of zeros to pad in W dimension\n    \n    # Pad the H and W dimensions with zeros\n    if pad_h > 0 or pad_w > 0:\n        score = torch.nn.functional.pad(score, (0, pad_w, 0, pad_h))  # (pad width left/right, pad height top/bottom)\n\n    # Update H and W after padding\n    H_padded, W_padded = score.shape[2], score.shape[3]\n    \n    # Step 1: Normalize along the H*W dimension so that information from different time steps has equal weight\n    score = score.view(B, T, -1)  # Merge H and W into one dimension [B, T, H*W]\n    score = torch.nn.functional.softmax(score, dim=-1)  # Normalize along H*W dimension\n    score = score.view(B, T, H_padded, W_padded)  # Restore to [B, T, H_padded, W_padded] shape\n\n    # Step 2: Perform block-wise operation on each spatial slice (each T time step)\n    block_size = grid_size * grid_size\n    assert (H_padded * W_padded) % block_size == 0, f\"H_padded * W_padded must be divisible by block size, shape: {B},{T},{H_padded},{W_padded}; block:{grid_size}*{grid_size};\" \n\n    # Reshape the score into block-wise grouped shape\n    score_reshaped = score.view(B, T, H_padded // grid_size, grid_size, W_padded // grid_size, grid_size)\n    score_reshaped = score_reshaped.permute(0, 1, 2, 4, 3, 5).contiguous()  # [B, T, H//grid_size, W//grid_size, grid_size, grid_size]\n    score_reshaped = score_reshaped.view(B, T, -1, block_size)  # [B, T, num_blocks, block_size]\n\n    # Step 3: Find the maximum score in each block\n    max_scores, max_indices = score_reshaped.max(dim=-1, keepdim=True)  # [B, T, num_blocks, 1]\n    \n    # Step 4: Create a mask to identify the token with the maximum score\n    mask = torch.zeros_like(score_reshaped)\n    mask.scatter_(-1, max_indices, 1)  # Set the mask to 1 at the index of the maximum score\n    \n    # Step 5: Apply the bonus only to the token with the maximum score\n    score_reshaped = score_reshaped + (mask * max_scores * bonus_ratio)  # Apply bonus only to the maximum score\n    \n    # Step 6: Restore the score to its original shape\n    score_modified = score_reshaped.view(B, T, H_padded // grid_size, W_padded // grid_size, grid_size, grid_size)\n    score_modified = score_modified.permute(0, 1, 2, 4, 3, 5).contiguous()\n    score_modified = score_modified.view(B, T, H_padded, W_padded)\n\n    # Step 7: Remove the padded zeros\n    if pad_h > 0 or pad_w > 0:\n        score_modified = score_modified[:, :, :H, :W]  # Remove the padded zeros\n\n    if time_mean:\n        score_modified = score_modified.mean(dim = 1)\n        score_modified = score_modified.unsqueeze(1).expand(B, T, H, W)\n        \n    # Finally, reshape the score back to the original shape [B, (T H W)]\n    score_modified = rearrange(score_modified, \"B T H W -> B (T H W)\")\n    \n    return score_modified\n"
  },
  {
    "path": "Open-Sora/opensora/models/cache_functions/cache_init.py",
    "content": "def cache_init(model_kwargs, num_steps):   \n    '''\n    Initialize for cache.\n    '''\n    cache_dic = {}\n    cache = {}\n    indices_cache = {}\n    cache_index = {}\n    cache[-1]={}\n    cache[0]={}\n    indices_cache[-1]={}\n    indices_cache[0]={}\n    cache_index[-1]={}\n    cache_index[0]={}\n    cache_index['layer_index']={}\n    cache_dic['attn_map'] = {}\n    cache_dic['attn_map'][-1] = {}\n    cache_dic['attn_map'][0] = {}\n    cache_dic['cross_attn_map'] = {}\n    cache_dic['cross_attn_map'][-1] = {}\n    cache_dic['cross_attn_map'][0] = {}\n\n    for j in range(28):\n        cache[-1][j] = {}\n        indices_cache[-1] = {}\n        cache_index[-1][j] = {}\n        cache_dic['attn_map'][-1][j] = {}\n        cache_dic['cross_attn_map'][-1][j] = {}\n\n        cache[0][j] = {}\n        indices_cache[0] = {}\n        cache_index[0][j] = {}\n        cache_dic['attn_map'][0][j] = {}\n        cache_dic['cross_attn_map'][0][j] = {}\n\n    cache_dic['cache_type'] = model_kwargs['cache_type']\n    cache_dic['cache_index'] = cache_index\n    cache_dic['cache'] = cache\n    cache_dic['indices_cache'] = indices_cache\n    cache_dic['fresh_ratio_schedule'] = model_kwargs['ratio_scheduler']\n    cache_dic['fresh_ratio'] = model_kwargs['fresh_ratio']\n    cache_dic['fresh_threshold'] = model_kwargs['fresh_threshold']\n    cache_dic['force_fresh'] = model_kwargs['force_fresh']\n    cache_dic['soft_fresh_weight'] = model_kwargs['soft_fresh_weight']\n    #cache_dic['extra_flops'] = 0.0\n    #cache_dic['merge_weight'] = merge_weight\n    current = {}\n    current['num_steps'] = num_steps\n    return cache_dic, current\n    "
  },
  {
    "path": "Open-Sora/opensora/models/cache_functions/force_init.py",
    "content": "import torch\nfrom .force_scheduler import force_scheduler\ndef force_init(cache_dic, current, tokens):\n    cache_dic['cache_index'][current['flag']][current['layer']][current['module']] = torch.zeros(tokens.shape[0], tokens.shape[1], dtype=torch.int, device=tokens.device)\n    force_scheduler(cache_dic, current)\n    if current['layer'] == 0:\n        cache_dic['cache_index']['layer_index'][current['module']] = torch.zeros(tokens.shape[0], tokens.shape[1], dtype=torch.int, device=tokens.device)"
  },
  {
    "path": "Open-Sora/opensora/models/cache_functions/force_scheduler.py",
    "content": "import torch\ndef force_scheduler(cache_dic, current):\n    thresholds = {}\n    if cache_dic['fresh_ratio'] == 0:\n        # FORA\n        linear_step_weight = 0.0\n    else: \n        # TokenCache\n        linear_step_weight = 0.0 #N=6 0.2 #N=4 0.3\n    step_factor = torch.tensor(1 - linear_step_weight + 2 * linear_step_weight * current['step'] / current['num_steps'])\n    threshold = torch.round(cache_dic['fresh_threshold'] / step_factor)\n\n    # Here we set force activation cycles for different modules separately.\n    thresholds = {\n        'spat-attn' : 3,\n        'temp-attn' : 3,\n       'cross-attn' : 6,\n              'mlp' : 3   }\n    \n    #thresholds = {\n    #    'spat-attn' : 2,\n    #    'temp-attn' : 2,\n    #   'cross-attn' : 2,\n    #          'mlp' : 2   }\n\n    cache_dic['cal_threshold'] = thresholds\n    #return threshold"
  },
  {
    "path": "Open-Sora/opensora/models/cache_functions/fresh_ratio_scheduler.py",
    "content": "import torch\ndef fresh_ratio_scheduler(cache_dic, current):\n    '''\n    Return the fresh ratio for the current step.\n    '''\n    fresh_ratio = cache_dic['fresh_ratio']\n    fresh_ratio_schedule = cache_dic['fresh_ratio_schedule']\n    step = current['step']\n    num_steps = current['num_steps']\n    threshold = cache_dic['fresh_threshold']\n    weight = 0.9\n    if fresh_ratio_schedule == 'constant':\n        return fresh_ratio\n    elif fresh_ratio_schedule == 'linear':\n        return fresh_ratio * (1 + weight - 2 * weight * step / num_steps)\n    elif fresh_ratio_schedule == 'exp':\n        #return 0.5 * (0.052 ** (step/num_steps))\n        return fresh_ratio * (weight ** (step / num_steps))\n    elif fresh_ratio_schedule == 'linear-mode':\n        mode = (step % threshold)/threshold - 0.5\n        mode_weight = 0.1\n        return fresh_ratio * (1 + weight - 2 * weight * step / num_steps + mode_weight * mode)\n    elif fresh_ratio_schedule == 'layerwise':\n        return fresh_ratio * (1 + weight - 2 * weight * current['layer'] / 27)\n    \n    elif fresh_ratio_schedule == 'ToCa':\n        '''\n        Video cost too much to tune the parameters\n        However, simply set these parameters have good enough performances and fast speed mentioned in our paper.\n        We will search a better parameter setting for better in future.\n        '''\n        step_weight = 0.0\n        step_factor = 1 + step_weight - 2 * step_weight * step / num_steps\n\n        layer_weight = 0.0\n        layer_factor = 1 + layer_weight - 2 * layer_weight * current['layer'] / 27\n\n        module_weight = 1.5\n        module_time_weight = 0.33\n        module_factor = (1 - (1-module_time_weight) * module_weight) if current['module']=='cross-attn' else (1 + module_time_weight * module_weight)\n        \n        # set for temporal and spatial branch\n        type_weight = 0.0\n        type_factor = 1 + type_weight if current['flag'] == -1 else 1 - type_weight\n\n        return fresh_ratio * layer_factor * step_factor * module_factor * type_factor\n\n    else:\n        raise ValueError(\"unrecognized fresh ratio schedule\", fresh_ratio_schedule)\n"
  },
  {
    "path": "Open-Sora/opensora/models/cache_functions/global_force_fresh.py",
    "content": "from .force_scheduler import force_scheduler\ndef global_force_fresh(cache_dic, current):\n    '''\n    Return whether to force fresh tokens globally.\n    '''\n    is_force_fresh = {}\n    fresh_thresholds = {}\n    first_step = (current['step'] == 0)\n    first_3steps = (current['step'] <= 2) # Note the fact that for OpenSora series models, the first 3 steps is with great importance!!!\n    last_step = current['step'] == current['num_steps'] - 1\n    force_fresh = cache_dic['force_fresh']\n    if not first_step:\n        fresh_thresholds['spat-attn']  = cache_dic['cal_threshold']['spat-attn']\n        fresh_thresholds['temp-attn']  = cache_dic['cal_threshold']['temp-attn']\n        fresh_thresholds['cross-attn'] = cache_dic['cal_threshold']['cross-attn']\n        fresh_thresholds['mlp']        = cache_dic['cal_threshold']['mlp']\n    else:\n        fresh_thresholds['spat-attn']  = cache_dic['fresh_threshold']\n        fresh_thresholds['temp-attn']  = cache_dic['fresh_threshold']\n        fresh_thresholds['cross-attn'] = cache_dic['fresh_threshold']\n        fresh_thresholds['mlp']        = cache_dic['fresh_threshold']\n\n    if force_fresh == 'global':\n        if current['flag'] == -1:\n            is_force_fresh['attn'] =   (first_3steps or (current['step']% fresh_thresholds['temp-attn'] == 0))\n        else:\n            is_force_fresh['attn'] =   (first_3steps or (current['step']% fresh_thresholds['spat-attn'] == 0))\n\n        is_force_fresh['cross-attn'] = (first_3steps or (current['step']% fresh_thresholds['cross-attn'] == 0))\n        is_force_fresh['mlp'] =        (first_3steps or (current['step']% fresh_thresholds['mlp'] == 0))\n\n        return is_force_fresh\n    elif force_fresh == 'local':\n        return first_step\n    elif force_fresh == 'none':\n        return first_step\n    else:\n        raise ValueError(\"unrecognized force fresh strategy\", force_fresh)"
  },
  {
    "path": "Open-Sora/opensora/models/cache_functions/score_evaluate.py",
    "content": "import torch\nimport torch.nn as nn\nfrom .scores import attn_score, similarity_score, norm_score\ndef score_evaluate(cache_dic, tokens, current) -> torch.Tensor:\n    '''\n    Return the score tensor (B, N) for the given tokens.\n    '''\n\n    #if ((not current['is_force_fresh']) and (cache_dic['force_fresh'] == 'local')):\n    ## abandoned branch, if you want to explore the local force fresh strategy, this may help.\n    #    force_fresh_mask = torch.as_tensor((cache_dic['cache_index'][-1][current['layer']][current['module']] >= 2 * cache_dic['fresh_threshold']), dtype = int) # 2 because the threshold is for step, not module\n    #    force_len = force_fresh_mask.sum(dim=1)\n    #    force_indices = force_fresh_mask.argsort(dim = -1, descending = True)[:, :force_len.min()]\n    #\n    #    force_indices = force_indices[:, torch.randperm(force_indices.shape[1])]\n\n    if cache_dic['cache_type'] == 'random':\n        score = torch.rand(int(tokens.shape[0]*0.5), tokens.shape[1], device=tokens.device)\n        score = torch.cat([score, score], dim=0).to(tokens.device)\n\n    elif cache_dic['cache_type'] == 'straight':\n        score = torch.ones(tokens.shape[0], tokens.shape[1]).to(tokens.device)\n    \n    elif cache_dic['cache_type'] == 'attention':\n        score = attn_score(cache_dic, current)\n    \n    elif cache_dic['cache_type'] == 'similarity':\n        score = similarity_score(cache_dic, current, tokens)\n\n    elif cache_dic['cache_type'] == 'norm':\n        score = norm_score(cache_dic, current, tokens)\n\n    elif cache_dic['cache_type'] == 'compress':\n        score1 = torch.rand(int(tokens.shape[0]*0.5), tokens.shape[1])\n        score1 = torch.cat([score1, score1], dim=0).to(tokens.device)\n        score2 = cache_dic['attn_map'][current['flag']][current['layer']].sum(dim=1)#.mean(dim=0) # (B, N)\n        # normalize\n        score2 = score2 / score2.max(dim=1, keepdim=True)[0]\n        score = 0.5 * score1 + 0.5 * score2\n\n    # abandon the branch, if you want to explore the local force fresh strategy, this may help.\n    #if ((not current['is_force_fresh']) and (cache_dic['force_fresh'] == 'local')): # current['is_force_fresh'] is False, cause when it is True, no cut and fresh are needed\n    #        #print(torch.ones_like(force_indices, dtype=float, device=force_indices.device).dtype)\n    #    score.scatter_(dim=1, index=force_indices, src=torch.ones_like(force_indices, dtype=torch.float32, \n    #                                                                       device=force_indices.device))\n    \n    if (True and (cache_dic['force_fresh'] == 'global')):\n        soft_step_score = cache_dic['cache_index'][current['flag']][current['layer']][current['module']].float() / (cache_dic['fresh_threshold'])\n        #soft_layer_score = cache_dic['cache_index']['layer_index'][current['module']].float() / (27)\n        score = score + cache_dic['soft_fresh_weight'] * soft_step_score #+ 0.1 *soft_layer_score\n    \n    return score.to(tokens.device)"
  },
  {
    "path": "Open-Sora/opensora/models/cache_functions/scores.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\ndef attn_score(cache_dic, current):\n    #self_attn_score = 1- cache_dic['attn_map'][current['flag']][current['layer']].diagonal(dim1=1, dim2=2)\n    #self_attn_score = F.normalize(self_attn_score, dim=1, p=2)\n    #attention_score = F.normalize(cache_dic['attn_map'][current['flag']][current['layer']].sum(dim=1), dim=1, p=2)\n    #cross_attn_map = F.threshold(cache_dic['cross_attn_map'][current['flag']][current['layer']],threshold=0.0, value=0.0)\n    #cross_attention_score = F.normalize(cross_attn_map.sum(dim=-1), dim=-1, p=2)\n    \n    cond_cmap, uncond_cmap = torch.split(cache_dic['cross_attn_map'][current['flag']][current['layer']], len(cache_dic['cross_attn_map'][current['flag']][current['layer']]) // 2, dim=0)\n    cond_weight = 0.5\n    cmap = cond_weight * cond_cmap + (1 - cond_weight) * uncond_cmap\n    cross_attention_entropy = -torch.sum(cmap * torch.log(cmap + 1e-7), dim=-1)\n    cross_attention_score   = F.normalize(1 + cross_attention_entropy, dim=1, p=2)\n    #score = self_attn_score\n    #score = attention_score\n    score = cross_attention_score.repeat(2, 1)\n    #cross_weight = 0.0\n    #score =  (1-cross_weight) * attention_score + cross_weight * cross_attention_score\n    return score\n\ndef similarity_score(cache_dic, current, tokens):\n    cosine_sim = F.cosine_similarity(tokens, cache_dic['cache'][current['flag']][current['layer']][current['module']], dim=-1)\n\n    return F.normalize(1- cosine_sim, dim=-1, p=2)\n\ndef norm_score(cache_dic, current, tokens):\n    norm = tokens.norm(dim=-1, p=2)\n    return F.normalize(norm, dim=-1, p=2)\n"
  },
  {
    "path": "Open-Sora/opensora/models/cache_functions/token_merge.py",
    "content": "import torch\ndef token_merge(cache_dic, tokens, current, fresh_indices, stale_indices):\n    '''\n    An abandoned branch in exploring if token merge helps. The answer is no, at least no for training-free strategy.\n    '''\n    if (current['layer'] % 1 == 0):\n        fresh_tokens = torch.gather(input = tokens, dim = 1, index = fresh_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1]))\n        stale_tokens = torch.gather(input = tokens, dim = 1, index = stale_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1]))\n        method = 'similarity'\n        if method == 'distance':\n            descending = False\n            distance = torch.cdist(stale_tokens, fresh_tokens, p=1)\n            stale_fresh_dist, stale_fresh_indices_allstale = torch.min(distance, dim=2)\n        elif method == 'similarity':\n            descending = True\n            fresh_tokens = torch.nn.functional.normalize(fresh_tokens, p=2, dim=-1)\n            stale_tokens = torch.nn.functional.normalize(stale_tokens, p=2, dim=-1)\n            similarity = stale_tokens @ fresh_tokens.transpose(1, 2)\n            stale_fresh_dist, stale_fresh_indices_allstale = torch.max(similarity, dim=2)\n        \n\n        saved_topk_stale = int((stale_fresh_dist > 0.995).sum(dim=1).min())\n        merged_stale_sequence = torch.sort(stale_fresh_dist, dim=1, descending=descending)[1][:,:saved_topk_stale]\n        stale_fresh_indices = stale_fresh_indices_allstale.gather(1, merged_stale_sequence)\n        merged_stale_sequence = stale_indices.gather(1, merged_stale_sequence)\n        merged_stale_fresh_indices = fresh_indices.gather(1, stale_fresh_indices)\n        cache_dic['merged_stale_fresh_indices'] = merged_stale_fresh_indices \n        cache_dic['merged_stale_sequence'] = merged_stale_sequence\n"
  },
  {
    "path": "Open-Sora/opensora/models/cache_functions/update_cache.py",
    "content": "import torch\ndef update_cache(fresh_indices, fresh_tokens, cache_dic, current, fresh_attn_map=None):\n    '''\n    Update the cache with the fresh tokens.\n    '''\n    step = current['step']\n    layer = current['layer']\n    module = current['module']\n    # Update the cached tokens at the positions\n    if module == 'attn':\n        indices = fresh_indices#.sort(dim=1, descending=False)[0]\n        cache_dic['attn_map'][current['flag']][layer].scatter_(dim=1, index=indices.unsqueeze(-1).expand(-1, -1, fresh_attn_map.shape[-1]), src=fresh_attn_map)\n    elif module == 'cross-attn':\n        indices = fresh_indices#.sort(dim=1, descending=False)[0]\n        cache_dic['cross_attn_map'][current['flag']][layer].scatter_(dim=1, index=indices.unsqueeze(-1).expand(-1, -1, fresh_attn_map.shape[-1]), src=fresh_attn_map)\n    elif module == 'mlp':\n        indices = fresh_indices\n\n    cache_dic['cache'][current['flag']][layer][module].scatter_(dim=1, index=indices.unsqueeze(-1).expand(-1, -1, fresh_tokens.shape[-1]), src=fresh_tokens)\n\n\n        \n        "
  },
  {
    "path": "Open-Sora/opensora/models/dit/__init__.py",
    "content": "from .dit import DiT, DiT_XL_2, DiT_XL_2x2\n"
  },
  {
    "path": "Open-Sora/opensora/models/dit/dit.py",
    "content": "# Modified from Meta DiT\n\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# DiT:   https://github.com/facebookresearch/DiT/tree/main\n# GLIDE: https://github.com/openai/glide-text2im\n# MAE:   https://github.com/facebookresearch/mae/blob/main/models_mae.py\n# --------------------------------------------------------\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.utils.checkpoint\nfrom einops import rearrange\nfrom timm.models.vision_transformer import Mlp\n\nfrom opensora.acceleration.checkpoint import auto_grad_checkpoint\nfrom opensora.models.layers.blocks import (\n    Attention,\n    CaptionEmbedder,\n    FinalLayer,\n    LabelEmbedder,\n    PatchEmbed3D,\n    TimestepEmbedder,\n    approx_gelu,\n    get_1d_sincos_pos_embed,\n    get_2d_sincos_pos_embed,\n    get_layernorm,\n    modulate,\n)\nfrom opensora.registry import MODELS\nfrom opensora.utils.ckpt_utils import load_checkpoint\n\n\nclass DiTBlock(nn.Module):\n    \"\"\"\n    A DiT block with adaptive layer norm zero (adaLN-Zero) conditioning.\n    \"\"\"\n\n    def __init__(\n        self,\n        hidden_size,\n        num_heads,\n        mlp_ratio=4.0,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n    ):\n        super().__init__()\n        self.hidden_size = hidden_size\n        self.num_heads = num_heads\n        self.enable_flash_attn = enable_flash_attn\n        mlp_hidden_dim = int(hidden_size * mlp_ratio)\n\n        self.norm1 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.attn = Attention(\n            hidden_size,\n            num_heads=num_heads,\n            qkv_bias=True,\n            enable_flash_attn=enable_flash_attn,\n        )\n        self.norm2 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.mlp = Mlp(in_features=hidden_size, hidden_features=mlp_hidden_dim, act_layer=approx_gelu, drop=0)\n        self.adaLN_modulation = nn.Sequential(nn.SiLU(), nn.Linear(hidden_size, 6 * hidden_size, bias=True))\n\n    def forward(self, x, c):\n        shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = self.adaLN_modulation(c).chunk(6, dim=1)\n        x = x + gate_msa.unsqueeze(1) * self.attn(modulate(self.norm1, x, shift_msa, scale_msa))\n        x = x + gate_mlp.unsqueeze(1) * self.mlp(modulate(self.norm2, x, shift_mlp, scale_mlp))\n        return x\n\n\n@MODELS.register_module()\nclass DiT(nn.Module):\n    \"\"\"\n    Diffusion model with a Transformer backbone.\n    \"\"\"\n\n    def __init__(\n        self,\n        input_size=(16, 32, 32),\n        in_channels=4,\n        patch_size=(1, 2, 2),\n        hidden_size=1152,\n        depth=28,\n        num_heads=16,\n        mlp_ratio=4.0,\n        class_dropout_prob=0.1,\n        learn_sigma=True,\n        condition=\"text\",\n        no_temporal_pos_emb=False,\n        caption_channels=512,\n        model_max_length=77,\n        dtype=torch.float32,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n    ):\n        super().__init__()\n        self.learn_sigma = learn_sigma\n        self.in_channels = in_channels\n        self.out_channels = in_channels * 2 if learn_sigma else in_channels\n        self.hidden_size = hidden_size\n        self.patch_size = patch_size\n        self.input_size = input_size\n        num_patches = np.prod([input_size[i] // patch_size[i] for i in range(3)])\n        self.num_patches = num_patches\n        self.num_temporal = input_size[0] // patch_size[0]\n        self.num_spatial = num_patches // self.num_temporal\n        self.num_heads = num_heads\n        self.dtype = dtype\n        self.use_text_encoder = not condition.startswith(\"label\")\n        if enable_flash_attn:\n            assert dtype in [\n                torch.float16,\n                torch.bfloat16,\n            ], f\"Flash attention only supports float16 and bfloat16, but got {self.dtype}\"\n        self.no_temporal_pos_emb = no_temporal_pos_emb\n        self.mlp_ratio = mlp_ratio\n        self.depth = depth\n        assert enable_sequence_parallelism is False, \"Sequence parallelism is not supported in DiT\"\n\n        self.register_buffer(\"pos_embed_spatial\", self.get_spatial_pos_embed())\n        self.register_buffer(\"pos_embed_temporal\", self.get_temporal_pos_embed())\n\n        self.x_embedder = PatchEmbed3D(patch_size, in_channels, embed_dim=hidden_size)\n        if not self.use_text_encoder:\n            num_classes = int(condition.split(\"_\")[-1])\n            self.y_embedder = LabelEmbedder(num_classes, hidden_size, class_dropout_prob)\n        else:\n            self.y_embedder = CaptionEmbedder(\n                in_channels=caption_channels,\n                hidden_size=hidden_size,\n                uncond_prob=class_dropout_prob,\n                act_layer=approx_gelu,\n                token_num=1,  # pooled token\n            )\n        self.t_embedder = TimestepEmbedder(hidden_size)\n        self.blocks = nn.ModuleList(\n            [\n                DiTBlock(\n                    hidden_size,\n                    num_heads,\n                    mlp_ratio=mlp_ratio,\n                    enable_flash_attn=enable_flash_attn,\n                    enable_layernorm_kernel=enable_layernorm_kernel,\n                )\n                for _ in range(depth)\n            ]\n        )\n        self.final_layer = FinalLayer(hidden_size, np.prod(self.patch_size), self.out_channels)\n\n        self.initialize_weights()\n        self.enable_flash_attn = enable_flash_attn\n        self.enable_layernorm_kernel = enable_layernorm_kernel\n\n    def get_spatial_pos_embed(self):\n        pos_embed = get_2d_sincos_pos_embed(\n            self.hidden_size,\n            self.input_size[1] // self.patch_size[1],\n        )\n        pos_embed = torch.from_numpy(pos_embed).float().unsqueeze(0).requires_grad_(False)\n        return pos_embed\n\n    def get_temporal_pos_embed(self):\n        pos_embed = get_1d_sincos_pos_embed(\n            self.hidden_size,\n            self.input_size[0] // self.patch_size[0],\n        )\n        pos_embed = torch.from_numpy(pos_embed).float().unsqueeze(0).requires_grad_(False)\n        return pos_embed\n\n    def unpatchify(self, x):\n        c = self.out_channels\n        t, h, w = [self.input_size[i] // self.patch_size[i] for i in range(3)]\n        pt, ph, pw = self.patch_size\n\n        x = x.reshape(shape=(x.shape[0], t, h, w, pt, ph, pw, c))\n        x = rearrange(x, \"n t h w r p q c -> n c t r h p w q\")\n        imgs = x.reshape(shape=(x.shape[0], c, t * pt, h * ph, w * pw))\n        return imgs\n\n    def forward(self, x, t, y):\n        \"\"\"\n        Forward pass of DiT.\n        x: (B, C, T, H, W) tensor of inputs\n        t: (B,) tensor of diffusion timesteps\n        y: list of text\n        \"\"\"\n        # origin inputs should be float32, cast to specified dtype\n        x = x.to(self.dtype)\n        if self.use_text_encoder:\n            y = y.to(self.dtype)\n\n        # embedding\n        x = self.x_embedder(x)  # (B, N, D)\n        x = rearrange(x, \"b (t s) d -> b t s d\", t=self.num_temporal, s=self.num_spatial)\n        x = x + self.pos_embed_spatial\n        if not self.no_temporal_pos_emb:\n            x = rearrange(x, \"b t s d -> b s t d\")\n            x = x + self.pos_embed_temporal\n            x = rearrange(x, \"b s t d -> b (t s) d\")\n        else:\n            x = rearrange(x, \"b t s d -> b (t s) d\")\n\n        t = self.t_embedder(t, dtype=x.dtype)  # (N, D)\n        y = self.y_embedder(y, self.training)  # (N, D)\n        if self.use_text_encoder:\n            y = y.squeeze(1).squeeze(1)\n        condition = t + y\n\n        # blocks\n        for _, block in enumerate(self.blocks):\n            c = condition\n            x = auto_grad_checkpoint(block, x, c)  # (B, N, D)\n\n        # final process\n        x = self.final_layer(x, condition)  # (B, N, num_patches * out_channels)\n        x = self.unpatchify(x)  # (B, out_channels, T, H, W)\n\n        # cast to float32 for better accuracy\n        x = x.to(torch.float32)\n        return x\n\n    def initialize_weights(self):\n        # Initialize transformer layers:\n        def _basic_init(module):\n            if isinstance(module, nn.Linear):\n                if module.weight.requires_grad_:\n                    torch.nn.init.xavier_uniform_(module.weight)\n                    if module.bias is not None:\n                        nn.init.constant_(module.bias, 0)\n\n        self.apply(_basic_init)\n\n        # Initialize patch_embed like nn.Linear (instead of nn.Conv2d):\n        w = self.x_embedder.proj.weight.data\n        nn.init.xavier_uniform_(w.view([w.shape[0], -1]))\n        nn.init.constant_(self.x_embedder.proj.bias, 0)\n\n        # Initialize timestep embedding MLP:\n        nn.init.normal_(self.t_embedder.mlp[0].weight, std=0.02)\n        nn.init.normal_(self.t_embedder.mlp[2].weight, std=0.02)\n\n        # Zero-out adaLN modulation layers in DiT blocks:\n        for block in self.blocks:\n            nn.init.constant_(block.adaLN_modulation[-1].weight, 0)\n            nn.init.constant_(block.adaLN_modulation[-1].bias, 0)\n\n        # Zero-out output layers:\n        nn.init.constant_(self.final_layer.adaLN_modulation[-1].weight, 0)\n        nn.init.constant_(self.final_layer.adaLN_modulation[-1].bias, 0)\n        nn.init.constant_(self.final_layer.linear.weight, 0)\n        nn.init.constant_(self.final_layer.linear.bias, 0)\n\n        # Zero-out text embedding layers:\n        if self.use_text_encoder:\n            nn.init.normal_(self.y_embedder.y_proj.fc1.weight, std=0.02)\n            nn.init.normal_(self.y_embedder.y_proj.fc2.weight, std=0.02)\n\n\n@MODELS.register_module(\"DiT-XL/2\")\ndef DiT_XL_2(from_pretrained=None, **kwargs):\n    model = DiT(\n        depth=28,\n        hidden_size=1152,\n        patch_size=(1, 2, 2),\n        num_heads=16,\n        **kwargs,\n    )\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n\n\n@MODELS.register_module(\"DiT-XL/2x2\")\ndef DiT_XL_2x2(from_pretrained=None, **kwargs):\n    model = DiT(\n        depth=28,\n        hidden_size=1152,\n        patch_size=(2, 2, 2),\n        num_heads=16,\n        **kwargs,\n    )\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n"
  },
  {
    "path": "Open-Sora/opensora/models/latte/__init__.py",
    "content": "from .latte import Latte, Latte_XL_2, Latte_XL_2x2\n"
  },
  {
    "path": "Open-Sora/opensora/models/latte/latte.py",
    "content": "# Copyright 2024 Vchitect/Latte\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.# Modified from Latte\n#\n#\n# This file is mofied from https://github.com/Vchitect/Latte/blob/main/models/latte.py\n#\n# With references to:\n# Latte:  https://github.com/Vchitect/Latte\n# DiT:    https://github.com/facebookresearch/DiT/tree/main\n\n\nimport torch\nfrom einops import rearrange, repeat\n\nfrom opensora.acceleration.checkpoint import auto_grad_checkpoint\nfrom opensora.models.dit import DiT\nfrom opensora.registry import MODELS\nfrom opensora.utils.ckpt_utils import load_checkpoint\n\n\n@MODELS.register_module()\nclass Latte(DiT):\n    def forward(self, x, t, y):\n        \"\"\"\n        Forward pass of DiT.\n        x: (B, C, T, H, W) tensor of inputs\n        t: (B,) tensor of diffusion timesteps\n        y: list of text\n        \"\"\"\n        # origin inputs should be float32, cast to specified dtype\n        x = x.to(self.dtype)\n\n        # embedding\n        x = self.x_embedder(x)  # (B, N, D)\n        x = rearrange(x, \"b (t s) d -> b t s d\", t=self.num_temporal, s=self.num_spatial)\n        x = x + self.pos_embed_spatial\n        x = rearrange(x, \"b t s d -> b (t s) d\")\n\n        t = self.t_embedder(t, dtype=x.dtype)  # (N, D)\n        y = self.y_embedder(y, self.training)  # (N, D)\n        if self.use_text_encoder:\n            y = y.squeeze(1).squeeze(1)\n        condition = t + y\n        condition_spatial = repeat(condition, \"b d -> (b t) d\", t=self.num_temporal)\n        condition_temporal = repeat(condition, \"b d -> (b s) d\", s=self.num_spatial)\n\n        # blocks\n        for i, block in enumerate(self.blocks):\n            if i % 2 == 0:\n                # spatial\n                x = rearrange(x, \"b (t s) d -> (b t) s d\", t=self.num_temporal, s=self.num_spatial)\n                c = condition_spatial\n            else:\n                # temporal\n                x = rearrange(x, \"b (t s) d -> (b s) t d\", t=self.num_temporal, s=self.num_spatial)\n                c = condition_temporal\n                if i == 1:\n                    x = x + self.pos_embed_temporal\n\n            x = auto_grad_checkpoint(block, x, c)  # (B, N, D)\n\n            if i % 2 == 0:\n                x = rearrange(x, \"(b t) s d -> b (t s) d\", t=self.num_temporal, s=self.num_spatial)\n            else:\n                x = rearrange(x, \"(b s) t d -> b (t s) d\", t=self.num_temporal, s=self.num_spatial)\n\n        # final process\n        x = self.final_layer(x, condition)  # (B, N, num_patches * out_channels)\n        x = self.unpatchify(x)  # (B, out_channels, T, H, W)\n\n        # cast to float32 for better accuracy\n        x = x.to(torch.float32)\n        return x\n\n\n@MODELS.register_module(\"Latte-XL/2\")\ndef Latte_XL_2(from_pretrained=None, **kwargs):\n    model = Latte(\n        depth=28,\n        hidden_size=1152,\n        patch_size=(1, 2, 2),\n        num_heads=16,\n        **kwargs,\n    )\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n\n\n@MODELS.register_module(\"Latte-XL/2x2\")\ndef Latte_XL_2x2(from_pretrained=None, **kwargs):\n    model = Latte(\n        depth=28,\n        hidden_size=1152,\n        patch_size=(2, 2, 2),\n        num_heads=16,\n        **kwargs,\n    )\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n"
  },
  {
    "path": "Open-Sora/opensora/models/layers/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/opensora/models/layers/blocks.py",
    "content": "# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# PixArt: https://github.com/PixArt-alpha/PixArt-alpha\n# Latte:  https://github.com/Vchitect/Latte\n# DiT:    https://github.com/facebookresearch/DiT/tree/main\n# GLIDE:  https://github.com/openai/glide-text2im\n# MAE:    https://github.com/facebookresearch/mae/blob/main/models_mae.py\n# --------------------------------------------------------\n\nimport functools\nimport math\nfrom typing import Optional\n\nimport numpy as np\nimport torch\nimport torch.distributed as dist\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.utils.checkpoint\nimport xformers.ops\nfrom einops import rearrange\nfrom timm.models.vision_transformer import Mlp\n\nfrom opensora.acceleration.communications import all_to_all, split_forward_gather_backward\nfrom opensora.acceleration.parallel_states import get_sequence_parallel_group\n\nfrom ..cache_functions.attention import cached_attention_forward\n\napprox_gelu = lambda: nn.GELU(approximate=\"tanh\")\n\n\nclass LlamaRMSNorm(nn.Module):\n    def __init__(self, hidden_size, eps=1e-6):\n        \"\"\"\n        LlamaRMSNorm is equivalent to T5LayerNorm\n        \"\"\"\n        super().__init__()\n        self.weight = nn.Parameter(torch.ones(hidden_size))\n        self.variance_epsilon = eps\n\n    def forward(self, hidden_states):\n        input_dtype = hidden_states.dtype\n        hidden_states = hidden_states.to(torch.float32)\n        variance = hidden_states.pow(2).mean(-1, keepdim=True)\n        hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)\n        return self.weight * hidden_states.to(input_dtype)\n\n\ndef get_layernorm(hidden_size: torch.Tensor, eps: float, affine: bool, use_kernel: bool):\n    if use_kernel:\n        try:\n            from apex.normalization import FusedLayerNorm\n\n            return FusedLayerNorm(hidden_size, elementwise_affine=affine, eps=eps)\n        except ImportError:\n            raise RuntimeError(\"FusedLayerNorm not available. Please install apex.\")\n    else:\n        return nn.LayerNorm(hidden_size, eps, elementwise_affine=affine)\n\n\ndef modulate(norm_func, x, shift, scale):\n    # Suppose x is (B, N, D), shift is (B, D), scale is (B, D)\n    dtype = x.dtype\n    x = norm_func(x.to(torch.float32)).to(dtype)\n    x = x * (scale.unsqueeze(1) + 1) + shift.unsqueeze(1)\n    x = x.to(dtype)\n    return x\n\n\ndef t2i_modulate(x, shift, scale):\n    return x * (1 + scale) + shift\n\n\n# ===============================================\n# General-purpose Layers\n# ===============================================\n\n\nclass PatchEmbed3D(nn.Module):\n    \"\"\"Video to Patch Embedding.\n\n    Args:\n        patch_size (int): Patch token size. Default: (2,4,4).\n        in_chans (int): Number of input video channels. Default: 3.\n        embed_dim (int): Number of linear projection output channels. Default: 96.\n        norm_layer (nn.Module, optional): Normalization layer. Default: None\n    \"\"\"\n\n    def __init__(\n        self,\n        patch_size=(2, 4, 4),\n        in_chans=3,\n        embed_dim=96,\n        norm_layer=None,\n        flatten=True,\n    ):\n        super().__init__()\n        self.patch_size = patch_size\n        self.flatten = flatten\n\n        self.in_chans = in_chans\n        self.embed_dim = embed_dim\n\n        self.proj = nn.Conv3d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size)\n        if norm_layer is not None:\n            self.norm = norm_layer(embed_dim)\n        else:\n            self.norm = None\n\n    def forward(self, x):\n        \"\"\"Forward function.\"\"\"\n        # padding\n        _, _, D, H, W = x.size()\n        if W % self.patch_size[2] != 0:\n            x = F.pad(x, (0, self.patch_size[2] - W % self.patch_size[2]))\n        if H % self.patch_size[1] != 0:\n            x = F.pad(x, (0, 0, 0, self.patch_size[1] - H % self.patch_size[1]))\n        if D % self.patch_size[0] != 0:\n            x = F.pad(x, (0, 0, 0, 0, 0, self.patch_size[0] - D % self.patch_size[0]))\n\n        x = self.proj(x)  # (B C T H W)\n        if self.norm is not None:\n            D, Wh, Ww = x.size(2), x.size(3), x.size(4)\n            x = x.flatten(2).transpose(1, 2)\n            x = self.norm(x)\n            x = x.transpose(1, 2).view(-1, self.embed_dim, D, Wh, Ww)\n        if self.flatten:\n            x = x.flatten(2).transpose(1, 2)  # BCTHW -> BNC\n        return x\n\n\nclass Attention(nn.Module):\n    def __init__(\n        self,\n        dim: int,\n        num_heads: int = 8,\n        qkv_bias: bool = False,\n        qk_norm: bool = False,\n        attn_drop: float = 0.0,\n        proj_drop: float = 0.0,\n        norm_layer: nn.Module = LlamaRMSNorm,\n        enable_flash_attn: bool = False,\n        rope=None,\n        qk_norm_legacy: bool = False,\n    ) -> None:\n        super().__init__()\n        assert dim % num_heads == 0, \"dim should be divisible by num_heads\"\n        self.dim = dim\n        self.num_heads = num_heads\n        self.head_dim = dim // num_heads\n        self.scale = self.head_dim**-0.5\n        self.enable_flash_attn = enable_flash_attn\n\n        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)\n        self.q_norm = norm_layer(self.head_dim) if qk_norm else nn.Identity()\n        self.k_norm = norm_layer(self.head_dim) if qk_norm else nn.Identity()\n        self.qk_norm_legacy = qk_norm_legacy\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj = nn.Linear(dim, dim)\n        self.proj_drop = nn.Dropout(proj_drop)\n\n        self.rope = False\n        if rope is not None:\n            self.rope = True\n            self.rotary_emb = rope\n        \n        self.is_causal = False\n    \n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        B, N, C = x.shape\n        # flash attn is not memory efficient for small sequences, this is empirical\n        enable_flash_attn = self.enable_flash_attn and (N > B)\n        qkv = self.qkv(x)\n        qkv_shape = (B, N, 3, self.num_heads, self.head_dim)\n\n        qkv = qkv.view(qkv_shape).permute(2, 0, 3, 1, 4)\n        q, k, v = qkv.unbind(0)\n        if self.qk_norm_legacy:\n            # WARNING: this may be a bug\n            if self.rope:\n                q = self.rotary_emb(q)\n                k = self.rotary_emb(k)\n            q, k = self.q_norm(q), self.k_norm(k)\n        else:\n            q, k = self.q_norm(q), self.k_norm(k)\n            if self.rope:\n                q = self.rotary_emb(q)\n                k = self.rotary_emb(k)\n\n        if enable_flash_attn:\n            from flash_attn import flash_attn_func\n\n            # (B, #heads, N, #dim) -> (B, N, #heads, #dim)\n            q = q.permute(0, 2, 1, 3)\n            k = k.permute(0, 2, 1, 3)\n            v = v.permute(0, 2, 1, 3)\n            x = flash_attn_func(\n                q,\n                k,\n                v,\n                dropout_p=self.attn_drop.p if self.training else 0.0,\n                softmax_scale=self.scale,\n                causal=self.is_causal,\n            )\n        else:\n            dtype = q.dtype\n            q = q * self.scale\n            #attn = q @ k.transpose(-2, -1)  # translate attn to float32\n            attn = torch.matmul(q,k.transpose(-2, -1))\n            attn = attn.to(torch.float32)\n            if self.is_causal:\n                causal_mask = torch.tril(torch.ones_like(attn), diagonal=0)\n                causal_mask = torch.where(causal_mask.bool(), 0, float('-inf'))\n                attn += causal_mask\n            attn = attn.softmax(dim=-1)\n            attn = attn.to(dtype)  # cast back attn to original dtype\n            attn = self.attn_drop(attn)\n            #x = attn @ v\n            x = torch.matmul(attn,v)\n\n        x_output_shape = (B, N, C)\n        if not enable_flash_attn:\n            x = x.transpose(1, 2)\n        x = x.reshape(x_output_shape)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        return x\n\n\nclass KVCompressAttention(nn.Module):\n    def __init__(\n        self,\n        dim: int,\n        num_heads: int = 8,\n        qkv_bias: bool = False,\n        qk_norm: bool = False,\n        attn_drop: float = 0.0,\n        proj_drop: float = 0.0,\n        norm_layer: nn.Module = LlamaRMSNorm,\n        enable_flash_attn: bool = False,\n        sampling=\"conv\",\n        sr_ratio=1,\n        mem_eff_attention=False,\n        attn_half=False,\n    ) -> None:\n        super().__init__()\n        assert dim % num_heads == 0, \"dim should be divisible by num_heads\"\n        self.dim = dim\n        self.num_heads = num_heads\n        self.head_dim = dim // num_heads\n        self.scale = self.head_dim**-0.5\n        self.enable_flash_attn = enable_flash_attn\n\n        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)\n\n        self.sr_ratio = sr_ratio\n        self.sampling = sampling\n        if sr_ratio > 1 and sampling == \"conv\":\n            # Avg Conv Init.\n            self.sr = nn.Conv2d(dim, dim, groups=dim, kernel_size=sr_ratio, stride=sr_ratio)\n            self.sr.weight.data.fill_(1 / sr_ratio**2)\n            self.sr.bias.data.zero_()\n            self.norm = nn.LayerNorm(dim)\n\n        self.q_norm = norm_layer(self.head_dim) if qk_norm else nn.Identity()\n        self.k_norm = norm_layer(self.head_dim) if qk_norm else nn.Identity()\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj = nn.Linear(dim, dim)\n        self.proj_drop = nn.Dropout(proj_drop)\n\n        self.mem_eff_attention = mem_eff_attention\n        self.attn_half = attn_half\n\n    def downsample_2d(self, tensor, H, W, scale_factor, sampling=None):\n        if sampling is None or scale_factor == 1:\n            return tensor\n        B, N, C = tensor.shape\n\n        if sampling == \"uniform_every\":\n            return tensor[:, ::scale_factor], int(N // scale_factor)\n\n        tensor = tensor.reshape(B, H, W, C).permute(0, 3, 1, 2)\n        new_H, new_W = int(H / scale_factor), int(W / scale_factor)\n        new_N = new_H * new_W\n\n        if sampling == \"ave\":\n            tensor = F.interpolate(tensor, scale_factor=1 / scale_factor, mode=\"nearest\").permute(0, 2, 3, 1)\n        elif sampling == \"uniform\":\n            tensor = tensor[:, :, ::scale_factor, ::scale_factor].permute(0, 2, 3, 1)\n        elif sampling == \"conv\":\n            tensor = self.sr(tensor).reshape(B, C, -1).permute(0, 2, 1)\n            tensor = self.norm(tensor)\n        else:\n            raise ValueError\n\n        return tensor.reshape(B, new_N, C).contiguous(), new_N\n\n    def forward(self, x: torch.Tensor, mask=None, HW=None, block_id=None, **kwargs) -> torch.Tensor:\n        B, N, C = x.shape\n        new_N = N\n        H, W = HW\n        # flash attn is not memory efficient for small sequences, this is empirical\n        enable_flash_attn = self.enable_flash_attn and (N > B)\n\n        qkv = self.qkv(x).reshape(B, N, 3, C)\n        q, k, v = qkv.unbind(2)\n        dtype = q.dtype\n        # KV compression\n        if self.sr_ratio > 1:\n            k, new_N = self.downsample_2d(k, H, W, self.sr_ratio, sampling=self.sampling)\n            v, new_N = self.downsample_2d(v, H, W, self.sr_ratio, sampling=self.sampling)\n\n        q = q.reshape(B, N, self.num_heads, C // self.num_heads).to(dtype)\n        k = k.reshape(B, new_N, self.num_heads, C // self.num_heads).to(dtype)\n        v = v.reshape(B, new_N, self.num_heads, C // self.num_heads).to(dtype)\n\n        q, k = self.q_norm(q), self.k_norm(k)\n\n        if enable_flash_attn:\n            from flash_attn import flash_attn_func\n\n            x = flash_attn_func(\n                q,\n                k,\n                v,\n                dropout_p=self.attn_drop.p if self.training else 0.0,\n                softmax_scale=self.scale,\n            )\n\n        elif self.mem_eff_attention:\n            attn_bias = None\n            if mask is not None:\n                attn_bias = torch.zeros([B * self.num_heads, q.shape[1], k.shape[1]], dtype=q.dtype, device=q.device)\n                attn_bias.masked_fill_(mask.squeeze(1).repeat(self.num_heads, 1, 1) == 0, float(\"-inf\"))\n            x = xformers.ops.memory_efficient_attention(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)\n        else:\n            # (B, N, #heads, #dim) -> (B, #heads, N, #dim)\n            q = q.permute(0, 2, 1, 3)\n            k = k.permute(0, 2, 1, 3)\n            v = v.permute(0, 2, 1, 3)\n            dtype = q.dtype\n            q = q * self.scale\n            attn = q @ k.transpose(-2, -1)  # translate attn to float32\n            if not self.attn_half:\n                attn = attn.to(torch.float32)\n            attn = attn.softmax(dim=-1)\n            attn = attn.to(dtype)  # cast back attn to original dtype\n            attn = self.attn_drop(attn)\n            x = attn @ v\n\n        x_output_shape = (B, N, C)\n        if not enable_flash_attn:\n            x = x.transpose(1, 2)\n        x = x.reshape(x_output_shape)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        return x\n\n\nclass SeqParallelAttention(Attention):\n    def __init__(\n        self,\n        dim: int,\n        num_heads: int = 8,\n        qkv_bias: bool = False,\n        qk_norm: bool = False,\n        attn_drop: float = 0.0,\n        proj_drop: float = 0.0,\n        norm_layer: nn.Module = LlamaRMSNorm,\n        enable_flash_attn: bool = False,\n        rope=None,\n    ) -> None:\n        assert rope is None, \"Rope is not supported in SeqParallelAttention\"\n        super().__init__(\n            dim=dim,\n            num_heads=num_heads,\n            qkv_bias=qkv_bias,\n            qk_norm=qk_norm,\n            attn_drop=attn_drop,\n            proj_drop=proj_drop,\n            norm_layer=norm_layer,\n            enable_flash_attn=enable_flash_attn,\n        )\n\n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        B, N, C = x.shape  # for sequence parallel here, the N is a local sequence length\n        qkv = self.qkv(x)\n        qkv_shape = (B, N, 3, self.num_heads, self.head_dim)\n        qkv = qkv.view(qkv_shape)\n\n        sp_group = get_sequence_parallel_group()\n\n        # apply all_to_all to gather sequence and split attention heads\n        # [B, SUB_N, 3, NUM_HEAD, HEAD_DIM] -> [B, N, 3, NUM_HEAD_PER_DEVICE, HEAD_DIM]\n        qkv = all_to_all(qkv, sp_group, scatter_dim=3, gather_dim=1)\n\n        if self.enable_flash_attn:\n            qkv_permute_shape = (\n                2,\n                0,\n                1,\n                3,\n                4,\n            )  # [3, B, N, NUM_HEAD_PER_DEVICE, HEAD_DIM]\n        else:\n            qkv_permute_shape = (\n                2,\n                0,\n                3,\n                1,\n                4,\n            )  # [3, B, NUM_HEAD_PER_DEVICE, N, HEAD_DIM]\n        qkv = qkv.permute(qkv_permute_shape)\n\n        # ERROR: Should qk_norm first\n        q, k, v = qkv.unbind(0)\n        q, k = self.q_norm(q), self.k_norm(k)\n        if self.enable_flash_attn:\n            from flash_attn import flash_attn_func\n\n            x = flash_attn_func(\n                q,\n                k,\n                v,\n                dropout_p=self.attn_drop.p if self.training else 0.0,\n                softmax_scale=self.scale,\n            )\n        else:\n            dtype = q.dtype\n            q = q * self.scale\n            attn = q @ k.transpose(-2, -1)  # translate attn to float32\n            attn = attn.to(torch.float32)\n            attn = attn.softmax(dim=-1)\n            attn = attn.to(dtype)  # cast back attn to original dtype\n            attn = self.attn_drop(attn)\n            x = attn @ v\n\n        if not self.enable_flash_attn:\n            x = x.transpose(1, 2)\n\n        # apply all to all to gather back attention heads and split sequence\n        # [B, N, NUM_HEAD_PER_DEVICE, HEAD_DIM]  -> [B, SUB_N, NUM_HEAD, HEAD_DIM]\n        x = all_to_all(x, sp_group, scatter_dim=1, gather_dim=2)\n\n        # reshape outputs back to [B, N, C]\n        x_output_shape = (B, N, C)\n        x = x.reshape(x_output_shape)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        return x\n\n\nclass MultiHeadCrossAttention(nn.Module):\n    def __init__(self, d_model, num_heads, attn_drop=0.0, proj_drop=0.0):\n        super(MultiHeadCrossAttention, self).__init__()\n        assert d_model % num_heads == 0, \"d_model must be divisible by num_heads\"\n\n        self.d_model = d_model\n        self.num_heads = num_heads\n        self.head_dim = d_model // num_heads\n\n        self.q_linear = nn.Linear(d_model, d_model)\n        self.kv_linear = nn.Linear(d_model, d_model * 2)\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj = nn.Linear(d_model, d_model)\n        self.proj_drop = nn.Dropout(proj_drop)\n    \n    def forward(self, x, cond, mask=None):\n        #start = torch.cuda.Event(enable_timing=True)\n        #end = torch.cuda.Event(enable_timing=True)\n        # query/value: img tokens; key: condition; mask: if padding tokens\n        B, N, C = x.shape\n        #start.record()\n        q = self.q_linear(x).view(1, -1, self.num_heads, self.head_dim)\n        kv = self.kv_linear(cond).view(1, -1, 2, self.num_heads, self.head_dim)\n        k, v = kv.unbind(2)\n\n        attn_bias = None\n        if mask is not None:\n            attn_bias = xformers.ops.fmha.BlockDiagonalMask.from_seqlens([N] * B, mask)\n        #x = xformers.ops.memory_efficient_attention(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)\n\n        x, cross_attn_map = cached_attention_forward(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)\n        x = x.view(B, -1, C)\n        cross_attn_map = cross_attn_map.view(B, -1, cross_attn_map.shape[-1])\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        #end.record()\n        #torch.cuda.synchronize()\n        #print(start.elapsed_time(end))\n        return x, cross_attn_map\n\n\nclass SeqParallelMultiHeadCrossAttention(MultiHeadCrossAttention):\n    def __init__(\n        self,\n        d_model,\n        num_heads,\n        attn_drop=0.0,\n        proj_drop=0.0,\n    ):\n        super().__init__(\n            d_model=d_model,\n            num_heads=num_heads,\n            attn_drop=attn_drop,\n            proj_drop=proj_drop,\n        )\n\n    def forward(self, x, cond, mask=None):\n        # query/value: img tokens; key: condition; mask: if padding tokens\n        sp_group = get_sequence_parallel_group()\n        sp_size = dist.get_world_size(sp_group)\n        B, SUB_N, C = x.shape  # [B, TS/p, C]\n        N = SUB_N * sp_size\n\n        # shape:\n        # q, k, v: [B, SUB_N, NUM_HEADS, HEAD_DIM]\n        q = self.q_linear(x).view(B, -1, self.num_heads, self.head_dim)\n        kv = self.kv_linear(cond).view(1, -1, 2, self.num_heads, self.head_dim)\n        kv = split_forward_gather_backward(kv, get_sequence_parallel_group(), dim=3, grad_scale=\"down\")\n        k, v = kv.unbind(2)\n\n        # apply all_to_all to gather sequence and split attention heads\n        q = all_to_all(q, sp_group, scatter_dim=2, gather_dim=1)\n\n        q = q.view(1, -1, self.num_heads // sp_size, self.head_dim)\n        k = k.view(1, -1, self.num_heads // sp_size, self.head_dim)\n        v = v.view(1, -1, self.num_heads // sp_size, self.head_dim)\n\n        # compute attention\n        attn_bias = None\n        if mask is not None:\n            attn_bias = xformers.ops.fmha.BlockDiagonalMask.from_seqlens([N] * B, mask)\n        x = xformers.ops.memory_efficient_attention(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)\n\n        # apply all to all to gather back attention heads and scatter sequence\n        x = x.view(B, -1, self.num_heads // sp_size, self.head_dim)\n        x = all_to_all(x, sp_group, scatter_dim=1, gather_dim=2)\n\n        # apply output projection\n        x = x.view(B, -1, C)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        return x\n\n\nclass FinalLayer(nn.Module):\n    \"\"\"\n    The final layer of DiT.\n    \"\"\"\n\n    def __init__(self, hidden_size, num_patch, out_channels):\n        super().__init__()\n        self.norm_final = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n        self.linear = nn.Linear(hidden_size, num_patch * out_channels, bias=True)\n        self.adaLN_modulation = nn.Sequential(nn.SiLU(), nn.Linear(hidden_size, 2 * hidden_size, bias=True))\n\n    def forward(self, x, c):\n        shift, scale = self.adaLN_modulation(c).chunk(2, dim=1)\n        x = modulate(self.norm_final, x, shift, scale)\n        x = self.linear(x)\n        return x\n\n\nclass T2IFinalLayer(nn.Module):\n    \"\"\"\n    The final layer of PixArt.\n    \"\"\"\n\n    def __init__(self, hidden_size, num_patch, out_channels, d_t=None, d_s=None):\n        super().__init__()\n        self.norm_final = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n        self.linear = nn.Linear(hidden_size, num_patch * out_channels, bias=True)\n        self.scale_shift_table = nn.Parameter(torch.randn(2, hidden_size) / hidden_size**0.5)\n        self.out_channels = out_channels\n        self.d_t = d_t\n        self.d_s = d_s\n\n    def t_mask_select(self, x_mask, x, masked_x, T, S):\n        # x: [B, (T, S), C]\n        # mased_x: [B, (T, S), C]\n        # x_mask: [B, T]\n        x = rearrange(x, \"B (T S) C -> B T S C\", T=T, S=S)\n        masked_x = rearrange(masked_x, \"B (T S) C -> B T S C\", T=T, S=S)\n        x = torch.where(x_mask[:, :, None, None], x, masked_x)\n        x = rearrange(x, \"B T S C -> B (T S) C\")\n        return x\n\n    def forward(self, x, t, x_mask=None, t0=None, T=None, S=None):\n        if T is None:\n            T = self.d_t\n        if S is None:\n            S = self.d_s\n        shift, scale = (self.scale_shift_table[None] + t[:, None]).chunk(2, dim=1)\n        x = t2i_modulate(self.norm_final(x), shift, scale)\n        if x_mask is not None:\n            shift_zero, scale_zero = (self.scale_shift_table[None] + t0[:, None]).chunk(2, dim=1)\n            x_zero = t2i_modulate(self.norm_final(x), shift_zero, scale_zero)\n            x = self.t_mask_select(x_mask, x, x_zero, T, S)\n        x = self.linear(x)\n        return x\n\n\n# ===============================================\n# Embedding Layers for Timesteps and Class Labels\n# ===============================================\n\n\nclass TimestepEmbedder(nn.Module):\n    \"\"\"\n    Embeds scalar timesteps into vector representations.\n    \"\"\"\n\n    def __init__(self, hidden_size, frequency_embedding_size=256):\n        super().__init__()\n        self.mlp = nn.Sequential(\n            nn.Linear(frequency_embedding_size, hidden_size, bias=True),\n            nn.SiLU(),\n            nn.Linear(hidden_size, hidden_size, bias=True),\n        )\n        self.frequency_embedding_size = frequency_embedding_size\n\n    @staticmethod\n    def timestep_embedding(t, dim, max_period=10000):\n        \"\"\"\n        Create sinusoidal timestep embeddings.\n        :param t: a 1-D Tensor of N indices, one per batch element.\n                          These may be fractional.\n        :param dim: the dimension of the output.\n        :param max_period: controls the minimum frequency of the embeddings.\n        :return: an (N, D) Tensor of positional embeddings.\n        \"\"\"\n        # https://github.com/openai/glide-text2im/blob/main/glide_text2im/nn.py\n        half = dim // 2\n        freqs = torch.exp(-math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half)\n        freqs = freqs.to(device=t.device)\n        args = t[:, None].float() * freqs[None]\n        embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)\n        if dim % 2:\n            embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)\n        return embedding\n\n    def forward(self, t, dtype):\n        t_freq = self.timestep_embedding(t, self.frequency_embedding_size)\n        if t_freq.dtype != dtype:\n            t_freq = t_freq.to(dtype)\n        t_emb = self.mlp(t_freq)\n        return t_emb\n\n\nclass LabelEmbedder(nn.Module):\n    \"\"\"\n    Embeds class labels into vector representations. Also handles label dropout for classifier-free guidance.\n    \"\"\"\n\n    def __init__(self, num_classes, hidden_size, dropout_prob):\n        super().__init__()\n        use_cfg_embedding = dropout_prob > 0\n        self.embedding_table = nn.Embedding(num_classes + use_cfg_embedding, hidden_size)\n        self.num_classes = num_classes\n        self.dropout_prob = dropout_prob\n\n    def token_drop(self, labels, force_drop_ids=None):\n        \"\"\"\n        Drops labels to enable classifier-free guidance.\n        \"\"\"\n        if force_drop_ids is None:\n            drop_ids = torch.rand(labels.shape[0]).cuda() < self.dropout_prob\n        else:\n            drop_ids = force_drop_ids == 1\n        labels = torch.where(drop_ids, self.num_classes, labels)\n        return labels\n\n    def forward(self, labels, train, force_drop_ids=None):\n        use_dropout = self.dropout_prob > 0\n        if (train and use_dropout) or (force_drop_ids is not None):\n            labels = self.token_drop(labels, force_drop_ids)\n        return self.embedding_table(labels)\n\n\nclass SizeEmbedder(TimestepEmbedder):\n    \"\"\"\n    Embeds scalar timesteps into vector representations.\n    \"\"\"\n\n    def __init__(self, hidden_size, frequency_embedding_size=256):\n        super().__init__(hidden_size=hidden_size, frequency_embedding_size=frequency_embedding_size)\n        self.mlp = nn.Sequential(\n            nn.Linear(frequency_embedding_size, hidden_size, bias=True),\n            nn.SiLU(),\n            nn.Linear(hidden_size, hidden_size, bias=True),\n        )\n        self.frequency_embedding_size = frequency_embedding_size\n        self.outdim = hidden_size\n\n    def forward(self, s, bs):\n        if s.ndim == 1:\n            s = s[:, None]\n        assert s.ndim == 2\n        if s.shape[0] != bs:\n            s = s.repeat(bs // s.shape[0], 1)\n            assert s.shape[0] == bs\n        b, dims = s.shape[0], s.shape[1]\n        s = rearrange(s, \"b d -> (b d)\")\n        s_freq = self.timestep_embedding(s, self.frequency_embedding_size).to(self.dtype)\n        s_emb = self.mlp(s_freq)\n        s_emb = rearrange(s_emb, \"(b d) d2 -> b (d d2)\", b=b, d=dims, d2=self.outdim)\n        return s_emb\n\n    @property\n    def dtype(self):\n        return next(self.parameters()).dtype\n\n\nclass CaptionEmbedder(nn.Module):\n    \"\"\"\n    Embeds class labels into vector representations. Also handles label dropout for classifier-free guidance.\n    \"\"\"\n\n    def __init__(\n        self,\n        in_channels,\n        hidden_size,\n        uncond_prob,\n        act_layer=nn.GELU(approximate=\"tanh\"),\n        token_num=120,\n    ):\n        super().__init__()\n        self.y_proj = Mlp(\n            in_features=in_channels,\n            hidden_features=hidden_size,\n            out_features=hidden_size,\n            act_layer=act_layer,\n            drop=0,\n        )\n        self.register_buffer(\n            \"y_embedding\",\n            torch.randn(token_num, in_channels) / in_channels**0.5,\n        )\n        self.uncond_prob = uncond_prob\n\n    def token_drop(self, caption, force_drop_ids=None):\n        \"\"\"\n        Drops labels to enable classifier-free guidance.\n        \"\"\"\n        if force_drop_ids is None:\n            drop_ids = torch.rand(caption.shape[0]).cuda() < self.uncond_prob\n        else:\n            drop_ids = force_drop_ids == 1\n        caption = torch.where(drop_ids[:, None, None, None], self.y_embedding, caption)\n        return caption\n\n    def forward(self, caption, train, force_drop_ids=None):\n        if train:\n            assert caption.shape[2:] == self.y_embedding.shape\n        use_dropout = self.uncond_prob > 0\n        if (train and use_dropout) or (force_drop_ids is not None):\n            caption = self.token_drop(caption, force_drop_ids)\n        caption = self.y_proj(caption)\n        return caption\n\n\nclass PositionEmbedding2D(nn.Module):\n    def __init__(self, dim: int) -> None:\n        super().__init__()\n        self.dim = dim\n        assert dim % 4 == 0, \"dim must be divisible by 4\"\n        half_dim = dim // 2\n        inv_freq = 1.0 / (10000 ** (torch.arange(0, half_dim, 2).float() / half_dim))\n        self.register_buffer(\"inv_freq\", inv_freq, persistent=False)\n\n    def _get_sin_cos_emb(self, t: torch.Tensor):\n        out = torch.einsum(\"i,d->id\", t, self.inv_freq)\n        emb_cos = torch.cos(out)\n        emb_sin = torch.sin(out)\n        return torch.cat((emb_sin, emb_cos), dim=-1)\n\n    @functools.lru_cache(maxsize=512)\n    def _get_cached_emb(\n        self,\n        device: torch.device,\n        dtype: torch.dtype,\n        h: int,\n        w: int,\n        scale: float = 1.0,\n        base_size: Optional[int] = None,\n    ):\n        grid_h = torch.arange(h, device=device) / scale\n        grid_w = torch.arange(w, device=device) / scale\n        if base_size is not None:\n            grid_h *= base_size / h\n            grid_w *= base_size / w\n        grid_h, grid_w = torch.meshgrid(\n            grid_w,\n            grid_h,\n            indexing=\"ij\",\n        )  # here w goes first\n        grid_h = grid_h.t().reshape(-1)\n        grid_w = grid_w.t().reshape(-1)\n        emb_h = self._get_sin_cos_emb(grid_h)\n        emb_w = self._get_sin_cos_emb(grid_w)\n        return torch.concat([emb_h, emb_w], dim=-1).unsqueeze(0).to(dtype)\n\n    def forward(\n        self,\n        x: torch.Tensor,\n        h: int,\n        w: int,\n        scale: Optional[float] = 1.0,\n        base_size: Optional[int] = None,\n    ) -> torch.Tensor:\n        return self._get_cached_emb(x.device, x.dtype, h, w, scale, base_size)\n\n\n# ===============================================\n# Sine/Cosine Positional Embedding Functions\n# ===============================================\n# https://github.com/facebookresearch/mae/blob/main/util/pos_embed.py\n\n\ndef get_2d_sincos_pos_embed(embed_dim, grid_size, cls_token=False, extra_tokens=0, scale=1.0, base_size=None):\n    \"\"\"\n    grid_size: int of the grid height and width\n    return:\n    pos_embed: [grid_size*grid_size, embed_dim] or [1+grid_size*grid_size, embed_dim] (w/ or w/o cls_token)\n    \"\"\"\n    if not isinstance(grid_size, tuple):\n        grid_size = (grid_size, grid_size)\n\n    grid_h = np.arange(grid_size[0], dtype=np.float32) / scale\n    grid_w = np.arange(grid_size[1], dtype=np.float32) / scale\n    if base_size is not None:\n        grid_h *= base_size / grid_size[0]\n        grid_w *= base_size / grid_size[1]\n    grid = np.meshgrid(grid_w, grid_h)  # here w goes first\n    grid = np.stack(grid, axis=0)\n\n    grid = grid.reshape([2, 1, grid_size[1], grid_size[0]])\n    pos_embed = get_2d_sincos_pos_embed_from_grid(embed_dim, grid)\n    if cls_token and extra_tokens > 0:\n        pos_embed = np.concatenate([np.zeros([extra_tokens, embed_dim]), pos_embed], axis=0)\n    return pos_embed\n\n\ndef get_2d_sincos_pos_embed_from_grid(embed_dim, grid):\n    assert embed_dim % 2 == 0\n\n    # use half of dimensions to encode grid_h\n    emb_h = get_1d_sincos_pos_embed_from_grid(embed_dim // 2, grid[0])  # (H*W, D/2)\n    emb_w = get_1d_sincos_pos_embed_from_grid(embed_dim // 2, grid[1])  # (H*W, D/2)\n\n    emb = np.concatenate([emb_h, emb_w], axis=1)  # (H*W, D)\n    return emb\n\n\ndef get_1d_sincos_pos_embed(embed_dim, length, scale=1.0):\n    pos = np.arange(0, length)[..., None] / scale\n    return get_1d_sincos_pos_embed_from_grid(embed_dim, pos)\n\n\ndef get_1d_sincos_pos_embed_from_grid(embed_dim, pos):\n    \"\"\"\n    embed_dim: output dimension for each position\n    pos: a list of positions to be encoded: size (M,)\n    out: (M, D)\n    \"\"\"\n    assert embed_dim % 2 == 0\n    omega = np.arange(embed_dim // 2, dtype=np.float64)\n    omega /= embed_dim / 2.0\n    omega = 1.0 / 10000**omega  # (D/2,)\n\n    pos = pos.reshape(-1)  # (M,)\n    out = np.einsum(\"m,d->md\", pos, omega)  # (M, D/2), outer product\n\n    emb_sin = np.sin(out)  # (M, D/2)\n    emb_cos = np.cos(out)  # (M, D/2)\n\n    emb = np.concatenate([emb_sin, emb_cos], axis=1)  # (M, D)\n    return emb\n"
  },
  {
    "path": "Open-Sora/opensora/models/pixart/pixart.py",
    "content": "# Adapted from PixArt\n#\n# Copyright (C) 2023  PixArt-alpha/PixArt-alpha\n#\n# This program is free software: you can redistribute it and/or modify\n# it under the terms of the GNU Affero General Public License as published\n# by the Free Software Foundation, either version 3 of the License, or\n# (at your option) any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU Affero General Public License for more details.\n#\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# PixArt: https://github.com/PixArt-alpha/PixArt-alpha\n# DiT:    https://github.com/facebookresearch/DiT/tree/main\n# --------------------------------------------------------\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom einops import rearrange\nfrom timm.models.layers import DropPath\nfrom timm.models.vision_transformer import Mlp\n\n# from .builder import MODELS\nfrom opensora.acceleration.checkpoint import auto_grad_checkpoint\nfrom opensora.models.layers.blocks import (\n    Attention,\n    CaptionEmbedder,\n    MultiHeadCrossAttention,\n    PatchEmbed3D,\n    SeqParallelAttention,\n    SeqParallelMultiHeadCrossAttention,\n    SizeEmbedder,\n    T2IFinalLayer,\n    TimestepEmbedder,\n    approx_gelu,\n    get_1d_sincos_pos_embed,\n    get_2d_sincos_pos_embed,\n    get_layernorm,\n    t2i_modulate,\n)\nfrom opensora.registry import MODELS\nfrom opensora.utils.ckpt_utils import load_checkpoint\n\n\nclass PixArtBlock(nn.Module):\n    \"\"\"\n    A PixArt block with adaptive layer norm (adaLN-single) conditioning.\n    \"\"\"\n\n    def __init__(\n        self,\n        hidden_size,\n        num_heads,\n        mlp_ratio=4.0,\n        drop_path=0.0,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n    ):\n        super().__init__()\n        self.hidden_size = hidden_size\n        self.enable_flash_attn = enable_flash_attn\n        self._enable_sequence_parallelism = enable_sequence_parallelism\n\n        if enable_sequence_parallelism:\n            self.attn_cls = SeqParallelAttention\n            self.mha_cls = SeqParallelMultiHeadCrossAttention\n        else:\n            self.attn_cls = Attention\n            self.mha_cls = MultiHeadCrossAttention\n\n        self.norm1 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.attn = self.attn_cls(\n            hidden_size,\n            num_heads=num_heads,\n            qkv_bias=True,\n            enable_flash_attn=enable_flash_attn,\n        )\n        self.cross_attn = self.mha_cls(hidden_size, num_heads)\n        self.norm2 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.mlp = Mlp(\n            in_features=hidden_size, hidden_features=int(hidden_size * mlp_ratio), act_layer=approx_gelu, drop=0\n        )\n        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity()\n        self.scale_shift_table = nn.Parameter(torch.randn(6, hidden_size) / hidden_size**0.5)\n\n    def forward(self, x, y, t, mask=None):\n        B, N, C = x.shape\n\n        shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (\n            self.scale_shift_table[None] + t.reshape(B, 6, -1)\n        ).chunk(6, dim=1)\n        x = x + self.drop_path(gate_msa * self.attn(t2i_modulate(self.norm1(x), shift_msa, scale_msa)).reshape(B, N, C))\n        x = x + self.cross_attn(x, y, mask)\n        x = x + self.drop_path(gate_mlp * self.mlp(t2i_modulate(self.norm2(x), shift_mlp, scale_mlp)))\n\n        return x\n\n\n@MODELS.register_module()\nclass PixArt(nn.Module):\n    \"\"\"\n    Diffusion model with a Transformer backbone.\n    \"\"\"\n\n    def __init__(\n        self,\n        input_size=(1, 32, 32),\n        in_channels=4,\n        patch_size=(1, 2, 2),\n        hidden_size=1152,\n        depth=28,\n        num_heads=16,\n        mlp_ratio=4.0,\n        class_dropout_prob=0.1,\n        pred_sigma=True,\n        drop_path: float = 0.0,\n        no_temporal_pos_emb=False,\n        caption_channels=4096,\n        model_max_length=120,\n        dtype=torch.float32,\n        freeze=None,\n        space_scale=1.0,\n        time_scale=1.0,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n        base_size=None,\n    ):\n        super().__init__()\n        assert enable_sequence_parallelism is False, \"Sequence parallelism is not supported in this version.\"\n        self.pred_sigma = pred_sigma\n        self.in_channels = in_channels\n        self.out_channels = in_channels * 2 if pred_sigma else in_channels\n        self.hidden_size = hidden_size\n        self.patch_size = patch_size\n        self.input_size = input_size\n        num_patches = np.prod([input_size[i] // patch_size[i] for i in range(3)])\n        self.num_patches = num_patches\n        self.num_temporal = input_size[0] // patch_size[0]\n        self.num_spatial = num_patches // self.num_temporal\n        if base_size is None:\n            self.base_size = int(np.sqrt(self.num_spatial))\n        else:\n            self.base_size = base_size // patch_size[1]\n        self.num_heads = num_heads\n        self.dtype = dtype\n        self.no_temporal_pos_emb = no_temporal_pos_emb\n        self.depth = depth\n        self.mlp_ratio = mlp_ratio\n        self.enable_flash_attn = enable_flash_attn\n        self.enable_layernorm_kernel = enable_layernorm_kernel\n        self.space_scale = space_scale\n        self.time_scale = time_scale\n\n        self.x_embedder = PatchEmbed3D(patch_size, in_channels, hidden_size)\n        self.t_embedder = TimestepEmbedder(hidden_size)\n        self.t_block = nn.Sequential(nn.SiLU(), nn.Linear(hidden_size, 6 * hidden_size, bias=True))\n        self.y_embedder = CaptionEmbedder(\n            in_channels=caption_channels,\n            hidden_size=hidden_size,\n            uncond_prob=class_dropout_prob,\n            act_layer=approx_gelu,\n            token_num=model_max_length,\n        )\n\n        self.register_buffer(\"pos_embed\", self.get_spatial_pos_embed())\n        self.register_buffer(\"pos_embed_temporal\", self.get_temporal_pos_embed())\n\n        drop_path = [x.item() for x in torch.linspace(0, drop_path, depth)]  # stochastic depth decay rule\n        self.blocks = nn.ModuleList(\n            [\n                PixArtBlock(\n                    hidden_size,\n                    num_heads,\n                    mlp_ratio=mlp_ratio,\n                    drop_path=drop_path[i],\n                    enable_flash_attn=enable_flash_attn,\n                    enable_layernorm_kernel=enable_layernorm_kernel,\n                )\n                for i in range(depth)\n            ]\n        )\n        self.final_layer = T2IFinalLayer(hidden_size, np.prod(self.patch_size), self.out_channels)\n\n        self.initialize_weights()\n        if freeze is not None:\n            assert freeze in [\"text\"]\n            if freeze == \"text\":\n                self.freeze_text()\n\n    def forward(self, x, timestep, y, mask=None, **kwargs):\n        \"\"\"\n        Forward pass of PixArt.\n        x: (N, C, H, W) tensor of spatial inputs (images or latent representations of images)\n        t: (N,) tensor of diffusion timesteps\n        y: (N, 1, 120, C) tensor of class labels\n        \"\"\"\n        dtype = self.x_embedder.proj.weight.dtype\n        B = x.size(0)\n        x = x.to(dtype)\n        timestep = timestep.to(dtype)\n        y = y.to(dtype)\n\n        # embedding\n        x = self.x_embedder(x)  # (B, N, D)\n        x = rearrange(x, \"b (t s) d -> b t s d\", t=self.num_temporal, s=self.num_spatial)\n        x = x + self.pos_embed\n        if not self.no_temporal_pos_emb:\n            x = rearrange(x, \"b t s d -> b s t d\")\n            x = x + self.pos_embed_temporal\n            x = rearrange(x, \"b s t d -> b (t s) d\")\n        else:\n            x = rearrange(x, \"b t s d -> b (t s) d\")\n\n        t = self.t_embedder(timestep, dtype=x.dtype)  # (N, D)\n        t0 = self.t_block(t)\n        y = self.y_embedder(y, self.training)  # (N, 1, L, D)\n        if mask is not None:\n            if mask.shape[0] != y.shape[0]:\n                mask = mask.repeat(y.shape[0] // mask.shape[0], 1)\n            mask = mask.squeeze(1).squeeze(1)\n            y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])\n            y_lens = mask.sum(dim=1).tolist()\n        else:\n            y_lens = [y.shape[2]] * y.shape[0]\n            y = y.squeeze(1).view(1, -1, x.shape[-1])\n\n        # blocks\n        for block in self.blocks:\n            x = auto_grad_checkpoint(block, x, y, t0, y_lens)\n\n        # final process\n        x = self.final_layer(x, t)  # (N, T, patch_size ** 2 * out_channels)\n        x = self.unpatchify(x)  # (N, out_channels, H, W)\n\n        # cast to float32 for better accuracy\n        x = x.to(torch.float32)\n        return x\n\n    def unpatchify(self, x):\n        c = self.out_channels\n        t, h, w = [self.input_size[i] // self.patch_size[i] for i in range(3)]\n        pt, ph, pw = self.patch_size\n\n        x = x.reshape(shape=(x.shape[0], t, h, w, pt, ph, pw, c))\n        x = rearrange(x, \"n t h w r p q c -> n c t r h p w q\")\n        imgs = x.reshape(shape=(x.shape[0], c, t * pt, h * ph, w * pw))\n        return imgs\n\n    def get_spatial_pos_embed(self, grid_size=None):\n        if grid_size is None:\n            grid_size = self.input_size[1:]\n        pos_embed = get_2d_sincos_pos_embed(\n            self.hidden_size,\n            (grid_size[0] // self.patch_size[1], grid_size[1] // self.patch_size[2]),\n            scale=self.space_scale,\n            base_size=self.base_size,\n        )\n        pos_embed = torch.from_numpy(pos_embed).float().unsqueeze(0).requires_grad_(False)\n        return pos_embed\n\n    def get_temporal_pos_embed(self):\n        pos_embed = get_1d_sincos_pos_embed(\n            self.hidden_size,\n            self.input_size[0] // self.patch_size[0],\n            scale=self.time_scale,\n        )\n        pos_embed = torch.from_numpy(pos_embed).float().unsqueeze(0).requires_grad_(False)\n        return pos_embed\n\n    def freeze_text(self):\n        for n, p in self.named_parameters():\n            if \"cross_attn\" in n:\n                p.requires_grad = False\n\n    def initialize_weights(self):\n        # Initialize transformer layers:\n        def _basic_init(module):\n            if isinstance(module, nn.Linear):\n                torch.nn.init.xavier_uniform_(module.weight)\n                if module.bias is not None:\n                    nn.init.constant_(module.bias, 0)\n\n        self.apply(_basic_init)\n\n        # Initialize patch_embed like nn.Linear (instead of nn.Conv2d):\n        w = self.x_embedder.proj.weight.data\n        nn.init.xavier_uniform_(w.view([w.shape[0], -1]))\n\n        # Initialize timestep embedding MLP:\n        nn.init.normal_(self.t_embedder.mlp[0].weight, std=0.02)\n        nn.init.normal_(self.t_embedder.mlp[2].weight, std=0.02)\n        nn.init.normal_(self.t_block[1].weight, std=0.02)\n\n        # Initialize caption embedding MLP:\n        nn.init.normal_(self.y_embedder.y_proj.fc1.weight, std=0.02)\n        nn.init.normal_(self.y_embedder.y_proj.fc2.weight, std=0.02)\n\n        # Zero-out adaLN modulation layers in PixArt blocks:\n        for block in self.blocks:\n            nn.init.constant_(block.cross_attn.proj.weight, 0)\n            nn.init.constant_(block.cross_attn.proj.bias, 0)\n\n        # Zero-out output layers:\n        nn.init.constant_(self.final_layer.linear.weight, 0)\n        nn.init.constant_(self.final_layer.linear.bias, 0)\n\n\n@MODELS.register_module()\nclass PixArtMS(PixArt):\n    def __init__(self, *args, **kwargs):\n        super().__init__(*args, **kwargs)\n\n        assert self.hidden_size % 3 == 0, \"hidden_size must be divisible by 3\"\n        self.csize_embedder = SizeEmbedder(self.hidden_size // 3)\n        self.ar_embedder = SizeEmbedder(self.hidden_size // 3)\n\n    def forward(self, x, timestep, y, mask=None, data_info=None):\n        \"\"\"\n        Forward pass of PixArt.\n        x: (N, C, H, W) tensor of spatial inputs (images or latent representations of images)\n        t: (N,) tensor of diffusion timesteps\n        y: (N, 1, 120, C) tensor of class labels\n        \"\"\"\n        x = x.to(self.dtype)\n        timestep = timestep.to(self.dtype)\n        y = y.to(self.dtype)\n\n        c_size = data_info[\"hw\"]\n        ar = data_info[\"ar\"]\n        pos_embed = self.get_spatial_pos_embed((x.shape[-2], x.shape[-1])).to(x.dtype)\n\n        # embedding\n        x = self.x_embedder(x)  # (B, N, D)\n        x = rearrange(x, \"b (t s) d -> b t s d\", t=self.num_temporal, s=self.num_spatial)\n        x = x + pos_embed.to(x.device)\n        if not self.no_temporal_pos_emb:\n            x = rearrange(x, \"b t s d -> b s t d\")\n            x = x + self.pos_embed_temporal\n            x = rearrange(x, \"b s t d -> b (t s) d\")\n        else:\n            x = rearrange(x, \"b t s d -> b (t s) d\")\n\n        t = self.t_embedder(timestep, dtype=x.dtype)  # (N, D)\n        B = x.shape[0]\n        csize = self.csize_embedder(c_size, B)\n        ar = self.ar_embedder(ar, B)\n        t = t + torch.cat([csize, ar], dim=1)\n\n        t0 = self.t_block(t)\n        y = self.y_embedder(y, self.training)  # (N, 1, L, D)\n        if mask is not None:\n            if mask.shape[0] != y.shape[0]:\n                mask = mask.repeat(y.shape[0] // mask.shape[0], 1)\n            mask = mask.squeeze(1).squeeze(1)\n            y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])\n            y_lens = mask.sum(dim=1).tolist()\n        else:\n            y_lens = [y.shape[2]] * y.shape[0]\n            y = y.squeeze(1).view(1, -1, x.shape[-1])\n\n        # blocks\n        for block in self.blocks:\n            x = block(x, y, t0, y_lens)\n\n        # final process\n        x = self.final_layer(x, t)  # (N, T, patch_size ** 2 * out_channels)\n        x = self.unpatchify(x)  # (N, out_channels, H, W)\n\n        # cast to float32 for better accuracy\n        x = x.to(torch.float32)\n        return x\n\n\n@MODELS.register_module(\"PixArt-XL/2\")\ndef PixArt_XL_2(from_pretrained=None, **kwargs):\n    model = PixArt(depth=28, hidden_size=1152, patch_size=(1, 2, 2), num_heads=16, **kwargs)\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n\n\n@MODELS.register_module(\"PixArt-1B/2\")\ndef PixArt_1B_2(from_pretrained=None, **kwargs):\n    model = PixArt(depth=28, hidden_size=1872, patch_size=(1, 2, 2), num_heads=26, **kwargs)\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n\n\n@MODELS.register_module(\"PixArtMS-XL/2\")\ndef PixArtMS_XL_2(from_pretrained=None, **kwargs):\n    model = PixArtMS(depth=28, hidden_size=1152, patch_size=(1, 2, 2), num_heads=16, **kwargs)\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n"
  },
  {
    "path": "Open-Sora/opensora/models/stdit/__init__.py",
    "content": "from .stdit import STDiT\nfrom .stdit2 import STDiT2\nfrom .stdit3 import STDiT3\n"
  },
  {
    "path": "Open-Sora/opensora/models/stdit/stdit.py",
    "content": "import numpy as np\nimport torch\nimport torch.distributed as dist\nimport torch.nn as nn\nfrom einops import rearrange\nfrom timm.models.layers import DropPath\nfrom timm.models.vision_transformer import Mlp\n\nfrom opensora.acceleration.checkpoint import auto_grad_checkpoint\nfrom opensora.acceleration.communications import gather_forward_split_backward, split_forward_gather_backward\nfrom opensora.acceleration.parallel_states import get_sequence_parallel_group\nfrom opensora.models.layers.blocks import (\n    Attention,\n    CaptionEmbedder,\n    MultiHeadCrossAttention,\n    PatchEmbed3D,\n    SeqParallelAttention,\n    SeqParallelMultiHeadCrossAttention,\n    T2IFinalLayer,\n    TimestepEmbedder,\n    approx_gelu,\n    get_1d_sincos_pos_embed,\n    get_2d_sincos_pos_embed,\n    get_layernorm,\n    t2i_modulate,\n)\nfrom opensora.registry import MODELS\nfrom opensora.utils.ckpt_utils import load_checkpoint\n\n\nclass STDiTBlock(nn.Module):\n    def __init__(\n        self,\n        hidden_size,\n        num_heads,\n        d_s=None,\n        d_t=None,\n        mlp_ratio=4.0,\n        drop_path=0.0,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n    ):\n        super().__init__()\n        self.hidden_size = hidden_size\n        self.enable_flash_attn = enable_flash_attn\n        self._enable_sequence_parallelism = enable_sequence_parallelism\n\n        if enable_sequence_parallelism:\n            self.attn_cls = SeqParallelAttention\n            self.mha_cls = SeqParallelMultiHeadCrossAttention\n        else:\n            self.attn_cls = Attention\n            self.mha_cls = MultiHeadCrossAttention\n\n        self.norm1 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.attn = self.attn_cls(\n            hidden_size,\n            num_heads=num_heads,\n            qkv_bias=True,\n            enable_flash_attn=enable_flash_attn,\n        )\n        self.cross_attn = self.mha_cls(hidden_size, num_heads)\n        self.norm2 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.mlp = Mlp(\n            in_features=hidden_size, hidden_features=int(hidden_size * mlp_ratio), act_layer=approx_gelu, drop=0\n        )\n        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity()\n        self.scale_shift_table = nn.Parameter(torch.randn(6, hidden_size) / hidden_size**0.5)\n\n        # temporal attention\n        self.d_s = d_s\n        self.d_t = d_t\n\n        if self._enable_sequence_parallelism:\n            sp_size = dist.get_world_size(get_sequence_parallel_group())\n            # make sure d_t is divisible by sp_size\n            assert d_t % sp_size == 0\n            self.d_t = d_t // sp_size\n\n        self.attn_temp = self.attn_cls(\n            hidden_size,\n            num_heads=num_heads,\n            qkv_bias=True,\n            enable_flash_attn=self.enable_flash_attn,\n        )\n\n    def t_mask_select(self, x, masked_x, x_mask):\n        # x: [B, (T, S), C]\n        # mased_x: [B, (T, S), C]\n        # x_mask: [B, T]\n        x = rearrange(x, \"B (T S) C -> B T S C\", T=self.d_t, S=self.d_s)\n        masked_x = rearrange(masked_x, \"B (T S) C -> B T S C\", T=self.d_t, S=self.d_s)\n        x = torch.where(x_mask[:, :, None, None], x, masked_x)\n        x = rearrange(x, \"B T S C -> B (T S) C\")\n        return x\n\n    def forward(self, x, y, t, mask=None, tpe=None, x_mask=None, t0=None):\n        B, N, C = x.shape\n\n        shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (\n            self.scale_shift_table[None] + t.reshape(B, 6, -1)\n        ).chunk(6, dim=1)\n        x_m = t2i_modulate(self.norm1(x), shift_msa, scale_msa)\n        if x_mask is not None:\n            shift_msa_zero, scale_msa_zero, gate_msa_zero, shift_mlp_zero, scale_mlp_zero, gate_mlp_zero = (\n                self.scale_shift_table[None] + t0.reshape(B, 6, -1)\n            ).chunk(6, dim=1)\n            x_m_zero = t2i_modulate(self.norm1(x), shift_msa_zero, scale_msa_zero)\n            x_m = self.t_mask_select(x_m, x_m_zero, x_mask)\n\n        # spatial branch\n        x_s = rearrange(x_m, \"B (T S) C -> (B T) S C\", T=self.d_t, S=self.d_s)\n        x_s = self.attn(x_s)\n        x_s = rearrange(x_s, \"(B T) S C -> B (T S) C\", T=self.d_t, S=self.d_s)\n\n        if x_mask is not None:\n            x_s_zero = gate_msa_zero * x_s\n            x_s = gate_msa * x_s\n            x_s = self.t_mask_select(x_s, x_s_zero, x_mask)\n        else:\n            x_s = gate_msa * x_s\n\n        x = x + self.drop_path(x_s)\n\n        # temporal branch\n        x_t = rearrange(x, \"B (T S) C -> (B S) T C\", T=self.d_t, S=self.d_s)\n        if tpe is not None:\n            x_t = x_t + tpe\n        x_t = self.attn_temp(x_t)\n        x_t = rearrange(x_t, \"(B S) T C -> B (T S) C\", T=self.d_t, S=self.d_s)\n        x = x + self.drop_path(gate_msa * x_t)\n\n        # cross attn\n        x = x + self.cross_attn(x, y, mask)\n\n        # mlp\n        x_m = t2i_modulate(self.norm2(x), shift_mlp, scale_mlp)\n        if x_mask is not None:\n            x_m_zero = t2i_modulate(self.norm2(x), shift_mlp_zero, scale_mlp_zero)\n            x_m = self.t_mask_select(x_m, x_m_zero, x_mask)\n\n        x_mlp = self.mlp(x_m)\n        if x_mask is not None:\n            x_mlp_zero = gate_mlp_zero * x_mlp\n            x_mlp = gate_mlp * x_mlp\n            x_mlp = self.t_mask_select(x_mlp, x_mlp_zero, x_mask)\n        else:\n            x_mlp = gate_mlp * x_mlp\n\n        x = x + self.drop_path(x_mlp)\n\n        return x\n\n\n@MODELS.register_module()\nclass STDiT(nn.Module):\n    def __init__(\n        self,\n        input_size=(1, 32, 32),\n        in_channels=4,\n        patch_size=(1, 2, 2),\n        hidden_size=1152,\n        depth=28,\n        num_heads=16,\n        mlp_ratio=4.0,\n        class_dropout_prob=0.1,\n        pred_sigma=True,\n        drop_path=0.0,\n        no_temporal_pos_emb=False,\n        caption_channels=4096,\n        model_max_length=120,\n        dtype=torch.float32,\n        space_scale=1.0,\n        time_scale=1.0,\n        freeze=None,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n    ):\n        super().__init__()\n        self.pred_sigma = pred_sigma\n        self.in_channels = in_channels\n        self.out_channels = in_channels * 2 if pred_sigma else in_channels\n        self.hidden_size = hidden_size\n        self.patch_size = patch_size\n        self.input_size = input_size\n        num_patches = np.prod([input_size[i] // patch_size[i] for i in range(3)])\n        self.num_patches = num_patches\n        self.num_temporal = input_size[0] // patch_size[0]\n        self.num_spatial = num_patches // self.num_temporal\n        self.num_heads = num_heads\n        self.dtype = dtype\n        self.no_temporal_pos_emb = no_temporal_pos_emb\n        self.depth = depth\n        self.mlp_ratio = mlp_ratio\n        self.enable_flash_attn = enable_flash_attn\n        self.enable_layernorm_kernel = enable_layernorm_kernel\n        self.space_scale = space_scale\n        self.time_scale = time_scale\n\n        self.register_buffer(\"pos_embed\", self.get_spatial_pos_embed())\n        self.register_buffer(\"pos_embed_temporal\", self.get_temporal_pos_embed())\n\n        self.x_embedder = PatchEmbed3D(patch_size, in_channels, hidden_size)\n        self.t_embedder = TimestepEmbedder(hidden_size)\n        self.t_block = nn.Sequential(nn.SiLU(), nn.Linear(hidden_size, 6 * hidden_size, bias=True))\n        self.y_embedder = CaptionEmbedder(\n            in_channels=caption_channels,\n            hidden_size=hidden_size,\n            uncond_prob=class_dropout_prob,\n            act_layer=approx_gelu,\n            token_num=model_max_length,\n        )\n\n        drop_path = [x.item() for x in torch.linspace(0, drop_path, depth)]\n        self.blocks = nn.ModuleList(\n            [\n                STDiTBlock(\n                    self.hidden_size,\n                    self.num_heads,\n                    mlp_ratio=self.mlp_ratio,\n                    drop_path=drop_path[i],\n                    enable_flash_attn=self.enable_flash_attn,\n                    enable_layernorm_kernel=self.enable_layernorm_kernel,\n                    enable_sequence_parallelism=enable_sequence_parallelism,\n                    d_t=self.num_temporal,\n                    d_s=self.num_spatial,\n                )\n                for i in range(self.depth)\n            ]\n        )\n        self.final_layer = T2IFinalLayer(\n            hidden_size,\n            np.prod(self.patch_size),\n            self.out_channels,\n            d_t=self.num_temporal,\n            d_s=self.num_spatial,\n        )\n\n        # init model\n        self.initialize_weights()\n        self.initialize_temporal()\n        if freeze is not None:\n            assert freeze in [\"not_temporal\", \"text\"]\n            if freeze == \"not_temporal\":\n                self.freeze_not_temporal()\n            elif freeze == \"text\":\n                self.freeze_text()\n\n        # sequence parallel related configs\n        self.enable_sequence_parallelism = enable_sequence_parallelism\n        if enable_sequence_parallelism:\n            self.sp_rank = dist.get_rank(get_sequence_parallel_group())\n        else:\n            self.sp_rank = None\n\n    def forward(self, x, timestep, y, mask=None, x_mask=None, **kwargs):\n        \"\"\"\n        Forward pass of STDiT.\n        Args:\n            x (torch.Tensor): latent representation of video; of shape [B, C, T, H, W]\n            timestep (torch.Tensor): diffusion time steps; of shape [B]\n            y (torch.Tensor): representation of prompts; of shape [B, 1, N_token, C]\n            mask (torch.Tensor): mask for selecting prompt tokens; of shape [B, N_token]\n\n        Returns:\n            x (torch.Tensor): output latent representation; of shape [B, C, T, H, W]\n        \"\"\"\n        dtype = self.x_embedder.proj.weight.dtype\n        x = x.to(dtype)\n        timestep = timestep.to(dtype)\n        y = y.to(dtype)\n\n        # embedding\n        x = self.x_embedder(x)  # [B, N, C]\n        x = rearrange(x, \"B (T S) C -> B T S C\", T=self.num_temporal, S=self.num_spatial)\n        x = x + self.pos_embed\n        x = rearrange(x, \"B T S C -> B (T S) C\")\n\n        # shard over the sequence dim if sp is enabled\n        if self.enable_sequence_parallelism:\n            x = split_forward_gather_backward(x, get_sequence_parallel_group(), dim=1, grad_scale=\"down\")\n\n        t = self.t_embedder(timestep, dtype=x.dtype)  # [B, C]\n        t_mlp = self.t_block(t)  # [B, C]\n        if x_mask is not None:\n            t0_timestep = torch.zeros_like(timestep)\n            t0 = self.t_embedder(t0_timestep, dtype=x.dtype)\n            t0_mlp = self.t_block(t0)\n        else:\n            t0 = None\n            t0_mlp = None\n        y = self.y_embedder(y, self.training)  # [B, 1, N_token, C]\n\n        if mask is not None:\n            if mask.shape[0] != y.shape[0]:\n                mask = mask.repeat(y.shape[0] // mask.shape[0], 1)\n            mask = mask.squeeze(1).squeeze(1)\n            y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])\n            y_lens = mask.sum(dim=1).tolist()\n        else:\n            y_lens = [y.shape[2]] * y.shape[0]\n            y = y.squeeze(1).view(1, -1, x.shape[-1])\n\n        # blocks\n        for i, block in enumerate(self.blocks):\n            if i == 0:\n                if self.enable_sequence_parallelism:\n                    tpe = torch.chunk(\n                        self.pos_embed_temporal, dist.get_world_size(get_sequence_parallel_group()), dim=1\n                    )[self.sp_rank].contiguous()\n                else:\n                    tpe = self.pos_embed_temporal\n            else:\n                tpe = None\n            x = auto_grad_checkpoint(block, x, y, t_mlp, y_lens, tpe, x_mask, t0_mlp)\n\n        if self.enable_sequence_parallelism:\n            x = gather_forward_split_backward(x, get_sequence_parallel_group(), dim=1, grad_scale=\"up\")\n        # x.shape: [B, N, C]\n\n        # final process\n        x = self.final_layer(x, t, x_mask, t0)  # [B, N, C=T_p * H_p * W_p * C_out]\n        x = self.unpatchify(x)  # [B, C_out, T, H, W]\n\n        # cast to float32 for better accuracy\n        x = x.to(torch.float32)\n        return x\n\n    def unpatchify(self, x):\n        \"\"\"\n        Args:\n            x (torch.Tensor): of shape [B, N, C]\n\n        Return:\n            x (torch.Tensor): of shape [B, C_out, T, H, W]\n        \"\"\"\n\n        N_t, N_h, N_w = [self.input_size[i] // self.patch_size[i] for i in range(3)]\n        T_p, H_p, W_p = self.patch_size\n        x = rearrange(\n            x,\n            \"B (N_t N_h N_w) (T_p H_p W_p C_out) -> B C_out (N_t T_p) (N_h H_p) (N_w W_p)\",\n            N_t=N_t,\n            N_h=N_h,\n            N_w=N_w,\n            T_p=T_p,\n            H_p=H_p,\n            W_p=W_p,\n            C_out=self.out_channels,\n        )\n        return x\n\n    def unpatchify_old(self, x):\n        c = self.out_channels\n        t, h, w = [self.input_size[i] // self.patch_size[i] for i in range(3)]\n        pt, ph, pw = self.patch_size\n\n        x = x.reshape(shape=(x.shape[0], t, h, w, pt, ph, pw, c))\n        x = rearrange(x, \"n t h w r p q c -> n c t r h p w q\")\n        imgs = x.reshape(shape=(x.shape[0], c, t * pt, h * ph, w * pw))\n        return imgs\n\n    def get_spatial_pos_embed(self, grid_size=None):\n        if grid_size is None:\n            grid_size = self.input_size[1:]\n        pos_embed = get_2d_sincos_pos_embed(\n            self.hidden_size,\n            (grid_size[0] // self.patch_size[1], grid_size[1] // self.patch_size[2]),\n            scale=self.space_scale,\n        )\n        pos_embed = torch.from_numpy(pos_embed).float().unsqueeze(0).requires_grad_(False)\n        return pos_embed\n\n    def get_temporal_pos_embed(self):\n        pos_embed = get_1d_sincos_pos_embed(\n            self.hidden_size,\n            self.input_size[0] // self.patch_size[0],\n            scale=self.time_scale,\n        )\n        pos_embed = torch.from_numpy(pos_embed).float().unsqueeze(0).requires_grad_(False)\n        return pos_embed\n\n    def freeze_not_temporal(self):\n        for n, p in self.named_parameters():\n            if \"attn_temp\" not in n:\n                p.requires_grad = False\n\n    def freeze_text(self):\n        for n, p in self.named_parameters():\n            if \"cross_attn\" in n:\n                p.requires_grad = False\n\n    def initialize_temporal(self):\n        for block in self.blocks:\n            nn.init.constant_(block.attn_temp.proj.weight, 0)\n            nn.init.constant_(block.attn_temp.proj.bias, 0)\n\n    def initialize_weights(self):\n        # Initialize transformer layers:\n        def _basic_init(module):\n            if isinstance(module, nn.Linear):\n                torch.nn.init.xavier_uniform_(module.weight)\n                if module.bias is not None:\n                    nn.init.constant_(module.bias, 0)\n\n        self.apply(_basic_init)\n\n        # Initialize patch_embed like nn.Linear (instead of nn.Conv2d):\n        w = self.x_embedder.proj.weight.data\n        nn.init.xavier_uniform_(w.view([w.shape[0], -1]))\n\n        # Initialize timestep embedding MLP:\n        nn.init.normal_(self.t_embedder.mlp[0].weight, std=0.02)\n        nn.init.normal_(self.t_embedder.mlp[2].weight, std=0.02)\n        nn.init.normal_(self.t_block[1].weight, std=0.02)\n\n        # Initialize caption embedding MLP:\n        nn.init.normal_(self.y_embedder.y_proj.fc1.weight, std=0.02)\n        nn.init.normal_(self.y_embedder.y_proj.fc2.weight, std=0.02)\n\n        # Zero-out adaLN modulation layers in PixArt blocks:\n        for block in self.blocks:\n            nn.init.constant_(block.cross_attn.proj.weight, 0)\n            nn.init.constant_(block.cross_attn.proj.bias, 0)\n\n        # Zero-out output layers:\n        nn.init.constant_(self.final_layer.linear.weight, 0)\n        nn.init.constant_(self.final_layer.linear.bias, 0)\n\n\n@MODELS.register_module(\"STDiT-XL/2\")\ndef STDiT_XL_2(from_pretrained=None, **kwargs):\n    model = STDiT(depth=28, hidden_size=1152, patch_size=(1, 2, 2), num_heads=16, **kwargs)\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n"
  },
  {
    "path": "Open-Sora/opensora/models/stdit/stdit2.py",
    "content": "import os\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom einops import rearrange\nfrom rotary_embedding_torch import RotaryEmbedding\nfrom timm.models.layers import DropPath\nfrom timm.models.vision_transformer import Mlp\nfrom transformers import PretrainedConfig, PreTrainedModel\n\nfrom opensora.acceleration.checkpoint import auto_grad_checkpoint\nfrom opensora.models.layers.blocks import (\n    Attention,\n    CaptionEmbedder,\n    MultiHeadCrossAttention,\n    PatchEmbed3D,\n    PositionEmbedding2D,\n    SizeEmbedder,\n    T2IFinalLayer,\n    TimestepEmbedder,\n    approx_gelu,\n    get_2d_sincos_pos_embed,\n    get_layernorm,\n    t2i_modulate,\n)\nfrom opensora.registry import MODELS\nfrom opensora.utils.ckpt_utils import load_checkpoint\n\n\nclass STDiT2Block(nn.Module):\n    def __init__(\n        self,\n        hidden_size,\n        num_heads,\n        mlp_ratio=4.0,\n        drop_path=0.0,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n        rope=None,\n        qk_norm=False,\n        qk_norm_legacy=False,\n    ):\n        super().__init__()\n        self.hidden_size = hidden_size\n        self.enable_flash_attn = enable_flash_attn\n        self._enable_sequence_parallelism = enable_sequence_parallelism\n\n        # spatial branch\n        self.norm1 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.attn = Attention(\n            hidden_size,\n            num_heads=num_heads,\n            qkv_bias=True,\n            enable_flash_attn=enable_flash_attn,\n            qk_norm=qk_norm,\n            qk_norm_legacy=qk_norm_legacy,\n        )\n        self.scale_shift_table = nn.Parameter(torch.randn(6, hidden_size) / hidden_size**0.5)\n\n        # cross attn\n        self.cross_attn = MultiHeadCrossAttention(hidden_size, num_heads)\n\n        # mlp branch\n        self.norm2 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.mlp = Mlp(\n            in_features=hidden_size, hidden_features=int(hidden_size * mlp_ratio), act_layer=approx_gelu, drop=0\n        )\n        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity()\n\n        # temporal branch\n        self.norm_temp = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)  # new\n        self.attn_temp = Attention(\n            hidden_size,\n            num_heads=num_heads,\n            qkv_bias=True,\n            enable_flash_attn=self.enable_flash_attn,\n            rope=rope,\n            qk_norm=qk_norm,\n            qk_norm_legacy=qk_norm_legacy,\n        )\n        self.scale_shift_table_temporal = nn.Parameter(torch.randn(3, hidden_size) / hidden_size**0.5)  # new\n\n    def t_mask_select(self, x_mask, x, masked_x, T, S):\n        # x: [B, (T, S), C]\n        # mased_x: [B, (T, S), C]\n        # x_mask: [B, T]\n        x = rearrange(x, \"B (T S) C -> B T S C\", T=T, S=S)\n        masked_x = rearrange(masked_x, \"B (T S) C -> B T S C\", T=T, S=S)\n        x = torch.where(x_mask[:, :, None, None], x, masked_x)\n        x = rearrange(x, \"B T S C -> B (T S) C\")\n        return x\n\n    def forward(self, x, y, t, t_tmp, mask=None, x_mask=None, t0=None, t0_tmp=None, T=None, S=None):\n        B, N, C = x.shape\n\n        shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (\n            self.scale_shift_table[None] + t.reshape(B, 6, -1)\n        ).chunk(6, dim=1)\n        shift_tmp, scale_tmp, gate_tmp = (self.scale_shift_table_temporal[None] + t_tmp.reshape(B, 3, -1)).chunk(\n            3, dim=1\n        )\n        if x_mask is not None:\n            shift_msa_zero, scale_msa_zero, gate_msa_zero, shift_mlp_zero, scale_mlp_zero, gate_mlp_zero = (\n                self.scale_shift_table[None] + t0.reshape(B, 6, -1)\n            ).chunk(6, dim=1)\n            shift_tmp_zero, scale_tmp_zero, gate_tmp_zero = (\n                self.scale_shift_table_temporal[None] + t0_tmp.reshape(B, 3, -1)\n            ).chunk(3, dim=1)\n\n        # modulate\n        x_m = t2i_modulate(self.norm1(x), shift_msa, scale_msa)\n        if x_mask is not None:\n            x_m_zero = t2i_modulate(self.norm1(x), shift_msa_zero, scale_msa_zero)\n            x_m = self.t_mask_select(x_mask, x_m, x_m_zero, T, S)\n\n        # spatial branch\n        x_s = rearrange(x_m, \"B (T S) C -> (B T) S C\", T=T, S=S)\n        x_s = self.attn(x_s)\n        x_s = rearrange(x_s, \"(B T) S C -> B (T S) C\", T=T, S=S)\n        if x_mask is not None:\n            x_s_zero = gate_msa_zero * x_s\n            x_s = gate_msa * x_s\n            x_s = self.t_mask_select(x_mask, x_s, x_s_zero, T, S)\n        else:\n            x_s = gate_msa * x_s\n        x = x + self.drop_path(x_s)\n\n        # modulate\n        x_m = t2i_modulate(self.norm_temp(x), shift_tmp, scale_tmp)\n        if x_mask is not None:\n            x_m_zero = t2i_modulate(self.norm_temp(x), shift_tmp_zero, scale_tmp_zero)\n            x_m = self.t_mask_select(x_mask, x_m, x_m_zero, T, S)\n\n        # temporal branch\n        x_t = rearrange(x_m, \"B (T S) C -> (B S) T C\", T=T, S=S)\n        x_t = self.attn_temp(x_t)\n        x_t = rearrange(x_t, \"(B S) T C -> B (T S) C\", T=T, S=S)\n        if x_mask is not None:\n            x_t_zero = gate_tmp_zero * x_t\n            x_t = gate_tmp * x_t\n            x_t = self.t_mask_select(x_mask, x_t, x_t_zero, T, S)\n        else:\n            x_t = gate_tmp * x_t\n        x = x + self.drop_path(x_t)\n\n        # cross attn\n        x = x + self.cross_attn(x, y, mask)\n\n        # modulate\n        x_m = t2i_modulate(self.norm2(x), shift_mlp, scale_mlp)\n        if x_mask is not None:\n            x_m_zero = t2i_modulate(self.norm2(x), shift_mlp_zero, scale_mlp_zero)\n            x_m = self.t_mask_select(x_mask, x_m, x_m_zero, T, S)\n\n        # mlp\n        x_mlp = self.mlp(x_m)\n        if x_mask is not None:\n            x_mlp_zero = gate_mlp_zero * x_mlp\n            x_mlp = gate_mlp * x_mlp\n            x_mlp = self.t_mask_select(x_mask, x_mlp, x_mlp_zero, T, S)\n        else:\n            x_mlp = gate_mlp * x_mlp\n        x = x + self.drop_path(x_mlp)\n\n        return x\n\n\nclass STDiT2Config(PretrainedConfig):\n    model_type = \"STDiT2\"\n\n    def __init__(\n        self,\n        input_size=(None, None, None),\n        input_sq_size=32,\n        in_channels=4,\n        patch_size=(1, 2, 2),\n        hidden_size=1152,\n        depth=28,\n        num_heads=16,\n        mlp_ratio=4.0,\n        class_dropout_prob=0.1,\n        pred_sigma=True,\n        drop_path=0.0,\n        no_temporal_pos_emb=False,\n        caption_channels=4096,\n        model_max_length=120,\n        freeze=None,\n        qk_norm=False,\n        qk_norm_legacy=False,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        **kwargs,\n    ):\n        self.input_size = input_size\n        self.input_sq_size = input_sq_size\n        self.in_channels = in_channels\n        self.patch_size = patch_size\n        self.hidden_size = hidden_size\n        self.depth = depth\n        self.num_heads = num_heads\n        self.mlp_ratio = mlp_ratio\n        self.class_dropout_prob = class_dropout_prob\n        self.pred_sigma = pred_sigma\n        self.drop_path = drop_path\n        self.no_temporal_pos_emb = no_temporal_pos_emb\n        self.caption_channels = caption_channels\n        self.model_max_length = model_max_length\n        self.freeze = freeze\n        self.qk_norm = qk_norm\n        self.qk_norm_legacy = qk_norm_legacy\n        self.enable_flash_attn = enable_flash_attn\n        self.enable_layernorm_kernel = enable_layernorm_kernel\n        super().__init__(**kwargs)\n\n\n@MODELS.register_module()\nclass STDiT2(PreTrainedModel):\n    config_class = STDiT2Config\n\n    def __init__(self, config):\n        super().__init__(config)\n        self.pred_sigma = config.pred_sigma\n        self.in_channels = config.in_channels\n        self.out_channels = config.in_channels * 2 if config.pred_sigma else config.in_channels\n        self.hidden_size = config.hidden_size\n        self.num_heads = config.num_heads\n        self.no_temporal_pos_emb = config.no_temporal_pos_emb\n        self.depth = config.depth\n        self.mlp_ratio = config.mlp_ratio\n        self.enable_flash_attn = config.enable_flash_attn\n        self.enable_layernorm_kernel = config.enable_layernorm_kernel\n\n        # support dynamic input\n        self.patch_size = config.patch_size\n        self.input_size = config.input_size\n        self.input_sq_size = config.input_sq_size\n        self.pos_embed = PositionEmbedding2D(config.hidden_size)\n\n        self.x_embedder = PatchEmbed3D(config.patch_size, config.in_channels, config.hidden_size)\n        self.t_embedder = TimestepEmbedder(config.hidden_size)\n        self.t_block = nn.Sequential(nn.SiLU(), nn.Linear(config.hidden_size, 6 * config.hidden_size, bias=True))\n        self.t_block_temp = nn.Sequential(\n            nn.SiLU(), nn.Linear(config.hidden_size, 3 * config.hidden_size, bias=True)\n        )  # new\n        self.y_embedder = CaptionEmbedder(\n            in_channels=config.caption_channels,\n            hidden_size=config.hidden_size,\n            uncond_prob=config.class_dropout_prob,\n            act_layer=approx_gelu,\n            token_num=config.model_max_length,\n        )\n\n        drop_path = [x.item() for x in torch.linspace(0, config.drop_path, config.depth)]\n        self.rope = RotaryEmbedding(dim=self.hidden_size // self.num_heads)  # new\n        self.blocks = nn.ModuleList(\n            [\n                STDiT2Block(\n                    self.hidden_size,\n                    self.num_heads,\n                    mlp_ratio=self.mlp_ratio,\n                    drop_path=drop_path[i],\n                    enable_flash_attn=self.enable_flash_attn,\n                    enable_layernorm_kernel=self.enable_layernorm_kernel,\n                    rope=self.rope.rotate_queries_or_keys,\n                    qk_norm=config.qk_norm,\n                    qk_norm_legacy=config.qk_norm_legacy,\n                )\n                for i in range(self.depth)\n            ]\n        )\n        self.final_layer = T2IFinalLayer(config.hidden_size, np.prod(self.patch_size), self.out_channels)\n\n        # multi_res\n        assert self.hidden_size % 3 == 0, \"hidden_size must be divisible by 3\"\n        self.csize_embedder = SizeEmbedder(self.hidden_size // 3)\n        self.ar_embedder = SizeEmbedder(self.hidden_size // 3)\n        self.fl_embedder = SizeEmbedder(self.hidden_size)  # new\n        self.fps_embedder = SizeEmbedder(self.hidden_size)  # new\n\n        # init model\n        self.initialize_weights()\n        self.initialize_temporal()\n        if config.freeze is not None:\n            assert config.freeze in [\"not_temporal\", \"text\"]\n            if config.freeze == \"not_temporal\":\n                self.freeze_not_temporal()\n            elif config.freeze == \"text\":\n                self.freeze_text()\n\n    def get_dynamic_size(self, x):\n        _, _, T, H, W = x.size()\n        if T % self.patch_size[0] != 0:\n            T += self.patch_size[0] - T % self.patch_size[0]\n        if H % self.patch_size[1] != 0:\n            H += self.patch_size[1] - H % self.patch_size[1]\n        if W % self.patch_size[2] != 0:\n            W += self.patch_size[2] - W % self.patch_size[2]\n        T = T // self.patch_size[0]\n        H = H // self.patch_size[1]\n        W = W // self.patch_size[2]\n        return (T, H, W)\n\n    def forward(\n        self, x, timestep, y, mask=None, x_mask=None, num_frames=None, height=None, width=None, ar=None, fps=None\n    ):\n        \"\"\"\n        Forward pass of STDiT.\n        Args:\n            x (torch.Tensor): latent representation of video; of shape [B, C, T, H, W]\n            timestep (torch.Tensor): diffusion time steps; of shape [B]\n            y (torch.Tensor): representation of prompts; of shape [B, 1, N_token, C]\n            mask (torch.Tensor): mask for selecting prompt tokens; of shape [B, N_token]\n\n        Returns:\n            x (torch.Tensor): output latent representation; of shape [B, C, T, H, W]\n        \"\"\"\n        B = x.shape[0]\n        dtype = self.x_embedder.proj.weight.dtype\n        x = x.to(dtype)\n        timestep = timestep.to(dtype)\n        y = y.to(dtype)\n\n        # === process data info ===\n        # 1. get dynamic size\n        hw = torch.cat([height[:, None], width[:, None]], dim=1)\n        rs = (height[0].item() * width[0].item()) ** 0.5\n        csize = self.csize_embedder(hw, B)\n\n        # 2. get aspect ratio\n        ar = ar.unsqueeze(1)\n        ar = self.ar_embedder(ar, B)\n        data_info = torch.cat([csize, ar], dim=1)\n\n        # 3. get number of frames\n        fl = num_frames.unsqueeze(1)\n        fps = fps.unsqueeze(1)\n        fl = self.fl_embedder(fl, B)\n        fl = fl + self.fps_embedder(fps, B)\n\n        # === get dynamic shape size ===\n        _, _, Tx, Hx, Wx = x.size()\n        T, H, W = self.get_dynamic_size(x)\n        S = H * W\n        scale = rs / self.input_sq_size\n        base_size = round(S**0.5)\n        pos_emb = self.pos_embed(x, H, W, scale=scale, base_size=base_size)\n\n        # embedding\n        x = self.x_embedder(x)  # [B, N, C]\n        x = rearrange(x, \"B (T S) C -> B T S C\", T=T, S=S)\n        x = x + pos_emb\n        x = rearrange(x, \"B T S C -> B (T S) C\")\n\n        # prepare adaIN\n        t = self.t_embedder(timestep, dtype=x.dtype)  # [B, C]\n        t_spc = t + data_info  # [B, C]\n        t_tmp = t + fl  # [B, C]\n        t_spc_mlp = self.t_block(t_spc)  # [B, 6*C]\n        t_tmp_mlp = self.t_block_temp(t_tmp)  # [B, 3*C]\n        if x_mask is not None:\n            t0_timestep = torch.zeros_like(timestep)\n            t0 = self.t_embedder(t0_timestep, dtype=x.dtype)\n            t0_spc = t0 + data_info\n            t0_tmp = t0 + fl\n            t0_spc_mlp = self.t_block(t0_spc)\n            t0_tmp_mlp = self.t_block_temp(t0_tmp)\n        else:\n            t0_spc = None\n            t0_tmp = None\n            t0_spc_mlp = None\n            t0_tmp_mlp = None\n\n        # prepare y\n        y = self.y_embedder(y, self.training)  # [B, 1, N_token, C]\n\n        if mask is not None:\n            if mask.shape[0] != y.shape[0]:\n                mask = mask.repeat(y.shape[0] // mask.shape[0], 1)\n            mask = mask.squeeze(1).squeeze(1)\n            y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])\n            y_lens = mask.sum(dim=1).tolist()\n        else:\n            y_lens = [y.shape[2]] * y.shape[0]\n            y = y.squeeze(1).view(1, -1, x.shape[-1])\n\n        # blocks\n        for _, block in enumerate(self.blocks):\n            x = auto_grad_checkpoint(\n                block,\n                x,\n                y,\n                t_spc_mlp,\n                t_tmp_mlp,\n                y_lens,\n                x_mask,\n                t0_spc_mlp,\n                t0_tmp_mlp,\n                T,\n                S,\n            )\n            # x.shape: [B, N, C]\n\n        # final process\n        x = self.final_layer(x, t, x_mask, t0_spc, T, S)  # [B, N, C=T_p * H_p * W_p * C_out]\n        x = self.unpatchify(x, T, H, W, Tx, Hx, Wx)  # [B, C_out, T, H, W]\n\n        # cast to float32 for better accuracy\n        x = x.to(torch.float32)\n        return x\n\n    def unpatchify(self, x, N_t, N_h, N_w, R_t, R_h, R_w):\n        \"\"\"\n        Args:\n            x (torch.Tensor): of shape [B, N, C]\n\n        Return:\n            x (torch.Tensor): of shape [B, C_out, T, H, W]\n        \"\"\"\n\n        # N_t, N_h, N_w = [self.input_size[i] // self.patch_size[i] for i in range(3)]\n        T_p, H_p, W_p = self.patch_size\n        x = rearrange(\n            x,\n            \"B (N_t N_h N_w) (T_p H_p W_p C_out) -> B C_out (N_t T_p) (N_h H_p) (N_w W_p)\",\n            N_t=N_t,\n            N_h=N_h,\n            N_w=N_w,\n            T_p=T_p,\n            H_p=H_p,\n            W_p=W_p,\n            C_out=self.out_channels,\n        )\n        # unpad\n        x = x[:, :, :R_t, :R_h, :R_w]\n        return x\n\n    def unpatchify_old(self, x):\n        c = self.out_channels\n        t, h, w = [self.input_size[i] // self.patch_size[i] for i in range(3)]\n        pt, ph, pw = self.patch_size\n\n        x = x.reshape(shape=(x.shape[0], t, h, w, pt, ph, pw, c))\n        x = rearrange(x, \"n t h w r p q c -> n c t r h p w q\")\n        imgs = x.reshape(shape=(x.shape[0], c, t * pt, h * ph, w * pw))\n        return imgs\n\n    def get_spatial_pos_embed(self, H, W, scale=1.0, base_size=None):\n        pos_embed = get_2d_sincos_pos_embed(\n            self.hidden_size,\n            (H, W),\n            scale=scale,\n            base_size=base_size,\n        )\n        pos_embed = torch.from_numpy(pos_embed).float().unsqueeze(0).requires_grad_(False)\n        return pos_embed\n\n    def freeze_not_temporal(self):\n        for n, p in self.named_parameters():\n            if \"attn_temp\" not in n:\n                p.requires_grad = False\n\n    def freeze_text(self):\n        for n, p in self.named_parameters():\n            if \"cross_attn\" in n:\n                p.requires_grad = False\n\n    def initialize_temporal(self):\n        for block in self.blocks:\n            nn.init.constant_(block.attn_temp.proj.weight, 0)\n            nn.init.constant_(block.attn_temp.proj.bias, 0)\n\n    def initialize_weights(self):\n        # Initialize transformer layers:\n        def _basic_init(module):\n            if isinstance(module, nn.Linear):\n                torch.nn.init.xavier_uniform_(module.weight)\n                if module.bias is not None:\n                    nn.init.constant_(module.bias, 0)\n\n        self.apply(_basic_init)\n\n        # Initialize patch_embed like nn.Linear (instead of nn.Conv2d):\n        w = self.x_embedder.proj.weight.data\n        nn.init.xavier_uniform_(w.view([w.shape[0], -1]))\n\n        # Initialize timestep embedding MLP:\n        nn.init.normal_(self.t_embedder.mlp[0].weight, std=0.02)\n        nn.init.normal_(self.t_embedder.mlp[2].weight, std=0.02)\n        nn.init.normal_(self.t_block[1].weight, std=0.02)\n        nn.init.normal_(self.t_block_temp[1].weight, std=0.02)\n\n        # Initialize caption embedding MLP:\n        nn.init.normal_(self.y_embedder.y_proj.fc1.weight, std=0.02)\n        nn.init.normal_(self.y_embedder.y_proj.fc2.weight, std=0.02)\n\n        # Zero-out adaLN modulation layers in PixArt blocks:\n        for block in self.blocks:\n            nn.init.constant_(block.cross_attn.proj.weight, 0)\n            nn.init.constant_(block.cross_attn.proj.bias, 0)\n\n        # Zero-out output layers:\n        nn.init.constant_(self.final_layer.linear.weight, 0)\n        nn.init.constant_(self.final_layer.linear.bias, 0)\n\n\n@MODELS.register_module(\"STDiT2-XL/2\")\ndef STDiT2_XL_2(from_pretrained=None, **kwargs):\n    if from_pretrained is not None:\n        if os.path.isdir(from_pretrained) or os.path.isfile(from_pretrained):\n            # if it is a directory or a file, we load the checkpoint manually\n            config = STDiT2Config(depth=28, hidden_size=1152, patch_size=(1, 2, 2), num_heads=16, **kwargs)\n            model = STDiT2(config)\n            load_checkpoint(model, from_pretrained)\n            return model\n        else:\n            # otherwise, we load the model from hugging face hub\n            return STDiT2.from_pretrained(from_pretrained)\n    else:\n        # create a new model\n        config = STDiT2Config(depth=28, hidden_size=1152, patch_size=(1, 2, 2), num_heads=16, **kwargs)\n        model = STDiT2(config)\n    return model\n"
  },
  {
    "path": "Open-Sora/opensora/models/stdit/stdit3.py",
    "content": "import os\n\nimport numpy as np\nimport torch\nimport torch.distributed as dist\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom einops import rearrange\nfrom rotary_embedding_torch import RotaryEmbedding\nfrom timm.models.layers import DropPath\nfrom timm.models.vision_transformer import Mlp\nfrom transformers import PretrainedConfig, PreTrainedModel\n\nfrom opensora.acceleration.checkpoint import auto_grad_checkpoint\nfrom opensora.acceleration.communications import gather_forward_split_backward, split_forward_gather_backward\nfrom opensora.acceleration.parallel_states import get_sequence_parallel_group\nfrom opensora.models.layers.blocks import (\n    Attention,\n    CaptionEmbedder,\n    MultiHeadCrossAttention,\n    PatchEmbed3D,\n    PositionEmbedding2D,\n    SeqParallelAttention,\n    SeqParallelMultiHeadCrossAttention,\n    SizeEmbedder,\n    T2IFinalLayer,\n    TimestepEmbedder,\n    approx_gelu,\n    get_layernorm,\n    t2i_modulate,\n)\nfrom opensora.registry import MODELS\nfrom opensora.utils.ckpt_utils import load_checkpoint\n\nfrom ...models.cache_functions import global_force_fresh, cache_cutfresh, update_cache, force_init, score_evaluate\n\nclass STDiT3Block(nn.Module):\n    def __init__(\n        self,\n        hidden_size,\n        num_heads,\n        mlp_ratio=4.0,\n        drop_path=0.0,\n        rope=None,\n        qk_norm=False,\n        temporal=False,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n    ):\n        super().__init__()\n        self.temporal = temporal\n        self.hidden_size = hidden_size\n        self.enable_flash_attn = enable_flash_attn\n        self.enable_sequence_parallelism = enable_sequence_parallelism\n\n        if self.enable_sequence_parallelism and not temporal:\n            attn_cls = SeqParallelAttention\n            mha_cls = SeqParallelMultiHeadCrossAttention\n        else:\n            attn_cls = Attention\n            mha_cls = MultiHeadCrossAttention\n\n        self.norm1 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.attn = attn_cls(\n            hidden_size,\n            num_heads=num_heads,\n            qkv_bias=True,\n            qk_norm=qk_norm,\n            rope=rope,\n            enable_flash_attn=enable_flash_attn,\n        )\n        self.cross_attn = mha_cls(hidden_size, num_heads)\n        self.norm2 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)\n        self.mlp = Mlp(\n            in_features=hidden_size, hidden_features=int(hidden_size * mlp_ratio), act_layer=approx_gelu, drop=0\n        )\n        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity()\n        self.scale_shift_table = nn.Parameter(torch.randn(6, hidden_size) / hidden_size**0.5)\n\n    def t_mask_select(self, x_mask, x, masked_x, T, S):\n        # x: [B, (T, S), C]\n        # mased_x: [B, (T, S), C]\n        # x_mask: [B, T]\n        x = rearrange(x, \"B (T S) C -> B T S C\", T=T, S=S)\n        masked_x = rearrange(masked_x, \"B (T S) C -> B T S C\", T=T, S=S)\n        x = torch.where(x_mask[:, :, None, None], x, masked_x)\n        x = rearrange(x, \"B T S C -> B (T S) C\")\n        return x\n\n    def forward(\n        self,\n        x,\n        y,\n        t,\n        current,\n        cache_dic,\n        mask=None,  # text mask\n        x_mask=None,  # temporal mask\n        t0=None,  # t with timestamp=0\n        T=None,  # number of frames\n        S=None,  # number of pixel patches\n    ):\n        '''\n        Forward for video models.\n        Note that the Force Activation Cycle is slightly different from DiT-ToCa and PixArt-alpha-ToCa.\n        This is because of a discovery: The Force Activation Cycle of different modules can be different for OpenSora model. \n        (This cause decrease in performance in DiT and PixArt). \n        '''\n\n\n        # prepare modulate parameters\n        B, N, C = x.shape\n        shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (\n            self.scale_shift_table[None] + t.reshape(B, 6, -1)\n        ).chunk(6, dim=1)\n        if x_mask is not None:\n            shift_msa_zero, scale_msa_zero, gate_msa_zero, shift_mlp_zero, scale_mlp_zero, gate_mlp_zero = (\n                self.scale_shift_table[None] + t0.reshape(B, 6, -1)\n            ).chunk(6, dim=1)\n\n        if self.temporal:\n            current['flag'] = -1\n        else:\n            current['flag'] = 0\n        is_force_fresh = global_force_fresh(cache_dic, current)\n        current['is_force_fresh'] = is_force_fresh\n        \n        # modulate (attention)\n        current['module'] = 'attn'\n\n        if is_force_fresh[current['module']]:\n            x_m = t2i_modulate(self.norm1(x), shift_msa, scale_msa)\n            if x_mask is not None:\n                x_m_zero = t2i_modulate(self.norm1(x), shift_msa_zero, scale_msa_zero)\n                x_m = self.t_mask_select(x_mask, x_m, x_m_zero, T, S)\n\n            # attention\n            if self.temporal:\n                x_m = rearrange(x_m, \"B (T S) C -> (B S) T C\", T=T, S=S)\n                x_m = self.attn(x_m)\n                x_m = rearrange(x_m, \"(B S) T C -> B (T S) C\", T=T, S=S)\n            else:\n                x_m = rearrange(x_m, \"B (T S) C -> (B T) S C\", T=T, S=S)\n                x_m = self.attn(x_m)\n                x_m = rearrange(x_m, \"(B T) S C -> B (T S) C\", T=T, S=S)\n\n            cache_dic['cache'][current['flag']][current['layer']][current['module']] = x_m\n            force_init(cache_dic, current, x)\n        else:            \n            x_m = cache_dic['cache'][current['flag']][current['layer']][current['module']]\n            \n        # modulate (attention)\n        x_m_s = gate_msa * x_m\n        if x_mask is not None:\n            x_m_s_zero = gate_msa_zero * x_m\n            x_m_s = self.t_mask_select(x_mask, x_m_s, x_m_s_zero, T, S)\n        # residual\n        x = x + self.drop_path(x_m_s)\n\n        # cross attention\n        current['module'] = 'cross-attn'\n\n        if is_force_fresh[current['module']]:\n            cache_dic['cache'][current['flag']][current['layer']][current['module']], cache_dic['cross_attn_map'][current['flag']][current['layer']] = self.cross_attn(x, y, mask)\n            force_init(cache_dic, current, x)\n\n        else:\n            fresh_indices, fresh_tokens = cache_cutfresh(cache_dic, x, current)\n            fresh_tokens, fresh_cross_attn_map = self.cross_attn(fresh_tokens, y, mask)\n            update_cache(fresh_indices, fresh_tokens=fresh_tokens, cache_dic=cache_dic, current=current, fresh_attn_map=fresh_cross_attn_map)\n\n        x = x + cache_dic['cache'][current['flag']][current['layer']][current['module']]\n\n        # modulate (MLP)\n        current['module'] = 'mlp'\n\n        #mlp_tick.record()\n        x_m = t2i_modulate(self.norm2(x), shift_mlp, scale_mlp)\n        if x_mask is not None:\n            x_m_zero = t2i_modulate(self.norm2(x), shift_mlp_zero, scale_mlp_zero)\n            x_m = self.t_mask_select(x_mask, x_m, x_m_zero, T, S)\n        \n        # MLP\n        if is_force_fresh[current['module']]:\n            x_m = self.mlp(x_m)\n            cache_dic['cache'][current['flag']][current['layer']][current['module']] = x_m\n            force_init(cache_dic, current, x)\n        \n        else:\n            fresh_indices, fresh_tokens = cache_cutfresh(cache_dic, x_m, current)\n            fresh_tokens = self.mlp(fresh_tokens)\n            update_cache(fresh_indices, fresh_tokens=fresh_tokens, cache_dic=cache_dic, current=current)\n\n        # modulate (MLP)\n        x_m_s = gate_mlp * cache_dic['cache'][current['flag']][current['layer']][current['module']]\n\n        if x_mask is not None:\n            x_m_s_zero = gate_mlp_zero * x_m\n            x_m_s = self.t_mask_select(x_mask, x_m_s, x_m_s_zero, T, S)\n\n            # residual    \n        x = x + self.drop_path(x_m_s)\n\n        return x\n\n\nclass STDiT3Config(PretrainedConfig):\n    model_type = \"STDiT3\"\n\n    def __init__(\n        self,\n        input_size=(None, None, None),\n        input_sq_size=512,\n        in_channels=4,\n        patch_size=(1, 2, 2),\n        hidden_size=1152,\n        depth=28,\n        num_heads=16,\n        mlp_ratio=4.0,\n        class_dropout_prob=0.1,\n        pred_sigma=True,\n        drop_path=0.0,\n        caption_channels=4096,\n        model_max_length=300,\n        qk_norm=True,\n        enable_flash_attn=False,\n        enable_layernorm_kernel=False,\n        enable_sequence_parallelism=False,\n        only_train_temporal=False,\n        freeze_y_embedder=False,\n        skip_y_embedder=False,\n        **kwargs,\n    ):\n        self.input_size = input_size\n        self.input_sq_size = input_sq_size\n        self.in_channels = in_channels\n        self.patch_size = patch_size\n        self.hidden_size = hidden_size\n        self.depth = depth\n        self.num_heads = num_heads\n        self.mlp_ratio = mlp_ratio\n        self.class_dropout_prob = class_dropout_prob\n        self.pred_sigma = pred_sigma\n        self.drop_path = drop_path\n        self.caption_channels = caption_channels\n        self.model_max_length = model_max_length\n        self.qk_norm = qk_norm\n        self.enable_flash_attn = enable_flash_attn\n        self.enable_layernorm_kernel = enable_layernorm_kernel\n        self.enable_sequence_parallelism = enable_sequence_parallelism\n        self.only_train_temporal = only_train_temporal\n        self.freeze_y_embedder = freeze_y_embedder\n        self.skip_y_embedder = skip_y_embedder\n        super().__init__(**kwargs)\n\n\nclass STDiT3(PreTrainedModel):\n    config_class = STDiT3Config\n\n    def __init__(self, config):\n        super().__init__(config)\n        self.pred_sigma = config.pred_sigma\n        self.in_channels = config.in_channels\n        self.out_channels = config.in_channels * 2 if config.pred_sigma else config.in_channels\n\n        # model size related\n        self.depth = config.depth\n        self.mlp_ratio = config.mlp_ratio\n        self.hidden_size = config.hidden_size\n        self.num_heads = config.num_heads\n\n        # computation related\n        self.drop_path = config.drop_path\n        self.enable_flash_attn = config.enable_flash_attn\n        self.enable_layernorm_kernel = config.enable_layernorm_kernel\n        self.enable_sequence_parallelism = config.enable_sequence_parallelism\n\n        # input size related\n        self.patch_size = config.patch_size\n        self.input_sq_size = config.input_sq_size\n        self.pos_embed = PositionEmbedding2D(config.hidden_size)\n        self.rope = RotaryEmbedding(dim=self.hidden_size // self.num_heads)\n\n        # embedding\n        self.x_embedder = PatchEmbed3D(config.patch_size, config.in_channels, config.hidden_size)\n        self.t_embedder = TimestepEmbedder(config.hidden_size)\n        self.fps_embedder = SizeEmbedder(self.hidden_size)\n        self.t_block = nn.Sequential(\n            nn.SiLU(),\n            nn.Linear(config.hidden_size, 6 * config.hidden_size, bias=True),\n        )\n        self.y_embedder = CaptionEmbedder(\n            in_channels=config.caption_channels,\n            hidden_size=config.hidden_size,\n            uncond_prob=config.class_dropout_prob,\n            act_layer=approx_gelu,\n            token_num=config.model_max_length,\n        )\n\n        # spatial blocks\n        drop_path = [x.item() for x in torch.linspace(0, self.drop_path, config.depth)]\n        self.spatial_blocks = nn.ModuleList(\n            [\n                STDiT3Block(\n                    hidden_size=config.hidden_size,\n                    num_heads=config.num_heads,\n                    mlp_ratio=config.mlp_ratio,\n                    drop_path=drop_path[i],\n                    qk_norm=config.qk_norm,\n                    enable_flash_attn=config.enable_flash_attn,\n                    enable_layernorm_kernel=config.enable_layernorm_kernel,\n                    enable_sequence_parallelism=config.enable_sequence_parallelism,\n                )\n                for i in range(config.depth)\n            ]\n        )\n\n        # temporal blocks\n        drop_path = [x.item() for x in torch.linspace(0, self.drop_path, config.depth)]\n        self.temporal_blocks = nn.ModuleList(\n            [\n                STDiT3Block(\n                    hidden_size=config.hidden_size,\n                    num_heads=config.num_heads,\n                    mlp_ratio=config.mlp_ratio,\n                    drop_path=drop_path[i],\n                    qk_norm=config.qk_norm,\n                    enable_flash_attn=config.enable_flash_attn,\n                    enable_layernorm_kernel=config.enable_layernorm_kernel,\n                    enable_sequence_parallelism=config.enable_sequence_parallelism,\n                    # temporal\n                    temporal=True,\n                    rope=self.rope.rotate_queries_or_keys,\n                )\n                for i in range(config.depth)\n            ]\n        )\n\n        # final layer\n        self.final_layer = T2IFinalLayer(config.hidden_size, np.prod(self.patch_size), self.out_channels)\n\n        self.initialize_weights()\n        if config.only_train_temporal:\n            for param in self.parameters():\n                param.requires_grad = False\n            for block in self.temporal_blocks:\n                for param in block.parameters():\n                    param.requires_grad = True\n\n        if config.freeze_y_embedder:\n            for param in self.y_embedder.parameters():\n                param.requires_grad = False\n\n    def initialize_weights(self):\n        # Initialize transformer layers:\n        def _basic_init(module):\n            if isinstance(module, nn.Linear):\n                torch.nn.init.xavier_uniform_(module.weight)\n                if module.bias is not None:\n                    nn.init.constant_(module.bias, 0)\n\n        self.apply(_basic_init)\n\n        # Initialize fps_embedder\n        nn.init.normal_(self.fps_embedder.mlp[0].weight, std=0.02)\n        nn.init.constant_(self.fps_embedder.mlp[0].bias, 0)\n        nn.init.constant_(self.fps_embedder.mlp[2].weight, 0)\n        nn.init.constant_(self.fps_embedder.mlp[2].bias, 0)\n\n        # Initialize timporal blocks\n        for block in self.temporal_blocks:\n            nn.init.constant_(block.attn.proj.weight, 0)\n            nn.init.constant_(block.cross_attn.proj.weight, 0)\n            nn.init.constant_(block.mlp.fc2.weight, 0)\n\n    def get_dynamic_size(self, x):\n        _, _, T, H, W = x.size()\n        if T % self.patch_size[0] != 0:\n            T += self.patch_size[0] - T % self.patch_size[0]\n        if H % self.patch_size[1] != 0:\n            H += self.patch_size[1] - H % self.patch_size[1]\n        if W % self.patch_size[2] != 0:\n            W += self.patch_size[2] - W % self.patch_size[2]\n        T = T // self.patch_size[0]\n        H = H // self.patch_size[1]\n        W = W // self.patch_size[2]\n        return (T, H, W)\n\n    def encode_text(self, y, mask=None):\n        y = self.y_embedder(y, self.training)  # [B, 1, N_token, C]\n        if mask is not None:\n            if mask.shape[0] != y.shape[0]:\n                mask = mask.repeat(y.shape[0] // mask.shape[0], 1)\n            mask = mask.squeeze(1).squeeze(1)\n            y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, self.hidden_size)\n            y_lens = mask.sum(dim=1).tolist()\n        else:\n            y_lens = [y.shape[2]] * y.shape[0]\n            y = y.squeeze(1).view(1, -1, self.hidden_size)\n        return y, y_lens\n\n    def forward(self, x, timestep, y, mask=None, x_mask=None, fps=None, height=None, width=None, cache_dic=None, current=None, **kwargs):\n        dtype = self.x_embedder.proj.weight.dtype\n        B = x.size(0)\n        x = x.to(dtype)\n        timestep = timestep.to(dtype)\n        y = y.to(dtype)\n\n        # === get pos embed ===\n        _, _, Tx, Hx, Wx = x.size()\n        T, H, W = self.get_dynamic_size(x)\n        cache_dic['dynamic_size'] = (B,T,H,W)\n        # adjust for sequence parallelism\n        # we need to ensure H * W is divisible by sequence parallel size\n        # for simplicity, we can adjust the height to make it divisible\n        if self.enable_sequence_parallelism:\n            sp_size = dist.get_world_size(get_sequence_parallel_group())\n            if H % sp_size != 0:\n                h_pad_size = sp_size - H % sp_size\n            else:\n                h_pad_size = 0\n\n            if h_pad_size > 0:\n                hx_pad_size = h_pad_size * self.patch_size[1]\n\n                # pad x along the H dimension\n                H += h_pad_size\n                x = F.pad(x, (0, 0, 0, hx_pad_size))\n\n        S = H * W\n        base_size = round(S**0.5)\n        resolution_sq = (height[0].item() * width[0].item()) ** 0.5\n        scale = resolution_sq / self.input_sq_size\n        pos_emb = self.pos_embed(x, H, W, scale=scale, base_size=base_size)\n\n        # === get timestep embed ===\n        t = self.t_embedder(timestep, dtype=x.dtype)  # [B, C]\n        fps = self.fps_embedder(fps.unsqueeze(1), B)\n        t = t + fps\n        t_mlp = self.t_block(t)\n        t0 = t0_mlp = None\n        if x_mask is not None:\n            t0_timestep = torch.zeros_like(timestep)\n            t0 = self.t_embedder(t0_timestep, dtype=x.dtype)\n            t0 = t0 + fps\n            t0_mlp = self.t_block(t0)\n\n        # === get y embed ===\n        if self.config.skip_y_embedder:\n            y_lens = mask\n            if isinstance(y_lens, torch.Tensor):\n                y_lens = y_lens.long().tolist()\n        else:\n            y, y_lens = self.encode_text(y, mask)\n\n        # === get x embed ===\n        x = self.x_embedder(x)  # [B, N, C]\n        x = rearrange(x, \"B (T S) C -> B T S C\", T=T, S=S)\n        x = x + pos_emb\n\n        # shard over the sequence dim if sp is enabled\n        if self.enable_sequence_parallelism:\n            x = split_forward_gather_backward(x, get_sequence_parallel_group(), dim=2, grad_scale=\"down\")\n            S = S // dist.get_world_size(get_sequence_parallel_group())\n\n        x = rearrange(x, \"B T S C -> B (T S) C\", T=T, S=S)\n\n        # === blocks ===\n        for i, (spatial_block, temporal_block) in enumerate(zip(self.spatial_blocks, self.temporal_blocks)):\n            current['layer'] = i\n            #x = auto_grad_checkpoint(spatial_block,  x, y, t_mlp, current, cache_dic, y_lens, x_mask, t0_mlp, T, S)\n            #x = auto_grad_checkpoint(temporal_block, x, y, t_mlp, current, cache_dic, y_lens, x_mask, t0_mlp, T, S)\n            x = spatial_block(x, y, t_mlp, current, cache_dic, y_lens, x_mask, t0_mlp, T, S)\n            x = temporal_block(x, y, t_mlp, current, cache_dic, y_lens, x_mask, t0_mlp, T, S)\n\n        if self.enable_sequence_parallelism:\n            x = rearrange(x, \"B (T S) C -> B T S C\", T=T, S=S)\n            x = gather_forward_split_backward(x, get_sequence_parallel_group(), dim=2, grad_scale=\"up\")\n            S = S * dist.get_world_size(get_sequence_parallel_group())\n            x = rearrange(x, \"B T S C -> B (T S) C\", T=T, S=S)\n\n        # === final layer ===\n        x = self.final_layer(x, t, x_mask, t0, T, S)\n        x = self.unpatchify(x, T, H, W, Tx, Hx, Wx)\n\n        # cast to float32 for better accuracy\n        x = x.to(torch.float32)\n        return x\n\n    def unpatchify(self, x, N_t, N_h, N_w, R_t, R_h, R_w):\n        \"\"\"\n        Args:\n            x (torch.Tensor): of shape [B, N, C]\n\n        Return:\n            x (torch.Tensor): of shape [B, C_out, T, H, W]\n        \"\"\"\n\n        # N_t, N_h, N_w = [self.input_size[i] // self.patch_size[i] for i in range(3)]\n        T_p, H_p, W_p = self.patch_size\n        x = rearrange(\n            x,\n            \"B (N_t N_h N_w) (T_p H_p W_p C_out) -> B C_out (N_t T_p) (N_h H_p) (N_w W_p)\",\n            N_t=N_t,\n            N_h=N_h,\n            N_w=N_w,\n            T_p=T_p,\n            H_p=H_p,\n            W_p=W_p,\n            C_out=self.out_channels,\n        )\n        # unpad\n        x = x[:, :, :R_t, :R_h, :R_w]\n        return x\n\n\n@MODELS.register_module(\"STDiT3-XL/2\")\ndef STDiT3_XL_2(from_pretrained=None, **kwargs):\n    force_huggingface = kwargs.pop(\"force_huggingface\", False)\n    if force_huggingface or from_pretrained is not None and not os.path.exists(from_pretrained):\n        model = STDiT3.from_pretrained(from_pretrained, **kwargs)\n    else:\n        config = STDiT3Config(depth=28, hidden_size=1152, patch_size=(1, 2, 2), num_heads=16, **kwargs)\n        model = STDiT3(config)\n        if from_pretrained is not None:\n            load_checkpoint(model, from_pretrained)\n    return model\n\n\n@MODELS.register_module(\"STDiT3-3B/2\")\ndef STDiT3_3B_2(from_pretrained=None, **kwargs):\n    force_huggingface = kwargs.pop(\"force_huggingface\", False)\n    if force_huggingface or from_pretrained is not None and not os.path.exists(from_pretrained):\n        model = STDiT3.from_pretrained(from_pretrained, **kwargs)\n    else:\n        config = STDiT3Config(depth=28, hidden_size=1872, patch_size=(1, 2, 2), num_heads=26, **kwargs)\n        model = STDiT3(config)\n        if from_pretrained is not None:\n            load_checkpoint(model, from_pretrained)\n    return model\n"
  },
  {
    "path": "Open-Sora/opensora/models/text_encoder/__init__.py",
    "content": "from .classes import ClassEncoder\nfrom .clip import ClipEncoder\nfrom .t5 import T5Encoder\n"
  },
  {
    "path": "Open-Sora/opensora/models/text_encoder/classes.py",
    "content": "import torch\n\nfrom opensora.registry import MODELS\n\n\n@MODELS.register_module(\"classes\")\nclass ClassEncoder:\n    def __init__(self, num_classes, model_max_length=None, device=\"cuda\", dtype=torch.float):\n        self.num_classes = num_classes\n        self.y_embedder = None\n\n        self.model_max_length = model_max_length\n        self.output_dim = None\n        self.device = device\n\n    def encode(self, text):\n        return dict(y=torch.tensor([int(t) for t in text]).to(self.device))\n\n    def null(self, n):\n        return torch.tensor([self.num_classes] * n).to(self.device)\n"
  },
  {
    "path": "Open-Sora/opensora/models/text_encoder/clip.py",
    "content": "# Copyright 2024 Vchitect/Latte\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.# Modified from Latte\n#\n# This file is adapted from the Latte project.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# Latte: https://github.com/Vchitect/Latte\n# DiT:   https://github.com/facebookresearch/DiT/tree/main\n# --------------------------------------------------------\n\n\nimport torch\nimport torch.nn as nn\nimport transformers\nfrom transformers import CLIPTextModel, CLIPTokenizer\n\nfrom opensora.registry import MODELS\n\ntransformers.logging.set_verbosity_error()\n\n\nclass AbstractEncoder(nn.Module):\n    def __init__(self):\n        super().__init__()\n\n    def encode(self, *args, **kwargs):\n        raise NotImplementedError\n\n\nclass FrozenCLIPEmbedder(AbstractEncoder):\n    \"\"\"Uses the CLIP transformer encoder for text (from Hugging Face)\"\"\"\n\n    def __init__(self, path=\"openai/clip-vit-huge-patch14\", device=\"cuda\", max_length=77):\n        super().__init__()\n        self.tokenizer = CLIPTokenizer.from_pretrained(path)\n        self.transformer = CLIPTextModel.from_pretrained(path)\n        self.device = device\n        self.max_length = max_length\n        self._freeze()\n\n    def _freeze(self):\n        self.transformer = self.transformer.eval()\n        for param in self.parameters():\n            param.requires_grad = False\n\n    def forward(self, text):\n        batch_encoding = self.tokenizer(\n            text,\n            truncation=True,\n            max_length=self.max_length,\n            return_length=True,\n            return_overflowing_tokens=False,\n            padding=\"max_length\",\n            return_tensors=\"pt\",\n        )\n        tokens = batch_encoding[\"input_ids\"].to(self.device)\n        outputs = self.transformer(input_ids=tokens)\n\n        z = outputs.last_hidden_state\n        pooled_z = outputs.pooler_output\n        return z, pooled_z\n\n    def encode(self, text):\n        return self(text)\n\n\n@MODELS.register_module(\"clip\")\nclass ClipEncoder:\n    \"\"\"\n    Embeds text prompt into vector representations. Also handles text dropout for classifier-free guidance.\n    \"\"\"\n\n    def __init__(\n        self,\n        from_pretrained,\n        model_max_length=77,\n        device=\"cuda\",\n        dtype=torch.float,\n    ):\n        super().__init__()\n        assert from_pretrained is not None, \"Please specify the path to the T5 model\"\n\n        self.text_encoder = FrozenCLIPEmbedder(path=from_pretrained, max_length=model_max_length).to(device, dtype)\n        self.y_embedder = None\n\n        self.model_max_length = model_max_length\n        self.output_dim = self.text_encoder.transformer.config.hidden_size\n\n    def encode(self, text):\n        _, pooled_embeddings = self.text_encoder.encode(text)\n        y = pooled_embeddings.unsqueeze(1).unsqueeze(1)\n        return dict(y=y)\n\n    def null(self, n):\n        null_y = self.y_embedder.y_embedding[None].repeat(n, 1, 1)[:, None]\n        return null_y\n\n    def to(self, dtype):\n        self.text_encoder = self.text_encoder.to(dtype)\n        return self\n"
  },
  {
    "path": "Open-Sora/opensora/models/text_encoder/t5.py",
    "content": "# Adapted from PixArt\n#\n# Copyright (C) 2023  PixArt-alpha/PixArt-alpha\n#\n# This program is free software: you can redistribute it and/or modify\n# it under the terms of the GNU Affero General Public License as published\n# by the Free Software Foundation, either version 3 of the License, or\n# (at your option) any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU Affero General Public License for more details.\n#\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# PixArt: https://github.com/PixArt-alpha/PixArt-alpha\n# T5:     https://github.com/google-research/text-to-text-transfer-transformer\n# --------------------------------------------------------\n\nimport html\nimport re\n\nimport ftfy\nimport torch\nfrom transformers import AutoTokenizer, T5EncoderModel\n\nfrom opensora.registry import MODELS\n\n\nclass T5Embedder:\n    def __init__(\n        self,\n        device,\n        from_pretrained=None,\n        *,\n        cache_dir=None,\n        hf_token=None,\n        use_text_preprocessing=True,\n        t5_model_kwargs=None,\n        torch_dtype=None,\n        use_offload_folder=None,\n        model_max_length=120,\n        local_files_only=False,\n    ):\n        self.device = torch.device(device)\n        self.torch_dtype = torch_dtype or torch.bfloat16\n        self.cache_dir = cache_dir\n\n        if t5_model_kwargs is None:\n            t5_model_kwargs = {\n                \"low_cpu_mem_usage\": True,\n                \"torch_dtype\": self.torch_dtype,\n            }\n\n            if use_offload_folder is not None:\n                t5_model_kwargs[\"offload_folder\"] = use_offload_folder\n                t5_model_kwargs[\"device_map\"] = {\n                    \"shared\": self.device,\n                    \"encoder.embed_tokens\": self.device,\n                    \"encoder.block.0\": self.device,\n                    \"encoder.block.1\": self.device,\n                    \"encoder.block.2\": self.device,\n                    \"encoder.block.3\": self.device,\n                    \"encoder.block.4\": self.device,\n                    \"encoder.block.5\": self.device,\n                    \"encoder.block.6\": self.device,\n                    \"encoder.block.7\": self.device,\n                    \"encoder.block.8\": self.device,\n                    \"encoder.block.9\": self.device,\n                    \"encoder.block.10\": self.device,\n                    \"encoder.block.11\": self.device,\n                    \"encoder.block.12\": \"disk\",\n                    \"encoder.block.13\": \"disk\",\n                    \"encoder.block.14\": \"disk\",\n                    \"encoder.block.15\": \"disk\",\n                    \"encoder.block.16\": \"disk\",\n                    \"encoder.block.17\": \"disk\",\n                    \"encoder.block.18\": \"disk\",\n                    \"encoder.block.19\": \"disk\",\n                    \"encoder.block.20\": \"disk\",\n                    \"encoder.block.21\": \"disk\",\n                    \"encoder.block.22\": \"disk\",\n                    \"encoder.block.23\": \"disk\",\n                    \"encoder.final_layer_norm\": \"disk\",\n                    \"encoder.dropout\": \"disk\",\n                }\n            else:\n                t5_model_kwargs[\"device_map\"] = {\n                    \"shared\": self.device,\n                    \"encoder\": self.device,\n                }\n\n        self.use_text_preprocessing = use_text_preprocessing\n        self.hf_token = hf_token\n\n        self.tokenizer = AutoTokenizer.from_pretrained(\n            from_pretrained,\n            cache_dir=cache_dir,\n            local_files_only=local_files_only,\n        )\n        self.model = T5EncoderModel.from_pretrained(\n            from_pretrained,\n            cache_dir=cache_dir,\n            local_files_only=local_files_only,\n            **t5_model_kwargs,\n        ).eval()\n        self.model_max_length = model_max_length\n\n    def get_text_embeddings(self, texts):\n        text_tokens_and_mask = self.tokenizer(\n            texts,\n            max_length=self.model_max_length,\n            padding=\"max_length\",\n            truncation=True,\n            return_attention_mask=True,\n            add_special_tokens=True,\n            return_tensors=\"pt\",\n        )\n\n        input_ids = text_tokens_and_mask[\"input_ids\"].to(self.device)\n        attention_mask = text_tokens_and_mask[\"attention_mask\"].to(self.device)\n        with torch.no_grad():\n            text_encoder_embs = self.model(\n                input_ids=input_ids,\n                attention_mask=attention_mask,\n            )[\"last_hidden_state\"].detach()\n        return text_encoder_embs, attention_mask\n\n\n@MODELS.register_module(\"t5\")\nclass T5Encoder:\n    def __init__(\n        self,\n        from_pretrained=None,\n        model_max_length=120,\n        device=\"cuda\",\n        dtype=torch.float,\n        cache_dir=None,\n        shardformer=False,\n        local_files_only=False,\n    ):\n        assert from_pretrained is not None, \"Please specify the path to the T5 model\"\n\n        self.t5 = T5Embedder(\n            device=device,\n            torch_dtype=dtype,\n            from_pretrained=from_pretrained,\n            cache_dir=cache_dir,\n            model_max_length=model_max_length,\n            local_files_only=local_files_only,\n        )\n        self.t5.model.to(dtype=dtype)\n        self.y_embedder = None\n\n        self.model_max_length = model_max_length\n        self.output_dim = self.t5.model.config.d_model\n        self.dtype = dtype\n\n        if shardformer:\n            self.shardformer_t5()\n\n    def shardformer_t5(self):\n        from colossalai.shardformer import ShardConfig, ShardFormer\n\n        from opensora.acceleration.shardformer.policy.t5_encoder import T5EncoderPolicy\n        from opensora.utils.misc import requires_grad\n\n        shard_config = ShardConfig(\n            tensor_parallel_process_group=None,\n            pipeline_stage_manager=None,\n            enable_tensor_parallelism=False,\n            enable_fused_normalization=False,\n            enable_flash_attention=False,\n            enable_jit_fused=True,\n            enable_sequence_parallelism=False,\n            enable_sequence_overlap=False,\n        )\n        shard_former = ShardFormer(shard_config=shard_config)\n        optim_model, _ = shard_former.optimize(self.t5.model, policy=T5EncoderPolicy())\n        self.t5.model = optim_model.to(self.dtype)\n\n        # ensure the weights are frozen\n        requires_grad(self.t5.model, False)\n\n    def encode(self, text):\n        caption_embs, emb_masks = self.t5.get_text_embeddings(text)\n        caption_embs = caption_embs[:, None]\n        return dict(y=caption_embs, mask=emb_masks)\n\n    def null(self, n):\n        null_y = self.y_embedder.y_embedding[None].repeat(n, 1, 1)[:, None]\n        return null_y\n\n\ndef basic_clean(text):\n    text = ftfy.fix_text(text)\n    text = html.unescape(html.unescape(text))\n    return text.strip()\n\n\nBAD_PUNCT_REGEX = re.compile(\n    r\"[\" + \"#®•©™&@·º½¾¿¡§~\" + \"\\)\" + \"\\(\" + \"\\]\" + \"\\[\" + \"\\}\" + \"\\{\" + \"\\|\" + \"\\\\\" + \"\\/\" + \"\\*\" + r\"]{1,}\"\n)  # noqa\n\n\ndef clean_caption(caption):\n    import urllib.parse as ul\n\n    from bs4 import BeautifulSoup\n\n    caption = str(caption)\n    caption = ul.unquote_plus(caption)\n    caption = caption.strip().lower()\n    caption = re.sub(\"<person>\", \"person\", caption)\n    # urls:\n    caption = re.sub(\n        r\"\\b((?:https?:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))\",  # noqa\n        \"\",\n        caption,\n    )  # regex for urls\n    caption = re.sub(\n        r\"\\b((?:www:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))\",  # noqa\n        \"\",\n        caption,\n    )  # regex for urls\n    # html:\n    caption = BeautifulSoup(caption, features=\"html.parser\").text\n\n    # @<nickname>\n    caption = re.sub(r\"@[\\w\\d]+\\b\", \"\", caption)\n\n    # 31C0—31EF CJK Strokes\n    # 31F0—31FF Katakana Phonetic Extensions\n    # 3200—32FF Enclosed CJK Letters and Months\n    # 3300—33FF CJK Compatibility\n    # 3400—4DBF CJK Unified Ideographs Extension A\n    # 4DC0—4DFF Yijing Hexagram Symbols\n    # 4E00—9FFF CJK Unified Ideographs\n    caption = re.sub(r\"[\\u31c0-\\u31ef]+\", \"\", caption)\n    caption = re.sub(r\"[\\u31f0-\\u31ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3200-\\u32ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3300-\\u33ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3400-\\u4dbf]+\", \"\", caption)\n    caption = re.sub(r\"[\\u4dc0-\\u4dff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u4e00-\\u9fff]+\", \"\", caption)\n    #######################################################\n\n    # все виды тире / all types of dash --> \"-\"\n    caption = re.sub(\n        r\"[\\u002D\\u058A\\u05BE\\u1400\\u1806\\u2010-\\u2015\\u2E17\\u2E1A\\u2E3A\\u2E3B\\u2E40\\u301C\\u3030\\u30A0\\uFE31\\uFE32\\uFE58\\uFE63\\uFF0D]+\",  # noqa\n        \"-\",\n        caption,\n    )\n\n    # кавычки к одному стандарту\n    caption = re.sub(r\"[`´«»“”¨]\", '\"', caption)\n    caption = re.sub(r\"[‘’]\", \"'\", caption)\n\n    # &quot;\n    caption = re.sub(r\"&quot;?\", \"\", caption)\n    # &amp\n    caption = re.sub(r\"&amp\", \"\", caption)\n\n    # ip adresses:\n    caption = re.sub(r\"\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\", \" \", caption)\n\n    # article ids:\n    caption = re.sub(r\"\\d:\\d\\d\\s+$\", \"\", caption)\n\n    # \\n\n    caption = re.sub(r\"\\\\n\", \" \", caption)\n\n    # \"#123\"\n    caption = re.sub(r\"#\\d{1,3}\\b\", \"\", caption)\n    # \"#12345..\"\n    caption = re.sub(r\"#\\d{5,}\\b\", \"\", caption)\n    # \"123456..\"\n    caption = re.sub(r\"\\b\\d{6,}\\b\", \"\", caption)\n    # filenames:\n    caption = re.sub(r\"[\\S]+\\.(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)\", \"\", caption)\n\n    #\n    caption = re.sub(r\"[\\\"\\']{2,}\", r'\"', caption)  # \"\"\"AUSVERKAUFT\"\"\"\n    caption = re.sub(r\"[\\.]{2,}\", r\" \", caption)  # \"\"\"AUSVERKAUFT\"\"\"\n\n    caption = re.sub(BAD_PUNCT_REGEX, r\" \", caption)  # ***AUSVERKAUFT***, #AUSVERKAUFT\n    caption = re.sub(r\"\\s+\\.\\s+\", r\" \", caption)  # \" . \"\n\n    # this-is-my-cute-cat / this_is_my_cute_cat\n    regex2 = re.compile(r\"(?:\\-|\\_)\")\n    if len(re.findall(regex2, caption)) > 3:\n        caption = re.sub(regex2, \" \", caption)\n\n    caption = basic_clean(caption)\n\n    caption = re.sub(r\"\\b[a-zA-Z]{1,3}\\d{3,15}\\b\", \"\", caption)  # jc6640\n    caption = re.sub(r\"\\b[a-zA-Z]+\\d+[a-zA-Z]+\\b\", \"\", caption)  # jc6640vc\n    caption = re.sub(r\"\\b\\d+[a-zA-Z]+\\d+\\b\", \"\", caption)  # 6640vc231\n\n    caption = re.sub(r\"(worldwide\\s+)?(free\\s+)?shipping\", \"\", caption)\n    caption = re.sub(r\"(free\\s)?download(\\sfree)?\", \"\", caption)\n    caption = re.sub(r\"\\bclick\\b\\s(?:for|on)\\s\\w+\", \"\", caption)\n    caption = re.sub(r\"\\b(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)(\\simage[s]?)?\", \"\", caption)\n    caption = re.sub(r\"\\bpage\\s+\\d+\\b\", \"\", caption)\n\n    caption = re.sub(r\"\\b\\d*[a-zA-Z]+\\d+[a-zA-Z]+\\d+[a-zA-Z\\d]*\\b\", r\" \", caption)  # j2d1a2a...\n\n    caption = re.sub(r\"\\b\\d+\\.?\\d*[xх×]\\d+\\.?\\d*\\b\", \"\", caption)\n\n    caption = re.sub(r\"\\b\\s+\\:\\s+\", r\": \", caption)\n    caption = re.sub(r\"(\\D[,\\./])\\b\", r\"\\1 \", caption)\n    caption = re.sub(r\"\\s+\", \" \", caption)\n\n    caption.strip()\n\n    caption = re.sub(r\"^[\\\"\\']([\\w\\W]+)[\\\"\\']$\", r\"\\1\", caption)\n    caption = re.sub(r\"^[\\'\\_,\\-\\:;]\", r\"\", caption)\n    caption = re.sub(r\"[\\'\\_,\\-\\:\\-\\+]$\", r\"\", caption)\n    caption = re.sub(r\"^\\.\\S+$\", \"\", caption)\n\n    return caption.strip()\n\n\ndef text_preprocessing(text, use_text_preprocessing: bool = True):\n    if use_text_preprocessing:\n        # The exact text cleaning as was in the training stage:\n        text = clean_caption(text)\n        text = clean_caption(text)\n        return text\n    else:\n        return text.lower().strip()\n"
  },
  {
    "path": "Open-Sora/opensora/models/vae/__init__.py",
    "content": "from .discriminator import DISCRIMINATOR_3D\nfrom .vae import VideoAutoencoderKL, VideoAutoencoderKLTemporalDecoder\nfrom .vae_temporal import VAE_Temporal\n"
  },
  {
    "path": "Open-Sora/opensora/models/vae/discriminator.py",
    "content": "import functools\nimport math\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom opensora.registry import MODELS\nfrom opensora.utils.ckpt_utils import find_model, load_checkpoint\n\n\ndef cast_tuple(t, length=1):\n    return t if isinstance(t, tuple) else ((t,) * length)\n\n\ndef xavier_uniform_weight_init(m):\n    if isinstance(m, nn.Conv3d) or isinstance(m, nn.Linear):\n        nn.init.xavier_uniform_(m.weight, gain=nn.init.calculate_gain(\"relu\"))\n        if m.bias is not None:\n            nn.init.zeros_(m.bias)\n        # print(\"initialized module to xavier_uniform:\", m)\n\n\n# SCH: taken from Open Sora Plan\ndef n_layer_disc_weights_init(m):\n    classname = m.__class__.__name__\n    if classname.find(\"Conv\") != -1:\n        nn.init.normal_(m.weight.data, 0.0, 0.02)\n    elif classname.find(\"BatchNorm\") != -1:\n        nn.init.normal_(m.weight.data, 1.0, 0.02)\n        nn.init.constant_(m.bias.data, 0)\n\n\n# SCH: own implementation modified on top of: discriminator with anti-aliased downsampling (blurpool Zhang et al.)\nclass BlurPool3D(nn.Module):\n    def __init__(\n        self,\n        channels,\n        pad_type=\"reflect\",\n        filt_size=3,\n        stride=2,\n        pad_off=0,\n        device=\"cpu\",\n        dtype=torch.bfloat16,\n    ):\n        super(BlurPool3D, self).__init__()\n        self.filt_size = filt_size\n        self.pad_off = pad_off\n        self.pad_sizes = [\n            int(1.0 * (filt_size - 1) / 2),\n            int(np.ceil(1.0 * (filt_size - 1) / 2)),\n            int(1.0 * (filt_size - 1) / 2),\n            int(np.ceil(1.0 * (filt_size - 1) / 2)),\n            int(1.0 * (filt_size - 1) / 2),\n            int(np.ceil(1.0 * (filt_size - 1) / 2)),\n        ]\n        self.pad_sizes = [pad_size + pad_off for pad_size in self.pad_sizes]\n        self.stride = stride\n        self.off = int((self.stride - 1) / 2.0)\n        self.channels = channels\n\n        if self.filt_size == 1:\n            a = np.array(\n                [\n                    1.0,\n                ]\n            )\n        elif self.filt_size == 2:\n            a = np.array([1.0, 1.0])\n        elif self.filt_size == 3:\n            a = np.array([1.0, 2.0, 1.0])\n        elif self.filt_size == 4:\n            a = np.array([1.0, 3.0, 3.0, 1.0])\n        elif self.filt_size == 5:\n            a = np.array([1.0, 4.0, 6.0, 4.0, 1.0])\n        elif self.filt_size == 6:\n            a = np.array([1.0, 5.0, 10.0, 10.0, 5.0, 1.0])\n        elif self.filt_size == 7:\n            a = np.array([1.0, 6.0, 15.0, 20.0, 15.0, 6.0, 1.0])\n\n        filt_2d = a[:, None] * a[None, :]\n        filt_3d = torch.Tensor(a[:, None, None] * filt_2d[None, :, :]).to(device, dtype)\n\n        filt = filt_3d / torch.sum(filt_3d)  # SCH: modified to it 3D\n        self.register_buffer(\"filt\", filt[None, None, :, :, :].repeat((self.channels, 1, 1, 1, 1)))\n\n        self.pad = get_pad_layer(pad_type)(self.pad_sizes)\n\n    def forward(self, inp):\n        if self.filt_size == 1:\n            if self.pad_off == 0:\n                return inp[:, :, :: self.stride, :: self.stride]\n            else:\n                return self.pad(inp)[:, :, :: self.stride, :: self.stride]\n        else:\n            return F.conv3d(self.pad(inp), self.filt, stride=self.stride, groups=inp.shape[1])\n\n\nclass ResBlockDown(nn.Module):\n    \"\"\"3D StyleGAN ResBlock for D.\"\"\"\n\n    def __init__(\n        self,\n        in_channels,\n        filters,\n        activation_fn,\n        num_groups=32,\n        device=\"cpu\",\n        dtype=torch.bfloat16,\n    ):\n        super().__init__()\n\n        self.filters = filters\n        self.activation_fn = activation_fn\n\n        # SCH: NOTE: although paper says conv (X->Y, Y->Y), original code implementation is (X->X, X->Y), we follow code\n        self.conv1 = nn.Conv3d(\n            in_channels, in_channels, (3, 3, 3), padding=1, device=device, dtype=dtype\n        )  # NOTE: init to xavier_uniform\n        self.norm1 = nn.GroupNorm(num_groups, in_channels, device=device, dtype=dtype)\n\n        self.blur = BlurPool3D(in_channels, device=device, dtype=dtype)\n\n        self.conv2 = nn.Conv3d(\n            in_channels, self.filters, (1, 1, 1), bias=False, device=device, dtype=dtype\n        )  # NOTE: init to xavier_uniform\n        self.conv3 = nn.Conv3d(\n            in_channels, self.filters, (3, 3, 3), padding=1, device=device, dtype=dtype\n        )  # NOTE: init to xavier_uniform\n        self.norm2 = nn.GroupNorm(num_groups, self.filters, device=device, dtype=dtype)\n\n        # self.apply(xavier_uniform_weight_init)\n\n    def forward(self, x):\n        residual = x\n        x = self.conv1(x)\n        x = self.norm1(x)\n        x = self.activation_fn(x)\n\n        residual = self.blur(residual)\n        residual = self.conv2(residual)\n\n        x = self.blur(x)\n        x = self.conv3(x)\n        x = self.norm2(x)\n        x = self.activation_fn(x)\n        out = (residual + x) / math.sqrt(2)\n        return out\n\n\n@MODELS.register_module()\nclass NLayerDiscriminator(nn.Module):\n    \"\"\"Defines a PatchGAN discriminator as in Pix2Pix\n    --> see https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/blob/master/models/networks.py\n    \"\"\"\n\n    def __init__(self, input_nc=3, ndf=64, n_layers=3, use_actnorm=False, from_pretrained=None):\n        \"\"\"Construct a PatchGAN discriminator\n        Parameters:\n            input_nc (int)  -- the number of channels in input images\n            ndf (int)       -- the number of filters in the last conv layer\n            n_layers (int)  -- the number of conv layers in the discriminator\n            norm_layer      -- normalization layer\n        \"\"\"\n        super(NLayerDiscriminator, self).__init__()\n\n        norm_layer = nn.BatchNorm2d\n\n        if type(norm_layer) == functools.partial:  # no need to use bias as BatchNorm2d has affine parameters\n            use_bias = norm_layer.func != nn.BatchNorm2d\n        else:\n            use_bias = norm_layer != nn.BatchNorm2d\n\n        kw = 4\n        padw = 1\n        sequence = [nn.Conv2d(input_nc, ndf, kernel_size=kw, stride=2, padding=padw), nn.LeakyReLU(0.2, True)]\n        nf_mult = 1\n        nf_mult_prev = 1\n        for n in range(1, n_layers):  # gradually increase the number of filters\n            nf_mult_prev = nf_mult\n            nf_mult = min(2**n, 8)\n            sequence += [\n                nn.Conv2d(ndf * nf_mult_prev, ndf * nf_mult, kernel_size=kw, stride=2, padding=padw, bias=use_bias),\n                norm_layer(ndf * nf_mult),\n                nn.LeakyReLU(0.2, True),\n            ]\n\n        nf_mult_prev = nf_mult\n        nf_mult = min(2**n_layers, 8)\n        sequence += [\n            nn.Conv2d(ndf * nf_mult_prev, ndf * nf_mult, kernel_size=kw, stride=1, padding=padw, bias=use_bias),\n            norm_layer(ndf * nf_mult),\n            nn.LeakyReLU(0.2, True),\n        ]\n\n        sequence += [\n            nn.Conv2d(ndf * nf_mult, 1, kernel_size=kw, stride=1, padding=padw)\n        ]  # output 1 channel prediction map\n        self.main = nn.Sequential(*sequence)\n\n        if from_pretrained is not None:\n            load_checkpoint(self, from_pretrained)\n\n    def forward(self, input):\n        \"\"\"Standard forward.\"\"\"\n        return self.main(input)\n\n\nclass NLayerDiscriminator3D(nn.Module):\n    \"\"\"Defines a 3D PatchGAN discriminator as in Pix2Pix but for 3D inputs.\"\"\"\n\n    def __init__(self, input_nc=1, ndf=64, n_layers=3, use_actnorm=False):\n        \"\"\"\n        Construct a 3D PatchGAN discriminator\n\n        Parameters:\n            input_nc (int)  -- the number of channels in input volumes\n            ndf (int)       -- the number of filters in the last conv layer\n            n_layers (int)  -- the number of conv layers in the discriminator\n            use_actnorm (bool) -- flag to use actnorm instead of batchnorm\n        \"\"\"\n        super(NLayerDiscriminator3D, self).__init__()\n        if not use_actnorm:\n            norm_layer = nn.BatchNorm3d\n        else:\n            raise NotImplementedError(\"Not implemented.\")\n        if type(norm_layer) == functools.partial:\n            use_bias = norm_layer.func != nn.BatchNorm3d\n        else:\n            use_bias = norm_layer != nn.BatchNorm3d\n\n        kw = 4\n        padw = 1\n        sequence = [nn.Conv3d(input_nc, ndf, kernel_size=kw, stride=2, padding=padw), nn.LeakyReLU(0.2, True)]\n        nf_mult = 1\n        nf_mult_prev = 1\n        for n in range(1, n_layers):  # gradually increase the number of filters\n            nf_mult_prev = nf_mult\n            nf_mult = min(2**n, 8)\n            sequence += [\n                nn.Conv3d(\n                    ndf * nf_mult_prev,\n                    ndf * nf_mult,\n                    kernel_size=(kw, kw, kw),\n                    stride=(1, 2, 2),\n                    padding=padw,\n                    bias=use_bias,\n                ),\n                norm_layer(ndf * nf_mult),\n                nn.LeakyReLU(0.2, True),\n            ]\n\n        nf_mult_prev = nf_mult\n        nf_mult = min(2**n_layers, 8)\n        sequence += [\n            nn.Conv3d(\n                ndf * nf_mult_prev, ndf * nf_mult, kernel_size=(kw, kw, kw), stride=1, padding=padw, bias=use_bias\n            ),\n            norm_layer(ndf * nf_mult),\n            nn.LeakyReLU(0.2, True),\n        ]\n\n        sequence += [\n            nn.Conv3d(ndf * nf_mult, 1, kernel_size=kw, stride=1, padding=padw)\n        ]  # output 1 channel prediction map\n        self.main = nn.Sequential(*sequence)\n\n    def forward(self, input):\n        \"\"\"Standard forward.\"\"\"\n        return self.main(input)\n\n\nclass StyleGANDiscriminatorBlur(nn.Module):\n    \"\"\"StyleGAN Discriminator.\n\n    SCH: NOTE:\n        this discriminator requries the num_frames to be fixed during training;\n        in case we pre-train with image then train on video, this disciminator's Linear layer would have to be re-trained!\n    \"\"\"\n\n    def __init__(\n        self,\n        image_size=(128, 128),\n        num_frames=17,\n        in_channels=3,\n        filters=128,\n        channel_multipliers=(2, 4, 4, 4, 4),\n        num_groups=32,\n        dtype=torch.bfloat16,\n        device=\"cpu\",\n    ):\n        super().__init__()\n\n        self.dtype = dtype\n        self.input_size = cast_tuple(image_size, 2)\n        self.filters = filters\n        self.activation_fn = nn.LeakyReLU(negative_slope=0.2)\n        self.channel_multipliers = channel_multipliers\n\n        self.conv1 = nn.Conv3d(\n            in_channels, self.filters, (3, 3, 3), padding=1, device=device, dtype=dtype\n        )  # NOTE: init to xavier_uniform\n\n        prev_filters = self.filters  # record in_channels\n        self.num_blocks = len(self.channel_multipliers)\n        self.res_block_list = nn.ModuleList([])\n        for i in range(self.num_blocks):\n            filters = self.filters * self.channel_multipliers[i]\n            self.res_block_list.append(\n                ResBlockDown(prev_filters, filters, self.activation_fn, device=device, dtype=dtype).apply(\n                    xavier_uniform_weight_init\n                )\n            )\n            prev_filters = filters  # update in_channels\n\n        self.conv2 = nn.Conv3d(\n            prev_filters, prev_filters, (3, 3, 3), padding=1, device=device, dtype=dtype\n        )  # NOTE: init to xavier_uniform\n        # torch.nn.init.xavier_uniform_(self.conv2.weight)\n\n        self.norm1 = nn.GroupNorm(num_groups, prev_filters, dtype=dtype, device=device)\n\n        scale_factor = 2**self.num_blocks\n        if num_frames % scale_factor != 0:  # SCH: NOTE: has first frame which would be padded before usage\n            time_scaled = num_frames // scale_factor + 1\n        else:\n            time_scaled = num_frames / scale_factor\n\n        assert (\n            self.input_size[0] % scale_factor == 0\n        ), f\"image width {self.input_size[0]} is not divisible by scale factor {scale_factor}\"\n        assert (\n            self.input_size[1] % scale_factor == 0\n        ), f\"image height {self.input_size[1]} is not divisible by scale factor {scale_factor}\"\n        w_scaled, h_scaled = self.input_size[0] / scale_factor, self.input_size[1] / scale_factor\n        in_features = int(prev_filters * time_scaled * w_scaled * h_scaled)  # (C*T*W*H)\n        self.linear1 = nn.Linear(in_features, prev_filters, device=device, dtype=dtype)  # NOTE: init to xavier_uniform\n        self.linear2 = nn.Linear(prev_filters, 1, device=device, dtype=dtype)  # NOTE: init to xavier_uniform\n\n        # self.apply(xavier_uniform_weight_init)\n\n    def forward(self, x):\n        x = self.conv1(x)\n        # print(\"discriminator aft conv:\", x.size())\n        x = self.activation_fn(x)\n\n        for i in range(self.num_blocks):\n            x = self.res_block_list[i](x)\n            # print(\"discriminator resblock down:\", x.size())\n\n        x = self.conv2(x)\n        # print(\"discriminator aft conv2:\", x.size())\n        x = self.norm1(x)\n        x = self.activation_fn(x)\n        x = x.reshape((x.shape[0], -1))  # SCH: [B, (C * T * W * H)] ?\n\n        # print(\"discriminator reshape:\", x.size())\n        x = self.linear1(x)\n        # print(\"discriminator aft linear1:\", x.size())\n\n        x = self.activation_fn(x)\n        x = self.linear2(x)\n        # print(\"discriminator aft linear2:\", x.size())\n        return x\n\n\ndef load_checkpoint_with_inflation(model, ckpt_path):\n    \"\"\"\n    pre-train using image, then inflate to 3D videos\n    \"\"\"\n    if ckpt_path.endswith(\".pt\") or ckpt_path.endswith(\".pth\"):\n        state_dict = find_model(ckpt_path)\n        with torch.no_grad():\n            for key in state_dict:\n                if key in model:\n                    # central inflation\n                    if state_dict[key].size() == model[key][:, :, 0, :, :].size():\n                        # temporal dimension\n                        val = torch.zeros_like(model[key])\n                        centre = int(model[key].size(2) // 2)\n                        val[:, :, centre, :, :] = state_dict[key]\n        missing_keys, unexpected_keys = model.load_state_dict(state_dict, strict=False)\n        print(f\"Missing keys: {missing_keys}\")\n        print(f\"Unexpected keys: {unexpected_keys}\")\n    else:\n        load_checkpoint(model, ckpt_path)  # use the default function\n\n\n@MODELS.register_module(\"DISCRIMINATOR_3D\")\ndef DISCRIMINATOR_3D(from_pretrained=None, inflate_from_2d=False, use_pretrained=True, **kwargs):\n    model = StyleGANDiscriminatorBlur(**kwargs).apply(xavier_uniform_weight_init)\n    if from_pretrained is not None:\n        if use_pretrained:\n            if inflate_from_2d:\n                load_checkpoint_with_inflation(model, from_pretrained)\n            else:\n                load_checkpoint(model, from_pretrained, model_name=\"discriminator\")\n                print(\"loaded discriminator\")\n        else:\n            print(f\"discriminator use_pretrained={use_pretrained}, initializing new discriminator\")\n\n    return model\n\n\n@MODELS.register_module(\"N_Layer_DISCRIMINATOR_3D\")\ndef DISCRIMINATOR_3D_N_Layer(from_pretrained=None, inflate_from_2d=False, use_pretrained=True, **kwargs):\n    model = NLayerDiscriminator3D(\n        input_nc=3,\n        n_layers=3,\n    ).apply(n_layer_disc_weights_init)\n    if from_pretrained is not None:\n        if use_pretrained:\n            if inflate_from_2d:\n                load_checkpoint_with_inflation(model, from_pretrained)\n            else:\n                load_checkpoint(model, from_pretrained, model_name=\"discriminator\")\n                print(\"loaded discriminator\")\n        else:\n            print(f\"discriminator use_pretrained={use_pretrained}, initializing new discriminator\")\n\n    return model\n"
  },
  {
    "path": "Open-Sora/opensora/models/vae/losses.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom einops import rearrange, repeat\n\nfrom .lpips import LPIPS\n\n\ndef hinge_d_loss(logits_real, logits_fake):\n    loss_real = torch.mean(F.relu(1.0 - logits_real))\n    loss_fake = torch.mean(F.relu(1.0 + logits_fake))\n    d_loss = 0.5 * (loss_real + loss_fake)\n    return d_loss\n\n\ndef vanilla_d_loss(logits_real, logits_fake):\n    d_loss = 0.5 * (\n        torch.mean(torch.nn.functional.softplus(-logits_real)) + torch.mean(torch.nn.functional.softplus(logits_fake))\n    )\n    return d_loss\n\n\n# from MAGVIT, used in place hof hinge_d_loss\ndef sigmoid_cross_entropy_with_logits(labels, logits):\n    # The final formulation is: max(x, 0) - x * z + log(1 + exp(-abs(x)))\n    zeros = torch.zeros_like(logits, dtype=logits.dtype)\n    condition = logits >= zeros\n    relu_logits = torch.where(condition, logits, zeros)\n    neg_abs_logits = torch.where(condition, -logits, logits)\n    return relu_logits - logits * labels + torch.log1p(torch.exp(neg_abs_logits))\n\n\ndef lecam_reg(real_pred, fake_pred, ema_real_pred, ema_fake_pred):\n    assert real_pred.ndim == 0 and ema_fake_pred.ndim == 0\n    lecam_loss = torch.mean(torch.pow(nn.ReLU()(real_pred - ema_fake_pred), 2))\n    lecam_loss += torch.mean(torch.pow(nn.ReLU()(ema_real_pred - fake_pred), 2))\n    return lecam_loss\n\n\ndef gradient_penalty_fn(images, output):\n    gradients = torch.autograd.grad(\n        outputs=output,\n        inputs=images,\n        grad_outputs=torch.ones(output.size(), device=images.device),\n        create_graph=True,\n        retain_graph=True,\n        only_inputs=True,\n    )[0]\n\n    gradients = rearrange(gradients, \"b ... -> b (...)\")\n    return ((gradients.norm(2, dim=1) - 1) ** 2).mean()\n\n\nclass VAELoss(nn.Module):\n    def __init__(\n        self,\n        logvar_init=0.0,\n        perceptual_loss_weight=0.1,\n        kl_loss_weight=0.000001,\n        device=\"cpu\",\n        dtype=\"bf16\",\n    ):\n        super().__init__()\n\n        if type(dtype) == str:\n            if dtype == \"bf16\":\n                dtype = torch.bfloat16\n            elif dtype == \"fp16\":\n                dtype = torch.float16\n            else:\n                raise NotImplementedError(f\"dtype: {dtype}\")\n\n        # KL Loss\n        self.kl_loss_weight = kl_loss_weight\n        # Perceptual Loss\n        self.perceptual_loss_fn = LPIPS().eval().to(device, dtype)\n        self.perceptual_loss_weight = perceptual_loss_weight\n        self.logvar = nn.Parameter(torch.ones(size=()) * logvar_init)\n\n    def forward(\n        self,\n        video,\n        recon_video,\n        posterior,\n        nll_weights=None,\n        no_perceptual=False,\n    ):\n        video = rearrange(video, \"b c t h w -> (b t) c h w\").contiguous()\n        recon_video = rearrange(recon_video, \"b c t h w -> (b t) c h w\").contiguous()\n\n        # reconstruction loss\n        recon_loss = torch.abs(video - recon_video)\n\n        # perceptual loss\n        if self.perceptual_loss_weight is not None and self.perceptual_loss_weight > 0.0 and not no_perceptual:\n            # handle channels\n            channels = video.shape[1]\n            assert channels in {1, 3}\n            if channels == 1:\n                input_vgg_input = repeat(video, \"b 1 h w -> b c h w\", c=3)\n                recon_vgg_input = repeat(recon_video, \"b 1 h w -> b c h w\", c=3)\n            else:\n                input_vgg_input = video\n                recon_vgg_input = recon_video\n\n            perceptual_loss = self.perceptual_loss_fn(input_vgg_input, recon_vgg_input)\n            recon_loss = recon_loss + self.perceptual_loss_weight * perceptual_loss\n\n        nll_loss = recon_loss / torch.exp(self.logvar) + self.logvar\n\n        weighted_nll_loss = nll_loss\n        if nll_weights is not None:\n            weighted_nll_loss = nll_weights * nll_loss\n        weighted_nll_loss = torch.sum(weighted_nll_loss) / weighted_nll_loss.shape[0]\n        nll_loss = torch.sum(nll_loss) / nll_loss.shape[0]\n\n        # KL Loss\n        weighted_kl_loss = 0\n        if self.kl_loss_weight is not None and self.kl_loss_weight > 0.0:\n            kl_loss = posterior.kl()\n            kl_loss = torch.sum(kl_loss) / kl_loss.shape[0]\n            weighted_kl_loss = kl_loss * self.kl_loss_weight\n\n        return nll_loss, weighted_nll_loss, weighted_kl_loss\n\n\ndef adopt_weight(weight, global_step, threshold=0, value=0.0):\n    if global_step < threshold:\n        weight = value\n    return weight\n\n\nclass AdversarialLoss(nn.Module):\n    def __init__(\n        self,\n        discriminator_factor=1.0,\n        discriminator_start=50001,\n        generator_factor=0.5,\n        generator_loss_type=\"non-saturating\",\n    ):\n        super().__init__()\n        self.discriminator_factor = discriminator_factor\n        self.discriminator_start = discriminator_start\n        self.generator_factor = generator_factor\n        self.generator_loss_type = generator_loss_type\n\n    def calculate_adaptive_weight(self, nll_loss, g_loss, last_layer):\n        nll_grads = torch.autograd.grad(nll_loss, last_layer, retain_graph=True)[0]\n        g_grads = torch.autograd.grad(g_loss, last_layer, retain_graph=True)[0]\n        d_weight = torch.norm(nll_grads) / (torch.norm(g_grads) + 1e-4)\n        d_weight = torch.clamp(d_weight, 0.0, 1e4).detach()\n        d_weight = d_weight * self.generator_factor\n        return d_weight\n\n    def forward(\n        self,\n        fake_logits,\n        nll_loss,\n        last_layer,\n        global_step,\n        is_training=True,\n    ):\n        # NOTE: following MAGVIT to allow non_saturating\n        assert self.generator_loss_type in [\"hinge\", \"vanilla\", \"non-saturating\"]\n\n        if self.generator_loss_type == \"hinge\":\n            gen_loss = -torch.mean(fake_logits)\n        elif self.generator_loss_type == \"non-saturating\":\n            gen_loss = torch.mean(\n                sigmoid_cross_entropy_with_logits(labels=torch.ones_like(fake_logits), logits=fake_logits)\n            )\n        else:\n            raise ValueError(\"Generator loss {} not supported\".format(self.generator_loss_type))\n\n        if self.discriminator_factor is not None and self.discriminator_factor > 0.0:\n            try:\n                d_weight = self.calculate_adaptive_weight(nll_loss, gen_loss, last_layer)\n            except RuntimeError:\n                assert not is_training\n                d_weight = torch.tensor(0.0)\n        else:\n            d_weight = torch.tensor(0.0)\n\n        disc_factor = adopt_weight(self.discriminator_factor, global_step, threshold=self.discriminator_start)\n        weighted_gen_loss = d_weight * disc_factor * gen_loss\n\n        return weighted_gen_loss\n\n\nclass LeCamEMA:\n    def __init__(self, ema_real=0.0, ema_fake=0.0, decay=0.999, dtype=torch.bfloat16, device=\"cpu\"):\n        self.decay = decay\n        self.ema_real = torch.tensor(ema_real).to(device, dtype)\n        self.ema_fake = torch.tensor(ema_fake).to(device, dtype)\n\n    def update(self, ema_real, ema_fake):\n        self.ema_real = self.ema_real * self.decay + ema_real * (1 - self.decay)\n        self.ema_fake = self.ema_fake * self.decay + ema_fake * (1 - self.decay)\n\n    def get(self):\n        return self.ema_real, self.ema_fake\n\n\nclass DiscriminatorLoss(nn.Module):\n    def __init__(\n        self,\n        discriminator_factor=1.0,\n        discriminator_start=50001,\n        discriminator_loss_type=\"non-saturating\",\n        lecam_loss_weight=None,\n        gradient_penalty_loss_weight=None,  # SCH: following MAGVIT config.vqgan.grad_penalty_cost\n    ):\n        super().__init__()\n\n        assert discriminator_loss_type in [\"hinge\", \"vanilla\", \"non-saturating\"]\n        self.discriminator_factor = discriminator_factor\n        self.discriminator_start = discriminator_start\n        self.lecam_loss_weight = lecam_loss_weight\n        self.gradient_penalty_loss_weight = gradient_penalty_loss_weight\n        self.discriminator_loss_type = discriminator_loss_type\n\n    def forward(\n        self,\n        real_logits,\n        fake_logits,\n        global_step,\n        lecam_ema_real=None,\n        lecam_ema_fake=None,\n        real_video=None,\n        split=\"train\",\n    ):\n        if self.discriminator_factor is not None and self.discriminator_factor > 0.0:\n            disc_factor = adopt_weight(self.discriminator_factor, global_step, threshold=self.discriminator_start)\n\n            if self.discriminator_loss_type == \"hinge\":\n                disc_loss = hinge_d_loss(real_logits, fake_logits)\n            elif self.discriminator_loss_type == \"non-saturating\":\n                if real_logits is not None:\n                    real_loss = sigmoid_cross_entropy_with_logits(\n                        labels=torch.ones_like(real_logits), logits=real_logits\n                    )\n                else:\n                    real_loss = 0.0\n                if fake_logits is not None:\n                    fake_loss = sigmoid_cross_entropy_with_logits(\n                        labels=torch.zeros_like(fake_logits), logits=fake_logits\n                    )\n                else:\n                    fake_loss = 0.0\n                disc_loss = 0.5 * (torch.mean(real_loss) + torch.mean(fake_loss))\n            elif self.discriminator_loss_type == \"vanilla\":\n                disc_loss = vanilla_d_loss(real_logits, fake_logits)\n            else:\n                raise ValueError(f\"Unknown GAN loss '{self.discriminator_loss_type}'.\")\n\n            weighted_d_adversarial_loss = disc_factor * disc_loss\n\n        else:\n            weighted_d_adversarial_loss = 0\n\n        lecam_loss = torch.tensor(0.0)\n        if self.lecam_loss_weight is not None and self.lecam_loss_weight > 0.0:\n            real_pred = torch.mean(real_logits)\n            fake_pred = torch.mean(fake_logits)\n            lecam_loss = lecam_reg(real_pred, fake_pred, lecam_ema_real, lecam_ema_fake)\n            lecam_loss = lecam_loss * self.lecam_loss_weight\n\n        gradient_penalty = torch.tensor(0.0)\n        if self.gradient_penalty_loss_weight is not None and self.gradient_penalty_loss_weight > 0.0:\n            assert real_video is not None\n            gradient_penalty = gradient_penalty_fn(real_video, real_logits)\n            gradient_penalty *= self.gradient_penalty_loss_weight\n\n        return (weighted_d_adversarial_loss, lecam_loss, gradient_penalty)\n"
  },
  {
    "path": "Open-Sora/opensora/models/vae/lpips.py",
    "content": "import hashlib\nimport os\nfrom collections import namedtuple\n\nimport requests\nimport torch\nimport torch.nn as nn\nfrom torchvision import models\nfrom tqdm import tqdm\n\nURL_MAP = {\"vgg_lpips\": \"https://heibox.uni-heidelberg.de/f/607503859c864bc1b30b/?dl=1\"}\n\nCKPT_MAP = {\"vgg_lpips\": \"vgg.pth\"}\n\nMD5_MAP = {\"vgg_lpips\": \"d507d7349b931f0638a25a48a722f98a\"}\n\n\ndef md5_hash(path):\n    with open(path, \"rb\") as f:\n        content = f.read()\n    return hashlib.md5(content).hexdigest()\n\n\ndef download(url, local_path, chunk_size=1024):\n    os.makedirs(os.path.split(local_path)[0], exist_ok=True)\n    with requests.get(url, stream=True) as r:\n        total_size = int(r.headers.get(\"content-length\", 0))\n        with tqdm(total=total_size, unit=\"B\", unit_scale=True) as pbar:\n            with open(local_path, \"wb\") as f:\n                for data in r.iter_content(chunk_size=chunk_size):\n                    if data:\n                        f.write(data)\n                        pbar.update(chunk_size)\n\n\ndef get_ckpt_path(name, root, check=False):\n    assert name in URL_MAP\n    path = os.path.join(root, CKPT_MAP[name])\n    if not os.path.exists(path) or (check and not md5_hash(path) == MD5_MAP[name]):\n        print(\"Downloading {} model from {} to {}\".format(name, URL_MAP[name], path))\n        download(URL_MAP[name], path)\n        md5 = md5_hash(path)\n        assert md5 == MD5_MAP[name], md5\n    return path\n\n\nclass LPIPS(nn.Module):\n    # Learned perceptual metric\n    def __init__(self, use_dropout=True):\n        super().__init__()\n        self.scaling_layer = ScalingLayer()\n        self.chns = [64, 128, 256, 512, 512]  # vg16 features\n        self.net = vgg16(pretrained=True, requires_grad=False)\n        self.lin0 = NetLinLayer(self.chns[0], use_dropout=use_dropout)\n        self.lin1 = NetLinLayer(self.chns[1], use_dropout=use_dropout)\n        self.lin2 = NetLinLayer(self.chns[2], use_dropout=use_dropout)\n        self.lin3 = NetLinLayer(self.chns[3], use_dropout=use_dropout)\n        self.lin4 = NetLinLayer(self.chns[4], use_dropout=use_dropout)\n        self.load_from_pretrained()\n        for param in self.parameters():\n            param.requires_grad = False\n\n    def load_from_pretrained(self, name=\"vgg_lpips\"):\n        ckpt = get_ckpt_path(name, \"pretrained_models/taming/modules/autoencoder/lpips\")\n        self.load_state_dict(torch.load(ckpt, map_location=torch.device(\"cpu\")), strict=False)\n        # print(\"loaded pretrained LPIPS loss from {}\".format(ckpt))\n\n    @classmethod\n    def from_pretrained(cls, name=\"vgg_lpips\"):\n        if name != \"vgg_lpips\":\n            raise NotImplementedError\n        model = cls()\n        ckpt = get_ckpt_path(name)\n        model.load_state_dict(torch.load(ckpt, map_location=torch.device(\"cpu\")), strict=False)\n        return model\n\n    def forward(self, input, target):\n        in0_input, in1_input = (self.scaling_layer(input), self.scaling_layer(target))\n        outs0, outs1 = self.net(in0_input), self.net(in1_input)\n        feats0, feats1, diffs = {}, {}, {}\n        lins = [self.lin0, self.lin1, self.lin2, self.lin3, self.lin4]\n        for kk in range(len(self.chns)):\n            feats0[kk], feats1[kk] = normalize_tensor(outs0[kk]), normalize_tensor(outs1[kk])\n            diffs[kk] = (feats0[kk] - feats1[kk]) ** 2\n\n        res = [spatial_average(lins[kk].model(diffs[kk]), keepdim=True) for kk in range(len(self.chns))]\n        val = res[0]\n        for l in range(1, len(self.chns)):\n            val += res[l]\n        return val\n\n\nclass ScalingLayer(nn.Module):\n    def __init__(self):\n        super(ScalingLayer, self).__init__()\n        self.register_buffer(\"shift\", torch.Tensor([-0.030, -0.088, -0.188])[None, :, None, None])\n        self.register_buffer(\"scale\", torch.Tensor([0.458, 0.448, 0.450])[None, :, None, None])\n\n    def forward(self, inp):\n        return (inp - self.shift) / self.scale\n\n\nclass NetLinLayer(nn.Module):\n    \"\"\"A single linear layer which does a 1x1 conv\"\"\"\n\n    def __init__(self, chn_in, chn_out=1, use_dropout=False):\n        super(NetLinLayer, self).__init__()\n        layers = (\n            [\n                nn.Dropout(),\n            ]\n            if (use_dropout)\n            else []\n        )\n        layers += [\n            nn.Conv2d(chn_in, chn_out, 1, stride=1, padding=0, bias=False),\n        ]\n        self.model = nn.Sequential(*layers)\n\n\nclass vgg16(torch.nn.Module):\n    def __init__(self, requires_grad=False, pretrained=True):\n        super(vgg16, self).__init__()\n        vgg_pretrained_features = models.vgg16(pretrained=pretrained).features\n        self.slice1 = torch.nn.Sequential()\n        self.slice2 = torch.nn.Sequential()\n        self.slice3 = torch.nn.Sequential()\n        self.slice4 = torch.nn.Sequential()\n        self.slice5 = torch.nn.Sequential()\n        self.N_slices = 5\n        for x in range(4):\n            self.slice1.add_module(str(x), vgg_pretrained_features[x])\n        for x in range(4, 9):\n            self.slice2.add_module(str(x), vgg_pretrained_features[x])\n        for x in range(9, 16):\n            self.slice3.add_module(str(x), vgg_pretrained_features[x])\n        for x in range(16, 23):\n            self.slice4.add_module(str(x), vgg_pretrained_features[x])\n        for x in range(23, 30):\n            self.slice5.add_module(str(x), vgg_pretrained_features[x])\n        if not requires_grad:\n            for param in self.parameters():\n                param.requires_grad = False\n\n    def forward(self, X):\n        h = self.slice1(X)\n        h_relu1_2 = h\n        h = self.slice2(h)\n        h_relu2_2 = h\n        h = self.slice3(h)\n        h_relu3_3 = h\n        h = self.slice4(h)\n        h_relu4_3 = h\n        h = self.slice5(h)\n        h_relu5_3 = h\n        vgg_outputs = namedtuple(\"VggOutputs\", [\"relu1_2\", \"relu2_2\", \"relu3_3\", \"relu4_3\", \"relu5_3\"])\n        out = vgg_outputs(h_relu1_2, h_relu2_2, h_relu3_3, h_relu4_3, h_relu5_3)\n        return out\n\n\ndef normalize_tensor(x, eps=1e-10):\n    norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True))\n    return x / (norm_factor + eps)\n\n\ndef spatial_average(x, keepdim=True):\n    return x.mean([2, 3], keepdim=keepdim)\n"
  },
  {
    "path": "Open-Sora/opensora/models/vae/utils.py",
    "content": "import numpy as np\nimport torch\n\n\"\"\"Stripped version of https://github.com/richzhang/PerceptualSimilarity/tree/master/models\"\"\"\n\n\nclass DiagonalGaussianDistribution(object):\n    def __init__(\n        self,\n        parameters,\n        deterministic=False,\n    ):\n        self.parameters = parameters\n        self.mean, self.logvar = torch.chunk(parameters, 2, dim=1)\n        self.logvar = torch.clamp(self.logvar, -30.0, 20.0)\n        self.deterministic = deterministic\n        self.std = torch.exp(0.5 * self.logvar)\n        self.var = torch.exp(self.logvar)\n        if self.deterministic:\n            self.var = self.std = torch.zeros_like(self.mean).to(device=self.parameters.device, dtype=self.mean.dtype)\n\n    def sample(self):\n        # torch.randn: standard normal distribution\n        x = self.mean + self.std * torch.randn(self.mean.shape).to(device=self.parameters.device, dtype=self.mean.dtype)\n        return x\n\n    def kl(self, other=None):\n        if self.deterministic:\n            return torch.Tensor([0.0])\n        else:\n            if other is None:  # SCH: assumes other is a standard normal distribution\n                return 0.5 * torch.sum(torch.pow(self.mean, 2) + self.var - 1.0 - self.logvar, dim=[1, 2, 3, 4])\n            else:\n                return 0.5 * torch.sum(\n                    torch.pow(self.mean - other.mean, 2) / other.var\n                    + self.var / other.var\n                    - 1.0\n                    - self.logvar\n                    + other.logvar,\n                    dim=[1, 2, 3, 4],\n                )\n\n    def nll(self, sample, dims=[1, 2, 3, 4]):\n        if self.deterministic:\n            return torch.Tensor([0.0])\n        logtwopi = np.log(2.0 * np.pi)\n        return 0.5 * torch.sum(logtwopi + self.logvar + torch.pow(sample - self.mean, 2) / self.var, dim=dims)\n\n    def mode(self):\n        return self.mean\n"
  },
  {
    "path": "Open-Sora/opensora/models/vae/vae.py",
    "content": "import os\n\nimport torch\nimport torch.nn as nn\nfrom diffusers.models import AutoencoderKL, AutoencoderKLTemporalDecoder\nfrom einops import rearrange\nfrom transformers import PretrainedConfig, PreTrainedModel\n\nfrom opensora.registry import MODELS, build_module\nfrom opensora.utils.ckpt_utils import load_checkpoint\n\n\n@MODELS.register_module()\nclass VideoAutoencoderKL(nn.Module):\n    def __init__(\n        self,\n        from_pretrained=None,\n        micro_batch_size=None,\n        cache_dir=None,\n        local_files_only=False,\n        subfolder=None,\n        scaling_factor=0.18215,\n    ):\n        super().__init__()\n        self.module = AutoencoderKL.from_pretrained(\n            from_pretrained,\n            cache_dir=cache_dir,\n            local_files_only=local_files_only,\n            subfolder=subfolder,\n        )\n        self.out_channels = self.module.config.latent_channels\n        self.patch_size = (1, 8, 8)\n        self.micro_batch_size = micro_batch_size\n        self.scaling_factor = scaling_factor\n\n    def encode(self, x):\n        # x: (B, C, T, H, W)\n        B = x.shape[0]\n        x = rearrange(x, \"B C T H W -> (B T) C H W\")\n\n        if self.micro_batch_size is None:\n            x = self.module.encode(x).latent_dist.sample().mul_(self.scaling_factor)\n        else:\n            # NOTE: cannot be used for training\n            bs = self.micro_batch_size\n            x_out = []\n            for i in range(0, x.shape[0], bs):\n                x_bs = x[i : i + bs]\n                x_bs = self.module.encode(x_bs).latent_dist.sample().mul_(self.scaling_factor)\n                x_out.append(x_bs)\n            x = torch.cat(x_out, dim=0)\n        x = rearrange(x, \"(B T) C H W -> B C T H W\", B=B)\n        return x\n\n    def decode(self, x, **kwargs):\n        # x: (B, C, T, H, W)\n        B = x.shape[0]\n        x = rearrange(x, \"B C T H W -> (B T) C H W\")\n        if self.micro_batch_size is None:\n            x = self.module.decode(x / self.scaling_factor).sample\n        else:\n            # NOTE: cannot be used for training\n            bs = self.micro_batch_size\n            x_out = []\n            for i in range(0, x.shape[0], bs):\n                x_bs = x[i : i + bs]\n                x_bs = self.module.decode(x_bs / self.scaling_factor).sample\n                x_out.append(x_bs)\n            x = torch.cat(x_out, dim=0)\n        x = rearrange(x, \"(B T) C H W -> B C T H W\", B=B)\n        return x\n\n    def get_latent_size(self, input_size):\n        latent_size = []\n        for i in range(3):\n            # assert (\n            #     input_size[i] is None or input_size[i] % self.patch_size[i] == 0\n            # ), \"Input size must be divisible by patch size\"\n            latent_size.append(input_size[i] // self.patch_size[i] if input_size[i] is not None else None)\n        return latent_size\n\n    @property\n    def device(self):\n        return next(self.parameters()).device\n\n    @property\n    def dtype(self):\n        return next(self.parameters()).dtype\n\n\n@MODELS.register_module()\nclass VideoAutoencoderKLTemporalDecoder(nn.Module):\n    def __init__(self, from_pretrained=None, cache_dir=None, local_files_only=False):\n        super().__init__()\n        self.module = AutoencoderKLTemporalDecoder.from_pretrained(\n            from_pretrained, cache_dir=cache_dir, local_files_only=local_files_only\n        )\n        self.out_channels = self.module.config.latent_channels\n        self.patch_size = (1, 8, 8)\n\n    def encode(self, x):\n        raise NotImplementedError\n\n    def decode(self, x, **kwargs):\n        B, _, T = x.shape[:3]\n        x = rearrange(x, \"B C T H W -> (B T) C H W\")\n        x = self.module.decode(x / 0.18215, num_frames=T).sample\n        x = rearrange(x, \"(B T) C H W -> B C T H W\", B=B)\n        return x\n\n    def get_latent_size(self, input_size):\n        latent_size = []\n        for i in range(3):\n            # assert (\n            #     input_size[i] is None or input_size[i] % self.patch_size[i] == 0\n            # ), \"Input size must be divisible by patch size\"\n            latent_size.append(input_size[i] // self.patch_size[i] if input_size[i] is not None else None)\n        return latent_size\n\n    @property\n    def device(self):\n        return next(self.parameters()).device\n\n    @property\n    def dtype(self):\n        return next(self.parameters()).dtype\n\n\nclass VideoAutoencoderPipelineConfig(PretrainedConfig):\n    model_type = \"VideoAutoencoderPipeline\"\n\n    def __init__(\n        self,\n        vae_2d=None,\n        vae_temporal=None,\n        from_pretrained=None,\n        freeze_vae_2d=False,\n        cal_loss=False,\n        micro_frame_size=None,\n        shift=0.0,\n        scale=1.0,\n        **kwargs,\n    ):\n        self.vae_2d = vae_2d\n        self.vae_temporal = vae_temporal\n        self.from_pretrained = from_pretrained\n        self.freeze_vae_2d = freeze_vae_2d\n        self.cal_loss = cal_loss\n        self.micro_frame_size = micro_frame_size\n        self.shift = shift\n        self.scale = scale\n        super().__init__(**kwargs)\n\n\nclass VideoAutoencoderPipeline(PreTrainedModel):\n    config_class = VideoAutoencoderPipelineConfig\n\n    def __init__(self, config: VideoAutoencoderPipelineConfig):\n        super().__init__(config=config)\n        self.spatial_vae = build_module(config.vae_2d, MODELS)\n        self.temporal_vae = build_module(config.vae_temporal, MODELS)\n        self.cal_loss = config.cal_loss\n        self.micro_frame_size = config.micro_frame_size\n        self.micro_z_frame_size = self.temporal_vae.get_latent_size([config.micro_frame_size, None, None])[0]\n\n        if config.freeze_vae_2d:\n            for param in self.spatial_vae.parameters():\n                param.requires_grad = False\n\n        self.out_channels = self.temporal_vae.out_channels\n\n        # normalization parameters\n        scale = torch.tensor(config.scale)\n        shift = torch.tensor(config.shift)\n        if len(scale.shape) > 0:\n            scale = scale[None, :, None, None, None]\n        if len(shift.shape) > 0:\n            shift = shift[None, :, None, None, None]\n        self.register_buffer(\"scale\", scale)\n        self.register_buffer(\"shift\", shift)\n\n    def encode(self, x):\n        x_z = self.spatial_vae.encode(x)\n\n        if self.micro_frame_size is None:\n            posterior = self.temporal_vae.encode(x_z)\n            z = posterior.sample()\n        else:\n            z_list = []\n            for i in range(0, x_z.shape[2], self.micro_frame_size):\n                x_z_bs = x_z[:, :, i : i + self.micro_frame_size]\n                posterior = self.temporal_vae.encode(x_z_bs)\n                z_list.append(posterior.sample())\n            z = torch.cat(z_list, dim=2)\n\n        if self.cal_loss:\n            return z, posterior, x_z\n        else:\n            return (z - self.shift) / self.scale\n\n    def decode(self, z, num_frames=None):\n        if not self.cal_loss:\n            z = z * self.scale.to(z.dtype) + self.shift.to(z.dtype)\n\n        if self.micro_frame_size is None:\n            x_z = self.temporal_vae.decode(z, num_frames=num_frames)\n            x = self.spatial_vae.decode(x_z)\n        else:\n            x_z_list = []\n            for i in range(0, z.size(2), self.micro_z_frame_size):\n                z_bs = z[:, :, i : i + self.micro_z_frame_size]\n                x_z_bs = self.temporal_vae.decode(z_bs, num_frames=min(self.micro_frame_size, num_frames))\n                x_z_list.append(x_z_bs)\n                num_frames -= self.micro_frame_size\n            x_z = torch.cat(x_z_list, dim=2)\n            x = self.spatial_vae.decode(x_z)\n\n        if self.cal_loss:\n            return x, x_z\n        else:\n            return x\n\n    def forward(self, x):\n        assert self.cal_loss, \"This method is only available when cal_loss is True\"\n        z, posterior, x_z = self.encode(x)\n        x_rec, x_z_rec = self.decode(z, num_frames=x_z.shape[2])\n        return x_rec, x_z_rec, z, posterior, x_z\n\n    def get_latent_size(self, input_size):\n        if self.micro_frame_size is None or input_size[0] is None:\n            return self.temporal_vae.get_latent_size(self.spatial_vae.get_latent_size(input_size))\n        else:\n            sub_input_size = [self.micro_frame_size, input_size[1], input_size[2]]\n            sub_latent_size = self.temporal_vae.get_latent_size(self.spatial_vae.get_latent_size(sub_input_size))\n            sub_latent_size[0] = sub_latent_size[0] * (input_size[0] // self.micro_frame_size)\n            remain_temporal_size = [input_size[0] % self.micro_frame_size, None, None]\n            if remain_temporal_size[0] > 0:\n                remain_size = self.temporal_vae.get_latent_size(remain_temporal_size)\n                sub_latent_size[0] += remain_size[0]\n            return sub_latent_size\n\n    def get_temporal_last_layer(self):\n        return self.temporal_vae.decoder.conv_out.conv.weight\n\n    @property\n    def device(self):\n        return next(self.parameters()).device\n\n    @property\n    def dtype(self):\n        return next(self.parameters()).dtype\n\n\n@MODELS.register_module()\ndef OpenSoraVAE_V1_2(\n    micro_batch_size=4,\n    micro_frame_size=17,\n    from_pretrained=None,\n    local_files_only=False,\n    freeze_vae_2d=False,\n    cal_loss=False,\n    force_huggingface=False,\n):\n    vae_2d = dict(\n        type=\"VideoAutoencoderKL\",\n        from_pretrained=\"/root/autodl-tmp/pretrained_models/PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers\",\n        subfolder=\"vae\",\n        micro_batch_size=micro_batch_size,\n        local_files_only=local_files_only,\n    )\n    vae_temporal = dict(\n        type=\"VAE_Temporal_SD\",\n        from_pretrained=None,\n    )\n    shift = (-0.10, 0.34, 0.27, 0.98)\n    scale = (3.85, 2.32, 2.33, 3.06)\n    kwargs = dict(\n        vae_2d=vae_2d,\n        vae_temporal=vae_temporal,\n        freeze_vae_2d=freeze_vae_2d,\n        cal_loss=cal_loss,\n        micro_frame_size=micro_frame_size,\n        shift=shift,\n        scale=scale,\n    )\n\n    if force_huggingface or (from_pretrained is not None and not os.path.exists(from_pretrained)):\n        model = VideoAutoencoderPipeline.from_pretrained(from_pretrained, **kwargs)\n    else:\n        config = VideoAutoencoderPipelineConfig(**kwargs)\n        model = VideoAutoencoderPipeline(config)\n\n        if from_pretrained:\n            load_checkpoint(model, from_pretrained)\n    return model\n"
  },
  {
    "path": "Open-Sora/opensora/models/vae/vae_temporal.py",
    "content": "from typing import Tuple, Union\n\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom einops import rearrange\n\nfrom opensora.registry import MODELS\nfrom opensora.utils.ckpt_utils import load_checkpoint\n\nfrom .utils import DiagonalGaussianDistribution\n\n\ndef cast_tuple(t, length=1):\n    return t if isinstance(t, tuple) else ((t,) * length)\n\n\ndef divisible_by(num, den):\n    return (num % den) == 0\n\n\ndef is_odd(n):\n    return not divisible_by(n, 2)\n\n\ndef pad_at_dim(t, pad, dim=-1):\n    dims_from_right = (-dim - 1) if dim < 0 else (t.ndim - dim - 1)\n    zeros = (0, 0) * dims_from_right\n    return F.pad(t, (*zeros, *pad), mode=\"constant\")\n\n\ndef exists(v):\n    return v is not None\n\n\nclass CausalConv3d(nn.Module):\n    def __init__(\n        self,\n        chan_in,\n        chan_out,\n        kernel_size: Union[int, Tuple[int, int, int]],\n        pad_mode=\"constant\",\n        strides=None,  # allow custom stride\n        **kwargs,\n    ):\n        super().__init__()\n        kernel_size = cast_tuple(kernel_size, 3)\n\n        time_kernel_size, height_kernel_size, width_kernel_size = kernel_size\n\n        assert is_odd(height_kernel_size) and is_odd(width_kernel_size)\n\n        dilation = kwargs.pop(\"dilation\", 1)\n        stride = strides[0] if strides is not None else kwargs.pop(\"stride\", 1)\n\n        self.pad_mode = pad_mode\n        time_pad = dilation * (time_kernel_size - 1) + (1 - stride)\n        height_pad = height_kernel_size // 2\n        width_pad = width_kernel_size // 2\n\n        self.time_pad = time_pad\n        self.time_causal_padding = (width_pad, width_pad, height_pad, height_pad, time_pad, 0)\n\n        stride = strides if strides is not None else (stride, 1, 1)\n        dilation = (dilation, 1, 1)\n        self.conv = nn.Conv3d(chan_in, chan_out, kernel_size, stride=stride, dilation=dilation, **kwargs)\n\n    def forward(self, x):\n        x = F.pad(x, self.time_causal_padding, mode=self.pad_mode)\n        x = self.conv(x)\n        return x\n\n\nclass ResBlock(nn.Module):\n    def __init__(\n        self,\n        in_channels,  # SCH: added\n        filters,\n        conv_fn,\n        activation_fn=nn.SiLU,\n        use_conv_shortcut=False,\n        num_groups=32,\n    ):\n        super().__init__()\n        self.in_channels = in_channels\n        self.filters = filters\n        self.activate = activation_fn()\n        self.use_conv_shortcut = use_conv_shortcut\n\n        # SCH: MAGVIT uses GroupNorm by default\n        self.norm1 = nn.GroupNorm(num_groups, in_channels)\n        self.conv1 = conv_fn(in_channels, self.filters, kernel_size=(3, 3, 3), bias=False)\n        self.norm2 = nn.GroupNorm(num_groups, self.filters)\n        self.conv2 = conv_fn(self.filters, self.filters, kernel_size=(3, 3, 3), bias=False)\n        if in_channels != filters:\n            if self.use_conv_shortcut:\n                self.conv3 = conv_fn(in_channels, self.filters, kernel_size=(3, 3, 3), bias=False)\n            else:\n                self.conv3 = conv_fn(in_channels, self.filters, kernel_size=(1, 1, 1), bias=False)\n\n    def forward(self, x):\n        residual = x\n        x = self.norm1(x)\n        x = self.activate(x)\n        x = self.conv1(x)\n        x = self.norm2(x)\n        x = self.activate(x)\n        x = self.conv2(x)\n        if self.in_channels != self.filters:  # SCH: ResBlock X->Y\n            residual = self.conv3(residual)\n        return x + residual\n\n\ndef get_activation_fn(activation):\n    if activation == \"relu\":\n        activation_fn = nn.ReLU\n    elif activation == \"swish\":\n        activation_fn = nn.SiLU\n    else:\n        raise NotImplementedError\n    return activation_fn\n\n\nclass Encoder(nn.Module):\n    \"\"\"Encoder Blocks.\"\"\"\n\n    def __init__(\n        self,\n        in_out_channels=4,\n        latent_embed_dim=512,  # num channels for latent vector\n        filters=128,\n        num_res_blocks=4,\n        channel_multipliers=(1, 2, 2, 4),\n        temporal_downsample=(False, True, True),\n        num_groups=32,  # for nn.GroupNorm\n        activation_fn=\"swish\",\n    ):\n        super().__init__()\n        self.filters = filters\n        self.num_res_blocks = num_res_blocks\n        self.num_blocks = len(channel_multipliers)\n        self.channel_multipliers = channel_multipliers\n        self.temporal_downsample = temporal_downsample\n        self.num_groups = num_groups\n        self.embedding_dim = latent_embed_dim\n\n        self.activation_fn = get_activation_fn(activation_fn)\n        self.activate = self.activation_fn()\n        self.conv_fn = CausalConv3d\n        self.block_args = dict(\n            conv_fn=self.conv_fn,\n            activation_fn=self.activation_fn,\n            use_conv_shortcut=False,\n            num_groups=self.num_groups,\n        )\n\n        # first layer conv\n        self.conv_in = self.conv_fn(\n            in_out_channels,\n            filters,\n            kernel_size=(3, 3, 3),\n            bias=False,\n        )\n\n        # ResBlocks and conv downsample\n        self.block_res_blocks = nn.ModuleList([])\n        self.conv_blocks = nn.ModuleList([])\n\n        filters = self.filters\n        prev_filters = filters  # record for in_channels\n        for i in range(self.num_blocks):\n            filters = self.filters * self.channel_multipliers[i]\n            block_items = nn.ModuleList([])\n            for _ in range(self.num_res_blocks):\n                block_items.append(ResBlock(prev_filters, filters, **self.block_args))\n                prev_filters = filters  # update in_channels\n            self.block_res_blocks.append(block_items)\n\n            if i < self.num_blocks - 1:\n                if self.temporal_downsample[i]:\n                    t_stride = 2 if self.temporal_downsample[i] else 1\n                    s_stride = 1\n                    self.conv_blocks.append(\n                        self.conv_fn(\n                            prev_filters, filters, kernel_size=(3, 3, 3), strides=(t_stride, s_stride, s_stride)\n                        )\n                    )\n                    prev_filters = filters  # update in_channels\n                else:\n                    # if no t downsample, don't add since this does nothing for pipeline models\n                    self.conv_blocks.append(nn.Identity(prev_filters))  # Identity\n                    prev_filters = filters  # update in_channels\n\n        # last layer res block\n        self.res_blocks = nn.ModuleList([])\n        for _ in range(self.num_res_blocks):\n            self.res_blocks.append(ResBlock(prev_filters, filters, **self.block_args))\n            prev_filters = filters  # update in_channels\n\n        # MAGVIT uses Group Normalization\n        self.norm1 = nn.GroupNorm(self.num_groups, prev_filters)\n\n        self.conv2 = self.conv_fn(prev_filters, self.embedding_dim, kernel_size=(1, 1, 1), padding=\"same\")\n\n    def forward(self, x):\n        x = self.conv_in(x)\n\n        for i in range(self.num_blocks):\n            for j in range(self.num_res_blocks):\n                x = self.block_res_blocks[i][j](x)\n            if i < self.num_blocks - 1:\n                x = self.conv_blocks[i](x)\n        for i in range(self.num_res_blocks):\n            x = self.res_blocks[i](x)\n\n        x = self.norm1(x)\n        x = self.activate(x)\n        x = self.conv2(x)\n        return x\n\n\nclass Decoder(nn.Module):\n    \"\"\"Decoder Blocks.\"\"\"\n\n    def __init__(\n        self,\n        in_out_channels=4,\n        latent_embed_dim=512,\n        filters=128,\n        num_res_blocks=4,\n        channel_multipliers=(1, 2, 2, 4),\n        temporal_downsample=(False, True, True),\n        num_groups=32,  # for nn.GroupNorm\n        activation_fn=\"swish\",\n    ):\n        super().__init__()\n        self.filters = filters\n        self.num_res_blocks = num_res_blocks\n        self.num_blocks = len(channel_multipliers)\n        self.channel_multipliers = channel_multipliers\n        self.temporal_downsample = temporal_downsample\n        self.num_groups = num_groups\n        self.embedding_dim = latent_embed_dim\n        self.s_stride = 1\n\n        self.activation_fn = get_activation_fn(activation_fn)\n        self.activate = self.activation_fn()\n        self.conv_fn = CausalConv3d\n        self.block_args = dict(\n            conv_fn=self.conv_fn,\n            activation_fn=self.activation_fn,\n            use_conv_shortcut=False,\n            num_groups=self.num_groups,\n        )\n\n        filters = self.filters * self.channel_multipliers[-1]\n        prev_filters = filters\n\n        # last conv\n        self.conv1 = self.conv_fn(self.embedding_dim, filters, kernel_size=(3, 3, 3), bias=True)\n\n        # last layer res block\n        self.res_blocks = nn.ModuleList([])\n        for _ in range(self.num_res_blocks):\n            self.res_blocks.append(ResBlock(filters, filters, **self.block_args))\n\n        # ResBlocks and conv upsample\n        self.block_res_blocks = nn.ModuleList([])\n        self.num_blocks = len(self.channel_multipliers)\n        self.conv_blocks = nn.ModuleList([])\n        # reverse to keep track of the in_channels, but append also in a reverse direction\n        for i in reversed(range(self.num_blocks)):\n            filters = self.filters * self.channel_multipliers[i]\n            # resblock handling\n            block_items = nn.ModuleList([])\n            for _ in range(self.num_res_blocks):\n                block_items.append(ResBlock(prev_filters, filters, **self.block_args))\n                prev_filters = filters  # SCH: update in_channels\n            self.block_res_blocks.insert(0, block_items)  # SCH: append in front\n\n            # conv blocks with upsampling\n            if i > 0:\n                if self.temporal_downsample[i - 1]:\n                    t_stride = 2 if self.temporal_downsample[i - 1] else 1\n                    # SCH: T-Causal Conv 3x3x3, f -> (t_stride * 2 * 2) * f, depth to space t_stride x 2 x 2\n                    self.conv_blocks.insert(\n                        0,\n                        self.conv_fn(\n                            prev_filters, prev_filters * t_stride * self.s_stride * self.s_stride, kernel_size=(3, 3, 3)\n                        ),\n                    )\n                else:\n                    self.conv_blocks.insert(\n                        0,\n                        nn.Identity(prev_filters),\n                    )\n\n        self.norm1 = nn.GroupNorm(self.num_groups, prev_filters)\n\n        self.conv_out = self.conv_fn(filters, in_out_channels, 3)\n\n    def forward(self, x):\n        x = self.conv1(x)\n        for i in range(self.num_res_blocks):\n            x = self.res_blocks[i](x)\n        for i in reversed(range(self.num_blocks)):\n            for j in range(self.num_res_blocks):\n                x = self.block_res_blocks[i][j](x)\n            if i > 0:\n                t_stride = 2 if self.temporal_downsample[i - 1] else 1\n                x = self.conv_blocks[i - 1](x)\n                x = rearrange(\n                    x,\n                    \"B (C ts hs ws) T H W -> B C (T ts) (H hs) (W ws)\",\n                    ts=t_stride,\n                    hs=self.s_stride,\n                    ws=self.s_stride,\n                )\n\n        x = self.norm1(x)\n        x = self.activate(x)\n        x = self.conv_out(x)\n        return x\n\n\n@MODELS.register_module()\nclass VAE_Temporal(nn.Module):\n    def __init__(\n        self,\n        in_out_channels=4,\n        latent_embed_dim=4,\n        embed_dim=4,\n        filters=128,\n        num_res_blocks=4,\n        channel_multipliers=(1, 2, 2, 4),\n        temporal_downsample=(True, True, False),\n        num_groups=32,  # for nn.GroupNorm\n        activation_fn=\"swish\",\n    ):\n        super().__init__()\n\n        self.time_downsample_factor = 2 ** sum(temporal_downsample)\n        # self.time_padding = self.time_downsample_factor - 1\n        self.patch_size = (self.time_downsample_factor, 1, 1)\n        self.out_channels = in_out_channels\n\n        # NOTE: following MAGVIT, conv in bias=False in encoder first conv\n        self.encoder = Encoder(\n            in_out_channels=in_out_channels,\n            latent_embed_dim=latent_embed_dim * 2,\n            filters=filters,\n            num_res_blocks=num_res_blocks,\n            channel_multipliers=channel_multipliers,\n            temporal_downsample=temporal_downsample,\n            num_groups=num_groups,  # for nn.GroupNorm\n            activation_fn=activation_fn,\n        )\n        self.quant_conv = CausalConv3d(2 * latent_embed_dim, 2 * embed_dim, 1)\n\n        self.post_quant_conv = CausalConv3d(embed_dim, latent_embed_dim, 1)\n        self.decoder = Decoder(\n            in_out_channels=in_out_channels,\n            latent_embed_dim=latent_embed_dim,\n            filters=filters,\n            num_res_blocks=num_res_blocks,\n            channel_multipliers=channel_multipliers,\n            temporal_downsample=temporal_downsample,\n            num_groups=num_groups,  # for nn.GroupNorm\n            activation_fn=activation_fn,\n        )\n\n    def get_latent_size(self, input_size):\n        latent_size = []\n        for i in range(3):\n            if input_size[i] is None:\n                lsize = None\n            elif i == 0:\n                time_padding = (\n                    0\n                    if (input_size[i] % self.time_downsample_factor == 0)\n                    else self.time_downsample_factor - input_size[i] % self.time_downsample_factor\n                )\n                lsize = (input_size[i] + time_padding) // self.patch_size[i]\n            else:\n                lsize = input_size[i] // self.patch_size[i]\n            latent_size.append(lsize)\n        return latent_size\n\n    def encode(self, x):\n        time_padding = (\n            0\n            if (x.shape[2] % self.time_downsample_factor == 0)\n            else self.time_downsample_factor - x.shape[2] % self.time_downsample_factor\n        )\n        x = pad_at_dim(x, (time_padding, 0), dim=2)\n        encoded_feature = self.encoder(x)\n        moments = self.quant_conv(encoded_feature).to(x.dtype)\n        posterior = DiagonalGaussianDistribution(moments)\n        return posterior\n\n    def decode(self, z, num_frames=None):\n        time_padding = (\n            0\n            if (num_frames % self.time_downsample_factor == 0)\n            else self.time_downsample_factor - num_frames % self.time_downsample_factor\n        )\n        z = self.post_quant_conv(z)\n        x = self.decoder(z)\n        x = x[:, :, time_padding:]\n        return x\n\n    def forward(self, x, sample_posterior=True):\n        posterior = self.encode(x)\n        if sample_posterior:\n            z = posterior.sample()\n        else:\n            z = posterior.mode()\n        recon_video = self.decode(z, num_frames=x.shape[2])\n        return recon_video, posterior, z\n\n\n@MODELS.register_module(\"VAE_Temporal_SD\")\ndef VAE_Temporal_SD(from_pretrained=None, **kwargs):\n    model = VAE_Temporal(\n        in_out_channels=4,\n        latent_embed_dim=4,\n        embed_dim=4,\n        filters=128,\n        num_res_blocks=4,\n        channel_multipliers=(1, 2, 2, 4),\n        temporal_downsample=(False, True, True),\n        **kwargs,\n    )\n    if from_pretrained is not None:\n        load_checkpoint(model, from_pretrained)\n    return model\n"
  },
  {
    "path": "Open-Sora/opensora/models/vae/video_sdxl/blocks.py",
    "content": "\"\"\"\nAdapted from SDXL VAE (https://huggingface.co/stabilityai/sdxl-vae/blob/main/config.json)\nAll default values of kwargs are the same as SDXL\n\"\"\"\n\nfrom typing import Optional, Tuple, Union\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom diffusers.models.attention_processor import Attention\nfrom einops import rearrange\n\n\ndef video_to_image(func):\n    def wrapper(self, x, *args, **kwargs):\n        if x.ndim == 5:\n            B = x.shape[0]\n            x = rearrange(x, 'B C T H W -> (B T) C H W')\n\n            if hasattr(self, 'micro_batch_size') and self.micro_batch_size is None:\n                x = func(self, x, *args, **kwargs)\n            else:\n                bs = self.micro_batch_size\n                x_out = []\n                for i in range(0, x.shape[0], bs):\n                    x_i = func(self, x[i:i + bs], *args, **kwargs)\n                    x_out.append(x_i)\n                x = torch.cat(x_out, dim=0)\n\n            x = rearrange(x, '(B T) C H W -> B C T H W', B=B)\n        return x\n    return wrapper\n\n\nclass VideoConv2d(nn.Conv2d):\n    def __init__(self, *args, micro_batch_size=None, **kwargs):\n        super().__init__(*args, **kwargs)\n        self.micro_batch_size = micro_batch_size\n\n    @video_to_image\n    def forward(self, x):\n        return super().forward(x)\n\n\nclass ResnetBlock2D(nn.Module):\n    \"\"\"\n        Use nn.Conv2d\n        Default activation is nn.SiLU()\n        Make sure input tensor is of shape [B, C, T, H, W] or [B, C, H, W]\n        Support micro_batch_size\n    \"\"\"\n    def __init__(\n        self,\n        in_channels: int,\n        out_channels: Optional[int] = None,\n        norm_groups: int = 32,\n        norm_eps: float = 1e-6,\n        micro_batch_size=None,\n    ):\n        super().__init__()\n        self.in_channels = in_channels\n        out_channels = in_channels if out_channels is None else out_channels\n        self.out_channels = out_channels\n        self.micro_batch_size = micro_batch_size\n\n        conv_cls = nn.Conv2d\n        self.norm1 = torch.nn.GroupNorm(num_groups=norm_groups, num_channels=in_channels, eps=norm_eps, affine=True)\n        self.conv1 = conv_cls(in_channels, out_channels, kernel_size=3, stride=1, padding=1)\n\n        self.norm2 = torch.nn.GroupNorm(num_groups=norm_groups, num_channels=out_channels, eps=norm_eps, affine=True)\n        self.conv2 = conv_cls(out_channels, out_channels, kernel_size=3, stride=1, padding=1)\n\n        self.act = nn.SiLU()\n\n        self.use_in_shortcut = self.in_channels != out_channels\n\n        self.conv_shortcut = None\n        if self.use_in_shortcut:\n            self.conv_shortcut = conv_cls(\n                in_channels,\n                out_channels,\n                kernel_size=1,\n                stride=1,\n                padding=0,\n            )\n\n    @video_to_image\n    def forward(self, x):\n        res = self.norm1(x)\n        res = self.act(res)\n        res = self.conv1(res)\n\n        res = self.norm2(res)\n        res = self.act(res)\n        res = self.conv2(res)\n\n        if self.conv_shortcut is not None:\n            x = self.conv_shortcut(x)\n\n        out = x + res\n        return out\n\n\nclass ResnetBlock3D(nn.Module):\n    \"\"\"\n        Use nn.Conv3d\n        Default activation is nn.SiLU()\n        Make sure input tensor is of shape [B, C, T, H, W]\n    \"\"\"\n    def __init__(\n        self,\n        in_channels: int,\n        out_channels: Optional[int] = None,\n        norm_groups: int = 32,\n        norm_eps: float = 1e-6,\n    ):\n        super().__init__()\n        self.in_channels = in_channels\n        out_channels = in_channels if out_channels is None else out_channels\n        self.out_channels = out_channels\n\n        conv_cls = nn.Conv3d\n        self.norm1 = torch.nn.GroupNorm(num_groups=norm_groups, num_channels=in_channels, eps=norm_eps, affine=True)\n        self.conv1 = conv_cls(in_channels, out_channels, kernel_size=3, stride=1, padding=1)\n\n        self.norm2 = torch.nn.GroupNorm(num_groups=norm_groups, num_channels=out_channels, eps=norm_eps, affine=True)\n        self.conv2 = conv_cls(out_channels, out_channels, kernel_size=3, stride=1, padding=1)\n\n        self.act = nn.SiLU()\n\n        self.use_in_shortcut = self.in_channels != out_channels\n\n        self.conv_shortcut = None\n        if self.use_in_shortcut:\n            self.conv_shortcut = conv_cls(\n                in_channels,\n                out_channels,\n                kernel_size=1,\n                stride=1,\n                padding=0,\n            )\n        \n    def forward(self, x):\n        res = self.norm1(x)\n        res = self.act(res)\n        res = self.conv1(res)\n\n        res = self.norm2(res)\n        res = self.act(res)\n        res = self.conv2(res)\n\n        if self.conv_shortcut is not None:\n            x = self.conv_shortcut(x)\n\n        out = x + res\n        return out\n\n\nclass SpatialDownsample2x(nn.Module):\n    \"\"\"\n        Default downsample is Conv2d(stride=2)\n        Make sure input tensor is of shape [B, C, T, H, W]\n        Support micro_batch_size\n    \"\"\"\n    def __init__(\n        self,\n        channels: int,\n        use_conv: bool = True,\n        micro_batch_size=None,\n    ):\n        super().__init__()\n        self.channels = channels\n        self.use_conv = use_conv\n        self.micro_batch_size = micro_batch_size\n\n        if use_conv:\n            self.downsample = nn.Conv2d(\n                self.channels, self.channels, kernel_size=3, stride=2, padding=0,\n            )\n        else:\n            self.downsample = nn.AvgPool2d(kernel_size=2, stride=2)\n\n    @video_to_image\n    def forward(self, x):\n        # implementation from SDXL\n        pad = (0, 1, 0, 1)\n        x = F.pad(x, pad, mode=\"constant\", value=0)\n\n        x = self.downsample(x)\n        return x\n\n\nclass SpatialUpsample2x(nn.Module):\n    \"\"\"\n        Default upsample is F.interpolate(scale_factor=2) + Conv2d(stride=1)\n        Make sure input tensor is of shape [B, C, T, H, W]\n        Support micro_batch_size\n    \"\"\"\n    def __init__(\n        self,\n        channels: int,\n        use_interpolate=True,\n        micro_batch_size=None,\n    ):\n        super().__init__()\n        self.channels = channels\n        self.use_interpolate = use_interpolate\n        self.micro_batch_size = micro_batch_size\n\n        if use_interpolate:\n            self.conv = nn.Conv2d(self.channels, self.channels, kernel_size=3, padding=1)\n        else:\n            raise NotImplementedError\n            self.upsample = nn.ConvTranspose2d(channels, self.channels, kernel_size=4, stride=2, padding=1)\n    \n    def forward(self, x):\n        B = x.shape[0]\n        x = rearrange(x, 'B C T H W -> (B T) C H W')\n\n        if self.micro_batch_size is None:\n            x = self.forward_BCHW(x)\n        else:\n            bs = self.micro_batch_size\n            x_out = []\n            for i in range(0, x.shape[0], bs):\n                x_i = self.forward_BCHW(x[i:i + bs])\n                x_out.append(x_i)\n            x = torch.cat(x_out, dim=0)\n\n        x = rearrange(x, '(B T) C H W -> B C T H W', B=B)\n        return x\n\n    def forward_BCHW(self, x):\n        if self.use_interpolate:\n            # upsample_nearest_nhwc fails with large batch sizes. see https://github.com/huggingface/diffusers/issues/984\n            if x.shape[0] >= 64:\n                x = x.contiguous()\n\n            # interpolate tensor of bfloat16 is fixed in pytorch 2.1. see https://github.com/pytorch/pytorch/issues/86679\n            x = F.interpolate(x, scale_factor=2.0, mode=\"nearest\")\n            x = self.conv(x)\n        else:\n            x = self.upsample(x)\n\n        return x\n\n\nclass TemporalDownsample2x(nn.Module):\n    \"\"\"\n        Default downsample is Conv3d(stride=(2, 1, 1))\n        Make sure input tensor is of shape [B, C, T, H, W]\n    \"\"\"\n    def __init__(\n        self,\n        channels: int,\n        use_conv: bool = True,\n    ):\n        super().__init__()\n        self.channels = channels\n        self.use_conv = use_conv\n\n        if use_conv:\n            self.downsample = nn.Conv3d(\n                self.channels, self.channels, kernel_size=(3, 3, 3), stride=(2, 1, 1), padding=(1, 1, 1),\n           )\n        else:\n            self.downsample = nn.AvgPool3d(kernel_size=(3, 1, 1), stride=(2, 1, 1))\n\n    def forward(self, x):\n        x = self.downsample(x)\n        return x\n\n\nclass TemporalUpsample2x(nn.Module):\n    \"\"\"\n        Default upsample is F.interpolate(scale_factor=(2, 1, 1)) + Conv3d(stride=1)\n        Make sure input tensor is of shape [B, C, T, H, W]\n        Support micro_batch_size\n    \"\"\"\n    def __init__(\n        self,\n        channels,\n    ):\n        super().__init__()\n        self.channels = channels\n        self.conv = nn.Conv3d(channels, channels, kernel_size=3, padding=1)\n\n    def forward(self, x):\n        if x.shape[0] >= 64:\n            x = x.contiguous()\n        x = F.interpolate(x, scale_factor=(2, 1, 1), mode=\"trilinear\")\n        x = self.conv(x)\n        return x\n\n\nclass UNetMidBlock2D(nn.Module):\n    \"\"\"\n        default is ResnetBlock2D + Spatial Attention + ResnetBlock2D\n        Make sure input tensor is of shape [B, C, T, H, W] or [B, C, H, W]\n    \"\"\"\n    def __init__(\n        self,\n        in_channels: int,\n        num_layers: int = 1,\n        norm_groups: int = 32,\n        norm_eps: float = 1e-6,\n        attn_groups: Optional[int] = None,\n        add_attention: bool = True,\n        attention_head_dim: int = 512,\n    ):\n        super().__init__()\n        self.add_attention = add_attention\n\n        if attn_groups is None:\n            attn_groups = norm_groups\n\n        if attention_head_dim is None:\n            attention_head_dim = in_channels\n\n        res_blocks = [\n            ResnetBlock2D(\n                in_channels=in_channels,\n                out_channels=in_channels,\n                norm_eps=norm_eps,\n                norm_groups=norm_groups,\n            )\n        ]\n        attn_blocks = []\n\n        for _ in range(num_layers):\n            if self.add_attention:\n                attn_blocks.append(\n                    Attention(\n                        in_channels,\n                        heads=in_channels // attention_head_dim,\n                        dim_head=attention_head_dim,\n                        # rescale_output_factor=output_scale_factor,\n                        rescale_output_factor=1.0,\n                        eps=norm_eps,\n                        norm_num_groups=attn_groups,\n                        # spatial_norm_dim=temb_channels if resnet_time_scale_shift == \"spatial\" else None,\n                        spatial_norm_dim=None,\n                        residual_connection=True,\n                        bias=True,\n                        upcast_softmax=True,\n                        _from_deprecated_attn_block=True,\n                    )\n                )\n\n            res_blocks.append(\n                ResnetBlock2D(\n                    in_channels=in_channels,\n                    out_channels=in_channels,\n                    norm_eps=norm_eps,\n                    norm_groups=norm_groups,\n                )\n            )\n\n        self.attn_blocks = nn.ModuleList(attn_blocks)\n        self.res_blocks = nn.ModuleList(res_blocks)\n\n    def forward(self, x):\n        has_T = x.ndim == 5\n        if has_T:\n            B = x.shape[0]\n            x = rearrange(x, 'B C T H W -> (B T) C H W')\n\n        x = self.res_blocks[0](x)\n        for attn, res_block in zip(self.attn_blocks, self.res_blocks[1:]):\n            if attn is not None:\n                x = attn(x)\n            x = res_block(x)\n\n        if has_T:\n            x = rearrange(x, '(B T) C H W -> B C T H W', B=B)\n        return x\n\n\nclass Encoder(nn.Module):\n    \"\"\"\n        default arch is conv_in + blocks + mid_block + out_block\n        Make sure input tensor is of shape [B, C, T, H, W]\n    \"\"\"\n    def __init__(\n        self,\n        in_channels=3,\n        out_channels=4,\n        norm_groups=32,\n        norm_eps=1e-6,\n        double_z=True,\n        micro_batch_size=None,\n    ):\n        super().__init__()\n        in_channels_encoder = in_channels\n        out_channels_encoder = out_channels\n        block_out_channels = [128, 256, 512, 512]\n\n        # conv_in\n        self.conv_in = VideoConv2d(\n            in_channels_encoder,\n            block_out_channels[0],\n            kernel_size=3,\n            stride=1,\n            padding=1,\n            micro_batch_size=micro_batch_size,\n        )\n\n        # blocks\n        blocks = []\n\n        # the first block: ResnetBlock2D\n        in_channels = block_out_channels[0]\n        out_channels = block_out_channels[0]\n        blocks.append(\n            nn.Sequential(\n                ResnetBlock2D(\n                    in_channels=in_channels,\n                    out_channels=out_channels,\n                    norm_groups=norm_groups,\n                    norm_eps=norm_eps,\n                    micro_batch_size=micro_batch_size,\n                ),\n                ResnetBlock2D(\n                    in_channels=out_channels,\n                    out_channels=out_channels,\n                    norm_groups=norm_groups,\n                    norm_eps=norm_eps,\n                    micro_batch_size=micro_batch_size,\n                ),\n                SpatialDownsample2x(\n                    channels=out_channels,\n                    use_conv=True,\n                    micro_batch_size=micro_batch_size, \n                ),\n            )\n        )\n\n        # the second block: ResnetBlock2D\n        in_channels = block_out_channels[0]\n        out_channels = block_out_channels[1]\n        blocks.append(\n            nn.Sequential(\n                ResnetBlock2D(\n                    in_channels=in_channels,\n                    out_channels=out_channels,\n                    norm_groups=norm_groups,\n                    norm_eps=norm_eps,\n                    micro_batch_size=micro_batch_size,\n                ),\n                ResnetBlock2D(\n                    in_channels=out_channels,\n                    out_channels=out_channels,\n                    norm_groups=norm_groups,\n                    norm_eps=norm_eps,\n                    micro_batch_size=micro_batch_size,\n                ),\n                SpatialDownsample2x(\n                    channels=out_channels,\n                    use_conv=True,\n                    micro_batch_size=micro_batch_size, \n                ),\n                TemporalDownsample2x(\n                    channels=out_channels,\n                    use_conv=True,\n                )\n            )\n        )\n\n        # the third block: ResnetBlock3D\n        in_channels = block_out_channels[1]\n        out_channels = block_out_channels[2]\n        blocks.append(\n            nn.Sequential(\n                ResnetBlock3D(\n                    in_channels=in_channels,\n                    out_channels=out_channels,\n                    norm_groups=norm_groups,\n                    norm_eps=norm_eps,\n                ),\n                ResnetBlock3D(\n                    in_channels=out_channels,\n                    out_channels=out_channels,\n                    norm_groups=norm_groups,\n                    norm_eps=norm_eps,\n                ),\n                SpatialDownsample2x(\n                    channels=out_channels,\n                    use_conv=True,\n                ),\n                TemporalDownsample2x(\n                    channels=out_channels,\n                    use_conv=True,\n                )\n            )\n        )\n\n        # the fourth block: ResnetBlock3D\n        in_channels = block_out_channels[2]\n        out_channels = block_out_channels[3]\n        blocks.append(\n            nn.Sequential(\n                ResnetBlock3D(\n                    in_channels=in_channels,\n                    out_channels=out_channels,\n                    norm_groups=norm_groups,\n                    norm_eps=norm_eps,\n                ),\n                ResnetBlock3D(\n                    in_channels=out_channels,\n                    out_channels=out_channels,\n                    norm_groups=norm_groups,\n                    norm_eps=norm_eps,\n                ),\n            )\n        )\n\n        self.blocks = nn.ModuleList(blocks)\n\n\n        # mid_block\n        in_channels = block_out_channels[-1]\n        self.mid_block = UNetMidBlock2D(\n            in_channels=in_channels,\n            num_layers=1,\n            norm_groups=norm_groups,\n            norm_eps=norm_eps,\n            add_attention=True,\n            attention_head_dim=in_channels,\n        )\n\n        # out_block\n        in_channels = block_out_channels[-1]\n        out_channels = 2 * out_channels_encoder if double_z else out_channels_encoder\n        self.out_block = nn.Sequential(\n            nn.GroupNorm(num_channels=in_channels, num_groups=norm_groups, eps=norm_eps),\n            nn.SiLU(),\n            nn.Conv3d(in_channels, out_channels, kernel_size=3, padding=1),\n        )\n    \n    def forward(self, x):\n        x = self.conv_in(x)\n\n        for block in self.blocks:\n            x = block(x)\n\n        x = self.mid_block(x)\n\n        x = self.out_block(x)\n        return x\n\n\nclass Decoder(nn.Module):\n    \"\"\"\n        default arch is conv_in + mid_block + blocks + out_block\n        Make sure input tensor is of shape [B, C, T, H, W]\n    \"\"\"\n    def __init__(\n        self,\n        in_channels=4,\n        out_channels=3,\n        norm_groups=32,\n        norm_eps=1e-6,\n    ):\n        super().__init__()\n        in_channels_decoder = in_channels\n        out_channels_decoder = out_channels\n        block_out_channels = [512, 512, 256, 128]\n\n        # conv_in\n        self.conv_in = nn.Conv3d(\n            in_channels_decoder,\n            block_out_channels[0],\n            kernel_size=3,\n            stride=1,\n            padding=1,\n        )\n\n        # mid_block\n        in_channels = block_out_channels[0]\n        self.mid_block = UNetMidBlock2D(\n            in_channels=in_channels,\n            num_layers=1,\n            norm_groups=norm_groups,\n            norm_eps=norm_eps,\n            add_attention=True,\n            attention_head_dim=in_channels,\n        )\n\n        # blocks\n        blocks = []\n        layer_per_block = 3\n\n        # the first up block: ResnetBlock3D\n        in_channels = block_out_channels[0]\n        out_channels = block_out_channels[0]\n        seq = [\n            ResnetBlock3D(\n                in_channels=in_channels if idx ==0 else out_channels,\n                out_channels=out_channels,\n                norm_groups=norm_groups,\n                norm_eps=norm_eps,\n            )\n            for idx in range(layer_per_block)\n        ] + [\n            SpatialUpsample2x(\n                channels=out_channels,\n                use_interpolate=True,\n            ),\n            TemporalUpsample2x(\n                channels=out_channels,\n            ),\n        ]\n        blocks.append(nn.Sequential(*seq))\n\n        # the second up block: ResnetBlock3D\n        in_channels = block_out_channels[0]\n        out_channels = block_out_channels[1]\n        seq = [\n            ResnetBlock3D(\n                in_channels=in_channels if idx ==0 else out_channels,\n                out_channels=out_channels,\n                norm_groups=norm_groups,\n                norm_eps=norm_eps,\n            )\n            for idx in range(layer_per_block)\n        ] + [\n            SpatialUpsample2x(\n                channels=out_channels,\n                use_interpolate=True,\n            ),\n            TemporalUpsample2x(\n                channels=out_channels,\n            ),\n        ]\n        blocks.append(nn.Sequential(*seq))\n\n        # the third up block: ResnetBlock3D\n        in_channels = block_out_channels[1]\n        out_channels = block_out_channels[2]\n        seq = [\n            ResnetBlock3D(\n                in_channels=in_channels if idx ==0 else out_channels,\n                out_channels=out_channels,\n                norm_groups=norm_groups,\n                norm_eps=norm_eps,\n            )\n            for idx in range(layer_per_block)\n        ] + [\n            SpatialUpsample2x(\n                channels=out_channels,\n                use_interpolate=True,\n            ),\n        ]\n        blocks.append(nn.Sequential(*seq))\n\n        # the fourth up block: ResnetBlock2D\n        in_channels = block_out_channels[2]\n        out_channels = block_out_channels[3]\n        seq = [\n            ResnetBlock2D(\n                in_channels=in_channels if idx ==0 else out_channels,\n                out_channels=out_channels,\n                norm_groups=norm_groups,\n                norm_eps=norm_eps,\n            )\n            for idx in range(layer_per_block)\n        ]\n        blocks.append(nn.Sequential(*seq))\n\n        self.blocks = nn.ModuleList(blocks)\n\n        # out_block\n        in_channels = block_out_channels[-1]\n        out_channels = out_channels_decoder\n        self.out_block = nn.Sequential(\n            nn.GroupNorm(num_channels=in_channels, num_groups=norm_groups, eps=norm_eps),\n            nn.SiLU(),\n            nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),\n        )\n\n    def forward(self, x):\n        x = self.conv_in(x)\n        print(torch.cuda.memory_allocated() /  1024 ** 3)\n\n        x = self.mid_block(x)\n        print(torch.cuda.memory_allocated() /  1024 ** 3)\n\n        for block in self.blocks:\n            x = block(x)\n        print(torch.cuda.memory_allocated() /  1024 ** 3)\n\n        x = self.out_block(x)\n        print(torch.cuda.memory_allocated() /  1024 ** 3)\n        return x\n\nif __name__ == '__main__':\n    from opensora.utils.misc import count_params\n    device = 'cuda'\n    dtype = torch.bfloat16\n\n    encoder = Encoder(\n        in_channels=3,\n        out_channels=4,\n        double_z=False,\n        micro_batch_size=4,\n    ).to(torch.bfloat16).to(device, dtype).eval()\n\n    decoder = Decoder(\n        in_channels=4,\n        out_channels=3,\n    ).to(torch.bfloat16).to(device, dtype).eval()\n    num_params_enc = count_params(encoder)\n    num_params_dec = count_params(decoder)\n    print(f'Encoder #params: {num_params_enc}')\n    print(f'Decoder #params: {num_params_dec}')\n\n    # inference\n    x = torch.rand(1, 3, 51, 720, 1080).to(device, dtype)\n    with torch.inference_mode():\n        x_enc = encoder(x)\n        x_dec = decoder(x_enc)\n    print(torch.cuda.memory_allocated() /  1024 ** 3)\n    breakpoint()\n"
  },
  {
    "path": "Open-Sora/opensora/registry.py",
    "content": "from copy import deepcopy\n\nimport torch.nn as nn\nfrom mmengine.registry import Registry\n\n\ndef build_module(module, builder, **kwargs):\n    \"\"\"Build module from config or return the module itself.\n\n    Args:\n        module (Union[dict, nn.Module]): The module to build.\n        builder (Registry): The registry to build module.\n        *args, **kwargs: Arguments passed to build function.\n\n    Returns:\n        Any: The built module.\n    \"\"\"\n    if module is None:\n        return None\n    if isinstance(module, dict):\n        cfg = deepcopy(module)\n        for k, v in kwargs.items():\n            cfg[k] = v\n        return builder.build(cfg)\n    elif isinstance(module, nn.Module):\n        return module\n    elif module is None:\n        return None\n    else:\n        raise TypeError(f\"Only support dict and nn.Module, but got {type(module)}.\")\n\n\nMODELS = Registry(\n    \"model\",\n    locations=[\"opensora.models\"],\n)\n\nSCHEDULERS = Registry(\n    \"scheduler\",\n    locations=[\"opensora.schedulers\"],\n)\n\nDATASETS = Registry(\n    \"dataset\",\n    locations=[\"opensora.datasets\"],\n)\n"
  },
  {
    "path": "Open-Sora/opensora/schedulers/__init__.py",
    "content": "from .dpms import DPMS\nfrom .iddpm import IDDPM\nfrom .rf import RFLOW\n"
  },
  {
    "path": "Open-Sora/opensora/schedulers/dpms/__init__.py",
    "content": "from functools import partial\n\nimport torch\n\nfrom opensora.registry import SCHEDULERS\n\nfrom .dpm_solver import DPMS\n\n\n@SCHEDULERS.register_module(\"dpm-solver\")\nclass DPM_SOLVER:\n    def __init__(self, num_sampling_steps=None, cfg_scale=4.0):\n        self.num_sampling_steps = num_sampling_steps\n        self.cfg_scale = cfg_scale\n\n    def sample(\n        self,\n        model,\n        text_encoder,\n        z,\n        prompts,\n        device,\n        additional_args=None,\n        mask=None,\n        progress=True,\n    ):\n        if mask is not None:\n            print(\"[WARNING] mask is not supported in dpm-solver, it will be ignored\")\n        n = len(prompts)\n        model_args = text_encoder.encode(prompts)\n        y = model_args.pop(\"y\")\n        null_y = text_encoder.null(n)\n        if additional_args is not None:\n            model_args.update(additional_args)\n\n        dpms = DPMS(\n            partial(forward_with_dpmsolver, model),\n            condition=y,\n            uncondition=null_y,\n            cfg_scale=self.cfg_scale,\n            model_kwargs=model_args,\n        )\n        samples = dpms.sample(\n            z,\n            steps=self.num_sampling_steps,\n            order=2,\n            skip_type=\"time_uniform\",\n            method=\"multistep\",\n            progress=progress,\n        )\n        return samples\n\n\ndef forward_with_dpmsolver(self, x, timestep, y, **kwargs):\n    \"\"\"\n    dpm solver donnot need variance prediction\n    \"\"\"\n    # https://github.com/openai/glide-text2im/blob/main/notebooks/text2im.ipynb\n    model_out = self.forward(x, timestep, y, **kwargs)\n    return model_out.chunk(2, dim=1)[0]\n"
  },
  {
    "path": "Open-Sora/opensora/schedulers/dpms/dpm_solver.py",
    "content": "# MIT License\n#\n# Copyright (c) 2022 Cheng Lu\n#\n# Permission is hereby granted, free of charge, to any person obtaining a copy\n# of this software and associated documentation files (the \"Software\"), to deal\n# in the Software without restriction, including without limitation the rights\n# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n# copies of the Software, and to permit persons to whom the Software is\n# furnished to do so, subject to the following conditions:\n#\n#\n# This file is adapted from the dpm-solver project\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# PixArt:       https://github.com/PixArt-alpha/PixArt-alpha\n# dpm-solver:   https://github.com/LuChengTHU/dpm-solver\n# --------------------------------------------------------\n\nimport math\n\nimport numpy as np\nimport torch\nfrom tqdm import tqdm\n\n\ndef _warmup_beta(beta_start, beta_end, num_diffusion_timesteps, warmup_frac):\n    betas = beta_end * np.ones(num_diffusion_timesteps, dtype=np.float64)\n    warmup_time = int(num_diffusion_timesteps * warmup_frac)\n    betas[:warmup_time] = np.linspace(beta_start, beta_end, warmup_time, dtype=np.float64)\n    return betas\n\n\ndef get_beta_schedule(beta_schedule, *, beta_start, beta_end, num_diffusion_timesteps):\n    \"\"\"\n    This is the deprecated API for creating beta schedules.\n    See get_named_beta_schedule() for the new library of schedules.\n    \"\"\"\n    if beta_schedule == \"quad\":\n        betas = (\n            np.linspace(\n                beta_start**0.5,\n                beta_end**0.5,\n                num_diffusion_timesteps,\n                dtype=np.float64,\n            )\n            ** 2\n        )\n    elif beta_schedule == \"linear\":\n        betas = np.linspace(beta_start, beta_end, num_diffusion_timesteps, dtype=np.float64)\n    elif beta_schedule == \"warmup10\":\n        betas = _warmup_beta(beta_start, beta_end, num_diffusion_timesteps, 0.1)\n    elif beta_schedule == \"warmup50\":\n        betas = _warmup_beta(beta_start, beta_end, num_diffusion_timesteps, 0.5)\n    elif beta_schedule == \"const\":\n        betas = beta_end * np.ones(num_diffusion_timesteps, dtype=np.float64)\n    elif beta_schedule == \"jsd\":  # 1/T, 1/(T-1), 1/(T-2), ..., 1\n        betas = 1.0 / np.linspace(num_diffusion_timesteps, 1, num_diffusion_timesteps, dtype=np.float64)\n    else:\n        raise NotImplementedError(beta_schedule)\n    assert betas.shape == (num_diffusion_timesteps,)\n    return betas\n\n\ndef get_named_beta_schedule(schedule_name, num_diffusion_timesteps):\n    \"\"\"\n    Get a pre-defined beta schedule for the given name.\n    The beta schedule library consists of beta schedules which remain similar\n    in the limit of num_diffusion_timesteps.\n    Beta schedules may be added, but should not be removed or changed once\n    they are committed to maintain backwards compatibility.\n    \"\"\"\n    if schedule_name == \"linear\":\n        # Linear schedule from Ho et al, extended to work for any number of\n        # diffusion steps.\n        scale = 1000 / num_diffusion_timesteps\n        return get_beta_schedule(\n            \"linear\",\n            beta_start=scale * 0.0001,\n            beta_end=scale * 0.02,\n            num_diffusion_timesteps=num_diffusion_timesteps,\n        )\n    elif schedule_name == \"squaredcos_cap_v2\":\n        return betas_for_alpha_bar(\n            num_diffusion_timesteps,\n            lambda t: math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2,\n        )\n    else:\n        raise NotImplementedError(f\"unknown beta schedule: {schedule_name}\")\n\n\ndef betas_for_alpha_bar(num_diffusion_timesteps, alpha_bar, max_beta=0.999):\n    \"\"\"\n    Create a beta schedule that discretizes the given alpha_t_bar function,\n    which defines the cumulative product of (1-beta) over time from t = [0,1].\n    :param num_diffusion_timesteps: the number of betas to produce.\n    :param alpha_bar: a lambda that takes an argument t from 0 to 1 and\n                      produces the cumulative product of (1-beta) up to that\n                      part of the diffusion process.\n    :param max_beta: the maximum beta to use; use values lower than 1 to\n                     prevent singularities.\n    \"\"\"\n    betas = []\n    for i in range(num_diffusion_timesteps):\n        t1 = i / num_diffusion_timesteps\n        t2 = (i + 1) / num_diffusion_timesteps\n        betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta))\n    return np.array(betas)\n\n\nclass NoiseScheduleVP:\n    def __init__(\n        self,\n        schedule=\"discrete\",\n        betas=None,\n        alphas_cumprod=None,\n        continuous_beta_0=0.1,\n        continuous_beta_1=20.0,\n        dtype=torch.float32,\n    ):\n        \"\"\"Create a wrapper class for the forward SDE (VP type).\n\n        ***\n        Update: We support discrete-time diffusion models by implementing a picewise linear interpolation for log_alpha_t.\n                We recommend to use schedule='discrete' for the discrete-time diffusion models, especially for high-resolution images.\n        ***\n\n        The forward SDE ensures that the condition distribution q_{t|0}(x_t | x_0) = N ( alpha_t * x_0, sigma_t^2 * I ).\n        We further define lambda_t = log(alpha_t) - log(sigma_t), which is the half-logSNR (described in the DPM-Solver paper).\n        Therefore, we implement the functions for computing alpha_t, sigma_t and lambda_t. For t in [0, T], we have:\n\n            log_alpha_t = self.marginal_log_mean_coeff(t)\n            sigma_t = self.marginal_std(t)\n            lambda_t = self.marginal_lambda(t)\n\n        Moreover, as lambda(t) is an invertible function, we also support its inverse function:\n\n            t = self.inverse_lambda(lambda_t)\n\n        ===============================================================\n\n        We support both discrete-time DPMs (trained on n = 0, 1, ..., N-1) and continuous-time DPMs (trained on t in [t_0, T]).\n\n        1. For discrete-time DPMs:\n\n            For discrete-time DPMs trained on n = 0, 1, ..., N-1, we convert the discrete steps to continuous time steps by:\n                t_i = (i + 1) / N\n            e.g. for N = 1000, we have t_0 = 1e-3 and T = t_{N-1} = 1.\n            We solve the corresponding diffusion ODE from time T = 1 to time t_0 = 1e-3.\n\n            Args:\n                betas: A `torch.Tensor`. The beta array for the discrete-time DPM. (See the original DDPM paper for details)\n                alphas_cumprod: A `torch.Tensor`. The cumprod alphas for the discrete-time DPM. (See the original DDPM paper for details)\n\n            Note that we always have alphas_cumprod = cumprod(1 - betas). Therefore, we only need to set one of `betas` and `alphas_cumprod`.\n\n            **Important**:  Please pay special attention for the args for `alphas_cumprod`:\n                The `alphas_cumprod` is the \\hat{alpha_n} arrays in the notations of DDPM. Specifically, DDPMs assume that\n                    q_{t_n | 0}(x_{t_n} | x_0) = N ( \\sqrt{\\hat{alpha_n}} * x_0, (1 - \\hat{alpha_n}) * I ).\n                Therefore, the notation \\hat{alpha_n} is different from the notation alpha_t in DPM-Solver. In fact, we have\n                    alpha_{t_n} = \\sqrt{\\hat{alpha_n}},\n                and\n                    log(alpha_{t_n}) = 0.5 * log(\\hat{alpha_n}).\n\n\n        2. For continuous-time DPMs:\n\n            We support the linear VPSDE for the continuous time setting. The hyperparameters for the noise\n            schedule are the default settings in Yang Song's ScoreSDE:\n\n            Args:\n                beta_min: A `float` number. The smallest beta for the linear schedule.\n                beta_max: A `float` number. The largest beta for the linear schedule.\n                T: A `float` number. The ending time of the forward process.\n\n        ===============================================================\n\n        Args:\n            schedule: A `str`. The noise schedule of the forward SDE. 'discrete' for discrete-time DPMs,\n                    'linear' for continuous-time DPMs.\n        Returns:\n            A wrapper object of the forward SDE (VP type).\n\n        ===============================================================\n\n        Example:\n\n        # For discrete-time DPMs, given betas (the beta array for n = 0, 1, ..., N - 1):\n        >>> ns = NoiseScheduleVP('discrete', betas=betas)\n\n        # For discrete-time DPMs, given alphas_cumprod (the \\hat{alpha_n} array for n = 0, 1, ..., N - 1):\n        >>> ns = NoiseScheduleVP('discrete', alphas_cumprod=alphas_cumprod)\n\n        # For continuous-time DPMs (VPSDE), linear schedule:\n        >>> ns = NoiseScheduleVP('linear', continuous_beta_0=0.1, continuous_beta_1=20.)\n\n        \"\"\"\n\n        if schedule not in [\"discrete\", \"linear\"]:\n            raise ValueError(f\"Unsupported noise schedule {schedule}. The schedule needs to be 'discrete' or 'linear'\")\n\n        self.schedule = schedule\n        if schedule == \"discrete\":\n            if betas is not None:\n                log_alphas = 0.5 * torch.log(1 - betas).cumsum(dim=0)\n            else:\n                assert alphas_cumprod is not None\n                log_alphas = 0.5 * torch.log(alphas_cumprod)\n            self.T = 1.0\n            self.log_alpha_array = (\n                self.numerical_clip_alpha(log_alphas)\n                .reshape(\n                    (\n                        1,\n                        -1,\n                    )\n                )\n                .to(dtype=dtype)\n            )\n            self.total_N = self.log_alpha_array.shape[1]\n            self.t_array = torch.linspace(0.0, 1.0, self.total_N + 1)[1:].reshape((1, -1)).to(dtype=dtype)\n        else:\n            self.T = 1.0\n            self.total_N = 1000\n            self.beta_0 = continuous_beta_0\n            self.beta_1 = continuous_beta_1\n\n    def numerical_clip_alpha(self, log_alphas, clipped_lambda=-5.1):\n        \"\"\"\n        For some beta schedules such as cosine schedule, the log-SNR has numerical isssues.\n        We clip the log-SNR near t=T within -5.1 to ensure the stability.\n        Such a trick is very useful for diffusion models with the cosine schedule, such as i-DDPM, guided-diffusion and GLIDE.\n        \"\"\"\n        log_sigmas = 0.5 * torch.log(1.0 - torch.exp(2.0 * log_alphas))\n        lambs = log_alphas - log_sigmas\n        idx = torch.searchsorted(torch.flip(lambs, [0]), clipped_lambda)\n        if idx > 0:\n            log_alphas = log_alphas[:-idx]\n        return log_alphas\n\n    def marginal_log_mean_coeff(self, t):\n        \"\"\"\n        Compute log(alpha_t) of a given continuous-time label t in [0, T].\n        \"\"\"\n        if self.schedule == \"discrete\":\n            return interpolate_fn(\n                t.reshape((-1, 1)), self.t_array.to(t.device), self.log_alpha_array.to(t.device)\n            ).reshape((-1))\n        elif self.schedule == \"linear\":\n            return -0.25 * t**2 * (self.beta_1 - self.beta_0) - 0.5 * t * self.beta_0\n\n    def marginal_alpha(self, t):\n        \"\"\"\n        Compute alpha_t of a given continuous-time label t in [0, T].\n        \"\"\"\n        return torch.exp(self.marginal_log_mean_coeff(t))\n\n    def marginal_std(self, t):\n        \"\"\"\n        Compute sigma_t of a given continuous-time label t in [0, T].\n        \"\"\"\n        return torch.sqrt(1.0 - torch.exp(2.0 * self.marginal_log_mean_coeff(t)))\n\n    def marginal_lambda(self, t):\n        \"\"\"\n        Compute lambda_t = log(alpha_t) - log(sigma_t) of a given continuous-time label t in [0, T].\n        \"\"\"\n        log_mean_coeff = self.marginal_log_mean_coeff(t)\n        log_std = 0.5 * torch.log(1.0 - torch.exp(2.0 * log_mean_coeff))\n        return log_mean_coeff - log_std\n\n    def inverse_lambda(self, lamb):\n        \"\"\"\n        Compute the continuous-time label t in [0, T] of a given half-logSNR lambda_t.\n        \"\"\"\n        if self.schedule == \"linear\":\n            tmp = 2.0 * (self.beta_1 - self.beta_0) * torch.logaddexp(-2.0 * lamb, torch.zeros((1,)).to(lamb))\n            Delta = self.beta_0**2 + tmp\n            return tmp / (torch.sqrt(Delta) + self.beta_0) / (self.beta_1 - self.beta_0)\n        elif self.schedule == \"discrete\":\n            log_alpha = -0.5 * torch.logaddexp(torch.zeros((1,)).to(lamb.device), -2.0 * lamb)\n            t = interpolate_fn(\n                log_alpha.reshape((-1, 1)),\n                torch.flip(self.log_alpha_array.to(lamb.device), [1]),\n                torch.flip(self.t_array.to(lamb.device), [1]),\n            )\n            return t.reshape((-1,))\n\n\ndef model_wrapper(\n    model,\n    noise_schedule,\n    model_type=\"noise\",\n    model_kwargs={},\n    guidance_type=\"uncond\",\n    condition=None,\n    unconditional_condition=None,\n    guidance_scale=1.0,\n    classifier_fn=None,\n    classifier_kwargs={},\n):\n    \"\"\"Create a wrapper function for the noise prediction model.\n\n    DPM-Solver needs to solve the continuous-time diffusion ODEs. For DPMs trained on discrete-time labels, we need to\n    firstly wrap the model function to a noise prediction model that accepts the continuous time as the input.\n\n    We support four types of the diffusion model by setting `model_type`:\n\n        1. \"noise\": noise prediction model. (Trained by predicting noise).\n\n        2. \"x_start\": data prediction model. (Trained by predicting the data x_0 at time 0).\n\n        3. \"v\": velocity prediction model. (Trained by predicting the velocity).\n            The \"v\" prediction is derivation detailed in Appendix D of [1], and is used in Imagen-Video [2].\n\n            [1] Salimans, Tim, and Jonathan Ho. \"Progressive distillation for fast sampling of diffusion models.\"\n                arXiv preprint arXiv:2202.00512 (2022).\n            [2] Ho, Jonathan, et al. \"Imagen Video: High Definition Video Generation with Diffusion Models.\"\n                arXiv preprint arXiv:2210.02303 (2022).\n\n        4. \"score\": marginal score function. (Trained by denoising score matching).\n            Note that the score function and the noise prediction model follows a simple relationship:\n            ```\n                noise(x_t, t) = -sigma_t * score(x_t, t)\n            ```\n\n    We support three types of guided sampling by DPMs by setting `guidance_type`:\n        1. \"uncond\": unconditional sampling by DPMs.\n            The input `model` has the following format:\n            ``\n                model(x, t_input, **model_kwargs) -> noise | x_start | v | score\n            ``\n\n        2. \"classifier\": classifier guidance sampling [3] by DPMs and another classifier.\n            The input `model` has the following format:\n            ``\n                model(x, t_input, **model_kwargs) -> noise | x_start | v | score\n            ``\n\n            The input `classifier_fn` has the following format:\n            ``\n                classifier_fn(x, t_input, cond, **classifier_kwargs) -> logits(x, t_input, cond)\n            ``\n\n            [3] P. Dhariwal and A. Q. Nichol, \"Diffusion models beat GANs on image synthesis,\"\n                in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 8780-8794.\n\n        3. \"classifier-free\": classifier-free guidance sampling by conditional DPMs.\n            The input `model` has the following format:\n            ``\n                model(x, t_input, cond, **model_kwargs) -> noise | x_start | v | score\n            ``\n            And if cond == `unconditional_condition`, the model output is the unconditional DPM output.\n\n            [4] Ho, Jonathan, and Tim Salimans. \"Classifier-free diffusion guidance.\"\n                arXiv preprint arXiv:2207.12598 (2022).\n\n\n    The `t_input` is the time label of the model, which may be discrete-time labels (i.e. 0 to 999)\n    or continuous-time labels (i.e. epsilon to T).\n\n    We wrap the model function to accept only `x` and `t_continuous` as inputs, and outputs the predicted noise:\n    ``\n        def model_fn(x, t_continuous) -> noise:\n            t_input = get_model_input_time(t_continuous)\n            return noise_pred(model, x, t_input, **model_kwargs)\n    ``\n    where `t_continuous` is the continuous time labels (i.e. epsilon to T). And we use `model_fn` for DPM-Solver.\n\n    ===============================================================\n\n    Args:\n        model: A diffusion model with the corresponding format described above.\n        noise_schedule: A noise schedule object, such as NoiseScheduleVP.\n        model_type: A `str`. The parameterization type of the diffusion model.\n                    \"noise\" or \"x_start\" or \"v\" or \"score\".\n        model_kwargs: A `dict`. A dict for the other inputs of the model function.\n        guidance_type: A `str`. The type of the guidance for sampling.\n                    \"uncond\" or \"classifier\" or \"classifier-free\".\n        condition: A pytorch tensor. The condition for the guided sampling.\n                    Only used for \"classifier\" or \"classifier-free\" guidance type.\n        unconditional_condition: A pytorch tensor. The condition for the unconditional sampling.\n                    Only used for \"classifier-free\" guidance type.\n        guidance_scale: A `float`. The scale for the guided sampling.\n        classifier_fn: A classifier function. Only used for the classifier guidance.\n        classifier_kwargs: A `dict`. A dict for the other inputs of the classifier function.\n    Returns:\n        A noise prediction model that accepts the noised data and the continuous time as the inputs.\n    \"\"\"\n\n    def get_model_input_time(t_continuous):\n        \"\"\"\n        Convert the continuous-time `t_continuous` (in [epsilon, T]) to the model input time.\n        For discrete-time DPMs, we convert `t_continuous` in [1 / N, 1] to `t_input` in [0, 1000 * (N - 1) / N].\n        For continuous-time DPMs, we just use `t_continuous`.\n        \"\"\"\n        if noise_schedule.schedule == \"discrete\":\n            return (t_continuous - 1.0 / noise_schedule.total_N) * 1000.0\n        else:\n            return t_continuous\n\n    def noise_pred_fn(x, t_continuous, cond=None):\n        t_input = get_model_input_time(t_continuous)\n        if cond is None:\n            output = model(x, t_input, **model_kwargs)\n        else:\n            output = model(x, t_input, cond, **model_kwargs)\n        if model_type == \"noise\":\n            return output\n        elif model_type == \"x_start\":\n            alpha_t, sigma_t = noise_schedule.marginal_alpha(t_continuous), noise_schedule.marginal_std(t_continuous)\n            return (x - expand_dims(alpha_t, x.dim()) * output) / expand_dims(sigma_t, x.dim())\n        elif model_type == \"v\":\n            alpha_t, sigma_t = noise_schedule.marginal_alpha(t_continuous), noise_schedule.marginal_std(t_continuous)\n            return expand_dims(alpha_t, x.dim()) * output + expand_dims(sigma_t, x.dim()) * x\n        elif model_type == \"score\":\n            sigma_t = noise_schedule.marginal_std(t_continuous)\n            return -expand_dims(sigma_t, x.dim()) * output\n\n    def cond_grad_fn(x, t_input):\n        \"\"\"\n        Compute the gradient of the classifier, i.e. nabla_{x} log p_t(cond | x_t).\n        \"\"\"\n        with torch.enable_grad():\n            x_in = x.detach().requires_grad_(True)\n            log_prob = classifier_fn(x_in, t_input, condition, **classifier_kwargs)\n            return torch.autograd.grad(log_prob.sum(), x_in)[0]\n\n    def model_fn(x, t_continuous):\n        \"\"\"\n        The noise predicition model function that is used for DPM-Solver.\n        \"\"\"\n        if guidance_type == \"uncond\":\n            return noise_pred_fn(x, t_continuous)\n        elif guidance_type == \"classifier\":\n            assert classifier_fn is not None\n            t_input = get_model_input_time(t_continuous)\n            cond_grad = cond_grad_fn(x, t_input)\n            sigma_t = noise_schedule.marginal_std(t_continuous)\n            noise = noise_pred_fn(x, t_continuous)\n            return noise - guidance_scale * expand_dims(sigma_t, x.dim()) * cond_grad\n        elif guidance_type == \"classifier-free\":\n            if guidance_scale == 1.0 or unconditional_condition is None:\n                return noise_pred_fn(x, t_continuous, cond=condition)\n            x_in = torch.cat([x] * 2)\n            t_in = torch.cat([t_continuous] * 2)\n            c_in = torch.cat([unconditional_condition, condition])\n            noise_uncond, noise = noise_pred_fn(x_in, t_in, cond=c_in).chunk(2)\n            return noise_uncond + guidance_scale * (noise - noise_uncond)\n\n    assert model_type in [\"noise\", \"x_start\", \"v\", \"score\"]\n    assert guidance_type in [\"uncond\", \"classifier\", \"classifier-free\"]\n    return model_fn\n\n\nclass DPM_Solver:\n    def __init__(\n        self,\n        model_fn,\n        noise_schedule,\n        algorithm_type=\"dpmsolver++\",\n        correcting_x0_fn=None,\n        correcting_xt_fn=None,\n        thresholding_max_val=1.0,\n        dynamic_thresholding_ratio=0.995,\n    ):\n        \"\"\"Construct a DPM-Solver.\n\n        We support both DPM-Solver (`algorithm_type=\"dpmsolver\"`) and DPM-Solver++ (`algorithm_type=\"dpmsolver++\"`).\n\n        We also support the \"dynamic thresholding\" method in Imagen[1]. For pixel-space diffusion models, you\n        can set both `algorithm_type=\"dpmsolver++\"` and `correcting_x0_fn=\"dynamic_thresholding\"` to use the\n        dynamic thresholding. The \"dynamic thresholding\" can greatly improve the sample quality for pixel-space\n        DPMs with large guidance scales. Note that the thresholding method is **unsuitable** for latent-space\n        DPMs (such as stable-diffusion).\n\n        To support advanced algorithms in image-to-image applications, we also support corrector functions for\n        both x0 and xt.\n\n        Args:\n            model_fn: A noise prediction model function which accepts the continuous-time input (t in [epsilon, T]):\n                ``\n                def model_fn(x, t_continuous):\n                    return noise\n                ``\n                The shape of `x` is `(batch_size, **shape)`, and the shape of `t_continuous` is `(batch_size,)`.\n            noise_schedule: A noise schedule object, such as NoiseScheduleVP.\n            algorithm_type: A `str`. Either \"dpmsolver\" or \"dpmsolver++\".\n            correcting_x0_fn: A `str` or a function with the following format:\n                ```\n                def correcting_x0_fn(x0, t):\n                    x0_new = ...\n                    return x0_new\n                ```\n                This function is to correct the outputs of the data prediction model at each sampling step. e.g.,\n                ```\n                x0_pred = data_pred_model(xt, t)\n                if correcting_x0_fn is not None:\n                    x0_pred = correcting_x0_fn(x0_pred, t)\n                xt_1 = update(x0_pred, xt, t)\n                ```\n                If `correcting_x0_fn=\"dynamic_thresholding\"`, we use the dynamic thresholding proposed in Imagen[1].\n            correcting_xt_fn: A function with the following format:\n                ```\n                def correcting_xt_fn(xt, t, step):\n                    x_new = ...\n                    return x_new\n                ```\n                This function is to correct the intermediate samples xt at each sampling step. e.g.,\n                ```\n                xt = ...\n                xt = correcting_xt_fn(xt, t, step)\n                ```\n            thresholding_max_val: A `float`. The max value for thresholding.\n                Valid only when use `dpmsolver++` and `correcting_x0_fn=\"dynamic_thresholding\"`.\n            dynamic_thresholding_ratio: A `float`. The ratio for dynamic thresholding (see Imagen[1] for details).\n                Valid only when use `dpmsolver++` and `correcting_x0_fn=\"dynamic_thresholding\"`.\n\n        [1] Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour,\n            Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al. Photorealistic text-to-image diffusion models\n            with deep language understanding. arXiv preprint arXiv:2205.11487, 2022b.\n        \"\"\"\n        self.model = lambda x, t: model_fn(x, t.expand((x.shape[0])))\n        self.noise_schedule = noise_schedule\n        assert algorithm_type in [\"dpmsolver\", \"dpmsolver++\"]\n        self.algorithm_type = algorithm_type\n        if correcting_x0_fn == \"dynamic_thresholding\":\n            self.correcting_x0_fn = self.dynamic_thresholding_fn\n        else:\n            self.correcting_x0_fn = correcting_x0_fn\n        self.correcting_xt_fn = correcting_xt_fn\n        self.dynamic_thresholding_ratio = dynamic_thresholding_ratio\n        self.thresholding_max_val = thresholding_max_val\n\n    def dynamic_thresholding_fn(self, x0, t):\n        \"\"\"\n        The dynamic thresholding method.\n        \"\"\"\n        dims = x0.dim()\n        p = self.dynamic_thresholding_ratio\n        s = torch.quantile(torch.abs(x0).reshape((x0.shape[0], -1)), p, dim=1)\n        s = expand_dims(torch.maximum(s, self.thresholding_max_val * torch.ones_like(s).to(s.device)), dims)\n        x0 = torch.clamp(x0, -s, s) / s\n        return x0\n\n    def noise_prediction_fn(self, x, t):\n        \"\"\"\n        Return the noise prediction model.\n        \"\"\"\n        return self.model(x, t)\n\n    def data_prediction_fn(self, x, t):\n        \"\"\"\n        Return the data prediction model (with corrector).\n        \"\"\"\n        noise = self.noise_prediction_fn(x, t)\n        alpha_t, sigma_t = self.noise_schedule.marginal_alpha(t), self.noise_schedule.marginal_std(t)\n        x0 = (x - sigma_t * noise) / alpha_t\n        if self.correcting_x0_fn is not None:\n            x0 = self.correcting_x0_fn(x0, t)\n        return x0\n\n    def model_fn(self, x, t):\n        \"\"\"\n        Convert the model to the noise prediction model or the data prediction model.\n        \"\"\"\n        if self.algorithm_type == \"dpmsolver++\":\n            return self.data_prediction_fn(x, t)\n        else:\n            return self.noise_prediction_fn(x, t)\n\n    def get_time_steps(self, skip_type, t_T, t_0, N, device):\n        \"\"\"Compute the intermediate time steps for sampling.\n\n        Args:\n            skip_type: A `str`. The type for the spacing of the time steps. We support three types:\n                - 'logSNR': uniform logSNR for the time steps.\n                - 'time_uniform': uniform time for the time steps. (**Recommended for high-resolutional data**.)\n                - 'time_quadratic': quadratic time for the time steps. (Used in DDIM for low-resolutional data.)\n            t_T: A `float`. The starting time of the sampling (default is T).\n            t_0: A `float`. The ending time of the sampling (default is epsilon).\n            N: A `int`. The total number of the spacing of the time steps.\n            device: A torch device.\n        Returns:\n            A pytorch tensor of the time steps, with the shape (N + 1,).\n        \"\"\"\n        if skip_type == \"logSNR\":\n            lambda_T = self.noise_schedule.marginal_lambda(torch.tensor(t_T).to(device))\n            lambda_0 = self.noise_schedule.marginal_lambda(torch.tensor(t_0).to(device))\n            logSNR_steps = torch.linspace(lambda_T.cpu().item(), lambda_0.cpu().item(), N + 1).to(device)\n            return self.noise_schedule.inverse_lambda(logSNR_steps)\n        elif skip_type == \"time_uniform\":\n            return torch.linspace(t_T, t_0, N + 1).to(device)\n        elif skip_type == \"time_quadratic\":\n            t_order = 2\n            return torch.linspace(t_T ** (1.0 / t_order), t_0 ** (1.0 / t_order), N + 1).pow(t_order).to(device)\n        else:\n            raise ValueError(\n                f\"Unsupported skip_type {skip_type}, need to be 'logSNR' or 'time_uniform' or 'time_quadratic'\"\n            )\n\n    def get_orders_and_timesteps_for_singlestep_solver(self, steps, order, skip_type, t_T, t_0, device):\n        \"\"\"\n        Get the order of each step for sampling by the singlestep DPM-Solver.\n\n        We combine both DPM-Solver-1,2,3 to use all the function evaluations, which is named as \"DPM-Solver-fast\".\n        Given a fixed number of function evaluations by `steps`, the sampling procedure by DPM-Solver-fast is:\n            - If order == 1:\n                We take `steps` of DPM-Solver-1 (i.e. DDIM).\n            - If order == 2:\n                - Denote K = (steps // 2). We take K or (K + 1) intermediate time steps for sampling.\n                - If steps % 2 == 0, we use K steps of DPM-Solver-2.\n                - If steps % 2 == 1, we use K steps of DPM-Solver-2 and 1 step of DPM-Solver-1.\n            - If order == 3:\n                - Denote K = (steps // 3 + 1). We take K intermediate time steps for sampling.\n                - If steps % 3 == 0, we use (K - 2) steps of DPM-Solver-3, and 1 step of DPM-Solver-2 and 1 step of DPM-Solver-1.\n                - If steps % 3 == 1, we use (K - 1) steps of DPM-Solver-3 and 1 step of DPM-Solver-1.\n                - If steps % 3 == 2, we use (K - 1) steps of DPM-Solver-3 and 1 step of DPM-Solver-2.\n\n        ============================================\n        Args:\n            order: A `int`. The max order for the solver (2 or 3).\n            steps: A `int`. The total number of function evaluations (NFE).\n            skip_type: A `str`. The type for the spacing of the time steps. We support three types:\n                - 'logSNR': uniform logSNR for the time steps.\n                - 'time_uniform': uniform time for the time steps. (**Recommended for high-resolutional data**.)\n                - 'time_quadratic': quadratic time for the time steps. (Used in DDIM for low-resolutional data.)\n            t_T: A `float`. The starting time of the sampling (default is T).\n            t_0: A `float`. The ending time of the sampling (default is epsilon).\n            device: A torch device.\n        Returns:\n            orders: A list of the solver order of each step.\n        \"\"\"\n        if order == 3:\n            K = steps // 3 + 1\n            if steps % 3 == 0:\n                orders = [\n                    3,\n                ] * (\n                    K - 2\n                ) + [2, 1]\n            elif steps % 3 == 1:\n                orders = [\n                    3,\n                ] * (\n                    K - 1\n                ) + [1]\n            else:\n                orders = [\n                    3,\n                ] * (\n                    K - 1\n                ) + [2]\n        elif order == 2:\n            if steps % 2 == 0:\n                K = steps // 2\n                orders = [\n                    2,\n                ] * K\n            else:\n                K = steps // 2 + 1\n                orders = [\n                    2,\n                ] * (\n                    K - 1\n                ) + [1]\n        elif order == 1:\n            K = 1\n            orders = [\n                1,\n            ] * steps\n        else:\n            raise ValueError(\"'order' must be '1' or '2' or '3'.\")\n        if skip_type == \"logSNR\":\n            # To reproduce the results in DPM-Solver paper\n            timesteps_outer = self.get_time_steps(skip_type, t_T, t_0, K, device)\n        else:\n            timesteps_outer = self.get_time_steps(skip_type, t_T, t_0, steps, device)[\n                torch.cumsum(\n                    torch.tensor(\n                        [\n                            0,\n                        ]\n                        + orders\n                    ),\n                    0,\n                ).to(device)\n            ]\n        return timesteps_outer, orders\n\n    def denoise_to_zero_fn(self, x, s):\n        \"\"\"\n        Denoise at the final step, which is equivalent to solve the ODE from lambda_s to infty by first-order discretization.\n        \"\"\"\n        return self.data_prediction_fn(x, s)\n\n    def dpm_solver_first_update(self, x, s, t, model_s=None, return_intermediate=False):\n        \"\"\"\n        DPM-Solver-1 (equivalent to DDIM) from time `s` to time `t`.\n\n        Args:\n            x: A pytorch tensor. The initial value at time `s`.\n            s: A pytorch tensor. The starting time, with the shape (1,).\n            t: A pytorch tensor. The ending time, with the shape (1,).\n            model_s: A pytorch tensor. The model function evaluated at time `s`.\n                If `model_s` is None, we evaluate the model by `x` and `s`; otherwise we directly use it.\n            return_intermediate: A `bool`. If true, also return the model value at time `s`.\n        Returns:\n            x_t: A pytorch tensor. The approximated solution at time `t`.\n        \"\"\"\n        ns = self.noise_schedule\n        x.dim()\n        lambda_s, lambda_t = ns.marginal_lambda(s), ns.marginal_lambda(t)\n        h = lambda_t - lambda_s\n        log_alpha_s, log_alpha_t = ns.marginal_log_mean_coeff(s), ns.marginal_log_mean_coeff(t)\n        sigma_s, sigma_t = ns.marginal_std(s), ns.marginal_std(t)\n        alpha_t = torch.exp(log_alpha_t)\n\n        if self.algorithm_type == \"dpmsolver++\":\n            phi_1 = torch.expm1(-h)\n            if model_s is None:\n                model_s = self.model_fn(x, s)\n            x_t = sigma_t / sigma_s * x - alpha_t * phi_1 * model_s\n        else:\n            phi_1 = torch.expm1(h)\n            if model_s is None:\n                model_s = self.model_fn(x, s)\n            x_t = torch.exp(log_alpha_t - log_alpha_s) * x - (sigma_t * phi_1) * model_s\n        return (x_t, {\"model_s\": model_s}) if return_intermediate else x_t\n\n    def singlestep_dpm_solver_second_update(\n        self, x, s, t, r1=0.5, model_s=None, return_intermediate=False, solver_type=\"dpmsolver\"\n    ):\n        \"\"\"\n        Singlestep solver DPM-Solver-2 from time `s` to time `t`.\n\n        Args:\n            x: A pytorch tensor. The initial value at time `s`.\n            s: A pytorch tensor. The starting time, with the shape (1,).\n            t: A pytorch tensor. The ending time, with the shape (1,).\n            r1: A `float`. The hyperparameter of the second-order solver.\n            model_s: A pytorch tensor. The model function evaluated at time `s`.\n                If `model_s` is None, we evaluate the model by `x` and `s`; otherwise we directly use it.\n            return_intermediate: A `bool`. If true, also return the model value at time `s` and `s1` (the intermediate time).\n            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.\n                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.\n        Returns:\n            x_t: A pytorch tensor. The approximated solution at time `t`.\n        \"\"\"\n        if solver_type not in [\"dpmsolver\", \"taylor\"]:\n            raise ValueError(f\"'solver_type' must be either 'dpmsolver' or 'taylor', got {solver_type}\")\n        if r1 is None:\n            r1 = 0.5\n        ns = self.noise_schedule\n        lambda_s, lambda_t = ns.marginal_lambda(s), ns.marginal_lambda(t)\n        h = lambda_t - lambda_s\n        lambda_s1 = lambda_s + r1 * h\n        s1 = ns.inverse_lambda(lambda_s1)\n        log_alpha_s, log_alpha_s1, log_alpha_t = (\n            ns.marginal_log_mean_coeff(s),\n            ns.marginal_log_mean_coeff(s1),\n            ns.marginal_log_mean_coeff(t),\n        )\n        sigma_s, sigma_s1, sigma_t = ns.marginal_std(s), ns.marginal_std(s1), ns.marginal_std(t)\n        alpha_s1, alpha_t = torch.exp(log_alpha_s1), torch.exp(log_alpha_t)\n\n        if self.algorithm_type == \"dpmsolver++\":\n            phi_11 = torch.expm1(-r1 * h)\n            phi_1 = torch.expm1(-h)\n\n            if model_s is None:\n                model_s = self.model_fn(x, s)\n            x_s1 = (sigma_s1 / sigma_s) * x - (alpha_s1 * phi_11) * model_s\n            model_s1 = self.model_fn(x_s1, s1)\n            if solver_type == \"dpmsolver\":\n                x_t = (\n                    (sigma_t / sigma_s) * x\n                    - (alpha_t * phi_1) * model_s\n                    - (0.5 / r1) * (alpha_t * phi_1) * (model_s1 - model_s)\n                )\n            elif solver_type == \"taylor\":\n                x_t = (\n                    (sigma_t / sigma_s) * x\n                    - (alpha_t * phi_1) * model_s\n                    + (1.0 / r1) * (alpha_t * (phi_1 / h + 1.0)) * (model_s1 - model_s)\n                )\n        else:\n            phi_11 = torch.expm1(r1 * h)\n            phi_1 = torch.expm1(h)\n\n            if model_s is None:\n                model_s = self.model_fn(x, s)\n            x_s1 = torch.exp(log_alpha_s1 - log_alpha_s) * x - (sigma_s1 * phi_11) * model_s\n            model_s1 = self.model_fn(x_s1, s1)\n            if solver_type == \"dpmsolver\":\n                x_t = (\n                    torch.exp(log_alpha_t - log_alpha_s) * x\n                    - (sigma_t * phi_1) * model_s\n                    - (0.5 / r1) * (sigma_t * phi_1) * (model_s1 - model_s)\n                )\n            elif solver_type == \"taylor\":\n                x_t = (\n                    torch.exp(log_alpha_t - log_alpha_s) * x\n                    - (sigma_t * phi_1) * model_s\n                    - (1.0 / r1) * (sigma_t * (phi_1 / h - 1.0)) * (model_s1 - model_s)\n                )\n        if return_intermediate:\n            return x_t, {\"model_s\": model_s, \"model_s1\": model_s1}\n        else:\n            return x_t\n\n    def singlestep_dpm_solver_third_update(\n        self,\n        x,\n        s,\n        t,\n        r1=1.0 / 3.0,\n        r2=2.0 / 3.0,\n        model_s=None,\n        model_s1=None,\n        return_intermediate=False,\n        solver_type=\"dpmsolver\",\n    ):\n        \"\"\"\n        Singlestep solver DPM-Solver-3 from time `s` to time `t`.\n\n        Args:\n            x: A pytorch tensor. The initial value at time `s`.\n            s: A pytorch tensor. The starting time, with the shape (1,).\n            t: A pytorch tensor. The ending time, with the shape (1,).\n            r1: A `float`. The hyperparameter of the third-order solver.\n            r2: A `float`. The hyperparameter of the third-order solver.\n            model_s: A pytorch tensor. The model function evaluated at time `s`.\n                If `model_s` is None, we evaluate the model by `x` and `s`; otherwise we directly use it.\n            model_s1: A pytorch tensor. The model function evaluated at time `s1` (the intermediate time given by `r1`).\n                If `model_s1` is None, we evaluate the model at `s1`; otherwise we directly use it.\n            return_intermediate: A `bool`. If true, also return the model value at time `s`, `s1` and `s2` (the intermediate times).\n            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.\n                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.\n        Returns:\n            x_t: A pytorch tensor. The approximated solution at time `t`.\n        \"\"\"\n        if solver_type not in [\"dpmsolver\", \"taylor\"]:\n            raise ValueError(f\"'solver_type' must be either 'dpmsolver' or 'taylor', got {solver_type}\")\n        if r1 is None:\n            r1 = 1.0 / 3.0\n        if r2 is None:\n            r2 = 2.0 / 3.0\n        ns = self.noise_schedule\n        lambda_s, lambda_t = ns.marginal_lambda(s), ns.marginal_lambda(t)\n        h = lambda_t - lambda_s\n        lambda_s1 = lambda_s + r1 * h\n        lambda_s2 = lambda_s + r2 * h\n        s1 = ns.inverse_lambda(lambda_s1)\n        s2 = ns.inverse_lambda(lambda_s2)\n        log_alpha_s, log_alpha_s1, log_alpha_s2, log_alpha_t = (\n            ns.marginal_log_mean_coeff(s),\n            ns.marginal_log_mean_coeff(s1),\n            ns.marginal_log_mean_coeff(s2),\n            ns.marginal_log_mean_coeff(t),\n        )\n        sigma_s, sigma_s1, sigma_s2, sigma_t = (\n            ns.marginal_std(s),\n            ns.marginal_std(s1),\n            ns.marginal_std(s2),\n            ns.marginal_std(t),\n        )\n        alpha_s1, alpha_s2, alpha_t = torch.exp(log_alpha_s1), torch.exp(log_alpha_s2), torch.exp(log_alpha_t)\n\n        if self.algorithm_type == \"dpmsolver++\":\n            phi_11 = torch.expm1(-r1 * h)\n            phi_12 = torch.expm1(-r2 * h)\n            phi_1 = torch.expm1(-h)\n            phi_22 = torch.expm1(-r2 * h) / (r2 * h) + 1.0\n            phi_2 = phi_1 / h + 1.0\n            phi_3 = phi_2 / h - 0.5\n\n            if model_s is None:\n                model_s = self.model_fn(x, s)\n            if model_s1 is None:\n                x_s1 = (sigma_s1 / sigma_s) * x - (alpha_s1 * phi_11) * model_s\n                model_s1 = self.model_fn(x_s1, s1)\n            x_s2 = (\n                (sigma_s2 / sigma_s) * x\n                - (alpha_s2 * phi_12) * model_s\n                + r2 / r1 * (alpha_s2 * phi_22) * (model_s1 - model_s)\n            )\n            model_s2 = self.model_fn(x_s2, s2)\n            if solver_type == \"dpmsolver\":\n                x_t = (\n                    (sigma_t / sigma_s) * x\n                    - (alpha_t * phi_1) * model_s\n                    + (1.0 / r2) * (alpha_t * phi_2) * (model_s2 - model_s)\n                )\n            elif solver_type == \"taylor\":\n                D1_0 = (1.0 / r1) * (model_s1 - model_s)\n                D1_1 = (1.0 / r2) * (model_s2 - model_s)\n                D1 = (r2 * D1_0 - r1 * D1_1) / (r2 - r1)\n                D2 = 2.0 * (D1_1 - D1_0) / (r2 - r1)\n                x_t = (\n                    (sigma_t / sigma_s) * x\n                    - (alpha_t * phi_1) * model_s\n                    + (alpha_t * phi_2) * D1\n                    - (alpha_t * phi_3) * D2\n                )\n        else:\n            phi_11 = torch.expm1(r1 * h)\n            phi_12 = torch.expm1(r2 * h)\n            phi_1 = torch.expm1(h)\n            phi_22 = torch.expm1(r2 * h) / (r2 * h) - 1.0\n            phi_2 = phi_1 / h - 1.0\n            phi_3 = phi_2 / h - 0.5\n\n            if model_s is None:\n                model_s = self.model_fn(x, s)\n            if model_s1 is None:\n                x_s1 = (torch.exp(log_alpha_s1 - log_alpha_s)) * x - (sigma_s1 * phi_11) * model_s\n                model_s1 = self.model_fn(x_s1, s1)\n            x_s2 = (\n                (torch.exp(log_alpha_s2 - log_alpha_s)) * x\n                - (sigma_s2 * phi_12) * model_s\n                - r2 / r1 * (sigma_s2 * phi_22) * (model_s1 - model_s)\n            )\n            model_s2 = self.model_fn(x_s2, s2)\n            if solver_type == \"dpmsolver\":\n                x_t = (\n                    (torch.exp(log_alpha_t - log_alpha_s)) * x\n                    - (sigma_t * phi_1) * model_s\n                    - (1.0 / r2) * (sigma_t * phi_2) * (model_s2 - model_s)\n                )\n            elif solver_type == \"taylor\":\n                D1_0 = (1.0 / r1) * (model_s1 - model_s)\n                D1_1 = (1.0 / r2) * (model_s2 - model_s)\n                D1 = (r2 * D1_0 - r1 * D1_1) / (r2 - r1)\n                D2 = 2.0 * (D1_1 - D1_0) / (r2 - r1)\n                x_t = (\n                    (torch.exp(log_alpha_t - log_alpha_s)) * x\n                    - (sigma_t * phi_1) * model_s\n                    - (sigma_t * phi_2) * D1\n                    - (sigma_t * phi_3) * D2\n                )\n\n        if return_intermediate:\n            return x_t, {\"model_s\": model_s, \"model_s1\": model_s1, \"model_s2\": model_s2}\n        else:\n            return x_t\n\n    def multistep_dpm_solver_second_update(self, x, model_prev_list, t_prev_list, t, solver_type=\"dpmsolver\"):\n        \"\"\"\n        Multistep solver DPM-Solver-2 from time `t_prev_list[-1]` to time `t`.\n\n        Args:\n            x: A pytorch tensor. The initial value at time `s`.\n            model_prev_list: A list of pytorch tensor. The previous computed model values.\n            t_prev_list: A list of pytorch tensor. The previous times, each time has the shape (1,)\n            t: A pytorch tensor. The ending time, with the shape (1,).\n            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.\n                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.\n        Returns:\n            x_t: A pytorch tensor. The approximated solution at time `t`.\n        \"\"\"\n        if solver_type not in [\"dpmsolver\", \"taylor\"]:\n            raise ValueError(f\"'solver_type' must be either 'dpmsolver' or 'taylor', got {solver_type}\")\n        ns = self.noise_schedule\n        model_prev_1, model_prev_0 = model_prev_list[-2], model_prev_list[-1]\n        t_prev_1, t_prev_0 = t_prev_list[-2], t_prev_list[-1]\n        lambda_prev_1, lambda_prev_0, lambda_t = (\n            ns.marginal_lambda(t_prev_1),\n            ns.marginal_lambda(t_prev_0),\n            ns.marginal_lambda(t),\n        )\n        log_alpha_prev_0, log_alpha_t = ns.marginal_log_mean_coeff(t_prev_0), ns.marginal_log_mean_coeff(t)\n        sigma_prev_0, sigma_t = ns.marginal_std(t_prev_0), ns.marginal_std(t)\n        alpha_t = torch.exp(log_alpha_t)\n\n        h_0 = lambda_prev_0 - lambda_prev_1\n        h = lambda_t - lambda_prev_0\n        r0 = h_0 / h\n        D1_0 = (1.0 / r0) * (model_prev_0 - model_prev_1)\n        if self.algorithm_type == \"dpmsolver++\":\n            phi_1 = torch.expm1(-h)\n            if solver_type == \"dpmsolver\":\n                x_t = (sigma_t / sigma_prev_0) * x - (alpha_t * phi_1) * model_prev_0 - 0.5 * (alpha_t * phi_1) * D1_0\n            elif solver_type == \"taylor\":\n                x_t = (\n                    (sigma_t / sigma_prev_0) * x\n                    - (alpha_t * phi_1) * model_prev_0\n                    + (alpha_t * (phi_1 / h + 1.0)) * D1_0\n                )\n        else:\n            phi_1 = torch.expm1(h)\n            if solver_type == \"dpmsolver\":\n                x_t = (\n                    (torch.exp(log_alpha_t - log_alpha_prev_0)) * x\n                    - (sigma_t * phi_1) * model_prev_0\n                    - 0.5 * (sigma_t * phi_1) * D1_0\n                )\n            elif solver_type == \"taylor\":\n                x_t = (\n                    (torch.exp(log_alpha_t - log_alpha_prev_0)) * x\n                    - (sigma_t * phi_1) * model_prev_0\n                    - (sigma_t * (phi_1 / h - 1.0)) * D1_0\n                )\n        return x_t\n\n    def multistep_dpm_solver_third_update(self, x, model_prev_list, t_prev_list, t, solver_type=\"dpmsolver\"):\n        \"\"\"\n        Multistep solver DPM-Solver-3 from time `t_prev_list[-1]` to time `t`.\n\n        Args:\n            x: A pytorch tensor. The initial value at time `s`.\n            model_prev_list: A list of pytorch tensor. The previous computed model values.\n            t_prev_list: A list of pytorch tensor. The previous times, each time has the shape (1,)\n            t: A pytorch tensor. The ending time, with the shape (1,).\n            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.\n                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.\n        Returns:\n            x_t: A pytorch tensor. The approximated solution at time `t`.\n        \"\"\"\n        ns = self.noise_schedule\n        model_prev_2, model_prev_1, model_prev_0 = model_prev_list\n        t_prev_2, t_prev_1, t_prev_0 = t_prev_list\n        lambda_prev_2, lambda_prev_1, lambda_prev_0, lambda_t = (\n            ns.marginal_lambda(t_prev_2),\n            ns.marginal_lambda(t_prev_1),\n            ns.marginal_lambda(t_prev_0),\n            ns.marginal_lambda(t),\n        )\n        log_alpha_prev_0, log_alpha_t = ns.marginal_log_mean_coeff(t_prev_0), ns.marginal_log_mean_coeff(t)\n        sigma_prev_0, sigma_t = ns.marginal_std(t_prev_0), ns.marginal_std(t)\n        alpha_t = torch.exp(log_alpha_t)\n\n        h_1 = lambda_prev_1 - lambda_prev_2\n        h_0 = lambda_prev_0 - lambda_prev_1\n        h = lambda_t - lambda_prev_0\n        r0, r1 = h_0 / h, h_1 / h\n        D1_0 = (1.0 / r0) * (model_prev_0 - model_prev_1)\n        D1_1 = (1.0 / r1) * (model_prev_1 - model_prev_2)\n        D1 = D1_0 + (r0 / (r0 + r1)) * (D1_0 - D1_1)\n        D2 = (1.0 / (r0 + r1)) * (D1_0 - D1_1)\n        if self.algorithm_type == \"dpmsolver++\":\n            phi_1 = torch.expm1(-h)\n            phi_2 = phi_1 / h + 1.0\n            phi_3 = phi_2 / h - 0.5\n            return (\n                (sigma_t / sigma_prev_0) * x\n                - (alpha_t * phi_1) * model_prev_0\n                + (alpha_t * phi_2) * D1\n                - (alpha_t * phi_3) * D2\n            )\n        else:\n            phi_1 = torch.expm1(h)\n            phi_2 = phi_1 / h - 1.0\n            phi_3 = phi_2 / h - 0.5\n            return (\n                (torch.exp(log_alpha_t - log_alpha_prev_0)) * x\n                - (sigma_t * phi_1) * model_prev_0\n                - (sigma_t * phi_2) * D1\n                - (sigma_t * phi_3) * D2\n            )\n\n    def singlestep_dpm_solver_update(\n        self, x, s, t, order, return_intermediate=False, solver_type=\"dpmsolver\", r1=None, r2=None\n    ):\n        \"\"\"\n        Singlestep DPM-Solver with the order `order` from time `s` to time `t`.\n\n        Args:\n            x: A pytorch tensor. The initial value at time `s`.\n            s: A pytorch tensor. The starting time, with the shape (1,).\n            t: A pytorch tensor. The ending time, with the shape (1,).\n            order: A `int`. The order of DPM-Solver. We only support order == 1 or 2 or 3.\n            return_intermediate: A `bool`. If true, also return the model value at time `s`, `s1` and `s2` (the intermediate times).\n            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.\n                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.\n            r1: A `float`. The hyperparameter of the second-order or third-order solver.\n            r2: A `float`. The hyperparameter of the third-order solver.\n        Returns:\n            x_t: A pytorch tensor. The approximated solution at time `t`.\n        \"\"\"\n        if order == 1:\n            return self.dpm_solver_first_update(x, s, t, return_intermediate=return_intermediate)\n        elif order == 2:\n            return self.singlestep_dpm_solver_second_update(\n                x, s, t, return_intermediate=return_intermediate, solver_type=solver_type, r1=r1\n            )\n        elif order == 3:\n            return self.singlestep_dpm_solver_third_update(\n                x, s, t, return_intermediate=return_intermediate, solver_type=solver_type, r1=r1, r2=r2\n            )\n        else:\n            raise ValueError(f\"Solver order must be 1 or 2 or 3, got {order}\")\n\n    def multistep_dpm_solver_update(self, x, model_prev_list, t_prev_list, t, order, solver_type=\"dpmsolver\"):\n        \"\"\"\n        Multistep DPM-Solver with the order `order` from time `t_prev_list[-1]` to time `t`.\n\n        Args:\n            x: A pytorch tensor. The initial value at time `s`.\n            model_prev_list: A list of pytorch tensor. The previous computed model values.\n            t_prev_list: A list of pytorch tensor. The previous times, each time has the shape (1,)\n            t: A pytorch tensor. The ending time, with the shape (1,).\n            order: A `int`. The order of DPM-Solver. We only support order == 1 or 2 or 3.\n            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.\n                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.\n        Returns:\n            x_t: A pytorch tensor. The approximated solution at time `t`.\n        \"\"\"\n        if order == 1:\n            return self.dpm_solver_first_update(x, t_prev_list[-1], t, model_s=model_prev_list[-1])\n        elif order == 2:\n            return self.multistep_dpm_solver_second_update(x, model_prev_list, t_prev_list, t, solver_type=solver_type)\n        elif order == 3:\n            return self.multistep_dpm_solver_third_update(x, model_prev_list, t_prev_list, t, solver_type=solver_type)\n        else:\n            raise ValueError(f\"Solver order must be 1 or 2 or 3, got {order}\")\n\n    def dpm_solver_adaptive(\n        self, x, order, t_T, t_0, h_init=0.05, atol=0.0078, rtol=0.05, theta=0.9, t_err=1e-5, solver_type=\"dpmsolver\"\n    ):\n        \"\"\"\n        The adaptive step size solver based on singlestep DPM-Solver.\n\n        Args:\n            x: A pytorch tensor. The initial value at time `t_T`.\n            order: A `int`. The (higher) order of the solver. We only support order == 2 or 3.\n            t_T: A `float`. The starting time of the sampling (default is T).\n            t_0: A `float`. The ending time of the sampling (default is epsilon).\n            h_init: A `float`. The initial step size (for logSNR).\n            atol: A `float`. The absolute tolerance of the solver. For image data, the default setting is 0.0078, followed [1].\n            rtol: A `float`. The relative tolerance of the solver. The default setting is 0.05.\n            theta: A `float`. The safety hyperparameter for adapting the step size. The default setting is 0.9, followed [1].\n            t_err: A `float`. The tolerance for the time. We solve the diffusion ODE until the absolute error between the\n                current time and `t_0` is less than `t_err`. The default setting is 1e-5.\n            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.\n                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.\n        Returns:\n            x_0: A pytorch tensor. The approximated solution at time `t_0`.\n\n        [1] A. Jolicoeur-Martineau, K. Li, R. Piché-Taillefer, T. Kachman, and I. Mitliagkas, \"Gotta go fast when generating data with score-based models,\" arXiv preprint arXiv:2105.14080, 2021.\n        \"\"\"\n        ns = self.noise_schedule\n        s = t_T * torch.ones((1,)).to(x)\n        lambda_s = ns.marginal_lambda(s)\n        lambda_0 = ns.marginal_lambda(t_0 * torch.ones_like(s).to(x))\n        h = h_init * torch.ones_like(s).to(x)\n        x_prev = x\n        nfe = 0\n        if order == 2:\n            r1 = 0.5\n            lower_update = lambda x, s, t: self.dpm_solver_first_update(x, s, t, return_intermediate=True)\n            higher_update = lambda x, s, t, **kwargs: self.singlestep_dpm_solver_second_update(\n                x, s, t, r1=r1, solver_type=solver_type, **kwargs\n            )\n        elif order == 3:\n            r1, r2 = 1.0 / 3.0, 2.0 / 3.0\n            lower_update = lambda x, s, t: self.singlestep_dpm_solver_second_update(\n                x, s, t, r1=r1, return_intermediate=True, solver_type=solver_type\n            )\n            higher_update = lambda x, s, t, **kwargs: self.singlestep_dpm_solver_third_update(\n                x, s, t, r1=r1, r2=r2, solver_type=solver_type, **kwargs\n            )\n        else:\n            raise ValueError(f\"For adaptive step size solver, order must be 2 or 3, got {order}\")\n        while torch.abs((s - t_0)).mean() > t_err:\n            t = ns.inverse_lambda(lambda_s + h)\n            x_lower, lower_noise_kwargs = lower_update(x, s, t)\n            x_higher = higher_update(x, s, t, **lower_noise_kwargs)\n            delta = torch.max(torch.ones_like(x).to(x) * atol, rtol * torch.max(torch.abs(x_lower), torch.abs(x_prev)))\n            norm_fn = lambda v: torch.sqrt(torch.square(v.reshape((v.shape[0], -1))).mean(dim=-1, keepdim=True))\n            E = norm_fn((x_higher - x_lower) / delta).max()\n            if torch.all(E <= 1.0):\n                x = x_higher\n                s = t\n                x_prev = x_lower\n                lambda_s = ns.marginal_lambda(s)\n            h = torch.min(theta * h * torch.float_power(E, -1.0 / order).float(), lambda_0 - lambda_s)\n            nfe += order\n        print(\"adaptive solver nfe\", nfe)\n        return x\n\n    def add_noise(self, x, t, noise=None):\n        \"\"\"\n        Compute the noised input xt = alpha_t * x + sigma_t * noise.\n\n        Args:\n            x: A `torch.Tensor` with shape `(batch_size, *shape)`.\n            t: A `torch.Tensor` with shape `(t_size,)`.\n        Returns:\n            xt with shape `(t_size, batch_size, *shape)`.\n        \"\"\"\n        alpha_t, sigma_t = self.noise_schedule.marginal_alpha(t), self.noise_schedule.marginal_std(t)\n        if noise is None:\n            noise = torch.randn((t.shape[0], *x.shape), device=x.device)\n        x = x.reshape((-1, *x.shape))\n        xt = expand_dims(alpha_t, x.dim()) * x + expand_dims(sigma_t, x.dim()) * noise\n        return xt.squeeze(0) if t.shape[0] == 1 else xt\n\n    def inverse(\n        self,\n        x,\n        steps=20,\n        t_start=None,\n        t_end=None,\n        order=2,\n        skip_type=\"time_uniform\",\n        method=\"multistep\",\n        lower_order_final=True,\n        denoise_to_zero=False,\n        solver_type=\"dpmsolver\",\n        atol=0.0078,\n        rtol=0.05,\n        return_intermediate=False,\n    ):\n        \"\"\"\n        Inverse the sample `x` from time `t_start` to `t_end` by DPM-Solver.\n        For discrete-time DPMs, we use `t_start=1/N`, where `N` is the total time steps during training.\n        \"\"\"\n        t_0 = 1.0 / self.noise_schedule.total_N if t_start is None else t_start\n        t_T = self.noise_schedule.T if t_end is None else t_end\n        assert (\n            t_0 > 0 and t_T > 0\n        ), \"Time range needs to be greater than 0. For discrete-time DPMs, it needs to be in [1 / N, 1], where N is the length of betas array\"\n        return self.sample(\n            x,\n            steps=steps,\n            t_start=t_0,\n            t_end=t_T,\n            order=order,\n            skip_type=skip_type,\n            method=method,\n            lower_order_final=lower_order_final,\n            denoise_to_zero=denoise_to_zero,\n            solver_type=solver_type,\n            atol=atol,\n            rtol=rtol,\n            return_intermediate=return_intermediate,\n        )\n\n    def sample(\n        self,\n        x,\n        steps=20,\n        t_start=None,\n        t_end=None,\n        order=2,\n        skip_type=\"time_uniform\",\n        method=\"multistep\",\n        lower_order_final=True,\n        denoise_to_zero=False,\n        solver_type=\"dpmsolver\",\n        atol=0.0078,\n        rtol=0.05,\n        return_intermediate=False,\n        progress=True,\n    ):\n        \"\"\"\n        Compute the sample at time `t_end` by DPM-Solver, given the initial `x` at time `t_start`.\n\n        =====================================================\n\n        We support the following algorithms for both noise prediction model and data prediction model:\n            - 'singlestep':\n                Singlestep DPM-Solver (i.e. \"DPM-Solver-fast\" in the paper), which combines different orders of singlestep DPM-Solver.\n                We combine all the singlestep solvers with order <= `order` to use up all the function evaluations (steps).\n                The total number of function evaluations (NFE) == `steps`.\n                Given a fixed NFE == `steps`, the sampling procedure is:\n                    - If `order` == 1:\n                        - Denote K = steps. We use K steps of DPM-Solver-1 (i.e. DDIM).\n                    - If `order` == 2:\n                        - Denote K = (steps // 2) + (steps % 2). We take K intermediate time steps for sampling.\n                        - If steps % 2 == 0, we use K steps of singlestep DPM-Solver-2.\n                        - If steps % 2 == 1, we use (K - 1) steps of singlestep DPM-Solver-2 and 1 step of DPM-Solver-1.\n                    - If `order` == 3:\n                        - Denote K = (steps // 3 + 1). We take K intermediate time steps for sampling.\n                        - If steps % 3 == 0, we use (K - 2) steps of singlestep DPM-Solver-3, and 1 step of singlestep DPM-Solver-2 and 1 step of DPM-Solver-1.\n                        - If steps % 3 == 1, we use (K - 1) steps of singlestep DPM-Solver-3 and 1 step of DPM-Solver-1.\n                        - If steps % 3 == 2, we use (K - 1) steps of singlestep DPM-Solver-3 and 1 step of singlestep DPM-Solver-2.\n            - 'multistep':\n                Multistep DPM-Solver with the order of `order`. The total number of function evaluations (NFE) == `steps`.\n                We initialize the first `order` values by lower order multistep solvers.\n                Given a fixed NFE == `steps`, the sampling procedure is:\n                    Denote K = steps.\n                    - If `order` == 1:\n                        - We use K steps of DPM-Solver-1 (i.e. DDIM).\n                    - If `order` == 2:\n                        - We firstly use 1 step of DPM-Solver-1, then use (K - 1) step of multistep DPM-Solver-2.\n                    - If `order` == 3:\n                        - We firstly use 1 step of DPM-Solver-1, then 1 step of multistep DPM-Solver-2, then (K - 2) step of multistep DPM-Solver-3.\n            - 'singlestep_fixed':\n                Fixed order singlestep DPM-Solver (i.e. DPM-Solver-1 or singlestep DPM-Solver-2 or singlestep DPM-Solver-3).\n                We use singlestep DPM-Solver-`order` for `order`=1 or 2 or 3, with total [`steps` // `order`] * `order` NFE.\n            - 'adaptive':\n                Adaptive step size DPM-Solver (i.e. \"DPM-Solver-12\" and \"DPM-Solver-23\" in the paper).\n                We ignore `steps` and use adaptive step size DPM-Solver with a higher order of `order`.\n                You can adjust the absolute tolerance `atol` and the relative tolerance `rtol` to balance the computatation costs\n                (NFE) and the sample quality.\n                    - If `order` == 2, we use DPM-Solver-12 which combines DPM-Solver-1 and singlestep DPM-Solver-2.\n                    - If `order` == 3, we use DPM-Solver-23 which combines singlestep DPM-Solver-2 and singlestep DPM-Solver-3.\n\n        =====================================================\n\n        Some advices for choosing the algorithm:\n            - For **unconditional sampling** or **guided sampling with small guidance scale** by DPMs:\n                Use singlestep DPM-Solver or DPM-Solver++ (\"DPM-Solver-fast\" in the paper) with `order = 3`.\n                e.g., DPM-Solver:\n                    >>> dpm_solver = DPM_Solver(model_fn, noise_schedule, algorithm_type=\"dpmsolver\")\n                    >>> x_sample = dpm_solver.sample(x, steps=steps, t_start=t_start, t_end=t_end, order=3,\n                            skip_type='time_uniform', method='singlestep')\n                e.g., DPM-Solver++:\n                    >>> dpm_solver = DPM_Solver(model_fn, noise_schedule, algorithm_type=\"dpmsolver++\")\n                    >>> x_sample = dpm_solver.sample(x, steps=steps, t_start=t_start, t_end=t_end, order=3,\n                            skip_type='time_uniform', method='singlestep')\n            - For **guided sampling with large guidance scale** by DPMs:\n                Use multistep DPM-Solver with `algorithm_type=\"dpmsolver++\"` and `order = 2`.\n                e.g.\n                    >>> dpm_solver = DPM_Solver(model_fn, noise_schedule, algorithm_type=\"dpmsolver++\")\n                    >>> x_sample = dpm_solver.sample(x, steps=steps, t_start=t_start, t_end=t_end, order=2,\n                            skip_type='time_uniform', method='multistep')\n\n        We support three types of `skip_type`:\n            - 'logSNR': uniform logSNR for the time steps. **Recommended for low-resolutional images**\n            - 'time_uniform': uniform time for the time steps. **Recommended for high-resolutional images**.\n            - 'time_quadratic': quadratic time for the time steps.\n\n        =====================================================\n        Args:\n            x: A pytorch tensor. The initial value at time `t_start`\n                e.g. if `t_start` == T, then `x` is a sample from the standard normal distribution.\n            steps: A `int`. The total number of function evaluations (NFE).\n            t_start: A `float`. The starting time of the sampling.\n                If `T` is None, we use self.noise_schedule.T (default is 1.0).\n            t_end: A `float`. The ending time of the sampling.\n                If `t_end` is None, we use 1. / self.noise_schedule.total_N.\n                e.g. if total_N == 1000, we have `t_end` == 1e-3.\n                For discrete-time DPMs:\n                    - We recommend `t_end` == 1. / self.noise_schedule.total_N.\n                For continuous-time DPMs:\n                    - We recommend `t_end` == 1e-3 when `steps` <= 15; and `t_end` == 1e-4 when `steps` > 15.\n            order: A `int`. The order of DPM-Solver.\n            skip_type: A `str`. The type for the spacing of the time steps. 'time_uniform' or 'logSNR' or 'time_quadratic'.\n            method: A `str`. The method for sampling. 'singlestep' or 'multistep' or 'singlestep_fixed' or 'adaptive'.\n            denoise_to_zero: A `bool`. Whether to denoise to time 0 at the final step.\n                Default is `False`. If `denoise_to_zero` is `True`, the total NFE is (`steps` + 1).\n\n                This trick is firstly proposed by DDPM (https://arxiv.org/abs/2006.11239) and\n                score_sde (https://arxiv.org/abs/2011.13456). Such trick can improve the FID\n                for diffusion models sampling by diffusion SDEs for low-resolutional images\n                (such as CIFAR-10). However, we observed that such trick does not matter for\n                high-resolutional images. As it needs an additional NFE, we do not recommend\n                it for high-resolutional images.\n            lower_order_final: A `bool`. Whether to use lower order solvers at the final steps.\n                Only valid for `method=multistep` and `steps < 15`. We empirically find that\n                this trick is a key to stabilizing the sampling by DPM-Solver with very few steps\n                (especially for steps <= 10). So we recommend to set it to be `True`.\n            solver_type: A `str`. The taylor expansion type for the solver. `dpmsolver` or `taylor`. We recommend `dpmsolver`.\n            atol: A `float`. The absolute tolerance of the adaptive step size solver. Valid when `method` == 'adaptive'.\n            rtol: A `float`. The relative tolerance of the adaptive step size solver. Valid when `method` == 'adaptive'.\n            return_intermediate: A `bool`. Whether to save the xt at each step.\n                When set to `True`, method returns a tuple (x0, intermediates); when set to False, method returns only x0.\n        Returns:\n            x_end: A pytorch tensor. The approximated solution at time `t_end`.\n\n        \"\"\"\n        t_0 = 1.0 / self.noise_schedule.total_N if t_end is None else t_end\n        t_T = self.noise_schedule.T if t_start is None else t_start\n        assert (\n            t_0 > 0 and t_T > 0\n        ), \"Time range needs to be greater than 0. For discrete-time DPMs, it needs to be in [1 / N, 1], where N is the length of betas array\"\n        if return_intermediate:\n            assert method in [\n                \"multistep\",\n                \"singlestep\",\n                \"singlestep_fixed\",\n            ], \"Cannot use adaptive solver when saving intermediate values\"\n        if self.correcting_xt_fn is not None:\n            assert method in [\n                \"multistep\",\n                \"singlestep\",\n                \"singlestep_fixed\",\n            ], \"Cannot use adaptive solver when correcting_xt_fn is not None\"\n        device = x.device\n        intermediates = []\n        with torch.no_grad():\n            if method == \"adaptive\":\n                x = self.dpm_solver_adaptive(\n                    x, order=order, t_T=t_T, t_0=t_0, atol=atol, rtol=rtol, solver_type=solver_type\n                )\n            elif method == \"multistep\":\n                assert steps >= order\n                timesteps = self.get_time_steps(skip_type=skip_type, t_T=t_T, t_0=t_0, N=steps, device=device)\n                assert timesteps.shape[0] - 1 == steps\n                # Init the initial values.\n                step = 0\n                t = timesteps[step]\n                t_prev_list = [t]\n                model_prev_list = [self.model_fn(x, t)]\n                if self.correcting_xt_fn is not None:\n                    x = self.correcting_xt_fn(x, t, step)\n                if return_intermediate:\n                    intermediates.append(x)\n                # Init the first `order` values by lower order multistep DPM-Solver.\n                for step in range(1, order):\n                    t = timesteps[step]\n                    x = self.multistep_dpm_solver_update(\n                        x, model_prev_list, t_prev_list, t, step, solver_type=solver_type\n                    )\n                    if self.correcting_xt_fn is not None:\n                        x = self.correcting_xt_fn(x, t, step)\n                    if return_intermediate:\n                        intermediates.append(x)\n                    t_prev_list.append(t)\n                    model_prev_list.append(self.model_fn(x, t))\n                # Compute the remaining values by `order`-th order multistep DPM-Solver.\n                progress_fn = tqdm if progress else lambda x: x\n                for step in progress_fn(range(order, steps + 1)):\n                    t = timesteps[step]\n                    # We only use lower order for steps < 10\n                    if lower_order_final:  # recommended by Shuchen Xue\n                        step_order = min(order, steps + 1 - step)\n                    else:\n                        step_order = order\n                    x = self.multistep_dpm_solver_update(\n                        x, model_prev_list, t_prev_list, t, step_order, solver_type=solver_type\n                    )\n                    if self.correcting_xt_fn is not None:\n                        x = self.correcting_xt_fn(x, t, step)\n                    if return_intermediate:\n                        intermediates.append(x)\n                    for i in range(order - 1):\n                        t_prev_list[i] = t_prev_list[i + 1]\n                        model_prev_list[i] = model_prev_list[i + 1]\n                    t_prev_list[-1] = t\n                    # We do not need to evaluate the final model value.\n                    if step < steps:\n                        model_prev_list[-1] = self.model_fn(x, t)\n            elif method in [\"singlestep\", \"singlestep_fixed\"]:\n                if method == \"singlestep\":\n                    timesteps_outer, orders = self.get_orders_and_timesteps_for_singlestep_solver(\n                        steps=steps, order=order, skip_type=skip_type, t_T=t_T, t_0=t_0, device=device\n                    )\n                elif method == \"singlestep_fixed\":\n                    K = steps // order\n                    orders = [\n                        order,\n                    ] * K\n                    timesteps_outer = self.get_time_steps(skip_type=skip_type, t_T=t_T, t_0=t_0, N=K, device=device)\n                for step, order in enumerate(orders):\n                    s, t = timesteps_outer[step], timesteps_outer[step + 1]\n                    timesteps_inner = self.get_time_steps(\n                        skip_type=skip_type, t_T=s.item(), t_0=t.item(), N=order, device=device\n                    )\n                    lambda_inner = self.noise_schedule.marginal_lambda(timesteps_inner)\n                    h = lambda_inner[-1] - lambda_inner[0]\n                    r1 = None if order <= 1 else (lambda_inner[1] - lambda_inner[0]) / h\n                    r2 = None if order <= 2 else (lambda_inner[2] - lambda_inner[0]) / h\n                    x = self.singlestep_dpm_solver_update(x, s, t, order, solver_type=solver_type, r1=r1, r2=r2)\n                    if self.correcting_xt_fn is not None:\n                        x = self.correcting_xt_fn(x, t, step)\n                    if return_intermediate:\n                        intermediates.append(x)\n            else:\n                raise ValueError(f\"Got wrong method {method}\")\n            if denoise_to_zero:\n                t = torch.ones((1,)).to(device) * t_0\n                x = self.denoise_to_zero_fn(x, t)\n                if self.correcting_xt_fn is not None:\n                    x = self.correcting_xt_fn(x, t, step + 1)\n                if return_intermediate:\n                    intermediates.append(x)\n        return (x, intermediates) if return_intermediate else x\n\n\n#############################################################\n# other utility functions\n#############################################################\n\n\ndef interpolate_fn(x, xp, yp):\n    \"\"\"\n    A piecewise linear function y = f(x), using xp and yp as keypoints.\n    We implement f(x) in a differentiable way (i.e. applicable for autograd).\n    The function f(x) is well-defined for all x-axis. (For x beyond the bounds of xp, we use the outmost points of xp to define the linear function.)\n\n    Args:\n        x: PyTorch tensor with shape [N, C], where N is the batch size, C is the number of channels (we use C = 1 for DPM-Solver).\n        xp: PyTorch tensor with shape [C, K], where K is the number of keypoints.\n        yp: PyTorch tensor with shape [C, K].\n    Returns:\n        The function values f(x), with shape [N, C].\n    \"\"\"\n    N, K = x.shape[0], xp.shape[1]\n    all_x = torch.cat([x.unsqueeze(2), xp.unsqueeze(0).repeat((N, 1, 1))], dim=2)\n    sorted_all_x, x_indices = torch.sort(all_x, dim=2)\n    x_idx = torch.argmin(x_indices, dim=2)\n    cand_start_idx = x_idx - 1\n    start_idx = torch.where(\n        torch.eq(x_idx, 0),\n        torch.tensor(1, device=x.device),\n        torch.where(\n            torch.eq(x_idx, K),\n            torch.tensor(K - 2, device=x.device),\n            cand_start_idx,\n        ),\n    )\n    end_idx = torch.where(torch.eq(start_idx, cand_start_idx), start_idx + 2, start_idx + 1)\n    start_x = torch.gather(sorted_all_x, dim=2, index=start_idx.unsqueeze(2)).squeeze(2)\n    end_x = torch.gather(sorted_all_x, dim=2, index=end_idx.unsqueeze(2)).squeeze(2)\n    start_idx2 = torch.where(\n        torch.eq(x_idx, 0),\n        torch.tensor(0, device=x.device),\n        torch.where(\n            torch.eq(x_idx, K),\n            torch.tensor(K - 2, device=x.device),\n            cand_start_idx,\n        ),\n    )\n    y_positions_expanded = yp.unsqueeze(0).expand(N, -1, -1)\n    start_y = torch.gather(y_positions_expanded, dim=2, index=start_idx2.unsqueeze(2)).squeeze(2)\n    end_y = torch.gather(y_positions_expanded, dim=2, index=(start_idx2 + 1).unsqueeze(2)).squeeze(2)\n    return start_y + (x - start_x) * (end_y - start_y) / (end_x - start_x)\n\n\ndef expand_dims(v, dims):\n    \"\"\"\n    Expand the tensor `v` to the dim `dims`.\n\n    Args:\n        `v`: a PyTorch tensor with shape [N].\n        `dim`: a `int`.\n    Returns:\n        a PyTorch tensor with shape [N, 1, 1, ..., 1] and the total dimension is `dims`.\n    \"\"\"\n    return v[(...,) + (None,) * (dims - 1)]\n\n\ndef DPMS(\n    model,\n    condition,\n    uncondition,\n    cfg_scale,\n    model_type=\"noise\",\n    noise_schedule=\"linear\",\n    guidance_type=\"classifier-free\",\n    model_kwargs=None,\n    diffusion_steps=1000,\n):\n    if model_kwargs is None:\n        model_kwargs = {}\n    betas = torch.tensor(get_named_beta_schedule(noise_schedule, diffusion_steps))\n\n    ## 1. Define the noise schedule.\n    noise_schedule = NoiseScheduleVP(schedule=\"discrete\", betas=betas)\n\n    ## 2. Convert your discrete-time `model` to the continuous-time\n    ## noise prediction model. Here is an example for a diffusion model\n    ## `model` with the noise prediction type (\"noise\") .\n    model_fn = model_wrapper(\n        model,\n        noise_schedule,\n        model_type=model_type,\n        model_kwargs=model_kwargs,\n        guidance_type=guidance_type,\n        condition=condition,\n        unconditional_condition=uncondition,\n        guidance_scale=cfg_scale,\n    )\n    ## 3. Define dpm-solver and sample by multistep DPM-Solver.\n    return DPM_Solver(model_fn, noise_schedule, algorithm_type=\"dpmsolver++\")\n"
  },
  {
    "path": "Open-Sora/opensora/schedulers/iddpm/__init__.py",
    "content": "from functools import partial\n\nimport torch\n\nfrom opensora.registry import SCHEDULERS\n\nfrom . import gaussian_diffusion as gd\nfrom .respace import SpacedDiffusion, space_timesteps\nfrom .speed import SpeeDiffusion\n\n\n@SCHEDULERS.register_module(\"iddpm\")\nclass IDDPM(SpacedDiffusion):\n    def __init__(\n        self,\n        num_sampling_steps=None,\n        timestep_respacing=None,\n        noise_schedule=\"linear\",\n        use_kl=False,\n        sigma_small=False,\n        predict_xstart=False,\n        learn_sigma=True,\n        rescale_learned_sigmas=False,\n        diffusion_steps=1000,\n        cfg_scale=4.0,\n        cfg_channel=None,\n    ):\n        betas = gd.get_named_beta_schedule(noise_schedule, diffusion_steps)\n        if use_kl:\n            loss_type = gd.LossType.RESCALED_KL\n        elif rescale_learned_sigmas:\n            loss_type = gd.LossType.RESCALED_MSE\n        else:\n            loss_type = gd.LossType.MSE\n        if num_sampling_steps is not None:\n            assert timestep_respacing is None\n            timestep_respacing = str(num_sampling_steps)\n        if timestep_respacing is None or timestep_respacing == \"\":\n            timestep_respacing = [diffusion_steps]\n        super().__init__(\n            use_timesteps=space_timesteps(diffusion_steps, timestep_respacing),\n            betas=betas,\n            model_mean_type=(gd.ModelMeanType.EPSILON if not predict_xstart else gd.ModelMeanType.START_X),\n            model_var_type=(\n                (gd.ModelVarType.FIXED_LARGE if not sigma_small else gd.ModelVarType.FIXED_SMALL)\n                if not learn_sigma\n                else gd.ModelVarType.LEARNED_RANGE\n            ),\n            loss_type=loss_type,\n        )\n\n        self.cfg_scale = cfg_scale\n        self.cfg_channel = cfg_channel\n\n    def sample(\n        self,\n        model,\n        text_encoder,\n        z,\n        prompts,\n        device,\n        additional_args=None,\n        mask=None,\n        progress=True,\n    ):\n        n = len(prompts)\n        z = torch.cat([z, z], 0)\n        model_args = text_encoder.encode(prompts)\n        y_null = text_encoder.null(n)\n        model_args[\"y\"] = torch.cat([model_args[\"y\"], y_null], 0)\n        if additional_args is not None:\n            model_args.update(additional_args)\n        forward = partial(forward_with_cfg, model, cfg_scale=self.cfg_scale, cfg_channel=self.cfg_channel)\n        samples = self.p_sample_loop(\n            forward,\n            z.shape,\n            z,\n            clip_denoised=False,\n            model_kwargs=model_args,\n            progress=progress,\n            device=device,\n            mask=mask,\n        )\n        samples, _ = samples.chunk(2, dim=0)\n        return samples\n\n\ndef forward_with_cfg(model, x, timestep, y, cfg_scale, cfg_channel=None, **kwargs):\n    # https://github.com/openai/glide-text2im/blob/main/notebooks/text2im.ipynb\n    half = x[: len(x) // 2]\n    combined = torch.cat([half, half], dim=0)\n    if \"x_mask\" in kwargs and kwargs[\"x_mask\"] is not None:\n        if len(kwargs[\"x_mask\"]) != len(x):\n            kwargs[\"x_mask\"] = torch.cat([kwargs[\"x_mask\"], kwargs[\"x_mask\"]], dim=0)\n    model_out = model.forward(combined, timestep, y, **kwargs)\n    model_out = model_out[\"x\"] if isinstance(model_out, dict) else model_out\n    if cfg_channel is None:\n        cfg_channel = model_out.shape[1] // 2\n    eps, rest = model_out[:, :cfg_channel], model_out[:, cfg_channel:]\n    cond_eps, uncond_eps = torch.split(eps, len(eps) // 2, dim=0)\n    half_eps = uncond_eps + cfg_scale * (cond_eps - uncond_eps)\n    eps = torch.cat([half_eps, half_eps], dim=0)\n    return torch.cat([eps, rest], dim=1)\n"
  },
  {
    "path": "Open-Sora/opensora/schedulers/iddpm/diffusion_utils.py",
    "content": "# Adapted from DiT\n\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# DiT:   https://github.com/facebookresearch/DiT/tree/main\n# GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py\n# ADM:   https://github.com/openai/guided-diffusion/blob/main/guided_diffusion\n# IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\n# --------------------------------------------------------\n\n\nimport numpy as np\nimport torch\n\n\ndef normal_kl(mean1, logvar1, mean2, logvar2):\n    \"\"\"\n    Compute the KL divergence between two gaussians.\n    Shapes are automatically broadcasted, so batches can be compared to\n    scalars, among other use cases.\n    \"\"\"\n    tensor = None\n    for obj in (mean1, logvar1, mean2, logvar2):\n        if isinstance(obj, torch.Tensor):\n            tensor = obj\n            break\n    assert tensor is not None, \"at least one argument must be a Tensor\"\n\n    # Force variances to be Tensors. Broadcasting helps convert scalars to\n    # Tensors, but it does not work for torch.exp().\n    logvar1, logvar2 = [x if isinstance(x, torch.Tensor) else torch.tensor(x).to(tensor) for x in (logvar1, logvar2)]\n\n    return 0.5 * (\n        -1.0 + logvar2 - logvar1 + torch.exp(logvar1 - logvar2) + ((mean1 - mean2) ** 2) * torch.exp(-logvar2)\n    )\n\n\ndef approx_standard_normal_cdf(x):\n    \"\"\"\n    A fast approximation of the cumulative distribution function of the\n    standard normal.\n    \"\"\"\n    return 0.5 * (1.0 + torch.tanh(np.sqrt(2.0 / torch.pi) * (x + 0.044715 * torch.pow(x, 3))))\n\n\ndef continuous_gaussian_log_likelihood(x, *, means, log_scales):\n    \"\"\"\n    Compute the log-likelihood of a continuous Gaussian distribution.\n    :param x: the targets\n    :param means: the Gaussian mean Tensor.\n    :param log_scales: the Gaussian log stddev Tensor.\n    :return: a tensor like x of log probabilities (in nats).\n    \"\"\"\n    centered_x = x - means\n    inv_stdv = torch.exp(-log_scales)\n    normalized_x = centered_x * inv_stdv\n    log_probs = torch.distributions.Normal(torch.zeros_like(x), torch.ones_like(x)).log_prob(normalized_x)\n    return log_probs\n\n\ndef discretized_gaussian_log_likelihood(x, *, means, log_scales):\n    \"\"\"\n    Compute the log-likelihood of a Gaussian distribution discretizing to a\n    given image.\n    :param x: the target images. It is assumed that this was uint8 values,\n              rescaled to the range [-1, 1].\n    :param means: the Gaussian mean Tensor.\n    :param log_scales: the Gaussian log stddev Tensor.\n    :return: a tensor like x of log probabilities (in nats).\n    \"\"\"\n    assert x.shape == means.shape == log_scales.shape\n    centered_x = x - means\n    inv_stdv = torch.exp(-log_scales)\n    plus_in = inv_stdv * (centered_x + 1.0 / 255.0)\n    cdf_plus = approx_standard_normal_cdf(plus_in)\n    min_in = inv_stdv * (centered_x - 1.0 / 255.0)\n    cdf_min = approx_standard_normal_cdf(min_in)\n    log_cdf_plus = torch.log(cdf_plus.clamp(min=1e-12))\n    log_one_minus_cdf_min = torch.log((1.0 - cdf_min).clamp(min=1e-12))\n    cdf_delta = cdf_plus - cdf_min\n    log_probs = torch.where(\n        x < -0.999,\n        log_cdf_plus,\n        torch.where(x > 0.999, log_one_minus_cdf_min, torch.log(cdf_delta.clamp(min=1e-12))),\n    )\n    assert log_probs.shape == x.shape\n    return log_probs\n"
  },
  {
    "path": "Open-Sora/opensora/schedulers/iddpm/gaussian_diffusion.py",
    "content": "# Adapted from DiT\n\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# DiT:   https://github.com/facebookresearch/DiT/tree/main\n# GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py\n# ADM:   https://github.com/openai/guided-diffusion/blob/main/guided_diffusion\n# IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\n# --------------------------------------------------------\n\nimport enum\nfrom typing import Callable, List\n\nimport numpy as np\nimport torch\nfrom einops import rearrange\n\nfrom .diffusion_utils import discretized_gaussian_log_likelihood, normal_kl\n\n\ndef mean_flat(tensor: torch.Tensor, mask=None):\n    \"\"\"\n    Take the mean over all non-batch dimensions.\n    \"\"\"\n    if mask is None:\n        return tensor.mean(dim=list(range(1, len(tensor.shape))))\n    else:\n        assert tensor.dim() == 5\n        assert tensor.shape[2] == mask.shape[1]\n        tensor = rearrange(tensor, \"b c t h w -> b t (c h w)\")\n        denom = mask.sum(dim=1) * tensor.shape[-1]\n        loss = (tensor * mask.unsqueeze(2)).sum(dim=1).sum(dim=1) / denom\n        return loss\n\n\nclass ModelMeanType(enum.Enum):\n    \"\"\"\n    Which type of output the model predicts.\n    \"\"\"\n\n    PREVIOUS_X = enum.auto()  # the model predicts x_{t-1}\n    START_X = enum.auto()  # the model predicts x_0\n    EPSILON = enum.auto()  # the model predicts epsilon\n\n\nclass ModelVarType(enum.Enum):\n    \"\"\"\n    What is used as the model's output variance.\n    The LEARNED_RANGE option has been added to allow the model to predict\n    values between FIXED_SMALL and FIXED_LARGE, making its job easier.\n    \"\"\"\n\n    LEARNED = enum.auto()\n    FIXED_SMALL = enum.auto()\n    FIXED_LARGE = enum.auto()\n    LEARNED_RANGE = enum.auto()\n\n\nclass LossType(enum.Enum):\n    MSE = enum.auto()  # use raw MSE loss (and KL when learning variances)\n    RESCALED_MSE = enum.auto()  # use raw MSE loss (with RESCALED_KL when learning variances)\n    KL = enum.auto()  # use the variational lower-bound\n    RESCALED_KL = enum.auto()  # like KL, but rescale to estimate the full VLB\n\n    def is_vb(self):\n        return self == LossType.KL or self == LossType.RESCALED_KL\n\n\ndef _warmup_beta(beta_start: float, beta_end: float, num_diffusion_timesteps: int, warmup_frac: float) -> torch.Tensor:\n    betas = beta_end * torch.ones(num_diffusion_timesteps, dtype=torch.float64)\n    warmup_time = int(num_diffusion_timesteps * warmup_frac)\n    betas[:warmup_time] = torch.linspace(beta_start, beta_end, warmup_time, dtype=torch.float64)\n    return betas\n\n\ndef get_beta_schedule(\n    beta_schedule: str, *, beta_start: float, beta_end: float, num_diffusion_timesteps: int\n) -> torch.Tensor:\n    \"\"\"\n    This is the deprecated API for creating beta schedules.\n    See get_named_beta_schedule() for the new library of schedules.\n    \"\"\"\n    if beta_schedule == \"quad\":\n        betas = (\n            torch.linspace(\n                beta_start**0.5,\n                beta_end**0.5,\n                num_diffusion_timesteps,\n                dtype=torch.float64,\n            )\n            ** 2\n        )\n    elif beta_schedule == \"linear\":\n        betas = torch.linspace(beta_start, beta_end, num_diffusion_timesteps, dtype=torch.float64)\n    elif beta_schedule == \"warmup10\":\n        betas = _warmup_beta(beta_start, beta_end, num_diffusion_timesteps, 0.1)\n    elif beta_schedule == \"warmup50\":\n        betas = _warmup_beta(beta_start, beta_end, num_diffusion_timesteps, 0.5)\n    elif beta_schedule == \"const\":\n        betas = beta_end * torch.ones(num_diffusion_timesteps, dtype=torch.float64)\n    elif beta_schedule == \"jsd\":  # 1/T, 1/(T-1), 1/(T-2), ..., 1\n        betas = 1.0 / torch.linspace(num_diffusion_timesteps, 1, num_diffusion_timesteps, dtype=torch.float64)\n    else:\n        raise NotImplementedError(beta_schedule)\n    assert betas.shape == (num_diffusion_timesteps,)\n    return betas\n\n\ndef betas_for_alpha_bar(num_diffusion_timesteps: int, alpha_bar: Callable, max_beta: float = 0.999):\n    \"\"\"\n    Create a beta schedule that discretizes the given alpha_t_bar function,\n    which defines the cumulative product of (1-beta) over time from t = [0,1].\n    :param num_diffusion_timesteps: the number of betas to produce.\n    :param alpha_bar: a lambda that takes an argument t from 0 to 1 and\n                      produces the cumulative product of (1-beta) up to that\n                      part of the diffusion process.\n    :param max_beta: the maximum beta to use; use values lower than 1 to\n                     prevent singularities.\n    \"\"\"\n    betas = []\n    for i in range(num_diffusion_timesteps):\n        t1 = i / num_diffusion_timesteps\n        t2 = (i + 1) / num_diffusion_timesteps\n        betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta))\n    return torch.DoubleTensor(betas)\n\n\ndef get_named_beta_schedule(schedule_name, num_diffusion_timesteps):\n    \"\"\"\n    Get a pre-defined beta schedule for the given name.\n    The beta schedule library consists of beta schedules which remain similar\n    in the limit of num_diffusion_timesteps.\n    Beta schedules may be added, but should not be removed or changed once\n    they are committed to maintain backwards compatibility.\n    \"\"\"\n    if schedule_name == \"linear\":\n        # Linear schedule from Ho et al, extended to work for any number of\n        # diffusion steps.\n        scale = 1000 / num_diffusion_timesteps\n        return get_beta_schedule(\n            \"linear\",\n            beta_start=scale * 0.0001,\n            beta_end=scale * 0.02,\n            num_diffusion_timesteps=num_diffusion_timesteps,\n        )\n    elif schedule_name == \"squaredcos_cap_v2\":\n        return betas_for_alpha_bar(\n            num_diffusion_timesteps,\n            lambda t: matorch.cos((t + 0.008) / 1.008 * matorch.pi / 2) ** 2,\n        )\n    else:\n        raise NotImplementedError(f\"unknown beta schedule: {schedule_name}\")\n\n\nclass GaussianDiffusion:\n    \"\"\"\n    Utilities for training and sampling diffusion models.\n    Original ported from this codebase:\n    https://github.com/hojonathanho/diffusion/blob/1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/diffusion_utils_2.py#L42\n    :param betas: a 1-D numpy array of betas for each diffusion timestep,\n                  starting at T and going to 1.\n    \"\"\"\n\n    def __init__(\n        self,\n        *,\n        betas: torch.Tensor,\n        model_mean_type: str,\n        model_var_type: str,\n        loss_type: str,\n        device: str = \"cuda\",\n    ):\n        if device == \"cuda\":\n            device = torch.device(f\"cuda:{torch.cuda.current_device()}\")\n        elif device == \"cpu\":\n            device = torch.device(\"cpu\")\n        else:\n            raise ValueError(f\"Unknown device: {device}\")\n        self.device = device\n        self.model_mean_type = model_mean_type\n        self.model_var_type = model_var_type\n        self.loss_type = loss_type\n\n        # Use float64 for accuracy.\n        self.betas = betas.to(self.device)\n        assert len(self.betas.shape) == 1, \"betas must be 1-D\"\n        assert (self.betas > 0).all() and (self.betas <= 1).all()\n\n        self.num_timesteps = int(betas.shape[0])\n\n        alphas = 1.0 - self.betas\n        self.alphas_cumprod = torch.cumprod(alphas, axis=0)\n        self.alphas_cumprod_prev = torch.cat([torch.tensor([1.0], device=self.device), self.alphas_cumprod[:-1]])\n        self.alphas_cumprod_next = torch.cat([self.alphas_cumprod[1:], torch.tensor([0.0], device=self.device)])\n        assert self.alphas_cumprod_prev.shape == (self.num_timesteps,)\n\n        # calculations for diffusion q(x_t | x_{t-1}) and others\n        self.sqrt_alphas_cumprod = torch.sqrt(self.alphas_cumprod)\n        self.sqrt_one_minus_alphas_cumprod = torch.sqrt(1.0 - self.alphas_cumprod)\n        self.log_one_minus_alphas_cumprod = torch.log(1.0 - self.alphas_cumprod)\n        self.sqrt_recip_alphas_cumprod = torch.sqrt(1.0 / self.alphas_cumprod)\n        self.sqrt_recipm1_alphas_cumprod = torch.sqrt(1.0 / self.alphas_cumprod - 1)\n\n        # calculations for posterior q(x_{t-1} | x_t, x_0)\n        self.posterior_variance = self.betas * (1.0 - self.alphas_cumprod_prev) / (1.0 - self.alphas_cumprod)\n        # below: log calculation clipped because the posterior variance is 0 at the beginning of the diffusion chain\n        self.posterior_log_variance_clipped = (\n            torch.log(torch.cat([self.posterior_variance[1].unsqueeze(0), self.posterior_variance[1:]]))\n            if len(self.posterior_variance) > 1\n            else torch.DoubleTensor([])\n        )\n\n        self.posterior_mean_coef1 = self.betas * torch.sqrt(self.alphas_cumprod_prev) / (1.0 - self.alphas_cumprod)\n        self.posterior_mean_coef2 = (1.0 - self.alphas_cumprod_prev) * torch.sqrt(alphas) / (1.0 - self.alphas_cumprod)\n\n    def q_mean_variance(self, x_start, t):\n        \"\"\"\n        Get the distribution q(x_t | x_0).\n        :param x_start: the [N x C x ...] tensor of noiseless inputs.\n        :param t: the number of diffusion steps (minus 1). Here, 0 means one step.\n        :return: A tuple (mean, variance, log_variance), all of x_start's shape.\n        \"\"\"\n        mean = _extract_into_tensor(self.sqrt_alphas_cumprod, t, x_start.shape) * x_start\n        variance = _extract_into_tensor(1.0 - self.alphas_cumprod, t, x_start.shape)\n        log_variance = _extract_into_tensor(self.log_one_minus_alphas_cumprod, t, x_start.shape)\n        return mean, variance, log_variance\n\n    def q_sample(self, x_start, t, noise=None):\n        \"\"\"\n        Diffuse the data for a given number of diffusion steps.\n        In other words, sample from q(x_t | x_0).\n        :param x_start: the initial data batch.\n        :param t: the number of diffusion steps (minus 1). Here, 0 means one step.\n        :param noise: if specified, the split-out normal noise.\n        :return: A noisy version of x_start.\n        \"\"\"\n        if noise is None:\n            noise = torch.randn_like(x_start)\n        assert noise.shape == x_start.shape\n        return (\n            _extract_into_tensor(self.sqrt_alphas_cumprod, t, x_start.shape) * x_start\n            + _extract_into_tensor(self.sqrt_one_minus_alphas_cumprod, t, x_start.shape) * noise\n        )\n\n    def q_posterior_mean_variance(self, x_start, x_t, t):\n        \"\"\"\n        Compute the mean and variance of the diffusion posterior:\n            q(x_{t-1} | x_t, x_0)\n        \"\"\"\n        assert x_start.shape == x_t.shape\n        posterior_mean = (\n            _extract_into_tensor(self.posterior_mean_coef1, t, x_t.shape) * x_start\n            + _extract_into_tensor(self.posterior_mean_coef2, t, x_t.shape) * x_t\n        )\n        posterior_variance = _extract_into_tensor(self.posterior_variance, t, x_t.shape)\n        posterior_log_variance_clipped = _extract_into_tensor(self.posterior_log_variance_clipped, t, x_t.shape)\n        assert (\n            posterior_mean.shape[0]\n            == posterior_variance.shape[0]\n            == posterior_log_variance_clipped.shape[0]\n            == x_start.shape[0]\n        )\n        return posterior_mean, posterior_variance, posterior_log_variance_clipped\n\n    def p_mean_variance(self, model, x, t, clip_denoised=True, denoised_fn=None, model_kwargs=None):\n        \"\"\"\n        Apply the model to get p(x_{t-1} | x_t), as well as a prediction of\n        the initial x, x_0.\n        :param model: the model, which takes a signal and a batch of timesteps\n                      as input.\n        :param x: the [N x C x ...] tensor at time t.\n        :param t: a 1-D Tensor of timesteps.\n        :param clip_denoised: if True, clip the denoised signal into [-1, 1].\n        :param denoised_fn: if not None, a function which applies to the\n            x_start prediction before it is used to sample. Applies before\n            clip_denoised.\n        :param model_kwargs: if not None, a dict of extra keyword arguments to\n            pass to the model. This can be used for conditioning.\n        :return: a dict with the following keys:\n                 - 'mean': the model mean output.\n                 - 'variance': the model variance output.\n                 - 'log_variance': the log of 'variance'.\n                 - 'pred_xstart': the prediction for x_0.\n        \"\"\"\n        if model_kwargs is None:\n            model_kwargs = {}\n\n        B, C = x.shape[:2]\n        assert t.shape == (B,)\n        model_output = model(x, t, **model_kwargs)\n        if isinstance(model_output, tuple):\n            model_output, extra = model_output\n        else:\n            extra = None\n\n        if self.model_var_type in [ModelVarType.LEARNED, ModelVarType.LEARNED_RANGE]:\n            assert model_output.shape == (B, C * 2, *x.shape[2:])\n            model_output, model_var_values = torch.split(model_output, C, dim=1)\n            min_log = _extract_into_tensor(self.posterior_log_variance_clipped, t, x.shape)\n            max_log = _extract_into_tensor(torch.log(self.betas), t, x.shape)\n            # The model_var_values is [-1, 1] for [min_var, max_var].\n            frac = (model_var_values + 1) / 2\n            model_log_variance = frac * max_log + (1 - frac) * min_log\n            model_variance = torch.exp(model_log_variance)\n        else:\n            model_variance, model_log_variance = {\n                # for fixedlarge, we set the initial (log-)variance like so\n                # to get a better decoder log likelihood.\n                ModelVarType.FIXED_LARGE: (\n                    torch.cat(self.posterior_variance[1].unsqueeze(0), self.betas[1:]),\n                    torch.log(torch.cat(self.posterior_variance[1].unsqueeze(0), self.betas[1:])),\n                ),\n                ModelVarType.FIXED_SMALL: (\n                    self.posterior_variance,\n                    self.posterior_log_variance_clipped,\n                ),\n            }[self.model_var_type]\n            model_variance = _extract_into_tensor(model_variance, t, x.shape)\n            model_log_variance = _extract_into_tensor(model_log_variance, t, x.shape)\n\n        def process_xstart(x):\n            if denoised_fn is not None:\n                x = denoised_fn(x)\n            if clip_denoised:\n                return x.clamp(-1, 1)\n            return x\n\n        if self.model_mean_type == ModelMeanType.START_X:\n            pred_xstart = process_xstart(model_output)\n        else:\n            pred_xstart = process_xstart(self._predict_xstart_from_eps(x_t=x, t=t, eps=model_output))\n        model_mean, _, _ = self.q_posterior_mean_variance(x_start=pred_xstart, x_t=x, t=t)\n\n        assert model_mean.shape == model_log_variance.shape == pred_xstart.shape == x.shape\n        return {\n            \"mean\": model_mean,\n            \"variance\": model_variance,\n            \"log_variance\": model_log_variance,\n            \"pred_xstart\": pred_xstart,\n            \"extra\": extra,\n        }\n\n    def _predict_xstart_from_eps(self, x_t, t, eps):\n        assert x_t.shape == eps.shape\n        return (\n            _extract_into_tensor(self.sqrt_recip_alphas_cumprod, t, x_t.shape) * x_t\n            - _extract_into_tensor(self.sqrt_recipm1_alphas_cumprod, t, x_t.shape) * eps\n        )\n\n    def _predict_eps_from_xstart(self, x_t, t, pred_xstart):\n        return (\n            _extract_into_tensor(self.sqrt_recip_alphas_cumprod, t, x_t.shape) * x_t - pred_xstart\n        ) / _extract_into_tensor(self.sqrt_recipm1_alphas_cumprod, t, x_t.shape)\n\n    def condition_mean(self, cond_fn, p_mean_var, x, t, model_kwargs=None):\n        \"\"\"\n        Compute the mean for the previous step, given a function cond_fn that\n        computes the gradient of a conditional log probability with respect to\n        x. In particular, cond_fn computes grad(log(p(y|x))), and we want to\n        condition on y.\n        This uses the conditioning strategy from Sohl-Dickstein et al. (2015).\n        \"\"\"\n        gradient = cond_fn(x, t, **model_kwargs)\n        new_mean = p_mean_var[\"mean\"].float() + p_mean_var[\"variance\"] * gradient.float()\n        return new_mean\n\n    def condition_score(self, cond_fn, p_mean_var, x, t, model_kwargs=None):\n        \"\"\"\n        Compute what the p_mean_variance output would have been, should the\n        model's score function be conditioned by cond_fn.\n        See condition_mean() for details on cond_fn.\n        Unlike condition_mean(), this instead uses the conditioning strategy\n        from Song et al (2020).\n        \"\"\"\n        alpha_bar = _extract_into_tensor(self.alphas_cumprod, t, x.shape)\n\n        eps = self._predict_eps_from_xstart(x, t, p_mean_var[\"pred_xstart\"])\n        eps = eps - (1 - alpha_bar).sqrt() * cond_fn(x, t, **model_kwargs)\n\n        out = p_mean_var.copy()\n        out[\"pred_xstart\"] = self._predict_xstart_from_eps(x, t, eps)\n        out[\"mean\"], _, _ = self.q_posterior_mean_variance(x_start=out[\"pred_xstart\"], x_t=x, t=t)\n        return out\n\n    def p_sample(\n        self,\n        model,\n        x,\n        t,\n        clip_denoised=True,\n        denoised_fn=None,\n        cond_fn=None,\n        model_kwargs=None,\n        mask=None,\n    ):\n        \"\"\"\n        Sample x_{t-1} from the model at the given timestep.\n        :param model: the model to sample from.\n        :param x: the current tensor at x_{t-1}.\n        :param t: the value of t, starting at 0 for the first diffusion step.\n        :param clip_denoised: if True, clip the x_start prediction to [-1, 1].\n        :param denoised_fn: if not None, a function which applies to the\n            x_start prediction before it is used to sample.\n        :param cond_fn: if not None, this is a gradient function that acts\n                        similarly to the model.\n        :param model_kwargs: if not None, a dict of extra keyword arguments to\n            pass to the model. This can be used for conditioning.\n        :return: a dict containing the following keys:\n                 - 'sample': a random sample from the model.\n                 - 'pred_xstart': a prediction of x_0.\n        \"\"\"\n        if mask is not None:\n            if mask.shape[0] != x.shape[0]:\n                mask = mask.repeat(2, 1)  # HACK\n            mask_t = (mask * len(self.betas)).to(torch.int)\n\n            # x0: copy unchanged x values\n            # x_noise: add noise to x values\n            x0 = x.clone()\n            x_noise = x0 * _extract_into_tensor(self.sqrt_alphas_cumprod, t, x.shape) + torch.randn_like(\n                x\n            ) * _extract_into_tensor(self.sqrt_one_minus_alphas_cumprod, t, x.shape)\n\n            # active noise addition\n            # WARNING: this is a hacky implementation\n            mask_t_equall = (mask_t == t.unsqueeze(1))[:, None, :, None, None]\n            x = torch.where(mask_t_equall, x_noise, x0)\n\n            # create x_mask\n            mask_t_upper = (mask_t > t.unsqueeze(1))[:, None, :, None, None]\n            batch_size = x.shape[0]\n            model_kwargs[\"x_mask\"] = mask_t_upper.reshape(batch_size, -1).to(torch.bool)\n\n        out = self.p_mean_variance(\n            model,\n            x,\n            t,\n            clip_denoised=clip_denoised,\n            denoised_fn=denoised_fn,\n            model_kwargs=model_kwargs,\n        )\n        noise = torch.randn_like(x)\n        nonzero_mask = (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))  # no noise when t == 0\n        if cond_fn is not None:\n            out[\"mean\"] = self.condition_mean(cond_fn, out, x, t, model_kwargs=model_kwargs)\n        sample = out[\"mean\"] + nonzero_mask * torch.exp(0.5 * out[\"log_variance\"]) * noise\n\n        if mask is not None:\n            mask_t_lower = (mask_t < t.unsqueeze(1))[:, None, :, None, None]\n            sample = torch.where(mask_t_lower, x0, sample)\n\n        return {\"sample\": sample, \"pred_xstart\": out[\"pred_xstart\"]}\n\n    def p_sample_loop(\n        self,\n        model,\n        shape,\n        noise=None,\n        clip_denoised=True,\n        denoised_fn=None,\n        cond_fn=None,\n        model_kwargs=None,\n        device=None,\n        progress=False,\n        mask=None,\n    ):\n        \"\"\"\n        Generate samples from the model.\n        :param model: the model module.\n        :param shape: the shape of the samples, (N, C, H, W).\n        :param noise: if specified, the noise from the encoder to sample.\n                      Should be of the same shape as `shape`.\n        :param clip_denoised: if True, clip x_start predictions to [-1, 1].\n        :param denoised_fn: if not None, a function which applies to the\n            x_start prediction before it is used to sample.\n        :param cond_fn: if not None, this is a gradient function that acts\n                        similarly to the model.\n        :param model_kwargs: if not None, a dict of extra keyword arguments to\n            pass to the model. This can be used for conditioning.\n        :param device: if specified, the device to create the samples on.\n                       If not specified, use a model parameter's device.\n        :param progress: if True, show a tqdm progress bar.\n        :return: a non-differentiable batch of samples.\n        \"\"\"\n        final = None\n        for sample in self.p_sample_loop_progressive(\n            model,\n            shape,\n            noise=noise,\n            clip_denoised=clip_denoised,\n            denoised_fn=denoised_fn,\n            cond_fn=cond_fn,\n            model_kwargs=model_kwargs,\n            device=device,\n            progress=progress,\n            mask=mask,\n        ):\n            final = sample\n        return final[\"sample\"]\n\n    def p_sample_loop_progressive(\n        self,\n        model,\n        shape,\n        noise=None,\n        clip_denoised=True,\n        denoised_fn=None,\n        cond_fn=None,\n        model_kwargs=None,\n        device=None,\n        progress=False,\n        mask=None,\n    ):\n        \"\"\"\n        Generate samples from the model and yield intermediate samples from\n        each timestep of diffusion.\n        Arguments are the same as p_sample_loop().\n        Returns a generator over dicts, where each dict is the return value of\n        p_sample().\n        \"\"\"\n        if device is None:\n            device = next(model.parameters()).device\n        assert isinstance(shape, (tuple, list))\n        if noise is not None:\n            img = noise\n        else:\n            img = torch.randn(*shape, device=device)\n        indices = list(range(self.num_timesteps))[::-1]\n\n        if progress:\n            # Lazy import so that we don't depend on tqdm.\n            from tqdm.auto import tqdm\n\n            indices = tqdm(indices)\n\n        for i in indices:\n            t = torch.tensor([i] * shape[0], device=device)\n            with torch.no_grad():\n                out = self.p_sample(\n                    model,\n                    img,\n                    t,\n                    clip_denoised=clip_denoised,\n                    denoised_fn=denoised_fn,\n                    cond_fn=cond_fn,\n                    model_kwargs=model_kwargs,\n                    mask=mask,\n                )\n                yield out\n                img = out[\"sample\"]\n\n    def ddim_sample(\n        self,\n        model,\n        x,\n        t,\n        clip_denoised=True,\n        denoised_fn=None,\n        cond_fn=None,\n        model_kwargs=None,\n        eta=0.0,\n    ):\n        \"\"\"\n        Sample x_{t-1} from the model using DDIM.\n        Same usage as p_sample().\n        \"\"\"\n        out = self.p_mean_variance(\n            model,\n            x,\n            t,\n            clip_denoised=clip_denoised,\n            denoised_fn=denoised_fn,\n            model_kwargs=model_kwargs,\n        )\n        if cond_fn is not None:\n            out = self.condition_score(cond_fn, out, x, t, model_kwargs=model_kwargs)\n\n        # Usually our model outputs epsilon, but we re-derive it\n        # in case we used x_start or x_prev prediction.\n        eps = self._predict_eps_from_xstart(x, t, out[\"pred_xstart\"])\n\n        alpha_bar = _extract_into_tensor(self.alphas_cumprod, t, x.shape)\n        alpha_bar_prev = _extract_into_tensor(self.alphas_cumprod_prev, t, x.shape)\n        sigma = eta * torch.sqrt((1 - alpha_bar_prev) / (1 - alpha_bar)) * torch.sqrt(1 - alpha_bar / alpha_bar_prev)\n        # Equation 12.\n        noise = torch.randn_like(x)\n        mean_pred = out[\"pred_xstart\"] * torch.sqrt(alpha_bar_prev) + torch.sqrt(1 - alpha_bar_prev - sigma**2) * eps\n        nonzero_mask = (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))  # no noise when t == 0\n        sample = mean_pred + nonzero_mask * sigma * noise\n        return {\"sample\": sample, \"pred_xstart\": out[\"pred_xstart\"]}\n\n    def ddim_reverse_sample(\n        self,\n        model,\n        x,\n        t,\n        clip_denoised=True,\n        denoised_fn=None,\n        cond_fn=None,\n        model_kwargs=None,\n        eta=0.0,\n    ):\n        \"\"\"\n        Sample x_{t+1} from the model using DDIM reverse ODE.\n        \"\"\"\n        assert eta == 0.0, \"Reverse ODE only for deterministic path\"\n        out = self.p_mean_variance(\n            model,\n            x,\n            t,\n            clip_denoised=clip_denoised,\n            denoised_fn=denoised_fn,\n            model_kwargs=model_kwargs,\n        )\n        if cond_fn is not None:\n            out = self.condition_score(cond_fn, out, x, t, model_kwargs=model_kwargs)\n        # Usually our model outputs epsilon, but we re-derive it\n        # in case we used x_start or x_prev prediction.\n        eps = (\n            _extract_into_tensor(self.sqrt_recip_alphas_cumprod, t, x.shape) * x - out[\"pred_xstart\"]\n        ) / _extract_into_tensor(self.sqrt_recipm1_alphas_cumprod, t, x.shape)\n        alpha_bar_next = _extract_into_tensor(self.alphas_cumprod_next, t, x.shape)\n\n        # Equation 12. reversed\n        mean_pred = out[\"pred_xstart\"] * torch.sqrt(alpha_bar_next) + torch.sqrt(1 - alpha_bar_next) * eps\n\n        return {\"sample\": mean_pred, \"pred_xstart\": out[\"pred_xstart\"]}\n\n    def ddim_sample_loop(\n        self,\n        model,\n        shape,\n        noise=None,\n        clip_denoised=True,\n        denoised_fn=None,\n        cond_fn=None,\n        model_kwargs=None,\n        device=None,\n        progress=False,\n        eta=0.0,\n    ):\n        \"\"\"\n        Generate samples from the model using DDIM.\n        Same usage as p_sample_loop().\n        \"\"\"\n        final = None\n        for sample in self.ddim_sample_loop_progressive(\n            model,\n            shape,\n            noise=noise,\n            clip_denoised=clip_denoised,\n            denoised_fn=denoised_fn,\n            cond_fn=cond_fn,\n            model_kwargs=model_kwargs,\n            device=device,\n            progress=progress,\n            eta=eta,\n        ):\n            final = sample\n        return final[\"sample\"]\n\n    def ddim_sample_loop_progressive(\n        self,\n        model,\n        shape,\n        noise=None,\n        clip_denoised=True,\n        denoised_fn=None,\n        cond_fn=None,\n        model_kwargs=None,\n        device=None,\n        progress=False,\n        eta=0.0,\n    ):\n        \"\"\"\n        Use DDIM to sample from the model and yield intermediate samples from\n        each timestep of DDIM.\n        Same usage as p_sample_loop_progressive().\n        \"\"\"\n        if device is None:\n            device = next(model.parameters()).device\n        assert isinstance(shape, (tuple, list))\n        if noise is not None:\n            img = noise\n        else:\n            img = torch.randn(*shape, device=device)\n        indices = list(range(self.num_timesteps))[::-1]\n\n        if progress:\n            # Lazy import so that we don't depend on tqdm.\n            from tqdm.auto import tqdm\n\n            indices = tqdm(indices)\n\n        for i in indices:\n            t = torch.tensor([i] * shape[0], device=device)\n            with torch.no_grad():\n                out = self.ddim_sample(\n                    model,\n                    img,\n                    t,\n                    clip_denoised=clip_denoised,\n                    denoised_fn=denoised_fn,\n                    cond_fn=cond_fn,\n                    model_kwargs=model_kwargs,\n                    eta=eta,\n                )\n                yield out\n                img = out[\"sample\"]\n\n    def _vb_terms_bpd(self, model, x_start, x_t, t, clip_denoised=True, model_kwargs=None, mask=None):\n        \"\"\"\n        Get a term for the variational lower-bound.\n        The resulting units are bits (rather than nats, as one might expect).\n        This allows for comparison to other papers.\n        :return: a dict with the following keys:\n                 - 'output': a shape [N] tensor of NLLs or KLs.\n                 - 'pred_xstart': the x_0 predictions.\n        \"\"\"\n        true_mean, _, true_log_variance_clipped = self.q_posterior_mean_variance(x_start=x_start, x_t=x_t, t=t)\n        out = self.p_mean_variance(model, x_t, t, clip_denoised=clip_denoised, model_kwargs=model_kwargs)\n        kl = normal_kl(true_mean, true_log_variance_clipped, out[\"mean\"], out[\"log_variance\"])\n        kl = mean_flat(kl, mask=mask) / np.log(2.0)\n\n        decoder_nll = -discretized_gaussian_log_likelihood(\n            x_start, means=out[\"mean\"], log_scales=0.5 * out[\"log_variance\"]\n        )\n        assert decoder_nll.shape == x_start.shape\n        decoder_nll = mean_flat(decoder_nll, mask=mask) / np.log(2.0)\n\n        # At the first timestep return the decoder NLL,\n        # otherwise return KL(q(x_{t-1}|x_t,x_0) || p(x_{t-1}|x_t))\n        output = torch.where((t == 0), decoder_nll, kl)\n        return {\"output\": output, \"pred_xstart\": out[\"pred_xstart\"]}\n\n    def training_losses(self, model, x_start, model_kwargs=None, noise=None, mask=None, weights=None, t=None):\n        \"\"\"\n        Compute training losses for a single timestep.\n        :param model: the model to evaluate loss on.\n        :param x_start: the [N x C x ...] tensor of inputs.\n        :param model_kwargs: if not None, a dict of extra keyword arguments to\n            pass to the model. This can be used for conditioning.\n        :param noise: if specified, the specific Gaussian noise to try to remove.\n        :return: a dict with the key \"loss\" containing a tensor of shape [N].\n                 Some mean or variance settings may also have other keys.\n        \"\"\"\n        # sample timestep\n        t = torch.randint(0, self.num_timesteps, (x_start.shape[0],), device=x_start.device)\n\n        if model_kwargs is None:\n            model_kwargs = {}\n        if noise is None:\n            noise = torch.randn_like(x_start)\n        x_t = self.q_sample(x_start, t, noise=noise)\n        if mask is not None:\n            t0 = torch.zeros_like(t)\n            x_t0 = self.q_sample(x_start, t0, noise=noise)\n            x_t = torch.where(mask[:, None, :, None, None], x_t, x_t0)\n\n        terms = {}\n\n        if self.loss_type == LossType.KL or self.loss_type == LossType.RESCALED_KL:\n            assert mask is None, \"mask not supported for KL loss\"\n            terms[\"loss\"] = self._vb_terms_bpd(\n                model=model,\n                x_start=x_start,\n                x_t=x_t,\n                t=t,\n                clip_denoised=False,\n                model_kwargs=model_kwargs,\n            )[\"output\"]\n            if self.loss_type == LossType.RESCALED_KL:\n                terms[\"loss\"] *= self.num_timesteps\n        elif self.loss_type == LossType.MSE or self.loss_type == LossType.RESCALED_MSE:\n            model_output = model(x_t, t, **model_kwargs)\n\n            if self.model_var_type in [\n                ModelVarType.LEARNED,\n                ModelVarType.LEARNED_RANGE,\n            ]:\n                B, C = x_t.shape[:2]\n                assert model_output.shape == (B, C * 2, *x_t.shape[2:])\n                model_output, model_var_values = torch.split(model_output, C, dim=1)\n                # Learn the variance using the variational bound, but don't let\n                # it affect our mean prediction.\n                frozen_out = torch.cat([model_output.detach(), model_var_values], dim=1)\n                terms[\"vb\"] = self._vb_terms_bpd(\n                    model=lambda *args, r=frozen_out: r,\n                    x_start=x_start,\n                    x_t=x_t,\n                    t=t,\n                    clip_denoised=False,\n                    mask=mask,\n                )[\"output\"]\n                if self.loss_type == LossType.RESCALED_MSE:\n                    # Divide by 1000 for equivalence with initial implementation.\n                    # Without a factor of 1/1000, the VB term hurts the MSE term.\n                    terms[\"vb\"] *= self.num_timesteps / 1000.0\n\n            target = {\n                ModelMeanType.PREVIOUS_X: self.q_posterior_mean_variance(x_start=x_start, x_t=x_t, t=t)[0],\n                ModelMeanType.START_X: x_start,\n                ModelMeanType.EPSILON: noise,\n            }[self.model_mean_type]\n            assert model_output.shape == target.shape == x_start.shape\n            if weights is None:\n                terms[\"mse\"] = mean_flat((target - model_output) ** 2, mask=mask)\n            else:\n                weight = _extract_into_tensor(weights, t, target.shape)\n                terms[\"mse\"] = mean_flat(weight * (target - model_output) ** 2, mask=mask)\n            if \"vb\" in terms:\n                terms[\"loss\"] = terms[\"mse\"] + terms[\"vb\"]\n            else:\n                terms[\"loss\"] = terms[\"mse\"]\n        else:\n            raise NotImplementedError(self.loss_type)\n\n        return terms\n\n    def _prior_bpd(self, x_start):\n        \"\"\"\n        Get the prior KL term for the variational lower-bound, measured in\n        bits-per-dim.\n        This term can't be optimized, as it only depends on the encoder.\n        :param x_start: the [N x C x ...] tensor of inputs.\n        :return: a batch of [N] KL values (in bits), one per batch element.\n        \"\"\"\n        batch_size = x_start.shape[0]\n        t = torch.tensor([self.num_timesteps - 1] * batch_size, device=x_start.device)\n        qt_mean, _, qt_log_variance = self.q_mean_variance(x_start, t)\n        kl_prior = normal_kl(mean1=qt_mean, logvar1=qt_log_variance, mean2=0.0, logvar2=0.0)\n        return mean_flat(kl_prior) / np.log(2.0)\n\n    def calc_bpd_loop(self, model, x_start, clip_denoised=True, model_kwargs=None):\n        \"\"\"\n        Compute the entire variational lower-bound, measured in bits-per-dim,\n        as well as other related quantities.\n        :param model: the model to evaluate loss on.\n        :param x_start: the [N x C x ...] tensor of inputs.\n        :param clip_denoised: if True, clip denoised samples.\n        :param model_kwargs: if not None, a dict of extra keyword arguments to\n            pass to the model. This can be used for conditioning.\n        :return: a dict containing the following keys:\n                 - total_bpd: the total variational lower-bound, per batch element.\n                 - prior_bpd: the prior term in the lower-bound.\n                 - vb: an [N x T] tensor of terms in the lower-bound.\n                 - xstart_mse: an [N x T] tensor of x_0 MSEs for each timestep.\n                 - mse: an [N x T] tensor of epsilon MSEs for each timestep.\n        \"\"\"\n        device = x_start.device\n        batch_size = x_start.shape[0]\n\n        vb = []\n        xstart_mse = []\n        mse = []\n        for t in list(range(self.num_timesteps))[::-1]:\n            t_batch = torch.tensor([t] * batch_size, device=device)\n            noise = torch.randn_like(x_start)\n            x_t = self.q_sample(x_start=x_start, t=t_batch, noise=noise)\n            # Calculate VLB term at the current timestep\n            with torch.no_grad():\n                out = self._vb_terms_bpd(\n                    model,\n                    x_start=x_start,\n                    x_t=x_t,\n                    t=t_batch,\n                    clip_denoised=clip_denoised,\n                    model_kwargs=model_kwargs,\n                )\n            vb.append(out[\"output\"])\n            xstart_mse.append(mean_flat((out[\"pred_xstart\"] - x_start) ** 2))\n            eps = self._predict_eps_from_xstart(x_t, t_batch, out[\"pred_xstart\"])\n            mse.append(mean_flat((eps - noise) ** 2))\n\n        vb = torch.stack(vb, dim=1)\n        xstart_mse = torch.stack(xstart_mse, dim=1)\n        mse = torch.stack(mse, dim=1)\n\n        prior_bpd = self._prior_bpd(x_start)\n        total_bpd = vb.sum(dim=1) + prior_bpd\n        return {\n            \"total_bpd\": total_bpd,\n            \"prior_bpd\": prior_bpd,\n            \"vb\": vb,\n            \"xstart_mse\": xstart_mse,\n            \"mse\": mse,\n        }\n\n\ndef _extract_into_tensor(arr: torch.Tensor, timesteps: torch.Tensor, broadcast_shape: List[int]):\n    \"\"\"\n    Extract values from a 1-D numpy array for a batch of indices.\n    :param arr: the 1-D numpy array.\n    :param timesteps: a tensor of indices into the array to extract.\n    :param broadcast_shape: a larger shape of K dimensions with the batch\n                            dimension equal to the length of timesteps.\n    :return: a tensor of shape [batch_size, 1, ...] where the shape has K dims.\n    \"\"\"\n    res = arr.to(timesteps.device)[timesteps].float()\n    while len(res.shape) < len(broadcast_shape):\n        res = res[..., None]\n    return res + torch.zeros(broadcast_shape, device=timesteps.device)\n"
  },
  {
    "path": "Open-Sora/opensora/schedulers/iddpm/respace.py",
    "content": "# Adapted from DiT\n\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# DiT:   https://github.com/facebookresearch/DiT/tree/main\n# GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py\n# ADM:   https://github.com/openai/guided-diffusion/blob/main/guided_diffusion\n# IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\n# --------------------------------------------------------\n\n\nimport torch\nfrom colossalai.utils import get_current_device\n\nfrom .gaussian_diffusion import GaussianDiffusion\n\n\ndef space_timesteps(num_timesteps, section_counts):\n    \"\"\"\n    Create a list of timesteps to use from an original diffusion process,\n    given the number of timesteps we want to take from equally-sized portions\n    of the original process.\n    For example, if there's 300 timesteps and the section counts are [10,15,20]\n    then the first 100 timesteps are strided to be 10 timesteps, the second 100\n    are strided to be 15 timesteps, and the final 100 are strided to be 20.\n    If the stride is a string starting with \"ddim\", then the fixed striding\n    from the DDIM paper is used, and only one section is allowed.\n    :param num_timesteps: the number of diffusion steps in the original\n                          process to divide up.\n    :param section_counts: either a list of numbers, or a string containing\n                           comma-separated numbers, indicating the step count\n                           per section. As a special case, use \"ddimN\" where N\n                           is a number of steps to use the striding from the\n                           DDIM paper.\n    :return: a set of diffusion steps from the original process to use.\n    \"\"\"\n    if isinstance(section_counts, str):\n        if section_counts.startswith(\"ddim\"):\n            desired_count = int(section_counts[len(\"ddim\") :])\n            for i in range(1, num_timesteps):\n                if len(range(0, num_timesteps, i)) == desired_count:\n                    return set(range(0, num_timesteps, i))\n            raise ValueError(f\"cannot create exactly {num_timesteps} steps with an integer stride\")\n        section_counts = [int(x) for x in section_counts.split(\",\")]\n    size_per = num_timesteps // len(section_counts)\n    extra = num_timesteps % len(section_counts)\n    start_idx = 0\n    all_steps = []\n    for i, section_count in enumerate(section_counts):\n        size = size_per + (1 if i < extra else 0)\n        if size < section_count:\n            raise ValueError(f\"cannot divide section of {size} steps into {section_count}\")\n        if section_count <= 1:\n            frac_stride = 1\n        else:\n            frac_stride = (size - 1) / (section_count - 1)\n        cur_idx = 0.0\n        taken_steps = []\n        for _ in range(section_count):\n            taken_steps.append(start_idx + round(cur_idx))\n            cur_idx += frac_stride\n        all_steps += taken_steps\n        start_idx += size\n    return set(all_steps)\n\n\nclass SpacedDiffusion(GaussianDiffusion):\n    \"\"\"\n    A diffusion process which can skip steps in a base diffusion process.\n    :param use_timesteps: a collection (sequence or set) of timesteps from the\n                          original diffusion process to retain.\n    :param kwargs: the kwargs to create the base diffusion process.\n    \"\"\"\n\n    def __init__(self, use_timesteps, **kwargs):\n        self.use_timesteps = set(use_timesteps)\n        self.timestep_map = []\n        self.original_num_steps = len(kwargs[\"betas\"])\n\n        base_diffusion = GaussianDiffusion(**kwargs)  # pylint: disable=missing-kwoa\n        last_alpha_cumprod = 1.0\n        new_betas = []\n        for i, alpha_cumprod in enumerate(base_diffusion.alphas_cumprod):\n            if i in self.use_timesteps:\n                new_betas.append(1 - alpha_cumprod / last_alpha_cumprod)\n                last_alpha_cumprod = alpha_cumprod\n                self.timestep_map.append(i)\n        kwargs[\"betas\"] = torch.FloatTensor(new_betas)\n        super().__init__(**kwargs)\n        self.map_tensor = torch.tensor(self.timestep_map, device=get_current_device())\n\n    def p_mean_variance(self, model, *args, **kwargs):  # pylint: disable=signature-differs\n        return super().p_mean_variance(self._wrap_model(model), *args, **kwargs)\n\n    def training_losses(self, model, *args, **kwargs):  # pylint: disable=signature-differs\n        return super().training_losses(self._wrap_model(model), *args, **kwargs)\n\n    def condition_mean(self, cond_fn, *args, **kwargs):\n        return super().condition_mean(self._wrap_model(cond_fn), *args, **kwargs)\n\n    def condition_score(self, cond_fn, *args, **kwargs):\n        return super().condition_score(self._wrap_model(cond_fn), *args, **kwargs)\n\n    def _wrap_model(self, model):\n        if isinstance(model, _WrappedModel):\n            return model\n        return _WrappedModel(model, self.map_tensor, self.original_num_steps)\n\n    def _scale_timesteps(self, t):\n        # Scaling is done by the wrapped model.\n        return t\n\n\nclass _WrappedModel:\n    def __init__(self, model, map_tensor, original_num_steps):\n        self.model = model\n        self.map_tensor = map_tensor\n        # self.rescale_timesteps = rescale_timesteps\n        self.original_num_steps = original_num_steps\n\n    def __call__(self, x, ts, **kwargs):\n        new_ts = self.map_tensor[ts].to(device=ts.device, dtype=ts.dtype)\n        # if self.rescale_timesteps:\n        #     new_ts = new_ts.float() * (1000.0 / self.original_num_steps)\n        return self.model(x, new_ts, **kwargs)\n"
  },
  {
    "path": "Open-Sora/opensora/schedulers/iddpm/speed.py",
    "content": "import numpy as np\nimport torch\nimport torch.nn.functional as F\n\nfrom opensora.registry import SCHEDULERS\n\nfrom . import gaussian_diffusion as gd\nfrom .respace import SpacedDiffusion, space_timesteps\n\n\n@SCHEDULERS.register_module(\"iddpm-speed\")\nclass SpeeDiffusion(SpacedDiffusion):\n    def __init__(\n        self,\n        num_sampling_steps=None,\n        timestep_respacing=None,\n        noise_schedule=\"linear\",\n        use_kl=False,\n        sigma_small=False,\n        predict_xstart=False,\n        learn_sigma=True,\n        rescale_learned_sigmas=False,\n        diffusion_steps=1000,\n        cfg_scale=4.0,\n    ):\n        betas = gd.get_named_beta_schedule(noise_schedule, diffusion_steps)\n        if use_kl:\n            loss_type = gd.LossType.RESCALED_KL\n        elif rescale_learned_sigmas:\n            loss_type = gd.LossType.RESCALED_MSE\n        else:\n            loss_type = gd.LossType.MSE\n        if num_sampling_steps is not None:\n            assert timestep_respacing is None\n            timestep_respacing = str(num_sampling_steps)\n        if timestep_respacing is None or timestep_respacing == \"\":\n            timestep_respacing = [diffusion_steps]\n        super().__init__(\n            use_timesteps=space_timesteps(diffusion_steps, timestep_respacing),\n            betas=betas,\n            model_mean_type=(gd.ModelMeanType.EPSILON if not predict_xstart else gd.ModelMeanType.START_X),\n            model_var_type=(\n                (gd.ModelVarType.FIXED_LARGE if not sigma_small else gd.ModelVarType.FIXED_SMALL)\n                if not learn_sigma\n                else gd.ModelVarType.LEARNED_RANGE\n            ),\n            loss_type=loss_type,\n        )\n\n        self.cfg_scale = cfg_scale\n        # we fallback to numpy here as argmax_cuda is not implemented for Bool\n        grad = np.gradient(self.sqrt_one_minus_alphas_cumprod.cpu())\n        self.meaningful_steps = np.argmax(grad < 5e-5) + 1\n\n        # p2 weighting from: Perception Prioritized Training of Diffusion Models\n        self.p2_gamma = 1\n        self.p2_k = 1\n        self.snr = 1.0 / (1 - self.alphas_cumprod) - 1\n        sqrt_one_minus_alphas_bar = self.sqrt_one_minus_alphas_cumprod\n        p = torch.tanh(1e6 * (torch.gradient(sqrt_one_minus_alphas_bar)[0] - 1e-4)) + 1.5\n        self.p = F.normalize(p, p=1, dim=0)\n        self.weights = 1 / (self.p2_k + self.snr) ** self.p2_gamma\n\n    def t_sample(self, n, device):\n        t = torch.multinomial(self.p, n // 2 + 1, replacement=True).to(device)\n        dual_t = torch.where(t < self.meaningful_steps, self.meaningful_steps - t, t - self.meaningful_steps)\n        t = torch.cat([t, dual_t], dim=0)[:n]\n        return t\n\n    def training_losses(self, model, x, *args, **kwargs):  # pylint: disable=signature-differs\n        t = self.t_sample(x.shape[0], x.device)\n        return super().training_losses(model, x, t, weights=self.weights, *args, **kwargs)\n\n    def sample(self, *args, **kwargs):\n        raise NotImplementedError(\"SpeeDiffusion is only for training\")\n"
  },
  {
    "path": "Open-Sora/opensora/schedulers/iddpm/timestep_sampler.py",
    "content": "# Adapted from DiT\n\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# DiT:   https://github.com/facebookresearch/DiT/tree/main\n# GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py\n# ADM:   https://github.com/openai/guided-diffusion/blob/main/guided_diffusion\n# IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\n# --------------------------------------------------------\n\nfrom abc import ABC, abstractmethod\n\nimport numpy as np\nimport torch as th\nimport torch.distributed as dist\n\n\ndef create_named_schedule_sampler(name, diffusion):\n    \"\"\"\n    Create a ScheduleSampler from a library of pre-defined samplers.\n    :param name: the name of the sampler.\n    :param diffusion: the diffusion object to sample for.\n    \"\"\"\n    if name == \"uniform\":\n        return UniformSampler(diffusion)\n    elif name == \"loss-second-moment\":\n        return LossSecondMomentResampler(diffusion)\n    else:\n        raise NotImplementedError(f\"unknown schedule sampler: {name}\")\n\n\nclass ScheduleSampler(ABC):\n    \"\"\"\n    A distribution over timesteps in the diffusion process, intended to reduce\n    variance of the objective.\n    By default, samplers perform unbiased importance sampling, in which the\n    objective's mean is unchanged.\n    However, subclasses may override sample() to change how the resampled\n    terms are reweighted, allowing for actual changes in the objective.\n    \"\"\"\n\n    @abstractmethod\n    def weights(self):\n        \"\"\"\n        Get a numpy array of weights, one per diffusion step.\n        The weights needn't be normalized, but must be positive.\n        \"\"\"\n\n    def sample(self, batch_size, device):\n        \"\"\"\n        Importance-sample timesteps for a batch.\n        :param batch_size: the number of timesteps.\n        :param device: the torch device to save to.\n        :return: a tuple (timesteps, weights):\n                 - timesteps: a tensor of timestep indices.\n                 - weights: a tensor of weights to scale the resulting losses.\n        \"\"\"\n        w = self.weights()\n        p = w / np.sum(w)\n        indices_np = np.random.choice(len(p), size=(batch_size,), p=p)\n        indices = th.from_numpy(indices_np).long().to(device)\n        weights_np = 1 / (len(p) * p[indices_np])\n        weights = th.from_numpy(weights_np).float().to(device)\n        return indices, weights\n\n\nclass UniformSampler(ScheduleSampler):\n    def __init__(self, diffusion):\n        self.diffusion = diffusion\n        self._weights = np.ones([diffusion.num_timesteps])\n\n    def weights(self):\n        return self._weights\n\n\nclass LossAwareSampler(ScheduleSampler):\n    def update_with_local_losses(self, local_ts, local_losses):\n        \"\"\"\n        Update the reweighting using losses from a model.\n        Call this method from each rank with a batch of timesteps and the\n        corresponding losses for each of those timesteps.\n        This method will perform synchronization to make sure all of the ranks\n        maintain the exact same reweighting.\n        :param local_ts: an integer Tensor of timesteps.\n        :param local_losses: a 1D Tensor of losses.\n        \"\"\"\n        batch_sizes = [th.tensor([0], dtype=th.int32, device=local_ts.device) for _ in range(dist.get_world_size())]\n        dist.all_gather(\n            batch_sizes,\n            th.tensor([len(local_ts)], dtype=th.int32, device=local_ts.device),\n        )\n\n        # Pad all_gather batches to be the maximum batch size.\n        batch_sizes = [x.item() for x in batch_sizes]\n        max_bs = max(batch_sizes)\n\n        timestep_batches = [th.zeros(max_bs).to(local_ts) for bs in batch_sizes]\n        loss_batches = [th.zeros(max_bs).to(local_losses) for bs in batch_sizes]\n        dist.all_gather(timestep_batches, local_ts)\n        dist.all_gather(loss_batches, local_losses)\n        timesteps = [x.item() for y, bs in zip(timestep_batches, batch_sizes) for x in y[:bs]]\n        losses = [x.item() for y, bs in zip(loss_batches, batch_sizes) for x in y[:bs]]\n        self.update_with_all_losses(timesteps, losses)\n\n    @abstractmethod\n    def update_with_all_losses(self, ts, losses):\n        \"\"\"\n        Update the reweighting using losses from a model.\n        Sub-classes should override this method to update the reweighting\n        using losses from the model.\n        This method directly updates the reweighting without synchronizing\n        between workers. It is called by update_with_local_losses from all\n        ranks with identical arguments. Thus, it should have deterministic\n        behavior to maintain state across workers.\n        :param ts: a list of int timesteps.\n        :param losses: a list of float losses, one per timestep.\n        \"\"\"\n\n\nclass LossSecondMomentResampler(LossAwareSampler):\n    def __init__(self, diffusion, history_per_term=10, uniform_prob=0.001):\n        self.diffusion = diffusion\n        self.history_per_term = history_per_term\n        self.uniform_prob = uniform_prob\n        self._loss_history = np.zeros([diffusion.num_timesteps, history_per_term], dtype=np.float64)\n        self._loss_counts = np.zeros([diffusion.num_timesteps], dtype=np.int)\n\n    def weights(self):\n        if not self._warmed_up():\n            return np.ones([self.diffusion.num_timesteps], dtype=np.float64)\n        weights = np.sqrt(np.mean(self._loss_history**2, axis=-1))\n        weights /= np.sum(weights)\n        weights *= 1 - self.uniform_prob\n        weights += self.uniform_prob / len(weights)\n        return weights\n\n    def update_with_all_losses(self, ts, losses):\n        for t, loss in zip(ts, losses):\n            if self._loss_counts[t] == self.history_per_term:\n                # Shift out the oldest loss term.\n                self._loss_history[t, :-1] = self._loss_history[t, 1:]\n                self._loss_history[t, -1] = loss\n            else:\n                self._loss_history[t, self._loss_counts[t]] = loss\n                self._loss_counts[t] += 1\n\n    def _warmed_up(self):\n        return (self._loss_counts == self.history_per_term).all()\n"
  },
  {
    "path": "Open-Sora/opensora/schedulers/rf/__init__.py",
    "content": "import torch\nfrom tqdm import tqdm\n\nfrom opensora.registry import SCHEDULERS\n\nfrom .rectified_flow import RFlowScheduler, timestep_transform\nfrom ...models.cache_functions import cache_init\nimport re\n\n@SCHEDULERS.register_module(\"rflow\")\nclass RFLOW:\n    def __init__(\n        self,\n        num_sampling_steps=10,\n        num_timesteps=1000,\n        cfg_scale=4.0,\n        use_discrete_timesteps=False,\n        use_timestep_transform=False,\n        **kwargs,\n    ):\n        self.num_sampling_steps = num_sampling_steps\n        self.num_timesteps = num_timesteps\n        self.cfg_scale = cfg_scale\n        self.use_discrete_timesteps = use_discrete_timesteps\n        self.use_timestep_transform = use_timestep_transform\n        \n        self.scheduler = RFlowScheduler(\n            num_timesteps=num_timesteps,\n            num_sampling_steps=num_sampling_steps,\n            use_discrete_timesteps=use_discrete_timesteps,\n            use_timestep_transform=use_timestep_transform,\n            **kwargs,\n        )\n\n    def sample(\n        self,\n        model,\n        text_encoder,\n        z,\n        prompts,\n        device,\n        additional_args=None,\n        mask=None,\n        guidance_scale=None,\n        progress=True,\n        #flops_cal=True,\n    ):  \n        # if no specific guidance scale is provided, use the default scale when initializing the scheduler\n        if guidance_scale is None:\n            guidance_scale = self.cfg_scale\n\n        n = len(prompts)\n        # text encoding\n        model_args = text_encoder.encode(prompts)\n        y_null = text_encoder.null(n)\n        model_args[\"y\"] = torch.cat([model_args[\"y\"], y_null], 0)\n        if additional_args is not None:\n            model_args.update(additional_args)\n        # prepare timesteps\n        timesteps = [(1.0 - i / self.num_sampling_steps) * self.num_timesteps for i in range(self.num_sampling_steps)]\n        if self.use_discrete_timesteps:\n            timesteps = [int(round(t)) for t in timesteps]\n        timesteps = [torch.tensor([t] * z.shape[0], device=device) for t in timesteps]\n        if self.use_timestep_transform:\n            timesteps = [timestep_transform(t, additional_args, num_timesteps=self.num_timesteps) for t in timesteps]\n\n        if mask is not None:\n            noise_added = torch.zeros_like(mask, dtype=torch.bool)\n            noise_added = noise_added | (mask == 1)\n        \n        cache_dic_cal_flops, current_cal_flops = cache_init(model_kwargs=model_args, num_steps=self.num_sampling_steps)\n        cache_dic, current = cache_init(model_kwargs=model_args, num_steps=self.num_sampling_steps)\n        flops_sum = 0\n        cal_flops = False\n        if cal_flops:\n            from calflops import calculate_flops\n        progress_wrap = tqdm if progress else (lambda x: x)\n        for i, t in progress_wrap(enumerate(timesteps)):\n            current['step'] = i\n            current_cal_flops['step'] = i\n            # mask for adding noise\n            if mask is not None:\n                mask_t = mask * self.num_timesteps\n                x0 = z.clone()\n                x_noise = self.scheduler.add_noise(x0, torch.randn_like(x0), t)\n\n                mask_t_upper = mask_t >= t.unsqueeze(1)\n                model_args[\"x_mask\"] = mask_t_upper.repeat(2, 1)\n                mask_add_noise = mask_t_upper & ~noise_added\n\n                z = torch.where(mask_add_noise[:, None, :, None, None], x_noise, x0)\n                noise_added = mask_t_upper\n\n            # classifier-free guidance\n            z_in = torch.cat([z, z], 0)\n            t = torch.cat([t, t], 0)\n            if cal_flops:\n                flop_kwargs = model_args.copy()\n                flop_kwargs['x'] = z_in.clone()\n                flop_kwargs['timestep'] = t.clone()\n                flop_kwargs['cache_dic'] = cache_dic_cal_flops\n                flop_kwargs['current'] = current_cal_flops\n                flops, macs, params = calculate_flops(model=model,\n                                          kwargs = flop_kwargs,\n                                          print_results=False)\n                # 将字符串转换为浮点数\n                #flops = float(re.findall(r\"[-+]?\\d*\\.\\d+|\\d+\", flops)[0])\n                match = re.findall(r\"([-+]?\\d*\\.\\d+|\\d+)\\s*([GMTP]?)FLOPS\", flops)\n                flops_value = float(match[0][0])  # 提取数值部分\n                unit = match[0][1]  # 提取量级部分，如 G 或 T\n                if unit == 'G':\n                    flops = flops_value * 0.001\n                else:\n                    flops = flops_value\n                flops_sum += flops\n                \n            else:\n                pred = model(z_in, t, cache_dic=cache_dic, current=current, **model_args).chunk(2, dim=1)[0]\n                pred_cond, pred_uncond = pred.chunk(2, dim=0)\n                v_pred = pred_uncond + guidance_scale * (pred_cond - pred_uncond)\n\n                # update z\n                dt = timesteps[i] - timesteps[i + 1] if i < len(timesteps) - 1 else timesteps[i]\n                dt = dt / self.num_timesteps\n                z = z + v_pred * dt[:, None, None, None, None]\n\n                if mask is not None:\n                    z = torch.where(mask_t_upper[:, None, :, None, None], z, x0)\n        if cal_flops:\n            print(\"FLOPs:\", flops_sum, \"TFLOPs\")\n        return z\n\n    def training_losses(self, model, x_start, model_kwargs=None, noise=None, mask=None, weights=None, t=None):\n        return self.scheduler.training_losses(model, x_start, model_kwargs, noise, mask, weights, t)\n"
  },
  {
    "path": "Open-Sora/opensora/schedulers/rf/rectified_flow.py",
    "content": "import torch\nfrom torch.distributions import LogisticNormal\n\nfrom ..iddpm.gaussian_diffusion import _extract_into_tensor, mean_flat\n\n# some code are inspired by https://github.com/magic-research/piecewise-rectified-flow/blob/main/scripts/train_perflow.py\n# and https://github.com/magic-research/piecewise-rectified-flow/blob/main/src/scheduler_perflow.py\n\n\ndef timestep_transform(\n    t,\n    model_kwargs,\n    base_resolution=512 * 512,\n    base_num_frames=1,\n    scale=1.0,\n    num_timesteps=1,\n):\n    # Force fp16 input to fp32 to avoid nan output\n    for key in [\"height\", \"width\", \"num_frames\"]:\n        if model_kwargs[key].dtype == torch.float16:\n            model_kwargs[key] = model_kwargs[key].float()\n\n    t = t / num_timesteps\n    resolution = model_kwargs[\"height\"] * model_kwargs[\"width\"]\n    ratio_space = (resolution / base_resolution).sqrt()\n    # NOTE: currently, we do not take fps into account\n    # NOTE: temporal_reduction is hardcoded, this should be equal to the temporal reduction factor of the vae\n    if model_kwargs[\"num_frames\"][0] == 1:\n        num_frames = torch.ones_like(model_kwargs[\"num_frames\"])\n    else:\n        num_frames = model_kwargs[\"num_frames\"] // 17 * 5\n    ratio_time = (num_frames / base_num_frames).sqrt()\n\n    ratio = ratio_space * ratio_time * scale\n    new_t = ratio * t / (1 + (ratio - 1) * t)\n\n    new_t = new_t * num_timesteps\n    return new_t\n\n\nclass RFlowScheduler:\n    def __init__(\n        self,\n        num_timesteps=1000,\n        num_sampling_steps=10,\n        use_discrete_timesteps=False,\n        sample_method=\"uniform\",\n        loc=0.0,\n        scale=1.0,\n        use_timestep_transform=False,\n        transform_scale=1.0,\n    ):\n        self.num_timesteps = num_timesteps\n        self.num_sampling_steps = num_sampling_steps\n        self.use_discrete_timesteps = use_discrete_timesteps\n\n        # sample method\n        assert sample_method in [\"uniform\", \"logit-normal\"]\n        assert (\n            sample_method == \"uniform\" or not use_discrete_timesteps\n        ), \"Only uniform sampling is supported for discrete timesteps\"\n        self.sample_method = sample_method\n        if sample_method == \"logit-normal\":\n            self.distribution = LogisticNormal(torch.tensor([loc]), torch.tensor([scale]))\n            self.sample_t = lambda x: self.distribution.sample((x.shape[0],))[:, 0].to(x.device)\n\n        # timestep transform\n        self.use_timestep_transform = use_timestep_transform\n        self.transform_scale = transform_scale\n\n    def training_losses(self, model, x_start, model_kwargs=None, noise=None, mask=None, weights=None, t=None):\n        \"\"\"\n        Compute training losses for a single timestep.\n        Arguments format copied from opensora/schedulers/iddpm/gaussian_diffusion.py/training_losses\n        Note: t is int tensor and should be rescaled from [0, num_timesteps-1] to [1,0]\n        \"\"\"\n        if t is None:\n            if self.use_discrete_timesteps:\n                t = torch.randint(0, self.num_timesteps, (x_start.shape[0],), device=x_start.device)\n            elif self.sample_method == \"uniform\":\n                t = torch.rand((x_start.shape[0],), device=x_start.device) * self.num_timesteps\n            elif self.sample_method == \"logit-normal\":\n                t = self.sample_t(x_start) * self.num_timesteps\n\n            if self.use_timestep_transform:\n                t = timestep_transform(t, model_kwargs, scale=self.transform_scale, num_timesteps=self.num_timesteps)\n\n        if model_kwargs is None:\n            model_kwargs = {}\n        if noise is None:\n            noise = torch.randn_like(x_start)\n        assert noise.shape == x_start.shape\n\n        x_t = self.add_noise(x_start, noise, t)\n        if mask is not None:\n            t0 = torch.zeros_like(t)\n            x_t0 = self.add_noise(x_start, noise, t0)\n            x_t = torch.where(mask[:, None, :, None, None], x_t, x_t0)\n\n        terms = {}\n        model_output = model(x_t, t, **model_kwargs)\n        velocity_pred = model_output.chunk(2, dim=1)[0]\n        if weights is None:\n            loss = mean_flat((velocity_pred - (x_start - noise)).pow(2), mask=mask)\n        else:\n            weight = _extract_into_tensor(weights, t, x_start.shape)\n            loss = mean_flat(weight * (velocity_pred - (x_start - noise)).pow(2), mask=mask)\n        terms[\"loss\"] = loss\n\n        return terms\n\n    def add_noise(\n        self,\n        original_samples: torch.FloatTensor,\n        noise: torch.FloatTensor,\n        timesteps: torch.IntTensor,\n    ) -> torch.FloatTensor:\n        \"\"\"\n        compatible with diffusers add_noise()\n        \"\"\"\n        timepoints = timesteps.float() / self.num_timesteps\n        timepoints = 1 - timepoints  # [1,1/1000]\n\n        # timepoint  (bsz) noise: (bsz, 4, frame, w ,h)\n        # expand timepoint to noise shape\n        timepoints = timepoints.unsqueeze(1).unsqueeze(1).unsqueeze(1).unsqueeze(1)\n        timepoints = timepoints.repeat(1, noise.shape[1], noise.shape[2], noise.shape[3], noise.shape[4])\n\n        return timepoints * original_samples + (1 - timepoints) * noise\n"
  },
  {
    "path": "Open-Sora/opensora/utils/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/opensora/utils/ckpt_utils.py",
    "content": "import functools\nimport json\nimport operator\nimport os\nfrom typing import Tuple\n\nimport torch\nimport torch.distributed as dist\nimport torch.nn as nn\nfrom colossalai.booster import Booster\nfrom colossalai.checkpoint_io import GeneralCheckpointIO\nfrom torch.optim import Optimizer\nfrom torch.optim.lr_scheduler import _LRScheduler\nfrom torchvision.datasets.utils import download_url\n\nfrom .misc import get_logger\n\nhf_endpoint = os.environ.get(\"HF_ENDPOINT\")\nif hf_endpoint is None:\n    hf_endpoint = \"https://huggingface.co\"\n\npretrained_models = {\n    \"DiT-XL-2-512x512.pt\": \"https://dl.fbaipublicfiles.com/DiT/models/DiT-XL-2-512x512.pt\",\n    \"DiT-XL-2-256x256.pt\": \"https://dl.fbaipublicfiles.com/DiT/models/DiT-XL-2-256x256.pt\",\n    \"Latte-XL-2-256x256-ucf101.pt\": hf_endpoint + \"/maxin-cn/Latte/resolve/main/ucf101.pt\",\n    \"PixArt-XL-2-256x256.pth\": hf_endpoint + \"/PixArt-alpha/PixArt-alpha/resolve/main/PixArt-XL-2-256x256.pth\",\n    \"PixArt-XL-2-SAM-256x256.pth\": hf_endpoint + \"/PixArt-alpha/PixArt-alpha/resolve/main/PixArt-XL-2-SAM-256x256.pth\",\n    \"PixArt-XL-2-512x512.pth\": hf_endpoint + \"/PixArt-alpha/PixArt-alpha/resolve/main/PixArt-XL-2-512x512.pth\",\n    \"PixArt-XL-2-1024-MS.pth\": hf_endpoint + \"/PixArt-alpha/PixArt-alpha/resolve/main/PixArt-XL-2-1024-MS.pth\",\n    \"OpenSora-v1-16x256x256.pth\": hf_endpoint + \"/hpcai-tech/Open-Sora/resolve/main/OpenSora-v1-16x256x256.pth\",\n    \"OpenSora-v1-HQ-16x256x256.pth\": hf_endpoint + \"/hpcai-tech/Open-Sora/resolve/main/OpenSora-v1-HQ-16x256x256.pth\",\n    \"OpenSora-v1-HQ-16x512x512.pth\": hf_endpoint + \"/hpcai-tech/Open-Sora/resolve/main/OpenSora-v1-HQ-16x512x512.pth\",\n    \"PixArt-Sigma-XL-2-256x256.pth\": hf_endpoint\n    + \"/PixArt-alpha/PixArt-Sigma/resolve/main/PixArt-Sigma-XL-2-256x256.pth\",\n    \"PixArt-Sigma-XL-2-512-MS.pth\": hf_endpoint\n    + \"/PixArt-alpha/PixArt-Sigma/resolve/main/PixArt-Sigma-XL-2-512-MS.pth\",\n    \"PixArt-Sigma-XL-2-1024-MS.pth\": hf_endpoint\n    + \"/PixArt-alpha/PixArt-Sigma/resolve/main/PixArt-Sigma-XL-2-1024-MS.pth\",\n    \"PixArt-Sigma-XL-2-2K-MS.pth\": hf_endpoint + \"/PixArt-alpha/PixArt-Sigma/resolve/main/PixArt-Sigma-XL-2-2K-MS.pth\",\n}\n\n\ndef reparameter(ckpt, name=None, model=None):\n    model_name = name\n    name = os.path.basename(name)\n    if not dist.is_initialized() or dist.get_rank() == 0:\n        get_logger().info(\"loading pretrained model: %s\", model_name)\n    if name in [\"DiT-XL-2-512x512.pt\", \"DiT-XL-2-256x256.pt\"]:\n        ckpt[\"x_embedder.proj.weight\"] = ckpt[\"x_embedder.proj.weight\"].unsqueeze(2)\n        del ckpt[\"pos_embed\"]\n    if name in [\"Latte-XL-2-256x256-ucf101.pt\"]:\n        ckpt = ckpt[\"ema\"]\n        ckpt[\"x_embedder.proj.weight\"] = ckpt[\"x_embedder.proj.weight\"].unsqueeze(2)\n        del ckpt[\"pos_embed\"]\n        del ckpt[\"temp_embed\"]\n    if name in [\n        \"PixArt-XL-2-256x256.pth\",\n        \"PixArt-XL-2-SAM-256x256.pth\",\n        \"PixArt-XL-2-512x512.pth\",\n        \"PixArt-XL-2-1024-MS.pth\",\n        \"PixArt-Sigma-XL-2-256x256.pth\",\n        \"PixArt-Sigma-XL-2-512-MS.pth\",\n        \"PixArt-Sigma-XL-2-1024-MS.pth\",\n        \"PixArt-Sigma-XL-2-2K-MS.pth\",\n    ]:\n        ckpt = ckpt[\"state_dict\"]\n        ckpt[\"x_embedder.proj.weight\"] = ckpt[\"x_embedder.proj.weight\"].unsqueeze(2)\n        if \"pos_embed\" in ckpt:\n            del ckpt[\"pos_embed\"]\n\n    if name in [\n        \"PixArt-1B-2.pth\",\n    ]:\n        ckpt = ckpt[\"state_dict\"]\n        if \"pos_embed\" in ckpt:\n            del ckpt[\"pos_embed\"]\n\n    # no need pos_embed\n    if \"pos_embed_temporal\" in ckpt:\n        del ckpt[\"pos_embed_temporal\"]\n    if \"pos_embed\" in ckpt:\n        del ckpt[\"pos_embed\"]\n    # different text length\n    if \"y_embedder.y_embedding\" in ckpt:\n        if ckpt[\"y_embedder.y_embedding\"].shape[0] < model.y_embedder.y_embedding.shape[0]:\n            get_logger().info(\n                \"Extend y_embedding from %s to %s\",\n                ckpt[\"y_embedder.y_embedding\"].shape[0],\n                model.y_embedder.y_embedding.shape[0],\n            )\n            additional_length = model.y_embedder.y_embedding.shape[0] - ckpt[\"y_embedder.y_embedding\"].shape[0]\n            new_y_embedding = torch.zeros(additional_length, model.y_embedder.y_embedding.shape[1])\n            new_y_embedding[:] = ckpt[\"y_embedder.y_embedding\"][-1]\n            ckpt[\"y_embedder.y_embedding\"] = torch.cat([ckpt[\"y_embedder.y_embedding\"], new_y_embedding], dim=0)\n        elif ckpt[\"y_embedder.y_embedding\"].shape[0] > model.y_embedder.y_embedding.shape[0]:\n            get_logger().info(\n                \"Shrink y_embedding from %s to %s\",\n                ckpt[\"y_embedder.y_embedding\"].shape[0],\n                model.y_embedder.y_embedding.shape[0],\n            )\n            ckpt[\"y_embedder.y_embedding\"] = ckpt[\"y_embedder.y_embedding\"][: model.y_embedder.y_embedding.shape[0]]\n    # stdit3 special case\n    if type(model).__name__ == \"STDiT3\" and \"PixArt-Sigma\" in name:\n        ckpt_keys = list(ckpt.keys())\n        for key in ckpt_keys:\n            if \"blocks.\" in key:\n                ckpt[key.replace(\"blocks.\", \"spatial_blocks.\")] = ckpt[key]\n                del ckpt[key]\n\n    return ckpt\n\n\ndef find_model(model_name, model=None):\n    \"\"\"\n    Finds a pre-trained DiT model, downloading it if necessary. Alternatively, loads a model from a local path.\n    \"\"\"\n    if model_name in pretrained_models:  # Find/download our pre-trained DiT checkpoints\n        model_ckpt = download_model(model_name)\n        model_ckpt = reparameter(model_ckpt, model_name, model=model)\n    else:  # Load a custom DiT checkpoint:\n        assert os.path.isfile(model_name), f\"Could not find DiT checkpoint at {model_name}\"\n        model_ckpt = torch.load(model_name, map_location=lambda storage, loc: storage)\n        model_ckpt = reparameter(model_ckpt, model_name, model=model)\n    return model_ckpt\n\n\ndef download_model(model_name=None, local_path=None, url=None):\n    \"\"\"\n    Downloads a pre-trained DiT model from the web.\n    \"\"\"\n    if model_name is not None:\n        assert model_name in pretrained_models\n        local_path = f\"pretrained_models/{model_name}\"\n        web_path = pretrained_models[model_name]\n    else:\n        assert local_path is not None\n        assert url is not None\n        web_path = url\n    if not os.path.isfile(local_path):\n        os.makedirs(\"pretrained_models\", exist_ok=True)\n        dir_name = os.path.dirname(local_path)\n        file_name = os.path.basename(local_path)\n        download_url(web_path, dir_name, file_name)\n    model = torch.load(local_path, map_location=lambda storage, loc: storage)\n    return model\n\n\ndef load_from_sharded_state_dict(model, ckpt_path, model_name=\"model.safetensors\", strict=False):\n    ckpt_io = GeneralCheckpointIO()\n    ckpt_io.load_model(model, os.path.join(ckpt_path, model_name), strict=strict)\n\n\ndef model_sharding(model: torch.nn.Module):\n    global_rank = dist.get_rank()\n    world_size = dist.get_world_size()\n    for _, param in model.named_parameters():\n        padding_size = (world_size - param.numel() % world_size) % world_size\n        if padding_size > 0:\n            padding_param = torch.nn.functional.pad(param.data.view(-1), [0, padding_size])\n        else:\n            padding_param = param.data.view(-1)\n        splited_params = padding_param.split(padding_param.numel() // world_size)\n        splited_params = splited_params[global_rank]\n        param.data = splited_params\n\n\ndef model_gathering(model: torch.nn.Module, model_shape_dict: dict):\n    global_rank = dist.get_rank()\n    global_size = dist.get_world_size()\n    for name, param in model.named_parameters():\n        all_params = [torch.empty_like(param.data) for _ in range(global_size)]\n        dist.all_gather(all_params, param.data, group=dist.group.WORLD)\n        if int(global_rank) == 0:\n            all_params = torch.cat(all_params)\n            param.data = remove_padding(all_params, model_shape_dict[name]).view(model_shape_dict[name])\n    dist.barrier()\n\n\ndef remove_padding(tensor: torch.Tensor, original_shape: Tuple) -> torch.Tensor:\n    return tensor[: functools.reduce(operator.mul, original_shape)]\n\n\ndef record_model_param_shape(model: torch.nn.Module) -> dict:\n    param_shape = {}\n    for name, param in model.named_parameters():\n        param_shape[name] = param.shape\n    return param_shape\n\n\ndef load_checkpoint(model, ckpt_path, save_as_pt=False, model_name=\"model.safetensors\", strict=False):\n    if ckpt_path.endswith(\".pt\") or ckpt_path.endswith(\".pth\"):\n        state_dict = find_model(ckpt_path, model=model)\n        missing_keys, unexpected_keys = model.load_state_dict(state_dict, strict=strict)\n        get_logger().info(\"Missing keys: %s\", missing_keys)\n        get_logger().info(\"Unexpected keys: %s\", unexpected_keys)\n    elif ckpt_path.endswith(\".safetensors\"):\n        from safetensors.torch import load_file\n        state_dict = load_file(ckpt_path)\n        missing_keys, unexpected_keys = model.load_state_dict(state_dict, strict=False)\n        print(f\"Missing keys: {missing_keys}\")\n        print(f\"Unexpected keys: {unexpected_keys}\")\n    elif os.path.isdir(ckpt_path):\n        load_from_sharded_state_dict(model, ckpt_path, model_name, strict=strict)\n        get_logger().info(\"Model checkpoint loaded from %s\", ckpt_path)\n        if save_as_pt:\n            save_path = os.path.join(ckpt_path, model_name + \"_ckpt.pt\")\n            torch.save(model.state_dict(), save_path)\n            get_logger().info(\"Model checkpoint saved to %s\", save_path)\n    else:\n        raise ValueError(f\"Invalid checkpoint path: {ckpt_path}\")\n\n\ndef load_json(file_path: str):\n    with open(file_path, \"r\") as f:\n        return json.load(f)\n\n\ndef save_json(data, file_path: str):\n    with open(file_path, \"w\") as f:\n        json.dump(data, f, indent=4)\n\n\n# save and load for training\n\n\ndef save(\n    booster: Booster,\n    save_dir: str,\n    model: nn.Module = None,\n    ema: nn.Module = None,\n    optimizer: Optimizer = None,\n    lr_scheduler: _LRScheduler = None,\n    sampler=None,\n    epoch: int = None,\n    step: int = None,\n    global_step: int = None,\n    batch_size: int = None,\n):\n    save_dir = os.path.join(save_dir, f\"epoch{epoch}-global_step{global_step}\")\n    os.makedirs(os.path.join(save_dir, \"model\"), exist_ok=True)\n\n    if model is not None:\n        booster.save_model(model, os.path.join(save_dir, \"model\"), shard=True)\n    if optimizer is not None:\n        booster.save_optimizer(optimizer, os.path.join(save_dir, \"optimizer\"), shard=True, size_per_shard=4096)\n    if lr_scheduler is not None:\n        booster.save_lr_scheduler(lr_scheduler, os.path.join(save_dir, \"lr_scheduler\"))\n    if dist.get_rank() == 0:\n        running_states = {\n            \"epoch\": epoch,\n            \"step\": step,\n            \"global_step\": global_step,\n            \"batch_size\": batch_size,\n        }\n        save_json(running_states, os.path.join(save_dir, \"running_states.json\"))\n\n        if ema is not None:\n            torch.save(ema.state_dict(), os.path.join(save_dir, \"ema.pt\"))\n\n        if sampler is not None:\n            # only for VariableVideoBatchSampler\n            torch.save(sampler.state_dict(step), os.path.join(save_dir, \"sampler\"))\n    dist.barrier()\n    return save_dir\n\n\ndef load(\n    booster: Booster,\n    load_dir: str,\n    model: nn.Module = None,\n    ema: nn.Module = None,\n    optimizer: Optimizer = None,\n    lr_scheduler: _LRScheduler = None,\n    sampler=None,\n) -> Tuple[int, int, int]:\n    assert os.path.exists(load_dir), f\"Checkpoint directory {load_dir} does not exist\"\n    assert os.path.exists(os.path.join(load_dir, \"running_states.json\")), \"running_states.json does not exist\"\n    running_states = load_json(os.path.join(load_dir, \"running_states.json\"))\n    if model is not None:\n        booster.load_model(model, os.path.join(load_dir, \"model\"))\n    if ema is not None:\n        # ema is not boosted, so we don't use booster.load_model\n        ema.load_state_dict(\n            torch.load(os.path.join(load_dir, \"ema.pt\"), map_location=torch.device(\"cpu\")),\n            strict=False,\n        )\n    if optimizer is not None:\n        booster.load_optimizer(optimizer, os.path.join(load_dir, \"optimizer\"))\n    if lr_scheduler is not None:\n        booster.load_lr_scheduler(lr_scheduler, os.path.join(load_dir, \"lr_scheduler\"))\n    if sampler is not None:\n        sampler.load_state_dict(torch.load(os.path.join(load_dir, \"sampler\")))\n    dist.barrier()\n\n    return (\n        running_states[\"epoch\"],\n        running_states[\"step\"],\n    )\n"
  },
  {
    "path": "Open-Sora/opensora/utils/config_utils.py",
    "content": "import argparse\nimport json\nimport os\nfrom glob import glob\n\nfrom mmengine.config import Config\n\n\ndef parse_args(training=False):\n    parser = argparse.ArgumentParser()\n\n    # model config\n    parser.add_argument(\"config\", help=\"model config file path\")\n\n    # ======================================================\n    # General\n    # ======================================================\n    parser.add_argument(\"--seed\", default=None, type=int, help=\"seed for reproducibility\")\n    parser.add_argument(\n        \"--ckpt-path\",\n        default=None,\n        type=str,\n        help=\"path to model ckpt; will overwrite cfg.model.from_pretrained if specified\",\n    )\n    parser.add_argument(\"--batch-size\", default=None, type=int, help=\"batch size\")\n    parser.add_argument(\"--outputs\", default=None, type=str, help=\"the dir to save model weights\")\n    parser.add_argument(\"--flash-attn\", default=None, type=str2bool, help=\"enable flash attention\")\n    parser.add_argument(\"--layernorm-kernel\", default=None, type=str2bool, help=\"enable layernorm kernel\")\n    parser.add_argument(\"--resolution\", default=None, type=str, help=\"multi resolution\")\n    parser.add_argument(\"--data-path\", default=None, type=str, help=\"path to data csv\")\n    parser.add_argument(\"--dtype\", default=None, type=str, help=\"data type\")\n\n    # ======================================================\n    # Inference\n    # ======================================================\n    if not training:\n        # output\n        parser.add_argument(\"--save-dir\", default=None, type=str, help=\"path to save generated samples\")\n        parser.add_argument(\"--sample-name\", default=None, type=str, help=\"sample name, default is sample_idx\")\n        parser.add_argument(\"--start-index\", default=None, type=int, help=\"start index for sample name\")\n        parser.add_argument(\"--end-index\", default=None, type=int, help=\"end index for sample name\")\n        parser.add_argument(\"--num-sample\", default=None, type=int, help=\"number of samples to generate for one prompt\")\n        parser.add_argument(\"--prompt-as-path\", action=\"store_true\", help=\"use prompt as path to save samples\")\n        parser.add_argument(\"--verbose\", default=None, type=int, help=\"verbose level\")\n\n        # prompt\n        parser.add_argument(\"--prompt-path\", default=None, type=str, help=\"path to prompt txt file\")\n        parser.add_argument(\"--prompt\", default=None, type=str, nargs=\"+\", help=\"prompt list\")\n        parser.add_argument(\"--llm-refine\", default=None, type=str2bool, help=\"enable LLM refine\")\n        parser.add_argument(\"--prompt-generator\", default=None, type=str, help=\"prompt generator\")\n\n        # image/video\n        parser.add_argument(\"--num-frames\", default=None, type=str, help=\"number of frames\")\n        parser.add_argument(\"--fps\", default=None, type=int, help=\"fps\")\n        parser.add_argument(\"--save-fps\", default=None, type=int, help=\"save fps\")\n        parser.add_argument(\"--image-size\", default=None, type=int, nargs=2, help=\"image size\")\n        parser.add_argument(\"--frame-interval\", default=None, type=int, help=\"frame interval\")\n        parser.add_argument(\"--aspect-ratio\", default=None, type=str, help=\"aspect ratio (h:w)\")\n        parser.add_argument(\"--watermark\", default=None, type=str2bool, help=\"watermark video\")\n\n        # hyperparameters\n        parser.add_argument(\"--num-sampling-steps\", default=None, type=int, help=\"sampling steps\")\n        parser.add_argument(\"--cfg-scale\", default=None, type=float, help=\"balance between cond & uncond\")\n\n        # reference\n        parser.add_argument(\"--loop\", default=None, type=int, help=\"loop\")\n        parser.add_argument(\"--condition-frame-length\", default=None, type=int, help=\"condition frame length\")\n        parser.add_argument(\"--reference-path\", default=None, type=str, nargs=\"+\", help=\"reference path\")\n        parser.add_argument(\"--mask-strategy\", default=None, type=str, nargs=\"+\", help=\"mask strategy\")\n        parser.add_argument(\"--aes\", default=None, type=float, help=\"aesthetic score\")\n        parser.add_argument(\"--flow\", default=None, type=float, help=\"flow score\")\n        parser.add_argument(\"--camera-motion\", default=None, type=str, help=\"camera motion\")\n    # ======================================================\n    # Training\n    # ======================================================\n    else:\n        parser.add_argument(\"--lr\", default=None, type=float, help=\"learning rate\")\n        parser.add_argument(\"--wandb\", default=None, type=bool, help=\"enable wandb\")\n        parser.add_argument(\"--load\", default=None, type=str, help=\"path to continue training\")\n        parser.add_argument(\"--start-from-scratch\", action=\"store_true\", help=\"start training from scratch\")\n        parser.add_argument(\"--warmup-steps\", default=None, type=int, help=\"warmup steps\")\n        parser.add_argument(\"--record-time\", default=False, action=\"store_true\", help=\"record time of each part\")\n\n    return parser.parse_args()\n\n\ndef merge_args(cfg, args, training=False):\n    if args.ckpt_path is not None:\n        cfg.model[\"from_pretrained\"] = args.ckpt_path\n        if cfg.get(\"discriminator\") is not None:\n            cfg.discriminator[\"from_pretrained\"] = args.ckpt_path\n        args.ckpt_path = None\n    if args.flash_attn is not None:\n        cfg.model[\"enable_flash_attn\"] = args.flash_attn\n        args.enable_flash_attn = None\n    if args.layernorm_kernel is not None:\n        cfg.model[\"enable_layernorm_kernel\"] = args.layernorm_kernel\n        args.enable_layernorm_kernel = None\n    if args.data_path is not None:\n        cfg.dataset[\"data_path\"] = args.data_path\n        args.data_path = None\n    # NOTE: for vae inference (reconstruction)\n    if not training and \"dataset\" in cfg:\n        if args.image_size is not None:\n            cfg.dataset[\"image_size\"] = args.image_size\n        if args.num_frames is not None:\n            cfg.dataset[\"num_frames\"] = args.num_frames\n    if not training:\n        if args.cfg_scale is not None:\n            cfg.scheduler[\"cfg_scale\"] = args.cfg_scale\n            args.cfg_scale = None\n        if args.num_sampling_steps is not None:\n            cfg.scheduler[\"num_sampling_steps\"] = args.num_sampling_steps\n            args.num_sampling_steps = None\n\n    for k, v in vars(args).items():\n        if v is not None:\n            cfg[k] = v\n\n    return cfg\n\n\ndef read_config(config_path):\n    cfg = Config.fromfile(config_path)\n    return cfg\n\n\ndef parse_configs(training=False):\n    args = parse_args(training)\n    cfg = read_config(args.config)\n    cfg = merge_args(cfg, args, training)\n    return cfg\n\n\ndef define_experiment_workspace(cfg, get_last_workspace=False):\n    \"\"\"\n    This function creates a folder for experiment tracking.\n\n    Args:\n        args: The parsed arguments.\n\n    Returns:\n        exp_dir: The path to the experiment folder.\n    \"\"\"\n    # Make outputs folder (holds all experiment subfolders)\n    os.makedirs(cfg.outputs, exist_ok=True)\n    experiment_index = len(glob(f\"{cfg.outputs}/*\"))\n    if get_last_workspace:\n        experiment_index -= 1\n\n    # Create an experiment folder\n    model_name = cfg.model[\"type\"].replace(\"/\", \"-\")\n    exp_name = f\"{experiment_index:03d}-{model_name}\"\n    exp_dir = f\"{cfg.outputs}/{exp_name}\"\n    return exp_name, exp_dir\n\n\ndef save_training_config(cfg, experiment_dir):\n    with open(f\"{experiment_dir}/config.txt\", \"w\") as f:\n        json.dump(cfg, f, indent=4)\n\n\ndef str2bool(v):\n    if isinstance(v, bool):\n        return v\n    if v.lower() in (\"yes\", \"true\", \"t\", \"y\", \"1\"):\n        return True\n    elif v.lower() in (\"no\", \"false\", \"f\", \"n\", \"0\"):\n        return False\n    else:\n        raise argparse.ArgumentTypeError(\"Boolean value expected.\")\n"
  },
  {
    "path": "Open-Sora/opensora/utils/inference_utils.py",
    "content": "import json\nimport os\nimport re\n\nimport torch\n\nfrom opensora.datasets import IMG_FPS\nfrom opensora.datasets.utils import read_from_path\n\n\ndef prepare_multi_resolution_info(info_type, batch_size, image_size, num_frames, fps, device, dtype):\n    if info_type is None:\n        return dict()\n    elif info_type == \"PixArtMS\":\n        hw = torch.tensor([image_size], device=device, dtype=dtype).repeat(batch_size, 1)\n        ar = torch.tensor([[image_size[0] / image_size[1]]], device=device, dtype=dtype).repeat(batch_size, 1)\n        return dict(ar=ar, hw=hw)\n    elif info_type in [\"STDiT2\", \"OpenSora\"]:\n        fps = fps if num_frames > 1 else IMG_FPS\n        fps = torch.tensor([fps], device=device, dtype=dtype).repeat(batch_size)\n        height = torch.tensor([image_size[0]], device=device, dtype=dtype).repeat(batch_size)\n        width = torch.tensor([image_size[1]], device=device, dtype=dtype).repeat(batch_size)\n        num_frames = torch.tensor([num_frames], device=device, dtype=dtype).repeat(batch_size)\n        ar = torch.tensor([image_size[0] / image_size[1]], device=device, dtype=dtype).repeat(batch_size)\n        return dict(height=height, width=width, num_frames=num_frames, ar=ar, fps=fps)\n    else:\n        raise NotImplementedError\n\n\ndef load_prompts(prompt_path, start_idx=None, end_idx=None):\n    with open(prompt_path, \"r\") as f:\n        prompts = [line.strip() for line in f.readlines()]\n    prompts = prompts[start_idx:end_idx]\n    return prompts\n\n\ndef get_save_path_name(\n    save_dir,\n    sample_name=None,  # prefix\n    sample_idx=None,  # sample index\n    prompt=None,  # used prompt\n    prompt_as_path=False,  # use prompt as path\n    num_sample=1,  # number of samples to generate for one prompt\n    k=None,  # kth sample\n):\n    if sample_name is None:\n        sample_name = \"\" if prompt_as_path else \"sample\"\n    sample_name_suffix = prompt if prompt_as_path else f\"_{sample_idx:04d}\"\n    save_path = os.path.join(save_dir, f\"{sample_name}{sample_name_suffix}\")\n    if num_sample != 1:\n        save_path = f\"{save_path}-{k}\"\n    return save_path\n\n\ndef append_score_to_prompts(prompts, aes=None, flow=None, camera_motion=None):\n    new_prompts = []\n    for prompt in prompts:\n        new_prompt = prompt\n        if aes is not None and \"aesthetic score:\" not in prompt:\n            new_prompt = f\"{new_prompt} aesthetic score: {aes:.1f}.\"\n        if flow is not None and \"motion score:\" not in prompt:\n            new_prompt = f\"{new_prompt} motion score: {flow:.1f}.\"\n        if camera_motion is not None and \"camera motion:\" not in prompt:\n            new_prompt = f\"{new_prompt} camera motion: {camera_motion}.\"\n        new_prompts.append(new_prompt)\n    return new_prompts\n\n\ndef extract_json_from_prompts(prompts, reference, mask_strategy):\n    ret_prompts = []\n    for i, prompt in enumerate(prompts):\n        parts = re.split(r\"(?=[{])\", prompt)\n        assert len(parts) <= 2, f\"Invalid prompt: {prompt}\"\n        ret_prompts.append(parts[0])\n        if len(parts) > 1:\n            additional_info = json.loads(parts[1])\n            for key in additional_info:\n                assert key in [\"reference_path\", \"mask_strategy\"], f\"Invalid key: {key}\"\n                if key == \"reference_path\":\n                    reference[i] = additional_info[key]\n                elif key == \"mask_strategy\":\n                    mask_strategy[i] = additional_info[key]\n    return ret_prompts, reference, mask_strategy\n\n\ndef collect_references_batch(reference_paths, vae, image_size):\n    refs_x = []  # refs_x: [batch, ref_num, C, T, H, W]\n    for reference_path in reference_paths:\n        if reference_path == \"\":\n            refs_x.append([])\n            continue\n        ref_path = reference_path.split(\";\")\n        ref = []\n        for r_path in ref_path:\n            r = read_from_path(r_path, image_size, transform_name=\"resize_crop\")\n            r_x = vae.encode(r.unsqueeze(0).to(vae.device, vae.dtype))\n            r_x = r_x.squeeze(0)\n            ref.append(r_x)\n        refs_x.append(ref)\n    return refs_x\n\n\ndef extract_prompts_loop(prompts, num_loop):\n    ret_prompts = []\n    for prompt in prompts:\n        if prompt.startswith(\"|0|\"):\n            prompt_list = prompt.split(\"|\")[1:]\n            text_list = []\n            for i in range(0, len(prompt_list), 2):\n                start_loop = int(prompt_list[i])\n                text = prompt_list[i + 1]\n                end_loop = int(prompt_list[i + 2]) if i + 2 < len(prompt_list) else num_loop + 1\n                text_list.extend([text] * (end_loop - start_loop))\n            prompt = text_list[num_loop]\n        ret_prompts.append(prompt)\n    return ret_prompts\n\n\ndef split_prompt(prompt_text):\n    if prompt_text.startswith(\"|0|\"):\n        # this is for prompts which look like\n        # |0| a beautiful day |1| a sunny day |2| a rainy day\n        # we want to parse it into a list of prompts with the loop index\n        prompt_list = prompt_text.split(\"|\")[1:]\n        text_list = []\n        loop_idx = []\n        for i in range(0, len(prompt_list), 2):\n            start_loop = int(prompt_list[i])\n            text = prompt_list[i + 1].strip()\n            text_list.append(text)\n            loop_idx.append(start_loop)\n        return text_list, loop_idx\n    else:\n        return [prompt_text], None\n\n\ndef merge_prompt(text_list, loop_idx_list=None):\n    if loop_idx_list is None:\n        return text_list[0]\n    else:\n        prompt = \"\"\n        for i, text in enumerate(text_list):\n            prompt += f\"|{loop_idx_list[i]}|{text}\"\n        return prompt\n\n\nMASK_DEFAULT = [\"0\", \"0\", \"0\", \"0\", \"1\", \"0\"]\n\n\ndef parse_mask_strategy(mask_strategy):\n    mask_batch = []\n    if mask_strategy == \"\" or mask_strategy is None:\n        return mask_batch\n\n    mask_strategy = mask_strategy.split(\";\")\n    for mask in mask_strategy:\n        mask_group = mask.split(\",\")\n        num_group = len(mask_group)\n        assert num_group >= 1 and num_group <= 6, f\"Invalid mask strategy: {mask}\"\n        mask_group.extend(MASK_DEFAULT[num_group:])\n        for i in range(5):\n            mask_group[i] = int(mask_group[i])\n        mask_group[5] = float(mask_group[5])\n        mask_batch.append(mask_group)\n    return mask_batch\n\n\ndef find_nearest_point(value, point, max_value):\n    t = value // point\n    if value % point > point / 2 and t < max_value // point - 1:\n        t += 1\n    return t * point\n\n\ndef apply_mask_strategy(z, refs_x, mask_strategys, loop_i, align=None):\n    masks = []\n    no_mask = True\n    for i, mask_strategy in enumerate(mask_strategys):\n        no_mask = False\n        mask = torch.ones(z.shape[2], dtype=torch.float, device=z.device)\n        mask_strategy = parse_mask_strategy(mask_strategy)\n        for mst in mask_strategy:\n            loop_id, m_id, m_ref_start, m_target_start, m_length, edit_ratio = mst\n            if loop_id != loop_i:\n                continue\n            ref = refs_x[i][m_id]\n\n            if m_ref_start < 0:\n                # ref: [C, T, H, W]\n                m_ref_start = ref.shape[1] + m_ref_start\n            if m_target_start < 0:\n                # z: [B, C, T, H, W]\n                m_target_start = z.shape[2] + m_target_start\n            if align is not None:\n                m_ref_start = find_nearest_point(m_ref_start, align, ref.shape[1])\n                m_target_start = find_nearest_point(m_target_start, align, z.shape[2])\n            m_length = min(m_length, z.shape[2] - m_target_start, ref.shape[1] - m_ref_start)\n            z[i, :, m_target_start : m_target_start + m_length] = ref[:, m_ref_start : m_ref_start + m_length]\n            mask[m_target_start : m_target_start + m_length] = edit_ratio\n        masks.append(mask)\n    if no_mask:\n        return None\n    masks = torch.stack(masks)\n    return masks\n\n\ndef append_generated(vae, generated_video, refs_x, mask_strategy, loop_i, condition_frame_length, condition_frame_edit):\n    ref_x = vae.encode(generated_video)\n    for j, refs in enumerate(refs_x):\n        if refs is None:\n            refs_x[j] = [ref_x[j]]\n        else:\n            refs.append(ref_x[j])\n        if mask_strategy[j] is None or mask_strategy[j] == \"\":\n            mask_strategy[j] = \"\"\n        else:\n            mask_strategy[j] += \";\"\n        mask_strategy[\n            j\n        ] += f\"{loop_i},{len(refs)-1},-{condition_frame_length},0,{condition_frame_length},{condition_frame_edit}\"\n    return refs_x, mask_strategy\n\n\ndef dframe_to_frame(num):\n    assert num % 5 == 0, f\"Invalid num: {num}\"\n    return num // 5 * 17\n\n\nOPENAI_CLIENT = None\nREFINE_PROMPTS = None\nREFINE_PROMPTS_PATH = \"assets/texts/t2v_pllava.txt\"\nREFINE_PROMPTS_TEMPLATE = \"\"\"\nYou need to refine user's input prompt. The user's input prompt is used for video generation task. You need to refine the user's prompt to make it more suitable for the task. Here are some examples of refined prompts:\n{}\n\nThe refined prompt should pay attention to all objects in the video. The description should be useful for AI to re-generate the video. The description should be no more than six sentences. The refined prompt should be in English.\n\"\"\"\nRANDOM_PROMPTS = None\nRANDOM_PROMPTS_TEMPLATE = \"\"\"\nYou need to generate one input prompt for video generation task. The prompt should be suitable for the task. Here are some examples of refined prompts:\n{}\n\nThe prompt should pay attention to all objects in the video. The description should be useful for AI to re-generate the video. The description should be no more than six sentences. The prompt should be in English.\n\"\"\"\n\n\ndef get_openai_response(sys_prompt, usr_prompt, model=\"gpt-4o\"):\n    global OPENAI_CLIENT\n    if OPENAI_CLIENT is None:\n        from openai import OpenAI\n\n        OPENAI_CLIENT = OpenAI(api_key=os.environ.get(\"OPENAI_API_KEY\"))\n\n    completion = OPENAI_CLIENT.chat.completions.create(\n        model=model,\n        messages=[\n            {\n                \"role\": \"system\",\n                \"content\": sys_prompt,\n            },  # <-- This is the system message that provides context to the model\n            {\n                \"role\": \"user\",\n                \"content\": usr_prompt,\n            },  # <-- This is the user message for which the model will generate a response\n        ],\n    )\n\n    return completion.choices[0].message.content\n\n\ndef get_random_prompt_by_openai():\n    global RANDOM_PROMPTS\n    if RANDOM_PROMPTS is None:\n        examples = load_prompts(REFINE_PROMPTS_PATH)\n        RANDOM_PROMPTS = RANDOM_PROMPTS_TEMPLATE.format(\"\\n\".join(examples))\n\n    response = get_openai_response(RANDOM_PROMPTS, \"Generate one example.\")\n    return response\n\n\ndef refine_prompt_by_openai(prompt):\n    global REFINE_PROMPTS\n    if REFINE_PROMPTS is None:\n        examples = load_prompts(REFINE_PROMPTS_PATH)\n        REFINE_PROMPTS = REFINE_PROMPTS_TEMPLATE.format(\"\\n\".join(examples))\n\n    response = get_openai_response(REFINE_PROMPTS, prompt)\n    return response\n\n\ndef has_openai_key():\n    return \"OPENAI_API_KEY\" in os.environ\n\n\ndef refine_prompts_by_openai(prompts):\n    new_prompts = []\n    for prompt in prompts:\n        try:\n            if prompt.strip() == \"\":\n                new_prompt = get_random_prompt_by_openai()\n                print(f\"[Info] Empty prompt detected, generate random prompt: {new_prompt}\")\n            else:\n                new_prompt = refine_prompt_by_openai(prompt)\n                print(f\"[Info] Refine prompt: {prompt} -> {new_prompt}\")\n            new_prompts.append(new_prompt)\n        except Exception as e:\n            print(f\"[Warning] Failed to refine prompt: {prompt} due to {e}\")\n            new_prompts.append(prompt)\n    return new_prompts\n\n\ndef add_watermark(\n    input_video_path, watermark_image_path=\"./assets/images/watermark/watermark.png\", output_video_path=None\n):\n    # execute this command in terminal with subprocess\n    # return if the process is successful\n    if output_video_path is None:\n        output_video_path = input_video_path.replace(\".mp4\", \"_watermark.mp4\")\n    cmd = f'ffmpeg -y -i {input_video_path} -i {watermark_image_path} -filter_complex \"[1][0]scale2ref=oh*mdar:ih*0.1[logo][video];[video][logo]overlay\" {output_video_path}'\n    exit_code = os.system(cmd)\n    is_success = exit_code == 0\n    return is_success\n"
  },
  {
    "path": "Open-Sora/opensora/utils/lr_scheduler.py",
    "content": "from torch.optim.lr_scheduler import _LRScheduler\n\n\nclass LinearWarmupLR(_LRScheduler):\n    \"\"\"Linearly warmup learning rate and then linearly decay.\n\n    Args:\n        optimizer (:class:`torch.optim.Optimizer`): Wrapped optimizer.\n        warmup_steps (int, optional): Number of warmup steps, defaults to 0\n        last_step (int, optional): The index of last step, defaults to -1. When last_step=-1,\n            the schedule is started from the beginning or When last_step=-1, sets initial lr as lr.\n    \"\"\"\n\n    def __init__(self, optimizer, warmup_steps: int = 0, last_epoch: int = -1):\n        self.warmup_steps = warmup_steps\n        super().__init__(optimizer, last_epoch=last_epoch)\n\n    def get_lr(self):\n        if self.last_epoch < self.warmup_steps:\n            return [(self.last_epoch + 1) / (self.warmup_steps + 1) * lr for lr in self.base_lrs]\n        else:\n            return self.base_lrs\n"
  },
  {
    "path": "Open-Sora/opensora/utils/misc.py",
    "content": "import collections\nimport importlib\nimport logging\nimport os\nimport time\nfrom collections import OrderedDict\nfrom collections.abc import Sequence\nfrom itertools import repeat\nfrom typing import Optional, Tuple\n\nimport numpy as np\nimport torch\nimport torch.distributed as dist\nfrom colossalai.cluster.dist_coordinator import DistCoordinator\n\n# ======================================================\n# Logging\n# ======================================================\n\n\ndef is_distributed():\n    return os.environ.get(\"WORLD_SIZE\", None) is not None\n\n\ndef is_main_process():\n    return not is_distributed() or dist.get_rank() == 0\n\n\ndef get_world_size():\n    if is_distributed():\n        return dist.get_world_size()\n    else:\n        return 1\n\n\ndef create_logger(logging_dir=None):\n    \"\"\"\n    Create a logger that writes to a log file and stdout.\n    \"\"\"\n    if is_main_process():  # real logger\n        additional_args = dict()\n        if logging_dir is not None:\n            additional_args[\"handlers\"] = [\n                logging.StreamHandler(),\n                logging.FileHandler(f\"{logging_dir}/log.txt\"),\n            ]\n        logging.basicConfig(\n            level=logging.INFO,\n            format=\"[\\033[34m%(asctime)s\\033[0m] %(message)s\",\n            datefmt=\"%Y-%m-%d %H:%M:%S\",\n            **additional_args,\n        )\n        logger = logging.getLogger(__name__)\n    else:  # dummy logger (does nothing)\n        logger = logging.getLogger(__name__)\n        logger.addHandler(logging.NullHandler())\n    return logger\n\n\ndef get_logger():\n    return logging.getLogger(__name__)\n\n\ndef print_rank(var_name, var_value, rank=0):\n    if dist.get_rank() == rank:\n        print(f\"[Rank {rank}] {var_name}: {var_value}\")\n\n\ndef print_0(*args, **kwargs):\n    if dist.get_rank() == 0:\n        print(*args, **kwargs)\n\n\ndef create_tensorboard_writer(exp_dir):\n    from torch.utils.tensorboard import SummaryWriter\n\n    tensorboard_dir = f\"{exp_dir}/tensorboard\"\n    os.makedirs(tensorboard_dir, exist_ok=True)\n    writer = SummaryWriter(tensorboard_dir)\n    return writer\n\n\n# ======================================================\n# String\n# ======================================================\n\n\ndef format_numel_str(numel: int) -> str:\n    B = 1024**3\n    M = 1024**2\n    K = 1024\n    if numel >= B:\n        return f\"{numel / B:.2f} B\"\n    elif numel >= M:\n        return f\"{numel / M:.2f} M\"\n    elif numel >= K:\n        return f\"{numel / K:.2f} K\"\n    else:\n        return f\"{numel}\"\n\n\ndef get_timestamp():\n    timestamp = time.strftime(\"%Y%m%d-%H%M%S\", time.localtime(time.time()))\n    return timestamp\n\n\ndef format_time(seconds):\n    days = int(seconds / 3600 / 24)\n    seconds = seconds - days * 3600 * 24\n    hours = int(seconds / 3600)\n    seconds = seconds - hours * 3600\n    minutes = int(seconds / 60)\n    seconds = seconds - minutes * 60\n    secondsf = int(seconds)\n    seconds = seconds - secondsf\n    millis = int(seconds * 1000)\n\n    f = \"\"\n    i = 1\n    if days > 0:\n        f += str(days) + \"D\"\n        i += 1\n    if hours > 0 and i <= 2:\n        f += str(hours) + \"h\"\n        i += 1\n    if minutes > 0 and i <= 2:\n        f += str(minutes) + \"m\"\n        i += 1\n    if secondsf > 0 and i <= 2:\n        f += str(secondsf) + \"s\"\n        i += 1\n    if millis > 0 and i <= 2:\n        f += str(millis) + \"ms\"\n        i += 1\n    if f == \"\":\n        f = \"0ms\"\n    return f\n\n\nclass BColors:\n    HEADER = \"\\033[95m\"\n    OKBLUE = \"\\033[94m\"\n    OKCYAN = \"\\033[96m\"\n    OKGREEN = \"\\033[92m\"\n    WARNING = \"\\033[93m\"\n    FAIL = \"\\033[91m\"\n    ENDC = \"\\033[0m\"\n    BOLD = \"\\033[1m\"\n    UNDERLINE = \"\\033[4m\"\n\n\n# ======================================================\n# PyTorch\n# ======================================================\n\n\ndef requires_grad(model: torch.nn.Module, flag: bool = True) -> None:\n    \"\"\"\n    Set requires_grad flag for all parameters in a model.\n    \"\"\"\n    for p in model.parameters():\n        p.requires_grad = flag\n\n\ndef all_reduce_mean(tensor: torch.Tensor) -> torch.Tensor:\n    dist.all_reduce(tensor=tensor, op=dist.ReduceOp.SUM)\n    tensor.div_(dist.get_world_size())\n    return tensor\n\n\ndef get_model_numel(model: torch.nn.Module) -> Tuple[int, int]:\n    num_params = 0\n    num_params_trainable = 0\n    for p in model.parameters():\n        num_params += p.numel()\n        if p.requires_grad:\n            num_params_trainable += p.numel()\n    return num_params, num_params_trainable\n\n\ndef count_params(model):\n    return sum(p.numel() for p in model.parameters() if p.requires_grad)\n\n\ndef to_tensor(data):\n    \"\"\"Convert objects of various python types to :obj:`torch.Tensor`.\n\n    Supported types are: :class:`numpy.ndarray`, :class:`torch.Tensor`,\n    :class:`Sequence`, :class:`int` and :class:`float`.\n\n    Args:\n        data (torch.Tensor | numpy.ndarray | Sequence | int | float): Data to\n            be converted.\n    \"\"\"\n\n    if isinstance(data, torch.Tensor):\n        return data\n    elif isinstance(data, np.ndarray):\n        return torch.from_numpy(data)\n    elif isinstance(data, Sequence) and not isinstance(data, str):\n        return torch.tensor(data)\n    elif isinstance(data, int):\n        return torch.LongTensor([data])\n    elif isinstance(data, float):\n        return torch.FloatTensor([data])\n    else:\n        raise TypeError(f\"type {type(data)} cannot be converted to tensor.\")\n\n\ndef to_ndarray(data):\n    if isinstance(data, torch.Tensor):\n        return data.numpy()\n    elif isinstance(data, np.ndarray):\n        return data\n    elif isinstance(data, Sequence):\n        return np.array(data)\n    elif isinstance(data, int):\n        return np.ndarray([data], dtype=int)\n    elif isinstance(data, float):\n        return np.array([data], dtype=float)\n    else:\n        raise TypeError(f\"type {type(data)} cannot be converted to ndarray.\")\n\n\ndef to_torch_dtype(dtype):\n    if isinstance(dtype, torch.dtype):\n        return dtype\n    elif isinstance(dtype, str):\n        dtype_mapping = {\n            \"float64\": torch.float64,\n            \"float32\": torch.float32,\n            \"float16\": torch.float16,\n            \"fp32\": torch.float32,\n            \"fp16\": torch.float16,\n            \"half\": torch.float16,\n            \"bf16\": torch.bfloat16,\n        }\n        if dtype not in dtype_mapping:\n            raise ValueError\n        dtype = dtype_mapping[dtype]\n        return dtype\n    else:\n        raise ValueError\n\n\ndef _ntuple(n):\n    def parse(x):\n        if isinstance(x, collections.abc.Iterable) and not isinstance(x, str):\n            return x\n        return tuple(repeat(x, n))\n\n    return parse\n\n\nto_1tuple = _ntuple(1)\nto_2tuple = _ntuple(2)\nto_3tuple = _ntuple(3)\nto_4tuple = _ntuple(4)\nto_ntuple = _ntuple\n\n\ndef convert_SyncBN_to_BN2d(model_cfg):\n    for k in model_cfg:\n        v = model_cfg[k]\n        if k == \"norm_cfg\" and v[\"type\"] == \"SyncBN\":\n            v[\"type\"] = \"BN2d\"\n        elif isinstance(v, dict):\n            convert_SyncBN_to_BN2d(v)\n\n\ndef get_topk(x, dim=4, k=5):\n    x = to_tensor(x)\n    inds = x[..., dim].topk(k)[1]\n    return x[inds]\n\n\ndef param_sigmoid(x, alpha):\n    ret = 1 / (1 + (-alpha * x).exp())\n    return ret\n\n\ndef inverse_param_sigmoid(x, alpha, eps=1e-5):\n    x = x.clamp(min=0, max=1)\n    x1 = x.clamp(min=eps)\n    x2 = (1 - x).clamp(min=eps)\n    return torch.log(x1 / x2) / alpha\n\n\ndef inverse_sigmoid(x, eps=1e-5):\n    \"\"\"Inverse function of sigmoid.\n\n    Args:\n        x (Tensor): The tensor to do the\n            inverse.\n        eps (float): EPS avoid numerical\n            overflow. Defaults 1e-5.\n    Returns:\n        Tensor: The x has passed the inverse\n            function of sigmoid, has same\n            shape with input.\n    \"\"\"\n    x = x.clamp(min=0, max=1)\n    x1 = x.clamp(min=eps)\n    x2 = (1 - x).clamp(min=eps)\n    return torch.log(x1 / x2)\n\n\n# ======================================================\n# Python\n# ======================================================\n\n\ndef count_columns(df, columns):\n    cnt_dict = OrderedDict()\n    num_samples = len(df)\n\n    for col in columns:\n        d_i = df[col].value_counts().to_dict()\n        for k in d_i:\n            d_i[k] = (d_i[k], d_i[k] / num_samples)\n        cnt_dict[col] = d_i\n\n    return cnt_dict\n\n\ndef try_import(name):\n    \"\"\"Try to import a module.\n\n    Args:\n        name (str): Specifies what module to import in absolute or relative\n            terms (e.g. either pkg.mod or ..mod).\n    Returns:\n        ModuleType or None: If importing successfully, returns the imported\n        module, otherwise returns None.\n    \"\"\"\n    try:\n        return importlib.import_module(name)\n    except ImportError:\n        return None\n\n\ndef transpose(x):\n    \"\"\"\n    transpose a list of list\n    Args:\n        x (list[list]):\n    \"\"\"\n    ret = list(map(list, zip(*x)))\n    return ret\n\n\ndef all_exists(paths):\n    return all(os.path.exists(path) for path in paths)\n\n\n# ======================================================\n# Profile\n# ======================================================\n\n\nclass Timer:\n    def __init__(self, name, log=False, coordinator: Optional[DistCoordinator] = None):\n        self.name = name\n        self.start_time = None\n        self.end_time = None\n        self.log = log\n        self.coordinator = coordinator\n\n    @property\n    def elapsed_time(self):\n        return self.end_time - self.start_time\n\n    def __enter__(self):\n        torch.cuda.synchronize()\n        self.start_time = time.time()\n        return self\n\n    def __exit__(self, exc_type, exc_val, exc_tb):\n        if self.coordinator is not None:\n            self.coordinator.block_all()\n        torch.cuda.synchronize()\n        self.end_time = time.time()\n        if self.log:\n            print(f\"Elapsed time for {self.name}: {self.elapsed_time:.2f} s\")\n\n\ndef get_tensor_memory(tensor, human_readable=True):\n    size = tensor.element_size() * tensor.nelement()\n    if human_readable:\n        size = format_numel_str(size)\n    return size\n\n\nclass FeatureSaver:\n    def __init__(self, save_dir, bin_size=10, start_bin=0):\n        self.save_dir = save_dir\n        self.bin_size = bin_size\n        self.bin_cnt = start_bin\n\n        self.data_list = []\n        self.cnt = 0\n\n    def update(self, data):\n        self.data_list.append(data)\n        self.cnt += 1\n\n        if self.cnt % self.bin_size == 0:\n            self.save()\n\n    def save(self):\n        save_path = os.path.join(self.save_dir, f\"{self.bin_cnt:08}.bin\")\n        torch.save(self.data_list, save_path)\n        get_logger().info(\"Saved to %s\", save_path)\n        self.data_list = []\n        self.bin_cnt += 1\n"
  },
  {
    "path": "Open-Sora/opensora/utils/train_utils.py",
    "content": "import math\nimport random\nfrom collections import OrderedDict\n\nimport torch\nimport torch.distributed as dist\nfrom colossalai.booster.plugin import LowLevelZeroPlugin\n\nfrom opensora.acceleration.parallel_states import set_data_parallel_group, set_sequence_parallel_group\nfrom opensora.acceleration.plugin import ZeroSeqParallelPlugin\n\nfrom .misc import get_logger\n\n\ndef create_colossalai_plugin(plugin, dtype, grad_clip, sp_size, reduce_bucket_size_in_m: int = 20):\n    if plugin == \"zero2\":\n        assert sp_size == 1, \"Zero2 plugin does not support sequence parallelism\"\n        plugin = LowLevelZeroPlugin(\n            stage=2,\n            precision=dtype,\n            initial_scale=2**16,\n            max_norm=grad_clip,\n            reduce_bucket_size_in_m=reduce_bucket_size_in_m,\n        )\n        set_data_parallel_group(dist.group.WORLD)\n    elif plugin == \"zero2-seq\":\n        assert sp_size > 1, \"Zero2-seq plugin requires sequence parallelism\"\n        plugin = ZeroSeqParallelPlugin(\n            sp_size=sp_size,\n            stage=2,\n            precision=dtype,\n            initial_scale=2**16,\n            max_norm=grad_clip,\n            reduce_bucket_size_in_m=reduce_bucket_size_in_m,\n        )\n        set_sequence_parallel_group(plugin.sp_group)\n        set_data_parallel_group(plugin.dp_group)\n    else:\n        raise ValueError(f\"Unknown plugin {plugin}\")\n    return plugin\n\n\n@torch.no_grad()\ndef update_ema(\n    ema_model: torch.nn.Module, model: torch.nn.Module, optimizer=None, decay: float = 0.9999, sharded: bool = True\n) -> None:\n    \"\"\"\n    Step the EMA model towards the current model.\n    \"\"\"\n    ema_params = OrderedDict(ema_model.named_parameters())\n    model_params = OrderedDict(model.named_parameters())\n\n    for name, param in model_params.items():\n        if name == \"pos_embed\":\n            continue\n        if not param.requires_grad:\n            continue\n        if not sharded:\n            param_data = param.data\n            ema_params[name].mul_(decay).add_(param_data, alpha=1 - decay)\n        else:\n            if param.data.dtype != torch.float32:\n                param_id = id(param)\n                master_param = optimizer._param_store.working_to_master_param[param_id]\n                param_data = master_param.data\n            else:\n                param_data = param.data\n            ema_params[name].mul_(decay).add_(param_data, alpha=1 - decay)\n\n\nclass MaskGenerator:\n    def __init__(self, mask_ratios):\n        valid_mask_names = [\n            \"identity\",\n            \"quarter_random\",\n            \"quarter_head\",\n            \"quarter_tail\",\n            \"quarter_head_tail\",\n            \"image_random\",\n            \"image_head\",\n            \"image_tail\",\n            \"image_head_tail\",\n            \"random\",\n            \"intepolate\",\n        ]\n        assert all(\n            mask_name in valid_mask_names for mask_name in mask_ratios.keys()\n        ), f\"mask_name should be one of {valid_mask_names}, got {mask_ratios.keys()}\"\n        assert all(\n            mask_ratio >= 0 for mask_ratio in mask_ratios.values()\n        ), f\"mask_ratio should be greater than or equal to 0, got {mask_ratios.values()}\"\n        assert all(\n            mask_ratio <= 1 for mask_ratio in mask_ratios.values()\n        ), f\"mask_ratio should be less than or equal to 1, got {mask_ratios.values()}\"\n        # sum of mask_ratios should be 1\n        if \"identity\" not in mask_ratios:\n            mask_ratios[\"identity\"] = 1.0 - sum(mask_ratios.values())\n        assert math.isclose(\n            sum(mask_ratios.values()), 1.0, abs_tol=1e-6\n        ), f\"sum of mask_ratios should be 1, got {sum(mask_ratios.values())}\"\n        get_logger().info(\"mask ratios: %s\", mask_ratios)\n        self.mask_ratios = mask_ratios\n\n    def get_mask(self, x):\n        mask_type = random.random()\n        mask_name = None\n        prob_acc = 0.0\n        for mask, mask_ratio in self.mask_ratios.items():\n            prob_acc += mask_ratio\n            if mask_type < prob_acc:\n                mask_name = mask\n                break\n\n        num_frames = x.shape[2]\n        # Hardcoded condition_frames\n        condition_frames_max = num_frames // 4\n\n        mask = torch.ones(num_frames, dtype=torch.bool, device=x.device)\n        if num_frames <= 1:\n            return mask\n\n        if mask_name == \"quarter_random\":\n            random_size = random.randint(1, condition_frames_max)\n            random_pos = random.randint(0, x.shape[2] - random_size)\n            mask[random_pos : random_pos + random_size] = 0\n        elif mask_name == \"image_random\":\n            random_size = 1\n            random_pos = random.randint(0, x.shape[2] - random_size)\n            mask[random_pos : random_pos + random_size] = 0\n        elif mask_name == \"quarter_head\":\n            random_size = random.randint(1, condition_frames_max)\n            mask[:random_size] = 0\n        elif mask_name == \"image_head\":\n            random_size = 1\n            mask[:random_size] = 0\n        elif mask_name == \"quarter_tail\":\n            random_size = random.randint(1, condition_frames_max)\n            mask[-random_size:] = 0\n        elif mask_name == \"image_tail\":\n            random_size = 1\n            mask[-random_size:] = 0\n        elif mask_name == \"quarter_head_tail\":\n            random_size = random.randint(1, condition_frames_max)\n            mask[:random_size] = 0\n            mask[-random_size:] = 0\n        elif mask_name == \"image_head_tail\":\n            random_size = 1\n            mask[:random_size] = 0\n            mask[-random_size:] = 0\n        elif mask_name == \"intepolate\":\n            random_start = random.randint(0, 1)\n            mask[random_start::2] = 0\n        elif mask_name == \"random\":\n            mask_ratio = random.uniform(0.1, 0.9)\n            mask = torch.rand(num_frames, device=x.device) > mask_ratio\n            # if mask is all False, set the last frame to True\n            if not mask.any():\n                mask[-1] = 1\n\n        return mask\n\n    def get_masks(self, x):\n        masks = []\n        for _ in range(len(x)):\n            mask = self.get_mask(x)\n            masks.append(mask)\n        masks = torch.stack(masks, dim=0)\n        return masks\n"
  },
  {
    "path": "Open-Sora/opensora.egg-info/PKG-INFO",
    "content": "Metadata-Version: 2.1\nName: opensora\nVersion: 1.2.0\nSummary: Democratizing Efficient Video Production for All\nHome-page: https://github.com/hpcaitech/Open-Sora\nLicense: Apache Software License 2.0\nProject-URL: Bug Tracker, https://github.com/hpcaitech/Open-Sora/issues\nProject-URL: Examples, https://hpcaitech.github.io/Open-Sora/\nProject-URL: Documentation, https://github.com/hpcaitech/Open-Sora?tab=readme-ov-file\nProject-URL: Github, https://github.com/hpcaitech/Open-Sora\nClassifier: Programming Language :: Python :: 3\nClassifier: License :: OSI Approved :: Apache Software License\nClassifier: Environment :: GPU :: NVIDIA CUDA\nClassifier: Topic :: Scientific/Engineering :: Artificial Intelligence\nClassifier: Topic :: System :: Distributed Computing\nRequires-Python: >=3.6\nDescription-Content-Type: text/markdown\nLicense-File: LICENSE\nRequires-Dist: colossalai>=0.4.0\nRequires-Dist: mmengine>=0.10.3\nRequires-Dist: pandas>=2.0.3\nRequires-Dist: timm==0.9.16\nRequires-Dist: rotary_embedding_torch==0.5.3\nRequires-Dist: ftfy>=6.2.0\nRequires-Dist: diffusers==0.27.2\nRequires-Dist: accelerate==0.29.2\nRequires-Dist: av>=12.0.0\nRequires-Dist: numpy<2.0.0\nRequires-Dist: gradio>=4.26.0\nRequires-Dist: spaces>=0.28.3\nRequires-Dist: ipykernel>=6.29.4\nRequires-Dist: ipywidgets>=8.1.2\nRequires-Dist: wandb>=0.17.0\nRequires-Dist: tensorboard>=2.14.0\nRequires-Dist: pandarallel>=1.6.5\nRequires-Dist: pyarrow>=16.1.0\nRequires-Dist: pre-commit>=3.5.0\nRequires-Dist: openai\nProvides-Extra: data\nRequires-Dist: gdown>=5.2.0; extra == \"data\"\nRequires-Dist: ninja>=1.11.1.1; extra == \"data\"\nRequires-Dist: shortuuid>=1.0.13; extra == \"data\"\nRequires-Dist: markdown2[all]; extra == \"data\"\nRequires-Dist: scikit-learn>=1.4.2; extra == \"data\"\nRequires-Dist: einops-exts>=0.0.4; extra == \"data\"\nRequires-Dist: decord==0.6.0; extra == \"data\"\nRequires-Dist: ptvsd==4.3.2; extra == \"data\"\nRequires-Dist: imageio-ffmpeg>=0.4.9; extra == \"data\"\nRequires-Dist: ffmpeg-python==0.2.0; extra == \"data\"\nRequires-Dist: lingua-language-detector==2.0.2; extra == \"data\"\nRequires-Dist: imageio>=2.34.1; extra == \"data\"\nRequires-Dist: setuptools==68.2.2; extra == \"data\"\nRequires-Dist: clip@ git+https://github.com/openai/CLIP.git ; extra == \"data\"\nRequires-Dist: mmcv==2.1.0; extra == \"data\"\nRequires-Dist: mmdet==3.1.0; extra == \"data\"\nRequires-Dist: mmocr==1.0.1; extra == \"data\"\nRequires-Dist: detectron2@ git+https://github.com/facebookresearch/detectron2.git@ff53992 ; extra == \"data\"\nProvides-Extra: eval\nRequires-Dist: detectron2@ git+https://github.com/facebookresearch/detectron2.git@ff53992 ; extra == \"eval\"\nRequires-Dist: imageio>=2.34.1; extra == \"eval\"\nRequires-Dist: pyiqa==0.1.10; extra == \"eval\"\nRequires-Dist: scikit-learn>=1.4.2; extra == \"eval\"\nRequires-Dist: scikit-image>=0.20.0; extra == \"eval\"\nRequires-Dist: lvis==0.5.3; extra == \"eval\"\nRequires-Dist: boto3>=1.34.113; extra == \"eval\"\nRequires-Dist: easydict>=1.9; extra == \"eval\"\nRequires-Dist: fairscale>=0.4.13; extra == \"eval\"\nRequires-Dist: decord==0.6.0; extra == \"eval\"\nRequires-Dist: pytorchvideo==0.1.5; extra == \"eval\"\nRequires-Dist: lpips==0.1.4; extra == \"eval\"\nProvides-Extra: vae\nRequires-Dist: beartype==0.18.5; extra == \"vae\"\nRequires-Dist: einops==0.8.0; extra == \"vae\"\nRequires-Dist: einops-exts==0.0.4; extra == \"vae\"\nRequires-Dist: opencv-python==4.9.0.80; extra == \"vae\"\nRequires-Dist: pillow==10.3.0; extra == \"vae\"\nProvides-Extra: full\nRequires-Dist: gdown>=5.2.0; extra == \"full\"\nRequires-Dist: ninja>=1.11.1.1; extra == \"full\"\nRequires-Dist: shortuuid>=1.0.13; extra == \"full\"\nRequires-Dist: markdown2[all]; extra == \"full\"\nRequires-Dist: scikit-learn>=1.4.2; extra == \"full\"\nRequires-Dist: einops-exts>=0.0.4; extra == \"full\"\nRequires-Dist: decord==0.6.0; extra == \"full\"\nRequires-Dist: ptvsd==4.3.2; extra == \"full\"\nRequires-Dist: imageio-ffmpeg>=0.4.9; extra == \"full\"\nRequires-Dist: ffmpeg-python==0.2.0; extra == \"full\"\nRequires-Dist: lingua-language-detector==2.0.2; extra == \"full\"\nRequires-Dist: imageio>=2.34.1; extra == \"full\"\nRequires-Dist: setuptools==68.2.2; extra == \"full\"\nRequires-Dist: clip@ git+https://github.com/openai/CLIP.git ; extra == \"full\"\nRequires-Dist: mmcv==2.1.0; extra == \"full\"\nRequires-Dist: mmdet==3.1.0; extra == \"full\"\nRequires-Dist: mmocr==1.0.1; extra == \"full\"\nRequires-Dist: detectron2@ git+https://github.com/facebookresearch/detectron2.git@ff53992 ; extra == \"full\"\nRequires-Dist: detectron2@ git+https://github.com/facebookresearch/detectron2.git@ff53992 ; extra == \"full\"\nRequires-Dist: imageio>=2.34.1; extra == \"full\"\nRequires-Dist: pyiqa==0.1.10; extra == \"full\"\nRequires-Dist: scikit-learn>=1.4.2; extra == \"full\"\nRequires-Dist: scikit-image>=0.20.0; extra == \"full\"\nRequires-Dist: lvis==0.5.3; extra == \"full\"\nRequires-Dist: boto3>=1.34.113; extra == \"full\"\nRequires-Dist: easydict>=1.9; extra == \"full\"\nRequires-Dist: fairscale>=0.4.13; extra == \"full\"\nRequires-Dist: decord==0.6.0; extra == \"full\"\nRequires-Dist: pytorchvideo==0.1.5; extra == \"full\"\nRequires-Dist: lpips==0.1.4; extra == \"full\"\n\n<p align=\"center\">\n    <img src=\"./assets/readme/icon.png\" width=\"250\"/>\n</p>\n<div align=\"center\">\n    <a href=\"https://github.com/hpcaitech/Open-Sora/stargazers\"><img src=\"https://img.shields.io/github/stars/hpcaitech/Open-Sora?style=social\"></a>\n    <a href=\"https://hpcaitech.github.io/Open-Sora/\"><img src=\"https://img.shields.io/badge/Gallery-View-orange?logo=&amp\"></a>\n    <a href=\"https://discord.gg/kZakZzrSUT\"><img src=\"https://img.shields.io/badge/Discord-join-blueviolet?logo=discord&amp\"></a>\n    <a href=\"https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-247ipg9fk-KRRYmUl~u2ll2637WRURVA\"><img src=\"https://img.shields.io/badge/Slack-ColossalAI-blueviolet?logo=slack&amp\"></a>\n    <a href=\"https://twitter.com/yangyou1991/status/1769411544083996787?s=61&t=jT0Dsx2d-MS5vS9rNM5e5g\"><img src=\"https://img.shields.io/badge/Twitter-Discuss-blue?logo=twitter&amp\"></a>\n    <a href=\"https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png\"><img src=\"https://img.shields.io/badge/微信-小助手加群-green?logo=wechat&amp\"></a>\n    <a href=\"https://hpc-ai.com/blog/open-sora-v1.0\"><img src=\"https://img.shields.io/badge/Open_Sora-Blog-blue\"></a>\n    <a href=\"https://huggingface.co/spaces/hpcai-tech/open-sora\"><img src=\"https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Gradio Demo-blue\"></a>\n</div>\n\n## Open-Sora: Democratizing Efficient Video Production for All\n\nWe design and implement **Open-Sora**, an initiative dedicated to **efficiently** producing high-quality video. We hope to make the model,\ntools and all details accessible to all. By embracing **open-source** principles,\nOpen-Sora not only democratizes access to advanced video generation techniques, but also offers a\nstreamlined and user-friendly platform that simplifies the complexities of video generation.\nWith Open-Sora, our goal is to foster innovation, creativity, and inclusivity within the field of content creation.\n\n[[中文文档](/docs/zh_CN/README.md)] [[潞晨云](https://cloud.luchentech.com/)|[OpenSora镜像](https://cloud.luchentech.com/doc/docs/image/open-sora/)|[视频教程](https://www.bilibili.com/video/BV1ow4m1e7PX/?vd_source=c6b752764cd36ff0e535a768e35d98d2)]\n\n## 📰 News\n\n- **[2024.06.17]** 🔥 We released **Open-Sora 1.2**, which includes **3D-VAE**, **rectified flow**, and **score condition**. The video quality is greatly improved. [[checkpoints]](#open-sora-10-model-weights) [[report]](/docs/report_03.md)   [[blog]](https://hpc-ai.com/blog/open-sora-from-hpc-ai-tech-team-continues-open-source-generate-any-16-second-720p-hd-video-with-one-click-model-weights-ready-to-use)\n- **[2024.04.25]** 🤗 We released the [Gradio demo for Open-Sora](https://huggingface.co/spaces/hpcai-tech/open-sora) on Hugging Face Spaces.\n- **[2024.04.25]** We released **Open-Sora 1.1**, which supports **2s~15s, 144p to 720p, any aspect ratio** text-to-image, **text-to-video, image-to-video, video-to-video, infinite time** generation. In addition, a full video processing pipeline is released. [[checkpoints]]() [[report]](/docs/report_02.md)\n- **[2024.03.18]** We released **Open-Sora 1.0**, a fully open-source project for video generation.\n  Open-Sora 1.0 supports a full pipeline of video data preprocessing, training with\n  <a href=\"https://github.com/hpcaitech/ColossalAI\"><img src=\"assets/readme/colossal_ai.png\" width=\"8%\" ></a>\n  acceleration,\n  inference, and more. Our model can produce 2s 512x512 videos with only 3 days training. [[checkpoints]](#open-sora-10-model-weights)\n  [[blog]](https://hpc-ai.com/blog/open-sora-v1.0) [[report]](/docs/report_01.md)\n- **[2024.03.04]** Open-Sora provides training with 46% cost reduction.\n  [[blog]](https://hpc-ai.com/blog/open-sora)\n\n## 🎥 Latest Demo\n\n🔥 You can experience Open-Sora on our [🤗 Gradio application on Hugging Face](https://huggingface.co/spaces/hpcai-tech/open-sora). More samples and corresponding prompts are available in our [Gallery](https://hpcaitech.github.io/Open-Sora/).\n\n| **4s 720×1280**                                                                                                                                      | **4s 720×1280**                                                                                                                                      | **4s 720×1280**                                                                                                                                      |\n| ---------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [<img src=\"assets/demo/v1.2/sample_0013.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/7895aab6-ed23-488c-8486-091480c26327) | [<img src=\"assets/demo/v1.2/sample_1718.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/20f07c7b-182b-4562-bbee-f1df74c86c9a) | [<img src=\"assets/demo/v1.2/sample_0087.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/3d897e0d-dc21-453a-b911-b3bda838acc2) |\n| [<img src=\"assets/demo/v1.2/sample_0052.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/644bf938-96ce-44aa-b797-b3c0b513d64c) | [<img src=\"assets/demo/v1.2/sample_1719.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/272d88ac-4b4a-484d-a665-8d07431671d0) | [<img src=\"assets/demo/v1.2/sample_0002.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/ebbac621-c34e-4bb4-9543-1c34f8989764) |\n| [<img src=\"assets/demo/v1.2/sample_0011.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/a1e3a1a3-4abd-45f5-8df2-6cced69da4ca) | [<img src=\"assets/demo/v1.2/sample_0004.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/d6ce9c13-28e1-4dff-9644-cc01f5f11926) | [<img src=\"assets/demo/v1.2/sample_0061.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/561978f8-f1b0-4f4d-ae7b-45bec9001b4a) |\n\n<details>\n<summary>OpenSora 1.1 Demo</summary>\n\n| **2s 240×426**                                                                                                                                              | **2s 240×426**                                                                                                                                             |\n| ----------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [<img src=\"assets/demo/sample_16x240x426_9.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c31ebc52-de39-4a4e-9b1e-9211d45e05b2) | [<img src=\"assets/demo/sora_16x240x426_26.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c31ebc52-de39-4a4e-9b1e-9211d45e05b2) |\n| [<img src=\"assets/demo/sora_16x240x426_27.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/f7ce4aaa-528f-40a8-be7a-72e61eaacbbd)  | [<img src=\"assets/demo/sora_16x240x426_40.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/5d58d71e-1fda-4d90-9ad3-5f2f7b75c6a9) |\n\n| **2s 426×240**                                                                                                                                             | **4s 480×854**                                                                                                                                              |\n| ---------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [<img src=\"assets/demo/sora_16x426x240_24.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/34ecb4a0-4eef-4286-ad4c-8e3a87e5a9fd) | [<img src=\"assets/demo/sample_32x480x854_9.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c1619333-25d7-42ba-a91c-18dbc1870b18) |\n\n| **16s 320×320**                                                                                                                                        | **16s 224×448**                                                                                                                                        | **2s 426×240**                                                                                                                                            |\n| ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [<img src=\"assets/demo/sample_16s_320x320.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/3cab536e-9b43-4b33-8da8-a0f9cf842ff2) | [<img src=\"assets/demo/sample_16s_224x448.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/9fb0b9e0-c6f4-4935-b29e-4cac10b373c4) | [<img src=\"assets/demo/sora_16x426x240_3.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/3e892ad2-9543-4049-b005-643a4c1bf3bf) |\n\n</details>\n\n<details>\n<summary>OpenSora 1.0 Demo</summary>\n\n| **2s 512×512**                                                                                                                                                                 | **2s 512×512**                                                                                                                                                              | **2s 512×512**                                                                                                                                    |\n| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [<img src=\"assets/readme/sample_0.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/de1963d3-b43b-4e68-a670-bb821ebb6f80)                                 | [<img src=\"assets/readme/sample_1.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/13f8338f-3d42-4b71-8142-d234fbd746cc)                              | [<img src=\"assets/readme/sample_2.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/fa6a65a6-e32a-4d64-9a9e-eabb0ebb8c16)    |\n| A serene night scene in a forested area. [...] The video is a time-lapse, capturing the transition from day to night, with the lake and forest serving as a constant backdrop. | A soaring drone footage captures the majestic beauty of a coastal cliff, [...] The water gently laps at the rock base and the greenery that clings to the top of the cliff. | The majestic beauty of a waterfall cascading down a cliff into a serene lake. [...] The camera angle provides a bird's eye view of the waterfall. |\n| [<img src=\"assets/readme/sample_3.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/64232f84-1b36-4750-a6c0-3e610fa9aa94)                                 | [<img src=\"assets/readme/sample_4.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/983a1965-a374-41a7-a76b-c07941a6c1e9)                              | [<img src=\"assets/readme/sample_5.gif\" width=\"\">](https://github.com/hpcaitech/Open-Sora/assets/99191637/ec10c879-9767-4c31-865f-2e8d6cf11e65)    |\n| A bustling city street at night, filled with the glow of car headlights and the ambient light of streetlights. [...]                                                           | The vibrant beauty of a sunflower field. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. [...]                                            | A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell [...]                   |\n\nVideos are downsampled to `.gif` for display. Click for original videos. Prompts are trimmed for display,\nsee [here](/assets/texts/t2v_samples.txt) for full prompts.\n\n</details>\n\n## 🔆 New Features/Updates\n\n- 📍 **Open-Sora 1.2** released. Model weights are available [here](#model-weights). See our **[report 1.2](/docs/report_03.md)** for more details.\n- ✅ Support rectified flow scheduling.\n- ✅ Support more conditioning including fps, aesthetic score, motion strength and camera motion.\n- ✅ Trained our 3D-VAE for temporal dimension compression.\n- 📍 **Open-Sora 1.1** released. Model weights are available [here](#model-weights). It is trained on **0s~15s, 144p to 720p, various aspect ratios** videos. See our **[report 1.1](/docs/report_02.md)** for more discussions.\n- 🔧 **Data processing pipeline v1.1** is released. An automatic [processing pipeline](#data-processing) from raw videos to (text, video clip) pairs is provided, including scene cutting $\\rightarrow$ filtering(aesthetic, optical flow, OCR, etc.) $\\rightarrow$ captioning $\\rightarrow$ managing. With this tool, you can easily build your video dataset.\n\n<details>\n<summary>View more</summary>\n\n- ✅ Improved ST-DiT architecture includes rope positional encoding, qk norm, longer text length, etc.\n- ✅ Support training with any resolution, aspect ratio, and duration (including images).\n- ✅ Support image and video conditioning and video editing, and thus support animating images, connecting videos, etc.\n- 📍 **Open-Sora 1.0** released. Model weights are available [here](#model-weights). With only 400K video clips and 200 H800\n  days (compared with 152M samples in Stable Video Diffusion), we are able to generate 2s 512×512 videos. See our **[report 1.0](docs/report_01.md)** for more discussions.\n- ✅ Three-stage training from an image diffusion model to a video diffusion model. We provide the weights for each\n  stage.\n- ✅ Support training acceleration including accelerated transformer, faster T5 and VAE, and sequence parallelism.\n  Open-Sora improves **55%** training speed when training on 64x512x512 videos. Details locates\n  at [acceleration.md](docs/acceleration.md).\n- 🔧 **Data preprocessing pipeline v1.0**,\n  including [downloading](tools/datasets/README.md), [video cutting](tools/scene_cut/README.md),\n  and [captioning](tools/caption/README.md) tools. Our data collection plan can be found\n  at [datasets.md](docs/datasets.md).\n- ✅ We find VQ-VAE from [VideoGPT](https://wilson1yan.github.io/videogpt/index.html) has a low quality and thus adopt a\n  better VAE from [Stability-AI](https://huggingface.co/stabilityai/sd-vae-ft-mse-original). We also find patching in\n  the time dimension deteriorates the quality. See our **[report](docs/report_01.md)** for more discussions.\n- ✅ We investigate different architectures including DiT, Latte, and our proposed STDiT. Our **STDiT** achieves a better\n  trade-off between quality and speed. See our **[report](docs/report_01.md)** for more discussions.\n- ✅ Support clip and T5 text conditioning.\n- ✅ By viewing images as one-frame videos, our project supports training DiT on both images and videos (e.g., ImageNet &\n  UCF101). See [commands.md](docs/commands.md) for more instructions.\n- ✅ Support inference with official weights\n  from [DiT](https://github.com/facebookresearch/DiT), [Latte](https://github.com/Vchitect/Latte),\n  and [PixArt](https://pixart-alpha.github.io/).\n- ✅ Refactor the codebase. See [structure.md](docs/structure.md) to learn the project structure and how to use the\n  config files.\n\n</details>\n\n### TODO list sorted by priority\n\n<details>\n<summary>View more</summary>\n\n- [x] Training Video-VAE and adapt our model to new VAE.\n- [x] Scaling model parameters and dataset size.\n- [x] Incoporate a better scheduler (rectified flow).\n- [x] Evaluation pipeline.\n- [x] Complete the data processing pipeline (including dense optical flow, aesthetics scores, text-image similarity, etc.). See [the dataset](/docs/datasets.md) for more information\n- [x] Support image and video conditioning.\n- [x] Support variable aspect ratios, resolutions, durations.\n\n</details>\n\n## Contents\n\n- [Installation](#installation)\n- [Model Weights](#model-weights)\n- [Gradio Demo](#gradio-demo)\n- [Inference](#inference)\n- [Data Processing](#data-processing)\n- [Training](#training)\n- [Evaluation](#evaluation)\n- [VAE Training & Evaluation](#vae-training--evaluation)\n- [Contribution](#contribution)\n- [Citation](#citation)\n- [Acknowledgement](#acknowledgement)\n\nOther useful documents and links are listed below.\n\n- Report: each version is trained from a image base seperately (not continuously trained), while a newer version will incorporate the techniques from the previous version.\n  - [report 1.2](docs/report_03.md): rectified flow, 3d-VAE, score condition, evaluation, etc.\n  - [report 1.1](docs/report_02.md): multi-resolution/length/aspect-ratio, image/video conditioning/editing, data preprocessing, etc.\n  - [report 1.0](docs/report_01.md): architecture, captioning, etc.\n  - [acceleration.md](docs/acceleration.md)\n- Repo structure: [structure.md](docs/structure.md)\n- Config file explanation: [config.md](docs/config.md)\n- Useful commands: [commands.md](docs/commands.md)\n- Data processing pipeline and dataset: [datasets.md](docs/datasets.md)\n- Each data processing tool's README: [dataset conventions and management](/tools/datasets/README.md), [scene cutting](/tools/scene_cut/README.md), [scoring](/tools/scoring/README.md), [caption](/tools/caption/README.md)\n- Evaluation: [eval/README.md](/eval/README.md)\n- Gallery: [gallery](https://hpcaitech.github.io/Open-Sora/)\n\n## Installation\n\n### Install from Source\n\nFor CUDA 12.1, you can install the dependencies with the following commands. Otherwise, please refer to [Installation Documentation](docs/installation.md) for more instructions on different cuda version, and additional dependency for data preprocessing, VAE, and model evaluation.\n\n```bash\n# create a virtual env and activate (conda as an example)\nconda create -n opensora python=3.9\nconda activate opensora\n\n# download the repo\ngit clone https://github.com/hpcaitech/Open-Sora\ncd Open-Sora\n\n# install torch, torchvision and xformers\npip install -r requirements/requirements-cu121.txt\n\n# the default installation is for inference only\npip install -v . # for development mode, `pip install -v -e .`\n```\n\n(Optional, recommended for fast speed, especially for training) To enable `layernorm_kernel` and `flash_attn`, you need to install `apex` and `flash-attn` with the following commands.\n\n```bash\n# install flash attention\n# set enable_flash_attn=False in config to disable flash attention\npip install packaging ninja\npip install flash-attn --no-build-isolation\n\n# install apex\n# set enable_layernorm_kernel=False in config to disable apex\npip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings \"--build-option=--cpp_ext\" --config-settings \"--build-option=--cuda_ext\" git+https://github.com/NVIDIA/apex.git\n```\n\n### Use Docker\n\nRun the following command to build a docker image from Dockerfile provided.\n\n```bash\ndocker build -t opensora .\n```\n\nRun the following command to start the docker container in interactive mode.\n\n```bash\ndocker run -ti --gpus all -v .:/workspace/Open-Sora opensora\n```\n\n## Model Weights\n\n### Open-Sora 1.2 Model Weights\n\n| Model     | Model Size | Data | #iterations | Batch Size | URL                                                           |\n| --------- | ---------- | ---- | ----------- | ---------- | ------------------------------------------------------------- |\n| Diffusion | 1.1B       | 30M  | 70k         | Dynamic    | [:link:](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v3) |\n| VAE       | 384M       | 3M   | 1M          | 8          | [:link:](https://huggingface.co/hpcai-tech/OpenSora-VAE-v1.2) |\n\nSee our **[report 1.2](docs/report_03.md)** for more infomation. Weight will be automatically downloaded when you run the inference script.\n\n> For users from mainland China, try `export HF_ENDPOINT=https://hf-mirror.com` to successfully download the weights.\n\n### Open-Sora 1.1 Model Weights\n\n<details>\n<summary>View more</summary>\n\n| Resolution         | Model Size | Data                       | #iterations | Batch Size                                        | URL                                                                  |\n| ------------------ | ---------- | -------------------------- | ----------- | ------------------------------------------------- | -------------------------------------------------------------------- |\n| mainly 144p & 240p | 700M       | 10M videos + 2M images     | 100k        | [dynamic](/configs/opensora-v1-1/train/stage2.py) | [:link:](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v2-stage2) |\n| 144p to 720p       | 700M       | 500K HQ videos + 1M images | 4k          | [dynamic](/configs/opensora-v1-1/train/stage3.py) | [:link:](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v2-stage3) |\n\nSee our **[report 1.1](docs/report_02.md)** for more infomation.\n\n:warning: **LIMITATION**: This version contains known issues which we are going to fix in the next version (as we save computation resource for the next release). In addition, the video generation may fail for long duration, and high resolution will have noisy results due to this problem.\n\n</details>\n\n### Open-Sora 1.0 Model Weights\n\n<details>\n<summary>View more</summary>\n\n| Resolution | Model Size | Data   | #iterations | Batch Size | GPU days (H800) | URL                                                                                           |\n| ---------- | ---------- | ------ | ----------- | ---------- | --------------- | --------------------------------------------------------------------------------------------- |\n| 16×512×512 | 700M       | 20K HQ | 20k         | 2×64       | 35              | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth) |\n| 16×256×256 | 700M       | 20K HQ | 24k         | 8×64       | 45              | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth) |\n| 16×256×256 | 700M       | 366K   | 80k         | 8×64       | 117             | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth)    |\n\nTraining orders: 16x256x256 $\\rightarrow$ 16x256x256 HQ $\\rightarrow$ 16x512x512 HQ.\n\nOur model's weight is partially initialized from [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha). The number of\nparameters is 724M. More information about training can be found in our **[report](/docs/report_01.md)**. More about\nthe dataset can be found in [datasets.md](/docs/datasets.md). HQ means high quality.\n\n:warning: **LIMITATION**: Our model is trained on a limited budget. The quality and text alignment is relatively poor.\nThe model performs badly, especially on generating human beings and cannot follow detailed instructions. We are working\non improving the quality and text alignment.\n\n</details>\n\n## Gradio Demo\n\n🔥 You can experience Open-Sora on our [🤗 Gradio application](https://huggingface.co/spaces/hpcai-tech/open-sora) on Hugging Face online.\n\n### Local Deployment\n\nIf you want to deploy gradio locally, we have also provided a [Gradio application](./gradio) in this repository, you can use the following the command to start an interactive web application to experience video generation with Open-Sora.\n\n```bash\npip install gradio spaces\npython gradio/app.py\n```\n\nThis will launch a Gradio application on your localhost. If you want to know more about the Gradio applicaiton, you can refer to the [Gradio README](./gradio/README.md).\n\nTo enable prompt enhancement and other language input (e.g., 中文输入), you need to set the `OPENAI_API_KEY` in the environment. Check [OpenAI's documentation](https://platform.openai.com/docs/quickstart) to get your API key.\n\n```bash\nexport OPENAI_API_KEY=YOUR_API_KEY\n```\n\n### Getting Started\n\nIn the Gradio application, the basic options are as follows:\n\n![Gradio Demo](assets/readme/gradio_basic.png)\n\nThe easiest way to generate a video is to input a text prompt and click the \"**Generate video**\" button (scroll down if you cannot find). The generated video will be displayed in the right panel. Checking the \"**Enhance prompt with GPT4o**\" will use GPT-4o to refine the prompt, while \"**Random Prompt**\" button will generate a random prompt by GPT-4o for you. Due to the OpenAI's API limit, the prompt refinement result has some randomness.\n\nThen, you can choose the **resolution**, **duration**, and **aspect ratio** of the generated video. Different resolution and video length will affect the video generation speed. On a 80G H100 GPU, the generation speed (with `num_sampling_step=30`) and peak memory usage is:\n\n|      | Image   | 2s       | 4s        | 8s        | 16s       |\n| ---- | ------- | -------- | --------- | --------- | --------- |\n| 360p | 3s, 24G | 18s, 27G | 31s, 27G  | 62s, 28G  | 121s, 33G |\n| 480p | 2s, 24G | 29s, 31G | 55s, 30G  | 108s, 32G | 219s, 36G |\n| 720p | 6s, 27G | 68s, 41G | 130s, 39G | 260s, 45G | 547s, 67G |\n\nNote that besides text to video, you can also use **image to video generation**. You can upload an image and then click the \"**Generate video**\" button to generate a video with the image as the first frame. Or you can fill in the text prompt and click the \"**Generate image**\" button to generate an image with the text prompt, and then click the \"**Generate video**\" button to generate a video with the image generated with the same model.\n\n![Gradio Demo](assets/readme/gradio_option.png)\n\nThen you can specify more options, including \"**Motion Strength**\", \"**Aesthetic**\" and \"**Camera Motion**\". If \"Enable\" not checked or the choice is \"none\", the information is not passed to the model. Otherwise, the model will generate videos with the specified motion strength, aesthetic score, and camera motion.\n\nFor the **aesthetic score**, we recommend using values higher than 6. For **motion strength**, a smaller value will lead to a smoother but less dynamic video, while a larger value will lead to a more dynamic but likely more blurry video. Thus, you can try without it and then adjust it according to the generated video. For the **camera motion**, sometimes the model cannot follow the instruction well, and we are working on improving it.\n\nYou can also adjust the \"**Sampling steps**\", this is directly related to the generation speed as it is the number of denoising. A number smaller than 30 usually leads to a poor generation results, while a number larger than 100 usually has no significant improvement. The \"**Seed**\" is used for reproducibility, you can set it to a fixed number to generate the same video. The \"**CFG Scale**\" controls how much the model follows the text prompt, a smaller value will lead to a more random video, while a larger value will lead to a more text-following video (7 is recommended).\n\nFor more advanced usage, you can refer to [Gradio README](./gradio/README.md#advanced-usage).\n\n## Inference\n\n### Open-Sora 1.2 Command Line Inference\n\nThe basic command line inference is as follows:\n\n```bash\n# text to video\npython scripts/inference.py configs/opensora-v1-2/inference/sample.py \\\n  --num-frames 4s --resolution 720p --aspect-ratio 9:16 \\\n  --prompt \"a beautiful waterfall\"\n```\n\nYou can add more options to the command line to customize the generation.\n\n```bash\npython scripts/inference.py configs/opensora-v1-2/inference/sample.py \\\n  --num-frames 4s --resolution 720p --aspect-ratio 9:16 \\\n  --num-sampling-steps 30 --flow 5 --aes 6.5 \\\n  --prompt \"a beautiful waterfall\"\n```\n\nFor image to video generation and other functionalities, the API is compatible with Open-Sora 1.1. See [here](docs/commands.md) for more instructions.\n\nIf your installation do not contain `apex` and `flash-attn`, you need to disable them in the config file, or via the folowing command.\n\n```bash\npython scripts/inference.py configs/opensora-v1-2/inference/sample.py \\\n  --num-frames 4s --resolution 720p \\\n  --layernorm-kernel False --flash-attn False \\\n  --prompt \"a beautiful waterfall\"\n```\n\n### Sequence Parallelism Inference\n\nTo enable sequence parallelism, you need to use `torchrun` to run the inference script. The following command will run the inference with 2 GPUs.\n\n```bash\n# text to video\nCUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node 2 scripts/inference.py configs/opensora-v1-2/inference/sample.py \\\n  --num-frames 4s --resolution 720p --aspect-ratio 9:16 \\\n  --prompt \"a beautiful waterfall\"\n```\n\n:warning: **LIMITATION**: The sequence parallelism is not supported for gradio deployment. For now, the sequence parallelism is only supported when the dimension can be divided by the number of GPUs. Thus, it may fail for some cases. We tested 4 GPUs for 720p and 2 GPUs for 480p.\n\n### GPT-4o Prompt Refinement\n\nWe find that GPT-4o can refine the prompt and improve the quality of the generated video. With this feature, you can also use other language (e.g., Chinese) as the prompt. To enable this feature, you need prepare your openai api key in the environment:\n\n```bash\nexport OPENAI_API_KEY=YOUR_API_KEY\n```\n\nThen you can inference with `--llm-refine True` to enable the GPT-4o prompt refinement, or leave prompt empty to get a random prompt generated by GPT-4o.\n\n```bash\npython scripts/inference.py configs/opensora-v1-2/inference/sample.py \\\n  --num-frames 4s --resolution 720p --llm-refine True\n```\n\n### Open-Sora 1.1 Command Line Inference\n\n<details>\n<summary>View more</summary>\n\nSince Open-Sora 1.1 supports inference with dynamic input size, you can pass the input size as an argument.\n\n```bash\n# text to video\npython scripts/inference.py configs/opensora-v1-1/inference/sample.py --prompt \"A beautiful sunset over the city\" --num-frames 32 --image-size 480 854\n```\n\nIf your installation do not contain `apex` and `flash-attn`, you need to disable them in the config file, or via the folowing command.\n\n```bash\npython scripts/inference.py configs/opensora-v1-1/inference/sample.py --prompt \"A beautiful sunset over the city\" --num-frames 32 --image-size 480 854 --layernorm-kernel False --flash-attn False\n```\n\nSee [here](docs/commands.md#inference-with-open-sora-11) for more instructions including text-to-image, image-to-video, video-to-video, and infinite time generation.\n\n</details>\n\n### Open-Sora 1.0 Command Line Inference\n\n<details>\n<summary>View more</summary>\n\nWe have also provided an offline inference script. Run the following commands to generate samples, the required model weights will be automatically downloaded. To change sampling prompts, modify the txt file passed to `--prompt-path`. See [here](docs/structure.md#inference-config-demos) to customize the configuration.\n\n```bash\n# Sample 16x512x512 (20s/sample, 100 time steps, 24 GB memory)\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path OpenSora-v1-HQ-16x512x512.pth --prompt-path ./assets/texts/t2v_samples.txt\n\n# Sample 16x256x256 (5s/sample, 100 time steps, 22 GB memory)\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path OpenSora-v1-HQ-16x256x256.pth --prompt-path ./assets/texts/t2v_samples.txt\n\n# Sample 64x512x512 (40s/sample, 100 time steps)\ntorchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/64x512x512.py --ckpt-path ./path/to/your/ckpt.pth --prompt-path ./assets/texts/t2v_samples.txt\n\n# Sample 64x512x512 with sequence parallelism (30s/sample, 100 time steps)\n# sequence parallelism is enabled automatically when nproc_per_node is larger than 1\ntorchrun --standalone --nproc_per_node 2 scripts/inference.py configs/opensora/inference/64x512x512.py --ckpt-path ./path/to/your/ckpt.pth --prompt-path ./assets/texts/t2v_samples.txt\n```\n\nThe speed is tested on H800 GPUs. For inference with other models, see [here](docs/commands.md) for more instructions.\nTo lower the memory usage, set a smaller `vae.micro_batch_size` in the config (slightly lower sampling speed).\n\n</details>\n\n## Data Processing\n\nHigh-quality data is crucial for training good generation models.\nTo this end, we establish a complete pipeline for data processing, which could seamlessly convert raw videos to high-quality video-text pairs.\nThe pipeline is shown below. For detailed information, please refer to [data processing](docs/data_processing.md).\nAlso check out the [datasets](docs/datasets.md) we use.\n\n![Data Processing Pipeline](assets/readme/report_data_pipeline.png)\n\n## Training\n\n### Open-Sora 1.2 Training\n\nThe training process is same as Open-Sora 1.1.\n\n```bash\n# one node\ntorchrun --standalone --nproc_per_node 8 scripts/train.py \\\n    configs/opensora-v1-2/train/stage1.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n# multiple nodes\ncolossalai run --nproc_per_node 8 --hostfile hostfile scripts/train.py \\\n    configs/opensora-v1-2/train/stage1.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n```\n\n### Open-Sora 1.1 Training\n\n<details>\n<summary>View more</summary>\n\nOnce you prepare the data in a `csv` file, run the following commands to launch training on a single node.\n\n```bash\n# one node\ntorchrun --standalone --nproc_per_node 8 scripts/train.py \\\n    configs/opensora-v1-1/train/stage1.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n# multiple nodes\ncolossalai run --nproc_per_node 8 --hostfile hostfile scripts/train.py \\\n    configs/opensora-v1-1/train/stage1.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n```\n\n</details>\n\n### Open-Sora 1.0 Training\n\n<details>\n<summary>View more</summary>\n\nOnce you prepare the data in a `csv` file, run the following commands to launch training on a single node.\n\n```bash\n# 1 GPU, 16x256x256\ntorchrun --nnodes=1 --nproc_per_node=1 scripts/train.py configs/opensora/train/16x256x256.py --data-path YOUR_CSV_PATH\n# 8 GPUs, 64x512x512\ntorchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n```\n\nTo launch training on multiple nodes, prepare a hostfile according\nto [ColossalAI](https://colossalai.org/docs/basics/launch_colossalai/#launch-with-colossal-ai-cli), and run the\nfollowing commands.\n\n```bash\ncolossalai run --nproc_per_node 8 --hostfile hostfile scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT\n```\n\nFor training other models and advanced usage, see [here](docs/commands.md) for more instructions.\n\n</details>\n\n## Evaluation\n\nWe support evaluation based on:\n\n- Validation loss\n- [VBench](https://github.com/Vchitect/VBench/tree/master) score\n- VBench-i2v score\n- Batch generation for human evaluation\n\nAll the evaluation code is released in `eval` folder. Check the [README](/eval/README.md) for more details. Our [report](/docs/report_03.md#evaluation) also provides more information about the evaluation during training. The following table shows Open-Sora 1.2 greatly improves Open-Sora 1.0.\n\n| Model          | Total Score | Quality Score | Semantic Score |\n| -------------- | ----------- | ------------- | -------------- |\n| Open-Sora V1.0 | 75.91%      | 78.81%        | 64.28%         |\n| Open-Sora V1.2 | 79.23%      | 80.71%        | 73.30%         |\n\n## VAE Training & Evaluation\n\nWe train a VAE pipeline that consists of a spatial VAE followed by a temporal VAE.\nFor more details, refer to [VAE Documentation](docs/vae.md).\nBefore you run the following commands, follow our [Installation Documentation](docs/installation.md) to install the required dependencies for VAE and Evaluation.\n\nIf you want to train your own VAE, we need to prepare data in the csv following the [data processing](#data-processing) pipeline, then run the following commands.\nNote that you need to adjust the number of trained epochs (`epochs`) in the config file accordingly with respect to your own csv data size.\n\n```bash\n# stage 1 training, 380k steps, 8 GPUs\ntorchrun --nnodes=1 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage1.py --data-path YOUR_CSV_PATH\n# stage 2 training, 260k steps, 8 GPUs\ntorchrun --nnodes=1 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage2.py --data-path YOUR_CSV_PATH\n# stage 3 training, 540k steps, 24 GPUs\ntorchrun --nnodes=3 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage3.py --data-path YOUR_CSV_PATH\n```\n\nTo evaluate the VAE performance, you need to run VAE inference first to generate the videos, then calculate scores on the generated videos:\n\n```bash\n# video generation\ntorchrun --standalone --nnodes=1 --nproc_per_node=1 scripts/inference_vae.py configs/vae/inference/video.py --ckpt-path YOUR_VAE_CKPT_PATH --data-path YOUR_CSV_PATH --save-dir YOUR_VIDEO_DIR\n# the original videos will be saved to `YOUR_VIDEO_DIR_ori`\n# the reconstructed videos through the pipeline will be saved to `YOUR_VIDEO_DIR_rec`\n# the reconstructed videos through the spatial VAE only will be saved to `YOUR_VIDEO_DIR_spatial`\n\n# score calculation\npython eval/vae/eval_common_metric.py --batch_size 2 --real_video_dir YOUR_VIDEO_DIR_ori --generated_video_dir YOUR_VIDEO_DIR_rec --device cuda --sample_fps 24 --crop_size 256 --resolution 256 --num_frames 17 --sample_rate 1 --metric ssim psnr lpips flolpips\n```\n\n## Contribution\n\nThanks goes to these wonderful contributors:\n\n<a href=\"https://github.com/hpcaitech/Open-Sora/graphs/contributors\">\n  <img src=\"https://contrib.rocks/image?repo=hpcaitech/Open-Sora\" />\n</a>\n\nIf you wish to contribute to this project, please refer to the [Contribution Guideline](./CONTRIBUTING.md).\n\n## Acknowledgement\n\nHere we only list a few of the projects. For other works and datasets, please refer to our report.\n\n- [ColossalAI](https://github.com/hpcaitech/ColossalAI): A powerful large model parallel acceleration and optimization\n  system.\n- [DiT](https://github.com/facebookresearch/DiT): Scalable Diffusion Models with Transformers.\n- [OpenDiT](https://github.com/NUS-HPC-AI-Lab/OpenDiT): An acceleration for DiT training. We adopt valuable acceleration\n  strategies for training progress from OpenDiT.\n- [PixArt](https://github.com/PixArt-alpha/PixArt-alpha): An open-source DiT-based text-to-image model.\n- [Latte](https://github.com/Vchitect/Latte): An attempt to efficiently train DiT for video.\n- [StabilityAI VAE](https://huggingface.co/stabilityai/sd-vae-ft-mse-original): A powerful image VAE model.\n- [CLIP](https://github.com/openai/CLIP): A powerful text-image embedding model.\n- [T5](https://github.com/google-research/text-to-text-transfer-transformer): A powerful text encoder.\n- [LLaVA](https://github.com/haotian-liu/LLaVA): A powerful image captioning model based on [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) and [Yi-34B](https://huggingface.co/01-ai/Yi-34B).\n- [PLLaVA](https://github.com/magic-research/PLLaVA): A powerful video captioning model.\n- [MiraData](https://github.com/mira-space/MiraData): A large-scale video dataset with long durations and structured caption.\n\nWe are grateful for their exceptional work and generous contribution to open source. Special thanks go to the authors of [MiraData](https://github.com/mira-space/MiraData) and [Rectified Flow](https://github.com/gnobitab/RectifiedFlow) for their valuable advice and help. We wish to express gratitude towards AK for sharing this project on social media and Hugging Face for providing free GPU resources for our online Gradio demo.\n\n## Citation\n\n```bibtex\n@software{opensora,\n  author = {Zangwei Zheng and Xiangyu Peng and Tianji Yang and Chenhui Shen and Shenggui Li and Hongxin Liu and Yukun Zhou and Tianyi Li and Yang You},\n  title = {Open-Sora: Democratizing Efficient Video Production for All},\n  month = {March},\n  year = {2024},\n  url = {https://github.com/hpcaitech/Open-Sora}\n}\n```\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=hpcaitech/Open-Sora&type=Date)](https://star-history.com/#hpcaitech/Open-Sora&Date)\n"
  },
  {
    "path": "Open-Sora/opensora.egg-info/SOURCES.txt",
    "content": "LICENSE\nREADME.md\npyproject.toml\nsetup.py\nopensora/__init__.py\nopensora/registry.py\nopensora.egg-info/PKG-INFO\nopensora.egg-info/SOURCES.txt\nopensora.egg-info/dependency_links.txt\nopensora.egg-info/requires.txt\nopensora.egg-info/top_level.txt\nopensora/acceleration/__init__.py\nopensora/acceleration/checkpoint.py\nopensora/acceleration/communications.py\nopensora/acceleration/parallel_states.py\nopensora/acceleration/plugin.py\nopensora/acceleration/shardformer/__init__.py\nopensora/acceleration/shardformer/modeling/__init__.py\nopensora/acceleration/shardformer/modeling/t5.py\nopensora/acceleration/shardformer/policy/__init__.py\nopensora/acceleration/shardformer/policy/t5_encoder.py\nopensora/datasets/__init__.py\nopensora/datasets/aspect.py\nopensora/datasets/bucket.py\nopensora/datasets/dataloader.py\nopensora/datasets/datasets.py\nopensora/datasets/read_video.py\nopensora/datasets/sampler.py\nopensora/datasets/utils.py\nopensora/datasets/video_transforms.py\nopensora/models/__init__.py\nopensora/models/cache_functions/__init__.py\nopensora/models/cache_functions/attention.py\nopensora/models/cache_functions/cache_cutfresh.py\nopensora/models/cache_functions/cache_init.py\nopensora/models/cache_functions/force_init.py\nopensora/models/cache_functions/force_scheduler.py\nopensora/models/cache_functions/fresh_ratio_scheduler.py\nopensora/models/cache_functions/global_force_fresh.py\nopensora/models/cache_functions/score_evaluate.py\nopensora/models/cache_functions/scores.py\nopensora/models/cache_functions/token_merge.py\nopensora/models/cache_functions/update_cache.py\nopensora/models/dit/__init__.py\nopensora/models/dit/dit.py\nopensora/models/latte/__init__.py\nopensora/models/latte/latte.py\nopensora/models/layers/__init__.py\nopensora/models/layers/blocks.py\nopensora/models/pixart/__init__.py\nopensora/models/pixart/pixart.py\nopensora/models/pixart/pixart_sigma.py\nopensora/models/stdit/__init__.py\nopensora/models/stdit/stdit.py\nopensora/models/stdit/stdit2.py\nopensora/models/stdit/stdit3 copy.py\nopensora/models/stdit/stdit3.py\nopensora/models/text_encoder/__init__.py\nopensora/models/text_encoder/classes.py\nopensora/models/text_encoder/clip.py\nopensora/models/text_encoder/t5.py\nopensora/models/vae/__init__.py\nopensora/models/vae/discriminator.py\nopensora/models/vae/losses.py\nopensora/models/vae/lpips.py\nopensora/models/vae/utils.py\nopensora/models/vae/vae.py\nopensora/models/vae/vae_temporal.py\nopensora/schedulers/__init__.py\nopensora/schedulers/dpms/__init__.py\nopensora/schedulers/dpms/dpm_solver.py\nopensora/schedulers/iddpm/__init__.py\nopensora/schedulers/iddpm/diffusion_utils.py\nopensora/schedulers/iddpm/gaussian_diffusion.py\nopensora/schedulers/iddpm/respace.py\nopensora/schedulers/iddpm/speed.py\nopensora/schedulers/iddpm/timestep_sampler.py\nopensora/schedulers/rf/__init__.py\nopensora/schedulers/rf/rectified_flow.py\nopensora/utils/__init__.py\nopensora/utils/ckpt_utils.py\nopensora/utils/config_utils.py\nopensora/utils/inference_utils.py\nopensora/utils/lr_scheduler.py\nopensora/utils/misc.py\nopensora/utils/train_utils.py\ntests/test_attn.py\ntests/test_lr_scheduler.py\ntests/test_np_torch.py\ntests/test_pos_emb.py\ntests/test_seq_parallel_attention.py\ntests/test_stdit3_sequence_parallelism.py\ntests/test_t5_shardformer.py\ntools/caption/__init__.py\ntools/caption/camera_motion_detect.py\ntools/caption/caption_gpt4.py\ntools/caption/caption_llama3.py\ntools/caption/caption_llava.py\ntools/caption/utils.py\ntools/caption/acceleration/__init__.py\ntools/caption/acceleration/llava/__init__.py\ntools/caption/acceleration/llava/policies/__init__.py\ntools/caption/acceleration/llava/policies/llama.py\ntools/caption/acceleration/llava/policies/mistral.py\ntools/caption/camera_motion/__init__.py\ntools/caption/camera_motion/camera_motion.py\ntools/caption/camera_motion/detect.py\ntools/caption/camera_motion/utils.py\ntools/caption/camera_motion/visualizer.py\ntools/datasets/__init__.py\ntools/datasets/analyze.py\ntools/datasets/convert.py\ntools/datasets/datautil.py\ntools/datasets/filter_panda10m.py\ntools/datasets/split.py\ntools/datasets/transform.py\ntools/datasets/utils.py\ntools/frame_interpolation/__init__.py\ntools/frame_interpolation/interpolation.py\ntools/frame_interpolation/networks/__init__.py\ntools/frame_interpolation/networks/amt_g.py\ntools/frame_interpolation/networks/blocks/__init__.py\ntools/frame_interpolation/networks/blocks/feat_enc.py\ntools/frame_interpolation/networks/blocks/ifrnet.py\ntools/frame_interpolation/networks/blocks/multi_flow.py\ntools/frame_interpolation/networks/blocks/raft.py\ntools/frame_interpolation/utils/__init__.py\ntools/frame_interpolation/utils/dist_utils.py\ntools/frame_interpolation/utils/flow_utils.py\ntools/frame_interpolation/utils/utils.py\ntools/scene_cut/__init__.py\ntools/scene_cut/convert_id_to_path.py\ntools/scene_cut/cut.py\ntools/scene_cut/scene_detect.py\ntools/scoring/__init__.py\ntools/scoring/aesthetic/__init__.py\ntools/scoring/aesthetic/inference.py\ntools/scoring/matching/__init__.py\ntools/scoring/matching/inference.py\ntools/scoring/ocr/__init__.py\ntools/scoring/ocr/dbnetpp.py\ntools/scoring/ocr/inference.py\ntools/scoring/optical_flow/__init__.py\ntools/scoring/optical_flow/inference.py\ntools/scoring/optical_flow/unimatch/__init__.py\ntools/scoring/optical_flow/unimatch/attention.py\ntools/scoring/optical_flow/unimatch/backbone.py\ntools/scoring/optical_flow/unimatch/geometry.py\ntools/scoring/optical_flow/unimatch/matching.py\ntools/scoring/optical_flow/unimatch/position.py\ntools/scoring/optical_flow/unimatch/reg_refine.py\ntools/scoring/optical_flow/unimatch/transformer.py\ntools/scoring/optical_flow/unimatch/trident_conv.py\ntools/scoring/optical_flow/unimatch/unimatch.py\ntools/scoring/optical_flow/unimatch/utils.py"
  },
  {
    "path": "Open-Sora/opensora.egg-info/dependency_links.txt",
    "content": "\n"
  },
  {
    "path": "Open-Sora/opensora.egg-info/requires.txt",
    "content": "colossalai>=0.4.0\nmmengine>=0.10.3\npandas>=2.0.3\ntimm==0.9.16\nrotary_embedding_torch==0.5.3\nftfy>=6.2.0\ndiffusers==0.27.2\naccelerate==0.29.2\nav>=12.0.0\nnumpy<2.0.0\ngradio>=4.26.0\nspaces>=0.28.3\nipykernel>=6.29.4\nipywidgets>=8.1.2\nwandb>=0.17.0\ntensorboard>=2.14.0\npandarallel>=1.6.5\npyarrow>=16.1.0\npre-commit>=3.5.0\nopenai\n\n[data]\ngdown>=5.2.0\nninja>=1.11.1.1\nshortuuid>=1.0.13\nmarkdown2[all]\nscikit-learn>=1.4.2\neinops-exts>=0.0.4\ndecord==0.6.0\nptvsd==4.3.2\nimageio-ffmpeg>=0.4.9\nffmpeg-python==0.2.0\nlingua-language-detector==2.0.2\nimageio>=2.34.1\nsetuptools==68.2.2\nclip@ git+https://github.com/openai/CLIP.git\nmmcv==2.1.0\nmmdet==3.1.0\nmmocr==1.0.1\ndetectron2@ git+https://github.com/facebookresearch/detectron2.git@ff53992\n\n[eval]\ndetectron2@ git+https://github.com/facebookresearch/detectron2.git@ff53992\nimageio>=2.34.1\npyiqa==0.1.10\nscikit-learn>=1.4.2\nscikit-image>=0.20.0\nlvis==0.5.3\nboto3>=1.34.113\neasydict>=1.9\nfairscale>=0.4.13\ndecord==0.6.0\npytorchvideo==0.1.5\nlpips==0.1.4\n\n[full]\ngdown>=5.2.0\nninja>=1.11.1.1\nshortuuid>=1.0.13\nmarkdown2[all]\nscikit-learn>=1.4.2\neinops-exts>=0.0.4\ndecord==0.6.0\nptvsd==4.3.2\nimageio-ffmpeg>=0.4.9\nffmpeg-python==0.2.0\nlingua-language-detector==2.0.2\nimageio>=2.34.1\nsetuptools==68.2.2\nclip@ git+https://github.com/openai/CLIP.git\nmmcv==2.1.0\nmmdet==3.1.0\nmmocr==1.0.1\ndetectron2@ git+https://github.com/facebookresearch/detectron2.git@ff53992\npyiqa==0.1.10\nscikit-image>=0.20.0\nlvis==0.5.3\nboto3>=1.34.113\neasydict>=1.9\nfairscale>=0.4.13\npytorchvideo==0.1.5\nlpips==0.1.4\n\n[vae]\nbeartype==0.18.5\neinops==0.8.0\neinops-exts==0.0.4\nopencv-python==4.9.0.80\npillow==10.3.0\n"
  },
  {
    "path": "Open-Sora/opensora.egg-info/top_level.txt",
    "content": "opensora\ntools\n"
  },
  {
    "path": "Open-Sora/pyproject.toml",
    "content": "[tool.autoflake]\nremove-unused-variables = true\nremove-all-unused-imports = true\nignore-init-module-imports = true\n\n[tool.isort]\nline_length = 120\nmulti_line_output = 3\ninclude_trailing_comma = true\nignore_comments = true\nprofile = \"black\"\nhonor_noqa = true\n\n[tool.black]\nline-length = 120\ntarget-version = [\"py37\", \"py38\", \"py39\", \"py310\"]\n"
  },
  {
    "path": "Open-Sora/requirements/requirements-cu121.txt",
    "content": "torch==2.2.2 --index-url https://download.pytorch.org/whl/cu121\ntorchvision==0.17.2 --index-url https://download.pytorch.org/whl/cu121\nxformers==0.0.25.post1 --index-url https://download.pytorch.org/whl/cu121\n"
  },
  {
    "path": "Open-Sora/requirements/requirements-data.txt",
    "content": "gdown>=5.2.0\n\n# [caption llava]\nninja>=1.11.1.1\nshortuuid>=1.0.13\nmarkdown2[all]\nscikit-learn>=1.4.2\neinops-exts>=0.0.4\n\n# [camera_motion]\ndecord==0.6.0\nptvsd==4.3.2\nimageio-ffmpeg>=0.4.9\n\n# [datasets]\nffmpeg-python==0.2.0\nlingua-language-detector==2.0.2\n\n# [frame interpolation]\nimageio>=2.34.1\n\n# [aesthetic]\nsetuptools==68.2.2\nclip @ git+https://github.com/openai/CLIP.git\n\n# [ocr]\nmmcv==2.1.0\nmmdet==3.1.0\nmmocr==1.0.1\ndetectron2 @ git+https://github.com/facebookresearch/detectron2.git@ff53992\n"
  },
  {
    "path": "Open-Sora/requirements/requirements-eval.txt",
    "content": "# [vbench]\ndetectron2 @ git+https://github.com/facebookresearch/detectron2.git@ff53992\nimageio>=2.34.1\npyiqa==0.1.10\nscikit-learn>=1.4.2\nscikit-image>=0.20.0\nlvis==0.5.3\nboto3>=1.34.113\neasydict>=1.9\nfairscale>=0.4.13\n\n# [vae]\ndecord==0.6.0\npytorchvideo==0.1.5\nlpips==0.1.4\n"
  },
  {
    "path": "Open-Sora/requirements/requirements-pllava.txt",
    "content": "absl-py==2.1.0\naccelerate==0.26.1\naddict==2.4.0\naiofiles==23.2.1\naliyun-python-sdk-core==2.15.0\naliyun-python-sdk-kms==2.16.2\naltair==5.2.0\nannotated-types==0.6.0\nantlr4-python3-runtime==4.9.3\nanyio==4.3.0\nanykeystore==0.2\napex==0.9.10.dev0\nappdirs==1.4.4\nargcomplete==3.2.3\nattrs==23.2.0\nav==10.0.0\nbeautifulsoup4==4.12.3\nblessed==1.20.0\nblessings==1.7\nboto3==1.34.63\nbotocore==1.34.63\nBrotli==1.1.0\ncachetools==5.3.3\ncertifi==2024.2.2\ncffi==1.16.0\ncharset-normalizer==3.3.2\nclick==8.1.7\ncolorama==0.4.6\ncontourpy==1.2.0\ncrcmod==1.7\ncryptacular==1.6.2\ncryptography==42.0.5\ncycler==0.12.1\ndacite==1.7.0\ndecorator==4.4.2\ndecord==0.6.0\ndeepspeed==0.14.0\ndefusedxml==0.7.1\nDeprecated==1.2.14\ndill==0.3.8\ndistro==1.9.0\ndnspython==2.6.1\ndocker-pycreds==0.4.0\neinops==0.6.1\nexceptiongroup==1.2.0\nfastapi==0.110.0\nffmpeg==1.4\nffmpy==0.3.2\nfiftyone==0.23.6\nfiftyone-brain==0.16.1\nfiftyone_db==1.1.2\nfilelock==3.9.0\nfonttools==4.49.0\nfsspec==2024.2.0\nftfy==6.1.3\nfuture==1.0.0\nfvcore==0.1.5.post20221221\ngdown==5.1.0\ngitdb==4.0.11\nGitPython==3.1.42\nglob2==0.7\ngoogle-auth==2.28.2\ngoogle-auth-oauthlib==1.2.0\ngpustat==1.1.1\ngradio==4.21.0\ngradio_client==0.12.0\ngraphql-core==3.2.3\ngreenlet==3.0.3\ngrpcio==1.62.1\nh11==0.14.0\nh2==4.1.0\nhjson==3.1.0\nhpack==4.0.0\nhttpcore==1.0.4\nhttpx==0.27.0\nhuggingface-hub==0.21.4\nhumanize==4.9.0\nhupper==1.12.1\nHypercorn==0.16.0\nhyperframe==6.0.1\nidna==3.6\nidscheck==2.3.0\nimageio==2.27.0\nimageio-ffmpeg==0.4.9\nimportlib_metadata==7.0.2\nimportlib_resources==6.3.0\ninflate64==1.0.0\niopath==0.1.10\nJinja2==3.1.2\njmespath==0.10.0\njoblib==1.3.2\njsonlines==4.0.0\njsonschema==4.21.1\njsonschema-specifications==2023.12.1\nkaleido==0.2.1\nkiwisolver==1.4.5\nlazy_loader==0.3\nMarkdown==3.6\nmarkdown-it-py==3.0.0\nMarkupSafe==2.1.3\nmatplotlib==3.8.3\nmdurl==0.1.2\nmmcv-full==1.7.2\nmodel-index==0.1.11\nmongoengine==0.24.2\nmotor==3.3.2\nmoviepy==1.0.3\nmpmath==1.3.0\nmultivolumefile==0.2.3\nnetworkx==3.2.1\nninja==1.11.1.1\nnumpy==1.23.5\nnvidia-ml-py==12.535.133\nnvidia-ml-py3==7.352.0\noauthlib==3.2.2\nomegaconf==2.3.0\nopenai==1.14.0\nopencv-python==4.9.0.80\nopencv-python-headless==4.9.0.80\nopendatalab==0.0.10\nopenmim==0.3.9\nopenxlab==0.0.36\nordered-set==4.1.0\norjson==3.9.15\noss2==2.17.0\npackaging==24.0\npandas==1.5.3\nPasteDeploy==3.1.0\npathtools==0.1.2\npbkdf2==1.3\npeft==0.10.0\npillow==10.2.0\nplaster==1.1.2\nplaster-pastedeploy==1.0.1\nplatformdirs==4.2.0\nplotly==5.20.0\nportalocker==2.8.2\npprintpp==0.4.0\npriority==2.0.0\nproglog==0.1.10\nprotobuf==4.23.4\npsutil==5.9.4\npy-cpuinfo==9.0.0\npy7zr==0.21.0\npyasn1==0.5.1\npyasn1-modules==0.3.0\npybcj==1.0.2\npycparser==2.21\npycryptodome==3.20.0\npycryptodomex==3.20.0\npydantic==2.6.4\npydantic_core==2.16.3\npydub==0.25.1\nPygments==2.17.2\npymongo==4.6.2\npynvml==11.5.0\npyparsing==3.1.2\npyppmd==1.1.0\npyramid==2.0.2\npyramid-mailer==0.15.1\nPySocks==1.7.1\npython-dateutil==2.9.0.post0\npython-multipart==0.0.9\npython3-openid==3.2.0\npytz==2023.4\nPyYAML==6.0\npyzstd==0.15.9\nrarfile==4.1\nreferencing==0.33.0\nregex==2023.12.25\nrepoze.sendmail==4.4.1\nrequests==2.28.2\nrequests-oauthlib==1.4.0\nretrying==1.3.4\nrich==13.4.2\nrpds-py==0.18.0\nrsa==4.9\nruff==0.3.2\ns3transfer==0.10.1\nsafetensors==0.4.2\nscikit-image==0.22.0\nscikit-learn==1.4.1.post1\nscipy==1.10.1\nsemantic-version==2.10.0\nsentencepiece==0.2.0\nsentry-sdk==1.42.0\nsetproctitle==1.3.3\nshellingham==1.5.4\nsix==1.16.0\nsmmap==5.0.1\nsniffio==1.3.1\nsortedcontainers==2.4.0\nsoupsieve==2.5\nSQLAlchemy==2.0.28\nsse-starlette==0.10.3\nsseclient-py==1.8.0\nstarlette==0.36.3\nstrawberry-graphql==0.138.1\nsympy==1.12\ntabulate==0.9.0\ntaskgroup==0.0.0a4\ntenacity==8.2.3\ntensorboard==2.15.1\ntensorboard-data-server==0.7.2\ntensorboardX==2.6.2.2\ntermcolor==2.3.0\ntexttable==1.7.0\nthreadpoolctl==3.3.0\ntifffile==2024.2.12\ntimm==0.6.12\ntokenizers==0.15.2\ntomli==2.0.1\ntomlkit==0.12.0\ntoolz==0.12.1\ntorch==2.2.2\ntorchaudio\ntorchvision==0.17.2\ntqdm==4.65.2\ntransaction==4.0\ntransformers==4.37.1\ntranslationstring==1.4\ntriton==2.2.0\ntyper==0.9.0\ntyping_extensions==4.8.0\ntzdata==2024.1\ntzlocal==5.2\nuniversal-analytics-python3==1.1.1\nurllib3==1.26.18\nuvicorn==0.28.0\nvelruse==1.1.1\nvenusian==3.1.0\nvoxel51-eta==0.12.6\nwandb==0.14.0\nwcwidth==0.2.13\nWebOb==1.8.7\nwebsockets==11.0.3\nWerkzeug==3.0.1\nwrapt==1.16.0\nwsproto==1.2.0\nWTForms==3.1.2\nwtforms-recaptcha==0.3.2\nxmltodict==0.13.0\nyacs==0.1.8\nyapf==0.40.2\nzipp==3.18.1\nzope.deprecation==5.0\nzope.interface==6.2\nzope.sqlalchemy==3.1\n"
  },
  {
    "path": "Open-Sora/requirements/requirements-vae.txt",
    "content": "beartype==0.18.5\neinops==0.8.0\neinops-exts==0.0.4\nopencv-python==4.9.0.80\npillow==10.3.0\n"
  },
  {
    "path": "Open-Sora/requirements/requirements.txt",
    "content": "colossalai>=0.4.0\nmmengine>=0.10.3\npandas>=2.0.3\ntimm==0.9.16\nrotary_embedding_torch==0.5.3\nftfy>=6.2.0 # for t5\ndiffusers==0.27.2 # for vae\naccelerate==0.29.2 # for t5\nav>=12.0.0 # for video loading\nnumpy<2.0.0\n\n# [gradio]\ngradio>=4.26.0\nspaces>=0.28.3\n\n# [notebook]\nipykernel>=6.29.4\nipywidgets>=8.1.2\n\n# [training]\nwandb>=0.17.0\ntensorboard>=2.14.0\npandarallel>=1.6.5\npyarrow>=16.1.0 # for parquet\n\n# [dev]\npre-commit>=3.5.0\nopenai\n"
  },
  {
    "path": "Open-Sora/scripts/inference.py",
    "content": "import os\nimport time\nfrom pprint import pformat\n\nimport colossalai\nimport torch\nimport torch.distributed as dist\nfrom colossalai.cluster import DistCoordinator\nfrom mmengine.runner import set_random_seed\nfrom tqdm import tqdm\n\nfrom opensora.acceleration.parallel_states import set_sequence_parallel_group\nfrom opensora.datasets import save_sample\nfrom opensora.datasets.aspect import get_image_size, get_num_frames\nfrom opensora.models.text_encoder.t5 import text_preprocessing\nfrom opensora.registry import MODELS, SCHEDULERS, build_module\nfrom opensora.utils.config_utils import parse_configs\nfrom opensora.utils.inference_utils import (\n    add_watermark,\n    append_generated,\n    append_score_to_prompts,\n    apply_mask_strategy,\n    collect_references_batch,\n    dframe_to_frame,\n    extract_json_from_prompts,\n    extract_prompts_loop,\n    get_save_path_name,\n    load_prompts,\n    merge_prompt,\n    prepare_multi_resolution_info,\n    refine_prompts_by_openai,\n    split_prompt,\n)\nfrom opensora.utils.misc import all_exists, create_logger, is_distributed, is_main_process, to_torch_dtype\n\n\ndef main():\n    torch.set_grad_enabled(False)\n    # ======================================================\n    # configs & runtime variables\n    # ======================================================\n    # == parse configs ==\n    cfg = parse_configs(training=False)\n\n    # == device and dtype ==\n    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n    cfg_dtype = cfg.get(\"dtype\", \"fp32\")\n    assert cfg_dtype in [\"fp16\", \"bf16\", \"fp32\"], f\"Unknown mixed precision {cfg_dtype}\"\n    dtype = to_torch_dtype(cfg.get(\"dtype\", \"bf16\"))\n    torch.backends.cuda.matmul.allow_tf32 = True\n    torch.backends.cudnn.allow_tf32 = True\n\n    # == init distributed env ==\n    if is_distributed():\n        colossalai.launch_from_torch({})\n        coordinator = DistCoordinator()\n        enable_sequence_parallelism = coordinator.world_size > 1\n        if enable_sequence_parallelism:\n            set_sequence_parallel_group(dist.group.WORLD)\n    else:\n        coordinator = None\n        enable_sequence_parallelism = False\n    set_random_seed(seed=cfg.get(\"seed\", 1024))\n\n    # == init logger ==\n    logger = create_logger()\n    logger.info(\"Inference configuration:\\n %s\", pformat(cfg.to_dict()))\n    verbose = cfg.get(\"verbose\", 1)\n    progress_wrap = tqdm if verbose == 1 else (lambda x: x)\n\n    # ======================================================\n    # build model & load weights\n    # ======================================================\n    logger.info(\"Building models...\")\n    # == build text-encoder and vae ==\n    text_encoder = build_module(cfg.text_encoder, MODELS, device=device)\n    vae = build_module(cfg.vae, MODELS).to(device, dtype).eval()\n\n    # == prepare video size ==\n    image_size = cfg.get(\"image_size\", None)\n    if image_size is None:\n        resolution = cfg.get(\"resolution\", None)\n        aspect_ratio = cfg.get(\"aspect_ratio\", None)\n        assert (\n            resolution is not None and aspect_ratio is not None\n        ), \"resolution and aspect_ratio must be provided if image_size is not provided\"\n        image_size = get_image_size(resolution, aspect_ratio)\n    num_frames = get_num_frames(cfg.num_frames)\n\n    # == build diffusion model ==\n    input_size = (num_frames, *image_size)\n    latent_size = vae.get_latent_size(input_size)\n    model = (\n        build_module(\n            cfg.model,\n            MODELS,\n            input_size=latent_size,\n            in_channels=vae.out_channels,\n            caption_channels=text_encoder.output_dim,\n            model_max_length=text_encoder.model_max_length,\n            enable_sequence_parallelism=enable_sequence_parallelism,\n        )\n        .to(device, dtype)\n        .eval()\n    )\n    text_encoder.y_embedder = model.y_embedder  # HACK: for classifier-free guidance\n\n    # == build scheduler ==\n    scheduler = build_module(cfg.scheduler, SCHEDULERS)\n\n    # ======================================================\n    # inference\n    # ======================================================\n    # == load prompts ==\n    prompts = cfg.get(\"prompt\", None)\n    start_idx = cfg.get(\"start_index\", 0)\n    if prompts is None:\n        if cfg.get(\"prompt_path\", None) is not None:\n            prompts = load_prompts(cfg.prompt_path, start_idx, cfg.get(\"end_index\", None))\n        else:\n            prompts = [cfg.get(\"prompt_generator\", \"\")] * 1_000_000  # endless loop\n    #print(start_idx, cfg.get(\"end_index\", None))\n    # == prepare reference ==\n    reference_path = cfg.get(\"reference_path\", [\"\"] * len(prompts))\n    mask_strategy = cfg.get(\"mask_strategy\", [\"\"] * len(prompts))\n    assert len(reference_path) == len(prompts), \"Length of reference must be the same as prompts\"\n    assert len(mask_strategy) == len(prompts), \"Length of mask_strategy must be the same as prompts\"\n\n    # == prepare arguments ==\n    fps = cfg.fps\n    save_fps = cfg.get(\"save_fps\", fps // cfg.get(\"frame_interval\", 1))\n    multi_resolution = cfg.get(\"multi_resolution\", None)\n    batch_size = cfg.get(\"batch_size\", 1)\n    num_sample = cfg.get(\"num_sample\", 1)\n    loop = cfg.get(\"loop\", 1)\n    condition_frame_length = cfg.get(\"condition_frame_length\", 5)\n    condition_frame_edit = cfg.get(\"condition_frame_edit\", 0.0)\n    align = cfg.get(\"align\", None)\n\n    save_dir = cfg.save_dir\n    os.makedirs(save_dir, exist_ok=True)\n    sample_name = cfg.get(\"sample_name\", None)\n    prompt_as_path = cfg.get(\"prompt_as_path\", False)\n\n    # == Iter over all samples ==\n    for i in progress_wrap(range(0, len(prompts), batch_size)):\n        # == prepare batch prompts ==\n        batch_prompts = prompts[i : i + batch_size]\n        ms = mask_strategy[i : i + batch_size]\n        refs = reference_path[i : i + batch_size]\n\n        # == get json from prompts ==\n        batch_prompts, refs, ms = extract_json_from_prompts(batch_prompts, refs, ms)\n        original_batch_prompts = batch_prompts\n\n        # == get reference for condition ==\n        refs = collect_references_batch(refs, vae, image_size)\n\n        # == multi-resolution info ==\n        model_args = prepare_multi_resolution_info(\n            multi_resolution, len(batch_prompts), image_size, num_frames, fps, device, dtype\n        )\n\n        model_args['cache_type'] = 'attention'\n        model_args['ratio_scheduler'] = 'ToCa'\n        model_args['fresh_ratio'] = 0.1\n        model_args['fresh_threshold'] = 3 # Note this does not decide the force activatioin cycles, see more details in Open-Sora\\opensora\\models\\cache_functions\\force_scheduler.py\n        model_args['force_fresh'] = 'global'\n        model_args['soft_fresh_weight'] = 0.25\n\n        # == Iter over number of sampling for one prompt ==\n        for k in range(num_sample):\n            # == prepare save paths ==\n            save_paths = [\n                get_save_path_name(\n                    save_dir,\n                    sample_name=sample_name,\n                    sample_idx=start_idx + idx,\n                    prompt=original_batch_prompts[idx],\n                    prompt_as_path=prompt_as_path,\n                    num_sample=num_sample,\n                    k=k,\n                )\n                for idx in range(len(batch_prompts))\n            ]\n\n            # NOTE: Skip if the sample already exists\n            # This is useful for resuming sampling VBench\n            if prompt_as_path and all_exists(save_paths):\n                continue\n\n            # == process prompts step by step ==\n            # 0. split prompt\n            # each element in the list is [prompt_segment_list, loop_idx_list]\n            batched_prompt_segment_list = []\n            batched_loop_idx_list = []\n            for prompt in batch_prompts:\n                prompt_segment_list, loop_idx_list = split_prompt(prompt)\n                batched_prompt_segment_list.append(prompt_segment_list)\n                batched_loop_idx_list.append(loop_idx_list)\n\n            # 1. refine prompt by openai\n            if cfg.get(\"llm_refine\", False):\n                # only call openai API when\n                # 1. seq parallel is not enabled\n                # 2. seq parallel is enabled and the process is rank 0\n                if not enable_sequence_parallelism or (enable_sequence_parallelism and is_main_process()):\n                    for idx, prompt_segment_list in enumerate(batched_prompt_segment_list):\n                        batched_prompt_segment_list[idx] = refine_prompts_by_openai(prompt_segment_list)\n\n                # sync the prompt if using seq parallel\n                if enable_sequence_parallelism:\n                    coordinator.block_all()\n                    prompt_segment_length = [\n                        len(prompt_segment_list) for prompt_segment_list in batched_prompt_segment_list\n                    ]\n\n                    # flatten the prompt segment list\n                    batched_prompt_segment_list = [\n                        prompt_segment\n                        for prompt_segment_list in batched_prompt_segment_list\n                        for prompt_segment in prompt_segment_list\n                    ]\n\n                    # create a list of size equal to world size\n                    broadcast_obj_list = [batched_prompt_segment_list] * coordinator.world_size\n                    dist.broadcast_object_list(broadcast_obj_list, 0)\n\n                    # recover the prompt list\n                    batched_prompt_segment_list = []\n                    segment_start_idx = 0\n                    all_prompts = broadcast_obj_list[0]\n                    for num_segment in prompt_segment_length:\n                        batched_prompt_segment_list.append(\n                            all_prompts[segment_start_idx : segment_start_idx + num_segment]\n                        )\n                        segment_start_idx += num_segment\n\n            # 2. append score\n            for idx, prompt_segment_list in enumerate(batched_prompt_segment_list):\n                batched_prompt_segment_list[idx] = append_score_to_prompts(\n                    prompt_segment_list,\n                    aes=cfg.get(\"aes\", None),\n                    flow=cfg.get(\"flow\", None),\n                    camera_motion=cfg.get(\"camera_motion\", None),\n                )\n\n            # 3. clean prompt with T5\n            for idx, prompt_segment_list in enumerate(batched_prompt_segment_list):\n                batched_prompt_segment_list[idx] = [text_preprocessing(prompt) for prompt in prompt_segment_list]\n\n            # 4. merge to obtain the final prompt\n            batch_prompts = []\n            for prompt_segment_list, loop_idx_list in zip(batched_prompt_segment_list, batched_loop_idx_list):\n                batch_prompts.append(merge_prompt(prompt_segment_list, loop_idx_list))\n\n            # == Iter over loop generation ==\n            video_clips = []\n            for loop_i in range(loop):\n                # == get prompt for loop i ==\n                batch_prompts_loop = extract_prompts_loop(batch_prompts, loop_i)\n\n                # == add condition frames for loop ==\n                if loop_i > 0:\n                    refs, ms = append_generated(\n                        vae, video_clips[-1], refs, ms, loop_i, condition_frame_length, condition_frame_edit\n                    )\n\n                # == sampling ==\n                torch.manual_seed(1024 + k) # should set diffrent seed for different samples\n                z = torch.randn(len(batch_prompts), vae.out_channels, *latent_size, device=device, dtype=dtype)\n                masks = apply_mask_strategy(z, refs, ms, loop_i, align=align)\n                samples = scheduler.sample(\n                    model,\n                    text_encoder,\n                    z=z,\n                    prompts=batch_prompts_loop,\n                    device=device,\n                    additional_args=model_args,\n                    progress=verbose >= 2,\n                    mask=masks,\n                )\n                samples = vae.decode(samples.to(dtype), num_frames=num_frames)\n                video_clips.append(samples)\n\n            # == save samples ==\n            if is_main_process():\n                for idx, batch_prompt in enumerate(batch_prompts):\n                    if verbose >= 2:\n                        logger.info(\"Prompt: %s\", batch_prompt)\n                    save_path = save_paths[idx]\n                    video = [video_clips[i][idx] for i in range(loop)]\n                    for i in range(1, loop):\n                        video[i] = video[i][:, dframe_to_frame(condition_frame_length) :]\n                    video = torch.cat(video, dim=1)\n                    save_path = save_sample(\n                        video,\n                        fps=save_fps,\n                        save_path=save_path,\n                        verbose=verbose >= 2,\n                    )\n                    if save_path.endswith(\".mp4\") and cfg.get(\"watermark\", False):\n                        time.sleep(1)  # prevent loading previous generated video\n                        add_watermark(save_path)\n        start_idx += len(batch_prompts)\n    logger.info(\"Inference finished.\")\n    logger.info(\"Saved %s samples to %s\", start_idx, save_dir)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/scripts/inference_vae.py",
    "content": "import os\nfrom pprint import pformat\n\nimport colossalai\nimport torch\nfrom mmengine.runner import set_random_seed\nfrom tqdm import tqdm\n\nfrom opensora.acceleration.parallel_states import get_data_parallel_group\nfrom opensora.datasets import save_sample\nfrom opensora.datasets.dataloader import prepare_dataloader\nfrom opensora.models.vae.losses import VAELoss\nfrom opensora.registry import DATASETS, MODELS, build_module\nfrom opensora.utils.config_utils import parse_configs\nfrom opensora.utils.misc import create_logger, get_world_size, is_distributed, is_main_process, to_torch_dtype\n\n\ndef main():\n    torch.set_grad_enabled(False)\n    # ======================================================\n    # configs & runtime variables\n    # ======================================================\n    # == parse configs ==\n    cfg = parse_configs(training=False)\n\n    # == device and dtype ==\n    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n    cfg_dtype = cfg.get(\"dtype\", \"fp32\")\n    assert cfg_dtype in [\"fp16\", \"bf16\", \"fp32\"], f\"Unknown mixed precision {cfg_dtype}\"\n    dtype = to_torch_dtype(cfg.get(\"dtype\", \"bf16\"))\n    torch.backends.cuda.matmul.allow_tf32 = True\n    torch.backends.cudnn.allow_tf32 = True\n\n    # == init distributed env ==\n    if is_distributed():\n        colossalai.launch_from_torch({})\n    set_random_seed(seed=cfg.get(\"seed\", 1024))\n\n    # == init logger ==\n    logger = create_logger()\n    logger.info(\"Inference configuration:\\n %s\", pformat(cfg.to_dict()))\n    verbose = cfg.get(\"verbose\", 1)\n\n    # ======================================================\n    # build dataset and dataloader\n    # ======================================================\n    logger.info(\"Building reconstruction dataset...\")\n    dataset = build_module(cfg.dataset, DATASETS)\n    batch_size = cfg.get(\"batch_size\", 1)\n    dataloader, _ = prepare_dataloader(\n        dataset,\n        batch_size=batch_size,\n        num_workers=cfg.get(\"num_workers\", 4),\n        shuffle=False,\n        drop_last=False,\n        pin_memory=True,\n        process_group=get_data_parallel_group(),\n    )\n    logger.info(\"Dataset %s contains %s videos.\", cfg.dataset.data_path, len(dataset))\n    total_batch_size = batch_size * get_world_size()\n    logger.info(\"Total batch size: %s\", total_batch_size)\n\n    total_steps = len(dataloader)\n    if cfg.get(\"num_samples\", None) is not None:\n        total_steps = min(int(cfg.num_samples // cfg.batch_size), total_steps)\n        logger.info(\"limiting test dataset to %s\", int(cfg.num_samples // cfg.batch_size) * cfg.batch_size)\n    dataiter = iter(dataloader)\n\n    # ======================================================\n    # build model & loss\n    # ======================================================\n    logger.info(\"Building models...\")\n    model = build_module(cfg.model, MODELS).to(device, dtype).eval()\n    vae_loss_fn = VAELoss(\n        logvar_init=cfg.get(\"logvar_init\", 0.0),\n        perceptual_loss_weight=cfg.get(\"perceptual_loss_weight\", 0.1),\n        kl_loss_weight=cfg.get(\"kl_loss_weight\", 1e-6),\n        device=device,\n        dtype=dtype,\n    )\n\n    # ======================================================\n    # inference\n    # ======================================================\n    # == global variables ==\n    running_loss = running_nll = running_nll_z = 0.0\n    loss_steps = 0\n    cal_stats = cfg.get(\"cal_stats\", False)\n    if cal_stats:\n        num_samples = 0\n        running_sum = running_var = 0.0\n        running_sum_c = torch.zeros(model.out_channels, dtype=torch.float, device=device)\n        running_var_c = torch.zeros(model.out_channels, dtype=torch.float, device=device)\n\n    # prepare arguments\n    save_fps = cfg.get(\"fps\", 24) // cfg.get(\"frame_interval\", 1)\n\n    # Iter over the dataset\n    with tqdm(\n        range(total_steps),\n        disable=not is_main_process() or verbose < 1,\n        total=total_steps,\n        initial=0,\n    ) as pbar:\n        for step in pbar:\n            batch = next(dataiter)\n            x = batch[\"video\"].to(device, dtype)  # [B, C, T, H, W]\n\n            # == vae encoding & decoding ===\n            z, posterior, x_z = model.encode(x)\n            x_rec, x_z_rec = model.decode(z, num_frames=x.size(2))\n            x_ref = model.spatial_vae.decode(x_z)\n\n            # == check z shape ==\n            input_size = x.shape[2:]\n            latent_size = model.get_latent_size(input_size)\n            assert list(z.shape[2:]) == latent_size, f\"z shape: {z.shape}, latent_size: {latent_size}\"\n\n            # == calculate stats ==\n            if cal_stats:\n                num_samples += 1\n                running_sum += z.mean().item()\n                running_var += (z - running_sum / num_samples).pow(2).mean().item()\n                running_sum_c += z.mean(dim=(0, 2, 3, 4)).float()\n                running_var_c += (\n                    (z - running_sum_c[None, :, None, None, None] / num_samples).pow(2).mean(dim=(0, 2, 3, 4)).float()\n                )\n                if verbose >= 1:\n                    pbar.set_postfix(\n                        {\n                            \"mean\": running_sum / num_samples,\n                            \"std\": (running_var / num_samples) ** 0.5,\n                        }\n                    )\n                if num_samples % cfg.get(\"log_stats_every\", 100) == 0:\n                    logger.info(\n                        \"VAE feature per channel stats: mean %s, var %s\",\n                        (running_sum_c / num_samples).cpu().tolist(),\n                        (running_var_c / num_samples).sqrt().cpu().tolist(),\n                    )\n\n            # == loss calculation ==\n            nll_loss, weighted_nll_loss, weighted_kl_loss = vae_loss_fn(x, x_rec, posterior)\n            nll_loss_z, _, _ = vae_loss_fn(x_z, x_z_rec, posterior, no_perceptual=True)\n            vae_loss = weighted_nll_loss + weighted_kl_loss\n            loss_steps += 1\n            running_loss = vae_loss.item() / loss_steps + running_loss * ((loss_steps - 1) / loss_steps)\n            running_nll = nll_loss.item() / loss_steps + running_nll * ((loss_steps - 1) / loss_steps)\n            running_nll_z = nll_loss_z.item() / loss_steps + running_nll_z * ((loss_steps - 1) / loss_steps)\n\n            # == save samples ==\n            save_dir = cfg.get(\"save_dir\", None)\n            if is_main_process() and save_dir is not None:\n                ori_dir = f\"{save_dir}_ori\"\n                rec_dir = f\"{save_dir}_rec\"\n                ref_dir = f\"{save_dir}_spatial\"\n                os.makedirs(ori_dir, exist_ok=True)\n                os.makedirs(rec_dir, exist_ok=True)\n                os.makedirs(ref_dir, exist_ok=True)\n                for idx, vid in enumerate(x):\n                    pos = step * cfg.batch_size + idx\n                    save_sample(vid, fps=save_fps, save_path=f\"{ori_dir}/{pos:03d}\", verbose=verbose >= 2)\n                    save_sample(x_rec[idx], fps=save_fps, save_path=f\"{rec_dir}/{pos:03d}\", verbose=verbose >= 2)\n                    save_sample(x_ref[idx], fps=save_fps, save_path=f\"{ref_dir}/{pos:03d}\", verbose=verbose >= 2)\n\n    logger.info(\"VAE loss: %s\", running_loss)\n    logger.info(\"VAE nll loss: %s\", running_nll)\n    logger.info(\"VAE nll_z loss: %s\", running_nll_z)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/scripts/misc/extract_feat.py",
    "content": "import os\nfrom pprint import pformat\n\nimport colossalai\nimport torch\nimport torch.distributed as dist\nfrom tqdm import tqdm\n\nfrom opensora.acceleration.parallel_states import get_data_parallel_group, set_data_parallel_group\nfrom opensora.datasets.dataloader import prepare_dataloader\nfrom opensora.registry import DATASETS, MODELS, build_module\nfrom opensora.utils.config_utils import parse_configs, save_training_config\nfrom opensora.utils.misc import FeatureSaver, Timer, create_logger, format_numel_str, get_model_numel, to_torch_dtype\n\n\ndef main():\n    torch.set_grad_enabled(False)\n    # ======================================================\n    # 1. configs & runtime variables\n    # ======================================================\n    # == parse configs ==\n    cfg = parse_configs(training=False)\n\n    # == device and dtype ==\n    assert torch.cuda.is_available(), \"Training currently requires at least one GPU.\"\n    cfg_dtype = cfg.get(\"dtype\", \"bf16\")\n    assert cfg_dtype in [\"fp16\", \"bf16\"], f\"Unknown mixed precision {cfg_dtype}\"\n    dtype = to_torch_dtype(cfg.get(\"dtype\", \"bf16\"))\n\n    # == colossalai init distributed training ==\n    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n    cfg_dtype = cfg.get(\"dtype\", \"fp32\")\n    assert cfg_dtype in [\"fp16\", \"bf16\", \"fp32\"], f\"Unknown mixed precision {cfg_dtype}\"\n    dtype = to_torch_dtype(cfg.get(\"dtype\", \"bf16\"))\n    torch.backends.cuda.matmul.allow_tf32 = True\n    torch.backends.cudnn.allow_tf32 = True\n\n    colossalai.launch_from_torch({})\n    set_data_parallel_group(dist.group.WORLD)\n\n    # == init logger, tensorboard & wandb ==\n    logger = create_logger()\n    logger.info(\"Configuration:\\n %s\", pformat(cfg.to_dict()))\n\n    # ======================================================\n    # 2. build dataset and dataloader\n    # ======================================================\n    logger.info(\"Building dataset...\")\n    # == build dataset ==\n    dataset = build_module(cfg.dataset, DATASETS)\n    logger.info(\"Dataset contains %s samples.\", len(dataset))\n\n    # == build dataloader ==\n    dataloader_args = dict(\n        dataset=dataset,\n        batch_size=cfg.get(\"batch_size\", None),\n        num_workers=cfg.get(\"num_workers\", 4),\n        seed=cfg.get(\"seed\", 1024),\n        shuffle=True,\n        drop_last=True,\n        pin_memory=True,\n        process_group=get_data_parallel_group(),\n    )\n    dataloader, _ = prepare_dataloader(\n        bucket_config=cfg.get(\"bucket_config\", None),\n        num_bucket_build_workers=cfg.get(\"num_bucket_build_workers\", 1),\n        **dataloader_args,\n    )\n    num_steps_per_epoch = len(dataloader)\n\n    # ======================================================\n    # 3. build model\n    # ======================================================\n    logger.info(\"Building models...\")\n    # == build text-encoder and vae ==\n    text_encoder = build_module(cfg.text_encoder, MODELS, device=device, dtype=dtype)\n    vae = build_module(cfg.vae, MODELS).to(device, dtype).eval()\n\n    # == build diffusion model ==\n    input_size = (dataset.num_frames, *dataset.image_size)\n    latent_size = vae.get_latent_size(input_size)\n    model = (\n        build_module(\n            cfg.model,\n            MODELS,\n            input_size=latent_size,\n            in_channels=vae.out_channels,\n            caption_channels=text_encoder.output_dim,\n            model_max_length=text_encoder.model_max_length,\n        )\n        .to(device, dtype)\n        .train()\n    )\n    model_numel, model_numel_trainable = get_model_numel(model)\n    logger.info(\n        \"[Diffusion] Trainable model params: %s, Total model params: %s\",\n        format_numel_str(model_numel_trainable),\n        format_numel_str(model_numel),\n    )\n\n    # =======================================================\n    # 5. training loop\n    # =======================================================\n    # == global variables ==\n    bin_size = cfg.bin_size\n    save_text_features = cfg.get(\"save_text_features\", False)\n    save_compressed_text_features = cfg.get(\"save_compressed_text_features\", False)\n\n    # == number of bins ==\n    num_bin = num_steps_per_epoch // bin_size\n    logger.info(\"Number of batches: %s\", num_steps_per_epoch)\n    logger.info(\"Bin size: %s\", bin_size)\n    logger.info(\"Number of bins: %s\", num_bin)\n\n    # resume from a specific batch index\n    start_index = cfg.get(\"start_index\", 0)\n    end_index = cfg.get(\"end_index\", num_bin)\n    dataloader.batch_sampler.load_state_dict({\"last_micro_batch_access_index\": start_index})\n    num_bin_to_process = min(num_bin, end_index) - start_index\n    logger.info(\"Start index: %s\", start_index)\n    logger.info(\"End index: %s\", end_index)\n    logger.info(\"Number of batches to process: %s\", num_bin_to_process)\n\n    # create save directory\n    assert cfg.get(\"save_dir\", None) is not None, \"Please specify the save_dir in the config file.\"\n    save_dir = os.path.join(cfg.save_dir, f\"s{start_index}_e{end_index}\")\n    os.makedirs(save_dir, exist_ok=True)\n    save_training_config(cfg.to_dict(), save_dir)\n    logger.info(\"Saving features to %s\", save_dir)\n\n    saver = FeatureSaver(save_dir, bin_size, start_bin=start_index)\n\n    # == training loop in an epoch ==\n    dataloader_iter = iter(dataloader)\n    log_time = cfg.get(\"log_time\", False)\n    for i in tqdm(range(0, num_bin_to_process * bin_size)):\n        with Timer(\"step\", log=log_time):\n            with Timer(\"data loading\", log=log_time):\n                batch = next(dataloader_iter)\n                x = batch.pop(\"video\").to(device, dtype)  # [B, C, T, H, W]\n                y = batch.pop(\"text\")\n\n            with Timer(\"vae\", log=log_time):\n                x = vae.encode(x)\n            with Timer(\"feature to cpu\", log=log_time):\n                x = x.cpu()\n\n            batch_dict = {\n                \"x\": x,\n                \"text\": y,\n                \"fps\": batch[\"fps\"].to(dtype),\n                \"height\": batch[\"height\"].to(dtype),\n                \"width\": batch[\"width\"].to(dtype),\n                \"num_frames\": batch[\"num_frames\"].to(dtype),\n            }\n\n            if save_text_features:\n                with Timer(\"text\", log=log_time):\n                    text_infos = text_encoder.encode(y)\n                    y_feat = text_infos[\"y\"]\n                    y_mask = text_infos[\"mask\"]\n                    if save_compressed_text_features:\n                        y_feat, y_mask = model.encode_text(y_feat, y_mask)\n                        y_mask = torch.tensor(y_mask)\n                with Timer(\"feature to cpu\", log=log_time):\n                    y_feat = y_feat.cpu()\n                    y_mask = y_mask.cpu()\n                batch_dict.update({\"y\": y_feat, \"mask\": y_mask})\n\n            saver.update(batch_dict)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/scripts/misc/launch_extract_feat.sh",
    "content": "#!/bin/bash\n\nset -x\nset -e\n\nSTART_SPLIT=0\nNUM_SPLIT=10\n\nDATA_PATH=$1\nSAVE_PATH=$2\nDATA_ARG=\"--data-path $DATA_PATH\"\nSAVE_ARG=\"--save-dir $SAVE_PATH\"\n\nCMD=\"torchrun --standalone --nproc_per_node 1 scripts/misc/extract_feat.py configs/opensora-v1-2/misc/extract.py $DATA_ARG $SAVE_ARG\"\ndeclare -a GPUS=(0 1 2 3 4 5 6 7)\n\nmkdir -p logs/extract_feat\n\nfor i in \"${GPUS[@]}\"; do\n    CUDA_VISIBLE_DEVICES=$i $CMD --start-index $(($START_SPLIT + i * $NUM_SPLIT)) --end-index $(($START_SPLIT + (i + 1) * $NUM_SPLIT)) >logs/extract_feat/$i.log 2>&1 &\ndone\n"
  },
  {
    "path": "Open-Sora/setup.py",
    "content": "from typing import List\n\nfrom setuptools import find_packages, setup\n\n\ndef fetch_requirements(paths) -> List[str]:\n    \"\"\"\n    This function reads the requirements file.\n\n    Args:\n        path (str): the path to the requirements file.\n\n    Returns:\n        The lines in the requirements file.\n    \"\"\"\n    if not isinstance(paths, list):\n        paths = [paths]\n    requirements = []\n    for path in paths:\n        with open(path, \"r\") as fd:\n            requirements += [r.strip() for r in fd.readlines()]\n    return requirements\n\n\ndef fetch_readme() -> str:\n    \"\"\"\n    This function reads the README.md file in the current directory.\n\n    Returns:\n        The lines in the README file.\n    \"\"\"\n    with open(\"README.md\", encoding=\"utf-8\") as f:\n        return f.read()\n\n\nsetup(\n    name=\"opensora\",\n    version=\"1.2.0\",\n    packages=find_packages(\n        exclude=(\n            \"assets\",\n            \"cache\",\n            \"configs\",\n            \"docs\",\n            \"eval\",\n            \"evaluation_results\",\n            \"gradio\",\n            \"logs\",\n            \"notebooks\",\n            \"outputs\",\n            \"pretrained_models\",\n            \"samples\",\n            \"scripts\",\n            \"tests\",\n            \"tools\",\n            \"*.egg-info\",\n        )\n    ),\n    description=\"Democratizing Efficient Video Production for All\",\n    long_description=fetch_readme(),\n    long_description_content_type=\"text/markdown\",\n    license=\"Apache Software License 2.0\",\n    url=\"https://github.com/hpcaitech/Open-Sora\",\n    project_urls={\n        \"Bug Tracker\": \"https://github.com/hpcaitech/Open-Sora/issues\",\n        \"Examples\": \"https://hpcaitech.github.io/Open-Sora/\",\n        \"Documentation\": \"https://github.com/hpcaitech/Open-Sora?tab=readme-ov-file\",\n        \"Github\": \"https://github.com/hpcaitech/Open-Sora\",\n    },\n    install_requires=fetch_requirements(\"requirements/requirements.txt\"),\n    python_requires=\">=3.6\",\n    classifiers=[\n        \"Programming Language :: Python :: 3\",\n        \"License :: OSI Approved :: Apache Software License\",\n        \"Environment :: GPU :: NVIDIA CUDA\",\n        \"Topic :: Scientific/Engineering :: Artificial Intelligence\",\n        \"Topic :: System :: Distributed Computing\",\n    ],\n    extras_require={\n        \"data\": fetch_requirements(\"requirements/requirements-data.txt\"),\n        \"eval\": fetch_requirements(\"requirements/requirements-eval.txt\"),\n        \"vae\": fetch_requirements(\"requirements/requirements-vae.txt\"),\n        \"full\": fetch_requirements(\n            [\n                \"requirements/requirements-data.txt\",\n                \"requirements/requirements-eval.txt\",\n            ]\n        ),\n    },\n)\n"
  },
  {
    "path": "Open-Sora/tests/test_attn.py",
    "content": "import torch\nfrom colossalai.accelerator import get_accelerator\nfrom colossalai.utils import get_current_device\nfrom rotary_embedding_torch import RotaryEmbedding\n\nfrom opensora.models.layers.blocks import Attention\n\n# B, S, H = 7488, 1, 1152\n# B, S, H = 32, 234, 1152\nB, S, H = 128, 32, 1152\nN, D = 16, 72\n\n\ndef run_attn(enable_flash_attn: bool):\n    get_accelerator().reset_peak_memory_stats()\n    rope = RotaryEmbedding(D).to(device=get_current_device(), dtype=torch.bfloat16)\n    attn = Attention(\n        H,\n        N,\n        qkv_bias=True,\n        rope=rope.rotate_queries_or_keys,\n        enable_flash_attn=enable_flash_attn,\n    ).to(device=get_current_device(), dtype=torch.bfloat16)\n    x = torch.randn(B, S, H, device=get_current_device(), dtype=torch.bfloat16).requires_grad_()\n    y = attn(x)\n    y.mean().backward()\n    print(f\"Peak memory: {get_accelerator().max_memory_allocated() / 1024**2:.2f} MB\")\n\n\nif __name__ == \"__main__\":\n    print(\"Use flashattn\")\n    run_attn(True)\n    print(\"No flashattn\")\n    run_attn(False)\n"
  },
  {
    "path": "Open-Sora/tests/test_lr_scheduler.py",
    "content": "import torch\nfrom torch.optim import Adam\nfrom torchvision.models import resnet50\nfrom tqdm import tqdm\n\nfrom opensora.utils.lr_scheduler import LinearWarmupLR\n\n\ndef test_lr_scheduler():\n    warmup_steps = 200\n    model = resnet50().cuda()\n    optimizer = Adam(model.parameters(), lr=0.01)\n    scheduler = LinearWarmupLR(optimizer, warmup_steps=warmup_steps)\n    current_lr = scheduler.get_lr()[0]\n    data = torch.rand(1, 3, 224, 224).cuda()\n\n    for i in tqdm(range(warmup_steps * 2)):\n        out = model(data)\n        out.mean().backward()\n        optimizer.step()\n        scheduler.step()\n\n        if i >= warmup_steps:\n            assert scheduler.get_lr()[0] == 0.01\n        else:\n            assert scheduler.get_lr()[0] > current_lr, f\"{scheduler.get_lr()[0]} <= {current_lr}\"\n            current_lr = scheduler.get_lr()[0]\n\n\nif __name__ == \"__main__\":\n    test_lr_scheduler()\n"
  },
  {
    "path": "Open-Sora/tools/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/tools/caption/README.md",
    "content": "# Video Captioning\n\nHuman labeling of videos is expensive and time-consuming. We adopt powerful image captioning models to generate captions for videos. Although GPT-4V achieves a better performance, its 20s/sample speed is too slow for us. As for our v1.2 model, we captioned our training videos with the [PLLaVA](https://github.com/magic-research/PLLaVA) model. PLLaVA performs highly competitively on multiple video-based text generation benchmarks including [MVbench](https://paperswithcode.com/sota/video-question-answering-on-mvbench?p=pllava-parameter-free-llava-extension-from-1).\n\n## PLLaVA Captioning\n\nTo balance captioning speed and performance, we chose the 13B version of PLLaVA configured with 2*2 spatial pooling. We feed it with 4 frames evenly extracted from the video. We accelerate its inference via (1) batching and (2) offload frame extraction to a separate process such that the GPU computations and frame extraction happen in parallel.\n\n### Installation\nInstall the required dependancies by following our [installation instructions](../../docs/installation.md)'s \"Data Dependencies\" and \"PLLaVA Captioning\" sections.\n\n\n<!-- ### Download the PLLaVA repo\n\nFirst, make sure you are under the directory of tools/caption/pllava_dir. Then,\n\n```bash\ngit clone https://github.com/magic-research/PLLaVA.git\ncd PLLaVA\ngit checkout fd9194a\n\n\n```\n\n### Environment\n\n```bash\nconda create -n pllava python=3.10\n\nconda activate pllava\n\npip install -r requirements.txt # change to your own torch version if neccessary; torch==2.2.2, torchaudio==2.2.2, torchvision==0.17.2 worked for H100 for Tom.\n\n```\n\n\n### Download weights\n\n```bash\npython python_scripts/hf.py # download the weights\n``` -->\n### Usage\n\nSince PLLaVA is not fashioned as a package, we will use PYTHONPATH to use it.\n\n\n```bash\ncd .. # step back to pllava_dir\n\nCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \\\nPYTHONPATH='$PYTHONPATH:OPEN_SORA_HOME/tools/caption/pllava_dir/PLLaVA' \\\nnohup python caption_pllava.py \\\n  --pretrained_model_name_or_path PLLaVA/MODELS/pllava-13b \\\n  --use_lora \\\n  --lora_alpha 4 \\\n  --num_frames 4 \\\n  --weight_dir PLLaVA/MODELS/pllava-13b \\\n  --csv_path meta.csv \\\n  --pooling_shape 4-12-12 \\\n  > pllava_caption.out 2>&1 &\n```\n\n### PLLaVA vs. LLaVA\nIn our previous releases, we used [LLaVA](#llava-captioning) for video captioning.\nQualitatively speaking, we observe that PLLaVA has a somewhat higher chance of accurately capture the details in the video than LLaVA. See below for their comparison on a video sample.\n<!-- <img src=\"../../assets/readme/llava_vs_pllava_sample.gif\" width=\"300\" height=\"200\" alt=\"LLaVA vs PLLaVA\"> -->\n\n<figure>\n    <img src=\"../../assets/readme/llava_vs_pllava_sample.gif\" width=\"300\" height=\"200\" alt=\"LLaVA vs PLLaVA\">\n</figure>\n\n\n\n| LLaVA | PLLaVA |\n|----------|----------|\n| The video is a close-up shot of two gold wedding rings. The rings are placed on a (black surface)✅, casting a soft shadow beneath them. The rings are positioned in such a way that (they are facing each other)❌, creating a mirror image effect. The rings are (identical in size and design)✅, suggesting they are a pair. The lighting in the video is soft and diffused, highlighting the gold color of the rings and creating a warm and inviting atmosphere. The overall style of the video is minimalist and elegant, focusing solely on the rings and their reflection.\t| The video shows a pair of gold wedding rings on a (reflective surface)✅. The rings are placed one on top of the other, (with the top ring slightly tilted to the left)✅. The rings have a (shiny, metallic finish)✅ and are the main focus of the image. The background is a gradient of dark to light gray, providing a neutral backdrop that highlights the rings. There are no texts or other objects in the image. The style of the video is a simple product display with a focus on the rings, likely intended for promotional or sales purposes. The lighting and shadows suggest a soft, even light source, (possibly a studio light)✅, which creates a reflective surface beneath the rings.|\n<!-- |Row2Cell1|Row2Cell2| -->\n\n\n\n\n## LLaVA Captioning\n\nWe extract three frames from the video for captioning. With batch inference, we can achieve 10 times speedup. With approximately 720p resolution and 1 frames, the speed is 2~3 videos/s on 8 GPUs. If we resize the smaller side to 336, the speed can be 8 videos/s. In Open-Sora v1.1, to lower the cost, we use the 7B model.\n\n### Installation\n\nInstall the required dependancies by following our [installation instructions](../../docs/installation.md)'s \"Data Dependencies\" and \"LLaVA Captioning\" sections.\n\n<!-- ### Requirement\n\n```bash\n# create conda env\nconda create -n llava python=3.10 -y\nconda activate llava\n\n# install torch\npip install torch torchvision\n\n# clone llava\ngit clone https://github.com/haotian-liu/LLaVA.git\ncd LLaVA\n# CAUTION: This line is to remove torch dependency in pyproject.toml, which is:\n# \"torch==2.1.2\", \"torchvision==0.16.2\",\n# It is better manually remove it in your local pyproject.toml\nsed -i '16d' pyproject.toml\n\n# install llava\npip install --upgrade pip  # enable PEP 660 support\npip install -e .\n\n# install flash attention\npip install flash-attn --no-build-isolation\n# install colossalai and decord\npip install colossalai decord\n``` -->\n\n### Usage\n\nPrepare a csv file for processing. The csv file can be generated by `convert_dataset.py` according to its [documentation](/tools/datasets/README.md). Then, run the following command to generate captions for videos/images with Llava:\n\n```bash\n# caption with mistral-7B\ntorchrun --nproc_per_node 8 --standalone -m tools.caption.caption_llava DATA.csv --dp-size 8 --tp-size 1 --model-path liuhaotian/llava-v1.6-mistral-7b --prompt video\n\n# caption with llava-34B\n# NOTE: remember to enable flash attention for this model\ntorchrun --nproc_per_node 8 --standalone -m tools.caption.caption_llava DATA.csv --dp-size 4 --tp-size 2 --model-path liuhaotian/llava-v1.6-34b --prompt image-3ex --flash-attention\n\n# we run this on 8xH800 GPUs\ntorchrun --nproc_per_node 8 --standalone -m tools.caption.caption_llava DATA.csv --tp-size 2 --dp-size 4 --bs 16\n\n# at least two 80G GPUs are required\ntorchrun --nproc_per_node 2 --standalone -m tools.caption.caption_llava DATA.csv --tp-size 2 --dp-size 1 --bs 16\n\n# can also caption images\ntorchrun --nproc_per_node 2 --standalone -m tools.caption.caption_llava DATA.csv --tp-size 2 --dp-size 1 --bs 16 --prompt image-3ex\n```\n\nPlease note that you should add the `--flash-attention` flag when running with Llama-based Llava models as it provides speedup but do turn it off for mistral-based ones. Reasons can be found in [this issue](https://discuss.huggingface.co/t/flash-attention-has-no-effect-on-inference/73453).\n\nAfter running the script, with `dp-size=N`, you will get `N` parts of csv files. Run the following command to merge them:\n\n```bash\npython -m tools.datasets.datautil DATA_caption_part*.csv --output DATA_caption.csv\n```\n\n### Resume\n\nSometimes the process may be interrupted. We can resume the process by running the following command:\n\n```bash\n# merge generated results\npython -m tools.datasets.datautil DATA_caption_part*.csv --output DATA_caption.csv\n\n# get the remaining videos\npython -m tools.datasets.datautil DATA.csv --difference DATA_caption.csv --output DATA_remaining.csv\n```\n\nThen use the output csv file to resume the process.\n\n\n## GPT-4V Captioning\n\nRun the following command to generate captions for videos with GPT-4V:\n\n```bash\n# output: DATA_caption.csv\npython -m tools.caption.caption_gpt4 DATA.csv --key $OPENAI_API_KEY\n```\n\nThe cost is approximately $0.01 per video (3 frames per video).\n\n## Camera Motion Detection\n\n<!-- Install additional required packages: `tools/caption/camera_motion/requirements.txt`. -->\nInstall required packages with `pip install -v .[data]` (See [installation.md](../../docs/installation.md)).\nRun the following command to classify camera motion:\n\n```bash\n# output: meta_cmotion.csv\npython -m tools.caption.camera_motion.detect tools/caption/camera_motion/meta.csv\n```\n\nYou may additionally specify `threshold` to indicate how \"sensitive\" the detection should be as below. For example `threshold = 0.2` means that the video is only counted as `tilt_up` when the pixels moved down by `>20%` of video height between the starting and ending frames.\n```bash\n# output: meta_cmotion.csv\npython -m tools.caption.camera_motion.detect tools/caption/camera_motion/meta.csv --threshold 0.2\n```\n\nEach video is classified according to 8 categories:\n            `pan_right,\n            pan_left,\n            tilt_up,\n            tilt_down,\n            zoom_in,\n            zoom_out,\n            static,\n            unclassified`.\nCategories of `tilt`, `pan` and `zoom` can overlap with each other.\n\n\n## Tagging with Llama3\n\nTo understand the overall category distribution of our training dataset, we use Llama3 to generate tags based on the video captions.\n\nAfter obtaining Llama3 usage permission from huggingface/meta, you may generate tags based on the captions using Llama3 like this:\n\n```bash\ntorchrun --nproc_per_node 8 --standalone -m tools.caption.caption_llama3 meta.csv --key objects --output_prefix meta\n```\n\nThis will generate tags based on the `text` column of `meta.csv` and put the results to `output_prefix + key.csv`. Currently the prompts for `objects` and `actions` are supported.\n"
  },
  {
    "path": "Open-Sora/tools/caption/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/tools/caption/acceleration/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/tools/caption/acceleration/llava/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/tools/caption/acceleration/llava/policies/__init__.py",
    "content": "from .llama import LlavaLlamaForCausalLMPolicy\nfrom .mistral import LlavaMistralForCausalLMPolicy\n"
  },
  {
    "path": "Open-Sora/tools/caption/acceleration/llava/policies/llama.py",
    "content": "from typing import Dict, Union\n\nimport torch.nn as nn\nfrom colossalai.shardformer.layer import Linear1D_Col, Linear1D_Row\nfrom colossalai.shardformer.policies.base_policy import ModulePolicyDescription, Policy, SubModuleReplacementDescription\n\n__all__ = [\"LlavaLlamaPolicy\", \"LlavaLlamaForCausalLMPolicy\"]\n\n\nclass LlavaLlamaPolicy(Policy):\n    def config_sanity_check(self):\n        pass\n\n    def preprocess(self):\n        if self.shard_config.enable_tensor_parallelism:\n            # Resize embedding\n            self.model.config.vocab_size\n            self.shard_config.tensor_parallel_size\n\n            # if vocab_size % world_size != 0:\n            #     new_vocab_size = vocab_size + world_size - vocab_size % world_size\n            #     self.model.resize_token_embeddings(new_vocab_size)\n\n        return self.model\n\n    def module_policy(self) -> Dict[Union[str, nn.Module], ModulePolicyDescription]:\n        from transformers.models.llama.modeling_llama import LlamaDecoderLayer\n\n        policy = {}\n\n        if self.shard_config.enable_tensor_parallelism:\n            decoder_attribute_replacement = {\n                \"self_attn.hidden_size\": self.model.config.hidden_size // self.shard_config.tensor_parallel_size,\n                \"self_attn.num_heads\": self.model.config.num_attention_heads // self.shard_config.tensor_parallel_size,\n            }\n            if getattr(self.model.config, \"num_key_value_heads\", False):\n                decoder_attribute_replacement[\"self_attn.num_key_value_heads\"] = (\n                    self.model.config.num_key_value_heads // self.shard_config.tensor_parallel_size\n                )\n\n            policy[LlamaDecoderLayer] = ModulePolicyDescription(\n                attribute_replacement=decoder_attribute_replacement,\n                sub_module_replacement=[\n                    SubModuleReplacementDescription(\n                        suffix=\"self_attn.q_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"self_attn.k_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"self_attn.v_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"self_attn.o_proj\",\n                        target_module=Linear1D_Row,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"mlp.gate_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"mlp.up_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"mlp.down_proj\",\n                        target_module=Linear1D_Row,\n                    ),\n                ],\n            )\n\n        return policy\n\n    def postprocess(self):\n        return self.model\n\n\nclass LlavaLlamaForCausalLMPolicy(LlavaLlamaPolicy):\n    def module_policy(self):\n        from transformers import LlamaForCausalLM\n\n        policy = super().module_policy()\n        if self.shard_config.enable_tensor_parallelism:\n            # add a new item for casual lm\n            new_item = {\n                LlamaForCausalLM: ModulePolicyDescription(\n                    sub_module_replacement=[\n                        SubModuleReplacementDescription(\n                            suffix=\"lm_head\", target_module=Linear1D_Col, kwargs={\"gather_output\": True}\n                        )\n                    ],\n                )\n            }\n            policy.update(new_item)\n        return policy\n"
  },
  {
    "path": "Open-Sora/tools/caption/acceleration/llava/policies/mistral.py",
    "content": "import warnings\nfrom typing import Dict, Union\n\nimport torch.nn as nn\nfrom colossalai.shardformer.layer import Linear1D_Col, Linear1D_Row, VocabParallelEmbedding1D\nfrom colossalai.shardformer.policies.base_policy import ModulePolicyDescription, Policy, SubModuleReplacementDescription\n\n__all__ = [\"LlavaMistralPolicy\", \"LlavaMistralForCausalLMPolicy\"]\n\n\nclass LlavaMistralPolicy(Policy):\n    def config_sanity_check(self):\n        pass\n\n    def preprocess(self):\n        if self.shard_config.enable_tensor_parallelism:\n            # Resize embedding\n            vocab_size = self.model.config.vocab_size\n            world_size = self.shard_config.tensor_parallel_size\n\n            if vocab_size % world_size != 0:\n                new_vocab_size = vocab_size + world_size - vocab_size % world_size\n                self.model.resize_token_embeddings(new_vocab_size)\n\n        return self.model\n\n    def module_policy(self) -> Dict[Union[str, nn.Module], ModulePolicyDescription]:\n        from transformers.models.mistral.modeling_mistral import MistralDecoderLayer, MistralModel\n\n        policy = {}\n\n        if self.shard_config.enable_sequence_parallelism:\n            self.shard_config.enable_sequence_parallelism = False\n            warnings.warn(\n                \"Mistral doesn't support sequence parallelism now, will ignore the sequence parallelism flag.\"\n            )\n\n        if self.shard_config.enable_tensor_parallelism:\n            decoder_attribute_replacement = {\n                \"self_attn.hidden_size\": self.model.config.hidden_size // self.shard_config.tensor_parallel_size,\n                \"self_attn.num_heads\": self.model.config.num_attention_heads // self.shard_config.tensor_parallel_size,\n                \"self_attn.num_key_value_heads\": self.model.config.num_key_value_heads\n                // self.shard_config.tensor_parallel_size,\n            }\n\n            policy[MistralDecoderLayer] = ModulePolicyDescription(\n                attribute_replacement=decoder_attribute_replacement,\n                sub_module_replacement=[\n                    SubModuleReplacementDescription(\n                        suffix=\"self_attn.q_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"self_attn.k_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"self_attn.v_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"self_attn.o_proj\",\n                        target_module=Linear1D_Row,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"mlp.gate_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"mlp.up_proj\",\n                        target_module=Linear1D_Col,\n                    ),\n                    SubModuleReplacementDescription(\n                        suffix=\"mlp.down_proj\",\n                        target_module=Linear1D_Row,\n                    ),\n                ],\n            )\n\n            self.append_or_create_submodule_replacement(\n                description=SubModuleReplacementDescription(\n                    suffix=\"embed_tokens\",\n                    target_module=VocabParallelEmbedding1D,\n                ),\n                policy=policy,\n                target_key=MistralModel,\n            )\n\n        return policy\n\n    def postprocess(self):\n        return self.model\n\n\nclass LlavaMistralForCausalLMPolicy(LlavaMistralPolicy):\n    def module_policy(self):\n        from transformers import MistralForCausalLM\n\n        policy = super().module_policy()\n\n        if self.shard_config.enable_tensor_parallelism:\n            # add a new item for casual lm\n            new_item = {\n                MistralForCausalLM: ModulePolicyDescription(\n                    sub_module_replacement=[\n                        SubModuleReplacementDescription(\n                            suffix=\"lm_head\", target_module=Linear1D_Col, kwargs=dict(gather_output=True)\n                        )\n                    ]\n                )\n            }\n            policy.update(new_item)\n        return policy\n"
  },
  {
    "path": "Open-Sora/tools/caption/camera_motion/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/tools/caption/camera_motion/camera_motion.py",
    "content": "import os\n\nimport numpy as np\nimport torch\n\nfrom .utils import load_video\nfrom .visualizer import Visualizer\n\n\ndef transform(vector):\n    x = np.mean([item[0] for item in vector])\n    y = np.mean([item[1] for item in vector])\n    return [x, y]\n\n\nclass CameraPredict:\n    def __init__(self, device, submodules_list, factor=0.25):\n        self.device = device\n        self.grid_size = 10\n        self.factor = factor\n        try:\n            self.model = torch.hub.load(submodules_list[\"repo\"], submodules_list[\"model\"]).to(self.device)\n        except:\n            # workaround for CERTIFICATE_VERIFY_FAILED (see: https://github.com/pytorch/pytorch/issues/33288#issuecomment-954160699)\n            import ssl\n\n            ssl._create_default_https_context = ssl._create_unverified_context\n            self.model = torch.hub.load(submodules_list[\"repo\"], submodules_list[\"model\"]).to(self.device)\n\n    def infer(self, video_path, save_video=False, save_dir=\"./saved_videos\"):\n        # load video\n        video = load_video(video_path, return_tensor=False)\n        # set scale\n        height, width = video.shape[1], video.shape[2]\n        self.scale = min(height, width)\n        video = torch.from_numpy(video).permute(0, 3, 1, 2)[None].float().to(self.device)  # B T C H W\n        pred_tracks, pred_visibility = self.model(video, grid_size=self.grid_size)  # B T N 2,  B T N 1\n\n        if save_video:\n            video_name = os.path.basename(video_path)[:-4]\n            vis = Visualizer(save_dir=save_dir, pad_value=120, linewidth=3)\n            vis.visualize(video, pred_tracks, pred_visibility, filename=video_name)\n\n        return pred_tracks[0].long().detach().cpu().numpy()\n\n    def transform_class(self, vector, min_reso):  # 768*0.05\n        scale = min_reso * self.factor\n        x, y = vector\n        direction = []\n        if x > scale:\n            direction.append(\"right\")\n        elif x < -scale:\n            direction.append(\"left\")\n\n        if y > scale:\n            direction.append(\"down\")\n        elif y < -scale:\n            direction.append(\"up\")\n\n        return direction if direction else [\"static\"]\n\n    def get_edge_point(self, track):\n        middle = self.grid_size // 2\n        top = [list(track[0, i, :]) for i in range(middle - 2, middle + 2)]\n        down = [list(track[self.grid_size - 1, i, :]) for i in range(middle - 2, middle + 2)]\n        left = [list(track[i, 0, :]) for i in range(middle - 2, middle + 2)]\n        right = [list(track[i, self.grid_size - 1, :]) for i in range(middle - 2, middle + 2)]\n\n        return top, down, left, right\n\n    def get_edge_direction(self, track1, track2):\n        edge_points1 = self.get_edge_point(track1)\n        edge_points2 = self.get_edge_point(track2)\n\n        vector_results = []\n        for points1, points2 in zip(edge_points1, edge_points2):\n            vectors = [[end[0] - start[0], end[1] - start[1]] for start, end in zip(points1, points2)]\n            vector_results.append(vectors)\n        vector_results = list(map(transform, vector_results))\n        class_results = [self.transform_class(vector, min_reso=self.scale) for vector in vector_results]\n\n        return class_results\n\n    def classify_top_down(self, top, down):\n        results = []\n        classes = [f\"{item_t}_{item_d}\" for item_t in top for item_d in down]\n\n        results_mapping = {\n            \"left_left\": \"pan_right\",\n            \"right_right\": \"pan_left\",\n            \"down_down\": \"tilt_up\",\n            \"up_up\": \"tilt_down\",\n            \"up_down\": \"zoom_in\",\n            \"down_up\": \"zoom_out\",\n            \"static_static\": \"static\",\n        }\n        results = [results_mapping.get(cls) for cls in classes if cls in results_mapping]\n        return results if results else [\"None\"]\n\n    def classify_left_right(self, left, right):\n        results = []\n        classes = [f\"{item_l}_{item_r}\" for item_l in left for item_r in right]\n        results_mapping = {\n            \"left_left\": \"pan_right\",\n            \"right_right\": \"pan_left\",\n            \"down_down\": \"tilt_up\",\n            \"up_up\": \"tilt_down\",\n            \"left_right\": \"zoom_in\",\n            \"right_left\": \"zoom_out\",\n            \"static_static\": \"static\",\n        }\n        results = [results_mapping.get(cls) for cls in classes if cls in results_mapping]\n        return results if results else [\"None\"]\n\n    def camera_classify(self, track1, track2):\n        top, down, left, right = self.get_edge_direction(track1, track2)\n\n        top_results = self.classify_top_down(top, down)\n        left_results = self.classify_left_right(left, right)\n\n        results = list(set(top_results + left_results))\n        if \"None\" in results and len(results) > 1:\n            results.remove(\"None\")\n        if \"static\" in results and len(results) > 1:\n            results.remove(\"static\")\n        if len(results) == 1 and results[0] == \"None\":  # Tom added this to deal with edge cases\n            results = [\"Undetermined\"]\n        return results\n\n    def predict(self, video_path):\n        pred_track = self.infer(video_path)\n        track1 = pred_track[0].reshape((self.grid_size, self.grid_size, 2))\n        track2 = pred_track[-1].reshape((self.grid_size, self.grid_size, 2))\n        results = self.camera_classify(track1, track2)\n        return results\n\n\ndef compute_camera_motion(device, submodules_dict, video_paths, factor):\n    camera = CameraPredict(device, submodules_dict, factor)\n    # predict_results = camera.predict(video_path)\n    # return predict_results\n    all_predictions = []\n    for video_path in video_paths:\n        camera_motion_types = camera.predict(video_path)\n        all_predictions.append(\"+\".join(camera_motion_types))\n    return all_predictions\n"
  },
  {
    "path": "Open-Sora/tools/caption/camera_motion/detect.py",
    "content": "# Originally developed by https://github.com/Vchitect/VBench based on https://github.com/facebookresearch/co-tracker.\n\nimport argparse\nfrom typing import List\n\nimport pandas as pd\n\nfrom .camera_motion import compute_camera_motion\n\n\ndef process(paths: List[str], threshold: float) -> List[str]:\n    device = \"cuda\"\n    submodules = {\"repo\": \"facebookresearch/co-tracker\", \"model\": \"cotracker2\"}\n    camera_motion_types = compute_camera_motion(device, submodules, paths, factor=threshold)\n    return camera_motion_types\n\n\ndef main(args):\n    output_file = args.input.replace(\".csv\", \"_cmotion.csv\")\n    data = pd.read_csv(args.input)\n    data[\"cmotion\"] = process(data[\"path\"], args.threshold)\n    data.to_csv(output_file, index=False)\n    print(f\"Output saved to {output_file}\")\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"input\", type=str)\n    parser.add_argument(\"--threshold\", type=float, default=0.25)\n    args = parser.parse_args()\n    main(args)\n"
  },
  {
    "path": "Open-Sora/tools/caption/camera_motion/requirements.txt",
    "content": "decord\nptvsd\nimageio-ffmpeg\n"
  },
  {
    "path": "Open-Sora/tools/caption/camera_motion/utils.py",
    "content": "import numpy as np\nimport torch\nfrom decord import VideoReader\nfrom PIL import Image, ImageSequence\n\n\ndef get_frame_indices(num_frames, vlen, sample=\"rand\", fix_start=None, input_fps=1, max_num_frames=-1):\n    if sample in [\"rand\", \"middle\"]:  # uniform sampling\n        acc_samples = min(num_frames, vlen)\n        # split the video into `acc_samples` intervals, and sample from each interval.\n        intervals = np.linspace(start=0, stop=vlen, num=acc_samples + 1).astype(int)\n        ranges = []\n        for idx, interv in enumerate(intervals[:-1]):\n            ranges.append((interv, intervals[idx + 1] - 1))\n        if sample == \"rand\":\n            try:\n                frame_indices = [random.choice(range(x[0], x[1])) for x in ranges]\n            except:\n                frame_indices = np.random.permutation(vlen)[:acc_samples]\n                frame_indices.sort()\n                frame_indices = list(frame_indices)\n        elif fix_start is not None:\n            frame_indices = [x[0] + fix_start for x in ranges]\n        elif sample == \"middle\":\n            frame_indices = [(x[0] + x[1]) // 2 for x in ranges]\n        else:\n            raise NotImplementedError\n\n        if len(frame_indices) < num_frames:  # padded with last frame\n            padded_frame_indices = [frame_indices[-1]] * num_frames\n            padded_frame_indices[: len(frame_indices)] = frame_indices\n            frame_indices = padded_frame_indices\n    elif \"fps\" in sample:  # fps0.5, sequentially sample frames at 0.5 fps\n        output_fps = float(sample[3:])\n        duration = float(vlen) / input_fps\n        delta = 1 / output_fps  # gap between frames, this is also the clip length each frame represents\n        frame_seconds = np.arange(0 + delta / 2, duration + delta / 2, delta)\n        frame_indices = np.around(frame_seconds * input_fps).astype(int)\n        frame_indices = [e for e in frame_indices if e < vlen]\n        if max_num_frames > 0 and len(frame_indices) > max_num_frames:\n            frame_indices = frame_indices[:max_num_frames]\n            # frame_indices = np.linspace(0 + delta / 2, duration + delta / 2, endpoint=False, num=max_num_frames)\n    else:\n        raise ValueError\n    return frame_indices\n\n\ndef load_video(video_path, data_transform=None, num_frames=None, return_tensor=True, width=None, height=None):\n    \"\"\"\n    Load a video from a given path and apply optional data transformations.\n\n    The function supports loading video in GIF (.gif), PNG (.png), and MP4 (.mp4) formats.\n    Depending on the format, it processes and extracts frames accordingly.\n\n    Parameters:\n    - video_path (str): The file path to the video or image to be loaded.\n    - data_transform (callable, optional): A function that applies transformations to the video data.\n\n    Returns:\n    - frames (torch.Tensor): A tensor containing the video frames with shape (T, C, H, W),\n      where T is the number of frames, C is the number of channels, H is the height, and W is the width.\n\n    Raises:\n    - NotImplementedError: If the video format is not supported.\n\n    The function first determines the format of the video file by its extension.\n    For GIFs, it iterates over each frame and converts them to RGB.\n    For PNGs, it reads the single frame, converts it to RGB.\n    For MP4s, it reads the frames using the VideoReader class and converts them to NumPy arrays.\n    If a data_transform is provided, it is applied to the buffer before converting it to a tensor.\n    Finally, the tensor is permuted to match the expected (T, C, H, W) format.\n    \"\"\"\n    if video_path.endswith(\".gif\"):\n        frame_ls = []\n        img = Image.open(video_path)\n        for frame in ImageSequence.Iterator(img):\n            frame = frame.convert(\"RGB\")\n            frame = np.array(frame).astype(np.uint8)\n            frame_ls.append(frame)\n        buffer = np.array(frame_ls).astype(np.uint8)\n    elif video_path.endswith(\".png\"):\n        frame = Image.open(video_path)\n        frame = frame.convert(\"RGB\")\n        frame = np.array(frame).astype(np.uint8)\n        frame_ls = [frame]\n        buffer = np.array(frame_ls)\n    elif video_path.endswith(\".mp4\"):\n        import decord\n\n        decord.bridge.set_bridge(\"native\")\n        if width:\n            video_reader = VideoReader(video_path, width=width, height=height, num_threads=1)\n        else:\n            video_reader = VideoReader(video_path, num_threads=1)\n        frames = video_reader.get_batch(range(len(video_reader)))  # (T, H, W, C), torch.uint8\n\n        buffer = frames.asnumpy().astype(np.uint8)\n    else:\n        raise NotImplementedError\n\n    frames = buffer\n    if num_frames:\n        frame_indices = get_frame_indices(num_frames, len(frames), sample=\"middle\")\n        frames = frames[frame_indices]\n\n    if data_transform:\n        frames = data_transform(frames)\n    elif return_tensor:\n        frames = torch.Tensor(frames)\n        frames = frames.permute(0, 3, 1, 2)  # (T, C, H, W), torch.uint8\n\n    return frames\n"
  },
  {
    "path": "Open-Sora/tools/caption/camera_motion/visualizer.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the license found in the cotracker github repo. https://github.com/facebookresearch/co-tracker.\nimport os\n\nimport imageio\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\nimport torchvision.transforms as transforms\nfrom matplotlib import cm\nfrom PIL import Image, ImageDraw\n\n\ndef read_video_from_path(path):\n    try:\n        reader = imageio.get_reader(path)\n    except Exception as e:\n        print(\"Error opening video file: \", e)\n        return None\n    frames = []\n    for i, im in enumerate(reader):\n        frames.append(np.array(im))\n    return np.stack(frames)\n\n\ndef draw_circle(rgb, coord, radius, color=(255, 0, 0), visible=True):\n    # Create a draw object\n    draw = ImageDraw.Draw(rgb)\n    # Calculate the bounding box of the circle\n    left_up_point = (coord[0] - radius, coord[1] - radius)\n    right_down_point = (coord[0] + radius, coord[1] + radius)\n    # Draw the circle\n    draw.ellipse(\n        [left_up_point, right_down_point],\n        fill=tuple(color) if visible else None,\n        outline=tuple(color),\n    )\n    return rgb\n\n\ndef draw_line(rgb, coord_y, coord_x, color, linewidth):\n    draw = ImageDraw.Draw(rgb)\n    draw.line(\n        (coord_y[0], coord_y[1], coord_x[0], coord_x[1]),\n        fill=tuple(color),\n        width=linewidth,\n    )\n    return rgb\n\n\ndef add_weighted(rgb, alpha, original, beta, gamma):\n    return (rgb * alpha + original * beta + gamma).astype(\"uint8\")\n\n\nclass Visualizer:\n    def __init__(\n        self,\n        save_dir: str = \"./results\",\n        grayscale: bool = False,\n        pad_value: int = 0,\n        fps: int = 10,\n        mode: str = \"rainbow\",  # 'cool', 'optical_flow'\n        linewidth: int = 2,\n        show_first_frame: int = 10,\n        tracks_leave_trace: int = 0,  # -1 for infinite\n    ):\n        self.mode = mode\n        self.save_dir = save_dir\n        if mode == \"rainbow\":\n            self.color_map = cm.get_cmap(\"gist_rainbow\")\n        elif mode == \"cool\":\n            self.color_map = cm.get_cmap(mode)\n        self.show_first_frame = show_first_frame\n        self.grayscale = grayscale\n        self.tracks_leave_trace = tracks_leave_trace\n        self.pad_value = pad_value\n        self.linewidth = linewidth\n        self.fps = fps\n\n    def visualize(\n        self,\n        video: torch.Tensor,  # (B,T,C,H,W)\n        tracks: torch.Tensor,  # (B,T,N,2)\n        visibility: torch.Tensor = None,  # (B, T, N, 1) bool\n        gt_tracks: torch.Tensor = None,  # (B,T,N,2)\n        segm_mask: torch.Tensor = None,  # (B,1,H,W)\n        filename: str = \"video\",\n        writer=None,  # tensorboard Summary Writer, used for visualization during training\n        step: int = 0,\n        query_frame: int = 0,\n        save_video: bool = True,\n        compensate_for_camera_motion: bool = False,\n    ):\n        if compensate_for_camera_motion:\n            assert segm_mask is not None\n        if segm_mask is not None:\n            coords = tracks[0, query_frame].round().long()\n            segm_mask = segm_mask[0, query_frame][coords[:, 1], coords[:, 0]].long()\n\n        video = F.pad(\n            video,\n            (self.pad_value, self.pad_value, self.pad_value, self.pad_value),\n            \"constant\",\n            255,\n        )\n        print(\"video shape after pad is: \", video.shape)\n        tracks = tracks + self.pad_value\n\n        print(tracks)\n        print(\"tracks shape after pad is: \", tracks.shape)\n\n        if self.grayscale:\n            transform = transforms.Grayscale()\n            video = transform(video)\n            video = video.repeat(1, 1, 3, 1, 1)\n\n        res_video = self.draw_tracks_on_video(\n            video=video,\n            tracks=tracks,\n            visibility=visibility,\n            segm_mask=segm_mask,\n            gt_tracks=gt_tracks,\n            query_frame=query_frame,\n            compensate_for_camera_motion=compensate_for_camera_motion,\n        )\n        if save_video:\n            self.save_video(res_video, filename=filename, writer=writer, step=step)\n        return res_video\n\n    def save_video(self, video, filename, writer=None, step=0):\n        if writer is not None:\n            writer.add_video(\n                filename,\n                video.to(torch.uint8),\n                global_step=step,\n                fps=self.fps,\n            )\n        else:\n            os.makedirs(self.save_dir, exist_ok=True)\n            wide_list = list(video.unbind(1))\n            wide_list = [wide[0].permute(1, 2, 0).cpu().numpy() for wide in wide_list]\n\n            # Prepare the video file path\n            save_path = os.path.join(self.save_dir, f\"{filename}.mp4\")\n\n            # Create a writer object\n            video_writer = imageio.get_writer(save_path, fps=self.fps)\n\n            # Write frames to the video file\n            for frame in wide_list[2:-1]:\n                video_writer.append_data(frame)\n\n            video_writer.close()\n\n            print(f\"Video saved to {save_path}\")\n\n    def draw_tracks_on_video(\n        self,\n        video: torch.Tensor,\n        tracks: torch.Tensor,\n        visibility: torch.Tensor = None,\n        segm_mask: torch.Tensor = None,\n        gt_tracks=None,\n        query_frame: int = 0,\n        compensate_for_camera_motion=False,\n    ):\n        B, T, C, H, W = video.shape\n        _, _, N, D = tracks.shape\n\n        assert D == 2\n        assert C == 3\n        video = video[0].permute(0, 2, 3, 1).byte().detach().cpu().numpy()  # S, H, W, C\n        tracks = tracks[0].long().detach().cpu().numpy()  # S, N, 2\n        if gt_tracks is not None:\n            gt_tracks = gt_tracks[0].detach().cpu().numpy()\n\n        res_video = []\n\n        # process input video\n        for rgb in video:\n            res_video.append(rgb.copy())\n        vector_colors = np.zeros((T, N, 3))\n\n        if self.mode == \"optical_flow\":\n            import flow_vis\n\n            vector_colors = flow_vis.flow_to_color(tracks - tracks[query_frame][None])\n        elif segm_mask is None:\n            if self.mode == \"rainbow\":\n                y_min, y_max = (\n                    tracks[query_frame, :, 1].min(),\n                    tracks[query_frame, :, 1].max(),\n                )\n                norm = plt.Normalize(y_min, y_max)\n                for n in range(N):\n                    color = self.color_map(norm(tracks[query_frame, n, 1]))\n                    color = np.array(color[:3])[None] * 255\n                    vector_colors[:, n] = np.repeat(color, T, axis=0)\n            else:\n                # color changes with time\n                for t in range(T):\n                    color = np.array(self.color_map(t / T)[:3])[None] * 255\n                    vector_colors[t] = np.repeat(color, N, axis=0)\n        else:\n            if self.mode == \"rainbow\":\n                vector_colors[:, segm_mask <= 0, :] = 255\n\n                y_min, y_max = (\n                    tracks[0, segm_mask > 0, 1].min(),\n                    tracks[0, segm_mask > 0, 1].max(),\n                )\n                norm = plt.Normalize(y_min, y_max)\n                for n in range(N):\n                    if segm_mask[n] > 0:\n                        color = self.color_map(norm(tracks[0, n, 1]))\n                        color = np.array(color[:3])[None] * 255\n                        vector_colors[:, n] = np.repeat(color, T, axis=0)\n\n            else:\n                # color changes with segm class\n                segm_mask = segm_mask.cpu()\n                color = np.zeros((segm_mask.shape[0], 3), dtype=np.float32)\n                color[segm_mask > 0] = np.array(self.color_map(1.0)[:3]) * 255.0\n                color[segm_mask <= 0] = np.array(self.color_map(0.0)[:3]) * 255.0\n                vector_colors = np.repeat(color[None], T, axis=0)\n\n        #  draw tracks\n        if self.tracks_leave_trace != 0:\n            for t in range(query_frame + 1, T):\n                first_ind = max(0, t - self.tracks_leave_trace) if self.tracks_leave_trace >= 0 else 0\n                curr_tracks = tracks[first_ind : t + 1]\n                curr_colors = vector_colors[first_ind : t + 1]\n                if compensate_for_camera_motion:\n                    diff = (tracks[first_ind : t + 1, segm_mask <= 0] - tracks[t : t + 1, segm_mask <= 0]).mean(1)[\n                        :, None\n                    ]\n\n                    curr_tracks = curr_tracks - diff\n                    curr_tracks = curr_tracks[:, segm_mask > 0]\n                    curr_colors = curr_colors[:, segm_mask > 0]\n\n                res_video[t] = self._draw_pred_tracks(\n                    res_video[t],\n                    curr_tracks,\n                    curr_colors,\n                )\n                if gt_tracks is not None:\n                    res_video[t] = self._draw_gt_tracks(res_video[t], gt_tracks[first_ind : t + 1])\n\n        #  draw points\n        for t in range(query_frame, T):\n            img = Image.fromarray(np.uint8(res_video[t]))\n            for i in range(N):\n                coord = (tracks[t, i, 0], tracks[t, i, 1])\n                visibile = True\n                if visibility is not None:\n                    visibile = visibility[0, t, i]\n                if coord[0] != 0 and coord[1] != 0:\n                    if not compensate_for_camera_motion or (compensate_for_camera_motion and segm_mask[i] > 0):\n                        img = draw_circle(\n                            img,\n                            coord=coord,\n                            radius=int(self.linewidth * 2),\n                            color=vector_colors[t, i].astype(int),\n                            visible=visibile,\n                        )\n            res_video[t] = np.array(img)\n\n        #  construct the final rgb sequence\n        if self.show_first_frame > 0:\n            res_video = [res_video[0]] * self.show_first_frame + res_video[1:]\n        return torch.from_numpy(np.stack(res_video)).permute(0, 3, 1, 2)[None].byte()\n\n    def _draw_pred_tracks(\n        self,\n        rgb: np.ndarray,  # H x W x 3\n        tracks: np.ndarray,  # T x 2\n        vector_colors: np.ndarray,\n        alpha: float = 0.5,\n    ):\n        T, N, _ = tracks.shape\n        rgb = Image.fromarray(np.uint8(rgb))\n        for s in range(T - 1):\n            vector_color = vector_colors[s]\n            original = rgb.copy()\n            alpha = (s / T) ** 2\n            for i in range(N):\n                coord_y = (int(tracks[s, i, 0]), int(tracks[s, i, 1]))\n                coord_x = (int(tracks[s + 1, i, 0]), int(tracks[s + 1, i, 1]))\n                if coord_y[0] != 0 and coord_y[1] != 0:\n                    rgb = draw_line(\n                        rgb,\n                        coord_y,\n                        coord_x,\n                        vector_color[i].astype(int),\n                        self.linewidth,\n                    )\n            if self.tracks_leave_trace > 0:\n                rgb = Image.fromarray(np.uint8(add_weighted(np.array(rgb), alpha, np.array(original), 1 - alpha, 0)))\n        rgb = np.array(rgb)\n        return rgb\n\n    def _draw_gt_tracks(\n        self,\n        rgb: np.ndarray,  # H x W x 3,\n        gt_tracks: np.ndarray,  # T x 2\n    ):\n        T, N, _ = gt_tracks.shape\n        color = np.array((211, 0, 0))\n        rgb = Image.fromarray(np.uint8(rgb))\n        for t in range(T):\n            for i in range(N):\n                gt_tracks = gt_tracks[t][i]\n                #  draw a red cross\n                if gt_tracks[0] > 0 and gt_tracks[1] > 0:\n                    length = self.linewidth * 3\n                    coord_y = (int(gt_tracks[0]) + length, int(gt_tracks[1]) + length)\n                    coord_x = (int(gt_tracks[0]) - length, int(gt_tracks[1]) - length)\n                    rgb = draw_line(\n                        rgb,\n                        coord_y,\n                        coord_x,\n                        color,\n                        self.linewidth,\n                    )\n                    coord_y = (int(gt_tracks[0]) - length, int(gt_tracks[1]) + length)\n                    coord_x = (int(gt_tracks[0]) + length, int(gt_tracks[1]) - length)\n                    rgb = draw_line(\n                        rgb,\n                        coord_y,\n                        coord_x,\n                        color,\n                        self.linewidth,\n                    )\n        rgb = np.array(rgb)\n        return rgb\n"
  },
  {
    "path": "Open-Sora/tools/caption/camera_motion_detect.py",
    "content": "# ref: https://github.com/antiboredom/camera-motion-detector\n\nimport argparse\n\nimport cv2\nimport numpy as np\nimport pandas as pd\nfrom tqdm import tqdm\n\ntqdm.pandas()\n\n\ndef apply(df, func, **kwargs):\n    if pandas_has_parallel:\n        return df.parallel_apply(func, **kwargs)\n    return df.progress_apply(func, **kwargs)\n\n\ntry:\n    from pandarallel import pandarallel\n\n    pandarallel.initialize(progress_bar=True)\n    pandas_has_parallel = True\nexcept ImportError:\n    pandas_has_parallel = False\n\n\ndef make_empty(new_w, new_h):\n    empty = []\n    for y in range(new_h):\n        xvals = []\n        for x in range(new_w):\n            xvals.append([x, y])\n        empty.append(xvals)\n\n    empty = np.array(empty)\n    return empty\n\n\ndef get_type(mag, ang, zoom_in, tau_static=1.0, tau_zoom=(0.4, 0.6)):\n    if mag < tau_static:\n        return \"static\"\n    if zoom_in < tau_zoom[0]:\n        return \"zoom out\"\n    if zoom_in > tau_zoom[1]:\n        return \"zoom in\"\n    if ang < 45 or ang >= 315:\n        return \"pan left\"\n    if 45 <= ang < 135:\n        return \"tilt up\"\n    if 135 <= ang < 225:\n        return \"pan right\"\n    if 225 <= ang < 315:\n        return \"tilt down\"\n    return \"unknown\"\n\n\ndef get_video_type(frame_types):\n    # count the number of each type\n    counts = {}\n    max_count = 0\n    max_type = None\n    for frame_type in frame_types:\n        if frame_type not in counts:\n            counts[frame_type] = 0\n        counts[frame_type] += 1\n        if counts[frame_type] > max_count:\n            max_count = counts[frame_type]\n            max_type = frame_type\n    if max_count > len(frame_types) / 2:\n        return max_type\n    if \"static\" in counts:\n        return \"unknown\"\n    if \"zoom in\" not in counts and \"zoom out\" not in counts:\n        return \"pan/tilt\"\n    return \"dynamic\"\n\n\ndef process(path: str, frame_interval=15) -> str:\n    cap = cv2.VideoCapture(path)\n    count = 0\n    prvs = None\n    frame_types = []\n    while cap.isOpened():\n        ret, frame = cap.read()\n        if ret:\n            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)\n            if count == 0:\n                prvs = frame\n                h, w = frame.shape\n                empty = make_empty(w, h)\n                empty_dists = np.sqrt(\n                    np.square(empty.ravel()[::2] - (w / 2)) + np.square(empty.ravel()[1::2] - (h / 2))\n                )\n            else:\n                flow = cv2.calcOpticalFlowFarneback(prvs, frame, None, 0.5, 3, 15, 3, 5, 1.2, 0)\n                mag, ang = cv2.cartToPolar(flow[..., 0], flow[..., 1], angleInDegrees=True)\n                mean_mag = np.median(mag)\n                mean_ang = np.median(ang)\n\n                flow_coords = flow + empty\n                xvals = flow_coords.ravel()[::2] - (w / 2)\n                yvals = flow_coords.ravel()[1::2] - (h / 2)\n                dists = np.sqrt(np.square(xvals) + np.square(yvals))\n                dist_diff = dists >= empty_dists\n                zoom_in_factor = np.count_nonzero(dist_diff) / len(dist_diff)\n                frame_types.append(get_type(mean_mag, mean_ang, zoom_in_factor))\n            count += frame_interval\n            cap.set(cv2.CAP_PROP_POS_FRAMES, count)\n        else:\n            cap.release()\n            break\n    video_type = get_video_type(frame_types)\n    return video_type\n\n\ndef main(args):\n    output_file = args.input.replace(\".csv\", \"_cmotion.csv\")\n    data = pd.read_csv(args.input)\n    data[\"cmotion\"] = apply(data[\"path\"], process)\n    data.to_csv(output_file, index=False)\n    print(f\"Output saved to {output_file}\")\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"input\", type=str)\n    parser.add_argument(\"--disable-parallel\", action=\"store_true\")\n    args = parser.parse_args()\n    if args.disable_parallel:\n        pandas_has_parallel = False\n    main(args)\n"
  },
  {
    "path": "Open-Sora/tools/caption/caption_gpt4.py",
    "content": "import argparse\nimport base64\nimport csv\nimport os\nfrom io import BytesIO\n\nimport requests\nimport tqdm\n\nfrom .utils import IMG_EXTENSIONS, PROMPTS, VID_EXTENSIONS, VideoTextDataset\n\n\ndef to_base64(image):\n    buffer = BytesIO()\n    image.save(buffer, format=\"JPEG\")\n    return base64.b64encode(buffer.getvalue()).decode(\"utf-8\")\n\n\ndef get_caption(frame, prompt, api_key):\n    headers = {\"Content-Type\": \"application/json\", \"Authorization\": f\"Bearer {api_key}\"}\n    payload = {\n        \"model\": \"gpt-4-vision-preview\",\n        \"messages\": [\n            {\n                \"role\": \"user\",\n                \"content\": [\n                    {\n                        \"type\": \"text\",\n                        \"text\": prompt,\n                    },\n                    {\"type\": \"image_url\", \"image_url\": {\"url\": f\"data:image/jpeg;base64,{frame[0]}\"}},\n                    {\"type\": \"image_url\", \"image_url\": {\"url\": f\"data:image/jpeg;base64,{frame[1]}\"}},\n                    {\"type\": \"image_url\", \"image_url\": {\"url\": f\"data:image/jpeg;base64,{frame[2]}\"}},\n                ],\n            }\n        ],\n        \"max_tokens\": 300,\n    }\n    response = requests.post(\"https://api.openai.com/v1/chat/completions\", headers=headers, json=payload, timeout=60)\n    caption = response.json()[\"choices\"][0][\"message\"][\"content\"]\n    caption = caption.replace(\"\\n\", \" \")\n    return caption\n\n\ndef main(args):\n    # ======================================================\n    # 1. read video list\n    # ======================================================\n    dataset = VideoTextDataset(args.input)\n    output_file = os.path.splitext(args.input)[0] + \"_caption.csv\"\n    f = open(output_file, \"w\")\n    writer = csv.writer(f)\n    writer.writerow([\"video\", \"text\"])\n\n    # make sure that the prompt type matches the data type\n    data_extension = \".\" + dataset.data[\"path\"].iloc[0].split(\".\")[-1]\n    prompt_type = PROMPTS[args.prompt][\"type\"]\n    if prompt_type == \"image\":\n        assert (\n            data_extension.lower() in IMG_EXTENSIONS\n        ), \"The prompt is suitable for an image dataset but the data is not image.\"\n    elif prompt_type == \"video\":\n        assert (\n            data_extension.lower() in VID_EXTENSIONS\n        ), \"The prompt is suitable for a video dataset but the data is not video.\"\n    else:\n        raise ValueError(f\"Found invalid prompt type {prompt_type}\")\n\n    # ======================================================\n    # 2. generate captions\n    # ======================================================\n    for sample in tqdm.tqdm(dataset):\n        prompt = PROMPTS[args.prompt][\"text\"]\n        if \"text\" in args.prompt:\n            prompt = prompt.format(sample[\"text\"])\n        frames = sample[\"image\"]\n        frames = [to_base64(frame) for frame in frames]\n        caption = get_caption(frames, prompt, args.key)\n\n        writer.writerow((sample[\"path\"], caption))\n    f.close()\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"input\", type=str, help=\"Path to the input CSV file\")\n    parser.add_argument(\"--prompt\", type=str, default=\"video-f3-detail-3ex\")\n    parser.add_argument(\"--key\", type=str)\n    args = parser.parse_args()\n\n    main(args)\n"
  },
  {
    "path": "Open-Sora/tools/caption/caption_llama3.py",
    "content": "import argparse\nimport csv\nimport os\nimport warnings\nfrom datetime import timedelta\n\nimport pandas as pd\nimport torch\nimport torch.distributed as dist\nfrom torch.utils.data import Dataset\nfrom tqdm import tqdm\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nfrom .utils import read_file\n\nos.system(f\"cp {__file__} ~/backup/\")  # optionally backup the script\nwarnings.filterwarnings(\"ignore\")\nos.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\nfrom torch.distributed.elastic.multiprocessing.errors import record\n\n\nclass CSVTextDataset(Dataset):\n    def __init__(self, csv_path):\n        self.df = pd.read_csv(csv_path)\n        # assert text is in the columns\n        assert \"text\" in self.df.columns, \"text column not found in the csv file\"\n\n    def __len__(self):\n        return len(self.df)\n\n    def __getitem__(self, idx):\n        if idx < 0 or idx >= len(self.df):\n            raise IndexError\n        return self.df.iloc[idx]\n\n    def set_rank_and_world_size(self, rank, world_size):\n        self.rank = rank\n        self.world_size = world_size\n        self.data_per_gpu = len(self) // world_size\n        self.start_index = rank * self.data_per_gpu\n        self.end_index = (rank + 1) * self.data_per_gpu if rank != world_size - 1 else len(self)\n        self.df = self.df.iloc[self.start_index : self.end_index]\n\n    def write_to_csv(self, output_file, data, new_key):\n        \"\"\"write the part of the df to a csv file corresponding to the rank and write self.data_list as a new column\"\"\"\n        writer = csv.writer(open(output_file, \"w\"))\n        columns = self.df.columns + [new_key]\n        writer.writerow(columns)\n        for index, row in self.df.iterrows():\n            if index < self.start_index or index >= self.end_index:\n                continue\n            writer.writerow([*row, data[index - self.start_index]])\n        writer.close()\n\n\ndef pad_left(sequences, padding_value=0):\n    # Determine the maximum length of the sequences\n    max_len = max([s.size(0) for s in sequences])\n    # Create a list to hold the padded sequences\n    padded_sequences = []\n    for sequence in sequences:\n        # Calculate the number of padding elements needed for this sequence\n        num_padding = max_len - sequence.size(0)\n        # Create a tensor of padding values\n        padding = torch.full((num_padding,), padding_value, dtype=sequence.dtype).to(sequence.device)\n        # Concatenate the padding and the sequence to pad on the left\n        padded_sequence = torch.cat([padding, sequence], dim=0)\n        padded_sequences.append(padded_sequence)\n    # Stack the padded sequences into a batch\n    batch = torch.stack(padded_sequences)\n    return batch\n\n\n@record\ndef main(args):\n    # ======================================================\n    # 1. init environment\n    # ======================================================\n    dist.init_process_group(backend=\"nccl\", timeout=timedelta(hours=24))\n    torch.cuda.set_device(dist.get_rank() % torch.cuda.device_count())\n\n    # ======================================================\n    # 2. Prep rank-wise dataloader\n    # ======================================================\n    dataframe = read_file(args.input)\n    print(\"read data from {}\".format(args.input))\n    dataset = CSVTextDataset(args.input)\n    dataset.set_rank_and_world_size(dist.get_rank(), dist.get_world_size())\n\n    import os\n\n    if os.getenv(\"DEBUG_ADDRESS\") != None and dist.get_rank() == 2:\n        import ptvsd\n\n        print(\"waiting for debugger attachment\")\n        ptvsd.enable_attach(address=(\"localhost\", int(os.getenv(\"DEBUG_ADDRESS\"))), redirect_output=True)\n        ptvsd.wait_for_attach()\n\n    output_file = args.output_prefix + f\"_rank{dist.get_rank()}\" + f\"_{args.key}.csv\"\n    output_file_handle = open(output_file, \"w\")\n    writer = csv.writer(output_file_handle)\n    columns = list(dataframe.columns) + [args.key]\n\n    writer.writerow(columns)\n\n    # add a new key named summary, write in csv file\n    print(\"the processed data saved on this rank will be saved to {}\".format(output_file))\n\n    def collate_fn(batch):\n        return batch\n\n    dataloader = torch.utils.data.DataLoader(\n        dataset,\n        # num_workers=2,\n        batch_size=args.batch_size,\n        collate_fn=collate_fn,\n        shuffle=False,\n    )\n\n    # ======================================================\n    # 2. process using llama3 and prompt\n    # ======================================================\n\n    print(\"Using model with the id {}\".format(args.model_id))\n    model_id = args.model_id\n    tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side=\"left\")\n    model = AutoModelForCausalLM.from_pretrained(\n        model_id,\n        torch_dtype=torch.bfloat16,\n        device_map=dist.get_rank() % torch.cuda.device_count(),\n    )\n    # .to(dist.get_rank() % torch.cuda.device_count())\n    dist.barrier()\n    print(\"======== Process data using LLAMA3 ========\")\n\n    def extract_batch(texts, prompt):\n        input_ids_list = [\n            tokenizer.apply_chat_template(\n                [{\"role\": \"system\", \"content\": prompt}, {\"role\": \"user\", \"content\": text}],\n                add_generation_prompt=True,\n                return_tensors=\"pt\",\n            ).to(model.device)[0]\n            for text in texts\n        ]\n\n        attention_mask_list = [\n            torch.ones(input_ids.shape, dtype=torch.long, device=model.device) for input_ids in input_ids_list\n        ]\n\n        # input_ids_batch = pad_left(\n        #     input_ids_list, padding_value=tokenizer.eos_token_id\n        # )\n\n        input_ids_batch = torch.nn.utils.rnn.pad_sequence(\n            input_ids_list, batch_first=True, padding_value=tokenizer.eos_token_id\n        )\n\n        attention_mask_batch = torch.nn.utils.rnn.pad_sequence(attention_mask_list, batch_first=True, padding_value=0)\n\n        # attention_mask_batch = pad_left(\n        #     attention_mask_list, padding_value=0\n        # )\n\n        terminators = [\n            tokenizer.eos_token_id,\n            tokenizer.convert_tokens_to_ids(\"<|eot_id|>\"),\n        ]\n        outputs = model.generate(\n            input_ids_batch,\n            max_new_tokens=512,\n            attention_mask=attention_mask_batch,\n            pad_token_id=tokenizer.eos_token_id,\n            eos_token_id=terminators,\n            # do_sample=True,\n            # temperature=0.6,\n            # top_p=0.9,\n        )\n\n        responses = []\n        for i in range(len(texts)):\n            response = outputs[i][input_ids_list[i].shape[-1] :]\n            response = tokenizer.decode(response, skip_special_tokens=True)\n            responses.append(response)\n\n        return responses\n\n    print(\"Processing starting...\")\n    if args.prompt == \"\" and args.key == \"objects\":\n        prompt = (\n            \"You are a AI assistant to extract objects from user's text. \"\n            \"For example: user: 'In this video a dog is running around. In addition, a person is laughing at the dog.', you produce a list of objects separated by ',' and wrapped by '[' and ']': '[dog, person]' \"\n        )\n    elif args.prompt == \"\" and args.key == \"actions\":\n        prompt = (\n            \"You are a AI assistant to extract actions from user's text. \"\n            \"For example: user: 'In this video a dog is running around. In addition, a person is laughing at the dog.', you produce a list of actions separated by ',' and wrapped by '[' and ']': '[run, laugh]' \"\n        )\n    else:\n        prompt = args.prompt\n\n    print(\"Prompt: {}\".format(prompt))\n\n    args.batch_size\n    # for i in tqdm(range(0, len(dataframe), batch_size)):\n    for _, batch in enumerate(tqdm(dataloader)):\n        # get the text column from the batch\n        texts = [batch[i][\"text\"] for i in range(len(batch))]\n        list_keywords = extract_batch(texts, prompt)\n\n        for idx, keywords in enumerate(list_keywords):\n            try:\n                keywords_start = keywords.find(\"[\")\n                keywords_end = keywords.find(\"]\")\n                keywords = keywords[keywords_start + 1 : keywords_end]\n                if (\n                    \"\\n\" in keywords or len(keywords.strip()) == 0\n                ):  # we empirically observe that it produces newlines when no keywords are found\n                    keywords = \"NONE_FOUND\"\n            except:\n                keywords = \"NONE_FOUND\"\n            row = batch[idx]\n            writer.writerow([*row, keywords])\n\n    output_file_handle.close()\n    dist.barrier()\n\n    if dist.get_rank() == 0:\n        collated_file = args.output_prefix + f\"_{args.key}.csv\"\n        print(\"All ranks are finished. Collating the processed data to {}\".format(collated_file))\n        import pandas as pd\n\n        csv_files = [args.output_prefix + f\"_rank{i}\" + f\"_{args.key}.csv\" for i in range(dist.get_world_size())]\n        # List to hold DataFrames\n        dataframes = []\n        # Read each CSV into a DataFrame and append to list\n        for file in csv_files:\n            df = pd.read_csv(file)\n            # scan each line in the df, if the ``key`` column is NaN, replace it with \"NONE_FOUND\"\n            df[args.key] = df[args.key].fillna(\"NONE_FOUND\")\n            dataframes.append(df)\n        # Concatenate all DataFrames\n        combined_df = pd.concat(dataframes, ignore_index=True)\n\n        # Save the combined DataFrame to a new CSV file\n        combined_df.to_csv(collated_file, index=False)\n        print(\"Collated data saved to {}\".format(collated_file))\n    # terminate distributed env\n    dist.destroy_process_group()\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"--model-id\", default=\"meta-llama/Meta-Llama-3-8B-Instruct\")\n    parser.add_argument(\"input\", type=str, help=\"Path to the input CSV file\")\n    parser.add_argument(\"--output_prefix\", type=str, help=\"Path to the output CSV file\")\n    parser.add_argument(\"--prompt\", type=str, default=\"\")\n    parser.add_argument(\"--batch_size\", type=int, default=32)\n    parser.add_argument(\"--key\", type=str)\n    args = parser.parse_args()\n\n    main(args)\n"
  },
  {
    "path": "Open-Sora/tools/caption/caption_llava.py",
    "content": "import argparse\nimport csv\nimport time\nimport warnings\nfrom datetime import timedelta\n\nimport torch\nimport torch.distributed as dist\nfrom colossalai.cluster import DistCoordinator, ProcessGroupMesh\nfrom colossalai.shardformer import ShardConfig, ShardFormer\nfrom colossalai.utils import get_current_device, set_seed\nfrom llava.constants import DEFAULT_IMAGE_TOKEN, IMAGE_TOKEN_INDEX\nfrom llava.conversation import conv_templates\nfrom llava.mm_utils import get_model_name_from_path, process_images, tokenizer_image_token\nfrom llava.model.builder import load_pretrained_model\nfrom llava.utils import disable_torch_init\nfrom torch.utils.data.distributed import DistributedSampler\nfrom tqdm import tqdm\n\nfrom ..datasets.utils import IMG_EXTENSIONS, VID_EXTENSIONS\nfrom .acceleration.llava.policies import LlavaLlamaForCausalLMPolicy, LlavaMistralForCausalLMPolicy\nfrom .utils import PROMPTS, Timer, VideoTextDataset, collate_fn\n\ndisable_torch_init()\n\n\nclass NoPaddingDistributedSampler(DistributedSampler):\n    def __init__(self, dataset, num_replicas=None, rank=None, shuffle=True, seed=0, drop_last=False):\n        super().__init__(\n            dataset=dataset, num_replicas=num_replicas, rank=rank, seed=seed, shuffle=False, drop_last=False\n        )\n        remainder = len(self.dataset) % self.num_replicas\n        if remainder > 0 and (self.rank + 1) - remainder <= 0:\n            # if the dataset is not divisible by num_replicas\n            # the remaining items will be allocated to the first n ranks\n            self.num_samples = len(self.dataset) // self.num_replicas + 1\n        else:\n            self.num_samples = len(self.dataset) // self.num_replicas\n        self.total_size = len(dataset)\n\n    def __iter__(self):\n        if self.shuffle:\n            # deterministically shuffle based on epoch and seed\n            g = torch.Generator()\n            g.manual_seed(self.seed + self.epoch)\n            indices = torch.randperm(len(self.dataset), generator=g).tolist()  # type: ignore[arg-type]\n        else:\n            indices = list(range(len(self.dataset)))  # type: ignore[arg-type]\n\n        # remove tail of data to make it evenly divisible.\n        indices = indices[: self.total_size]\n\n        # subsample\n        indices = indices[self.rank : self.total_size : self.num_replicas]\n        assert len(indices) == self.num_samples\n        return iter(indices)\n\n\n@torch.inference_mode()\ndef main(args):\n    # ======================================================\n    # 1. init environment\n    # ======================================================\n    # we set a very large timeout to avoid some processes exit early\n    dist.init_process_group(backend=\"nccl\", timeout=timedelta(hours=24))\n    torch.cuda.set_device(dist.get_rank() % torch.cuda.device_count())\n    set_seed(1024)\n    coordinator = DistCoordinator()\n\n    # prepare the dp and tp groups\n    assert (\n        args.dp_size * args.tp_size == coordinator.world_size\n    ), f\"DP size {args.dp_size} * TP size {args.tp_size} must equal to world size {coordinator.world_size}\"\n    mesh = ProcessGroupMesh(args.dp_size, args.tp_size)\n    dp_group = mesh.get_group_along_axis(0)\n    tp_group = mesh.get_group_along_axis(1)\n\n    # ======================================================\n    # 2. load model\n    # ======================================================\n    model_path = args.model_path\n    with warnings.catch_warnings():\n        warnings.simplefilter(\"ignore\")  # Pytorch non-meta copying warning fills out the console\n        tokenizer, model, image_processor, context_len = load_pretrained_model(\n            model_path=model_path,\n            model_base=None,\n            model_name=get_model_name_from_path(model_path),\n            device=get_current_device(),\n            torch_dtype=torch.float16,\n            attn_implementation=\"flash_attention_2\" if args.flash_attention else \"eager\",\n        )\n        dist.barrier()\n\n    # ======================================================\n    # 3. Apply system optimization\n    # ======================================================\n    tp_size = dist.get_world_size(tp_group)\n    shard_config = ShardConfig(\n        tensor_parallel_process_group=tp_group if tp_size > 1 else None,\n        enable_tensor_parallelism=True if tp_size > 1 else False,\n    )\n    shard_former = ShardFormer(shard_config=shard_config)\n\n    # check the model type\n    model_name = model.__class__.__name__\n    print(model_name)\n    if model_name == \"LlavaLlamaForCausalLM\":\n        model = shard_former.optimize(model, policy=LlavaLlamaForCausalLMPolicy())[0].cuda()\n    elif model_name == \"LlavaMistralForCausalLM\":\n        model = shard_former.optimize(model, policy=LlavaMistralForCausalLMPolicy())[0].cuda()\n    else:\n        print(f\"The shardformer policy for {model_name} is not implemented, skip\")\n    torch.cuda.empty_cache()\n\n    # ======================================================\n    # 4. Prepare dataloader\n    # ======================================================\n    # prepare prompt\n    query = PROMPTS[args.prompt][\"text\"]\n    if dist.get_rank() == 0:\n        print(f\"Prompt: {query}\")\n\n    if \"text\" in args.prompt:\n\n        def get_text_input_ids(text):\n            conv = conv_templates[\"chatml_direct\"].copy()\n            query_text = query.format(text)\n            conv.append_message(conv.roles[0], DEFAULT_IMAGE_TOKEN + \"\\n\" + query_text)\n            prompt = conv.get_prompt()\n            # add num_frames images\n            t = prompt.split(\"<image>\")\n            prompt = t[0] + \"<image>\" * args.num_frames + t[1]\n            input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors=\"pt\")\n            input_ids = input_ids.unsqueeze(0)\n            return input_ids\n\n    else:\n        conv = conv_templates[\"chatml_direct\"].copy()\n        conv.append_message(conv.roles[0], DEFAULT_IMAGE_TOKEN + \"\\n\" + query)\n        prompt = conv.get_prompt()\n        # add num_frames images\n        t = prompt.split(\"<image>\")\n        prompt = t[0] + \"<image>\" * args.num_frames + t[1]\n        input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors=\"pt\")\n        input_ids = input_ids.unsqueeze(0)\n\n        def get_text_input_ids(*args):\n            return input_ids\n\n    # build dataset\n    def transform(imgs):\n        imgs = process_images(imgs, image_processor, model.config)\n        imgs = imgs.to(dtype=torch.float16)\n        return imgs\n\n    dataset = VideoTextDataset(\n        args.input,\n        transform=transform,\n        num_frames=args.num_frames,\n        get_text_input_ids=get_text_input_ids,\n        resize=args.resize,\n    )\n\n    # make sure that the prompt type matches the data type\n    data_extension = \".\" + dataset.data[\"path\"].iloc[0].split(\".\")[-1]\n    prompt_type = PROMPTS[args.prompt][\"type\"]\n    if prompt_type == \"image\":\n        assert (\n            data_extension.lower() in IMG_EXTENSIONS\n        ), f\"The prompt is suitable for an image dataset but the data is not image. The first data is of format {data_extension}\"\n    elif prompt_type == \"video\":\n        assert (\n            data_extension.lower() in VID_EXTENSIONS\n        ), f\"The prompt is suitable for a video dataset but the data is not video. The first data is of format {data_extension}\"\n    else:\n        raise ValueError(f\"Found invalid prompt type {prompt_type}\")\n\n    total_num_videos = len(dataset)\n\n    # build sampler\n    dp_rank = dist.get_rank(dp_group)\n    dp_size = dist.get_world_size(dp_group)\n    sampler = NoPaddingDistributedSampler(dataset, rank=dp_rank, num_replicas=dp_size)\n\n    # build dataloader\n    dataloader = torch.utils.data.DataLoader(\n        dataset,\n        batch_size=args.bs,\n        shuffle=False,\n        num_workers=args.num_workers,\n        pin_memory=True,\n        prefetch_factor=args.prefetch_factor,\n        sampler=sampler,\n        collate_fn=collate_fn,\n    )\n\n    # prepare output file reader\n    output_file = args.input.replace(\".csv\", \"_caption.csv\")\n\n    # create csv writer\n    has_dp_writter = dist.get_rank(tp_group) == 0\n\n    if has_dp_writter:\n        # the dp writer takes care of the files processed on the current dp rank\n        # so we use write mode\n        output_file_split = output_file.replace(\".csv\", f\"_part{dp_rank}.csv\")\n        dp_file = open(output_file_split, \"w\")\n        dp_writer = csv.writer(dp_file)\n        dp_writer.writerow([\"path\", \"text\", \"num_frames\"])\n\n    # ======================================================\n    # 5. generate captions\n    # ======================================================\n    if dist.get_rank(tp_group) == 0:\n        pbar = tqdm(dataloader, position=dp_rank, desc=f\"Data Parallel Rank {dist.get_rank(dp_group)}\")\n    else:\n        pbar = dataloader\n\n    if args.profile:\n        encode_time = []\n        generate_time = []\n        output_length = []\n        total_time = []\n\n    for i, batch in enumerate(pbar):\n        # measure time\n        if args.profile:\n            torch.cuda.synchronize()\n            start_time = time.time()\n\n        video_files, frames, video_lengths, img_size_list, texts = batch\n\n        # encode the batch of inputs\n        with Timer() as encode_timer:\n            samples = []\n            for imgs, imgs_size, input_ids in zip(frames, img_size_list, texts):\n                imgs = imgs.cuda()\n                input_ids = input_ids.cuda()\n                _, _, _, _, inputs_embeds, _ = model.prepare_inputs_labels_for_multimodal(\n                    input_ids, None, None, None, None, images=imgs, image_sizes=imgs_size\n                )\n                samples.append(inputs_embeds)\n\n        # padding\n        max_len = max([sample.shape[1] for sample in samples])\n        attention_mask = torch.tensor(\n            [[0] * (max_len - samples[i].shape[1]) + [1] * samples[i].shape[1] for i in range(len(samples))]\n        ).to(model.device)\n        inputs_embeds = [\n            torch.cat(\n                [\n                    torch.zeros(\n                        (1, max_len - samples[i].shape[1], samples[i].shape[-1]),\n                        device=model.device,\n                        dtype=torch.float16,\n                    ),\n                    samples[i],\n                ],\n                dim=1,\n            )\n            for i in range(len(samples))\n        ]\n        inputs_embeds = torch.cat(inputs_embeds, dim=0)\n\n        # generate outputs\n        with Timer() as generate_timer:\n            output_ids = super(type(model), model).generate(\n                inputs_embeds=inputs_embeds,\n                attention_mask=attention_mask,\n                do_sample=False,  # sampling is not deterministic and may cause TP to hang\n                max_new_tokens=args.max_tokens,\n                use_cache=True,\n            )\n\n            # skip warmup and add profiling data\n            if args.profile and i >= args.profile_warmup:\n                output_length.append(output_ids.size(0) * output_ids.size(1))\n\n            outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)\n            outputs = [output.replace(\"\\n\", \" \").strip() for output in outputs]\n\n        # skip warmup and add profiling data\n        if args.profile and i >= args.profile_warmup:\n            # measure time\n            torch.cuda.synchronize()\n            time_taken = time.time() - start_time\n\n            total_time.append(time_taken)\n            encode_time.append(encode_timer.time_taken)\n            generate_time.append(generate_timer.time_taken)\n\n        # save results\n        if has_dp_writter:\n            result = list(zip(video_files, outputs, video_lengths))\n            for t in result:\n                dp_writer.writerow(t)\n\n    # display profiling info\n    if args.profile:\n        print(output_length)\n        num_samples_after_warmup = total_num_videos - args.bs * args.profile_warmup * dp_size\n        print(f\"throughput (samples/s): {num_samples_after_warmup / sum(total_time)}\")\n        print(f\"average encode time per sample: {sum(encode_time) / num_samples_after_warmup}\")\n        print(f\"average generate time per sample: {sum(generate_time) / num_samples_after_warmup}\")\n        print(f\"average number of tokens characters per sample: {sum(output_length) / num_samples_after_warmup}\")\n        print(f\"Max GPU allocated / GB: {torch.cuda.max_memory_allocated() / 1024**3}\")\n        print(f\"Max GPU reserved / GB: {torch.cuda.max_memory_reserved() / 1024**3}\")\n\n    # ======================================================\n    # 6. shutdown\n    # ======================================================\n    # close file writing\n    if has_dp_writter:\n        dp_file.close()\n    dist.barrier()\n\n    # terminate distributed env\n    dist.destroy_process_group()\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"input\", type=str, help=\"Path to the input CSV file\")\n    parser.add_argument(\"--model-path\", type=str, default=\"liuhaotian/llava-v1.6-34b\")\n    parser.add_argument(\"--prompt\", type=str, default=\"video-f1-detail-3ex\")\n    parser.add_argument(\"--resize\", type=int, default=336)\n    parser.add_argument(\"--num-frames\", type=int, default=1)\n    parser.add_argument(\"--max-tokens\", type=int, default=300)\n    # speed related\n    parser.add_argument(\"--bs\", type=int, default=16)\n    parser.add_argument(\"--tp-size\", type=int, default=2)\n    parser.add_argument(\"--dp-size\", type=int, default=4)\n    parser.add_argument(\"--num-workers\", type=int, default=8)\n    parser.add_argument(\"--prefetch-factor\", type=int, default=8, help=\"Prefetch factor\")\n    parser.add_argument(\n        \"--flash-attention\",\n        action=\"store_true\",\n        help=\"Whether to use flash attention. You can turn on this flag for llama model and off for mistral model.\",\n    )\n    # debug related\n    parser.add_argument(\"--profile\", action=\"store_true\")\n    parser.add_argument(\"--profile-warmup\", type=int, default=1)\n\n    args = parser.parse_args()\n    main(args)\n"
  },
  {
    "path": "Open-Sora/tools/caption/pllava_dir/caption_pllava.py",
    "content": "import sys\nimport os\nimport os\nfrom pathlib import Path\n\ncurrent_file = Path(__file__)  # Gets the path of the current file\nfourth_level_parent = current_file.parents[3]\n\ndatasets_dir = os.path.join(fourth_level_parent, \"opensora/datasets\")\nimport sys\nsys.path.append(datasets_dir)\nfrom read_video import read_video_av\nsys.path.remove(datasets_dir)\n\nimport itertools\nimport logging\nimport multiprocessing as mp\nfrom argparse import ArgumentParser\nfrom multiprocessing import Process, Queue\n\nimport numpy as np\nimport pandas as pd\nimport torch\nimport torchvision\nimport transformers\nfrom decord import VideoReader, cpu\nfrom PIL import Image\nfrom tasks.eval.eval_utils import Conversation\nfrom tasks.eval.model_utils import load_pllava\nfrom torch.utils.data import Dataset\nfrom tqdm import tqdm\nfrom transformers.feature_extraction_utils import BatchFeature\n\nconv_template = Conversation(\n    system=\"Describe this video. Pay attention to all objects in the video. The description should be useful for AI to re-generate the video. The description should be no more than six sentences. Here are some examples of good descriptions: 1. A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about. 2. Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field. 3. Drone view of waves crashing against the rugged cliffs along Big Sur's garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff's edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.\",\n    roles=(\"USER:\", \"ASSISTANT:\"),\n    messages=[],\n    sep=(\" \", \"</s>\"),\n    mm_token=\"<image>\",\n)\n\nlogging.basicConfig()\nlogger = logging.getLogger(__name__)\nlogger.setLevel(logging.INFO)\n\nRESOLUTION = 672  #\n\n\ndef pllava_answer(\n    conv: Conversation,\n    model,\n    processor,\n    video_list,\n    do_sample=True,\n    max_new_tokens=200,\n    num_beams=1,\n    min_length=1,\n    top_p=0.9,\n    repetition_penalty=1.0,\n    length_penalty=1,\n    temperature=1.0,\n    stop_criteria_keywords=None,\n    print_res=False,\n):\n    # torch.cuda.empty_cache()\n    prompt = conv.get_prompt()\n    inputs_list = [processor(text=prompt, images=video, return_tensors=\"pt\") for video in video_list]\n    inputs_batched = dict()  # add batch dimension by cat\n    for input_type in list(inputs_list[0].keys()):\n        inputs_batched[input_type] = torch.cat([inputs[input_type] for inputs in inputs_list])\n    inputs_batched = BatchFeature(inputs_batched, tensor_type=\"pt\").to(model.device)\n\n    with torch.no_grad():\n        output_texts = model.generate(\n            **inputs_batched,\n            media_type=\"video\",\n            do_sample=do_sample,\n            max_new_tokens=max_new_tokens,\n            num_beams=num_beams,\n            min_length=min_length,\n            top_p=top_p,\n            repetition_penalty=repetition_penalty,\n            length_penalty=length_penalty,\n            temperature=temperature,\n        )\n        output_texts = processor.batch_decode(\n            output_texts, skip_special_tokens=True, clean_up_tokenization_spaces=False\n        )\n    for i in range(len(output_texts)):\n        if print_res:  # debug usage\n            print(\"### PROMPTING LM WITH: \", prompt)\n            print(\"### LM OUTPUT TEXT:  \", output_texts[i])\n        if conv.roles[-1] == \"<|im_start|>assistant\\n\":\n            split_tag = \"<|im_start|> assistant\\n\"\n        else:\n            split_tag = conv.roles[-1]\n        output_texts[i] = output_texts[i].split(split_tag)[-1]\n        ending = conv.sep if isinstance(conv.sep, str) else conv.sep[1]\n        output_texts[i] = output_texts[i].removesuffix(ending).strip()\n        output_texts[i] = output_texts[i].replace(\"\\n\", \" \")\n        conv.messages[-1][1] = output_texts[i]\n    return output_texts, conv\n\n\ndef get_index(num_frames, num_segments):\n    seg_size = float(num_frames - 1) / num_segments\n    start = int(seg_size / 2)\n    offsets = np.array([start + int(np.round(seg_size * idx)) for idx in range(num_segments)])\n    return offsets\n\n\n# def load_video(video_path, num_frames, return_msg=False, resolution=336):\n#     transforms = torchvision.transforms.Resize(size=resolution)\n#     vr = VideoReader(video_path, ctx=cpu(0), num_threads=1)\n#     total_num_frames = len(vr)\n#     frame_indices = get_index(total_num_frames, num_frames)\n#     images_group = list()\n#     for frame_index in frame_indices:\n#         img = Image.fromarray(vr[frame_index].asnumpy())\n#         images_group.append(transforms(img))\n#     if return_msg:\n#         fps = float(vr.get_avg_fps())\n#         sec = \", \".join([str(round(f / fps, 1)) for f in frame_indices])\n#         # \" \" should be added in the start and end\n#         msg = f\"The video contains {len(frame_indices)} frames sampled at {sec} seconds.\"\n#         return images_group, msg\n#     else:\n#         return images_group\n\n\ndef load_video(video_path, num_frames, return_msg=False, resolution=336):\n    transforms = torchvision.transforms.Resize(size=resolution)\n    # vr = VideoReader(video_path, ctx=cpu(0), num_threads=1)\n    vframes, aframes, info = read_video_av(\n        video_path,\n        pts_unit=\"sec\", \n        output_format=\"THWC\"\n    )\n    print(vframes.shape)\n    total_num_frames = len(vframes)\n    # print(\"Video path: \", video_path)\n    # print(\"Total number of frames: \", total_num_frames)\n    frame_indices = get_index(total_num_frames, num_frames)\n    images_group = list()\n    for frame_index in frame_indices:\n        img = Image.fromarray(vframes[frame_index].numpy())\n        images_group.append(transforms(img))\n    if return_msg:\n        # fps = float(vframes.get_avg_fps())\n        # sec = \", \".join([str(round(f / fps, 1)) for f in frame_indices])\n        # # \" \" should be added in the start and end\n        # msg = f\"The video contains {len(frame_indices)} frames sampled at {sec} seconds.\"\n        # return images_group, msg\n        exit('return_msg not implemented yet')\n    else:\n        return images_group\n\n\ndef collate_fn(batch):\n    return batch\n\n\nclass CSVDataset(Dataset):\n    def __init__(self, csv_path, num_frames):\n        self.df = pd.read_csv(csv_path)\n        self.data_list = self.df.path.tolist()\n        self.num_frames = num_frames\n\n    def __len__(self):\n        return len(self.data_list)\n\n    def __getitem__(self, idx):\n        if idx < 0 or idx >= len(self.data_list):\n            raise IndexError\n        try:\n            video = load_video(self.data_list[idx], self.num_frames, resolution=RESOLUTION)\n        except:\n            return None\n        return video\n\n    def set_rank_and_world_size(self, rank, world_size):\n        self.rank = rank\n        self.world_size = world_size\n        self.data_per_gpu = len(self) // world_size\n        start_index = rank * self.data_per_gpu\n        end_index = (rank + 1) * self.data_per_gpu if rank != world_size - 1 else len(self)\n        self.data_list = self.data_list[start_index:end_index]\n\n\ndef parse_args():\n    parser = ArgumentParser()\n    parser.add_argument(\"--pretrained_model_name_or_path\", type=str, required=True, default=\"llava-hf/llava-1.5-7b-hf\")\n    parser.add_argument(\n        \"--batch_size\",\n        type=int,\n        required=False,\n        default=1,\n    )\n    parser.add_argument(\n        \"--csv_path\",\n        type=str,\n        required=True,\n    )\n    parser.add_argument(\n        \"--num_frames\",\n        type=int,\n        required=True,\n        default=4,\n    )\n    parser.add_argument(\"--use_lora\", action=\"store_true\")\n    parser.add_argument(\n        \"--lora_alpha\",\n        type=int,\n        required=False,\n        default=4,\n    )\n    parser.add_argument(\n        \"--weight_dir\",\n        type=str,\n        required=False,\n        default=None,\n    )\n    parser.add_argument(\n        \"--conv_mode\",\n        type=str,\n        required=False,\n        default=\"eval_mvbench\",\n    )\n    parser.add_argument(\n        \"--pooling_shape\",\n        type=str,\n        required=False,\n        default=None,\n    )\n    parser.add_argument(\n        \"--error_message\",\n        type=str,\n        required=False,\n        default='error occured during captioning',\n    )\n    args = parser.parse_args()\n    return args\n\n\ndef load_model_and_dataset(\n    rank,\n    world_size,\n    pretrained_model_name_or_path,\n    num_frames,\n    use_lora,\n    lora_alpha,\n    weight_dir,\n    csv_path,\n    pooling_shape=(16, 12, 12),\n):\n    # remind that, once the model goes larger (30B+) may cause the memory to be heavily used up. Even Tearing Nodes.\n    model, processor = load_pllava(\n        pretrained_model_name_or_path,\n        num_frames=num_frames,\n        use_lora=use_lora,\n        weight_dir=weight_dir,\n        lora_alpha=lora_alpha,\n        pooling_shape=pooling_shape,\n    )\n    logger.info(\"done loading llava\")\n\n    #  position embedding\n    model = model.to(torch.device(rank))\n    model = model.eval()\n\n    dataset = CSVDataset(csv_path, num_frames)\n    dataset.set_rank_and_world_size(rank, world_size)\n    return model, processor, dataset\n\n\ndef infer(\n    model,\n    processor,\n    video_list,\n    conv_mode,\n    print_res=False,\n):\n    # check if any video in video_list is None, if so, raise an exception\n    if any([video is None for video in video_list]):\n        raise Exception(\"Video not loaded properly\")\n    conv = conv_template.copy()\n    conv.user_query(\"Describe the video in details.\", is_mm=True)\n\n    llm_responses, conv = pllava_answer(\n        conv=conv,\n        model=model,\n        processor=processor,\n        video_list=video_list,\n        max_new_tokens=256,\n        do_sample=False,\n        print_res=print_res,\n    )\n\n    return llm_responses\n\n\ndef run(rank, args, world_size, output_queue):\n    if rank == 0:\n        import os\n\n        if os.getenv(\"DEBUG_ADDRESS\") != None:\n            import ptvsd\n\n            ptvsd.enable_attach(address=(\"localhost\", int(os.getenv(\"DEBUG_ADDRESS\"))), redirect_output=True)\n            ptvsd.wait_for_attach()\n            print(\"waiting for debugger attachment\")\n    if rank != 0:\n        transformers.utils.logging.set_verbosity_error()\n        logger.setLevel(transformers.logging.ERROR)\n\n    print_res = False\n    conv_mode = args.conv_mode\n    if args.pooling_shape is not None:\n        pooling_shape = tuple([int(x) for x in args.pooling_shape.split(\"-\")])\n\n    logger.info(f\"loading model and constructing dataset to gpu {rank}...\")\n    model, processor, dataset = load_model_and_dataset(\n        rank,\n        world_size,\n        pretrained_model_name_or_path=args.pretrained_model_name_or_path,\n        num_frames=args.num_frames,\n        use_lora=args.use_lora,\n        lora_alpha=args.lora_alpha,\n        weight_dir=args.weight_dir,\n        pooling_shape=pooling_shape,\n        csv_path=args.csv_path,\n    )\n    logger.info(f\"done model and dataset...\")\n    logger.info(\"constructing dataset...\")\n    logger.info(\"single test...\")\n    dataloader = torch.utils.data.DataLoader(\n        dataset,\n        num_workers=2,\n        batch_size=args.batch_size,\n        collate_fn=collate_fn,\n        shuffle=False,\n    )\n\n    total = 0\n    result_list = []\n    print(len(dataset))\n    for batch in tqdm(dataloader):\n        total += 1\n        try:\n            preds = infer(\n                model,\n                processor,\n                batch,\n                conv_mode=conv_mode,\n                print_res=print_res,\n            )\n        except Exception as e:\n            logger.error(f\"error in {batch}: {str(e)}\")\n            # preds = args.error_message duplicated for each video in the batch\n            preds = [args.error_message] * len(batch)\n        result_list.extend(preds)\n    output_queue.put((rank, result_list))\n    return result_list\n\n\ndef main():\n    multiprocess = True\n    mp.set_start_method(\"spawn\")\n    args = parse_args()\n    # csv_path = '/home/tom/PLLaVA/test_short_caption_part2.csv'\n    if multiprocess:\n        n_gpus = torch.cuda.device_count()\n        world_size = n_gpus\n        print(f\"world_size: {world_size}\")\n        # Create a queue to collect results from each process\n        output_queue = Queue()\n\n        # with Pool(world_size) as pool:\n        #     func = functools.partial(run, args=args, world_size=world_size)\n        #     result_lists = pool.map(func, range(world_size))\n        processes = []\n        for i in range(world_size):\n            # Each process will now also take the output queue as an argument\n            p = Process(target=run, args=(i, args, world_size, output_queue))\n            p.daemon = False\n            processes.append(p)\n            p.start()\n\n        results_by_rank = {}\n        for _ in range(world_size):\n            rank, results = output_queue.get()  # Retrieve results as they finish\n            results_by_rank[rank] = results\n            print(f\"Results received from rank {rank}\")\n            # ORDER THE RESULTS BY RANK\n        logger.info(\"finished running\")\n        for p in processes:\n            p.join()\n\n        results_list = list(itertools.chain.from_iterable(results_by_rank[i] for i in range(world_size)))\n        # results_list = list(itertools.chain([results_by_rank[i] for i in range(world_size)]))\n        # (data[key] for key in sorted_keys)\n        # results_list = [item for sublist in results_by_rank.values() for item in sublist]\n\n    else:\n        results_list = run(0, world_size=1, args=args)  # debug\n\n    print(results_list)\n\n    df = pd.read_csv(args.csv_path)\n    # add a new column to the dataframe\n    df[\"text\"] = results_list\n    drop_failed = True\n    if drop_failed:\n        # iterate through the dataframe and delete the entire row if captioning failed\n        for i in tqdm(range(len(df))):\n            if df[\"text\"][i] == args.error_message:\n                df = df.drop(i)\n    # write the dataframe to a new csv file called '*_pllava_13b_caption.csv'\n    new_csv_path = args.csv_path.replace(\".csv\", \"_text.csv\")\n    df.to_csv(new_csv_path, index=False)\n    print(f\"Results saved to {new_csv_path}\")\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/tools/caption/utils.py",
    "content": "import time\n\nimport pandas as pd\nimport torch\nimport torchvision.transforms as transforms\nfrom torchvision.datasets.folder import pil_loader\n\nfrom tools.datasets.utils import extract_frames, is_video\n\nIMG_EXTENSIONS = (\".jpg\", \".jpeg\", \".png\", \".ppm\", \".bmp\", \".pgm\", \".tif\", \".tiff\", \".webp\")\nVID_EXTENSIONS = (\".mp4\", \".avi\", \".mov\", \".mkv\")\nPROMPTS = {\n    \"image\": {\n        \"text\": \"Describe this image and its style to generate a succinct yet informative description. Pay attention to all objects in the image. The description should be useful for AI to re-generate the image. The description should be no more than five sentences. Remember do not exceed 5 sentences.\",\n        \"type\": \"image\",\n    },\n    \"image-text\": {\n        \"text\": \"Describe this image and its style in a very detailed manner. Pay attention to all objects in the image. The description should be useful for AI to re-generate the image. The description should be no more than six sentences. Some information about the image is '{}'.\",\n        \"type\": \"image\",\n    },\n    \"image-3ex\": {\n        \"text\": \"An image is given. Describe this image and its style to generate a succinct yet informative description. Pay attention to all objects in the image. The description should be useful for AI to re-generate the video. The description should be no more than five sentences. Here are some examples of good descriptions: 1. A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick and walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about. 2. Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field. 3. Drone view of waves crashing against the rugged cliffs along Big Sur's garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff's edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.\",\n        \"type\": \"image\",\n    },\n    \"video\": {\n        \"text\": \"Describe this video and its style in a very detailed manner. Pay attention to all objects in the video. The description should be useful for AI to re-generate the video. The description should be no more than six sentences.\",\n        \"type\": \"video\",\n    },\n    \"video-text\": {\n        \"text\": \"Describe this video and its style in a very detailed manner. Some information about the image is '{}'. Pay attention to all objects in the video. The description should be useful for AI to re-generate the video. The description should be no more than six sentences.\",\n        \"type\": \"video\",\n    },\n    \"video-f1-detail-3ex\": {\n        \"text\": \"A video is given by providing the middle frame. Describe this video and its style to generate a description. Pay attention to all objects in the video. The description should be useful for AI to re-generate the video. The description should be no more than six sentences. Here are some examples of good descriptions: 1. A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about. 2. Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field. 3. Drone view of waves crashing against the rugged cliffs along Big Sur's garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff's edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.\",\n        \"type\": \"video\",\n    },\n    \"video-f1-detail-2ex-text\": {\n        \"text\": \"A video is given by providing the middle frame. Some information about the image is '{}'. Describe this video and its style to generate a description. Pay attention to all objects in the video. Do not describe each frame individually. Do not reply with words like 'first frame'. The description should be useful for AI to re-generate the video. The description should be no more than six sentences. Here are some examples of good descriptions: 1. A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about. 2. Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.\",\n        \"type\": \"video\",\n    },\n    \"video-f3-detail-3ex\": {\n        \"text\": \"A video is given by providing three frames in chronological order. Describe this video and its style to generate a description. Pay attention to all objects in the video. Do not describe each frame individually. Do not reply with words like 'first frame'. The description should be useful for AI to re-generate the video. The description should be no more than six sentences. Here are some examples of good descriptions: 1. A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about. 2. Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field. 3. Drone view of waves crashing against the rugged cliffs along Big Sur's garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff's edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.\",\n        \"type\": \"video\",\n    },\n    \"video-f3-detail-2ex-text\": {\n        \"text\": \"A video is given by providing three frames in chronological order. Some information about the image is '{}'. Describe this video and its style to generate a description. Pay attention to all objects in the video. Do not describe each frame individually. Do not reply with words like 'first frame'. The description should be useful for AI to re-generate the video. The description should be no more than six sentences. Here are some examples of good descriptions: 1. A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about. 2. Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.\",\n        \"type\": \"video\",\n    },\n}\n\n\nNUM_FRAMES_POINTS = {\n    1: (0.5,),\n    2: (0.25, 0.75),\n    3: (0.1, 0.5, 0.9),\n}\n\n\ndef read_file(input_path):\n    if input_path.endswith(\".csv\"):\n        return pd.read_csv(input_path)\n    elif input_path.endswith(\".parquet\"):\n        return pd.read_parquet(input_path)\n    else:\n        raise NotImplementedError(f\"Unsupported file format: {input_path}\")\n\n\nclass VideoTextDataset(torch.utils.data.Dataset):\n    def __init__(self, csv_path, transform=None, num_frames=3, get_text_input_ids=None, resize=None):\n        self.csv_path = csv_path\n        self.transform = transform\n        self.data = read_file(csv_path)\n        self.points = NUM_FRAMES_POINTS[num_frames]\n        self.get_text_input_ids = get_text_input_ids\n        self.use_text = False\n        self.resize_size = resize\n        self.resize = transforms.Resize(resize, transforms.InterpolationMode.BICUBIC) if resize is not None else None\n        if \"text\" in self.data.columns:\n            self.use_text = True\n\n    def getitem(self, index):\n        sample = self.data.iloc[index]\n        path = sample[\"path\"]\n        if not is_video(path):\n            images = [pil_loader(path)]\n            length = 1\n        else:\n            images, length = extract_frames(sample[\"path\"], points=self.points, backend=\"opencv\", return_length=True)\n        if self.resize_size is not None:\n            images_r = []\n            for img in images:\n                if img.size[0] > self.resize_size or img.size[1] > self.resize_size:\n                    img = self.resize(img)\n                images_r.append(img)\n            images = images_r\n        imgs_size = [img.size for img in images]\n        if self.transform is not None:\n            images = self.transform(images)\n\n        # we put images into a list as pytorch dataloader does not accept Pill\n        out = dict(path=path, image=images, length=length, img_size=imgs_size)\n        if self.get_text_input_ids is not None:\n            if self.use_text:\n                out[\"text\"] = self.get_text_input_ids(sample[\"text\"])\n            else:\n                out[\"text\"] = self.get_text_input_ids()\n        else:\n            if self.use_text:\n                out[\"text\"] = sample[\"text\"]\n            else:\n                out[\"text\"] = \"\"\n        return out\n\n    def __len__(self):\n        return len(self.data)\n\n    def __getitem__(self, index):\n        return self.getitem(index)\n\n\ndef collate_fn(batch):\n    paths = [item[\"path\"] for item in batch]\n    images = [item[\"image\"] for item in batch]\n    lengths = [item[\"length\"] for item in batch]\n    img_sizes = [item[\"img_size\"] for item in batch]\n    texts = [item[\"text\"] for item in batch]\n    return paths, images, lengths, img_sizes, texts\n\n\nclass Timer:\n    def __init__(self):\n        self.time_taken = 0\n        self.start_time = 0\n        self.end_time = 0\n\n    def __enter__(self):\n        self.start_time = time.time()\n        return self\n\n    def __exit__(self, exc_type, exc_value, exc_tb):\n        self.end_time = time.time()\n        self.time_taken = self.end_time - self.start_time\n"
  },
  {
    "path": "Open-Sora/tools/datasets/README.md",
    "content": "# Dataset Management\n\n- [Dataset Management](#dataset-management)\n  - [Dataset Format](#dataset-format)\n  - [Dataset to CSV](#dataset-to-csv)\n  - [Manage datasets](#manage-datasets)\n    - [Requirement](#requirement)\n    - [Basic Usage](#basic-usage)\n    - [Score filtering](#score-filtering)\n    - [Documentation](#documentation)\n  - [Transform datasets](#transform-datasets)\n    - [Resize](#resize)\n    - [Frame extraction](#frame-extraction)\n    - [Crop Midjourney 4 grid](#crop-midjourney-4-grid)\n  - [Analyze datasets](#analyze-datasets)\n  - [Data Process Pipeline](#data-process-pipeline)\n\nAfter preparing the raw dataset according to the [instructions](/docs/datasets.md), you can use the following commands to manage the dataset.\n\n## Dataset Format\n\nAll dataset should be provided in a `.csv` file (or `parquet.gzip` to save space), which is used for both training and data preprocessing. The columns should follow the words below:\n\n- `path`: the relative/absolute path or url to the image or video file. Required.\n- `text`: the caption or description of the image or video. Required for training.\n- `num_frames`: the number of frames in the video. Required for training.\n- `width`: the width of the video frame. Required for dynamic bucket.\n- `height`: the height of the video frame. Required for dynamic bucket.\n- `aspect_ratio`: the aspect ratio of the video frame (height / width). Required for dynamic bucket.\n- `resolution`: height x width. For analysis.\n- `text_len`: the number of tokens in the text. For analysis.\n- `aes`: aesthetic score calculated by [asethetic scorer](/tools/aesthetic/README.md). For filtering.\n- `flow`: optical flow score calculated by [UniMatch](/tools/scoring/README.md). For filtering.\n- `match`: matching score of a image-text/video-text pair calculated by [CLIP](/tools/scoring/README.md). For filtering.\n- `fps`: the frame rate of the video. Optional.\n- `cmotion`: the camera motion.\n\nAn example ready for training:\n\n```csv\npath, text, num_frames, width, height, aspect_ratio\n/absolute/path/to/image1.jpg, caption, 1, 720, 1280, 0.5625\n/absolute/path/to/video1.mp4, caption, 120, 720, 1280, 0.5625\n/absolute/path/to/video2.mp4, caption, 20, 256, 256, 1\n```\n\nWe use pandas to manage the `.csv` or `.parquet` files. The following code is for reading and writing files:\n\n```python\ndf = pd.read_csv(input_path)\ndf = df.to_csv(output_path, index=False)\n# or use parquet, which is smaller\ndf = pd.read_parquet(input_path)\ndf = df.to_parquet(output_path, index=False)\n```\n\n## Dataset to CSV\n\nAs a start point, `convert.py` is used to convert the dataset to a CSV file. You can use the following commands to convert the dataset to a CSV file:\n\n```bash\npython -m tools.datasets.convert DATASET-TYPE DATA_FOLDER\n\n# general video folder\npython -m tools.datasets.convert video VIDEO_FOLDER --output video.csv\n# general image folder\npython -m tools.datasets.convert image IMAGE_FOLDER --output image.csv\n# imagenet\npython -m tools.datasets.convert imagenet IMAGENET_FOLDER --split train\n# ucf101\npython -m tools.datasets.convert ucf101 UCF101_FOLDER --split videos\n# vidprom\npython -m tools.datasets.convert vidprom VIDPROM_FOLDER --info VidProM_semantic_unique.csv\n```\n\n## Manage datasets\n\nUse `datautil` to manage the dataset.\n\n### Requirement\n\nFollow our [installation guide](../../docs/installation.md)'s \"Data Dependencies\" and \"Datasets\" section to install the required packages.\n<!-- To accelerate processing speed, you can install [pandarallel](https://github.com/nalepae/pandarallel):\n\n```bash\npip install pandarallel\n``` -->\n\n<!-- To get image and video information, you need to install [opencv-python](https://github.com/opencv/opencv-python): -->\n\n<!-- ```bash\npip install opencv-python\n# If your videos are in av1 codec instead of h264, you need to\n# - install ffmpeg first\n# - install via conda to support av1 codec\nconda install -c conda-forge opencv\n``` -->\n\n<!-- Or to get video information, you can install ffmpeg and ffmpeg-python:\n\n```bash\npip install ffmpeg-python\n``` -->\n\n<!-- To filter a specific language, you need to install [lingua](https://github.com/pemistahl/lingua-py):\n\n```bash\npip install lingua-language-detector\n``` -->\n\n### Basic Usage\n\nYou can use the following commands to process the `csv` or `parquet` files. The output file will be saved in the same directory as the input, with different suffixes indicating the processed method.\n\n```bash\n# datautil takes multiple CSV files as input and merge them into one CSV file\n# output: DATA1+DATA2.csv\npython -m tools.datasets.datautil DATA1.csv DATA2.csv\n\n# shard CSV files into multiple CSV files\n# output: DATA1_0.csv, DATA1_1.csv, ...\npython -m tools.datasets.datautil DATA1.csv --shard 10\n\n# filter frames between 128 and 256, with captions\n# output: DATA1_fmin_128_fmax_256.csv\npython -m tools.datasets.datautil DATA.csv --fmin 128 --fmax 256\n\n# Disable parallel processing\npython -m tools.datasets.datautil DATA.csv --fmin 128 --fmax 256 --disable-parallel\n\n# Compute num_frames, height, width, fps, aspect_ratio for videos or images\n# output: IMG_DATA+VID_DATA_vinfo.csv\npython -m tools.datasets.datautil IMG_DATA.csv VID_DATA.csv --video-info\n\n# You can run multiple operations at the same time.\npython -m tools.datasets.datautil DATA.csv --video-info --remove-empty-caption --remove-url --lang en\n```\n\n### Score filtering\n\nTo examine and filter the quality of the dataset by aesthetic score and clip score, you can use the following commands:\n\n```bash\n# sort the dataset by aesthetic score\n# output: DATA_sort.csv\npython -m tools.datasets.datautil DATA.csv --sort aesthetic_score\n# View examples of high aesthetic score\nhead -n 10 DATA_sort.csv\n# View examples of low aesthetic score\ntail -n 10 DATA_sort.csv\n\n# sort the dataset by clip score\n# output: DATA_sort.csv\npython -m tools.datasets.datautil DATA.csv --sort clip_score\n\n# filter the dataset by aesthetic score\n# output: DATA_aesmin_0.5.csv\npython -m tools.datasets.datautil DATA.csv --aesmin 0.5\n# filter the dataset by clip score\n# output: DATA_matchmin_0.5.csv\npython -m tools.datasets.datautil DATA.csv --matchmin 0.5\n```\n\n### Documentation\n\nYou can also use `python -m tools.datasets.datautil --help` to see usage.\n\n| Args                        | File suffix    | Description                                                   |\n| --------------------------- | -------------- | ------------------------------------------------------------- |\n| `--output OUTPUT`           |                | Output path                                                   |\n| `--format FORMAT`           |                | Output format (csv, parquet, parquet.gzip)                    |\n| `--disable-parallel`        |                | Disable `pandarallel`                                         |\n| `--seed SEED`               |                | Random seed                                                   |\n| `--shard SHARD`             | `_0`,`_1`, ... | Shard the dataset                                             |\n| `--sort KEY`                | `_sort`        | Sort the dataset by KEY                                       |\n| `--sort-descending KEY`     | `_sort`        | Sort the dataset by KEY in descending order                   |\n| `--difference DATA.csv`     |                | Remove the paths in DATA.csv from the dataset                 |\n| `--intersection DATA.csv`   |                | Keep the paths in DATA.csv from the dataset and merge columns |\n| `--info`                    | `_info`        | Get the basic information of each video and image (cv2)       |\n| `--ext`                     | `_ext`         | Remove rows if the file does not exist                        |\n| `--relpath`                 | `_relpath`     | Modify the path to relative path by root given                |\n| `--abspath`                 | `_abspath`     | Modify the path to absolute path by root given                |\n| `--remove-empty-caption`    | `_noempty`     | Remove rows with empty caption                                |\n| `--remove-url`              | `_nourl`       | Remove rows with url in caption                               |\n| `--lang LANG`               | `_lang`        | Remove rows with other language                               |\n| `--remove-path-duplication` | `_noduppath`   | Remove rows with duplicated path                              |\n| `--remove-text-duplication` | `_noduptext`   | Remove rows with duplicated caption                           |\n| `--refine-llm-caption`      | `_llm`         | Modify the caption generated by LLM                           |\n| `--clean-caption MODEL`     | `_clean`       | Modify the caption according to T5 pipeline to suit training  |\n| `--unescape`                | `_unescape`    | Unescape the caption                                          |\n| `--merge-cmotion`           | `_cmotion`     | Merge the camera motion to the caption                        |\n| `--count-num-token`         | `_ntoken`      | Count the number of tokens in the caption                     |\n| `--load-caption EXT`        | `_load`        | Load the caption from the file                                |\n| `--fmin FMIN`               | `_fmin`        | Filter the dataset by minimum number of frames                |\n| `--fmax FMAX`               | `_fmax`        | Filter the dataset by maximum number of frames                |\n| `--hwmax HWMAX`             | `_hwmax`       | Filter the dataset by maximum height x width                  |\n| `--aesmin AESMIN`           | `_aesmin`      | Filter the dataset by minimum aesthetic score                 |\n| `--matchmin MATCHMIN`       | `_matchmin`    | Filter the dataset by minimum clip score                      |\n| `--flowmin FLOWMIN`         | `_flowmin`     | Filter the dataset by minimum optical flow score              |\n\n## Transform datasets\n\nThe `tools.datasets.transform` module provides a set of tools to transform the dataset. The general usage is as follows:\n\n```bash\npython -m tools.datasets.transform TRANSFORM_TYPE META.csv ORIGINAL_DATA_FOLDER DATA_FOLDER_TO_SAVE_RESULTS --additional-args\n```\n\n### Resize\n\nSometimes you may need to resize the images or videos to a specific resolution. You can use the following commands to resize the dataset:\n\n```bash\npython -m tools.datasets.transform meta.csv /path/to/raw/data /path/to/new/data --length 2160\n```\n\n### Frame extraction\n\nTo extract frames from videos, you can use the following commands:\n\n```bash\npython -m tools.datasets.transform vid_frame_extract meta.csv /path/to/raw/data /path/to/new/data --points 0.1 0.5 0.9\n```\n\n### Crop Midjourney 4 grid\n\nRandomly select one of the 4 images in the 4 grid generated by Midjourney.\n\n```bash\npython -m tools.datasets.transform img_rand_crop meta.csv /path/to/raw/data /path/to/new/data\n```\n\n## Analyze datasets\n\nYou can easily get basic information about a `.csv` dataset by using the following commands:\n\n```bash\n# examine the first 10 rows of the CSV file\nhead -n 10 DATA1.csv\n# count the number of data in the CSV file (approximately)\nwc -l DATA1.csv\n```\n\nFor the dataset provided in a `.csv` or `.parquet` file, you can easily analyze the dataset using the following commands. Plots will be automatically saved.\n\n```python\npyhton -m tools.datasets.analyze DATA_info.csv\n```\n\n## Data Process Pipeline\n\n```bash\n# Suppose videos and images under ~/dataset/\n# 1. Convert dataset to CSV\npython -m tools.datasets.convert video ~/dataset --output meta.csv\n\n# 2. Get video information\npython -m tools.datasets.datautil meta.csv --info --fmin 1\n\n# 3. Get caption\n# 3.1. generate caption\ntorchrun --nproc_per_node 8 --standalone -m tools.caption.caption_llava meta_info_fmin1.csv --dp-size 8 --tp-size 1 --model-path liuhaotian/llava-v1.6-mistral-7b --prompt video\n# merge generated results\npython -m tools.datasets.datautil meta_info_fmin1_caption_part*.csv --output meta_caption.csv\n# merge caption and info\npython -m tools.datasets.datautil meta_info_fmin1.csv --intersection meta_caption.csv --output meta_caption_info.csv\n# clean caption\npython -m tools.datasets.datautil meta_caption_info.csv --clean-caption --refine-llm-caption --remove-empty-caption --output meta_caption_processed.csv\n# 3.2. extract caption\npython -m tools.datasets.datautil meta_info_fmin1.csv --load-caption json --remove-empty-caption --clean-caption\n\n# 4. Scoring\n# aesthetic scoring\ntorchrun --standalone --nproc_per_node 8 -m tools.scoring.aesthetic.inference meta_caption_processed.csv\npython -m tools.datasets.datautil meta_caption_processed_part*.csv --output meta_caption_processed_aes.csv\n# optical flow scoring\ntorchrun --standalone --nproc_per_node 8 -m tools.scoring.optical_flow.inference meta_caption_processed.csv\n# matching scoring\ntorchrun --standalone --nproc_per_node 8 -m tools.scoring.matching.inference meta_caption_processed.csv\n# camera motion\npython -m tools.caption.camera_motion_detect meta_caption_processed.csv\n```\n"
  },
  {
    "path": "Open-Sora/tools/datasets/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/tools/datasets/analyze.py",
    "content": "import argparse\nimport os\n\nimport matplotlib.pyplot as plt\nimport pandas as pd\n\n\ndef read_file(input_path):\n    if input_path.endswith(\".csv\"):\n        return pd.read_csv(input_path)\n    elif input_path.endswith(\".parquet\"):\n        return pd.read_parquet(input_path)\n    else:\n        raise NotImplementedError(f\"Unsupported file format: {input_path}\")\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"input\", type=str, help=\"Path to the input dataset\")\n    parser.add_argument(\"--save-img\", type=str, default=\"samples/infos/\", help=\"Path to save the image\")\n    return parser.parse_args()\n\n\ndef plot_data(data, column, bins, name):\n    plt.clf()\n    data.hist(column=column, bins=bins)\n    os.makedirs(os.path.dirname(name), exist_ok=True)\n    plt.savefig(name)\n    print(f\"Saved {name}\")\n\n\ndef plot_categorical_data(data, column, name):\n    plt.clf()\n    data[column].value_counts().plot(kind=\"bar\")\n    os.makedirs(os.path.dirname(name), exist_ok=True)\n    plt.savefig(name)\n    print(f\"Saved {name}\")\n\n\nCOLUMNS = {\n    \"num_frames\": 100,\n    \"resolution\": 100,\n    \"text_len\": 100,\n    \"aes\": 100,\n    \"match\": 100,\n    \"flow\": 100,\n    \"cmotion\": None,\n}\n\n\ndef main(args):\n    data = read_file(args.input)\n\n    # === Image Data Info ===\n    image_index = data[\"num_frames\"] == 1\n    if image_index.sum() > 0:\n        print(\"=== Image Data Info ===\")\n        img_data = data[image_index]\n        print(f\"Number of images: {len(img_data)}\")\n        print(img_data.head())\n        print(img_data.describe())\n        if args.save_img:\n            for column in COLUMNS:\n                if column in img_data.columns and column not in [\"num_frames\", \"cmotion\"]:\n                    if COLUMNS[column] is None:\n                        plot_categorical_data(img_data, column, os.path.join(args.save_img, f\"image_{column}.png\"))\n                    else:\n                        plot_data(img_data, column, COLUMNS[column], os.path.join(args.save_img, f\"image_{column}.png\"))\n\n    # === Video Data Info ===\n    if not image_index.all():\n        print(\"=== Video Data Info ===\")\n        video_data = data[~image_index]\n        print(f\"Number of videos: {len(video_data)}\")\n        if \"num_frames\" in video_data.columns:\n            total_num_frames = video_data[\"num_frames\"].sum()\n            print(f\"Number of frames: {total_num_frames}\")\n            DEFAULT_FPS = 30\n            total_hours = total_num_frames / DEFAULT_FPS / 3600\n            print(f\"Total hours (30 FPS): {int(total_hours)}\")\n        print(video_data.head())\n        print(video_data.describe())\n        if args.save_img:\n            for column in COLUMNS:\n                if column in video_data.columns:\n                    if COLUMNS[column] is None:\n                        plot_categorical_data(video_data, column, os.path.join(args.save_img, f\"video_{column}.png\"))\n                    else:\n                        plot_data(\n                            video_data, column, COLUMNS[column], os.path.join(args.save_img, f\"video_{column}.png\")\n                        )\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n    main(args)\n"
  },
  {
    "path": "Open-Sora/tools/datasets/convert.py",
    "content": "import argparse\nimport os\nimport time\n\nimport pandas as pd\nfrom torchvision.datasets import ImageNet\n\nIMG_EXTENSIONS = (\".jpg\", \".jpeg\", \".png\", \".ppm\", \".bmp\", \".pgm\", \".tif\", \".tiff\", \".webp\")\nVID_EXTENSIONS = (\".mp4\", \".avi\", \".mov\", \".mkv\", \".m2ts\")\n\n\ndef scan_recursively(root):\n    num = 0\n    for entry in os.scandir(root):\n        if entry.is_file():\n            yield entry\n        elif entry.is_dir():\n            num += 1\n            if num % 100 == 0:\n                print(f\"Scanned {num} directories.\")\n            yield from scan_recursively(entry.path)\n\n\ndef get_filelist(file_path, exts=None):\n    filelist = []\n    time_start = time.time()\n\n    # == OS Walk ==\n    # for home, dirs, files in os.walk(file_path):\n    #     for filename in files:\n    #         ext = os.path.splitext(filename)[-1].lower()\n    #         if exts is None or ext in exts:\n    #             filelist.append(os.path.join(home, filename))\n\n    # == Scandir ==\n    obj = scan_recursively(file_path)\n    for entry in obj:\n        if entry.is_file():\n            ext = os.path.splitext(entry.name)[-1].lower()\n            if exts is None or ext in exts:\n                filelist.append(entry.path)\n\n    time_end = time.time()\n    print(f\"Scanned {len(filelist)} files in {time_end - time_start:.2f} seconds.\")\n    return filelist\n\n\ndef split_by_capital(name):\n    # BoxingPunchingBag -> Boxing Punching Bag\n    new_name = \"\"\n    for i in range(len(name)):\n        if name[i].isupper() and i != 0:\n            new_name += \" \"\n        new_name += name[i]\n    return new_name\n\n\ndef process_imagenet(root, split):\n    root = os.path.expanduser(root)\n    data = ImageNet(root, split=split)\n    samples = [(path, data.classes[label][0]) for path, label in data.samples]\n    output = f\"imagenet_{split}.csv\"\n\n    df = pd.DataFrame(samples, columns=[\"path\", \"text\"])\n    df.to_csv(output, index=False)\n    print(f\"Saved {len(samples)} samples to {output}.\")\n\n\ndef process_ucf101(root, split):\n    root = os.path.expanduser(root)\n    video_lists = get_filelist(os.path.join(root, split))\n    classes = [x.split(\"/\")[-2] for x in video_lists]\n    classes = [split_by_capital(x) for x in classes]\n    samples = list(zip(video_lists, classes))\n    output = f\"ucf101_{split}.csv\"\n\n    df = pd.DataFrame(samples, columns=[\"path\", \"text\"])\n    df.to_csv(output, index=False)\n    print(f\"Saved {len(samples)} samples to {output}.\")\n\n\ndef process_vidprom(root, info):\n    root = os.path.expanduser(root)\n    video_lists = get_filelist(root)\n    video_set = set(video_lists)\n    # read info csv\n    infos = pd.read_csv(info)\n    abs_path = infos[\"uuid\"].apply(lambda x: os.path.join(root, f\"pika-{x}.mp4\"))\n    is_exist = abs_path.apply(lambda x: x in video_set)\n    df = pd.DataFrame(dict(path=abs_path[is_exist], text=infos[\"prompt\"][is_exist]))\n    df.to_csv(\"vidprom.csv\", index=False)\n    print(f\"Saved {len(df)} samples to vidprom.csv.\")\n\n\ndef process_general_images(root, output):\n    root = os.path.expanduser(root)\n    if not os.path.exists(root):\n        return\n    path_list = get_filelist(root, IMG_EXTENSIONS)\n    fname_list = [os.path.splitext(os.path.basename(x))[0] for x in path_list]\n    df = pd.DataFrame(dict(id=fname_list, path=path_list))\n\n    os.makedirs(os.path.dirname(output), exist_ok=True)\n    df.to_csv(output, index=False)\n    print(f\"Saved {len(df)} samples to {output}.\")\n\n\ndef process_general_videos(root, output):\n    root = os.path.expanduser(root)\n    if not os.path.exists(root):\n        return\n    path_list = get_filelist(root, VID_EXTENSIONS)\n    path_list = list(set(path_list))  # remove duplicates\n    fname_list = [os.path.splitext(os.path.basename(x))[0] for x in path_list]\n    relpath_list = [os.path.relpath(x, root) for x in path_list]\n    df = pd.DataFrame(dict(path=path_list, id=fname_list, relpath=relpath_list))\n\n    os.makedirs(os.path.dirname(output), exist_ok=True)\n    df.to_csv(output, index=False)\n    print(f\"Saved {len(df)} samples to {output}.\")\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"dataset\", type=str, choices=[\"imagenet\", \"ucf101\", \"vidprom\", \"image\", \"video\"])\n    parser.add_argument(\"root\", type=str)\n    parser.add_argument(\"--split\", type=str, default=\"train\")\n    parser.add_argument(\"--info\", type=str, default=None)\n    parser.add_argument(\"--output\", type=str, default=None, required=True, help=\"Output path\")\n    args = parser.parse_args()\n\n    if args.dataset == \"imagenet\":\n        process_imagenet(args.root, args.split)\n    elif args.dataset == \"ucf101\":\n        process_ucf101(args.root, args.split)\n    elif args.dataset == \"vidprom\":\n        process_vidprom(args.root, args.info)\n    elif args.dataset == \"image\":\n        process_general_images(args.root, args.output)\n    elif args.dataset == \"video\":\n        process_general_videos(args.root, args.output)\n    else:\n        raise ValueError(\"Invalid dataset\")\n"
  },
  {
    "path": "Open-Sora/tools/datasets/datautil.py",
    "content": "import argparse\nimport html\nimport json\nimport os\nimport random\nimport re\nfrom functools import partial\nfrom glob import glob\n\nimport cv2\nimport numpy as np\nimport pandas as pd\nfrom PIL import Image\nfrom tqdm import tqdm\n\nfrom opensora.datasets.read_video import read_video\n\nfrom .utils import IMG_EXTENSIONS\n\ntqdm.pandas()\n\ntry:\n    from pandarallel import pandarallel\n\n    PANDA_USE_PARALLEL = True\nexcept ImportError:\n    PANDA_USE_PARALLEL = False\n\n\ndef apply(df, func, **kwargs):\n    if PANDA_USE_PARALLEL:\n        return df.parallel_apply(func, **kwargs)\n    return df.progress_apply(func, **kwargs)\n\n\nTRAIN_COLUMNS = [\"path\", \"text\", \"num_frames\", \"fps\", \"height\", \"width\", \"aspect_ratio\", \"resolution\", \"text_len\"]\n\n# ======================================================\n# --info\n# ======================================================\n\n\ndef get_video_length(cap, method=\"header\"):\n    assert method in [\"header\", \"set\"]\n    if method == \"header\":\n        length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))\n    else:\n        cap.set(cv2.CAP_PROP_POS_AVI_RATIO, 1)\n        length = int(cap.get(cv2.CAP_PROP_POS_FRAMES))\n    return length\n\n\ndef get_info_old(path):\n    try:\n        ext = os.path.splitext(path)[1].lower()\n        if ext in IMG_EXTENSIONS:\n            im = cv2.imread(path)\n            if im is None:\n                return 0, 0, 0, np.nan, np.nan, np.nan\n            height, width = im.shape[:2]\n            num_frames, fps = 1, np.nan\n        else:\n            cap = cv2.VideoCapture(path)\n            num_frames, height, width, fps = (\n                get_video_length(cap, method=\"header\"),\n                int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)),\n                int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),\n                float(cap.get(cv2.CAP_PROP_FPS)),\n            )\n        hw = height * width\n        aspect_ratio = height / width if width > 0 else np.nan\n        return num_frames, height, width, aspect_ratio, fps, hw\n    except:\n        return 0, 0, 0, np.nan, np.nan, np.nan\n\n\ndef get_info(path):\n    try:\n        ext = os.path.splitext(path)[1].lower()\n        if ext in IMG_EXTENSIONS:\n            return get_image_info(path)\n        else:\n            return get_video_info(path)\n    except:\n        return 0, 0, 0, np.nan, np.nan, np.nan\n\n\ndef get_image_info(path, backend=\"pillow\"):\n    if backend == \"pillow\":\n        try:\n            with open(path, \"rb\") as f:\n                img = Image.open(f)\n                img = img.convert(\"RGB\")\n            width, height = img.size\n            num_frames, fps = 1, np.nan\n            hw = height * width\n            aspect_ratio = height / width if width > 0 else np.nan\n            return num_frames, height, width, aspect_ratio, fps, hw\n        except:\n            return 0, 0, 0, np.nan, np.nan, np.nan\n    elif backend == \"cv2\":\n        try:\n            im = cv2.imread(path)\n            if im is None:\n                return 0, 0, 0, np.nan, np.nan, np.nan\n            height, width = im.shape[:2]\n            num_frames, fps = 1, np.nan\n            hw = height * width\n            aspect_ratio = height / width if width > 0 else np.nan\n            return num_frames, height, width, aspect_ratio, fps, hw\n        except:\n            return 0, 0, 0, np.nan, np.nan, np.nan\n    else:\n        raise ValueError\n\n\ndef get_video_info(path, backend=\"torchvision\"):\n    if backend == \"torchvision\":\n        try:\n            vframes, infos = read_video(path)\n            num_frames, height, width = vframes.shape[0], vframes.shape[2], vframes.shape[3]\n            if \"video_fps\" in infos:\n                fps = infos[\"video_fps\"]\n            else:\n                fps = np.nan\n            hw = height * width\n            aspect_ratio = height / width if width > 0 else np.nan\n            return num_frames, height, width, aspect_ratio, fps, hw\n        except:\n            return 0, 0, 0, np.nan, np.nan, np.nan\n    elif backend == \"cv2\":\n        try:\n            cap = cv2.VideoCapture(path)\n            num_frames, height, width, fps = (\n                get_video_length(cap, method=\"header\"),\n                int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)),\n                int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),\n                float(cap.get(cv2.CAP_PROP_FPS)),\n            )\n            hw = height * width\n            aspect_ratio = height / width if width > 0 else np.nan\n            return num_frames, height, width, aspect_ratio, fps, hw\n        except:\n            return 0, 0, 0, np.nan, np.nan, np.nan\n    else:\n        raise ValueError\n\n\n# ======================================================\n# --refine-llm-caption\n# ======================================================\n\nLLAVA_PREFIX = [\n    \"The video shows\",\n    \"The video captures\",\n    \"The video features\",\n    \"The video depicts\",\n    \"The video presents\",\n    \"The video features\",\n    \"The video is \",\n    \"In the video,\",\n    \"The image shows\",\n    \"The image captures\",\n    \"The image features\",\n    \"The image depicts\",\n    \"The image presents\",\n    \"The image features\",\n    \"The image is \",\n    \"The image portrays\",\n    \"In the image,\",\n]\n\n\ndef remove_caption_prefix(caption):\n    for prefix in LLAVA_PREFIX:\n        if caption.startswith(prefix) or caption.startswith(prefix.lower()):\n            caption = caption[len(prefix) :].strip()\n            if caption[0].islower():\n                caption = caption[0].upper() + caption[1:]\n            return caption\n    return caption\n\n\n# ======================================================\n# --merge-cmotion\n# ======================================================\n\nCMOTION_TEXT = {\n    \"static\": \"static\",\n    \"pan_right\": \"pan right\",\n    \"pan_left\": \"pan left\",\n    \"zoom_in\": \"zoom in\",\n    \"zoom_out\": \"zoom out\",\n    \"tilt_up\": \"tilt up\",\n    \"tilt_down\": \"tilt down\",\n    # \"pan/tilt\": \"The camera is panning.\",\n    # \"dynamic\": \"The camera is moving.\",\n    # \"unknown\": None,\n}\nCMOTION_PROBS = {\n    # hard-coded probabilities\n    \"static\": 1.0,\n    \"zoom_in\": 1.0,\n    \"zoom_out\": 1.0,\n    \"pan_left\": 1.0,\n    \"pan_right\": 1.0,\n    \"tilt_up\": 1.0,\n    \"tilt_down\": 1.0,\n    # \"dynamic\": 1.0,\n    # \"unknown\": 0.0,\n    # \"pan/tilt\": 1.0,\n}\n\n\ndef merge_cmotion(caption, cmotion):\n    text = CMOTION_TEXT[cmotion]\n    prob = CMOTION_PROBS[cmotion]\n    if text is not None and random.random() < prob:\n        caption = f\"{caption} Camera motion: {text}.\"\n    return caption\n\n\n# ======================================================\n# --lang\n# ======================================================\n\n\ndef build_lang_detector(lang_to_detect):\n    from lingua import Language, LanguageDetectorBuilder\n\n    lang_dict = dict(en=Language.ENGLISH)\n    assert lang_to_detect in lang_dict\n    valid_lang = lang_dict[lang_to_detect]\n    detector = LanguageDetectorBuilder.from_all_spoken_languages().with_low_accuracy_mode().build()\n\n    def detect_lang(caption):\n        confidence_values = detector.compute_language_confidence_values(caption)\n        confidence = [x.language for x in confidence_values[:5]]\n        if valid_lang not in confidence:\n            return False\n        return True\n\n    return detect_lang\n\n\n# ======================================================\n# --clean-caption\n# ======================================================\n\n\ndef basic_clean(text):\n    import ftfy\n\n    text = ftfy.fix_text(text)\n    text = html.unescape(html.unescape(text))\n    return text.strip()\n\n\nBAD_PUNCT_REGEX = re.compile(\n    r\"[\" + \"#®•©™&@·º½¾¿¡§~\" + \"\\)\" + \"\\(\" + \"\\]\" + \"\\[\" + \"\\}\" + \"\\{\" + \"\\|\" + \"\\\\\" + \"\\/\" + \"\\*\" + r\"]{1,}\"\n)  # noqa\n\n\ndef clean_caption(caption):\n    import urllib.parse as ul\n\n    from bs4 import BeautifulSoup\n\n    caption = str(caption)\n    caption = ul.unquote_plus(caption)\n    caption = caption.strip().lower()\n    caption = re.sub(\"<person>\", \"person\", caption)\n    # urls:\n    caption = re.sub(\n        r\"\\b((?:https?:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))\",  # noqa\n        \"\",\n        caption,\n    )  # regex for urls\n    caption = re.sub(\n        r\"\\b((?:www:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))\",  # noqa\n        \"\",\n        caption,\n    )  # regex for urls\n    # html:\n    caption = BeautifulSoup(caption, features=\"html.parser\").text\n\n    # @<nickname>\n    caption = re.sub(r\"@[\\w\\d]+\\b\", \"\", caption)\n\n    # 31C0—31EF CJK Strokes\n    # 31F0—31FF Katakana Phonetic Extensions\n    # 3200—32FF Enclosed CJK Letters and Months\n    # 3300—33FF CJK Compatibility\n    # 3400—4DBF CJK Unified Ideographs Extension A\n    # 4DC0—4DFF Yijing Hexagram Symbols\n    # 4E00—9FFF CJK Unified Ideographs\n    caption = re.sub(r\"[\\u31c0-\\u31ef]+\", \"\", caption)\n    caption = re.sub(r\"[\\u31f0-\\u31ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3200-\\u32ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3300-\\u33ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3400-\\u4dbf]+\", \"\", caption)\n    caption = re.sub(r\"[\\u4dc0-\\u4dff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u4e00-\\u9fff]+\", \"\", caption)\n    #######################################################\n\n    # все виды тире / all types of dash --> \"-\"\n    caption = re.sub(\n        r\"[\\u002D\\u058A\\u05BE\\u1400\\u1806\\u2010-\\u2015\\u2E17\\u2E1A\\u2E3A\\u2E3B\\u2E40\\u301C\\u3030\\u30A0\\uFE31\\uFE32\\uFE58\\uFE63\\uFF0D]+\",  # noqa\n        \"-\",\n        caption,\n    )\n\n    # кавычки к одному стандарту\n    caption = re.sub(r\"[`´«»“”¨]\", '\"', caption)\n    caption = re.sub(r\"[‘’]\", \"'\", caption)\n\n    # &quot;\n    caption = re.sub(r\"&quot;?\", \"\", caption)\n    # &amp\n    caption = re.sub(r\"&amp\", \"\", caption)\n\n    # ip adresses:\n    caption = re.sub(r\"\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\", \" \", caption)\n\n    # article ids:\n    caption = re.sub(r\"\\d:\\d\\d\\s+$\", \"\", caption)\n\n    # \\n\n    caption = re.sub(r\"\\\\n\", \" \", caption)\n\n    # \"#123\"\n    caption = re.sub(r\"#\\d{1,3}\\b\", \"\", caption)\n    # \"#12345..\"\n    caption = re.sub(r\"#\\d{5,}\\b\", \"\", caption)\n    # \"123456..\"\n    caption = re.sub(r\"\\b\\d{6,}\\b\", \"\", caption)\n    # filenames:\n    caption = re.sub(r\"[\\S]+\\.(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)\", \"\", caption)\n\n    #\n    caption = re.sub(r\"[\\\"\\']{2,}\", r'\"', caption)  # \"\"\"AUSVERKAUFT\"\"\"\n    caption = re.sub(r\"[\\.]{2,}\", r\" \", caption)  # \"\"\"AUSVERKAUFT\"\"\"\n\n    caption = re.sub(BAD_PUNCT_REGEX, r\" \", caption)  # ***AUSVERKAUFT***, #AUSVERKAUFT\n    caption = re.sub(r\"\\s+\\.\\s+\", r\" \", caption)  # \" . \"\n\n    # this-is-my-cute-cat / this_is_my_cute_cat\n    regex2 = re.compile(r\"(?:\\-|\\_)\")\n    if len(re.findall(regex2, caption)) > 3:\n        caption = re.sub(regex2, \" \", caption)\n\n    caption = basic_clean(caption)\n\n    caption = re.sub(r\"\\b[a-zA-Z]{1,3}\\d{3,15}\\b\", \"\", caption)  # jc6640\n    caption = re.sub(r\"\\b[a-zA-Z]+\\d+[a-zA-Z]+\\b\", \"\", caption)  # jc6640vc\n    caption = re.sub(r\"\\b\\d+[a-zA-Z]+\\d+\\b\", \"\", caption)  # 6640vc231\n\n    caption = re.sub(r\"(worldwide\\s+)?(free\\s+)?shipping\", \"\", caption)\n    caption = re.sub(r\"(free\\s)?download(\\sfree)?\", \"\", caption)\n    caption = re.sub(r\"\\bclick\\b\\s(?:for|on)\\s\\w+\", \"\", caption)\n    caption = re.sub(r\"\\b(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)(\\simage[s]?)?\", \"\", caption)\n    caption = re.sub(r\"\\bpage\\s+\\d+\\b\", \"\", caption)\n\n    caption = re.sub(r\"\\b\\d*[a-zA-Z]+\\d+[a-zA-Z]+\\d+[a-zA-Z\\d]*\\b\", r\" \", caption)  # j2d1a2a...\n\n    caption = re.sub(r\"\\b\\d+\\.?\\d*[xх×]\\d+\\.?\\d*\\b\", \"\", caption)\n\n    caption = re.sub(r\"\\b\\s+\\:\\s+\", r\": \", caption)\n    caption = re.sub(r\"(\\D[,\\./])\\b\", r\"\\1 \", caption)\n    caption = re.sub(r\"\\s+\", \" \", caption)\n\n    caption.strip()\n\n    caption = re.sub(r\"^[\\\"\\']([\\w\\W]+)[\\\"\\']$\", r\"\\1\", caption)\n    caption = re.sub(r\"^[\\'\\_,\\-\\:;]\", r\"\", caption)\n    caption = re.sub(r\"[\\'\\_,\\-\\:\\-\\+]$\", r\"\", caption)\n    caption = re.sub(r\"^\\.\\S+$\", \"\", caption)\n\n    return caption.strip()\n\n\ndef text_preprocessing(text, use_text_preprocessing: bool = True):\n    if use_text_preprocessing:\n        # The exact text cleaning as was in the training stage:\n        text = clean_caption(text)\n        text = clean_caption(text)\n        return text\n    else:\n        return text.lower().strip()\n\n\n# ======================================================\n# load caption\n# ======================================================\n\n\ndef load_caption(path, ext):\n    try:\n        assert ext in [\"json\"]\n        json_path = path.split(\".\")[0] + \".json\"\n        with open(json_path, \"r\") as f:\n            data = json.load(f)\n        caption = data[\"caption\"]\n        return caption\n    except:\n        return \"\"\n\n\n# ======================================================\n# --clean-caption\n# ======================================================\n\nDROP_SCORE_PROB = 0.2\n\n\ndef score_to_text(data):\n    text = data[\"text\"]\n    scores = []\n    # aesthetic\n    if \"aes\" in data:\n        aes = data[\"aes\"]\n        if random.random() > DROP_SCORE_PROB:\n            score_text = f\"aesthetic score: {aes:.1f}\"\n            scores.append(score_text)\n    if \"flow\" in data:\n        flow = data[\"flow\"]\n        if random.random() > DROP_SCORE_PROB:\n            score_text = f\"motion score: {flow:.1f}\"\n            scores.append(score_text)\n    if len(scores) > 0:\n        text = f\"{text} [{', '.join(scores)}]\"\n    return text\n\n\n# ======================================================\n# read & write\n# ======================================================\n\n\ndef read_file(input_path):\n    if input_path.endswith(\".csv\"):\n        return pd.read_csv(input_path)\n    elif input_path.endswith(\".parquet\"):\n        return pd.read_parquet(input_path)\n    else:\n        raise NotImplementedError(f\"Unsupported file format: {input_path}\")\n\n\ndef save_file(data, output_path):\n    output_dir = os.path.dirname(output_path)\n    if not os.path.exists(output_dir) and output_dir != \"\":\n        os.makedirs(output_dir)\n    if output_path.endswith(\".csv\"):\n        return data.to_csv(output_path, index=False)\n    elif output_path.endswith(\".parquet\"):\n        return data.to_parquet(output_path, index=False)\n    else:\n        raise NotImplementedError(f\"Unsupported file format: {output_path}\")\n\n\ndef read_data(input_paths):\n    data = []\n    input_name = \"\"\n    input_list = []\n    for input_path in input_paths:\n        input_list.extend(glob(input_path))\n    print(\"Input files:\", input_list)\n    for i, input_path in enumerate(input_list):\n        if not os.path.exists(input_path):\n            continue\n        data.append(read_file(input_path))\n        input_name += os.path.basename(input_path).split(\".\")[0]\n        if i != len(input_list) - 1:\n            input_name += \"+\"\n        print(f\"Loaded {len(data[-1])} samples from '{input_path}'.\")\n    if len(data) == 0:\n        print(f\"No samples to process. Exit.\")\n        exit()\n    data = pd.concat(data, ignore_index=True, sort=False)\n    print(f\"Total number of samples: {len(data)}\")\n    return data, input_name\n\n\n# ======================================================\n# main\n# ======================================================\n# To add a new method, register it in the main, parse_args, and get_output_path functions, and update the doc at /tools/datasets/README.md#documentation\n\n\ndef main(args):\n    # reading data\n    data, input_name = read_data(args.input)\n\n    # make difference\n    if args.difference is not None:\n        data_diff = pd.read_csv(args.difference)\n        print(f\"Difference csv contains {len(data_diff)} samples.\")\n        data = data[~data[\"path\"].isin(data_diff[\"path\"])]\n        input_name += f\"-{os.path.basename(args.difference).split('.')[0]}\"\n        print(f\"Filtered number of samples: {len(data)}.\")\n\n    # make intersection\n    if args.intersection is not None:\n        data_new = pd.read_csv(args.intersection)\n        print(f\"Intersection csv contains {len(data_new)} samples.\")\n        cols_to_use = data_new.columns.difference(data.columns)\n\n        col_on = \"path\"\n        # if 'id' in data.columns and 'id' in data_new.columns:\n        #     col_on = 'id'\n        cols_to_use = cols_to_use.insert(0, col_on)\n        data = pd.merge(data, data_new[cols_to_use], on=col_on, how=\"inner\")\n        print(f\"Intersection number of samples: {len(data)}.\")\n\n    # get output path\n    output_path = get_output_path(args, input_name)\n\n    # preparation\n    if args.lang is not None:\n        detect_lang = build_lang_detector(args.lang)\n    if args.count_num_token == \"t5\":\n        from transformers import AutoTokenizer\n\n        tokenizer = AutoTokenizer.from_pretrained(\"DeepFloyd/t5-v1_1-xxl\")\n\n    # IO-related\n    if args.load_caption is not None:\n        assert \"path\" in data.columns\n        data[\"text\"] = apply(data[\"path\"], load_caption, ext=args.load_caption)\n    if args.info:\n        info = apply(data[\"path\"], get_info)\n        (\n            data[\"num_frames\"],\n            data[\"height\"],\n            data[\"width\"],\n            data[\"aspect_ratio\"],\n            data[\"fps\"],\n            data[\"resolution\"],\n        ) = zip(*info)\n    if args.video_info:\n        info = apply(data[\"path\"], get_video_info)\n        (\n            data[\"num_frames\"],\n            data[\"height\"],\n            data[\"width\"],\n            data[\"aspect_ratio\"],\n            data[\"fps\"],\n            data[\"resolution\"],\n        ) = zip(*info)\n    if args.ext:\n        assert \"path\" in data.columns\n        data = data[apply(data[\"path\"], os.path.exists)]\n\n    # filtering\n    if args.remove_url:\n        assert \"text\" in data.columns\n        data = data[~data[\"text\"].str.contains(r\"(?P<url>https?://[^\\s]+)\", regex=True)]\n    if args.lang is not None:\n        assert \"text\" in data.columns\n        data = data[data[\"text\"].progress_apply(detect_lang)]  # cannot parallelize\n    if args.remove_empty_path:\n        assert \"path\" in data.columns\n        data = data[data[\"path\"].str.len() > 0]\n        data = data[~data[\"path\"].isna()]\n    if args.remove_empty_caption:\n        assert \"text\" in data.columns\n        data = data[data[\"text\"].str.len() > 0]\n        data = data[~data[\"text\"].isna()]\n    if args.remove_path_duplication:\n        assert \"path\" in data.columns\n        data = data.drop_duplicates(subset=[\"path\"])\n    if args.path_subset:\n        data = data[data[\"path\"].str.contains(args.path_subset)]\n\n    # processing\n    if args.relpath is not None:\n        data[\"path\"] = apply(data[\"path\"], lambda x: os.path.relpath(x, args.relpath))\n    if args.abspath is not None:\n        data[\"path\"] = apply(data[\"path\"], lambda x: os.path.join(args.abspath, x))\n    if args.path_to_id:\n        data[\"id\"] = apply(data[\"path\"], lambda x: os.path.splitext(os.path.basename(x))[0])\n    if args.merge_cmotion:\n        data[\"text\"] = apply(data, lambda x: merge_cmotion(x[\"text\"], x[\"cmotion\"]), axis=1)\n    if args.refine_llm_caption:\n        assert \"text\" in data.columns\n        data[\"text\"] = apply(data[\"text\"], remove_caption_prefix)\n    if args.append_text is not None:\n        assert \"text\" in data.columns\n        data[\"text\"] = data[\"text\"] + args.append_text\n    if args.score_to_text:\n        data[\"text\"] = apply(data, score_to_text, axis=1)\n    if args.clean_caption:\n        assert \"text\" in data.columns\n        data[\"text\"] = apply(\n            data[\"text\"],\n            partial(text_preprocessing, use_text_preprocessing=True),\n        )\n    if args.count_num_token is not None:\n        assert \"text\" in data.columns\n        data[\"text_len\"] = apply(data[\"text\"], lambda x: len(tokenizer(x)[\"input_ids\"]))\n    if args.update_text is not None:\n        data_new = pd.read_csv(args.update_text)\n        num_updated = data.path.isin(data_new.path).sum()\n        print(f\"Number of updated samples: {num_updated}.\")\n        data = data.set_index(\"path\")\n        data_new = data_new[[\"path\", \"text\"]].set_index(\"path\")\n        data.update(data_new)\n        data = data.reset_index()\n\n    # sort\n    if args.sort is not None:\n        data = data.sort_values(by=args.sort, ascending=False)\n    if args.sort_ascending is not None:\n        data = data.sort_values(by=args.sort_ascending, ascending=True)\n\n    # filtering\n    if args.filesize:\n        assert \"path\" in data.columns\n        data[\"filesize\"] = apply(data[\"path\"], lambda x: os.stat(x).st_size / 1024 / 1024)\n    if args.fsmax is not None:\n        assert \"filesize\" in data.columns\n        data = data[data[\"filesize\"] <= args.fsmax]\n    if args.remove_empty_caption:\n        assert \"text\" in data.columns\n        data = data[data[\"text\"].str.len() > 0]\n        data = data[~data[\"text\"].isna()]\n    if args.fmin is not None:\n        assert \"num_frames\" in data.columns\n        data = data[data[\"num_frames\"] >= args.fmin]\n    if args.fmax is not None:\n        assert \"num_frames\" in data.columns\n        data = data[data[\"num_frames\"] <= args.fmax]\n    if args.fpsmax is not None:\n        assert \"fps\" in data.columns\n        data = data[(data[\"fps\"] <= args.fpsmax) | np.isnan(data[\"fps\"])]\n    if args.hwmax is not None:\n        if \"resolution\" not in data.columns:\n            height = data[\"height\"]\n            width = data[\"width\"]\n            data[\"resolution\"] = height * width\n        data = data[data[\"resolution\"] <= args.hwmax]\n    if args.aesmin is not None:\n        assert \"aes\" in data.columns\n        data = data[data[\"aes\"] >= args.aesmin]\n    if args.matchmin is not None:\n        assert \"match\" in data.columns\n        data = data[data[\"match\"] >= args.matchmin]\n    if args.flowmin is not None:\n        assert \"flow\" in data.columns\n        data = data[data[\"flow\"] >= args.flowmin]\n    if args.remove_text_duplication:\n        data = data.drop_duplicates(subset=[\"text\"], keep=\"first\")\n    if args.img_only:\n        data = data[data[\"path\"].str.lower().str.endswith(IMG_EXTENSIONS)]\n    if args.vid_only:\n        data = data[~data[\"path\"].str.lower().str.endswith(IMG_EXTENSIONS)]\n\n    # process data\n    if args.shuffle:\n        data = data.sample(frac=1).reset_index(drop=True)  # shuffle\n    if args.head is not None:\n        data = data.head(args.head)\n\n    # train columns\n    if args.train_column:\n        all_columns = data.columns\n        columns_to_drop = all_columns.difference(TRAIN_COLUMNS)\n        data = data.drop(columns=columns_to_drop)\n\n    print(f\"Filtered number of samples: {len(data)}.\")\n\n    # shard data\n    if args.shard is not None:\n        sharded_data = np.array_split(data, args.shard)\n        for i in range(args.shard):\n            output_path_part = output_path.split(\".\")\n            output_path_s = \".\".join(output_path_part[:-1]) + f\"_{i}.\" + output_path_part[-1]\n            save_file(sharded_data[i], output_path_s)\n            print(f\"Saved {len(sharded_data[i])} samples to {output_path_s}.\")\n    else:\n        save_file(data, output_path)\n        print(f\"Saved {len(data)} samples to {output_path}.\")\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"input\", type=str, nargs=\"+\", help=\"path to the input dataset\")\n    parser.add_argument(\"--output\", type=str, default=None, help=\"output path\")\n    parser.add_argument(\"--format\", type=str, default=\"csv\", help=\"output format\", choices=[\"csv\", \"parquet\"])\n    parser.add_argument(\"--disable-parallel\", action=\"store_true\", help=\"disable parallel processing\")\n    parser.add_argument(\"--num-workers\", type=int, default=None, help=\"number of workers\")\n    parser.add_argument(\"--seed\", type=int, default=42, help=\"random seed\")\n\n    # special case\n    parser.add_argument(\"--shard\", type=int, default=None, help=\"shard the dataset\")\n    parser.add_argument(\"--sort\", type=str, default=None, help=\"sort by column\")\n    parser.add_argument(\"--sort-ascending\", type=str, default=None, help=\"sort by column (ascending order)\")\n    parser.add_argument(\"--difference\", type=str, default=None, help=\"get difference from the dataset\")\n    parser.add_argument(\n        \"--intersection\", type=str, default=None, help=\"keep the paths in csv from the dataset and merge columns\"\n    )\n    parser.add_argument(\"--train-column\", action=\"store_true\", help=\"only keep the train column\")\n\n    # IO-related\n    parser.add_argument(\"--info\", action=\"store_true\", help=\"get the basic information of each video and image\")\n    parser.add_argument(\"--video-info\", action=\"store_true\", help=\"get the basic information of each video\")\n    parser.add_argument(\"--ext\", action=\"store_true\", help=\"check if the file exists\")\n    parser.add_argument(\n        \"--load-caption\", type=str, default=None, choices=[\"json\", \"txt\"], help=\"load the caption from json or txt\"\n    )\n\n    # path processing\n    parser.add_argument(\"--relpath\", type=str, default=None, help=\"modify the path to relative path by root given\")\n    parser.add_argument(\"--abspath\", type=str, default=None, help=\"modify the path to absolute path by root given\")\n    parser.add_argument(\"--path-to-id\", action=\"store_true\", help=\"add id based on path\")\n    parser.add_argument(\n        \"--path-subset\", type=str, default=None, help=\"extract a subset data containing the given `path-subset` value\"\n    )\n    parser.add_argument(\n        \"--remove-empty-path\",\n        action=\"store_true\",\n        help=\"remove rows with empty path\",  # caused by transform, cannot read path\n    )\n\n    # caption filtering\n    parser.add_argument(\n        \"--remove-empty-caption\",\n        action=\"store_true\",\n        help=\"remove rows with empty caption\",\n    )\n    parser.add_argument(\"--remove-url\", action=\"store_true\", help=\"remove rows with url in caption\")\n    parser.add_argument(\"--lang\", type=str, default=None, help=\"remove rows with other language\")\n    parser.add_argument(\"--remove-path-duplication\", action=\"store_true\", help=\"remove rows with duplicated path\")\n    parser.add_argument(\"--remove-text-duplication\", action=\"store_true\", help=\"remove rows with duplicated caption\")\n\n    # caption processing\n    parser.add_argument(\"--refine-llm-caption\", action=\"store_true\", help=\"modify the caption generated by LLM\")\n    parser.add_argument(\n        \"--clean-caption\", action=\"store_true\", help=\"modify the caption according to T5 pipeline to suit training\"\n    )\n    parser.add_argument(\"--merge-cmotion\", action=\"store_true\", help=\"merge the camera motion to the caption\")\n    parser.add_argument(\n        \"--count-num-token\", type=str, choices=[\"t5\"], default=None, help=\"Count the number of tokens in the caption\"\n    )\n    parser.add_argument(\"--append-text\", type=str, default=None, help=\"append text to the caption\")\n    parser.add_argument(\"--score-to-text\", action=\"store_true\", help=\"convert score to text\")\n    parser.add_argument(\"--update-text\", type=str, default=None, help=\"update the text with the given text\")\n\n    # score filtering\n    parser.add_argument(\"--filesize\", action=\"store_true\", help=\"get the filesize of each video and image in MB\")\n    parser.add_argument(\"--fsmax\", type=int, default=None, help=\"filter the dataset by maximum filesize\")\n    parser.add_argument(\"--fmin\", type=int, default=None, help=\"filter the dataset by minimum number of frames\")\n    parser.add_argument(\"--fmax\", type=int, default=None, help=\"filter the dataset by maximum number of frames\")\n    parser.add_argument(\"--hwmax\", type=int, default=None, help=\"filter the dataset by maximum resolution\")\n    parser.add_argument(\"--aesmin\", type=float, default=None, help=\"filter the dataset by minimum aes score\")\n    parser.add_argument(\"--matchmin\", type=float, default=None, help=\"filter the dataset by minimum match score\")\n    parser.add_argument(\"--flowmin\", type=float, default=None, help=\"filter the dataset by minimum flow score\")\n    parser.add_argument(\"--fpsmax\", type=float, default=None, help=\"filter the dataset by maximum fps\")\n    parser.add_argument(\"--img-only\", action=\"store_true\", help=\"only keep the image data\")\n    parser.add_argument(\"--vid-only\", action=\"store_true\", help=\"only keep the video data\")\n\n    # data processing\n    parser.add_argument(\"--shuffle\", default=False, action=\"store_true\", help=\"shuffle the dataset\")\n    parser.add_argument(\"--head\", type=int, default=None, help=\"return the first n rows of data\")\n\n    return parser.parse_args()\n\n\ndef get_output_path(args, input_name):\n    if args.output is not None:\n        return args.output\n    name = input_name\n    dir_path = os.path.dirname(args.input[0])\n\n    # sort\n    if args.sort is not None:\n        assert args.sort_ascending is None\n        name += \"_sort\"\n    if args.sort_ascending is not None:\n        assert args.sort is None\n        name += \"_sort\"\n\n    # IO-related\n    # for IO-related, the function must be wrapped in try-except\n    if args.info:\n        name += \"_info\"\n    if args.video_info:\n        name += \"_vinfo\"\n    if args.ext:\n        name += \"_ext\"\n    if args.load_caption:\n        name += f\"_load{args.load_caption}\"\n\n    # path processing\n    if args.relpath is not None:\n        name += \"_relpath\"\n    if args.abspath is not None:\n        name += \"_abspath\"\n    if args.remove_empty_path:\n        name += \"_noemptypath\"\n\n    # caption filtering\n    if args.remove_empty_caption:\n        name += \"_noempty\"\n    if args.remove_url:\n        name += \"_nourl\"\n    if args.lang is not None:\n        name += f\"_{args.lang}\"\n    if args.remove_path_duplication:\n        name += \"_noduppath\"\n    if args.remove_text_duplication:\n        name += \"_noduptext\"\n    if args.path_subset:\n        name += \"_subset\"\n\n    # caption processing\n    if args.refine_llm_caption:\n        name += \"_llm\"\n    if args.clean_caption:\n        name += \"_clean\"\n    if args.merge_cmotion:\n        name += \"_cmcaption\"\n    if args.count_num_token:\n        name += \"_ntoken\"\n    if args.append_text is not None:\n        name += \"_appendtext\"\n    if args.score_to_text:\n        name += \"_score2text\"\n    if args.update_text is not None:\n        name += \"_update\"\n\n    # score filtering\n    if args.filesize:\n        name += \"_filesize\"\n    if args.fsmax is not None:\n        name += f\"_fsmax{args.fsmax}\"\n    if args.fmin is not None:\n        name += f\"_fmin{args.fmin}\"\n    if args.fmax is not None:\n        name += f\"_fmax{args.fmax}\"\n    if args.fpsmax is not None:\n        name += f\"_fpsmax{args.fpsmax}\"\n    if args.hwmax is not None:\n        name += f\"_hwmax{args.hwmax}\"\n    if args.aesmin is not None:\n        name += f\"_aesmin{args.aesmin}\"\n    if args.matchmin is not None:\n        name += f\"_matchmin{args.matchmin}\"\n    if args.flowmin is not None:\n        name += f\"_flowmin{args.flowmin}\"\n    if args.img_only:\n        name += \"_img\"\n    if args.vid_only:\n        name += \"_vid\"\n\n    # processing\n    if args.shuffle:\n        name += f\"_shuffled_seed{args.seed}\"\n    if args.head is not None:\n        name += f\"_first_{args.head}_data\"\n\n    output_path = os.path.join(dir_path, f\"{name}.{args.format}\")\n    return output_path\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n    if args.disable_parallel:\n        PANDA_USE_PARALLEL = False\n    if PANDA_USE_PARALLEL:\n        if args.num_workers is not None:\n            pandarallel.initialize(nb_workers=args.num_workers, progress_bar=True)\n        else:\n            pandarallel.initialize(progress_bar=True)\n    if args.seed is not None:\n        random.seed(args.seed)\n        np.random.seed(args.seed)\n    main(args)\n"
  },
  {
    "path": "Open-Sora/tools/datasets/filter_panda10m.py",
    "content": "# TODO: remove this file before releasing\n\nimport argparse\nimport html\nimport os\nimport re\n\nimport pandas as pd\nfrom tqdm import tqdm\n\ntqdm.pandas()\n\ntry:\n    from pandarallel import pandarallel\n\n    pandarallel.initialize(progress_bar=True)\n    pandas_has_parallel = True\nexcept ImportError:\n    pandas_has_parallel = False\n\n\ndef apply(df, func, **kwargs):\n    if pandas_has_parallel:\n        return df.parallel_apply(func, **kwargs)\n    return df.progress_apply(func, **kwargs)\n\n\ndef basic_clean(text):\n    import ftfy\n\n    text = ftfy.fix_text(text)\n    text = html.unescape(html.unescape(text))\n    return text.strip()\n\n\nBAD_PUNCT_REGEX = re.compile(\n    r\"[\" + \"#®•©™&@·º½¾¿¡§~\" + \"\\)\" + \"\\(\" + \"\\]\" + \"\\[\" + \"\\}\" + \"\\{\" + \"\\|\" + \"\\\\\" + \"\\/\" + \"\\*\" + r\"]{1,}\"\n)  # noqa\n\n\ndef clean_caption(caption):\n    import urllib.parse as ul\n\n    from bs4 import BeautifulSoup\n\n    caption = str(caption)\n    caption = ul.unquote_plus(caption)\n    caption = caption.strip().lower()\n    caption = re.sub(\"<person>\", \"person\", caption)\n    # urls:\n    caption = re.sub(\n        r\"\\b((?:https?:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))\",  # noqa\n        \"\",\n        caption,\n    )  # regex for urls\n    caption = re.sub(\n        r\"\\b((?:www:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))\",  # noqa\n        \"\",\n        caption,\n    )  # regex for urls\n    # html:\n    caption = BeautifulSoup(caption, features=\"html.parser\").text\n\n    # @<nickname>\n    caption = re.sub(r\"@[\\w\\d]+\\b\", \"\", caption)\n\n    # 31C0—31EF CJK Strokes\n    # 31F0—31FF Katakana Phonetic Extensions\n    # 3200—32FF Enclosed CJK Letters and Months\n    # 3300—33FF CJK Compatibility\n    # 3400—4DBF CJK Unified Ideographs Extension A\n    # 4DC0—4DFF Yijing Hexagram Symbols\n    # 4E00—9FFF CJK Unified Ideographs\n    caption = re.sub(r\"[\\u31c0-\\u31ef]+\", \"\", caption)\n    caption = re.sub(r\"[\\u31f0-\\u31ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3200-\\u32ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3300-\\u33ff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u3400-\\u4dbf]+\", \"\", caption)\n    caption = re.sub(r\"[\\u4dc0-\\u4dff]+\", \"\", caption)\n    caption = re.sub(r\"[\\u4e00-\\u9fff]+\", \"\", caption)\n    #######################################################\n\n    # все виды тире / all types of dash --> \"-\"\n    caption = re.sub(\n        r\"[\\u002D\\u058A\\u05BE\\u1400\\u1806\\u2010-\\u2015\\u2E17\\u2E1A\\u2E3A\\u2E3B\\u2E40\\u301C\\u3030\\u30A0\\uFE31\\uFE32\\uFE58\\uFE63\\uFF0D]+\",  # noqa\n        \"-\",\n        caption,\n    )\n\n    # кавычки к одному стандарту\n    caption = re.sub(r\"[`´«»“”¨]\", '\"', caption)\n    caption = re.sub(r\"[‘’]\", \"'\", caption)\n\n    # &quot;\n    caption = re.sub(r\"&quot;?\", \"\", caption)\n    # &amp\n    caption = re.sub(r\"&amp\", \"\", caption)\n\n    # ip adresses:\n    caption = re.sub(r\"\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\", \" \", caption)\n\n    # article ids:\n    caption = re.sub(r\"\\d:\\d\\d\\s+$\", \"\", caption)\n\n    # \\n\n    caption = re.sub(r\"\\\\n\", \" \", caption)\n\n    # \"#123\"\n    caption = re.sub(r\"#\\d{1,3}\\b\", \"\", caption)\n    # \"#12345..\"\n    caption = re.sub(r\"#\\d{5,}\\b\", \"\", caption)\n    # \"123456..\"\n    caption = re.sub(r\"\\b\\d{6,}\\b\", \"\", caption)\n    # filenames:\n    caption = re.sub(r\"[\\S]+\\.(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)\", \"\", caption)\n\n    #\n    caption = re.sub(r\"[\\\"\\']{2,}\", r'\"', caption)  # \"\"\"AUSVERKAUFT\"\"\"\n    caption = re.sub(r\"[\\.]{2,}\", r\" \", caption)  # \"\"\"AUSVERKAUFT\"\"\"\n\n    caption = re.sub(BAD_PUNCT_REGEX, r\" \", caption)  # ***AUSVERKAUFT***, #AUSVERKAUFT\n    caption = re.sub(r\"\\s+\\.\\s+\", r\" \", caption)  # \" . \"\n\n    # this-is-my-cute-cat / this_is_my_cute_cat\n    regex2 = re.compile(r\"(?:\\-|\\_)\")\n    if len(re.findall(regex2, caption)) > 3:\n        caption = re.sub(regex2, \" \", caption)\n\n    caption = basic_clean(caption)\n\n    caption = re.sub(r\"\\b[a-zA-Z]{1,3}\\d{3,15}\\b\", \"\", caption)  # jc6640\n    caption = re.sub(r\"\\b[a-zA-Z]+\\d+[a-zA-Z]+\\b\", \"\", caption)  # jc6640vc\n    caption = re.sub(r\"\\b\\d+[a-zA-Z]+\\d+\\b\", \"\", caption)  # 6640vc231\n\n    caption = re.sub(r\"(worldwide\\s+)?(free\\s+)?shipping\", \"\", caption)\n    caption = re.sub(r\"(free\\s)?download(\\sfree)?\", \"\", caption)\n    caption = re.sub(r\"\\bclick\\b\\s(?:for|on)\\s\\w+\", \"\", caption)\n    caption = re.sub(r\"\\b(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)(\\simage[s]?)?\", \"\", caption)\n    caption = re.sub(r\"\\bpage\\s+\\d+\\b\", \"\", caption)\n\n    caption = re.sub(r\"\\b\\d*[a-zA-Z]+\\d+[a-zA-Z]+\\d+[a-zA-Z\\d]*\\b\", r\" \", caption)  # j2d1a2a...\n\n    caption = re.sub(r\"\\b\\d+\\.?\\d*[xх×]\\d+\\.?\\d*\\b\", \"\", caption)\n\n    caption = re.sub(r\"\\b\\s+\\:\\s+\", r\": \", caption)\n    caption = re.sub(r\"(\\D[,\\./])\\b\", r\"\\1 \", caption)\n    caption = re.sub(r\"\\s+\", \" \", caption)\n\n    caption.strip()\n\n    caption = re.sub(r\"^[\\\"\\']([\\w\\W]+)[\\\"\\']$\", r\"\\1\", caption)\n    caption = re.sub(r\"^[\\'\\_,\\-\\:;]\", r\"\", caption)\n    caption = re.sub(r\"[\\'\\_,\\-\\:\\-\\+]$\", r\"\", caption)\n    caption = re.sub(r\"^\\.\\S+$\", \"\", caption)\n\n    return caption.strip()\n\n\ndef get_10m_set():\n    meta_path_10m = \"/mnt/hdd/data/Panda-70M/raw/meta/train/panda70m_training_10m.csv\"\n    meta_10m = pd.read_csv(meta_path_10m)\n\n    def process_single_caption(row):\n        text_list = eval(row[\"caption\"])\n        clean_list = [clean_caption(x) for x in text_list]\n        return str(clean_list)\n\n    ret = apply(meta_10m, process_single_caption, axis=1)\n    # ret = meta_10m.progress_apply(process_single_caption, axis=1)\n    print(\"==> text processed.\")\n\n    text_list = []\n    for x in ret:\n        text_list += eval(x)\n        # text_set = text_set.union(set(eval(x)))\n    text_set = set(text_list)\n    # meta_10m['caption_new'] = ret\n    # meta_10m.to_csv('/mnt/hdd/data/Panda-70M/raw/meta/train/panda70m_training_10m_new-cap.csv')\n\n    # video_id_set = set(meta_10m['videoID'])\n    # id2t = {}\n    # for idx, row in tqdm(meta_10m.iterrows(), total=len(meta_10m)):\n    #     video_id = row['videoID']\n    #     text_list = eval(row['caption'])\n    #     id2t[video_id] = set(text_list)\n\n    print(f\"==> Loaded meta_10m from '{meta_path_10m}'\")\n    return text_set\n\n\ndef filter_panda10m_text(meta_path, text_set):\n    def process_single_row(row):\n        # path = row['path']\n        t = row[\"text\"]\n        # fname = os.path.basename(path)\n        # video_id = fname[:fname.rindex('_')]\n        if t not in text_set:\n            return False\n        return True\n\n    meta = pd.read_csv(meta_path)\n    ret = apply(meta, process_single_row, axis=1)\n    # ret = meta.progress_apply(process_single_row, axis=1)\n\n    meta = meta[ret]\n    wo_ext, ext = os.path.splitext(meta_path)\n    out_path = f\"{wo_ext}_filter-10m{ext}\"\n    meta.to_csv(out_path, index=False)\n    print(f\"New meta (shape={meta.shape}) saved to '{out_path}'.\")\n\n\ndef filter_panda10m_timestamp(meta_path):\n    meta_path_10m = \"/mnt/hdd/data/Panda-70M/raw/meta/train/panda70m_training_10m.csv\"\n    meta_10m = pd.read_csv(meta_path_10m)\n\n    id2t = {}\n    for idx, row in tqdm(meta_10m.iterrows(), total=len(meta_10m)):\n        video_id = row[\"videoID\"]\n        timestamp = eval(row[\"timestamp\"])\n        timestamp = [str(tuple(x)) for x in timestamp]\n        id2t[video_id] = timestamp\n\n    # video_id_set_10m = set(meta_10m['videoID'])\n    print(f\"==> Loaded meta_10m from '{meta_path_10m}'\")\n\n    def process_single_row(row):\n        path = row[\"path\"]\n        t = row[\"timestamp\"]\n        fname = os.path.basename(path)\n        video_id = fname[: fname.rindex(\"_\")]\n        if video_id not in id2t:\n            return False\n        if t not in id2t[video_id]:\n            return False\n        return True\n        # return video_id in video_id_set_10m\n\n    meta = pd.read_csv(meta_path)\n    ret = apply(meta, process_single_row, axis=1)\n\n    meta = meta[ret]\n    wo_ext, ext = os.path.splitext(meta_path)\n    out_path = f\"{wo_ext}_filter-10m{ext}\"\n    meta.to_csv(out_path, index=False)\n    print(f\"New meta (shape={meta.shape}) saved to '{out_path}'.\")\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"--meta_path\", type=str, nargs=\"+\")\n    parser.add_argument(\"--num_workers\", default=5, type=int)\n\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n\n    text_set = get_10m_set()\n    for x in args.meta_path:\n        filter_panda10m_text(x, text_set)\n"
  },
  {
    "path": "Open-Sora/tools/datasets/split.py",
    "content": "import argparse\nfrom typing import List\n\nimport pandas as pd\nfrom mmengine.config import Config\n\nfrom opensora.datasets.bucket import Bucket\n\n\ndef split_by_bucket(\n    bucket: Bucket,\n    input_files: List[str],\n    output_path: str,\n    limit: int,\n    frame_interval: int,\n):\n    print(f\"Split {len(input_files)} files into {len(bucket)} buckets\")\n    total_limit = len(bucket) * limit\n    bucket_cnt = {}\n    # get all bucket id\n    for hw_id, d in bucket.ar_criteria.items():\n        for t_id, v in d.items():\n            for ar_id in v.keys():\n                bucket_id = (hw_id, t_id, ar_id)\n                bucket_cnt[bucket_id] = 0\n    output_df = None\n    # split files\n    for path in input_files:\n        df = pd.read_csv(path)\n        if output_df is None:\n            output_df = pd.DataFrame(columns=df.columns)\n        for i in range(len(df)):\n            row = df.iloc[i]\n            t, h, w = row[\"num_frames\"], row[\"height\"], row[\"width\"]\n            bucket_id = bucket.get_bucket_id(t, h, w, frame_interval)\n            if bucket_id is None:\n                continue\n            if bucket_cnt[bucket_id] < limit:\n                bucket_cnt[bucket_id] += 1\n                output_df = pd.concat([output_df, pd.DataFrame([row])], ignore_index=True)\n                if len(output_df) >= total_limit:\n                    break\n        if len(output_df) >= total_limit:\n            break\n    assert len(output_df) <= total_limit\n    if len(output_df) == total_limit:\n        print(f\"All buckets are full ({total_limit} samples)\")\n    else:\n        print(f\"Only {len(output_df)} files are used\")\n    output_df.to_csv(output_path, index=False)\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"input\", type=str, nargs=\"+\")\n    parser.add_argument(\"-o\", \"--output\", required=True)\n    parser.add_argument(\"-c\", \"--config\", required=True)\n    parser.add_argument(\"-l\", \"--limit\", default=200, type=int)\n    args = parser.parse_args()\n    assert args.limit > 0\n\n    cfg = Config.fromfile(args.config)\n    bucket_config = cfg.bucket_config\n    # rewrite bucket_config\n    for ar, d in bucket_config.items():\n        for frames, t in d.items():\n            p, bs = t\n            if p > 0.0:\n                p = 1.0\n            d[frames] = (p, bs)\n    bucket = Bucket(bucket_config)\n    split_by_bucket(bucket, args.input, args.output, args.limit, cfg.dataset.frame_interval)\n"
  },
  {
    "path": "Open-Sora/tools/datasets/transform.py",
    "content": "import argparse\nimport os\nimport random\n\nimport cv2\nimport numpy as np\nimport pandas as pd\nfrom tqdm import tqdm\n\nfrom .utils import IMG_EXTENSIONS, extract_frames\n\ntqdm.pandas()\n\ntry:\n    from pandarallel import pandarallel\n\n    pandarallel.initialize(progress_bar=True)\n    pandas_has_parallel = True\nexcept ImportError:\n    pandas_has_parallel = False\n\n\ndef apply(df, func, **kwargs):\n    if pandas_has_parallel:\n        return df.parallel_apply(func, **kwargs)\n    return df.progress_apply(func, **kwargs)\n\n\ndef get_new_path(path, input_dir, output):\n    path_new = os.path.join(output, os.path.relpath(path, input_dir))\n    os.makedirs(os.path.dirname(path_new), exist_ok=True)\n    return path_new\n\n\ndef resize(path, length, input_dir, output):\n    path_new = get_new_path(path, input_dir, output)\n    ext = os.path.splitext(path)[1].lower()\n    assert ext in IMG_EXTENSIONS\n    img = cv2.imread(path)\n    if img is not None:\n        h, w = img.shape[:2]\n        if min(h, w) > length:\n            if h > w:\n                new_h = length\n                new_w = int(w * new_h / h)\n            else:\n                new_w = length\n                new_h = int(h * new_w / w)\n            img = cv2.resize(img, (new_w, new_h))\n        cv2.imwrite(path_new, img)\n    else:\n        path_new = \"\"\n    return path_new\n\n\ndef rand_crop(path, input_dir, output):\n    ext = os.path.splitext(path)[1].lower()\n    path_new = get_new_path(path, input_dir, output)\n    assert ext in IMG_EXTENSIONS\n    img = cv2.imread(path)\n    if img is not None:\n        h, w = img.shape[:2]\n        width, height, _ = img.shape\n        pos = random.randint(0, 3)\n        if pos == 0:\n            img_cropped = img[: width // 2, : height // 2]\n        elif pos == 1:\n            img_cropped = img[width // 2 :, : height // 2]\n        elif pos == 2:\n            img_cropped = img[: width // 2, height // 2 :]\n        else:\n            img_cropped = img[width // 2 :, height // 2 :]\n        cv2.imwrite(path_new, img_cropped)\n    else:\n        path_new = \"\"\n    return path_new\n\n\ndef main(args):\n    data = pd.read_csv(args.input)\n    if args.method == \"img_rand_crop\":\n        data[\"path\"] = apply(data[\"path\"], lambda x: rand_crop(x, args.input_dir, args.output))\n        output_csv = args.input.replace(\".csv\", f\"_rand_crop.csv\")\n    elif args.method == \"img_resize\":\n        data[\"path\"] = apply(data[\"path\"], lambda x: resize(x, args.length, args.input_dir, args.output))\n        output_csv = args.input.replace(\".csv\", f\"_resized{args.length}.csv\")\n    elif args.method == \"vid_frame_extract\":\n        points = args.points if args.points is not None else args.points_index\n        data = pd.DataFrame(np.repeat(data.values, 3, axis=0), columns=data.columns)\n        num_points = len(points)\n        data[\"point\"] = np.nan\n        for i, point in enumerate(points):\n            if isinstance(point, int):\n                data.loc[i::num_points, \"point\"] = point\n            else:\n                data.loc[i::num_points, \"point\"] = data.loc[i::num_points, \"num_frames\"] * point\n        data[\"path\"] = apply(data, lambda x: extract_frames(x[\"path\"], args.input_dir, args.output, x[\"point\"]), axis=1)\n        output_csv = args.input.replace(\".csv\", f\"_vid_frame_extract.csv\")\n\n    data.to_csv(output_csv, index=False)\n    print(f\"Saved to {output_csv}\")\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"method\", type=str, choices=[\"img_resize\", \"img_rand_crop\", \"vid_frame_extract\"])\n    parser.add_argument(\"input\", type=str)\n    parser.add_argument(\"input_dir\", type=str)\n    parser.add_argument(\"output\", type=str)\n    parser.add_argument(\"--disable-parallel\", action=\"store_true\")\n    parser.add_argument(\"--length\", type=int, default=2160)\n    parser.add_argument(\"--seed\", type=int, default=42, help=\"seed for random\")\n    parser.add_argument(\"--points\", nargs=\"+\", type=float, default=None)\n    parser.add_argument(\"--points_index\", nargs=\"+\", type=int, default=None)\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n    random.seed(args.seed)\n    if args.disable_parallel:\n        pandas_has_parallel = False\n    main(args)\n"
  },
  {
    "path": "Open-Sora/tools/datasets/utils.py",
    "content": "import os\n\nimport cv2\nimport numpy as np\nfrom PIL import Image\n\nIMG_EXTENSIONS = (\".jpg\", \".jpeg\", \".png\", \".ppm\", \".bmp\", \".pgm\", \".tif\", \".tiff\", \".webp\")\nVID_EXTENSIONS = (\".mp4\", \".avi\", \".mov\", \".mkv\")\n\n\ndef is_video(filename):\n    ext = os.path.splitext(filename)[-1].lower()\n    return ext in VID_EXTENSIONS\n\n\ndef extract_frames(\n    video_path,\n    frame_inds=None,\n    points=None,\n    backend=\"opencv\",\n    return_length=False,\n    num_frames=None,\n):\n    \"\"\"\n    Args:\n        video_path (str): path to video\n        frame_inds (List[int]): indices of frames to extract\n        points (List[float]): values within [0, 1); multiply #frames to get frame indices\n    Return:\n        List[PIL.Image]\n    \"\"\"\n    assert backend in [\"av\", \"opencv\", \"decord\"]\n    assert (frame_inds is None) or (points is None)\n\n    if backend == \"av\":\n        import av\n\n        container = av.open(video_path)\n        if num_frames is not None:\n            total_frames = num_frames\n        else:\n            total_frames = container.streams.video[0].frames\n\n        if points is not None:\n            frame_inds = [int(p * total_frames) for p in points]\n\n        frames = []\n        for idx in frame_inds:\n            if idx >= total_frames:\n                idx = total_frames - 1\n            target_timestamp = int(idx * av.time_base / container.streams.video[0].average_rate)\n            container.seek(target_timestamp)\n            frame = next(container.decode(video=0)).to_image()\n            frames.append(frame)\n\n        if return_length:\n            return frames, total_frames\n        return frames\n\n    elif backend == \"decord\":\n        import decord\n\n        container = decord.VideoReader(video_path, num_threads=1)\n        if num_frames is not None:\n            total_frames = num_frames\n        else:\n            total_frames = len(container)\n\n        if points is not None:\n            frame_inds = [int(p * total_frames) for p in points]\n\n        frame_inds = np.array(frame_inds).astype(np.int32)\n        frame_inds[frame_inds >= total_frames] = total_frames - 1\n        frames = container.get_batch(frame_inds).asnumpy()  # [N, H, W, C]\n        frames = [Image.fromarray(x) for x in frames]\n\n        if return_length:\n            return frames, total_frames\n        return frames\n\n    elif backend == \"opencv\":\n        cap = cv2.VideoCapture(video_path)\n        if num_frames is not None:\n            total_frames = num_frames\n        else:\n            total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))\n\n        if points is not None:\n            frame_inds = [int(p * total_frames) for p in points]\n\n        frames = []\n        for idx in frame_inds:\n            if idx >= total_frames:\n                idx = total_frames - 1\n\n            cap.set(cv2.CAP_PROP_POS_FRAMES, idx)\n\n            # HACK: sometimes OpenCV fails to read frames, return a black frame instead\n            try:\n                ret, frame = cap.read()\n                frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n                frame = Image.fromarray(frame)\n            except Exception as e:\n                print(f\"[Warning] Error reading frame {idx} from {video_path}: {e}\")\n                # First, try to read the first frame\n                try:\n                    print(f\"[Warning] Try reading first frame.\")\n                    cap.set(cv2.CAP_PROP_POS_FRAMES, 0)\n                    ret, frame = cap.read()\n                    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n                    frame = Image.fromarray(frame)\n                # If that fails, return a black frame\n                except Exception as e:\n                    print(f\"[Warning] Error in reading first frame from {video_path}: {e}\")\n                    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))\n                    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))\n                    frame = Image.new(\"RGB\", (width, height), (0, 0, 0))\n\n            # HACK: if height or width is 0, return a black frame instead\n            if frame.height == 0 or frame.width == 0:\n                height = width = 256\n                frame = Image.new(\"RGB\", (width, height), (0, 0, 0))\n\n            frames.append(frame)\n\n        if return_length:\n            return frames, total_frames\n        return frames\n    else:\n        raise ValueError\n"
  },
  {
    "path": "Open-Sora/tools/frame_interpolation/README.md",
    "content": "# Frame Interpolation\n\nFor current version, we sample 1 frame out of 3 frames in the video. Although we are going to use VAE to avoid frame loss, we provide a frame interpolation tool to interpolate the video now. The frame interpolation tool is based on [AMT](https://github.com/MCG-NKU/AMT).\n\nInterpolation can be useful for scenery videos, but it may not be suitable for videos with fast motion.\n\n## Requirement\n\nInstall the required dependancies by following our [installation instructions](../../docs/installation.md)'s \"Data Dependencies\" and \"Frame Interpolation\" sections.\n\n<!-- ```bash\nconda install -c conda-forge opencv\npip install imageio\n``` -->\n\n## Model\n\nWe use **AMT** as our frame interpolation model. After sampling, you can use frame interpolation model to interpolate your video smoothly.\n\n## Usage\n\nThe ckpt file will be automatically downloaded in user's `.cache` directory. You can use frame interpolation to your video file or a video folder.\n\n1. Process a video file\n\n```python\npython -m tools.frame_interpolation.interpolation your_video.mp4\n```\n\n2. Process all video file in target directory\n\n```python\npython -m tools.frame_interpolation.interpolation your_video_dir --output_path samples/interpolation\n```\n\nThe output video will be stored at `output_path` and its duration time is equal `the total number of frames after frame interpolation / the frame rate`\n\n### Command Line Arguments\n\n* `input`: Path of the input video. **Video path** or **Folder path(with --folder)**\n* `--ckpt`: Pretrained model of [AMT](https://github.com/MCG-NKU/AMT). Default path: `~/.cache/amt-g.pth`.\n* `--niter`: Iterations of interpolation. With $m$ input frames, `[N_ITER]` $=n$ corresponds to $2^n\\times (m-1)+1$ output frames.\n* `--fps`: Frame rate of the input video. (Default: 8)\n* `--output_path`: **Folder Path** of the output video.\n"
  },
  {
    "path": "Open-Sora/tools/frame_interpolation/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/tools/frame_interpolation/interpolation.py",
    "content": "# this script is modified from https://github.com/MCG-NKU/AMT/blob/main/demos/demo_2x.py\nimport argparse\nimport os\nimport os.path as osp\n\nimport cv2\nimport numpy as np\nimport torch\n\nfrom opensora.utils.ckpt_utils import download_model\n\nfrom .networks.amt_g import Model\nfrom .utils.utils import InputPadder, img2tensor, tensor2img\n\nhf_endpoint = os.environ.get(\"HF_ENDPOINT\")\nif hf_endpoint is None:\n    hf_endpoint = \"https://huggingface.co\"\nVID_EXT = [\".mp4\", \".avi\", \".mov\", \".mkv\", \".flv\", \".wmv\", \".webm\"]\nnetwork_cfg = {\n    \"params\": {\n        \"corr_radius\": 3,\n        \"corr_lvls\": 4,\n        \"num_flows\": 5,\n    },\n}\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n\n\ndef init():\n    \"\"\"\n    initialize the device and the anchor resolution.\n    \"\"\"\n\n    if device == \"cuda\":\n        anchor_resolution = 1024 * 512\n        anchor_memory = 1500 * 1024**2\n        anchor_memory_bias = 2500 * 1024**2\n        vram_avail = torch.cuda.get_device_properties(device).total_memory\n        print(\"VRAM available: {:.1f} MB\".format(vram_avail / 1024**2))\n    else:\n        # Do not resize in cpu mode\n        anchor_resolution = 8192 * 8192\n        anchor_memory = 1\n        anchor_memory_bias = 0\n        vram_avail = 1\n\n    return anchor_resolution, anchor_memory, anchor_memory_bias, vram_avail\n\n\ndef get_input_video_from_path(input_path):\n    \"\"\"\n    Get the input video from the input_path.\n\n    params:\n        input_path: str, the path of the input video.\n        devices: str, the device to run the model.\n    returns:\n        inputs: list, the list of the input frames.\n        scale: float, the scale of the input frames.\n        padder: InputPadder, the padder to pad the input frames.\n    \"\"\"\n\n    anchor_resolution, anchor_memory, anchor_memory_bias, vram_avail = init()\n\n    if osp.splitext(input_path)[-1].lower() in VID_EXT:\n        vcap = cv2.VideoCapture(input_path)\n\n        inputs = []\n        w = int(vcap.get(cv2.CAP_PROP_FRAME_WIDTH))\n        h = int(vcap.get(cv2.CAP_PROP_FRAME_HEIGHT))\n        scale = anchor_resolution / (h * w) * np.sqrt((vram_avail - anchor_memory_bias) / anchor_memory)\n        scale = 1 if scale > 1 else scale\n        scale = 1 / np.floor(1 / np.sqrt(scale) * 16) * 16\n        if scale < 1:\n            print(f\"Due to the limited VRAM, the video will be scaled by {scale:.2f}\")\n        padding = int(16 / scale)\n        padder = InputPadder((h, w), padding)\n        while True:\n            ret, frame = vcap.read()\n            if ret is False:\n                break\n            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n            frame_t = img2tensor(frame).to(device)\n            frame_t = padder.pad(frame_t)\n            inputs.append(frame_t)\n        print(f\"Loading the [video] from {input_path}, the number of frames [{len(inputs)}]\")\n    else:\n        raise TypeError(\"Input should be a video.\")\n\n    return inputs, scale, padder\n\n\ndef load_model(ckpt):\n    \"\"\"\n    load the frame interpolation model.\n    \"\"\"\n    params = network_cfg.get(\"params\", {})\n    model = Model(**params)\n    model.load_state_dict(ckpt[\"state_dict\"])\n    model = model.to(device)\n    model.eval()\n    return model\n\n\ndef interpolater(model, inputs, scale, padder, iters=1):\n    \"\"\"\n    interpolating with the interpolation model.\n\n    params:\n        model: nn.Module, the frame interpolation model.\n        inputs: list, the list of the input frames.\n        scale: float, the scale of the input frames.\n        iters: int, the number of iterations of interpolation. The final frames model generating is 2 ** iters * (m - 1) + 1 and m is input frames.\n    returns:\n        outputs: list, the list of the output frames.\n    \"\"\"\n\n    print(\"Start frame interpolation:\")\n    embt = torch.tensor(1 / 2).float().view(1, 1, 1, 1).to(device)\n\n    for i in range(iters):\n        print(f\"Iter {i+1}. input_frames={len(inputs)} output_frames={2*len(inputs)-1}\")\n        outputs = [inputs[0]]\n        for in_0, in_1 in zip(inputs[:-1], inputs[1:]):\n            in_0 = in_0.to(device)\n            in_1 = in_1.to(device)\n            with torch.no_grad():\n                imgt_pred = model(in_0, in_1, embt, scale_factor=scale, eval=True)[\"imgt_pred\"]\n            outputs += [imgt_pred.cpu(), in_1.cpu()]\n        inputs = outputs\n\n    outputs = padder.unpad(*outputs)\n    return outputs\n\n\ndef write(outputs, input_path, output_path, fps=30):\n    \"\"\"\n    write results to the output_path.\n    \"\"\"\n\n    if osp.exists(output_path) is False:\n        os.makedirs(output_path)\n\n    size = outputs[0].shape[2:][::-1]\n\n    _, file_name_with_extension = os.path.split(input_path)\n    file_name, _ = os.path.splitext(file_name_with_extension)\n\n    save_video_path = f\"{output_path}/fps{fps}_{file_name}.mp4\"\n    fourcc = cv2.VideoWriter_fourcc(*\"mp4v\")\n    writer = cv2.VideoWriter(save_video_path, fourcc, fps, size)\n\n    for i, imgt_pred in enumerate(outputs):\n        imgt_pred = tensor2img(imgt_pred)\n        imgt_pred = cv2.cvtColor(imgt_pred, cv2.COLOR_RGB2BGR)\n        writer.write(imgt_pred)\n    print(f\"Demo video is saved to [{save_video_path}]\")\n\n    writer.release()\n\n\ndef process(\n    model,\n    image_path,\n    output_path,\n    fps,\n    iters,\n):\n    inputs, scale, padder = get_input_video_from_path(image_path)\n    outputs = interpolater(model, inputs, scale, padder, iters)\n    write(outputs, image_path, output_path, fps)\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"input\", help=\"Input video.\")\n    parser.add_argument(\"--ckpt\", type=str, default=\"./pretrained_models/amt-g.pth\", help=\"The pretrained model.\")\n    parser.add_argument(\n        \"--niters\",\n        type=int,\n        default=1,\n        help=\"Iter of Interpolation. The number of frames will be double after per iter.\",\n    )\n    parser.add_argument(\"--output_path\", type=str, default=\"samples\", help=\"Output path.\")\n    parser.add_argument(\"--fps\", type=int, default=8, help=\"Frames rate of the output video.\")\n    parser.add_argument(\"--folder\", action=\"store_true\", help=\"If the input is a folder, set this flag.\")\n    args = parser.parse_args()\n\n    times_frame = 2**args.niters\n    old_fps = args.fps\n    args.fps = args.fps * times_frame\n    print(f\"Interpolation will turn {old_fps}fps video to {args.fps}fps video.\")\n    args.input = os.path.expanduser(args.input)\n    args.ckpt = os.path.expanduser(args.ckpt)\n    args.folder = osp.splitext(args.input)[-1].lower() not in VID_EXT\n    args.ckpt = download_model(local_path=args.ckpt, url=hf_endpoint + \"/lalala125/AMT/resolve/main/amt-g.pth\")\n    return args\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n    ckpt_path = args.ckpt\n    input_path = args.input\n    output_path = args.output_path\n    iters = int(args.niters)\n    fps = int(args.fps)\n\n    model = load_model(ckpt_path)\n\n    if args.folder:\n        for file in os.listdir(input_path):\n            if osp.splitext(file)[-1].lower() in VID_EXT:\n                vid_path = os.path.join(input_path, file)\n                process(model, vid_path, output_path, fps, iters)\n    else:\n        process(model, input_path, output_path, fps, iters)\n\n    print(\"Interpolation is done.\")\n    print(f\"Output path: {output_path}\")\n"
  },
  {
    "path": "Open-Sora/tools/frame_interpolation/networks/__init__.py",
    "content": "from .amt_g import Model\n"
  },
  {
    "path": "Open-Sora/tools/frame_interpolation/networks/amt_g.py",
    "content": "import torch\nimport torch.nn as nn\n\nfrom .blocks.feat_enc import LargeEncoder\nfrom .blocks.ifrnet import Encoder, InitDecoder, IntermediateDecoder, resize\nfrom .blocks.multi_flow import MultiFlowDecoder, multi_flow_combine\nfrom .blocks.raft import BasicUpdateBlock, BidirCorrBlock, coords_grid\n\n\nclass Model(nn.Module):\n    def __init__(self, corr_radius=3, corr_lvls=4, num_flows=5, channels=[84, 96, 112, 128], skip_channels=84):\n        super(Model, self).__init__()\n        self.radius = corr_radius\n        self.corr_levels = corr_lvls\n        self.num_flows = num_flows\n\n        self.feat_encoder = LargeEncoder(output_dim=128, norm_fn=\"instance\", dropout=0.0)\n        self.encoder = Encoder(channels, large=True)\n        self.decoder4 = InitDecoder(channels[3], channels[2], skip_channels)\n        self.decoder3 = IntermediateDecoder(channels[2], channels[1], skip_channels)\n        self.decoder2 = IntermediateDecoder(channels[1], channels[0], skip_channels)\n        self.decoder1 = MultiFlowDecoder(channels[0], skip_channels, num_flows)\n\n        self.update4 = self._get_updateblock(112, None)\n        self.update3_low = self._get_updateblock(96, 2.0)\n        self.update2_low = self._get_updateblock(84, 4.0)\n\n        self.update3_high = self._get_updateblock(96, None)\n        self.update2_high = self._get_updateblock(84, None)\n\n        self.comb_block = nn.Sequential(\n            nn.Conv2d(3 * self.num_flows, 6 * self.num_flows, 7, 1, 3),\n            nn.PReLU(6 * self.num_flows),\n            nn.Conv2d(6 * self.num_flows, 3, 7, 1, 3),\n        )\n\n    def _get_updateblock(self, cdim, scale_factor=None):\n        return BasicUpdateBlock(\n            cdim=cdim,\n            hidden_dim=192,\n            flow_dim=64,\n            corr_dim=256,\n            corr_dim2=192,\n            fc_dim=188,\n            scale_factor=scale_factor,\n            corr_levels=self.corr_levels,\n            radius=self.radius,\n        )\n\n    def _corr_scale_lookup(self, corr_fn, coord, flow0, flow1, embt, downsample=1):\n        # convert t -> 0 to 0 -> 1 | convert t -> 1 to 1 -> 0\n        # based on linear assumption\n        t1_scale = 1.0 / embt\n        t0_scale = 1.0 / (1.0 - embt)\n        if downsample != 1:\n            inv = 1 / downsample\n            flow0 = inv * resize(flow0, scale_factor=inv)\n            flow1 = inv * resize(flow1, scale_factor=inv)\n\n        corr0, corr1 = corr_fn(coord + flow1 * t1_scale, coord + flow0 * t0_scale)\n        corr = torch.cat([corr0, corr1], dim=1)\n        flow = torch.cat([flow0, flow1], dim=1)\n        return corr, flow\n\n    def forward(self, img0, img1, embt, scale_factor=1.0, eval=False, **kwargs):\n        mean_ = torch.cat([img0, img1], 2).mean(1, keepdim=True).mean(2, keepdim=True).mean(3, keepdim=True)\n        img0 = img0 - mean_\n        img1 = img1 - mean_\n        img0_ = resize(img0, scale_factor) if scale_factor != 1.0 else img0\n        img1_ = resize(img1, scale_factor) if scale_factor != 1.0 else img1\n        b, _, h, w = img0_.shape\n        coord = coords_grid(b, h // 8, w // 8, img0.device)\n\n        fmap0, fmap1 = self.feat_encoder([img0_, img1_])  # [1, 128, H//8, W//8]\n        corr_fn = BidirCorrBlock(fmap0, fmap1, radius=self.radius, num_levels=self.corr_levels)\n\n        # f0_1: [1, c0, H//2, W//2] | f0_2: [1, c1, H//4, W//4]\n        # f0_3: [1, c2, H//8, W//8] | f0_4: [1, c3, H//16, W//16]\n        f0_1, f0_2, f0_3, f0_4 = self.encoder(img0_)\n        f1_1, f1_2, f1_3, f1_4 = self.encoder(img1_)\n\n        ######################################### the 4th decoder #########################################\n        up_flow0_4, up_flow1_4, ft_3_ = self.decoder4(f0_4, f1_4, embt)\n        corr_4, flow_4 = self._corr_scale_lookup(corr_fn, coord, up_flow0_4, up_flow1_4, embt, downsample=1)\n\n        # residue update with lookup corr\n        delta_ft_3_, delta_flow_4 = self.update4(ft_3_, flow_4, corr_4)\n        delta_flow0_4, delta_flow1_4 = torch.chunk(delta_flow_4, 2, 1)\n        up_flow0_4 = up_flow0_4 + delta_flow0_4\n        up_flow1_4 = up_flow1_4 + delta_flow1_4\n        ft_3_ = ft_3_ + delta_ft_3_\n\n        ######################################### the 3rd decoder #########################################\n        up_flow0_3, up_flow1_3, ft_2_ = self.decoder3(ft_3_, f0_3, f1_3, up_flow0_4, up_flow1_4)\n        corr_3, flow_3 = self._corr_scale_lookup(corr_fn, coord, up_flow0_3, up_flow1_3, embt, downsample=2)\n\n        # residue update with lookup corr\n        delta_ft_2_, delta_flow_3 = self.update3_low(ft_2_, flow_3, corr_3)\n        delta_flow0_3, delta_flow1_3 = torch.chunk(delta_flow_3, 2, 1)\n        up_flow0_3 = up_flow0_3 + delta_flow0_3\n        up_flow1_3 = up_flow1_3 + delta_flow1_3\n        ft_2_ = ft_2_ + delta_ft_2_\n\n        # residue update with lookup corr (hr)\n        corr_3 = resize(corr_3, scale_factor=2.0)\n        up_flow_3 = torch.cat([up_flow0_3, up_flow1_3], dim=1)\n        delta_ft_2_, delta_up_flow_3 = self.update3_high(ft_2_, up_flow_3, corr_3)\n        ft_2_ += delta_ft_2_\n        up_flow0_3 += delta_up_flow_3[:, 0:2]\n        up_flow1_3 += delta_up_flow_3[:, 2:4]\n\n        ######################################### the 2nd decoder #########################################\n        up_flow0_2, up_flow1_2, ft_1_ = self.decoder2(ft_2_, f0_2, f1_2, up_flow0_3, up_flow1_3)\n        corr_2, flow_2 = self._corr_scale_lookup(corr_fn, coord, up_flow0_2, up_flow1_2, embt, downsample=4)\n\n        # residue update with lookup corr\n        delta_ft_1_, delta_flow_2 = self.update2_low(ft_1_, flow_2, corr_2)\n        delta_flow0_2, delta_flow1_2 = torch.chunk(delta_flow_2, 2, 1)\n        up_flow0_2 = up_flow0_2 + delta_flow0_2\n        up_flow1_2 = up_flow1_2 + delta_flow1_2\n        ft_1_ = ft_1_ + delta_ft_1_\n\n        # residue update with lookup corr (hr)\n        corr_2 = resize(corr_2, scale_factor=4.0)\n        up_flow_2 = torch.cat([up_flow0_2, up_flow1_2], dim=1)\n        delta_ft_1_, delta_up_flow_2 = self.update2_high(ft_1_, up_flow_2, corr_2)\n        ft_1_ += delta_ft_1_\n        up_flow0_2 += delta_up_flow_2[:, 0:2]\n        up_flow1_2 += delta_up_flow_2[:, 2:4]\n\n        ######################################### the 1st decoder #########################################\n        up_flow0_1, up_flow1_1, mask, img_res = self.decoder1(ft_1_, f0_1, f1_1, up_flow0_2, up_flow1_2)\n\n        if scale_factor != 1.0:\n            up_flow0_1 = resize(up_flow0_1, scale_factor=(1.0 / scale_factor)) * (1.0 / scale_factor)\n            up_flow1_1 = resize(up_flow1_1, scale_factor=(1.0 / scale_factor)) * (1.0 / scale_factor)\n            mask = resize(mask, scale_factor=(1.0 / scale_factor))\n            img_res = resize(img_res, scale_factor=(1.0 / scale_factor))\n\n        # Merge multiple predictions\n        imgt_pred = multi_flow_combine(self.comb_block, img0, img1, up_flow0_1, up_flow1_1, mask, img_res, mean_)\n        imgt_pred = torch.clamp(imgt_pred, 0, 1)\n\n        if eval:\n            return {\n                \"imgt_pred\": imgt_pred,\n            }\n        else:\n            up_flow0_1 = up_flow0_1.reshape(b, self.num_flows, 2, h, w)\n            up_flow1_1 = up_flow1_1.reshape(b, self.num_flows, 2, h, w)\n            return {\n                \"imgt_pred\": imgt_pred,\n                \"flow0_pred\": [up_flow0_1, up_flow0_2, up_flow0_3, up_flow0_4],\n                \"flow1_pred\": [up_flow1_1, up_flow1_2, up_flow1_3, up_flow1_4],\n                \"ft_pred\": [ft_1_, ft_2_, ft_3_],\n            }\n"
  },
  {
    "path": "Open-Sora/tools/frame_interpolation/networks/blocks/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/tools/frame_interpolation/networks/blocks/feat_enc.py",
    "content": "import torch\nimport torch.nn as nn\n\n\nclass BottleneckBlock(nn.Module):\n    def __init__(self, in_planes, planes, norm_fn=\"group\", stride=1):\n        super(BottleneckBlock, self).__init__()\n\n        self.conv1 = nn.Conv2d(in_planes, planes // 4, kernel_size=1, padding=0)\n        self.conv2 = nn.Conv2d(planes // 4, planes // 4, kernel_size=3, padding=1, stride=stride)\n        self.conv3 = nn.Conv2d(planes // 4, planes, kernel_size=1, padding=0)\n        self.relu = nn.ReLU(inplace=True)\n\n        num_groups = planes // 8\n\n        if norm_fn == \"group\":\n            self.norm1 = nn.GroupNorm(num_groups=num_groups, num_channels=planes // 4)\n            self.norm2 = nn.GroupNorm(num_groups=num_groups, num_channels=planes // 4)\n            self.norm3 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n            if not stride == 1:\n                self.norm4 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n\n        elif norm_fn == \"batch\":\n            self.norm1 = nn.BatchNorm2d(planes // 4)\n            self.norm2 = nn.BatchNorm2d(planes // 4)\n            self.norm3 = nn.BatchNorm2d(planes)\n            if not stride == 1:\n                self.norm4 = nn.BatchNorm2d(planes)\n\n        elif norm_fn == \"instance\":\n            self.norm1 = nn.InstanceNorm2d(planes // 4)\n            self.norm2 = nn.InstanceNorm2d(planes // 4)\n            self.norm3 = nn.InstanceNorm2d(planes)\n            if not stride == 1:\n                self.norm4 = nn.InstanceNorm2d(planes)\n\n        elif norm_fn == \"none\":\n            self.norm1 = nn.Sequential()\n            self.norm2 = nn.Sequential()\n            self.norm3 = nn.Sequential()\n            if not stride == 1:\n                self.norm4 = nn.Sequential()\n\n        if stride == 1:\n            self.downsample = None\n\n        else:\n            self.downsample = nn.Sequential(nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride), self.norm4)\n\n    def forward(self, x):\n        y = x\n        y = self.relu(self.norm1(self.conv1(y)))\n        y = self.relu(self.norm2(self.conv2(y)))\n        y = self.relu(self.norm3(self.conv3(y)))\n\n        if self.downsample is not None:\n            x = self.downsample(x)\n\n        return self.relu(x + y)\n\n\nclass ResidualBlock(nn.Module):\n    def __init__(self, in_planes, planes, norm_fn=\"group\", stride=1):\n        super(ResidualBlock, self).__init__()\n\n        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, padding=1, stride=stride)\n        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, padding=1)\n        self.relu = nn.ReLU(inplace=True)\n\n        num_groups = planes // 8\n\n        if norm_fn == \"group\":\n            self.norm1 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n            self.norm2 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n            if not stride == 1:\n                self.norm3 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)\n\n        elif norm_fn == \"batch\":\n            self.norm1 = nn.BatchNorm2d(planes)\n            self.norm2 = nn.BatchNorm2d(planes)\n            if not stride == 1:\n                self.norm3 = nn.BatchNorm2d(planes)\n\n        elif norm_fn == \"instance\":\n            self.norm1 = nn.InstanceNorm2d(planes)\n            self.norm2 = nn.InstanceNorm2d(planes)\n            if not stride == 1:\n                self.norm3 = nn.InstanceNorm2d(planes)\n\n        elif norm_fn == \"none\":\n            self.norm1 = nn.Sequential()\n            self.norm2 = nn.Sequential()\n            if not stride == 1:\n                self.norm3 = nn.Sequential()\n\n        if stride == 1:\n            self.downsample = None\n\n        else:\n            self.downsample = nn.Sequential(nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride), self.norm3)\n\n    def forward(self, x):\n        y = x\n        y = self.relu(self.norm1(self.conv1(y)))\n        y = self.relu(self.norm2(self.conv2(y)))\n\n        if self.downsample is not None:\n            x = self.downsample(x)\n\n        return self.relu(x + y)\n\n\nclass SmallEncoder(nn.Module):\n    def __init__(self, output_dim=128, norm_fn=\"batch\", dropout=0.0):\n        super(SmallEncoder, self).__init__()\n        self.norm_fn = norm_fn\n\n        if self.norm_fn == \"group\":\n            self.norm1 = nn.GroupNorm(num_groups=8, num_channels=32)\n\n        elif self.norm_fn == \"batch\":\n            self.norm1 = nn.BatchNorm2d(32)\n\n        elif self.norm_fn == \"instance\":\n            self.norm1 = nn.InstanceNorm2d(32)\n\n        elif self.norm_fn == \"none\":\n            self.norm1 = nn.Sequential()\n\n        self.conv1 = nn.Conv2d(3, 32, kernel_size=7, stride=2, padding=3)\n        self.relu1 = nn.ReLU(inplace=True)\n\n        self.in_planes = 32\n        self.layer1 = self._make_layer(32, stride=1)\n        self.layer2 = self._make_layer(64, stride=2)\n        self.layer3 = self._make_layer(96, stride=2)\n\n        self.dropout = None\n        if dropout > 0:\n            self.dropout = nn.Dropout2d(p=dropout)\n\n        self.conv2 = nn.Conv2d(96, output_dim, kernel_size=1)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode=\"fan_out\", nonlinearity=\"relu\")\n            elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):\n                if m.weight is not None:\n                    nn.init.constant_(m.weight, 1)\n                if m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n\n    def _make_layer(self, dim, stride=1):\n        layer1 = BottleneckBlock(self.in_planes, dim, self.norm_fn, stride=stride)\n        layer2 = BottleneckBlock(dim, dim, self.norm_fn, stride=1)\n        layers = (layer1, layer2)\n\n        self.in_planes = dim\n        return nn.Sequential(*layers)\n\n    def forward(self, x):\n        # if input is list, combine batch dimension\n        is_list = isinstance(x, tuple) or isinstance(x, list)\n        if is_list:\n            batch_dim = x[0].shape[0]\n            x = torch.cat(x, dim=0)\n\n        x = self.conv1(x)\n        x = self.norm1(x)\n        x = self.relu1(x)\n\n        x = self.layer1(x)\n        x = self.layer2(x)\n        x = self.layer3(x)\n        x = self.conv2(x)\n\n        if self.training and self.dropout is not None:\n            x = self.dropout(x)\n\n        if is_list:\n            x = torch.split(x, [batch_dim, batch_dim], dim=0)\n\n        return x\n\n\nclass BasicEncoder(nn.Module):\n    def __init__(self, output_dim=128, norm_fn=\"batch\", dropout=0.0):\n        super(BasicEncoder, self).__init__()\n        self.norm_fn = norm_fn\n\n        if self.norm_fn == \"group\":\n            self.norm1 = nn.GroupNorm(num_groups=8, num_channels=64)\n\n        elif self.norm_fn == \"batch\":\n            self.norm1 = nn.BatchNorm2d(64)\n\n        elif self.norm_fn == \"instance\":\n            self.norm1 = nn.InstanceNorm2d(64)\n\n        elif self.norm_fn == \"none\":\n            self.norm1 = nn.Sequential()\n\n        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)\n        self.relu1 = nn.ReLU(inplace=True)\n\n        self.in_planes = 64\n        self.layer1 = self._make_layer(64, stride=1)\n        self.layer2 = self._make_layer(72, stride=2)\n        self.layer3 = self._make_layer(128, stride=2)\n\n        # output convolution\n        self.conv2 = nn.Conv2d(128, output_dim, kernel_size=1)\n\n        self.dropout = None\n        if dropout > 0:\n            self.dropout = nn.Dropout2d(p=dropout)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode=\"fan_out\", nonlinearity=\"relu\")\n            elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):\n                if m.weight is not None:\n                    nn.init.constant_(m.weight, 1)\n                if m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n\n    def _make_layer(self, dim, stride=1):\n        layer1 = ResidualBlock(self.in_planes, dim, self.norm_fn, stride=stride)\n        layer2 = ResidualBlock(dim, dim, self.norm_fn, stride=1)\n        layers = (layer1, layer2)\n\n        self.in_planes = dim\n        return nn.Sequential(*layers)\n\n    def forward(self, x):\n        # if input is list, combine batch dimension\n        is_list = isinstance(x, tuple) or isinstance(x, list)\n        if is_list:\n            batch_dim = x[0].shape[0]\n            x = torch.cat(x, dim=0)\n\n        x = self.conv1(x)\n        x = self.norm1(x)\n        x = self.relu1(x)\n\n        x = self.layer1(x)\n        x = self.layer2(x)\n        x = self.layer3(x)\n\n        x = self.conv2(x)\n\n        if self.training and self.dropout is not None:\n            x = self.dropout(x)\n\n        if is_list:\n            x = torch.split(x, [batch_dim, batch_dim], dim=0)\n\n        return x\n\n\nclass LargeEncoder(nn.Module):\n    def __init__(self, output_dim=128, norm_fn=\"batch\", dropout=0.0):\n        super(LargeEncoder, self).__init__()\n        self.norm_fn = norm_fn\n\n        if self.norm_fn == \"group\":\n            self.norm1 = nn.GroupNorm(num_groups=8, num_channels=64)\n\n        elif self.norm_fn == \"batch\":\n            self.norm1 = nn.BatchNorm2d(64)\n\n        elif self.norm_fn == \"instance\":\n            self.norm1 = nn.InstanceNorm2d(64)\n\n        elif self.norm_fn == \"none\":\n            self.norm1 = nn.Sequential()\n\n        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)\n        self.relu1 = nn.ReLU(inplace=True)\n\n        self.in_planes = 64\n        self.layer1 = self._make_layer(64, stride=1)\n        self.layer2 = self._make_layer(112, stride=2)\n        self.layer3 = self._make_layer(160, stride=2)\n        self.layer3_2 = self._make_layer(160, stride=1)\n\n        # output convolution\n        self.conv2 = nn.Conv2d(self.in_planes, output_dim, kernel_size=1)\n\n        self.dropout = None\n        if dropout > 0:\n            self.dropout = nn.Dropout2d(p=dropout)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode=\"fan_out\", nonlinearity=\"relu\")\n            elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):\n                if m.weight is not None:\n                    nn.init.constant_(m.weight, 1)\n                if m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n\n    def _make_layer(self, dim, stride=1):\n        layer1 = ResidualBlock(self.in_planes, dim, self.norm_fn, stride=stride)\n        layer2 = ResidualBlock(dim, dim, self.norm_fn, stride=1)\n        layers = (layer1, layer2)\n\n        self.in_planes = dim\n        return nn.Sequential(*layers)\n\n    def forward(self, x):\n        # if input is list, combine batch dimension\n        is_list = isinstance(x, tuple) or isinstance(x, list)\n        if is_list:\n            batch_dim = x[0].shape[0]\n            x = torch.cat(x, dim=0)\n\n        x = self.conv1(x)\n        x = self.norm1(x)\n        x = self.relu1(x)\n\n        x = self.layer1(x)\n        x = self.layer2(x)\n        x = self.layer3(x)\n        x = self.layer3_2(x)\n\n        x = self.conv2(x)\n\n        if self.training and self.dropout is not None:\n            x = self.dropout(x)\n\n        if is_list:\n            x = torch.split(x, [batch_dim, batch_dim], dim=0)\n\n        return x\n"
  },
  {
    "path": "Open-Sora/tools/frame_interpolation/networks/blocks/ifrnet.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom tools.frame_interpolation.utils.flow_utils import warp\n\n\ndef resize(x, scale_factor):\n    return F.interpolate(x, scale_factor=scale_factor, mode=\"bilinear\", align_corners=False)\n\n\ndef convrelu(in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1, groups=1, bias=True):\n    return nn.Sequential(\n        nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias=bias),\n        nn.PReLU(out_channels),\n    )\n\n\nclass ResBlock(nn.Module):\n    def __init__(self, in_channels, side_channels, bias=True):\n        super(ResBlock, self).__init__()\n        self.side_channels = side_channels\n        self.conv1 = nn.Sequential(\n            nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1, bias=bias), nn.PReLU(in_channels)\n        )\n        self.conv2 = nn.Sequential(\n            nn.Conv2d(side_channels, side_channels, kernel_size=3, stride=1, padding=1, bias=bias),\n            nn.PReLU(side_channels),\n        )\n        self.conv3 = nn.Sequential(\n            nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1, bias=bias), nn.PReLU(in_channels)\n        )\n        self.conv4 = nn.Sequential(\n            nn.Conv2d(side_channels, side_channels, kernel_size=3, stride=1, padding=1, bias=bias),\n            nn.PReLU(side_channels),\n        )\n        self.conv5 = nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1, bias=bias)\n        self.prelu = nn.PReLU(in_channels)\n\n    def forward(self, x):\n        out = self.conv1(x)\n\n        res_feat = out[:, : -self.side_channels, ...]\n        side_feat = out[:, -self.side_channels :, :, :]\n        side_feat = self.conv2(side_feat)\n        out = self.conv3(torch.cat([res_feat, side_feat], 1))\n\n        res_feat = out[:, : -self.side_channels, ...]\n        side_feat = out[:, -self.side_channels :, :, :]\n        side_feat = self.conv4(side_feat)\n        out = self.conv5(torch.cat([res_feat, side_feat], 1))\n\n        out = self.prelu(x + out)\n        return out\n\n\nclass Encoder(nn.Module):\n    def __init__(self, channels, large=False):\n        super(Encoder, self).__init__()\n        self.channels = channels\n        prev_ch = 3\n        for idx, ch in enumerate(channels, 1):\n            k = 7 if large and idx == 1 else 3\n            p = 3 if k == 7 else 1\n            self.register_module(\n                f\"pyramid{idx}\", nn.Sequential(convrelu(prev_ch, ch, k, 2, p), convrelu(ch, ch, 3, 1, 1))\n            )\n            prev_ch = ch\n\n    def forward(self, in_x):\n        fs = []\n        for idx in range(len(self.channels)):\n            out_x = getattr(self, f\"pyramid{idx+1}\")(in_x)\n            fs.append(out_x)\n            in_x = out_x\n        return fs\n\n\nclass InitDecoder(nn.Module):\n    def __init__(self, in_ch, out_ch, skip_ch) -> None:\n        super().__init__()\n        self.convblock = nn.Sequential(\n            convrelu(in_ch * 2 + 1, in_ch * 2),\n            ResBlock(in_ch * 2, skip_ch),\n            nn.ConvTranspose2d(in_ch * 2, out_ch + 4, 4, 2, 1, bias=True),\n        )\n\n    def forward(self, f0, f1, embt):\n        h, w = f0.shape[2:]\n        embt = embt.repeat(1, 1, h, w)\n        out = self.convblock(torch.cat([f0, f1, embt], 1))\n        flow0, flow1 = torch.chunk(out[:, :4, ...], 2, 1)\n        ft_ = out[:, 4:, ...]\n        return flow0, flow1, ft_\n\n\nclass IntermediateDecoder(nn.Module):\n    def __init__(self, in_ch, out_ch, skip_ch) -> None:\n        super().__init__()\n        self.convblock = nn.Sequential(\n            convrelu(in_ch * 3 + 4, in_ch * 3),\n            ResBlock(in_ch * 3, skip_ch),\n            nn.ConvTranspose2d(in_ch * 3, out_ch + 4, 4, 2, 1, bias=True),\n        )\n\n    def forward(self, ft_, f0, f1, flow0_in, flow1_in):\n        f0_warp = warp(f0, flow0_in)\n        f1_warp = warp(f1, flow1_in)\n        f_in = torch.cat([ft_, f0_warp, f1_warp, flow0_in, flow1_in], 1)\n        out = self.convblock(f_in)\n        flow0, flow1 = torch.chunk(out[:, :4, ...], 2, 1)\n        ft_ = out[:, 4:, ...]\n        flow0 = flow0 + 2.0 * resize(flow0_in, scale_factor=2.0)\n        flow1 = flow1 + 2.0 * resize(flow1_in, scale_factor=2.0)\n        return flow0, flow1, ft_\n"
  },
  {
    "path": "Open-Sora/tools/frame_interpolation/networks/blocks/multi_flow.py",
    "content": "import torch\nimport torch.nn as nn\n\nfrom tools.frame_interpolation.utils.flow_utils import warp\n\nfrom .ifrnet import ResBlock, convrelu, resize\n\n\ndef multi_flow_combine(comb_block, img0, img1, flow0, flow1, mask=None, img_res=None, mean=None):\n    \"\"\"\n    A parallel implementation of multiple flow field warping\n    comb_block: An nn.Seqential object.\n    img shape: [b, c, h, w]\n    flow shape: [b, 2*num_flows, h, w]\n    mask (opt):\n        If 'mask' is None, the function conduct a simple average.\n    img_res (opt):\n        If 'img_res' is None, the function adds zero instead.\n    mean (opt):\n        If 'mean' is None, the function adds zero instead.\n    \"\"\"\n    b, c, h, w = flow0.shape\n    num_flows = c // 2\n    flow0 = flow0.reshape(b, num_flows, 2, h, w).reshape(-1, 2, h, w)\n    flow1 = flow1.reshape(b, num_flows, 2, h, w).reshape(-1, 2, h, w)\n\n    mask = mask.reshape(b, num_flows, 1, h, w).reshape(-1, 1, h, w) if mask is not None else None\n    img_res = img_res.reshape(b, num_flows, 3, h, w).reshape(-1, 3, h, w) if img_res is not None else 0\n    img0 = torch.stack([img0] * num_flows, 1).reshape(-1, 3, h, w)\n    img1 = torch.stack([img1] * num_flows, 1).reshape(-1, 3, h, w)\n    mean = torch.stack([mean] * num_flows, 1).reshape(-1, 1, 1, 1) if mean is not None else 0\n\n    img0_warp = warp(img0, flow0)\n    img1_warp = warp(img1, flow1)\n    img_warps = mask * img0_warp + (1 - mask) * img1_warp + mean + img_res\n    img_warps = img_warps.reshape(b, num_flows, 3, h, w)\n    imgt_pred = img_warps.mean(1) + comb_block(img_warps.view(b, -1, h, w))\n    return imgt_pred\n\n\nclass MultiFlowDecoder(nn.Module):\n    def __init__(self, in_ch, skip_ch, num_flows=3):\n        super(MultiFlowDecoder, self).__init__()\n        self.num_flows = num_flows\n        self.convblock = nn.Sequential(\n            convrelu(in_ch * 3 + 4, in_ch * 3),\n            ResBlock(in_ch * 3, skip_ch),\n            nn.ConvTranspose2d(in_ch * 3, 8 * num_flows, 4, 2, 1, bias=True),\n        )\n\n    def forward(self, ft_, f0, f1, flow0, flow1):\n        n = self.num_flows\n        f0_warp = warp(f0, flow0)\n        f1_warp = warp(f1, flow1)\n        out = self.convblock(torch.cat([ft_, f0_warp, f1_warp, flow0, flow1], 1))\n        delta_flow0, delta_flow1, mask, img_res = torch.split(out, [2 * n, 2 * n, n, 3 * n], 1)\n        mask = torch.sigmoid(mask)\n\n        flow0 = delta_flow0 + 2.0 * resize(flow0, scale_factor=2.0).repeat(1, self.num_flows, 1, 1)\n        flow1 = delta_flow1 + 2.0 * resize(flow1, scale_factor=2.0).repeat(1, self.num_flows, 1, 1)\n\n        return flow0, flow1, mask, img_res\n"
  },
  {
    "path": "Open-Sora/tools/frame_interpolation/networks/blocks/raft.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\ndef resize(x, scale_factor):\n    return F.interpolate(x, scale_factor=scale_factor, mode=\"bilinear\", align_corners=False)\n\n\ndef bilinear_sampler(img, coords, mask=False):\n    \"\"\"Wrapper for grid_sample, uses pixel coordinates\"\"\"\n    H, W = img.shape[-2:]\n    xgrid, ygrid = coords.split([1, 1], dim=-1)\n    xgrid = 2 * xgrid / (W - 1) - 1\n    ygrid = 2 * ygrid / (H - 1) - 1\n\n    grid = torch.cat([xgrid, ygrid], dim=-1)\n    img = F.grid_sample(img, grid, align_corners=True)\n\n    if mask:\n        mask = (xgrid > -1) & (ygrid > -1) & (xgrid < 1) & (ygrid < 1)\n        return img, mask.float()\n\n    return img\n\n\ndef coords_grid(batch, ht, wd, device):\n    coords = torch.meshgrid(torch.arange(ht, device=device), torch.arange(wd, device=device), indexing=\"ij\")\n    coords = torch.stack(coords[::-1], dim=0).float()\n    return coords[None].repeat(batch, 1, 1, 1)\n\n\nclass SmallUpdateBlock(nn.Module):\n    def __init__(self, cdim, hidden_dim, flow_dim, corr_dim, fc_dim, corr_levels=4, radius=3, scale_factor=None):\n        super(SmallUpdateBlock, self).__init__()\n        cor_planes = corr_levels * (2 * radius + 1) ** 2\n        self.scale_factor = scale_factor\n\n        self.convc1 = nn.Conv2d(2 * cor_planes, corr_dim, 1, padding=0)\n        self.convf1 = nn.Conv2d(4, flow_dim * 2, 7, padding=3)\n        self.convf2 = nn.Conv2d(flow_dim * 2, flow_dim, 3, padding=1)\n        self.conv = nn.Conv2d(corr_dim + flow_dim, fc_dim, 3, padding=1)\n\n        self.gru = nn.Sequential(\n            nn.Conv2d(fc_dim + 4 + cdim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n        )\n\n        self.feat_head = nn.Sequential(\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, cdim, 3, padding=1),\n        )\n\n        self.flow_head = nn.Sequential(\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, 4, 3, padding=1),\n        )\n\n        self.lrelu = nn.LeakyReLU(negative_slope=0.1, inplace=True)\n\n    def forward(self, net, flow, corr):\n        net = resize(net, 1 / self.scale_factor) if self.scale_factor is not None else net\n        cor = self.lrelu(self.convc1(corr))\n        flo = self.lrelu(self.convf1(flow))\n        flo = self.lrelu(self.convf2(flo))\n        cor_flo = torch.cat([cor, flo], dim=1)\n        inp = self.lrelu(self.conv(cor_flo))\n        inp = torch.cat([inp, flow, net], dim=1)\n\n        out = self.gru(inp)\n        delta_net = self.feat_head(out)\n        delta_flow = self.flow_head(out)\n\n        if self.scale_factor is not None:\n            delta_net = resize(delta_net, scale_factor=self.scale_factor)\n            delta_flow = self.scale_factor * resize(delta_flow, scale_factor=self.scale_factor)\n\n        return delta_net, delta_flow\n\n\nclass BasicUpdateBlock(nn.Module):\n    def __init__(\n        self,\n        cdim,\n        hidden_dim,\n        flow_dim,\n        corr_dim,\n        corr_dim2,\n        fc_dim,\n        corr_levels=4,\n        radius=3,\n        scale_factor=None,\n        out_num=1,\n    ):\n        super(BasicUpdateBlock, self).__init__()\n        cor_planes = corr_levels * (2 * radius + 1) ** 2\n\n        self.scale_factor = scale_factor\n        self.convc1 = nn.Conv2d(2 * cor_planes, corr_dim, 1, padding=0)\n        self.convc2 = nn.Conv2d(corr_dim, corr_dim2, 3, padding=1)\n        self.convf1 = nn.Conv2d(4, flow_dim * 2, 7, padding=3)\n        self.convf2 = nn.Conv2d(flow_dim * 2, flow_dim, 3, padding=1)\n        self.conv = nn.Conv2d(flow_dim + corr_dim2, fc_dim, 3, padding=1)\n\n        self.gru = nn.Sequential(\n            nn.Conv2d(fc_dim + 4 + cdim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n        )\n\n        self.feat_head = nn.Sequential(\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, cdim, 3, padding=1),\n        )\n\n        self.flow_head = nn.Sequential(\n            nn.Conv2d(hidden_dim, hidden_dim, 3, padding=1),\n            nn.LeakyReLU(negative_slope=0.1, inplace=True),\n            nn.Conv2d(hidden_dim, 4 * out_num, 3, padding=1),\n        )\n\n        self.lrelu = nn.LeakyReLU(negative_slope=0.1, inplace=True)\n\n    def forward(self, net, flow, corr):\n        net = resize(net, 1 / self.scale_factor) if self.scale_factor is not None else net\n        cor = self.lrelu(self.convc1(corr))\n        cor = self.lrelu(self.convc2(cor))\n        flo = self.lrelu(self.convf1(flow))\n        flo = self.lrelu(self.convf2(flo))\n        cor_flo = torch.cat([cor, flo], dim=1)\n        inp = self.lrelu(self.conv(cor_flo))\n        inp = torch.cat([inp, flow, net], dim=1)\n\n        out = self.gru(inp)\n        delta_net = self.feat_head(out)\n        delta_flow = self.flow_head(out)\n\n        if self.scale_factor is not None:\n            delta_net = resize(delta_net, scale_factor=self.scale_factor)\n            delta_flow = self.scale_factor * resize(delta_flow, scale_factor=self.scale_factor)\n        return delta_net, delta_flow\n\n\nclass BidirCorrBlock:\n    def __init__(self, fmap1, fmap2, num_levels=4, radius=4):\n        self.num_levels = num_levels\n        self.radius = radius\n        self.corr_pyramid = []\n        self.corr_pyramid_T = []\n\n        corr = BidirCorrBlock.corr(fmap1, fmap2)\n        batch, h1, w1, dim, h2, w2 = corr.shape\n        corr_T = corr.clone().permute(0, 4, 5, 3, 1, 2)\n\n        corr = corr.reshape(batch * h1 * w1, dim, h2, w2)\n        corr_T = corr_T.reshape(batch * h2 * w2, dim, h1, w1)\n\n        self.corr_pyramid.append(corr)\n        self.corr_pyramid_T.append(corr_T)\n\n        for _ in range(self.num_levels - 1):\n            corr = F.avg_pool2d(corr, 2, stride=2)\n            corr_T = F.avg_pool2d(corr_T, 2, stride=2)\n            self.corr_pyramid.append(corr)\n            self.corr_pyramid_T.append(corr_T)\n\n    def __call__(self, coords0, coords1):\n        r = self.radius\n        coords0 = coords0.permute(0, 2, 3, 1)\n        coords1 = coords1.permute(0, 2, 3, 1)\n        assert coords0.shape == coords1.shape, f\"coords0 shape: [{coords0.shape}] is not equal to [{coords1.shape}]\"\n        batch, h1, w1, _ = coords0.shape\n\n        out_pyramid = []\n        out_pyramid_T = []\n        for i in range(self.num_levels):\n            corr = self.corr_pyramid[i]\n            corr_T = self.corr_pyramid_T[i]\n\n            dx = torch.linspace(-r, r, 2 * r + 1, device=coords0.device)\n            dy = torch.linspace(-r, r, 2 * r + 1, device=coords0.device)\n            delta = torch.stack(torch.meshgrid(dy, dx, indexing=\"ij\"), axis=-1)\n            delta_lvl = delta.view(1, 2 * r + 1, 2 * r + 1, 2)\n\n            centroid_lvl_0 = coords0.reshape(batch * h1 * w1, 1, 1, 2) / 2**i\n            centroid_lvl_1 = coords1.reshape(batch * h1 * w1, 1, 1, 2) / 2**i\n            coords_lvl_0 = centroid_lvl_0 + delta_lvl\n            coords_lvl_1 = centroid_lvl_1 + delta_lvl\n\n            corr = bilinear_sampler(corr, coords_lvl_0)\n            corr_T = bilinear_sampler(corr_T, coords_lvl_1)\n            corr = corr.view(batch, h1, w1, -1)\n            corr_T = corr_T.view(batch, h1, w1, -1)\n            out_pyramid.append(corr)\n            out_pyramid_T.append(corr_T)\n\n        out = torch.cat(out_pyramid, dim=-1)\n        out_T = torch.cat(out_pyramid_T, dim=-1)\n        return out.permute(0, 3, 1, 2).contiguous().float(), out_T.permute(0, 3, 1, 2).contiguous().float()\n\n    @staticmethod\n    def corr(fmap1, fmap2):\n        batch, dim, ht, wd = fmap1.shape\n        fmap1 = fmap1.view(batch, dim, ht * wd)\n        fmap2 = fmap2.view(batch, dim, ht * wd)\n\n        corr = torch.matmul(fmap1.transpose(1, 2), fmap2)\n        corr = corr.view(batch, ht, wd, 1, ht, wd)\n        return corr / torch.sqrt(torch.tensor(dim).float())\n"
  },
  {
    "path": "Open-Sora/tools/frame_interpolation/utils/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/tools/frame_interpolation/utils/dist_utils.py",
    "content": "import os\n\nimport torch\n\n\ndef get_world_size():\n    \"\"\"Find OMPI world size without calling mpi functions\n    :rtype: int\n    \"\"\"\n    if os.environ.get(\"PMI_SIZE\") is not None:\n        return int(os.environ.get(\"PMI_SIZE\") or 1)\n    elif os.environ.get(\"OMPI_COMM_WORLD_SIZE\") is not None:\n        return int(os.environ.get(\"OMPI_COMM_WORLD_SIZE\") or 1)\n    else:\n        return torch.cuda.device_count()\n\n\ndef get_global_rank():\n    \"\"\"Find OMPI world rank without calling mpi functions\n    :rtype: int\n    \"\"\"\n    if os.environ.get(\"PMI_RANK\") is not None:\n        return int(os.environ.get(\"PMI_RANK\") or 0)\n    elif os.environ.get(\"OMPI_COMM_WORLD_RANK\") is not None:\n        return int(os.environ.get(\"OMPI_COMM_WORLD_RANK\") or 0)\n    else:\n        return 0\n\n\ndef get_local_rank():\n    \"\"\"Find OMPI local rank without calling mpi functions\n    :rtype: int\n    \"\"\"\n    if os.environ.get(\"MPI_LOCALRANKID\") is not None:\n        return int(os.environ.get(\"MPI_LOCALRANKID\") or 0)\n    elif os.environ.get(\"OMPI_COMM_WORLD_LOCAL_RANK\") is not None:\n        return int(os.environ.get(\"OMPI_COMM_WORLD_LOCAL_RANK\") or 0)\n    else:\n        return 0\n\n\ndef get_master_ip():\n    if os.environ.get(\"AZ_BATCH_MASTER_NODE\") is not None:\n        return os.environ.get(\"AZ_BATCH_MASTER_NODE\").split(\":\")[0]\n    elif os.environ.get(\"AZ_BATCHAI_MPI_MASTER_NODE\") is not None:\n        return os.environ.get(\"AZ_BATCHAI_MPI_MASTER_NODE\")\n    else:\n        return \"127.0.0.1\"\n"
  },
  {
    "path": "Open-Sora/tools/frame_interpolation/utils/flow_utils.py",
    "content": "import numpy as np\nimport torch\nimport torch.nn.functional as F\nfrom PIL import ImageFile\n\nImageFile.LOAD_TRUNCATED_IMAGES = True\n\n\ndef warp(img, flow):\n    B, _, H, W = flow.shape\n    xx = torch.linspace(-1.0, 1.0, W).view(1, 1, 1, W).expand(B, -1, H, -1)\n    yy = torch.linspace(-1.0, 1.0, H).view(1, 1, H, 1).expand(B, -1, -1, W)\n    grid = torch.cat([xx, yy], 1).to(img)\n    flow_ = torch.cat([flow[:, 0:1, :, :] / ((W - 1.0) / 2.0), flow[:, 1:2, :, :] / ((H - 1.0) / 2.0)], 1)\n    grid_ = (grid + flow_).permute(0, 2, 3, 1)\n    output = F.grid_sample(input=img, grid=grid_, mode=\"bilinear\", padding_mode=\"border\", align_corners=True)\n    return output\n\n\ndef make_colorwheel():\n    \"\"\"\n    Generates a color wheel for optical flow visualization as presented in:\n        Baker et al. \"A Database and Evaluation Methodology for Optical Flow\" (ICCV, 2007)\n        URL: http://vision.middlebury.edu/flow/flowEval-iccv07.pdf\n    Code follows the original C++ source code of Daniel Scharstein.\n    Code follows the Matlab source code of Deqing Sun.\n    Returns:\n        np.ndarray: Color wheel\n    \"\"\"\n\n    RY = 15\n    YG = 6\n    GC = 4\n    CB = 11\n    BM = 13\n    MR = 6\n\n    ncols = RY + YG + GC + CB + BM + MR\n    colorwheel = np.zeros((ncols, 3))\n    col = 0\n\n    # RY\n    colorwheel[0:RY, 0] = 255\n    colorwheel[0:RY, 1] = np.floor(255 * np.arange(0, RY) / RY)\n    col = col + RY\n    # YG\n    colorwheel[col : col + YG, 0] = 255 - np.floor(255 * np.arange(0, YG) / YG)\n    colorwheel[col : col + YG, 1] = 255\n    col = col + YG\n    # GC\n    colorwheel[col : col + GC, 1] = 255\n    colorwheel[col : col + GC, 2] = np.floor(255 * np.arange(0, GC) / GC)\n    col = col + GC\n    # CB\n    colorwheel[col : col + CB, 1] = 255 - np.floor(255 * np.arange(CB) / CB)\n    colorwheel[col : col + CB, 2] = 255\n    col = col + CB\n    # BM\n    colorwheel[col : col + BM, 2] = 255\n    colorwheel[col : col + BM, 0] = np.floor(255 * np.arange(0, BM) / BM)\n    col = col + BM\n    # MR\n    colorwheel[col : col + MR, 2] = 255 - np.floor(255 * np.arange(MR) / MR)\n    colorwheel[col : col + MR, 0] = 255\n    return colorwheel\n\n\ndef flow_uv_to_colors(u, v, convert_to_bgr=False):\n    \"\"\"\n    Applies the flow color wheel to (possibly clipped) flow components u and v.\n    According to the C++ source code of Daniel Scharstein\n    According to the Matlab source code of Deqing Sun\n    Args:\n        u (np.ndarray): Input horizontal flow of shape [H,W]\n        v (np.ndarray): Input vertical flow of shape [H,W]\n        convert_to_bgr (bool, optional): Convert output image to BGR. Defaults to False.\n    Returns:\n        np.ndarray: Flow visualization image of shape [H,W,3]\n    \"\"\"\n    flow_image = np.zeros((u.shape[0], u.shape[1], 3), np.uint8)\n    colorwheel = make_colorwheel()  # shape [55x3]\n    ncols = colorwheel.shape[0]\n    rad = np.sqrt(np.square(u) + np.square(v))\n    a = np.arctan2(-v, -u) / np.pi\n    fk = (a + 1) / 2 * (ncols - 1)\n    k0 = np.floor(fk).astype(np.int32)\n    k1 = k0 + 1\n    k1[k1 == ncols] = 0\n    f = fk - k0\n    for i in range(colorwheel.shape[1]):\n        tmp = colorwheel[:, i]\n        col0 = tmp[k0] / 255.0\n        col1 = tmp[k1] / 255.0\n        col = (1 - f) * col0 + f * col1\n        idx = rad <= 1\n        col[idx] = 1 - rad[idx] * (1 - col[idx])\n        col[~idx] = col[~idx] * 0.75  # out of range\n        # Note the 2-i => BGR instead of RGB\n        ch_idx = 2 - i if convert_to_bgr else i\n        flow_image[:, :, ch_idx] = np.floor(255 * col)\n    return flow_image\n\n\ndef flow_to_image(flow_uv, clip_flow=None, convert_to_bgr=False):\n    \"\"\"\n    Expects a two dimensional flow image of shape.\n    Args:\n        flow_uv (np.ndarray): Flow UV image of shape [H,W,2]\n        clip_flow (float, optional): Clip maximum of flow values. Defaults to None.\n        convert_to_bgr (bool, optional): Convert output image to BGR. Defaults to False.\n    Returns:\n        np.ndarray: Flow visualization image of shape [H,W,3]\n    \"\"\"\n    assert flow_uv.ndim == 3, \"input flow must have three dimensions\"\n    assert flow_uv.shape[2] == 2, \"input flow must have shape [H,W,2]\"\n    if clip_flow is not None:\n        flow_uv = np.clip(flow_uv, 0, clip_flow)\n    u = flow_uv[:, :, 0]\n    v = flow_uv[:, :, 1]\n    rad = np.sqrt(np.square(u) + np.square(v))\n    rad_max = np.max(rad)\n    epsilon = 1e-5\n    u = u / (rad_max + epsilon)\n    v = v / (rad_max + epsilon)\n    return flow_uv_to_colors(u, v, convert_to_bgr)\n"
  },
  {
    "path": "Open-Sora/tools/frame_interpolation/utils/utils.py",
    "content": "import random\nimport re\nimport sys\n\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\nfrom imageio import imread, imwrite\nfrom PIL import ImageFile\n\nImageFile.LOAD_TRUNCATED_IMAGES = True\n\n\nclass AverageMeter:\n    def __init__(self):\n        self.reset()\n\n    def reset(self):\n        self.val = 0.0\n        self.avg = 0.0\n        self.sum = 0.0\n        self.count = 0\n\n    def update(self, val, n=1):\n        self.val = val\n        self.sum += val * n\n        self.count += n\n        self.avg = self.sum / self.count\n\n\nclass AverageMeterGroups:\n    def __init__(self) -> None:\n        self.meter_dict = dict()\n\n    def update(self, dict, n=1):\n        for name, val in dict.items():\n            if self.meter_dict.get(name) is None:\n                self.meter_dict[name] = AverageMeter()\n            self.meter_dict[name].update(val, n)\n\n    def reset(self, name=None):\n        if name is None:\n            for v in self.meter_dict.values():\n                v.reset()\n        else:\n            meter = self.meter_dict.get(name)\n            if meter is not None:\n                meter.reset()\n\n    def avg(self, name):\n        meter = self.meter_dict.get(name)\n        if meter is not None:\n            return meter.avg\n\n\nclass InputPadder:\n    \"\"\"Pads images such that dimensions are divisible by divisor\"\"\"\n\n    def __init__(self, dims, divisor=16):\n        self.ht, self.wd = dims[-2:]\n        pad_ht = (((self.ht // divisor) + 1) * divisor - self.ht) % divisor\n        pad_wd = (((self.wd // divisor) + 1) * divisor - self.wd) % divisor\n        self._pad = [pad_wd // 2, pad_wd - pad_wd // 2, pad_ht // 2, pad_ht - pad_ht // 2]\n\n    def pad(self, *inputs):\n        if len(inputs) == 1:\n            return F.pad(inputs[0], self._pad, mode=\"replicate\")\n        else:\n            return [F.pad(x, self._pad, mode=\"replicate\") for x in inputs]\n\n    def unpad(self, *inputs):\n        if len(inputs) == 1:\n            return self._unpad(inputs[0])\n        else:\n            return [self._unpad(x) for x in inputs]\n\n    def _unpad(self, x):\n        ht, wd = x.shape[-2:]\n        c = [self._pad[2], ht - self._pad[3], self._pad[0], wd - self._pad[1]]\n        return x[..., c[0] : c[1], c[2] : c[3]]\n\n\ndef img2tensor(img):\n    if img.shape[-1] > 3:\n        img = img[:, :, :3]\n    return torch.tensor(img).permute(2, 0, 1).unsqueeze(0) / 255.0\n\n\ndef tensor2img(img_t):\n    return (img_t * 255.0).detach().squeeze(0).permute(1, 2, 0).cpu().numpy().clip(0, 255).astype(np.uint8)\n\n\ndef seed_all(seed):\n    random.seed(seed)\n    np.random.seed(seed)\n    torch.manual_seed(seed)\n    torch.cuda.manual_seed_all(seed)\n\n\ndef read(file):\n    if file.endswith(\".float3\"):\n        return readFloat(file)\n    elif file.endswith(\".flo\"):\n        return readFlow(file)\n    elif file.endswith(\".ppm\"):\n        return readImage(file)\n    elif file.endswith(\".pgm\"):\n        return readImage(file)\n    elif file.endswith(\".png\"):\n        return readImage(file)\n    elif file.endswith(\".jpg\"):\n        return readImage(file)\n    elif file.endswith(\".pfm\"):\n        return readPFM(file)[0]\n    else:\n        raise Exception(\"don't know how to read %s\" % file)\n\n\ndef write(file, data):\n    if file.endswith(\".float3\"):\n        return writeFloat(file, data)\n    elif file.endswith(\".flo\"):\n        return writeFlow(file, data)\n    elif file.endswith(\".ppm\"):\n        return writeImage(file, data)\n    elif file.endswith(\".pgm\"):\n        return writeImage(file, data)\n    elif file.endswith(\".png\"):\n        return writeImage(file, data)\n    elif file.endswith(\".jpg\"):\n        return writeImage(file, data)\n    elif file.endswith(\".pfm\"):\n        return writePFM(file, data)\n    else:\n        raise Exception(\"don't know how to write %s\" % file)\n\n\ndef readPFM(file):\n    file = open(file, \"rb\")\n\n    color = None\n    width = None\n    height = None\n    scale = None\n    endian = None\n\n    header = file.readline().rstrip()\n    if header.decode(\"ascii\") == \"PF\":\n        color = True\n    elif header.decode(\"ascii\") == \"Pf\":\n        color = False\n    else:\n        raise Exception(\"Not a PFM file.\")\n\n    dim_match = re.match(r\"^(\\d+)\\s(\\d+)\\s$\", file.readline().decode(\"ascii\"))\n    if dim_match:\n        width, height = list(map(int, dim_match.groups()))\n    else:\n        raise Exception(\"Malformed PFM header.\")\n\n    scale = float(file.readline().decode(\"ascii\").rstrip())\n    if scale < 0:\n        endian = \"<\"\n        scale = -scale\n    else:\n        endian = \">\"\n\n    data = np.fromfile(file, endian + \"f\")\n    shape = (height, width, 3) if color else (height, width)\n\n    data = np.reshape(data, shape)\n    data = np.flipud(data)\n    return data, scale\n\n\ndef writePFM(file, image, scale=1):\n    file = open(file, \"wb\")\n\n    color = None\n\n    if image.dtype.name != \"float32\":\n        raise Exception(\"Image dtype must be float32.\")\n\n    image = np.flipud(image)\n\n    if len(image.shape) == 3 and image.shape[2] == 3:\n        color = True\n    elif len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1:\n        color = False\n    else:\n        raise Exception(\"Image must have H x W x 3, H x W x 1 or H x W dimensions.\")\n\n    file.write(\"PF\\n\" if color else \"Pf\\n\".encode())\n    file.write(\"%d %d\\n\".encode() % (image.shape[1], image.shape[0]))\n\n    endian = image.dtype.byteorder\n\n    if endian == \"<\" or endian == \"=\" and sys.byteorder == \"little\":\n        scale = -scale\n\n    file.write(\"%f\\n\".encode() % scale)\n\n    image.tofile(file)\n\n\ndef readFlow(name):\n    if name.endswith(\".pfm\") or name.endswith(\".PFM\"):\n        return readPFM(name)[0][:, :, 0:2]\n\n    f = open(name, \"rb\")\n\n    header = f.read(4)\n    if header.decode(\"utf-8\") != \"PIEH\":\n        raise Exception(\"Flow file header does not contain PIEH\")\n\n    width = np.fromfile(f, np.int32, 1).squeeze()\n    height = np.fromfile(f, np.int32, 1).squeeze()\n\n    flow = np.fromfile(f, np.float32, width * height * 2).reshape((height, width, 2))\n\n    return flow.astype(np.float32)\n\n\ndef readImage(name):\n    if name.endswith(\".pfm\") or name.endswith(\".PFM\"):\n        data = readPFM(name)[0]\n        if len(data.shape) == 3:\n            return data[:, :, 0:3]\n        else:\n            return data\n    return imread(name)\n\n\ndef writeImage(name, data):\n    if name.endswith(\".pfm\") or name.endswith(\".PFM\"):\n        return writePFM(name, data, 1)\n    return imwrite(name, data)\n\n\ndef writeFlow(name, flow):\n    f = open(name, \"wb\")\n    f.write(\"PIEH\".encode(\"utf-8\"))\n    np.array([flow.shape[1], flow.shape[0]], dtype=np.int32).tofile(f)\n    flow = flow.astype(np.float32)\n    flow.tofile(f)\n\n\ndef readFloat(name):\n    f = open(name, \"rb\")\n\n    if (f.readline().decode(\"utf-8\")) != \"float\\n\":\n        raise Exception(\"float file %s did not contain <float> keyword\" % name)\n\n    dim = int(f.readline())\n\n    dims = []\n    count = 1\n    for i in range(0, dim):\n        d = int(f.readline())\n        dims.append(d)\n        count *= d\n\n    dims = list(reversed(dims))\n\n    data = np.fromfile(f, np.float32, count).reshape(dims)\n    if dim > 2:\n        data = np.transpose(data, (2, 1, 0))\n        data = np.transpose(data, (1, 0, 2))\n\n    return data\n\n\ndef writeFloat(name, data):\n    f = open(name, \"wb\")\n\n    dim = len(data.shape)\n    if dim > 3:\n        raise Exception(\"bad float file dimension: %d\" % dim)\n\n    f.write((\"float\\n\").encode(\"ascii\"))\n    f.write((\"%d\\n\" % dim).encode(\"ascii\"))\n\n    if dim == 1:\n        f.write((\"%d\\n\" % data.shape[0]).encode(\"ascii\"))\n    else:\n        f.write((\"%d\\n\" % data.shape[1]).encode(\"ascii\"))\n        f.write((\"%d\\n\" % data.shape[0]).encode(\"ascii\"))\n        for i in range(2, dim):\n            f.write((\"%d\\n\" % data.shape[i]).encode(\"ascii\"))\n\n    data = data.astype(np.float32)\n    if dim == 2:\n        data.tofile(f)\n\n    else:\n        np.transpose(data, (2, 0, 1)).tofile(f)\n\n\ndef check_dim_and_resize(tensor_list):\n    shape_list = []\n    for t in tensor_list:\n        shape_list.append(t.shape[2:])\n\n    if len(set(shape_list)) > 1:\n        desired_shape = shape_list[0]\n        print(f\"Inconsistent size of input video frames. All frames will be resized to {desired_shape}\")\n\n        resize_tensor_list = []\n        for t in tensor_list:\n            resize_tensor_list.append(torch.nn.functional.interpolate(t, size=tuple(desired_shape), mode=\"bilinear\"))\n\n        tensor_list = resize_tensor_list\n\n    return tensor_list\n"
  },
  {
    "path": "Open-Sora/tools/scene_cut/README.md",
    "content": "# Scene Detection and Video Splitting\n\n- [Scene Detection and Video Splitting](#scene-detection-and-video-splitting)\n    - [Prepare Meta Files](#prepare-meta-files)\n    - [Scene Detection](#scene-detection)\n    - [Video Splitting](#video-splitting)\n\nIn many cases, raw videos contain several scenes and are too long for training. Thus, it is essential to split them into shorter\nclips based on scenes. Here, we provide code for scene detection and video splitting.\n\n## Prepare Meta Files\nAt this step, you should have a raw video dataset prepared. A meta file of the dataset information is needed for data processing. To create a meta file from a folder, run:\n\n```bash\npython -m tools.datasets.convert video /path/to/video/folder --output /path/to/save/meta.csv\n```\nThis should output a `.csv` file with column `path`.\n\nIf you already have a meta file for the videos and want to keep the information.\n**Make sure** the meta file has column `id`, which is the id for each video, and the video is named as `{id}.mp4`.\nThe following command will add a new column `path` to the meta file.\n\n```bash\npython tools/scene_cut/convert_id_to_path.py /path/to/meta.csv --folder_path /path/to/video/folder\n```\nThis should output\n- `{prefix}_path-filtered.csv` with column `path` (broken videos filtered)\n- `{prefix}_path_intact.csv` with column `path` and `intact` (`intact` indicating a video is intact or not)\n\n\n## Scene Detection\n\nInstall the required dependancies by following our [installation instructions](../../docs/installation.md)'s \"Data Dependencies\" and \"Scene Detection\" sections.\n\n<!-- The next step is to detect scenes in a video.\nWe use [`PySceneDetect`](https://github.com/Breakthrough/PySceneDetect) for this job.\n```bash\npip install scenedetect[opencv] --upgrade\n``` -->\n\n**Make sure** the input meta file has column `path`, which is the path of a video.\n\n```bash\npython tools/scene_cut/scene_detect.py /path/to/meta.csv\n```\nThe output is `{prefix}_timestamp.csv` with column `timestamp`. Each cell in column `timestamp` is a list of tuples,\nwith each tuple indicating the start and end timestamp of a scene\n(e.g., `[('00:00:01.234', '00:00:02.345'), ('00:00:03.456', '00:00:04.567')]`).\n\n## Video Splitting\nAfter obtaining timestamps for scenes, we conduct video splitting (cutting).\n**Make sure** the meta file contains column `timestamp`.\n\n```bash\npython tools/scene_cut/cut.py /path/to/meta.csv --save_dir /path/to/output/dir\n```\n\nThis will save video clips to `/path/to/output/dir`. The video clips are named as `{video_id}_scene-{scene_id}.mp4`\n\nTo create a new meta file for the generated clips, run:\n```bash\npython -m tools.datasets.convert video /path/to/video/folder --output /path/to/save/meta.csv\n```\n"
  },
  {
    "path": "Open-Sora/tools/scene_cut/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/tools/scene_cut/convert_id_to_path.py",
    "content": "import argparse\nimport json\nimport os\nfrom functools import partial\n\nimport cv2\nimport numpy as np\nimport pandas as pd\nfrom mmengine.logging import print_log\nfrom moviepy.editor import VideoFileClip\nfrom pandarallel import pandarallel\nfrom tqdm import tqdm\n\ntqdm.pandas()\n\n\ndef is_intact_video(video_path, mode=\"moviepy\", verbose=False, logger=None):\n    if not os.path.exists(video_path):\n        if verbose:\n            print_log(f\"Could not find '{video_path}'\", logger=logger)\n        return False\n\n    if mode == \"moviepy\":\n        try:\n            VideoFileClip(video_path)\n            if verbose:\n                print_log(f\"The video file '{video_path}' is intact.\", logger=logger)\n            return True\n        except Exception as e:\n            if verbose:\n                print_log(f\"Error: {e}\", logger=logger)\n                print_log(f\"The video file '{video_path}' is not intact.\", logger=logger)\n            return False\n    elif mode == \"cv2\":\n        try:\n            cap = cv2.VideoCapture(video_path)\n            if cap.isOpened():\n                if verbose:\n                    print_log(f\"The video file '{video_path}' is intact.\", logger=logger)\n                return True\n        except Exception as e:\n            if verbose:\n                print_log(f\"Error: {e}\", logger=logger)\n                print_log(f\"The video file '{video_path}' is not intact.\", logger=logger)\n            return False\n    else:\n        raise ValueError\n\n\ndef has_downloaded_success(json_path):\n    if not os.path.exists(json_path):\n        return False\n\n    try:\n        with open(json_path, \"r\") as f:\n            data = json.load(f)\n            if \"success\" not in data or isinstance(data[\"success\"], bool) is False or data[\"success\"] is False:\n                return False\n    except Exception:\n        return False\n\n    return True\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"meta_path\", type=str)\n    parser.add_argument(\"--folder_path\", type=str, required=True)\n    parser.add_argument(\"--mode\", type=str, default=None)\n    parser.add_argument(\"--num_workers\", type=int, default=None, help=\"#workers for pandarallel\")\n\n    args = parser.parse_args()\n    return args\n\n\ndef main():\n    args = parse_args()\n\n    meta_path = args.meta_path\n    folder_path = args.folder_path\n    mode = args.mode\n\n    def is_intact(row, mode=None):\n        video_id = row[\"id\"]\n        video_path = os.path.join(folder_path, f\"{video_id}.mp4\")\n        row[\"path\"] = video_path\n\n        if mode == \".mp4\":\n            if is_intact_video(video_path):\n                return True, video_path\n            return False, video_path\n        elif mode == \".json\":\n            # json_path = os.path.join(root_raw, f\"data/{split}/{video_id}.json\")\n            json_path = os.path.join(folder_path, f\"{video_id}.json\")\n            if has_downloaded_success(json_path):\n                return True, video_path\n            return False, video_path\n        elif mode is None:\n            return True, video_path\n        else:\n            raise ValueError\n\n    meta_dirpath = os.path.dirname(meta_path)\n    meta_fname = os.path.basename(meta_path)\n    wo_ext, ext = os.path.splitext(meta_fname)\n\n    if args.num_workers is not None:\n        pandarallel.initialize(progress_bar=True, nb_workers=args.num_workers)\n    else:\n        pandarallel.initialize(progress_bar=True)\n    is_intact_partial = partial(is_intact, mode=mode)\n\n    meta = pd.read_csv(meta_path)\n    ret = meta.parallel_apply(is_intact_partial, axis=1)\n    intact, paths = list(zip(*ret))\n\n    meta[\"intact\"] = intact\n    meta[\"path\"] = paths\n    out_path = os.path.join(meta_dirpath, f\"{wo_ext}_path_intact.csv\")\n    meta.to_csv(out_path, index=False)\n    print(f\"New meta (shape={meta.shape}) with intact info saved to '{out_path}'\")\n\n    meta_format = meta[np.array(intact)]\n    meta_format.drop(\"intact\", axis=1, inplace=True)\n    out_path = os.path.join(meta_dirpath, f\"{wo_ext}_path-filtered.csv\")\n    meta_format.to_csv(out_path, index=False)\n    print(f\"New meta (shape={meta_format.shape}) with format info saved to '{out_path}'\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/tools/scene_cut/cut.py",
    "content": "import cv2  # isort:skip\n\nimport argparse\nimport os\nimport subprocess\nfrom functools import partial\n\nimport pandas as pd\nfrom imageio_ffmpeg import get_ffmpeg_exe\nfrom pandarallel import pandarallel\nfrom scenedetect import FrameTimecode\nfrom tqdm import tqdm\n\ntqdm.pandas()\n\n\ndef print_log(s, logger=None):\n    if logger is not None:\n        logger.info(s)\n    else:\n        print(s)\n\n\ndef process_single_row(row, args):\n    video_path = row[\"path\"]\n\n    logger = None\n\n    # check mp4 integrity\n    # if not is_intact_video(video_path, logger=logger):\n    #     return False\n    try:\n        if \"timestamp\" in row:\n            timestamp = row[\"timestamp\"]\n            if not (timestamp.startswith(\"[\") and timestamp.endswith(\"]\")):\n                return False\n            scene_list = eval(timestamp)\n            scene_list = [(FrameTimecode(s, fps=100), FrameTimecode(t, fps=100)) for s, t in scene_list]\n        else:\n            scene_list = [None]\n        if args.drop_invalid_timestamps:\n            return True\n    except Exception as e:\n        if args.drop_invalid_timestamps:\n            return False\n\n    if \"relpath\" in row:\n        save_dir = os.path.dirname(os.path.join(args.save_dir, row[\"relpath\"]))\n        os.makedirs(save_dir, exist_ok=True)\n    else:\n        save_dir = args.save_dir\n\n    shorter_size = args.shorter_size\n    if (shorter_size is not None) and (\"height\" in row) and (\"width\" in row):\n        min_size = min(row[\"height\"], row[\"width\"])\n        if min_size <= shorter_size:\n            shorter_size = None\n\n    split_video(\n        video_path,\n        scene_list,\n        save_dir=save_dir,\n        min_seconds=args.min_seconds,\n        max_seconds=args.max_seconds,\n        target_fps=args.target_fps,\n        shorter_size=shorter_size,\n        logger=logger,\n    )\n    return True\n\ndef split_video(\n    video_path,\n    scene_list,\n    save_dir,\n    min_seconds=2,\n    max_seconds=15,\n    target_fps=30,\n    shorter_size=None,\n    verbose=False,\n    logger=None,\n):\n    \"\"\"\n    scenes shorter than min_seconds will be ignored;\n    scenes longer than max_seconds will be cut to save the beginning max_seconds.\n    Currently, the saved file name pattern is f'{fname}_scene-{idx}'.mp4\n\n    Args:\n        scene_list (List[Tuple[FrameTimecode, FrameTimecode]]): each element is (s, t): start and end of a scene.\n        min_seconds (float | None)\n        max_seconds (float | None)\n        target_fps (int | None)\n        shorter_size (int | None)\n    \"\"\"\n    FFMPEG_PATH = get_ffmpeg_exe()\n\n    save_path_list = []\n    for idx, scene in enumerate(scene_list):\n        if scene is not None:\n            s, t = scene  # FrameTimecode\n            if min_seconds is not None:\n                if (t - s).get_seconds() < min_seconds:\n                    continue\n\n            duration = t - s\n            if max_seconds is not None:\n                fps = s.framerate\n                max_duration = FrameTimecode(max_seconds, fps=fps)\n                duration = min(max_duration, duration)\n\n        # save path\n        fname = os.path.basename(video_path)\n        fname_wo_ext = os.path.splitext(fname)[0]\n        # TODO: fname pattern\n        save_path = os.path.join(save_dir, f\"{fname_wo_ext}_scene-{idx}.mp4\")\n        if os.path.exists(save_path):\n            # print_log(f\"File '{save_path}' already exists. Skip.\", logger=logger)\n            continue\n        \n        # ffmpeg cmd\n        cmd = [FFMPEG_PATH]\n\n        # Only show ffmpeg output for the first call, which will display any\n        # errors if it fails, and then break the loop. We only show error messages\n        # for the remaining calls.\n        # cmd += ['-v', 'error']\n\n        # clip to cut\n        # Note: -ss after -i is very slow; put -ss before -i !!!\n        if scene is None:\n            cmd += [\"-nostdin\", \"-y\", \"-i\", video_path]\n        else:\n            cmd += [\"-nostdin\", \"-y\", \"-ss\", str(s.get_seconds()), \"-i\", video_path, \"-t\", str(duration.get_seconds())]\n\n        # target fps\n        if target_fps is not None:\n            cmd += [\"-r\", f\"{target_fps}\"]\n\n        # aspect ratio\n        if shorter_size is not None:\n            cmd += [\"-vf\", f\"scale='if(gt(iw,ih),-2,{shorter_size})':'if(gt(iw,ih),{shorter_size},-2)'\"]\n            # cmd += ['-vf', f\"scale='if(gt(iw,ih),{shorter_size},trunc(ow/a/2)*2)':-2\"]\n\n        cmd += [\"-map\", \"0:v\", save_path]\n        # print(cmd)\n        proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\n        stdout, stderr = proc.communicate()\n        # stdout = stdout.decode(\"utf-8\")\n        # print_log(stdout, logger=logger)\n\n        save_path_list.append(video_path)\n        if verbose:\n            print_log(f\"Video clip saved to '{save_path}'\", logger=logger)\n\n    return save_path_list\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"meta_path\", type=str)\n    parser.add_argument(\"--save_dir\", type=str)\n    parser.add_argument(\n        \"--min_seconds\", type=float, default=None, help=\"if not None, clip shorter than min_seconds is ignored\"\n    )\n    parser.add_argument(\n        \"--max_seconds\", type=float, default=None, help=\"if not None, clip longer than max_seconds is truncated\"\n    )\n    parser.add_argument(\"--target_fps\", type=int, default=None, help=\"target fps of clips\")\n    parser.add_argument(\n        \"--shorter_size\", type=int, default=None, help=\"resize the shorter size by keeping ratio; will not do upscale\"\n    )\n    parser.add_argument(\"--num_workers\", type=int, default=None, help=\"#workers for pandarallel\")\n    parser.add_argument(\"--disable_parallel\", action=\"store_true\", help=\"disable parallel processing\")\n    parser.add_argument(\"--drop_invalid_timestamps\", action=\"store_true\", help=\"drop rows with invalid timestamps\")\n    args = parser.parse_args()\n    return args\n\n\ndef main():\n    args = parse_args()\n    meta_path = args.meta_path\n    if not os.path.exists(meta_path):\n        print(f\"Meta file '{meta_path}' not found. Exit.\")\n        exit()\n\n    # create save_dir\n    os.makedirs(args.save_dir, exist_ok=True)\n\n    # initialize pandarallel\n    if not args.disable_parallel:\n        if args.num_workers is not None:\n            pandarallel.initialize(progress_bar=True, nb_workers=args.num_workers)\n        else:\n            pandarallel.initialize(progress_bar=True)\n    process_single_row_partial = partial(process_single_row, args=args)\n\n    # process\n    meta = pd.read_csv(args.meta_path)\n    if not args.disable_parallel:\n        results = meta.parallel_apply(process_single_row_partial, axis=1)\n    else:\n        results = meta.apply(process_single_row_partial, axis=1)\n    if args.drop_invalid_timestamps:\n        meta = meta[results]\n        assert args.meta_path.endswith(\"timestamp.csv\"), \"Only support *timestamp.csv\"\n        meta.to_csv(args.meta_path.replace(\"timestamp.csv\", \"correct_timestamp.csv\"), index=False)\n        print(f\"Corrected timestamp file saved to '{args.meta_path.replace('timestamp.csv', 'correct_timestamp.csv')}'\")\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/tools/scene_cut/scene_detect.py",
    "content": "import argparse\nimport os\n\nimport numpy as np\nimport pandas as pd\nfrom pandarallel import pandarallel\nfrom scenedetect import AdaptiveDetector, detect\nfrom tqdm import tqdm\n\ntqdm.pandas()\n\n\ndef process_single_row(row):\n    # windows\n    # from scenedetect import detect, ContentDetector, AdaptiveDetector\n\n    video_path = row[\"path\"]\n\n    detector = AdaptiveDetector(\n        adaptive_threshold=3.0,\n        # luma_only=True,\n    )\n    # detector = ContentDetector()\n    # TODO: catch error here\n    try:\n        scene_list = detect(video_path, detector, start_in_scene=True)\n        timestamp = [(s.get_timecode(), t.get_timecode()) for s, t in scene_list]\n        return True, str(timestamp)\n    except Exception as e:\n        print(f\"Video '{video_path}' with error {e}\")\n        return False, \"\"\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"meta_path\", type=str)\n    parser.add_argument(\"--num_workers\", type=int, default=None, help=\"#workers for pandarallel\")\n\n    args = parser.parse_args()\n    return args\n\n\ndef main():\n    args = parse_args()\n    meta_path = args.meta_path\n    if not os.path.exists(meta_path):\n        print(f\"Meta file '{meta_path}' not found. Exit.\")\n        exit()\n\n    if args.num_workers is not None:\n        pandarallel.initialize(progress_bar=True, nb_workers=args.num_workers)\n    else:\n        pandarallel.initialize(progress_bar=True)\n\n    meta = pd.read_csv(meta_path)\n    ret = meta.parallel_apply(process_single_row, axis=1)\n\n    succ, timestamps = list(zip(*ret))\n    meta[\"timestamp\"] = timestamps\n    meta = meta[np.array(succ)]\n\n    wo_ext, ext = os.path.splitext(meta_path)\n    out_path = f\"{wo_ext}_timestamp{ext}\"\n    meta.to_csv(out_path, index=False)\n    print(f\"New meta (shape={meta.shape}) with timestamp saved to '{out_path}'.\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/tools/scoring/README.md",
    "content": "# Scoring and Filtering\n\n- [Scoring and Filtering](#scoring-and-filtering)\n  - [Aesthetic Score](#aesthetic-score)\n  - [Optical Flow Score](#optical-flow-score)\n  - [OCR](#ocr)\n  - [Matching Score](#matching-score)\n  - [Filtering](#filtering)\n\n## Aesthetic Score\n\nTo evaluate the aesthetic quality of videos, we use the scoring model from [CLIP+MLP Aesthetic Score Predictor](https://github.com/christophschuhmann/improved-aesthetic-predictor). This model is trained on 176K SAC (Simulacra Aesthetic Captions) pairs, 15K LAION-Logos (Logos) pairs, and 250K AVA (The Aesthetic Visual Analysis) image-text pairs.\n\nThe aesthetic score is between 1 and 10, where 5.5 can be considered as the threshold for fair aesthetics, and 6.5 for high aesthetics. Good text-to-image models can achieve a score of 7.0 or higher.\n\nFor videos, we extract the first, last, and the middle frames for evaluation. The script also supports images as input.\nThe throughput of our code is ~1K videos/s on a single H800 GPU. It also supports running on multiple GPUs for further acceleration.\n\nFirst, install the required packages following our [installation instructions](../../docs/installation.md)'s \"Data Dependencies\".\n\nNext, download the scoring model to `./pretrained_models/aesthetic.pth`.\n\n```bash\nwget https://github.com/christophschuhmann/improved-aesthetic-predictor/raw/main/sac+logos+ava1-l14-linearMSE.pth -O pretrained_models/aesthetic.pth\n```\n\n<!-- First, install the required packages and download the scoring model to `./pretrained_models/aesthetic.pth`.\n```bash\n# pip install\npip install git+https://github.com/openai/CLIP.git\npip install decord\n\n# get pretrained model\nwget https://github.com/christophschuhmann/improved-aesthetic-predictor/raw/main/sac+logos+ava1-l14-linearMSE.pth -O pretrained_models/aesthetic.pth\n``` -->\n\nThen, run the following command. **Make sure** the meta file has column `path` (path to the sample).\n```bash\ntorchrun --nproc_per_node 8 -m tools.scoring.aesthetic.inference /path/to/meta.csv --bs 1024 --num_workers 16\n```\nThis will generate multiple part files, each corresponding to a node . Run `python -m tools.datasets.datautil /path/to/meta_aes_part*.csv --output /path/to/meta_aes.csv` to merge them.\n\n## Optical Flow Score\n\nOptical flow scores are used to assess the motion of a video. Higher optical flow scores indicate larger movement.\nWe use the [UniMatch](https://github.com/autonomousvision/unimatch) model for this task.\n\nFirst, install the required packages following our [installation instructions](../../docs/installation.md)'s \"Data Dependencies\".\n\nNext, download the pretrained model to `./pretrained_model/unimatch/`\n```bash\nwget https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmflow-scale2-regrefine6-mixdata-train320x576-4e7b215d.pth -P ./pretrained_models/unimatch/\n```\n\nThen, run the following command. **Make sure** the meta file has column `path` (path to the sample).\n```bash\ntorchrun --standalone --nproc_per_node 8 tools/scoring/optical_flow/inference.py /path/to/meta.csv\n```\n\nThis should output `/path/to/meta_flow.csv` with column `flow`.\n\n## OCR\nSome videos are of dense text scenes like news broadcast and advertisement, which are not desired for training.\nWe apply Optical Character Recognition (OCR) to detect texts and drop samples with dense texts. Here, we use\nthe [DBNet++](https://arxiv.org/abs/2202.10304) model implemented by [MMOCR](https://github.com/open-mmlab/mmocr/).\n\nFirst, install the required packages following our [installation instructions](../../docs/installation.md)'s \"Data Dependencies\" and \"OCR\" section.\n\n<!-- First, install [MMOCR](https://mmocr.readthedocs.io/en/dev-1.x/get_started/install.html).\nFor reference, we install packages of these versions.\n```\ntorch==2.0.1\nmmcv==2.0.1\nmmdet==3.1.0\nmmocr==1.0.1\n``` -->\n\nThen, run the following command. **Make sure** the meta file has column `path` (path to the sample).\n<!-- ```bash\ntorchrun --standalone --nproc_per_node 8 tools/scoring/ocr/inference.py /path/to/meta.csv\n``` -->\n```bash\ntorchrun --standalone --nproc_per_node 8 -m tools.scoring.ocr.inference /path/to/meta.csv\n```\nThis should output `/path/to/meta_ocr.csv` with column `ocr`, indicating the number of text regions with detection confidence > 0.3.\n\n\n## Matching Score\n\nMatching scores are calculated to evaluate the alignment between an image/video and its caption.\nHere, we use the [CLIP](https://github.com/openai/CLIP) model, which is trained on image-text pairs.\nWe simply use the cosine similarity as the matching score.\nFor videos, we extract the middle frame and compare it with the caption.\n\nFirst, install OpenAI CLIP.\n```bash\npip install git+https://github.com/openai/CLIP.git\n```\n\nThen, run the following command. **Make sure** the meta file has column `path` (path to the sample) and `text` (caption of the sample).\n\n```bash\ntorchrun --standalone --nproc_per_node 8 tools/scoring/matching/inference.py /path/to/meta.csv\n```\n\nThis should output `/path/to/meta_match.csv` with column `match`. Higher matching scores indicate better image-text/video-text alignment.\n\n\n## Filtering\nOnce scores are obtained, it is simple to filter samples based on these scores. Here is an example to remove\nsamples of aesthetic score < 5.0.\n```\npython -m tools.datasets.datautil /path/to/meta.csv --aesmin 5.0\n```\nThis should output `/path/to/meta_aesmin5.0.csv` with column `aes` >= 5.0\n"
  },
  {
    "path": "Open-Sora/tools/scoring/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/tools/scoring/aesthetic/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/tools/scoring/aesthetic/inference.py",
    "content": "# adapted from https://github.com/christophschuhmann/improved-aesthetic-predictor/blob/main/simple_inference.py\nimport cv2  # isort:skip\n\nimport argparse\nimport gc\nimport os\nfrom datetime import timedelta\n\nimport clip\nimport numpy as np\nimport pandas as pd\nimport torch\nimport torch.distributed as dist\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom einops import rearrange\nfrom torch.utils.data import DataLoader, DistributedSampler\nfrom torchvision.datasets.folder import pil_loader\nfrom tqdm import tqdm\n\nfrom tools.datasets.utils import extract_frames, is_video\n\nNUM_FRAMES_POINTS = {\n    1: (0.5,),\n    2: (0.25, 0.5),\n    3: (0.1, 0.5, 0.9),\n}\n\n\ndef merge_scores(gathered_list: list, meta: pd.DataFrame, column):\n    # reorder\n    indices_list = list(map(lambda x: x[0], gathered_list))\n    scores_list = list(map(lambda x: x[1], gathered_list))\n\n    flat_indices = []\n    for x in zip(*indices_list):\n        flat_indices.extend(x)\n    flat_scores = []\n    for x in zip(*scores_list):\n        flat_scores.extend(x)\n    flat_indices = np.array(flat_indices)\n    flat_scores = np.array(flat_scores)\n\n    # filter duplicates\n    unique_indices, unique_indices_idx = np.unique(flat_indices, return_index=True)\n    meta.loc[unique_indices, column] = flat_scores[unique_indices_idx]\n\n    # drop indices in meta not in unique_indices\n    meta = meta.loc[unique_indices]\n    return meta\n\n\nclass VideoTextDataset(torch.utils.data.Dataset):\n    def __init__(self, meta_path, transform=None, num_frames=3):\n        self.meta_path = meta_path\n        self.meta = pd.read_csv(meta_path)\n        self.transform = transform\n        self.points = NUM_FRAMES_POINTS[num_frames]\n\n    def __getitem__(self, index):\n        sample = self.meta.iloc[index]\n        path = sample[\"path\"]\n\n        # extract frames\n        if not is_video(path):\n            images = [pil_loader(path)]\n        else:\n            num_frames = sample[\"num_frames\"] if \"num_frames\" in sample else None\n            images = extract_frames(sample[\"path\"], points=self.points, backend=\"opencv\", num_frames=num_frames)\n\n        # transform\n        images = [self.transform(img) for img in images]\n\n        # stack\n        images = torch.stack(images)\n\n        ret = dict(index=index, images=images)\n        return ret\n\n    def __len__(self):\n        return len(self.meta)\n\n\nclass MLP(nn.Module):\n    def __init__(self, input_size):\n        super().__init__()\n        self.input_size = input_size\n        self.layers = nn.Sequential(\n            nn.Linear(self.input_size, 1024),\n            nn.Dropout(0.2),\n            nn.Linear(1024, 128),\n            nn.Dropout(0.2),\n            nn.Linear(128, 64),\n            nn.Dropout(0.1),\n            nn.Linear(64, 16),\n            nn.Linear(16, 1),\n        )\n\n    def forward(self, x):\n        return self.layers(x)\n\n\nclass AestheticScorer(nn.Module):\n    def __init__(self, input_size, device):\n        super().__init__()\n        self.mlp = MLP(input_size)\n        self.clip, self.preprocess = clip.load(\"ViT-L/14\", device=device)\n\n        self.eval()\n        self.to(device)\n\n    def forward(self, x):\n        image_features = self.clip.encode_image(x)\n        image_features = F.normalize(image_features, p=2, dim=-1).float()\n        return self.mlp(image_features)\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"meta_path\", type=str, help=\"Path to the input CSV file\")\n    parser.add_argument(\"--bs\", type=int, default=1024, help=\"Batch size\")\n    parser.add_argument(\"--num_workers\", type=int, default=16, help=\"Number of workers\")\n    parser.add_argument(\"--prefetch_factor\", type=int, default=3, help=\"Prefetch factor\")\n    parser.add_argument(\"--num_frames\", type=int, default=3, help=\"Number of frames to extract\")\n    parser.add_argument(\"--skip_if_existing\", action=\"store_true\")\n    args = parser.parse_args()\n\n    return args\n\n\ndef main():\n    args = parse_args()\n\n    meta_path = args.meta_path\n    if not os.path.exists(meta_path):\n        print(f\"Meta file '{meta_path}' not found. Exit.\")\n        exit()\n\n    wo_ext, ext = os.path.splitext(meta_path)\n    out_path = f\"{wo_ext}_aes{ext}\"\n    if args.skip_if_existing and os.path.exists(out_path):\n        print(f\"Output meta file '{out_path}' already exists. Exit.\")\n        exit()\n\n    dist.init_process_group(backend=\"nccl\", timeout=timedelta(hours=24))\n    torch.cuda.set_device(dist.get_rank() % torch.cuda.device_count())\n\n    # build model\n    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n    model = AestheticScorer(768, device)\n    model.mlp.load_state_dict(torch.load(\"pretrained_models/aesthetic.pth\", map_location=device))\n    preprocess = model.preprocess\n\n    # build dataset\n    dataset = VideoTextDataset(args.meta_path, transform=preprocess, num_frames=args.num_frames)\n    dataloader = DataLoader(\n        dataset,\n        batch_size=args.bs,\n        num_workers=args.num_workers,\n        sampler=DistributedSampler(\n            dataset,\n            num_replicas=dist.get_world_size(),\n            rank=dist.get_rank(),\n            shuffle=False,\n            drop_last=False,\n        ),\n    )\n\n    # compute aesthetic scores\n    indices_list = []\n    scores_list = []\n    model.eval()\n    for batch in tqdm(dataloader, disable=dist.get_rank() != 0):\n        indices = batch[\"index\"]\n        images = batch[\"images\"].to(device, non_blocking=True)\n\n        B = images.shape[0]\n        images = rearrange(images, \"B N C H W -> (B N) C H W\")\n\n        # compute score\n        with torch.no_grad():\n            scores = model(images)\n\n        scores = rearrange(scores, \"(B N) 1 -> B N\", B=B)\n        scores = scores.mean(dim=1)\n        scores_np = scores.to(torch.float32).cpu().numpy()\n\n        indices_list.extend(indices.tolist())\n        scores_list.extend(scores_np.tolist())\n\n    # save local results\n    meta_local = merge_scores([(indices_list, scores_list)], dataset.meta, column=\"aes\")\n    save_dir_local = os.path.join(os.path.dirname(out_path), \"parts\")\n    os.makedirs(save_dir_local, exist_ok=True)\n    out_path_local = os.path.join(\n        save_dir_local, os.path.basename(out_path).replace(\".csv\", f\"_part_{dist.get_rank()}.csv\")\n    )\n    meta_local.to_csv(out_path_local, index=False)\n\n    # wait for all ranks to finish data processing\n    dist.barrier()\n\n    torch.cuda.empty_cache()\n    gc.collect()\n    gathered_list = [None] * dist.get_world_size()\n    dist.all_gather_object(gathered_list, (indices_list, scores_list))\n    if dist.get_rank() == 0:\n        meta_new = merge_scores(gathered_list, dataset.meta, column=\"aes\")\n        meta_new.to_csv(out_path, index=False)\n        print(f\"New meta with aesthetic scores saved to '{out_path}'.\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/tools/scoring/matching/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/tools/scoring/matching/inference.py",
    "content": "import argparse\nimport os\n\nimport clip\nimport colossalai\nimport numpy as np\nimport pandas as pd\nimport torch\nimport torch.distributed as dist\nimport torch.nn.functional as F\nfrom torch.utils.data import DataLoader, DistributedSampler\nfrom torchvision.datasets.folder import pil_loader\nfrom tqdm import tqdm\n\nfrom tools.datasets.utils import extract_frames, is_video\n\n\ndef merge_scores(gathered_list: list, meta: pd.DataFrame, column):\n    # reorder\n    indices_list = list(map(lambda x: x[0], gathered_list))\n    scores_list = list(map(lambda x: x[1], gathered_list))\n\n    flat_indices = []\n    for x in zip(*indices_list):\n        flat_indices.extend(x)\n    flat_scores = []\n    for x in zip(*scores_list):\n        flat_scores.extend(x)\n    flat_indices = np.array(flat_indices)\n    flat_scores = np.array(flat_scores)\n\n    # filter duplicates\n    unique_indices, unique_indices_idx = np.unique(flat_indices, return_index=True)\n    meta.loc[unique_indices, column] = flat_scores[unique_indices_idx]\n    return meta\n\n\nclass VideoTextDataset(torch.utils.data.Dataset):\n    def __init__(self, meta_path, transform):\n        self.meta_path = meta_path\n        self.meta = pd.read_csv(meta_path)\n        self.transform = transform\n\n    def __getitem__(self, index):\n        row = self.meta.iloc[index]\n        path = row[\"path\"]\n\n        if is_video(path):\n            img = extract_frames(path, points=[0.5], backend=\"opencv\")[0]\n        else:\n            img = pil_loader(path)\n\n        img = self.transform(img)\n\n        text = row[\"text\"]\n        text = clip.tokenize(text, truncate=True).squeeze()\n\n        return img, text, index\n\n    def __len__(self):\n        return len(self.meta)\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"meta_path\", type=str, help=\"Path to the input CSV file\")\n    parser.add_argument(\"--bs\", type=int, default=16, help=\"Batch size\")\n    parser.add_argument(\"--num_workers\", type=int, default=16, help=\"Number of workers\")\n    parser.add_argument(\"--skip_if_existing\", action=\"store_true\")\n    args = parser.parse_args()\n    return args\n\n\ndef main():\n    args = parse_args()\n\n    meta_path = args.meta_path\n    if not os.path.exists(meta_path):\n        print(f\"Meta file '{meta_path}' not found. Exit.\")\n        exit()\n\n    wo_ext, ext = os.path.splitext(meta_path)\n    out_path = f\"{wo_ext}_match{ext}\"\n    if args.skip_if_existing and os.path.exists(out_path):\n        print(f\"Output meta file '{out_path}' already exists. Exit.\")\n        exit()\n\n    colossalai.launch_from_torch({})\n\n    # build model\n    device = torch.device(\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n    model, preprocess = clip.load(\"ViT-L/14\", device=device)\n    logit_scale = model.logit_scale.exp().item()\n\n    # build dataset\n    dataset = VideoTextDataset(meta_path=meta_path, transform=preprocess)\n    dataloader = DataLoader(\n        dataset,\n        batch_size=args.bs,\n        num_workers=args.num_workers,\n        sampler=DistributedSampler(\n            dataset,\n            num_replicas=dist.get_world_size(),\n            rank=dist.get_rank(),\n            shuffle=False,\n            drop_last=False,\n        ),\n    )\n\n    # compute scores\n    indices_list = []\n    scores_list = []\n    model.eval()\n    for imgs, text, indices in tqdm(dataloader, disable=dist.get_rank() != 0):\n        imgs = imgs.to(device)\n        text = text.to(device)\n\n        with torch.no_grad():\n            feat_img = model.encode_image(imgs)\n            feat_text = model.encode_text(text)\n\n        feat_img = F.normalize(feat_img, dim=1)\n        feat_text = F.normalize(feat_text, dim=1)\n        clip_scores = logit_scale * (feat_img * feat_text).sum(dim=1)\n        clip_scores = clip_scores.cpu().tolist()\n        indices_list.extend(indices)\n        scores_list.extend(clip_scores)\n\n    gathered_list = [None] * dist.get_world_size()\n    dist.all_gather_object(gathered_list, (indices_list, scores_list))\n    if dist.get_rank() == 0:\n        meta_new = merge_scores(gathered_list, dataset.meta, column=\"match\")\n        meta_new.to_csv(out_path, index=False)\n        print(f\"New meta with matching scores saved to '{out_path}'.\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/tools/scoring/ocr/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/tools/scoring/ocr/dbnetpp.py",
    "content": "model = dict(\n    type=\"DBNet\",\n    backbone=dict(\n        type=\"CLIPResNet\",\n        depth=50,\n        num_stages=4,\n        out_indices=(0, 1, 2, 3),\n        frozen_stages=-1,\n        norm_cfg=dict(type=\"BN\", requires_grad=True),\n        norm_eval=False,\n        style=\"pytorch\",\n        dcn=dict(type=\"DCNv2\", deform_groups=1, fallback_on_stride=False),\n        # init_cfg=dict(\n        #     type='Pretrained',\n        #     checkpoint='https://download.openmmlab.com/mmocr/backbone/resnet50-oclip-7ba0c533.pth'),\n        stage_with_dcn=(False, True, True, True),\n    ),\n    neck=dict(\n        type=\"FPNC\",\n        in_channels=[256, 512, 1024, 2048],\n        lateral_channels=256,\n        asf_cfg=dict(attention_type=\"ScaleChannelSpatial\"),\n    ),\n    det_head=dict(\n        type=\"DBHead\",\n        in_channels=256,\n        module_loss=dict(type=\"DBModuleLoss\"),\n        postprocessor=dict(\n            type=\"DBPostprocessor\",\n            text_repr_type=\"quad\",\n            epsilon_ratio=0.002,\n        ),\n    ),\n    data_preprocessor=dict(\n        type=\"TextDetDataPreprocessor\",\n        mean=[123.675, 116.28, 103.53],\n        std=[58.395, 57.12, 57.375],\n        bgr_to_rgb=True,\n        pad_size_divisor=32,\n    ),\n    init_cfg=dict(\n        type=\"Pretrained\",\n        checkpoint=\"https://download.openmmlab.com/mmocr/textdet/dbnetpp/\"\n        \"dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015/\"\n        \"dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015_20221101_124139-4ecb39ac.pth\",\n    ),\n)\n\ntest_pipeline = [\n    # dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),\n    dict(type=\"Resize\", scale=(4068, 1024), keep_ratio=True),\n    dict(\n        type=\"PackTextDetInputs\",\n        # meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'),\n        meta_keys=(\"img_shape\", \"scale_factor\"),\n    ),\n]\n\n# Visualization\nvis_backends = [dict(type=\"LocalVisBackend\")]\nvisualizer = dict(\n    type=\"TextDetLocalVisualizer\",\n    name=\"visualizer\",\n    vis_backends=vis_backends,\n)\n"
  },
  {
    "path": "Open-Sora/tools/scoring/ocr/inference.py",
    "content": "import argparse\nimport os\n\nimport colossalai\nimport numpy as np\nimport pandas as pd\nimport torch\nimport torch.distributed as dist\nfrom mmengine import Config\nfrom mmengine.dataset import Compose, default_collate\nfrom mmengine.registry import DefaultScope\nfrom mmocr.datasets import PackTextDetInputs\nfrom mmocr.registry import MODELS\nfrom torch.utils.data import DataLoader, DistributedSampler\nfrom torchvision.datasets.folder import pil_loader\nfrom torchvision.transforms import CenterCrop, Compose, Resize\nfrom tqdm import tqdm\n\nfrom tools.datasets.utils import extract_frames, is_video\n\n\ndef merge_scores(gathered_list: list, meta: pd.DataFrame):\n    # reorder\n    indices_list = list(map(lambda x: x[0], gathered_list))\n    scores_list = list(map(lambda x: x[1], gathered_list))\n    flat_indices = []\n    for x in zip(*indices_list):\n        flat_indices.extend(x)\n    flat_scores = []\n    for x in zip(*scores_list):\n        flat_scores.extend(x)\n    flat_indices = np.array(flat_indices)\n    flat_scores = np.array(flat_scores)\n    # filter duplicates\n    unique_indices, unique_indices_idx = np.unique(flat_indices, return_index=True)\n    meta.loc[unique_indices, \"ocr\"] = flat_scores[unique_indices_idx]\n\n\nclass VideoTextDataset(torch.utils.data.Dataset):\n    def __init__(self, meta_path, transform):\n        self.meta_path = meta_path\n        self.meta = pd.read_csv(meta_path)\n        self.transform = transform\n        self.transform = Compose(\n            [\n                Resize(1024),\n                CenterCrop(1024),\n            ]\n        )\n        self.formatting = PackTextDetInputs(meta_keys=[\"scale_factor\"])\n\n    def __getitem__(self, index):\n        row = self.meta.iloc[index]\n        path = row[\"path\"]\n\n        if is_video(path):\n            img = extract_frames(path, frame_inds=[10], backend=\"opencv\")[0]\n        else:\n            img = pil_loader(path)\n\n        img = self.transform(img)\n        img_array = np.array(img)[:, :, ::-1].copy()  # bgr\n        results = {\n            \"img\": img_array,\n            \"scale_factor\": 1.0,\n            # 'img_shape': img_array.shape[-2],\n            # 'ori_shape': img_array.shape[-2],\n        }\n        results = self.formatting(results)\n        results[\"index\"] = index\n\n        return results\n\n    def __len__(self):\n        return len(self.meta)\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"meta_path\", type=str, help=\"Path to the input CSV file\")\n    parser.add_argument(\"--bs\", type=int, default=16, help=\"Batch size\")\n    parser.add_argument(\"--num_workers\", type=int, default=16, help=\"Number of workers\")\n    parser.add_argument(\"--skip_if_existing\", action=\"store_true\")\n    args = parser.parse_args()\n\n    return args\n\n\ndef main():\n    args = parse_args()\n\n    meta_path = args.meta_path\n    if not os.path.exists(meta_path):\n        print(f\"Meta file '{meta_path}' not found. Exit.\")\n        exit()\n\n    wo_ext, ext = os.path.splitext(meta_path)\n    out_path = f\"{wo_ext}_ocr{ext}\"\n    if args.skip_if_existing and os.path.exists(out_path):\n        print(f\"Output meta file '{out_path}' already exists. Exit.\")\n        exit()\n\n    cfg = Config.fromfile(\"./tools/scoring/ocr/dbnetpp.py\")\n    colossalai.launch_from_torch({})\n\n    device = torch.device(\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n    DefaultScope.get_instance(\"ocr\", scope_name=\"mmocr\")  # use mmocr Registry as default\n\n    # build model\n    model = MODELS.build(cfg.model)\n    model.init_weights()\n    model.to(device)  # set data_preprocessor._device\n    print(\"==> Model built.\")\n\n    # build dataset\n    transform = Compose(cfg.test_pipeline)\n    dataset = VideoTextDataset(meta_path=meta_path, transform=transform)\n    dataloader = DataLoader(\n        dataset,\n        batch_size=args.bs,\n        num_workers=args.num_workers,\n        sampler=DistributedSampler(\n            dataset,\n            num_replicas=dist.get_world_size(),\n            rank=dist.get_rank(),\n            shuffle=False,\n            drop_last=False,\n        ),\n        collate_fn=default_collate,\n    )\n    print(\"==> Dataloader built.\")\n\n    # compute scores\n    dataset.meta[\"ocr\"] = np.nan\n    indices_list = []\n    scores_list = []\n    model.eval()\n    for data in tqdm(dataloader, disable=dist.get_rank() != 0):\n        indices_i = data[\"index\"]\n        indices_list.extend(indices_i.tolist())\n        del data[\"index\"]\n\n        pred = model.test_step(data)  # this line will cast data to device\n\n        num_texts_i = [(x.pred_instances.scores > 0.3).sum().item() for x in pred]\n        scores_list.extend(num_texts_i)\n\n    gathered_list = [None] * dist.get_world_size()\n    dist.all_gather_object(gathered_list, (indices_list, scores_list))\n\n    if dist.get_rank() == 0:\n        merge_scores(gathered_list, dataset.meta)\n        dataset.meta.to_csv(out_path, index=False)\n        print(f\"New meta (shape={dataset.meta.shape}) with ocr results saved to '{out_path}'.\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/tools/scoring/optical_flow/__init__.py",
    "content": ""
  },
  {
    "path": "Open-Sora/tools/scoring/optical_flow/inference.py",
    "content": "import cv2  # isort:skip\n\nimport argparse\nimport gc\nimport os\nfrom datetime import timedelta\n\nimport numpy as np\nimport pandas as pd\nimport torch\nimport torch.distributed as dist\nimport torch.nn.functional as F\nfrom einops import rearrange\nfrom torch.utils.data import DataLoader, DistributedSampler\nfrom torchvision.transforms.functional import pil_to_tensor\nfrom tqdm import tqdm\n\nfrom tools.datasets.utils import extract_frames\nfrom tools.scoring.optical_flow.unimatch import UniMatch\n\n# torch.backends.cudnn.enabled = False # This line enables large batch, but the speed is similar\n\n\ndef merge_scores(gathered_list: list, meta: pd.DataFrame, column):\n    # reorder\n    indices_list = list(map(lambda x: x[0], gathered_list))\n    scores_list = list(map(lambda x: x[1], gathered_list))\n\n    flat_indices = []\n    for x in zip(*indices_list):\n        flat_indices.extend(x)\n    flat_scores = []\n    for x in zip(*scores_list):\n        flat_scores.extend(x)\n    flat_indices = np.array(flat_indices)\n    flat_scores = np.array(flat_scores)\n\n    # filter duplicates\n    unique_indices, unique_indices_idx = np.unique(flat_indices, return_index=True)\n    meta.loc[unique_indices, column] = flat_scores[unique_indices_idx]\n\n    # drop indices in meta not in unique_indices\n    meta = meta.loc[unique_indices]\n    return meta\n\n\nclass VideoTextDataset(torch.utils.data.Dataset):\n    def __init__(self, meta_path, frame_inds=[0, 10, 20, 30]):\n        self.meta_path = meta_path\n        self.meta = pd.read_csv(meta_path)\n        self.frame_inds = frame_inds\n\n    def __getitem__(self, index):\n        sample = self.meta.iloc[index]\n        path = sample[\"path\"]\n\n        # extract frames\n        images = extract_frames(path, frame_inds=self.frame_inds, backend=\"opencv\")\n\n        # transform\n        images = torch.stack([pil_to_tensor(x) for x in images])\n\n        # stack\n        # shape: [N, C, H, W]; dtype: torch.uint8\n        images = images.float()\n        H, W = images.shape[-2:]\n        if H > W:\n            images = rearrange(images, \"N C H W -> N C W H\")\n        images = F.interpolate(images, size=(320, 576), mode=\"bilinear\", align_corners=True)\n\n        ret = dict(index=index, images=images)\n        return ret\n\n    def __len__(self):\n        return len(self.meta)\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"meta_path\", type=str, help=\"Path to the input CSV file\")\n    parser.add_argument(\"--bs\", type=int, default=4, help=\"Batch size\")  # don't use too large bs for unimatch\n    parser.add_argument(\"--num_workers\", type=int, default=16, help=\"Number of workers\")\n    parser.add_argument(\"--skip_if_existing\", action=\"store_true\")\n    args = parser.parse_args()\n    return args\n\n\ndef main():\n    args = parse_args()\n\n    meta_path = args.meta_path\n    if not os.path.exists(meta_path):\n        print(f\"Meta file '{meta_path}' not found. Exit.\")\n        exit()\n\n    wo_ext, ext = os.path.splitext(meta_path)\n    out_path = f\"{wo_ext}_flow{ext}\"\n    if args.skip_if_existing and os.path.exists(out_path):\n        print(f\"Output meta file '{out_path}' already exists. Exit.\")\n        exit()\n\n    torch.backends.cudnn.deterministic = True\n    torch.backends.cudnn.benchmark = False\n    dist.init_process_group(backend=\"nccl\", timeout=timedelta(hours=24))\n    torch.cuda.set_device(dist.get_rank() % torch.cuda.device_count())\n\n    # build model\n    device = torch.device(\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n    model = UniMatch(\n        feature_channels=128,\n        num_scales=2,\n        upsample_factor=4,\n        num_head=1,\n        ffn_dim_expansion=4,\n        num_transformer_layers=6,\n        reg_refine=True,\n        task=\"flow\",\n    )\n    ckpt = torch.load(\"./pretrained_models/unimatch/gmflow-scale2-regrefine6-mixdata-train320x576-4e7b215d.pth\")\n    model.load_state_dict(ckpt[\"model\"])\n    model = model.to(device)\n\n    # build dataset\n    dataset = VideoTextDataset(meta_path=meta_path, frame_inds=[0, 10, 20, 30])\n    dataloader = DataLoader(\n        dataset,\n        batch_size=args.bs,\n        num_workers=args.num_workers,\n        sampler=DistributedSampler(\n            dataset,\n            num_replicas=dist.get_world_size(),\n            rank=dist.get_rank(),\n            shuffle=False,\n            drop_last=False,\n        ),\n    )\n\n    # compute optical flow scores\n    indices_list = []\n    scores_list = []\n    model.eval()\n    for batch in tqdm(dataloader, disable=dist.get_rank() != 0):\n        indices = batch[\"index\"]\n        images = batch[\"images\"].to(device, non_blocking=True)\n\n        B = images.shape[0]\n        batch_0 = rearrange(images[:, :-1], \"B N C H W -> (B N) C H W\").contiguous()\n        batch_1 = rearrange(images[:, 1:], \"B N C H W -> (B N) C H W\").contiguous()\n\n        with torch.no_grad():\n            res = model(\n                batch_0,\n                batch_1,\n                attn_type=\"swin\",\n                attn_splits_list=[2, 8],\n                corr_radius_list=[-1, 4],\n                prop_radius_list=[-1, 1],\n                num_reg_refine=6,\n                task=\"flow\",\n                pred_bidir_flow=False,\n            )\n            flow_maps = res[\"flow_preds\"][-1].cpu()  # [B * (N-1), 2, H, W]\n            flow_maps = rearrange(flow_maps, \"(B N) C H W -> B N H W C\", B=B)\n            flow_scores = flow_maps.abs().mean(dim=[1, 2, 3, 4])\n            flow_scores = flow_scores.tolist()\n\n        indices_list.extend(indices.tolist())\n        scores_list.extend(flow_scores)\n\n    # save local results\n    meta_local = merge_scores([(indices_list, scores_list)], dataset.meta, column=\"flow\")\n    save_dir_local = os.path.join(os.path.dirname(out_path), \"parts\")\n    os.makedirs(save_dir_local, exist_ok=True)\n    out_path_local = os.path.join(\n        save_dir_local, os.path.basename(out_path).replace(\".csv\", f\"_part_{dist.get_rank()}.csv\")\n    )\n    meta_local.to_csv(out_path_local, index=False)\n\n    # wait for all ranks to finish data processing\n    dist.barrier()\n\n    torch.cuda.empty_cache()\n    gc.collect()\n    gathered_list = [None] * dist.get_world_size()\n    dist.all_gather_object(gathered_list, (indices_list, scores_list))\n    if dist.get_rank() == 0:\n        meta_new = merge_scores(gathered_list, dataset.meta, column=\"flow\")\n        meta_new.to_csv(out_path, index=False)\n        print(f\"New meta with optical flow scores saved to '{out_path}'.\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "Open-Sora/tools/scoring/optical_flow/unimatch/__init__.py",
    "content": "from .unimatch import UniMatch\n"
  },
  {
    "path": "Open-Sora/tools/scoring/optical_flow/unimatch/attention.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom .utils import merge_splits, merge_splits_1d, split_feature, split_feature_1d\n\n\ndef single_head_full_attention(q, k, v):\n    # q, k, v: [B, L, C]\n    assert q.dim() == k.dim() == v.dim() == 3\n\n    scores = torch.matmul(q, k.permute(0, 2, 1)) / (q.size(2) ** 0.5)  # [B, L, L]\n    attn = torch.softmax(scores, dim=2)  # [B, L, L]\n    out = torch.matmul(attn, v)  # [B, L, C]\n\n    return out\n\n\ndef single_head_full_attention_1d(\n    q,\n    k,\n    v,\n    h=None,\n    w=None,\n):\n    # q, k, v: [B, L, C]\n\n    assert h is not None and w is not None\n    assert q.size(1) == h * w\n\n    b, _, c = q.size()\n\n    q = q.view(b, h, w, c)  # [B, H, W, C]\n    k = k.view(b, h, w, c)\n    v = v.view(b, h, w, c)\n\n    scale_factor = c**0.5\n\n    scores = torch.matmul(q, k.permute(0, 1, 3, 2)) / scale_factor  # [B, H, W, W]\n\n    attn = torch.softmax(scores, dim=-1)\n\n    out = torch.matmul(attn, v).view(b, -1, c)  # [B, H*W, C]\n\n    return out\n\n\ndef single_head_split_window_attention(\n    q,\n    k,\n    v,\n    num_splits=1,\n    with_shift=False,\n    h=None,\n    w=None,\n    attn_mask=None,\n):\n    # ref: https://github.com/microsoft/Swin-Transformer/blob/main/models/swin_transformer.py\n    # q, k, v: [B, L, C]\n    assert q.dim() == k.dim() == v.dim() == 3\n\n    assert h is not None and w is not None\n    assert q.size(1) == h * w\n\n    b, _, c = q.size()\n\n    b_new = b * num_splits * num_splits\n\n    window_size_h = h // num_splits\n    window_size_w = w // num_splits\n\n    q = q.view(b, h, w, c)  # [B, H, W, C]\n    k = k.view(b, h, w, c)\n    v = v.view(b, h, w, c)\n\n    scale_factor = c**0.5\n\n    if with_shift:\n        assert attn_mask is not None  # compute once\n        shift_size_h = window_size_h // 2\n        shift_size_w = window_size_w // 2\n\n        q = torch.roll(q, shifts=(-shift_size_h, -shift_size_w), dims=(1, 2))\n        k = torch.roll(k, shifts=(-shift_size_h, -shift_size_w), dims=(1, 2))\n        v = torch.roll(v, shifts=(-shift_size_h, -shift_size_w), dims=(1, 2))\n\n    q = split_feature(q, num_splits=num_splits, channel_last=True)  # [B*K*K, H/K, W/K, C]\n    k = split_feature(k, num_splits=num_splits, channel_last=True)\n    v = split_feature(v, num_splits=num_splits, channel_last=True)\n\n    scores = (\n        torch.matmul(q.view(b_new, -1, c), k.view(b_new, -1, c).permute(0, 2, 1)) / scale_factor\n    )  # [B*K*K, H/K*W/K, H/K*W/K]\n\n    if with_shift:\n        scores += attn_mask.repeat(b, 1, 1)\n\n    attn = torch.softmax(scores, dim=-1)\n\n    out = torch.matmul(attn, v.view(b_new, -1, c))  # [B*K*K, H/K*W/K, C]\n\n    out = merge_splits(\n        out.view(b_new, h // num_splits, w // num_splits, c), num_splits=num_splits, channel_last=True\n    )  # [B, H, W, C]\n\n    # shift back\n    if with_shift:\n        out = torch.roll(out, shifts=(shift_size_h, shift_size_w), dims=(1, 2))\n\n    out = out.view(b, -1, c)\n\n    return out\n\n\ndef single_head_split_window_attention_1d(\n    q,\n    k,\n    v,\n    relative_position_bias=None,\n    num_splits=1,\n    with_shift=False,\n    h=None,\n    w=None,\n    attn_mask=None,\n):\n    # q, k, v: [B, L, C]\n\n    assert h is not None and w is not None\n    assert q.size(1) == h * w\n\n    b, _, c = q.size()\n\n    b_new = b * num_splits * h\n\n    window_size_w = w // num_splits\n\n    q = q.view(b * h, w, c)  # [B*H, W, C]\n    k = k.view(b * h, w, c)\n    v = v.view(b * h, w, c)\n\n    scale_factor = c**0.5\n\n    if with_shift:\n        assert attn_mask is not None  # compute once\n        shift_size_w = window_size_w // 2\n\n        q = torch.roll(q, shifts=-shift_size_w, dims=1)\n        k = torch.roll(k, shifts=-shift_size_w, dims=1)\n        v = torch.roll(v, shifts=-shift_size_w, dims=1)\n\n    q = split_feature_1d(q, num_splits=num_splits)  # [B*H*K, W/K, C]\n    k = split_feature_1d(k, num_splits=num_splits)\n    v = split_feature_1d(v, num_splits=num_splits)\n\n    scores = (\n        torch.matmul(q.view(b_new, -1, c), k.view(b_new, -1, c).permute(0, 2, 1)) / scale_factor\n    )  # [B*H*K, W/K, W/K]\n\n    if with_shift:\n        # attn_mask: [K, W/K, W/K]\n        scores += attn_mask.repeat(b * h, 1, 1)  # [B*H*K, W/K, W/K]\n\n    attn = torch.softmax(scores, dim=-1)\n\n    out = torch.matmul(attn, v.view(b_new, -1, c))  # [B*H*K, W/K, C]\n\n    out = merge_splits_1d(out, h, num_splits=num_splits)  # [B, H, W, C]\n\n    # shift back\n    if with_shift:\n        out = torch.roll(out, shifts=shift_size_w, dims=2)\n\n    out = out.view(b, -1, c)\n\n    return out\n\n\nclass SelfAttnPropagation(nn.Module):\n    \"\"\"\n    flow propagation with self-attention on feature\n    query: feature0, key: feature0, value: flow\n    \"\"\"\n\n    def __init__(\n        self,\n        in_channels,\n        **kwargs,\n    ):\n        super(SelfAttnPropagation, self).__init__()\n\n        self.q_proj = nn.Linear(in_channels, in_channels)\n        self.k_proj = nn.Linear(in_channels, in_channels)\n\n        for p in self.parameters():\n            if p.dim() > 1:\n                nn.init.xavier_uniform_(p)\n\n    def forward(\n        self,\n        feature0,\n        flow,\n        local_window_attn=False,\n        local_window_radius=1,\n        **kwargs,\n    ):\n        # q, k: feature [B, C, H, W], v: flow [B, 2, H, W]\n        if local_window_attn:\n            return self.forward_local_window_attn(feature0, flow, local_window_radius=local_window_radius)\n\n        b, c, h, w = feature0.size()\n\n        query = feature0.view(b, c, h * w).permute(0, 2, 1)  # [B, H*W, C]\n\n        # a note: the ``correct'' implementation should be:\n        # ``query = self.q_proj(query), key = self.k_proj(query)''\n        # this problem is observed while cleaning up the code\n        # however, this doesn't affect the performance since the projection is a linear operation,\n        # thus the two projection matrices for key can be merged\n        # so I just leave it as is in order to not re-train all models :)\n        query = self.q_proj(query)  # [B, H*W, C]\n        key = self.k_proj(query)  # [B, H*W, C]\n\n        value = flow.view(b, flow.size(1), h * w).permute(0, 2, 1)  # [B, H*W, 2]\n\n        scores = torch.matmul(query, key.permute(0, 2, 1)) / (c**0.5)  # [B, H*W, H*W]\n        prob = torch.softmax(scores, dim=-1)\n\n        out = torch.matmul(prob, value)  # [B, H*W, 2]\n        out = out.view(b, h, w, value.size(-1)).permute(0, 3, 1, 2)  # [B, 2, H, W]\n\n        return out\n\n    def forward_local_window_attn(\n        self,\n        feature0,\n        flow,\n        local_window_radius=1,\n    ):\n        assert flow.size(1) == 2 or flow.size(1) == 1  # flow or disparity or depth\n        assert local_window_radius > 0\n\n        b, c, h, w = feature0.size()\n\n        value_channel = flow.size(1)\n\n        feature0_reshape = self.q_proj(feature0.view(b, c, -1).permute(0, 2, 1)).reshape(\n            b * h * w, 1, c\n        )  # [B*H*W, 1, C]\n\n        kernel_size = 2 * local_window_radius + 1\n\n        feature0_proj = self.k_proj(feature0.view(b, c, -1).permute(0, 2, 1)).permute(0, 2, 1).reshape(b, c, h, w)\n\n        feature0_window = F.unfold(\n            feature0_proj, kernel_size=kernel_size, padding=local_window_radius\n        )  # [B, C*(2R+1)^2), H*W]\n\n        feature0_window = (\n            feature0_window.view(b, c, kernel_size**2, h, w)\n            .permute(0, 3, 4, 1, 2)\n            .reshape(b * h * w, c, kernel_size**2)\n        )  # [B*H*W, C, (2R+1)^2]\n\n        flow_window = F.unfold(flow, kernel_size=kernel_size, padding=local_window_radius)  # [B, 2*(2R+1)^2), H*W]\n\n        flow_window = (\n            flow_window.view(b, value_channel, kernel_size**2, h, w)\n            .permute(0, 3, 4, 2, 1)\n            .reshape(b * h * w, kernel_size**2, value_channel)\n        )  # [B*H*W, (2R+1)^2, 2]\n\n        scores = torch.matmul(feature0_reshape, feature0_window) / (c**0.5)  # [B*H*W, 1, (2R+1)^2]\n\n        prob = torch.softmax(scores, dim=-1)\n\n        out = (\n            torch.matmul(prob, flow_window).view(b, h, w, value_channel).permute(0, 3, 1, 2).contiguous()\n        )  # [B, 2, H, W]\n\n        return out\n"
  },
  {
    "path": "Open-Sora/tools/scoring/optical_flow/unimatch/backbone.py",
    "content": "import torch.nn as nn\n\nfrom .trident_conv import MultiScaleTridentConv\n\n\nclass ResidualBlock(nn.Module):\n    def __init__(\n        self,\n        in_planes,\n        planes,\n        norm_layer=nn.InstanceNorm2d,\n        stride=1,\n        dilation=1,\n    ):\n        super(ResidualBlock, self).__init__()\n\n        self.conv1 = nn.Conv2d(\n            in_planes, planes, kernel_size=3, dilation=dilation, padding=dilation, stride=stride, bias=False\n        )\n        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, dilation=dilation, padding=dilation, bias=False)\n        self.relu = nn.ReLU(inplace=True)\n\n        self.norm1 = norm_layer(planes)\n        self.norm2 = norm_layer(planes)\n        if not stride == 1 or in_planes != planes:\n            self.norm3 = norm_layer(planes)\n\n        if stride == 1 and in_planes == planes:\n            self.downsample = None\n        else:\n            self.downsample = nn.Sequential(nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride), self.norm3)\n\n    def forward(self, x):\n        y = x\n        y = self.relu(self.norm1(self.conv1(y)))\n        y = self.relu(self.norm2(self.conv2(y)))\n\n        if self.downsample is not None:\n            x = self.downsample(x)\n\n        return self.relu(x + y)\n\n\nclass CNNEncoder(nn.Module):\n    def __init__(\n        self,\n        output_dim=128,\n        norm_layer=nn.InstanceNorm2d,\n        num_output_scales=1,\n        **kwargs,\n    ):\n        super(CNNEncoder, self).__init__()\n        self.num_branch = num_output_scales\n\n        feature_dims = [64, 96, 128]\n\n        self.conv1 = nn.Conv2d(3, feature_dims[0], kernel_size=7, stride=2, padding=3, bias=False)  # 1/2\n        self.norm1 = norm_layer(feature_dims[0])\n        self.relu1 = nn.ReLU(inplace=True)\n\n        self.in_planes = feature_dims[0]\n        self.layer1 = self._make_layer(feature_dims[0], stride=1, norm_layer=norm_layer)  # 1/2\n        self.layer2 = self._make_layer(feature_dims[1], stride=2, norm_layer=norm_layer)  # 1/4\n\n        # highest resolution 1/4 or 1/8\n        stride = 2 if num_output_scales == 1 else 1\n        self.layer3 = self._make_layer(\n            feature_dims[2],\n            stride=stride,\n            norm_layer=norm_layer,\n        )  # 1/4 or 1/8\n\n        self.conv2 = nn.Conv2d(feature_dims[2], output_dim, 1, 1, 0)\n\n        if self.num_branch > 1:\n            if self.num_branch == 4:\n                strides = (1, 2, 4, 8)\n            elif self.num_branch == 3:\n                strides = (1, 2, 4)\n            elif self.num_branch == 2:\n                strides = (1, 2)\n            else:\n                raise ValueError\n\n            self.trident_conv = MultiScaleTridentConv(\n                output_dim,\n                output_dim,\n                kernel_size=3,\n                strides=strides,\n                paddings=1,\n                num_branch=self.num_branch,\n            )\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode=\"fan_out\", nonlinearity=\"relu\")\n            elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):\n                if m.weight is not None:\n                    nn.init.constant_(m.weight, 1)\n                if m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n\n    def _make_layer(self, dim, stride=1, dilation=1, norm_layer=nn.InstanceNorm2d):\n        layer1 = ResidualBlock(self.in_planes, dim, norm_layer=norm_layer, stride=stride, dilation=dilation)\n        layer2 = ResidualBlock(dim, dim, norm_layer=norm_layer, stride=1, dilation=dilation)\n\n        layers = (layer1, layer2)\n\n        self.in_planes = dim\n        return nn.Sequential(*layers)\n\n    def forward(self, x):\n        x = self.conv1(x)\n        x = self.norm1(x)\n        x = self.relu1(x)\n\n        x = self.layer1(x)  # 1/2\n        x = self.layer2(x)  # 1/4\n        x = self.layer3(x)  # 1/8 or 1/4\n\n        x = self.conv2(x)\n\n        if self.num_branch > 1:\n            out = self.trident_conv([x] * self.num_branch)  # high to low res\n        else:\n            out = [x]\n\n        return out\n"
  },
  {
    "path": "Open-Sora/tools/scoring/optical_flow/unimatch/geometry.py",
    "content": "import torch\nimport torch.nn.functional as F\n\n\ndef coords_grid(b, h, w, homogeneous=False, device=None):\n    y, x = torch.meshgrid(torch.arange(h), torch.arange(w))  # [H, W]\n\n    stacks = [x, y]\n\n    if homogeneous:\n        ones = torch.ones_like(x)  # [H, W]\n        stacks.append(ones)\n\n    grid = torch.stack(stacks, dim=0).float()  # [2, H, W] or [3, H, W]\n\n    grid = grid[None].repeat(b, 1, 1, 1)  # [B, 2, H, W] or [B, 3, H, W]\n\n    if device is not None:\n        grid = grid.to(device)\n\n    return grid\n\n\ndef generate_window_grid(h_min, h_max, w_min, w_max, len_h, len_w, device=None):\n    assert device is not None\n\n    x, y = torch.meshgrid(\n        [torch.linspace(w_min, w_max, len_w, device=device), torch.linspace(h_min, h_max, len_h, device=device)],\n    )\n    grid = torch.stack((x, y), -1).transpose(0, 1).float()  # [H, W, 2]\n\n    return grid\n\n\ndef normalize_coords(coords, h, w):\n    # coords: [B, H, W, 2]\n    c = torch.Tensor([(w - 1) / 2.0, (h - 1) / 2.0]).float().to(coords.device)\n    return (coords - c) / c  # [-1, 1]\n\n\ndef bilinear_sample(img, sample_coords, mode=\"bilinear\", padding_mode=\"zeros\", return_mask=False):\n    # img: [B, C, H, W]\n    # sample_coords: [B, 2, H, W] in image scale\n    if sample_coords.size(1) != 2:  # [B, H, W, 2]\n        sample_coords = sample_coords.permute(0, 3, 1, 2)\n\n    b, _, h, w = sample_coords.shape\n\n    # Normalize to [-1, 1]\n    x_grid = 2 * sample_coords[:, 0] / (w - 1) - 1\n    y_grid = 2 * sample_coords[:, 1] / (h - 1) - 1\n\n    grid = torch.stack([x_grid, y_grid], dim=-1)  # [B, H, W, 2]\n\n    img = F.grid_sample(img, grid, mode=mode, padding_mode=padding_mode, align_corners=True)\n\n    if return_mask:\n        mask = (x_grid >= -1) & (y_grid >= -1) & (x_grid <= 1) & (y_grid <= 1)  # [B, H, W]\n\n        return img, mask\n\n    return img\n\n\ndef flow_warp(feature, flow, mask=False, padding_mode=\"zeros\"):\n    b, c, h, w = feature.size()\n    assert flow.size(1) == 2\n\n    grid = coords_grid(b, h, w).to(flow.device) + flow  # [B, 2, H, W]\n\n    return bilinear_sample(feature, grid, padding_mode=padding_mode, return_mask=mask)\n\n\ndef forward_backward_consistency_check(fwd_flow, bwd_flow, alpha=0.01, beta=0.5):\n    # fwd_flow, bwd_flow: [B, 2, H, W]\n    # alpha and beta values are following UnFlow (https://arxiv.org/abs/1711.07837)\n    assert fwd_flow.dim() == 4 and bwd_flow.dim() == 4\n    assert fwd_flow.size(1) == 2 and bwd_flow.size(1) == 2\n    flow_mag = torch.norm(fwd_flow, dim=1) + torch.norm(bwd_flow, dim=1)  # [B, H, W]\n\n    warped_bwd_flow = flow_warp(bwd_flow, fwd_flow)  # [B, 2, H, W]\n    warped_fwd_flow = flow_warp(fwd_flow, bwd_flow)  # [B, 2, H, W]\n\n    diff_fwd = torch.norm(fwd_flow + warped_bwd_flow, dim=1)  # [B, H, W]\n    diff_bwd = torch.norm(bwd_flow + warped_fwd_flow, dim=1)\n\n    threshold = alpha * flow_mag + beta\n\n    fwd_occ = (diff_fwd > threshold).float()  # [B, H, W]\n    bwd_occ = (diff_bwd > threshold).float()\n\n    return fwd_occ, bwd_occ\n\n\ndef back_project(depth, intrinsics):\n    # Back project 2D pixel coords to 3D points\n    # depth: [B, H, W]\n    # intrinsics: [B, 3, 3]\n    b, h, w = depth.shape\n    grid = coords_grid(b, h, w, homogeneous=True, device=depth.device)  # [B, 3, H, W]\n\n    intrinsics_inv = torch.inverse(intrinsics)  # [B, 3, 3]\n\n    points = intrinsics_inv.bmm(grid.view(b, 3, -1)).view(b, 3, h, w) * depth.unsqueeze(1)  # [B, 3, H, W]\n\n    return points\n\n\ndef camera_transform(points_ref, extrinsics_ref=None, extrinsics_tgt=None, extrinsics_rel=None):\n    # Transform 3D points from reference camera to target camera\n    # points_ref: [B, 3, H, W]\n    # extrinsics_ref: [B, 4, 4]\n    # extrinsics_tgt: [B, 4, 4]\n    # extrinsics_rel: [B, 4, 4], relative pose transform\n    b, _, h, w = points_ref.shape\n\n    if extrinsics_rel is None:\n        extrinsics_rel = torch.bmm(extrinsics_tgt, torch.inverse(extrinsics_ref))  # [B, 4, 4]\n\n    points_tgt = (\n        torch.bmm(extrinsics_rel[:, :3, :3], points_ref.view(b, 3, -1)) + extrinsics_rel[:, :3, -1:]\n    )  # [B, 3, H*W]\n\n    points_tgt = points_tgt.view(b, 3, h, w)  # [B, 3, H, W]\n\n    return points_tgt\n\n\ndef reproject(points_tgt, intrinsics, return_mask=False):\n    # reproject to target view\n    # points_tgt: [B, 3, H, W]\n    # intrinsics: [B, 3, 3]\n\n    b, _, h, w = points_tgt.shape\n\n    proj_points = torch.bmm(intrinsics, points_tgt.view(b, 3, -1)).view(b, 3, h, w)  # [B, 3, H, W]\n\n    X = proj_points[:, 0]\n    Y = proj_points[:, 1]\n    Z = proj_points[:, 2].clamp(min=1e-3)\n\n    pixel_coords = torch.stack([X / Z, Y / Z], dim=1).view(b, 2, h, w)  # [B, 2, H, W] in image scale\n\n    if return_mask:\n        # valid mask in pixel space\n        mask = (\n            (pixel_coords[:, 0] >= 0)\n            & (pixel_coords[:, 0] <= (w - 1))\n            & (pixel_coords[:, 1] >= 0)\n            & (pixel_coords[:, 1] <= (h - 1))\n        )  # [B, H, W]\n\n        return pixel_coords, mask\n\n    return pixel_coords\n\n\ndef reproject_coords(\n    depth_ref, intrinsics, extrinsics_ref=None, extrinsics_tgt=None, extrinsics_rel=None, return_mask=False\n):\n    # Compute reprojection sample coords\n    points_ref = back_project(depth_ref, intrinsics)  # [B, 3, H, W]\n    points_tgt = camera_transform(points_ref, extrinsics_ref, extrinsics_tgt, extrinsics_rel=extrinsics_rel)\n\n    if return_mask:\n        reproj_coords, mask = reproject(points_tgt, intrinsics, return_mask=return_mask)  # [B, 2, H, W] in image scale\n\n        return reproj_coords, mask\n\n    reproj_coords = reproject(points_tgt, intrinsics, return_mask=return_mask)  # [B, 2, H, W] in image scale\n\n    return reproj_coords\n\n\ndef compute_flow_with_depth_pose(\n    depth_ref, intrinsics, extrinsics_ref=None, extrinsics_tgt=None, extrinsics_rel=None, return_mask=False\n):\n    b, h, w = depth_ref.shape\n    coords_init = coords_grid(b, h, w, device=depth_ref.device)  # [B, 2, H, W]\n\n    if return_mask:\n        reproj_coords, mask = reproject_coords(\n            depth_ref,\n            intrinsics,\n            extrinsics_ref,\n            extrinsics_tgt,\n            extrinsics_rel=extrinsics_rel,\n            return_mask=return_mask,\n        )  # [B, 2, H, W]\n        rigid_flow = reproj_coords - coords_init\n\n        return rigid_flow, mask\n\n    reproj_coords = reproject_coords(\n        depth_ref, intrinsics, extrinsics_ref, extrinsics_tgt, extrinsics_rel=extrinsics_rel, return_mask=return_mask\n    )  # [B, 2, H, W]\n\n    rigid_flow = reproj_coords - coords_init\n\n    return rigid_flow\n"
  },
  {
    "path": "Open-Sora/tools/scoring/optical_flow/unimatch/matching.py",
    "content": "import torch\nimport torch.nn.functional as F\n\nfrom .geometry import coords_grid, generate_window_grid, normalize_coords\n\n\ndef global_correlation_softmax(\n    feature0,\n    feature1,\n    pred_bidir_flow=False,\n):\n    # global correlation\n    b, c, h, w = feature0.shape\n    feature0 = feature0.view(b, c, -1).permute(0, 2, 1)  # [B, H*W, C]\n    feature1 = feature1.view(b, c, -1)  # [B, C, H*W]\n\n    correlation = torch.matmul(feature0, feature1).view(b, h, w, h, w) / (c**0.5)  # [B, H, W, H, W]\n\n    # flow from softmax\n    init_grid = coords_grid(b, h, w).to(correlation.device)  # [B, 2, H, W]\n    grid = init_grid.view(b, 2, -1).permute(0, 2, 1)  # [B, H*W, 2]\n\n    correlation = correlation.view(b, h * w, h * w)  # [B, H*W, H*W]\n\n    if pred_bidir_flow:\n        correlation = torch.cat((correlation, correlation.permute(0, 2, 1)), dim=0)  # [2*B, H*W, H*W]\n        init_grid = init_grid.repeat(2, 1, 1, 1)  # [2*B, 2, H, W]\n        grid = grid.repeat(2, 1, 1)  # [2*B, H*W, 2]\n        b = b * 2\n\n    prob = F.softmax(correlation, dim=-1)  # [B, H*W, H*W]\n\n    correspondence = torch.matmul(prob, grid).view(b, h, w, 2).permute(0, 3, 1, 2)  # [B, 2, H, W]\n\n    # when predicting bidirectional flow, flow is the concatenation of forward flow and backward flow\n    flow = correspondence - init_grid\n\n    return flow, prob\n\n\ndef local_correlation_softmax(\n    feature0,\n    feature1,\n    local_radius,\n    padding_mode=\"zeros\",\n):\n    b, c, h, w = feature0.size()\n    coords_init = coords_grid(b, h, w).to(feature0.device)  # [B, 2, H, W]\n    coords = coords_init.view(b, 2, -1).permute(0, 2, 1)  # [B, H*W, 2]\n\n    local_h = 2 * local_radius + 1\n    local_w = 2 * local_radius + 1\n\n    window_grid = generate_window_grid(\n        -local_radius, local_radius, -local_radius, local_radius, local_h, local_w, device=feature0.device\n    )  # [2R+1, 2R+1, 2]\n    window_grid = window_grid.reshape(-1, 2).repeat(b, 1, 1, 1)  # [B, 1, (2R+1)^2, 2]\n    sample_coords = coords.unsqueeze(-2) + window_grid  # [B, H*W, (2R+1)^2, 2]\n\n    sample_coords_softmax = sample_coords\n\n    # exclude coords that are out of image space\n    valid_x = (sample_coords[:, :, :, 0] >= 0) & (sample_coords[:, :, :, 0] < w)  # [B, H*W, (2R+1)^2]\n    valid_y = (sample_coords[:, :, :, 1] >= 0) & (sample_coords[:, :, :, 1] < h)  # [B, H*W, (2R+1)^2]\n\n    valid = valid_x & valid_y  # [B, H*W, (2R+1)^2], used to mask out invalid values when softmax\n\n    # normalize coordinates to [-1, 1]\n    sample_coords_norm = normalize_coords(sample_coords, h, w)  # [-1, 1]\n    window_feature = F.grid_sample(feature1, sample_coords_norm, padding_mode=padding_mode, align_corners=True).permute(\n        0, 2, 1, 3\n    )  # [B, H*W, C, (2R+1)^2]\n    feature0_view = feature0.permute(0, 2, 3, 1).view(b, h * w, 1, c)  # [B, H*W, 1, C]\n\n    corr = torch.matmul(feature0_view, window_feature).view(b, h * w, -1) / (c**0.5)  # [B, H*W, (2R+1)^2]\n\n    # mask invalid locations\n    corr[~valid] = -1e9\n\n    prob = F.softmax(corr, -1)  # [B, H*W, (2R+1)^2]\n\n    correspondence = (\n        torch.matmul(prob.unsqueeze(-2), sample_coords_softmax).squeeze(-2).view(b, h, w, 2).permute(0, 3, 1, 2)\n    )  # [B, 2, H, W]\n\n    flow = correspondence - coords_init\n    match_prob = prob\n\n    return flow, match_prob\n\n\ndef local_correlation_with_flow(\n    feature0,\n    feature1,\n    flow,\n    local_radius,\n    padding_mode=\"zeros\",\n    dilation=1,\n):\n    b, c, h, w = feature0.size()\n    coords_init = coords_grid(b, h, w).to(feature0.device)  # [B, 2, H, W]\n    coords = coords_init.view(b, 2, -1).permute(0, 2, 1)  # [B, H*W, 2]\n\n    local_h = 2 * local_radius + 1\n    local_w = 2 * local_radius + 1\n\n    window_grid = generate_window_grid(\n        -local_radius, local_radius, -local_radius, local_radius, local_h, local_w, device=feature0.device\n    )  # [2R+1, 2R+1, 2]\n    window_grid = window_grid.reshape(-1, 2).repeat(b, 1, 1, 1)  # [B, 1, (2R+1)^2, 2]\n    sample_coords = coords.unsqueeze(-2) + window_grid * dilation  # [B, H*W, (2R+1)^2, 2]\n\n    # flow can be zero when using features after transformer\n    if not isinstance(flow, float):\n        sample_coords = sample_coords + flow.view(b, 2, -1).permute(0, 2, 1).unsqueeze(-2)  # [B, H*W, (2R+1)^2, 2]\n    else:\n        assert flow == 0.0\n\n    # normalize coordinates to [-1, 1]\n    sample_coords_norm = normalize_coords(sample_coords, h, w)  # [-1, 1]\n    window_feature = F.grid_sample(feature1, sample_coords_norm, padding_mode=padding_mode, align_corners=True).permute(\n        0, 2, 1, 3\n    )  # [B, H*W, C, (2R+1)^2]\n    feature0_view = feature0.permute(0, 2, 3, 1).view(b, h * w, 1, c)  # [B, H*W, 1, C]\n\n    corr = torch.matmul(feature0_view, window_feature).view(b, h * w, -1) / (c**0.5)  # [B, H*W, (2R+1)^2]\n\n    corr = corr.view(b, h, w, -1).permute(0, 3, 1, 2).contiguous()  # [B, (2R+1)^2, H, W]\n\n    return corr\n\n\ndef global_correlation_softmax_stereo(\n    feature0,\n    feature1,\n):\n    # global correlation on horizontal direction\n    b, c, h, w = feature0.shape\n\n    x_grid = torch.linspace(0, w - 1, w, device=feature0.device)  # [W]\n\n    feature0 = feature0.permute(0, 2, 3, 1)  # [B, H, W, C]\n    feature1 = feature1.permute(0, 2, 1, 3)  # [B, H, C, W]\n\n    correlation = torch.matmul(feature0, feature1) / (c**0.5)  # [B, H, W, W]\n\n    # mask subsequent positions to make disparity positive\n    mask = torch.triu(torch.ones((w, w)), diagonal=1).type_as(feature0)  # [W, W]\n    valid_mask = (mask == 0).unsqueeze(0).unsqueeze(0).repeat(b, h, 1, 1)  # [B, H, W, W]\n\n    correlation[~valid_mask] = -1e9\n\n    prob = F.softmax(correlation, dim=-1)  # [B, H, W, W]\n\n    correspondence = (x_grid.view(1, 1, 1, w) * prob).sum(-1)  # [B, H, W]\n\n    # NOTE: unlike flow, disparity is typically positive\n    disparity = x_grid.view(1, 1, w).repeat(b, h, 1) - correspondence  # [B, H, W]\n\n    return disparity.unsqueeze(1), prob  # feature resolution\n\n\ndef local_correlation_softmax_stereo(\n    feature0,\n    feature1,\n    local_radius,\n):\n    b, c, h, w = feature0.size()\n    coords_init = coords_grid(b, h, w).to(feature0.device)  # [B, 2, H, W]\n    coords = coords_init.view(b, 2, -1).permute(0, 2, 1).contiguous()  # [B, H*W, 2]\n\n    local_h = 1\n    local_w = 2 * local_radius + 1\n\n    window_grid = generate_window_grid(\n        0, 0, -local_radius, local_radius, local_h, local_w, device=feature0.device\n    )  # [1, 2R+1, 2]\n    window_grid = window_grid.reshape(-1, 2).repeat(b, 1, 1, 1)  # [B, 1, (2R+1), 2]\n    sample_coords = coords.unsqueeze(-2) + window_grid  # [B, H*W, (2R+1), 2]\n\n    sample_coords_softmax = sample_coords\n\n    # exclude coords that are out of image space\n    valid_x = (sample_coords[:, :, :, 0] >= 0) & (sample_coords[:, :, :, 0] < w)  # [B, H*W, (2R+1)^2]\n    valid_y = (sample_coords[:, :, :, 1] >= 0) & (sample_coords[:, :, :, 1] < h)  # [B, H*W, (2R+1)^2]\n\n    valid = valid_x & valid_y  # [B, H*W, (2R+1)^2], used to mask out invalid values when softmax\n\n    # normalize coordinates to [-1, 1]\n    sample_coords_norm = normalize_coords(sample_coords, h, w)  # [-1, 1]\n    window_feature = F.grid_sample(feature1, sample_coords_norm, padding_mode=\"zeros\", align_corners=True).permute(\n        0, 2, 1, 3\n    )  # [B, H*W, C, (2R+1)]\n    feature0_view = feature0.permute(0, 2, 3, 1).contiguous().view(b, h * w, 1, c)  # [B, H*W, 1, C]\n\n    corr = torch.matmul(feature0_view, window_feature).view(b, h * w, -1) / (c**0.5)  # [B, H*W, (2R+1)]\n\n    # mask invalid locations\n    corr[~valid] = -1e9\n\n    prob = F.softmax(corr, -1)  # [B, H*W, (2R+1)]\n\n    correspondence = (\n        torch.matmul(prob.unsqueeze(-2), sample_coords_softmax)\n        .squeeze(-2)\n        .view(b, h, w, 2)\n        .permute(0, 3, 1, 2)\n        .contiguous()\n    )  # [B, 2, H, W]\n\n    flow = correspondence - coords_init  # flow at feature resolution\n    match_prob = prob\n\n    flow_x = -flow[:, :1]  # [B, 1, H, W]\n\n    return flow_x, match_prob\n\n\ndef correlation_softmax_depth(\n    feature0,\n    feature1,\n    intrinsics,\n    pose,\n    depth_candidates,\n    depth_from_argmax=False,\n    pred_bidir_depth=False,\n):\n    b, c, h, w = feature0.size()\n    assert depth_candidates.dim() == 4  # [B, D, H, W]\n    scale_factor = c**0.5\n\n    if pred_bidir_depth:\n        feature0, feature1 = torch.cat((feature0, feature1), dim=0), torch.cat((feature1, feature0), dim=0)\n        intrinsics = intrinsics.repeat(2, 1, 1)\n        pose = torch.cat((pose, torch.inverse(pose)), dim=0)\n        depth_candidates = depth_candidates.repeat(2, 1, 1, 1)\n\n    # depth candidates are actually inverse depth\n    warped_feature1 = warp_with_pose_depth_candidates(\n        feature1,\n        intrinsics,\n        pose,\n        1.0 / depth_candidates,\n    )  # [B, C, D, H, W]\n\n    correlation = (feature0.unsqueeze(2) * warped_feature1).sum(1) / scale_factor  # [B, D, H, W]\n\n    match_prob = F.softmax(correlation, dim=1)  # [B, D, H, W]\n\n    # for cross-task transfer (flow -> depth), extract depth with argmax at test time\n    if depth_from_argmax:\n        index = torch.argmax(match_prob, dim=1, keepdim=True)\n        depth = torch.gather(depth_candidates, dim=1, index=index)\n    else:\n        depth = (match_prob * depth_candidates).sum(dim=1, keepdim=True)  # [B, 1, H, W]\n\n    return depth, match_prob\n\n\ndef warp_with_pose_depth_candidates(\n    feature1,\n    intrinsics,\n    pose,\n    depth,\n    clamp_min_depth=1e-3,\n):\n    \"\"\"\n    feature1: [B, C, H, W]\n    intrinsics: [B, 3, 3]\n    pose: [B, 4, 4]\n    depth: [B, D, H, W]\n    \"\"\"\n\n    assert intrinsics.size(1) == intrinsics.size(2) == 3\n    assert pose.size(1) == pose.size(2) == 4\n    assert depth.dim() == 4\n\n    b, d, h, w = depth.size()\n    c = feature1.size(1)\n\n    with torch.no_grad():\n        # pixel coordinates\n        grid = coords_grid(b, h, w, homogeneous=True, device=depth.device)  # [B, 3, H, W]\n        # back project to 3D and transform viewpoint\n        points = torch.inverse(intrinsics).bmm(grid.view(b, 3, -1))  # [B, 3, H*W]\n        points = torch.bmm(pose[:, :3, :3], points).unsqueeze(2).repeat(1, 1, d, 1) * depth.view(\n            b, 1, d, h * w\n        )  # [B, 3, D, H*W]\n        points = points + pose[:, :3, -1:].unsqueeze(-1)  # [B, 3, D, H*W]\n        # reproject to 2D image plane\n        points = torch.bmm(intrinsics, points.view(b, 3, -1)).view(b, 3, d, h * w)  # [B, 3, D, H*W]\n        pixel_coords = points[:, :2] / points[:, -1:].clamp(min=clamp_min_depth)  # [B, 2, D, H*W]\n\n        # normalize to [-1, 1]\n        x_grid = 2 * pixel_coords[:, 0] / (w - 1) - 1\n        y_grid = 2 * pixel_coords[:, 1] / (h - 1) - 1\n\n        grid = torch.stack([x_grid, y_grid], dim=-1)  # [B, D, H*W, 2]\n\n    # sample features\n    warped_feature = F.grid_sample(\n        feature1, grid.view(b, d * h, w, 2), mode=\"bilinear\", padding_mode=\"zeros\", align_corners=True\n    ).view(\n        b, c, d, h, w\n    )  # [B, C, D, H, W]\n\n    return warped_feature\n"
  },
  {
    "path": "Open-Sora/tools/scoring/optical_flow/unimatch/position.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n# https://github.com/facebookresearch/detr/blob/main/models/position_encoding.py\n\nimport math\n\nimport torch\nimport torch.nn as nn\n\n\nclass PositionEmbeddingSine(nn.Module):\n    \"\"\"\n    This is a more standard version of the position embedding, very similar to the one\n    used by the Attention is all you need paper, generalized to work on images.\n    \"\"\"\n\n    def __init__(self, num_pos_feats=64, temperature=10000, normalize=True, scale=None):\n        super().__init__()\n        self.num_pos_feats = num_pos_feats\n        self.temperature = temperature\n        self.normalize = normalize\n        if scale is not None and normalize is False:\n            raise ValueError(\"normalize should be True if scale is passed\")\n        if scale is None:\n            scale = 2 * math.pi\n        self.scale = scale\n\n    def forward(self, x):\n        # x = tensor_list.tensors  # [B, C, H, W]\n        # mask = tensor_list.mask  # [B, H, W], input with padding, valid as 0\n        b, c, h, w = x.size()\n        mask = torch.ones((b, h, w), device=x.device)  # [B, H, W]\n        y_embed = mask.cumsum(1, dtype=torch.float32)\n        x_embed = mask.cumsum(2, dtype=torch.float32)\n        if self.normalize:\n            eps = 1e-6\n            y_embed = y_embed / (y_embed[:, -1:, :] + eps) * self.scale\n            x_embed = x_embed / (x_embed[:, :, -1:] + eps) * self.scale\n\n        dim_t = torch.arange(self.num_pos_feats, dtype=torch.float32, device=x.device)\n        dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats)\n\n        pos_x = x_embed[:, :, :, None] / dim_t\n        pos_y = y_embed[:, :, :, None] / dim_t\n        pos_x = torch.stack((pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), dim=4).flatten(3)\n        pos_y = torch.stack((pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), dim=4).flatten(3)\n        pos = torch.cat((pos_y, pos_x), dim=3).permute(0, 3, 1, 2)\n        return pos\n"
  },
  {
    "path": "Open-Sora/tools/scoring/optical_flow/unimatch/reg_refine.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\nclass FlowHead(nn.Module):\n    def __init__(\n        self,\n        input_dim=128,\n        hidden_dim=256,\n        out_dim=2,\n    ):\n        super(FlowHead, self).__init__()\n\n        self.conv1 = nn.Conv2d(input_dim, hidden_dim, 3, padding=1)\n        self.conv2 = nn.Conv2d(hidden_dim, out_dim, 3, padding=1)\n        self.relu = nn.ReLU(inplace=True)\n\n    def forward(self, x):\n        out = self.conv2(self.relu(self.conv1(x)))\n\n        return out\n\n\nclass SepConvGRU(nn.Module):\n    def __init__(\n        self,\n        hidden_dim=128,\n        input_dim=192 + 128,\n        kernel_size=5,\n    ):\n        padding = (kernel_size - 1) // 2\n\n        super(SepConvGRU, self).__init__()\n        self.convz1 = nn.Conv2d(hidden_dim + input_dim, hidden_dim, (1, kernel_size), padding=(0, padding))\n        self.convr1 = nn.Conv2d(hidden_dim + input_dim, hidden_dim, (1, kernel_size), padding=(0, padding))\n        self.convq1 = nn.Conv2d(hidden_dim + input_dim, hidden_dim, (1, kernel_size), padding=(0, padding))\n\n        self.convz2 = nn.Conv2d(hidden_dim + input_dim, hidden_dim, (kernel_size, 1), padding=(padding, 0))\n        self.convr2 = nn.Conv2d(hidden_dim + input_dim, hidden_dim, (kernel_size, 1), padding=(padding, 0))\n        self.convq2 = nn.Conv2d(hidden_dim + input_dim, hidden_dim, (kernel_size, 1), padding=(padding, 0))\n\n    def forward(self, h, x):\n        # horizontal\n        hx = torch.cat([h, x], dim=1)\n        z = torch.sigmoid(self.convz1(hx))\n        r = torch.sigmoid(self.convr1(hx))\n        q = torch.tanh(self.convq1(torch.cat([r * h, x], dim=1)))\n        h = (1 - z) * h + z * q\n\n        # vertical\n        hx = torch.cat([h, x], dim=1)\n        z = torch.sigmoid(self.convz2(hx))\n        r = torch.sigmoid(self.convr2(hx))\n        q = torch.tanh(self.convq2(torch.cat([r * h, x], dim=1)))\n        h = (1 - z) * h + z * q\n\n        return h\n\n\nclass BasicMotionEncoder(nn.Module):\n    def __init__(\n        self,\n        corr_channels=324,\n        flow_channels=2,\n    ):\n        super(BasicMotionEncoder, self).__init__()\n\n        self.convc1 = nn.Conv2d(corr_channels, 256, 1, padding=0)\n        self.convc2 = nn.Conv2d(256, 192, 3, padding=1)\n        self.convf1 = nn.Conv2d(flow_channels, 128, 7, padding=3)\n        self.convf2 = nn.Conv2d(128, 64, 3, padding=1)\n        self.conv = nn.Conv2d(64 + 192, 128 - flow_channels, 3, padding=1)\n\n    def forward(self, flow, corr):\n        cor = F.relu(self.convc1(corr))\n        cor = F.relu(self.convc2(cor))\n        flo = F.relu(self.convf1(flow))\n        flo = F.relu(self.convf2(flo))\n\n        cor_flo = torch.cat([cor, flo], dim=1)\n        out = F.relu(self.conv(cor_flo))\n        return torch.cat([out, flow], dim=1)\n\n\nclass BasicUpdateBlock(nn.Module):\n    def __init__(\n        self,\n        corr_channels=324,\n        hidden_dim=128,\n        context_dim=128,\n        downsample_factor=8,\n        flow_dim=2,\n        bilinear_up=False,\n    ):\n        super(BasicUpdateBlock, self).__init__()\n\n        self.encoder = BasicMotionEncoder(\n            corr_channels=corr_channels,\n            flow_channels=flow_dim,\n        )\n\n        self.gru = SepConvGRU(hidden_dim=hidden_dim, input_dim=context_dim + hidden_dim)\n\n        self.flow_head = FlowHead(\n            hidden_dim,\n            hidden_dim=256,\n            out_dim=flow_dim,\n        )\n\n        if bilinear_up:\n            self.mask = None\n        else:\n            self.mask = nn.Sequential(\n                nn.Conv2d(hidden_dim, 256, 3, padding=1),\n                nn.ReLU(inplace=True),\n                nn.Conv2d(256, downsample_factor**2 * 9, 1, padding=0),\n            )\n\n    def forward(self, net, inp, corr, flow):\n        motion_features = self.encoder(flow, corr)\n\n        inp = torch.cat([inp, motion_features], dim=1)\n\n        net = self.gru(net, inp)\n        delta_flow = self.flow_head(net)\n\n        if self.mask is not None:\n            mask = self.mask(net)\n        else:\n            mask = None\n\n        return net, mask, delta_flow\n"
  },
  {
    "path": "Open-Sora/tools/scoring/optical_flow/unimatch/transformer.py",
    "content": "import torch\nimport torch.nn as nn\n\nfrom .attention import (\n    single_head_full_attention,\n    single_head_full_attention_1d,\n    single_head_split_window_attention,\n    single_head_split_window_attention_1d,\n)\nfrom .utils import generate_shift_window_attn_mask, generate_shift_window_attn_mask_1d\n\n\nclass TransformerLayer(nn.Module):\n    def __init__(\n        self,\n        d_model=128,\n        nhead=1,\n        no_ffn=False,\n        ffn_dim_expansion=4,\n    ):\n        super(TransformerLayer, self).__init__()\n\n        self.dim = d_model\n        self.nhead = nhead\n        self.no_ffn = no_ffn\n\n        # multi-head attention\n        self.q_proj = nn.Linear(d_model, d_model, bias=False)\n        self.k_proj = nn.Linear(d_model, d_model, bias=False)\n        self.v_proj = nn.Linear(d_model, d_model, bias=False)\n\n        self.merge = nn.Linear(d_model, d_model, bias=False)\n\n        self.norm1 = nn.LayerNorm(d_model)\n\n        # no ffn after self-attn, with ffn after cross-attn\n        if not self.no_ffn:\n            in_channels = d_model * 2\n            self.mlp = nn.Sequential(\n                nn.Linear(in_channels, in_channels * ffn_dim_expansion, bias=False),\n                nn.GELU(),\n                nn.Linear(in_channels * ffn_dim_expansion, d_model, bias=False),\n            )\n\n            self.norm2 = nn.LayerNorm(d_model)\n\n    def forward(\n        self,\n        source,\n        target,\n        height=None,\n        width=None,\n        shifted_window_attn_mask=None,\n        shifted_window_attn_mask_1d=None,\n        attn_type=\"swin\",\n        with_shift=False,\n        attn_num_splits=None,\n    ):\n        # source, target: [B, L, C]\n        query, key, value = source, target, target\n\n        # for stereo: 2d attn in self-attn, 1d attn in cross-attn\n        is_self_attn = (query - key).abs().max() < 1e-6\n\n        # single-head attention\n        query = self.q_proj(query)  # [B, L, C]\n        key = self.k_proj(key)  # [B, L, C]\n        value = self.v_proj(value)  # [B, L, C]\n\n        if attn_type == \"swin\" and attn_num_splits > 1:  # self, cross-attn: both swin 2d\n            if self.nhead > 1:\n                # we observe that multihead attention slows down the speed and increases the memory consumption\n                # without bringing obvious performance gains and thus the implementation is removed\n                raise NotImplementedError\n            else:\n                message = single_head_split_window_attention(\n                    query,\n                    key,\n                    value,\n                    num_splits=attn_num_splits,\n                    with_shift=with_shift,\n                    h=height,\n                    w=width,\n                    attn_mask=shifted_window_attn_mask,\n                )\n\n        elif attn_type == \"self_swin2d_cross_1d\":  # self-attn: swin 2d, cross-attn: full 1d\n            if self.nhead > 1:\n                raise NotImplementedError\n            else:\n                if is_self_attn:\n                    if attn_num_splits > 1:\n                        message = single_head_split_window_attention(\n                            query,\n                            key,\n                            value,\n                            num_splits=attn_num_splits,\n                            with_shift=with_shift,\n                            h=height,\n                            w=width,\n                            attn_mask=shifted_window_attn_mask,\n                        )\n                    else:\n                        # full 2d attn\n                        message = single_head_full_attention(query, key, value)  # [N, L, C]\n\n                else:\n                    # cross attn 1d\n                    message = single_head_full_attention_1d(\n                        query,\n                        key,\n                        value,\n                        h=height,\n                        w=width,\n                    )\n\n        elif attn_type == \"self_swin2d_cross_swin1d\":  # self-attn: swin 2d, cross-attn: swin 1d\n            if self.nhead > 1:\n                raise NotImplementedError\n            else:\n                if is_self_attn:\n                    if attn_num_splits > 1:\n                        # self attn shift window\n                        message = single_head_split_window_attention(\n                            query,\n                            key,\n                            value,\n                            num_splits=attn_num_splits,\n                            with_shift=with_shift,\n                            h=height,\n                            w=width,\n                            attn_mask=shifted_window_attn_mask,\n                        )\n                    else:\n                        # full 2d attn\n                        message = single_head_full_attention(query, key, value)  # [N, L, C]\n                else:\n                    if attn_num_splits > 1:\n                        assert shifted_window_attn_mask_1d is not None\n                        # cross attn 1d shift\n                        message = single_head_split_window_attention_1d(\n                            query,\n                            key,\n                            value,\n                            num_splits=attn_num_splits,\n                            with_shift=with_shift,\n                            h=height,\n                            w=width,\n                            attn_mask=shifted_window_attn_mask_1d,\n                        )\n                    else:\n                        message = single_head_full_attention_1d(\n                            query,\n                            key,\n                            value,\n                            h=height,\n                            w=width,\n                        )\n\n        else:\n            message = single_head_full_attention(query, key, value)  # [B, L, C]\n\n        message = self.merge(message)  # [B, L, C]\n        message = self.norm1(message)\n\n        if not self.no_ffn:\n            message = self.mlp(torch.cat([source, message], dim=-1))\n            message = self.norm2(message)\n\n        return source + message\n\n\nclass TransformerBlock(nn.Module):\n    \"\"\"self attention + cross attention + FFN\"\"\"\n\n    def __init__(\n        self,\n        d_model=128,\n        nhead=1,\n        ffn_dim_expansion=4,\n    ):\n        super(TransformerBlock, self).__init__()\n\n        self.self_attn = TransformerLayer(\n            d_model=d_model,\n            nhead=nhead,\n            no_ffn=True,\n            ffn_dim_expansion=ffn_dim_expansion,\n        )\n\n        self.cross_attn_ffn = TransformerLayer(\n            d_model=d_model,\n            nhead=nhead,\n            ffn_dim_expansion=ffn_dim_expansion,\n        )\n\n    def forward(\n        self,\n        source,\n        target,\n        height=None,\n        width=None,\n        shifted_window_attn_mask=None,\n        shifted_window_attn_mask_1d=None,\n        attn_type=\"swin\",\n        with_shift=False,\n        attn_num_splits=None,\n    ):\n        # source, target: [B, L, C]\n\n        # self attention\n        source = self.self_attn(\n            source,\n            source,\n            height=height,\n            width=width,\n            shifted_window_attn_mask=shifted_window_attn_mask,\n            attn_type=attn_type,\n            with_shift=with_shift,\n            attn_num_splits=attn_num_splits,\n        )\n\n        # cross attention and ffn\n        source = self.cross_attn_ffn(\n            source,\n            target,\n            height=height,\n            width=width,\n            shifted_window_attn_mask=shifted_window_attn_mask,\n            shifted_window_attn_mask_1d=shifted_window_attn_mask_1d,\n            attn_type=attn_type,\n            with_shift=with_shift,\n            attn_num_splits=attn_num_splits,\n        )\n\n        return source\n\n\nclass FeatureTransformer(nn.Module):\n    def __init__(\n        self,\n        num_layers=6,\n        d_model=128,\n        nhead=1,\n        ffn_dim_expansion=4,\n    ):\n        super(FeatureTransformer, self).__init__()\n\n        self.d_model = d_model\n        self.nhead = nhead\n\n        self.layers = nn.ModuleList(\n            [\n                TransformerBlock(\n                    d_model=d_model,\n                    nhead=nhead,\n                    ffn_dim_expansion=ffn_dim_expansion,\n                )\n                for i in range(num_layers)\n            ]\n        )\n\n        for p in self.parameters():\n            if p.dim() > 1:\n                nn.init.xavier_uniform_(p)\n\n    def forward(\n        self,\n        feature0,\n        feature1,\n        attn_type=\"swin\",\n        attn_num_splits=None,\n        **kwargs,\n    ):\n        b, c, h, w = feature0.shape\n        assert self.d_model == c\n\n        feature0 = feature0.flatten(-2).permute(0, 2, 1)  # [B, H*W, C]\n        feature1 = feature1.flatten(-2).permute(0, 2, 1)  # [B, H*W, C]\n\n        # 2d attention\n        if \"swin\" in attn_type and attn_num_splits > 1:\n            # global and refine use different number of splits\n            window_size_h = h // attn_num_splits\n            window_size_w = w // attn_num_splits\n\n            # compute attn mask once\n            shifted_window_attn_mask = generate_shift_window_attn_mask(\n                input_resolution=(h, w),\n                window_size_h=window_size_h,\n                window_size_w=window_size_w,\n                shift_size_h=window_size_h // 2,\n                shift_size_w=window_size_w // 2,\n                device=feature0.device,\n            )  # [K*K, H/K*W/K, H/K*W/K]\n        else:\n            shifted_window_attn_mask = None\n\n        # 1d attention\n        if \"swin1d\" in attn_type and attn_num_splits > 1:\n            window_size_w = w // attn_num_splits\n\n            # compute attn mask once\n            shifted_window_attn_mask_1d = generate_shift_window_attn_mask_1d(\n                input_w=w,\n                window_size_w=window_size_w,\n                shift_size_w=window_size_w // 2,\n                device=feature0.device,\n            )  # [K, W/K, W/K]\n        else:\n            shifted_window_attn_mask_1d = None\n\n        # concat feature0 and feature1 in batch dimension to compute in parallel\n        concat0 = torch.cat((feature0, feature1), dim=0)  # [2B, H*W, C]\n        concat1 = torch.cat((feature1, feature0), dim=0)  # [2B, H*W, C]\n\n        for i, layer in enumerate(self.layers):\n            concat0 = layer(\n                concat0,\n                concat1,\n                height=h,\n                width=w,\n                attn_type=attn_type,\n                with_shift=\"swin\" in attn_type and attn_num_splits > 1 and i % 2 == 1,\n                attn_num_splits=attn_num_splits,\n                shifted_window_attn_mask=shifted_window_attn_mask,\n                shifted_window_attn_mask_1d=shifted_window_attn_mask_1d,\n            )\n\n            # update feature1\n            concat1 = torch.cat(concat0.chunk(chunks=2, dim=0)[::-1], dim=0)\n\n        feature0, feature1 = concat0.chunk(chunks=2, dim=0)  # [B, H*W, C]\n\n        # reshape back\n        feature0 = feature0.view(b, h, w, c).permute(0, 3, 1, 2).contiguous()  # [B, C, H, W]\n        feature1 = feature1.view(b, h, w, c).permute(0, 3, 1, 2).contiguous()  # [B, C, H, W]\n\n        return feature0, feature1\n"
  },
  {
    "path": "Open-Sora/tools/scoring/optical_flow/unimatch/trident_conv.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# https://github.com/facebookresearch/detectron2/blob/main/projects/TridentNet/tridentnet/trident_conv.py\n\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\nfrom torch.nn.modules.utils import _pair\n\n\nclass MultiScaleTridentConv(nn.Module):\n    def __init__(\n        self,\n        in_channels,\n        out_channels,\n        kernel_size,\n        stride=1,\n        strides=1,\n        paddings=0,\n        dilations=1,\n        dilation=1,\n        groups=1,\n        num_branch=1,\n        test_branch_idx=-1,\n        bias=False,\n        norm=None,\n        activation=None,\n    ):\n        super(MultiScaleTridentConv, self).__init__()\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n        self.kernel_size = _pair(kernel_size)\n        self.num_branch = num_branch\n        self.stride = _pair(stride)\n        self.groups = groups\n        self.with_bias = bias\n        self.dilation = dilation\n        if isinstance(paddings, int):\n            paddings = [paddings] * self.num_branch\n        if isinstance(dilations, int):\n            dilations = [dilations] * self.num_branch\n        if isinstance(strides, int):\n            strides = [strides] * self.num_branch\n        self.paddings = [_pair(padding) for padding in paddings]\n        self.dilations = [_pair(dilation) for dilation in dilations]\n        self.strides = [_pair(stride) for stride in strides]\n        self.test_branch_idx = test_branch_idx\n        self.norm = norm\n        self.activation = activation\n\n        assert len({self.num_branch, len(self.paddings), len(self.strides)}) == 1\n\n        self.weight = nn.Parameter(torch.Tensor(out_channels, in_channels // groups, *self.kernel_size))\n        if bias:\n            self.bias = nn.Parameter(torch.Tensor(out_channels))\n        else:\n            self.bias = None\n\n        nn.init.kaiming_uniform_(self.weight, nonlinearity=\"relu\")\n        if self.bias is not None:\n            nn.init.constant_(self.bias, 0)\n\n    def forward(self, inputs):\n        num_branch = self.num_branch if self.training or self.test_branch_idx == -1 else 1\n        assert len(inputs) == num_branch\n\n        if self.training or self.test_branch_idx == -1:\n            outputs = [\n                F.conv2d(input, self.weight, self.bias, stride, padding, self.dilation, self.groups)\n                for input, stride, padding in zip(inputs, self.strides, self.paddings)\n            ]\n        else:\n            outputs = [\n                F.conv2d(\n                    inputs[0],\n                    self.weight,\n                    self.bias,\n                    self.strides[self.test_branch_idx] if self.test_branch_idx == -1 else self.strides[-1],\n                    self.paddings[self.test_branch_idx] if self.test_branch_idx == -1 else self.paddings[-1],\n                    self.dilation,\n                    self.groups,\n                )\n            ]\n\n        if self.norm is not None:\n            outputs = [self.norm(x) for x in outputs]\n        if self.activation is not None:\n            outputs = [self.activation(x) for x in outputs]\n        return outputs\n"
  },
  {
    "path": "Open-Sora/tools/scoring/optical_flow/unimatch/unimatch.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom .attention import SelfAttnPropagation\nfrom .backbone import CNNEncoder\nfrom .geometry import compute_flow_with_depth_pose, flow_warp\nfrom .matching import (\n    correlation_softmax_depth,\n    global_correlation_softmax,\n    global_correlation_softmax_stereo,\n    local_correlation_softmax,\n    local_correlation_softmax_stereo,\n    local_correlation_with_flow,\n)\nfrom .reg_refine import BasicUpdateBlock\nfrom .transformer import FeatureTransformer\nfrom .utils import feature_add_position, normalize_img, upsample_flow_with_mask\n\n\nclass UniMatch(nn.Module):\n    def __init__(\n        self,\n        num_scales=1,\n        feature_channels=128,\n        upsample_factor=8,\n        num_head=1,\n        ffn_dim_expansion=4,\n        num_transformer_layers=6,\n        reg_refine=False,  # optional local regression refinement\n        task=\"flow\",\n    ):\n        super(UniMatch, self).__init__()\n\n        self.feature_channels = feature_channels\n        self.num_scales = num_scales\n        self.upsample_factor = upsample_factor\n        self.reg_refine = reg_refine\n\n        # CNN\n        self.backbone = CNNEncoder(output_dim=feature_channels, num_output_scales=num_scales)\n\n        # Transformer\n        self.transformer = FeatureTransformer(\n            num_layers=num_transformer_layers,\n            d_model=feature_channels,\n            nhead=num_head,\n            ffn_dim_expansion=ffn_dim_expansion,\n        )\n\n        # propagation with self-attn\n        self.feature_flow_attn = SelfAttnPropagation(in_channels=feature_channels)\n\n        if not self.reg_refine or task == \"depth\":\n            # convex upsampling simiar to RAFT\n            # concat feature0 and low res flow as input\n            self.upsampler = nn.Sequential(\n                nn.Conv2d(2 + feature_channels, 256, 3, 1, 1),\n                nn.ReLU(inplace=True),\n                nn.Conv2d(256, upsample_factor**2 * 9, 1, 1, 0),\n            )\n            # thus far, all the learnable parameters are task-agnostic\n\n        if reg_refine:\n            # optional task-specific local regression refinement\n            self.refine_proj = nn.Conv2d(128, 256, 1)\n            self.refine = BasicUpdateBlock(\n                corr_channels=(2 * 4 + 1) ** 2,\n                downsample_factor=upsample_factor,\n                flow_dim=2 if task == \"flow\" else 1,\n                bilinear_up=task == \"depth\",\n            )\n\n    def extract_feature(self, img0, img1):\n        concat = torch.cat((img0, img1), dim=0)  # [2B, C, H, W]\n        features = self.backbone(concat)  # list of [2B, C, H, W], resolution from high to low\n\n        # reverse: resolution from low to high\n        features = features[::-1]\n\n        feature0, feature1 = [], []\n\n        for i in range(len(features)):\n            feature = features[i]\n            chunks = torch.chunk(feature, 2, 0)  # tuple\n            feature0.append(chunks[0])\n            feature1.append(chunks[1])\n\n        return feature0, feature1\n\n    def upsample_flow(self, flow, feature, bilinear=False, upsample_factor=8, is_depth=False):\n        if bilinear:\n            multiplier = 1 if is_depth else upsample_factor\n            up_flow = (\n                F.interpolate(flow, scale_factor=upsample_factor, mode=\"bilinear\", align_corners=True) * multiplier\n            )\n        else:\n            concat = torch.cat((flow, feature), dim=1)\n            mask = self.upsampler(concat)\n            up_flow = upsample_flow_with_mask(flow, mask, upsample_factor=self.upsample_factor, is_depth=is_depth)\n\n        return up_flow\n\n    def forward(\n        self,\n        img0,\n        img1,\n        attn_type=None,\n        attn_splits_list=None,\n        corr_radius_list=None,\n        prop_radius_list=None,\n        num_reg_refine=1,\n        pred_bidir_flow=False,\n        task=\"flow\",\n        intrinsics=None,\n        pose=None,  # relative pose transform\n        min_depth=1.0 / 0.5,  # inverse depth range\n        max_depth=1.0 / 10,\n        num_depth_candidates=64,\n        depth_from_argmax=False,\n        pred_bidir_depth=False,\n        **kwargs,\n    ):\n        if pred_bidir_flow:\n            assert task == \"flow\"\n\n        if task == \"depth\":\n            assert self.num_scales == 1  # multi-scale depth model is not supported yet\n\n        results_dict = {}\n        flow_preds = []\n\n        if task == \"flow\":\n            # stereo and depth tasks have normalized img in dataloader\n            img0, img1 = normalize_img(img0, img1)  # [B, 3, H, W]\n\n        # list of features, resolution low to high\n        feature0_list, feature1_list = self.extract_feature(img0, img1)  # list of features\n\n        flow = None\n\n        if task != \"depth\":\n            assert len(attn_splits_list) == len(corr_radius_list) == len(prop_radius_list) == self.num_scales\n        else:\n            assert len(attn_splits_list) == len(prop_radius_list) == self.num_scales == 1\n\n        for scale_idx in range(self.num_scales):\n            feature0, feature1 = feature0_list[scale_idx], feature1_list[scale_idx]\n\n            if pred_bidir_flow and scale_idx > 0:\n                # predicting bidirectional flow with refinement\n                feature0, feature1 = torch.cat((feature0, feature1), dim=0), torch.cat((feature1, feature0), dim=0)\n\n            feature0_ori, feature1_ori = feature0, feature1\n\n            upsample_factor = self.upsample_factor * (2 ** (self.num_scales - 1 - scale_idx))\n\n            if task == \"depth\":\n                # scale intrinsics\n                intrinsics_curr = intrinsics.clone()\n                intrinsics_curr[:, :2] = intrinsics_curr[:, :2] / upsample_factor\n\n            if scale_idx > 0:\n                assert task != \"depth\"  # not supported for multi-scale depth model\n                flow = F.interpolate(flow, scale_factor=2, mode=\"bilinear\", align_corners=True) * 2\n\n            if flow is not None:\n                assert task != \"depth\"\n                flow = flow.detach()\n\n                if task == \"stereo\":\n                    # construct flow vector for disparity\n                    # flow here is actually disparity\n                    zeros = torch.zeros_like(flow)  # [B, 1, H, W]\n                    # NOTE: reverse disp, disparity is positive\n                    displace = torch.cat((-flow, zeros), dim=1)  # [B, 2, H, W]\n                    feature1 = flow_warp(feature1, displace)  # [B, C, H, W]\n                elif task == \"flow\":\n                    feature1 = flow_warp(feature1, flow)  # [B, C, H, W]\n                else:\n                    raise NotImplementedError\n\n            attn_splits = attn_splits_list[scale_idx]\n            if task != \"depth\":\n                corr_radius = corr_radius_list[scale_idx]\n            prop_radius = prop_radius_list[scale_idx]\n\n            # add position to features\n            feature0, feature1 = feature_add_position(feature0, feature1, attn_splits, self.feature_channels)\n\n            # Transformer\n            feature0, feature1 = self.transformer(\n                feature0,\n                feature1,\n                attn_type=attn_type,\n                attn_num_splits=attn_splits,\n            )\n\n            # correlation and softmax\n            if task == \"depth\":\n                # first generate depth candidates\n                b, _, h, w = feature0.size()\n                depth_candidates = torch.linspace(min_depth, max_depth, num_depth_candidates).type_as(feature0)\n                depth_candidates = depth_candidates.view(1, num_depth_candidates, 1, 1).repeat(\n                    b, 1, h, w\n                )  # [B, D, H, W]\n\n                flow_pred = correlation_softmax_depth(\n                    feature0,\n                    feature1,\n                    intrinsics_curr,\n                    pose,\n                    depth_candidates=depth_candidates,\n                    depth_from_argmax=depth_from_argmax,\n                    pred_bidir_depth=pred_bidir_depth,\n                )[0]\n\n            else:\n                if corr_radius == -1:  # global matching\n                    if task == \"flow\":\n                        flow_pred = global_correlation_softmax(feature0, feature1, pred_bidir_flow)[0]\n                    elif task == \"stereo\":\n                        flow_pred = global_correlation_softmax_stereo(feature0, feature1)[0]\n                    else:\n                        raise NotImplementedError\n                else:  # local matching\n                    if task == \"flow\":\n                        flow_pred = local_correlation_softmax(feature0, feature1, corr_radius)[0]\n                    elif task == \"stereo\":\n                        flow_pred = local_correlation_softmax_stereo(feature0, feature1, corr_radius)[0]\n                    else:\n                        raise NotImplementedError\n\n            # flow or residual flow\n            flow = flow + flow_pred if flow is not None else flow_pred\n\n            if task == \"stereo\":\n                flow = flow.clamp(min=0)  # positive disparity\n\n            # upsample to the original resolution for supervison at training time only\n            if self.training:\n                flow_bilinear = self.upsample_flow(\n                    flow, None, bilinear=True, upsample_factor=upsample_factor, is_depth=task == \"depth\"\n                )\n                flow_preds.append(flow_bilinear)\n\n            # flow propagation with self-attn\n            if (pred_bidir_flow or pred_bidir_depth) and scale_idx == 0:\n                feature0 = torch.cat((feature0, feature1), dim=0)  # [2*B, C, H, W] for propagation\n\n            flow = self.feature_flow_attn(\n                feature0,\n                flow.detach(),\n                local_window_attn=prop_radius > 0,\n                local_window_radius=prop_radius,\n            )\n\n            # bilinear exclude the last one\n            if self.training and scale_idx < self.num_scales - 1:\n                flow_up = self.upsample_flow(\n                    flow, feature0, bilinear=True, upsample_factor=upsample_factor, is_depth=task == \"depth\"\n                )\n                flow_preds.append(flow_up)\n\n            if scale_idx == self.num_scales - 1:\n                if not self.reg_refine:\n                    # upsample to the original image resolution\n\n                    if task == \"stereo\":\n                        flow_pad = torch.cat((-flow, torch.zeros_like(flow)), dim=1)  # [B, 2, H, W]\n                        flow_up_pad = self.upsample_flow(flow_pad, feature0)\n                        flow_up = -flow_up_pad[:, :1]  # [B, 1, H, W]\n                    elif task == \"depth\":\n                        depth_pad = torch.cat((flow, torch.zeros_like(flow)), dim=1)  # [B, 2, H, W]\n                        depth_up_pad = self.upsample_flow(depth_pad, feature0, is_depth=True).clamp(\n                            min=min_depth, max=max_depth\n                        )\n                        flow_up = depth_up_pad[:, :1]  # [B, 1, H, W]\n                    else:\n                        flow_up = self.upsample_flow(flow, feature0)\n\n                    flow_preds.append(flow_up)\n                else:\n                    # task-specific local regression refinement\n                    # supervise current flow\n                    if self.training:\n                        flow_up = self.upsample_flow(\n                            flow, feature0, bilinear=True, upsample_factor=upsample_factor, is_depth=task == \"depth\"\n                        )\n                        flow_preds.append(flow_up)\n\n                    assert num_reg_refine > 0\n                    for refine_iter_idx in range(num_reg_refine):\n                        flow = flow.detach()\n\n                        if task == \"stereo\":\n                            zeros = torch.zeros_like(flow)  # [B, 1, H, W]\n                            # NOTE: reverse disp, disparity is positive\n                            displace = torch.cat((-flow, zeros), dim=1)  # [B, 2, H, W]\n                            correlation = local_correlation_with_flow(\n                                feature0_ori,\n                                feature1_ori,\n                                flow=displace,\n                                local_radius=4,\n                            )  # [B, (2R+1)^2, H, W]\n                        elif task == \"depth\":\n                            if pred_bidir_depth and refine_iter_idx == 0:\n                                intrinsics_curr = intrinsics_curr.repeat(2, 1, 1)\n                                pose = torch.cat((pose, torch.inverse(pose)), dim=0)\n\n                                feature0_ori, feature1_ori = torch.cat((feature0_ori, feature1_ori), dim=0), torch.cat(\n                                    (feature1_ori, feature0_ori), dim=0\n                                )\n\n                            flow_from_depth = compute_flow_with_depth_pose(\n                                1.0 / flow.squeeze(1),\n                                intrinsics_curr,\n                                extrinsics_rel=pose,\n                            )\n\n                            correlation = local_correlation_with_flow(\n                                feature0_ori,\n                                feature1_ori,\n                                flow=flow_from_depth,\n                                local_radius=4,\n                            )  # [B, (2R+1)^2, H, W]\n\n                        else:\n                            correlation = local_correlation_with_flow(\n                                feature0_ori,\n                                feature1_ori,\n                                flow=flow,\n                                local_radius=4,\n                            )  # [B, (2R+1)^2, H, W]\n\n                        proj = self.refine_proj(feature0)\n\n                        net, inp = torch.chunk(proj, chunks=2, dim=1)\n\n                        net = torch.tanh(net)\n                        inp = torch.relu(inp)\n\n                        net, up_mask, residual_flow = self.refine(\n                            net,\n                            inp,\n                            correlation,\n                            flow.clone(),\n                        )\n\n                        if task == \"depth\":\n                            flow = (flow - residual_flow).clamp(min=min_depth, max=max_depth)\n                        else:\n                            flow = flow + residual_flow\n\n                        if task == \"stereo\":\n                            flow = flow.clamp(min=0)  # positive\n\n                        if self.training or refine_iter_idx == num_reg_refine - 1:\n                            if task == \"depth\":\n                                if refine_iter_idx < num_reg_refine - 1:\n                                    # bilinear upsampling\n                                    flow_up = self.upsample_flow(\n                                        flow, feature0, bilinear=True, upsample_factor=upsample_factor, is_depth=True\n                                    )\n                                else:\n                                    # last one convex upsampling\n                                    # NOTE: clamp depth due to the zero padding in the unfold in the convex upsampling\n                                    # pad depth to 2 channels as flow\n                                    depth_pad = torch.cat((flow, torch.zeros_like(flow)), dim=1)  # [B, 2, H, W]\n                                    depth_up_pad = self.upsample_flow(depth_pad, feature0, is_depth=True).clamp(\n                                        min=min_depth, max=max_depth\n                                    )\n                                    flow_up = depth_up_pad[:, :1]  # [B, 1, H, W]\n\n                            else:\n                                flow_up = upsample_flow_with_mask(\n                                    flow, up_mask, upsample_factor=self.upsample_factor, is_depth=task == \"depth\"\n                                )\n\n                            flow_preds.append(flow_up)\n\n        if task == \"stereo\":\n            for i in range(len(flow_preds)):\n                flow_preds[i] = flow_preds[i].squeeze(1)  # [B, H, W]\n\n        # convert inverse depth to depth\n        if task == \"depth\":\n            for i in range(len(flow_preds)):\n                flow_preds[i] = 1.0 / flow_preds[i].squeeze(1)  # [B, H, W]\n\n        results_dict.update({\"flow_preds\": flow_preds})\n\n        return results_dict\n"
  },
  {
    "path": "Open-Sora/tools/scoring/optical_flow/unimatch/utils.py",
    "content": "import torch\nimport torch.nn.functional as F\n\nfrom .position import PositionEmbeddingSine\n\n\ndef generate_window_grid(h_min, h_max, w_min, w_max, len_h, len_w, device=None):\n    assert device is not None\n\n    x, y = torch.meshgrid(\n        [torch.linspace(w_min, w_max, len_w, device=device), torch.linspace(h_min, h_max, len_h, device=device)],\n    )\n    grid = torch.stack((x, y), -1).transpose(0, 1).float()  # [H, W, 2]\n\n    return grid\n\n\ndef normalize_coords(coords, h, w):\n    # coords: [B, H, W, 2]\n    c = torch.Tensor([(w - 1) / 2.0, (h - 1) / 2.0]).float().to(coords.device)\n    return (coords - c) / c  # [-1, 1]\n\n\ndef normalize_img(img0, img1):\n    # loaded images are in [0, 255]\n    # normalize by ImageNet mean and std\n    mean = torch.tensor([0.485, 0.456, 0.406]).view(1, 3, 1, 1).to(img1.device)\n    std = torch.tensor([0.229, 0.224, 0.225]).view(1, 3, 1, 1).to(img1.device)\n    img0 = (img0 / 255.0 - mean) / std\n    img1 = (img1 / 255.0 - mean) / std\n\n    return img0, img1\n\n\ndef split_feature(\n    feature,\n    num_splits=2,\n    channel_last=False,\n):\n    if channel_last:  # [B, H, W, C]\n        b, h, w, c = feature.size()\n        assert h % num_splits == 0 and w % num_splits == 0\n\n        b_new = b * num_splits * num_splits\n        h_new = h // num_splits\n        w_new = w // num_splits\n\n        feature = (\n            feature.view(b, num_splits, h // num_splits, num_splits, w // num_splits, c)\n            .permute(0, 1, 3, 2, 4, 5)\n            .reshape(b_new, h_new, w_new, c)\n        )  # [B*K*K, H/K, W/K, C]\n    else:  # [B, C, H, W]\n        b, c, h, w = feature.size()\n        assert h % num_splits == 0 and w % num_splits == 0\n\n        b_new = b * num_splits * num_splits\n        h_new = h // num_splits\n        w_new = w // num_splits\n\n        feature = (\n            feature.view(b, c, num_splits, h // num_splits, num_splits, w // num_splits)\n            .permute(0, 2, 4, 1, 3, 5)\n            .reshape(b_new, c, h_new, w_new)\n        )  # [B*K*K, C, H/K, W/K]\n\n    return feature\n\n\ndef merge_splits(\n    splits,\n    num_splits=2,\n    channel_last=False,\n):\n    if channel_last:  # [B*K*K, H/K, W/K, C]\n        b, h, w, c = splits.size()\n        new_b = b // num_splits // num_splits\n\n        splits = splits.view(new_b, num_splits, num_splits, h, w, c)\n        merge = (\n            splits.permute(0, 1, 3, 2, 4, 5).contiguous().view(new_b, num_splits * h, num_splits * w, c)\n        )  # [B, H, W, C]\n    else:  # [B*K*K, C, H/K, W/K]\n        b, c, h, w = splits.size()\n        new_b = b // num_splits // num_splits\n\n        splits = splits.view(new_b, num_splits, num_splits, c, h, w)\n        merge = (\n            splits.permute(0, 3, 1, 4, 2, 5).contiguous().view(new_b, c, num_splits * h, num_splits * w)\n        )  # [B, C, H, W]\n\n    return merge\n\n\ndef generate_shift_window_attn_mask(\n    input_resolution, window_size_h, window_size_w, shift_size_h, shift_size_w, device=torch.device(\"cuda\")\n):\n    # ref: https://github.com/microsoft/Swin-Transformer/blob/main/models/swin_transformer.py\n    # calculate attention mask for SW-MSA\n    h, w = input_resolution\n    img_mask = torch.zeros((1, h, w, 1)).to(device)  # 1 H W 1\n    h_slices = (slice(0, -window_size_h), slice(-window_size_h, -shift_size_h), slice(-shift_size_h, None))\n    w_slices = (slice(0, -window_size_w), slice(-window_size_w, -shift_size_w), slice(-shift_size_w, None))\n    cnt = 0\n    for h in h_slices:\n        for w in w_slices:\n            img_mask[:, h, w, :] = cnt\n            cnt += 1\n\n    mask_windows = split_feature(img_mask, num_splits=input_resolution[-1] // window_size_w, channel_last=True)\n\n    mask_windows = mask_windows.view(-1, window_size_h * window_size_w)\n    attn_mask = mask_windows.unsqueeze(1) - mask_windows.unsqueeze(2)\n    attn_mask = attn_mask.masked_fill(attn_mask != 0, float(-100.0)).masked_fill(attn_mask == 0, float(0.0))\n\n    return attn_mask\n\n\ndef feature_add_position(feature0, feature1, attn_splits, feature_channels):\n    pos_enc = PositionEmbeddingSine(num_pos_feats=feature_channels // 2)\n\n    if attn_splits > 1:  # add position in splited window\n        feature0_splits = split_feature(feature0, num_splits=attn_splits)\n        feature1_splits = split_feature(feature1, num_splits=attn_splits)\n\n        position = pos_enc(feature0_splits)\n\n        feature0_splits = feature0_splits + position\n        feature1_splits = feature1_splits + position\n\n        feature0 = merge_splits(feature0_splits, num_splits=attn_splits)\n        feature1 = merge_splits(feature1_splits, num_splits=attn_splits)\n    else:\n        position = pos_enc(feature0)\n\n        feature0 = feature0 + position\n        feature1 = feature1 + position\n\n    return feature0, feature1\n\n\ndef upsample_flow_with_mask(flow, up_mask, upsample_factor, is_depth=False):\n    # convex upsampling following raft\n\n    mask = up_mask\n    b, flow_channel, h, w = flow.shape\n    mask = mask.view(b, 1, 9, upsample_factor, upsample_factor, h, w)  # [B, 1, 9, K, K, H, W]\n    mask = torch.softmax(mask, dim=2)\n\n    multiplier = 1 if is_depth else upsample_factor\n    up_flow = F.unfold(multiplier * flow, [3, 3], padding=1)\n    up_flow = up_flow.view(b, flow_channel, 9, 1, 1, h, w)  # [B, 2, 9, 1, 1, H, W]\n\n    up_flow = torch.sum(mask * up_flow, dim=2)  # [B, 2, K, K, H, W]\n    up_flow = up_flow.permute(0, 1, 4, 2, 5, 3)  # [B, 2, K, H, K, W]\n    up_flow = up_flow.reshape(b, flow_channel, upsample_factor * h, upsample_factor * w)  # [B, 2, K*H, K*W]\n\n    return up_flow\n\n\ndef split_feature_1d(\n    feature,\n    num_splits=2,\n):\n    # feature: [B, W, C]\n    b, w, c = feature.size()\n    assert w % num_splits == 0\n\n    b_new = b * num_splits\n    w_new = w // num_splits\n\n    feature = feature.view(b, num_splits, w // num_splits, c).view(b_new, w_new, c)  # [B*K, W/K, C]\n\n    return feature\n\n\ndef merge_splits_1d(\n    splits,\n    h,\n    num_splits=2,\n):\n    b, w, c = splits.size()\n    new_b = b // num_splits // h\n\n    splits = splits.view(new_b, h, num_splits, w, c)\n    merge = splits.view(new_b, h, num_splits * w, c)  # [B, H, W, C]\n\n    return merge\n\n\ndef window_partition_1d(x, window_size_w):\n    \"\"\"\n    Args:\n        x: (B, W, C)\n        window_size (int): window size\n\n    Returns:\n        windows: (num_windows*B, window_size, C)\n    \"\"\"\n    B, W, C = x.shape\n    x = x.view(B, W // window_size_w, window_size_w, C).view(-1, window_size_w, C)\n    return x\n\n\ndef generate_shift_window_attn_mask_1d(input_w, window_size_w, shift_size_w, device=torch.device(\"cuda\")):\n    # calculate attention mask for SW-MSA\n    img_mask = torch.zeros((1, input_w, 1)).to(device)  # 1 W 1\n    w_slices = (slice(0, -window_size_w), slice(-window_size_w, -shift_size_w), slice(-shift_size_w, None))\n    cnt = 0\n    for w in w_slices:\n        img_mask[:, w, :] = cnt\n        cnt += 1\n\n    mask_windows = window_partition_1d(img_mask, window_size_w)  # nW, window_size, 1\n    mask_windows = mask_windows.view(-1, window_size_w)\n    attn_mask = mask_windows.unsqueeze(1) - mask_windows.unsqueeze(2)  # nW, window_size, window_size\n    attn_mask = attn_mask.masked_fill(attn_mask != 0, float(-100.0)).masked_fill(attn_mask == 0, float(0.0))\n\n    return attn_mask\n"
  },
  {
    "path": "PixArt-alpha-ToCa/Dockerfile",
    "content": "# This is a sample Dockefile that builds a runtime container and runs the sample Gradio app.\r\n# Note, you must pass in the pretrained models when you run the container.\r\n\r\nFROM nvidia/cuda:12.2.0-runtime-ubuntu22.04\r\n\r\nWORKDIR /workspace\r\n\r\nRUN apt-get update && \\\r\n    apt-get install -y \\\r\n        git \\\r\n        python3 \\\r\n        python-is-python3 \\\r\n        python3-pip \\\r\n        python3.10-venv \\\r\n        libgl1 \\\r\n        libgl1-mesa-glx \\ \r\n        libglib2.0-0 \\\r\n    && rm -rf /var/lib/apt/lists/*\r\n\r\nADD requirements.txt .\r\n\r\nRUN pip install -r requirements.txt\r\n\r\nADD . .\r\n\r\nRUN chmod a+x docker-entrypoint.sh\r\n\r\nENV DEMO_PORT=12345\r\nENTRYPOINT [ \"/workspace/docker-entrypoint.sh\" ]"
  },
  {
    "path": "PixArt-alpha-ToCa/README(PixArt-alpha).md",
    "content": "<p align=\"center\">\n  <img src=\"asset/logo.png\"  height=120>\n</p>\n\n\n### <div align=\"center\">👉 PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis<div> \n### <div align=\"center\"> ICLR 2024 Spotlight <div> \n\n<div align=\"center\">\n  <a href=\"https://github.com/PixArt-alpha/PixArt-sigma/\"><img src=\"https://img.shields.io/static/v1?label=PixArt-Sigma Code&message=Github&color=blue&logo=github-pages\"></a> &ensp;\n\n  <a href=\"https://pixart-alpha.github.io/\"><img src=\"https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages\"></a> &ensp;\n  <a href=\"https://huggingface.co/datasets/PixArt-alpha/SAM-LLaVA-Captions10M\"><img src=\"https://img.shields.io/static/v1?label=SAM-LLaVA&message=HF&color=yellow\"></a> &ensp;\n  <a href=\"https://arxiv.org/abs/2310.00426\"><img src=\"https://img.shields.io/static/v1?label=Paper&message=Arxiv:Alpha&color=red&logo=arxiv\"></a> &ensp;\n  <a href=\"https://arxiv.org/abs/2401.05252\"><img src=\"https://img.shields.io/static/v1?label=Paper&message=Arxiv:Delta&color=red&logo=arxiv\"></a> &ensp;\n  <a href=\"https://discord.gg/rde6eaE5Ta\"><img src=\"https://img.shields.io/static/v1?label=Discuss&message=Discord&color=purple&logo=discord\"></a> &ensp;\n  <a href=\"https://huggingface.co/docs/diffusers/main/en/api/pipelines/pixart\"><img src=\"https://img.shields.io/static/v1?label=Usage&message=Diffusers&color=green&\"></a> &ensp;\n  <a href=\"https://github.com/city96/ComfyUI_ExtraModels\"><img src=\"https://img.shields.io/static/v1?label=App&message=ComfyUI&&color=green\"></a> &ensp;\n\n  <a href=\"https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha\"><img src=\"https://img.shields.io/static/v1?label=Demo PixArt&message=HuggingFace&color=yellow\"></a> &ensp;\n  <a href=\"https://huggingface.co/spaces/PixArt-alpha/PixArt-LCM\"><img src=\"https://img.shields.io/static/v1?label=Demo PixArt-LCM&message=HuggingFace&color=yellow\"></a> &ensp;\n  <a href=\"https://openxlab.org.cn/apps/detail/PixArt-alpha/PixArt-alpha\"><img src=\"https://img.shields.io/static/v1?label=Demo PixArt&message=OpenXLab&color=purple\"></a> &ensp;\n  <a href=\"https://openxlab.org.cn/apps/detail/houshaowei/PixArt-LCM\"><img src=\"https://img.shields.io/static/v1?label=Demo PixArt-LCM&message=OpenXLab&color=purple\"></a> &ensp;\n  <a href=\"https://colab.research.google.com/drive/1jZ5UZXk7tcpTfVwnX33dDuefNMcnW9ME?usp=sharing\"><img src=\"https://img.shields.io/static/v1?label=Free%20Trial&message=Google%20Colab&logo=google&color=orange\"></a> &ensp;\n</div>\n\n---\n\nThis repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring \nFast training diffusion models with transformers. You can find more visualizations on our [project page](https://pixart-alpha.github.io/).\n\n<img src=\"asset/logo.png\" width=\"10%\" alt=\"\" /> **PixArt-α Community**: Join our PixArt-α discord channels <a href=\"https://discord.gg/rde6eaE5Ta\" style=\"text-decoration:none;\">\n<img src=\"https://user-images.githubusercontent.com/25839884/218347213-c080267f-cbb6-443e-8532-8e1ed9a58ea9.png\" width=\"3%\" alt=\"\" /></a> for discussions. Coders are welcome to contribute.\n\n> [**PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis**](https://pixart-alpha.github.io/)<br>\n> [Junsong Chen*](https://lawrence-cj.github.io/), [Jincheng Yu*](https://lovesykun.cn/about.html), \n> [Chongjian Ge*](https://chongjiange.github.io/), [Lewei Yao*](https://scholar.google.com/citations?user=hqDyTg8AAAAJ&hl=zh-CN&oi=ao),\n> [Enze Xie](https://xieenze.github.io/)&#8224;,\n> [Yue Wu](https://yuewuhkust.github.io/), [Zhongdao Wang](https://zhongdao.github.io/), \n> [James Kwok](https://www.cse.ust.hk/~jamesk/), [Ping Luo](http://luoping.me/), \n> [Huchuan Lu](https://scholar.google.com/citations?hl=en&user=D3nE0agAAAAJ), \n> [Zhenguo Li](https://scholar.google.com/citations?user=XboZC1AAAAAJ)\n> <br>Huawei Noah’s Ark Lab, Dalian University of Technology, HKU, HKUST<br>\n\n> [**PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models**](https://pixart-alpha.github.io/)<br>\n> [Junsong Chen](https://lawrence-cj.github.io/), [Yue Wu](https://yuewuhkust.github.io/), [Simian Luo](https://luosiallen.github.io/),  [Enze Xie](https://xieenze.github.io/)&#8224;,\n> [Sayak Paul](https://sayak.dev/), [Ping Luo](http://luoping.me/), [Hang Zhao](), [Zhenguo Li](https://scholar.google.com/citations?user=XboZC1AAAAAJ)\n> <br>Huawei Noah’s Ark Lab, DLUT, Tsinghua University, HKU, Hugging Face<br>\n\n---\n## Breaking News 🔥🔥!!\n- (🔥 New) Apr. 12, 2024. 💥 A better version of [PixArt-Σ](https://github.com/PixArt-alpha/PixArt-sigma) training & inference code, checkpoints are all released!!!\nWelcome to collaborate and contribute. Star 🌟us if you think it is helpful!!\n\n\n- (🔥 New) Jan. 19, 2024. 💥 [PixArt-δ](https://arxiv.org/abs/2401.05252) ControlNet [app_controlnet.py](app/app_controlnet.py) and [Checkpoint](https://huggingface.co/PixArt-alpha/PixArt-ControlNet/tree/main) are released!!!\n- (🔥 New) Jan. 16, 2024. 💥 Glad to announce that [PixArt-α](https://arxiv.org/abs/2310.00426) is accepted by ICLR 2024 (Spotlight).\n- (🔥 New) Dec. 17, 2023. 💥 PixArt supports [ComfyUI](https://github.com/comfyanonymous/ComfyUI#manual-install-windows-linux). Thanks to [@city96](https://github.com/city96/ComfyUI_ExtraModels) with his great work.\n- (🔥 New) Nov. 30, 2023. 💥 PixArt collaborates with [LCMs](https://github.com/luosiallen/latent-consistency-model) team to make the **fastest** [Training & Inference Text-to-Image Generation System](https://github.com/PixArt-alpha/PixArt-alpha).\nHere, [Training code](train_scripts/train_pixart_lcm.py) & [Inference code](scripts/inference_lcm.py) & [Weights](https://huggingface.co/PixArt-alpha/PixArt-LCM-XL-2-1024-MS) & [HF Demo](https://huggingface.co/spaces/PixArt-alpha/PixArt-LCM) [OpenXLab Demo](https://openxlab.org.cn/apps/detail/houshaowei/PixArt-LCM) are all released, we hope users will enjoy them. \nDetailed **inference speed** and **code guidance** can be found in [docs](asset/docs/pixart_lcm.md). At the same time, we update the codebase for better user experience and fix some bugs in the newest version.\n\n---\n## 🚩 **New Features/Updates**\n- ✅ Jan. 11, 2024. 💥 [PixArt-δ](https://arxiv.org/abs/2401.05252): We are excited to announce the release of the [PixArt-δ](https://arxiv.org/abs/2401.05252) technical report!!!\nThis report offers valuable insights into the training of LCM and ControlNet-like modules in Transformer Models. Along with the report, we have also released all the training and inference code for LCM & ControlNet [in this repository](https://github.com/PixArt-alpha/PixArt-alpha). \nWe encourage you to try them out and warmly welcome any Pull Requests from our users. Your contributions and feedback are highly appreciated!\n- ✅ Feb. 07, 2024. [train_diffusers.py](train_scripts/train_diffusers.py) can directly train with diffusers model and visualize during training.\n- ✅ Jan. 26, 2024. 💥 All checkpoints of [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha), including 256px checkpoints are all available here [Download Models](#-download-models).\n- ✅ Jan. 19, 2024. 💥 [PixArt-δ](https://arxiv.org/abs/2401.05252) ControlNet [app_controlnet.py](app/app_controlnet.py) and [Checkpoint](https://huggingface.co/PixArt-alpha/PixArt-ControlNet/tree/main) is released!!!\n- ✅ Jan. 12, 2024. 💥 We release the [SAM-LLaVA-Captions](https://huggingface.co/datasets/PixArt-alpha/SAM-LLaVA-Captions10M) used in PixArt-α training.\n- ✅ Dec. 27, 2023. [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha) incorporates into [ControlLLM](https://github.com/OpenGVLab/ControlLLM)!\n- ✅ Dec. 17, 2023. [PixArt-LCM-Lora](train_scripts/train_pixart_lcm_lora.py) & [PixArt-Lora](train_scripts/train_pixart_lora_hf.py) training scripts in Hugging Face style is released.\n- ✅ Dec. 13, 2023. Add multi-scale vae feature extraction in [tools/extract_features.py](https://github.com/PixArt-alpha/PixArt-alpha/blob/3b4f0afdbe39def80b41ab05c664c963edeebbcd/tools/extract_features.py#L276).\n- ✅ Dec. 01, 2023. Add a [Notebook folder](./notebooks) to help users get started with PixArt quickly! Thanks to [@kopyl](https://github.com/kopyl) for his contribution!\n- ✅ Nov. 27, 2023. 💥 **PixArt-α Community**: Join our PixArt-α discord channels <a href=\"https://discord.gg/rde6eaE5Ta\" style=\"text-decoration:none;\">\n<img src=\"https://user-images.githubusercontent.com/25839884/218347213-c080267f-cbb6-443e-8532-8e1ed9a58ea9.png\" width=\"3%\" alt=\"\" /></a> for discussions. Coders are welcome to contribute.\n- ✅ Nov. 21, 2023. 💥 [SA-Sovler](https://arxiv.org/abs/2309.05019) official code first release [here](asset/docs/sasolver.md).\n- ✅ Nov. 19, 2023. Release `PixArt + Dreambooth` training scripts.\n- ✅ Nov. 16, 2023. Diffusers support `random resolution` and `batch images` generation now. Besides, \nrunning `Pixart` in under 8GB GPU VRAM is available in 🧨 [diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pixart).\n- ✅ Nov. 10, 2023. Support DALL-E 3 Consistency Decoder in 🧨 diffusers.\n- ✅ Nov. 06, 2023. Release pretrained weights with 🧨 diffusers integration, Hugging Face demo, and Google Colab example.\n- ✅ Nov. 03, 2023. Release the LLaVA-captioning inference code.\n- ✅ Oct. 27, 2023. Release the training & feature extraction code.\n- ✅ Oct. 20, 2023. Collaborate with Hugging Face & Diffusers team to co-release the code and weights. (plz stay tuned.)\n- ✅ Oct. 15, 2023. Release the inference code.\n\n---\n\n## Contents\n* [Training](#-how-to-train)\n* [Inference](#-how-to-test)\n* [Download Models](#-download-models)\n* [Use diffusers](#1---using-in--diffusers)\n* [Data Processing](#-how-to-extract-t5-and-vae-features)\n* [PixArt-**α** Demo](#3---gradio-with-diffusers--faster-)\n* [PixArt-**α** 8GB VRAM](asset/docs/pixart.md)\n* [PixArt-**δ** (LCM)](asset/docs/pixart_lcm.md)\n* [PixArt-**δ** (ControlNet)](asset/docs/pixart_controlnet.md)\n* [PixArt-**δ** (Dreambooth)](asset/docs/pixart-dreambooth.md)\n* [Acknowledgement](#acknowledgements)\n* [Citation](#bibtex)\n\n\n* [PixArt-**Σ** Releasing](https://github.com/PixArt-alpha/PixArt-sigma)\n\n---\n\n## 🐱 Abstract\n<b>TL; DR: <font color=\"red\">PixArt-α</font> is a Transformer-based T2I diffusion model whose image generation quality is competitive with state-of-the-art image generators (e.g., Imagen, SDXL, and even Midjourney), and the training speed markedly surpasses existing large-scale T2I models, e.g., PixArt-α only takes 10.8% of Stable Diffusion v1.5's training time (675 vs. 6,250 A100 GPU days).</b>\n\n<details><summary>CLICK for the full abstract</summary>\nThe most advanced text-to-image (T2I) models require significant training costs (e.g., millions of GPU hours), \nseriously hindering the fundamental innovation for the AIGC community while increasing CO2 emissions. \nThis paper introduces PixArt-α, a Transformer-based T2I diffusion model whose image generation quality is competitive with state-of-the-art image generators (e.g., Imagen, SDXL, and even Midjourney), \nreaching near-commercial application standards. Additionally, it supports high-resolution image synthesis up to 1024px resolution with low training cost. \nTo achieve this goal, three core designs are proposed: \n(1) Training strategy decomposition: We devise three distinct training steps that separately optimize pixel dependency, text-image alignment, and image aesthetic quality; \n(2) Efficient T2I Transformer: We incorporate cross-attention modules into Diffusion Transformer (DiT) to inject text conditions and streamline the computation-intensive class-condition branch; \n(3) High-informative data: We emphasize the significance of concept density in text-image pairs and leverage a large Vision-Language model to auto-label dense pseudo-captions to assist text-image alignment learning. \nAs a result, PixArt-α's training speed markedly surpasses existing large-scale T2I models, \ne.g., PixArt-α only takes 10.8% of Stable Diffusion v1.5's training time (675 vs. 6,250 A100 GPU days), \nsaving nearly $300,000 ($26,000 vs. $320,000) and reducing 90% CO2 emissions. Moreover, compared with a larger SOTA model, RAPHAEL, \nour training cost is merely 1%. Extensive experiments demonstrate that PixArt-α excels in image quality, artistry, and semantic control. \nWe hope PixArt-α will provide new insights to the AIGC community and startups to accelerate building their own high-quality yet low-cost generative models from scratch.\n</details>\n\n---\n\n![A small cactus with a happy face in the Sahara desert.](asset/images/teaser.png)\n\n---\n\n# 🔥🔥🔥 Why PixArt-α? \n## Training Efficiency\nPixArt-α only takes 12% of Stable Diffusion v1.5's training time (753 vs. 6,250 A100 GPU days), saving nearly $300,000 ($28,000 vs. $320,000) and reducing 90% CO2 emissions. Moreover, compared with a larger SOTA model, RAPHAEL, our training cost is merely 1%.\n![Training Efficiency.](asset/images/efficiency.png)\n\n| Method    | Type | #Params | #Images| FID-30K ↓        | A100 GPU days |\n|-----------|------|---------|--------|------------------|---------------|\n| DALL·E    | Diff | 12.0B   | 250M   | 27.50            |               |\n| GLIDE     | Diff | 5.0B    | 250M   | 12.24            |               |\n| LDM       | Diff | 1.4B    | 400M   | 12.64            |               |\n| DALL·E 2  | Diff | 6.5B    | 650M   | 10.39            | 41,66         |\n| SDv1.5    | Diff | 0.9B    | 2000M  | 9.62             | 6,250         |\n| GigaGAN   | GAN  | 0.9B    | 2700M  | 9.09             | 4,783         |\n| Imagen    | Diff | 3.0B    | 860M   | 7.27             | 7,132         |\n| RAPHAEL   | Diff | 3.0B    | 5000M+ | 6.61             | 60,000        |\n| PixArt-α  | Diff | 0.6B    | 25M    | 7.32 (zero-shot) | 753           |\n| PixArt-α  | Diff | 0.6B    | 25M    | 5.51 (COCO FT)   | 753           |\n\n## Inference Efficiency\nPIXART-δ successfully generates **1024x1024 high resolution** images within **0.5 seconds** on an A100. With the implementation\nof 8-bit inference technology, PIXART-δ requires **less than 8GB of GPU VRAM**. \n\nLet us stress again how liberating it is to explore image generation so easily with PixArt-LCM.\n\n| Hardware                    | PIXART-δ (4 steps) | SDXL LoRA LCM (4 steps) | PixArt-α (14 steps) | SDXL standard (25 steps) |\n|-----------------------------|--------------------|-------------------------|---------------------|---------------------------|\n| T4 (Google Colab Free Tier) | 3.3s               | 8.4s                    | 16.0s               | 26.5s                     |\n| V100 (32 GB)                | 0.8s               | 1.2s                    | 5.5s                | 7.7s                      |\n| A100 (80 GB)                | 0.51s              | 1.2s                    | 2.2s                | 3.8s                      |\n\nThese tests were run with a batch size of 1 in all cases.\n\nFor cards with a lot of capacity, such as A100, performance increases significantly when generating multiple images at once, which is usually the case for production workloads.\n\n## High-quality Generation from PixArt-α\n\n- More samples\n<div id=\"more-samples\" style=\"display: flex; justify-content: center;\">\n  <img src=\"asset/images/more-samples1.png\" style=\"width: 50%; height: auto; object-fit: contain; margin: 5px;\">\n  <img src=\"asset/images/more-samples.png\" style=\"width: 43%; height: auto; object-fit: contain; margin: 5px;\">\n</div>\n\n- PixArt + [Dreambooth](https://dreambooth.github.io/)\n<div id=\"dreambooth\" style=\"display: flex; justify-content: center;\">\n  <img src=\"asset/images/dreambooth/dreambooth_dog.svg\" width=\"46%\" style=\"margin: 5px;\">\n  <img src=\"asset/images/dreambooth/dreambooth_m5.svg\" width=\"46%\" style=\"margin: 5px;\">\n</div>\n\n- PixArt + [ControlNet](https://github.com/lllyasviel/ControlNet)\n<div id=\"ControlNet\" style=\"display: flex; justify-content: center;\">\n  <img src=\"asset/images/controlnet/controlnet_huawei.svg\" width=\"46%\" style=\"margin: 5px;\">\n  <img src=\"asset/images/controlnet/controlnet_lenna.svg\" width=\"46%\" style=\"margin: 5px;\">\n</div>\n\n# 🔧 Dependencies and Installation\n\n- Python >= 3.9 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html))\n- [PyTorch >= 1.13.0+cu11.7](https://pytorch.org/)\n```bash\nconda create -n pixart python=3.9\nconda activate pixart\npip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118\n\ngit clone https://github.com/PixArt-alpha/PixArt-alpha.git\ncd PixArt-alpha\npip install -r requirements.txt\n```\n\n# ⏬ Download Models\nAll models will be automatically downloaded. You can also choose to download manually from this [url](https://huggingface.co/PixArt-alpha/PixArt-alpha).\n\n| Model                       | #Params | url                                                                                                                                                                                                          | Download in OpenXLab                                                                                            |\n|:----------------------------|:--------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------|\n| T5                          | 4.3B    | [T5](https://huggingface.co/PixArt-alpha/PixArt-alpha/tree/main/t5-v1_1-xxl)                                                                                                                                 | [T5](https://download.openxlab.org.cn/models/PixArt-alpha/PixArt-alpha/weight/t5-v1_1-xxl.zip)                  |\n| VAE                         | 80M     | [VAE](https://huggingface.co/PixArt-alpha/PixArt-alpha/tree/main/sd-vae-ft-ema)                                                                                                                              | [VAE](https://download.openxlab.org.cn/models/PixArt-alpha/PixArt-alpha/weight/sd-vae-ft-ema.zip)               |\n| PixArt-α-SAM-256            | 0.6B    | [PixArt-XL-2-SAM-256x256.pth](https://huggingface.co/PixArt-alpha/PixArt-alpha/resolve/main/PixArt-XL-2-SAM-256x256.pth) or [diffusers version](https://huggingface.co/PixArt-alpha/PixArt-XL-2-SAM-256x256) | [256-SAM](https://download.openxlab.org.cn/models/PixArt-alpha/PixArt-alpha/weight/PixArt-XL-2-SAM-256x256.pth) |\n| PixArt-α-256                | 0.6B    | [PixArt-XL-2-256x256.pth](https://huggingface.co/PixArt-alpha/PixArt-alpha/resolve/main/PixArt-XL-2-256x256.pth) or [diffusers version](https://huggingface.co/PixArt-alpha/PixArt-XL-2-256x256)             | [256](https://download.openxlab.org.cn/models/PixArt-alpha/PixArt-alpha/weight/PixArt-XL-2-256x256.pth)         |\n| PixArt-α-256-MSCOCO-FID7.32 | 0.6B    | [PixArt-XL-2-256x256.pth](https://huggingface.co/PixArt-alpha/PixArt-alpha/resolve/main/PixArt-XL-2-256x256-MSCOCO-FID732.pth)                                                                               | [256]()                                                                                                         |\n| PixArt-α-512                | 0.6B    | [PixArt-XL-2-512x512.pth](https://huggingface.co/PixArt-alpha/PixArt-alpha/resolve/main/PixArt-XL-2-512x512.pth) or [diffusers version](https://huggingface.co/PixArt-alpha/PixArt-XL-2-512x512)             | [512](https://download.openxlab.org.cn/models/PixArt-alpha/PixArt-alpha/weight/PixArt-XL-2-512x512.pth)         |\n| PixArt-α-1024               | 0.6B    | [PixArt-XL-2-1024-MS.pth](https://huggingface.co/PixArt-alpha/PixArt-alpha/resolve/main/PixArt-XL-2-1024-MS.pth) or [diffusers version](https://huggingface.co/PixArt-alpha/PixArt-XL-2-1024-MS)             | [1024](https://download.openxlab.org.cn/models/PixArt-alpha/PixArt-alpha/weight/PixArt-XL-2-1024-MS.pth)        |\n| PixArt-δ-1024-LCM           | 0.6B    | [diffusers version](https://huggingface.co/PixArt-alpha/PixArt-LCM-XL-2-1024-MS)                                                                                                                             |                                                                                                                 |\n| ControlNet-HED-Encoder      | 30M     | [ControlNetHED.pth](https://huggingface.co/PixArt-alpha/PixArt-alpha/blob/main/ControlNetHED.pth)                                                                                                            |                                                                                                                 |\n| PixArt-δ-512-ControlNet     | 0.9B    | [PixArt-XL-2-512-ControlNet.pth](https://huggingface.co/PixArt-alpha/PixArt-ControlNet/tree/main)                                                                                                            | [512](https://openxlab.org.cn/models/detail/PixArt-alpha/PixArt-ControlNet)                                     |\n| PixArt-δ-1024-ControlNet    | 0.9B    | [PixArt-XL-2-1024-ControlNet.pth](https://huggingface.co/PixArt-alpha/PixArt-ControlNet/tree/main)                                                                                                           | [1024](https://openxlab.org.cn/models/detail/PixArt-alpha/PixArt-ControlNet)                                    |\n\nALSO find all models in [OpenXLab_PixArt-alpha](https://openxlab.org.cn/models/detail/PixArt-alpha/PixArt-alpha)\n\n# 🔥 How to Train\n## 1. PixArt Training\n\n**First of all.**\n\nThanks to [@kopyl](https://github.com/kopyl), you can reproduce the full fine-tune training flow on [Pokemon dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) from HugginFace with notebooks:\n1. Train with [notebooks/train.ipynb](https://github.com/PixArt-alpha/PixArt-alpha/blob/53dac066f60fe5fdbdde4f0360145ca96d4cc38c/notebooks/train.ipynb).\n2. Convert to Diffusers with [notebooks/convert-checkpoint-to-diffusers.ipynb](https://github.com/PixArt-alpha/PixArt-alpha/blob/master/notebooks/convert-checkpoint-to-diffusers.ipynb).\n3. Run the inference with converted checkpoint in step 2 with [notebooks/infer.ipynb](https://github.com/PixArt-alpha/PixArt-alpha/blob/master/notebooks/infer.ipynb).\n\n**Then, for more details.**\n\nHere we take SAM dataset training config as an example, but of course, you can also prepare your own dataset following this method.\n\nYou **ONLY** need to change the **config** file in [config](./configs/pixart_config) and **dataloader** in [dataset](./diffusion/data/datasets).\n```bash\npython -m torch.distributed.launch --nproc_per_node=2 --master_port=12345 train_scripts/train.py configs/pixart_config/PixArt_xl2_img256_SAM.py --work-dir output/train_SAM_256\n```\n\nThe directory structure for SAM dataset is:\n```\ncd ./data\n\nSA1B\n├──images/  (images are saved here)\n│  ├──sa_xxxxx.jpg\n│  ├──sa_xxxxx.jpg\n│  ├──......\n├──captions/    (corresponding captions are saved here, same name as images)\n│  ├──sa_xxxxx.txt\n│  ├──sa_xxxxx.txt\n├──partition/   (all image names are stored txt file where each line is a image name)\n│  ├──part0.txt\n│  ├──part1.txt\n│  ├──......\n├──caption_feature_wmask/   (run tools/extract_caption_feature.py to generate caption T5 features, same name as images except .npz extension)\n│  ├──sa_xxxxx.npz\n│  ├──sa_xxxxx.npz\n│  ├──......\n├──img_vae_feature/  (run tools/extract_img_vae_feature.py to generate image VAE features, same name as images except .npy extension)\n│  ├──train_vae_256/\n│  │  ├──noflip/\n│  │  │  ├──sa_xxxxx.npy\n│  │  │  ├──sa_xxxxx.npy\n│  │  │  ├──......\n\n```\n\n**Here we prepare data_toy for better understanding**\n```bash\ncd ./data\n\ngit lfs install\ngit clone https://huggingface.co/datasets/PixArt-alpha/data_toy\n```\nThen, \n[Here](https://huggingface.co/datasets/PixArt-alpha/data_toy/blob/main/part0.txt) is an example of partition/part0.txt file.\n\n---\n\nBesides, for json file guided [training](https://github.com/PixArt-alpha/PixArt-alpha/blob/fe0cb78065d64c18ecd8955a04e4f29138d47946/configs/pixart_config/PixArt_xl2_img1024_internalms.py#L3C2-L3C2),\n[here](https://huggingface.co/datasets/PixArt-alpha/data_toy/blob/main/data_info.json) is a toy json file for better understand.\n\n---\n\n## 2. PixArt + DreamBooth Training\n\nFollowing the `Pixart + DreamBooth` [training guidance](asset/docs/pixart-dreambooth.md)\n\n## 3. PixArt + LCM / LCM-LoRA Training\n\nFollowing the `PixArt + LCM` [training guidance](asset/docs/pixart_lcm.md)\n\n## 4. PixArt + ControlNet Training\n\nFollowing the `PixArt + ControlNet` [training guidance](asset/docs/pixart_controlnet.md)\n\n## 4. PixArt + LoRA Training\n\n```bash\npip install peft==0.6.2\n\naccelerate launch --num_processes=1 --main_process_port=36667  train_scripts/train_pixart_lora_hf.py --mixed_precision=\"fp16\" \\\n  --pretrained_model_name_or_path=PixArt-alpha/PixArt-XL-2-1024-MS \\\n  --dataset_name=lambdalabs/pokemon-blip-captions --caption_column=\"text\" \\\n  --resolution=1024 --random_flip \\\n  --train_batch_size=16 \\\n  --num_train_epochs=200 --checkpointing_steps=100 \\\n  --learning_rate=1e-06 --lr_scheduler=\"constant\" --lr_warmup_steps=0 \\\n  --seed=42 \\\n  --output_dir=\"pixart-pokemon-model\" \\\n  --validation_prompt=\"cute dragon creature\" --report_to=\"tensorboard\" \\\n  --gradient_checkpointing --checkpoints_total_limit=10 --validation_epochs=5 \\\n  --rank=16\n```\n\n# 💻 How to Test\nInference requires at least `23GB` of GPU memory using this repo, while `11GB and 8GB` using in 🧨 [diffusers](#using-in--diffusers).\n\nCurrently support:\n- [x] [IDDPM](https://arxiv.org/abs/2102.09672)\n- [x] [DPM-Solver](https://arxiv.org/abs/2206.00927)\n- [x] [SA-Solver](https://arxiv.org/abs/2309.05019)\n- [ ] [DPM-Solver-v3](https://arxiv.org/abs/2310.13268v2)\n\n## 1. Quick start with [Gradio](https://www.gradio.app/guides/quickstart)\n\nTo get started, first install the required dependencies. Make sure you've downloaded the [models](https://huggingface.co/PixArt-alpha/PixArt-alpha) to the output/pretrained_models folder, and then run on your local machine:\n\n```bash\nDEMO_PORT=12345 python app/app.py\n```\n\nAs an alternative, a sample [Dockerfile](Dockerfile) is provided to make a runtime container that starts the Gradio app.\n\n```bash\ndocker build . -t pixart\ndocker run --gpus all -it -p 12345:12345 -v <path_to_huggingface_cache>:/root/.cache/huggingface pixart\n```\n\nOr use docker-compose.  Note, if you want to change context from the 1024 to 512 or LCM version of the app just change the APP_CONTEXT env variable in the docker-compose.yml file.  The default is 1024\n\n```bash\ndocker compose build\ndocker compose up\n```\n\nLet's have a look at a simple example using the `http://your-server-ip:12345`.\n\n\n## 2. Integration in diffusers\n### 1). Using in 🧨 diffusers\n\nMake sure you have the updated versions of the following libraries:\n\n```bash\npip install -U transformers accelerate diffusers SentencePiece ftfy beautifulsoup4\n```\n\nAnd then:\n\n```python\nimport torch\nfrom diffusers import PixArtAlphaPipeline, ConsistencyDecoderVAE, AutoencoderKL\ndevice = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n\n# You can replace the checkpoint id with \"PixArt-alpha/PixArt-XL-2-512x512\" too.\npipe = PixArtAlphaPipeline.from_pretrained(\"PixArt-alpha/PixArt-XL-2-1024-MS\", torch_dtype=torch.float16, use_safetensors=True)\n\n# If use DALL-E 3 Consistency Decoder\n# pipe.vae = ConsistencyDecoderVAE.from_pretrained(\"openai/consistency-decoder\", torch_dtype=torch.float16)\n\n# If use SA-Solver sampler\n# from diffusion.sa_solver_diffusers import SASolverScheduler\n# pipe.scheduler = SASolverScheduler.from_config(pipe.scheduler.config, algorithm_type='data_prediction')\n\n# If loading a LoRA model\n# transformer = Transformer2DModel.from_pretrained(\"PixArt-alpha/PixArt-LCM-XL-2-1024-MS\", subfolder=\"transformer\", torch_dtype=torch.float16)\n# transformer = PeftModel.from_pretrained(transformer, \"Your-LoRA-Model-Path\")\n# pipe = PixArtAlphaPipeline.from_pretrained(\"PixArt-alpha/PixArt-LCM-XL-2-1024-MS\", transformer=transformer, torch_dtype=torch.float16, use_safetensors=True)\n# del transformer\n\n# Enable memory optimizations.\n# pipe.enable_model_cpu_offload()\n\npipe.to(device)\n\nprompt = \"A small cactus with a happy face in the Sahara desert.\"\nimage = pipe(prompt).images[0]\nimage.save(\"./catcus.png\")\n```\nCheck out the [documentation](./asset/docs/sasolver.md) for more information about SA-Solver Sampler.\n\nThis integration allows running the pipeline with a batch size of 4 under 11 GBs of GPU VRAM. \nCheck out the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pixart) to learn more.\n\n### 2). Running the `PixArtAlphaPipeline` in under 8GB GPU VRAM\n\nGPU VRAM consumption under 8 GB is supported now, please refer to [documentation](asset/docs/pixart.md) for more information.\n\n### 3). Gradio with diffusers (Faster)\n\nTo get started, first install the required dependencies, then run on your local machine:\n\n```bash\n# diffusers version\nDEMO_PORT=12345 python app/app.py\n```\nLet's have a look at a simple example using the `http://your-server-ip:12345`.\n\nYou can also click [here](https://colab.research.google.com/drive/1jZ5UZXk7tcpTfVwnX33dDuefNMcnW9ME?usp=sharing) to have a free trial on Google Colab.\n\n### 4). Convert .pth checkpoint into diffusers version\n\n```bash\npython tools/convert_pixart_alpha_to_diffusers.py --image_size your_img_size --multi_scale_train (True if you use PixArtMS else False) --orig_ckpt_path path/to/pth --dump_path path/to/diffusers --only_transformer=True\n```\n\n\n## 3. Online Demo [![Hugging Face PixArt](https://img.shields.io/static/v1?label=Demo&message=HuggingFace%20Gradio&color=orange)](https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha) \n![Online Demo sample](asset/images/sample.png)\n\n# ✏️ How to LLaVA captioning\nThanks to the code base of [LLaVA-Lightning-MPT](https://huggingface.co/liuhaotian/LLaVA-Lightning-MPT-7B-preview), \nwe can caption the LAION and SAM dataset with the following launching code:\n```bash\npython tools/VLM_caption_lightning.py --output output/dir/ --data-root data/root/path --index path/to/data.json\n```\nWe present auto-labeling with custom prompts for LAION (left) and SAM (right). The words highlighted in green represent the original caption in LAION, while those marked in red indicate the detailed captions labeled by LLaVA.\n\n![Dialog with LLaVA.](asset/images/LLaVA-dialog.png)\n\n# ✏️ How to extract T5 and VAE features\n\nPrepare T5 text feature and VAE image feature in advance will speed up the training process and save GPU memory.\n```bash\npython tools/extract_features.py --img_size=1024 \\\n    --json_path \"data/data_info.json\" \\\n    --t5_save_root \"data/SA1B/caption_feature_wmask\" \\\n    --vae_save_root \"data/SA1B/img_vae_features\" \\\n    --pretrained_models_dir \"output/pretrained_models\" \\\n    --dataset_root \"data/SA1B/Images/\"\n```\n\n## 💪To-Do List (Congratulations🎉)\n\n- [x] Inference code\n- [x] Training code\n- [x] T5 & VAE feature extraction code\n- [x] LLaVA captioning code\n- [x] Model zoo \n- [x] Diffusers version & Hugging Face demo\n- [x] Google Colab example\n- [x] DALLE3 VAE integration\n- [x] Inference under 8GB GPU VRAM with diffusers\n- [x] Dreambooth Training code\n- [x] SA-Solver code\n- [x] PixArt-α-LCM will release soon\n- [x] Multi-scale vae feature extraction code\n- [x] PixArt-α-LCM-LoRA scripts will release soon\n- [x] PixArt-α-LoRA training scripts will release soon\n- [x] ControlNet code will be released\n- [x] SAM-LLaVA caption dataset\n- [x] ControlNet checkpoint\n- [x] 256px pre-trained models\n- [x] PixArt-Σ: Next version model with much better ability is training!\n\n# Other Source\nWe make a video comparing PixArt with current most powerful Text-to-Image models.\n\n[![Watch the video](https://img.youtube.com/vi/7_6KsIITgWY/maxresdefault.jpg)](https://www.youtube.com/watch?v=7_6KsIITgWY)\n\n# 📖BibTeX\n    @misc{chen2023pixartalpha,\n          title={PixArt-$\\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis}, \n          author={Junsong Chen and Jincheng Yu and Chongjian Ge and Lewei Yao and Enze Xie and Yue Wu and Zhongdao Wang and James Kwok and Ping Luo and Huchuan Lu and Zhenguo Li},\n          year={2023},\n          eprint={2310.00426},\n          archivePrefix={arXiv},\n          primaryClass={cs.CV}\n    }\n    @misc{chen2024pixartdelta,\n          title={PIXART-{\\delta}: Fast and Controllable Image Generation with Latent Consistency Models}, \n          author={Junsong Chen and Yue Wu and Simian Luo and Enze Xie and Sayak Paul and Ping Luo and Hang Zhao and Zhenguo Li},\n          year={2024},\n          eprint={2401.05252},\n          archivePrefix={arXiv},\n          primaryClass={cs.CV}\n    }\n    \n# 🤗Acknowledgements\n- Thanks to [Diffusers](https://github.com/huggingface/diffusers) for their wonderful technical support and awesome collaboration!\n- Thanks to [Hugging Face](https://github.com/huggingface) for sponsoring the nicely demo!\n- Thanks to [DiT](https://github.com/facebookresearch/DiT) for their wonderful work and codebase!\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=PixArt-alpha/PixArt-alpha&type=Date)](https://star-history.com/#PixArt-alpha/PixArt-alpha&Date)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/app/app.py",
    "content": "#!/usr/bin/env python\nfrom __future__ import annotations\nimport os\nimport sys\nfrom pathlib import Path\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\nimport random\nimport gradio as gr\nimport numpy as np\nimport uuid\nfrom diffusers import ConsistencyDecoderVAE, PixArtAlphaPipeline, DPMSolverMultistepScheduler\nimport torch\nfrom typing import Tuple\nfrom datetime import datetime\nfrom diffusion.sa_solver_diffusers import SASolverScheduler\n\n\nDESCRIPTION = \"\"\"![Logo](https://raw.githubusercontent.com/PixArt-alpha/PixArt-alpha.github.io/master/static/images/logo.png)\n        # PixArt-Alpha 1024px\n        #### [PixArt-Alpha 1024px](https://github.com/PixArt-alpha/PixArt-alpha) is a transformer-based text-to-image diffusion system trained on text embeddings from T5. This demo uses the [PixArt-alpha/PixArt-XL-2-1024-MS](https://huggingface.co/PixArt-alpha/PixArt-XL-2-1024-MS) checkpoint.\n        #### English prompts ONLY; 提示词仅限英文\n        Don't want to queue? Try [OpenXLab](https://openxlab.org.cn/apps/detail/PixArt-alpha/PixArt-alpha) or [Google Colab Demo](https://colab.research.google.com/drive/1jZ5UZXk7tcpTfVwnX33dDuefNMcnW9ME?usp=sharing).\n        ### <span style='color: red;'>You may change the DPM-Solver inference steps from 14 to 20, if you didn't get satisfied results.\n        \"\"\"\nif not torch.cuda.is_available():\n    DESCRIPTION += \"\\n<p>Running on CPU 🥶 This demo does not work on CPU.</p>\"\n\nMAX_SEED = np.iinfo(np.int32).max\nCACHE_EXAMPLES = torch.cuda.is_available() and os.getenv(\"CACHE_EXAMPLES\", \"1\") == \"1\"\nMAX_IMAGE_SIZE = int(os.getenv(\"MAX_IMAGE_SIZE\", \"2048\"))\nUSE_TORCH_COMPILE = os.getenv(\"USE_TORCH_COMPILE\", \"0\") == \"1\"\nENABLE_CPU_OFFLOAD = os.getenv(\"ENABLE_CPU_OFFLOAD\", \"0\") == \"1\"\nPORT = int(os.getenv(\"DEMO_PORT\", \"15432\"))\n\ndevice = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n\n\nstyle_list = [\n    {\n        \"name\": \"(No style)\",\n        \"prompt\": \"{prompt}\",\n        \"negative_prompt\": \"\",\n    },\n    {\n        \"name\": \"Cinematic\",\n        \"prompt\": \"cinematic still {prompt} . emotional, harmonious, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy\",\n        \"negative_prompt\": \"anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured\",\n    },\n    {\n        \"name\": \"Photographic\",\n        \"prompt\": \"cinematic photo {prompt} . 35mm photograph, film, bokeh, professional, 4k, highly detailed\",\n        \"negative_prompt\": \"drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly\",\n    },\n    {\n        \"name\": \"Anime\",\n        \"prompt\": \"anime artwork {prompt} . anime style, key visual, vibrant, studio anime,  highly detailed\",\n        \"negative_prompt\": \"photo, deformed, black and white, realism, disfigured, low contrast\",\n    },\n    {\n        \"name\": \"Manga\",\n        \"prompt\": \"manga style {prompt} . vibrant, high-energy, detailed, iconic, Japanese comic style\",\n        \"negative_prompt\": \"ugly, deformed, noisy, blurry, low contrast, realism, photorealistic, Western comic style\",\n    },\n    {\n        \"name\": \"Digital Art\",\n        \"prompt\": \"concept art {prompt} . digital artwork, illustrative, painterly, matte painting, highly detailed\",\n        \"negative_prompt\": \"photo, photorealistic, realism, ugly\",\n    },\n    {\n        \"name\": \"Pixel art\",\n        \"prompt\": \"pixel-art {prompt} . low-res, blocky, pixel art style, 8-bit graphics\",\n        \"negative_prompt\": \"sloppy, messy, blurry, noisy, highly detailed, ultra textured, photo, realistic\",\n    },\n    {\n        \"name\": \"Fantasy art\",\n        \"prompt\": \"ethereal fantasy concept art of  {prompt} . magnificent, celestial, ethereal, painterly, epic, majestic, magical, fantasy art, cover art, dreamy\",\n        \"negative_prompt\": \"photographic, realistic, realism, 35mm film, dslr, cropped, frame, text, deformed, glitch, noise, noisy, off-center, deformed, cross-eyed, closed eyes, bad anatomy, ugly, disfigured, sloppy, duplicate, mutated, black and white\",\n    },\n    {\n        \"name\": \"Neonpunk\",\n        \"prompt\": \"neonpunk style {prompt} . cyberpunk, vaporwave, neon, vibes, vibrant, stunningly beautiful, crisp, detailed, sleek, ultramodern, magenta highlights, dark purple shadows, high contrast, cinematic, ultra detailed, intricate, professional\",\n        \"negative_prompt\": \"painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured\",\n    },\n    {\n        \"name\": \"3D Model\",\n        \"prompt\": \"professional 3d model {prompt} . octane render, highly detailed, volumetric, dramatic lighting\",\n        \"negative_prompt\": \"ugly, deformed, noisy, low poly, blurry, painting\",\n    },\n]\n\n\nstyles = {k[\"name\"]: (k[\"prompt\"], k[\"negative_prompt\"]) for k in style_list}\nSTYLE_NAMES = list(styles.keys())\nDEFAULT_STYLE_NAME = \"(No style)\"\nSCHEDULE_NAME = [\"DPM-Solver\", \"SA-Solver\"]\nDEFAULT_SCHEDULE_NAME = \"DPM-Solver\"\nNUM_IMAGES_PER_PROMPT = 1\n\ndef apply_style(style_name: str, positive: str, negative: str = \"\") -> Tuple[str, str]:\n    p, n = styles.get(style_name, styles[DEFAULT_STYLE_NAME])\n    if not negative:\n        negative = \"\"\n    return p.replace(\"{prompt}\", positive), n + negative\n\n\nif torch.cuda.is_available():\n    pipe = PixArtAlphaPipeline.from_pretrained(\n        \"PixArt-alpha/PixArt-XL-2-1024-MS\",\n        torch_dtype=torch.float16,\n        use_safetensors=True,\n    )\n\n    if os.getenv('CONSISTENCY_DECODER', False):\n        print(\"Using DALL-E 3 Consistency Decoder\")\n        pipe.vae = ConsistencyDecoderVAE.from_pretrained(\"openai/consistency-decoder\", torch_dtype=torch.float16)\n\n    if ENABLE_CPU_OFFLOAD:\n        pipe.enable_model_cpu_offload()\n    else:\n        pipe.to(device)\n        print(\"Loaded on Device!\")\n\n    # speed-up T5\n    pipe.text_encoder.to_bettertransformer()\n\n    if USE_TORCH_COMPILE:\n        pipe.transformer = torch.compile(pipe.transformer, mode=\"reduce-overhead\", fullgraph=True)\n        print(\"Model Compiled!\")\n\n\ndef save_image(img):\n    unique_name = f'{str(uuid.uuid4())}.png'\n    save_path = os.path.join(f'output/online_demo_img/{datetime.now().date()}')\n    os.makedirs(save_path, exist_ok=True)\n    unique_name = os.path.join(save_path, unique_name)\n    img.save(unique_name)\n    return unique_name\n\n\ndef randomize_seed_fn(seed: int, randomize_seed: bool) -> int:\n    if randomize_seed:\n        seed = random.randint(0, MAX_SEED)\n    return seed\n\n\ndef generate(\n        prompt: str,\n        negative_prompt: str = \"\",\n        style: str = DEFAULT_STYLE_NAME,\n        use_negative_prompt: bool = False,\n        seed: int = 0,\n        width: int = 1024,\n        height: int = 1024,\n        schedule: str = 'DPM-Solver',\n        dpms_guidance_scale: float = 4.5,\n        sas_guidance_scale: float = 3,\n        dpms_inference_steps: int = 20,\n        sas_inference_steps: int = 25,\n        randomize_seed: bool = False,\n        use_resolution_binning: bool = True,\n        progress=gr.Progress(track_tqdm=True),\n):\n    seed = int(randomize_seed_fn(seed, randomize_seed))\n    generator = torch.Generator().manual_seed(seed)\n\n    if schedule == 'DPM-Solver':\n        if not isinstance(pipe.scheduler, DPMSolverMultistepScheduler):\n            pipe.scheduler = DPMSolverMultistepScheduler()\n        num_inference_steps = dpms_inference_steps\n        guidance_scale = dpms_guidance_scale\n    elif schedule == \"SA-Solver\":\n        if not isinstance(pipe.scheduler, SASolverScheduler):\n            pipe.scheduler = SASolverScheduler.from_config(pipe.scheduler.config, algorithm_type='data_prediction', tau_func=lambda t: 1 if 200 <= t <= 800 else 0, predictor_order=2, corrector_order=2)\n        num_inference_steps = sas_inference_steps\n        guidance_scale = sas_guidance_scale\n    else:\n        raise ValueError(f\"Unknown schedule: {schedule}\")\n\n    if not use_negative_prompt:\n        negative_prompt = None  # type: ignore\n    prompt, negative_prompt = apply_style(style, prompt, negative_prompt)\n\n    images = pipe(\n        prompt=prompt,\n        width=width,\n        height=height,\n        negative_prompt=negative_prompt,\n        guidance_scale=guidance_scale,\n        num_inference_steps=num_inference_steps,\n        generator=generator,\n        num_images_per_prompt=NUM_IMAGES_PER_PROMPT,\n        use_resolution_binning=use_resolution_binning,\n        output_type=\"pil\",\n    ).images\n\n    image_paths = [save_image(img) for img in images]\n    print(image_paths)\n    return image_paths, seed\n\n\nexamples = [\n    \"A small cactus with a happy face in the Sahara desert.\",\n    \"an astronaut sitting in a diner, eating fries, cinematic, analog film\",\n    \"Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, intricate detail.\",\n    \"stars, water, brilliantly, gorgeous large scale scene, a little girl, in the style of dreamy realism, light gold and amber, blue and pink, brilliantly illuminated in the background.\",\n    \"professional portrait photo of an anthropomorphic cat wearing fancy gentleman hat and jacket walking in autumn forest.\",\n    \"beautiful lady, freckles, big smile, blue eyes, short ginger hair, dark makeup, wearing a floral blue vest top, soft light, dark grey background\",\n    \"Spectacular Tiny World in the Transparent Jar On the Table, interior of the Great Hall, Elaborate, Carved Architecture, Anatomy, Symetrical, Geometric and Parameteric Details, Precision Flat line Details, Pattern, Dark fantasy, Dark errie mood and ineffably mysterious mood, Technical design, Intricate Ultra Detail, Ornate Detail, Stylized and Futuristic and Biomorphic Details, Architectural Concept, Low contrast Details, Cinematic Lighting, 8k, by moebius, Fullshot, Epic, Fullshot, Octane render, Unreal ,Photorealistic, Hyperrealism\",\n    \"anthropomorphic profile of the white snow owl Crystal priestess , art deco painting, pretty and expressive eyes, ornate costume, mythical, ethereal, intricate, elaborate, hyperrealism, hyper detailed, 3D, 8K, Ultra Realistic, high octane, ultra resolution, amazing detail, perfection, In frame, photorealistic, cinematic lighting, visual clarity, shading , Lumen Reflections, Super-Resolution, gigapixel, color grading, retouch, enhanced, PBR, Blender, V-ray, Procreate, zBrush, Unreal Engine 5, cinematic, volumetric, dramatic, neon lighting, wide angle lens ,no digital painting blur\",\n    \"The parametric hotel lobby is a sleek and modern space with plenty of natural light. The lobby is spacious and open with a variety of seating options. The front desk is a sleek white counter with a parametric design. The walls are a light blue color with parametric patterns. The floor is a light wood color with a parametric design. There are plenty of plants and flowers throughout the space. The overall effect is a calm and relaxing space. occlusion, moody, sunset, concept art, octane rendering, 8k, highly detailed, concept art, highly detailed, beautiful scenery, cinematic, beautiful light, hyperreal, octane render, hdr, long exposure, 8K, realistic, fog, moody, fire and explosions, smoke, 50mm f2.8\",\n]\n\nwith gr.Blocks(css=\"app/style.css\") as demo:\n    gr.Markdown(DESCRIPTION)\n    gr.DuplicateButton(\n        value=\"Duplicate Space for private use\",\n        elem_id=\"duplicate-button\",\n        visible=os.getenv(\"SHOW_DUPLICATE_BUTTON\") == \"1\",\n    )\n    with gr.Group():\n        with gr.Row():\n            prompt = gr.Text(\n                label=\"Prompt\",\n                show_label=False,\n                max_lines=1,\n                placeholder=\"Enter your prompt\",\n                container=False,\n            )\n            run_button = gr.Button(\"Run\", scale=0)\n        result = gr.Gallery(label=\"Result\", columns=NUM_IMAGES_PER_PROMPT, show_label=False)\n    with gr.Accordion(\"Advanced options\", open=False):\n        with gr.Row():\n            use_negative_prompt = gr.Checkbox(label=\"Use negative prompt\", value=False, visible=True)\n        schedule = gr.Radio(\n            show_label=True,\n            container=True,\n            interactive=True,\n            choices=SCHEDULE_NAME,\n            value=DEFAULT_SCHEDULE_NAME,\n            label=\"Sampler Schedule\",\n            visible=True,\n        )\n        style_selection = gr.Radio(\n            show_label=True,\n            container=True,\n            interactive=True,\n            choices=STYLE_NAMES,\n            value=DEFAULT_STYLE_NAME,\n            label=\"Image Style\",\n        )\n        negative_prompt = gr.Text(\n            label=\"Negative prompt\",\n            max_lines=1,\n            placeholder=\"Enter a negative prompt\",\n            visible=True,\n        )\n        seed = gr.Slider(\n            label=\"Seed\",\n            minimum=0,\n            maximum=MAX_SEED,\n            step=1,\n            value=0,\n        )\n        randomize_seed = gr.Checkbox(label=\"Randomize seed\", value=True)\n        with gr.Row(visible=True):\n            width = gr.Slider(\n                label=\"Width\",\n                minimum=256,\n                maximum=MAX_IMAGE_SIZE,\n                step=32,\n                value=1024,\n            )\n            height = gr.Slider(\n                label=\"Height\",\n                minimum=256,\n                maximum=MAX_IMAGE_SIZE,\n                step=32,\n                value=1024,\n            )\n        with gr.Row():\n            dpms_guidance_scale = gr.Slider(\n                label=\"DPM-Solver Guidance scale\",\n                minimum=1,\n                maximum=10,\n                step=0.1,\n                value=4.5,\n            )\n            dpms_inference_steps = gr.Slider(\n                label=\"DPM-Solver inference steps\",\n                minimum=5,\n                maximum=40,\n                step=1,\n                value=14,\n            )\n        with gr.Row():\n            sas_guidance_scale = gr.Slider(\n                label=\"SA-Solver Guidance scale\",\n                minimum=1,\n                maximum=10,\n                step=0.1,\n                value=3,\n            )\n            sas_inference_steps = gr.Slider(\n                label=\"SA-Solver inference steps\",\n                minimum=10,\n                maximum=40,\n                step=1,\n                value=25,\n            )\n\n    gr.Examples(\n        examples=examples,\n        inputs=prompt,\n        outputs=[result, seed],\n        fn=generate,\n        cache_examples=CACHE_EXAMPLES,\n    )\n\n    use_negative_prompt.change(\n        fn=lambda x: gr.update(visible=x),\n        inputs=use_negative_prompt,\n        outputs=negative_prompt,\n        api_name=False,\n    )\n\n    gr.on(\n        triggers=[\n            prompt.submit,\n            negative_prompt.submit,\n            run_button.click,\n        ],\n        fn=generate,\n        inputs=[\n            prompt,\n            negative_prompt,\n            style_selection,\n            use_negative_prompt,\n            seed,\n            width,\n            height,\n            schedule,\n            dpms_guidance_scale,\n            sas_guidance_scale,\n            dpms_inference_steps,\n            sas_inference_steps,\n            randomize_seed,\n        ],\n        outputs=[result, seed],\n        api_name=\"run\",\n    )\n\nif __name__ == \"__main__\":\n    demo.queue(max_size=20).launch(server_name=\"0.0.0.0\", server_port=PORT, debug=True)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/app/app_512.py",
    "content": "#!/usr/bin/env python\nfrom __future__ import annotations\nimport os\nimport sys\nfrom pathlib import Path\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\nimport random\nimport gradio as gr\nimport numpy as np\nimport uuid\nfrom diffusers import PixArtAlphaPipeline, ConsistencyDecoderVAE, DPMSolverMultistepScheduler\nimport torch\nfrom typing import Tuple\nfrom datetime import datetime\nfrom diffusion.data.datasets import ASPECT_RATIO_512_TEST\nfrom diffusion.model.utils import resize_and_crop_img\nfrom diffusion.sa_solver_diffusers import SASolverScheduler\n\n\nDESCRIPTION = \"\"\"![Logo](https://raw.githubusercontent.com/PixArt-alpha/PixArt-alpha.github.io/master/static/images/logo.png)\n        # PixArt-Alpha 512px\n        #### [PixArt-Alpha 512px](https://github.com/PixArt-alpha/PixArt-alpha) is a transformer-based text-to-image diffusion system trained on text embeddings from T5. This demo uses the [PixArt-alpha/PixArt-XL-2-512x512](https://huggingface.co/PixArt-alpha/PixArt-XL-2-512x512) checkpoint.\n        #### English prompts ONLY; 提示词仅限英文\n        Don't want to queue? Try [OpenXLab](https://openxlab.org.cn/apps/detail/PixArt-alpha/PixArt-alpha) or [Google Colab Demo](https://colab.research.google.com/drive/1jZ5UZXk7tcpTfVwnX33dDuefNMcnW9ME?usp=sharing).\n        \"\"\"\nif not torch.cuda.is_available():\n    DESCRIPTION += \"\\n<p>Running on CPU 🥶 This demo does not work on CPU.</p>\"\n\nMAX_SEED = np.iinfo(np.int32).max\nCACHE_EXAMPLES = torch.cuda.is_available() and os.getenv(\"CACHE_EXAMPLES\", \"1\") == \"1\"\nMAX_IMAGE_SIZE = int(os.getenv(\"MAX_IMAGE_SIZE\", \"1024\"))\nUSE_TORCH_COMPILE = os.getenv(\"USE_TORCH_COMPILE\", \"0\") == \"1\"\nENABLE_CPU_OFFLOAD = os.getenv(\"ENABLE_CPU_OFFLOAD\", \"0\") == \"1\"\nPORT = int(os.getenv(\"DEMO_PORT\", \"15432\"))\n\ndevice = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n\n\nstyle_list = [\n    {\n        \"name\": \"(No style)\",\n        \"prompt\": \"{prompt}\",\n        \"negative_prompt\": \"\",\n    },\n    {\n        \"name\": \"Cinematic\",\n        \"prompt\": \"cinematic still {prompt} . emotional, harmonious, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy\",\n        \"negative_prompt\": \"anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured\",\n    },\n    {\n        \"name\": \"Photographic\",\n        \"prompt\": \"cinematic photo {prompt} . 35mm photograph, film, bokeh, professional, 4k, highly detailed\",\n        \"negative_prompt\": \"drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly\",\n    },\n    {\n        \"name\": \"Anime\",\n        \"prompt\": \"anime artwork {prompt} . anime style, key visual, vibrant, studio anime,  highly detailed\",\n        \"negative_prompt\": \"photo, deformed, black and white, realism, disfigured, low contrast\",\n    },\n    {\n        \"name\": \"Manga\",\n        \"prompt\": \"manga style {prompt} . vibrant, high-energy, detailed, iconic, Japanese comic style\",\n        \"negative_prompt\": \"ugly, deformed, noisy, blurry, low contrast, realism, photorealistic, Western comic style\",\n    },\n    {\n        \"name\": \"Digital Art\",\n        \"prompt\": \"concept art {prompt} . digital artwork, illustrative, painterly, matte painting, highly detailed\",\n        \"negative_prompt\": \"photo, photorealistic, realism, ugly\",\n    },\n    {\n        \"name\": \"Pixel art\",\n        \"prompt\": \"pixel-art {prompt} . low-res, blocky, pixel art style, 8-bit graphics\",\n        \"negative_prompt\": \"sloppy, messy, blurry, noisy, highly detailed, ultra textured, photo, realistic\",\n    },\n    {\n        \"name\": \"Fantasy art\",\n        \"prompt\": \"ethereal fantasy concept art of  {prompt} . magnificent, celestial, ethereal, painterly, epic, majestic, magical, fantasy art, cover art, dreamy\",\n        \"negative_prompt\": \"photographic, realistic, realism, 35mm film, dslr, cropped, frame, text, deformed, glitch, noise, noisy, off-center, deformed, cross-eyed, closed eyes, bad anatomy, ugly, disfigured, sloppy, duplicate, mutated, black and white\",\n    },\n    {\n        \"name\": \"Neonpunk\",\n        \"prompt\": \"neonpunk style {prompt} . cyberpunk, vaporwave, neon, vibes, vibrant, stunningly beautiful, crisp, detailed, sleek, ultramodern, magenta highlights, dark purple shadows, high contrast, cinematic, ultra detailed, intricate, professional\",\n        \"negative_prompt\": \"painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured\",\n    },\n    {\n        \"name\": \"3D Model\",\n        \"prompt\": \"professional 3d model {prompt} . octane render, highly detailed, volumetric, dramatic lighting\",\n        \"negative_prompt\": \"ugly, deformed, noisy, low poly, blurry, painting\",\n    },\n]\n\n\nstyles = {k[\"name\"]: (k[\"prompt\"], k[\"negative_prompt\"]) for k in style_list}\nSTYLE_NAMES = list(styles.keys())\nDEFAULT_STYLE_NAME = \"(No style)\"\nSCHEDULE_NAME = [\"DPM-Solver\", \"SA-Solver\"]\nDEFAULT_SCHEDULE_NAME = \"DPM-Solver\"\nNUM_IMAGES_PER_PROMPT = 2\n\n\ndef apply_style(style_name: str, positive: str, negative: str = \"\") -> Tuple[str, str]:\n    p, n = styles.get(style_name, styles[DEFAULT_STYLE_NAME])\n    if not negative:\n        negative = \"\"\n    return p.replace(\"{prompt}\", positive), n + negative\n\n\nif torch.cuda.is_available():\n    pipe = PixArtAlphaPipeline.from_pretrained(\n        \"PixArt-alpha/PixArt-XL-2-512x512\",\n        torch_dtype=torch.float16,\n        variant=\"fp16\",\n        use_safetensors=True,\n    )\n\n    if os.getenv('CONSISTENCY_DECODER', False):\n        print(\"Using DALL-E 3 Consistency Decoder\")\n        pipe.vae = ConsistencyDecoderVAE.from_pretrained(\"openai/consistency-decoder\", torch_dtype=torch.float16)\n\n    if ENABLE_CPU_OFFLOAD:\n        pipe.enable_model_cpu_offload()\n    else:\n        pipe.to(device)\n        print(\"Loaded on Device!\")\n\n    # speed-up T5\n    pipe.text_encoder.to_bettertransformer()\n\n    if USE_TORCH_COMPILE:\n        pipe.transformer = torch.compile(pipe.transformer, mode=\"reduce-overhead\", fullgraph=True)\n        print(\"Model Compiled!\")\n\n\ndef prepare_prompt_hw(height, width, ratios):\n    ar = float(height/width)\n    closest_ratio = min(ratios.keys(), key=lambda ratio: abs(float(ratio) - ar))\n    default_hw = ratios[closest_ratio]\n    return int(default_hw[0]), int(default_hw[1])\n\n\ndef save_image(img):\n    unique_name = f'{str(uuid.uuid4())}.png'\n    save_path = os.path.join(f'output/online_demo_img512/{datetime.now().date()}')\n    os.makedirs(save_path, exist_ok=True)\n    unique_name = os.path.join(save_path, unique_name)\n    img.save(unique_name)\n    return unique_name\n\n\ndef randomize_seed_fn(seed: int, randomize_seed: bool) -> int:\n    if randomize_seed:\n        seed = random.randint(0, MAX_SEED)\n    return seed\n\n\ndef classify_height_width_bin(height: int, width: int, ratios: dict):\n    ar = float(height / width)\n    closest_ratio = min(ratios.keys(), key=lambda ratio: abs(float(ratio) - ar))\n    default_hw = ratios[closest_ratio]\n    return int(default_hw[0]), int(default_hw[1])\n\n\ndef generate(\n        prompt: str,\n        negative_prompt: str = \"\",\n        style: str = DEFAULT_STYLE_NAME,\n        use_negative_prompt: bool = False,\n        seed: int = 0,\n        width: int = 512,\n        height: int = 512,\n        schedule: str = 'DPM-Solver',\n        dpms_guidance_scale: float = 4.5,\n        sas_guidance_scale: float = 3,\n        dpms_inference_steps: int = 20,\n        sas_inference_steps: int = 25,\n        randomize_seed: bool = False,\n        use_resolution_binning: bool = True,\n        progress=gr.Progress(track_tqdm=True),\n):\n    seed = int(randomize_seed_fn(seed, randomize_seed))\n    generator = torch.Generator().manual_seed(seed)\n\n    if schedule == 'DPM-Solver':\n        if not isinstance(pipe.scheduler, DPMSolverMultistepScheduler):\n            pipe.scheduler = DPMSolverMultistepScheduler()\n        num_inference_steps = dpms_inference_steps\n        guidance_scale = dpms_guidance_scale\n    elif schedule == \"SA-Solver\":\n        if not isinstance(pipe.scheduler, SASolverScheduler):\n            pipe.scheduler = SASolverScheduler.from_config(pipe.scheduler.config, algorithm_type='data_prediction', tau_func=lambda t: 1 if 200 <= t <= 800 else 0, predictor_order=2, corrector_order=2)\n        num_inference_steps = sas_inference_steps\n        guidance_scale = sas_guidance_scale\n    else:\n        raise ValueError(f\"Unknown schedule: {schedule}\")\n\n    if not use_negative_prompt:\n        negative_prompt = None  # type: ignore\n    prompt, negative_prompt = apply_style(style, prompt, negative_prompt)\n\n    if use_resolution_binning:\n        orig_height, orig_width = height, width\n        height, width = classify_height_width_bin(height, width, ratios=ASPECT_RATIO_512_TEST)\n\n    images = pipe(\n        prompt=prompt,\n        width=width,\n        height=height,\n        negative_prompt=negative_prompt,\n        guidance_scale=guidance_scale,\n        num_inference_steps=num_inference_steps,\n        generator=generator,\n        use_resolution_binning=False,\n        num_images_per_prompt=NUM_IMAGES_PER_PROMPT,\n        output_type=\"pil\",\n    ).images\n\n    if use_resolution_binning:\n        images = [resize_and_crop_img(img, orig_width, orig_height) for img in images]\n    image_paths = [save_image(img) for img in images]\n    print(image_paths)\n    return image_paths, seed\n\n\nexamples = [\n    \"A small cactus with a happy face in the Sahara desert.\",\n    \"an astronaut sitting in a diner, eating fries, cinematic, analog film\",\n    \"Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, intricate detail.\",\n    \"stars, water, brilliantly, gorgeous large scale scene, a little girl, in the style of dreamy realism, light gold and amber, blue and pink, brilliantly illuminated in the background.\",\n    \"professional portrait photo of an anthropomorphic cat wearing fancy gentleman hat and jacket walking in autumn forest.\",\n    \"beautiful lady, freckles, big smile, blue eyes, short ginger hair, dark makeup, wearing a floral blue vest top, soft light, dark grey background\",\n    \"Spectacular Tiny World in the Transparent Jar On the Table, interior of the Great Hall, Elaborate, Carved Architecture, Anatomy, Symetrical, Geometric and Parameteric Details, Precision Flat line Details, Pattern, Dark fantasy, Dark errie mood and ineffably mysterious mood, Technical design, Intricate Ultra Detail, Ornate Detail, Stylized and Futuristic and Biomorphic Details, Architectural Concept, Low contrast Details, Cinematic Lighting, 8k, by moebius, Fullshot, Epic, Fullshot, Octane render, Unreal ,Photorealistic, Hyperrealism\",\n    \"anthropomorphic profile of the white snow owl Crystal priestess , art deco painting, pretty and expressive eyes, ornate costume, mythical, ethereal, intricate, elaborate, hyperrealism, hyper detailed, 3D, 8K, Ultra Realistic, high octane, ultra resolution, amazing detail, perfection, In frame, photorealistic, cinematic lighting, visual clarity, shading , Lumen Reflections, Super-Resolution, gigapixel, color grading, retouch, enhanced, PBR, Blender, V-ray, Procreate, zBrush, Unreal Engine 5, cinematic, volumetric, dramatic, neon lighting, wide angle lens ,no digital painting blur\",\n    \"The parametric hotel lobby is a sleek and modern space with plenty of natural light. The lobby is spacious and open with a variety of seating options. The front desk is a sleek white counter with a parametric design. The walls are a light blue color with parametric patterns. The floor is a light wood color with a parametric design. There are plenty of plants and flowers throughout the space. The overall effect is a calm and relaxing space. occlusion, moody, sunset, concept art, octane rendering, 8k, highly detailed, concept art, highly detailed, beautiful scenery, cinematic, beautiful light, hyperreal, octane render, hdr, long exposure, 8K, realistic, fog, moody, fire and explosions, smoke, 50mm f2.8\",\n]\n\nwith gr.Blocks(css=\"scripts/style.css\") as demo:\n    gr.Markdown(DESCRIPTION)\n    gr.DuplicateButton(\n        value=\"Duplicate Space for private use\",\n        elem_id=\"duplicate-button\",\n        visible=os.getenv(\"SHOW_DUPLICATE_BUTTON\") == \"1\",\n    )\n    with gr.Group():\n        with gr.Row():\n            prompt = gr.Text(\n                label=\"Prompt\",\n                show_label=False,\n                max_lines=1,\n                placeholder=\"Enter your prompt\",\n                container=False,\n            )\n            run_button = gr.Button(\"Run\", scale=0)\n        result = gr.Gallery(label=\"Result\", columns=NUM_IMAGES_PER_PROMPT, show_label=False)\n    with gr.Accordion(\"Advanced options\", open=False):\n        with gr.Row():\n            use_negative_prompt = gr.Checkbox(label=\"Use negative prompt\", value=False, visible=False)\n        schedule = gr.Radio(\n            show_label=True,\n            container=True,\n            interactive=True,\n            choices=SCHEDULE_NAME,\n            value=DEFAULT_SCHEDULE_NAME,\n            label=\"Sampler Schedule\",\n            visible=True,\n        )\n        style_selection = gr.Radio(\n            show_label=True,\n            container=True,\n            interactive=True,\n            choices=STYLE_NAMES,\n            value=DEFAULT_STYLE_NAME,\n            label=\"Image Style\",\n        )\n        negative_prompt = gr.Text(\n            label=\"Negative prompt (no use now)\",\n            max_lines=1,\n            placeholder=\"Enter a negative prompt\",\n            visible=False,\n        )\n        seed = gr.Slider(\n            label=\"Seed\",\n            minimum=0,\n            maximum=MAX_SEED,\n            step=1,\n            value=0,\n        )\n        randomize_seed = gr.Checkbox(label=\"Randomize seed\", value=True)\n        with gr.Row(visible=True):\n            width = gr.Slider(\n                label=\"Width\",\n                minimum=256,\n                maximum=MAX_IMAGE_SIZE,\n                step=32,\n                value=512,\n            )\n            height = gr.Slider(\n                label=\"Height\",\n                minimum=256,\n                maximum=MAX_IMAGE_SIZE,\n                step=32,\n                value=512,\n            )\n        with gr.Row():\n            dpms_guidance_scale = gr.Slider(\n                label=\"DPM-Solver Guidance scale\",\n                minimum=1,\n                maximum=10,\n                step=0.1,\n                value=4.5,\n            )\n            dpms_inference_steps = gr.Slider(\n                label=\"DPM-Solver inference steps\",\n                minimum=5,\n                maximum=40,\n                step=1,\n                value=20,\n            )\n        with gr.Row():\n            sas_guidance_scale = gr.Slider(\n                label=\"SA-Solver Guidance scale\",\n                minimum=1,\n                maximum=10,\n                step=0.1,\n                value=3,\n            )\n            sas_inference_steps = gr.Slider(\n                label=\"SA-Solver inference steps\",\n                minimum=10,\n                maximum=40,\n                step=1,\n                value=25,\n            )\n\n    gr.Examples(\n        examples=examples,\n        inputs=prompt,\n        outputs=[result, seed],\n        fn=generate,\n        cache_examples=CACHE_EXAMPLES,\n    )\n\n    use_negative_prompt.change(\n        fn=lambda x: gr.update(visible=x),\n        inputs=use_negative_prompt,\n        outputs=negative_prompt,\n        api_name=False,\n    )\n\n    gr.on(\n        triggers=[\n            prompt.submit,\n            negative_prompt.submit,\n            run_button.click,\n        ],\n        fn=generate,\n        inputs=[\n            prompt,\n            negative_prompt,\n            style_selection,\n            use_negative_prompt,\n            seed,\n            width,\n            height,\n            schedule,\n            dpms_guidance_scale,\n            sas_guidance_scale,\n            dpms_inference_steps,\n            sas_inference_steps,\n            randomize_seed,\n        ],\n        outputs=[result, seed],\n        api_name=\"run\",\n    )\n\nif __name__ == \"__main__\":\n    demo.queue(max_size=20).launch(server_name=\"0.0.0.0\", server_port=PORT, debug=True)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/app/app_controlnet.py",
    "content": "#!/usr/bin/env python\nfrom __future__ import annotations\n\nimport argparse\nimport os\nimport random\nimport sys\n\nimport uuid\nfrom datetime import datetime\nfrom pathlib import Path\nfrom typing import List, Tuple, Union\n\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\n\nimport gradio as gr\nimport numpy as np\nimport torch\nfrom PIL import Image as PILImage\nimport torchvision.transforms as T\nimport torchvision.transforms.functional as TF\nfrom torchvision.utils import _log_api_usage_once, make_grid, save_image\n\nfrom diffusers import PixArtAlphaPipeline\nfrom diffusion import DPMS, SASolverSampler\nfrom diffusion.data.datasets import *\nfrom diffusion.model.hed import HEDdetector\nfrom diffusion.model.nets import PixArt_XL_2, PixArtMS_XL_2, ControlPixArtHalf, ControlPixArtMSHalf\nfrom diffusion.model.utils import resize_and_crop_tensor\nfrom diffusion.utils.misc import read_config\nfrom tools.download import find_model\n\n\nDESCRIPTION = \"\"\"![Logo](https://raw.githubusercontent.com/PixArt-alpha/PixArt-alpha.github.io/master/static/images/logo.png)\n        # PixArt-Delta (ControlNet)\n        #### [PixArt-Alpha 1024px](https://github.com/PixArt-alpha/PixArt-alpha) is a transformer-based text-to-image diffusion system trained on text embeddings from T5. \n        #### This demo uses the [PixArt-alpha/PixArt-XL-2-1024-ControlNet](https://huggingface.co/PixArt-alpha/PixArt-ControlNet/tree/main) checkpoint.\n        #### This demo uses the [PixArt-alpha/PixArt-XL-2-512-ControlNet](https://huggingface.co/PixArt-alpha/PixArt-ControlNet/tree/main) checkpoint.\n        #### English prompts ONLY; 提示词仅限英文\n        ### <span style='color: red;'>Please use the image size corresponding to the model as input to get the best performance. (eg. 1024px for PixArt-XL-2-1024-ControlNet.pth)\n        \"\"\"\nif not torch.cuda.is_available():\n    DESCRIPTION += \"\\n<p>Running on CPU �� This demo does not work on CPU.</p>\"\n\nMAX_SEED = np.iinfo(np.int32).max\nCACHE_EXAMPLES = torch.cuda.is_available() and os.getenv(\"CACHE_EXAMPLES\", \"1\") == \"1\"\nMAX_IMAGE_SIZE = int(os.getenv(\"MAX_IMAGE_SIZE\", \"2048\"))\nUSE_TORCH_COMPILE = os.getenv(\"USE_TORCH_COMPILE\", \"0\") == \"1\"\nENABLE_CPU_OFFLOAD = os.getenv(\"ENABLE_CPU_OFFLOAD\", \"0\") == \"1\"\nPORT = int(os.getenv(\"DEMO_PORT\", \"15432\"))\n\ndevice = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n\n\n@torch.no_grad()\ndef ndarr_image(tensor: Union[torch.Tensor, List[torch.Tensor]], **kwargs, ) -> None:\n    if not torch.jit.is_scripting() and not torch.jit.is_tracing():\n        _log_api_usage_once(save_image)\n    grid = make_grid(tensor, **kwargs)\n    ndarr = grid.mul(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).to(\"cpu\", torch.uint8).numpy()\n    return ndarr\n\n\nstyle_list = [\n    {\n        \"name\": \"(No style)\",\n        \"prompt\": \"{prompt}\",\n        \"negative_prompt\": \"\",\n    },\n    {\n        \"name\": \"Cinematic\",\n        \"prompt\": \"cinematic still {prompt} . emotional, harmonious, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy\",\n        \"negative_prompt\": \"anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured\",\n    },\n    {\n        \"name\": \"Photographic\",\n        \"prompt\": \"cinematic photo {prompt} . 35mm photograph, film, bokeh, professional, 4k, highly detailed\",\n        \"negative_prompt\": \"drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly\",\n    },\n    {\n        \"name\": \"Anime\",\n        \"prompt\": \"anime artwork {prompt} . anime style, key visual, vibrant, studio anime,  highly detailed\",\n        \"negative_prompt\": \"photo, deformed, black and white, realism, disfigured, low contrast\",\n    },\n    {\n        \"name\": \"Manga\",\n        \"prompt\": \"manga style {prompt} . vibrant, high-energy, detailed, iconic, Japanese comic style\",\n        \"negative_prompt\": \"ugly, deformed, noisy, blurry, low contrast, realism, photorealistic, Western comic style\",\n    },\n    {\n        \"name\": \"Digital Art\",\n        \"prompt\": \"concept art {prompt} . digital artwork, illustrative, painterly, matte painting, highly detailed\",\n        \"negative_prompt\": \"photo, photorealistic, realism, ugly\",\n    },\n    {\n        \"name\": \"Pixel art\",\n        \"prompt\": \"pixel-art {prompt} . low-res, blocky, pixel art style, 8-bit graphics\",\n        \"negative_prompt\": \"sloppy, messy, blurry, noisy, highly detailed, ultra textured, photo, realistic\",\n    },\n    {\n        \"name\": \"Fantasy art\",\n        \"prompt\": \"ethereal fantasy concept art of  {prompt} . magnificent, celestial, ethereal, painterly, epic, majestic, magical, fantasy art, cover art, dreamy\",\n        \"negative_prompt\": \"photographic, realistic, realism, 35mm film, dslr, cropped, frame, text, deformed, glitch, noise, noisy, off-center, deformed, cross-eyed, closed eyes, bad anatomy, ugly, disfigured, sloppy, duplicate, mutated, black and white\",\n    },\n    {\n        \"name\": \"Neonpunk\",\n        \"prompt\": \"neonpunk style {prompt} . cyberpunk, vaporwave, neon, vibes, vibrant, stunningly beautiful, crisp, detailed, sleek, ultramodern, magenta highlights, dark purple shadows, high contrast, cinematic, ultra detailed, intricate, professional\",\n        \"negative_prompt\": \"painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured\",\n    },\n    {\n        \"name\": \"3D Model\",\n        \"prompt\": \"professional 3d model {prompt} . octane render, highly detailed, volumetric, dramatic lighting\",\n        \"negative_prompt\": \"ugly, deformed, noisy, low poly, blurry, painting\",\n    },\n]\n\n\nstyles = {k[\"name\"]: (k[\"prompt\"], k[\"negative_prompt\"]) for k in style_list}\nSTYLE_NAMES = list(styles.keys())\nDEFAULT_STYLE_NAME = \"(No style)\"\nSCHEDULE_NAME = [\"DPM-Solver\", \"SA-Solver\"]\nDEFAULT_SCHEDULE_NAME = \"DPM-Solver\"\n\ndef apply_style(style_name: str, positive: str, negative: str = \"\") -> Tuple[str, str]:\n    p, n = styles.get(style_name, styles[DEFAULT_STYLE_NAME])\n    if not negative:\n        negative = \"\"\n    return p.replace(\"{prompt}\", positive), n + negative\n\n\ndef save_image(img):\n    unique_name = str(uuid.uuid4()) + '.png'\n    save_path = os.path.join(f'output/online_demo_img/{datetime.now().date()}')\n    os.makedirs(save_path, exist_ok=True)\n    unique_name = os.path.join(save_path, unique_name)\n    img.save(unique_name)\n    return unique_name\n\n\ndef randomize_seed_fn(seed: int, randomize_seed: bool) -> int:\n    if randomize_seed:\n        seed = random.randint(0, MAX_SEED)\n    return seed\n\n\n@torch.inference_mode()\ndef generate(\n        prompt: str,\n        given_image = None,\n        negative_prompt: str = \"\",\n        style: str = DEFAULT_STYLE_NAME,\n        use_negative_prompt: bool = False,\n        seed: int = 0,\n        width: int = 1024,\n        height: int = 1024,\n        schedule: str = 'DPM-Solver',\n        dpms_guidance_scale: float = 4.5,\n        sas_guidance_scale: float = 3,\n        dpms_inference_steps: int = 14,\n        sas_inference_steps: int = 25,\n        randomize_seed: bool = False,\n):\n    seed = int(randomize_seed_fn(seed, randomize_seed))\n    torch.manual_seed(seed)\n    torch.cuda.empty_cache()\n    strength = 1.0\n    c_vis = given_image\n\n    if not use_negative_prompt:\n        negative_prompt = None  # type: ignore\n    prompt, negative_prompt = apply_style(style, prompt, negative_prompt)\n\n    prompt_embeds, prompt_attention_mask, negative_prompt_embeds, negative_prompt_attention_mask\\\n        = pipe.encode_prompt(prompt=prompt, negative_prompt=negative_prompt)\n    prompt_embeds, negative_prompt_embeds = prompt_embeds[:, None], negative_prompt_embeds[:, None]\n    torch.cuda.empty_cache()\n\n    # condition process\n    if given_image is not None:\n        ar = torch.tensor([given_image.size[1] / given_image.size[0]], device=device)[None]\n        custom_hw = torch.tensor([given_image.size[1], given_image.size[0]], device=device)[None]\n        closest_hw = base_ratios[min(base_ratios.keys(), key=lambda ratio: abs(float(ratio) - ar))]\n        hw = torch.tensor(closest_hw, device=device)[None]\n        condition_transform = T.Compose([\n            T.Lambda(lambda img: img.convert('RGB')),\n            T.Resize(int(min(closest_hw))),\n            T.CenterCrop([int(closest_hw[0]), int(closest_hw[1])]),\n            T.ToTensor(),\n        ])\n\n        given_image = condition_transform(given_image).unsqueeze(0).to(device)\n        hed_edge = hed(given_image) * strength\n        hed_edge = TF.normalize(hed_edge, [.5], [.5])\n        hed_edge = hed_edge.repeat(1, 3, 1, 1).to(weight_dtype)\n        posterior = vae.encode(hed_edge).latent_dist\n        condition = posterior.sample()\n        c = condition * config.scale_factor\n        c_vis = vae.decode(condition)['sample']\n        c_vis = torch.clamp(127.5 * c_vis + 128.0, 0, 255).permute(0, 2, 3, 1).to(\"cpu\", dtype=torch.uint8).numpy()[0]\n    else:\n        c = None\n        ar = torch.tensor([int(height) / int(width)], device=device)[None]\n        custom_hw = torch.tensor([int(height), int(width)], device=device)[None]\n        closest_hw = base_ratios[min(base_ratios.keys(), key=lambda ratio: abs(float(ratio) - ar))]\n        hw = torch.tensor(closest_hw, device=device)[None]\n\n    latent_size_h, latent_size_w = int(hw[0, 0] // 8), int(hw[0, 1] // 8)\n\n    # Sample images:\n    if schedule == 'DPM-Solver':\n        # Create sampling noise:\n        n = prompt_embeds.shape[0]\n        z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device)\n        model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=prompt_attention_mask, c=c)\n        dpm_solver = DPMS(model.forward_with_dpmsolver,\n                          condition=prompt_embeds,\n                          uncondition=negative_prompt_embeds,\n                          cfg_scale=dpms_guidance_scale,\n                          model_kwargs=model_kwargs)\n        samples = dpm_solver.sample(\n            z,\n            steps=dpms_inference_steps,\n            order=2,\n            skip_type=\"time_uniform\",\n            method=\"multistep\",\n        ).to(weight_dtype)\n    elif schedule == \"SA-Solver\":\n        # Create sampling noise:\n        n = prompt_embeds.shape[0]\n        model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=prompt_attention_mask, c=c)\n        sas_solver = SASolverSampler(model.forward_with_dpmsolver, device=device)\n        samples = sas_solver.sample(\n            S=sas_inference_steps,\n            batch_size=n,\n            shape=(4, latent_size_h, latent_size_w),\n            eta=1,\n            conditioning=prompt_embeds,\n            unconditional_conditioning=negative_prompt_embeds,\n            unconditional_guidance_scale=sas_guidance_scale,\n            model_kwargs=model_kwargs,\n        )[0].to(weight_dtype)\n\n    samples = vae.decode(samples / config.scale_factor).sample\n    torch.cuda.empty_cache()\n    samples = resize_and_crop_tensor(samples, custom_hw[0, 1], custom_hw[0, 0])\n    samples = PILImage.fromarray(ndarr_image(samples, normalize=True, value_range=(-1, 1)))\n    image_paths = [save_image(samples)]\n    c_vis = PILImage.fromarray(c_vis) if c_vis is not None else samples\n    c_paths = [save_image(c_vis)]\n    print(image_paths)\n    return image_paths, c_paths, seed\n\n\ndef get_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"config\", type=str, help=\"config\")\n    parser.add_argument('--image_size', default=1024, type=int)\n    parser.add_argument('--model_path', type=str)\n    return parser.parse_args()\n\n\nargs = get_args()\nconfig = read_config(args.config)\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n\nassert args.image_size in [512, 1024], \"We only provide pre-trained models for 512x512 and 1024x1024 resolutions.\"\nlewei_scale = {512: 1, 1024: 2}\nlatent_size = args.image_size // 8\nweight_dtype = torch.float16\nprint(f\"Inference with {weight_dtype}\")\n\nif torch.cuda.is_available():\n    hed = HEDdetector(False).to(device)\n    pipe = PixArtAlphaPipeline.from_pretrained(\n        \"PixArt-alpha/PixArt-XL-2-1024-MS\",\n        transformer=None,\n        torch_dtype=weight_dtype,\n        use_safetensors=True,\n    )\n    pipe.to(device)\n    print(\"Loaded on Device!\")\n    vae = pipe.vae\n    text_encoder = pipe.text_encoder\n    tokenizer = pipe.tokenizer\n\n    assert args.image_size == config.image_size\n    if config.image_size == 512:\n        model = PixArt_XL_2(input_size=latent_size, lewei_scale=lewei_scale[config.image_size])\n        print('model architecture ControlPixArtHalf and image size is 512')\n        model = ControlPixArtHalf(model).to(device)\n    elif config.image_size == 1024:\n        model = PixArtMS_XL_2(input_size=latent_size, lewei_scale=lewei_scale[config.image_size])\n        print('model architecture ControlPixArtMSHalf and image size is 1024')\n        model = ControlPixArtMSHalf(model).to(device)\n\n    state_dict = find_model(args.model_path)['state_dict']\n    if 'pos_embed' in state_dict:\n        del state_dict['pos_embed']\n    elif 'base_model.pos_embed' in state_dict:\n        del state_dict['base_model.pos_embed']\n    missing, unexpected = model.load_state_dict(state_dict, strict=False)\n    print('Missing keys (missing pos_embed is normal): ', missing)\n    print('Unexpected keys', unexpected)\n    model.eval()\n    model.to(weight_dtype)\n    base_ratios = eval(f'ASPECT_RATIO_{args.image_size}_TEST')\n\nwith gr.Blocks(css=\"app/style_controlnet.css\") as demo:\n    gr.Markdown(DESCRIPTION)\n    gr.DuplicateButton(\n        value=\"Duplicate Space for private use\",\n        elem_id=\"duplicate-button\",\n        visible=os.getenv(\"SHOW_DUPLICATE_BUTTON\") == \"1\",\n    )\n    image_input = gr.Image(\n        label=\"Image\",\n        height=360,\n        width=360,\n        show_label=False,\n        sources=\"upload\",\n        type=\"pil\",\n    )\n    with gr.Group():\n        with gr.Row():\n            prompt = gr.Text(\n                label=\"Prompt\",\n                show_label=False,\n                max_lines=1,\n                placeholder=\"Enter your prompt\",\n                container=False,\n            )\n            run_button = gr.Button(\"Run\", scale=0)\n    with gr.Group():\n        with gr.Row():\n            hed_result = gr.Gallery(label=\"Hed Result\", show_label=False)\n            result = gr.Gallery(label=\"Result\", show_label=False)\n    with gr.Accordion(\"Advanced options\", open=False):\n        with gr.Row():\n            use_negative_prompt = gr.Checkbox(label=\"Use negative prompt\", value=False, visible=True)\n        schedule = gr.Radio(\n            show_label=True,\n            container=True,\n            interactive=True,\n            choices=SCHEDULE_NAME,\n            value=DEFAULT_SCHEDULE_NAME,\n            label=\"Sampler Schedule\",\n            visible=True,\n        )\n        style_selection = gr.Radio(\n            show_label=True,\n            container=True,\n            interactive=True,\n            choices=STYLE_NAMES,\n            value=DEFAULT_STYLE_NAME,\n            label=\"Image Style\",\n        )\n        negative_prompt = gr.Text(\n            label=\"Negative prompt\",\n            max_lines=1,\n            placeholder=\"Enter a negative prompt\",\n            visible=True,\n        )\n        seed = gr.Slider(\n            label=\"Seed\",\n            minimum=0,\n            maximum=MAX_SEED,\n            step=1,\n            value=0,\n        )\n        randomize_seed = gr.Checkbox(label=\"Randomize seed\", value=True)\n        with gr.Row(visible=True):\n            width = gr.Slider(\n                label=\"Width\",\n                minimum=256,\n                maximum=MAX_IMAGE_SIZE,\n                step=32,\n                value=config.image_size,\n            )\n            height = gr.Slider(\n                label=\"Height\",\n                minimum=256,\n                maximum=MAX_IMAGE_SIZE,\n                step=32,\n                value=config.image_size,\n            )\n        with gr.Row():\n            dpms_guidance_scale = gr.Slider(\n                label=\"DPM-Solver Guidance scale\",\n                minimum=1,\n                maximum=10,\n                step=0.1,\n                value=4.5,\n            )\n            dpms_inference_steps = gr.Slider(\n                label=\"DPM-Solver inference steps\",\n                minimum=5,\n                maximum=40,\n                step=1,\n                value=14,\n            )\n        with gr.Row():\n            sas_guidance_scale = gr.Slider(\n                label=\"SA-Solver Guidance scale\",\n                minimum=1,\n                maximum=10,\n                step=0.1,\n                value=3,\n            )\n            sas_inference_steps = gr.Slider(\n                label=\"SA-Solver inference steps\",\n                minimum=10,\n                maximum=40,\n                step=1,\n                value=25,\n            )\n\n    gr.Examples(\n        examples=[\n            [\n                \"anime superman in action\",\n                \"asset/images/controlnet/0_0.png\",\n            ],\n            [\n                \"illustration of A loving couple standing in the open kitchen of the living room, cooking ,Couples have a full body, with characters accounting for a quarter of the screen, and the composition of the living room has a large perspective, resulting in a larger space.\",\n                \"asset/images/controlnet/0_3.png\",\n            ],\n            [\n                \"A Electric 4 seats mini VAN,simple design stylel,led headlight,front 45 angle view,sunlight,clear sky.\",\n                \"asset/images/controlnet/0_2.png\",\n            ],\n        ],\n        inputs=[prompt, image_input],\n        outputs=[result, hed_result, seed],\n        fn=generate,\n        cache_examples=CACHE_EXAMPLES,\n\n    )\n\n    use_negative_prompt.change(\n        fn=lambda x: gr.update(visible=x),\n        inputs=use_negative_prompt,\n        outputs=negative_prompt,\n        api_name=False,\n    )\n\n    gr.on(\n        triggers=[\n            prompt.submit,\n            negative_prompt.submit,\n            run_button.click,\n        ],\n        fn=generate,\n        inputs=[\n            prompt,\n            image_input,\n            negative_prompt,\n            style_selection,\n            use_negative_prompt,\n            seed,\n            width,\n            height,\n            schedule,\n            dpms_guidance_scale,\n            sas_guidance_scale,\n            dpms_inference_steps,\n            sas_inference_steps,\n            randomize_seed,\n        ],\n        outputs=[result, hed_result, seed],\n        api_name=\"run\",\n    )\n\nif __name__ == \"__main__\":\n    demo.queue(max_size=20).launch(server_name=\"0.0.0.0\", server_port=PORT, debug=True)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/app/app_lcm.py",
    "content": "#!/usr/bin/env python\nfrom __future__ import annotations\nimport os\nimport sys\nfrom pathlib import Path\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\nimport random\nimport gradio as gr\nimport numpy as np\nimport uuid\nfrom diffusers import PixArtAlphaPipeline, Transformer2DModel\nfrom peft import PeftModel\nimport torch\nfrom typing import Tuple\nfrom datetime import datetime\nimport argparse\n\nDESCRIPTION = \"\"\"![Logo](https://raw.githubusercontent.com/PixArt-alpha/PixArt-alpha.github.io/master/static/images/pixart-lcm.png)\n        # PixArt-LCM 1024px\n        #### [PixArt-Alpha 1024px](https://github.com/PixArt-alpha/PixArt-alpha) is a transformer-based text-to-image diffusion system trained on text embeddings from T5. This demo uses the [PixArt-alpha/PixArt-LCM-XL-2-1024-MS](https://huggingface.co/PixArt-alpha/PixArt-LCM-XL-2-1024-MS) checkpoint.\n        #### [LCMs](https://github.com/luosiallen/latent-consistency-model) is a diffusion distillation method which predict PF-ODE's solution directly in latent space, achieving super fast inference with few steps.\n        #### English prompts ONLY; 提示词仅限英文\n        Don't want to queue? Try [OpenXLab](https://openxlab.org.cn/apps/detail/PixArt-alpha/PixArt-alpha) or [Google Colab Demo](https://colab.research.google.com/drive/1jZ5UZXk7tcpTfVwnX33dDuefNMcnW9ME?usp=sharing).\n        \"\"\"\nif not torch.cuda.is_available():\n    DESCRIPTION += \"\\n<p>Running on CPU 🥶 This demo does not work on CPU.</p>\"\n\nMAX_SEED = np.iinfo(np.int32).max\nCACHE_EXAMPLES = torch.cuda.is_available() and os.getenv(\"CACHE_EXAMPLES\", \"1\") == \"1\"\nMAX_IMAGE_SIZE = int(os.getenv(\"MAX_IMAGE_SIZE\", \"2048\"))\nUSE_TORCH_COMPILE = os.getenv(\"USE_TORCH_COMPILE\", \"0\") == \"1\"\nENABLE_CPU_OFFLOAD = os.getenv(\"ENABLE_CPU_OFFLOAD\", \"0\") == \"1\"\nPORT = int(os.getenv(\"DEMO_PORT\", \"15432\"))\n\ndevice = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n\n\nstyle_list = [\n    {\n        \"name\": \"(No style)\",\n        \"prompt\": \"{prompt}\",\n        \"negative_prompt\": \"\",\n    },\n    {\n        \"name\": \"Cinematic\",\n        \"prompt\": \"cinematic still {prompt} . emotional, harmonious, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy\",\n        \"negative_prompt\": \"anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured\",\n    },\n    {\n        \"name\": \"Photographic\",\n        \"prompt\": \"cinematic photo {prompt} . 35mm photograph, film, bokeh, professional, 4k, highly detailed\",\n        \"negative_prompt\": \"drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly\",\n    },\n    {\n        \"name\": \"Anime\",\n        \"prompt\": \"anime artwork {prompt} . anime style, key visual, vibrant, studio anime,  highly detailed\",\n        \"negative_prompt\": \"photo, deformed, black and white, realism, disfigured, low contrast\",\n    },\n    {\n        \"name\": \"Manga\",\n        \"prompt\": \"manga style {prompt} . vibrant, high-energy, detailed, iconic, Japanese comic style\",\n        \"negative_prompt\": \"ugly, deformed, noisy, blurry, low contrast, realism, photorealistic, Western comic style\",\n    },\n    {\n        \"name\": \"Digital Art\",\n        \"prompt\": \"concept art {prompt} . digital artwork, illustrative, painterly, matte painting, highly detailed\",\n        \"negative_prompt\": \"photo, photorealistic, realism, ugly\",\n    },\n    {\n        \"name\": \"Pixel art\",\n        \"prompt\": \"pixel-art {prompt} . low-res, blocky, pixel art style, 8-bit graphics\",\n        \"negative_prompt\": \"sloppy, messy, blurry, noisy, highly detailed, ultra textured, photo, realistic\",\n    },\n    {\n        \"name\": \"Fantasy art\",\n        \"prompt\": \"ethereal fantasy concept art of  {prompt} . magnificent, celestial, ethereal, painterly, epic, majestic, magical, fantasy art, cover art, dreamy\",\n        \"negative_prompt\": \"photographic, realistic, realism, 35mm film, dslr, cropped, frame, text, deformed, glitch, noise, noisy, off-center, deformed, cross-eyed, closed eyes, bad anatomy, ugly, disfigured, sloppy, duplicate, mutated, black and white\",\n    },\n    {\n        \"name\": \"Neonpunk\",\n        \"prompt\": \"neonpunk style {prompt} . cyberpunk, vaporwave, neon, vibes, vibrant, stunningly beautiful, crisp, detailed, sleek, ultramodern, magenta highlights, dark purple shadows, high contrast, cinematic, ultra detailed, intricate, professional\",\n        \"negative_prompt\": \"painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured\",\n    },\n    {\n        \"name\": \"3D Model\",\n        \"prompt\": \"professional 3d model {prompt} . octane render, highly detailed, volumetric, dramatic lighting\",\n        \"negative_prompt\": \"ugly, deformed, noisy, low poly, blurry, painting\",\n    },\n]\n\n\nstyles = {k[\"name\"]: (k[\"prompt\"], k[\"negative_prompt\"]) for k in style_list}\nSTYLE_NAMES = list(styles.keys())\nDEFAULT_STYLE_NAME = \"(No style)\"\nNUM_IMAGES_PER_PROMPT = 1\n\ndef apply_style(style_name: str, positive: str, negative: str = \"\") -> Tuple[str, str]:\n    p, n = styles.get(style_name, styles[DEFAULT_STYLE_NAME])\n    if not negative:\n        negative = \"\"\n    return p.replace(\"{prompt}\", positive), n + negative\n\n\ndef get_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--is_lora', action='store_true', help='enable lora ckpt loading')\n    parser.add_argument('--repo_id', default=\"PixArt-alpha/PixArt-LCM-XL-2-1024-MS\", type=str)\n    parser.add_argument('--lora_repo_id', default=\"PixArt-alpha/PixArt-LCM-LoRA-XL-2-1024-MS\", type=str)\n    return parser.parse_args()\n\n\nargs = get_args()\nif torch.cuda.is_available():\n    if not args.is_lora:\n        pipe = PixArtAlphaPipeline.from_pretrained(\n            args.repo_id,\n            torch_dtype=torch.float16,\n            use_safetensors=True,\n        )\n    else:\n        assert args.lora_repo_id is not None\n        transformer = Transformer2DModel.from_pretrained(args.repo_id, subfolder=\"transformer\", torch_dtype=torch.float16)\n        transformer = PeftModel.from_pretrained(transformer, args.lora_repo_id)\n        pipe = PixArtAlphaPipeline.from_pretrained(\n            args.repo_id,\n            transformer=transformer,\n            torch_dtype=torch.float16,\n            use_safetensors=True,\n        )\n        del transformer\n\n    if ENABLE_CPU_OFFLOAD:\n        pipe.enable_model_cpu_offload()\n    else:\n        pipe.to(device)\n        print(\"Loaded on Device!\")\n\n    # speed-up T5\n    pipe.text_encoder.to_bettertransformer()\n\n    if USE_TORCH_COMPILE:\n        pipe.transformer = torch.compile(pipe.transformer, mode=\"reduce-overhead\", fullgraph=True)\n        print(\"Model Compiled!\")\n\n\ndef save_image(img):\n    unique_name = f'{str(uuid.uuid4())}.png'\n    save_path = os.path.join(f'output/online_demo_img/{datetime.now().date()}')\n    os.makedirs(save_path, exist_ok=True)\n    unique_name = os.path.join(save_path, unique_name)\n    img.save(unique_name)\n    return unique_name\n\n\ndef randomize_seed_fn(seed: int, randomize_seed: bool) -> int:\n    if randomize_seed:\n        seed = random.randint(0, MAX_SEED)\n    return seed\n\n\ndef generate(\n        prompt: str,\n        negative_prompt: str = \"\",\n        style: str = DEFAULT_STYLE_NAME,\n        use_negative_prompt: bool = False,\n        seed: int = 0,\n        width: int = 1024,\n        height: int = 1024,\n        inference_steps: int = 4,\n        randomize_seed: bool = False,\n        use_resolution_binning: bool = True,\n        progress=gr.Progress(track_tqdm=True),\n):\n    seed = int(randomize_seed_fn(seed, randomize_seed))\n    generator = torch.Generator().manual_seed(seed)\n\n    if not use_negative_prompt:\n        negative_prompt = None  # type: ignore\n    prompt, negative_prompt = apply_style(style, prompt, negative_prompt)\n\n    images = pipe(\n        prompt=prompt,\n        width=width,\n        height=height,\n        negative_prompt=negative_prompt,\n        guidance_scale=0.,\n        num_inference_steps=inference_steps,\n        generator=generator,\n        num_images_per_prompt=NUM_IMAGES_PER_PROMPT,\n        use_resolution_binning=use_resolution_binning,\n        output_type=\"pil\",\n    ).images\n\n    image_paths = [save_image(img) for img in images]\n    print(image_paths)\n    return image_paths, seed\n\n\nexamples = [\n    \"A small cactus with a happy face in the Sahara desert.\",\n    \"an astronaut sitting in a diner, eating fries, cinematic, analog film\",\n    \"Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, intricate detail.\",\n    \"stars, water, brilliantly, gorgeous large scale scene, a little girl, in the style of dreamy realism, light gold and amber, blue and pink, brilliantly illuminated in the background.\",\n    \"professional portrait photo of an anthropomorphic cat wearing fancy gentleman hat and jacket walking in autumn forest.\",\n    \"beautiful lady, freckles, big smile, blue eyes, short ginger hair, dark makeup, wearing a floral blue vest top, soft light, dark grey background\",\n    \"Spectacular Tiny World in the Transparent Jar On the Table, interior of the Great Hall, Elaborate, Carved Architecture, Anatomy, Symetrical, Geometric and Parameteric Details, Precision Flat line Details, Pattern, Dark fantasy, Dark errie mood and ineffably mysterious mood, Technical design, Intricate Ultra Detail, Ornate Detail, Stylized and Futuristic and Biomorphic Details, Architectural Concept, Low contrast Details, Cinematic Lighting, 8k, by moebius, Fullshot, Epic, Fullshot, Octane render, Unreal ,Photorealistic, Hyperrealism\",\n    \"anthropomorphic profile of the white snow owl Crystal priestess , art deco painting, pretty and expressive eyes, ornate costume, mythical, ethereal, intricate, elaborate, hyperrealism, hyper detailed, 3D, 8K, Ultra Realistic, high octane, ultra resolution, amazing detail, perfection, In frame, photorealistic, cinematic lighting, visual clarity, shading , Lumen Reflections, Super-Resolution, gigapixel, color grading, retouch, enhanced, PBR, Blender, V-ray, Procreate, zBrush, Unreal Engine 5, cinematic, volumetric, dramatic, neon lighting, wide angle lens ,no digital painting blur\",\n    \"The parametric hotel lobby is a sleek and modern space with plenty of natural light. The lobby is spacious and open with a variety of seating options. The front desk is a sleek white counter with a parametric design. The walls are a light blue color with parametric patterns. The floor is a light wood color with a parametric design. There are plenty of plants and flowers throughout the space. The overall effect is a calm and relaxing space. occlusion, moody, sunset, concept art, octane rendering, 8k, highly detailed, concept art, highly detailed, beautiful scenery, cinematic, beautiful light, hyperreal, octane render, hdr, long exposure, 8K, realistic, fog, moody, fire and explosions, smoke, 50mm f2.8\",\n]\n\nwith gr.Blocks(css=\"scripts/style.css\") as demo:\n    gr.Markdown(DESCRIPTION)\n    gr.DuplicateButton(\n        value=\"Duplicate Space for private use\",\n        elem_id=\"duplicate-button\",\n        visible=os.getenv(\"SHOW_DUPLICATE_BUTTON\") == \"1\",\n    )\n    with gr.Group():\n        with gr.Row():\n            prompt = gr.Text(\n                label=\"Prompt\",\n                show_label=False,\n                max_lines=1,\n                placeholder=\"Enter your prompt\",\n                container=False,\n            )\n            run_button = gr.Button(\"Run\", scale=0)\n        result = gr.Gallery(label=\"Result\", columns=NUM_IMAGES_PER_PROMPT, show_label=False)\n    with gr.Accordion(\"Advanced options\", open=False):\n        with gr.Row():\n            use_negative_prompt = gr.Checkbox(label=\"Use negative prompt\", value=False, visible=True)\n            negative_prompt = gr.Text(\n                label=\"Negative prompt\",\n                max_lines=1,\n                placeholder=\"Enter a negative prompt\",\n                visible=True,\n            )\n        style_selection = gr.Radio(\n            show_label=True,\n            container=True,\n            interactive=True,\n            choices=STYLE_NAMES,\n            value=DEFAULT_STYLE_NAME,\n            label=\"Image Style\",\n        )\n        seed = gr.Slider(\n            label=\"Seed\",\n            minimum=0,\n            maximum=MAX_SEED,\n            step=1,\n            value=0,\n        )\n        randomize_seed = gr.Checkbox(label=\"Randomize seed\", value=True)\n        with gr.Row(visible=True):\n            width = gr.Slider(\n                label=\"Width\",\n                minimum=256,\n                maximum=MAX_IMAGE_SIZE,\n                step=32,\n                value=1024,\n            )\n            height = gr.Slider(\n                label=\"Height\",\n                minimum=256,\n                maximum=MAX_IMAGE_SIZE,\n                step=32,\n                value=1024,\n            )\n        with gr.Row():\n            inference_steps = gr.Slider(\n                label=\"LCM inference steps\",\n                minimum=1,\n                maximum=30,\n                step=1,\n                value=4,\n            )\n    gr.Examples(\n        examples=examples,\n        inputs=prompt,\n        outputs=[result, seed],\n        fn=generate,\n        cache_examples=CACHE_EXAMPLES,\n    )\n\n    use_negative_prompt.change(\n        fn=lambda x: gr.update(visible=x),\n        inputs=use_negative_prompt,\n        outputs=negative_prompt,\n        api_name=False,\n    )\n\n    gr.on(\n        triggers=[\n            prompt.submit,\n            negative_prompt.submit,\n            run_button.click,\n        ],\n        fn=generate,\n        inputs=[\n            prompt,\n            negative_prompt,\n            style_selection,\n            use_negative_prompt,\n            seed,\n            width,\n            height,\n            inference_steps,\n            randomize_seed,\n        ],\n        outputs=[result, seed],\n        api_name=\"run\",\n    )\n\nif __name__ == \"__main__\":\n    demo.queue(max_size=20).launch(server_name=\"0.0.0.0\", server_port=PORT, debug=True)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/app/style.css",
    "content": ".gradio-container{width:680px!important}"
  },
  {
    "path": "PixArt-alpha-ToCa/app/style_controlnet.css",
    "content": ".gradio-container{width:768px!important}"
  },
  {
    "path": "PixArt-alpha-ToCa/asset/docs/pixart-dreambooth.md",
    "content": "# 🔥 How to Train PixArt + Dreambooth\n- PixArt + [Dreambooth](https://dreambooth.github.io/)\n<div id=\"dreambooth\" style=\"display: flex; justify-content: center;\">\n  <img src=\"../images/dreambooth/dreambooth_dog.svg\" width=\"46%\" style=\"margin: 5px;\">\n  <img src=\"../images/dreambooth/dreambooth_m5.svg\" width=\"46%\" style=\"margin: 5px;\">\n</div>\n\nYou **ONLY** need to change the **config** file in [config](../../configs/pixart_app_config/PixArt_xl2_img1024_dreambooth.py) and **dataloader** in [dataset](../../diffusion/data/datasets/Dreambooth.py).\n\n\nThe directory structure for Dreambooth dataset is:\n```\ncd ./data/dreambooth\n\ndataset\n├──dog6/\n│  ├──00.jpg\n│  ├──01.jpg\n│  ├──......\n├──cat/\n│  ├──00.jpg\n│  ├──01.jpg\n│  ├──......\n\n```\n\nTo get started, first install the required dependencies, then run on your local machine:\n\n```bash\ncd data/\ngit clone https://github.com/google/dreambooth.git\n\npython -m torch.distributed.launch --nproc_per_node=1 --master_port=26666 train_scripts/train_dreambooth.py configs/pixart_app_config/PixArt_xl2_img1024_dreambooth.py --work-dir output/path\n```\n\n\n"
  },
  {
    "path": "PixArt-alpha-ToCa/asset/docs/pixart.md",
    "content": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with\nthe License. You may obtain a copy of the License at\n\nhttp://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software distributed under the License is distributed on\nan \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the\nspecific language governing permissions and limitations under the License.\n-->\n\n[//]: # (&#40;reference from [hugging Face]&#40;https://github.com/huggingface/diffusers/blob/docs/8bit-inference-pixart/docs/source/en/api/pipelines/pixart.md&#41;&#41;)\n\n## Running the `PixArtAlphaPipeline` in under 8GB GPU VRAM\n\nIt is possible to run the [`PixArtAlphaPipeline`] under 8GB GPU VRAM by loading the text encoder in 8-bit numerical precision. Let's walk through a full-fledged example. \n\nFirst, install the `bitsandbytes` library:\n\n```bash\npip install -U bitsandbytes\n```\n\nThen load the text encoder in 8-bit:\n\n```python\nfrom transformers import T5EncoderModel\nfrom diffusers import PixArtAlphaPipeline\n\ntext_encoder = T5EncoderModel.from_pretrained(\n    \"PixArt-alpha/PixArt-XL-2-1024-MS\",\n    subfolder=\"text_encoder\",\n    load_in_8bit=True,\n    device_map=\"auto\",\n\n)\npipe = PixArtAlphaPipeline.from_pretrained(\n    \"PixArt-alpha/PixArt-XL-2-1024-MS\",\n    text_encoder=text_encoder,\n    transformer=None,\n    device_map=\"auto\"\n)\n```\n\nNow, use the `pipe` to encode a prompt:\n\n```python\nwith torch.no_grad():\n    prompt = \"cute cat\"\n    prompt_embeds, prompt_attention_mask, negative_embeds, negative_prompt_attention_mask = pipe.encode_prompt(prompt)\n\ndel text_encoder\ndel pipe\nflush()\n```\n\n`flush()` is just a utility function to clear the GPU VRAM and is implemented like so:\n\n```python\nimport gc \n\ndef flush():\n    gc.collect()\n    torch.cuda.empty_cache()\n```\n\nThen compute the latents providing the prompt embeddings as inputs:\n\n```python\npipe = PixArtAlphaPipeline.from_pretrained(\n    \"PixArt-alpha/PixArt-XL-2-1024-MS\",\n    text_encoder=None,\n    torch_dtype=torch.float16,\n).to(\"cuda\")\n\nlatents = pipe(\n    negative_prompt=None, \n    prompt_embeds=prompt_embeds,\n    negative_prompt_embeds=negative_embeds,\n    prompt_attention_mask=prompt_attention_mask,\n    negative_prompt_attention_mask=negative_prompt_attention_mask,\n    num_images_per_prompt=1,\n    output_type=\"latent\",\n).images\n\ndel pipe.transformer\nflush()\n```\n\nNotice that while initializing `pipe`, you're setting `text_encoder` to `None` so that it's not loaded. \n\nOnce the latents are computed, pass it off the VAE to decode into a real image:\n\n```python\nwith torch.no_grad():\n    image = pipe.vae.decode(latents / pipe.vae.config.scaling_factor, return_dict=False)[0]\nimage = pipe.image_processor.postprocess(image, output_type=\"pil\")\nimage.save(\"cat.png\")\n```\n\nAll of this, put together, should allow you to run [`PixArtAlphaPipeline`] under 8GB GPU VRAM.\n\n![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/pixart/8bits_cat.png)\n\nFind the script [here](https://gist.github.com/sayakpaul/3ae0f847001d342af27018a96f467e4e) that can be run end-to-end to report the memory being used.\n\n<Tip warning={true}>\n\nText embeddings computed in 8-bit can have an impact on the quality of the generated images because of the information loss in the representation space induced by the reduced precision. It's recommended to compare the outputs with and without 8-bit.\n\n</Tip>"
  },
  {
    "path": "PixArt-alpha-ToCa/asset/docs/pixart_comfyui.md",
    "content": "<!--Copyright 2023 The Huawei Noah’s Ark Lab Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with\nthe License. You may obtain a copy of the License at\n\nhttp://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software distributed under the License is distributed on\nan \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the\nspecific language governing permissions and limitations under the License.\n-->\n\n## 🔥 How to use PixArt in ComfyUI\n\n### 1. Preparation for PixArt running envrironment\n\n```bash\ncd /workspace\n\nconda create -n pixart python==3.9.0\nconda activate pixart\npip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117\n\ngit clone https://github.com/PixArt-alpha/PixArt-alpha.git\ncd PixArt-alpha\npip install -r requirements.txt\n```\n\n### 2. Install ComfyUI related dependencies\n\n```bash\ncd /workspace\ngit clone https://github.com/comfyanonymous/ComfyUI.git\n\ncd ComfyUI\ngit clone https://github.com/city96/ComfyUI_ExtraModels custom_nodes/ComfyUI_ExtraModels\n```\n\n### 3. Download all the checkpoints: PixArt, VAE, T5 with script\n\n```bash\ncd /workspace/PixArt\npython tools/download.py --model_names \"PixArt-XL-2-1024-MS.pth\"\n```\nor download with urls:[PixArt ckpt](https://huggingface.co/PixArt-alpha/PixArt-alpha/resolve/main/PixArt-XL-2-1024-MS.pth), [VAE ckpt](https://huggingface.co/PixArt-alpha/PixArt-alpha/tree/main/sd-vae-ft-ema), \n[T5 ckpt](https://huggingface.co/PixArt-alpha/PixArt-alpha/tree/main/t5-v1_1-xxl).\n\n### 4. Put Checkpoints into corresponding folders\n```bash\ncd /workspace/ComfyUI\n\nmv /path/to/PixArt-XL-2-1024-MS.pth ./models/checkpoints/\nmv /path/to/sd-vae-ft-ema ./models/VAE/\nmv /path/to/t5-v1_1-xxl ./models/t5/\n```\n### 5. run the ComfyUI website\n```bash\ncd /workspace/ComfyUI\n\npython main.py --port 11111 --listen 0.0.0.0\n```\nOpen http://your-server-ip:11111 to play with PixArt.\n\n### 6. Create your own custom nodes\nHere we prepare two examples for better understanding:\n\n1) [PixArt Text-to-Image workflow](https://huggingface.co/PixArt-alpha/PixArt-alpha/blob/main/PixArt-image-to-image-workflow.json)\n\n2) [PixArt Image-to-Image workflow](https://huggingface.co/PixArt-alpha/PixArt-alpha/blob/main/PixArt-image-to-image-workflow.json)\n\nOnce you download these json files, you can open your server website which is `http://your-server-ip:11111` and drop the json file into the website window to begin the PixArt-ComfyUI playground."
  },
  {
    "path": "PixArt-alpha-ToCa/asset/docs/pixart_controlnet.md",
    "content": "<!--Copyright 2023 The Huawei Noah’s Ark Lab Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with\nthe License. You may obtain a copy of the License at\n\nhttp://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software distributed under the License is distributed on\nan \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the\nspecific language governing permissions and limitations under the License.\n-->\n\n\n## 🔥 ControlNet\nWe incorporate a ControlNet-like(https://github.com/lllyasviel/ControlNet) module enables fine-grained control over text-to-image diffusion models. We introduce a novel ControlNet-Transformer architecture, specifically tailored for Transformers, achieving explicit controllability alongside high-quality image generation.\n\nFor more details about PixArt-ControlNet, please check the technical report [PixArt-δ](https://arxiv.org/abs/2401.05252).\n\n<p align=\"center\">\n  <img src=\"../images/controlnet.PNG\"  height=480>\n</p>\n\n\n## Training the `PixArt + ControlNet` on your machine\n\n```bash\n# Train on 1024px\npython -m torch.distributed.launch --nproc_per_node=2 --master_port=12345 train_scripts/train_controlnet.py configs/pixart_app_config/PixArt_xl2_img1024_controlHed.py --work-dir output/pixartcontrolnet-xl2-img1024\n\n# Train on 512px\npython -m torch.distributed.launch --nproc_per_node=2 --master_port=12345 train_scripts/train_controlnet.py configs/pixart_app_config/PixArt_xl2_img512_controlHed.py --work-dir output/pixartcontrolnet-xl2-img512\n```\n\n## Testing the `PixArt + ControlNet`\n```bash\n# Test on 1024px\nDEMO_PORT= 12345 python app/app_controlnet.py configs/pixart_app_config/PixArt_xl2_img1024_controlHed.py --model_path path/to/1024px/PixArt-XL-2-1024-ControlNet.pth\n\n# Test on 512px\nDEMO_PORT= 12345 python app/app_controlnet.py configs/pixart_app_config/PixArt_xl2_img512_controlHed.py --model_path path/to/512px/pixart_controlnet_ckpt\n```\nThen have a look at a simple example using the http://your-server-ip:12345\n\n"
  },
  {
    "path": "PixArt-alpha-ToCa/asset/docs/pixart_inpaint.md",
    "content": "\n```python\nimport torch\nfrom scripts.pipeline_pixart_inpaint import PixArtAlphaInpaintPipeline\nfrom PIL import Image\n\npipe = PixArtAlphaInpaintPipeline.from_pretrained(\"PixArt-alpha/PixArt-XL-2-1024-MS\", torch_dtype=torch.float16)\n\nprompt = \"\"\nimage = Image.open('')\nmask_image = Image.open('')\nout = pipe(prompt, image=image, mask_image=mask_image, strength=1.0).images[0]\nout.save('./cactus_removed.png')\n```"
  },
  {
    "path": "PixArt-alpha-ToCa/asset/docs/pixart_lcm.md",
    "content": "<!--Copyright 2023 The Huawei Noah’s Ark Lab Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with\nthe License. You may obtain a copy of the License at\n\nhttp://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software distributed under the License is distributed on\nan \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the\nspecific language governing permissions and limitations under the License.\n-->\n\n<p align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/PixArt-alpha/PixArt-alpha.github.io/master/static/images/pixart-lcm2.png\"  height=120>\n</p>\n\n## 🔥 Why Need PixArt-LCM\nFollowing [LCM LoRA](https://huggingface.co/blog/lcm_lora), we illustrative of the generation speed we achieve on various computers. Let us stress again how liberating it is to explore image generation so easily with PixArt-LCM.\n\n| Hardware                    | PixArt-LCM (4 steps) | SDXL LoRA LCM (4 steps) | PixArt standard (14 steps) | SDXL standard (25 steps) |\n|-----------------------------|----------------------|-------------------------|----------------------------|---------------------------|\n| T4 (Google Colab Free Tier) | 3.3s                 | 8.4s                    | 16.0s                      | 26.5s                     |\n| A100 (80 GB)                | 0.51s                | 1.2s                    | 2.2s                       | 3.8s                      |\n| V100 (32 GB)                | 0.8s                 | 1.2s                    | 5.5s                       | 7.7s                      |\n\nThese tests were run with a batch size of 1 in all cases.\n\nFor cards with a lot of capacity, such as A100, performance increases significantly when generating multiple images at once, which is usually the case for production workloads.\n\n## Training the `PixArt + LCM` on your machine\n\n```bash\npython -m torch.distributed.launch --nproc_per_node=2 --master_port=12345 train_scripts/train_pixart_lcm.py configs/pixart_config/PixArt_xl2_img1024_lcm.py --work-dir output/pixartlcm-xl2-img1024_ft\n```\n\n## Trainig the `PixArt + LCM-LoRA`\n\n```bash\npython -m torch.distributed.launch --nproc_per_node=2 --master_port=12345 train_scripts/train_pixart_lcm_lora.py configs/pixart_config/PixArt_xl2_img1024_lcm.py --work-dir output/pixartlcm-lora-xl2-img1024_ft\n```\n\n## Testing the `PixArt + LCM` on your machine\n\n```bash\nDEMO_PORT=12345 python app/app_lcm.py\n\nThen have a look at a simple example using the http://your-server-ip:12345\n```\n\n## Testing the `PixArt + LCM-LoRA`\n\n```bash\nDEMO_PORT=12345 python app/app_lcm.py --is_lora --lora_repo_id output/pixartlcm-lora-xl2-img1024_ft/checkpoint-xxx\n\nThen have a look at a simple example using the http://your-server-ip:12345\n```\n\n## Integration in diffusers\n### Using in 🧨 diffusers\n\nMake sure you have the updated versions of the following libraries:\n\n```bash\npip install -U transformers accelerate diffusers\n```\n\nAnd then:\n\n```python\nimport torch\nfrom diffusers import PixArtAlphaPipeline, AutoencoderKL\n\n# for PixArt-LCM\npipe = PixArtAlphaPipeline.from_pretrained(\"PixArt-alpha/PixArt-LCM-XL-2-1024-MS\", torch_dtype=torch.float16, use_safetensors=True)\n\n# for PixArt-LCM-LoRA\n# transformer = Transformer2DModel.from_pretrained(\"PixArt-alpha/PixArt-LCM-XL-2-1024-MS\", subfolder=\"transformer\", torch_dtype=torch.float16)\n# transformer = PeftModel.from_pretrained(transformer, \"PixArt-alpha/PixArt-LCM-LoRA-XL-2-1024-MS\")\n# pipe = PixArtAlphaPipeline.from_pretrained(\"PixArt-alpha/PixArt-LCM-XL-2-1024-MS\", transformer=transformer, torch_dtype=torch.float16, use_safetensors=True)\n# del transformer\n\n# Enable memory optimizations.\npipe.enable_model_cpu_offload()\n\nprompt = \"A small cactus with a happy face in the Sahara desert.\"\nimage = pipe(prompt, guidance_scale=0., num_inference_steps=4).images[0]\n```\n\nThis integration allows running the pipeline with a batch size of 4 under 11 GBs of GPU VRAM. \nCheck out the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pixart) to learn more.\n\n# Keeping updating"
  },
  {
    "path": "PixArt-alpha-ToCa/asset/docs/sasolver.md",
    "content": "## SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models (Neurips 2023)\n<div align=\"center\">\n  <a href=\"https://arxiv.org/pdf/2309.05019.pdf\"><img src=\"https://img.shields.io/static/v1?label=Paper&message=Arxiv&color=red&logo=arxiv\"></a> &ensp;\n  <a href=\"https://github.com/PixArt-alpha/PixArt-alpha/blob/master/diffusion/sa_solver_diffusers.py\"><img src=\"https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages\"></a> &ensp;\n</div>\n\n> [**SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models (Neurips 2023)**](https://arxiv.org/pdf/2309.05019.pdf)<br>\n> [Shuchen Xue*](https://github.com/scxue), [Mingyang Yi]()&#8224;, \n> [Weijian Luo](), [Shifeng Zhang](), [Jiacheng Sun](),\n> [Zhenguo Li](https://scholar.google.com/citations?user=XboZC1AAAAAJ),\n> [Zhi-Ming Ma]()\n> <br>University of Chinese Academy of Sciences, Huawei Noah’s Ark Lab, Peking University<br>\n---\n\n## 🐱 Abstract\nSA-Solver is a stochastic diffusion sampler based on Stochastic Adams Method. It is training-free and can be employed into pretrained diffusion models. It is a multistep SDE solver that can do fast stochastic sampling. \n\n1. The parameter 'tau function' controls the stochasticity in the sampling process. Inspired by EDM, we choose the 'tau function' to be a piecewise constant function that is greater than 0 in the middle stage of sampling process and equals zero in the start and end stage. Specifically, we choose the default value of this parameter to be\n\n```python\ntau_func = lambda t: 1 if t >= 200 and t <= 800 else 0\n```\n\nin diffusers library and \n\n```python\ntau_t = lambda t: eta if 0.2 <= t <= 0.8 else 0\n```\n\nin ldm library. (The difference is because the time transformation * 1000).\n\nThe value '1' represents the magnitude of stochasticity. Higher value are recommended with more NFEs.\n\nIf you want to employ deterministic sampling (solving diffusion ODE) in SA-Solver, please set\n\n```python\ntau_func = lambda t: 0\n```\n\nIf you want to employ original stochastic sampling (solving original diffusion SDE) in SA-Solver, please set\n\n```python\ntau_func = lambda t: 1\n```\n\n\n2. The parameter 'predictor_order' and 'corrector_order' controls the specific orders of 'SA-Predictor' and 'SA-Corrector'. For unconditional generation and conditional generation with small classifier-free guidance scale, the recommended orders are 'predictor_order = 3' and 'corrector_order = 4'; for conditional generation with large classifier-free guidance scale (e.g. t2i), the recommended orders are 'predictor_order = 2' and 'corrector_order = 2'.\n\n"
  },
  {
    "path": "PixArt-alpha-ToCa/asset/examples.py",
    "content": "examples = [\n    [\n        \"A small cactus with a happy face in the Sahara desert.\",\n        \"dpm-solver\", 20, 4.5,\n        \"https://github.com/PixArt-alpha/PixArt-alpha.github.io/blob/master/static/images/carousel/carousel1.png\",\n        \"Prompt: A small cactus with a happy face in the Sahara desert. \\nSize: --ar 1:1.\",\n        \"Model path: PixArt-XL-2-1024x1024.pt.\\nBase image size: 1024, \\nSampling Algo: dpm-solver\"],\n    [\n        \"Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, \"\n        \"spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, \"\n        \"intricate detail. --ar 6144:4096.\",\n        \"dpm-solver\", 20, 4.5,\n        \"https://github.com/PixArt-alpha/PixArt-alpha.github.io/blob/master/static/images/samples/15.png\",\n        \"Prompt: Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, \"\n        \"spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, \"\n        \"intricate detail.\\nSize: --ar 6144:4096.\",\n        \"Model path: PixArt-XL-2-1024x1024.pt.\\nBase image size: 1024, \\nSampling Algo: dpm-solver\"],\n    [\n        \"stars, water, brilliantly, gorgeous large scale scene, a little girl, in the style of dreamy realism, light gold and amber, \"\n        \"blue and pink, brilliantly illuminated in the background.\",\n        \"dpm-solver\", 20, 4.5,\n        \"https://github.com/PixArt-alpha/PixArt-alpha.github.io/blob/master/static/images/samples/13.png\",\n        \"stars, water, brilliantly, gorgeous large scale scene, a little girl, in the style of dreamy realism, light gold and amber, blue and pink, brilliantly illuminated in the background.\",\n        \"Model path: PixArt-XL-2-1024x1024.pt.\\nBase image size: 1024, \\nSampling Algo: dpm-solver\"],\n    [\n        \"nature vs human nature, surreal, UHD, 8k, hyper details, rich colors, photograph.\",\n        \"dpm-solver\", 20, 4.5,\n        \"https://github.com/PixArt-alpha/PixArt-alpha.github.io/blob/master/static/images/samples/14.png\",\n        \"nature vs human nature, surreal, UHD, 8k, hyper details, rich colors, photograph.\",\n        \"Model path: PixArt-XL-2-1024x1024.pt.\\nBase image size: 1024, \\nSampling Algo: dpm-solver\"],\n]"
  },
  {
    "path": "PixArt-alpha-ToCa/asset/samples.txt",
    "content": "A small cactus with a happy face in the Sahara desert.\nPirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, intricate detail.\nbeautiful lady, freckles, big smile, blue eyes, short ginger hair, dark makeup, wearing a floral blue vest top, soft light, dark grey background\nstars, water, brilliantly, gorgeous large scale scene, a little girl, in the style of dreamy realism, light gold and amber, blue and pink, brilliantly illuminated in the background.\nnature vs human nature, surreal, UHD, 8k, hyper details, rich colors, photograph.\nSpectacular Tiny World in the Transparent Jar On the Table, interior of the Great Hall, Elaborate, Carved Architecture, Anatomy, Symetrical, Geometric and Parameteric Details, Precision Flat line Details, Pattern, Dark fantasy, Dark errie mood and ineffably mysterious mood, Technical design, Intricate Ultra Detail, Ornate Detail, Stylized and Futuristic and Biomorphic Details, Architectural Concept, Low contrast Details, Cinematic Lighting, 8k, by moebius, Fullshot, Epic, Fullshot, Octane render, Unreal ,Photorealistic, Hyperrealism\nanthropomorphic profile of the white snow owl Crystal priestess , art deco painting, pretty and expressive eyes, ornate costume, mythical, ethereal, intricate, elaborate, hyperrealism, hyper detailed, 3D, 8K, Ultra Realistic, high octane, ultra resolution, amazing detail, perfection, In frame, photorealistic, cinematic lighting, visual clarity, shading , Lumen Reflections, Super-Resolution, gigapixel, color grading, retouch, enhanced, PBR, Blender, V-ray, Procreate, zBrush, Unreal Engine 5, cinematic, volumetric, dramatic, neon lighting, wide angle lens ,no digital painting blur\nThe parametric hotel lobby is a sleek and modern space with plenty of natural light. The lobby is spacious and open with a variety of seating options. The front desk is a sleek white counter with a parametric design. The walls are a light blue color with parametric patterns. The floor is a light wood color with a parametric design. There are plenty of plants and flowers throughout the space. The overall effect is a calm and relaxing space. occlusion, moody, sunset, concept art, octane rendering, 8k, highly detailed, concept art, highly detailed, beautiful scenery, cinematic, beautiful light, hyperreal, octane render, hdr, long exposure, 8K, realistic, fog, moody, fire and explosions, smoke, 50mm f2.8\nBright scene, aerial view, ancient city, fantasy, gorgeous light, mirror reflection, high detail, wide angle lens.\n8k uhd A man looks up at the starry sky, lonely and ethereal, Minimalism, Chaotic composition Op Art\nA middle-aged woman of Asian descent, her dark hair streaked with silver, appears fractured and splintered, intricately embedded within a sea of broken porcelain. The porcelain glistens with splatter paint patterns in a harmonious blend of glossy and matte blues, greens, oranges, and reds, capturing her dance in a surreal juxtaposition of movement and stillness. Her skin tone, a light hue like the porcelain, adds an almost mystical quality to her form.\nA 4k dslr image of a lemur wearing a red magician hat and a blue coat performing magic tricks with cards in a garden.\nA alpaca made of colorful building blocks, cyberpunk\nA baby painter trying to draw very simple picture, white background\nA boy and a girl fall in love\nA dog that has been meditating all the time\nA man is sitting in a chair with his chin resting on his hand. The chair, along with the man's feet, are submerged in the sea. Strikingly, the man's back is on fire.\nA painter study hard to learn how to draw with many concepts in the air, white background\nA painter with low quality, white background, pixel art\nA person standing on the desert, desert waves, gossip illustration, half red, half blue, abstract image of sand, clear style, trendy illustration, outdoor, top view, clear style, precision art, ultra high definition image\nA silhouette of a grand piano overlooking a dusky cityscape viewed from a top-floor penthouse, rendered in the bold and vivid sytle of a vintage travel poster.\nA sureal parallel world where mankind avoid extinction by preserving nature, epic trees, water streams, various flowers, intricate details, rich colors, rich vegetation, cinematic, symmetrical, beautiful lighting, V-Ray render, sun rays, magical lights, photography\nA woman is shopping for fresh produce at the farmer's market.\nA worker that looks like a mixture of cow and horse is working hard to type code\nA young man dressed in ancient Chinese clothing, Asian people, White robe, Handsome, Hand gestures forming a spell, Martial arts and fairy-like vibe, Carrying a legendary-level giant sword on the back, Game character, Surrounded by runes, Cyberpunk style, neon lights, best quality, masterpiece, cg, hdr, high-definition, extremely detailed, photorealistic, epic, character design, detailed face, superhero, hero, detailed UHD, real-time, vfx, 3D rendering, 8k\nAn alien octopus floats through a protal reading a newspaper\nAn epressive oil painting of a basketbal player dunking, depicted as an explosion of  a nebula\nart collection style and fashion shoot, in the style of made of glass, dark blue and light pink, paul rand, solarpunk, camille vivier, beth didonato hair, barbiecore, hyper-realistic\nartistic\nbeautiful secen\nCrocodile in a sweater\nDesign a letter A, 3D stereoscopic Ice material Interior light blue Conceptual product design Futuristic Blind box toy Handcrafted Exquisite 3D effect Full body display Ultra-high precision Ultra-detailed Perfect lighting OC Renderer Blender 8k Ultra-sharp Ultra-noise reduction\nFloating,colossal,futuristic statue in the sky, awe-inspiring and serenein the style of Stuart Lippincott:2with detailed composition and subtle geometric elements.This sanctuary-ike atmosphere features crisp clarity and soft amber tones.In contrasttiny human figures surround the statueThe pieceincorporates flowing draperiesreminiscent of Shwedoff and Philip McKay's stylesemphasizing thejuxtaposition between the powerful presence of the statue and thevulnerability of the minuscule human figuresshwedoff\nknolling of a drawing tools for painter\nLeonardo da Vinci's Last Supper content, Van Goph's Starry Night Style\nLuffy from ONEPIECE, handsome face, fantasy\nphotography shot through an outdoor window of a coffee shop with neon sign lighting, window glares and reflections, depth of field, {little girl with red hair sitting at a table, portrait, kodak portra 800,105 mm f1.8\nposter of a mechanical cat, techical Schematics viewed from front and side view on light white blueprint paper, illustartion drafting style, illustation, typography, conceptual art, dark fantasy steampunk, cinematic, dark fantasy\nThe girl in the car is filled with goldfish and flowers, goldfish can fly, Kawaguchi Renko's art, natural posture, holiday dadcore, youthful energy and pressure, body stretching, goldfish simulation movies in the sky, super details, and dreamy high photography. Colorful. Covered by water and goldfish, indoor scene, close-up shot in XT4 movie\nThe image features a woman wearing a red shirt with an icon. She appears to be posing for the camera, and her outfit includes a pair of jeans. The woman seems to be in a good mood, as she is smiling. The background of the image is blurry, focusing more on the woman and her attire.\nThe towel was on top of the hard counter.\nA vast landscape made entirely of various meats spreads out before the viewer. tender, succulent hills of roast beef, chicken drumstick trees, bacon rivers, and ham boulders create a surreal, yet appetizing scene. the sky is adorned with pepperoni sun and salami clouds.\nI want to supplement vitamin c, please help me paint related food.\nA vibrant yellow banana-shaped couch sits in a cozy living room, its curve cradling a pile of colorful cushions. on the wooden floor, a patterned rug adds a touch of eclectic charm, and a potted plant sits in the corner, reaching towards the sunlight filtering through the window.\nA transparent sculpture of a duck made out of glass. The sculpture is in front of a painting of a landscape.\nA blue jay standing on a large basket of rainbow macarons.\nA bucket bag made of blue suede. The bag is decorated with intricate golden paisley patterns. The handle of the bag is made of rubies and pearls.\nAn alien octopus floats through a portal reading a newspaper.\nbird's eye view of a city.\nbeautiful scene\nA 2D animation of a folk music band composed of anthropomorphic autumn leaves, each playing traditional bluegrass instruments, amidst a rustic forest setting dappled with the soft light of a harvest moon.\nIn front of a deep black backdrop, a figure of middle years, her Tongan skin rich and glowing, is captured mid-twirl, her curly hair flowing like a storm behind her. Her attire resembles a whirlwind of marble and porcelain fragments. Illuminated by the gleam of scattered porcelain shards, creating a dreamlike atmosphere, the dancer manages to appear fragmented, yet maintains a harmonious and fluid form.\nDigital illustration of a beach scene crafted from yarn. The sandy beach is depicted with beige yarn, waves are made of blue and white yarn crashing onto the shore. A yarn sun sets on the horizon, casting a warm glow. Yarn palm trees sway gently, and little yarn seashells dot the shoreline.\nIllustration of a chic chair with a design reminiscent of a pumpkin’s form, with deep orange cushioning, in a stylish loft setting.\nA detailed oil painting of an old sea captain, steering his ship through a storm. Saltwater is splashing against his weathered face, determination in his eyes. Twirling malevolent clouds are seen above and stern waves threaten to submerge the ship while seagulls dive and twirl through the chaotic landscape. Thunder and lights embark in the distance, illuminating the scene with an eerie green glow.\nAn illustration of a human heart made of translucent glass, standing on a pedestal amidst a stormy sea. Rays of sunlight pierce the clouds, illuminating the heart, revealing a tiny universe within. The quote 'Find the universe within you' is etched in bold letters across the horizon.\nA modern architectural building with large glass windows, situated on a cliff overlooking a serene ocean at sunset\nphoto of an ancient shipwreck nestled on the ocean floor. Marine plants have claimed the wooden structure, and fish swim in and out of its hollow spaces. Sunken treasures and old cannons are scattered around, providing a glimpse into the past\nA 3D render of a coffee mug placed on a window sill during a stormy day. The storm outside the window is reflected in the coffee, with miniature lightning bolts and turbulent waves seen inside the mug. The room is dimly lit, adding to the dramatic atmosphere.A minimap diorama of a cafe adorned with indoor plants. Wooden beams crisscross above, and a cold brew station stands out with tiny bottles and glasses.\nAn antique botanical illustration drawn with fine lines and a touch of watercolour whimsy, depicting a strange lily crossed with a Venus flytrap, its petals poised as if ready to snap shut on any unsuspecting insects.An illustration inspired by old-world botanical sketches blends a cactus with lilac blooms into a Möbius strip, using detailed lines and subtle watercolor touches to capture nature's diverse beauty and mathematical intrigue.\nAn ink sketch style illustration of a small hedgehog holding a piece of watermelon with its tiny paws, taking little bites with its eyes closed in delight.Photo of a lychee-inspired spherical chair, with a bumpy white exterior and plush interior, set against a tropical wallpaper.\n3d digital art of an adorable ghost, glowing within, holding a heart shaped pumpkin, Halloween, super cute, spooky haunted house background\nprofessional portrait photo of an anthropomorphic cat wearing fancy gentleman hat and jacket walking in autumn forest.\nan astronaut sitting in a diner, eating fries, cinematic, analog film"
  },
  {
    "path": "PixArt-alpha-ToCa/configs/PixArt_xl2_internal.py",
    "content": "data_root = '/data/data'\ndata = dict(type='InternalData', root='images', image_list_json=['data_info.json'], transform='default_train', load_vae_feat=True)\nimage_size = 256  # the generated image resolution\ntrain_batch_size = 32\neval_batch_size = 16\nuse_fsdp=False   # if use FSDP mode\nvalid_num=0      # take as valid aspect-ratio when sample number >= valid_num\n\n# model setting\nmodel = 'PixArt_XL_2'\naspect_ratio_type = None         # base aspect ratio [ASPECT_RATIO_512 or ASPECT_RATIO_256]\nmulti_scale = False     # if use multiscale dataset model training\nlewei_scale = 1.0    # lewei_scale for positional embedding interpolation\n# training setting\nnum_workers=4\ntrain_sampling_steps = 1000\neval_sampling_steps = 250\nmodel_max_length = 120\nlora_rank = 4\n\nnum_epochs = 80\ngradient_accumulation_steps = 1\ngrad_checkpointing = False\ngradient_clip = 1.0\ngc_step = 1\nauto_lr = dict(rule='sqrt')\n\n# we use different weight decay with the official implementation since it results better result\noptimizer = dict(type='AdamW', lr=1e-4, weight_decay=3e-2, eps=1e-10)\nlr_schedule = 'constant'\nlr_schedule_args = dict(num_warmup_steps=500)\n\nsave_image_epochs = 1\nsave_model_epochs = 1\nsave_model_steps=1000000\n\nsample_posterior = True\nmixed_precision = 'fp16'\nscale_factor = 0.18215\nema_rate = 0.9999\ntensorboard_mox_interval = 50\nlog_interval = 50\ncfg_scale = 4\nmask_type='null'\nnum_group_tokens=0\nmask_loss_coef=0.\nload_mask_index=False    # load prepared mask_type index\n# load model settings\nvae_pretrained = \"/cache/pretrained_models/sd-vae-ft-ema\"\nload_from = None\nresume_from = dict(checkpoint=None, load_ema=False, resume_optimizer=True, resume_lr_scheduler=True)\nsnr_loss=False\n\n# work dir settings\nwork_dir = '/cache/exps/'\ns3_work_dir = None\n\nseed = 43\n"
  },
  {
    "path": "PixArt-alpha-ToCa/configs/PixArt_xl2_sam.py",
    "content": "data_root = '/data/data'\ndata = dict(type='SAM', root='images', image_list_txt='part0.txt', transform='default_train', load_vae_feat=True)\nimage_size = 256  # the generated image resolution\ntrain_batch_size = 32\neval_batch_size = 16\nuse_fsdp=False   # if use FSDP mode\n\n# model setting\nmodel = 'PixArt_XL_2'\naspect_ratio_type = None         # base aspect ratio [ASPECT_RATIO_512 or ASPECT_RATIO_1024]\nmulti_scale = False     # if use multiscale dataset model training\nlewei_scale = 1.0\nmodel_max_length = 120\nlora_rank = 4\n# training setting\nnum_workers=4\ntrain_sampling_steps = 1000\neval_sampling_steps = 250\n\nnum_epochs = 80\ngradient_accumulation_steps = 1\ngrad_checkpointing = False\ngc_step = 1\ngradient_clip = 1.0\nauto_lr = dict(rule='sqrt')\n\n# we use different weight decay with the official implementation since it results better result\noptimizer = dict(type='AdamW', lr=1e-4, weight_decay=3e-2, eps=1e-10)\nlr_schedule = 'constant'\nlr_schedule_args = dict(num_warmup_steps=500)\n\nsave_image_epochs = 1\nsave_model_epochs = 1\nsave_model_steps=1000000\n\nsample_posterior = True\nmixed_precision = 'fp16'\nscale_factor = 0.18215\nema_rate = 0.9999\ntensorboard_mox_interval = 50\nlog_interval = 50\ncfg_scale = 4\nmask_type='null'\nnum_group_tokens=0\nmask_loss_coef=0.\nload_mask_index=False    # load prepared mask_type index\n# load model settings\nvae_pretrained = \"/cache/pretrained_models/sd-vae-ft-ema\"\nload_from = None\nresume_from = dict(checkpoint=None, load_ema=False, resume_optimizer=True, resume_lr_scheduler=True)\nsnr_loss=False\n\n# work dir settings\nwork_dir = '/cache/exps/'\ns3_work_dir = None\n\nseed = 43\n"
  },
  {
    "path": "PixArt-alpha-ToCa/configs/pixart_app_config/PixArt_xl2_img1024_controlHed.py",
    "content": "_base_ = ['../PixArt_xl2_internal.py']\ndata_root = 'data'\nimage_list_json = ['data_info.json',]\n\ndata = dict(type='InternalDataHed', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)\nimage_size = 1024\n\n# model setting\nmodel = 'PixArtMS_XL_2'\nfp32_attention = False  # Set to True if you got NaN loss\nload_from = 'path-to-pixart-checkpoints'\nvae_pretrained = \"output/pretrained_models/sd-vae-ft-ema\"\nwindow_block_indexes = []\nwindow_size=0\nuse_rel_pos=False\nlewei_scale = 2.0\n\n# training setting\nnum_workers=10\ntrain_batch_size = 4 #  set the batch size according to your VRAM\nnum_epochs = 10 # 3\ngradient_accumulation_steps = 4\ngrad_checkpointing = True\ngradient_clip = 0.01\noptimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)\nlr_schedule_args = dict(num_warmup_steps=0)\nsave_model_epochs=5\nsave_model_steps=1000\n\nlog_interval = 20\neval_sampling_steps = 200\nwork_dir = 'output_debug/debug'\n\n# controlnet related params\ncopy_blocks_num = 13\nclass_dropout_prob = 0.5\ntrain_ratio = 1\n"
  },
  {
    "path": "PixArt-alpha-ToCa/configs/pixart_app_config/PixArt_xl2_img1024_dreambooth.py",
    "content": "_base_ = ['../PixArt_xl2_internal.py']\ndata_root = 'data/dreambooth/dataset'\n\ndata = dict(type='DreamBooth', root='dog6', prompt=['a photo of sks dog'], transform='default_train', load_vae_feat=True)\nimage_size = 1024\n\n# model setting\nmodel = 'PixArtMS_XL_2'     # model for multi-scale training\nfp32_attention = True\nload_from = 'Path/to/PixArt-XL-2-1024-MS.pth'\nvae_pretrained = \"output/pretrained_models/sd-vae-ft-ema\"\nwindow_block_indexes = []\nwindow_size=0\nuse_rel_pos=False\naspect_ratio_type = 'ASPECT_RATIO_1024'         # base aspect ratio [ASPECT_RATIO_512 or ASPECT_RATIO_256]\nmulti_scale = True     # if use multiscale dataset model training\nlewei_scale = 2.0\n\n# training setting\nnum_workers=1\ntrain_batch_size = 1\nnum_epochs = 200\ngradient_accumulation_steps = 1\ngrad_checkpointing = True\ngradient_clip = 0.01\noptimizer = dict(type='AdamW', lr=5e-6, weight_decay=3e-2, eps=1e-10)\nlr_schedule_args = dict(num_warmup_steps=0)\nauto_lr = None\n\nlog_interval = 1\nsave_model_epochs=10000\nsave_model_steps=100\nwork_dir = 'output/debug'\n"
  },
  {
    "path": "PixArt-alpha-ToCa/configs/pixart_app_config/PixArt_xl2_img512_controlHed.py",
    "content": "_base_ = ['../PixArt_xl2_internal.py']\ndata_root = 'data'\nimage_list_json = ['data_info.json',]\n\ndata = dict(type='InternalDataHed', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)\nimage_size = 512\n\n# model setting\nmodel = 'PixArt_XL_2'\nfp32_attention = False  # Set to True if you got NaN loss\nload_from = 'path-to-pixart-checkpoints'\nvae_pretrained = \"output/pretrained_models/sd-vae-ft-ema\"\nwindow_block_indexes = []\nwindow_size=0\nuse_rel_pos=False\nlewei_scale = 1.0\n\n# training setting\nnum_workers=10\ntrain_batch_size = 12 # 32  # max 96 for DiT-L/4 when grad_checkpoint\nnum_epochs = 1000 # 3\ngradient_accumulation_steps = 4\ngrad_checkpointing = True\ngradient_clip = 0.01\noptimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)\nlr_schedule_args = dict(num_warmup_steps=0)\nsave_model_epochs=5\nsave_model_steps=1000\n\nlog_interval = 20\neval_sampling_steps = 200\nwork_dir = 'output_debug/debug'\n\n# controlnet related params\ncopy_blocks_num = 13\nclass_dropout_prob = 0.5\ntrain_ratio = 0.1\n"
  },
  {
    "path": "PixArt-alpha-ToCa/configs/pixart_config/PixArt_xl2_img1024_internal.py",
    "content": "_base_ = ['../PixArt_xl2_internal.py']\ndata_root = 'data'\nimage_list_json = ['data_info.json',]\n\ndata = dict(type='InternalData', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)\nimage_size = 1024\n\n# model setting\nwindow_block_indexes = []\nwindow_size=0\nuse_rel_pos=False\nmodel = 'PixArt_XL_2'\nfp32_attention = True\nload_from = None\nvae_pretrained = \"output/pretrained_models/sd-vae-ft-ema\"\nlewei_scale = 2.0\n\n# training setting\nnum_workers=10\ntrain_batch_size = 2 # 32\nnum_epochs = 200 # 3\ngradient_accumulation_steps = 1\ngrad_checkpointing = True\ngradient_clip = 0.01\noptimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)\nlr_schedule_args = dict(num_warmup_steps=1000)\n\neval_sampling_steps = 200\nlog_interval = 20\nsave_model_epochs=1\nsave_model_steps=2000\nwork_dir = 'output/debug'\n"
  },
  {
    "path": "PixArt-alpha-ToCa/configs/pixart_config/PixArt_xl2_img1024_internalms.py",
    "content": "_base_ = ['../PixArt_xl2_internal.py']\ndata_root = 'data'\nimage_list_json = ['data_info.json',]\n\ndata = dict(type='InternalDataMS', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)\nimage_size = 1024\n\n# model setting\nmodel = 'PixArtMS_XL_2'     # model for multi-scale training\nfp32_attention = True\nload_from = None\nvae_pretrained = \"output/pretrained_models/sd-vae-ft-ema\"\nwindow_block_indexes = []\nwindow_size=0\nuse_rel_pos=False\naspect_ratio_type = 'ASPECT_RATIO_1024'         # base aspect ratio [ASPECT_RATIO_512 or ASPECT_RATIO_256]\nmulti_scale = True     # if use multiscale dataset model training\nlewei_scale = 2.0\n\n# training setting\nnum_workers=10\ntrain_batch_size = 12   # max 14 for PixArt-xL/2 when grad_checkpoint\nnum_epochs = 10 # 3\ngradient_accumulation_steps = 1\ngrad_checkpointing = True\ngradient_clip = 0.01\noptimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)\nlr_schedule_args = dict(num_warmup_steps=1000)\nsave_model_epochs=1\nsave_model_steps=2000\n\nlog_interval = 20\neval_sampling_steps = 200\nwork_dir = 'output/debug'\n"
  },
  {
    "path": "PixArt-alpha-ToCa/configs/pixart_config/PixArt_xl2_img1024_lcm.py",
    "content": "_base_ = ['../PixArt_xl2_internal.py']\ndata_root = 'data'\nimage_list_json = ['data_info.json',]\n\ndata = dict(type='InternalDataMS', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)\nimage_size = 1024\n\n# model setting\nmodel = 'PixArtMS_XL_2'     # model for multi-scale training\nfp32_attention = False  # Set to True if you got NaN loss\nload_from = None\nvae_pretrained = \"output/pretrained_models/sd-vae-ft-ema\"\nwindow_block_indexes = []\nwindow_size=0\nuse_rel_pos=False\naspect_ratio_type = 'ASPECT_RATIO_1024'         # base aspect ratio [ASPECT_RATIO_512 or ASPECT_RATIO_256]\nmulti_scale = True     # if use multiscale dataset model training\nlewei_scale = 2.0\n\n# training setting\nnum_workers=4\ntrain_batch_size = 16   # max 12 for PixArt-xL/2 when grad_checkpoint   16 for LCM-LoRA\nnum_epochs = 10 # 3\ngradient_accumulation_steps = 1\ngrad_checkpointing = True\ngradient_clip = 0.01\noptimizer = dict(type='AdamW', lr=2e-5, weight_decay=0.0, eps=1e-10)\n# optimizer = dict(type='CAMEWrapper', lr=1e-7, weight_decay=0.0, betas=(0.9, 0.999, 0.9999), eps=(1e-30, 1e-16))\nlr_schedule_args = dict(num_warmup_steps=100)\nsave_model_epochs=1\nsave_model_steps=200\nvalid_num=0      # take as valid aspect-ratio when sample number >= valid_num\n\nlog_interval = 10\neval_sampling_steps = 200\nwork_dir = 'output/debug'\n\n# LCM\nloss_type = 'huber'\nhuber_c = 0.001\nnum_ddim_timesteps=50\nw_max = 15.0\nw_min = 3.0\nema_decay = 0.95\ncfg_scale = 4.5\nclass_dropout_prob = 0.\nlora_rank = 32"
  },
  {
    "path": "PixArt-alpha-ToCa/configs/pixart_config/PixArt_xl2_img256_SAM.py",
    "content": "_base_ = ['../PixArt_xl2_sam.py']\ndata_root = 'data'\nimage_list_txt = ['part0.txt', 'part1.txt', 'part2.txt', 'part3.txt', 'part4.txt', 'part5.txt', 'part6.txt', 'part7.txt', 'part8.txt',\n                  'part9.txt', 'part10.txt', 'part11.txt', 'part12.txt', 'part13.txt', 'part14.txt','part15.txt','part16.txt',\n                  'part17.txt','part18.txt','part19.txt','part20.txt','part21.txt', 'part22.txt', 'part23.txt', 'part24.txt',\n                  'part25.txt', 'part26.txt', 'part27.txt', 'part28.txt', 'part29.txt', 'part30.txt', 'part31.txt']\ndata = dict(type='SAM', root='SA1B', image_list_txt=image_list_txt, transform='default_train', load_vae_feat=True)\nimage_size = 256\n\n# model setting\nwindow_block_indexes=[]\nwindow_size=0\nuse_rel_pos=False\nmodel = 'PixArt_XL_2'\nfp32_attention = True\nload_from = None\nvae_pretrained = \"output/pretrained_models/sd-vae-ft-ema\"\n\n# training setting\nuse_fsdp=False   # if use FSDP mode\nnum_workers=10\ntrain_batch_size = 176 # 32\nnum_epochs = 200 # 3\ngradient_accumulation_steps = 1\ngrad_checkpointing = True\ngradient_clip = 0.01\noptimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)\nlr_schedule_args = dict(num_warmup_steps=1000)\n\neval_sampling_steps = 200\nlog_interval = 20\nsave_model_epochs=2\nsave_model_steps=20000\nwork_dir = 'output/debug'\n"
  },
  {
    "path": "PixArt-alpha-ToCa/configs/pixart_config/PixArt_xl2_img256_internal.py",
    "content": "_base_ = ['../PixArt_xl2_internal.py']\ndata_root = 'data'\nimage_list_json = ['data_info.json',]\n\ndata = dict(type='InternalData', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)\nimage_size = 256\n\n# model setting\nwindow_block_indexes=[]\nwindow_size=0\nuse_rel_pos=False\nmodel = 'PixArt_XL_2'\nfp32_attention = True\nload_from = None\nvae_pretrained = \"output/pretrained_models/sd-vae-ft-ema\"\n# training setting\neval_sampling_steps = 200\n\nnum_workers=10\ntrain_batch_size = 176 # 32  # max 96 for PixArt-L/4 when grad_checkpoint\nnum_epochs = 200 # 3\ngradient_accumulation_steps = 1\ngrad_checkpointing = True\ngradient_clip = 0.01\noptimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)\nlr_schedule_args = dict(num_warmup_steps=1000)\n\nlog_interval = 20\nsave_model_epochs=5\nwork_dir = 'output/debug'\n"
  },
  {
    "path": "PixArt-alpha-ToCa/configs/pixart_config/PixArt_xl2_img512_internal.py",
    "content": "_base_ = ['../PixArt_xl2_internal.py']\ndata_root = 'data'\nimage_list_json = ['data_info.json',]\n\ndata = dict(type='InternalData', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)\nimage_size = 512\n\n# model setting\nwindow_block_indexes = []\nwindow_size=0\nuse_rel_pos=False\nmodel = 'PixArt_XL_2'\nfp32_attention = True\nload_from = None\nvae_pretrained = \"output/pretrained_models/sd-vae-ft-ema\"\nlewei_scale = 1.0\n\n# training setting\nuse_fsdp=False   # if use FSDP mode\nnum_workers=10\ntrain_batch_size = 38 # 32\nnum_epochs = 200 # 3\ngradient_accumulation_steps = 1\ngrad_checkpointing = True\ngradient_clip = 0.01\noptimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)\nlr_schedule_args = dict(num_warmup_steps=1000)\n\neval_sampling_steps = 200\nlog_interval = 20\nsave_model_epochs=1\nwork_dir = 'output/debug'\n"
  },
  {
    "path": "PixArt-alpha-ToCa/configs/pixart_config/PixArt_xl2_img512_internalms.py",
    "content": "_base_ = ['../PixArt_xl2_internal.py']\ndata_root = 'data'\nimage_list_json = ['data_info.json',]\n\ndata = dict(type='InternalDataMS', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)\nimage_size = 512\n\n# model setting\nmodel = 'PixArtMS_XL_2'     # model for multi-scale training\nfp32_attention = True\nload_from = None\nvae_pretrained = \"output/pretrained_models/sd-vae-ft-ema\"\nwindow_block_indexes = []\nwindow_size=0\nuse_rel_pos=False\naspect_ratio_type = 'ASPECT_RATIO_512'         # base aspect ratio [ASPECT_RATIO_512 or ASPECT_RATIO_256]\nmulti_scale = True     # if use multiscale dataset model training\nlewei_scale = 1.0\n\n# training setting\nnum_workers=10\ntrain_batch_size = 40   # max 40 for PixArt-xL/2 when grad_checkpoint\nnum_epochs = 20 # 3\ngradient_accumulation_steps = 1\ngrad_checkpointing = True\ngradient_clip = 0.01\noptimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)\nlr_schedule_args = dict(num_warmup_steps=1000)\nsave_model_epochs=1\nsave_model_steps=2000\n\nlog_interval = 20\neval_sampling_steps = 200\nwork_dir = 'output/debug'\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/__init__.py",
    "content": "# Modified from OpenAI's diffusion repos\n#     GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py\n#     ADM:   https://github.com/openai/guided-diffusion/blob/main/guided_diffusion\n#     IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\n\nfrom .iddpm import IDDPM\nfrom .dpm_solver import DPMS\nfrom .sa_sampler import SASolverSampler\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/data/__init__.py",
    "content": "from .datasets import *\nfrom .transforms import get_transform\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/data/builder.py",
    "content": "import os\nimport time\n\nfrom mmcv import Registry, build_from_cfg\nfrom torch.utils.data import DataLoader\n\nfrom diffusion.data.transforms import get_transform\nfrom diffusion.utils.logger import get_root_logger\n\nDATASETS = Registry('datasets')\n\nDATA_ROOT = '/cache/data'\n\n\ndef set_data_root(data_root):\n    global DATA_ROOT\n    DATA_ROOT = data_root\n\n\ndef get_data_path(data_dir):\n    if os.path.isabs(data_dir):\n        return data_dir\n    global DATA_ROOT\n    return os.path.join(DATA_ROOT, data_dir)\n\n\ndef build_dataset(cfg, resolution=224, **kwargs):\n    logger = get_root_logger()\n\n    dataset_type = cfg.get('type')\n    logger.info(f\"Constructing dataset {dataset_type}...\")\n    t = time.time()\n    transform = cfg.pop('transform', 'default_train')\n    transform = get_transform(transform, resolution)\n    dataset = build_from_cfg(cfg, DATASETS, default_args=dict(transform=transform, resolution=resolution, **kwargs))\n    logger.info(f\"Dataset {dataset_type} constructed. time: {(time.time() - t):.2f} s, length (use/ori): {len(dataset)}/{dataset.ori_imgs_nums}\")\n    return dataset\n\n\ndef build_dataloader(dataset, batch_size=256, num_workers=4, shuffle=True, **kwargs):\n    return (\n        DataLoader(\n            dataset,\n            batch_sampler=kwargs['batch_sampler'],\n            num_workers=num_workers,\n            pin_memory=True,\n        )\n        if 'batch_sampler' in kwargs\n        else DataLoader(\n            dataset,\n            batch_size=batch_size,\n            shuffle=shuffle,\n            num_workers=num_workers,\n            pin_memory=True,\n            **kwargs\n        )\n    )\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/data/datasets/Dreambooth.py",
    "content": "from PIL import Image\nimport numpy as np\nimport torch\nfrom torchvision.datasets.folder import default_loader, IMG_EXTENSIONS\nfrom torch.utils.data import Dataset\nfrom diffusers.utils.torch_utils import randn_tensor\nfrom torchvision import transforms as T\nimport pathlib\nfrom diffusers.models import AutoencoderKL\n\nfrom diffusion.data.builder import get_data_path, DATASETS\nfrom diffusion.data.datasets.utils import *\n\nIMAGE_EXTENSIONS = {'bmp', 'jpg', 'jpeg', 'pgm', 'png', 'ppm', 'tif', 'tiff', 'webp', 'JPEG'}\n\n\n@DATASETS.register_module()\nclass DreamBooth(Dataset):\n    def __init__(self,\n                 root,\n                 transform=None,\n                 resolution=1024,\n                 **kwargs):\n        self.root = get_data_path(root)\n        path = pathlib.Path(self.root)\n        self.transform = transform\n        self.resolution = resolution\n        self.img_samples = sorted(\n            [file for ext in IMAGE_EXTENSIONS for file in path.glob(f'*.{ext}')]\n        )\n        self.ori_imgs_nums = len(self)\n        self.loader = default_loader\n        self.base_size = int(kwargs['aspect_ratio_type'].split('_')[-1])\n        self.aspect_ratio = eval(kwargs.pop('aspect_ratio_type'))       # base aspect ratio\n        self.ratio_nums = {}\n        for k, v in self.aspect_ratio.items():\n            self.ratio_nums[float(k)] = 0      # used for batch-sampler\n        self.data_info = {'img_hw': torch.tensor([resolution, resolution], dtype=torch.float32), 'aspect_ratio': 1.}\n\n        # image related\n        with torch.inference_mode():\n            vae = AutoencoderKL.from_pretrained(\"output/pretrained_models/sd-vae-ft-ema\")\n            imgs = []\n            for img_path in self.img_samples:\n                img = self.loader(img_path)\n                self.ratio_nums[1.0] += 1\n                if self.transform is not None:\n                    imgs.append(self.transform(img))\n            imgs = torch.stack(imgs, dim=0)\n            self.img_vae = vae.encode(imgs).latent_dist.sample()\n            del vae\n\n    def __getitem__(self, index):\n        return self.img_vae[index], self.data_info\n\n    @staticmethod\n    def vae_feat_loader(path):\n        # [mean, std]\n        mean, std = torch.from_numpy(np.load(path)).chunk(2)\n        sample = randn_tensor(mean.shape, generator=None, device=mean.device, dtype=mean.dtype)\n        return mean + std * sample\n\n    def load_ori_img(self, img_path):\n        # 加载图像并转换为Tensor\n        transform = T.Compose([\n            T.Resize(256),  # Image.BICUBIC\n            T.CenterCrop(256),\n            T.ToTensor(),\n        ])\n        return transform(Image.open(img_path))\n\n    def __len__(self):\n        return len(self.img_samples)\n\n    def __getattr__(self, name):\n        if name == \"set_epoch\":\n            return lambda epoch: None\n        raise AttributeError(f\"'{type(self).__name__}' object has no attribute '{name}'\")\n\n    def get_data_info(self, idx):\n        return {'height': self.resolution, 'width': self.resolution}\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/data/datasets/InternalData.py",
    "content": "import os\nimport random\nfrom PIL import Image\nimport numpy as np\nimport torch\nfrom torchvision.datasets.folder import default_loader, IMG_EXTENSIONS\nfrom torch.utils.data import Dataset\nfrom diffusers.utils.torch_utils import randn_tensor\nfrom torchvision import transforms as T\nfrom diffusion.data.builder import get_data_path, DATASETS\nfrom diffusion.utils.logger import get_root_logger\n\nimport json\n\n\n@DATASETS.register_module()\nclass InternalData(Dataset):\n    def __init__(self,\n                 root,\n                 image_list_json='data_info.json',\n                 transform=None,\n                 resolution=256,\n                 sample_subset=None,\n                 load_vae_feat=False,\n                 input_size=32,\n                 patch_size=2,\n                 mask_ratio=0.0,\n                 load_mask_index=False,\n                 max_length=120,\n                 config=None,\n                 **kwargs):\n        self.root = get_data_path(root)\n        self.transform = transform\n        self.load_vae_feat = load_vae_feat\n        self.ori_imgs_nums = 0\n        self.resolution = resolution\n        self.N = int(resolution // (input_size // patch_size))\n        self.mask_ratio = mask_ratio\n        self.load_mask_index = load_mask_index\n        self.max_lenth = max_length\n        self.meta_data_clean = []\n        self.img_samples = []\n        self.txt_feat_samples = []\n        self.vae_feat_samples = []\n        self.mask_index_samples = []\n        self.prompt_samples = []\n\n        image_list_json = image_list_json if isinstance(image_list_json, list) else [image_list_json]\n        for json_file in image_list_json:\n            meta_data = self.load_json(os.path.join(self.root, 'partition', json_file))\n            self.ori_imgs_nums += len(meta_data)\n            meta_data_clean = [item for item in meta_data if item['ratio'] <= 4]\n            self.meta_data_clean.extend(meta_data_clean)\n            self.img_samples.extend([os.path.join(self.root.replace('InternData', \"InternImgs\"), item['path']) for item in meta_data_clean])\n            self.txt_feat_samples.extend([os.path.join(self.root, 'caption_feature_wmask', '_'.join(item['path'].rsplit('/', 1)).replace('.png', '.npz')) for item in meta_data_clean])\n            self.vae_feat_samples.extend([os.path.join(self.root, f'img_vae_features_{resolution}resolution/noflip', '_'.join(item['path'].rsplit('/', 1)).replace('.png', '.npy')) for item in meta_data_clean])\n            self.prompt_samples.extend([item['prompt'] for item in meta_data_clean])\n\n        # Set loader and extensions\n        if load_vae_feat:\n            self.transform = None\n            self.loader = self.vae_feat_loader\n        else:\n            self.loader = default_loader\n\n        if sample_subset is not None:\n            self.sample_subset(sample_subset)  # sample dataset for local debug\n        logger = get_root_logger() if config is None else get_root_logger(os.path.join(config.work_dir, 'train_log.log'))\n        logger.info(f\"T5 max token length: {self.max_lenth}\")\n\n    def getdata(self, index):\n        img_path = self.img_samples[index]\n        npz_path = self.txt_feat_samples[index]\n        npy_path = self.vae_feat_samples[index]\n        prompt = self.prompt_samples[index]\n        data_info = {\n            'img_hw': torch.tensor([torch.tensor(self.resolution), torch.tensor(self.resolution)], dtype=torch.float32),\n            'aspect_ratio': torch.tensor(1.)\n        }\n\n        img = self.loader(npy_path) if self.load_vae_feat else self.loader(img_path)\n        txt_info = np.load(npz_path)\n        txt_fea = torch.from_numpy(txt_info['caption_feature'])     # 1xTx4096\n        attention_mask = torch.ones(1, 1, txt_fea.shape[1])     # 1x1xT\n        if 'attention_mask' in txt_info.keys():\n            attention_mask = torch.from_numpy(txt_info['attention_mask'])[None]\n        if txt_fea.shape[1] != self.max_lenth:\n            txt_fea = torch.cat([txt_fea, txt_fea[:, -1:].repeat(1, self.max_lenth-txt_fea.shape[1], 1)], dim=1)\n            attention_mask = torch.cat([attention_mask, torch.zeros(1, 1, self.max_lenth-attention_mask.shape[-1])], dim=-1)\n\n        if self.transform:\n            img = self.transform(img)\n\n        data_info['prompt'] = prompt\n        return img, txt_fea, attention_mask, data_info\n\n    def __getitem__(self, idx):\n        for _ in range(20):\n            try:\n                return self.getdata(idx)\n            except Exception as e:\n                print(f\"Error details: {str(e)}\")\n                idx = np.random.randint(len(self))\n        raise RuntimeError('Too many bad data.')\n\n    def get_data_info(self, idx):\n        data_info = self.meta_data_clean[idx]\n        return {'height': data_info['height'], 'width': data_info['width']}\n\n    @staticmethod\n    def vae_feat_loader(path):\n        # [mean, std]\n        mean, std = torch.from_numpy(np.load(path)).chunk(2)\n        sample = randn_tensor(mean.shape, generator=None, device=mean.device, dtype=mean.dtype)\n        return mean + std * sample\n\n    def load_ori_img(self, img_path):\n        # 加载图像并转换为Tensor\n        transform = T.Compose([\n            T.Resize(256),  # Image.BICUBIC\n            T.CenterCrop(256),\n            T.ToTensor(),\n        ])\n        return transform(Image.open(img_path))\n\n    def load_json(self, file_path):\n        with open(file_path, 'r') as f:\n            meta_data = json.load(f)\n\n        return meta_data\n\n    def sample_subset(self, ratio):\n        sampled_idx = random.sample(list(range(len(self))), int(len(self) * ratio))\n        self.img_samples = [self.img_samples[i] for i in sampled_idx]\n\n    def __len__(self):\n        return len(self.img_samples)\n\n    def __getattr__(self, name):\n        if name == \"set_epoch\":\n            return lambda epoch: None\n        raise AttributeError(f\"'{type(self).__name__}' object has no attribute '{name}'\")\n\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/data/datasets/InternalData_ms.py",
    "content": "import os\nimport numpy as np\nimport torch\nimport random\nfrom torchvision.datasets.folder import default_loader\nfrom diffusion.data.datasets.InternalData import InternalData\nfrom diffusion.data.builder import get_data_path, DATASETS\nfrom diffusion.utils.logger import get_root_logger\nimport torchvision.transforms as T\nfrom torchvision.transforms.functional import InterpolationMode\nfrom diffusion.data.datasets.utils import *\n\ndef get_closest_ratio(height: float, width: float, ratios: dict):\n    aspect_ratio = height / width\n    closest_ratio = min(ratios.keys(), key=lambda ratio: abs(float(ratio) - aspect_ratio))\n    return ratios[closest_ratio], float(closest_ratio)\n\n\n@DATASETS.register_module()\nclass InternalDataMS(InternalData):\n    def __init__(self,\n                 root,\n                 image_list_json='data_info.json',\n                 transform=None,\n                 resolution=256,\n                 sample_subset=None,\n                 load_vae_feat=False,\n                 input_size=32,\n                 patch_size=2,\n                 mask_ratio=0.0,\n                 mask_type='null',\n                 load_mask_index=False,\n                 max_length=120,\n                 config=None,\n                 **kwargs):\n        self.root = get_data_path(root)\n        self.transform = transform\n        self.load_vae_feat = load_vae_feat\n        self.ori_imgs_nums = 0\n        self.resolution = resolution\n        self.N = int(resolution // (input_size // patch_size))\n        self.mask_ratio = mask_ratio\n        self.load_mask_index = load_mask_index\n        self.mask_type = mask_type\n        self.base_size = int(kwargs['aspect_ratio_type'].split('_')[-1])\n        self.max_lenth = max_length\n        self.aspect_ratio = eval(kwargs.pop('aspect_ratio_type'))       # base aspect ratio\n        self.meta_data_clean = []\n        self.img_samples = []\n        self.txt_feat_samples = []\n        self.vae_feat_samples = []\n        self.mask_index_samples = []\n        self.ratio_index = {}\n        self.ratio_nums = {}\n        for k, v in self.aspect_ratio.items():\n            self.ratio_index[float(k)] = []     # used for self.getitem\n            self.ratio_nums[float(k)] = 0      # used for batch-sampler\n\n        image_list_json = image_list_json if isinstance(image_list_json, list) else [image_list_json]\n        for json_file in image_list_json:\n            meta_data = self.load_json(os.path.join(self.root, 'partition_filter', json_file))\n            self.ori_imgs_nums += len(meta_data)\n            meta_data_clean = [item for item in meta_data if item['ratio'] <= 4]\n            self.meta_data_clean.extend(meta_data_clean)\n            self.img_samples.extend([os.path.join(self.root.replace('InternData', \"InternImgs\"), item['path']) for item in meta_data_clean])\n            self.txt_feat_samples.extend([os.path.join(self.root, 'caption_feature_wmask', '_'.join(item['path'].rsplit('/', 1)).replace('.png', '.npz')) for item in meta_data_clean])\n            self.vae_feat_samples.extend([os.path.join(self.root, f'img_vae_fatures_{resolution}_multiscale/ms', '_'.join(item['path'].rsplit('/', 1)).replace('.png', '.npy')) for item in meta_data_clean])\n\n        # Set loader and extensions\n        if load_vae_feat:\n            self.transform = None\n            self.loader = self.vae_feat_loader\n        else:\n            self.loader = default_loader\n\n        if sample_subset is not None:\n            self.sample_subset(sample_subset)  # sample dataset for local debug\n\n        # scan the dataset for ratio static\n        for i, info in enumerate(self.meta_data_clean[:len(self.meta_data_clean)//3]):\n            ori_h, ori_w = info['height'], info['width']\n            closest_size, closest_ratio = get_closest_ratio(ori_h, ori_w, self.aspect_ratio)\n            self.ratio_nums[closest_ratio] += 1\n            if len(self.ratio_index[closest_ratio]) == 0:\n                self.ratio_index[closest_ratio].append(i)\n        # print(self.ratio_nums)\n        logger = get_root_logger() if config is None else get_root_logger(os.path.join(config.work_dir, 'train_log.log'))\n        logger.info(f\"T5 max token length: {self.max_lenth}\")\n\n    def getdata(self, index):\n        img_path = self.img_samples[index]\n        npz_path = self.txt_feat_samples[index]\n        npy_path = self.vae_feat_samples[index]\n        ori_h, ori_w = self.meta_data_clean[index]['height'], self.meta_data_clean[index]['width']\n\n        # Calculate the closest aspect ratio and resize & crop image[w, h]\n        closest_size, closest_ratio = get_closest_ratio(ori_h, ori_w, self.aspect_ratio)\n        closest_size = list(map(lambda x: int(x), closest_size))\n        self.closest_ratio = closest_ratio\n\n        if self.load_vae_feat:\n            try:\n                img = self.loader(npy_path)\n                if index not in self.ratio_index[closest_ratio]:\n                    self.ratio_index[closest_ratio].append(index)\n            except Exception:\n                index = random.choice(self.ratio_index[closest_ratio])\n                return self.getdata(index)\n            h, w = (img.shape[1], img.shape[2])\n            assert h, w == (ori_h//8, ori_w//8)\n        else:\n            img = self.loader(img_path)\n            h, w = (img.size[1], img.size[0])\n            assert h, w == (ori_h, ori_w)\n\n        data_info = {'img_hw': torch.tensor([ori_h, ori_w], dtype=torch.float32)}\n        data_info['aspect_ratio'] = closest_ratio\n        data_info[\"mask_type\"] = self.mask_type\n\n        txt_info = np.load(npz_path)\n        txt_fea = torch.from_numpy(txt_info['caption_feature'])\n        attention_mask = torch.ones(1, 1, txt_fea.shape[1])\n        if 'attention_mask' in txt_info.keys():\n            attention_mask = torch.from_numpy(txt_info['attention_mask'])[None]\n\n        if not self.load_vae_feat:\n            if closest_size[0] / ori_h > closest_size[1] / ori_w:\n                resize_size = closest_size[0], int(ori_w * closest_size[0] / ori_h)\n            else:\n                resize_size = int(ori_h * closest_size[1] / ori_w), closest_size[1]\n            self.transform = T.Compose([\n                T.Lambda(lambda img: img.convert('RGB')),\n                T.Resize(resize_size, interpolation=InterpolationMode.BICUBIC),  # Image.BICUBIC\n                T.CenterCrop(closest_size),\n                T.ToTensor(),\n                T.Normalize([.5], [.5]),\n            ])\n\n        if self.transform:\n            img = self.transform(img)\n\n        return img, txt_fea, attention_mask, data_info\n\n    def __getitem__(self, idx):\n        for _ in range(20):\n            try:\n                return self.getdata(idx)\n            except Exception as e:\n                print(f\"Error details: {str(e)}\")\n                idx = random.choice(self.ratio_index[self.closest_ratio])\n        raise RuntimeError('Too many bad data.')\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/data/datasets/SA.py",
    "content": "import os\nimport random\nimport time\n\nimport numpy as np\nimport torch\nfrom torchvision.datasets.folder import default_loader, IMG_EXTENSIONS\nfrom torch.utils.data import Dataset\nfrom diffusers.utils.torch_utils import randn_tensor\n\nfrom diffusion.data.builder import get_data_path, DATASETS\n\n\n@DATASETS.register_module()\nclass SAM(Dataset):\n    def __init__(self,\n                 root,\n                 image_list_txt='part0.txt',\n                 transform=None,\n                 resolution=256,\n                 sample_subset=None,\n                 load_vae_feat=False,\n                 mask_ratio=0.0,\n                 mask_type='null',\n                 **kwargs):\n        self.root = get_data_path(root)\n        self.transform = transform\n        self.load_vae_feat = load_vae_feat\n        self.mask_type = mask_type\n        self.mask_ratio = mask_ratio\n        self.resolution = resolution\n        self.img_samples = []\n        self.txt_feat_samples = []\n        self.vae_feat_samples = []\n        image_list_txt = image_list_txt if isinstance(image_list_txt, list) else [image_list_txt]\n        if image_list_txt == 'all':\n            image_list_txts = os.listdir(os.path.join(self.root, 'partition'))\n            for txt in image_list_txts:\n                image_list = os.path.join(self.root, 'partition', txt)\n                with open(image_list, 'r') as f:\n                    lines = [line.strip() for line in f.readlines()]\n                    self.img_samples.extend([os.path.join(self.root, 'images', i+'.jpg') for i in lines])\n                    self.txt_feat_samples.extend([os.path.join(self.root, 'caption_feature_wmask', i+'.npz') for i in lines])\n        elif isinstance(image_list_txt, list):\n            for txt in image_list_txt:\n                image_list = os.path.join(self.root, 'partition', txt)\n                with open(image_list, 'r') as f:\n                    lines = [line.strip() for line in f.readlines()]\n                    self.img_samples.extend([os.path.join(self.root, 'images', i + '.jpg') for i in lines])\n                    self.txt_feat_samples.extend([os.path.join(self.root, 'caption_feature_wmask', i + '.npz') for i in lines])\n                    self.vae_feat_samples.extend([os.path.join(self.root, 'img_vae_feature/train_vae_256/noflip', i + '.npy') for i in lines])\n\n        self.ori_imgs_nums = len(self)\n        # self.img_samples = self.img_samples[:10000]\n        # Set loader and extensions\n        if load_vae_feat:\n            self.transform = None\n            self.loader = self.vae_feat_loader\n        else:\n            self.loader = default_loader\n\n        if sample_subset is not None:\n            self.sample_subset(sample_subset)  # sample dataset for local debug\n\n    def getdata(self, idx):\n        img_path = self.img_samples[idx]\n        npz_path = self.txt_feat_samples[idx]\n        npy_path = self.vae_feat_samples[idx]\n        data_info = {'img_hw': torch.tensor([self.resolution, self.resolution], dtype=torch.float32),\n                     'aspect_ratio': torch.tensor(1.)}\n\n        img = self.loader(npy_path) if self.load_vae_feat else self.loader(img_path)\n        npz_info = np.load(npz_path)\n        txt_fea = torch.from_numpy(npz_info['caption_feature'])\n        attention_mask = torch.ones(1, 1, txt_fea.shape[1])\n        if 'attention_mask' in npz_info.keys():\n            attention_mask = torch.from_numpy(npz_info['attention_mask'])[None]\n\n        if self.transform:\n            img = self.transform(img)\n\n        data_info[\"mask_type\"] = self.mask_type\n\n        return img, txt_fea, attention_mask, data_info\n\n    def __getitem__(self, idx):\n        for _ in range(20):\n            try:\n                return self.getdata(idx)\n            except Exception:\n                print(self.img_samples[idx], ' info is not correct')\n                idx = np.random.randint(len(self))\n        raise RuntimeError('Too many bad data.')\n\n    @staticmethod\n    def vae_feat_loader(path):\n        # [mean, std]\n        mean, std = torch.from_numpy(np.load(path)).chunk(2)\n        sample = randn_tensor(mean.shape, generator=None, device=mean.device, dtype=mean.dtype)\n        return mean + std * sample\n        # return mean\n\n    def sample_subset(self, ratio):\n        sampled_idx = random.sample(list(range(len(self))), int(len(self) * ratio))\n        self.img_samples = [self.img_samples[i] for i in sampled_idx]\n        self.txt_feat_samples = [self.txt_feat_samples[i] for i in sampled_idx]\n\n    def __len__(self):\n        return len(self.img_samples)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/data/datasets/__init__.py",
    "content": "from .SA import SAM\nfrom .InternalData import InternalData\nfrom .InternalData_ms import InternalDataMS\nfrom .Dreambooth import DreamBooth\nfrom .pixart_control import InternalDataHed\nfrom .utils import *\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/data/datasets/pixart_control.py",
    "content": "import os\nimport random\nfrom PIL import Image\nimport numpy as np\nimport torch\nfrom torchvision.datasets.folder import default_loader, IMG_EXTENSIONS\nfrom torch.utils.data import Dataset\nfrom diffusers.utils.torch_utils import randn_tensor\nfrom torchvision import transforms as T\nfrom diffusion.data.builder import get_data_path, DATASETS\n\nimport json, time\n\n\n@DATASETS.register_module()\nclass InternalDataHed(Dataset):\n    def __init__(self,\n                 root,\n                 image_list_json='data_info.json',\n                 transform=None,\n                 resolution=256,\n                 sample_subset=None,\n                 load_vae_feat=False,\n                 input_size=32,\n                 patch_size=2,\n                 mask_ratio=0.0,\n                 load_mask_index=False,\n                 train_ratio=1.0,\n                 mode='train',\n                 **kwargs):\n        self.root = get_data_path(root)\n        self.transform = transform\n        self.load_vae_feat = load_vae_feat\n        self.ori_imgs_nums = 0\n        self.resolution = resolution\n        self.N = int(resolution // (input_size // patch_size))\n        self.mask_ratio = mask_ratio\n        self.load_mask_index = load_mask_index\n        self.meta_data_clean = []\n        self.img_samples = []\n        self.txt_feat_samples = []\n        self.vae_feat_samples = []\n        self.hed_feat_samples = []\n        self.prompt_samples = []\n\n        image_list_json = image_list_json if isinstance(image_list_json, list) else [image_list_json]\n        for json_file in image_list_json:\n            meta_data = self.load_json(os.path.join(self.root, 'partition_filter', json_file))\n            self.ori_imgs_nums += len(meta_data)\n            meta_data_clean = [item for item in meta_data if item['ratio'] <= 4]\n            self.meta_data_clean.extend(meta_data_clean)\n            self.img_samples.extend([os.path.join(self.root.replace('InternData', \"InternImgs\"), item['path']) for item in meta_data_clean])\n            self.txt_feat_samples.extend([os.path.join(self.root, 'caption_features', '_'.join(item['path'].rsplit('/', 1)).replace('.png', '.npz')) for item in meta_data_clean])\n            self.vae_feat_samples.extend([os.path.join(self.root, f'img_vae_features_{resolution}resolution/noflip', '_'.join(item['path'].rsplit('/', 1)).replace('.png', '.npy')) for item in meta_data_clean])\n            self.hed_feat_samples.extend([os.path.join(self.root, f'hed_feature_{resolution}', item['path'].replace('.png', '.npz')) for item in meta_data_clean])\n            self.prompt_samples.extend([item['prompt'] for item in meta_data_clean])\n\n        total_sample = len(self.img_samples)\n        used_sample_num = int(total_sample * train_ratio)\n        print(\"using mode\", mode)\n        if mode == 'train':\n            self.img_samples = self.img_samples[:used_sample_num]\n            self.txt_feat_samples = self.txt_feat_samples[:used_sample_num]\n            self.vae_feat_samples = self.vae_feat_samples[:used_sample_num]\n            self.hed_feat_samples = self.hed_feat_samples[:used_sample_num]\n            self.prompt_samples = self.prompt_samples[:used_sample_num]\n        else:\n            self.img_samples = self.img_samples[-used_sample_num:]\n            self.txt_feat_samples = self.txt_feat_samples[-used_sample_num:]\n            self.vae_feat_samples = self.vae_feat_samples[-used_sample_num:]\n            self.hed_feat_samples = self.hed_feat_samples[-used_sample_num:]\n            self.prompt_samples = self.prompt_samples[-used_sample_num:]\n\n        # Set loader and extensions\n        if load_vae_feat:\n            self.transform = None\n            self.loader = self.vae_feat_loader\n        else:\n            self.loader = default_loader\n\n        if sample_subset is not None:\n            self.sample_subset(sample_subset)  # sample dataset for local debug\n\n    def getdata(self, index):\n        img_path = self.img_samples[index]\n        npz_path = self.txt_feat_samples[index]\n        npy_path = self.vae_feat_samples[index]\n        hed_npz_path = self.hed_feat_samples[index]\n        prompt = self.prompt_samples[index]\n        # only trained on single-scale 1024 res data\n        data_info = {'img_hw': torch.tensor([1024., 1024.], dtype=torch.float32), 'aspect_ratio': torch.tensor(1.)}\n\n        if self.load_vae_feat:\n            img = self.loader(npy_path)\n        else:\n            img = self.loader(img_path)\n        hed_fea = self.vae_feat_loader_npz(hed_npz_path)\n        txt_info = np.load(npz_path)\n        txt_fea = torch.from_numpy(txt_info['caption_feature'])\n        attention_mask = torch.ones(1, 1, txt_fea.shape[1])\n        if 'attention_mask' in txt_info.keys():\n            attention_mask = torch.from_numpy(txt_info['attention_mask'])[None]\n\n        if self.transform:\n            img = self.transform(img)\n\n        data_info['condition'] = hed_fea\n        data_info['prompt'] = prompt\n        return img, txt_fea, attention_mask, data_info\n\n    def __getitem__(self, idx):\n        for i in range(20):\n            try:\n                data = self.getdata(idx)\n                return data\n            except Exception as e:\n                print(f\"Error details: {str(e)}\")\n                idx = np.random.randint(len(self))\n        raise RuntimeError('Too many bad data.')\n\n    def get_data_info(self, idx):\n        data_info = self.meta_data_clean[idx]\n        return {'height': data_info['height'], 'width': data_info['width']}\n\n    @staticmethod\n    def vae_feat_loader(path):\n        # [mean, std]\n        mean, std = torch.from_numpy(np.load(path)).chunk(2)\n        sample = randn_tensor(mean.shape, generator=None, device=mean.device, dtype=mean.dtype)\n        return mean + std * sample\n\n    @staticmethod\n    def vae_feat_loader_npz(path):\n        # [mean, std]\n        mean, std = torch.from_numpy(np.load(path)['arr_0']).chunk(2)\n        sample = randn_tensor(mean.shape, generator=None, device=mean.device, dtype=mean.dtype)\n        return mean + std * sample\n\n    def load_json(self, file_path):\n        with open(file_path, 'r') as f:\n            meta_data = json.load(f)\n\n        return meta_data\n\n    def sample_subset(self, ratio):\n        sampled_idx = random.sample(list(range(len(self))), int(len(self) * ratio))\n        self.img_samples = [self.img_samples[i] for i in sampled_idx]\n\n    def __len__(self):\n        return len(self.img_samples)\n\n    def __getattr__(self, name):\n        if name == \"set_epoch\":\n            return lambda epoch: None\n        raise AttributeError(f\"'{type(self).__name__}' object has no attribute '{name}'\")\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/data/datasets/utils.py",
    "content": "\n\nASPECT_RATIO_1024 = {\n    '0.25': [512., 2048.], '0.26': [512., 1984.], '0.27': [512., 1920.], '0.28': [512., 1856.],\n    '0.32': [576., 1792.], '0.33': [576., 1728.], '0.35': [576., 1664.], '0.4':  [640., 1600.],\n    '0.42':  [640., 1536.], '0.48': [704., 1472.], '0.5': [704., 1408.], '0.52': [704., 1344.],\n    '0.57': [768., 1344.], '0.6': [768., 1280.], '0.68': [832., 1216.], '0.72': [832., 1152.],\n    '0.78': [896., 1152.], '0.82': [896., 1088.], '0.88': [960., 1088.], '0.94': [960., 1024.],\n    '1.0':  [1024., 1024.], '1.07': [1024.,  960.], '1.13': [1088.,  960.], '1.21': [1088.,  896.],\n    '1.29': [1152.,  896.], '1.38': [1152.,  832.], '1.46': [1216.,  832.], '1.67': [1280.,  768.],\n    '1.75': [1344.,  768.], '2.0':  [1408.,  704.], '2.09':  [1472.,  704.], '2.4':  [1536.,  640.],\n    '2.5':  [1600.,  640.], '2.89':  [1664.,  576.], '3.0':  [1728.,  576.], '3.11':  [1792.,  576.],\n    '3.62':  [1856.,  512.], '3.75':  [1920.,  512.], '3.88':  [1984.,  512.], '4.0':  [2048.,  512.],\n}\n\nASPECT_RATIO_512 = {\n     '0.25': [256.0, 1024.0], '0.26': [256.0, 992.0], '0.27': [256.0, 960.0], '0.28': [256.0, 928.0],\n     '0.32': [288.0, 896.0], '0.33': [288.0, 864.0], '0.35': [288.0, 832.0], '0.4': [320.0, 800.0],\n     '0.42': [320.0, 768.0], '0.48': [352.0, 736.0], '0.5': [352.0, 704.0], '0.52': [352.0, 672.0],\n     '0.57': [384.0, 672.0], '0.6': [384.0, 640.0], '0.68': [416.0, 608.0], '0.72': [416.0, 576.0],\n     '0.78': [448.0, 576.0], '0.82': [448.0, 544.0], '0.88': [480.0, 544.0], '0.94': [480.0, 512.0],\n     '1.0': [512.0, 512.0], '1.07': [512.0, 480.0], '1.13': [544.0, 480.0], '1.21': [544.0, 448.0],\n     '1.29': [576.0, 448.0], '1.38': [576.0, 416.0], '1.46': [608.0, 416.0], '1.67': [640.0, 384.0],\n     '1.75': [672.0, 384.0], '2.0': [704.0, 352.0], '2.09': [736.0, 352.0], '2.4': [768.0, 320.0],\n     '2.5': [800.0, 320.0], '2.89': [832.0, 288.0], '3.0': [864.0, 288.0], '3.11': [896.0, 288.0],\n     '3.62': [928.0, 256.0], '3.75': [960.0, 256.0], '3.88': [992.0, 256.0], '4.0': [1024.0, 256.0]\n     }\n\nASPECT_RATIO_256 = {\n     '0.25': [128.0, 512.0], '0.26': [128.0, 496.0], '0.27': [128.0, 480.0], '0.28': [128.0, 464.0],\n     '0.32': [144.0, 448.0], '0.33': [144.0, 432.0], '0.35': [144.0, 416.0], '0.4': [160.0, 400.0],\n     '0.42': [160.0, 384.0], '0.48': [176.0, 368.0], '0.5': [176.0, 352.0], '0.52': [176.0, 336.0],\n     '0.57': [192.0, 336.0], '0.6': [192.0, 320.0], '0.68': [208.0, 304.0], '0.72': [208.0, 288.0],\n     '0.78': [224.0, 288.0], '0.82': [224.0, 272.0], '0.88': [240.0, 272.0], '0.94': [240.0, 256.0],\n     '1.0': [256.0, 256.0], '1.07': [256.0, 240.0], '1.13': [272.0, 240.0], '1.21': [272.0, 224.0],\n     '1.29': [288.0, 224.0], '1.38': [288.0, 208.0], '1.46': [304.0, 208.0], '1.67': [320.0, 192.0],\n     '1.75': [336.0, 192.0], '2.0': [352.0, 176.0], '2.09': [368.0, 176.0], '2.4': [384.0, 160.0],\n     '2.5': [400.0, 160.0], '2.89': [416.0, 144.0], '3.0': [432.0, 144.0], '3.11': [448.0, 144.0],\n     '3.62': [464.0, 128.0], '3.75': [480.0, 128.0], '3.88': [496.0, 128.0], '4.0': [512.0, 128.0]\n}\n\nASPECT_RATIO_256_TEST = {\n     '0.25': [128.0, 512.0], '0.28': [128.0, 464.0],\n     '0.32': [144.0, 448.0], '0.33': [144.0, 432.0], '0.35': [144.0, 416.0], '0.4': [160.0, 400.0],\n     '0.42': [160.0, 384.0], '0.48': [176.0, 368.0], '0.5': [176.0, 352.0], '0.52': [176.0, 336.0],\n     '0.57': [192.0, 336.0], '0.6': [192.0, 320.0], '0.68': [208.0, 304.0], '0.72': [208.0, 288.0],\n     '0.78': [224.0, 288.0], '0.82': [224.0, 272.0], '0.88': [240.0, 272.0], '0.94': [240.0, 256.0],\n     '1.0': [256.0, 256.0], '1.07': [256.0, 240.0], '1.13': [272.0, 240.0], '1.21': [272.0, 224.0],\n     '1.29': [288.0, 224.0], '1.38': [288.0, 208.0], '1.46': [304.0, 208.0], '1.67': [320.0, 192.0],\n     '1.75': [336.0, 192.0], '2.0': [352.0, 176.0], '2.09': [368.0, 176.0], '2.4': [384.0, 160.0],\n     '2.5': [400.0, 160.0], '3.0': [432.0, 144.0],\n     '4.0': [512.0, 128.0]\n}\n\nASPECT_RATIO_512_TEST = {\n     '0.25': [256.0, 1024.0], '0.28': [256.0, 928.0],\n     '0.32': [288.0, 896.0], '0.33': [288.0, 864.0], '0.35': [288.0, 832.0], '0.4': [320.0, 800.0],\n     '0.42': [320.0, 768.0], '0.48': [352.0, 736.0], '0.5': [352.0, 704.0], '0.52': [352.0, 672.0],\n     '0.57': [384.0, 672.0], '0.6': [384.0, 640.0], '0.68': [416.0, 608.0], '0.72': [416.0, 576.0],\n     '0.78': [448.0, 576.0], '0.82': [448.0, 544.0], '0.88': [480.0, 544.0], '0.94': [480.0, 512.0],\n     '1.0': [512.0, 512.0], '1.07': [512.0, 480.0], '1.13': [544.0, 480.0], '1.21': [544.0, 448.0],\n     '1.29': [576.0, 448.0], '1.38': [576.0, 416.0], '1.46': [608.0, 416.0], '1.67': [640.0, 384.0],\n     '1.75': [672.0, 384.0], '2.0': [704.0, 352.0], '2.09': [736.0, 352.0], '2.4': [768.0, 320.0],\n     '2.5': [800.0, 320.0], '3.0': [864.0, 288.0],\n     '4.0': [1024.0, 256.0]\n     }\n\nASPECT_RATIO_1024_TEST = {\n    '0.25': [512., 2048.], '0.28': [512., 1856.],\n    '0.32': [576., 1792.], '0.33': [576., 1728.], '0.35': [576., 1664.], '0.4':  [640., 1600.],\n    '0.42':  [640., 1536.], '0.48': [704., 1472.], '0.5': [704., 1408.], '0.52': [704., 1344.],\n    '0.57': [768., 1344.], '0.6': [768., 1280.], '0.68': [832., 1216.], '0.72': [832., 1152.],\n    '0.78': [896., 1152.], '0.82': [896., 1088.], '0.88': [960., 1088.], '0.94': [960., 1024.],\n    '1.0':  [1024., 1024.], '1.07': [1024.,  960.], '1.13': [1088.,  960.], '1.21': [1088.,  896.],\n    '1.29': [1152.,  896.], '1.38': [1152.,  832.], '1.46': [1216.,  832.], '1.67': [1280.,  768.],\n    '1.75': [1344.,  768.], '2.0':  [1408.,  704.], '2.09':  [1472.,  704.], '2.4':  [1536.,  640.],\n    '2.5':  [1600.,  640.], '3.0':  [1728.,  576.],\n    '4.0':  [2048.,  512.],\n}\n\n\ndef get_chunks(lst, n):\n    for i in range(0, len(lst), n):\n        yield lst[i:i + n]\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/data/transforms.py",
    "content": "import torchvision.transforms as T\n\nTRANSFORMS = {}\n\n\ndef register_transform(transform):\n    name = transform.__name__\n    if name in TRANSFORMS:\n        raise RuntimeError(f'Transform {name} has already registered.')\n    TRANSFORMS.update({name: transform})\n\n\ndef get_transform(type, resolution):\n    transform = TRANSFORMS[type](resolution)\n    transform = T.Compose(transform)\n    transform.image_size = resolution\n    return transform\n\n\n@register_transform\ndef default_train(n_px):\n    return [\n        T.Lambda(lambda img: img.convert('RGB')),\n        T.Resize(n_px),  # Image.BICUBIC\n        T.CenterCrop(n_px),\n        # T.RandomHorizontalFlip(),\n        T.ToTensor(),\n        T.Normalize([0.5], [0.5]),\n    ]\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/dpm_solver.py",
    "content": "import torch\nfrom .model import gaussian_diffusion as gd\nfrom .model.dpm_solver import model_wrapper, DPM_Solver, NoiseScheduleVP\n\n\ndef DPMS(model, condition, uncondition, cfg_scale, model_type='noise', noise_schedule=\"linear\", guidance_type='classifier-free', model_kwargs=None, diffusion_steps=1000):\n    if model_kwargs is None:\n        model_kwargs = {}\n    betas = torch.tensor(gd.get_named_beta_schedule(noise_schedule, diffusion_steps))\n\n    ## 1. Define the noise schedule.\n    noise_schedule = NoiseScheduleVP(schedule='discrete', betas=betas)\n\n    ## 2. Convert your discrete-time `model` to the continuous-time\n    ## noise prediction model. Here is an example for a diffusion model\n    ## `model` with the noise prediction type (\"noise\") .\n    model_fn = model_wrapper(\n        model,\n        noise_schedule,\n        model_type=model_type,\n        model_kwargs=model_kwargs,\n        guidance_type=guidance_type,\n        condition=condition,\n        unconditional_condition=uncondition,\n        guidance_scale=cfg_scale,\n    )\n    ## 3. Define dpm-solver and sample by multistep DPM-Solver.\n    return DPM_Solver(model_fn, noise_schedule, algorithm_type=\"dpmsolver++\")"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/iddpm.py",
    "content": "# Modified from OpenAI's diffusion repos\n#     GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py\n#     ADM:   https://github.com/openai/guided-diffusion/blob/main/guided_diffusion\n#     IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\nfrom diffusion.model.respace import SpacedDiffusion, space_timesteps\nfrom .model import gaussian_diffusion as gd\n\n\ndef IDDPM(\n        timestep_respacing,\n        noise_schedule=\"linear\",\n        use_kl=False,\n        sigma_small=False,\n        predict_xstart=False,\n        learn_sigma=True,\n        pred_sigma=True,\n        rescale_learned_sigmas=False,\n        diffusion_steps=1000,\n        snr=False,\n        return_startx=False,\n):\n    betas = gd.get_named_beta_schedule(noise_schedule, diffusion_steps)\n    if use_kl:\n        loss_type = gd.LossType.RESCALED_KL\n    elif rescale_learned_sigmas:\n        loss_type = gd.LossType.RESCALED_MSE\n    else:\n        loss_type = gd.LossType.MSE\n    if timestep_respacing is None or timestep_respacing == \"\":\n        timestep_respacing = [diffusion_steps]\n    return SpacedDiffusion(\n        use_timesteps=space_timesteps(diffusion_steps, timestep_respacing),\n        betas=betas,\n        model_mean_type=(\n            gd.ModelMeanType.START_X if predict_xstart else gd.ModelMeanType.EPSILON\n        ),\n        model_var_type=(\n            (gd.ModelVarType.LEARNED_RANGE if learn_sigma else (\n                                 gd.ModelVarType.FIXED_LARGE\n                                 if not sigma_small\n                                 else gd.ModelVarType.FIXED_SMALL\n                             )\n             )\n            if pred_sigma\n            else None\n        ),\n        loss_type=loss_type,\n        snr=snr,\n        return_startx=return_startx,\n        # rescale_timesteps=rescale_timesteps,\n    )"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/lcm_scheduler.py",
    "content": "# Copyright 2023 Stanford University Team and The HuggingFace Team. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\n# DISCLAIMER: This code is strongly influenced by https://github.com/pesser/pytorch_diffusion\n# and https://github.com/hojonathanho/diffusion\n\nimport math\nfrom dataclasses import dataclass\nfrom typing import List, Optional, Tuple, Union\n\nimport numpy as np\nimport torch\n\nfrom diffusers import ConfigMixin, SchedulerMixin\nfrom diffusers.configuration_utils import register_to_config\nfrom diffusers.utils import BaseOutput\n\n\n@dataclass\n# Copied from diffusers.schedulers.scheduling_ddpm.DDPMSchedulerOutput with DDPM->DDIM\nclass LCMSchedulerOutput(BaseOutput):\n    \"\"\"\n    Output class for the scheduler's `step` function output.\n    Args:\n        prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):\n            Computed sample `(x_{t-1})` of previous timestep. `prev_sample` should be used as next model input in the\n            denoising loop.\n        pred_original_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):\n            The predicted denoised sample `(x_{0})` based on the model output from the current timestep.\n            `pred_original_sample` can be used to preview progress or for guidance.\n    \"\"\"\n\n    prev_sample: torch.FloatTensor\n    denoised: Optional[torch.FloatTensor] = None\n\n\n# Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar\ndef betas_for_alpha_bar(\n        num_diffusion_timesteps,\n        max_beta=0.999,\n        alpha_transform_type=\"cosine\",\n):\n    \"\"\"\n    Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of\n    (1-beta) over time from t = [0,1].\n    Contains a function alpha_bar that takes an argument t and transforms it to the cumulative product of (1-beta) up\n    to that part of the diffusion process.\n    Args:\n        num_diffusion_timesteps (`int`): the number of betas to produce.\n        max_beta (`float`): the maximum beta to use; use values lower than 1 to\n                     prevent singularities.\n        alpha_transform_type (`str`, *optional*, default to `cosine`): the type of noise schedule for alpha_bar.\n                     Choose from `cosine` or `exp`\n    Returns:\n        betas (`np.ndarray`): the betas used by the scheduler to step the model outputs\n    \"\"\"\n    if alpha_transform_type == \"cosine\":\n\n        def alpha_bar_fn(t):\n            return math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2\n\n    elif alpha_transform_type == \"exp\":\n\n        def alpha_bar_fn(t):\n            return math.exp(t * -12.0)\n\n    else:\n        raise ValueError(f\"Unsupported alpha_tranform_type: {alpha_transform_type}\")\n\n    betas = []\n    for i in range(num_diffusion_timesteps):\n        t1 = i / num_diffusion_timesteps\n        t2 = (i + 1) / num_diffusion_timesteps\n        betas.append(min(1 - alpha_bar_fn(t2) / alpha_bar_fn(t1), max_beta))\n    return torch.tensor(betas, dtype=torch.float32)\n\n\ndef rescale_zero_terminal_snr(betas):\n    \"\"\"\n    Rescales betas to have zero terminal SNR Based on https://arxiv.org/pdf/2305.08891.pdf (Algorithm 1)\n    Args:\n        betas (`torch.FloatTensor`):\n            the betas that the scheduler is being initialized with.\n    Returns:\n        `torch.FloatTensor`: rescaled betas with zero terminal SNR\n    \"\"\"\n    # Convert betas to alphas_bar_sqrt\n    alphas = 1.0 - betas\n    alphas_cumprod = torch.cumprod(alphas, dim=0)\n    alphas_bar_sqrt = alphas_cumprod.sqrt()\n\n    # Store old values.\n    alphas_bar_sqrt_0 = alphas_bar_sqrt[0].clone()\n    alphas_bar_sqrt_T = alphas_bar_sqrt[-1].clone()\n\n    # Shift so the last timestep is zero.\n    alphas_bar_sqrt -= alphas_bar_sqrt_T\n\n    # Scale so the first timestep is back to the old value.\n    alphas_bar_sqrt *= alphas_bar_sqrt_0 / (alphas_bar_sqrt_0 - alphas_bar_sqrt_T)\n\n    # Convert alphas_bar_sqrt to betas\n    alphas_bar = alphas_bar_sqrt ** 2  # Revert sqrt\n    alphas = alphas_bar[1:] / alphas_bar[:-1]  # Revert cumprod\n    alphas = torch.cat([alphas_bar[:1], alphas])\n    betas = 1 - alphas\n\n    return betas\n\n\nclass LCMScheduler(SchedulerMixin, ConfigMixin):\n    \"\"\"\n    `LCMScheduler` extends the denoising procedure introduced in denoising diffusion probabilistic models (DDPMs) with\n    non-Markovian guidance.\n    This model inherits from [`SchedulerMixin`] and [`ConfigMixin`]. Check the superclass documentation for the generic\n    methods the library implements for all schedulers such as loading and saving.\n    Args:\n        num_train_timesteps (`int`, defaults to 1000):\n            The number of diffusion steps to train the model.\n        beta_start (`float`, defaults to 0.0001):\n            The starting `beta` value of inference.\n        beta_end (`float`, defaults to 0.02):\n            The final `beta` value.\n        beta_schedule (`str`, defaults to `\"linear\"`):\n            The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from\n            `linear`, `scaled_linear`, or `squaredcos_cap_v2`.\n        trained_betas (`np.ndarray`, *optional*):\n            Pass an array of betas directly to the constructor to bypass `beta_start` and `beta_end`.\n        clip_sample (`bool`, defaults to `True`):\n            Clip the predicted sample for numerical stability.\n        clip_sample_range (`float`, defaults to 1.0):\n            The maximum magnitude for sample clipping. Valid only when `clip_sample=True`.\n        set_alpha_to_one (`bool`, defaults to `True`):\n            Each diffusion step uses the alphas product value at that step and at the previous one. For the final step\n            there is no previous alpha. When this option is `True` the previous alpha product is fixed to `1`,\n            otherwise it uses the alpha value at step 0.\n        steps_offset (`int`, defaults to 0):\n            An offset added to the inference steps. You can use a combination of `offset=1` and\n            `set_alpha_to_one=False` to make the last step use step 0 for the previous alpha product like in Stable\n            Diffusion.\n        prediction_type (`str`, defaults to `epsilon`, *optional*):\n            Prediction type of the scheduler function; can be `epsilon` (predicts the noise of the diffusion process),\n            `sample` (directly predicts the noisy sample`) or `v_prediction` (see section 2.4 of [Imagen\n            Video](https://imagen.research.google/video/paper.pdf) paper).\n        thresholding (`bool`, defaults to `False`):\n            Whether to use the \"dynamic thresholding\" method. This is unsuitable for latent-space diffusion models such\n            as Stable Diffusion.\n        dynamic_thresholding_ratio (`float`, defaults to 0.995):\n            The ratio for the dynamic thresholding method. Valid only when `thresholding=True`.\n        sample_max_value (`float`, defaults to 1.0):\n            The threshold value for dynamic thresholding. Valid only when `thresholding=True`.\n        timestep_spacing (`str`, defaults to `\"leading\"`):\n            The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and\n            Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information.\n        rescale_betas_zero_snr (`bool`, defaults to `False`):\n            Whether to rescale the betas to have zero terminal SNR. This enables the model to generate very bright and\n            dark samples instead of limiting it to samples with medium brightness. Loosely related to\n            [`--offset_noise`](https://github.com/huggingface/diffusers/blob/74fd735eb073eb1d774b1ab4154a0876eb82f055/examples/dreambooth/train_dreambooth.py#L506).\n    \"\"\"\n\n    # _compatibles = [e.name for e in KarrasDiffusionSchedulers]\n    order = 1\n\n    @register_to_config\n    def __init__(\n            self,\n            num_train_timesteps: int = 1000,\n            beta_start: float = 0.0001,\n            beta_end: float = 0.02,\n            beta_schedule: str = \"linear\",\n            trained_betas: Optional[Union[np.ndarray, List[float]]] = None,\n            clip_sample: bool = True,\n            set_alpha_to_one: bool = True,\n            steps_offset: int = 0,\n            prediction_type: str = \"epsilon\",\n            thresholding: bool = False,\n            dynamic_thresholding_ratio: float = 0.995,\n            clip_sample_range: float = 1.0,\n            sample_max_value: float = 1.0,\n            timestep_spacing: str = \"leading\",\n            rescale_betas_zero_snr: bool = False,\n    ):\n        if trained_betas is not None:\n            self.betas = torch.tensor(trained_betas, dtype=torch.float32)\n        elif beta_schedule == \"linear\":\n            self.betas = torch.linspace(beta_start, beta_end, num_train_timesteps, dtype=torch.float32)\n        elif beta_schedule == \"scaled_linear\":\n            # this schedule is very specific to the latent diffusion model.\n            self.betas = (\n                    torch.linspace(beta_start ** 0.5, beta_end ** 0.5, num_train_timesteps, dtype=torch.float32) ** 2\n            )\n        elif beta_schedule == \"squaredcos_cap_v2\":\n            # Glide cosine schedule\n            self.betas = betas_for_alpha_bar(num_train_timesteps)\n        else:\n            raise NotImplementedError(f\"{beta_schedule} does is not implemented for {self.__class__}\")\n\n        # Rescale for zero SNR\n        if rescale_betas_zero_snr:\n            self.betas = rescale_zero_terminal_snr(self.betas)\n\n        self.alphas = 1.0 - self.betas\n        self.alphas_cumprod = torch.cumprod(self.alphas, dim=0)\n\n        # At every step in ddim, we are looking into the previous alphas_cumprod\n        # For the final step, there is no previous alphas_cumprod because we are already at 0\n        # `set_alpha_to_one` decides whether we set this parameter simply to one or\n        # whether we use the final alpha of the \"non-previous\" one.\n        self.final_alpha_cumprod = torch.tensor(1.0) if set_alpha_to_one else self.alphas_cumprod[0]\n\n        # standard deviation of the initial noise distribution\n        self.init_noise_sigma = 1.0\n\n        # setable values\n        self.num_inference_steps = None\n        self.timesteps = torch.from_numpy(np.arange(0, num_train_timesteps)[::-1].copy().astype(np.int64))\n\n    def scale_model_input(self, sample: torch.FloatTensor, timestep: Optional[int] = None) -> torch.FloatTensor:\n        \"\"\"\n        Ensures interchangeability with schedulers that need to scale the denoising model input depending on the\n        current timestep.\n        Args:\n            sample (`torch.FloatTensor`):\n                The input sample.\n            timestep (`int`, *optional*):\n                The current timestep in the diffusion chain.\n        Returns:\n            `torch.FloatTensor`:\n                A scaled input sample.\n        \"\"\"\n        return sample\n\n    def _get_variance(self, timestep, prev_timestep):\n        alpha_prod_t = self.alphas_cumprod[timestep]\n        alpha_prod_t_prev = self.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else self.final_alpha_cumprod\n        beta_prod_t = 1 - alpha_prod_t\n        beta_prod_t_prev = 1 - alpha_prod_t_prev\n\n        return (beta_prod_t_prev / beta_prod_t) * (1 - alpha_prod_t / alpha_prod_t_prev)\n\n    # Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler._threshold_sample\n    def _threshold_sample(self, sample: torch.FloatTensor) -> torch.FloatTensor:\n        \"\"\"\n        \"Dynamic thresholding: At each sampling step we set s to a certain percentile absolute pixel value in xt0 (the\n        prediction of x_0 at timestep t), and if s > 1, then we threshold xt0 to the range [-s, s] and then divide by\n        s. Dynamic thresholding pushes saturated pixels (those near -1 and 1) inwards, thereby actively preventing\n        pixels from saturation at each step. We find that dynamic thresholding results in significantly better\n        photorealism as well as better image-text alignment, especially when using very large guidance weights.\"\n        https://arxiv.org/abs/2205.11487\n        \"\"\"\n        dtype = sample.dtype\n        batch_size, channels, height, width = sample.shape\n\n        if dtype not in (torch.float32, torch.float64):\n            sample = sample.float()  # upcast for quantile calculation, and clamp not implemented for cpu half\n\n        # Flatten sample for doing quantile calculation along each image\n        sample = sample.reshape(batch_size, channels * height * width)\n\n        abs_sample = sample.abs()  # \"a certain percentile absolute pixel value\"\n\n        s = torch.quantile(abs_sample, self.config.dynamic_thresholding_ratio, dim=1)\n        s = torch.clamp(\n            s, min=1, max=self.config.sample_max_value\n        )  # When clamped to min=1, equivalent to standard clipping to [-1, 1]\n\n        s = s.unsqueeze(1)  # (batch_size, 1) because clamp will broadcast along dim=0\n        sample = torch.clamp(sample, -s, s) / s  # \"we threshold xt0 to the range [-s, s] and then divide by s\"\n\n        sample = sample.reshape(batch_size, channels, height, width)\n        sample = sample.to(dtype)\n\n        return sample\n\n    def set_timesteps(self, num_inference_steps: int, lcm_origin_steps: int, device: Union[str, torch.device] = None):\n        \"\"\"\n        Sets the discrete timesteps used for the diffusion chain (to be run before inference).\n        Args:\n            num_inference_steps (`int`):\n                The number of diffusion steps used when generating samples with a pre-trained model.\n        \"\"\"\n\n        if num_inference_steps > self.config.num_train_timesteps:\n            raise ValueError(\n                f\"`num_inference_steps`: {num_inference_steps} cannot be larger than `self.config.train_timesteps`:\"\n                f\" {self.config.num_train_timesteps} as the unet model trained with this scheduler can only handle\"\n                f\" maximal {self.config.num_train_timesteps} timesteps.\"\n            )\n\n        self.num_inference_steps = num_inference_steps\n\n        # LCM Timesteps Setting:  # Linear Spacing\n        c = self.config.num_train_timesteps // lcm_origin_steps\n        lcm_origin_timesteps = np.asarray(list(range(1, lcm_origin_steps + 1))) * c - 1  # LCM Training  Steps Schedule\n        skipping_step = len(lcm_origin_timesteps) // num_inference_steps\n        timesteps = lcm_origin_timesteps[::-skipping_step][:num_inference_steps]  # LCM Inference Steps Schedule\n\n        self.timesteps = torch.from_numpy(timesteps.copy()).to(device)\n\n    def get_scalings_for_boundary_condition_discrete(self, t):\n        self.sigma_data = 0.5  # Default: 0.5\n\n        # By dividing 0.1: This is almost a delta function at t=0.\n        c_skip = self.sigma_data ** 2 / ((t / 0.1) ** 2 + self.sigma_data ** 2)\n        c_out = ((t / 0.1) / ((t / 0.1) ** 2 + self.sigma_data ** 2) ** 0.5)\n        return c_skip, c_out\n\n    def step(\n            self,\n            model_output: torch.FloatTensor,\n            timeindex: int,\n            timestep: int,\n            sample: torch.FloatTensor,\n            eta: float = 0.0,\n            use_clipped_model_output: bool = False,\n            generator=None,\n            variance_noise: Optional[torch.FloatTensor] = None,\n            return_dict: bool = True,\n    ) -> Union[LCMSchedulerOutput, Tuple]:\n        \"\"\"\n        Predict the sample from the previous timestep by reversing the SDE. This function propagates the diffusion\n        process from the learned model outputs (most often the predicted noise).\n        Args:\n            model_output (`torch.FloatTensor`):\n                The direct output from learned diffusion model.\n            timestep (`float`):\n                The current discrete timestep in the diffusion chain.\n            sample (`torch.FloatTensor`):\n                A current instance of a sample created by the diffusion process.\n            eta (`float`):\n                The weight of noise for added noise in diffusion step.\n            use_clipped_model_output (`bool`, defaults to `False`):\n                If `True`, computes \"corrected\" `model_output` from the clipped predicted original sample. Necessary\n                because predicted original sample is clipped to [-1, 1] when `self.config.clip_sample` is `True`. If no\n                clipping has happened, \"corrected\" `model_output` would coincide with the one provided as input and\n                `use_clipped_model_output` has no effect.\n            generator (`torch.Generator`, *optional*):\n                A random number generator.\n            variance_noise (`torch.FloatTensor`):\n                Alternative to generating noise with `generator` by directly providing the noise for the variance\n                itself. Useful for methods such as [`CycleDiffusion`].\n            return_dict (`bool`, *optional*, defaults to `True`):\n                Whether or not to return a [`~schedulers.scheduling_lcm.LCMSchedulerOutput`] or `tuple`.\n        Returns:\n            [`~schedulers.scheduling_utils.LCMSchedulerOutput`] or `tuple`:\n                If return_dict is `True`, [`~schedulers.scheduling_lcm.LCMSchedulerOutput`] is returned, otherwise a\n                tuple is returned where the first element is the sample tensor.\n        \"\"\"\n        if self.num_inference_steps is None:\n            raise ValueError(\n                \"Number of inference steps is 'None', you need to run 'set_timesteps' after creating the scheduler\"\n            )\n\n        # 1. get previous step value\n        prev_timeindex = timeindex + 1\n        if prev_timeindex < len(self.timesteps):\n            prev_timestep = self.timesteps[prev_timeindex]\n        else:\n            prev_timestep = timestep\n\n        # 2. compute alphas, betas\n        alpha_prod_t = self.alphas_cumprod[timestep]\n        alpha_prod_t_prev = self.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else self.final_alpha_cumprod\n\n        beta_prod_t = 1 - alpha_prod_t\n        beta_prod_t_prev = 1 - alpha_prod_t_prev\n\n        # 3. Get scalings for boundary conditions\n        c_skip, c_out = self.get_scalings_for_boundary_condition_discrete(timestep)\n\n        # 4. Different Parameterization:\n        parameterization = self.config.prediction_type\n\n        if parameterization == \"epsilon\":  # noise-prediction\n            pred_x0 = (sample - beta_prod_t.sqrt() * model_output) / alpha_prod_t.sqrt()\n\n        elif parameterization == \"sample\":  # x-prediction\n            pred_x0 = model_output\n\n        elif parameterization == \"v_prediction\":  # v-prediction\n            pred_x0 = alpha_prod_t.sqrt() * sample - beta_prod_t.sqrt() * model_output\n\n        # 4. Denoise model output using boundary conditions\n        denoised = c_out * pred_x0 + c_skip * sample\n\n        # 5. Sample z ~ N(0, I), For MultiStep Inference\n        # Noise is not used for one-step sampling.\n        if len(self.timesteps) > 1:\n            noise = torch.randn(model_output.shape).to(model_output.device)\n            prev_sample = alpha_prod_t_prev.sqrt() * denoised + beta_prod_t_prev.sqrt() * noise\n        else:\n            prev_sample = denoised\n\n        if not return_dict:\n            return (prev_sample, denoised)\n\n        return LCMSchedulerOutput(prev_sample=prev_sample, denoised=denoised)\n\n    # Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler.add_noise\n    def add_noise(\n            self,\n            original_samples: torch.FloatTensor,\n            noise: torch.FloatTensor,\n            timesteps: torch.IntTensor,\n    ) -> torch.FloatTensor:\n        # Make sure alphas_cumprod and timestep have same device and dtype as original_samples\n        alphas_cumprod = self.alphas_cumprod.to(device=original_samples.device, dtype=original_samples.dtype)\n        timesteps = timesteps.to(original_samples.device)\n\n        sqrt_alpha_prod = alphas_cumprod[timesteps] ** 0.5\n        sqrt_alpha_prod = sqrt_alpha_prod.flatten()\n        while len(sqrt_alpha_prod.shape) < len(original_samples.shape):\n            sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1)\n\n        sqrt_one_minus_alpha_prod = (1 - alphas_cumprod[timesteps]) ** 0.5\n        sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.flatten()\n        while len(sqrt_one_minus_alpha_prod.shape) < len(original_samples.shape):\n            sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1)\n\n        return sqrt_alpha_prod * original_samples + sqrt_one_minus_alpha_prod * noise\n\n    # Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler.get_velocity\n    def get_velocity(\n            self, sample: torch.FloatTensor, noise: torch.FloatTensor, timesteps: torch.IntTensor\n    ) -> torch.FloatTensor:\n        # Make sure alphas_cumprod and timestep have same device and dtype as sample\n        alphas_cumprod = self.alphas_cumprod.to(device=sample.device, dtype=sample.dtype)\n        timesteps = timesteps.to(sample.device)\n\n        sqrt_alpha_prod = alphas_cumprod[timesteps] ** 0.5\n        sqrt_alpha_prod = sqrt_alpha_prod.flatten()\n        while len(sqrt_alpha_prod.shape) < len(sample.shape):\n            sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1)\n\n        sqrt_one_minus_alpha_prod = (1 - alphas_cumprod[timesteps]) ** 0.5\n        sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.flatten()\n        while len(sqrt_one_minus_alpha_prod.shape) < len(sample.shape):\n            sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1)\n\n        return sqrt_alpha_prod * noise - sqrt_one_minus_alpha_prod * sample\n\n    def __len__(self):\n        return self.config.num_train_timesteps\n\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/__init__.py",
    "content": "from .nets import *\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/builder.py",
    "content": "from mmcv import Registry\n\nfrom diffusion.model.utils import set_grad_checkpoint\n\nMODELS = Registry('models')\n\n\ndef build_model(cfg, use_grad_checkpoint=False, use_fp32_attention=False, gc_step=1, **kwargs):\n    if isinstance(cfg, str):\n        cfg = dict(type=cfg)\n    model = MODELS.build(cfg, default_args=kwargs)\n    if use_grad_checkpoint:\n        set_grad_checkpoint(model, use_fp32_attention=use_fp32_attention, gc_step=gc_step)\n    return model\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/cache_functions/__init__.py",
    "content": "from .cache_cutfresh import cache_cutfresh\nfrom .fresh_ratio_scheduler import fresh_ratio_scheduler\nfrom .score_evaluate import score_evaluate\nfrom .global_force_fresh import global_force_fresh\nfrom .cache_cutfresh import cache_cutfresh\nfrom .update_cache import update_cache\nfrom .force_init import force_init\nfrom .attention import cached_attention_forward\nfrom .cache_init import cache_init"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/cache_functions/attention.py",
    "content": "# Besides, re-arrange the attention module\nfrom torch.jit import Final\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom typing import Optional, Union\nfrom xformers.ops.fmha.attn_bias import BlockDiagonalMask\ndef cached_attention_forward(\n    query: torch.Tensor,\n    key: torch.Tensor,\n    value: torch.Tensor,\n    attn_bias: Optional[Union[torch.Tensor, BlockDiagonalMask]] = None,\n    p: float = 0.0,\n    scale: Optional[float] = None\n) -> torch.Tensor:\n    scale = 1.0 / query.shape[-1] ** 0.5\n    query = query * scale\n    query = query.transpose(1, 2)\n    key = key.transpose(1, 2)\n    value = value.transpose(1, 2)\n    attn = query @ key.transpose(-2, -1)\n    if attn_bias is not None:\n        attn_bias = attn_bias.materialize(shape= attn.shape, dtype= attn.dtype, device= attn.device)\n        attn = attn + attn_bias\n    #out_map = attn\n    attn_map = attn.softmax(-1)\n    attn = F.dropout(attn_map, p)\n    attn = attn @ value\n\n    return attn.transpose(1, 2).contiguous(), attn_map.mean(dim=1)"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/cache_functions/cache_cutfresh.py",
    "content": "from .fresh_ratio_scheduler import fresh_ratio_scheduler\nfrom .score_evaluate import score_evaluate\n#from .token_merge import token_merge\nimport torch\ndef cache_cutfresh(cache_dic, tokens, current):\n    '''\n    Cut fresh tokens from the input tokens and update the cache counter.\n    \n    cache_dic: dict, the cache dictionary containing cache(main extra memory cost), indices and some other information.\n    tokens: torch.Tensor, the input tokens to be cut.\n    current: dict, the current step, layer, and module information. Particularly convenient for debugging.\n    '''\n    step = current['step']\n    layer = current['layer']\n    module = current['module']\n    \n    fresh_ratio = fresh_ratio_scheduler(cache_dic, current)\n    fresh_ratio = torch.clamp(torch.tensor(fresh_ratio, device = tokens.device), min=0, max=1)\n    # Generate the index tensor for fresh tokens\n    score = score_evaluate(cache_dic, tokens, current) # s1, s2, s3 mentioned in the paper\n    score = local_selection_with_bonus(score, 0.4, 4) # Uniform Spatial Distribution s4 mentioned in the paper\n    indices = score.argsort(dim=-1, descending=True)\n    topk = int(fresh_ratio * score.shape[1])\n    fresh_indices = indices[:, :topk]\n    stale_indices = indices[:, topk:]\n    # (B, fresh_ratio *N)\n\n    # Updating the Cache Frequency Score s3 mentioned in the paper\n    # stale tokens index + 1 in each ***module***, fresh tokens index = 0\n    cache_dic['cache_index'][-1][layer][module] += 1\n    cache_dic['cache_index'][-1][layer][module].scatter_(dim=1, index=fresh_indices, \n                                                                    src = torch.zeros_like(fresh_indices, dtype=torch.int, device=fresh_indices.device))\n    cache_dic['cache_index']['layer_index'][module] += 1\n    cache_dic['cache_index']['layer_index'][module].scatter_(dim=1, index=fresh_indices, \n                                                                    src = torch.zeros_like(fresh_indices, dtype=torch.int, device=fresh_indices.device))\n    \n    fresh_indices_expand = fresh_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1])\n    if module in ['mlp', 'attn', 'cross-attn']:\n\n        fresh_tokens = torch.gather(input = tokens, dim = 1, index = fresh_indices_expand)\n\n        return fresh_indices, fresh_tokens\n    else:\n        raise ValueError(\"Unrecognized module?\", module)\n    \ndef local_selection_with_bonus(score, bonus_ratio, grid_size=2):\n    batch_size, num_tokens = score.shape\n    image_size = int(num_tokens ** 0.5)\n    block_size = grid_size * grid_size\n    \n    assert num_tokens % block_size == 0, \"The number of tokens must be divisible by the block size.\"\n    \n    # Step 1: Reshape score to group it by blocks\n    score_reshaped = score.view(batch_size, image_size // grid_size, grid_size, image_size // grid_size, grid_size)\n    score_reshaped = score_reshaped.permute(0, 1, 3, 2, 4).contiguous()\n    score_reshaped = score_reshaped.view(batch_size, -1, block_size)  # [batch_size, num_blocks, block_size]\n    \n    # Step 2: Find the max token in each block\n    max_scores, max_indices = score_reshaped.max(dim=-1, keepdim=True)  # [batch_size, num_blocks, 1]\n    \n    # Step 3: Create a mask to identify max score tokens\n    mask = torch.zeros_like(score_reshaped)\n    mask.scatter_(-1, max_indices, 1)  # Set mask to 1 at the max indices\n    \n    # Step 4: Apply the bonus only to the max score tokens\n    score_reshaped = score_reshaped + (mask * max_scores * bonus_ratio)  # Apply bonus only to max tokens\n    \n    # Step 5: Reshape the score back to its original shape\n    score_modified = score_reshaped.view(batch_size, image_size // grid_size, image_size // grid_size, grid_size, grid_size)\n    score_modified = score_modified.permute(0, 1, 3, 2, 4).contiguous()\n    score_modified = score_modified.view(batch_size, num_tokens)\n    \n    return score_modified"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/cache_functions/cache_init.py",
    "content": "def cache_init(model_kwargs, num_steps):   \n    '''\n    Initialization for cache.\n    '''\n    cache_dic = {}\n    cache = {}\n    cache_index = {}\n    cache[-1]={}\n    cache_index[-1]={}\n    cache_index['layer_index']={}\n    cache_dic['attn_map'] = {}\n    cache_dic['attn_map'][-1] = {}\n    cache_dic['cross_attn_map'] = {}\n    cache_dic['cross_attn_map'][-1] = {}\n\n    for j in range(28):\n        cache[-1][j] = {}\n        cache_index[-1][j] = {}\n        cache_dic['attn_map'][-1][j] = {}\n        cache_dic['cross_attn_map'][-1][j] = {}\n\n    cache_dic['cache_type'] = model_kwargs['cache_type']\n    cache_dic['cache_index'] = cache_index\n    cache_dic['cache'] = cache\n    cache_dic['fresh_ratio_schedule'] = model_kwargs['ratio_scheduler']\n    cache_dic['fresh_ratio'] = model_kwargs['fresh_ratio']\n    cache_dic['fresh_threshold'] = model_kwargs['fresh_threshold']\n    cache_dic['force_fresh'] = model_kwargs['force_fresh']\n    cache_dic['soft_fresh_weight'] = model_kwargs['soft_fresh_weight']\n    #cache_dic['merge_weight'] = merge_weight\n    current = {}\n    current['num_steps'] = num_steps\n    return cache_dic, current\n    "
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/cache_functions/force_init.py",
    "content": "import torch\nfrom .force_scheduler import force_scheduler\ndef force_init(cache_dic, current, tokens):\n    '''\n    Initialization for Force Activation step.\n    '''\n    cache_dic['cache_index'][-1][current['layer']][current['module']] = torch.zeros(tokens.shape[0], tokens.shape[1], dtype=torch.int, device=tokens.device)\n    force_scheduler(cache_dic, current)\n    if current['layer'] == 0:\n        cache_dic['cache_index']['layer_index'][current['module']] = torch.zeros(tokens.shape[0], tokens.shape[1], dtype=torch.int, device=tokens.device)"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/cache_functions/force_scheduler.py",
    "content": "import torch\ndef force_scheduler(cache_dic, current):\n    if cache_dic['fresh_ratio'] == 0:\n        # FORA\n        linear_step_weight = 0.0\n    else: \n        # TokenCache\n        linear_step_weight = 0.2\n    step_factor = torch.tensor(1 - linear_step_weight + 2 * linear_step_weight * current['step'] / current['num_steps'])\n    threshold = torch.round(cache_dic['fresh_threshold'] / step_factor)\n\n    # no force constrain for sensitive steps, cause the performance is good enough.\n    # you may have a try.\n    \n    cache_dic['cal_threshold'] = threshold\n    #return threshold"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/cache_functions/fresh_ratio_scheduler.py",
    "content": "import torch\ndef fresh_ratio_scheduler(cache_dic, current):\n    '''\n    Return the fresh ratio for the current step.\n    '''\n    fresh_ratio = cache_dic['fresh_ratio']\n    fresh_ratio_schedule = cache_dic['fresh_ratio_schedule']\n    step = current['step']\n    num_steps = current['num_steps']\n    threshold = cache_dic['fresh_threshold']\n    weight = 0.9\n    if fresh_ratio_schedule == 'constant':\n        return fresh_ratio\n    elif fresh_ratio_schedule == 'linear':\n        return fresh_ratio * (1 + weight - 2 * weight * step / num_steps)\n    elif fresh_ratio_schedule == 'exp':\n        #return 0.5 * (0.052 ** (step/num_steps))\n        return fresh_ratio * (weight ** (step / num_steps))\n    elif fresh_ratio_schedule == 'linear-mode':\n        mode = (step % threshold)/threshold - 0.5\n        mode_weight = 0.1\n        return fresh_ratio * (1 + weight - 2 * weight * step / num_steps + mode_weight * mode)\n    elif fresh_ratio_schedule == 'layerwise':\n        return fresh_ratio * (1 + weight - 2 * weight * current['layer'] / 27)\n    elif fresh_ratio_schedule == 'linear-layerwise':\n        step_weight = -0.9 #0.9\n        step_factor = 1 - step_weight + 2 * step_weight * step / num_steps\n        #if current['layer'] == 2:\n        #    return 1.0\n        #sigmoid\n        #sigmoid_weight = 0.13\n        #layer_factor = 2 * torch.sigmoid(torch.tensor([sigmoid_weight * (13.5 - current['layer'])]))\n        layer_weight = 0.6\n        layer_factor = 1 + layer_weight - 2 * layer_weight * current['layer'] / 27\n\n        module_weight = 1.0 #TokenCache N=8 2.5 N=6 2.5 #N=4 2.1\n        module_time_weight = 0.6\n        module_factor = (1 - (1-module_time_weight) * module_weight) if current['module']=='cross-attn' else (1 + module_time_weight * module_weight)\n        \n        return fresh_ratio * layer_factor * step_factor * module_factor\n\n    elif fresh_ratio_schedule == 'ToCa':\n        step_weight = -0.9 #0.9\n        step_factor = 1 - step_weight + 2 * step_weight * step / num_steps\n\n        layer_weight = 0.6\n        layer_factor = 1 + layer_weight - 2 * layer_weight * current['layer'] / 27\n\n        module_weight = 1.0\n        module_time_weight = 0.6\n        # this means 60*x% cross-attn computation, and 160*x% mlp computation. This is designed for cross-attn has best temporal redundancy, and mlp has worse.\n        # so cross-attn compute less and mlp compute more.\n        module_factor = (1 - (1-module_time_weight) * module_weight) if current['module']=='cross-attn' else (1 + module_time_weight * module_weight)\n        \n        return fresh_ratio * layer_factor * step_factor * module_factor\n\n    else:\n        raise ValueError(\"unrecognized fresh ratio schedule\", fresh_ratio_schedule)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/cache_functions/global_force_fresh.py",
    "content": "from .force_scheduler import force_scheduler\ndef global_force_fresh(cache_dic, current):\n    '''\n    Return whether to force fresh tokens globally.\n    '''\n    first_step = (current['step'] == 0)\n    force_fresh = cache_dic['force_fresh']\n    if not first_step:\n        fresh_threshold = cache_dic['cal_threshold']\n    else:\n        fresh_threshold = cache_dic['fresh_threshold']\n\n    if force_fresh == 'global':\n        return (first_step or (current['step']% fresh_threshold == 0))\n    elif force_fresh == 'local':\n        return first_step\n    elif force_fresh == 'none':\n        return first_step\n    else:\n        raise ValueError(\"unrecognized force fresh strategy\", force_fresh)"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/cache_functions/score_evaluate.py",
    "content": "import torch\nimport torch.nn as nn\nfrom .scores import attn_score, similarity_score, norm_score\ndef score_evaluate(cache_dic, tokens, current) -> torch.Tensor:\n    '''\n    Return the score tensor (B, N) for the given tokens.\n    '''\n\n    #if ((not current['is_force_fresh']) and (cache_dic['force_fresh'] == 'local')):\n    #    # abandoned branch, if you want to explore the local force fresh strategy, this may help.\n    #    force_fresh_mask = torch.as_tensor((cache_dic['cache_index'][-1][current['layer']][current['module']] >= 2 * cache_dic['fresh_threshold']), dtype = int) # 2 because the threshold is for step, not module\n    #    force_len = force_fresh_mask.sum(dim=1)\n    #    force_indices = force_fresh_mask.argsort(dim = -1, descending = True)[:, :force_len.min()]\n    #    force_indices = force_indices[:, torch.randperm(force_indices.shape[1])]\n\n    # Just see more explanation in the version of DiT-ToCa if needed.\n\n    if cache_dic['cache_type'] == 'random':\n        score = torch.rand(int(tokens.shape[0]*0.5), tokens.shape[1], device=tokens.device)\n        score = torch.cat([score, score], dim=0).to(tokens.device)\n\n    elif cache_dic['cache_type'] == 'straight':\n        score = torch.ones(tokens.shape[0], tokens.shape[1]).to(tokens.device)\n    \n    elif cache_dic['cache_type'] == 'attention':\n        # cache_dic['attn_map'][step][layer] (B, N, N), the last dimention has get softmaxed\n        score = attn_score(cache_dic, current)\n        #score = score + 0.0 * torch.rand_like(score, device= score.device)\n    \n    elif cache_dic['cache_type'] == 'similarity':\n        score = similarity_score(cache_dic, current, tokens)\n\n    elif cache_dic['cache_type'] == 'norm':\n        score = norm_score(cache_dic, current, tokens)\n\n    elif cache_dic['cache_type'] == 'compress':\n        score1 = torch.rand(int(tokens.shape[0]*0.5), tokens.shape[1])\n        score1 = torch.cat([score1, score1], dim=0).to(tokens.device)\n        score2 = cache_dic['attn_map'][-1][current['layer']].sum(dim=1)#.mean(dim=0) # (B, N)\n        # normalize\n        score2 = score2 / score2.max(dim=1, keepdim=True)[0]\n        score = 0.5 * score1 + 0.5 * score2\n    \n    # abandoned the branch, if you want to explore the local force fresh strategy, this may help.\n    #if ((not current['is_force_fresh']) and (cache_dic['force_fresh'] == 'local')): # current['is_force_fresh'] is False, cause when it is True, no cut and fresh are needed\n    #        #print(torch.ones_like(force_indices, dtype=float, device=force_indices.device).dtype)\n    #    score.scatter_(dim=1, index=force_indices, src=torch.ones_like(force_indices, dtype=torch.float32, \n    #                                                                       device=force_indices.device))\n    \n    if (True and (cache_dic['force_fresh'] == 'global')):\n        soft_step_score = cache_dic['cache_index'][-1][current['layer']][current['module']].float() / (cache_dic['fresh_threshold'])\n        soft_layer_score = cache_dic['cache_index']['layer_index'][current['module']].float() / (27)\n        score = score + cache_dic['soft_fresh_weight'] * soft_step_score #+ 0.1 *soft_layer_score\n    \n    return score.to(tokens.device)"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/cache_functions/scores.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\ndef attn_score(cache_dic, current):\n    #self_attn_score = 1- cache_dic['attn_map'][-1][current['layer']].diagonal(dim1=1, dim2=2)\n    #self_attn_score = F.normalize(self_attn_score, dim=1, p=2)\n    #attention_score = F.normalize(cache_dic['attn_map'][-1][current['layer']].sum(dim=1), dim=1, p=2)\n    #cross_attn_map = F.threshold(cache_dic['cross_attn_map'][-1][current['layer']],threshold=0.0, value=0.0)\n    #cross_attention_score = F.normalize(cross_attn_map.sum(dim=-1), dim=-1, p=2)\n\n    # Note: It is important to give a same selection method for cfg and no cfg.\n    # Because the influence of **Cross-Attention** in text-contidional models makes cfg and no cfg a BIG difference.\n\n    # Same selection for cfg and no cfg\n    cond_cmap, uncond_cmap = torch.split(cache_dic['cross_attn_map'][-1][current['layer']], len(cache_dic['cross_attn_map'][-1][current['layer']]) // 2, dim=0)\n    cond_weight = 0.5\n    cmap = cond_weight * cond_cmap + (1 - cond_weight) * uncond_cmap\n\n    # Entropy score\n    cross_attention_entropy = -torch.sum(cmap * torch.log(cmap + 1e-7), dim=-1)\n    cross_attention_score   = F.normalize(1 + cross_attention_entropy, dim=1, p=2) # Note here \"1\" does not influence the sorted sequence, but provie stability.\n    score = cross_attention_score.repeat(2, 1)\n\n    # In PixArt, the cross_attention_score (s2) is used as the score, for a better text-image alignment.\n\n    # You can try conbining the self_attention_score (s1) and cross_attention_score (s2) as the final score, there exists a balance.\n    #cross_weight = 0.0\n    #score =  (1-cross_weight) * attention_score + cross_weight * cross_attention_score\n    return score\n\ndef similarity_score(cache_dic, current, tokens):\n    cosine_sim = F.cosine_similarity(tokens, cache_dic['cache'][-1][current['layer']][current['module']], dim=-1)\n\n    return F.normalize(1- cosine_sim, dim=-1, p=2)\n\ndef norm_score(cache_dic, current, tokens):\n    norm = tokens.norm(dim=-1, p=2)\n    return F.normalize(norm, dim=-1, p=2)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/cache_functions/token_merge.py",
    "content": "import torch\ndef token_merge(cache_dic, tokens, current, fresh_indices, stale_indices):\n    '''\n    An abandoned branch in exploring if token merge helps. The answer is no, at least no for training-free strategy.\n    '''\n    if (current['layer'] % 1 == 0):\n        fresh_tokens = torch.gather(input = tokens, dim = 1, index = fresh_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1]))\n        stale_tokens = torch.gather(input = tokens, dim = 1, index = stale_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1]))\n        method = 'similarity'\n        if method == 'distance':\n            descending = False\n            distance = torch.cdist(stale_tokens, fresh_tokens, p=1)\n            stale_fresh_dist, stale_fresh_indices_allstale = torch.min(distance, dim=2)\n        elif method == 'similarity':\n            descending = True\n            fresh_tokens = torch.nn.functional.normalize(fresh_tokens, p=2, dim=-1)\n            stale_tokens = torch.nn.functional.normalize(stale_tokens, p=2, dim=-1)\n            similarity = stale_tokens @ fresh_tokens.transpose(1, 2)\n            stale_fresh_dist, stale_fresh_indices_allstale = torch.max(similarity, dim=2)\n        \n\n        saved_topk_stale = int((stale_fresh_dist > 0.995).sum(dim=1).min())\n        merged_stale_sequence = torch.sort(stale_fresh_dist, dim=1, descending=descending)[1][:,:saved_topk_stale]\n        stale_fresh_indices = stale_fresh_indices_allstale.gather(1, merged_stale_sequence)\n        merged_stale_sequence = stale_indices.gather(1, merged_stale_sequence)\n        merged_stale_fresh_indices = fresh_indices.gather(1, stale_fresh_indices)\n        cache_dic['merged_stale_fresh_indices'] = merged_stale_fresh_indices\n        cache_dic['merged_stale_sequence'] = merged_stale_sequence \n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/cache_functions/update_cache.py",
    "content": "import torch\ndef update_cache(fresh_indices, fresh_tokens, cache_dic, current, fresh_attn_map=None):\n    '''\n    Update the cache with the fresh tokens.\n    '''\n    step = current['step']\n    layer = current['layer']\n    module = current['module']\n    # Update the cached tokens at the positions\n    if module == 'attn':\n        # this branch is not used in the final version, but if you explore the partial fresh strategy of attention, it works (probably a few bugs).\n        indices = fresh_indices#.sort(dim=1, descending=False)[0]\n        cache_dic['attn_map'][-1][layer].scatter_(dim=1, index=indices.unsqueeze(-1).expand(-1, -1, fresh_attn_map.shape[-1]), src=fresh_attn_map)\n    elif module == 'cross-attn':\n        indices = fresh_indices#.sort(dim=1, descending=False)[0]\n        cache_dic['cross_attn_map'][-1][layer].scatter_(dim=1, index=indices.unsqueeze(-1).expand(-1, -1, fresh_attn_map.shape[-1]), src=fresh_attn_map)\n    elif module == 'mlp':\n        indices = fresh_indices\n\n    cache_dic['cache'][-1][layer][module].scatter_(dim=1, index=indices.unsqueeze(-1).expand(-1, -1, fresh_tokens.shape[-1]), src=fresh_tokens)\n    \n    \n\n        \n        "
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/diffusion_utils.py",
    "content": "# Modified from OpenAI's diffusion repos\n#     GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py\n#     ADM:   https://github.com/openai/guided-diffusion/blob/main/guided_diffusion\n#     IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\n\nimport numpy as np\nimport torch as th\n\n\ndef normal_kl(mean1, logvar1, mean2, logvar2):\n    \"\"\"\n    Compute the KL divergence between two gaussians.\n    Shapes are automatically broadcasted, so batches can be compared to\n    scalars, among other use cases.\n    \"\"\"\n    tensor = next(\n        (\n            obj\n            for obj in (mean1, logvar1, mean2, logvar2)\n            if isinstance(obj, th.Tensor)\n        ),\n        None,\n    )\n    assert tensor is not None, \"at least one argument must be a Tensor\"\n\n    # Force variances to be Tensors. Broadcasting helps convert scalars to\n    # Tensors, but it does not work for th.exp().\n    logvar1, logvar2 = [\n        x if isinstance(x, th.Tensor) else th.tensor(x, device=tensor.device)\n        for x in (logvar1, logvar2)\n    ]\n\n    return 0.5 * (\n        -1.0\n        + logvar2\n        - logvar1\n        + th.exp(logvar1 - logvar2)\n        + ((mean1 - mean2) ** 2) * th.exp(-logvar2)\n    )\n\n\ndef approx_standard_normal_cdf(x):\n    \"\"\"\n    A fast approximation of the cumulative distribution function of the\n    standard normal.\n    \"\"\"\n    return 0.5 * (1.0 + th.tanh(np.sqrt(2.0 / np.pi) * (x + 0.044715 * th.pow(x, 3))))\n\n\ndef continuous_gaussian_log_likelihood(x, *, means, log_scales):\n    \"\"\"\n    Compute the log-likelihood of a continuous Gaussian distribution.\n    :param x: the targets\n    :param means: the Gaussian mean Tensor.\n    :param log_scales: the Gaussian log stddev Tensor.\n    :return: a tensor like x of log probabilities (in nats).\n    \"\"\"\n    centered_x = x - means\n    inv_stdv = th.exp(-log_scales)\n    normalized_x = centered_x * inv_stdv\n    return th.distributions.Normal(th.zeros_like(x), th.ones_like(x)).log_prob(\n        normalized_x\n    )\n\n\ndef discretized_gaussian_log_likelihood(x, *, means, log_scales):\n    \"\"\"\n    Compute the log-likelihood of a Gaussian distribution discretizing to a\n    given image.\n    :param x: the target images. It is assumed that this was uint8 values,\n              rescaled to the range [-1, 1].\n    :param means: the Gaussian mean Tensor.\n    :param log_scales: the Gaussian log stddev Tensor.\n    :return: a tensor like x of log probabilities (in nats).\n    \"\"\"\n    assert x.shape == means.shape == log_scales.shape\n    centered_x = x - means\n    inv_stdv = th.exp(-log_scales)\n    plus_in = inv_stdv * (centered_x + 1.0 / 255.0)\n    cdf_plus = approx_standard_normal_cdf(plus_in)\n    min_in = inv_stdv * (centered_x - 1.0 / 255.0)\n    cdf_min = approx_standard_normal_cdf(min_in)\n    log_cdf_plus = th.log(cdf_plus.clamp(min=1e-12))\n    log_one_minus_cdf_min = th.log((1.0 - cdf_min).clamp(min=1e-12))\n    cdf_delta = cdf_plus - cdf_min\n    log_probs = th.where(\n        x < -0.999,\n        log_cdf_plus,\n        th.where(x > 0.999, log_one_minus_cdf_min, th.log(cdf_delta.clamp(min=1e-12))),\n    )\n    assert log_probs.shape == x.shape\n    return log_probs\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/dpm_solver.py",
    "content": "import torch\nfrom tqdm import tqdm\nfrom ..model.cache_functions import cache_init\n\nclass NoiseScheduleVP:\n    def __init__(\n            self,\n            schedule='discrete',\n            betas=None,\n            alphas_cumprod=None,\n            continuous_beta_0=0.1,\n            continuous_beta_1=20.,\n            dtype=torch.float32,\n    ):\n        \"\"\"Create a wrapper class for the forward SDE (VP type).\n\n        ***\n        Update: We support discrete-time diffusion models by implementing a picewise linear interpolation for log_alpha_t.\n                We recommend to use schedule='discrete' for the discrete-time diffusion models, especially for high-resolution images.\n        ***\n\n        The forward SDE ensures that the condition distribution q_{t|0}(x_t | x_0) = N ( alpha_t * x_0, sigma_t^2 * I ).\n        We further define lambda_t = log(alpha_t) - log(sigma_t), which is the half-logSNR (described in the DPM-Solver paper).\n        Therefore, we implement the functions for computing alpha_t, sigma_t and lambda_t. For t in [0, T], we have:\n\n            log_alpha_t = self.marginal_log_mean_coeff(t)\n            sigma_t = self.marginal_std(t)\n            lambda_t = self.marginal_lambda(t)\n\n        Moreover, as lambda(t) is an invertible function, we also support its inverse function:\n\n            t = self.inverse_lambda(lambda_t)\n\n        ===============================================================\n\n        We support both discrete-time DPMs (trained on n = 0, 1, ..., N-1) and continuous-time DPMs (trained on t in [t_0, T]).\n\n        1. For discrete-time DPMs:\n\n            For discrete-time DPMs trained on n = 0, 1, ..., N-1, we convert the discrete steps to continuous time steps by:\n                t_i = (i + 1) / N\n            e.g. for N = 1000, we have t_0 = 1e-3 and T = t_{N-1} = 1.\n            We solve the corresponding diffusion ODE from time T = 1 to time t_0 = 1e-3.\n\n            Args:\n                betas: A `torch.Tensor`. The beta array for the discrete-time DPM. (See the original DDPM paper for details)\n                alphas_cumprod: A `torch.Tensor`. The cumprod alphas for the discrete-time DPM. (See the original DDPM paper for details)\n\n            Note that we always have alphas_cumprod = cumprod(1 - betas). Therefore, we only need to set one of `betas` and `alphas_cumprod`.\n\n            **Important**:  Please pay special attention for the args for `alphas_cumprod`:\n                The `alphas_cumprod` is the \\hat{alpha_n} arrays in the notations of DDPM. Specifically, DDPMs assume that\n                    q_{t_n | 0}(x_{t_n} | x_0) = N ( \\sqrt{\\hat{alpha_n}} * x_0, (1 - \\hat{alpha_n}) * I ).\n                Therefore, the notation \\hat{alpha_n} is different from the notation alpha_t in DPM-Solver. In fact, we have\n                    alpha_{t_n} = \\sqrt{\\hat{alpha_n}},\n                and\n                    log(alpha_{t_n}) = 0.5 * log(\\hat{alpha_n}).\n\n\n        2. For continuous-time DPMs:\n\n            We support the linear VPSDE for the continuous time setting. The hyperparameters for the noise\n            schedule are the default settings in Yang Song's ScoreSDE:\n\n            Args:\n                beta_min: A `float` number. The smallest beta for the linear schedule.\n                beta_max: A `float` number. The largest beta for the linear schedule.\n                T: A `float` number. The ending time of the forward process.\n\n        ===============================================================\n\n        Args:\n            schedule: A `str`. The noise schedule of the forward SDE. 'discrete' for discrete-time DPMs,\n                    'linear' for continuous-time DPMs.\n        Returns:\n            A wrapper object of the forward SDE (VP type).\n\n        ===============================================================\n\n        Example:\n\n        # For discrete-time DPMs, given betas (the beta array for n = 0, 1, ..., N - 1):\n        >>> ns = NoiseScheduleVP('discrete', betas=betas)\n\n        # For discrete-time DPMs, given alphas_cumprod (the \\hat{alpha_n} array for n = 0, 1, ..., N - 1):\n        >>> ns = NoiseScheduleVP('discrete', alphas_cumprod=alphas_cumprod)\n\n        # For continuous-time DPMs (VPSDE), linear schedule:\n        >>> ns = NoiseScheduleVP('linear', continuous_beta_0=0.1, continuous_beta_1=20.)\n\n        \"\"\"\n\n        if schedule not in ['discrete', 'linear']:\n            raise ValueError(\n                f\"Unsupported noise schedule {schedule}. The schedule needs to be 'discrete' or 'linear'\"\n            )\n\n        self.schedule = schedule\n        if schedule == 'discrete':\n            if betas is not None:\n                log_alphas = 0.5 * torch.log(1 - betas).cumsum(dim=0)\n            else:\n                assert alphas_cumprod is not None\n                log_alphas = 0.5 * torch.log(alphas_cumprod)\n            self.T = 1.\n            self.log_alpha_array = self.numerical_clip_alpha(log_alphas).reshape((1, -1,)).to(dtype=dtype)\n            self.total_N = self.log_alpha_array.shape[1]\n            self.t_array = torch.linspace(0., 1., self.total_N + 1)[1:].reshape((1, -1)).to(dtype=dtype)\n        else:\n            self.T = 1.\n            self.total_N = 1000\n            self.beta_0 = continuous_beta_0\n            self.beta_1 = continuous_beta_1\n\n    def numerical_clip_alpha(self, log_alphas, clipped_lambda=-5.1):\n        \"\"\"\n        For some beta schedules such as cosine schedule, the log-SNR has numerical isssues.\n        We clip the log-SNR near t=T within -5.1 to ensure the stability.\n        Such a trick is very useful for diffusion models with the cosine schedule, such as i-DDPM, guided-diffusion and GLIDE.\n        \"\"\"\n        log_sigmas = 0.5 * torch.log(1. - torch.exp(2. * log_alphas))\n        lambs = log_alphas - log_sigmas\n        idx = torch.searchsorted(torch.flip(lambs, [0]), clipped_lambda)\n        if idx > 0:\n            log_alphas = log_alphas[:-idx]\n        return log_alphas\n\n    def marginal_log_mean_coeff(self, t):\n        \"\"\"\n        Compute log(alpha_t) of a given continuous-time label t in [0, T].\n        \"\"\"\n        if self.schedule == 'discrete':\n            return interpolate_fn(t.reshape((-1, 1)), self.t_array.to(t.device),\n                                  self.log_alpha_array.to(t.device)).reshape((-1))\n        elif self.schedule == 'linear':\n            return -0.25 * t ** 2 * (self.beta_1 - self.beta_0) - 0.5 * t * self.beta_0\n\n    def marginal_alpha(self, t):\n        \"\"\"\n        Compute alpha_t of a given continuous-time label t in [0, T].\n        \"\"\"\n        return torch.exp(self.marginal_log_mean_coeff(t))\n\n    def marginal_std(self, t):\n        \"\"\"\n        Compute sigma_t of a given continuous-time label t in [0, T].\n        \"\"\"\n        return torch.sqrt(1. - torch.exp(2. * self.marginal_log_mean_coeff(t)))\n\n    def marginal_lambda(self, t):\n        \"\"\"\n        Compute lambda_t = log(alpha_t) - log(sigma_t) of a given continuous-time label t in [0, T].\n        \"\"\"\n        log_mean_coeff = self.marginal_log_mean_coeff(t)\n        log_std = 0.5 * torch.log(1. - torch.exp(2. * log_mean_coeff))\n        return log_mean_coeff - log_std\n\n    def inverse_lambda(self, lamb):\n        \"\"\"\n        Compute the continuous-time label t in [0, T] of a given half-logSNR lambda_t.\n        \"\"\"\n        if self.schedule == 'linear':\n            tmp = 2. * (self.beta_1 - self.beta_0) * torch.logaddexp(-2. * lamb, torch.zeros((1,)).to(lamb))\n            Delta = self.beta_0 ** 2 + tmp\n            return tmp / (torch.sqrt(Delta) + self.beta_0) / (self.beta_1 - self.beta_0)\n        elif self.schedule == 'discrete':\n            log_alpha = -0.5 * torch.logaddexp(torch.zeros((1,)).to(lamb.device), -2. * lamb)\n            t = interpolate_fn(log_alpha.reshape((-1, 1)), torch.flip(self.log_alpha_array.to(lamb.device), [1]),\n                               torch.flip(self.t_array.to(lamb.device), [1]))\n            return t.reshape((-1,))\n\n\ndef model_wrapper(\n        model,\n        noise_schedule,\n        model_type=\"noise\",\n        model_kwargs={},\n        guidance_type=\"uncond\",\n        condition=None,\n        unconditional_condition=None,\n        guidance_scale=1.,\n        classifier_fn=None,\n        classifier_kwargs={},\n):\n    \"\"\"Create a wrapper function for the noise prediction model.\n\n    DPM-Solver needs to solve the continuous-time diffusion ODEs. For DPMs trained on discrete-time labels, we need to\n    firstly wrap the model function to a noise prediction model that accepts the continuous time as the input.\n\n    We support four types of the diffusion model by setting `model_type`:\n\n        1. \"noise\": noise prediction model. (Trained by predicting noise).\n\n        2. \"x_start\": data prediction model. (Trained by predicting the data x_0 at time 0).\n\n        3. \"v\": velocity prediction model. (Trained by predicting the velocity).\n            The \"v\" prediction is derivation detailed in Appendix D of [1], and is used in Imagen-Video [2].\n\n            [1] Salimans, Tim, and Jonathan Ho. \"Progressive distillation for fast sampling of diffusion models.\"\n                arXiv preprint arXiv:2202.00512 (2022).\n            [2] Ho, Jonathan, et al. \"Imagen Video: High Definition Video Generation with Diffusion Models.\"\n                arXiv preprint arXiv:2210.02303 (2022).\n\n        4. \"score\": marginal score function. (Trained by denoising score matching).\n            Note that the score function and the noise prediction model follows a simple relationship:\n            ```\n                noise(x_t, t) = -sigma_t * score(x_t, t)\n            ```\n\n    We support three types of guided sampling by DPMs by setting `guidance_type`:\n        1. \"uncond\": unconditional sampling by DPMs.\n            The input `model` has the following format:\n            ``\n                model(x, t_input, **model_kwargs) -> noise | x_start | v | score\n            ``\n\n        2. \"classifier\": classifier guidance sampling [3] by DPMs and another classifier.\n            The input `model` has the following format:\n            ``\n                model(x, t_input, **model_kwargs) -> noise | x_start | v | score\n            ``\n\n            The input `classifier_fn` has the following format:\n            ``\n                classifier_fn(x, t_input, cond, **classifier_kwargs) -> logits(x, t_input, cond)\n            ``\n\n            [3] P. Dhariwal and A. Q. Nichol, \"Diffusion models beat GANs on image synthesis,\"\n                in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 8780-8794.\n\n        3. \"classifier-free\": classifier-free guidance sampling by conditional DPMs.\n            The input `model` has the following format:\n            ``\n                model(x, t_input, cond, **model_kwargs) -> noise | x_start | v | score\n            ``\n            And if cond == `unconditional_condition`, the model output is the unconditional DPM output.\n\n            [4] Ho, Jonathan, and Tim Salimans. \"Classifier-free diffusion guidance.\"\n                arXiv preprint arXiv:2207.12598 (2022).\n\n\n    The `t_input` is the time label of the model, which may be discrete-time labels (i.e. 0 to 999)\n    or continuous-time labels (i.e. epsilon to T).\n\n    We wrap the model function to accept only `x` and `t_continuous` as inputs, and outputs the predicted noise:\n    ``\n        def model_fn(x, t_continuous) -> noise:\n            t_input = get_model_input_time(t_continuous)\n            return noise_pred(model, x, t_input, **model_kwargs)\n    ``\n    where `t_continuous` is the continuous time labels (i.e. epsilon to T). And we use `model_fn` for DPM-Solver.\n\n    ===============================================================\n\n    Args:\n        model: A diffusion model with the corresponding format described above.\n        noise_schedule: A noise schedule object, such as NoiseScheduleVP.\n        model_type: A `str`. The parameterization type of the diffusion model.\n                    \"noise\" or \"x_start\" or \"v\" or \"score\".\n        model_kwargs: A `dict`. A dict for the other inputs of the model function.\n        guidance_type: A `str`. The type of the guidance for sampling.\n                    \"uncond\" or \"classifier\" or \"classifier-free\".\n        condition: A pytorch tensor. The condition for the guided sampling.\n                    Only used for \"classifier\" or \"classifier-free\" guidance type.\n        unconditional_condition: A pytorch tensor. The condition for the unconditional sampling.\n                    Only used for \"classifier-free\" guidance type.\n        guidance_scale: A `float`. The scale for the guided sampling.\n        classifier_fn: A classifier function. Only used for the classifier guidance.\n        classifier_kwargs: A `dict`. A dict for the other inputs of the classifier function.\n    Returns:\n        A noise prediction model that accepts the noised data and the continuous time as the inputs.\n    \"\"\"\n   \n    def get_model_input_time(t_continuous):\n        \"\"\"\n        Convert the continuous-time `t_continuous` (in [epsilon, T]) to the model input time.\n        For discrete-time DPMs, we convert `t_continuous` in [1 / N, 1] to `t_input` in [0, 1000 * (N - 1) / N].\n        For continuous-time DPMs, we just use `t_continuous`.\n        \"\"\"\n        if noise_schedule.schedule == 'discrete':\n            return (t_continuous - 1. / noise_schedule.total_N) * 1000.\n        else:\n            return t_continuous\n\n    def noise_pred_fn(x, t_continuous, current, cache_dic, cond=None):\n        t_input = get_model_input_time(t_continuous)\n        if cond is None:\n            output = model(x, t_input, current, cache_dic, **model_kwargs)\n        else:\n            output = model(x, t_input, current, cache_dic, cond, **model_kwargs)\n        if model_type == \"noise\":\n            return output\n        elif model_type == \"x_start\":\n            alpha_t, sigma_t = noise_schedule.marginal_alpha(t_continuous), noise_schedule.marginal_std(t_continuous)\n            return (x - expand_dims(alpha_t, x.dim()) * output) / expand_dims(sigma_t, x.dim())\n        elif model_type == \"v\":\n            alpha_t, sigma_t = noise_schedule.marginal_alpha(t_continuous), noise_schedule.marginal_std(t_continuous)\n            return expand_dims(alpha_t, x.dim()) * output + expand_dims(sigma_t, x.dim()) * x\n        elif model_type == \"score\":\n            sigma_t = noise_schedule.marginal_std(t_continuous)\n            return -expand_dims(sigma_t, x.dim()) * output\n\n    def cond_grad_fn(x, t_input):\n        \"\"\"\n        Compute the gradient of the classifier, i.e. nabla_{x} log p_t(cond | x_t).\n        \"\"\"\n        with torch.enable_grad():\n            x_in = x.detach().requires_grad_(True)\n            log_prob = classifier_fn(x_in, t_input, condition, **classifier_kwargs)\n            return torch.autograd.grad(log_prob.sum(), x_in)[0]\n\n    def model_fn(x, t_continuous, current, cache_dic):\n        \"\"\"\n        The noise predicition model function that is used for DPM-Solver.\n        \"\"\"\n        if guidance_type == \"uncond\":\n            return noise_pred_fn(x, t_continuous)\n        elif guidance_type == \"classifier\":\n            assert classifier_fn is not None\n            t_input = get_model_input_time(t_continuous)\n            cond_grad = cond_grad_fn(x, t_input)\n            sigma_t = noise_schedule.marginal_std(t_continuous)\n            noise = noise_pred_fn(x, t_continuous)\n            return noise - guidance_scale * expand_dims(sigma_t, x.dim()) * cond_grad\n        elif guidance_type == \"classifier-free\":\n            if guidance_scale == 1. or unconditional_condition is None:\n                return noise_pred_fn(x, t_continuous, cond=condition)\n            x_in = torch.cat([x] * 2)\n            t_in = torch.cat([t_continuous] * 2)\n            c_in = torch.cat([unconditional_condition, condition])\n            noise_uncond, noise = noise_pred_fn(x_in, t_in, current, cache_dic, cond=c_in).chunk(2)\n            return noise_uncond + guidance_scale * (noise - noise_uncond)\n\n    assert model_type in [\"noise\", \"x_start\", \"v\", \"score\"]\n    assert guidance_type in [\"uncond\", \"classifier\", \"classifier-free\"]\n\n    return model_fn\n\n\nclass DPM_Solver:\n    def __init__(\n            self,\n            model_fn,\n            noise_schedule,\n            algorithm_type=\"dpmsolver++\",\n            correcting_x0_fn=None,\n            correcting_xt_fn=None,\n            thresholding_max_val=1.,\n            dynamic_thresholding_ratio=0.995,\n    ):\n        \"\"\"Construct a DPM-Solver.\n\n        We support both DPM-Solver (`algorithm_type=\"dpmsolver\"`) and DPM-Solver++ (`algorithm_type=\"dpmsolver++\"`).\n\n        We also support the \"dynamic thresholding\" method in Imagen[1]. For pixel-space diffusion models, you\n        can set both `algorithm_type=\"dpmsolver++\"` and `correcting_x0_fn=\"dynamic_thresholding\"` to use the\n        dynamic thresholding. The \"dynamic thresholding\" can greatly improve the sample quality for pixel-space\n        DPMs with large guidance scales. Note that the thresholding method is **unsuitable** for latent-space\n        DPMs (such as stable-diffusion).\n\n        To support advanced algorithms in image-to-image applications, we also support corrector functions for\n        both x0 and xt.\n\n        Args:\n            model_fn: A noise prediction model function which accepts the continuous-time input (t in [epsilon, T]):\n                ``\n                def model_fn(x, t_continuous):\n                    return noise\n                ``\n                The shape of `x` is `(batch_size, **shape)`, and the shape of `t_continuous` is `(batch_size,)`.\n            noise_schedule: A noise schedule object, such as NoiseScheduleVP.\n            algorithm_type: A `str`. Either \"dpmsolver\" or \"dpmsolver++\".\n            correcting_x0_fn: A `str` or a function with the following format:\n                ```\n                def correcting_x0_fn(x0, t):\n                    x0_new = ...\n                    return x0_new\n                ```\n                This function is to correct the outputs of the data prediction model at each sampling step. e.g.,\n                ```\n                x0_pred = data_pred_model(xt, t)\n                if correcting_x0_fn is not None:\n                    x0_pred = correcting_x0_fn(x0_pred, t)\n                xt_1 = update(x0_pred, xt, t)\n                ```\n                If `correcting_x0_fn=\"dynamic_thresholding\"`, we use the dynamic thresholding proposed in Imagen[1].\n            correcting_xt_fn: A function with the following format:\n                ```\n                def correcting_xt_fn(xt, t, step):\n                    x_new = ...\n                    return x_new\n                ```\n                This function is to correct the intermediate samples xt at each sampling step. e.g.,\n                ```\n                xt = ...\n                xt = correcting_xt_fn(xt, t, step)\n                ```\n            thresholding_max_val: A `float`. The max value for thresholding.\n                Valid only when use `dpmsolver++` and `correcting_x0_fn=\"dynamic_thresholding\"`.\n            dynamic_thresholding_ratio: A `float`. The ratio for dynamic thresholding (see Imagen[1] for details).\n                Valid only when use `dpmsolver++` and `correcting_x0_fn=\"dynamic_thresholding\"`.\n\n        [1] Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour,\n            Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al. Photorealistic text-to-image diffusion models\n            with deep language understanding. arXiv preprint arXiv:2205.11487, 2022b.\n        \"\"\"\n        self.model = lambda x, t, current, cache_dic: model_fn(x, t.expand((x.shape[0])), current, cache_dic)\n        self.noise_schedule = noise_schedule\n        assert algorithm_type in [\"dpmsolver\", \"dpmsolver++\"]\n        self.algorithm_type = algorithm_type\n        if correcting_x0_fn == \"dynamic_thresholding\":\n            self.correcting_x0_fn = self.dynamic_thresholding_fn\n        else:\n            self.correcting_x0_fn = correcting_x0_fn\n        self.correcting_xt_fn = correcting_xt_fn\n        self.dynamic_thresholding_ratio = dynamic_thresholding_ratio\n        self.thresholding_max_val = thresholding_max_val\n\n    def dynamic_thresholding_fn(self, x0, t):\n        \"\"\"\n        The dynamic thresholding method.\n        \"\"\"\n        dims = x0.dim()\n        p = self.dynamic_thresholding_ratio\n        s = torch.quantile(torch.abs(x0).reshape((x0.shape[0], -1)), p, dim=1)\n        s = expand_dims(torch.maximum(s, self.thresholding_max_val * torch.ones_like(s).to(s.device)), dims)\n        x0 = torch.clamp(x0, -s, s) / s\n        return x0\n\n    def noise_prediction_fn(self, x, t, current, cache_dic):\n        \"\"\"\n        Return the noise prediction model.\n        \"\"\"\n        return self.model(x, t, current, cache_dic)\n\n    def data_prediction_fn(self, x, t, current, cache_dic):\n        \"\"\"\n        Return the data prediction model (with corrector).\n        \"\"\"\n        noise = self.noise_prediction_fn(x, t, current, cache_dic)\n        alpha_t, sigma_t = self.noise_schedule.marginal_alpha(t), self.noise_schedule.marginal_std(t)\n        x0 = (x - sigma_t * noise) / alpha_t\n        if self.correcting_x0_fn is not None:\n            x0 = self.correcting_x0_fn(x0, t)\n        return x0\n\n    def model_fn(self, x, t, current, cache_dic):\n        \"\"\"\n        Convert the model to the noise prediction model or the data prediction model.\n        \"\"\"\n        if self.algorithm_type == \"dpmsolver++\":\n            return self.data_prediction_fn(x, t, current, cache_dic)\n        else:\n            return self.noise_prediction_fn(x, t, current, cache_dic)\n\n    def get_time_steps(self, skip_type, t_T, t_0, N, device):\n        \"\"\"Compute the intermediate time steps for sampling.\n\n        Args:\n            skip_type: A `str`. The type for the spacing of the time steps. We support three types:\n                - 'logSNR': uniform logSNR for the time steps.\n                - 'time_uniform': uniform time for the time steps. (**Recommended for high-resolutional data**.)\n                - 'time_quadratic': quadratic time for the time steps. (Used in DDIM for low-resolutional data.)\n            t_T: A `float`. The starting time of the sampling (default is T).\n            t_0: A `float`. The ending time of the sampling (default is epsilon).\n            N: A `int`. The total number of the spacing of the time steps.\n            device: A torch device.\n        Returns:\n            A pytorch tensor of the time steps, with the shape (N + 1,).\n        \"\"\"\n        if skip_type == 'logSNR':\n            lambda_T = self.noise_schedule.marginal_lambda(torch.tensor(t_T).to(device))\n            lambda_0 = self.noise_schedule.marginal_lambda(torch.tensor(t_0).to(device))\n            logSNR_steps = torch.linspace(lambda_T.cpu().item(), lambda_0.cpu().item(), N + 1).to(device)\n            return self.noise_schedule.inverse_lambda(logSNR_steps)\n        elif skip_type == 'time_uniform':\n            return torch.linspace(t_T, t_0, N + 1).to(device)\n        elif skip_type == 'time_quadratic':\n            t_order = 2\n            return (\n                torch.linspace(\n                    t_T ** (1.0 / t_order), t_0 ** (1.0 / t_order), N + 1\n                )\n                .pow(t_order)\n                .to(device)\n            )\n        else:\n            raise ValueError(\n                f\"Unsupported skip_type {skip_type}, need to be 'logSNR' or 'time_uniform' or 'time_quadratic'\"\n            )\n\n    def get_orders_and_timesteps_for_singlestep_solver(self, steps, order, skip_type, t_T, t_0, device):\n        \"\"\"\n        Get the order of each step for sampling by the singlestep DPM-Solver.\n\n        We combine both DPM-Solver-1,2,3 to use all the function evaluations, which is named as \"DPM-Solver-fast\".\n        Given a fixed number of function evaluations by `steps`, the sampling procedure by DPM-Solver-fast is:\n            - If order == 1:\n                We take `steps` of DPM-Solver-1 (i.e. DDIM).\n            - If order == 2:\n                - Denote K = (steps // 2). We take K or (K + 1) intermediate time steps for sampling.\n                - If steps % 2 == 0, we use K steps of DPM-Solver-2.\n                - If steps % 2 == 1, we use K steps of DPM-Solver-2 and 1 step of DPM-Solver-1.\n            - If order == 3:\n                - Denote K = (steps // 3 + 1). We take K intermediate time steps for sampling.\n                - If steps % 3 == 0, we use (K - 2) steps of DPM-Solver-3, and 1 step of DPM-Solver-2 and 1 step of DPM-Solver-1.\n                - If steps % 3 == 1, we use (K - 1) steps of DPM-Solver-3 and 1 step of DPM-Solver-1.\n                - If steps % 3 == 2, we use (K - 1) steps of DPM-Solver-3 and 1 step of DPM-Solver-2.\n\n        ============================================\n        Args:\n            order: A `int`. The max order for the solver (2 or 3).\n            steps: A `int`. The total number of function evaluations (NFE).\n            skip_type: A `str`. The type for the spacing of the time steps. We support three types:\n                - 'logSNR': uniform logSNR for the time steps.\n                - 'time_uniform': uniform time for the time steps. (**Recommended for high-resolutional data**.)\n                - 'time_quadratic': quadratic time for the time steps. (Used in DDIM for low-resolutional data.)\n            t_T: A `float`. The starting time of the sampling (default is T).\n            t_0: A `float`. The ending time of the sampling (default is epsilon).\n            device: A torch device.\n        Returns:\n            orders: A list of the solver order of each step.\n        \"\"\"\n        if order == 3:\n            K = steps // 3 + 1\n            if steps % 3 == 0:\n                orders = [3, ] * (K - 2) + [2, 1]\n            elif steps % 3 == 1:\n                orders = [3, ] * (K - 1) + [1]\n            else:\n                orders = [3, ] * (K - 1) + [2]\n        elif order == 2:\n            if steps % 2 == 0:\n                K = steps // 2\n                orders = [2, ] * K\n            else:\n                K = steps // 2 + 1\n                orders = [2, ] * (K - 1) + [1]\n        elif order == 1:\n            K = 1\n            orders = [1, ] * steps\n        else:\n            raise ValueError(\"'order' must be '1' or '2' or '3'.\")\n        if skip_type == 'logSNR':\n            # To reproduce the results in DPM-Solver paper\n            timesteps_outer = self.get_time_steps(skip_type, t_T, t_0, K, device)\n        else:\n            timesteps_outer = self.get_time_steps(skip_type, t_T, t_0, steps, device)[\n                torch.cumsum(torch.tensor([0, ] + orders), 0).to(device)]\n        return timesteps_outer, orders\n\n    def denoise_to_zero_fn(self, x, s):\n        \"\"\"\n        Denoise at the final step, which is equivalent to solve the ODE from lambda_s to infty by first-order discretization.\n        \"\"\"\n        return self.data_prediction_fn(x, s)\n\n    def dpm_solver_first_update(self, x, s, t, current, cache_dic, model_s=None, return_intermediate=False):\n        \"\"\"\n        DPM-Solver-1 (equivalent to DDIM) from time `s` to time `t`.\n\n        Args:\n            x: A pytorch tensor. The initial value at time `s`.\n            s: A pytorch tensor. The starting time, with the shape (1,).\n            t: A pytorch tensor. The ending time, with the shape (1,).\n            model_s: A pytorch tensor. The model function evaluated at time `s`.\n                If `model_s` is None, we evaluate the model by `x` and `s`; otherwise we directly use it.\n            return_intermediate: A `bool`. If true, also return the model value at time `s`.\n        Returns:\n            x_t: A pytorch tensor. The approximated solution at time `t`.\n        \"\"\"\n        ns = self.noise_schedule\n        dims = x.dim()\n        lambda_s, lambda_t = ns.marginal_lambda(s), ns.marginal_lambda(t)\n        h = lambda_t - lambda_s\n        log_alpha_s, log_alpha_t = ns.marginal_log_mean_coeff(s), ns.marginal_log_mean_coeff(t)\n        sigma_s, sigma_t = ns.marginal_std(s), ns.marginal_std(t)\n        alpha_t = torch.exp(log_alpha_t)\n\n        if self.algorithm_type == \"dpmsolver++\":\n            phi_1 = torch.expm1(-h)\n            if model_s is None:\n                model_s = self.model_fn(x, s, current, cache_dic)\n            x_t = (\n                    sigma_t / sigma_s * x\n                    - alpha_t * phi_1 * model_s\n            )\n        else:\n            phi_1 = torch.expm1(h)\n            if model_s is None:\n                model_s = self.model_fn(x, s, current, cache_dic)\n            x_t = (\n                    torch.exp(log_alpha_t - log_alpha_s) * x\n                    - (sigma_t * phi_1) * model_s\n            )\n        return (x_t, {'model_s': model_s}) if return_intermediate else x_t\n\n    def singlestep_dpm_solver_second_update(self, x, s, t, current, cache_dic, r1=0.5, model_s=None, return_intermediate=False,\n                                            solver_type='dpmsolver'):\n        \"\"\"\n        Singlestep solver DPM-Solver-2 from time `s` to time `t`.\n\n        Args:\n            x: A pytorch tensor. The initial value at time `s`.\n            s: A pytorch tensor. The starting time, with the shape (1,).\n            t: A pytorch tensor. The ending time, with the shape (1,).\n            r1: A `float`. The hyperparameter of the second-order solver.\n            model_s: A pytorch tensor. The model function evaluated at time `s`.\n                If `model_s` is None, we evaluate the model by `x` and `s`; otherwise we directly use it.\n            return_intermediate: A `bool`. If true, also return the model value at time `s` and `s1` (the intermediate time).\n            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.\n                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.\n        Returns:\n            x_t: A pytorch tensor. The approximated solution at time `t`.\n        \"\"\"\n        if solver_type not in ['dpmsolver', 'taylor']:\n            raise ValueError(\n                f\"'solver_type' must be either 'dpmsolver' or 'taylor', got {solver_type}\"\n            )\n        if r1 is None:\n            r1 = 0.5\n        ns = self.noise_schedule\n        lambda_s, lambda_t = ns.marginal_lambda(s), ns.marginal_lambda(t)\n        h = lambda_t - lambda_s\n        lambda_s1 = lambda_s + r1 * h\n        s1 = ns.inverse_lambda(lambda_s1)\n        log_alpha_s, log_alpha_s1, log_alpha_t = ns.marginal_log_mean_coeff(s), ns.marginal_log_mean_coeff(\n            s1), ns.marginal_log_mean_coeff(t)\n        sigma_s, sigma_s1, sigma_t = ns.marginal_std(s), ns.marginal_std(s1), ns.marginal_std(t)\n        alpha_s1, alpha_t = torch.exp(log_alpha_s1), torch.exp(log_alpha_t)\n\n        if self.algorithm_type == \"dpmsolver++\":\n            phi_11 = torch.expm1(-r1 * h)\n            phi_1 = torch.expm1(-h)\n\n            if model_s is None:\n                model_s = self.model_fn(x, s, current, cache_dic)\n            x_s1 = (\n                    (sigma_s1 / sigma_s) * x\n                    - (alpha_s1 * phi_11) * model_s\n            )\n            model_s1 = self.model_fn(x_s1, s1, current, cache_dic)\n            if solver_type == 'dpmsolver':\n                x_t = (\n                        (sigma_t / sigma_s) * x\n                        - (alpha_t * phi_1) * model_s\n                        - (0.5 / r1) * (alpha_t * phi_1) * (model_s1 - model_s)\n                )\n            elif solver_type == 'taylor':\n                x_t = (\n                        (sigma_t / sigma_s) * x\n                        - (alpha_t * phi_1) * model_s\n                        + (1. / r1) * (alpha_t * (phi_1 / h + 1.)) * (model_s1 - model_s)\n                )\n        else:\n            phi_11 = torch.expm1(r1 * h)\n            phi_1 = torch.expm1(h)\n\n            if model_s is None:\n                model_s = self.model_fn(x, s, current, cache_dic)\n            x_s1 = (\n                    torch.exp(log_alpha_s1 - log_alpha_s) * x\n                    - (sigma_s1 * phi_11) * model_s\n            )\n            model_s1 = self.model_fn(x_s1, s1, current, cache_dic)\n            if solver_type == 'dpmsolver':\n                x_t = (\n                        torch.exp(log_alpha_t - log_alpha_s) * x\n                        - (sigma_t * phi_1) * model_s\n                        - (0.5 / r1) * (sigma_t * phi_1) * (model_s1 - model_s)\n                )\n            elif solver_type == 'taylor':\n                x_t = (\n                        torch.exp(log_alpha_t - log_alpha_s) * x\n                        - (sigma_t * phi_1) * model_s\n                        - (1. / r1) * (sigma_t * (phi_1 / h - 1.)) * (model_s1 - model_s)\n                )\n        if return_intermediate:\n            return x_t, {'model_s': model_s, 'model_s1': model_s1}\n        else:\n            return x_t\n\n    def singlestep_dpm_solver_third_update(self, x, s, t, current, cache_dic, r1=1. / 3., r2=2. / 3., model_s=None, model_s1=None,\n                                           return_intermediate=False, solver_type='dpmsolver'):\n        \"\"\"\n        Singlestep solver DPM-Solver-3 from time `s` to time `t`.\n\n        Args:\n            x: A pytorch tensor. The initial value at time `s`.\n            s: A pytorch tensor. The starting time, with the shape (1,).\n            t: A pytorch tensor. The ending time, with the shape (1,).\n            r1: A `float`. The hyperparameter of the third-order solver.\n            r2: A `float`. The hyperparameter of the third-order solver.\n            model_s: A pytorch tensor. The model function evaluated at time `s`.\n                If `model_s` is None, we evaluate the model by `x` and `s`; otherwise we directly use it.\n            model_s1: A pytorch tensor. The model function evaluated at time `s1` (the intermediate time given by `r1`).\n                If `model_s1` is None, we evaluate the model at `s1`; otherwise we directly use it.\n            return_intermediate: A `bool`. If true, also return the model value at time `s`, `s1` and `s2` (the intermediate times).\n            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.\n                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.\n        Returns:\n            x_t: A pytorch tensor. The approximated solution at time `t`.\n        \"\"\"\n        if solver_type not in ['dpmsolver', 'taylor']:\n            raise ValueError(\n                f\"'solver_type' must be either 'dpmsolver' or 'taylor', got {solver_type}\"\n            )\n        if r1 is None:\n            r1 = 1. / 3.\n        if r2 is None:\n            r2 = 2. / 3.\n        ns = self.noise_schedule\n        lambda_s, lambda_t = ns.marginal_lambda(s), ns.marginal_lambda(t)\n        h = lambda_t - lambda_s\n        lambda_s1 = lambda_s + r1 * h\n        lambda_s2 = lambda_s + r2 * h\n        s1 = ns.inverse_lambda(lambda_s1)\n        s2 = ns.inverse_lambda(lambda_s2)\n        log_alpha_s, log_alpha_s1, log_alpha_s2, log_alpha_t = ns.marginal_log_mean_coeff(\n            s), ns.marginal_log_mean_coeff(s1), ns.marginal_log_mean_coeff(s2), ns.marginal_log_mean_coeff(t)\n        sigma_s, sigma_s1, sigma_s2, sigma_t = ns.marginal_std(s), ns.marginal_std(s1), ns.marginal_std(\n            s2), ns.marginal_std(t)\n        alpha_s1, alpha_s2, alpha_t = torch.exp(log_alpha_s1), torch.exp(log_alpha_s2), torch.exp(log_alpha_t)\n\n        if self.algorithm_type == \"dpmsolver++\":\n            phi_11 = torch.expm1(-r1 * h)\n            phi_12 = torch.expm1(-r2 * h)\n            phi_1 = torch.expm1(-h)\n            phi_22 = torch.expm1(-r2 * h) / (r2 * h) + 1.\n            phi_2 = phi_1 / h + 1.\n            phi_3 = phi_2 / h - 0.5\n\n            if model_s is None:\n                model_s = self.model_fn(x, s, current, cache_dic)\n            if model_s1 is None:\n                x_s1 = (\n                        (sigma_s1 / sigma_s) * x\n                        - (alpha_s1 * phi_11) * model_s\n                )\n                model_s1 = self.model_fn(x_s1, s1, current, cache_dic)\n            x_s2 = (\n                    (sigma_s2 / sigma_s) * x\n                    - (alpha_s2 * phi_12) * model_s\n                    + r2 / r1 * (alpha_s2 * phi_22) * (model_s1 - model_s)\n            )\n            model_s2 = self.model_fn(x_s2, s2, current, cache_dic)\n            if solver_type == 'dpmsolver':\n                x_t = (\n                        (sigma_t / sigma_s) * x\n                        - (alpha_t * phi_1) * model_s\n                        + (1. / r2) * (alpha_t * phi_2) * (model_s2 - model_s)\n                )\n            elif solver_type == 'taylor':\n                D1_0 = (1. / r1) * (model_s1 - model_s)\n                D1_1 = (1. / r2) * (model_s2 - model_s)\n                D1 = (r2 * D1_0 - r1 * D1_1) / (r2 - r1)\n                D2 = 2. * (D1_1 - D1_0) / (r2 - r1)\n                x_t = (\n                        (sigma_t / sigma_s) * x\n                        - (alpha_t * phi_1) * model_s\n                        + (alpha_t * phi_2) * D1\n                        - (alpha_t * phi_3) * D2\n                )\n        else:\n            phi_11 = torch.expm1(r1 * h)\n            phi_12 = torch.expm1(r2 * h)\n            phi_1 = torch.expm1(h)\n            phi_22 = torch.expm1(r2 * h) / (r2 * h) - 1.\n            phi_2 = phi_1 / h - 1.\n            phi_3 = phi_2 / h - 0.5\n\n            if model_s is None:\n                model_s = self.model_fn(x, s, current, cache_dic)\n            if model_s1 is None:\n                x_s1 = (\n                        (torch.exp(log_alpha_s1 - log_alpha_s)) * x\n                        - (sigma_s1 * phi_11) * model_s\n                )\n                model_s1 = self.model_fn(x_s1, s1, current, cache_dic)\n            x_s2 = (\n                    (torch.exp(log_alpha_s2 - log_alpha_s)) * x\n                    - (sigma_s2 * phi_12) * model_s\n                    - r2 / r1 * (sigma_s2 * phi_22) * (model_s1 - model_s)\n            )\n            model_s2 = self.model_fn(x_s2, s2, current, cache_dic)\n            if solver_type == 'dpmsolver':\n                x_t = (\n                        (torch.exp(log_alpha_t - log_alpha_s)) * x\n                        - (sigma_t * phi_1) * model_s\n                        - (1. / r2) * (sigma_t * phi_2) * (model_s2 - model_s)\n                )\n            elif solver_type == 'taylor':\n                D1_0 = (1. / r1) * (model_s1 - model_s)\n                D1_1 = (1. / r2) * (model_s2 - model_s)\n                D1 = (r2 * D1_0 - r1 * D1_1) / (r2 - r1)\n                D2 = 2. * (D1_1 - D1_0) / (r2 - r1)\n                x_t = (\n                        (torch.exp(log_alpha_t - log_alpha_s)) * x\n                        - (sigma_t * phi_1) * model_s\n                        - (sigma_t * phi_2) * D1\n                        - (sigma_t * phi_3) * D2\n                )\n\n        if return_intermediate:\n            return x_t, {'model_s': model_s, 'model_s1': model_s1, 'model_s2': model_s2}\n        else:\n            return x_t\n\n    def multistep_dpm_solver_second_update(self, x, model_prev_list, t_prev_list, t, solver_type=\"dpmsolver\"):\n        \"\"\"\n        Multistep solver DPM-Solver-2 from time `t_prev_list[-1]` to time `t`.\n\n        Args:\n            x: A pytorch tensor. The initial value at time `s`.\n            model_prev_list: A list of pytorch tensor. The previous computed model values.\n            t_prev_list: A list of pytorch tensor. The previous times, each time has the shape (1,)\n            t: A pytorch tensor. The ending time, with the shape (1,).\n            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.\n                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.\n        Returns:\n            x_t: A pytorch tensor. The approximated solution at time `t`.\n        \"\"\"\n        if solver_type not in ['dpmsolver', 'taylor']:\n            raise ValueError(\n                f\"'solver_type' must be either 'dpmsolver' or 'taylor', got {solver_type}\"\n            )\n        ns = self.noise_schedule\n        model_prev_1, model_prev_0 = model_prev_list[-2], model_prev_list[-1]\n        t_prev_1, t_prev_0 = t_prev_list[-2], t_prev_list[-1]\n        lambda_prev_1, lambda_prev_0, lambda_t = ns.marginal_lambda(t_prev_1), ns.marginal_lambda(\n            t_prev_0), ns.marginal_lambda(t)\n        log_alpha_prev_0, log_alpha_t = ns.marginal_log_mean_coeff(t_prev_0), ns.marginal_log_mean_coeff(t)\n        sigma_prev_0, sigma_t = ns.marginal_std(t_prev_0), ns.marginal_std(t)\n        alpha_t = torch.exp(log_alpha_t)\n\n        h_0 = lambda_prev_0 - lambda_prev_1\n        h = lambda_t - lambda_prev_0\n        r0 = h_0 / h\n        D1_0 = (1. / r0) * (model_prev_0 - model_prev_1)\n        if self.algorithm_type == \"dpmsolver++\":\n            phi_1 = torch.expm1(-h)\n            if solver_type == 'dpmsolver':\n                x_t = (\n                        (sigma_t / sigma_prev_0) * x\n                        - (alpha_t * phi_1) * model_prev_0\n                        - 0.5 * (alpha_t * phi_1) * D1_0\n                )\n            elif solver_type == 'taylor':\n                x_t = (\n                        (sigma_t / sigma_prev_0) * x\n                        - (alpha_t * phi_1) * model_prev_0\n                        + (alpha_t * (phi_1 / h + 1.)) * D1_0\n                )\n        else:\n            phi_1 = torch.expm1(h)\n            if solver_type == 'dpmsolver':\n                x_t = (\n                        (torch.exp(log_alpha_t - log_alpha_prev_0)) * x\n                        - (sigma_t * phi_1) * model_prev_0\n                        - 0.5 * (sigma_t * phi_1) * D1_0\n                )\n            elif solver_type == 'taylor':\n                x_t = (\n                        (torch.exp(log_alpha_t - log_alpha_prev_0)) * x\n                        - (sigma_t * phi_1) * model_prev_0\n                        - (sigma_t * (phi_1 / h - 1.)) * D1_0\n                )\n        return x_t\n\n    def multistep_dpm_solver_third_update(self, x, model_prev_list, t_prev_list, t, solver_type='dpmsolver'):\n        \"\"\"\n        Multistep solver DPM-Solver-3 from time `t_prev_list[-1]` to time `t`.\n\n        Args:\n            x: A pytorch tensor. The initial value at time `s`.\n            model_prev_list: A list of pytorch tensor. The previous computed model values.\n            t_prev_list: A list of pytorch tensor. The previous times, each time has the shape (1,)\n            t: A pytorch tensor. The ending time, with the shape (1,).\n            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.\n                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.\n        Returns:\n            x_t: A pytorch tensor. The approximated solution at time `t`.\n        \"\"\"\n        ns = self.noise_schedule\n        model_prev_2, model_prev_1, model_prev_0 = model_prev_list\n        t_prev_2, t_prev_1, t_prev_0 = t_prev_list\n        lambda_prev_2, lambda_prev_1, lambda_prev_0, lambda_t = ns.marginal_lambda(t_prev_2), ns.marginal_lambda(\n            t_prev_1), ns.marginal_lambda(t_prev_0), ns.marginal_lambda(t)\n        log_alpha_prev_0, log_alpha_t = ns.marginal_log_mean_coeff(t_prev_0), ns.marginal_log_mean_coeff(t)\n        sigma_prev_0, sigma_t = ns.marginal_std(t_prev_0), ns.marginal_std(t)\n        alpha_t = torch.exp(log_alpha_t)\n\n        h_1 = lambda_prev_1 - lambda_prev_2\n        h_0 = lambda_prev_0 - lambda_prev_1\n        h = lambda_t - lambda_prev_0\n        r0, r1 = h_0 / h, h_1 / h\n        D1_0 = (1. / r0) * (model_prev_0 - model_prev_1)\n        D1_1 = (1. / r1) * (model_prev_1 - model_prev_2)\n        D1 = D1_0 + (r0 / (r0 + r1)) * (D1_0 - D1_1)\n        D2 = (1. / (r0 + r1)) * (D1_0 - D1_1)\n        if self.algorithm_type == \"dpmsolver++\":\n            phi_1 = torch.expm1(-h)\n            phi_2 = phi_1 / h + 1.\n            phi_3 = phi_2 / h - 0.5\n            return (\n                (sigma_t / sigma_prev_0) * x\n                - (alpha_t * phi_1) * model_prev_0\n                + (alpha_t * phi_2) * D1\n                - (alpha_t * phi_3) * D2\n            )\n        else:\n            phi_1 = torch.expm1(h)\n            phi_2 = phi_1 / h - 1.\n            phi_3 = phi_2 / h - 0.5\n            return (\n                (torch.exp(log_alpha_t - log_alpha_prev_0)) * x\n                - (sigma_t * phi_1) * model_prev_0\n                - (sigma_t * phi_2) * D1\n                - (sigma_t * phi_3) * D2\n            )\n\n    def singlestep_dpm_solver_update(self, x, s, t, current, cache_dic, order, return_intermediate=False, solver_type='dpmsolver', r1=None,\n                                     r2=None):\n        \"\"\"\n        Singlestep DPM-Solver with the order `order` from time `s` to time `t`.\n\n        Args:\n            x: A pytorch tensor. The initial value at time `s`.\n            s: A pytorch tensor. The starting time, with the shape (1,).\n            t: A pytorch tensor. The ending time, with the shape (1,).\n            order: A `int`. The order of DPM-Solver. We only support order == 1 or 2 or 3.\n            return_intermediate: A `bool`. If true, also return the model value at time `s`, `s1` and `s2` (the intermediate times).\n            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.\n                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.\n            r1: A `float`. The hyperparameter of the second-order or third-order solver.\n            r2: A `float`. The hyperparameter of the third-order solver.\n        Returns:\n            x_t: A pytorch tensor. The approximated solution at time `t`.\n        \"\"\"\n        if order == 1:\n            return self.dpm_solver_first_update(x, s, t, current, cache_dic, return_intermediate=return_intermediate)\n        elif order == 2:\n            return self.singlestep_dpm_solver_second_update(x, s, t, current, cache_dic, return_intermediate=return_intermediate,\n                                                            solver_type=solver_type, r1=r1)\n        elif order == 3:\n            return self.singlestep_dpm_solver_third_update(x, s, t, current, cache_dic, return_intermediate=return_intermediate,\n                                                           solver_type=solver_type, r1=r1, r2=r2)\n        else:\n            raise ValueError(f\"Solver order must be 1 or 2 or 3, got {order}\")\n\n    def multistep_dpm_solver_update(self, x, model_prev_list, t_prev_list, t, current, cache_dic, order, solver_type='dpmsolver'):\n        \"\"\"\n        Multistep DPM-Solver with the order `order` from time `t_prev_list[-1]` to time `t`.\n\n        Args:\n            x: A pytorch tensor. The initial value at time `s`.\n            model_prev_list: A list of pytorch tensor. The previous computed model values.\n            t_prev_list: A list of pytorch tensor. The previous times, each time has the shape (1,)\n            t: A pytorch tensor. The ending time, with the shape (1,).\n            order: A `int`. The order of DPM-Solver. We only support order == 1 or 2 or 3.\n            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.\n                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.\n        Returns:\n            x_t: A pytorch tensor. The approximated solution at time `t`.\n        \"\"\"\n        if order == 1:\n            return self.dpm_solver_first_update(x, t_prev_list[-1], t, current, cache_dic, model_s=model_prev_list[-1])\n        elif order == 2:\n            return self.multistep_dpm_solver_second_update(x, model_prev_list, t_prev_list, t, solver_type=solver_type)\n        elif order == 3:\n            return self.multistep_dpm_solver_third_update(x, model_prev_list, t_prev_list, t, solver_type=solver_type)\n        else:\n            raise ValueError(f\"Solver order must be 1 or 2 or 3, got {order}\")\n\n    def dpm_solver_adaptive(self, x, order, t_T, t_0, h_init=0.05, atol=0.0078, rtol=0.05, theta=0.9, t_err=1e-5,\n                            solver_type='dpmsolver'):\n        \"\"\"\n        The adaptive step size solver based on singlestep DPM-Solver.\n\n        Args:\n            x: A pytorch tensor. The initial value at time `t_T`.\n            order: A `int`. The (higher) order of the solver. We only support order == 2 or 3.\n            t_T: A `float`. The starting time of the sampling (default is T).\n            t_0: A `float`. The ending time of the sampling (default is epsilon).\n            h_init: A `float`. The initial step size (for logSNR).\n            atol: A `float`. The absolute tolerance of the solver. For image data, the default setting is 0.0078, followed [1].\n            rtol: A `float`. The relative tolerance of the solver. The default setting is 0.05.\n            theta: A `float`. The safety hyperparameter for adapting the step size. The default setting is 0.9, followed [1].\n            t_err: A `float`. The tolerance for the time. We solve the diffusion ODE until the absolute error between the\n                current time and `t_0` is less than `t_err`. The default setting is 1e-5.\n            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.\n                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.\n        Returns:\n            x_0: A pytorch tensor. The approximated solution at time `t_0`.\n\n        [1] A. Jolicoeur-Martineau, K. Li, R. Piché-Taillefer, T. Kachman, and I. Mitliagkas, \"Gotta go fast when generating data with score-based models,\" arXiv preprint arXiv:2105.14080, 2021.\n        \"\"\"\n        ns = self.noise_schedule\n        s = t_T * torch.ones((1,)).to(x)\n        lambda_s = ns.marginal_lambda(s)\n        lambda_0 = ns.marginal_lambda(t_0 * torch.ones_like(s).to(x))\n        h = h_init * torch.ones_like(s).to(x)\n        x_prev = x\n        nfe = 0\n        if order == 2:\n            r1 = 0.5\n            lower_update = lambda x, s, t: self.dpm_solver_first_update(x, s, t, return_intermediate=True)\n            higher_update = lambda x, s, t, **kwargs: self.singlestep_dpm_solver_second_update(x, s, t, r1=r1,\n                                                                                               solver_type=solver_type,\n                                                                                               **kwargs)\n        elif order == 3:\n            r1, r2 = 1. / 3., 2. / 3.\n            lower_update = lambda x, s, t: self.singlestep_dpm_solver_second_update(x, s, t, r1=r1,\n                                                                                    return_intermediate=True,\n                                                                                    solver_type=solver_type)\n            higher_update = lambda x, s, t, **kwargs: self.singlestep_dpm_solver_third_update(x, s, t, r1=r1, r2=r2,\n                                                                                              solver_type=solver_type,\n                                                                                              **kwargs)\n        else:\n            raise ValueError(\n                f\"For adaptive step size solver, order must be 2 or 3, got {order}\"\n            )\n        while torch.abs((s - t_0)).mean() > t_err:\n            t = ns.inverse_lambda(lambda_s + h)\n            x_lower, lower_noise_kwargs = lower_update(x, s, t)\n            x_higher = higher_update(x, s, t, **lower_noise_kwargs)\n            delta = torch.max(torch.ones_like(x).to(x) * atol, rtol * torch.max(torch.abs(x_lower), torch.abs(x_prev)))\n            norm_fn = lambda v: torch.sqrt(torch.square(v.reshape((v.shape[0], -1))).mean(dim=-1, keepdim=True))\n            E = norm_fn((x_higher - x_lower) / delta).max()\n            if torch.all(E <= 1.):\n                x = x_higher\n                s = t\n                x_prev = x_lower\n                lambda_s = ns.marginal_lambda(s)\n            h = torch.min(theta * h * torch.float_power(E, -1. / order).float(), lambda_0 - lambda_s)\n            nfe += order\n        print('adaptive solver nfe', nfe)\n        return x\n\n    def add_noise(self, x, t, noise=None):\n        \"\"\"\n        Compute the noised input xt = alpha_t * x + sigma_t * noise.\n\n        Args:\n            x: A `torch.Tensor` with shape `(batch_size, *shape)`.\n            t: A `torch.Tensor` with shape `(t_size,)`.\n        Returns:\n            xt with shape `(t_size, batch_size, *shape)`.\n        \"\"\"\n        alpha_t, sigma_t = self.noise_schedule.marginal_alpha(t), self.noise_schedule.marginal_std(t)\n        if noise is None:\n            noise = torch.randn((t.shape[0], *x.shape), device=x.device)\n        x = x.reshape((-1, *x.shape))\n        xt = expand_dims(alpha_t, x.dim()) * x + expand_dims(sigma_t, x.dim()) * noise\n        return xt.squeeze(0) if t.shape[0] == 1 else xt\n\n    def inverse(self, x, steps=20, t_start=None, t_end=None, order=2, skip_type='time_uniform',\n                method='multistep', lower_order_final=True, denoise_to_zero=False, solver_type='dpmsolver',\n                atol=0.0078, rtol=0.05, return_intermediate=False,\n                ):\n        \"\"\"\n        Inverse the sample `x` from time `t_start` to `t_end` by DPM-Solver.\n        For discrete-time DPMs, we use `t_start=1/N`, where `N` is the total time steps during training.\n        \"\"\"\n        t_0 = 1. / self.noise_schedule.total_N if t_start is None else t_start\n        t_T = self.noise_schedule.T if t_end is None else t_end\n        assert t_0 > 0 and t_T > 0, \"Time range needs to be greater than 0. For discrete-time DPMs, it needs to be in [1 / N, 1], where N is the length of betas array\"\n        return self.sample(x, steps=steps, t_start=t_0, t_end=t_T, order=order, skip_type=skip_type,\n                           method=method, lower_order_final=lower_order_final, denoise_to_zero=denoise_to_zero,\n                           solver_type=solver_type,\n                           atol=atol, rtol=rtol, return_intermediate=return_intermediate)\n\n    def sample(self, x, steps=20, t_start=None, t_end=None, order=2, skip_type='time_uniform',\n               method='multistep', lower_order_final=True, denoise_to_zero=False, solver_type='dpmsolver',\n               atol=0.0078, rtol=0.05, return_intermediate=False, model_kwargs = {}, rank = None,\n               ):\n        \"\"\"\n        Compute the sample at time `t_end` by DPM-Solver, given the initial `x` at time `t_start`.\n\n        =====================================================\n\n        We support the following algorithms for both noise prediction model and data prediction model:\n            - 'singlestep':\n                Singlestep DPM-Solver (i.e. \"DPM-Solver-fast\" in the paper), which combines different orders of singlestep DPM-Solver.\n                We combine all the singlestep solvers with order <= `order` to use up all the function evaluations (steps).\n                The total number of function evaluations (NFE) == `steps`.\n                Given a fixed NFE == `steps`, the sampling procedure is:\n                    - If `order` == 1:\n                        - Denote K = steps. We use K steps of DPM-Solver-1 (i.e. DDIM).\n                    - If `order` == 2:\n                        - Denote K = (steps // 2) + (steps % 2). We take K intermediate time steps for sampling.\n                        - If steps % 2 == 0, we use K steps of singlestep DPM-Solver-2.\n                        - If steps % 2 == 1, we use (K - 1) steps of singlestep DPM-Solver-2 and 1 step of DPM-Solver-1.\n                    - If `order` == 3:\n                        - Denote K = (steps // 3 + 1). We take K intermediate time steps for sampling.\n                        - If steps % 3 == 0, we use (K - 2) steps of singlestep DPM-Solver-3, and 1 step of singlestep DPM-Solver-2 and 1 step of DPM-Solver-1.\n                        - If steps % 3 == 1, we use (K - 1) steps of singlestep DPM-Solver-3 and 1 step of DPM-Solver-1.\n                        - If steps % 3 == 2, we use (K - 1) steps of singlestep DPM-Solver-3 and 1 step of singlestep DPM-Solver-2.\n            - 'multistep':\n                Multistep DPM-Solver with the order of `order`. The total number of function evaluations (NFE) == `steps`.\n                We initialize the first `order` values by lower order multistep solvers.\n                Given a fixed NFE == `steps`, the sampling procedure is:\n                    Denote K = steps.\n                    - If `order` == 1:\n                        - We use K steps of DPM-Solver-1 (i.e. DDIM).\n                    - If `order` == 2:\n                        - We firstly use 1 step of DPM-Solver-1, then use (K - 1) step of multistep DPM-Solver-2.\n                    - If `order` == 3:\n                        - We firstly use 1 step of DPM-Solver-1, then 1 step of multistep DPM-Solver-2, then (K - 2) step of multistep DPM-Solver-3.\n            - 'singlestep_fixed':\n                Fixed order singlestep DPM-Solver (i.e. DPM-Solver-1 or singlestep DPM-Solver-2 or singlestep DPM-Solver-3).\n                We use singlestep DPM-Solver-`order` for `order`=1 or 2 or 3, with total [`steps` // `order`] * `order` NFE.\n            - 'adaptive':\n                Adaptive step size DPM-Solver (i.e. \"DPM-Solver-12\" and \"DPM-Solver-23\" in the paper).\n                We ignore `steps` and use adaptive step size DPM-Solver with a higher order of `order`.\n                You can adjust the absolute tolerance `atol` and the relative tolerance `rtol` to balance the computatation costs\n                (NFE) and the sample quality.\n                    - If `order` == 2, we use DPM-Solver-12 which combines DPM-Solver-1 and singlestep DPM-Solver-2.\n                    - If `order` == 3, we use DPM-Solver-23 which combines singlestep DPM-Solver-2 and singlestep DPM-Solver-3.\n\n        =====================================================\n\n        Some advices for choosing the algorithm:\n            - For **unconditional sampling** or **guided sampling with small guidance scale** by DPMs:\n                Use singlestep DPM-Solver or DPM-Solver++ (\"DPM-Solver-fast\" in the paper) with `order = 3`.\n                e.g., DPM-Solver:\n                    >>> dpm_solver = DPM_Solver(model_fn, noise_schedule, algorithm_type=\"dpmsolver\")\n                    >>> x_sample = dpm_solver.sample(x, steps=steps, t_start=t_start, t_end=t_end, order=3,\n                            skip_type='time_uniform', method='singlestep')\n                e.g., DPM-Solver++:\n                    >>> dpm_solver = DPM_Solver(model_fn, noise_schedule, algorithm_type=\"dpmsolver++\")\n                    >>> x_sample = dpm_solver.sample(x, steps=steps, t_start=t_start, t_end=t_end, order=3,\n                            skip_type='time_uniform', method='singlestep')\n            - For **guided sampling with large guidance scale** by DPMs:\n                Use multistep DPM-Solver with `algorithm_type=\"dpmsolver++\"` and `order = 2`.\n                e.g.\n                    >>> dpm_solver = DPM_Solver(model_fn, noise_schedule, algorithm_type=\"dpmsolver++\")\n                    >>> x_sample = dpm_solver.sample(x, steps=steps, t_start=t_start, t_end=t_end, order=2,\n                            skip_type='time_uniform', method='multistep')\n\n        We support three types of `skip_type`:\n            - 'logSNR': uniform logSNR for the time steps. **Recommended for low-resolutional images**\n            - 'time_uniform': uniform time for the time steps. **Recommended for high-resolutional images**.\n            - 'time_quadratic': quadratic time for the time steps.\n\n        =====================================================\n        Args:\n            x: A pytorch tensor. The initial value at time `t_start`\n                e.g. if `t_start` == T, then `x` is a sample from the standard normal distribution.\n            steps: A `int`. The total number of function evaluations (NFE).\n            t_start: A `float`. The starting time of the sampling.\n                If `T` is None, we use self.noise_schedule.T (default is 1.0).\n            t_end: A `float`. The ending time of the sampling.\n                If `t_end` is None, we use 1. / self.noise_schedule.total_N.\n                e.g. if total_N == 1000, we have `t_end` == 1e-3.\n                For discrete-time DPMs:\n                    - We recommend `t_end` == 1. / self.noise_schedule.total_N.\n                For continuous-time DPMs:\n                    - We recommend `t_end` == 1e-3 when `steps` <= 15; and `t_end` == 1e-4 when `steps` > 15.\n            order: A `int`. The order of DPM-Solver.\n            skip_type: A `str`. The type for the spacing of the time steps. 'time_uniform' or 'logSNR' or 'time_quadratic'.\n            method: A `str`. The method for sampling. 'singlestep' or 'multistep' or 'singlestep_fixed' or 'adaptive'.\n            denoise_to_zero: A `bool`. Whether to denoise to time 0 at the final step.\n                Default is `False`. If `denoise_to_zero` is `True`, the total NFE is (`steps` + 1).\n\n                This trick is firstly proposed by DDPM (https://arxiv.org/abs/2006.11239) and\n                score_sde (https://arxiv.org/abs/2011.13456). Such trick can improve the FID\n                for diffusion models sampling by diffusion SDEs for low-resolutional images\n                (such as CIFAR-10). However, we observed that such trick does not matter for\n                high-resolutional images. As it needs an additional NFE, we do not recommend\n                it for high-resolutional images.\n            lower_order_final: A `bool`. Whether to use lower order solvers at the final steps.\n                Only valid for `method=multistep` and `steps < 15`. We empirically find that\n                this trick is a key to stabilizing the sampling by DPM-Solver with very few steps\n                (especially for steps <= 10). So we recommend to set it to be `True`.\n            solver_type: A `str`. The taylor expansion type for the solver. `dpmsolver` or `taylor`. We recommend `dpmsolver`.\n            atol: A `float`. The absolute tolerance of the adaptive step size solver. Valid when `method` == 'adaptive'.\n            rtol: A `float`. The relative tolerance of the adaptive step size solver. Valid when `method` == 'adaptive'.\n            return_intermediate: A `bool`. Whether to save the xt at each step.\n                When set to `True`, method returns a tuple (x0, intermediates); when set to False, method returns only x0.\n        Returns:\n            x_end: A pytorch tensor. The approximated solution at time `t_end`.\n\n        \"\"\"\n        t_0 = 1. / self.noise_schedule.total_N if t_end is None else t_end\n        t_T = self.noise_schedule.T if t_start is None else t_start\n        assert t_0 > 0 and t_T > 0, \"Time range needs to be greater than 0. For discrete-time DPMs, it needs to be in [1 / N, 1], where N is the length of betas array\"\n        if return_intermediate:\n            assert method in ['multistep', 'singlestep',\n                              'singlestep_fixed'], \"Cannot use adaptive solver when saving intermediate values\"\n        if self.correcting_xt_fn is not None:\n            assert method in ['multistep', 'singlestep',\n                              'singlestep_fixed'], \"Cannot use adaptive solver when correcting_xt_fn is not None\"\n        device = x.device\n        intermediates = []\n\n        cache_dic, current = cache_init(model_kwargs=model_kwargs, num_steps=steps)\n        \n        with torch.no_grad():\n            if method == 'adaptive':\n                x = self.dpm_solver_adaptive(x, order=order, t_T=t_T, t_0=t_0, atol=atol, rtol=rtol,\n                                             solver_type=solver_type)\n            elif method == 'multistep':\n                assert steps >= order\n                timesteps = self.get_time_steps(skip_type=skip_type, t_T=t_T, t_0=t_0, N=steps, device=device)\n                assert timesteps.shape[0] - 1 == steps\n                # Init the initial values.\n                step = 0\n                current['step'] = step\n                t = timesteps[step]\n                t_prev_list = [t]\n                model_prev_list = [self.model_fn(x, t, current, cache_dic)]\n                if self.correcting_xt_fn is not None:\n                    x = self.correcting_xt_fn(x, t, step)\n                if return_intermediate:\n                    intermediates.append(x)\n                # Init the first `order` values by lower order multistep DPM-Solver.\n                for step in range(1, order):\n                    current['step'] = step\n                    t = timesteps[step]\n                    x = self.multistep_dpm_solver_update(x, model_prev_list, t_prev_list, t, current, cache_dic, step,\n                                                         solver_type=solver_type)\n                    if self.correcting_xt_fn is not None:\n                        x = self.correcting_xt_fn(x, t, step)\n                    if return_intermediate:\n                        intermediates.append(x)\n                    t_prev_list.append(t)\n                    model_prev_list.append(self.model_fn(x, t, current, cache_dic))\n                # Compute the remaining values by `order`-th order multistep DPM-Solver.\n                pbar = tqdm(range(order, steps + 1), leave=False) if (rank == 0) or (rank == None) else range(order, steps + 1)\n                for step in pbar:\n                    current['step'] = step\n                    t = timesteps[step]\n                    # We only use lower order for steps < 10\n                    if lower_order_final and steps < 10:\n                        step_order = min(order, steps + 1 - step)\n                    else:\n                        step_order = order\n                    x = self.multistep_dpm_solver_update(x, model_prev_list, t_prev_list, t, current, cache_dic, step_order,\n                                                         solver_type=solver_type)\n                    if self.correcting_xt_fn is not None:\n                        x = self.correcting_xt_fn(x, t, step)\n                    if return_intermediate:\n                        intermediates.append(x)\n                    for i in range(order - 1):\n                        t_prev_list[i] = t_prev_list[i + 1]\n                        model_prev_list[i] = model_prev_list[i + 1]\n                    t_prev_list[-1] = t\n                    # We do not need to evaluate the final model value.\n                    if step < steps:\n                        model_prev_list[-1] = self.model_fn(x, t, current, cache_dic)\n            elif method in ['singlestep', 'singlestep_fixed']:\n                if method == 'singlestep':\n                    timesteps_outer, orders = self.get_orders_and_timesteps_for_singlestep_solver(steps=steps,\n                                                                                                  order=order,\n                                                                                                  skip_type=skip_type,\n                                                                                                  t_T=t_T, t_0=t_0,\n                                                                                                  device=device)\n                elif method == 'singlestep_fixed':\n                    K = steps // order\n                    orders = [order, ] * K\n                    timesteps_outer = self.get_time_steps(skip_type=skip_type, t_T=t_T, t_0=t_0, N=K, device=device)\n                for step, order in enumerate(orders):\n                    s, t = timesteps_outer[step], timesteps_outer[step + 1]\n                    timesteps_inner = self.get_time_steps(skip_type=skip_type, t_T=s.item(), t_0=t.item(), N=order,\n                                                          device=device)\n                    lambda_inner = self.noise_schedule.marginal_lambda(timesteps_inner)\n                    h = lambda_inner[-1] - lambda_inner[0]\n                    r1 = None if order <= 1 else (lambda_inner[1] - lambda_inner[0]) / h\n                    r2 = None if order <= 2 else (lambda_inner[2] - lambda_inner[0]) / h\n                    x = self.singlestep_dpm_solver_update(x, s, t, order, solver_type=solver_type, r1=r1, r2=r2)\n                    if self.correcting_xt_fn is not None:\n                        x = self.correcting_xt_fn(x, t, step)\n                    if return_intermediate:\n                        intermediates.append(x)\n            else:\n                raise ValueError(f\"Got wrong method {method}\")\n            if denoise_to_zero:\n                t = torch.ones((1,)).to(device) * t_0\n                x = self.denoise_to_zero_fn(x, t)\n                if self.correcting_xt_fn is not None:\n                    x = self.correcting_xt_fn(x, t, step + 1)\n                if return_intermediate:\n                    intermediates.append(x)\n        return (x, intermediates) if return_intermediate else x\n\n\n#############################################################\n# other utility functions\n#############################################################\n\ndef interpolate_fn(x, xp, yp):\n    \"\"\"\n    A piecewise linear function y = f(x), using xp and yp as keypoints.\n    We implement f(x) in a differentiable way (i.e. applicable for autograd).\n    The function f(x) is well-defined for all x-axis. (For x beyond the bounds of xp, we use the outmost points of xp to define the linear function.)\n\n    Args:\n        x: PyTorch tensor with shape [N, C], where N is the batch size, C is the number of channels (we use C = 1 for DPM-Solver).\n        xp: PyTorch tensor with shape [C, K], where K is the number of keypoints.\n        yp: PyTorch tensor with shape [C, K].\n    Returns:\n        The function values f(x), with shape [N, C].\n    \"\"\"\n    N, K = x.shape[0], xp.shape[1]\n    all_x = torch.cat([x.unsqueeze(2), xp.unsqueeze(0).repeat((N, 1, 1))], dim=2)\n    sorted_all_x, x_indices = torch.sort(all_x, dim=2)\n    x_idx = torch.argmin(x_indices, dim=2)\n    cand_start_idx = x_idx - 1\n    start_idx = torch.where(\n        torch.eq(x_idx, 0),\n        torch.tensor(1, device=x.device),\n        torch.where(\n            torch.eq(x_idx, K), torch.tensor(K - 2, device=x.device), cand_start_idx,\n        ),\n    )\n    end_idx = torch.where(torch.eq(start_idx, cand_start_idx), start_idx + 2, start_idx + 1)\n    start_x = torch.gather(sorted_all_x, dim=2, index=start_idx.unsqueeze(2)).squeeze(2)\n    end_x = torch.gather(sorted_all_x, dim=2, index=end_idx.unsqueeze(2)).squeeze(2)\n    start_idx2 = torch.where(\n        torch.eq(x_idx, 0),\n        torch.tensor(0, device=x.device),\n        torch.where(\n            torch.eq(x_idx, K), torch.tensor(K - 2, device=x.device), cand_start_idx,\n        ),\n    )\n    y_positions_expanded = yp.unsqueeze(0).expand(N, -1, -1)\n    start_y = torch.gather(y_positions_expanded, dim=2, index=start_idx2.unsqueeze(2)).squeeze(2)\n    end_y = torch.gather(y_positions_expanded, dim=2, index=(start_idx2 + 1).unsqueeze(2)).squeeze(2)\n    return start_y + (x - start_x) * (end_y - start_y) / (end_x - start_x)\n\n\ndef expand_dims(v, dims):\n    \"\"\"\n    Expand the tensor `v` to the dim `dims`.\n\n    Args:\n        `v`: a PyTorch tensor with shape [N].\n        `dim`: a `int`.\n    Returns:\n        a PyTorch tensor with shape [N, 1, 1, ..., 1] and the total dimension is `dims`.\n    \"\"\"\n    return v[(...,) + (None,) * (dims - 1)]"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/edm_sample.py",
    "content": "import random\nimport numpy as np\nfrom tqdm import tqdm\n\nfrom diffusion.model.utils import *\n\n\n# ----------------------------------------------------------------------------\n# Proposed EDM sampler (Algorithm 2).\n\ndef edm_sampler(\n        net, latents, class_labels=None, cfg_scale=None, randn_like=torch.randn_like,\n        num_steps=18, sigma_min=0.002, sigma_max=80, rho=7,\n        S_churn=0, S_min=0, S_max=float('inf'), S_noise=1, **kwargs\n):\n    # Adjust noise levels based on what's supported by the network.\n    sigma_min = max(sigma_min, net.sigma_min)\n    sigma_max = min(sigma_max, net.sigma_max)\n\n    # Time step discretization.\n    step_indices = torch.arange(num_steps, dtype=torch.float64, device=latents.device)\n    t_steps = (sigma_max ** (1 / rho) + step_indices / (num_steps - 1) * (\n                sigma_min ** (1 / rho) - sigma_max ** (1 / rho))) ** rho\n    t_steps = torch.cat([net.round_sigma(t_steps), torch.zeros_like(t_steps[:1])])  # t_N = 0\n\n    # Main sampling loop.\n    x_next = latents.to(torch.float64) * t_steps[0]\n    for i, (t_cur, t_next) in tqdm(list(enumerate(zip(t_steps[:-1], t_steps[1:])))):  # 0, ..., N-1\n        x_cur = x_next\n\n        # Increase noise temporarily.\n        gamma = min(S_churn / num_steps, np.sqrt(2) - 1) if S_min <= t_cur <= S_max else 0\n        t_hat = net.round_sigma(t_cur + gamma * t_cur)\n        x_hat = x_cur + (t_hat ** 2 - t_cur ** 2).sqrt() * S_noise * randn_like(x_cur)\n\n        # Euler step.\n        denoised = net(x_hat.float(), t_hat, class_labels, cfg_scale, **kwargs)['x'].to(torch.float64)\n        d_cur = (x_hat - denoised) / t_hat\n        x_next = x_hat + (t_next - t_hat) * d_cur\n\n        # Apply 2nd order correction.\n        if i < num_steps - 1:\n            denoised = net(x_next.float(), t_next, class_labels, cfg_scale, **kwargs)['x'].to(torch.float64)\n            d_prime = (x_next - denoised) / t_next\n            x_next = x_hat + (t_next - t_hat) * (0.5 * d_cur + 0.5 * d_prime)\n\n    return x_next\n\n\n# ----------------------------------------------------------------------------\n# Generalized ablation sampler, representing the superset of all sampling\n# methods discussed in the paper.\n\ndef ablation_sampler(\n        net, latents, class_labels=None, cfg_scale=None, feat=None, randn_like=torch.randn_like,\n        num_steps=18, sigma_min=None, sigma_max=None, rho=7,\n        solver='heun', discretization='edm', schedule='linear', scaling='none',\n        epsilon_s=1e-3, C_1=0.001, C_2=0.008, M=1000, alpha=1,\n        S_churn=0, S_min=0, S_max=float('inf'), S_noise=1,\n):\n    assert solver in ['euler', 'heun']\n    assert discretization in ['vp', 've', 'iddpm', 'edm']\n    assert schedule in ['vp', 've', 'linear']\n    assert scaling in ['vp', 'none']\n\n    # Helper functions for VP & VE noise level schedules.\n    vp_sigma = lambda beta_d, beta_min: lambda t: (np.e ** (0.5 * beta_d * (t ** 2) + beta_min * t) - 1) ** 0.5\n    vp_sigma_deriv = lambda beta_d, beta_min: lambda t: 0.5 * (beta_min + beta_d * t) * (sigma(t) + 1 / sigma(t))\n    vp_sigma_inv = lambda beta_d, beta_min: lambda sigma: ((beta_min ** 2 + 2 * beta_d * (\n            sigma ** 2 + 1).log()).sqrt() - beta_min) / beta_d\n    ve_sigma = lambda t: t.sqrt()\n    ve_sigma_deriv = lambda t: 0.5 / t.sqrt()\n    ve_sigma_inv = lambda sigma: sigma ** 2\n\n    # Select default noise level range based on the specified time step discretization.\n    if sigma_min is None:\n        vp_def = vp_sigma(beta_d=19.1, beta_min=0.1)(t=epsilon_s)\n        sigma_min = {'vp': vp_def, 've': 0.02, 'iddpm': 0.002, 'edm': 0.002}[discretization]\n    if sigma_max is None:\n        vp_def = vp_sigma(beta_d=19.1, beta_min=0.1)(t=1)\n        sigma_max = {'vp': vp_def, 've': 100, 'iddpm': 81, 'edm': 80}[discretization]\n\n    # Adjust noise levels based on what's supported by the network.\n    sigma_min = max(sigma_min, net.sigma_min)\n    sigma_max = min(sigma_max, net.sigma_max)\n\n    # Compute corresponding betas for VP.\n    vp_beta_d = 2 * (np.log(sigma_min ** 2 + 1) / epsilon_s - np.log(sigma_max ** 2 + 1)) / (epsilon_s - 1)\n    vp_beta_min = np.log(sigma_max ** 2 + 1) - 0.5 * vp_beta_d\n\n    # Define time steps in terms of noise level.\n    step_indices = torch.arange(num_steps, dtype=torch.float64, device=latents.device)\n    if discretization == 'vp':\n        orig_t_steps = 1 + step_indices / (num_steps - 1) * (epsilon_s - 1)\n        sigma_steps = vp_sigma(vp_beta_d, vp_beta_min)(orig_t_steps)\n    elif discretization == 've':\n        orig_t_steps = (sigma_max ** 2) * ((sigma_min ** 2 / sigma_max ** 2) ** (step_indices / (num_steps - 1)))\n        sigma_steps = ve_sigma(orig_t_steps)\n    elif discretization == 'iddpm':\n        u = torch.zeros(M + 1, dtype=torch.float64, device=latents.device)\n        alpha_bar = lambda j: (0.5 * np.pi * j / M / (C_2 + 1)).sin() ** 2\n        for j in torch.arange(M, 0, -1, device=latents.device):  # M, ..., 1\n            u[j - 1] = ((u[j] ** 2 + 1) / (alpha_bar(j - 1) / alpha_bar(j)).clip(min=C_1) - 1).sqrt()\n        u_filtered = u[torch.logical_and(u >= sigma_min, u <= sigma_max)]\n        sigma_steps = u_filtered[((len(u_filtered) - 1) / (num_steps - 1) * step_indices).round().to(torch.int64)]\n    else:\n        assert discretization == 'edm'\n        sigma_steps = (sigma_max ** (1 / rho) + step_indices / (num_steps - 1) * (\n                sigma_min ** (1 / rho) - sigma_max ** (1 / rho))) ** rho\n\n    # Define noise level schedule.\n    if schedule == 'vp':\n        sigma = vp_sigma(vp_beta_d, vp_beta_min)\n        sigma_deriv = vp_sigma_deriv(vp_beta_d, vp_beta_min)\n        sigma_inv = vp_sigma_inv(vp_beta_d, vp_beta_min)\n    elif schedule == 've':\n        sigma = ve_sigma\n        sigma_deriv = ve_sigma_deriv\n        sigma_inv = ve_sigma_inv\n    else:\n        assert schedule == 'linear'\n        sigma = lambda t: t\n        sigma_deriv = lambda t: 1\n        sigma_inv = lambda sigma: sigma\n\n    # Define scaling schedule.\n    if scaling == 'vp':\n        s = lambda t: 1 / (1 + sigma(t) ** 2).sqrt()\n        s_deriv = lambda t: -sigma(t) * sigma_deriv(t) * (s(t) ** 3)\n    else:\n        assert scaling == 'none'\n        s = lambda t: 1\n        s_deriv = lambda t: 0\n\n    # Compute final time steps based on the corresponding noise levels.\n    t_steps = sigma_inv(net.round_sigma(sigma_steps))\n    t_steps = torch.cat([t_steps, torch.zeros_like(t_steps[:1])])  # t_N = 0\n\n    # Main sampling loop.\n    t_next = t_steps[0]\n    x_next = latents.to(torch.float64) * (sigma(t_next) * s(t_next))\n    for i, (t_cur, t_next) in enumerate(zip(t_steps[:-1], t_steps[1:])):  # 0, ..., N-1\n        x_cur = x_next\n\n        # Increase noise temporarily.\n        gamma = min(S_churn / num_steps, np.sqrt(2) - 1) if S_min <= sigma(t_cur) <= S_max else 0\n        t_hat = sigma_inv(net.round_sigma(sigma(t_cur) + gamma * sigma(t_cur)))\n        x_hat = s(t_hat) / s(t_cur) * x_cur + (sigma(t_hat) ** 2 - sigma(t_cur) ** 2).clip(min=0).sqrt() * s(\n            t_hat) * S_noise * randn_like(x_cur)\n\n        # Euler step.\n        h = t_next - t_hat\n        denoised = net(x_hat.float() / s(t_hat), sigma(t_hat), class_labels, cfg_scale, feat=feat)['x'].to(\n            torch.float64)\n        d_cur = (sigma_deriv(t_hat) / sigma(t_hat) + s_deriv(t_hat) / s(t_hat)) * x_hat - sigma_deriv(t_hat) * s(\n            t_hat) / sigma(t_hat) * denoised\n        x_prime = x_hat + alpha * h * d_cur\n        t_prime = t_hat + alpha * h\n\n        # Apply 2nd order correction.\n        if solver == 'euler' or i == num_steps - 1:\n            x_next = x_hat + h * d_cur\n        else:\n            assert solver == 'heun'\n            denoised = net(x_prime.float() / s(t_prime), sigma(t_prime), class_labels, cfg_scale, feat=feat)['x'].to(\n                torch.float64)\n            d_prime = (sigma_deriv(t_prime) / sigma(t_prime) + s_deriv(t_prime) / s(t_prime)) * x_prime - sigma_deriv(\n                t_prime) * s(t_prime) / sigma(t_prime) * denoised\n            x_next = x_hat + h * ((1 - 1 / (2 * alpha)) * d_cur + 1 / (2 * alpha) * d_prime)\n\n    return x_next\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/gaussian_diffusion.py",
    "content": "# Modified from OpenAI's diffusion repos\n#     GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py\n#     ADM:   https://github.com/openai/guided-diffusion/blob/main/guided_diffusion\n#     IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\n\n\nimport enum\nimport math\n\nimport numpy as np\nimport torch as th\nimport torch.nn.functional as F\n\nfrom .diffusion_utils import discretized_gaussian_log_likelihood, normal_kl\nfrom .cache_functions import cache_init\n\ndef mean_flat(tensor):\n    \"\"\"\n    Take the mean over all non-batch dimensions.\n    \"\"\"\n    return tensor.mean(dim=list(range(1, len(tensor.shape))))\n\n\nclass ModelMeanType(enum.Enum):\n    \"\"\"\n    Which type of output the model predicts.\n    \"\"\"\n\n    PREVIOUS_X = enum.auto()  # the model predicts x_{t-1}\n    START_X = enum.auto()  # the model predicts x_0\n    EPSILON = enum.auto()  # the model predicts epsilon\n\n\nclass ModelVarType(enum.Enum):\n    \"\"\"\n    What is used as the model's output variance.\n    The LEARNED_RANGE option has been added to allow the model to predict\n    values between FIXED_SMALL and FIXED_LARGE, making its job easier.\n    \"\"\"\n\n    LEARNED = enum.auto()\n    FIXED_SMALL = enum.auto()\n    FIXED_LARGE = enum.auto()\n    LEARNED_RANGE = enum.auto()\n\n\nclass LossType(enum.Enum):\n    MSE = enum.auto()  # use raw MSE loss (and KL when learning variances)\n    RESCALED_MSE = (\n        enum.auto()\n    )  # use raw MSE loss (with RESCALED_KL when learning variances)\n    KL = enum.auto()  # use the variational lower-bound\n    RESCALED_KL = enum.auto()  # like KL, but rescale to estimate the full VLB\n\n    def is_vb(self):\n        return self in [LossType.KL, LossType.RESCALED_KL]\n\n\ndef _warmup_beta(beta_start, beta_end, num_diffusion_timesteps, warmup_frac):\n    betas = beta_end * np.ones(num_diffusion_timesteps, dtype=np.float64)\n    warmup_time = int(num_diffusion_timesteps * warmup_frac)\n    betas[:warmup_time] = np.linspace(beta_start, beta_end, warmup_time, dtype=np.float64)\n    return betas\n\n\ndef get_beta_schedule(beta_schedule, *, beta_start, beta_end, num_diffusion_timesteps):\n    \"\"\"\n    This is the deprecated API for creating beta schedules.\n    See get_named_beta_schedule() for the new library of schedules.\n    \"\"\"\n    if beta_schedule == \"quad\":\n        betas = (\n            np.linspace(\n                beta_start ** 0.5,\n                beta_end ** 0.5,\n                num_diffusion_timesteps,\n                dtype=np.float64,\n            )\n            ** 2\n        )\n    elif beta_schedule == \"linear\":\n        betas = np.linspace(beta_start, beta_end, num_diffusion_timesteps, dtype=np.float64)\n    elif beta_schedule == \"warmup10\":\n        betas = _warmup_beta(beta_start, beta_end, num_diffusion_timesteps, 0.1)\n    elif beta_schedule == \"warmup50\":\n        betas = _warmup_beta(beta_start, beta_end, num_diffusion_timesteps, 0.5)\n    elif beta_schedule == \"const\":\n        betas = beta_end * np.ones(num_diffusion_timesteps, dtype=np.float64)\n    elif beta_schedule == \"jsd\":  # 1/T, 1/(T-1), 1/(T-2), ..., 1\n        betas = 1.0 / np.linspace(\n            num_diffusion_timesteps, 1, num_diffusion_timesteps, dtype=np.float64\n        )\n    else:\n        raise NotImplementedError(beta_schedule)\n    assert betas.shape == (num_diffusion_timesteps,)\n    return betas\n\n\ndef get_named_beta_schedule(schedule_name, num_diffusion_timesteps):\n    \"\"\"\n    Get a pre-defined beta schedule for the given name.\n    The beta schedule library consists of beta schedules which remain similar\n    in the limit of num_diffusion_timesteps.\n    Beta schedules may be added, but should not be removed or changed once\n    they are committed to maintain backwards compatibility.\n    \"\"\"\n    if schedule_name == \"linear\":\n        # Linear schedule from Ho et al, extended to work for any number of\n        # diffusion steps.\n        scale = 1000 / num_diffusion_timesteps\n        return get_beta_schedule(\n            \"linear\",\n            beta_start=scale * 0.0001,\n            beta_end=scale * 0.02,\n            num_diffusion_timesteps=num_diffusion_timesteps,\n        )\n    elif schedule_name == \"squaredcos_cap_v2\":\n        return betas_for_alpha_bar(\n            num_diffusion_timesteps,\n            lambda t: math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2,\n        )\n    else:\n        raise NotImplementedError(f\"unknown beta schedule: {schedule_name}\")\n\n\ndef betas_for_alpha_bar(num_diffusion_timesteps, alpha_bar, max_beta=0.999):\n    \"\"\"\n    Create a beta schedule that discretizes the given alpha_t_bar function,\n    which defines the cumulative product of (1-beta) over time from t = [0,1].\n    :param num_diffusion_timesteps: the number of betas to produce.\n    :param alpha_bar: a lambda that takes an argument t from 0 to 1 and\n                      produces the cumulative product of (1-beta) up to that\n                      part of the diffusion process.\n    :param max_beta: the maximum beta to use; use values lower than 1 to\n                     prevent singularities.\n    \"\"\"\n    betas = []\n    for i in range(num_diffusion_timesteps):\n        t1 = i / num_diffusion_timesteps\n        t2 = (i + 1) / num_diffusion_timesteps\n        betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta))\n    return np.array(betas)\n\n\nclass GaussianDiffusion:\n    \"\"\"\n    Utilities for training and sampling diffusion models.\n    Original ported from this codebase:\n    https://github.com/hojonathanho/diffusion/blob/1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/diffusion_utils_2.py#L42\n    :param betas: a 1-D numpy array of betas for each diffusion timestep,\n                  starting at T and going to 1.\n    \"\"\"\n\n    def __init__(\n        self,\n        *,\n        betas,\n        model_mean_type,\n        model_var_type,\n        loss_type,\n        snr=False,\n        return_startx=False,\n    ):\n\n        self.model_mean_type = model_mean_type\n        self.model_var_type = model_var_type\n        self.loss_type = loss_type\n        self.snr = snr\n        self.return_startx = return_startx\n\n        # Use float64 for accuracy.\n        betas = np.array(betas, dtype=np.float64)\n        self.betas = betas\n        assert len(betas.shape) == 1, \"betas must be 1-D\"\n        assert (betas > 0).all() and (betas <= 1).all()\n\n        self.num_timesteps = int(betas.shape[0])\n\n        alphas = 1.0 - betas\n        self.alphas_cumprod = np.cumprod(alphas, axis=0)\n        self.alphas_cumprod_prev = np.append(1.0, self.alphas_cumprod[:-1])\n        self.alphas_cumprod_next = np.append(self.alphas_cumprod[1:], 0.0)\n        assert self.alphas_cumprod_prev.shape == (self.num_timesteps,)\n\n        # calculations for diffusion q(x_t | x_{t-1}) and others\n        self.sqrt_alphas_cumprod = np.sqrt(self.alphas_cumprod)\n        self.sqrt_one_minus_alphas_cumprod = np.sqrt(1.0 - self.alphas_cumprod)\n        self.log_one_minus_alphas_cumprod = np.log(1.0 - self.alphas_cumprod)\n        self.sqrt_recip_alphas_cumprod = np.sqrt(1.0 / self.alphas_cumprod)\n        self.sqrt_recipm1_alphas_cumprod = np.sqrt(1.0 / self.alphas_cumprod - 1)\n\n        # calculations for posterior q(x_{t-1} | x_t, x_0)\n        self.posterior_variance = (\n            betas * (1.0 - self.alphas_cumprod_prev) / (1.0 - self.alphas_cumprod)\n        )\n        # below: log calculation clipped because the posterior variance is 0 at the beginning of the diffusion chain\n        self.posterior_log_variance_clipped = np.log(\n            np.append(self.posterior_variance[1], self.posterior_variance[1:])\n        ) if len(self.posterior_variance) > 1 else np.array([])\n\n        self.posterior_mean_coef1 = (\n            betas * np.sqrt(self.alphas_cumprod_prev) / (1.0 - self.alphas_cumprod)\n        )\n        self.posterior_mean_coef2 = (\n            (1.0 - self.alphas_cumprod_prev) * np.sqrt(alphas) / (1.0 - self.alphas_cumprod)\n        )\n\n    def q_mean_variance(self, x_start, t):\n        \"\"\"\n        Get the distribution q(x_t | x_0).\n        :param x_start: the [N x C x ...] tensor of noiseless inputs.\n        :param t: the number of diffusion steps (minus 1). Here, 0 means one step.\n        :return: A tuple (mean, variance, log_variance), all of x_start's shape.\n        \"\"\"\n        mean = _extract_into_tensor(self.sqrt_alphas_cumprod, t, x_start.shape) * x_start\n        variance = _extract_into_tensor(1.0 - self.alphas_cumprod, t, x_start.shape)\n        log_variance = _extract_into_tensor(self.log_one_minus_alphas_cumprod, t, x_start.shape)\n        return mean, variance, log_variance\n\n    def q_sample(self, x_start, t, noise=None):\n        \"\"\"\n        Diffuse the data for a given number of diffusion steps.\n        In other words, sample from q(x_t | x_0).\n        :param x_start: the initial data batch.\n        :param t: the number of diffusion steps (minus 1). Here, 0 means one step.\n        :param noise: if specified, the split-out normal noise.\n        :return: A noisy version of x_start.\n        \"\"\"\n        if noise is None:\n            noise = th.randn_like(x_start)\n        assert noise.shape == x_start.shape\n        return (\n            _extract_into_tensor(self.sqrt_alphas_cumprod, t, x_start.shape) * x_start\n            + _extract_into_tensor(self.sqrt_one_minus_alphas_cumprod, t, x_start.shape) * noise\n        )\n\n    def q_posterior_mean_variance(self, x_start, x_t, t):\n        \"\"\"\n        Compute the mean and variance of the diffusion posterior:\n            q(x_{t-1} | x_t, x_0)\n        \"\"\"\n        assert x_start.shape == x_t.shape\n        posterior_mean = (\n            _extract_into_tensor(self.posterior_mean_coef1, t, x_t.shape) * x_start\n            + _extract_into_tensor(self.posterior_mean_coef2, t, x_t.shape) * x_t\n        )\n        posterior_variance = _extract_into_tensor(self.posterior_variance, t, x_t.shape)\n        posterior_log_variance_clipped = _extract_into_tensor(\n            self.posterior_log_variance_clipped, t, x_t.shape\n        )\n        assert (\n            posterior_mean.shape[0]\n            == posterior_variance.shape[0]\n            == posterior_log_variance_clipped.shape[0]\n            == x_start.shape[0]\n        )\n        return posterior_mean, posterior_variance, posterior_log_variance_clipped\n\n    def p_mean_variance(self, model, x, t, current, cache_dic, clip_denoised=True, denoised_fn=None, model_kwargs=None):\n        \"\"\"\n        Apply the model to get p(x_{t-1} | x_t), as well as a prediction of\n        the initial x, x_0.\n        :param model: the model, which takes a signal and a batch of timesteps\n                      as input.\n        :param x: the [N x C x ...] tensor at time t.\n        :param t: a 1-D Tensor of timesteps.\n        :param clip_denoised: if True, clip the denoised signal into [-1, 1].\n        :param denoised_fn: if not None, a function which applies to the\n            x_start prediction before it is used to sample. Applies before\n            clip_denoised.\n        :param model_kwargs: if not None, a dict of extra keyword arguments to\n            pass to the model. This can be used for conditioning.\n        :return: a dict with the following keys:\n                 - 'mean': the model mean output.\n                 - 'variance': the model variance output.\n                 - 'log_variance': the log of 'variance'.\n                 - 'pred_xstart': the prediction for x_0.\n        \"\"\"\n        if model_kwargs is None:\n            model_kwargs = {}\n\n        B, C = x.shape[:2]\n        assert t.shape == (B,)\n        model_output = model(x, t, current=current, cache_dic=cache_dic, **model_kwargs)\n        if isinstance(model_output, tuple):\n            model_output, extra = model_output\n        else:\n            extra = None\n\n        if self.model_var_type in [ModelVarType.LEARNED, ModelVarType.LEARNED_RANGE]:\n            assert model_output.shape == (B, C * 2, *x.shape[2:])\n            model_output, model_var_values = th.split(model_output, C, dim=1)\n            min_log = _extract_into_tensor(self.posterior_log_variance_clipped, t, x.shape)\n            max_log = _extract_into_tensor(np.log(self.betas), t, x.shape)\n            # The model_var_values is [-1, 1] for [min_var, max_var].\n            frac = (model_var_values + 1) / 2\n            model_log_variance = frac * max_log + (1 - frac) * min_log\n            model_variance = th.exp(model_log_variance)\n        elif self.model_var_type in [ModelVarType.FIXED_LARGE, ModelVarType.FIXED_SMALL]:\n            model_variance, model_log_variance = {\n                # for fixedlarge, we set the initial (log-)variance like so\n                # to get a better decoder log likelihood.\n                ModelVarType.FIXED_LARGE: (\n                    np.append(self.posterior_variance[1], self.betas[1:]),\n                    np.log(np.append(self.posterior_variance[1], self.betas[1:])),\n                ),\n                ModelVarType.FIXED_SMALL: (\n                    self.posterior_variance,\n                    self.posterior_log_variance_clipped,\n                ),\n            }[self.model_var_type]\n            model_variance = _extract_into_tensor(model_variance, t, x.shape)\n            model_log_variance = _extract_into_tensor(model_log_variance, t, x.shape)\n        else:\n            model_variance = th.zeros_like(model_output)\n            model_log_variance = th.zeros_like(model_output)\n\n        def process_xstart(x):\n            if denoised_fn is not None:\n                x = denoised_fn(x)\n            return x.clamp(-1, 1) if clip_denoised else x\n\n        if self.model_mean_type == ModelMeanType.START_X:\n            pred_xstart = process_xstart(model_output)\n        else:\n            pred_xstart = process_xstart(\n                self._predict_xstart_from_eps(x_t=x, t=t, eps=model_output)\n            )\n        model_mean, _, _ = self.q_posterior_mean_variance(x_start=pred_xstart, x_t=x, t=t)\n\n        assert model_mean.shape == model_log_variance.shape == pred_xstart.shape == x.shape\n        return {\n            \"mean\": model_mean,\n            \"variance\": model_variance,\n            \"log_variance\": model_log_variance,\n            \"pred_xstart\": pred_xstart,\n            \"extra\": extra,\n        }\n\n    def _predict_xstart_from_eps(self, x_t, t, eps):\n        assert x_t.shape == eps.shape\n        return (\n            _extract_into_tensor(self.sqrt_recip_alphas_cumprod, t, x_t.shape) * x_t\n            - _extract_into_tensor(self.sqrt_recipm1_alphas_cumprod, t, x_t.shape) * eps\n        )\n\n    def _predict_eps_from_xstart(self, x_t, t, pred_xstart):\n        return (\n            _extract_into_tensor(self.sqrt_recip_alphas_cumprod, t, x_t.shape) * x_t - pred_xstart\n        ) / _extract_into_tensor(self.sqrt_recipm1_alphas_cumprod, t, x_t.shape)\n\n    def condition_mean(self, cond_fn, p_mean_var, x, t, model_kwargs=None):\n        \"\"\"\n        Compute the mean for the previous step, given a function cond_fn that\n        computes the gradient of a conditional log probability with respect to\n        x. In particular, cond_fn computes grad(log(p(y|x))), and we want to\n        condition on y.\n        This uses the conditioning strategy from Sohl-Dickstein et al. (2015).\n        \"\"\"\n        gradient = cond_fn(x, t, **model_kwargs)\n        return p_mean_var[\"mean\"].float() + p_mean_var[\"variance\"] * gradient.float()\n\n    def condition_score(self, cond_fn, p_mean_var, x, t, model_kwargs=None):\n        \"\"\"\n        Compute what the p_mean_variance output would have been, should the\n        model's score function be conditioned by cond_fn.\n        See condition_mean() for details on cond_fn.\n        Unlike condition_mean(), this instead uses the conditioning strategy\n        from Song et al (2020).\n        \"\"\"\n        alpha_bar = _extract_into_tensor(self.alphas_cumprod, t, x.shape)\n\n        eps = self._predict_eps_from_xstart(x, t, p_mean_var[\"pred_xstart\"])\n        eps = eps - (1 - alpha_bar).sqrt() * cond_fn(x, t, **model_kwargs)\n\n        out = p_mean_var.copy()\n        out[\"pred_xstart\"] = self._predict_xstart_from_eps(x, t, eps)\n        out[\"mean\"], _, _ = self.q_posterior_mean_variance(x_start=out[\"pred_xstart\"], x_t=x, t=t)\n        return out\n\n    def p_sample(\n        self,\n        model,\n        x,\n        t,\n        current=None,\n        cache_dic=None,\n        clip_denoised=True,\n        denoised_fn=None,\n        cond_fn=None,\n        model_kwargs=None,\n    ):\n        \"\"\"\n        Sample x_{t-1} from the model at the given timestep.\n        :param model: the model to sample from.\n        :param x: the current tensor at x_{t-1}.\n        :param t: the value of t, starting at 0 for the first diffusion step.\n        :param clip_denoised: if True, clip the x_start prediction to [-1, 1].\n        :param denoised_fn: if not None, a function which applies to the\n            x_start prediction before it is used to sample.\n        :param cond_fn: if not None, this is a gradient function that acts\n                        similarly to the model.\n        :param model_kwargs: if not None, a dict of extra keyword arguments to\n            pass to the model. This can be used for conditioning.\n        :return: a dict containing the following keys:\n                 - 'sample': a random sample from the model.\n                 - 'pred_xstart': a prediction of x_0.\n        \"\"\"\n        out = self.p_mean_variance(\n            model,\n            x,\n            t,\n            current=current,\n            cache_dic=cache_dic,\n            clip_denoised=clip_denoised,\n            denoised_fn=denoised_fn,\n            model_kwargs=model_kwargs,\n        )\n        noise = th.randn_like(x)\n        nonzero_mask = (\n            (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))\n        )  # no noise when t == 0\n        if cond_fn is not None:\n            out[\"mean\"] = self.condition_mean(cond_fn, out, x, t, model_kwargs=model_kwargs)\n        sample = out[\"mean\"] + nonzero_mask * th.exp(0.5 * out[\"log_variance\"]) * noise\n        return {\"sample\": sample, \"pred_xstart\": out[\"pred_xstart\"]}\n\n    def p_sample_loop(\n        self,\n        model,\n        shape,\n        noise=None,\n        clip_denoised=True,\n        denoised_fn=None,\n        cond_fn=None,\n        model_kwargs=None,\n        device=None,\n        progress=False        \n    ):\n        \"\"\"\n        Generate samples from the model.\n        :param model: the model module.\n        :param shape: the shape of the samples, (N, C, H, W).\n        :param noise: if specified, the noise from the encoder to sample.\n                      Should be of the same shape as `shape`.\n        :param clip_denoised: if True, clip x_start predictions to [-1, 1].\n        :param denoised_fn: if not None, a function which applies to the\n            x_start prediction before it is used to sample.\n        :param cond_fn: if not None, this is a gradient function that acts\n                        similarly to the model.\n        :param model_kwargs: if not None, a dict of extra keyword arguments to\n            pass to the model. This can be used for conditioning.\n        :param device: if specified, the device to create the samples on.\n                       If not specified, use a model parameter's device.\n        :param progress: if True, show a tqdm progress bar.\n        :return: a non-differentiable batch of samples.\n        \"\"\"\n        final = None\n        for sample in self.p_sample_loop_progressive(\n            model,\n            shape,\n            noise=noise,\n            clip_denoised=clip_denoised,\n            denoised_fn=denoised_fn,\n            cond_fn=cond_fn,\n            model_kwargs=model_kwargs,\n            device=device,\n            progress=progress            \n        ):\n            final = sample\n        return final[\"sample\"]\n\n    def p_sample_loop_progressive(\n        self,\n        model,\n        shape,\n        noise=None,\n        clip_denoised=True,\n        denoised_fn=None,\n        cond_fn=None,\n        model_kwargs=None,\n        device=None,\n        progress=False\n    ):\n        \"\"\"\n        Generate samples from the model and yield intermediate samples from\n        each timestep of diffusion.\n        Arguments are the same as p_sample_loop().\n        Returns a generator over dicts, where each dict is the return value of\n        p_sample().\n        \"\"\"\n        if device is None:\n            device = next(model.parameters()).device\n        assert isinstance(shape, (tuple, list))\n        img = noise if noise is not None else th.randn(*shape, device=device)\n        indices = list(range(self.num_timesteps))[::-1]\n\n        if progress:\n            # Lazy import so that we don't depend on tqdm.\n            from tqdm.auto import tqdm\n\n            indices = tqdm(indices)\n\n        cache_dic, current = cache_init(model_kwargs=model_kwargs, num_steps=self.num_timesteps)\n        \n        for i in indices:\n            t = th.tensor([i] * shape[0], device=device)\n            with th.no_grad():\n                current['step'] = i\n                out = self.p_sample(\n                    model,\n                    img,\n                    t,\n                    current=current,\n                    cache_dic=cache_dic,\n                    clip_denoised=clip_denoised,\n                    denoised_fn=denoised_fn,\n                    cond_fn=cond_fn,\n                    model_kwargs=model_kwargs,\n                )\n                yield out\n                img = out[\"sample\"]\n\n    def ddim_sample(\n        self,\n        model,\n        x,\n        t,\n        clip_denoised=True,\n        denoised_fn=None,\n        cond_fn=None,\n        model_kwargs=None,\n        eta=0.0,\n    ):\n        \"\"\"\n        Sample x_{t-1} from the model using DDIM.\n        Same usage as p_sample().\n        \"\"\"\n        out = self.p_mean_variance(\n            model,\n            x,\n            t,\n            clip_denoised=clip_denoised,\n            denoised_fn=denoised_fn,\n            model_kwargs=model_kwargs,\n        )\n        if cond_fn is not None:\n            out = self.condition_score(cond_fn, out, x, t, model_kwargs=model_kwargs)\n\n        # Usually our model outputs epsilon, but we re-derive it\n        # in case we used x_start or x_prev prediction.\n        eps = self._predict_eps_from_xstart(x, t, out[\"pred_xstart\"])\n\n        alpha_bar = _extract_into_tensor(self.alphas_cumprod, t, x.shape)\n        alpha_bar_prev = _extract_into_tensor(self.alphas_cumprod_prev, t, x.shape)\n        sigma = (\n            eta\n            * th.sqrt((1 - alpha_bar_prev) / (1 - alpha_bar))\n            * th.sqrt(1 - alpha_bar / alpha_bar_prev)\n        )\n        # Equation 12.\n        noise = th.randn_like(x)\n        mean_pred = (\n            out[\"pred_xstart\"] * th.sqrt(alpha_bar_prev)\n            + th.sqrt(1 - alpha_bar_prev - sigma ** 2) * eps\n        )\n        nonzero_mask = (\n            (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))\n        )  # no noise when t == 0\n        sample = mean_pred + nonzero_mask * sigma * noise\n        return {\"sample\": sample, \"pred_xstart\": out[\"pred_xstart\"]}\n\n    def ddim_reverse_sample(\n        self,\n        model,\n        x,\n        t,\n        clip_denoised=True,\n        denoised_fn=None,\n        cond_fn=None,\n        model_kwargs=None,\n        eta=0.0,\n    ):\n        \"\"\"\n        Sample x_{t+1} from the model using DDIM reverse ODE.\n        \"\"\"\n        assert eta == 0.0, \"Reverse ODE only for deterministic path\"\n        out = self.p_mean_variance(\n            model,\n            x,\n            t,\n            clip_denoised=clip_denoised,\n            denoised_fn=denoised_fn,\n            model_kwargs=model_kwargs,\n        )\n        if cond_fn is not None:\n            out = self.condition_score(cond_fn, out, x, t, model_kwargs=model_kwargs)\n        # Usually our model outputs epsilon, but we re-derive it\n        # in case we used x_start or x_prev prediction.\n        eps = (\n            _extract_into_tensor(self.sqrt_recip_alphas_cumprod, t, x.shape) * x\n            - out[\"pred_xstart\"]\n        ) / _extract_into_tensor(self.sqrt_recipm1_alphas_cumprod, t, x.shape)\n        alpha_bar_next = _extract_into_tensor(self.alphas_cumprod_next, t, x.shape)\n\n        # Equation 12. reversed\n        mean_pred = out[\"pred_xstart\"] * th.sqrt(alpha_bar_next) + th.sqrt(1 - alpha_bar_next) * eps\n\n        return {\"sample\": mean_pred, \"pred_xstart\": out[\"pred_xstart\"]}\n\n    def ddim_sample_loop(\n        self,\n        model,\n        shape,\n        noise=None,\n        clip_denoised=True,\n        denoised_fn=None,\n        cond_fn=None,\n        model_kwargs=None,\n        device=None,\n        progress=False,\n        eta=0.0,\n    ):\n        \"\"\"\n        Generate samples from the model using DDIM.\n        Same usage as p_sample_loop().\n        \"\"\"\n        final = None\n        for sample in self.ddim_sample_loop_progressive(\n            model,\n            shape,\n            noise=noise,\n            clip_denoised=clip_denoised,\n            denoised_fn=denoised_fn,\n            cond_fn=cond_fn,\n            model_kwargs=model_kwargs,\n            device=device,\n            progress=progress,\n            eta=eta,\n        ):\n            final = sample\n        return final[\"sample\"]\n\n    def ddim_sample_loop_progressive(\n        self,\n        model,\n        shape,\n        noise=None,\n        clip_denoised=True,\n        denoised_fn=None,\n        cond_fn=None,\n        model_kwargs=None,\n        device=None,\n        progress=False,\n        eta=0.0,\n    ):\n        \"\"\"\n        Use DDIM to sample from the model and yield intermediate samples from\n        each timestep of DDIM.\n        Same usage as p_sample_loop_progressive().\n        \"\"\"\n        if device is None:\n            device = next(model.parameters()).device\n        assert isinstance(shape, (tuple, list))\n        img = noise if noise is not None else th.randn(*shape, device=device)\n        indices = list(range(self.num_timesteps))[::-1]\n\n        if progress:\n            # Lazy import so that we don't depend on tqdm.\n            from tqdm.auto import tqdm\n\n            indices = tqdm(indices)\n\n        for i in indices:\n            t = th.tensor([i] * shape[0], device=device)\n            with th.no_grad():\n                out = self.ddim_sample(\n                    model,\n                    img,\n                    t,\n                    clip_denoised=clip_denoised,\n                    denoised_fn=denoised_fn,\n                    cond_fn=cond_fn,\n                    model_kwargs=model_kwargs,\n                    eta=eta,\n                )\n                yield out\n                img = out[\"sample\"]\n\n    def _vb_terms_bpd(\n            self, model, x_start, x_t, t, clip_denoised=True, model_kwargs=None\n    ):\n        \"\"\"\n        Get a term for the variational lower-bound.\n        The resulting units are bits (rather than nats, as one might expect).\n        This allows for comparison to other papers.\n        :return: a dict with the following keys:\n                 - 'output': a shape [N] tensor of NLLs or KLs.\n                 - 'pred_xstart': the x_0 predictions.\n        \"\"\"\n        true_mean, _, true_log_variance_clipped = self.q_posterior_mean_variance(\n            x_start=x_start, x_t=x_t, t=t\n        )\n        out = self.p_mean_variance(\n            model, x_t, t, clip_denoised=clip_denoised, model_kwargs=model_kwargs\n        )\n        kl = normal_kl(\n            true_mean, true_log_variance_clipped, out[\"mean\"], out[\"log_variance\"]\n        )\n        kl = mean_flat(kl) / np.log(2.0)\n\n        decoder_nll = -discretized_gaussian_log_likelihood(\n            x_start, means=out[\"mean\"], log_scales=0.5 * out[\"log_variance\"]\n        )\n        assert decoder_nll.shape == x_start.shape\n        decoder_nll = mean_flat(decoder_nll) / np.log(2.0)\n\n        # At the first timestep return the decoder NLL,\n        # otherwise return KL(q(x_{t-1}|x_t,x_0) || p(x_{t-1}|x_t))\n        output = th.where((t == 0), decoder_nll, kl)\n        return {\"output\": output, \"pred_xstart\": out[\"pred_xstart\"]}\n\n    def training_losses(self, model, x_start, timestep, model_kwargs=None, noise=None, skip_noise=False):\n        \"\"\"\n        Compute training losses for a single timestep.\n        :param model: the model to evaluate loss on.\n        :param x_start: the [N x C x ...] tensor of inputs.\n        :param t: a batch of timestep indices.\n        :param model_kwargs: if not None, a dict of extra keyword arguments to\n            pass to the model. This can be used for conditioning.\n        :param noise: if specified, the specific Gaussian noise to try to remove.\n        :return: a dict with the key \"loss\" containing a tensor of shape [N].\n                 Some mean or variance settings may also have other keys.\n        \"\"\"\n        t = timestep\n        if model_kwargs is None:\n            model_kwargs = {}\n        if skip_noise:\n            x_t = x_start\n        else:\n            if noise is None:\n                noise = th.randn_like(x_start)\n            x_t = self.q_sample(x_start, t, noise=noise)\n\n        terms = {}\n\n        if self.loss_type == LossType.KL or self.loss_type == LossType.RESCALED_KL:\n            terms[\"loss\"] = self._vb_terms_bpd(\n                model=model,\n                x_start=x_start,\n                x_t=x_t,\n                t=t,\n                clip_denoised=False,\n                model_kwargs=model_kwargs,\n            )[\"output\"]\n            if self.loss_type == LossType.RESCALED_KL:\n                terms[\"loss\"] *= self.num_timesteps\n        elif self.loss_type in [LossType.MSE, LossType.RESCALED_MSE]:\n            model_output = model(x_t, t, **model_kwargs)\n            if isinstance(model_output, dict) and model_output.get('x', None) is not None:\n                output = model_output['x']\n            else:\n                output = model_output\n\n            if self.return_startx and self.model_mean_type == ModelMeanType.EPSILON:\n                return self._extracted_from_training_losses_diffusers(x_t, output, t)\n            # self.model_var_type = ModelVarType.LEARNED_RANGE:4\n            if self.model_var_type in [\n                ModelVarType.LEARNED,\n                ModelVarType.LEARNED_RANGE,\n            ]:\n                B, C = x_t.shape[:2]\n                assert output.shape == (B, C * 2, *x_t.shape[2:])\n                output, model_var_values = th.split(output, C, dim=1)\n                # Learn the variance using the variational bound, but don't let it affect our mean prediction.\n                frozen_out = th.cat([output.detach(), model_var_values], dim=1)\n                # vb variational bound\n                terms[\"vb\"] = self._vb_terms_bpd(\n                    model=lambda *args, r=frozen_out, **kwargs: r,\n                    x_start=x_start,\n                    x_t=x_t,\n                    t=t,\n                    clip_denoised=False,\n                )[\"output\"]\n                if self.loss_type == LossType.RESCALED_MSE:\n                    # Divide by 1000 for equivalence with initial implementation.\n                    # Without a factor of 1/1000, the VB term hurts the MSE term.\n                    terms[\"vb\"] *= self.num_timesteps / 1000.0\n\n            target = {\n                ModelMeanType.PREVIOUS_X: self.q_posterior_mean_variance(\n                    x_start=x_start, x_t=x_t, t=t\n                )[0],\n                ModelMeanType.START_X: x_start,\n                ModelMeanType.EPSILON: noise,\n            }[self.model_mean_type]\n            assert output.shape == target.shape == x_start.shape\n            if self.snr:\n                if self.model_mean_type == ModelMeanType.START_X:\n                    pred_noise = self._predict_eps_from_xstart(x_t=x_t, t=t, pred_xstart=output)\n                    pred_startx = output\n                elif self.model_mean_type == ModelMeanType.EPSILON:\n                    pred_noise = output\n                    pred_startx = self._predict_xstart_from_eps(x_t=x_t, t=t, eps=output)\n                # terms[\"mse_eps\"] = mean_flat((noise - pred_noise) ** 2)\n                # terms[\"mse_x0\"] = mean_flat((x_start - pred_startx) ** 2)\n\n                t = t[:, None, None, None].expand(pred_startx.shape)  # [128, 4, 32, 32]\n                # best\n                target = th.where(t > 249, noise, x_start)\n                output = th.where(t > 249, pred_noise, pred_startx)\n            loss = (target - output) ** 2\n            if model_kwargs.get('mask_ratio', False) and model_kwargs['mask_ratio'] > 0:\n                assert 'mask' in model_output\n                loss = F.avg_pool2d(loss.mean(dim=1), model.model.module.patch_size).flatten(1)\n                mask = model_output['mask']\n                unmask = 1 - mask\n                terms['mse'] = mean_flat(loss * unmask) * unmask.shape[1]/unmask.sum(1)\n                if model_kwargs['mask_loss_coef'] > 0:\n                    terms['mae'] = model_kwargs['mask_loss_coef'] * mean_flat(loss * mask) * mask.shape[1]/mask.sum(1)\n            else:\n                terms[\"mse\"] = mean_flat(loss)\n            terms[\"loss\"] = terms[\"mse\"] + terms[\"vb\"] if \"vb\" in terms else terms[\"mse\"]\n            if \"mae\" in terms:\n                terms[\"loss\"] = terms[\"loss\"] + terms[\"mae\"]\n        else:\n            raise NotImplementedError(self.loss_type)\n\n        return terms\n\n    def training_losses_diffusers(self, model, x_start, timestep, model_kwargs=None, noise=None, skip_noise=False):\n        \"\"\"\n        Compute training losses for a single timestep.\n        :param model: the model to evaluate loss on.\n        :param x_start: the [N x C x ...] tensor of inputs.\n        :param t: a batch of timestep indices.\n        :param model_kwargs: if not None, a dict of extra keyword arguments to\n            pass to the model. This can be used for conditioning.\n        :param noise: if specified, the specific Gaussian noise to try to remove.\n        :return: a dict with the key \"loss\" containing a tensor of shape [N].\n                 Some mean or variance settings may also have other keys.\n        \"\"\"\n        t = timestep\n        if model_kwargs is None:\n            model_kwargs = {}\n        if skip_noise:\n            x_t = x_start\n        else:\n            if noise is None:\n                noise = th.randn_like(x_start)\n            x_t = self.q_sample(x_start, t, noise=noise)\n\n        terms = {}\n\n        if self.loss_type in [LossType.KL, LossType.RESCALED_KL]:\n            terms[\"loss\"] = self._vb_terms_bpd(\n                model=model,\n                x_start=x_start,\n                x_t=x_t,\n                t=t,\n                clip_denoised=False,\n                model_kwargs=model_kwargs,\n            )[\"output\"]\n            if self.loss_type == LossType.RESCALED_KL:\n                terms[\"loss\"] *= self.num_timesteps\n        elif self.loss_type in [LossType.MSE, LossType.RESCALED_MSE]:\n            output = model(x_t, timestep=t, **model_kwargs, return_dict=False)[0]\n\n            if self.return_startx and self.model_mean_type == ModelMeanType.EPSILON:\n                return self._extracted_from_training_losses_diffusers(x_t, output, t)\n\n            if self.model_var_type in [\n                ModelVarType.LEARNED,\n                ModelVarType.LEARNED_RANGE,\n            ]:\n                B, C = x_t.shape[:2]\n                assert output.shape == (B, C * 2, *x_t.shape[2:])\n                output, model_var_values = th.split(output, C, dim=1)\n                # Learn the variance using the variational bound, but don't let it affect our mean prediction.\n                frozen_out = th.cat([output.detach(), model_var_values], dim=1)\n                terms[\"vb\"] = self._vb_terms_bpd(\n                    model=lambda *args, r=frozen_out, **kwargs: r,\n                    x_start=x_start,\n                    x_t=x_t,\n                    t=t,\n                    clip_denoised=False,\n                )[\"output\"]\n                if self.loss_type == LossType.RESCALED_MSE:\n                    # Divide by 1000 for equivalence with initial implementation.\n                    # Without a factor of 1/1000, the VB term hurts the MSE term.\n                    terms[\"vb\"] *= self.num_timesteps / 1000.0\n\n            target = {\n                ModelMeanType.PREVIOUS_X: self.q_posterior_mean_variance(\n                    x_start=x_start, x_t=x_t, t=t\n                )[0],\n                ModelMeanType.START_X: x_start,\n                ModelMeanType.EPSILON: noise,\n            }[self.model_mean_type]\n            assert output.shape == target.shape == x_start.shape\n            if self.snr:\n                if self.model_mean_type == ModelMeanType.START_X:\n                    pred_noise = self._predict_eps_from_xstart(x_t=x_t, t=t, pred_xstart=output)\n                    pred_startx = output\n                elif self.model_mean_type == ModelMeanType.EPSILON:\n                    pred_noise = output\n                    pred_startx = self._predict_xstart_from_eps(x_t=x_t, t=t, eps=output)\n                # terms[\"mse_eps\"] = mean_flat((noise - pred_noise) ** 2)\n                # terms[\"mse_x0\"] = mean_flat((x_start - pred_startx) ** 2)\n\n                t = t[:, None, None, None].expand(pred_startx.shape)  # [128, 4, 32, 32]\n                # best\n                target = th.where(t > 249, noise, x_start)\n                output = th.where(t > 249, pred_noise, pred_startx)\n            loss = (target - output) ** 2\n            terms[\"mse\"] = mean_flat(loss)\n            terms[\"loss\"] = terms[\"mse\"] + terms[\"vb\"] if \"vb\" in terms else terms[\"mse\"]\n            if \"mae\" in terms:\n                terms[\"loss\"] = terms[\"loss\"] + terms[\"mae\"]\n        else:\n            raise NotImplementedError(self.loss_type)\n\n        return terms\n\n    def _extracted_from_training_losses_diffusers(self, x_t, output, t):\n        B, C = x_t.shape[:2]\n        assert output.shape == (B, C * 2, *x_t.shape[2:])\n        output = th.split(output, C, dim=1)[0]\n        return output, self._predict_xstart_from_eps(x_t=x_t, t=t, eps=output), x_t\n\n    def _prior_bpd(self, x_start):\n        \"\"\"\n        Get the prior KL term for the variational lower-bound, measured in\n        bits-per-dim.\n        This term can't be optimized, as it only depends on the encoder.\n        :param x_start: the [N x C x ...] tensor of inputs.\n        :return: a batch of [N] KL values (in bits), one per batch element.\n        \"\"\"\n        batch_size = x_start.shape[0]\n        t = th.tensor([self.num_timesteps - 1] * batch_size, device=x_start.device)\n        qt_mean, _, qt_log_variance = self.q_mean_variance(x_start, t)\n        kl_prior = normal_kl(\n            mean1=qt_mean, logvar1=qt_log_variance, mean2=0.0, logvar2=0.0\n        )\n        return mean_flat(kl_prior) / np.log(2.0)\n\n    def calc_bpd_loop(self, model, x_start, clip_denoised=True, model_kwargs=None):\n        \"\"\"\n        Compute the entire variational lower-bound, measured in bits-per-dim,\n        as well as other related quantities.\n        :param model: the model to evaluate loss on.\n        :param x_start: the [N x C x ...] tensor of inputs.\n        :param clip_denoised: if True, clip denoised samples.\n        :param model_kwargs: if not None, a dict of extra keyword arguments to\n            pass to the model. This can be used for conditioning.\n        :return: a dict containing the following keys:\n                 - total_bpd: the total variational lower-bound, per batch element.\n                 - prior_bpd: the prior term in the lower-bound.\n                 - vb: an [N x T] tensor of terms in the lower-bound.\n                 - xstart_mse: an [N x T] tensor of x_0 MSEs for each timestep.\n                 - mse: an [N x T] tensor of epsilon MSEs for each timestep.\n        \"\"\"\n        device = x_start.device\n        batch_size = x_start.shape[0]\n\n        vb = []\n        xstart_mse = []\n        mse = []\n        for t in list(range(self.num_timesteps))[::-1]:\n            t_batch = th.tensor([t] * batch_size, device=device)\n            noise = th.randn_like(x_start)\n            x_t = self.q_sample(x_start=x_start, t=t_batch, noise=noise)\n            # Calculate VLB term at the current timestep\n            with th.no_grad():\n                out = self._vb_terms_bpd(\n                    model,\n                    x_start=x_start,\n                    x_t=x_t,\n                    t=t_batch,\n                    clip_denoised=clip_denoised,\n                    model_kwargs=model_kwargs,\n                )\n            vb.append(out[\"output\"])\n            xstart_mse.append(mean_flat((out[\"pred_xstart\"] - x_start) ** 2))\n            eps = self._predict_eps_from_xstart(x_t, t_batch, out[\"pred_xstart\"])\n            mse.append(mean_flat((eps - noise) ** 2))\n\n        vb = th.stack(vb, dim=1)\n        xstart_mse = th.stack(xstart_mse, dim=1)\n        mse = th.stack(mse, dim=1)\n\n        prior_bpd = self._prior_bpd(x_start)\n        total_bpd = vb.sum(dim=1) + prior_bpd\n        return {\n            \"total_bpd\": total_bpd,\n            \"prior_bpd\": prior_bpd,\n            \"vb\": vb,\n            \"xstart_mse\": xstart_mse,\n            \"mse\": mse,\n        }\n\n\ndef _extract_into_tensor(arr, timesteps, broadcast_shape):\n    \"\"\"\n    Extract values from a 1-D numpy array for a batch of indices.\n    :param arr: the 1-D numpy array.\n    :param timesteps: a tensor of indices into the array to extract.\n    :param broadcast_shape: a larger shape of K dimensions with the batch\n                            dimension equal to the length of timesteps.\n    :return: a tensor of shape [batch_size, 1, ...] where the shape has K dims.\n    \"\"\"\n    res = th.from_numpy(arr).to(device=timesteps.device)[timesteps].float()\n    while len(res.shape) < len(broadcast_shape):\n        res = res[..., None]\n    return res + th.zeros(broadcast_shape, device=timesteps.device)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/hed.py",
    "content": "# This is an improved version and model of HED edge detection with Apache License, Version 2.0.\n# Please use this implementation in your products\n# This implementation may produce slightly different results from Saining Xie's official implementations,\n# but it generates smoother edges and is more suitable for ControlNet as well as other image-to-image translations.\n# Different from official models and other implementations, this is an RGB-input model (rather than BGR)\n# and in this way it works better for gradio's RGB protocol\nimport sys\nfrom pathlib import Path\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent.parent))\nfrom torch import nn\nimport torch\nimport numpy as np\nfrom torchvision import transforms as T\nfrom tqdm import tqdm\nfrom torch.utils.data import Dataset, DataLoader\nimport json\nfrom PIL import Image\nimport torchvision.transforms.functional as TF\nfrom accelerate import Accelerator\nfrom diffusers.models import AutoencoderKL\nimport os\n\nimage_resize = 1024\n\n\nclass DoubleConvBlock(nn.Module):\n    def __init__(self, input_channel, output_channel, layer_number):\n        super().__init__()\n        self.convs = torch.nn.Sequential()\n        self.convs.append(torch.nn.Conv2d(in_channels=input_channel, out_channels=output_channel, kernel_size=(3, 3), stride=(1, 1), padding=1))\n        for i in range(1, layer_number):\n            self.convs.append(torch.nn.Conv2d(in_channels=output_channel, out_channels=output_channel, kernel_size=(3, 3), stride=(1, 1), padding=1))\n        self.projection = torch.nn.Conv2d(in_channels=output_channel, out_channels=1, kernel_size=(1, 1), stride=(1, 1), padding=0)\n\n    def forward(self, x, down_sampling=False):\n        h = x\n        if down_sampling:\n            h = torch.nn.functional.max_pool2d(h, kernel_size=(2, 2), stride=(2, 2))\n        for conv in self.convs:\n            h = conv(h)\n            h = torch.nn.functional.relu(h)\n        return h, self.projection(h)\n\n\nclass ControlNetHED_Apache2(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.norm = torch.nn.Parameter(torch.zeros(size=(1, 3, 1, 1)))\n        self.block1 = DoubleConvBlock(input_channel=3, output_channel=64, layer_number=2)\n        self.block2 = DoubleConvBlock(input_channel=64, output_channel=128, layer_number=2)\n        self.block3 = DoubleConvBlock(input_channel=128, output_channel=256, layer_number=3)\n        self.block4 = DoubleConvBlock(input_channel=256, output_channel=512, layer_number=3)\n        self.block5 = DoubleConvBlock(input_channel=512, output_channel=512, layer_number=3)\n\n    def forward(self, x):\n        h = x - self.norm\n        h, projection1 = self.block1(h)\n        h, projection2 = self.block2(h, down_sampling=True)\n        h, projection3 = self.block3(h, down_sampling=True)\n        h, projection4 = self.block4(h, down_sampling=True)\n        h, projection5 = self.block5(h, down_sampling=True)\n        return projection1, projection2, projection3, projection4, projection5\n\n\nclass InternData(Dataset):\n    def __init__(self):\n        ####\n        with open('data/InternData/partition/data_info.json', 'r') as f:\n            self.j = json.load(f)\n        self.transform = T.Compose([\n            T.Lambda(lambda img: img.convert('RGB')),\n            T.Resize(image_resize),  # Image.BICUBIC\n            T.CenterCrop(image_resize),\n            T.ToTensor(),\n        ])\n\n    def __len__(self):\n        return len(self.j)\n\n    def getdata(self, idx):\n\n        path = self.j[idx]['path']\n        image = Image.open(\"data/InternImgs/\" + path)\n        image = self.transform(image)\n        return image, path\n\n    def __getitem__(self, idx):\n        for i in range(20):\n            try:\n                data = self.getdata(idx)\n                return data\n            except Exception as e:\n                print(f\"Error details: {str(e)}\")\n                idx = np.random.randint(len(self))\n        raise RuntimeError('Too many bad data.')\n\nclass HEDdetector(nn.Module):\n    def __init__(self, feature=True, vae=None):\n        super().__init__()\n        self.model = ControlNetHED_Apache2()\n        self.model.load_state_dict(torch.load('output/pretrained_models/ControlNetHED.pth', map_location='cpu'))\n        self.model.eval()\n        self.model.requires_grad_(False)\n        if feature:\n            if vae is None:\n                self.vae = AutoencoderKL.from_pretrained(\"output/pretrained_models/sd-vae-ft-ema\")\n            else:\n                self.vae = vae\n            self.vae.eval()\n            self.vae.requires_grad_(False)\n        else:\n            self.vae = None\n\n    def forward(self, input_image):\n        B, C, H, W = input_image.shape\n        with torch.inference_mode():\n            edges = self.model(input_image * 255.)\n            edges = torch.cat([TF.resize(e, [H, W]) for e in edges], dim=1)\n            edge = 1 / (1 + torch.exp(-torch.mean(edges, dim=1, keepdim=True)))\n            edge.clip_(0, 1)\n            if self.vae:\n                edge = TF.normalize(edge, [.5], [.5])\n                edge = edge.repeat(1, 3, 1, 1)\n                posterior = self.vae.encode(edge).latent_dist\n                edge = torch.cat([posterior.mean, posterior.std], dim=1).cpu().numpy()\n        return edge\n\n\ndef main():\n    dataset = InternData()\n    dataloader = DataLoader(dataset, batch_size=10, shuffle=False, num_workers=8, pin_memory=True)\n    hed = HEDdetector()\n\n    accelerator = Accelerator()\n    hed, dataloader = accelerator.prepare(hed, dataloader)\n\n\n    for img, path in tqdm(dataloader):\n        out = hed(img.cuda())\n        for p, o in zip(path, out):\n            save = f'data/InternalData/hed_feature_{image_resize}/' + p.replace('.png', '.npz')\n            if os.path.exists(save):\n                continue\n            os.makedirs(os.path.dirname(save), exist_ok=True)\n            np.savez_compressed(save, o)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/llava/__init__.py",
    "content": "from diffusion.model.llava.llava_mpt import LlavaMPTForCausalLM, LlavaMPTConfig"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/llava/llava_mpt.py",
    "content": "#    Copyright 2023 Haotian Liu\n#\n#    Licensed under the Apache License, Version 2.0 (the \"License\");\n#    you may not use this file except in compliance with the License.\n#    You may obtain a copy of the License at\n#\n#        http://www.apache.org/licenses/LICENSE-2.0\n#\n#    Unless required by applicable law or agreed to in writing, software\n#    distributed under the License is distributed on an \"AS IS\" BASIS,\n#    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n#    See the License for the specific language governing permissions and\n#    limitations under the License.\n\n\nfrom typing import List, Optional, Tuple, Union\nimport warnings\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom torch.nn import CrossEntropyLoss\n\nimport math\n\nfrom transformers import AutoConfig, AutoModelForCausalLM, CLIPVisionModel, CLIPImageProcessor\n\nfrom transformers.modeling_outputs import BaseModelOutputWithPast, CausalLMOutputWithPast\n\nfrom diffusion.model.llava.mpt.modeling_mpt import MPTConfig, MPTForCausalLM, MPTModel\n\n\nDEFAULT_IMAGE_TOKEN = \"<image>\"\nDEFAULT_IMAGE_PATCH_TOKEN = \"<im_patch>\"\nDEFAULT_IM_START_TOKEN = \"<im_start>\"\nDEFAULT_IM_END_TOKEN = \"<im_end>\"\n\n\nclass LlavaMPTConfig(MPTConfig):\n    model_type = \"llava_mpt\"\n\n\nclass LlavaMPTModel(MPTModel):\n    config_class = LlavaMPTConfig\n\n    def __init__(self, config: MPTConfig, mm_vision_tower=None, mm_hidden_size=None):\n        super(LlavaMPTModel, self).__init__(config)\n\n        if hasattr(config, \"mm_vision_tower\"):\n            # HACK: for FSDP\n            self.vision_tower = [CLIPVisionModel.from_pretrained(config.mm_vision_tower)]\n            # self.vision_tower = CLIPVisionModel.from_pretrained(config.mm_vision_tower)\n\n        if hasattr(config, \"use_mm_proj\"):\n            self.mm_projector = nn.Linear(config.mm_hidden_size, config.d_model)\n\n    def initialize_vision_modules(self, vision_tower, mm_vision_select_layer,\n                                  pretrain_mm_mlp_adapter=None, tune_mm_mlp_adapter=False):\n        self.config.mm_vision_tower = vision_tower\n\n        image_processor = CLIPImageProcessor.from_pretrained(vision_tower)\n\n        if not hasattr(self, 'vision_tower'):\n            vision_tower = CLIPVisionModel.from_pretrained(vision_tower)\n        else:\n            vision_tower = self.vision_tower[0]\n        vision_tower.requires_grad_(False)\n        vision_tower = vision_tower.to(torch.float16)\n        self.vision_tower = [vision_tower]\n\n        vision_config = vision_tower.config\n        num_patches = (vision_config.image_size // vision_config.patch_size) ** 2\n\n        self.config.use_mm_proj = True\n        self.config.mm_hidden_size = vision_config.hidden_size\n        self.config.mm_vision_select_layer = mm_vision_select_layer\n\n        if not hasattr(self, 'mm_projector'):\n            self.mm_projector = nn.Linear(vision_config.hidden_size, self.config.d_model)\n\n        if pretrain_mm_mlp_adapter is not None:\n            mm_projector_weights = torch.load(pretrain_mm_mlp_adapter, map_location='cpu')\n            self.mm_projector.load_state_dict({k.split('.')[-1]: v for k, v in mm_projector_weights.items() if 'mm_projector' in k})\n\n        return dict(\n            image_processor=image_processor,\n            image_token_len=num_patches,\n            vision_config=vision_config\n        )\n\n    def forward(self, input_ids: torch.LongTensor, past_key_values: Optional[List[Tuple[torch.FloatTensor]]]=None, attention_mask: Optional[torch.ByteTensor]=None, prefix_mask: Optional[torch.ByteTensor]=None, sequence_id: Optional[torch.LongTensor]=None, return_dict: Optional[bool]=None, output_attentions: Optional[bool]=None, output_hidden_states: Optional[bool]=None, use_cache: Optional[bool]=None, images=None):\n\n        # HACK: replace back original embeddings for LLaVA pretraining\n        orig_embeds_params = getattr(self, 'orig_embeds_params', None)\n        # if orig_embeds_params is not None:\n        #     orig_embeds_params = orig_embeds_params[0]\n        #     with torch.no_grad():\n        #         self.get_input_embeddings().weight.data[:-2] = orig_embeds_params[:-2].data\n\n        inputs_embeds = self.wte(input_ids)\n\n        vision_tower = getattr(self, 'vision_tower', None)\n        if vision_tower is not None and (input_ids.shape[1] != 1 or self.training) and images is not None:\n            # TODO: this is a modified multimodal LLM -- Haotian Liu\n            vision_tower = vision_tower[0]  # HACK: for FSDP\n            with torch.no_grad():\n                if type(images) is list:\n                    # variable length images\n                    image_features = []\n                    for image in images:\n                        image_forward_out = vision_tower(image.unsqueeze(0), output_hidden_states=True)\n                        select_hidden_state_layer = getattr(self.config, \"mm_vision_select_layer\", -1)\n                        select_hidden_state = image_forward_out.hidden_states[select_hidden_state_layer]\n                        image_feature = select_hidden_state[:, 1:]\n                        image_features.append(image_feature)\n                else:\n                    image_forward_outs = vision_tower(images, output_hidden_states=True)\n                    select_hidden_state_layer = getattr(self.config, \"mm_vision_select_layer\", -1)\n                    select_hidden_state = image_forward_outs.hidden_states[select_hidden_state_layer]\n                    image_features = select_hidden_state[:, 1:]\n            if type(images) is list:\n                image_features = [self.mm_projector(image_feature)[0] for image_feature in image_features]\n            else:\n                image_features = self.mm_projector(image_features)\n            dummy_image_features = torch.zeros(256, 1024, device=inputs_embeds.device, dtype=inputs_embeds.dtype)\n            dummy_image_features = self.mm_projector(dummy_image_features)\n\n            new_input_embeds = []\n            cur_image_idx = 0\n            for cur_input_ids, cur_input_embeds in zip(input_ids, inputs_embeds):\n                if (cur_input_ids == vision_tower.config.im_patch_token).sum() == 0:\n                    # multimodal LLM, but the current sample is not multimodal\n                    cur_input_embeds = cur_input_embeds + (0. * dummy_image_features).sum()\n                    new_input_embeds.append(cur_input_embeds)\n                    continue\n                cur_image_features = image_features[cur_image_idx]\n                num_patches = cur_image_features.shape[0]\n                if vision_tower.config.use_im_start_end:\n                    if (cur_input_ids == vision_tower.config.im_start_token).sum() != (cur_input_ids == vision_tower.config.im_end_token).sum():\n                        raise ValueError(\"The number of image start tokens and image end tokens should be the same.\")\n                    image_start_tokens = torch.where(cur_input_ids == vision_tower.config.im_start_token)[0]\n                    for image_start_token_pos in image_start_tokens:\n                        cur_image_features = image_features[cur_image_idx].to(device=cur_input_embeds.device)\n                        num_patches = cur_image_features.shape[0]\n                        if cur_input_ids[image_start_token_pos + num_patches + 1] != vision_tower.config.im_end_token:\n                            raise ValueError(\"The image end token should follow the image start token.\")\n                        if orig_embeds_params is not None:\n                            cur_new_input_embeds = torch.cat((cur_input_embeds[:image_start_token_pos].detach(), cur_input_embeds[image_start_token_pos:image_start_token_pos+1], cur_image_features, cur_input_embeds[image_start_token_pos + num_patches + 1:image_start_token_pos + num_patches + 2], cur_input_embeds[image_start_token_pos + num_patches + 2:].detach()), dim=0)\n                        else:\n                            cur_new_input_embeds = torch.cat((cur_input_embeds[:image_start_token_pos+1], cur_image_features, cur_input_embeds[image_start_token_pos + num_patches + 1:]), dim=0)\n                        cur_image_idx += 1\n                else:\n                    if (cur_input_ids == vision_tower.config.im_patch_token).sum() != num_patches:\n                        raise ValueError(\"The number of image patch tokens should be the same as the number of image patches.\")\n                    masked_indices = torch.where(cur_input_ids == vision_tower.config.im_patch_token)[0]\n                    mask_index_start = masked_indices[0]\n                    if (masked_indices != torch.arange(mask_index_start, mask_index_start+num_patches, device=masked_indices.device, dtype=masked_indices.dtype)).any():\n                        raise ValueError(\"The image patch tokens should be consecutive.\")\n                    if orig_embeds_params is not None:\n                        cur_new_input_embeds = torch.cat((cur_input_embeds[:mask_index_start].detach(), cur_image_features, cur_input_embeds[mask_index_start+num_patches:].detach()), dim=0)\n                    else:\n                        cur_new_input_embeds = torch.cat((cur_input_embeds[:mask_index_start], cur_image_features, cur_input_embeds[mask_index_start+num_patches:]), dim=0)\n                new_input_embeds.append(cur_new_input_embeds)\n            inputs_embeds = torch.stack(new_input_embeds, dim=0)\n\n        return super(LlavaMPTModel, self).forward(input_ids=None, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache, tok_emb=inputs_embeds)\n\n\nclass LlavaMPTForCausalLM(MPTForCausalLM):\n    config_class = LlavaMPTConfig\n    supports_gradient_checkpointing = True\n\n    def __init__(self, config):\n        super(MPTForCausalLM, self).__init__(config)\n\n        if not config.tie_word_embeddings:\n            raise ValueError('MPTForCausalLM only supports tied word embeddings')\n        self.transformer = LlavaMPTModel(config)\n        self.logit_scale = None\n        if config.logit_scale is not None:\n            logit_scale = config.logit_scale\n            if isinstance(logit_scale, str):\n                if logit_scale == 'inv_sqrt_d_model':\n                    logit_scale = 1 / math.sqrt(config.d_model)\n                else:\n                    raise ValueError(f\"logit_scale={logit_scale!r} is not recognized as an option; use numeric value or 'inv_sqrt_d_model'.\")\n            self.logit_scale = logit_scale\n\n    def get_model(self):\n        return self.transformer\n\n    def _set_gradient_checkpointing(self, module, value=False):\n        if isinstance(module, LlavaMPTModel):\n            module.gradient_checkpointing = value\n\n    def forward(self, input_ids: torch.LongTensor, past_key_values: Optional[List[Tuple[torch.FloatTensor]]]=None, attention_mask: Optional[torch.ByteTensor]=None, prefix_mask: Optional[torch.ByteTensor]=None, sequence_id: Optional[torch.LongTensor]=None, labels: Optional[torch.LongTensor]=None, return_dict: Optional[bool]=None, output_attentions: Optional[bool]=None, output_hidden_states: Optional[bool]=None, use_cache: Optional[bool]=None, images=None):\n        return_dict = return_dict if return_dict is not None else self.config.return_dict\n        use_cache = use_cache if use_cache is not None else self.config.use_cache\n        outputs = self.transformer(input_ids=input_ids, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache, images=images)\n        logits = F.linear(outputs.last_hidden_state, self.transformer.wte.weight)\n        if self.logit_scale is not None:\n            if self.logit_scale == 0:\n                warnings.warn(f'Multiplying logits by self.logit_scale={self.logit_scale!r}. This will produce uniform (uninformative) outputs.')\n            logits *= self.logit_scale\n        loss = None\n        if labels is not None:\n            labels = torch.roll(labels, shifts=-1)\n            labels[:, -1] = -100\n            loss = F.cross_entropy(logits.view(-1, logits.size(-1)), labels.to(logits.device).view(-1))\n        return CausalLMOutputWithPast(loss=loss, logits=logits, past_key_values=outputs.past_key_values, hidden_states=outputs.hidden_states)\n\n    def prepare_inputs_for_generation(self, input_ids, past_key_values=None, inputs_embeds=None, **kwargs):\n        if inputs_embeds is not None:\n            raise NotImplementedError('inputs_embeds is not implemented for MPT yet')\n        attention_mask = kwargs['attention_mask'].bool()\n        if attention_mask[:, -1].sum() != attention_mask.shape[0]:\n            raise NotImplementedError('MPT does not support generation with right padding.')\n        if self.transformer.attn_uses_sequence_id and self.training:\n            sequence_id = torch.zeros_like(input_ids[:1])\n        else:\n            sequence_id = None\n        if past_key_values is not None:\n            input_ids = input_ids[:, -1].unsqueeze(-1)\n        if self.transformer.prefix_lm:\n            prefix_mask = torch.ones_like(attention_mask)\n            if kwargs.get('use_cache') == False:\n                raise NotImplementedError('MPT with prefix_lm=True does not support use_cache=False.')\n        else:\n            prefix_mask = None\n        return {'input_ids': input_ids, 'attention_mask': attention_mask, 'prefix_mask': prefix_mask, 'sequence_id': sequence_id, 'past_key_values': past_key_values, 'use_cache': kwargs.get('use_cache', True), \"images\": kwargs.get(\"images\", None)}\n\n    def initialize_vision_tokenizer(self, mm_use_im_start_end, tokenizer, device,\n                                    tune_mm_mlp_adapter=False, pretrain_mm_mlp_adapter=None):\n        vision_config = self.get_model().vision_tower[0].config\n        vision_config.use_im_start_end = mm_use_im_start_end\n        tokenizer.add_tokens([DEFAULT_IMAGE_PATCH_TOKEN], special_tokens=True)\n        self.resize_token_embeddings(len(tokenizer))\n\n        if mm_use_im_start_end:\n            num_new_tokens = tokenizer.add_tokens([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True)\n            self.resize_token_embeddings(len(tokenizer))\n            vision_config.im_start_token, vision_config.im_end_token = tokenizer.convert_tokens_to_ids([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN])\n\n            if num_new_tokens > 0:\n                input_embeddings = (\n                    self._extracted_from_initialize_vision_tokenizer_14(\n                        num_new_tokens\n                    )\n                )\n            if tune_mm_mlp_adapter:\n                self.get_model().orig_embeds_params = [self.get_input_embeddings().weight.data.clone().to(device=device)]\n                for p in self.get_input_embeddings().parameters():\n                    p.requires_grad = True\n                for p in self.get_output_embeddings().parameters():\n                    p.requires_grad = False\n\n            if pretrain_mm_mlp_adapter:\n                mm_projector_weights = torch.load(pretrain_mm_mlp_adapter, map_location='cpu')\n                embed_tokens_weight = mm_projector_weights['transformer.wte.weight']\n                assert num_new_tokens == 2\n                if input_embeddings.shape == embed_tokens_weight.shape:\n                    input_embeddings[-num_new_tokens:] = embed_tokens_weight[-num_new_tokens:]\n                elif embed_tokens_weight.shape[0] == num_new_tokens:\n                    input_embeddings[-num_new_tokens:] = embed_tokens_weight\n                else:\n                    raise ValueError(f\"Unexpected embed_tokens_weight shape. Pretrained: {embed_tokens_weight.shape}. Current: {input_embeddings.shape}. Numer of new tokens: {num_new_tokens}.\")\n\n        vision_config.im_patch_token = tokenizer.convert_tokens_to_ids([DEFAULT_IMAGE_PATCH_TOKEN])[0]\n\n    # TODO Rename this here and in `initialize_vision_tokenizer`\n    def _extracted_from_initialize_vision_tokenizer_14(self, num_new_tokens):\n        result = self.get_input_embeddings().weight.data\n        output_embeddings = self.get_output_embeddings().weight.data\n\n        input_embeddings_avg = result[:-num_new_tokens].mean(dim=0, keepdim=True)\n        output_embeddings_avg = output_embeddings[:-num_new_tokens].mean(\n            dim=0, keepdim=True)\n\n        result[-num_new_tokens:] = input_embeddings_avg\n        output_embeddings[-num_new_tokens:] = output_embeddings_avg\n\n        return result\n\nAutoConfig.register(\"llava_mpt\", LlavaMPTConfig)\nAutoModelForCausalLM.register(LlavaMPTConfig, LlavaMPTForCausalLM)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/llava/mpt/attention.py",
    "content": "\"\"\"Attention layers.\"\"\"\nimport math\nimport warnings\nfrom typing import Optional\nimport torch\nimport torch.nn as nn\nfrom einops import rearrange\nfrom torch import nn\nfrom .norm import LPLayerNorm\n\ndef _reset_is_causal(num_query_tokens: int, num_key_tokens: int, original_is_causal: bool):\n    if original_is_causal and num_query_tokens != num_key_tokens:\n        if num_query_tokens != 1:\n            raise NotImplementedError('MPT does not support query and key with different number of tokens, unless number of query tokens is 1.')\n        else:\n            return False\n    return original_is_causal\n\ndef scaled_multihead_dot_product_attention(query, key, value, n_heads, softmax_scale=None, attn_bias=None, key_padding_mask=None, is_causal=False, dropout_p=0.0, training=False, needs_weights=False, multiquery=False):\n    q = rearrange(query, 'b s (h d) -> b h s d', h=n_heads)\n    k = rearrange(key, 'b s (h d) -> b h d s', h=1 if multiquery else n_heads)\n    v = rearrange(value, 'b s (h d) -> b h s d', h=1 if multiquery else n_heads)\n    min_val = torch.finfo(q.dtype).min\n    (b, _, s_q, d) = q.shape\n    s_k = k.size(-1)\n    if softmax_scale is None:\n        softmax_scale = 1 / math.sqrt(d)\n    attn_weight = q.matmul(k) * softmax_scale\n    if attn_bias is not None:\n        if attn_bias.size(-1) not in [1, s_k] or attn_bias.size(-2) not in [\n            1,\n            s_q,\n        ]:\n            raise RuntimeError(f'attn_bias (shape: {attn_bias.shape}) is expected to broadcast to shape: {attn_weight.shape}.')\n        attn_weight = attn_weight + attn_bias\n    if key_padding_mask is not None:\n        if attn_bias is not None:\n            warnings.warn('Propogating key_padding_mask to the attention module ' + 'and applying it within the attention module can cause ' + 'unneccessary computation/memory usage. Consider integrating ' + 'into attn_bias once and passing that to each attention ' + 'module instead.')\n        attn_weight = attn_weight.masked_fill(~key_padding_mask.view((b, 1, 1, s_k)), min_val)\n    if is_causal:\n        s = max(s_q, s_k)\n        causal_mask = attn_weight.new_ones(s, s, dtype=torch.float16)\n        causal_mask = causal_mask.tril()\n        causal_mask = causal_mask.to(torch.bool)\n        causal_mask = ~causal_mask\n        causal_mask = causal_mask[-s_q:, -s_k:]\n        attn_weight = attn_weight.masked_fill(causal_mask.view(1, 1, s_q, s_k), min_val)\n    attn_weight = torch.softmax(attn_weight, dim=-1)\n    if dropout_p:\n        attn_weight = torch.nn.functional.dropout(attn_weight, p=dropout_p, training=training, inplace=True)\n    out = attn_weight.matmul(v)\n    out = rearrange(out, 'b h s d -> b s (h d)')\n    return (out, attn_weight) if needs_weights else (out, None)\n\ndef check_valid_inputs(*tensors, valid_dtypes=None):\n    if valid_dtypes is None:\n        valid_dtypes = [torch.float16, torch.bfloat16]\n    for tensor in tensors:\n        if tensor.dtype not in valid_dtypes:\n            raise TypeError(f'tensor.dtype={tensor.dtype!r} must be in valid_dtypes={valid_dtypes!r}.')\n        if not tensor.is_cuda:\n            raise TypeError(f'Inputs must be cuda tensors (tensor.is_cuda={tensor.is_cuda!r}).')\n\ndef flash_attn_fn(query, key, value, n_heads, softmax_scale=None, attn_bias=None, key_padding_mask=None, is_causal=False, dropout_p=0.0, training=False, needs_weights=False, multiquery=False):\n    try:\n        from flash_attn import bert_padding, flash_attn_interface\n    except:\n        raise RuntimeError('Please install flash-attn==1.0.3.post0')\n    check_valid_inputs(query, key, value)\n    if attn_bias is not None:\n        raise NotImplementedError('attn_bias not implemented for flash attn.')\n    (batch_size, seqlen) = query.shape[:2]\n    if key_padding_mask is None:\n        key_padding_mask = torch.ones_like(key[:, :, 0], dtype=torch.bool)\n    query_padding_mask = key_padding_mask[:, -query.size(1):]\n    (query_unpad, indices_q, cu_seqlens_q, max_seqlen_q) = bert_padding.unpad_input(query, query_padding_mask)\n    query_unpad = rearrange(query_unpad, 'nnz (h d) -> nnz h d', h=n_heads)\n    (key_unpad, _, cu_seqlens_k, max_seqlen_k) = bert_padding.unpad_input(key, key_padding_mask)\n    key_unpad = rearrange(key_unpad, 'nnz (h d) -> nnz h d', h=1 if multiquery else n_heads)\n    (value_unpad, _, _, _) = bert_padding.unpad_input(value, key_padding_mask)\n    value_unpad = rearrange(value_unpad, 'nnz (h d) -> nnz h d', h=1 if multiquery else n_heads)\n    if multiquery:\n        key_unpad = key_unpad.expand(key_unpad.size(0), n_heads, key_unpad.size(-1))\n        value_unpad = value_unpad.expand(value_unpad.size(0), n_heads, value_unpad.size(-1))\n    dropout_p = dropout_p if training else 0.0\n    reset_is_causal = _reset_is_causal(query.size(1), key.size(1), is_causal)\n    output_unpad = flash_attn_interface.flash_attn_unpadded_func(query_unpad, key_unpad, value_unpad, cu_seqlens_q, cu_seqlens_k, max_seqlen_q, max_seqlen_k, dropout_p, softmax_scale=softmax_scale, causal=reset_is_causal, return_attn_probs=needs_weights)\n    output = bert_padding.pad_input(rearrange(output_unpad, 'nnz h d -> nnz (h d)'), indices_q, batch_size, seqlen)\n    return (output, None)\n\ndef triton_flash_attn_fn(query, key, value, n_heads, softmax_scale=None, attn_bias=None, key_padding_mask=None, is_causal=False, dropout_p=0.0, training=False, needs_weights=False, multiquery=False):\n    try:\n        from flash_attn import flash_attn_triton\n    except:\n        raise RuntimeError('Please install flash-attn==1.0.3.post0 and triton==2.0.0.dev20221202')\n    check_valid_inputs(query, key, value)\n    if dropout_p:\n        raise NotImplementedError('Dropout not implemented for attn_impl: triton.')\n    if needs_weights:\n        raise NotImplementedError('attn_impl: triton cannot return attn weights.')\n    if key_padding_mask is not None:\n        warnings.warn('Propagating key_padding_mask to the attention module ' + 'and applying it within the attention module can cause ' + 'unnecessary computation/memory usage. Consider integrating ' + 'into attn_bias once and passing that to each attention ' + 'module instead.')\n        (b_size, s_k) = key_padding_mask.shape[:2]\n        if attn_bias is None:\n            attn_bias = query.new_zeros(b_size, 1, 1, s_k)\n        attn_bias = attn_bias.masked_fill(~key_padding_mask.view((b_size, 1, 1, s_k)), torch.finfo(query.dtype).min)\n    query = rearrange(query, 'b s (h d) -> b s h d', h=n_heads)\n    key = rearrange(key, 'b s (h d) -> b s h d', h=1 if multiquery else n_heads)\n    value = rearrange(value, 'b s (h d) -> b s h d', h=1 if multiquery else n_heads)\n    if multiquery:\n        key = key.expand(*key.shape[:2], n_heads, key.size(-1))\n        value = value.expand(*value.shape[:2], n_heads, value.size(-1))\n    reset_is_causal = _reset_is_causal(query.size(1), key.size(1), is_causal)\n    attn_output = flash_attn_triton.flash_attn_func(query, key, value, attn_bias, reset_is_causal, softmax_scale)\n    output = attn_output.view(*attn_output.shape[:2], -1)\n    return (output, None)\n\nclass MultiheadAttention(nn.Module):\n    \"\"\"Multi-head self attention.\n\n    Using torch or triton attention implemetation enables user to also use\n    additive bias.\n    \"\"\"\n\n    def __init__(self, d_model: int, n_heads: int, attn_impl: str='triton', clip_qkv: Optional[float]=None, qk_ln: bool=False, softmax_scale: Optional[float]=None, attn_pdrop: float=0.0, low_precision_layernorm: bool=False, device: Optional[str]=None):\n        super().__init__()\n        self.attn_impl = attn_impl\n        self.clip_qkv = clip_qkv\n        self.qk_ln = qk_ln\n        self.d_model = d_model\n        self.n_heads = n_heads\n        self.softmax_scale = softmax_scale\n        if self.softmax_scale is None:\n            self.softmax_scale = 1 / math.sqrt(self.d_model / self.n_heads)\n        self.attn_dropout_p = attn_pdrop\n        self.Wqkv = nn.Linear(self.d_model, 3 * self.d_model, device=device)\n        fuse_splits = (d_model, 2 * d_model)\n        self.Wqkv._fused = (0, fuse_splits)\n        if self.qk_ln:\n            layernorm_class = LPLayerNorm if low_precision_layernorm else nn.LayerNorm\n            self.q_ln = layernorm_class(self.d_model, device=device)\n            self.k_ln = layernorm_class(self.d_model, device=device)\n        if self.attn_impl == 'flash':\n            self.attn_fn = flash_attn_fn\n        elif self.attn_impl == 'triton':\n            self.attn_fn = triton_flash_attn_fn\n            warnings.warn('While `attn_impl: triton` can be faster than `attn_impl: flash` ' + 'it uses more memory. When training larger models this can trigger ' + 'alloc retries which hurts performance. If encountered, we recommend ' + 'using `attn_impl: flash` if your model does not use `alibi` or `prefix_lm`.')\n        elif self.attn_impl == 'torch':\n            self.attn_fn = scaled_multihead_dot_product_attention\n            if torch.cuda.is_available():\n                warnings.warn('Using `attn_impl: torch`. If your model does not use `alibi` or ' + '`prefix_lm` we recommend using `attn_impl: flash` otherwise ' + 'we recommend using `attn_impl: triton`.')\n        else:\n            raise ValueError(f'attn_impl={attn_impl!r} is an invalid setting.')\n        self.out_proj = nn.Linear(self.d_model, self.d_model, device=device)\n        self.out_proj._is_residual = True\n\n    def forward(self, x, past_key_value=None, attn_bias=None, attention_mask=None, is_causal=True, needs_weights=False):\n        qkv = self.Wqkv(x)\n        if self.clip_qkv:\n            qkv.clamp_(min=-self.clip_qkv, max=self.clip_qkv)\n        (query, key, value) = qkv.chunk(3, dim=2)\n        key_padding_mask = attention_mask\n        if self.qk_ln:\n            dtype = query.dtype\n            query = self.q_ln(query).to(dtype)\n            key = self.k_ln(key).to(dtype)\n        if past_key_value is not None:\n            if len(past_key_value) != 0:\n                key = torch.cat([past_key_value[0], key], dim=1)\n                value = torch.cat([past_key_value[1], value], dim=1)\n            past_key_value = (key, value)\n        if attn_bias is not None:\n            attn_bias = attn_bias[:, :, -query.size(1):, -key.size(1):]\n        (context, attn_weights) = self.attn_fn(query, key, value, self.n_heads, softmax_scale=self.softmax_scale, attn_bias=attn_bias, key_padding_mask=key_padding_mask, is_causal=is_causal, dropout_p=self.attn_dropout_p, training=self.training, needs_weights=needs_weights)\n        return (self.out_proj(context), attn_weights, past_key_value)\n\nclass MultiQueryAttention(nn.Module):\n    \"\"\"Multi-Query self attention.\n\n    Using torch or triton attention implemetation enables user to also use\n    additive bias.\n    \"\"\"\n\n    def __init__(self, d_model: int, n_heads: int, attn_impl: str='triton', clip_qkv: Optional[float]=None, qk_ln: bool=False, softmax_scale: Optional[float]=None, attn_pdrop: float=0.0, low_precision_layernorm: bool=False, device: Optional[str]=None):\n        super().__init__()\n        self.attn_impl = attn_impl\n        self.clip_qkv = clip_qkv\n        self.qk_ln = qk_ln\n        self.d_model = d_model\n        self.n_heads = n_heads\n        self.head_dim = d_model // n_heads\n        self.softmax_scale = softmax_scale\n        if self.softmax_scale is None:\n            self.softmax_scale = 1 / math.sqrt(self.head_dim)\n        self.attn_dropout_p = attn_pdrop\n        self.Wqkv = nn.Linear(d_model, d_model + 2 * self.head_dim, device=device)\n        fuse_splits = (d_model, d_model + self.head_dim)\n        self.Wqkv._fused = (0, fuse_splits)\n        if self.qk_ln:\n            layernorm_class = LPLayerNorm if low_precision_layernorm else nn.LayerNorm\n            self.q_ln = layernorm_class(d_model, device=device)\n            self.k_ln = layernorm_class(self.head_dim, device=device)\n        if self.attn_impl == 'flash':\n            self.attn_fn = flash_attn_fn\n        elif self.attn_impl == 'triton':\n            self.attn_fn = triton_flash_attn_fn\n            warnings.warn('While `attn_impl: triton` can be faster than `attn_impl: flash` ' + 'it uses more memory. When training larger models this can trigger ' + 'alloc retries which hurts performance. If encountered, we recommend ' + 'using `attn_impl: flash` if your model does not use `alibi` or `prefix_lm`.')\n        elif self.attn_impl == 'torch':\n            self.attn_fn = scaled_multihead_dot_product_attention\n            if torch.cuda.is_available():\n                warnings.warn('Using `attn_impl: torch`. If your model does not use `alibi` or ' + '`prefix_lm` we recommend using `attn_impl: flash` otherwise ' + 'we recommend using `attn_impl: triton`.')\n        else:\n            raise ValueError(f'attn_impl={attn_impl!r} is an invalid setting.')\n        self.out_proj = nn.Linear(self.d_model, self.d_model, device=device)\n        self.out_proj._is_residual = True\n\n    def forward(self, x, past_key_value=None, attn_bias=None, attention_mask=None, is_causal=True, needs_weights=False):\n        qkv = self.Wqkv(x)\n        if self.clip_qkv:\n            qkv.clamp_(min=-self.clip_qkv, max=self.clip_qkv)\n        (query, key, value) = qkv.split([self.d_model, self.head_dim, self.head_dim], dim=2)\n        key_padding_mask = attention_mask\n        if self.qk_ln:\n            dtype = query.dtype\n            query = self.q_ln(query).to(dtype)\n            key = self.k_ln(key).to(dtype)\n        if past_key_value is not None:\n            if len(past_key_value) != 0:\n                key = torch.cat([past_key_value[0], key], dim=1)\n                value = torch.cat([past_key_value[1], value], dim=1)\n            past_key_value = (key, value)\n        if attn_bias is not None:\n            attn_bias = attn_bias[:, :, -query.size(1):, -key.size(1):]\n        (context, attn_weights) = self.attn_fn(query, key, value, self.n_heads, softmax_scale=self.softmax_scale, attn_bias=attn_bias, key_padding_mask=key_padding_mask, is_causal=is_causal, dropout_p=self.attn_dropout_p, training=self.training, needs_weights=needs_weights, multiquery=True)\n        return (self.out_proj(context), attn_weights, past_key_value)\n\ndef attn_bias_shape(attn_impl, n_heads, seq_len, alibi, prefix_lm, causal, use_sequence_id):\n    if attn_impl == 'flash':\n        return None\n    elif attn_impl in ['torch', 'triton']:\n        if alibi:\n            if (prefix_lm or not causal) or use_sequence_id:\n                return (1, n_heads, seq_len, seq_len)\n            return (1, n_heads, 1, seq_len)\n        elif prefix_lm or use_sequence_id:\n            return (1, 1, seq_len, seq_len)\n        return None\n    else:\n        raise ValueError(f'attn_impl={attn_impl!r} is an invalid setting.')\n\ndef build_attn_bias(attn_impl, attn_bias, n_heads, seq_len, causal=False, alibi=False, alibi_bias_max=8):\n    if attn_impl == 'flash':\n        return None\n    elif attn_impl in ['torch', 'triton']:\n        if alibi:\n            (device, dtype) = (attn_bias.device, attn_bias.dtype)\n            attn_bias = attn_bias.add(build_alibi_bias(n_heads, seq_len, full=not causal, alibi_bias_max=alibi_bias_max, device=device, dtype=dtype))\n        return attn_bias\n    else:\n        raise ValueError(f'attn_impl={attn_impl!r} is an invalid setting.')\n\ndef gen_slopes(n_heads, alibi_bias_max=8, device=None):\n    _n_heads = 2 ** math.ceil(math.log2(n_heads))\n    m = torch.arange(1, _n_heads + 1, dtype=torch.float32, device=device)\n    m = m.mul(alibi_bias_max / _n_heads)\n    slopes = 1.0 / torch.pow(2, m)\n    if _n_heads != n_heads:\n        slopes = torch.concat([slopes[1::2], slopes[::2]])[:n_heads]\n    return slopes.view(1, n_heads, 1, 1)\n\ndef build_alibi_bias(n_heads, seq_len, full=False, alibi_bias_max=8, device=None, dtype=None):\n    alibi_bias = torch.arange(1 - seq_len, 1, dtype=torch.int32, device=device).view(1, 1, 1, seq_len)\n    if full:\n        alibi_bias = alibi_bias - torch.arange(1 - seq_len, 1, dtype=torch.int32, device=device).view(1, 1, seq_len, 1)\n        alibi_bias = alibi_bias.abs().mul(-1)\n    slopes = gen_slopes(n_heads, alibi_bias_max, device=device)\n    alibi_bias = alibi_bias * slopes\n    return alibi_bias.to(dtype=dtype)\nATTN_CLASS_REGISTRY = {'multihead_attention': MultiheadAttention, 'multiquery_attention': MultiQueryAttention}"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/llava/mpt/blocks.py",
    "content": "\"\"\"GPT Blocks used for the GPT Model.\"\"\"\nfrom typing import Dict, Optional, Tuple\nimport torch\nimport torch.nn as nn\nfrom .attention import ATTN_CLASS_REGISTRY\nfrom .norm import NORM_CLASS_REGISTRY\n\nclass MPTMLP(nn.Module):\n\n    def __init__(self, d_model: int, expansion_ratio: int, device: Optional[str]=None):\n        super().__init__()\n        self.up_proj = nn.Linear(d_model, expansion_ratio * d_model, device=device)\n        self.act = nn.GELU(approximate='none')\n        self.down_proj = nn.Linear(expansion_ratio * d_model, d_model, device=device)\n        self.down_proj._is_residual = True\n\n    def forward(self, x):\n        return self.down_proj(self.act(self.up_proj(x)))\n\nclass MPTBlock(nn.Module):\n\n    def __init__(self, d_model: int, n_heads: int, expansion_ratio: int, attn_config: Dict = None, resid_pdrop: float=0.0, norm_type: str='low_precision_layernorm', device: Optional[str]=None, **kwargs):\n        if attn_config is None:\n            attn_config = {\n                'attn_type': 'multihead_attention',\n                'attn_pdrop': 0.0,\n                'attn_impl': 'triton',\n                'qk_ln': False,\n                'clip_qkv': None,\n                'softmax_scale': None,\n                'prefix_lm': False,\n                'attn_uses_sequence_id': False,\n                'alibi': False,\n                'alibi_bias_max': 8,\n            }\n        del kwargs\n        super().__init__()\n        norm_class = NORM_CLASS_REGISTRY[norm_type.lower()]\n        attn_class = ATTN_CLASS_REGISTRY[attn_config['attn_type']]\n        self.norm_1 = norm_class(d_model, device=device)\n        self.attn = attn_class(attn_impl=attn_config['attn_impl'], clip_qkv=attn_config['clip_qkv'], qk_ln=attn_config['qk_ln'], softmax_scale=attn_config['softmax_scale'], attn_pdrop=attn_config['attn_pdrop'], d_model=d_model, n_heads=n_heads, device=device)\n        self.norm_2 = norm_class(d_model, device=device)\n        self.ffn = MPTMLP(d_model=d_model, expansion_ratio=expansion_ratio, device=device)\n        self.resid_attn_dropout = nn.Dropout(resid_pdrop)\n        self.resid_ffn_dropout = nn.Dropout(resid_pdrop)\n\n    def forward(self, x: torch.Tensor, past_key_value: Optional[Tuple[torch.Tensor]]=None, attn_bias: Optional[torch.Tensor]=None, attention_mask: Optional[torch.ByteTensor]=None, is_causal: bool=True) -> Tuple[torch.Tensor, Optional[Tuple[torch.Tensor]]]:\n        a = self.norm_1(x)\n        (b, _, past_key_value) = self.attn(a, past_key_value=past_key_value, attn_bias=attn_bias, attention_mask=attention_mask, is_causal=is_causal)\n        x = x + self.resid_attn_dropout(b)\n        m = self.norm_2(x)\n        n = self.ffn(m)\n        x = x + self.resid_ffn_dropout(n)\n        return (x, past_key_value)"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/llava/mpt/configuration_mpt.py",
    "content": "\"\"\"A HuggingFace-style model configuration.\"\"\"\nfrom typing import Dict, Optional, Union\nfrom transformers import PretrainedConfig\nattn_config_defaults: Dict = {'attn_type': 'multihead_attention', 'attn_pdrop': 0.0, 'attn_impl': 'triton', 'qk_ln': False, 'clip_qkv': None, 'softmax_scale': None, 'prefix_lm': False, 'attn_uses_sequence_id': False, 'alibi': False, 'alibi_bias_max': 8}\ninit_config_defaults: Dict = {'name': 'kaiming_normal_', 'fan_mode': 'fan_in', 'init_nonlinearity': 'relu'}\n\nclass MPTConfig(PretrainedConfig):\n    model_type = 'mpt'\n\n    def __init__(self, d_model: int=2048, n_heads: int=16, n_layers: int=24, expansion_ratio: int=4, max_seq_len: int=2048, vocab_size: int=50368, resid_pdrop: float=0.0, emb_pdrop: float=0.0, learned_pos_emb: bool=True, attn_config: Dict=attn_config_defaults, init_device: str='cpu', logit_scale: Optional[Union[float, str]]=None, no_bias: bool=False, verbose: int=0, embedding_fraction: float=1.0, norm_type: str='low_precision_layernorm', use_cache: bool=False, init_config: Dict=init_config_defaults, **kwargs):\n        \"\"\"The MPT configuration class.\n\n        Args:\n            d_model (int): The size of the embedding dimension of the model.\n            n_heads (int): The number of attention heads.\n            n_layers (int): The number of layers in the model.\n            expansion_ratio (int): The ratio of the up/down scale in the MLP.\n            max_seq_len (int): The maximum sequence length of the model.\n            vocab_size (int): The size of the vocabulary.\n            resid_pdrop (float): The dropout probability applied to the attention output before combining with residual.\n            emb_pdrop (float): The dropout probability for the embedding layer.\n            learned_pos_emb (bool): Whether to use learned positional embeddings\n            attn_config (Dict):  A dictionary used to configure the model's attention module:\n                attn_type (str): type of attention to use. Options: multihead_attention, multiquery_attention\n                attn_pdrop (float): The dropout probability for the attention layers.\n                attn_impl (str): The attention implementation to use. One of 'torch', 'flash', or 'triton'.\n                qk_ln (bool): Whether to apply layer normalization to the queries and keys in the attention layer.\n                clip_qkv (Optional[float]): If not None, clip the queries, keys, and values in the attention layer to\n                    this value.\n                softmax_scale (Optional[float]): If not None, scale the softmax in the attention layer by this value. If None,\n                    use the default scale of ``1/sqrt(d_keys)``.\n                prefix_lm (Optional[bool]): Whether the model should operate as a Prefix LM. This requires passing an\n                    extra `prefix_mask` argument which indicates which tokens belong to the prefix. Tokens in the prefix\n                    can attend to one another bi-directionally. Tokens outside the prefix use causal attention.\n                attn_uses_sequence_id (Optional[bool]): Whether to restrict attention to tokens that have the same sequence_id.\n                    When the model is in `train` mode, this requires passing an extra `sequence_id` argument which indicates\n                    which sub-sequence each token belongs to.\n                    Defaults to ``False`` meaning any provided `sequence_id` will be ignored.\n                alibi (bool): Whether to use the alibi bias instead of position embeddings.\n                alibi_bias_max (int): The maximum value of the alibi bias.\n            init_device (str): The device to use for parameter initialization.\n            logit_scale (Optional[Union[float, str]]): If not None, scale the logits by this value.\n            no_bias (bool): Whether to use bias in all layers.\n            verbose (int): The verbosity level. 0 is silent.\n            embedding_fraction (float): The fraction to scale the gradients of the embedding layer by.\n            norm_type (str): choose type of norm to use\n            multiquery_attention (bool): Whether to use multiquery attention implementation.\n            use_cache (bool): Whether or not the model should return the last key/values attentions\n            init_config (Dict): A dictionary used to configure the model initialization:\n                init_config.name: The parameter initialization scheme to use. Options: 'default_', 'baseline_',\n                    'kaiming_uniform_', 'kaiming_normal_', 'neox_init_', 'small_init_', 'xavier_uniform_', or\n                    'xavier_normal_'. These mimic the parameter initialization methods in PyTorch.\n                init_div_is_residual (Union[int, float, str, bool]): Value to divide initial weights by if ``module._is_residual`` is True.\n                emb_init_std (Optional[float]): The standard deviation of the normal distribution used to initialize the embedding layer.\n                emb_init_uniform_lim (Optional[Union[Tuple[float, float], float]]): The lower and upper limits of the uniform distribution\n                    used to initialize the embedding layer. Mutually exclusive with ``emb_init_std``.\n                init_std (float): The standard deviation of the normal distribution used to initialize the model,\n                    if using the baseline_ parameter initialization scheme.\n                init_gain (float): The gain to use for parameter initialization with kaiming or xavier initialization schemes.\n                fan_mode (str): The fan mode to use for parameter initialization with kaiming initialization schemes.\n                init_nonlinearity (str): The nonlinearity to use for parameter initialization with kaiming initialization schemes.\n                ---\n                See llmfoundry.models.utils.param_init_fns.py for info on other param init config options\n        \"\"\"\n        self.d_model = d_model\n        self.n_heads = n_heads\n        self.n_layers = n_layers\n        self.expansion_ratio = expansion_ratio\n        self.max_seq_len = max_seq_len\n        self.vocab_size = vocab_size\n        self.resid_pdrop = resid_pdrop\n        self.emb_pdrop = emb_pdrop\n        self.learned_pos_emb = learned_pos_emb\n        self.attn_config = attn_config\n        self.init_device = init_device\n        self.logit_scale = logit_scale\n        self.no_bias = no_bias\n        self.verbose = verbose\n        self.embedding_fraction = embedding_fraction\n        self.norm_type = norm_type\n        self.use_cache = use_cache\n        self.init_config = init_config\n        if 'name' in kwargs:\n            del kwargs['name']\n        if 'loss_fn' in kwargs:\n            del kwargs['loss_fn']\n        super().__init__(**kwargs)\n        self._validate_config()\n\n    def _set_config_defaults(self, config, config_defaults):\n        for (k, v) in config_defaults.items():\n            if k not in config:\n                config[k] = v\n        return config\n\n    def _validate_config(self):\n        self.attn_config = self._set_config_defaults(self.attn_config, attn_config_defaults)\n        self.init_config = self._set_config_defaults(self.init_config, init_config_defaults)\n        if self.d_model % self.n_heads != 0:\n            raise ValueError('d_model must be divisible by n_heads')\n        if any((prob < 0 or prob > 1 for prob in [self.attn_config['attn_pdrop'], self.resid_pdrop, self.emb_pdrop])):\n            raise ValueError(\"self.attn_config['attn_pdrop'], resid_pdrop, emb_pdrop are probabilities and must be between 0 and 1\")\n        if self.attn_config['attn_impl'] not in ['torch', 'flash', 'triton']:\n            raise ValueError(f\"Unknown attn_impl={self.attn_config['attn_impl']}\")\n        if self.attn_config['prefix_lm'] and self.attn_config['attn_impl'] not in ['torch', 'triton']:\n            raise NotImplementedError('prefix_lm only implemented with torch and triton attention.')\n        if self.attn_config['alibi'] and self.attn_config['attn_impl'] not in ['torch', 'triton']:\n            raise NotImplementedError('alibi only implemented with torch and triton attention.')\n        if self.attn_config['attn_uses_sequence_id'] and self.attn_config['attn_impl'] not in ['torch', 'triton']:\n            raise NotImplementedError('attn_uses_sequence_id only implemented with torch and triton attention.')\n        if self.embedding_fraction > 1 or self.embedding_fraction <= 0:\n            raise ValueError('model.embedding_fraction must be between 0 (exclusive) and 1 (inclusive)!')\n        if isinstance(self.logit_scale, str) and self.logit_scale != 'inv_sqrt_d_model':\n            raise ValueError(f\"self.logit_scale={self.logit_scale!r} is not recognized as an option; use numeric value or 'inv_sqrt_d_model'.\")\n        if self.init_config.get('name', None) is None:\n            raise ValueError(f\"self.init_config={self.init_config!r} 'name' needs to be set.\")\n        if not self.learned_pos_emb and (not self.attn_config['alibi']):\n            raise ValueError(\n                'Positional information must be provided to the model using either learned_pos_emb or alibi.'\n            )"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/llava/mpt/modeling_mpt.py",
    "content": "\"\"\"A simple, flexible implementation of a GPT model.\n\nInspired by https://github.com/karpathy/minGPT/blob/master/mingpt/model.py\n\"\"\"\nimport math\nimport warnings\nfrom typing import List, Optional, Tuple, Union\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom transformers import PreTrainedModel, PreTrainedTokenizer, PreTrainedTokenizerFast\nfrom transformers.modeling_outputs import BaseModelOutputWithPast, CausalLMOutputWithPast\nfrom .attention import attn_bias_shape, build_attn_bias\nfrom .blocks import MPTBlock\nfrom .norm import NORM_CLASS_REGISTRY\nfrom .configuration_mpt import MPTConfig\nfrom .param_init_fns import MODEL_INIT_REGISTRY, generic_param_init_fn_\nTokenizer = Union[PreTrainedTokenizer, PreTrainedTokenizerFast]\n\nfrom transformers.utils import logging\nlogger = logging.get_logger(__name__)\n\nclass MPTPreTrainedModel(PreTrainedModel):\n    config_class = MPTConfig\n    base_model_prefix = 'model'\n\nclass MPTModel(MPTPreTrainedModel):\n\n    def __init__(self, config: MPTConfig):\n        config._validate_config()\n        super().__init__(config)\n        self.attn_impl = config.attn_config['attn_impl']\n        self.prefix_lm = config.attn_config['prefix_lm']\n        self.attn_uses_sequence_id = config.attn_config['attn_uses_sequence_id']\n        self.alibi = config.attn_config['alibi']\n        self.alibi_bias_max = config.attn_config['alibi_bias_max']\n        if config.norm_type.lower() not in NORM_CLASS_REGISTRY.keys():\n            norm_options = ' | '.join(NORM_CLASS_REGISTRY.keys())\n            raise NotImplementedError(f'Requested norm type ({config.norm_type}) is not implemented within this repo (Options: {norm_options}).')\n        norm_class = NORM_CLASS_REGISTRY[config.norm_type.lower()]\n        self.embedding_fraction = config.embedding_fraction\n        self.wte = nn.Embedding(config.vocab_size, config.d_model, device=config.init_device)\n        if not self.alibi:\n            self.wpe = nn.Embedding(config.max_seq_len, config.d_model, device=config.init_device)\n        self.emb_drop = nn.Dropout(config.emb_pdrop)\n        self.blocks = nn.ModuleList([MPTBlock(device=config.init_device, **config.to_dict()) for _ in range(config.n_layers)])\n        self.norm_f = norm_class(config.d_model, device=config.init_device)\n        if config.init_device != 'meta':\n            self.apply(self.param_init_fn)\n        self.is_causal = not self.prefix_lm\n        self._attn_bias_initialized = False\n        self.attn_bias = None\n        self.attn_bias_shape = attn_bias_shape(self.attn_impl, config.n_heads, config.max_seq_len, self.alibi, prefix_lm=self.prefix_lm, causal=self.is_causal, use_sequence_id=self.attn_uses_sequence_id)\n        if config.no_bias:\n            for module in self.modules():\n                if hasattr(module, 'bias') and isinstance(module.bias, nn.Parameter):\n                    if config.verbose:\n                        warnings.warn(f'Removing bias ({module.bias}) from {module}.')\n                    module.register_parameter('bias', None)\n        if config.verbose and config.verbose > 2:\n            print(self)\n        if 'verbose' not in self.config.init_config:\n            self.config.init_config['verbose'] = self.config.verbose\n        if self.config.init_config['verbose'] > 1:\n            init_fn_name = self.config.init_config['name']\n            warnings.warn(f'Using {init_fn_name} initialization.')\n        self.gradient_checkpointing = False\n\n    def get_input_embeddings(self):\n        return self.wte\n\n    def set_input_embeddings(self, value):\n        self.wte = value\n\n    @torch.no_grad()\n    def _attn_bias(self, device, dtype, attention_mask: Optional[torch.ByteTensor]=None, prefix_mask: Optional[torch.ByteTensor]=None, sequence_id: Optional[torch.LongTensor]=None):\n        if not self._attn_bias_initialized:\n            if self.attn_bias_shape:\n                self.attn_bias = torch.zeros(self.attn_bias_shape, device=device, dtype=dtype)\n                self.attn_bias = build_attn_bias(self.attn_impl, self.attn_bias, self.config.n_heads, self.config.max_seq_len, causal=self.is_causal, alibi=self.alibi, alibi_bias_max=self.alibi_bias_max)\n            self._attn_bias_initialized = True\n        if self.attn_impl == 'flash':\n            return (self.attn_bias, attention_mask)\n        if self.attn_bias is not None:\n            self.attn_bias = self.attn_bias.to(dtype=dtype, device=device)\n        attn_bias = self.attn_bias\n        if self.prefix_lm:\n            assert isinstance(attn_bias, torch.Tensor)\n            assert isinstance(prefix_mask, torch.Tensor)\n            attn_bias = self._apply_prefix_mask(attn_bias, prefix_mask)\n        if self.attn_uses_sequence_id and sequence_id is not None:\n            assert isinstance(attn_bias, torch.Tensor)\n            attn_bias = self._apply_sequence_id(attn_bias, sequence_id)\n        if attention_mask is not None:\n            s_k = attention_mask.shape[-1]\n            if attn_bias is None:\n                attn_bias = torch.zeros((1, 1, 1, s_k), device=device, dtype=dtype)\n            else:\n                attn_bias = attn_bias[:, :, :, -s_k:]\n            if prefix_mask is not None and attention_mask.shape != prefix_mask.shape:\n                raise ValueError(f'attention_mask shape={attention_mask.shape} ' + f'and prefix_mask shape={prefix_mask.shape} are not equal.')\n            min_val = torch.finfo(attn_bias.dtype).min\n            attn_bias = attn_bias.masked_fill(~attention_mask.view(-1, 1, 1, s_k), min_val)\n        return (attn_bias, None)\n\n    def _apply_prefix_mask(self, attn_bias: torch.Tensor, prefix_mask: torch.Tensor):\n        (s_k, s_q) = attn_bias.shape[-2:]\n        if s_k != self.config.max_seq_len or s_q != self.config.max_seq_len:\n            raise ValueError(\n                f'attn_bias does not match the expected shape. The last two dimensions should both be {self.config.max_length} '\n                + f'but are {s_k} and {s_q}.'\n            )\n        seq_len = prefix_mask.shape[-1]\n        if seq_len > self.config.max_seq_len:\n            raise ValueError(f'prefix_mask sequence length cannot exceed max_seq_len={self.config.max_seq_len}')\n        attn_bias = attn_bias[..., :seq_len, :seq_len]\n        causal = torch.tril(torch.ones((seq_len, seq_len), dtype=torch.bool, device=prefix_mask.device)).view(1, 1, seq_len, seq_len)\n        prefix = prefix_mask.view(-1, 1, 1, seq_len)\n        cannot_attend = ~torch.logical_or(causal, prefix.bool())\n        return self._extracted_from__apply_sequence_id_15(attn_bias, cannot_attend)\n\n    def _apply_sequence_id(self, attn_bias: torch.Tensor, sequence_id: torch.LongTensor):\n        seq_len = sequence_id.shape[-1]\n        if seq_len > self.config.max_seq_len:\n            raise ValueError(f'sequence_id sequence length cannot exceed max_seq_len={self.config.max_seq_len}')\n        attn_bias = attn_bias[..., :seq_len, :seq_len]\n        cannot_attend = torch.logical_not(torch.eq(sequence_id.view(-1, seq_len, 1), sequence_id.view(-1, 1, seq_len))).unsqueeze(1)\n        return self._extracted_from__apply_sequence_id_15(attn_bias, cannot_attend)\n\n    # TODO Rename this here and in `_apply_prefix_mask` and `_apply_sequence_id`\n    def _extracted_from__apply_sequence_id_15(self, attn_bias, cannot_attend):\n        min_val = torch.finfo(attn_bias.dtype).min\n        attn_bias = attn_bias.masked_fill(cannot_attend, min_val)\n        return attn_bias\n\n    def forward(self, input_ids: torch.LongTensor, past_key_values: Optional[List[Tuple[torch.FloatTensor]]]=None, attention_mask: Optional[torch.ByteTensor]=None, prefix_mask: Optional[torch.ByteTensor]=None, sequence_id: Optional[torch.LongTensor]=None, return_dict: Optional[bool]=None, output_attentions: Optional[bool]=None, output_hidden_states: Optional[bool]=None, use_cache: Optional[bool]=None, tok_emb: Optional[torch.FloatTensor]=None):\n        return_dict = return_dict if return_dict is not None else self.config.return_dict\n        use_cache = use_cache if use_cache is not None else self.config.use_cache\n\n        if self.gradient_checkpointing and self.training and use_cache:\n            logger.warning_once(\n                \"`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\"\n            )\n            use_cache = False\n        if attention_mask is not None:\n            attention_mask = attention_mask.bool()\n        if prefix_mask is not None:\n            prefix_mask = prefix_mask.bool()\n        if not return_dict:\n            raise NotImplementedError('return_dict False is not implemented yet for MPT')\n        if output_attentions:\n            raise NotImplementedError('output_attentions is not implemented yet for MPT')\n        if attention_mask is not None and attention_mask[:, 0].sum() != attention_mask.shape[0] and self.training:\n            raise NotImplementedError('MPT does not support training with left padding.')\n        if self.prefix_lm and prefix_mask is None:\n            raise ValueError('prefix_mask is a required argument when MPT is configured with prefix_lm=True.')\n        if self.training:\n            if self.attn_uses_sequence_id and sequence_id is None:\n                raise ValueError('sequence_id is a required argument when MPT is configured with attn_uses_sequence_id=True ' + 'and the model is in train mode.')\n            elif self.attn_uses_sequence_id is False and sequence_id is not None:\n                warnings.warn('MPT received non-None input for `sequence_id` but is configured with attn_uses_sequence_id=False. ' + 'This input will be ignored. If you want the model to use `sequence_id`, set attn_uses_sequence_id to True.')\n        if input_ids is not None:\n            S = input_ids.size(1)\n            assert S <= self.config.max_seq_len, f'Cannot forward input with seq_len={S}, this model only supports seq_len<={self.config.max_seq_len}'\n            tok_emb = self.wte(input_ids)\n        else:\n            assert tok_emb is not None\n            S = tok_emb.size(1)\n        if self.alibi:\n            x = tok_emb\n        else:\n            past_position = 0\n            if past_key_values is not None:\n                if len(past_key_values) != self.config.n_layers:\n                    raise ValueError(\n                        f'past_key_values must provide a past_key_value for each attention layer in the network (len(past_key_values)={len(past_key_values)!r}; self.config.n_layers={self.config.n_layers!r}).'\n                    )\n                past_position = past_key_values[0][0].size(1)\n            if S + past_position > self.config.max_seq_len:\n                raise ValueError(f'Cannot forward input with past sequence length {past_position} and current sequence length {S + 1}, this model only supports total sequence length <= {self.config.max_seq_len}.')\n            pos = torch.arange(past_position, S + past_position, dtype=torch.long, device=input_ids.device).unsqueeze(0)\n            if attention_mask is not None:\n                pos = torch.clamp(pos - torch.cumsum((~attention_mask).to(torch.int32), dim=1)[:, past_position:], min=0)\n            pos_emb = self.wpe(pos)\n            x = tok_emb + pos_emb\n        if self.embedding_fraction == 1:\n            x = self.emb_drop(x)\n        else:\n            x_shrunk = x * self.embedding_fraction + x.detach() * (1 - self.embedding_fraction)\n            assert isinstance(self.emb_drop, nn.Module)\n            x = self.emb_drop(x_shrunk)\n        (attn_bias, attention_mask) = self._attn_bias(device=x.device, dtype=x.dtype, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id)\n        if use_cache and past_key_values is None:\n            past_key_values = [() for _ in range(self.config.n_layers)]\n        all_hidden_states = () if output_hidden_states else None\n        for (b_idx, block) in enumerate(self.blocks):\n            if output_hidden_states:\n                assert all_hidden_states is not None\n                all_hidden_states = all_hidden_states + (x,)\n            past_key_value = past_key_values[b_idx] if past_key_values is not None else None\n            if self.gradient_checkpointing and self.training:\n                (x, past_key_value) = torch.utils.checkpoint.checkpoint(\n                    block,\n                    x, past_key_value, attn_bias, attention_mask, self.is_causal\n                )\n            else:\n                (x, past_key_value) = block(x, past_key_value=past_key_value, attn_bias=attn_bias, attention_mask=attention_mask, is_causal=self.is_causal)\n            if past_key_values is not None:\n                past_key_values[b_idx] = past_key_value\n        x = self.norm_f(x)\n        return BaseModelOutputWithPast(last_hidden_state=x, past_key_values=past_key_values, hidden_states=all_hidden_states)\n\n    def param_init_fn(self, module):\n        init_fn_name = self.config.init_config['name']\n        MODEL_INIT_REGISTRY[init_fn_name](module=module, n_layers=self.config.n_layers, d_model=self.config.d_model, **self.config.init_config)\n\n    def fsdp_wrap_fn(self, module):\n        return isinstance(module, MPTBlock)\n\n    def activation_checkpointing_fn(self, module):\n        return isinstance(module, MPTBlock)\n\nclass MPTForCausalLM(MPTPreTrainedModel):\n\n    def __init__(self, config: MPTConfig):\n        super().__init__(config)\n        if not config.tie_word_embeddings:\n            raise ValueError('MPTForCausalLM only supports tied word embeddings')\n        self.transformer = MPTModel(config)\n        self.logit_scale = None\n        if config.logit_scale is not None:\n            logit_scale = config.logit_scale\n            if isinstance(logit_scale, str):\n                if logit_scale == 'inv_sqrt_d_model':\n                    logit_scale = 1 / math.sqrt(config.d_model)\n                else:\n                    raise ValueError(f\"logit_scale={logit_scale!r} is not recognized as an option; use numeric value or 'inv_sqrt_d_model'.\")\n            self.logit_scale = logit_scale\n\n    def get_input_embeddings(self):\n        return self.transformer.wte\n\n    def set_input_embeddings(self, value):\n        self.transformer.wte = value\n\n    def get_output_embeddings(self):\n        return self.transformer.wte\n\n    def set_output_embeddings(self, new_embeddings):\n        self.transformer.wte = new_embeddings\n\n    def set_decoder(self, decoder):\n        self.transformer = decoder\n\n    def get_decoder(self):\n        return self.transformer\n\n    def forward(self, input_ids: torch.LongTensor, past_key_values: Optional[List[Tuple[torch.FloatTensor]]]=None, attention_mask: Optional[torch.ByteTensor]=None, prefix_mask: Optional[torch.ByteTensor]=None, sequence_id: Optional[torch.LongTensor]=None, labels: Optional[torch.LongTensor]=None, return_dict: Optional[bool]=None, output_attentions: Optional[bool]=None, output_hidden_states: Optional[bool]=None, use_cache: Optional[bool]=None):\n        return_dict = return_dict if return_dict is not None else self.config.return_dict\n        use_cache = use_cache if use_cache is not None else self.config.use_cache\n        outputs = self.transformer(input_ids=input_ids, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache)\n        logits = F.linear(outputs.last_hidden_state, self.transformer.wte.weight)\n        if self.logit_scale is not None:\n            if self.logit_scale == 0:\n                warnings.warn(f'Multiplying logits by self.logit_scale={self.logit_scale!r}. This will produce uniform (uninformative) outputs.')\n            logits *= self.logit_scale\n        loss = None\n        if labels is not None:\n            labels = torch.roll(labels, shifts=-1)\n            labels[:, -1] = -100\n            loss = F.cross_entropy(logits.view(-1, logits.size(-1)), labels.to(logits.device).view(-1))\n        return CausalLMOutputWithPast(loss=loss, logits=logits, past_key_values=outputs.past_key_values, hidden_states=outputs.hidden_states)\n\n    def param_init_fn(self, module):\n        init_fn_name = self.config.init_config['name']\n        MODEL_INIT_REGISTRY[init_fn_name](module=module, n_layers=self.config.n_layers, d_model=self.config.d_model, **self.config.init_config)\n\n    def fsdp_wrap_fn(self, module):\n        return isinstance(module, MPTBlock)\n\n    def activation_checkpointing_fn(self, module):\n        return isinstance(module, MPTBlock)\n\n    def prepare_inputs_for_generation(self, input_ids, past_key_values=None, inputs_embeds=None, **kwargs):\n        if inputs_embeds is not None:\n            raise NotImplementedError('inputs_embeds is not implemented for MPT yet')\n        attention_mask = kwargs['attention_mask'].bool()\n        if attention_mask[:, -1].sum() != attention_mask.shape[0]:\n            raise NotImplementedError('MPT does not support generation with right padding.')\n        if self.transformer.attn_uses_sequence_id and self.training:\n            sequence_id = torch.zeros_like(input_ids[:1])\n        else:\n            sequence_id = None\n        if past_key_values is not None:\n            input_ids = input_ids[:, -1].unsqueeze(-1)\n        if self.transformer.prefix_lm:\n            prefix_mask = torch.ones_like(attention_mask)\n            if kwargs.get('use_cache') == False:\n                raise NotImplementedError('MPT with prefix_lm=True does not support use_cache=False.')\n        else:\n            prefix_mask = None\n        return {'input_ids': input_ids, 'attention_mask': attention_mask, 'prefix_mask': prefix_mask, 'sequence_id': sequence_id, 'past_key_values': past_key_values, 'use_cache': kwargs.get('use_cache', True)}\n\n    @staticmethod\n    def _reorder_cache(past_key_values, beam_idx):\n        \"\"\"Used by HuggingFace generate when using beam search with kv-caching.\n\n        See https://github.com/huggingface/transformers/blob/3ec7a47664ebe40c40f4b722f6bb1cd30c3821ec/src/transformers/models/gpt2/modeling_gpt2.py#L1122-L1133\n        for an example in transformers.\n        \"\"\"\n        return [\n            tuple(\n                (past_state.index_select(0, beam_idx) for past_state in layer_past)\n            )\n            for layer_past in past_key_values\n        ]"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/llava/mpt/norm.py",
    "content": "import torch\n\ndef _cast_if_autocast_enabled(tensor):\n    if torch.is_autocast_enabled():\n        if tensor.device.type == 'cuda':\n            dtype = torch.get_autocast_gpu_dtype()\n        elif tensor.device.type == 'cpu':\n            dtype = torch.get_autocast_cpu_dtype()\n        else:\n            raise NotImplementedError()\n        return tensor.to(dtype=dtype)\n    return tensor\n\nclass LPLayerNorm(torch.nn.LayerNorm):\n\n    def __init__(self, normalized_shape, eps=1e-05, elementwise_affine=True, device=None, dtype=None):\n        super().__init__(normalized_shape=normalized_shape, eps=eps, elementwise_affine=elementwise_affine, device=device, dtype=dtype)\n\n    def forward(self, x):\n        module_device = x.device\n        downcast_x = _cast_if_autocast_enabled(x)\n        downcast_weight = _cast_if_autocast_enabled(self.weight) if self.weight is not None else self.weight\n        downcast_bias = _cast_if_autocast_enabled(self.bias) if self.bias is not None else self.bias\n        with torch.autocast(enabled=False, device_type=module_device.type):\n            return torch.nn.functional.layer_norm(downcast_x, self.normalized_shape, downcast_weight, downcast_bias, self.eps)\n\ndef rms_norm(x, weight=None, eps=1e-05):\n    output = x / torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + eps)\n    return output * weight if weight is not None else output\n\nclass RMSNorm(torch.nn.Module):\n\n    def __init__(self, normalized_shape, eps=1e-05, weight=True, dtype=None, device=None):\n        super().__init__()\n        self.eps = eps\n        if weight:\n            self.weight = torch.nn.Parameter(torch.ones(normalized_shape, dtype=dtype, device=device))\n        else:\n            self.register_parameter('weight', None)\n\n    def forward(self, x):\n        return rms_norm(x.float(), self.weight, self.eps).to(dtype=x.dtype)\n\nclass LPRMSNorm(RMSNorm):\n\n    def __init__(self, normalized_shape, eps=1e-05, weight=True, dtype=None, device=None):\n        super().__init__(normalized_shape=normalized_shape, eps=eps, weight=weight, dtype=dtype, device=device)\n\n    def forward(self, x):\n        downcast_x = _cast_if_autocast_enabled(x)\n        downcast_weight = _cast_if_autocast_enabled(self.weight) if self.weight is not None else self.weight\n        with torch.autocast(enabled=False, device_type=x.device.type):\n            return rms_norm(downcast_x, downcast_weight, self.eps).to(dtype=x.dtype)\nNORM_CLASS_REGISTRY = {'layernorm': torch.nn.LayerNorm, 'low_precision_layernorm': LPLayerNorm, 'rmsnorm': RMSNorm, 'low_precision_rmsnorm': LPRMSNorm}"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/llava/mpt/param_init_fns.py",
    "content": "import math\nimport warnings\nfrom collections.abc import Sequence\nfrom functools import partial\nfrom typing import Optional, Tuple, Union\nimport torch\nfrom torch import nn\nfrom .norm import NORM_CLASS_REGISTRY\n\ndef torch_default_param_init_fn_(module: nn.Module, verbose: int=0, **kwargs):\n    del kwargs\n    if verbose > 1:\n        warnings.warn(\"Initializing network using module's reset_parameters attribute\")\n    if hasattr(module, 'reset_parameters'):\n        module.reset_parameters()\n\ndef fused_init_helper_(module: nn.Module, init_fn_):\n    _fused = getattr(module, '_fused', None)\n    if _fused is None:\n        raise RuntimeError('Internal logic error')\n    (dim, splits) = _fused\n    splits = (0, *splits, module.weight.size(dim))\n    for (s, e) in zip(splits[:-1], splits[1:]):\n        slice_indices = [slice(None)] * module.weight.ndim\n        slice_indices[dim] = slice(s, e)\n        init_fn_(module.weight[slice_indices])\n\ndef generic_param_init_fn_(module: nn.Module, init_fn_, n_layers: int, d_model: Optional[int]=None, init_div_is_residual: Union[int, float, str, bool]=True, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, verbose: int=0, **kwargs):\n    del kwargs\n    if verbose > 1:\n        warnings.warn('If model has bias parameters they are initialized to 0.')\n    init_div_is_residual = init_div_is_residual\n    if init_div_is_residual is False:\n        div_is_residual = 1.0\n    elif init_div_is_residual is True:\n        div_is_residual = math.sqrt(2 * n_layers)\n    elif isinstance(init_div_is_residual, (float, int)):\n        div_is_residual = init_div_is_residual\n    elif isinstance(init_div_is_residual, str) and init_div_is_residual.isnumeric():\n        div_is_residual = float(init_div_is_residual)\n    else:\n        div_is_residual = 1.0\n        raise ValueError(f'Expected init_div_is_residual to be boolean or numeric, got {init_div_is_residual}')\n    if init_div_is_residual is not False and verbose > 1:\n        warnings.warn(\n            f'Initializing _is_residual layers then dividing them by {div_is_residual:.3f}. Set `init_div_is_residual: false` in init config to disable this.'\n        )\n    if isinstance(module, nn.Linear):\n        if hasattr(module, '_fused'):\n            fused_init_helper_(module, init_fn_)\n        else:\n            init_fn_(module.weight)\n        if module.bias is not None:\n            torch.nn.init.zeros_(module.bias)\n        if init_div_is_residual is not False and getattr(module, '_is_residual', False):\n            with torch.no_grad():\n                module.weight.div_(div_is_residual)\n    elif isinstance(module, nn.Embedding):\n        if emb_init_std is not None:\n            std = emb_init_std\n            if std == 0:\n                warnings.warn('Embedding layer initialized to 0.')\n            emb_init_fn_ = partial(torch.nn.init.normal_, mean=0.0, std=std)\n            if verbose > 1:\n                warnings.warn(f'Embedding layer initialized using normal distribution with mean=0 and std={std!r}.')\n        elif emb_init_uniform_lim is not None:\n            lim = emb_init_uniform_lim\n            if isinstance(lim, Sequence):\n                if len(lim) > 2:\n                    raise ValueError(f'Uniform init requires a min and a max limit. User input: {lim}.')\n                if lim[0] == lim[1]:\n                    warnings.warn(f'Embedding layer initialized to {lim[0]}.')\n            else:\n                if lim == 0:\n                    warnings.warn('Embedding layer initialized to 0.')\n                lim = [-lim, lim]\n            (a, b) = lim\n            emb_init_fn_ = partial(torch.nn.init.uniform_, a=a, b=b)\n            if verbose > 1:\n                warnings.warn(f'Embedding layer initialized using uniform distribution in range {lim}.')\n        else:\n            emb_init_fn_ = init_fn_\n        emb_init_fn_(module.weight)\n    elif isinstance(module, tuple(set(NORM_CLASS_REGISTRY.values()))):\n        if verbose > 1:\n            warnings.warn(\n                'Norm weights are set to 1. If norm layer has a bias it is initialized to 0.'\n            )\n        if hasattr(module, 'weight') and module.weight is not None:\n            torch.nn.init.ones_(module.weight)\n        if hasattr(module, 'bias') and module.bias is not None:\n            torch.nn.init.zeros_(module.bias)\n    elif isinstance(module, nn.MultiheadAttention):\n        if module._qkv_same_embed_dim:\n            _extracted_from_generic_param_init_fn__69(module, d_model, init_fn_)\n        else:\n            assert module.q_proj_weight is not None and module.k_proj_weight is not None and (module.v_proj_weight is not None)\n            assert module.in_proj_weight is None\n            init_fn_(module.q_proj_weight)\n            init_fn_(module.k_proj_weight)\n            init_fn_(module.v_proj_weight)\n        if module.in_proj_bias is not None:\n            torch.nn.init.zeros_(module.in_proj_bias)\n        if module.bias_k is not None:\n            torch.nn.init.zeros_(module.bias_k)\n        if module.bias_v is not None:\n            torch.nn.init.zeros_(module.bias_v)\n        init_fn_(module.out_proj.weight)\n        if init_div_is_residual is not False and getattr(module.out_proj, '_is_residual', False):\n            with torch.no_grad():\n                module.out_proj.weight.div_(div_is_residual)\n        if module.out_proj.bias is not None:\n            torch.nn.init.zeros_(module.out_proj.bias)\n    else:\n        for _ in module.parameters(recurse=False):\n            raise NotImplementedError(f'{module.__class__.__name__} parameters are not initialized by param_init_fn.')\n\n\n# TODO Rename this here and in `generic_param_init_fn_`\ndef _extracted_from_generic_param_init_fn__69(module, d_model, init_fn_):\n    assert module.in_proj_weight is not None\n    assert module.q_proj_weight is None and module.k_proj_weight is None and (module.v_proj_weight is None)\n    assert d_model is not None\n    _d = d_model\n    splits = (0, _d, 2 * _d, 3 * _d)\n    for (s, e) in zip(splits[:-1], splits[1:]):\n        init_fn_(module.in_proj_weight[s:e])\n\ndef _normal_init_(std, mean=0.0):\n    return partial(torch.nn.init.normal_, mean=mean, std=std)\n\ndef _normal_param_init_fn_(module: nn.Module, std: float, n_layers: int, d_model: Optional[int]=None, init_div_is_residual: Union[int, float, str, bool]=True, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, verbose: int=0, **kwargs):\n    del kwargs\n    init_fn_ = _normal_init_(std=std)\n    if verbose > 1:\n        warnings.warn(f'Using torch.nn.init.normal_ init fn mean=0.0, std={std}')\n    generic_param_init_fn_(module=module, init_fn_=init_fn_, d_model=d_model, n_layers=n_layers, init_div_is_residual=init_div_is_residual, emb_init_std=emb_init_std, emb_init_uniform_lim=emb_init_uniform_lim, verbose=verbose)\n\ndef baseline_param_init_fn_(module: nn.Module, init_std: float, n_layers: int, d_model: Optional[int]=None, init_div_is_residual: Union[int, float, str, bool]=True, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, verbose: int=0, **kwargs):\n    del kwargs\n    if init_std is None:\n        raise ValueError(\"You must set model.init_config['init_std'] to a float value to use the default initialization scheme.\")\n    _normal_param_init_fn_(module=module, std=init_std, d_model=d_model, n_layers=n_layers, init_div_is_residual=init_div_is_residual, emb_init_std=emb_init_std, emb_init_uniform_lim=emb_init_uniform_lim, verbose=verbose)\n\ndef small_param_init_fn_(module: nn.Module, n_layers: int, d_model: int, init_div_is_residual: Union[int, float, str, bool]=True, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, verbose: int=0, **kwargs):\n    del kwargs\n    std = math.sqrt(2 / (5 * d_model))\n    _normal_param_init_fn_(module=module, std=std, d_model=d_model, n_layers=n_layers, init_div_is_residual=init_div_is_residual, emb_init_std=emb_init_std, emb_init_uniform_lim=emb_init_uniform_lim, verbose=verbose)\n\ndef neox_param_init_fn_(module: nn.Module, n_layers: int, d_model: int, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, verbose: int=0, **kwargs):\n    \"\"\"From section 2.3.1 of GPT-NeoX-20B:\n\n    An Open-Source AutoregressiveLanguage Model — Black et. al. (2022)\n    see https://github.com/EleutherAI/gpt-neox/blob/9610391ab319403cef079b438edd016a2443af54/megatron/model/init_functions.py#L151\n    and https://github.com/EleutherAI/gpt-neox/blob/main/megatron/model/transformer.py\n    \"\"\"\n    del kwargs\n    residual_div = n_layers / math.sqrt(10)\n    if verbose > 1:\n        warnings.warn(f'setting init_div_is_residual to {residual_div}')\n    small_param_init_fn_(module=module, d_model=d_model, n_layers=n_layers, init_div_is_residual=residual_div, emb_init_std=emb_init_std, emb_init_uniform_lim=emb_init_uniform_lim, verbose=verbose)\n\ndef kaiming_uniform_param_init_fn_(module: nn.Module, n_layers: int, d_model: Optional[int]=None, init_div_is_residual: Union[int, float, str, bool]=True, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, init_gain: float=0, fan_mode: str='fan_in', init_nonlinearity: str='leaky_relu', verbose: int=0, **kwargs):\n    del kwargs\n    if verbose > 1:\n        warnings.warn(\n            f'Using nn.init.kaiming_uniform_ init fn with parameters: a={init_gain}, mode={fan_mode}, nonlinearity={init_nonlinearity}'\n        )\n    kaiming_uniform_ = partial(nn.init.kaiming_uniform_, a=init_gain, mode=fan_mode, nonlinearity=init_nonlinearity)\n    generic_param_init_fn_(module=module, init_fn_=kaiming_uniform_, d_model=d_model, n_layers=n_layers, init_div_is_residual=init_div_is_residual, emb_init_std=emb_init_std, emb_init_uniform_lim=emb_init_uniform_lim, verbose=verbose)\n\ndef kaiming_normal_param_init_fn_(module: nn.Module, n_layers: int, d_model: Optional[int]=None, init_div_is_residual: Union[int, float, str, bool]=True, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, init_gain: float=0, fan_mode: str='fan_in', init_nonlinearity: str='leaky_relu', verbose: int=0, **kwargs):\n    del kwargs\n    if verbose > 1:\n        warnings.warn(\n            f'Using nn.init.kaiming_normal_ init fn with parameters: a={init_gain}, mode={fan_mode}, nonlinearity={init_nonlinearity}'\n        )\n    kaiming_normal_ = partial(torch.nn.init.kaiming_normal_, a=init_gain, mode=fan_mode, nonlinearity=init_nonlinearity)\n    generic_param_init_fn_(module=module, init_fn_=kaiming_normal_, d_model=d_model, n_layers=n_layers, init_div_is_residual=init_div_is_residual, emb_init_std=emb_init_std, emb_init_uniform_lim=emb_init_uniform_lim, verbose=verbose)\n\ndef xavier_uniform_param_init_fn_(module: nn.Module, n_layers: int, d_model: Optional[int]=None, init_div_is_residual: Union[int, float, str, bool]=True, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, init_gain: float=0, verbose: int=0, **kwargs):\n    del kwargs\n    xavier_uniform_ = partial(torch.nn.init.xavier_uniform_, gain=init_gain)\n    if verbose > 1:\n        warnings.warn(\n            f'Using torch.nn.init.xavier_uniform_ init fn with parameters: gain={init_gain}'\n        )\n    generic_param_init_fn_(module=module, init_fn_=xavier_uniform_, d_model=d_model, n_layers=n_layers, init_div_is_residual=init_div_is_residual, emb_init_std=emb_init_std, emb_init_uniform_lim=emb_init_uniform_lim, verbose=verbose)\n\ndef xavier_normal_param_init_fn_(module: nn.Module, n_layers: int, d_model: Optional[int]=None, init_div_is_residual: Union[int, float, str, bool]=True, emb_init_std: Optional[float]=None, emb_init_uniform_lim: Optional[Union[Tuple[float, float], float]]=None, init_gain: float=0, verbose: int=0, **kwargs):\n    xavier_normal_ = partial(torch.nn.init.xavier_normal_, gain=init_gain)\n    if verbose > 1:\n        warnings.warn(\n            f'Using torch.nn.init.xavier_normal_ init fn with parameters: gain={init_gain}'\n        )\n    generic_param_init_fn_(module=module, init_fn_=xavier_normal_, d_model=d_model, n_layers=n_layers, init_div_is_residual=init_div_is_residual, emb_init_std=emb_init_std, emb_init_uniform_lim=emb_init_uniform_lim, verbose=verbose)\nMODEL_INIT_REGISTRY = {'default_': torch_default_param_init_fn_, 'baseline_': baseline_param_init_fn_, 'kaiming_uniform_': kaiming_uniform_param_init_fn_, 'kaiming_normal_': kaiming_normal_param_init_fn_, 'neox_init_': neox_param_init_fn_, 'small_init_': small_param_init_fn_, 'xavier_uniform_': xavier_uniform_param_init_fn_, 'xavier_normal_': xavier_normal_param_init_fn_}"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/nets/PixArt.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# GLIDE: https://github.com/openai/glide-text2im\n# MAE: https://github.com/facebookresearch/mae/blob/main/models_mae.py\n# --------------------------------------------------------\nimport math\nimport torch\nimport torch.nn as nn\nimport os\nimport numpy as np\nfrom timm.models.layers import DropPath\nfrom timm.models.vision_transformer import PatchEmbed, Mlp\n\nfrom diffusion.model.builder import MODELS\nfrom diffusion.model.utils import auto_grad_checkpoint, to_2tuple\nfrom diffusion.model.nets.PixArt_blocks import t2i_modulate, CaptionEmbedder, WindowAttention, MultiHeadCrossAttention, T2IFinalLayer, TimestepEmbedder, LabelEmbedder, FinalLayer\nfrom diffusion.utils.logger import get_root_logger\nfrom diffusion.model.cache_functions import global_force_fresh, cache_cutfresh, update_cache, force_init\nimport json\n\nclass PixArtBlock(nn.Module):\n    \"\"\"\n    A PixArt block with adaptive layer norm (adaLN-single) conditioning.\n    \"\"\"\n\n    def __init__(self, hidden_size, num_heads, mlp_ratio=4.0, drop_path=0., window_size=0, input_size=None, use_rel_pos=False, **block_kwargs):\n        super().__init__()\n        self.hidden_size = hidden_size\n        self.norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n        self.attn = WindowAttention(hidden_size, num_heads=num_heads, qkv_bias=True,\n                                    input_size=input_size if window_size == 0 else (window_size, window_size),\n                                    use_rel_pos=use_rel_pos, **block_kwargs)\n        self.cross_attn = MultiHeadCrossAttention(hidden_size, num_heads, **block_kwargs)\n        self.norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n        # to be compatible with lower version pytorch\n        approx_gelu = lambda: nn.GELU(approximate=\"tanh\")\n        self.mlp = Mlp(in_features=hidden_size, hidden_features=int(hidden_size * mlp_ratio), act_layer=approx_gelu, drop=0)\n        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n        self.window_size = window_size\n        self.scale_shift_table = nn.Parameter(torch.randn(6, hidden_size) / hidden_size ** 0.5)\n\n    def forward(self, x, y, t, current, cache_dic, mask=None, **kwargs):\n        B, N, C = x.shape\n\n        shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (self.scale_shift_table[None] + t.reshape(B, 6, -1)).chunk(6, dim=1)\n        is_force_fresh = global_force_fresh(cache_dic, current)\n        current['is_force_fresh'] = is_force_fresh\n        \n        if is_force_fresh: # Compute all tokens, and save them to cache\n            current['module'] = 'attn'\n            cache_dic['cache'][-1][current['layer']][current['module']], cache_dic['attn_map'][-1][current['layer']] = self.attn(t2i_modulate(self.norm1(x), shift_msa, scale_msa))#.reshape(B, N, C)\n            force_init(cache_dic, current, x)\n            x = x + self.drop_path(gate_msa * cache_dic['cache'][-1][current['layer']][current['module']])\n\n            current['module'] = 'cross-attn'\n            cache_dic['cache'][-1][current['layer']][current['module']], cache_dic['cross_attn_map'][-1][current['layer']] = self.cross_attn(x, y, mask)\n            force_init(cache_dic, current, x)\n            x = x + cache_dic['cache'][-1][current['layer']][current['module']]\n\n            current['module'] = 'mlp'\n            cache_dic['cache'][-1][current['layer']][current['module']] = self.mlp(t2i_modulate(self.norm2(x), shift_mlp, scale_mlp))\n            force_init(cache_dic, current, x)\n            x = x + self.drop_path(gate_mlp * cache_dic['cache'][-1][current['layer']][current['module']])\n\n        else: \n            current['module'] = 'attn' \n            # no partial computation for attn. if you want to have an exploration, below may help.\n            #fresh_indices, fresh_tokens = cache_cutfresh(cache_dic, x, current)\n            #fresh_tokens, fresh_attn_map = self.attn(t2i_modulate(self.norm1(fresh_tokens), shift_msa, scale_msa))#.reshape(B, N, C)\n            #update_cache(fresh_indices, fresh_tokens=fresh_tokens, cache_dic=cache_dic, current=current, fresh_attn_map=fresh_attn_map)\n            #cache_dic['cache'][-1][current['layer']][current['module']], cache_dic['attn_map'][-1][current['layer']] = self.attn(t2i_modulate(self.norm1(x), shift_msa, scale_msa))#.reshape(B, N, C)\n            \n            x = x + self.drop_path(gate_msa * cache_dic['cache'][-1][current['layer']][current['module']])\n\n            current['module'] = 'cross-attn'\n            fresh_indices, fresh_tokens = cache_cutfresh(cache_dic, x, current)\n            fresh_tokens, fresh_cross_attn_map = self.cross_attn(fresh_tokens, y, mask)\n            update_cache(fresh_indices, fresh_tokens=fresh_tokens, cache_dic=cache_dic, current=current, fresh_attn_map=fresh_cross_attn_map)\n\n            x = x + cache_dic['cache'][-1][current['layer']][current['module']]\n\n            current['module'] = 'mlp'\n            fresh_indices, fresh_tokens = cache_cutfresh(cache_dic, x, current)\n            fresh_tokens = self.mlp(t2i_modulate(self.norm2(fresh_tokens), shift_mlp, scale_mlp))\n            update_cache(fresh_indices, fresh_tokens=fresh_tokens, cache_dic=cache_dic, current=current)\n            \n            x = x + self.drop_path(gate_mlp * cache_dic['cache'][-1][current['layer']][current['module']])\n\n        return x\n\n#############################################################################\n#                                 Core PixArt Model                                #\n#################################################################################\n@MODELS.register_module()\nclass PixArt(nn.Module):\n    \"\"\"\n    Diffusion model with a Transformer backbone.\n    \"\"\"\n\n    def __init__(self, input_size=32, patch_size=2, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, class_dropout_prob=0.1, pred_sigma=True, drop_path: float = 0., window_size=0, window_block_indexes=None, use_rel_pos=False, caption_channels=4096, lewei_scale=1.0, config=None, model_max_length=120, **kwargs):\n        if window_block_indexes is None:\n            window_block_indexes = []\n        super().__init__()\n        self.pred_sigma = pred_sigma\n        self.in_channels = in_channels\n        self.out_channels = in_channels * 2 if pred_sigma else in_channels\n        self.patch_size = patch_size\n        self.num_heads = num_heads\n        self.lewei_scale = lewei_scale,\n\n        self.x_embedder = PatchEmbed(input_size, patch_size, in_channels, hidden_size, bias=True)\n        self.t_embedder = TimestepEmbedder(hidden_size)\n        num_patches = self.x_embedder.num_patches\n        self.base_size = input_size // self.patch_size\n        # Will use fixed sin-cos embedding:\n        self.register_buffer(\"pos_embed\", torch.zeros(1, num_patches, hidden_size))\n\n        approx_gelu = lambda: nn.GELU(approximate=\"tanh\")\n        self.t_block = nn.Sequential(\n            nn.SiLU(),\n            nn.Linear(hidden_size, 6 * hidden_size, bias=True)\n        )\n        self.y_embedder = CaptionEmbedder(in_channels=caption_channels, hidden_size=hidden_size, uncond_prob=class_dropout_prob, act_layer=approx_gelu, token_num=model_max_length)\n        drop_path = [x.item() for x in torch.linspace(0, drop_path, depth)]  # stochastic depth decay rule\n        self.blocks = nn.ModuleList([\n            PixArtBlock(hidden_size, num_heads, mlp_ratio=mlp_ratio, drop_path=drop_path[i],\n                          input_size=(input_size // patch_size, input_size // patch_size),\n                          window_size=window_size if i in window_block_indexes else 0,\n                          use_rel_pos=use_rel_pos if i in window_block_indexes else False)\n            for i in range(depth)\n        ])\n        self.final_layer = T2IFinalLayer(hidden_size, patch_size, self.out_channels)\n\n        self.initialize_weights()\n\n        if config:\n            logger = get_root_logger(os.path.join(config.work_dir, 'train_log.log'))\n            logger.warning(f\"lewei scale: {self.lewei_scale}, base size: {self.base_size}\")\n        else:\n            print(f'Warning: lewei scale: {self.lewei_scale}, base size: {self.base_size}')\n\n    def forward(self, x, timestep, current, cache_dic, y, mask=None, data_info=None, **kwargs):\n        \"\"\"\n        Forward pass of PixArt.\n        x: (N, C, H, W) tensor of spatial inputs (images or latent representations of images)\n        t: (N,) tensor of diffusion timesteps\n        y: (N, 1, 120, C) tensor of class labels\n        \"\"\"\n        x = x.to(self.dtype)\n        timestep = timestep.to(self.dtype)\n        y = y.to(self.dtype)\n        pos_embed = self.pos_embed.to(self.dtype)\n        self.h, self.w = x.shape[-2]//self.patch_size, x.shape[-1]//self.patch_size\n        x = self.x_embedder(x) + pos_embed  # (N, T, D), where T = H * W / patch_size ** 2\n        t = self.t_embedder(timestep.to(x.dtype))  # (N, D)\n        t0 = self.t_block(t)\n        y = self.y_embedder(y, self.training)  # (N, 1, L, D)\n        if mask is not None:\n            if mask.shape[0] != y.shape[0]:\n                mask = mask.repeat(y.shape[0] // mask.shape[0], 1)\n            mask = mask.squeeze(1).squeeze(1)\n            y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])\n            y_lens = mask.sum(dim=1).tolist()\n        else:\n            y_lens = [y.shape[2]] * y.shape[0]\n            y = y.squeeze(1).view(1, -1, x.shape[-1])\n        for i, block in enumerate(self.blocks):\n            current['layer'] = i\n            x = auto_grad_checkpoint(block, x, y, t0, current, cache_dic, y_lens)  # (N, T, D) #support grad checkpoint\n        x = self.final_layer(x, t)  # (N, T, patch_size ** 2 * out_channels)\n        x = self.unpatchify(x)  # (N, out_channels, H, W)\n        return x\n\n    def forward_with_dpmsolver(self, x, timestep, current, cache_dic, y, mask=None, **kwargs):\n        \"\"\"\n        dpm solver donnot need variance prediction\n        \"\"\"\n        # https://github.com/openai/glide-text2im/blob/main/notebooks/text2im.ipynb\n        model_out = self.forward(x, timestep, current, cache_dic, y, mask)\n        return model_out.chunk(2, dim=1)[0]\n\n    def forward_with_cfg(self, x, timestep, current, cache_dic, y, cfg_scale, mask=None, **kwargs):\n        \"\"\"\n        Forward pass of PixArt, but also batches the unconditional forward pass for classifier-free guidance.\n        \"\"\"\n        # https://github.com/openai/glide-text2im/blob/main/notebooks/text2im.ipynb\n        half = x[: len(x) // 2]\n        combined = torch.cat([half, half], dim=0)\n        model_out = self.forward(combined, timestep, current, cache_dic, y, mask, kwargs)\n        model_out = model_out['x'] if isinstance(model_out, dict) else model_out\n        eps, rest = model_out[:, :3], model_out[:, 3:]\n        cond_eps, uncond_eps = torch.split(eps, len(eps) // 2, dim=0)\n        half_eps = uncond_eps + cfg_scale * (cond_eps - uncond_eps)\n        eps = torch.cat([half_eps, half_eps], dim=0)\n        return torch.cat([eps, rest], dim=1)\n\n    def unpatchify(self, x):\n        \"\"\"\n        x: (N, T, patch_size**2 * C)\n        imgs: (N, H, W, C)\n        \"\"\"\n        c = self.out_channels\n        p = self.x_embedder.patch_size[0]\n        h = w = int(x.shape[1] ** 0.5)\n        assert h * w == x.shape[1]\n\n        x = x.reshape(shape=(x.shape[0], h, w, p, p, c))\n        x = torch.einsum('nhwpqc->nchpwq', x)\n        return x.reshape(shape=(x.shape[0], c, h * p, h * p))\n\n    def initialize_weights(self):\n        # Initialize transformer layers:\n        def _basic_init(module):\n            if isinstance(module, nn.Linear):\n                torch.nn.init.xavier_uniform_(module.weight)\n                if module.bias is not None:\n                    nn.init.constant_(module.bias, 0)\n\n        self.apply(_basic_init)\n\n        # Initialize (and freeze) pos_embed by sin-cos embedding:\n        pos_embed = get_2d_sincos_pos_embed(self.pos_embed.shape[-1], int(self.x_embedder.num_patches ** 0.5), lewei_scale=self.lewei_scale, base_size=self.base_size)\n        self.pos_embed.data.copy_(torch.from_numpy(pos_embed).float().unsqueeze(0))\n\n        # Initialize patch_embed like nn.Linear (instead of nn.Conv2d):\n        w = self.x_embedder.proj.weight.data\n        nn.init.xavier_uniform_(w.view([w.shape[0], -1]))\n\n        # Initialize timestep embedding MLP:\n        nn.init.normal_(self.t_embedder.mlp[0].weight, std=0.02)\n        nn.init.normal_(self.t_embedder.mlp[2].weight, std=0.02)\n        nn.init.normal_(self.t_block[1].weight, std=0.02)\n\n        # Initialize caption embedding MLP:\n        nn.init.normal_(self.y_embedder.y_proj.fc1.weight, std=0.02)\n        nn.init.normal_(self.y_embedder.y_proj.fc2.weight, std=0.02)\n\n        # Zero-out adaLN modulation layers in PixArt blocks:\n        for block in self.blocks:\n            nn.init.constant_(block.cross_attn.proj.weight, 0)\n            nn.init.constant_(block.cross_attn.proj.bias, 0)\n\n        # Zero-out output layers:\n        nn.init.constant_(self.final_layer.linear.weight, 0)\n        nn.init.constant_(self.final_layer.linear.bias, 0)\n\n    @property\n    def dtype(self):\n        return next(self.parameters()).dtype\n\n\ndef get_2d_sincos_pos_embed(embed_dim, grid_size, cls_token=False, extra_tokens=0, lewei_scale=1.0, base_size=16):\n    \"\"\"\n    grid_size: int of the grid height and width\n    return:\n    pos_embed: [grid_size*grid_size, embed_dim] or [1+grid_size*grid_size, embed_dim] (w/ or w/o cls_token)\n    \"\"\"\n    if isinstance(grid_size, int):\n        grid_size = to_2tuple(grid_size)\n    grid_h = np.arange(grid_size[0], dtype=np.float32) / (grid_size[0]/base_size) / lewei_scale\n    grid_w = np.arange(grid_size[1], dtype=np.float32) / (grid_size[1]/base_size) / lewei_scale\n    grid = np.meshgrid(grid_w, grid_h)  # here w goes first\n    grid = np.stack(grid, axis=0)\n    grid = grid.reshape([2, 1, grid_size[1], grid_size[0]])\n\n    pos_embed = get_2d_sincos_pos_embed_from_grid(embed_dim, grid)\n    if cls_token and extra_tokens > 0:\n        pos_embed = np.concatenate([np.zeros([extra_tokens, embed_dim]), pos_embed], axis=0)\n    return pos_embed\n\n\ndef get_2d_sincos_pos_embed_from_grid(embed_dim, grid):\n    assert embed_dim % 2 == 0\n\n    # use half of dimensions to encode grid_h\n    emb_h = get_1d_sincos_pos_embed_from_grid(embed_dim // 2, grid[0])  # (H*W, D/2)\n    emb_w = get_1d_sincos_pos_embed_from_grid(embed_dim // 2, grid[1])  # (H*W, D/2)\n\n    return np.concatenate([emb_h, emb_w], axis=1)\n\n\ndef get_1d_sincos_pos_embed_from_grid(embed_dim, pos):\n    \"\"\"\n    embed_dim: output dimension for each position\n    pos: a list of positions to be encoded: size (M,)\n    out: (M, D)\n    \"\"\"\n    assert embed_dim % 2 == 0\n    omega = np.arange(embed_dim // 2, dtype=np.float64)\n    omega /= embed_dim / 2.\n    omega = 1. / 10000 ** omega  # (D/2,)\n\n    pos = pos.reshape(-1)  # (M,)\n    out = np.einsum('m,d->md', pos, omega)  # (M, D/2), outer product\n\n    emb_sin = np.sin(out)  # (M, D/2)\n    emb_cos = np.cos(out)  # (M, D/2)\n\n    return np.concatenate([emb_sin, emb_cos], axis=1)\n\n\n#################################################################################\n#                                   PixArt Configs                                  #\n#################################################################################\n@MODELS.register_module()\ndef PixArt_XL_2(**kwargs):\n    return PixArt(depth=28, hidden_size=1152, patch_size=2, num_heads=16, **kwargs)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/nets/PixArtMS.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# GLIDE: https://github.com/openai/glide-text2im\n# MAE: https://github.com/facebookresearch/mae/blob/main/models_mae.py\n# --------------------------------------------------------\nimport torch\nimport torch.nn as nn\nfrom timm.models.layers import DropPath\nfrom timm.models.vision_transformer import Mlp\n\nfrom diffusion.model.builder import MODELS\nfrom diffusion.model.utils import auto_grad_checkpoint, to_2tuple\nfrom diffusion.model.nets.PixArt_blocks import t2i_modulate, CaptionEmbedder, WindowAttention, MultiHeadCrossAttention, T2IFinalLayer, TimestepEmbedder, SizeEmbedder\nfrom diffusion.model.nets.PixArt import PixArt, get_2d_sincos_pos_embed\n\n\nclass PatchEmbed(nn.Module):\n    \"\"\" 2D Image to Patch Embedding\n    \"\"\"\n    def __init__(\n            self,\n            patch_size=16,\n            in_chans=3,\n            embed_dim=768,\n            norm_layer=None,\n            flatten=True,\n            bias=True,\n    ):\n        super().__init__()\n        patch_size = to_2tuple(patch_size)\n        self.patch_size = patch_size\n        self.flatten = flatten\n        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size, bias=bias)\n        self.norm = norm_layer(embed_dim) if norm_layer else nn.Identity()\n\n    def forward(self, x):\n        x = self.proj(x)\n        if self.flatten:\n            x = x.flatten(2).transpose(1, 2)  # BCHW -> BNC\n        x = self.norm(x)\n        return x\n\n\nclass PixArtMSBlock(nn.Module):\n    \"\"\"\n    A PixArt block with adaptive layer norm zero (adaLN-Zero) conditioning.\n    \"\"\"\n\n    def __init__(self, hidden_size, num_heads, mlp_ratio=4.0, drop_path=0., window_size=0, input_size=None, use_rel_pos=False, **block_kwargs):\n        super().__init__()\n        self.hidden_size = hidden_size\n        self.norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n        self.attn = WindowAttention(hidden_size, num_heads=num_heads, qkv_bias=True,\n                              input_size=input_size if window_size == 0 else (window_size, window_size),\n                              use_rel_pos=use_rel_pos, **block_kwargs)\n        self.cross_attn = MultiHeadCrossAttention(hidden_size, num_heads, **block_kwargs)\n        self.norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n        # to be compatible with lower version pytorch\n        approx_gelu = lambda: nn.GELU(approximate=\"tanh\")\n        self.mlp = Mlp(in_features=hidden_size, hidden_features=int(hidden_size * mlp_ratio), act_layer=approx_gelu, drop=0)\n        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n        self.window_size = window_size\n        self.scale_shift_table = nn.Parameter(torch.randn(6, hidden_size) / hidden_size ** 0.5)\n\n    def forward(self, x, y, t, mask=None, **kwargs):\n        B, N, C = x.shape\n\n        shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (self.scale_shift_table[None] + t.reshape(B, 6, -1)).chunk(6, dim=1)\n        x = x + self.drop_path(gate_msa * self.attn(t2i_modulate(self.norm1(x), shift_msa, scale_msa)))\n        x = x + self.cross_attn(x, y, mask)\n        x = x + self.drop_path(gate_mlp * self.mlp(t2i_modulate(self.norm2(x), shift_mlp, scale_mlp)))\n\n        return x\n\n\n#############################################################################\n#                                 Core PixArt Model                                #\n#################################################################################\n@MODELS.register_module()\nclass PixArtMS(PixArt):\n    \"\"\"\n    Diffusion model with a Transformer backbone.\n    \"\"\"\n\n    def __init__(self, input_size=32, patch_size=2, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, class_dropout_prob=0.1, learn_sigma=True, pred_sigma=True, drop_path: float = 0., window_size=0, window_block_indexes=None, use_rel_pos=False, caption_channels=4096, lewei_scale=1., config=None, model_max_length=120, **kwargs):\n        if window_block_indexes is None:\n            window_block_indexes = []\n        super().__init__(\n            input_size=input_size,\n            patch_size=patch_size,\n            in_channels=in_channels,\n            hidden_size=hidden_size,\n            depth=depth,\n            num_heads=num_heads,\n            mlp_ratio=mlp_ratio,\n            class_dropout_prob=class_dropout_prob,\n            learn_sigma=learn_sigma,\n            pred_sigma=pred_sigma,\n            drop_path=drop_path,\n            window_size=window_size,\n            window_block_indexes=window_block_indexes,\n            use_rel_pos=use_rel_pos,\n            lewei_scale=lewei_scale,\n            config=config,\n            model_max_length=model_max_length,\n            **kwargs,\n        )\n        self.h = self.w = 0\n        approx_gelu = lambda: nn.GELU(approximate=\"tanh\")\n        self.t_block = nn.Sequential(\n            nn.SiLU(),\n            nn.Linear(hidden_size, 6 * hidden_size, bias=True)\n        )\n        self.x_embedder = PatchEmbed(patch_size, in_channels, hidden_size, bias=True)\n        self.y_embedder = CaptionEmbedder(in_channels=caption_channels, hidden_size=hidden_size, uncond_prob=class_dropout_prob, act_layer=approx_gelu, token_num=model_max_length)\n        self.csize_embedder = SizeEmbedder(hidden_size//3)  # c_size embed\n        self.ar_embedder = SizeEmbedder(hidden_size//3)     # aspect ratio embed\n        drop_path = [x.item() for x in torch.linspace(0, drop_path, depth)]  # stochastic depth decay rule\n        self.blocks = nn.ModuleList([\n            PixArtMSBlock(hidden_size, num_heads, mlp_ratio=mlp_ratio, drop_path=drop_path[i],\n                          input_size=(input_size // patch_size, input_size // patch_size),\n                          window_size=window_size if i in window_block_indexes else 0,\n                          use_rel_pos=use_rel_pos if i in window_block_indexes else False)\n            for i in range(depth)\n        ])\n        self.final_layer = T2IFinalLayer(hidden_size, patch_size, self.out_channels)\n\n        self.initialize()\n\n    def forward(self, x, timestep, y, mask=None, data_info=None, **kwargs):\n        \"\"\"\n        Forward pass of PixArt.\n        x: (N, C, H, W) tensor of spatial inputs (images or latent representations of images)\n        t: (N,) tensor of diffusion timesteps\n        y: (N, 1, 120, C) tensor of class labels\n        \"\"\"\n        bs = x.shape[0]\n        x = x.to(self.dtype)\n        timestep = timestep.to(self.dtype)\n        y = y.to(self.dtype)\n        c_size, ar = data_info['img_hw'].to(self.dtype), data_info['aspect_ratio'].to(self.dtype)\n        self.h, self.w = x.shape[-2]//self.patch_size, x.shape[-1]//self.patch_size\n        pos_embed = torch.from_numpy(get_2d_sincos_pos_embed(self.pos_embed.shape[-1], (self.h, self.w), lewei_scale=self.lewei_scale, base_size=self.base_size)).unsqueeze(0).to(x.device).to(self.dtype)\n        x = self.x_embedder(x) + pos_embed  # (N, T, D), where T = H * W / patch_size ** 2\n        t = self.t_embedder(timestep)  # (N, D)\n        csize = self.csize_embedder(c_size, bs)  # (N, D)\n        ar = self.ar_embedder(ar, bs)  # (N, D)\n        t = t + torch.cat([csize, ar], dim=1)\n        t0 = self.t_block(t)\n        y = self.y_embedder(y, self.training)  # (N, D)\n        if mask is not None:\n            if mask.shape[0] != y.shape[0]:\n                mask = mask.repeat(y.shape[0] // mask.shape[0], 1)\n            mask = mask.squeeze(1).squeeze(1)\n            y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])\n            y_lens = mask.sum(dim=1).tolist()\n        else:\n            y_lens = [y.shape[2]] * y.shape[0]\n            y = y.squeeze(1).view(1, -1, x.shape[-1])\n        for block in self.blocks:\n            x = auto_grad_checkpoint(block, x, y, t0, y_lens, **kwargs)  # (N, T, D) #support grad checkpoint\n        x = self.final_layer(x, t)  # (N, T, patch_size ** 2 * out_channels)\n        x = self.unpatchify(x)  # (N, out_channels, H, W)\n        return x\n\n    def forward_with_dpmsolver(self, x, timestep, y, data_info, **kwargs):\n        \"\"\"\n        dpm solver donnot need variance prediction\n        \"\"\"\n        # https://github.com/openai/glide-text2im/blob/main/notebooks/text2im.ipynb\n        model_out = self.forward(x, timestep, y, data_info=data_info, **kwargs)\n        return model_out.chunk(2, dim=1)[0]\n\n    def forward_with_cfg(self, x, timestep, y, cfg_scale, data_info, **kwargs):\n        \"\"\"\n        Forward pass of PixArt, but also batches the unconditional forward pass for classifier-free guidance.\n        \"\"\"\n        # https://github.com/openai/glide-text2im/blob/main/notebooks/text2im.ipynb\n        half = x[: len(x) // 2]\n        combined = torch.cat([half, half], dim=0)\n        model_out = self.forward(combined, timestep, y, data_info=data_info)\n        eps, rest = model_out[:, :3], model_out[:, 3:]\n        cond_eps, uncond_eps = torch.split(eps, len(eps) // 2, dim=0)\n        half_eps = uncond_eps + cfg_scale * (cond_eps - uncond_eps)\n        eps = torch.cat([half_eps, half_eps], dim=0)\n        return torch.cat([eps, rest], dim=1)\n\n    def unpatchify(self, x):\n        \"\"\"\n        x: (N, T, patch_size**2 * C)\n        imgs: (N, H, W, C)\n        \"\"\"\n        c = self.out_channels\n        p = self.x_embedder.patch_size[0]\n        assert self.h * self.w == x.shape[1]\n\n        x = x.reshape(shape=(x.shape[0], self.h, self.w, p, p, c))\n        x = torch.einsum('nhwpqc->nchpwq', x)\n        return x.reshape(shape=(x.shape[0], c, self.h * p, self.w * p))\n\n    def initialize(self):\n        # Initialize transformer layers:\n        def _basic_init(module):\n            if isinstance(module, nn.Linear):\n                torch.nn.init.xavier_uniform_(module.weight)\n                if module.bias is not None:\n                    nn.init.constant_(module.bias, 0)\n\n        self.apply(_basic_init)\n\n        # Initialize patch_embed like nn.Linear (instead of nn.Conv2d):\n        w = self.x_embedder.proj.weight.data\n        nn.init.xavier_uniform_(w.view([w.shape[0], -1]))\n\n        # Initialize timestep embedding MLP:\n        nn.init.normal_(self.t_embedder.mlp[0].weight, std=0.02)\n        nn.init.normal_(self.t_embedder.mlp[2].weight, std=0.02)\n        nn.init.normal_(self.t_block[1].weight, std=0.02)\n        nn.init.normal_(self.csize_embedder.mlp[0].weight, std=0.02)\n        nn.init.normal_(self.csize_embedder.mlp[2].weight, std=0.02)\n        nn.init.normal_(self.ar_embedder.mlp[0].weight, std=0.02)\n        nn.init.normal_(self.ar_embedder.mlp[2].weight, std=0.02)\n\n        # Initialize caption embedding MLP:\n        nn.init.normal_(self.y_embedder.y_proj.fc1.weight, std=0.02)\n        nn.init.normal_(self.y_embedder.y_proj.fc2.weight, std=0.02)\n\n        # Zero-out adaLN modulation layers in PixArt blocks:\n        for block in self.blocks:\n            nn.init.constant_(block.cross_attn.proj.weight, 0)\n            nn.init.constant_(block.cross_attn.proj.bias, 0)\n\n        # Zero-out output layers:\n        nn.init.constant_(self.final_layer.linear.weight, 0)\n        nn.init.constant_(self.final_layer.linear.bias, 0)\n\n\n#################################################################################\n#                                   PixArt Configs                                  #\n#################################################################################\n@MODELS.register_module()\ndef PixArtMS_XL_2(**kwargs):\n    return PixArtMS(depth=28, hidden_size=1152, patch_size=2, num_heads=16, **kwargs)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/nets/PixArt_blocks.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# --------------------------------------------------------\n# References:\n# GLIDE: https://github.com/openai/glide-text2im\n# MAE: https://github.com/facebookresearch/mae/blob/main/models_mae.py\n# --------------------------------------------------------\nimport math\nimport torch\nimport torch.nn as nn\nfrom timm.models.vision_transformer import Mlp, Attention as Attention_\nfrom einops import rearrange, repeat\nimport xformers.ops\n\nfrom diffusion.model.utils import add_decomposed_rel_pos\nfrom diffusion.model.cache_functions import cached_attention_forward\n\n\ndef modulate(x, shift, scale):\n    return x * (1 + scale.unsqueeze(1)) + shift.unsqueeze(1)\n\n\ndef t2i_modulate(x, shift, scale):\n    return x * (1 + scale) + shift\n\n\nclass MultiHeadCrossAttention(nn.Module):\n    def __init__(self, d_model, num_heads, attn_drop=0., proj_drop=0., **block_kwargs):\n        super(MultiHeadCrossAttention, self).__init__()\n        assert d_model % num_heads == 0, \"d_model must be divisible by num_heads\"\n\n        self.d_model = d_model\n        self.num_heads = num_heads\n        self.head_dim = d_model // num_heads\n\n        self.q_linear = nn.Linear(d_model, d_model)\n        self.kv_linear = nn.Linear(d_model, d_model*2)\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj = nn.Linear(d_model, d_model)\n        self.proj_drop = nn.Dropout(proj_drop)\n\n    def forward(self, x, cond, mask=None):\n        # query: img tokens; key/value: condition; mask: if padding tokens\n        B, N, C = x.shape\n\n        q = self.q_linear(x).view(1, -1, self.num_heads, self.head_dim)\n        kv = self.kv_linear(cond).view(1, -1, 2, self.num_heads, self.head_dim)\n        k, v = kv.unbind(2)\n        attn_bias = None\n        if mask is not None:\n            attn_bias = xformers.ops.fmha.BlockDiagonalMask.from_seqlens([N] * B, mask)\n        #x = xformers.ops.memory_efficient_attention(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)\n        # we need to save the cross-attn map here, so we use our own function for cross-attention, not the xformers.ops.memory_efficient_attention\n        # maybe there is a future version of xformers.ops.memory_efficient_attention that can return the attn_map\n        x, attn_map = cached_attention_forward(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)\n        x = x.view(B, -1, C)\n        attn_map = attn_map.view(B, -1, attn_map.shape[-1])\n        x = self.proj(x)\n        x = self.proj_drop(x)\n\n        #q = self.q_linear(x).reshape(B, -1, self.num_heads, self.head_dim)\n        #kv = self.kv_linear(cond).reshape(B, -1, 2, self.num_heads, self.head_dim)\n        #k, v = kv.unbind(2)\n        #attn_bias = None\n        #if mask is not None:\n        #    attn_bias = torch.zeros([B * self.num_heads, q.shape[1], k.shape[1]], dtype=q.dtype, device=q.device)\n        #    attn_bias.masked_fill_(mask.squeeze(1).repeat(self.num_heads, 1, 1) == 0, float('-inf'))\n        ##x = xformers.ops.memory_efficient_attention(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)\n        #x, attn_map = cached_attention_forward(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)\n        #x = x.contiguous().reshape(B, -1, C)\n        #x = self.proj(x)\n        #x = self.proj_drop(x)\n\n        return x, attn_map\n\n\nclass WindowAttention(Attention_):\n    \"\"\"Multi-head Attention block with relative position embeddings.\"\"\"\n\n    def __init__(\n        self,\n        dim,\n        num_heads=8,\n        qkv_bias=True,\n        use_rel_pos=False,\n        rel_pos_zero_init=True,\n        input_size=None,\n        **block_kwargs,\n    ):\n        \"\"\"\n        Args:\n            dim (int): Number of input channels.\n            num_heads (int): Number of attention heads.\n            qkv_bias (bool:  If True, add a learnable bias to query, key, value.\n            rel_pos (bool): If True, add relative positional embeddings to the attention map.\n            rel_pos_zero_init (bool): If True, zero initialize relative positional parameters.\n            input_size (int or None): Input resolution for calculating the relative positional\n                parameter size.\n        \"\"\"\n        super().__init__(dim, num_heads=num_heads, qkv_bias=qkv_bias, **block_kwargs)\n\n        self.use_rel_pos = use_rel_pos\n        if self.use_rel_pos:\n            # initialize relative positional embeddings\n            self.rel_pos_h = nn.Parameter(torch.zeros(2 * input_size[0] - 1, self.head_dim))\n            self.rel_pos_w = nn.Parameter(torch.zeros(2 * input_size[1] - 1, self.head_dim))\n\n            if not rel_pos_zero_init:\n                nn.init.trunc_normal_(self.rel_pos_h, std=0.02)\n                nn.init.trunc_normal_(self.rel_pos_w, std=0.02)\n\n    def forward(self, x, mask=None):\n        B, N, C = x.shape\n        qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads)\n        q, k, v = qkv.unbind(2)\n        if use_fp32_attention := getattr(self, 'fp32_attention', False):\n            q, k, v = q.float(), k.float(), v.float()\n\n        attn_bias = None\n        if mask is not None:\n            attn_bias = torch.zeros([B * self.num_heads, q.shape[1], k.shape[1]], dtype=q.dtype, device=q.device)\n            attn_bias.masked_fill_(mask.squeeze(1).repeat(self.num_heads, 1, 1) == 0, float('-inf'))\n        #x = xformers.ops.memory_efficient_attention(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)\n        #attn_map = None\n        # we need to save the self-attn map here, so we use our own function for self-attention, not the xformers.ops.memory_efficient_attention\n        # maybe there is a future version of xformers.ops.memory_efficient_attention that can return the attn_map\n        # However, you can use the xformers.ops.memory_efficient_attention for self-attention, and use our own function for cross-attention.\n        # This is because in our final version, only cross attention map is used, you can use the xformers.ops.memory_efficient_attention for self-attention for a faster speed, if you don't need the self-attention score(s1).\n        x, attn_map = cached_attention_forward(q, k, v, p=self.attn_drop.p, attn_bias=attn_bias)\n        x = x.view(B, N, C)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        return x, attn_map\n\n\n#################################################################################\n#   AMP attention with fp32 softmax to fix loss NaN problem during training     #\n#################################################################################\nclass Attention(Attention_):\n    def forward(self, x):\n        B, N, C = x.shape\n        qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)\n        q, k, v = qkv.unbind(0)  # make torchscript happy (cannot use tensor as tuple)\n        use_fp32_attention = getattr(self, 'fp32_attention', False)\n        if use_fp32_attention:\n            q, k = q.float(), k.float()\n        with torch.cuda.amp.autocast(enabled=not use_fp32_attention):\n            attn = (q @ k.transpose(-2, -1)) * self.scale\n            attn = attn.softmax(dim=-1)\n\n        attn = self.attn_drop(attn)\n\n        x = (attn @ v).transpose(1, 2).reshape(B, N, C)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        return x\n\n\nclass FinalLayer(nn.Module):\n    \"\"\"\n    The final layer of PixArt.\n    \"\"\"\n\n    def __init__(self, hidden_size, patch_size, out_channels):\n        super().__init__()\n        self.norm_final = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n        self.linear = nn.Linear(hidden_size, patch_size * patch_size * out_channels, bias=True)\n        self.adaLN_modulation = nn.Sequential(\n            nn.SiLU(),\n            nn.Linear(hidden_size, 2 * hidden_size, bias=True)\n        )\n\n    def forward(self, x, c):\n        shift, scale = self.adaLN_modulation(c).chunk(2, dim=1)\n        x = modulate(self.norm_final(x), shift, scale)\n        x = self.linear(x)\n        return x\n\n\nclass T2IFinalLayer(nn.Module):\n    \"\"\"\n    The final layer of PixArt.\n    \"\"\"\n\n    def __init__(self, hidden_size, patch_size, out_channels):\n        super().__init__()\n        self.norm_final = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n        self.linear = nn.Linear(hidden_size, patch_size * patch_size * out_channels, bias=True)\n        self.scale_shift_table = nn.Parameter(torch.randn(2, hidden_size) / hidden_size ** 0.5)\n        self.out_channels = out_channels\n\n    def forward(self, x, t):\n        shift, scale = (self.scale_shift_table[None] + t[:, None]).chunk(2, dim=1)\n        x = t2i_modulate(self.norm_final(x), shift, scale)\n        x = self.linear(x)\n        return x\n\n\nclass MaskFinalLayer(nn.Module):\n    \"\"\"\n    The final layer of PixArt.\n    \"\"\"\n\n    def __init__(self, final_hidden_size, c_emb_size, patch_size, out_channels):\n        super().__init__()\n        self.norm_final = nn.LayerNorm(final_hidden_size, elementwise_affine=False, eps=1e-6)\n        self.linear = nn.Linear(final_hidden_size, patch_size * patch_size * out_channels, bias=True)\n        self.adaLN_modulation = nn.Sequential(\n            nn.SiLU(),\n            nn.Linear(c_emb_size, 2 * final_hidden_size, bias=True)\n        )\n    def forward(self, x, t):\n        shift, scale = self.adaLN_modulation(t).chunk(2, dim=1)\n        x = modulate(self.norm_final(x), shift, scale)\n        x = self.linear(x)\n        return x\n\n\nclass DecoderLayer(nn.Module):\n    \"\"\"\n    The final layer of PixArt.\n    \"\"\"\n\n    def __init__(self, hidden_size, decoder_hidden_size):\n        super().__init__()\n        self.norm_decoder = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n        self.linear = nn.Linear(hidden_size, decoder_hidden_size, bias=True)\n        self.adaLN_modulation = nn.Sequential(\n            nn.SiLU(),\n            nn.Linear(hidden_size, 2 * hidden_size, bias=True)\n        )\n    def forward(self, x, t):\n        shift, scale = self.adaLN_modulation(t).chunk(2, dim=1)\n        x = modulate(self.norm_decoder(x), shift, scale)\n        x = self.linear(x)\n        return x\n\n\n#################################################################################\n#               Embedding Layers for Timesteps and Class Labels                 #\n#################################################################################\nclass TimestepEmbedder(nn.Module):\n    \"\"\"\n    Embeds scalar timesteps into vector representations.\n    \"\"\"\n\n    def __init__(self, hidden_size, frequency_embedding_size=256):\n        super().__init__()\n        self.mlp = nn.Sequential(\n            nn.Linear(frequency_embedding_size, hidden_size, bias=True),\n            nn.SiLU(),\n            nn.Linear(hidden_size, hidden_size, bias=True),\n        )\n        self.frequency_embedding_size = frequency_embedding_size\n\n    @staticmethod\n    def timestep_embedding(t, dim, max_period=10000):\n        \"\"\"\n        Create sinusoidal timestep embeddings.\n        :param t: a 1-D Tensor of N indices, one per batch element.\n                          These may be fractional.\n        :param dim: the dimension of the output.\n        :param max_period: controls the minimum frequency of the embeddings.\n        :return: an (N, D) Tensor of positional embeddings.\n        \"\"\"\n        # https://github.com/openai/glide-text2im/blob/main/glide_text2im/nn.py\n        half = dim // 2\n        freqs = torch.exp(\n            -math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32, device=t.device) / half)\n        args = t[:, None].float() * freqs[None]\n        embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)\n        if dim % 2:\n            embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)\n        return embedding\n\n    def forward(self, t):\n        t_freq = self.timestep_embedding(t, self.frequency_embedding_size).to(self.dtype)\n        return self.mlp(t_freq)\n\n    @property\n    def dtype(self):\n        # 返回模型参数的数据类型\n        return next(self.parameters()).dtype\n\n\nclass SizeEmbedder(TimestepEmbedder):\n    \"\"\"\n    Embeds scalar timesteps into vector representations.\n    \"\"\"\n\n    def __init__(self, hidden_size, frequency_embedding_size=256):\n        super().__init__(hidden_size=hidden_size, frequency_embedding_size=frequency_embedding_size)\n        self.mlp = nn.Sequential(\n            nn.Linear(frequency_embedding_size, hidden_size, bias=True),\n            nn.SiLU(),\n            nn.Linear(hidden_size, hidden_size, bias=True),\n        )\n        self.frequency_embedding_size = frequency_embedding_size\n        self.outdim = hidden_size\n\n    def forward(self, s, bs):\n        if s.ndim == 1:\n            s = s[:, None]\n        assert s.ndim == 2\n        if s.shape[0] != bs:\n            s = s.repeat(bs//s.shape[0], 1)\n            assert s.shape[0] == bs\n        b, dims = s.shape[0], s.shape[1]\n        s = rearrange(s, \"b d -> (b d)\")\n        s_freq = self.timestep_embedding(s, self.frequency_embedding_size).to(self.dtype)\n        s_emb = self.mlp(s_freq)\n        s_emb = rearrange(s_emb, \"(b d) d2 -> b (d d2)\", b=b, d=dims, d2=self.outdim)\n        return s_emb\n\n    @property\n    def dtype(self):\n        # 返回模型参数的数据类型\n        return next(self.parameters()).dtype\n\n\nclass LabelEmbedder(nn.Module):\n    \"\"\"\n    Embeds class labels into vector representations. Also handles label dropout for classifier-free guidance.\n    \"\"\"\n\n    def __init__(self, num_classes, hidden_size, dropout_prob):\n        super().__init__()\n        use_cfg_embedding = dropout_prob > 0\n        self.embedding_table = nn.Embedding(num_classes + use_cfg_embedding, hidden_size)\n        self.num_classes = num_classes\n        self.dropout_prob = dropout_prob\n\n    def token_drop(self, labels, force_drop_ids=None):\n        \"\"\"\n        Drops labels to enable classifier-free guidance.\n        \"\"\"\n        if force_drop_ids is None:\n            drop_ids = torch.rand(labels.shape[0]).cuda() < self.dropout_prob\n        else:\n            drop_ids = force_drop_ids == 1\n        labels = torch.where(drop_ids, self.num_classes, labels)\n        return labels\n\n    def forward(self, labels, train, force_drop_ids=None):\n        use_dropout = self.dropout_prob > 0\n        if (train and use_dropout) or (force_drop_ids is not None):\n            labels = self.token_drop(labels, force_drop_ids)\n        return self.embedding_table(labels)\n\n\nclass CaptionEmbedder(nn.Module):\n    \"\"\"\n    Embeds class labels into vector representations. Also handles label dropout for classifier-free guidance.\n    \"\"\"\n\n    def __init__(self, in_channels, hidden_size, uncond_prob, act_layer=nn.GELU(approximate='tanh'), token_num=120):\n        super().__init__()\n        self.y_proj = Mlp(in_features=in_channels, hidden_features=hidden_size, out_features=hidden_size, act_layer=act_layer, drop=0)\n        self.register_buffer(\"y_embedding\", nn.Parameter(torch.randn(token_num, in_channels) / in_channels ** 0.5))\n        self.uncond_prob = uncond_prob\n\n    def token_drop(self, caption, force_drop_ids=None):\n        \"\"\"\n        Drops labels to enable classifier-free guidance.\n        \"\"\"\n        if force_drop_ids is None:\n            drop_ids = torch.rand(caption.shape[0]).cuda() < self.uncond_prob\n        else:\n            drop_ids = force_drop_ids == 1\n        caption = torch.where(drop_ids[:, None, None, None], self.y_embedding, caption)\n        return caption\n\n    def forward(self, caption, train, force_drop_ids=None):\n        if train:\n            assert caption.shape[2:] == self.y_embedding.shape\n        use_dropout = self.uncond_prob > 0\n        if (train and use_dropout) or (force_drop_ids is not None):\n            caption = self.token_drop(caption, force_drop_ids)\n        caption = self.y_proj(caption)\n        return caption\n\n\nclass CaptionEmbedderDoubleBr(nn.Module):\n    \"\"\"\n    Embeds class labels into vector representations. Also handles label dropout for classifier-free guidance.\n    \"\"\"\n\n    def __init__(self, in_channels, hidden_size, uncond_prob, act_layer=nn.GELU(approximate='tanh'), token_num=120):\n        super().__init__()\n        self.proj = Mlp(in_features=in_channels, hidden_features=hidden_size, out_features=hidden_size, act_layer=act_layer, drop=0)\n        self.embedding = nn.Parameter(torch.randn(1, in_channels) / 10 ** 0.5)\n        self.y_embedding = nn.Parameter(torch.randn(token_num, in_channels) / 10 ** 0.5)\n        self.uncond_prob = uncond_prob\n\n    def token_drop(self, global_caption, caption, force_drop_ids=None):\n        \"\"\"\n        Drops labels to enable classifier-free guidance.\n        \"\"\"\n        if force_drop_ids is None:\n            drop_ids = torch.rand(global_caption.shape[0]).cuda() < self.uncond_prob\n        else:\n            drop_ids = force_drop_ids == 1\n        global_caption = torch.where(drop_ids[:, None], self.embedding, global_caption)\n        caption = torch.where(drop_ids[:, None, None, None], self.y_embedding, caption)\n        return global_caption, caption\n\n    def forward(self, caption, train, force_drop_ids=None):\n        assert caption.shape[2: ] == self.y_embedding.shape\n        global_caption = caption.mean(dim=2).squeeze()\n        use_dropout = self.uncond_prob > 0\n        if (train and use_dropout) or (force_drop_ids is not None):\n            global_caption, caption = self.token_drop(global_caption, caption, force_drop_ids)\n        y_embed = self.proj(global_caption)\n        return y_embed, caption"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/nets/__init__.py",
    "content": "from .PixArt import PixArt, PixArt_XL_2\nfrom .PixArtMS import PixArtMS, PixArtMS_XL_2, PixArtMSBlock\nfrom .pixart_controlnet import ControlPixArtHalf, ControlPixArtMSHalf"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/nets/pixart_controlnet.py",
    "content": "import re\nimport torch\nimport torch.nn as nn\n\nfrom copy import deepcopy\nfrom torch import Tensor\nfrom torch.nn import Module, Linear, init\nfrom typing import Any, Mapping\n\nfrom diffusion.model.nets import PixArtMSBlock, PixArtMS, PixArt\nfrom diffusion.model.nets.PixArt import get_2d_sincos_pos_embed\nfrom diffusion.model.utils import auto_grad_checkpoint\n\n\n# The implementation of ControlNet-Half architrecture\n# https://github.com/lllyasviel/ControlNet/discussions/188\nclass ControlT2IDitBlockHalf(Module):\n    def __init__(self, base_block: PixArtMSBlock, block_index: 0) -> None:\n        super().__init__()\n        self.copied_block = deepcopy(base_block)\n        self.block_index = block_index\n\n        for p in self.copied_block.parameters():\n            p.requires_grad_(True)\n\n        self.copied_block.load_state_dict(base_block.state_dict())\n        self.copied_block.train()\n        \n        self.hidden_size = hidden_size = base_block.hidden_size\n        if self.block_index == 0:\n            self.before_proj = Linear(hidden_size, hidden_size)\n            init.zeros_(self.before_proj.weight)\n            init.zeros_(self.before_proj.bias)\n        self.after_proj = Linear(hidden_size, hidden_size) \n        init.zeros_(self.after_proj.weight)\n        init.zeros_(self.after_proj.bias)\n\n    def forward(self, x, y, t, mask=None, c=None):\n        \n        if self.block_index == 0:\n            # the first block\n            c = self.before_proj(c)\n            c = self.copied_block(x + c, y, t, mask)\n            c_skip = self.after_proj(c)\n        else:\n            # load from previous c and produce the c for skip connection\n            c = self.copied_block(c, y, t, mask)\n            c_skip = self.after_proj(c)\n        \n        return c, c_skip\n        \n\n# The implementation of ControlPixArtHalf net\nclass ControlPixArtHalf(Module):\n    # only support single res model\n    def __init__(self, base_model: PixArt, copy_blocks_num: int = 13) -> None:\n        super().__init__()\n        self.base_model = base_model.eval()\n        self.controlnet = []\n        self.copy_blocks_num = copy_blocks_num\n        self.total_blocks_num = len(base_model.blocks)\n        for p in self.base_model.parameters():\n            p.requires_grad_(False)\n\n        # Copy first copy_blocks_num block\n        for i in range(copy_blocks_num):\n            self.controlnet.append(ControlT2IDitBlockHalf(base_model.blocks[i], i))\n        self.controlnet = nn.ModuleList(self.controlnet)\n    \n    def __getattr__(self, name: str) -> Tensor or Module:\n        if name in ['forward', 'forward_with_dpmsolver', 'forward_with_cfg', 'forward_c', 'load_state_dict']:\n            return self.__dict__[name]\n        elif name in ['base_model', 'controlnet']:\n            return super().__getattr__(name)\n        else:\n            return getattr(self.base_model, name)\n\n    def forward_c(self, c):\n        self.h, self.w = c.shape[-2]//self.patch_size, c.shape[-1]//self.patch_size\n        pos_embed = torch.from_numpy(get_2d_sincos_pos_embed(self.pos_embed.shape[-1], (self.h, self.w), lewei_scale=self.lewei_scale, base_size=self.base_size)).unsqueeze(0).to(c.device).to(self.dtype)\n        return self.x_embedder(c) + pos_embed if c is not None else c\n\n    # def forward(self, x, t, c, **kwargs):\n    #     return self.base_model(x, t, c=self.forward_c(c), **kwargs)\n    def forward(self, x, timestep, y, mask=None, data_info=None, c=None, **kwargs):\n        # modify the original PixArtMS forward function\n        if c is not None:\n            c = c.to(self.dtype)\n            c = self.forward_c(c)\n        \"\"\"\n        Forward pass of PixArt.\n        x: (N, C, H, W) tensor of spatial inputs (images or latent representations of images)\n        t: (N,) tensor of diffusion timesteps\n        y: (N, 1, 120, C) tensor of class labels\n        \"\"\"\n        x = x.to(self.dtype)\n        timestep = timestep.to(self.dtype)\n        y = y.to(self.dtype)\n        pos_embed = self.pos_embed.to(self.dtype)\n        self.h, self.w = x.shape[-2]//self.patch_size, x.shape[-1]//self.patch_size\n        x = self.x_embedder(x) + pos_embed  # (N, T, D), where T = H * W / patch_size ** 2\n        t = self.t_embedder(timestep.to(x.dtype))  # (N, D)\n        t0 = self.t_block(t)\n        y = self.y_embedder(y, self.training)  # (N, 1, L, D)\n        if mask is not None:\n            if mask.shape[0] != y.shape[0]:\n                mask = mask.repeat(y.shape[0] // mask.shape[0], 1)\n            mask = mask.squeeze(1).squeeze(1)\n            y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])\n            y_lens = mask.sum(dim=1).tolist()\n        else:\n            y_lens = [y.shape[2]] * y.shape[0]\n            y = y.squeeze(1).view(1, -1, x.shape[-1])\n\n        # define the first layer\n        x = auto_grad_checkpoint(self.base_model.blocks[0], x, y, t0, y_lens, **kwargs)  # (N, T, D) #support grad checkpoint\n\n        if c is not None:\n            # update c\n            for index in range(1, self.copy_blocks_num + 1):\n                c, c_skip = auto_grad_checkpoint(self.controlnet[index - 1], x, y, t0, y_lens, c, **kwargs)\n                x = auto_grad_checkpoint(self.base_model.blocks[index], x + c_skip, y, t0, y_lens, **kwargs)\n        \n            # update x\n            for index in range(self.copy_blocks_num + 1, self.total_blocks_num):\n                x = auto_grad_checkpoint(self.base_model.blocks[index], x, y, t0, y_lens, **kwargs)\n        else:\n            for index in range(1, self.total_blocks_num):\n                x = auto_grad_checkpoint(self.base_model.blocks[index], x, y, t0, y_lens, **kwargs)\n\n        x = self.final_layer(x, t)  # (N, T, patch_size ** 2 * out_channels)\n        x = self.unpatchify(x)  # (N, out_channels, H, W)\n        return x\n\n    def forward_with_dpmsolver(self, x, t, y, data_info, c, **kwargs):\n        model_out = self.forward(x, t, y, data_info=data_info, c=c, **kwargs)\n        return model_out.chunk(2, dim=1)[0]\n\n    # def forward_with_dpmsolver(self, x, t, y, data_info, c, **kwargs):\n    #     return self.base_model.forward_with_dpmsolver(x, t, y, data_info=data_info, c=self.forward_c(c), **kwargs)\n\n    def forward_with_cfg(self, x, t, y, cfg_scale, data_info, c, **kwargs):\n        return self.base_model.forward_with_cfg(x, t, y, cfg_scale, data_info, c=self.forward_c(c), **kwargs)\n\n    def load_state_dict(self, state_dict: Mapping[str, Any], strict: bool = True):\n        if all((k.startswith('base_model') or k.startswith('controlnet')) for k in state_dict.keys()):\n            return super().load_state_dict(state_dict, strict)\n        else:\n            new_key = {}\n            for k in state_dict.keys():\n                new_key[k] = re.sub(r\"(blocks\\.\\d+)(.*)\", r\"\\1.base_block\\2\", k)\n            for k, v in new_key.items():\n                if k != v:\n                    print(f\"replace {k} to {v}\")\n                    state_dict[v] = state_dict.pop(k)\n\n            return self.base_model.load_state_dict(state_dict, strict)\n    \n    def unpatchify(self, x):\n        \"\"\"\n        x: (N, T, patch_size**2 * C)\n        imgs: (N, H, W, C)\n        \"\"\"\n        c = self.out_channels\n        p = self.x_embedder.patch_size[0]\n        assert self.h * self.w == x.shape[1]\n\n        x = x.reshape(shape=(x.shape[0], self.h, self.w, p, p, c))\n        x = torch.einsum('nhwpqc->nchpwq', x)\n        imgs = x.reshape(shape=(x.shape[0], c, self.h * p, self.w * p))\n        return imgs\n\n    @property\n    def dtype(self):\n        # 返回模型参数的数据类型\n        return next(self.parameters()).dtype\n\n\n# The implementation for PixArtMS_Half + 1024 resolution\nclass ControlPixArtMSHalf(ControlPixArtHalf):\n    # support multi-scale res model (multi-scale model can also be applied to single reso training & inference)\n    def __init__(self, base_model: PixArtMS, copy_blocks_num: int = 13) -> None:\n        super().__init__(base_model=base_model, copy_blocks_num=copy_blocks_num)\n\n    def forward(self, x, timestep, y, mask=None, data_info=None, c=None, **kwargs):\n        # modify the original PixArtMS forward function\n        \"\"\"\n        Forward pass of PixArt.\n        x: (N, C, H, W) tensor of spatial inputs (images or latent representations of images)\n        t: (N,) tensor of diffusion timesteps\n        y: (N, 1, 120, C) tensor of class labels\n        \"\"\"\n        if c is not None:\n            c = c.to(self.dtype)\n            c = self.forward_c(c)\n        bs = x.shape[0]\n        x = x.to(self.dtype)\n        timestep = timestep.to(self.dtype)\n        y = y.to(self.dtype)\n        c_size, ar = data_info['img_hw'].to(self.dtype), data_info['aspect_ratio'].to(self.dtype)\n        self.h, self.w = x.shape[-2]//self.patch_size, x.shape[-1]//self.patch_size\n\n        pos_embed = torch.from_numpy(get_2d_sincos_pos_embed(self.pos_embed.shape[-1], (self.h, self.w), lewei_scale=self.lewei_scale, base_size=self.base_size)).unsqueeze(0).to(x.device).to(self.dtype)\n        x = self.x_embedder(x) + pos_embed  # (N, T, D), where T = H * W / patch_size ** 2\n        t = self.t_embedder(timestep)  # (N, D)\n        csize = self.csize_embedder(c_size, bs)  # (N, D)\n        ar = self.ar_embedder(ar, bs)  # (N, D)\n        t = t + torch.cat([csize, ar], dim=1)\n        t0 = self.t_block(t)\n        y = self.y_embedder(y, self.training)  # (N, D)\n        if mask is not None:\n            if mask.shape[0] != y.shape[0]:\n                mask = mask.repeat(y.shape[0] // mask.shape[0], 1)\n            mask = mask.squeeze(1).squeeze(1)\n            y = y.squeeze(1).masked_select(mask.unsqueeze(-1) != 0).view(1, -1, x.shape[-1])\n            y_lens = mask.sum(dim=1).tolist()\n        else:\n            y_lens = [y.shape[2]] * y.shape[0]\n            y = y.squeeze(1).view(1, -1, x.shape[-1])\n\n        # define the first layer\n        x = auto_grad_checkpoint(self.base_model.blocks[0], x, y, t0, y_lens, **kwargs)  # (N, T, D) #support grad checkpoint\n\n        if c is not None:\n            # update c\n            for index in range(1, self.copy_blocks_num + 1):\n                c, c_skip = auto_grad_checkpoint(self.controlnet[index - 1], x, y, t0, y_lens, c, **kwargs)\n                x = auto_grad_checkpoint(self.base_model.blocks[index], x + c_skip, y, t0, y_lens, **kwargs)\n        \n            # update x\n            for index in range(self.copy_blocks_num + 1, self.total_blocks_num):\n                x = auto_grad_checkpoint(self.base_model.blocks[index], x, y, t0, y_lens, **kwargs)\n        else:\n            for index in range(1, self.total_blocks_num):\n                x = auto_grad_checkpoint(self.base_model.blocks[index], x, y, t0, y_lens, **kwargs)\n\n        x = self.final_layer(x, t)  # (N, T, patch_size ** 2 * out_channels)\n        x = self.unpatchify(x)  # (N, out_channels, H, W)\n        return x\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/respace.py",
    "content": "# Modified from OpenAI's diffusion repos\n#     GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py\n#     ADM:   https://github.com/openai/guided-diffusion/blob/main/guided_diffusion\n#     IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\n\nimport numpy as np\nimport torch as th\n\nfrom .gaussian_diffusion import GaussianDiffusion\n\n\ndef space_timesteps(num_timesteps, section_counts):\n    \"\"\"\n    Create a list of timesteps to use from an original diffusion process,\n    given the number of timesteps we want to take from equally-sized portions\n    of the original process.\n    For example, if there's 300 timesteps and the section counts are [10,15,20]\n    then the first 100 timesteps are strided to be 10 timesteps, the second 100\n    are strided to be 15 timesteps, and the final 100 are strided to be 20.\n    If the stride is a string starting with \"ddim\", then the fixed striding\n    from the DDIM paper is used, and only one section is allowed.\n    :param num_timesteps: the number of diffusion steps in the original\n                          process to divide up.\n    :param section_counts: either a list of numbers, or a string containing\n                           comma-separated numbers, indicating the step count\n                           per section. As a special case, use \"ddimN\" where N\n                           is a number of steps to use the striding from the\n                           DDIM paper.\n    :return: a set of diffusion steps from the original process to use.\n    \"\"\"\n    if isinstance(section_counts, str):\n        if section_counts.startswith(\"ddim\"):\n            desired_count = int(section_counts[len(\"ddim\") :])\n            for i in range(1, num_timesteps):\n                if len(range(0, num_timesteps, i)) == desired_count:\n                    return set(range(0, num_timesteps, i))\n            raise ValueError(\n                f\"cannot create exactly {num_timesteps} steps with an integer stride\"\n            )\n        section_counts = [int(x) for x in section_counts.split(\",\")]\n    size_per = num_timesteps // len(section_counts)\n    extra = num_timesteps % len(section_counts)\n    start_idx = 0\n    all_steps = []\n    for i, section_count in enumerate(section_counts):\n        size = size_per + (1 if i < extra else 0)\n        if size < section_count:\n            raise ValueError(\n                f\"cannot divide section of {size} steps into {section_count}\"\n            )\n        frac_stride = 1 if section_count <= 1 else (size - 1) / (section_count - 1)\n        cur_idx = 0.0\n        taken_steps = []\n        for _ in range(section_count):\n            taken_steps.append(start_idx + round(cur_idx))\n            cur_idx += frac_stride\n        all_steps += taken_steps\n        start_idx += size\n    return set(all_steps)\n\n\nclass SpacedDiffusion(GaussianDiffusion):\n    \"\"\"\n    A diffusion process which can skip steps in a base diffusion process.\n    :param use_timesteps: a collection (sequence or set) of timesteps from the\n                          original diffusion process to retain.\n    :param kwargs: the kwargs to create the base diffusion process.\n    \"\"\"\n\n    def __init__(self, use_timesteps, **kwargs):\n        self.use_timesteps = set(use_timesteps)\n        self.timestep_map = []\n        self.original_num_steps = len(kwargs[\"betas\"])\n\n        base_diffusion = GaussianDiffusion(**kwargs)  # pylint: disable=missing-kwoa\n        last_alpha_cumprod = 1.0\n        new_betas = []\n        for i, alpha_cumprod in enumerate(base_diffusion.alphas_cumprod):\n            if i in self.use_timesteps:\n                new_betas.append(1 - alpha_cumprod / last_alpha_cumprod)\n                last_alpha_cumprod = alpha_cumprod\n                self.timestep_map.append(i)\n        kwargs[\"betas\"] = np.array(new_betas)\n        super().__init__(**kwargs)\n\n    def p_mean_variance(\n        self, model, *args, **kwargs\n    ):  # pylint: disable=signature-differs\n        return super().p_mean_variance(self._wrap_model(model), *args, **kwargs)\n\n    def training_losses(\n        self, model, *args, **kwargs\n    ):  # pylint: disable=signature-differs\n        return super().training_losses(self._wrap_model(model), *args, **kwargs)\n\n    def training_losses_diffusers(\n        self, model, *args, **kwargs\n    ):  # pylint: disable=signature-differs\n        return super().training_losses_diffusers(self._wrap_model(model), *args, **kwargs)\n\n    def condition_mean(self, cond_fn, *args, **kwargs):\n        return super().condition_mean(self._wrap_model(cond_fn), *args, **kwargs)\n\n    def condition_score(self, cond_fn, *args, **kwargs):\n        return super().condition_score(self._wrap_model(cond_fn), *args, **kwargs)\n\n    def _wrap_model(self, model):\n        if isinstance(model, _WrappedModel):\n            return model\n        return _WrappedModel(\n            model, self.timestep_map, self.original_num_steps\n        )\n\n    def _scale_timesteps(self, t):\n        # Scaling is done by the wrapped model.\n        return t\n\n\nclass _WrappedModel:\n    def __init__(self, model, timestep_map, original_num_steps):\n        self.model = model\n        self.timestep_map = timestep_map\n        # self.rescale_timesteps = rescale_timesteps\n        self.original_num_steps = original_num_steps\n\n    def __call__(self, x, timestep, **kwargs):\n        map_tensor = th.tensor(self.timestep_map, device=timestep.device, dtype=timestep.dtype)\n        new_ts = map_tensor[timestep]\n        # if self.rescale_timesteps:\n        #     new_ts = new_ts.float() * (1000.0 / self.original_num_steps)\n        return self.model(x, timestep=new_ts, **kwargs)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/sa_solver.py",
    "content": "import torch\nimport torch.nn.functional as F\nimport math\nfrom tqdm import tqdm\n\n\nclass NoiseScheduleVP:\n    def __init__(\n            self,\n            schedule='discrete',\n            betas=None,\n            alphas_cumprod=None,\n            continuous_beta_0=0.1,\n            continuous_beta_1=20.,\n            dtype=torch.float32,\n    ):\n        \"\"\"Thanks to DPM-Solver for their code base\"\"\"\n        \"\"\"Create a wrapper class for the forward SDE (VP type).\n        ***\n        Update: We support discrete-time diffusion models by implementing a picewise linear interpolation for log_alpha_t.\n                We recommend to use schedule='discrete' for the discrete-time diffusion models, especially for high-resolution images.\n        ***\n        The forward SDE ensures that the condition distribution q_{t|0}(x_t | x_0) = N ( alpha_t * x_0, sigma_t^2 * I ).\n        We further define lambda_t = log(alpha_t) - log(sigma_t), which is the half-logSNR (described in the DPM-Solver paper).\n        Therefore, we implement the functions for computing alpha_t, sigma_t and lambda_t. For t in [0, T], we have:\n            log_alpha_t = self.marginal_log_mean_coeff(t)\n            sigma_t = self.marginal_std(t)\n            lambda_t = self.marginal_lambda(t)\n        Moreover, as lambda(t) is an invertible function, we also support its inverse function:\n            t = self.inverse_lambda(lambda_t)\n        ===============================================================\n        We support both discrete-time DPMs (trained on n = 0, 1, ..., N-1) and continuous-time DPMs (trained on t in [t_0, T]).\n        1. For discrete-time DPMs:\n            For discrete-time DPMs trained on n = 0, 1, ..., N-1, we convert the discrete steps to continuous time steps by:\n                t_i = (i + 1) / N\n            e.g. for N = 1000, we have t_0 = 1e-3 and T = t_{N-1} = 1.\n            We solve the corresponding diffusion ODE from time T = 1 to time t_0 = 1e-3.\n            Args:\n                betas: A `torch.Tensor`. The beta array for the discrete-time DPM. (See the original DDPM paper for details)\n                alphas_cumprod: A `torch.Tensor`. The cumprod alphas for the discrete-time DPM. (See the original DDPM paper for details)\n            Note that we always have alphas_cumprod = cumprod(1 - betas). Therefore, we only need to set one of `betas` and `alphas_cumprod`.\n            **Important**:  Please pay special attention for the args for `alphas_cumprod`:\n                The `alphas_cumprod` is the \\hat{alpha_n} arrays in the notations of DDPM. Specifically, DDPMs assume that\n                    q_{t_n | 0}(x_{t_n} | x_0) = N ( \\sqrt{\\hat{alpha_n}} * x_0, (1 - \\hat{alpha_n}) * I ).\n                Therefore, the notation \\hat{alpha_n} is different from the notation alpha_t in DPM-Solver. In fact, we have\n                    alpha_{t_n} = \\sqrt{\\hat{alpha_n}},\n                and\n                    log(alpha_{t_n}) = 0.5 * log(\\hat{alpha_n}).\n        2. For continuous-time DPMs:\n            We support two types of VPSDEs: linear (DDPM) and cosine (improved-DDPM). The hyperparameters for the noise\n            schedule are the default settings in DDPM and improved-DDPM:\n            Args:\n                beta_min: A `float` number. The smallest beta for the linear schedule.\n                beta_max: A `float` number. The largest beta for the linear schedule.\n                cosine_s: A `float` number. The hyperparameter in the cosine schedule.\n                cosine_beta_max: A `float` number. The hyperparameter in the cosine schedule.\n                T: A `float` number. The ending time of the forward process.\n        ===============================================================\n        Args:\n            schedule: A `str`. The noise schedule of the forward SDE. 'discrete' for discrete-time DPMs,\n                    'linear' or 'cosine' for continuous-time DPMs.\n        Returns:\n            A wrapper object of the forward SDE (VP type).\n\n        ===============================================================\n        Example:\n        # For discrete-time DPMs, given betas (the beta array for n = 0, 1, ..., N - 1):\n        >>> ns = NoiseScheduleVP('discrete', betas=betas)\n        # For discrete-time DPMs, given alphas_cumprod (the \\hat{alpha_n} array for n = 0, 1, ..., N - 1):\n        >>> ns = NoiseScheduleVP('discrete', alphas_cumprod=alphas_cumprod)\n        # For continuous-time DPMs (VPSDE), linear schedule:\n        >>> ns = NoiseScheduleVP('linear', continuous_beta_0=0.1, continuous_beta_1=20.)\n        \"\"\"\n\n        if schedule not in ['discrete', 'linear', 'cosine']:\n            raise ValueError(\n                f\"Unsupported noise schedule {schedule}. The schedule needs to be 'discrete' or 'linear' or 'cosine'\"\n            )\n\n        self.schedule = schedule\n        if schedule == 'discrete':\n            if betas is not None:\n                log_alphas = 0.5 * torch.log(1 - betas).cumsum(dim=0)\n            else:\n                assert alphas_cumprod is not None\n                log_alphas = 0.5 * torch.log(alphas_cumprod)\n            self.total_N = len(log_alphas)\n            self.T = 1.\n            self.t_array = torch.linspace(0., 1., self.total_N + 1)[1:].reshape((1, -1)).to(dtype=dtype)\n            self.log_alpha_array = log_alphas.reshape((1, -1,)).to(dtype=dtype)\n        else:\n            self.total_N = 1000\n            self.beta_0 = continuous_beta_0\n            self.beta_1 = continuous_beta_1\n            self.cosine_s = 0.008\n            self.cosine_beta_max = 999.\n            self.cosine_t_max = math.atan(self.cosine_beta_max * (1. + self.cosine_s) / math.pi) * 2. * (\n                        1. + self.cosine_s) / math.pi - self.cosine_s\n            self.cosine_log_alpha_0 = math.log(math.cos(self.cosine_s / (1. + self.cosine_s) * math.pi / 2.))\n            self.schedule = schedule\n            self.T = 0.9946 if schedule == 'cosine' else 1.\n\n    def marginal_log_mean_coeff(self, t):\n        \"\"\"\n        Compute log(alpha_t) of a given continuous-time label t in [0, T].\n        \"\"\"\n        if self.schedule == 'discrete':\n            return interpolate_fn(t.reshape((-1, 1)), self.t_array.to(t.device),\n                                  self.log_alpha_array.to(t.device)).reshape((-1))\n        elif self.schedule == 'linear':\n            return -0.25 * t ** 2 * (self.beta_1 - self.beta_0) - 0.5 * t * self.beta_0\n        elif self.schedule == 'cosine':\n            log_alpha_fn = lambda s: torch.log(torch.cos((s + self.cosine_s) / (1. + self.cosine_s) * math.pi / 2.))\n            return log_alpha_fn(t) - self.cosine_log_alpha_0\n\n    def marginal_alpha(self, t):\n        \"\"\"\n        Compute alpha_t of a given continuous-time label t in [0, T].\n        \"\"\"\n        return torch.exp(self.marginal_log_mean_coeff(t))\n\n    def marginal_std(self, t):\n        \"\"\"\n        Compute sigma_t of a given continuous-time label t in [0, T].\n        \"\"\"\n        return torch.sqrt(1. - torch.exp(2. * self.marginal_log_mean_coeff(t)))\n\n    def marginal_lambda(self, t):\n        \"\"\"\n        Compute lambda_t = log(alpha_t) - log(sigma_t) of a given continuous-time label t in [0, T].\n        \"\"\"\n        log_mean_coeff = self.marginal_log_mean_coeff(t)\n        log_std = 0.5 * torch.log(1. - torch.exp(2. * log_mean_coeff))\n        return log_mean_coeff - log_std\n\n    def inverse_lambda(self, lamb):\n        \"\"\"\n        Compute the continuous-time label t in [0, T] of a given half-logSNR lambda_t.\n        \"\"\"\n        if self.schedule == 'linear':\n            tmp = 2. * (self.beta_1 - self.beta_0) * torch.logaddexp(-2. * lamb, torch.zeros((1,)).to(lamb))\n            Delta = self.beta_0 ** 2 + tmp\n            return tmp / (torch.sqrt(Delta) + self.beta_0) / (self.beta_1 - self.beta_0)\n        elif self.schedule == 'discrete':\n            log_alpha = -0.5 * torch.logaddexp(torch.zeros((1,)).to(lamb.device), -2. * lamb)\n            t = interpolate_fn(log_alpha.reshape((-1, 1)), torch.flip(self.log_alpha_array.to(lamb.device), [1]),\n                               torch.flip(self.t_array.to(lamb.device), [1]))\n            return t.reshape((-1,))\n        else:\n            log_alpha = -0.5 * torch.logaddexp(-2. * lamb, torch.zeros((1,)).to(lamb))\n            t_fn = lambda log_alpha_t: torch.arccos(torch.exp(log_alpha_t + self.cosine_log_alpha_0)) * 2. * (\n                        1. + self.cosine_s) / math.pi - self.cosine_s\n            return t_fn(log_alpha)\n\n    def edm_sigma(self, t):\n        return self.marginal_std(t) / self.marginal_alpha(t)\n\n    def edm_inverse_sigma(self, edmsigma):\n        alpha = 1 / (edmsigma ** 2 + 1).sqrt()\n        sigma = alpha * edmsigma\n        lambda_t = torch.log(alpha / sigma)\n        return self.inverse_lambda(lambda_t)\n\n\ndef model_wrapper(\n        model,\n        noise_schedule,\n        model_type=\"noise\",\n        model_kwargs={},\n        guidance_type=\"uncond\",\n        condition=None,\n        unconditional_condition=None,\n        guidance_scale=1.,\n        classifier_fn=None,\n        classifier_kwargs={},\n):\n    \"\"\"Thanks to DPM-Solver for their code base\"\"\"\n    \"\"\"Create a wrapper function for the noise prediction model.\n    SA-Solver needs to solve the continuous-time diffusion SDEs. For DPMs trained on discrete-time labels, we need to\n    firstly wrap the model function to a noise prediction model that accepts the continuous time as the input.\n    We support four types of the diffusion model by setting `model_type`:\n        1. \"noise\": noise prediction model. (Trained by predicting noise).\n        2. \"x_start\": data prediction model. (Trained by predicting the data x_0 at time 0).\n        3. \"v\": velocity prediction model. (Trained by predicting the velocity).\n            The \"v\" prediction is derivation detailed in Appendix D of [1], and is used in Imagen-Video [2].\n            [1] Salimans, Tim, and Jonathan Ho. \"Progressive distillation for fast sampling of diffusion models.\"\n                arXiv preprint arXiv:2202.00512 (2022).\n            [2] Ho, Jonathan, et al. \"Imagen Video: High Definition Video Generation with Diffusion Models.\"\n                arXiv preprint arXiv:2210.02303 (2022).\n\n        4. \"score\": marginal score function. (Trained by denoising score matching).\n            Note that the score function and the noise prediction model follows a simple relationship:\n            ```\n                noise(x_t, t) = -sigma_t * score(x_t, t)\n            ```\n    We support three types of guided sampling by DPMs by setting `guidance_type`:\n        1. \"uncond\": unconditional sampling by DPMs.\n            The input `model` has the following format:\n            ``\n                model(x, t_input, **model_kwargs) -> noise | x_start | v | score\n            ``\n        2. \"classifier\": classifier guidance sampling [3] by DPMs and another classifier.\n            The input `model` has the following format:\n            ``\n                model(x, t_input, **model_kwargs) -> noise | x_start | v | score\n            ``\n            The input `classifier_fn` has the following format:\n            ``\n                classifier_fn(x, t_input, cond, **classifier_kwargs) -> logits(x, t_input, cond)\n            ``\n            [3] P. Dhariwal and A. Q. Nichol, \"Diffusion models beat GANs on image synthesis,\"\n                in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 8780-8794.\n        3. \"classifier-free\": classifier-free guidance sampling by conditional DPMs.\n            The input `model` has the following format:\n            ``\n                model(x, t_input, cond, **model_kwargs) -> noise | x_start | v | score\n            ``\n            And if cond == `unconditional_condition`, the model output is the unconditional DPM output.\n            [4] Ho, Jonathan, and Tim Salimans. \"Classifier-free diffusion guidance.\"\n                arXiv preprint arXiv:2207.12598 (2022).\n\n    The `t_input` is the time label of the model, which may be discrete-time labels (i.e. 0 to 999)\n    or continuous-time labels (i.e. epsilon to T).\n    We wrap the model function to accept only `x` and `t_continuous` as inputs, and outputs the predicted noise:\n    ``\n        def model_fn(x, t_continuous) -> noise:\n            t_input = get_model_input_time(t_continuous)\n            return noise_pred(model, x, t_input, **model_kwargs)\n    ``\n    where `t_continuous` is the continuous time labels (i.e. epsilon to T). And we use `model_fn` for SA-Solver.\n    ===============================================================\n    Args:\n        model: A diffusion model with the corresponding format described above.\n        noise_schedule: A noise schedule object, such as NoiseScheduleVP.\n        model_type: A `str`. The parameterization type of the diffusion model.\n                    \"noise\" or \"x_start\" or \"v\" or \"score\".\n        model_kwargs: A `dict`. A dict for the other inputs of the model function.\n        guidance_type: A `str`. The type of the guidance for sampling.\n                    \"uncond\" or \"classifier\" or \"classifier-free\".\n        condition: A pytorch tensor. The condition for the guided sampling.\n                    Only used for \"classifier\" or \"classifier-free\" guidance type.\n        unconditional_condition: A pytorch tensor. The condition for the unconditional sampling.\n                    Only used for \"classifier-free\" guidance type.\n        guidance_scale: A `float`. The scale for the guided sampling.\n        classifier_fn: A classifier function. Only used for the classifier guidance.\n        classifier_kwargs: A `dict`. A dict for the other inputs of the classifier function.\n    Returns:\n        A noise prediction model that accepts the noised data and the continuous time as the inputs.\n    \"\"\"\n\n    def get_model_input_time(t_continuous):\n        \"\"\"\n        Convert the continuous-time `t_continuous` (in [epsilon, T]) to the model input time.\n        For discrete-time DPMs, we convert `t_continuous` in [1 / N, 1] to `t_input` in [0, 1000 * (N - 1) / N].\n        For continuous-time DPMs, we just use `t_continuous`.\n        \"\"\"\n        if noise_schedule.schedule == 'discrete':\n            return (t_continuous - 1. / noise_schedule.total_N) * 1000.\n        else:\n            return t_continuous\n\n    def noise_pred_fn(x, t_continuous, cond=None):\n        t_input = get_model_input_time(t_continuous)\n        if cond is None:\n            output = model(x, t_input, **model_kwargs)\n        else:\n            output = model(x, t_input, cond, **model_kwargs)\n        if model_type == \"noise\":\n            return output\n        elif model_type == \"x_start\":\n            alpha_t, sigma_t = noise_schedule.marginal_alpha(t_continuous), noise_schedule.marginal_std(t_continuous)\n            return (x - alpha_t[0] * output) / sigma_t[0]\n        elif model_type == \"v\":\n            alpha_t, sigma_t = noise_schedule.marginal_alpha(t_continuous), noise_schedule.marginal_std(t_continuous)\n            return alpha_t[0] * output + sigma_t[0] * x\n        elif model_type == \"score\":\n            sigma_t = noise_schedule.marginal_std(t_continuous)\n            return -sigma_t[0] * output\n\n    def cond_grad_fn(x, t_input):\n        \"\"\"\n        Compute the gradient of the classifier, i.e. nabla_{x} log p_t(cond | x_t).\n        \"\"\"\n        with torch.enable_grad():\n            x_in = x.detach().requires_grad_(True)\n            log_prob = classifier_fn(x_in, t_input, condition, **classifier_kwargs)\n            return torch.autograd.grad(log_prob.sum(), x_in)[0]\n\n    def model_fn(x, t_continuous):\n        \"\"\"\n        The noise predicition model function that is used for DPM-Solver.\n        \"\"\"\n        if guidance_type == \"uncond\":\n            return noise_pred_fn(x, t_continuous)\n        elif guidance_type == \"classifier\":\n            assert classifier_fn is not None\n            t_input = get_model_input_time(t_continuous)\n            cond_grad = cond_grad_fn(x, t_input)\n            sigma_t = noise_schedule.marginal_std(t_continuous)\n            noise = noise_pred_fn(x, t_continuous)\n            return noise - guidance_scale * sigma_t * cond_grad\n        elif guidance_type == \"classifier-free\":\n            if guidance_scale == 1. or unconditional_condition is None:\n                return noise_pred_fn(x, t_continuous, cond=condition)\n            x_in = torch.cat([x] * 2)\n            t_in = torch.cat([t_continuous] * 2)\n            c_in = torch.cat([unconditional_condition, condition])\n            noise_uncond, noise = noise_pred_fn(x_in, t_in, cond=c_in).chunk(2)\n            return noise_uncond + guidance_scale * (noise - noise_uncond)\n\n    assert model_type in [\"noise\", \"x_start\", \"v\", \"score\"]\n    assert guidance_type in [\"uncond\", \"classifier\", \"classifier-free\"]\n    return model_fn\n\n\nclass SASolver:\n    def __init__(\n            self,\n            model_fn,\n            noise_schedule,\n            algorithm_type=\"data_prediction\",\n            correcting_x0_fn=None,\n            correcting_xt_fn=None,\n            thresholding_max_val=1.,\n            dynamic_thresholding_ratio=0.995\n    ):\n        \"\"\"\n        Construct a SA-Solver\n        The default value for algorithm_type is \"data_prediction\" and we recommend not to change it to\n        \"noise_prediction\". For details, please see Appendix A.2.4 in SA-Solver paper https://arxiv.org/pdf/2309.05019.pdf\n        \"\"\"\n\n        self.model = lambda x, t: model_fn(x, t.expand((x.shape[0])))\n        self.noise_schedule = noise_schedule\n        assert algorithm_type in [\"data_prediction\", \"noise_prediction\"]\n\n        if correcting_x0_fn == \"dynamic_thresholding\":\n            self.correcting_x0_fn = self.dynamic_thresholding_fn\n        else:\n            self.correcting_x0_fn = correcting_x0_fn\n\n        self.correcting_xt_fn = correcting_xt_fn\n        self.dynamic_thresholding_ratio = dynamic_thresholding_ratio\n        self.thresholding_max_val = thresholding_max_val\n\n        self.predict_x0 = algorithm_type == \"data_prediction\"\n\n        self.sigma_min = float(self.noise_schedule.edm_sigma(torch.tensor([1e-3])))\n        self.sigma_max = float(self.noise_schedule.edm_sigma(torch.tensor([1])))\n\n    def dynamic_thresholding_fn(self, x0, t=None):\n        \"\"\"\n        The dynamic thresholding method.\n        \"\"\"\n        dims = x0.dim()\n        p = self.dynamic_thresholding_ratio\n        s = torch.quantile(torch.abs(x0).reshape((x0.shape[0], -1)), p, dim=1)\n        s = expand_dims(torch.maximum(s, self.thresholding_max_val * torch.ones_like(s).to(s.device)), dims)\n        x0 = torch.clamp(x0, -s, s) / s\n        return x0\n\n    def noise_prediction_fn(self, x, t):\n        \"\"\"\n        Return the noise prediction model.\n        \"\"\"\n        return self.model(x, t)\n\n    def data_prediction_fn(self, x, t):\n        \"\"\"\n        Return the data prediction model (with corrector).\n        \"\"\"\n        noise = self.noise_prediction_fn(x, t)\n        alpha_t, sigma_t = self.noise_schedule.marginal_alpha(t), self.noise_schedule.marginal_std(t)\n        x0 = (x - sigma_t * noise) / alpha_t\n        if self.correcting_x0_fn is not None:\n            x0 = self.correcting_x0_fn(x0)\n        return x0\n\n    def model_fn(self, x, t):\n        \"\"\"\n        Convert the model to the noise prediction model or the data prediction model.\n        \"\"\"\n\n        if self.predict_x0:\n            return self.data_prediction_fn(x, t)\n        else:\n            return self.noise_prediction_fn(x, t)\n\n    def get_time_steps(self, skip_type, t_T, t_0, N, order, device):\n        \"\"\"Compute the intermediate time steps for sampling.\n        \"\"\"\n        if skip_type == 'logSNR':\n            lambda_T = self.noise_schedule.marginal_lambda(torch.tensor(t_T).to(device))\n            lambda_0 = self.noise_schedule.marginal_lambda(torch.tensor(t_0).to(device))\n            logSNR_steps = lambda_T + torch.linspace(torch.tensor(0.).cpu().item(),\n                                                     (lambda_0 - lambda_T).cpu().item() ** (1. / order), N + 1).pow(\n                order).to(device)\n            return self.noise_schedule.inverse_lambda(logSNR_steps)\n        elif skip_type == 'time':\n            t = torch.linspace(t_T ** (1. / order), t_0 ** (1. / order), N + 1).pow(order).to(device)\n            return t\n        elif skip_type == 'karras':\n            sigma_min = max(0.002, self.sigma_min)\n            sigma_max = min(80, self.sigma_max)\n            sigma_steps = torch.linspace(sigma_max ** (1. / 7), sigma_min ** (1. / 7), N + 1).pow(7).to(device)\n            return self.noise_schedule.edm_inverse_sigma(sigma_steps)\n        else:\n            raise ValueError(\n                f\"Unsupported skip_type {skip_type}, need to be 'logSNR' or 'time' or 'karras'\"\n            )\n\n    def denoise_to_zero_fn(self, x, s):\n        \"\"\"\n        Denoise at the final step, which is equivalent to solve the ODE from lambda_s to infty by first-order discretization.\n        \"\"\"\n        return self.data_prediction_fn(x, s)\n\n    def get_coefficients_exponential_negative(self, order, interval_start, interval_end):\n        \"\"\"\n        Calculate the integral of exp(-x) * x^order dx from interval_start to interval_end\n        For calculating the coefficient of gradient terms after the lagrange interpolation,\n        see Eq.(15) and Eq.(18) in SA-Solver paper https://arxiv.org/pdf/2309.05019.pdf\n        For noise_prediction formula.\n        \"\"\"\n        assert order in [0, 1, 2, 3], \"order is only supported for 0, 1, 2 and 3\"\n\n        if order == 0:\n            return torch.exp(-interval_end) * (torch.exp(interval_end - interval_start) - 1)\n        elif order == 1:\n            return torch.exp(-interval_end) * (\n                        (interval_start + 1) * torch.exp(interval_end - interval_start) - (interval_end + 1))\n        elif order == 2:\n            return torch.exp(-interval_end) * (\n                        (interval_start ** 2 + 2 * interval_start + 2) * torch.exp(interval_end - interval_start) - (\n                            interval_end ** 2 + 2 * interval_end + 2))\n        elif order == 3:\n            return torch.exp(-interval_end) * (\n                        (interval_start ** 3 + 3 * interval_start ** 2 + 6 * interval_start + 6) * torch.exp(\n                    interval_end - interval_start) - (interval_end ** 3 + 3 * interval_end ** 2 + 6 * interval_end + 6))\n\n    def get_coefficients_exponential_positive(self, order, interval_start, interval_end, tau):\n        \"\"\"\n        Calculate the integral of exp(x(1+tau^2)) * x^order dx from interval_start to interval_end\n        For calculating the coefficient of gradient terms after the lagrange interpolation,\n        see Eq.(15) and Eq.(18) in SA-Solver paper https://arxiv.org/pdf/2309.05019.pdf\n        For data_prediction formula.\n        \"\"\"\n        assert order in [0, 1, 2, 3], \"order is only supported for 0, 1, 2 and 3\"\n\n        # after change of variable(cov)\n        interval_end_cov = (1 + tau ** 2) * interval_end\n        interval_start_cov = (1 + tau ** 2) * interval_start\n\n        if order == 0:\n            return torch.exp(interval_end_cov) * (1 - torch.exp(-(interval_end_cov - interval_start_cov))) / (\n            (1 + tau ** 2))\n        elif order == 1:\n            return torch.exp(interval_end_cov) * ((interval_end_cov - 1) - (interval_start_cov - 1) * torch.exp(\n                -(interval_end_cov - interval_start_cov))) / ((1 + tau ** 2) ** 2)\n        elif order == 2:\n            return torch.exp(interval_end_cov) * ((interval_end_cov ** 2 - 2 * interval_end_cov + 2) - (\n                        interval_start_cov ** 2 - 2 * interval_start_cov + 2) * torch.exp(\n                -(interval_end_cov - interval_start_cov))) / ((1 + tau ** 2) ** 3)\n        elif order == 3:\n            return torch.exp(interval_end_cov) * (\n                        (interval_end_cov ** 3 - 3 * interval_end_cov ** 2 + 6 * interval_end_cov - 6) - (\n                            interval_start_cov ** 3 - 3 * interval_start_cov ** 2 + 6 * interval_start_cov - 6) * torch.exp(\n                    -(interval_end_cov - interval_start_cov))) / ((1 + tau ** 2) ** 4)\n\n    def lagrange_polynomial_coefficient(self, order, lambda_list):\n        \"\"\"\n        Calculate the coefficient of lagrange polynomial\n        For lagrange interpolation\n        \"\"\"\n        assert order in [0, 1, 2, 3]\n        assert order == len(lambda_list) - 1\n        if order == 0:\n            return [[1]]\n        elif order == 1:\n            return [[1 / (lambda_list[0] - lambda_list[1]), -lambda_list[1] / (lambda_list[0] - lambda_list[1])],\n                    [1 / (lambda_list[1] - lambda_list[0]), -lambda_list[0] / (lambda_list[1] - lambda_list[0])]]\n        elif order == 2:\n            denominator1 = (lambda_list[0] - lambda_list[1]) * (lambda_list[0] - lambda_list[2])\n            denominator2 = (lambda_list[1] - lambda_list[0]) * (lambda_list[1] - lambda_list[2])\n            denominator3 = (lambda_list[2] - lambda_list[0]) * (lambda_list[2] - lambda_list[1])\n            return [[1 / denominator1,\n                     (-lambda_list[1] - lambda_list[2]) / denominator1,\n                     lambda_list[1] * lambda_list[2] / denominator1],\n\n                    [1 / denominator2,\n                     (-lambda_list[0] - lambda_list[2]) / denominator2,\n                     lambda_list[0] * lambda_list[2] / denominator2],\n\n                    [1 / denominator3,\n                     (-lambda_list[0] - lambda_list[1]) / denominator3,\n                     lambda_list[0] * lambda_list[1] / denominator3]\n                    ]\n        elif order == 3:\n            denominator1 = (lambda_list[0] - lambda_list[1]) * (lambda_list[0] - lambda_list[2]) * (\n                        lambda_list[0] - lambda_list[3])\n            denominator2 = (lambda_list[1] - lambda_list[0]) * (lambda_list[1] - lambda_list[2]) * (\n                        lambda_list[1] - lambda_list[3])\n            denominator3 = (lambda_list[2] - lambda_list[0]) * (lambda_list[2] - lambda_list[1]) * (\n                        lambda_list[2] - lambda_list[3])\n            denominator4 = (lambda_list[3] - lambda_list[0]) * (lambda_list[3] - lambda_list[1]) * (\n                        lambda_list[3] - lambda_list[2])\n            return [[1 / denominator1,\n                     (-lambda_list[1] - lambda_list[2] - lambda_list[3]) / denominator1,\n                     (lambda_list[1] * lambda_list[2] + lambda_list[1] * lambda_list[3] + lambda_list[2] * lambda_list[\n                         3]) / denominator1,\n                     (-lambda_list[1] * lambda_list[2] * lambda_list[3]) / denominator1],\n\n                    [1 / denominator2,\n                     (-lambda_list[0] - lambda_list[2] - lambda_list[3]) / denominator2,\n                     (lambda_list[0] * lambda_list[2] + lambda_list[0] * lambda_list[3] + lambda_list[2] * lambda_list[\n                         3]) / denominator2,\n                     (-lambda_list[0] * lambda_list[2] * lambda_list[3]) / denominator2],\n\n                    [1 / denominator3,\n                     (-lambda_list[0] - lambda_list[1] - lambda_list[3]) / denominator3,\n                     (lambda_list[0] * lambda_list[1] + lambda_list[0] * lambda_list[3] + lambda_list[1] * lambda_list[\n                         3]) / denominator3,\n                     (-lambda_list[0] * lambda_list[1] * lambda_list[3]) / denominator3],\n\n                    [1 / denominator4,\n                     (-lambda_list[0] - lambda_list[1] - lambda_list[2]) / denominator4,\n                     (lambda_list[0] * lambda_list[1] + lambda_list[0] * lambda_list[2] + lambda_list[1] * lambda_list[\n                         2]) / denominator4,\n                     (-lambda_list[0] * lambda_list[1] * lambda_list[2]) / denominator4]\n\n                    ]\n\n    def get_coefficients_fn(self, order, interval_start, interval_end, lambda_list, tau):\n        \"\"\"\n        Calculate the coefficient of gradients.\n        \"\"\"\n        assert order in [1, 2, 3, 4]\n        assert order == len(lambda_list), 'the length of lambda list must be equal to the order'\n        coefficients = []\n        lagrange_coefficient = self.lagrange_polynomial_coefficient(order - 1, lambda_list)\n        for i in range(order):\n            coefficient = sum(\n                lagrange_coefficient[i][j]\n                * self.get_coefficients_exponential_positive(\n                    order - 1 - j, interval_start, interval_end, tau\n                )\n                if self.predict_x0\n                else lagrange_coefficient[i][j]\n                * self.get_coefficients_exponential_negative(\n                    order - 1 - j, interval_start, interval_end\n                )\n                for j in range(order)\n            )\n            coefficients.append(coefficient)\n        assert len(coefficients) == order, 'the length of coefficients does not match the order'\n        return coefficients\n\n    def adams_bashforth_update(self, order, x, tau, model_prev_list, t_prev_list, noise, t):\n        \"\"\"\n        SA-Predictor, without the \"rescaling\" trick in Appendix D in SA-Solver paper https://arxiv.org/pdf/2309.05019.pdf\n        \"\"\"\n        assert order in [1, 2, 3, 4], \"order of stochastic adams bashforth method is only supported for 1, 2, 3 and 4\"\n\n        # get noise schedule\n        ns = self.noise_schedule\n        alpha_t = ns.marginal_alpha(t)\n        sigma_t = ns.marginal_std(t)\n        lambda_t = ns.marginal_lambda(t)\n        alpha_prev = ns.marginal_alpha(t_prev_list[-1])\n        sigma_prev = ns.marginal_std(t_prev_list[-1])\n        gradient_part = torch.zeros_like(x)\n        h = lambda_t - ns.marginal_lambda(t_prev_list[-1])\n        lambda_list = [ns.marginal_lambda(t_prev_list[-(i + 1)]) for i in range(order)]\n        gradient_coefficients = self.get_coefficients_fn(order, ns.marginal_lambda(t_prev_list[-1]), lambda_t,\n                                                         lambda_list, tau)\n\n        for i in range(order):\n            if self.predict_x0:\n                gradient_part += (1 + tau ** 2) * sigma_t * torch.exp(- tau ** 2 * lambda_t) * gradient_coefficients[\n                    i] * model_prev_list[-(i + 1)]\n            else:\n                gradient_part += -(1 + tau ** 2) * alpha_t * gradient_coefficients[i] * model_prev_list[-(i + 1)]\n\n        if self.predict_x0:\n            noise_part = sigma_t * torch.sqrt(1 - torch.exp(-2 * tau ** 2 * h)) * noise\n        else:\n            noise_part = tau * sigma_t * torch.sqrt(torch.exp(2 * h) - 1) * noise\n\n        if self.predict_x0:\n            x_t = torch.exp(-tau ** 2 * h) * (sigma_t / sigma_prev) * x + gradient_part + noise_part\n        else:\n            x_t = (alpha_t / alpha_prev) * x + gradient_part + noise_part\n\n        return x_t\n\n    def adams_moulton_update(self, order, x, tau, model_prev_list, t_prev_list, noise, t):\n        \"\"\"\n        SA-Corrector, without the \"rescaling\" trick in Appendix D in SA-Solver paper https://arxiv.org/pdf/2309.05019.pdf\n        \"\"\"\n\n        assert order in [1, 2, 3, 4], \"order of stochastic adams bashforth method is only supported for 1, 2, 3 and 4\"\n\n        # get noise schedule\n        ns = self.noise_schedule\n        alpha_t = ns.marginal_alpha(t)\n        sigma_t = ns.marginal_std(t)\n        lambda_t = ns.marginal_lambda(t)\n        alpha_prev = ns.marginal_alpha(t_prev_list[-1])\n        sigma_prev = ns.marginal_std(t_prev_list[-1])\n        gradient_part = torch.zeros_like(x)\n        h = lambda_t - ns.marginal_lambda(t_prev_list[-1])\n        t_list = t_prev_list + [t]\n        lambda_list = [ns.marginal_lambda(t_list[-(i + 1)]) for i in range(order)]\n        gradient_coefficients = self.get_coefficients_fn(order, ns.marginal_lambda(t_prev_list[-1]), lambda_t,\n                                                         lambda_list, tau)\n\n        for i in range(order):\n            if self.predict_x0:\n                gradient_part += (1 + tau ** 2) * sigma_t * torch.exp(- tau ** 2 * lambda_t) * gradient_coefficients[\n                    i] * model_prev_list[-(i + 1)]\n            else:\n                gradient_part += -(1 + tau ** 2) * alpha_t * gradient_coefficients[i] * model_prev_list[-(i + 1)]\n\n        if self.predict_x0:\n            noise_part = sigma_t * torch.sqrt(1 - torch.exp(-2 * tau ** 2 * h)) * noise\n        else:\n            noise_part = tau * sigma_t * torch.sqrt(torch.exp(2 * h) - 1) * noise\n\n        if self.predict_x0:\n            x_t = torch.exp(-tau ** 2 * h) * (sigma_t / sigma_prev) * x + gradient_part + noise_part\n        else:\n            x_t = (alpha_t / alpha_prev) * x + gradient_part + noise_part\n\n        return x_t\n\n    def adams_bashforth_update_few_steps(self, order, x, tau, model_prev_list, t_prev_list, noise, t):\n        \"\"\"\n        SA-Predictor, with the \"rescaling\" trick in Appendix D in SA-Solver paper https://arxiv.org/pdf/2309.05019.pdf\n        \"\"\"\n\n        assert order in [1, 2, 3, 4], \"order of stochastic adams bashforth method is only supported for 1, 2, 3 and 4\"\n\n        # get noise schedule\n        ns = self.noise_schedule\n        alpha_t = ns.marginal_alpha(t)\n        sigma_t = ns.marginal_std(t)\n        lambda_t = ns.marginal_lambda(t)\n        alpha_prev = ns.marginal_alpha(t_prev_list[-1])\n        sigma_prev = ns.marginal_std(t_prev_list[-1])\n        gradient_part = torch.zeros_like(x)\n        h = lambda_t - ns.marginal_lambda(t_prev_list[-1])\n        lambda_list = [ns.marginal_lambda(t_prev_list[-(i + 1)]) for i in range(order)]\n        gradient_coefficients = self.get_coefficients_fn(order, ns.marginal_lambda(t_prev_list[-1]), lambda_t,\n                                                         lambda_list, tau)\n\n        if self.predict_x0:\n            if order == 2:  ## if order = 2 we do a modification that does not influence the convergence order similar to unipc. Note: This is used only for few steps sampling.\n                # The added term is O(h^3). Empirically we find it will slightly improve the image quality.\n                # ODE case\n                # gradient_coefficients[0] += 1.0 * torch.exp(lambda_t) * (h ** 2 / 2 - (h - 1 + torch.exp(-h))) / (ns.marginal_lambda(t_prev_list[-1]) - ns.marginal_lambda(t_prev_list[-2]))\n                # gradient_coefficients[1] -= 1.0 * torch.exp(lambda_t) * (h ** 2 / 2 - (h - 1 + torch.exp(-h))) / (ns.marginal_lambda(t_prev_list[-1]) - ns.marginal_lambda(t_prev_list[-2]))\n                gradient_coefficients[0] += 1.0 * torch.exp((1 + tau ** 2) * lambda_t) * (\n                            h ** 2 / 2 - (h * (1 + tau ** 2) - 1 + torch.exp((1 + tau ** 2) * (-h))) / (\n                                (1 + tau ** 2) ** 2)) / (ns.marginal_lambda(t_prev_list[-1]) - ns.marginal_lambda(\n                    t_prev_list[-2]))\n                gradient_coefficients[1] -= 1.0 * torch.exp((1 + tau ** 2) * lambda_t) * (\n                            h ** 2 / 2 - (h * (1 + tau ** 2) - 1 + torch.exp((1 + tau ** 2) * (-h))) / (\n                                (1 + tau ** 2) ** 2)) / (ns.marginal_lambda(t_prev_list[-1]) - ns.marginal_lambda(\n                    t_prev_list[-2]))\n\n        for i in range(order):\n            if self.predict_x0:\n                gradient_part += (1 + tau ** 2) * sigma_t * torch.exp(- tau ** 2 * lambda_t) * gradient_coefficients[\n                    i] * model_prev_list[-(i + 1)]\n            else:\n                gradient_part += -(1 + tau ** 2) * alpha_t * gradient_coefficients[i] * model_prev_list[-(i + 1)]\n\n        if self.predict_x0:\n            noise_part = sigma_t * torch.sqrt(1 - torch.exp(-2 * tau ** 2 * h)) * noise\n        else:\n            noise_part = tau * sigma_t * torch.sqrt(torch.exp(2 * h) - 1) * noise\n\n        if self.predict_x0:\n            x_t = torch.exp(-tau ** 2 * h) * (sigma_t / sigma_prev) * x + gradient_part + noise_part\n        else:\n            x_t = (alpha_t / alpha_prev) * x + gradient_part + noise_part\n\n        return x_t\n\n    def adams_moulton_update_few_steps(self, order, x, tau, model_prev_list, t_prev_list, noise, t):\n        \"\"\"\n        SA-Corrector, without the \"rescaling\" trick in Appendix D in SA-Solver paper https://arxiv.org/pdf/2309.05019.pdf\n        \"\"\"\n\n        assert order in [1, 2, 3, 4], \"order of stochastic adams bashforth method is only supported for 1, 2, 3 and 4\"\n\n        # get noise schedule\n        ns = self.noise_schedule\n        alpha_t = ns.marginal_alpha(t)\n        sigma_t = ns.marginal_std(t)\n        lambda_t = ns.marginal_lambda(t)\n        alpha_prev = ns.marginal_alpha(t_prev_list[-1])\n        sigma_prev = ns.marginal_std(t_prev_list[-1])\n        gradient_part = torch.zeros_like(x)\n        h = lambda_t - ns.marginal_lambda(t_prev_list[-1])\n        t_list = t_prev_list + [t]\n        lambda_list = [ns.marginal_lambda(t_list[-(i + 1)]) for i in range(order)]\n        gradient_coefficients = self.get_coefficients_fn(order, ns.marginal_lambda(t_prev_list[-1]), lambda_t,\n                                                         lambda_list, tau)\n\n        if self.predict_x0:\n            if order == 2:  ## if order = 2 we do a modification that does not influence the convergence order similar to UniPC. Note: This is used only for few steps sampling.\n                # The added term is O(h^3). Empirically we find it will slightly improve the image quality.\n                # ODE case\n                # gradient_coefficients[0] += 1.0 * torch.exp(lambda_t) * (h / 2 - (h - 1 + torch.exp(-h)) / h)\n                # gradient_coefficients[1] -= 1.0 * torch.exp(lambda_t) * (h / 2 - (h - 1 + torch.exp(-h)) / h)\n                gradient_coefficients[0] += 1.0 * torch.exp((1 + tau ** 2) * lambda_t) * (\n                            h / 2 - (h * (1 + tau ** 2) - 1 + torch.exp((1 + tau ** 2) * (-h))) / (\n                                (1 + tau ** 2) ** 2 * h))\n                gradient_coefficients[1] -= 1.0 * torch.exp((1 + tau ** 2) * lambda_t) * (\n                            h / 2 - (h * (1 + tau ** 2) - 1 + torch.exp((1 + tau ** 2) * (-h))) / (\n                                (1 + tau ** 2) ** 2 * h))\n\n        for i in range(order):\n            if self.predict_x0:\n                gradient_part += (1 + tau ** 2) * sigma_t * torch.exp(- tau ** 2 * lambda_t) * gradient_coefficients[\n                    i] * model_prev_list[-(i + 1)]\n            else:\n                gradient_part += -(1 + tau ** 2) * alpha_t * gradient_coefficients[i] * model_prev_list[-(i + 1)]\n\n        if self.predict_x0:\n            noise_part = sigma_t * torch.sqrt(1 - torch.exp(-2 * tau ** 2 * h)) * noise\n        else:\n            noise_part = tau * sigma_t * torch.sqrt(torch.exp(2 * h) - 1) * noise\n\n        if self.predict_x0:\n            x_t = torch.exp(-tau ** 2 * h) * (sigma_t / sigma_prev) * x + gradient_part + noise_part\n        else:\n            x_t = (alpha_t / alpha_prev) * x + gradient_part + noise_part\n\n        return x_t\n\n    def sample_few_steps(self, x, tau, steps=5, t_start=None, t_end=None, skip_type='time', skip_order=1,\n                         predictor_order=3, corrector_order=4, pc_mode='PEC', return_intermediate=False\n                         ):\n        \"\"\"\n        For the PC-mode, please refer to the wiki page\n        https://en.wikipedia.org/wiki/Predictor%E2%80%93corrector_method#PEC_mode_and_PECE_mode\n        'PEC' needs one model evaluation per step while 'PECE' needs two model evaluations\n        We recommend use pc_mode='PEC' for NFEs is limited. 'PECE' mode is only for test with sufficient NFEs.\n        \"\"\"\n\n        skip_first_step = False\n        skip_final_step = True\n        lower_order_final = True\n        denoise_to_zero = False\n\n        assert pc_mode in ['PEC', 'PECE'], 'Predictor-corrector mode only supports PEC and PECE'\n        t_0 = 1. / self.noise_schedule.total_N if t_end is None else t_end\n        t_T = self.noise_schedule.T if t_start is None else t_start\n        assert t_0 > 0 and t_T > 0, \"Time range needs to be greater than 0. For discrete-time DPMs, it needs to be in [1 / N, 1], where N is the length of betas array\"\n\n        device = x.device\n        intermediates = []\n        with torch.no_grad():\n            assert steps >= max(predictor_order, corrector_order - 1)\n            timesteps = self.get_time_steps(skip_type=skip_type, t_T=t_T, t_0=t_0, N=steps, order=skip_order,\n                                            device=device)\n            assert timesteps.shape[0] - 1 == steps\n            # Init the initial values.\n            step = 0\n            t = timesteps[step]\n            noise = torch.randn_like(x)\n            t_prev_list = [t]\n            # do not evaluate if skip_first_step\n            if skip_first_step:\n                if self.predict_x0:\n                    alpha_t = self.noise_schedule.marginal_alpha(t)\n                    sigma_t = self.noise_schedule.marginal_std(t)\n                    model_prev_list = [(1 - sigma_t) / alpha_t * x]\n                else:\n                    model_prev_list = [x]\n            else:\n                model_prev_list = [self.model_fn(x, t)]\n\n            if self.correcting_xt_fn is not None:\n                x = self.correcting_xt_fn(x, t, step)\n            if return_intermediate:\n                intermediates.append(x)\n\n            # determine the first several values\n            for step in tqdm(range(1, max(predictor_order, corrector_order - 1))):\n\n                t = timesteps[step]\n                predictor_order_used = min(predictor_order, step)\n                corrector_order_used = min(corrector_order, step + 1)\n                noise = torch.randn_like(x)\n                # predictor step\n                x_p = self.adams_bashforth_update_few_steps(order=predictor_order_used, x=x, tau=tau(t),\n                                                            model_prev_list=model_prev_list, t_prev_list=t_prev_list,\n                                                            noise=noise, t=t)\n                # evaluation step\n                model_x = self.model_fn(x_p, t)\n\n                # update model_list\n                model_prev_list.append(model_x)\n                # corrector step\n                if corrector_order > 0:\n                    x = self.adams_moulton_update_few_steps(order=corrector_order_used, x=x, tau=tau(t),\n                                                            model_prev_list=model_prev_list, t_prev_list=t_prev_list,\n                                                            noise=noise, t=t)\n                else:\n                    x = x_p\n\n                # evaluation step if correction and mode = pece\n                if corrector_order > 0 and pc_mode == 'PECE':\n                    model_x = self.model_fn(x, t)\n                    del model_prev_list[-1]\n                    model_prev_list.append(model_x)\n\n                if self.correcting_xt_fn is not None:\n                    x = self.correcting_xt_fn(x, t, step)\n                if return_intermediate:\n                    intermediates.append(x)\n\n                t_prev_list.append(t)\n\n            for step in tqdm(range(max(predictor_order, corrector_order - 1), steps + 1)):\n                if lower_order_final:\n                    predictor_order_used = min(predictor_order, steps - step + 1)\n                    corrector_order_used = min(corrector_order, steps - step + 2)\n\n                else:\n                    predictor_order_used = predictor_order\n                    corrector_order_used = corrector_order\n                t = timesteps[step]\n                noise = torch.randn_like(x)\n\n                # predictor step\n                if skip_final_step and step == steps and not denoise_to_zero:\n                    x_p = self.adams_bashforth_update_few_steps(order=predictor_order_used, x=x, tau=0,\n                                                                model_prev_list=model_prev_list,\n                                                                t_prev_list=t_prev_list, noise=noise, t=t)\n                else:\n                    x_p = self.adams_bashforth_update_few_steps(order=predictor_order_used, x=x, tau=tau(t),\n                                                                model_prev_list=model_prev_list,\n                                                                t_prev_list=t_prev_list, noise=noise, t=t)\n\n                # evaluation step\n                # do not evaluate if skip_final_step and step = steps\n                if not skip_final_step or step < steps:\n                    model_x = self.model_fn(x_p, t)\n\n                # update model_list\n                # do not update if skip_final_step and step = steps\n                if not skip_final_step or step < steps:\n                    model_prev_list.append(model_x)\n\n                # corrector step\n                # do not correct if skip_final_step and step = steps\n                if corrector_order > 0 and (not skip_final_step or step < steps):\n                    x = self.adams_moulton_update_few_steps(order=corrector_order_used, x=x, tau=tau(t),\n                                                            model_prev_list=model_prev_list,\n                                                            t_prev_list=t_prev_list, noise=noise, t=t)\n                else:\n                    x = x_p\n\n                # evaluation step if mode = pece and step != steps\n                if corrector_order > 0 and (pc_mode == 'PECE' and step < steps):\n                    model_x = self.model_fn(x, t)\n                    del model_prev_list[-1]\n                    model_prev_list.append(model_x)\n\n                if self.correcting_xt_fn is not None:\n                    x = self.correcting_xt_fn(x, t, step)\n                if return_intermediate:\n                    intermediates.append(x)\n\n                t_prev_list.append(t)\n                del model_prev_list[0]\n\n            if denoise_to_zero:\n                t = torch.ones((1,)).to(device) * t_0\n                x = self.denoise_to_zero_fn(x, t)\n                if self.correcting_xt_fn is not None:\n                    x = self.correcting_xt_fn(x, t, step + 1)\n                if return_intermediate:\n                    intermediates.append(x)\n        return (x, intermediates) if return_intermediate else x\n\n    def sample_more_steps(self, x, tau, steps=20, t_start=None, t_end=None, skip_type='time', skip_order=1,\n                          predictor_order=3, corrector_order=4, pc_mode='PEC', return_intermediate=False\n                          ):\n        \"\"\"\n        For the PC-mode, please refer to the wiki page\n        https://en.wikipedia.org/wiki/Predictor%E2%80%93corrector_method#PEC_mode_and_PECE_mode\n        'PEC' needs one model evaluation per step while 'PECE' needs two model evaluations\n        We recommend use pc_mode='PEC' for NFEs is limited. 'PECE' mode is only for test with sufficient NFEs.\n        \"\"\"\n\n        skip_first_step = False\n        skip_final_step = False\n        lower_order_final = True\n        denoise_to_zero = True\n\n        assert pc_mode in ['PEC', 'PECE'], 'Predictor-corrector mode only supports PEC and PECE'\n        t_0 = 1. / self.noise_schedule.total_N if t_end is None else t_end\n        t_T = self.noise_schedule.T if t_start is None else t_start\n        assert t_0 > 0 and t_T > 0, \"Time range needs to be greater than 0. For discrete-time DPMs, it needs to be in [1 / N, 1], where N is the length of betas array\"\n\n        device = x.device\n        intermediates = []\n        with torch.no_grad():\n            assert steps >= max(predictor_order, corrector_order - 1)\n            timesteps = self.get_time_steps(skip_type=skip_type, t_T=t_T, t_0=t_0, N=steps, order=skip_order,\n                                            device=device)\n            assert timesteps.shape[0] - 1 == steps\n            # Init the initial values.\n            step = 0\n            t = timesteps[step]\n            noise = torch.randn_like(x)\n            t_prev_list = [t]\n            # do not evaluate if skip_first_step\n            if skip_first_step:\n                if self.predict_x0:\n                    alpha_t = self.noise_schedule.marginal_alpha(t)\n                    sigma_t = self.noise_schedule.marginal_std(t)\n                    model_prev_list = [(1 - sigma_t) / alpha_t * x]\n                else:\n                    model_prev_list = [x]\n            else:\n                model_prev_list = [self.model_fn(x, t)]\n\n            if self.correcting_xt_fn is not None:\n                x = self.correcting_xt_fn(x, t, step)\n            if return_intermediate:\n                intermediates.append(x)\n\n            # determine the first several values\n            for step in tqdm(range(1, max(predictor_order, corrector_order - 1))):\n\n                t = timesteps[step]\n                predictor_order_used = min(predictor_order, step)\n                corrector_order_used = min(corrector_order, step + 1)\n                noise = torch.randn_like(x)\n                # predictor step\n                x_p = self.adams_bashforth_update(order=predictor_order_used, x=x, tau=tau(t),\n                                                  model_prev_list=model_prev_list, t_prev_list=t_prev_list, noise=noise,\n                                                  t=t)\n                # evaluation step\n                model_x = self.model_fn(x_p, t)\n\n                # update model_list\n                model_prev_list.append(model_x)\n                # corrector step\n                if corrector_order > 0:\n                    x = self.adams_moulton_update(order=corrector_order_used, x=x, tau=tau(t),\n                                                  model_prev_list=model_prev_list, t_prev_list=t_prev_list, noise=noise,\n                                                  t=t)\n                else:\n                    x = x_p\n\n                # evaluation step if mode = pece\n                if corrector_order > 0 and pc_mode == 'PECE':\n                    model_x = self.model_fn(x, t)\n                    del model_prev_list[-1]\n                    model_prev_list.append(model_x)\n                if self.correcting_xt_fn is not None:\n                    x = self.correcting_xt_fn(x, t, step)\n                if return_intermediate:\n                    intermediates.append(x)\n\n                t_prev_list.append(t)\n\n            for step in tqdm(range(max(predictor_order, corrector_order - 1), steps + 1)):\n                if lower_order_final:\n                    predictor_order_used = min(predictor_order, steps - step + 1)\n                    corrector_order_used = min(corrector_order, steps - step + 2)\n\n                else:\n                    predictor_order_used = predictor_order\n                    corrector_order_used = corrector_order\n                t = timesteps[step]\n                noise = torch.randn_like(x)\n\n                # predictor step\n                if skip_final_step and step == steps and not denoise_to_zero:\n                    x_p = self.adams_bashforth_update(order=predictor_order_used, x=x, tau=0,\n                                                      model_prev_list=model_prev_list, t_prev_list=t_prev_list,\n                                                      noise=noise, t=t)\n                else:\n                    x_p = self.adams_bashforth_update(order=predictor_order_used, x=x, tau=tau(t),\n                                                      model_prev_list=model_prev_list, t_prev_list=t_prev_list,\n                                                      noise=noise, t=t)\n\n                # evaluation step\n                # do not evaluate if skip_final_step and step = steps\n                if not skip_final_step or step < steps:\n                    model_x = self.model_fn(x_p, t)\n\n                # update model_list\n                # do not update if skip_final_step and step = steps\n                if not skip_final_step or step < steps:\n                    model_prev_list.append(model_x)\n\n                # corrector step\n                # do not correct if skip_final_step and step = steps\n                if corrector_order > 0:\n                    if not skip_final_step or step < steps:\n                        x = self.adams_moulton_update(order=corrector_order_used, x=x, tau=tau(t),\n                                                      model_prev_list=model_prev_list, t_prev_list=t_prev_list,\n                                                      noise=noise, t=t)\n                    else:\n                        x = x_p\n                else:\n                    x = x_p\n\n                # evaluation step if mode = pece and step != steps\n                if corrector_order > 0 and (pc_mode == 'PECE' and step < steps):\n                    model_x = self.model_fn(x, t)\n                    del model_prev_list[-1]\n                    model_prev_list.append(model_x)\n\n                if self.correcting_xt_fn is not None:\n                    x = self.correcting_xt_fn(x, t, step)\n                if return_intermediate:\n                    intermediates.append(x)\n\n                t_prev_list.append(t)\n                del model_prev_list[0]\n\n            if denoise_to_zero:\n                t = torch.ones((1,)).to(device) * t_0\n                x = self.denoise_to_zero_fn(x, t)\n                if self.correcting_xt_fn is not None:\n                    x = self.correcting_xt_fn(x, t, step + 1)\n                if return_intermediate:\n                    intermediates.append(x)\n        if return_intermediate:\n            return x, intermediates\n        else:\n            return x\n\n    def sample(self, mode, x, tau, steps, t_start=None, t_end=None, skip_type='time', skip_order=1, predictor_order=3,\n               corrector_order=4, pc_mode='PEC', return_intermediate=False\n               ):\n        \"\"\"\n        For the PC-mode, please refer to the wiki page \n        https://en.wikipedia.org/wiki/Predictor%E2%80%93corrector_method#PEC_mode_and_PECE_mode\n        'PEC' needs one model evaluation per step while 'PECE' needs two model evaluations\n        We recommend use pc_mode='PEC' for NFEs is limited. 'PECE' mode is only for test with sufficient NFEs.\n\n        'few_steps' mode is recommended. The differences between 'few_steps' and 'more_steps' are as below:\n        1) 'few_steps' do not correct at final step and do not denoise to zero, while 'more_steps' do these two.\n        Thus the NFEs for 'few_steps' = steps, NFEs for 'more_steps' = steps + 2\n        For most of the experiments and tasks, we find these two operations do not have much help to sample quality.\n        2) 'few_steps' use a rescaling trick as in Appendix D in SA-Solver paper https://arxiv.org/pdf/2309.05019.pdf\n        We find it will slightly improve the sample quality especially in few steps.\n        \"\"\"\n        assert mode in ['few_steps', 'more_steps'], \"mode must be either 'few_steps' or 'more_steps'\"\n        if mode == 'few_steps':\n            return self.sample_few_steps(x=x, tau=tau, steps=steps, t_start=t_start, t_end=t_end, skip_type=skip_type,\n                                         skip_order=skip_order, predictor_order=predictor_order,\n                                         corrector_order=corrector_order, pc_mode=pc_mode,\n                                         return_intermediate=return_intermediate)\n        else:\n            return self.sample_more_steps(x=x, tau=tau, steps=steps, t_start=t_start, t_end=t_end, skip_type=skip_type,\n                                          skip_order=skip_order, predictor_order=predictor_order,\n                                          corrector_order=corrector_order, pc_mode=pc_mode,\n                                          return_intermediate=return_intermediate)\n\n\n#############################################################\n# other utility functions\n#############################################################\n\ndef interpolate_fn(x, xp, yp):\n    \"\"\"\n    A piecewise linear function y = f(x), using xp and yp as keypoints.\n    We implement f(x) in a differentiable way (i.e. applicable for autograd).\n    The function f(x) is well-defined for all x-axis. (For x beyond the bounds of xp, we use the outmost points of xp to define the linear function.)\n    Args:\n        x: PyTorch tensor with shape [N, C], where N is the batch size, C is the number of channels (we use C = 1 for DPM-Solver).\n        xp: PyTorch tensor with shape [C, K], where K is the number of keypoints.\n        yp: PyTorch tensor with shape [C, K].\n    Returns:\n        The function values f(x), with shape [N, C].\n    \"\"\"\n    N, K = x.shape[0], xp.shape[1]\n    all_x = torch.cat([x.unsqueeze(2), xp.unsqueeze(0).repeat((N, 1, 1))], dim=2)\n    sorted_all_x, x_indices = torch.sort(all_x, dim=2)\n    x_idx = torch.argmin(x_indices, dim=2)\n    cand_start_idx = x_idx - 1\n    start_idx = torch.where(\n        torch.eq(x_idx, 0),\n        torch.tensor(1, device=x.device),\n        torch.where(\n            torch.eq(x_idx, K), torch.tensor(K - 2, device=x.device), cand_start_idx,\n        ),\n    )\n    end_idx = torch.where(torch.eq(start_idx, cand_start_idx), start_idx + 2, start_idx + 1)\n    start_x = torch.gather(sorted_all_x, dim=2, index=start_idx.unsqueeze(2)).squeeze(2)\n    end_x = torch.gather(sorted_all_x, dim=2, index=end_idx.unsqueeze(2)).squeeze(2)\n    start_idx2 = torch.where(\n        torch.eq(x_idx, 0),\n        torch.tensor(0, device=x.device),\n        torch.where(\n            torch.eq(x_idx, K), torch.tensor(K - 2, device=x.device), cand_start_idx,\n        ),\n    )\n    y_positions_expanded = yp.unsqueeze(0).expand(N, -1, -1)\n    start_y = torch.gather(y_positions_expanded, dim=2, index=start_idx2.unsqueeze(2)).squeeze(2)\n    end_y = torch.gather(y_positions_expanded, dim=2, index=(start_idx2 + 1).unsqueeze(2)).squeeze(2)\n    cand = start_y + (x - start_x) * (end_y - start_y) / (end_x - start_x)\n    return cand\n\n\ndef expand_dims(v, dims):\n    \"\"\"\n    Expand the tensor `v` to the dim `dims`.\n    Args:\n        `v`: a PyTorch tensor with shape [N].\n        `dim`: a `int`.\n    Returns:\n        a PyTorch tensor with shape [N, 1, 1, ..., 1] and the total dimension is `dims`.\n    \"\"\"\n    return v[(...,) + (None,) * (dims - 1)]"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/t5.py",
    "content": "# -*- coding: utf-8 -*-\nimport os\nimport re\nimport html\nimport urllib.parse as ul\n\nimport ftfy\nimport torch\nfrom bs4 import BeautifulSoup\nfrom transformers import T5EncoderModel, AutoTokenizer\nfrom huggingface_hub import hf_hub_download\n\nclass T5Embedder:\n\n    available_models = ['t5-v1_1-xxl']\n    bad_punct_regex = re.compile(r'['+'#®•©™&@·º½¾¿¡§~'+'\\)'+'\\('+'\\]'+'\\['+'\\}'+'\\{'+'\\|'+'\\\\'+'\\/'+'\\*' + r']{1,}')  # noqa\n\n    def __init__(self, device, dir_or_name='t5-v1_1-xxl', *, local_cache=False, cache_dir=None, hf_token=None, use_text_preprocessing=True,\n                 t5_model_kwargs=None, torch_dtype=None, use_offload_folder=None, model_max_length=120):\n        self.device = torch.device(device)\n        self.torch_dtype = torch_dtype or torch.bfloat16\n        if t5_model_kwargs is None:\n            t5_model_kwargs = {'low_cpu_mem_usage': True, 'torch_dtype': self.torch_dtype}\n            if use_offload_folder is not None:\n                t5_model_kwargs['offload_folder'] = use_offload_folder\n                t5_model_kwargs['device_map'] = {\n                    'shared': self.device,\n                    'encoder.embed_tokens': self.device,\n                    'encoder.block.0': self.device,\n                    'encoder.block.1': self.device,\n                    'encoder.block.2': self.device,\n                    'encoder.block.3': self.device,\n                    'encoder.block.4': self.device,\n                    'encoder.block.5': self.device,\n                    'encoder.block.6': self.device,\n                    'encoder.block.7': self.device,\n                    'encoder.block.8': self.device,\n                    'encoder.block.9': self.device,\n                    'encoder.block.10': self.device,\n                    'encoder.block.11': self.device,\n                    'encoder.block.12': 'disk',\n                    'encoder.block.13': 'disk',\n                    'encoder.block.14': 'disk',\n                    'encoder.block.15': 'disk',\n                    'encoder.block.16': 'disk',\n                    'encoder.block.17': 'disk',\n                    'encoder.block.18': 'disk',\n                    'encoder.block.19': 'disk',\n                    'encoder.block.20': 'disk',\n                    'encoder.block.21': 'disk',\n                    'encoder.block.22': 'disk',\n                    'encoder.block.23': 'disk',\n                    'encoder.final_layer_norm': 'disk',\n                    'encoder.dropout': 'disk',\n                }\n            else:\n                t5_model_kwargs['device_map'] = {'shared': self.device, 'encoder': self.device}\n\n        self.use_text_preprocessing = use_text_preprocessing\n        self.hf_token = hf_token\n        self.cache_dir = cache_dir or os.path.expanduser('~/.cache/IF_')\n        self.dir_or_name = dir_or_name\n        tokenizer_path, path = dir_or_name, dir_or_name\n        if local_cache:\n            cache_dir = os.path.join(self.cache_dir, dir_or_name)\n            tokenizer_path, path = cache_dir, cache_dir\n        elif dir_or_name in self.available_models:\n            cache_dir = os.path.join(self.cache_dir, dir_or_name)\n            for filename in [\n                'config.json', 'special_tokens_map.json', 'spiece.model', 'tokenizer_config.json',\n                'pytorch_model.bin.index.json', 'pytorch_model-00001-of-00002.bin', 'pytorch_model-00002-of-00002.bin'\n            ]:\n                hf_hub_download(repo_id=f'DeepFloyd/{dir_or_name}', filename=filename, cache_dir=cache_dir,\n                                force_filename=filename, token=self.hf_token)\n            tokenizer_path, path = cache_dir, cache_dir\n        else:\n            cache_dir = os.path.join(self.cache_dir, 't5-v1_1-xxl')\n            for filename in [\n                'config.json', 'special_tokens_map.json', 'spiece.model', 'tokenizer_config.json',\n            ]:\n                hf_hub_download(repo_id='DeepFloyd/t5-v1_1-xxl', filename=filename, cache_dir=cache_dir,\n                                force_filename=filename, token=self.hf_token)\n            tokenizer_path = cache_dir\n\n        print(tokenizer_path)\n        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)\n        self.model = T5EncoderModel.from_pretrained(path, **t5_model_kwargs).eval()\n        self.model_max_length = model_max_length\n\n    def get_text_embeddings(self, texts):\n        texts = [self.text_preprocessing(text) for text in texts]\n\n        text_tokens_and_mask = self.tokenizer(\n            texts,\n            max_length=self.model_max_length,\n            padding='max_length',\n            truncation=True,\n            return_attention_mask=True,\n            add_special_tokens=True,\n            return_tensors='pt'\n        )\n\n        text_tokens_and_mask['input_ids'] = text_tokens_and_mask['input_ids']\n        text_tokens_and_mask['attention_mask'] = text_tokens_and_mask['attention_mask']\n\n        with torch.no_grad():\n            text_encoder_embs = self.model(\n                input_ids=text_tokens_and_mask['input_ids'].to(self.device),\n                attention_mask=text_tokens_and_mask['attention_mask'].to(self.device),\n            )['last_hidden_state'].detach()\n        return text_encoder_embs, text_tokens_and_mask['attention_mask'].to(self.device)\n\n    def text_preprocessing(self, text):\n        if self.use_text_preprocessing:\n            # The exact text cleaning as was in the training stage:\n            text = self.clean_caption(text)\n            text = self.clean_caption(text)\n            return text\n        else:\n            return text.lower().strip()\n\n    @staticmethod\n    def basic_clean(text):\n        text = ftfy.fix_text(text)\n        text = html.unescape(html.unescape(text))\n        return text.strip()\n\n    def clean_caption(self, caption):\n        caption = str(caption)\n        caption = ul.unquote_plus(caption)\n        caption = caption.strip().lower()\n        caption = re.sub('<person>', 'person', caption)\n        # urls:\n        caption = re.sub(\n            r'\\b((?:https?:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))',  # noqa\n            '', caption)  # regex for urls\n        caption = re.sub(\n            r'\\b((?:www:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))',  # noqa\n            '', caption)  # regex for urls\n        # html:\n        caption = BeautifulSoup(caption, features='html.parser').text\n\n        # @<nickname>\n        caption = re.sub(r'@[\\w\\d]+\\b', '', caption)\n\n        # 31C0—31EF CJK Strokes\n        # 31F0—31FF Katakana Phonetic Extensions\n        # 3200—32FF Enclosed CJK Letters and Months\n        # 3300—33FF CJK Compatibility\n        # 3400—4DBF CJK Unified Ideographs Extension A\n        # 4DC0—4DFF Yijing Hexagram Symbols\n        # 4E00—9FFF CJK Unified Ideographs\n        caption = re.sub(r'[\\u31c0-\\u31ef]+', '', caption)\n        caption = re.sub(r'[\\u31f0-\\u31ff]+', '', caption)\n        caption = re.sub(r'[\\u3200-\\u32ff]+', '', caption)\n        caption = re.sub(r'[\\u3300-\\u33ff]+', '', caption)\n        caption = re.sub(r'[\\u3400-\\u4dbf]+', '', caption)\n        caption = re.sub(r'[\\u4dc0-\\u4dff]+', '', caption)\n        caption = re.sub(r'[\\u4e00-\\u9fff]+', '', caption)\n        #######################################################\n\n        # все виды тире / all types of dash --> \"-\"\n        caption = re.sub(\n            r'[\\u002D\\u058A\\u05BE\\u1400\\u1806\\u2010-\\u2015\\u2E17\\u2E1A\\u2E3A\\u2E3B\\u2E40\\u301C\\u3030\\u30A0\\uFE31\\uFE32\\uFE58\\uFE63\\uFF0D]+',  # noqa\n            '-', caption)\n\n        # кавычки к одному стандарту\n        caption = re.sub(r'[`´«»“”¨]', '\"', caption)\n        caption = re.sub(r'[‘’]', \"'\", caption)\n\n        # &quot;\n        caption = re.sub(r'&quot;?', '', caption)\n        # &amp\n        caption = re.sub(r'&amp', '', caption)\n\n        # ip adresses:\n        caption = re.sub(r'\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}', ' ', caption)\n\n        # article ids:\n        caption = re.sub(r'\\d:\\d\\d\\s+$', '', caption)\n\n        # \\n\n        caption = re.sub(r'\\\\n', ' ', caption)\n\n        # \"#123\"\n        caption = re.sub(r'#\\d{1,3}\\b', '', caption)\n        # \"#12345..\"\n        caption = re.sub(r'#\\d{5,}\\b', '', caption)\n        # \"123456..\"\n        caption = re.sub(r'\\b\\d{6,}\\b', '', caption)\n        # filenames:\n        caption = re.sub(r'[\\S]+\\.(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)', '', caption)\n\n        #\n        caption = re.sub(r'[\\\"\\']{2,}', r'\"', caption)  # \"\"\"AUSVERKAUFT\"\"\"\n        caption = re.sub(r'[\\.]{2,}', r' ', caption)  # \"\"\"AUSVERKAUFT\"\"\"\n\n        caption = re.sub(self.bad_punct_regex, r' ', caption)  # ***AUSVERKAUFT***, #AUSVERKAUFT\n        caption = re.sub(r'\\s+\\.\\s+', r' ', caption)  # \" . \"\n\n        # this-is-my-cute-cat / this_is_my_cute_cat\n        regex2 = re.compile(r'(?:\\-|\\_)')\n        if len(re.findall(regex2, caption)) > 3:\n            caption = re.sub(regex2, ' ', caption)\n\n        caption = self.basic_clean(caption)\n\n        caption = re.sub(r'\\b[a-zA-Z]{1,3}\\d{3,15}\\b', '', caption)  # jc6640\n        caption = re.sub(r'\\b[a-zA-Z]+\\d+[a-zA-Z]+\\b', '', caption)  # jc6640vc\n        caption = re.sub(r'\\b\\d+[a-zA-Z]+\\d+\\b', '', caption)  # 6640vc231\n\n        caption = re.sub(r'(worldwide\\s+)?(free\\s+)?shipping', '', caption)\n        caption = re.sub(r'(free\\s)?download(\\sfree)?', '', caption)\n        caption = re.sub(r'\\bclick\\b\\s(?:for|on)\\s\\w+', '', caption)\n        caption = re.sub(r'\\b(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)(\\simage[s]?)?', '', caption)\n        caption = re.sub(r'\\bpage\\s+\\d+\\b', '', caption)\n\n        caption = re.sub(r'\\b\\d*[a-zA-Z]+\\d+[a-zA-Z]+\\d+[a-zA-Z\\d]*\\b', r' ', caption)  # j2d1a2a...\n\n        caption = re.sub(r'\\b\\d+\\.?\\d*[xх×]\\d+\\.?\\d*\\b', '', caption)\n\n        caption = re.sub(r'\\b\\s+\\:\\s+', r': ', caption)\n        caption = re.sub(r'(\\D[,\\./])\\b', r'\\1 ', caption)\n        caption = re.sub(r'\\s+', ' ', caption)\n\n        caption.strip()\n\n        caption = re.sub(r'^[\\\"\\']([\\w\\W]+)[\\\"\\']$', r'\\1', caption)\n        caption = re.sub(r'^[\\'\\_,\\-\\:;]', r'', caption)\n        caption = re.sub(r'[\\'\\_,\\-\\:\\-\\+]$', r'', caption)\n        caption = re.sub(r'^\\.\\S+$', '', caption)\n\n        return caption.strip()\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/timestep_sampler.py",
    "content": "# Modified from OpenAI's diffusion repos\n#     GLIDE: https://github.com/openai/glide-text2im/blob/main/glide_text2im/gaussian_diffusion.py\n#     ADM:   https://github.com/openai/guided-diffusion/blob/main/guided_diffusion\n#     IDDPM: https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\n\nfrom abc import ABC, abstractmethod\n\nimport numpy as np\nimport torch as th\nimport torch.distributed as dist\n\n\ndef create_named_schedule_sampler(name, diffusion):\n    \"\"\"\n    Create a ScheduleSampler from a library of pre-defined samplers.\n    :param name: the name of the sampler.\n    :param diffusion: the diffusion object to sample for.\n    \"\"\"\n    if name == \"uniform\":\n        return UniformSampler(diffusion)\n    elif name == \"loss-second-moment\":\n        return LossSecondMomentResampler(diffusion)\n    else:\n        raise NotImplementedError(f\"unknown schedule sampler: {name}\")\n\n\nclass ScheduleSampler(ABC):\n    \"\"\"\n    A distribution over timesteps in the diffusion process, intended to reduce\n    variance of the objective.\n    By default, samplers perform unbiased importance sampling, in which the\n    objective's mean is unchanged.\n    However, subclasses may override sample() to change how the resampled\n    terms are reweighted, allowing for actual changes in the objective.\n    \"\"\"\n\n    @abstractmethod\n    def weights(self):\n        \"\"\"\n        Get a numpy array of weights, one per diffusion step.\n        The weights needn't be normalized, but must be positive.\n        \"\"\"\n\n    def sample(self, batch_size, device):\n        \"\"\"\n        Importance-sample timesteps for a batch.\n        :param batch_size: the number of timesteps.\n        :param device: the torch device to save to.\n        :return: a tuple (timesteps, weights):\n                 - timesteps: a tensor of timestep indices.\n                 - weights: a tensor of weights to scale the resulting losses.\n        \"\"\"\n        w = self.weights()\n        p = w / np.sum(w)\n        indices_np = np.random.choice(len(p), size=(batch_size,), p=p)\n        indices = th.from_numpy(indices_np).long().to(device)\n        weights_np = 1 / (len(p) * p[indices_np])\n        weights = th.from_numpy(weights_np).float().to(device)\n        return indices, weights\n\n\nclass UniformSampler(ScheduleSampler):\n    def __init__(self, diffusion):\n        self.diffusion = diffusion\n        self._weights = np.ones([diffusion.num_timesteps])\n\n    def weights(self):\n        return self._weights\n\n\nclass LossAwareSampler(ScheduleSampler):\n    def update_with_local_losses(self, local_ts, local_losses):\n        \"\"\"\n        Update the reweighting using losses from a model.\n        Call this method from each rank with a batch of timesteps and the\n        corresponding losses for each of those timesteps.\n        This method will perform synchronization to make sure all of the ranks\n        maintain the exact same reweighting.\n        :param local_ts: an integer Tensor of timesteps.\n        :param local_losses: a 1D Tensor of losses.\n        \"\"\"\n        batch_sizes = [\n            th.tensor([0], dtype=th.int32, device=local_ts.device)\n            for _ in range(dist.get_world_size())\n        ]\n        dist.all_gather(\n            batch_sizes,\n            th.tensor([len(local_ts)], dtype=th.int32, device=local_ts.device),\n        )\n\n        # Pad all_gather batches to be the maximum batch size.\n        batch_sizes = [x.item() for x in batch_sizes]\n        max_bs = max(batch_sizes)\n\n        timestep_batches = [th.zeros(max_bs, device=local_ts.device) for _ in batch_sizes]\n        loss_batches = [th.zeros(max_bs, device=local_losses.device) for _ in batch_sizes]\n        dist.all_gather(timestep_batches, local_ts)\n        dist.all_gather(loss_batches, local_losses)\n        timesteps = [\n            x.item() for y, bs in zip(timestep_batches, batch_sizes) for x in y[:bs]\n        ]\n        losses = [x.item() for y, bs in zip(loss_batches, batch_sizes) for x in y[:bs]]\n        self.update_with_all_losses(timesteps, losses)\n\n    @abstractmethod\n    def update_with_all_losses(self, ts, losses):\n        \"\"\"\n        Update the reweighting using losses from a model.\n        Sub-classes should override this method to update the reweighting\n        using losses from the model.\n        This method directly updates the reweighting without synchronizing\n        between workers. It is called by update_with_local_losses from all\n        ranks with identical arguments. Thus, it should have deterministic\n        behavior to maintain state across workers.\n        :param ts: a list of int timesteps.\n        :param losses: a list of float losses, one per timestep.\n        \"\"\"\n\n\nclass LossSecondMomentResampler(LossAwareSampler):\n    def __init__(self, diffusion, history_per_term=10, uniform_prob=0.001):\n        self.diffusion = diffusion\n        self.history_per_term = history_per_term\n        self.uniform_prob = uniform_prob\n        self._loss_history = np.zeros(\n            [diffusion.num_timesteps, history_per_term], dtype=np.float64\n        )\n        self._loss_counts = np.zeros([diffusion.num_timesteps], dtype=np.int)\n\n    def weights(self):\n        if not self._warmed_up():\n            return np.ones([self.diffusion.num_timesteps], dtype=np.float64)\n        weights = np.sqrt(np.mean(self._loss_history ** 2, axis=-1))\n        weights /= np.sum(weights)\n        weights *= 1 - self.uniform_prob\n        weights += self.uniform_prob / len(weights)\n        return weights\n\n    def update_with_all_losses(self, ts, losses):\n        for t, loss in zip(ts, losses):\n            if self._loss_counts[t] == self.history_per_term:\n                # Shift out the oldest loss term.\n                self._loss_history[t, :-1] = self._loss_history[t, 1:]\n                self._loss_history[t, -1] = loss\n            else:\n                self._loss_history[t, self._loss_counts[t]] = loss\n                self._loss_counts[t] += 1\n\n    def _warmed_up(self):\n        return (self._loss_counts == self.history_per_term).all()\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/model/utils.py",
    "content": "import os\nimport sys\nimport torch.nn as nn\nfrom torch.utils.checkpoint import checkpoint, checkpoint_sequential\nimport torch.nn.functional as F\nimport torch\nimport torch.distributed as dist\nimport re\nimport math\nfrom collections.abc import Iterable\nfrom itertools import repeat\nfrom torchvision import transforms as T\nimport random\nfrom PIL import Image\n\n\ndef _ntuple(n):\n    def parse(x):\n        if isinstance(x, Iterable) and not isinstance(x, str):\n            return x\n        return tuple(repeat(x, n))\n    return parse\n\n\nto_1tuple = _ntuple(1)\nto_2tuple = _ntuple(2)\n\ndef set_grad_checkpoint(model, use_fp32_attention=False, gc_step=1):\n    assert isinstance(model, nn.Module)\n\n    def set_attr(module):\n        module.grad_checkpointing = True\n        module.fp32_attention = use_fp32_attention\n        module.grad_checkpointing_step = gc_step\n    model.apply(set_attr)\n\n\ndef auto_grad_checkpoint(module, *args, **kwargs):\n    if getattr(module, 'grad_checkpointing', False):\n        if not isinstance(module, Iterable):\n            return checkpoint(module, *args, **kwargs)\n        gc_step = module[0].grad_checkpointing_step\n        return checkpoint_sequential(module, gc_step, *args, **kwargs)\n    return module(*args, **kwargs)\n\n\ndef checkpoint_sequential(functions, step, input, *args, **kwargs):\n\n    # Hack for keyword-only parameter in a python 2.7-compliant way\n    preserve = kwargs.pop('preserve_rng_state', True)\n    if kwargs:\n        raise ValueError(\"Unexpected keyword arguments: \" + \",\".join(kwargs))\n\n    def run_function(start, end, functions):\n        def forward(input):\n            for j in range(start, end + 1):\n                input = functions[j](input, *args)\n            return input\n        return forward\n\n    if isinstance(functions, torch.nn.Sequential):\n        functions = list(functions.children())\n\n    # the last chunk has to be non-volatile\n    end = -1\n    segment = len(functions) // step\n    for start in range(0, step * (segment - 1), step):\n        end = start + step - 1\n        input = checkpoint(run_function(start, end, functions), input, preserve_rng_state=preserve)\n    return run_function(end + 1, len(functions) - 1, functions)(input)\n\n\ndef window_partition(x, window_size):\n    \"\"\"\n    Partition into non-overlapping windows with padding if needed.\n    Args:\n        x (tensor): input tokens with [B, H, W, C].\n        window_size (int): window size.\n\n    Returns:\n        windows: windows after partition with [B * num_windows, window_size, window_size, C].\n        (Hp, Wp): padded height and width before partition\n    \"\"\"\n    B, H, W, C = x.shape\n\n    pad_h = (window_size - H % window_size) % window_size\n    pad_w = (window_size - W % window_size) % window_size\n    if pad_h > 0 or pad_w > 0:\n        x = F.pad(x, (0, 0, 0, pad_w, 0, pad_h))\n    Hp, Wp = H + pad_h, W + pad_w\n\n    x = x.view(B, Hp // window_size, window_size, Wp // window_size, window_size, C)\n    windows = x.permute(0, 1, 3, 2, 4, 5).contiguous().view(-1, window_size, window_size, C)\n    return windows, (Hp, Wp)\n\n\ndef window_unpartition(windows, window_size, pad_hw, hw):\n    \"\"\"\n    Window unpartition into original sequences and removing padding.\n    Args:\n        x (tensor): input tokens with [B * num_windows, window_size, window_size, C].\n        window_size (int): window size.\n        pad_hw (Tuple): padded height and width (Hp, Wp).\n        hw (Tuple): original height and width (H, W) before padding.\n\n    Returns:\n        x: unpartitioned sequences with [B, H, W, C].\n    \"\"\"\n    Hp, Wp = pad_hw\n    H, W = hw\n    B = windows.shape[0] // (Hp * Wp // window_size // window_size)\n    x = windows.view(B, Hp // window_size, Wp // window_size, window_size, window_size, -1)\n    x = x.permute(0, 1, 3, 2, 4, 5).contiguous().view(B, Hp, Wp, -1)\n\n    if Hp > H or Wp > W:\n        x = x[:, :H, :W, :].contiguous()\n    return x\n\n\ndef get_rel_pos(q_size, k_size, rel_pos):\n    \"\"\"\n    Get relative positional embeddings according to the relative positions of\n        query and key sizes.\n    Args:\n        q_size (int): size of query q.\n        k_size (int): size of key k.\n        rel_pos (Tensor): relative position embeddings (L, C).\n\n    Returns:\n        Extracted positional embeddings according to relative positions.\n    \"\"\"\n    max_rel_dist = int(2 * max(q_size, k_size) - 1)\n    # Interpolate rel pos if needed.\n    if rel_pos.shape[0] != max_rel_dist:\n        # Interpolate rel pos.\n        rel_pos_resized = F.interpolate(\n            rel_pos.reshape(1, rel_pos.shape[0], -1).permute(0, 2, 1),\n            size=max_rel_dist,\n            mode=\"linear\",\n        )\n        rel_pos_resized = rel_pos_resized.reshape(-1, max_rel_dist).permute(1, 0)\n    else:\n        rel_pos_resized = rel_pos\n\n    # Scale the coords with short length if shapes for q and k are different.\n    q_coords = torch.arange(q_size)[:, None] * max(k_size / q_size, 1.0)\n    k_coords = torch.arange(k_size)[None, :] * max(q_size / k_size, 1.0)\n    relative_coords = (q_coords - k_coords) + (k_size - 1) * max(q_size / k_size, 1.0)\n\n    return rel_pos_resized[relative_coords.long()]\n\n\ndef add_decomposed_rel_pos(attn, q, rel_pos_h, rel_pos_w, q_size, k_size):\n    \"\"\"\n    Calculate decomposed Relative Positional Embeddings from :paper:`mvitv2`.\n    https://github.com/facebookresearch/mvit/blob/19786631e330df9f3622e5402b4a419a263a2c80/mvit/models/attention.py   # noqa B950\n    Args:\n        attn (Tensor): attention map.\n        q (Tensor): query q in the attention layer with shape (B, q_h * q_w, C).\n        rel_pos_h (Tensor): relative position embeddings (Lh, C) for height axis.\n        rel_pos_w (Tensor): relative position embeddings (Lw, C) for width axis.\n        q_size (Tuple): spatial sequence size of query q with (q_h, q_w).\n        k_size (Tuple): spatial sequence size of key k with (k_h, k_w).\n\n    Returns:\n        attn (Tensor): attention map with added relative positional embeddings.\n    \"\"\"\n    q_h, q_w = q_size\n    k_h, k_w = k_size\n    Rh = get_rel_pos(q_h, k_h, rel_pos_h)\n    Rw = get_rel_pos(q_w, k_w, rel_pos_w)\n\n    B, _, dim = q.shape\n    r_q = q.reshape(B, q_h, q_w, dim)\n    rel_h = torch.einsum(\"bhwc,hkc->bhwk\", r_q, Rh)\n    rel_w = torch.einsum(\"bhwc,wkc->bhwk\", r_q, Rw)\n\n    attn = (\n        attn.view(B, q_h, q_w, k_h, k_w) + rel_h[:, :, :, :, None] + rel_w[:, :, :, None, :]\n    ).view(B, q_h * q_w, k_h * k_w)\n\n    return attn\n\ndef mean_flat(tensor):\n    return tensor.mean(dim=list(range(1, tensor.ndim)))\n\n\n#################################################################################\n#                          Token Masking and Unmasking                          #\n#################################################################################\ndef get_mask(batch, length, mask_ratio, device, mask_type=None, data_info=None, extra_len=0):\n    \"\"\"\n    Get the binary mask for the input sequence.\n    Args:\n        - batch: batch size\n        - length: sequence length\n        - mask_ratio: ratio of tokens to mask\n        - data_info: dictionary with info for reconstruction\n    return:\n        mask_dict with following keys:\n        - mask: binary mask, 0 is keep, 1 is remove\n        - ids_keep: indices of tokens to keep\n        - ids_restore: indices to restore the original order\n    \"\"\"\n    assert mask_type in ['random', 'fft', 'laplacian', 'group']\n    mask = torch.ones([batch, length], device=device)\n    len_keep = int(length * (1 - mask_ratio)) - extra_len\n\n    if mask_type in ['random', 'group']:\n        noise = torch.rand(batch, length, device=device)  # noise in [0, 1]\n        ids_shuffle = torch.argsort(noise, dim=1)  # ascend: small is keep, large is remove\n        ids_restore = torch.argsort(ids_shuffle, dim=1)\n        # keep the first subset\n        ids_keep = ids_shuffle[:, :len_keep]\n        ids_removed = ids_shuffle[:, len_keep:]\n\n    elif mask_type in ['fft', 'laplacian']:\n        if 'strength' in data_info:\n            strength = data_info['strength']\n\n        else:\n            N = data_info['N'][0]\n            img = data_info['ori_img']\n            # 获取原图的尺寸信息\n            _, C, H, W = img.shape\n            if mask_type == 'fft':\n                # 对图片进行reshape，将其变为patch (3, H/N, N, W/N, N)\n                reshaped_image = img.reshape((batch, -1, H // N, N, W // N, N))\n                fft_image = torch.fft.fftn(reshaped_image, dim=(3, 5))\n                # 取绝对值并求和获取频率强度\n                strength = torch.sum(torch.abs(fft_image), dim=(1, 3, 5)).reshape((batch, -1,))\n            elif type == 'laplacian':\n                laplacian_kernel = torch.tensor([[-1, -1, -1], [-1, 8, -1], [-1, -1, -1]], dtype=torch.float32).reshape(1, 1, 3, 3)\n                laplacian_kernel = laplacian_kernel.repeat(C, 1, 1, 1)\n                # 对图片进行reshape，将其变为patch (3, H/N, N, W/N, N)\n                reshaped_image = img.reshape(-1, C, H // N, N, W // N, N).permute(0, 2, 4, 1, 3, 5).reshape(-1, C, N, N)\n                laplacian_response = F.conv2d(reshaped_image, laplacian_kernel, padding=1, groups=C)\n                strength = laplacian_response.sum(dim=[1, 2, 3]).reshape((batch, -1,))\n\n        # 对频率强度进行归一化，然后使用torch.multinomial进行采样\n        probabilities = strength / (strength.max(dim=1)[0][:, None]+1e-5)\n        ids_shuffle = torch.multinomial(probabilities.clip(1e-5, 1), length, replacement=False)\n        ids_keep = ids_shuffle[:, :len_keep]\n        ids_restore = torch.argsort(ids_shuffle, dim=1)\n        ids_removed = ids_shuffle[:, len_keep:]\n\n    mask[:, :len_keep] = 0\n    mask = torch.gather(mask, dim=1, index=ids_restore)\n\n    return {'mask': mask,\n            'ids_keep': ids_keep,\n            'ids_restore': ids_restore,\n            'ids_removed': ids_removed}\n\n\ndef mask_out_token(x, ids_keep, ids_removed=None):\n    \"\"\"\n    Mask out the tokens specified by ids_keep.\n    Args:\n        - x: input sequence, [N, L, D]\n        - ids_keep: indices of tokens to keep\n    return:\n        - x_masked: masked sequence\n    \"\"\"\n    N, L, D = x.shape  # batch, length, dim\n    x_remain = torch.gather(x, dim=1, index=ids_keep.unsqueeze(-1).repeat(1, 1, D))\n    if ids_removed is not None:\n        x_masked = torch.gather(x, dim=1, index=ids_removed.unsqueeze(-1).repeat(1, 1, D))\n        return x_remain, x_masked\n    else:\n        return x_remain\n\n\ndef mask_tokens(x, mask_ratio):\n    \"\"\"\n    Perform per-sample random masking by per-sample shuffling.\n    Per-sample shuffling is done by argsort random noise.\n    x: [N, L, D], sequence\n    \"\"\"\n    N, L, D = x.shape  # batch, length, dim\n    len_keep = int(L * (1 - mask_ratio))\n\n    noise = torch.rand(N, L, device=x.device)  # noise in [0, 1]\n\n    # sort noise for each sample\n    ids_shuffle = torch.argsort(noise, dim=1)  # ascend: small is keep, large is remove\n    ids_restore = torch.argsort(ids_shuffle, dim=1)\n\n    # keep the first subset\n    ids_keep = ids_shuffle[:, :len_keep]\n    x_masked = torch.gather(x, dim=1, index=ids_keep.unsqueeze(-1).repeat(1, 1, D))\n\n    # generate the binary mask: 0 is keep, 1 is remove\n    mask = torch.ones([N, L], device=x.device)\n    mask[:, :len_keep] = 0\n    mask = torch.gather(mask, dim=1, index=ids_restore)\n\n    return x_masked, mask, ids_restore\n\n\ndef unmask_tokens(x, ids_restore, mask_token):\n    # x: [N, T, D] if extras == 0 (i.e., no cls token) else x: [N, T+1, D]\n    mask_tokens = mask_token.repeat(x.shape[0], ids_restore.shape[1] - x.shape[1], 1)\n    x = torch.cat([x, mask_tokens], dim=1)\n    x = torch.gather(x, dim=1, index=ids_restore.unsqueeze(-1).repeat(1, 1, x.shape[2]))  # unshuffle\n    return x\n\n\n# Parse 'None' to None and others to float value\ndef parse_float_none(s):\n    assert isinstance(s, str)\n    return None if s == 'None' else float(s)\n\n\n#----------------------------------------------------------------------------\n# Parse a comma separated list of numbers or ranges and return a list of ints.\n# Example: '1,2,5-10' returns [1, 2, 5, 6, 7, 8, 9, 10]\n\ndef parse_int_list(s):\n    if isinstance(s, list): return s\n    ranges = []\n    range_re = re.compile(r'^(\\d+)-(\\d+)$')\n    for p in s.split(','):\n        if m := range_re.match(p):\n            ranges.extend(range(int(m.group(1)), int(m.group(2))+1))\n        else:\n            ranges.append(int(p))\n    return ranges\n\n\ndef init_processes(fn, args):\n    \"\"\" Initialize the distributed environment. \"\"\"\n    os.environ['MASTER_ADDR'] = args.master_address\n    os.environ['MASTER_PORT'] = str(random.randint(2000, 6000))\n    print(f'MASTER_ADDR = {os.environ[\"MASTER_ADDR\"]}')\n    print(f'MASTER_PORT = {os.environ[\"MASTER_PORT\"]}')\n    torch.cuda.set_device(args.local_rank)\n    dist.init_process_group(backend='nccl', init_method='env://', rank=args.global_rank, world_size=args.global_size)\n    fn(args)\n    if args.global_size > 1:\n        cleanup()\n\n\ndef mprint(*args, **kwargs):\n    \"\"\"\n    Print only from rank 0.\n    \"\"\"\n    if dist.get_rank() == 0:\n        print(*args, **kwargs)\n\n\ndef cleanup():\n    \"\"\"\n    End DDP training.\n    \"\"\"\n    dist.barrier()\n    mprint(\"Done!\")\n    dist.barrier()\n    dist.destroy_process_group()\n\n\n#----------------------------------------------------------------------------\n# logging info.\nclass Logger(object):\n    \"\"\"\n    Redirect stderr to stdout, optionally print stdout to a file,\n    and optionally force flushing on both stdout and the file.\n    \"\"\"\n\n    def __init__(self, file_name=None, file_mode=\"w\", should_flush=True):\n        self.file = None\n\n        if file_name is not None:\n            self.file = open(file_name, file_mode)\n\n        self.should_flush = should_flush\n        self.stdout = sys.stdout\n        self.stderr = sys.stderr\n\n        sys.stdout = self\n        sys.stderr = self\n\n    def __enter__(self):\n        return self\n\n    def __exit__(self, exc_type, exc_value, traceback):\n        self.close()\n\n    def write(self, text):\n        \"\"\"Write text to stdout (and a file) and optionally flush.\"\"\"\n        if len(text) == 0: # workaround for a bug in VSCode debugger: sys.stdout.write(''); sys.stdout.flush() => crash\n            return\n\n        if self.file is not None:\n            self.file.write(text)\n\n        self.stdout.write(text)\n\n        if self.should_flush:\n            self.flush()\n\n    def flush(self):\n        \"\"\"Flush written text to both stdout and a file, if open.\"\"\"\n        if self.file is not None:\n            self.file.flush()\n\n        self.stdout.flush()\n\n    def close(self):\n        \"\"\"Flush, close possible files, and remove stdout/stderr mirroring.\"\"\"\n        self.flush()\n\n        # if using multiple loggers, prevent closing in wrong order\n        if sys.stdout is self:\n            sys.stdout = self.stdout\n        if sys.stderr is self:\n            sys.stderr = self.stderr\n\n        if self.file is not None:\n            self.file.close()\n\n\nclass StackedRandomGenerator:\n    def __init__(self, device, seeds):\n        super().__init__()\n        self.generators = [torch.Generator(device).manual_seed(int(seed) % (1 << 32)) for seed in seeds]\n\n    def randn(self, size, **kwargs):\n        assert size[0] == len(self.generators)\n        return torch.stack([torch.randn(size[1:], generator=gen, **kwargs) for gen in self.generators])\n\n    def randn_like(self, input):\n        return self.randn(input.shape, dtype=input.dtype, layout=input.layout, device=input.device)\n\n    def randint(self, *args, size, **kwargs):\n        assert size[0] == len(self.generators)\n        return torch.stack([torch.randint(*args, size=size[1:], generator=gen, **kwargs) for gen in self.generators])\n\n\ndef prepare_prompt_ar(prompt, ratios, device='cpu', show=True):\n    # get aspect_ratio or ar\n    aspect_ratios = re.findall(r\"--aspect_ratio\\s+(\\d+:\\d+)\", prompt)\n    ars = re.findall(r\"--ar\\s+(\\d+:\\d+)\", prompt)\n    custom_hw = re.findall(r\"--hw\\s+(\\d+:\\d+)\", prompt)\n    if show:\n        print(\"aspect_ratios:\", aspect_ratios, \"ars:\", ars, \"hws:\", custom_hw)\n    prompt_clean = prompt.split(\"--aspect_ratio\")[0].split(\"--ar\")[0].split(\"--hw\")[0]\n    if len(aspect_ratios) + len(ars) + len(custom_hw) == 0 and show:\n        print( \"Wrong prompt format. Set to default ar: 1. change your prompt into format '--ar h:w or --hw h:w' for correct generating\")\n    if len(aspect_ratios) != 0:\n        ar = float(aspect_ratios[0].split(':')[0]) / float(aspect_ratios[0].split(':')[1])\n    elif len(ars) != 0:\n        ar = float(ars[0].split(':')[0]) / float(ars[0].split(':')[1])\n    else:\n        ar = 1.\n    closest_ratio = min(ratios.keys(), key=lambda ratio: abs(float(ratio) - ar))\n    if len(custom_hw) != 0:\n        custom_hw = [float(custom_hw[0].split(':')[0]), float(custom_hw[0].split(':')[1])]\n    else:\n        custom_hw = ratios[closest_ratio]\n    default_hw = ratios[closest_ratio]\n    prompt_show = f'prompt: {prompt_clean.strip()}\\nSize: --ar {closest_ratio}, --bin hw {ratios[closest_ratio]}, --custom hw {custom_hw}'\n    return prompt_clean, prompt_show, torch.tensor(default_hw, device=device)[None], torch.tensor([float(closest_ratio)], device=device)[None], torch.tensor(custom_hw, device=device)[None]\n\n\ndef resize_and_crop_tensor(samples: torch.Tensor, new_width: int, new_height: int):\n    orig_hw = torch.tensor([samples.shape[2], samples.shape[3]], dtype=torch.int)\n    custom_hw = torch.tensor([int(new_height), int(new_width)], dtype=torch.int)\n\n    if (orig_hw != custom_hw).all():\n        ratio = max(custom_hw[0] / orig_hw[0], custom_hw[1] / orig_hw[1])\n        resized_width = int(orig_hw[1] * ratio)\n        resized_height = int(orig_hw[0] * ratio)\n\n        transform = T.Compose([\n            T.Resize((resized_height, resized_width)),\n            T.CenterCrop(custom_hw.tolist())\n        ])\n        return transform(samples)\n    else:\n        return samples\n\n\ndef resize_and_crop_img(img: Image, new_width, new_height):\n    orig_width, orig_height = img.size\n\n    ratio = max(new_width/orig_width, new_height/orig_height)\n    resized_width = int(orig_width * ratio)\n    resized_height = int(orig_height * ratio)\n\n    img = img.resize((resized_width, resized_height), Image.LANCZOS)\n\n    left = (resized_width - new_width)/2\n    top = (resized_height - new_height)/2\n    right = (resized_width + new_width)/2\n    bottom = (resized_height + new_height)/2\n\n    img = img.crop((left, top, right, bottom))\n\n    return img\n\n\n\ndef mask_feature(emb, mask):\n    if emb.shape[0] == 1:\n        keep_index = mask.sum().item()\n        return emb[:, :, :keep_index, :], keep_index\n    else:\n        masked_feature = emb * mask[:, None, :, None]\n        return masked_feature, emb.shape[2]"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/sa_sampler.py",
    "content": "\"\"\"SAMPLING ONLY.\"\"\"\n\nimport torch\nimport numpy as np\n\nfrom diffusion.model.sa_solver import NoiseScheduleVP, model_wrapper, SASolver\nfrom .model import gaussian_diffusion as gd\n\n\nclass SASolverSampler(object):\n    def __init__(self, model,\n                 noise_schedule=\"linear\",\n                 diffusion_steps=1000,\n                 device='cpu',\n                 ):\n        super().__init__()\n        self.model = model\n        self.device = device\n        to_torch = lambda x: x.clone().detach().to(torch.float32).to(device)\n        betas = torch.tensor(gd.get_named_beta_schedule(noise_schedule, diffusion_steps))\n        alphas = 1.0 - betas\n        self.register_buffer('alphas_cumprod', to_torch(np.cumprod(alphas, axis=0)))\n\n    def register_buffer(self, name, attr):\n        if type(attr) == torch.Tensor and attr.device != torch.device(\"cuda\"):\n            attr = attr.to(torch.device(\"cuda\"))\n        setattr(self, name, attr)\n\n    @torch.no_grad()\n    def sample(self, S, batch_size, shape, conditioning=None, callback=None, normals_sequence=None, img_callback=None, quantize_x0=False, eta=0., mask=None, x0=None, temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None, verbose=True, x_T=None, log_every_t=100, unconditional_guidance_scale=1., unconditional_conditioning=None, model_kwargs=None, **kwargs):\n        if model_kwargs is None:\n            model_kwargs = {}\n        if conditioning is not None:\n            if isinstance(conditioning, dict):\n                cbs = conditioning[list(conditioning.keys())[0]].shape[0]\n                if cbs != batch_size:\n                    print(f\"Warning: Got {cbs} conditionings but batch-size is {batch_size}\")\n            elif conditioning.shape[0] != batch_size:\n                print(f\"Warning: Got {conditioning.shape[0]} conditionings but batch-size is {batch_size}\")\n\n        # sampling\n        C, H, W = shape\n        size = (batch_size, C, H, W)\n\n        device = self.device\n        img = torch.randn(size, device=device) if x_T is None else x_T\n        ns = NoiseScheduleVP('discrete', alphas_cumprod=self.alphas_cumprod)\n\n        model_fn = model_wrapper(\n            self.model,\n            ns,\n            model_type=\"noise\",\n            guidance_type=\"classifier-free\",\n            condition=conditioning,\n            unconditional_condition=unconditional_conditioning,\n            guidance_scale=unconditional_guidance_scale,\n            model_kwargs=model_kwargs,\n        )\n\n        sasolver = SASolver(model_fn, ns, algorithm_type=\"data_prediction\")\n\n        tau_t = lambda t: eta if 0.2 <= t <= 0.8 else 0\n\n        x = sasolver.sample(mode='few_steps', x=img, tau=tau_t, steps=S, skip_type='time', skip_order=1, predictor_order=2, corrector_order=2, pc_mode='PEC', return_intermediate=False)\n\n        return x.to(device), None"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/sa_solver_diffusers.py",
    "content": "# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\n# DISCLAIMER: check https://arxiv.org/abs/2309.05019\n# The codebase is modified based on https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py\n\nimport math\nfrom typing import List, Optional, Tuple, Union, Callable\n\nimport numpy as np\nimport torch\n\nfrom diffusers.configuration_utils import ConfigMixin, register_to_config\nfrom diffusers.utils.torch_utils import randn_tensor\nfrom diffusers.schedulers.scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin, SchedulerOutput\n\n\n# Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar\ndef betas_for_alpha_bar(\n        num_diffusion_timesteps,\n        max_beta=0.999,\n        alpha_transform_type=\"cosine\",\n):\n    \"\"\"\n    Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of\n    (1-beta) over time from t = [0,1].\n\n    Contains a function alpha_bar that takes an argument t and transforms it to the cumulative product of (1-beta) up\n    to that part of the diffusion process.\n\n\n    Args:\n        num_diffusion_timesteps (`int`): the number of betas to produce.\n        max_beta (`float`): the maximum beta to use; use values lower than 1 to\n                     prevent singularities.\n        alpha_transform_type (`str`, *optional*, default to `cosine`): the type of noise schedule for alpha_bar.\n                     Choose from `cosine` or `exp`\n\n    Returns:\n        betas (`np.ndarray`): the betas used by the scheduler to step the model outputs\n    \"\"\"\n    if alpha_transform_type == \"cosine\":\n\n        def alpha_bar_fn(t):\n            return math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2\n\n    elif alpha_transform_type == \"exp\":\n\n        def alpha_bar_fn(t):\n            return math.exp(t * -12.0)\n\n    else:\n        raise ValueError(f\"Unsupported alpha_tranform_type: {alpha_transform_type}\")\n\n    betas = []\n    for i in range(num_diffusion_timesteps):\n        t1 = i / num_diffusion_timesteps\n        t2 = (i + 1) / num_diffusion_timesteps\n        betas.append(min(1 - alpha_bar_fn(t2) / alpha_bar_fn(t1), max_beta))\n    return torch.tensor(betas, dtype=torch.float32)\n\n\nclass SASolverScheduler(SchedulerMixin, ConfigMixin):\n    \"\"\"\n    `SASolverScheduler` is a fast dedicated high-order solver for diffusion SDEs.\n\n    This model inherits from [`SchedulerMixin`] and [`ConfigMixin`]. Check the superclass documentation for the generic\n    methods the library implements for all schedulers such as loading and saving.\n\n    Args:\n        num_train_timesteps (`int`, defaults to 1000):\n            The number of diffusion steps to train the model.\n        beta_start (`float`, defaults to 0.0001):\n            The starting `beta` value of inference.\n        beta_end (`float`, defaults to 0.02):\n            The final `beta` value.\n        beta_schedule (`str`, defaults to `\"linear\"`):\n            The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from\n            `linear`, `scaled_linear`, or `squaredcos_cap_v2`.\n        trained_betas (`np.ndarray`, *optional*):\n            Pass an array of betas directly to the constructor to bypass `beta_start` and `beta_end`.\n        predictor_order (`int`, defaults to 2):\n            The predictor order which can be `1` or `2` or `3` or '4'. It is recommended to use `predictor_order=2` for guided\n            sampling, and `predictor_order=3` for unconditional sampling.\n        corrector_order (`int`, defaults to 2):\n            The corrector order which can be `1` or `2` or `3` or '4'. It is recommended to use `corrector_order=2` for guided\n            sampling, and `corrector_order=3` for unconditional sampling.\n        predictor_corrector_mode (`str`, defaults to `PEC`):\n            The predictor-corrector mode can be `PEC` or 'PECE'. It is recommended to use `PEC` mode for fast\n            sampling, and `PECE` for high-quality sampling (PECE needs around twice model evaluations as PEC).\n        prediction_type (`str`, defaults to `epsilon`, *optional*):\n            Prediction type of the scheduler function; can be `epsilon` (predicts the noise of the diffusion process),\n            `sample` (directly predicts the noisy sample`) or `v_prediction` (see section 2.4 of [Imagen\n            Video](https://imagen.research.google/video/paper.pdf) paper).\n        thresholding (`bool`, defaults to `False`):\n            Whether to use the \"dynamic thresholding\" method. This is unsuitable for latent-space diffusion models such\n            as Stable Diffusion.\n        dynamic_thresholding_ratio (`float`, defaults to 0.995):\n            The ratio for the dynamic thresholding method. Valid only when `thresholding=True`.\n        sample_max_value (`float`, defaults to 1.0):\n            The threshold value for dynamic thresholding. Valid only when `thresholding=True` and\n            `algorithm_type=\"dpmsolver++\"`.\n        algorithm_type (`str`, defaults to `data_prediction`):\n            Algorithm type for the solver; can be `data_prediction` or `noise_prediction`. It is recommended to use `data_prediction`\n            with `solver_order=2` for guided sampling like in Stable Diffusion.\n        lower_order_final (`bool`, defaults to `True`):\n            Whether to use lower-order solvers in the final steps. Default = True.\n        use_karras_sigmas (`bool`, *optional*, defaults to `False`):\n            Whether to use Karras sigmas for step sizes in the noise schedule during the sampling process. If `True`,\n            the sigmas are determined according to a sequence of noise levels {σi}.\n        lambda_min_clipped (`float`, defaults to `-inf`):\n            Clipping threshold for the minimum value of `lambda(t)` for numerical stability. This is critical for the\n            cosine (`squaredcos_cap_v2`) noise schedule.\n        variance_type (`str`, *optional*):\n            Set to \"learned\" or \"learned_range\" for diffusion models that predict variance. If set, the model's output\n            contains the predicted Gaussian variance.\n        timestep_spacing (`str`, defaults to `\"linspace\"`):\n            The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and\n            Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information.\n        steps_offset (`int`, defaults to 0):\n            An offset added to the inference steps. You can use a combination of `offset=1` and\n            `set_alpha_to_one=False` to make the last step use step 0 for the previous alpha product like in Stable\n            Diffusion.\n    \"\"\"\n\n    _compatibles = [e.name for e in KarrasDiffusionSchedulers]\n    order = 1\n\n    @register_to_config\n    def __init__(\n            self,\n            num_train_timesteps: int = 1000,\n            beta_start: float = 0.0001,\n            beta_end: float = 0.02,\n            beta_schedule: str = \"linear\",\n            trained_betas: Optional[Union[np.ndarray, List[float]]] = None,\n            predictor_order: int = 2,\n            corrector_order: int = 2,\n            predictor_corrector_mode: str = 'PEC',\n            prediction_type: str = \"epsilon\",\n            tau_func: Callable = lambda t: 1 if t >= 200 and t <= 800 else 0,\n            thresholding: bool = False,\n            dynamic_thresholding_ratio: float = 0.995,\n            sample_max_value: float = 1.0,\n            algorithm_type: str = \"data_prediction\",\n            lower_order_final: bool = True,\n            use_karras_sigmas: Optional[bool] = False,\n            lambda_min_clipped: float = -float(\"inf\"),\n            variance_type: Optional[str] = None,\n            timestep_spacing: str = \"linspace\",\n            steps_offset: int = 0,\n    ):\n        if trained_betas is not None:\n            self.betas = torch.tensor(trained_betas, dtype=torch.float32)\n        elif beta_schedule == \"linear\":\n            self.betas = torch.linspace(beta_start, beta_end, num_train_timesteps, dtype=torch.float32)\n        elif beta_schedule == \"scaled_linear\":\n            # this schedule is very specific to the latent diffusion model.\n            self.betas = (\n                    torch.linspace(beta_start ** 0.5, beta_end ** 0.5, num_train_timesteps, dtype=torch.float32) ** 2\n            )\n        elif beta_schedule == \"squaredcos_cap_v2\":\n            # Glide cosine schedule\n            self.betas = betas_for_alpha_bar(num_train_timesteps)\n        else:\n            raise NotImplementedError(f\"{beta_schedule} does is not implemented for {self.__class__}\")\n\n        self.alphas = 1.0 - self.betas\n        self.alphas_cumprod = torch.cumprod(self.alphas, dim=0)\n        # Currently we only support VP-type noise schedule\n        self.alpha_t = torch.sqrt(self.alphas_cumprod)\n        self.sigma_t = torch.sqrt(1 - self.alphas_cumprod)\n        self.lambda_t = torch.log(self.alpha_t) - torch.log(self.sigma_t)\n\n        # standard deviation of the initial noise distribution\n        self.init_noise_sigma = 1.0\n\n        if algorithm_type not in [\"data_prediction\", \"noise_prediction\"]:\n            raise NotImplementedError(f\"{algorithm_type} does is not implemented for {self.__class__}\")\n\n        # setable values\n        self.num_inference_steps = None\n        timesteps = np.linspace(0, num_train_timesteps - 1, num_train_timesteps, dtype=np.float32)[::-1].copy()\n        self.timesteps = torch.from_numpy(timesteps)\n        self.timestep_list = [None] * max(predictor_order, corrector_order - 1)\n        self.model_outputs = [None] * max(predictor_order, corrector_order - 1)\n\n        self.tau_func = tau_func\n        self.predict_x0 = algorithm_type == \"data_prediction\"\n        self.lower_order_nums = 0\n        self.last_sample = None\n\n    def set_timesteps(self, num_inference_steps: int = None, device: Union[str, torch.device] = None):\n        \"\"\"\n        Sets the discrete timesteps used for the diffusion chain (to be run before inference).\n\n        Args:\n            num_inference_steps (`int`):\n                The number of diffusion steps used when generating samples with a pre-trained model.\n            device (`str` or `torch.device`, *optional*):\n                The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.\n        \"\"\"\n        # Clipping the minimum of all lambda(t) for numerical stability.\n        # This is critical for cosine (squaredcos_cap_v2) noise schedule.\n        clipped_idx = torch.searchsorted(torch.flip(self.lambda_t, [0]), self.config.lambda_min_clipped)\n        last_timestep = ((self.config.num_train_timesteps - clipped_idx).numpy()).item()\n\n        # \"linspace\", \"leading\", \"trailing\" corresponds to annotation of Table 2. of https://arxiv.org/abs/2305.08891\n        if self.config.timestep_spacing == \"linspace\":\n            timesteps = (\n                np.linspace(0, last_timestep - 1, num_inference_steps + 1).round()[::-1][:-1].copy().astype(np.int64)\n            )\n\n        elif self.config.timestep_spacing == \"leading\":\n            step_ratio = last_timestep // (num_inference_steps + 1)\n            # creates integer timesteps by multiplying by ratio\n            # casting to int to avoid issues when num_inference_step is power of 3\n            timesteps = (np.arange(0, num_inference_steps + 1) * step_ratio).round()[::-1][:-1].copy().astype(np.int64)\n            timesteps += self.config.steps_offset\n        elif self.config.timestep_spacing == \"trailing\":\n            step_ratio = self.config.num_train_timesteps / num_inference_steps\n            # creates integer timesteps by multiplying by ratio\n            # casting to int to avoid issues when num_inference_step is power of 3\n            timesteps = np.arange(last_timestep, 0, -step_ratio).round().copy().astype(np.int64)\n            timesteps -= 1\n        else:\n            raise ValueError(\n                f\"{self.config.timestep_spacing} is not supported. Please make sure to choose one of 'linspace', 'leading' or 'trailing'.\"\n            )\n\n        sigmas = np.array(((1 - self.alphas_cumprod) / self.alphas_cumprod) ** 0.5)\n        if self.config.use_karras_sigmas:\n            log_sigmas = np.log(sigmas)\n            sigmas = self._convert_to_karras(in_sigmas=sigmas, num_inference_steps=num_inference_steps)\n            timesteps = np.array([self._sigma_to_t(sigma, log_sigmas) for sigma in sigmas]).round()\n            timesteps = np.flip(timesteps).copy().astype(np.int64)\n\n        self.sigmas = torch.from_numpy(sigmas)\n\n        # when num_inference_steps == num_train_timesteps, we can end up with\n        # duplicates in timesteps.\n        _, unique_indices = np.unique(timesteps, return_index=True)\n        timesteps = timesteps[np.sort(unique_indices)]\n\n        self.timesteps = torch.from_numpy(timesteps).to(device)\n\n        self.num_inference_steps = len(timesteps)\n\n        self.model_outputs = [\n                                 None,\n                             ] * max(self.config.predictor_order, self.config.corrector_order - 1)\n        self.lower_order_nums = 0\n        self.last_sample = None\n\n    # Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler._threshold_sample\n    def _threshold_sample(self, sample: torch.FloatTensor) -> torch.FloatTensor:\n        \"\"\"\n        \"Dynamic thresholding: At each sampling step we set s to a certain percentile absolute pixel value in xt0 (the\n        prediction of x_0 at timestep t), and if s > 1, then we threshold xt0 to the range [-s, s] and then divide by\n        s. Dynamic thresholding pushes saturated pixels (those near -1 and 1) inwards, thereby actively preventing\n        pixels from saturation at each step. We find that dynamic thresholding results in significantly better\n        photorealism as well as better image-text alignment, especially when using very large guidance weights.\"\n\n        https://arxiv.org/abs/2205.11487\n        \"\"\"\n        dtype = sample.dtype\n        batch_size, channels, height, width = sample.shape\n\n        if dtype not in (torch.float32, torch.float64):\n            sample = sample.float()  # upcast for quantile calculation, and clamp not implemented for cpu half\n\n        # Flatten sample for doing quantile calculation along each image\n        sample = sample.reshape(batch_size, channels * height * width)\n\n        abs_sample = sample.abs()  # \"a certain percentile absolute pixel value\"\n\n        s = torch.quantile(abs_sample, self.config.dynamic_thresholding_ratio, dim=1)\n        s = torch.clamp(\n            s, min=1, max=self.config.sample_max_value\n        )  # When clamped to min=1, equivalent to standard clipping to [-1, 1]\n\n        s = s.unsqueeze(1)  # (batch_size, 1) because clamp will broadcast along dim=0\n        sample = torch.clamp(sample, -s, s) / s  # \"we threshold xt0 to the range [-s, s] and then divide by s\"\n\n        sample = sample.reshape(batch_size, channels, height, width)\n        sample = sample.to(dtype)\n\n        return sample\n\n    # Copied from diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler._sigma_to_t\n    def _sigma_to_t(self, sigma, log_sigmas):\n        # get log sigma\n        log_sigma = np.log(sigma)\n\n        # get distribution\n        dists = log_sigma - log_sigmas[:, np.newaxis]\n\n        # get sigmas range\n        low_idx = np.cumsum((dists >= 0), axis=0).argmax(axis=0).clip(max=log_sigmas.shape[0] - 2)\n        high_idx = low_idx + 1\n\n        low = log_sigmas[low_idx]\n        high = log_sigmas[high_idx]\n\n        # interpolate sigmas\n        w = (low - log_sigma) / (low - high)\n        w = np.clip(w, 0, 1)\n\n        # transform interpolation to time range\n        t = (1 - w) * low_idx + w * high_idx\n        t = t.reshape(sigma.shape)\n        return t\n\n    # Copied from diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler._convert_to_karras\n    def _convert_to_karras(self, in_sigmas: torch.FloatTensor, num_inference_steps) -> torch.FloatTensor:\n        \"\"\"Constructs the noise schedule of Karras et al. (2022).\"\"\"\n\n        sigma_min: float = in_sigmas[-1].item()\n        sigma_max: float = in_sigmas[0].item()\n\n        rho = 7.0  # 7.0 is the value used in the paper\n        ramp = np.linspace(0, 1, num_inference_steps)\n        min_inv_rho = sigma_min ** (1 / rho)\n        max_inv_rho = sigma_max ** (1 / rho)\n        return (max_inv_rho + ramp * (min_inv_rho - max_inv_rho)) ** rho\n\n    def convert_model_output(\n            self, model_output: torch.FloatTensor, timestep: int, sample: torch.FloatTensor\n    ) -> torch.FloatTensor:\n        \"\"\"\n        Convert the model output to the corresponding type the DPMSolver/DPMSolver++ algorithm needs. DPM-Solver is\n        designed to discretize an integral of the noise prediction model, and DPM-Solver++ is designed to discretize an\n        integral of the data prediction model.\n\n        <Tip>\n\n        The algorithm and model type are decoupled. You can use either DPMSolver or DPMSolver++ for both noise\n        prediction and data prediction models.\n\n        </Tip>\n\n        Args:\n            model_output (`torch.FloatTensor`):\n                The direct output from the learned diffusion model.\n            timestep (`int`):\n                The current discrete timestep in the diffusion chain.\n            sample (`torch.FloatTensor`):\n                A current instance of a sample created by the diffusion process.\n\n        Returns:\n            `torch.FloatTensor`:\n                The converted model output.\n        \"\"\"\n\n        # SA-Solver_data_prediction needs to solve an integral of the data prediction model.\n        if self.config.algorithm_type in [\"data_prediction\"]:\n            if self.config.prediction_type == \"epsilon\":\n                # SA-Solver only needs the \"mean\" output.\n                if self.config.variance_type in [\"learned\", \"learned_range\"]:\n                    model_output = model_output[:, :3]\n                alpha_t, sigma_t = self.alpha_t[timestep], self.sigma_t[timestep]\n                x0_pred = (sample - sigma_t * model_output) / alpha_t\n            elif self.config.prediction_type == \"sample\":\n                x0_pred = model_output\n            elif self.config.prediction_type == \"v_prediction\":\n                alpha_t, sigma_t = self.alpha_t[timestep], self.sigma_t[timestep]\n                x0_pred = alpha_t * sample - sigma_t * model_output\n            else:\n                raise ValueError(\n                    f\"prediction_type given as {self.config.prediction_type} must be one of `epsilon`, `sample`, or\"\n                    \" `v_prediction` for the SASolverScheduler.\"\n                )\n\n            if self.config.thresholding:\n                x0_pred = self._threshold_sample(x0_pred)\n\n            return x0_pred\n\n        # SA-Solver_noise_prediction needs to solve an integral of the noise prediction model.\n        elif self.config.algorithm_type in [\"noise_prediction\"]:\n            if self.config.prediction_type == \"epsilon\":\n                # SA-Solver only needs the \"mean\" output.\n                if self.config.variance_type in [\"learned\", \"learned_range\"]:\n                    epsilon = model_output[:, :3]\n                else:\n                    epsilon = model_output\n            elif self.config.prediction_type == \"sample\":\n                alpha_t, sigma_t = self.alpha_t[timestep], self.sigma_t[timestep]\n                epsilon = (sample - alpha_t * model_output) / sigma_t\n            elif self.config.prediction_type == \"v_prediction\":\n                alpha_t, sigma_t = self.alpha_t[timestep], self.sigma_t[timestep]\n                epsilon = alpha_t * model_output + sigma_t * sample\n            else:\n                raise ValueError(\n                    f\"prediction_type given as {self.config.prediction_type} must be one of `epsilon`, `sample`, or\"\n                    \" `v_prediction` for the SASolverScheduler.\"\n                )\n\n            if self.config.thresholding:\n                alpha_t, sigma_t = self.alpha_t[timestep], self.sigma_t[timestep]\n                x0_pred = (sample - sigma_t * epsilon) / alpha_t\n                x0_pred = self._threshold_sample(x0_pred)\n                epsilon = (sample - alpha_t * x0_pred) / sigma_t\n\n            return epsilon\n\n    def get_coefficients_exponential_negative(self, order, interval_start, interval_end):\n        \"\"\"\n        Calculate the integral of exp(-x) * x^order dx from interval_start to interval_end\n        \"\"\"\n        assert order in [0, 1, 2, 3], \"order is only supported for 0, 1, 2 and 3\"\n\n        if order == 0:\n            return torch.exp(-interval_end) * (torch.exp(interval_end - interval_start) - 1)\n        elif order == 1:\n            return torch.exp(-interval_end) * (\n                        (interval_start + 1) * torch.exp(interval_end - interval_start) - (interval_end + 1))\n        elif order == 2:\n            return torch.exp(-interval_end) * (\n                        (interval_start ** 2 + 2 * interval_start + 2) * torch.exp(interval_end - interval_start) - (\n                            interval_end ** 2 + 2 * interval_end + 2))\n        elif order == 3:\n            return torch.exp(-interval_end) * (\n                        (interval_start ** 3 + 3 * interval_start ** 2 + 6 * interval_start + 6) * torch.exp(\n                    interval_end - interval_start) - (interval_end ** 3 + 3 * interval_end ** 2 + 6 * interval_end + 6))\n\n    def get_coefficients_exponential_positive(self, order, interval_start, interval_end, tau):\n        \"\"\"\n        Calculate the integral of exp(x(1+tau^2)) * x^order dx from interval_start to interval_end\n        \"\"\"\n        assert order in [0, 1, 2, 3], \"order is only supported for 0, 1, 2 and 3\"\n\n        # after change of variable(cov)\n        interval_end_cov = (1 + tau ** 2) * interval_end\n        interval_start_cov = (1 + tau ** 2) * interval_start\n\n        if order == 0:\n            return torch.exp(interval_end_cov) * (1 - torch.exp(-(interval_end_cov - interval_start_cov))) / (\n            (1 + tau ** 2))\n        elif order == 1:\n            return torch.exp(interval_end_cov) * ((interval_end_cov - 1) - (interval_start_cov - 1) * torch.exp(\n                -(interval_end_cov - interval_start_cov))) / ((1 + tau ** 2) ** 2)\n        elif order == 2:\n            return torch.exp(interval_end_cov) * ((interval_end_cov ** 2 - 2 * interval_end_cov + 2) - (\n                        interval_start_cov ** 2 - 2 * interval_start_cov + 2) * torch.exp(\n                -(interval_end_cov - interval_start_cov))) / ((1 + tau ** 2) ** 3)\n        elif order == 3:\n            return torch.exp(interval_end_cov) * (\n                        (interval_end_cov ** 3 - 3 * interval_end_cov ** 2 + 6 * interval_end_cov - 6) - (\n                            interval_start_cov ** 3 - 3 * interval_start_cov ** 2 + 6 * interval_start_cov - 6) * torch.exp(\n                    -(interval_end_cov - interval_start_cov))) / ((1 + tau ** 2) ** 4)\n\n    def lagrange_polynomial_coefficient(self, order, lambda_list):\n        \"\"\"\n        Calculate the coefficient of lagrange polynomial\n        \"\"\"\n\n        assert order in [0, 1, 2, 3]\n        assert order == len(lambda_list) - 1\n        if order == 0:\n            return [[1]]\n        elif order == 1:\n            return [[1 / (lambda_list[0] - lambda_list[1]), -lambda_list[1] / (lambda_list[0] - lambda_list[1])],\n                    [1 / (lambda_list[1] - lambda_list[0]), -lambda_list[0] / (lambda_list[1] - lambda_list[0])]]\n        elif order == 2:\n            denominator1 = (lambda_list[0] - lambda_list[1]) * (lambda_list[0] - lambda_list[2])\n            denominator2 = (lambda_list[1] - lambda_list[0]) * (lambda_list[1] - lambda_list[2])\n            denominator3 = (lambda_list[2] - lambda_list[0]) * (lambda_list[2] - lambda_list[1])\n            return [[1 / denominator1,\n                     (-lambda_list[1] - lambda_list[2]) / denominator1,\n                     lambda_list[1] * lambda_list[2] / denominator1],\n\n                    [1 / denominator2,\n                     (-lambda_list[0] - lambda_list[2]) / denominator2,\n                     lambda_list[0] * lambda_list[2] / denominator2],\n\n                    [1 / denominator3,\n                     (-lambda_list[0] - lambda_list[1]) / denominator3,\n                     lambda_list[0] * lambda_list[1] / denominator3]\n                    ]\n        elif order == 3:\n            denominator1 = (lambda_list[0] - lambda_list[1]) * (lambda_list[0] - lambda_list[2]) * (\n                        lambda_list[0] - lambda_list[3])\n            denominator2 = (lambda_list[1] - lambda_list[0]) * (lambda_list[1] - lambda_list[2]) * (\n                        lambda_list[1] - lambda_list[3])\n            denominator3 = (lambda_list[2] - lambda_list[0]) * (lambda_list[2] - lambda_list[1]) * (\n                        lambda_list[2] - lambda_list[3])\n            denominator4 = (lambda_list[3] - lambda_list[0]) * (lambda_list[3] - lambda_list[1]) * (\n                        lambda_list[3] - lambda_list[2])\n            return [[1 / denominator1,\n                     (-lambda_list[1] - lambda_list[2] - lambda_list[3]) / denominator1,\n                     (lambda_list[1] * lambda_list[2] + lambda_list[1] * lambda_list[3] + lambda_list[2] * lambda_list[\n                         3]) / denominator1,\n                     (-lambda_list[1] * lambda_list[2] * lambda_list[3]) / denominator1],\n\n                    [1 / denominator2,\n                     (-lambda_list[0] - lambda_list[2] - lambda_list[3]) / denominator2,\n                     (lambda_list[0] * lambda_list[2] + lambda_list[0] * lambda_list[3] + lambda_list[2] * lambda_list[\n                         3]) / denominator2,\n                     (-lambda_list[0] * lambda_list[2] * lambda_list[3]) / denominator2],\n\n                    [1 / denominator3,\n                     (-lambda_list[0] - lambda_list[1] - lambda_list[3]) / denominator3,\n                     (lambda_list[0] * lambda_list[1] + lambda_list[0] * lambda_list[3] + lambda_list[1] * lambda_list[\n                         3]) / denominator3,\n                     (-lambda_list[0] * lambda_list[1] * lambda_list[3]) / denominator3],\n\n                    [1 / denominator4,\n                     (-lambda_list[0] - lambda_list[1] - lambda_list[2]) / denominator4,\n                     (lambda_list[0] * lambda_list[1] + lambda_list[0] * lambda_list[2] + lambda_list[1] * lambda_list[\n                         2]) / denominator4,\n                     (-lambda_list[0] * lambda_list[1] * lambda_list[2]) / denominator4]\n\n                    ]\n\n    def get_coefficients_fn(self, order, interval_start, interval_end, lambda_list, tau):\n        assert order in [1, 2, 3, 4]\n        assert order == len(lambda_list), 'the length of lambda list must be equal to the order'\n        coefficients = []\n        lagrange_coefficient = self.lagrange_polynomial_coefficient(order - 1, lambda_list)\n        for i in range(order):\n            coefficient = sum(\n                lagrange_coefficient[i][j]\n                * self.get_coefficients_exponential_positive(\n                    order - 1 - j, interval_start, interval_end, tau\n                )\n                if self.predict_x0\n                else lagrange_coefficient[i][j]\n                * self.get_coefficients_exponential_negative(\n                    order - 1 - j, interval_start, interval_end\n                )\n                for j in range(order)\n            )\n            coefficients.append(coefficient)\n        assert len(coefficients) == order, 'the length of coefficients does not match the order'\n        return coefficients\n\n    def stochastic_adams_bashforth_update(\n            self,\n            model_output: torch.FloatTensor,\n            prev_timestep: int,\n            sample: torch.FloatTensor,\n            noise: torch.FloatTensor,\n            order: int,\n            tau: torch.FloatTensor,\n    ) -> torch.FloatTensor:\n        \"\"\"\n        One step for the SA-Predictor.\n\n        Args:\n            model_output (`torch.FloatTensor`):\n                The direct output from the learned diffusion model at the current timestep.\n            prev_timestep (`int`):\n                The previous discrete timestep in the diffusion chain.\n            sample (`torch.FloatTensor`):\n                A current instance of a sample created by the diffusion process.\n            order (`int`):\n                The order of SA-Predictor at this timestep.\n\n        Returns:\n            `torch.FloatTensor`:\n                The sample tensor at the previous timestep.\n        \"\"\"\n\n        assert noise is not None\n        timestep_list = self.timestep_list\n        model_output_list = self.model_outputs\n        s0, t = self.timestep_list[-1], prev_timestep\n        lambda_t, lambda_s0 = self.lambda_t[t], self.lambda_t[s0]\n        alpha_t, alpha_s0 = self.alpha_t[t], self.alpha_t[s0]\n        sigma_t, sigma_s0 = self.sigma_t[t], self.sigma_t[s0]\n        gradient_part = torch.zeros_like(sample)\n        h = lambda_t - lambda_s0\n        lambda_list = [self.lambda_t[timestep_list[-(i + 1)]] for i in range(order)]\n        gradient_coefficients = self.get_coefficients_fn(order, lambda_s0, lambda_t, lambda_list, tau)\n\n        x = sample\n\n        if self.predict_x0 and order == 2:\n            gradient_coefficients[0] += 1.0 * torch.exp((1 + tau ** 2) * lambda_t) * (\n                        h ** 2 / 2 - (h * (1 + tau ** 2) - 1 + torch.exp((1 + tau ** 2) * (-h))) / (\n                            (1 + tau ** 2) ** 2)) / (self.lambda_t[timestep_list[-1]] - self.lambda_t[\n                timestep_list[-2]])\n            gradient_coefficients[1] -= 1.0 * torch.exp((1 + tau ** 2) * lambda_t) * (\n                        h ** 2 / 2 - (h * (1 + tau ** 2) - 1 + torch.exp((1 + tau ** 2) * (-h))) / (\n                            (1 + tau ** 2) ** 2)) / (self.lambda_t[timestep_list[-1]] - self.lambda_t[\n                timestep_list[-2]])\n\n        for i in range(order):\n            if self.predict_x0:\n\n                gradient_part += (1 + tau ** 2) * sigma_t * torch.exp(- tau ** 2 * lambda_t) * gradient_coefficients[\n                    i] * model_output_list[-(i + 1)]\n            else:\n                gradient_part += -(1 + tau ** 2) * alpha_t * gradient_coefficients[i] * model_output_list[-(i + 1)]\n\n        if self.predict_x0:\n            noise_part = sigma_t * torch.sqrt(1 - torch.exp(-2 * tau ** 2 * h)) * noise\n        else:\n            noise_part = tau * sigma_t * torch.sqrt(torch.exp(2 * h) - 1) * noise\n\n        if self.predict_x0:\n            x_t = torch.exp(-tau ** 2 * h) * (sigma_t / sigma_s0) * x + gradient_part + noise_part\n        else:\n            x_t = (alpha_t / alpha_s0) * x + gradient_part + noise_part\n\n        x_t = x_t.to(x.dtype)\n        return x_t\n\n    def stochastic_adams_moulton_update(\n            self,\n            this_model_output: torch.FloatTensor,\n            this_timestep: int,\n            last_sample: torch.FloatTensor,\n            last_noise: torch.FloatTensor,\n            this_sample: torch.FloatTensor,\n            order: int,\n            tau: torch.FloatTensor,\n    ) -> torch.FloatTensor:\n        \"\"\"\n        One step for the SA-Corrector.\n\n        Args:\n            this_model_output (`torch.FloatTensor`):\n                The model outputs at `x_t`.\n            this_timestep (`int`):\n                The current timestep `t`.\n            last_sample (`torch.FloatTensor`):\n                The generated sample before the last predictor `x_{t-1}`.\n            this_sample (`torch.FloatTensor`):\n                The generated sample after the last predictor `x_{t}`.\n            order (`int`):\n                The order of SA-Corrector at this step.\n\n        Returns:\n            `torch.FloatTensor`:\n                The corrected sample tensor at the current timestep.\n        \"\"\"\n\n        assert last_noise is not None\n        timestep_list = self.timestep_list\n        model_output_list = self.model_outputs\n        s0, t = self.timestep_list[-1], this_timestep\n        lambda_t, lambda_s0 = self.lambda_t[t], self.lambda_t[s0]\n        alpha_t, alpha_s0 = self.alpha_t[t], self.alpha_t[s0]\n        sigma_t, sigma_s0 = self.sigma_t[t], self.sigma_t[s0]\n        gradient_part = torch.zeros_like(this_sample)\n        h = lambda_t - lambda_s0\n        t_list = timestep_list + [this_timestep]\n        lambda_list = [self.lambda_t[t_list[-(i + 1)]] for i in range(order)]\n        model_prev_list = model_output_list + [this_model_output]\n\n        gradient_coefficients = self.get_coefficients_fn(order, lambda_s0, lambda_t, lambda_list, tau)\n\n        x = last_sample\n\n        if self.predict_x0 and order == 2:\n            gradient_coefficients[0] += 1.0 * torch.exp((1 + tau ** 2) * lambda_t) * (\n                        h / 2 - (h * (1 + tau ** 2) - 1 + torch.exp((1 + tau ** 2) * (-h))) / (\n                            (1 + tau ** 2) ** 2 * h))\n            gradient_coefficients[1] -= 1.0 * torch.exp((1 + tau ** 2) * lambda_t) * (\n                        h / 2 - (h * (1 + tau ** 2) - 1 + torch.exp((1 + tau ** 2) * (-h))) / (\n                            (1 + tau ** 2) ** 2 * h))\n\n        for i in range(order):\n            if self.predict_x0:\n                gradient_part += (1 + tau ** 2) * sigma_t * torch.exp(- tau ** 2 * lambda_t) * gradient_coefficients[\n                    i] * model_prev_list[-(i + 1)]\n            else:\n                gradient_part += -(1 + tau ** 2) * alpha_t * gradient_coefficients[i] * model_prev_list[-(i + 1)]\n\n        if self.predict_x0:\n            noise_part = sigma_t * torch.sqrt(1 - torch.exp(-2 * tau ** 2 * h)) * last_noise\n        else:\n            noise_part = tau * sigma_t * torch.sqrt(torch.exp(2 * h) - 1) * last_noise\n\n        if self.predict_x0:\n            x_t = torch.exp(-tau ** 2 * h) * (sigma_t / sigma_s0) * x + gradient_part + noise_part\n        else:\n            x_t = (alpha_t / alpha_s0) * x + gradient_part + noise_part\n\n        x_t = x_t.to(x.dtype)\n        return x_t\n\n    def step(\n            self,\n            model_output: torch.FloatTensor,\n            timestep: int,\n            sample: torch.FloatTensor,\n            generator=None,\n            return_dict: bool = True,\n    ) -> Union[SchedulerOutput, Tuple]:\n        \"\"\"\n        Predict the sample from the previous timestep by reversing the SDE. This function propagates the sample with\n        the SA-Solver.\n\n        Args:\n            model_output (`torch.FloatTensor`):\n                The direct output from learned diffusion model.\n            timestep (`int`):\n                The current discrete timestep in the diffusion chain.\n            sample (`torch.FloatTensor`):\n                A current instance of a sample created by the diffusion process.\n            generator (`torch.Generator`, *optional*):\n                A random number generator.\n            return_dict (`bool`):\n                Whether or not to return a [`~schedulers.scheduling_utils.SchedulerOutput`] or `tuple`.\n\n        Returns:\n            [`~schedulers.scheduling_utils.SchedulerOutput`] or `tuple`:\n                If return_dict is `True`, [`~schedulers.scheduling_utils.SchedulerOutput`] is returned, otherwise a\n                tuple is returned where the first element is the sample tensor.\n\n        \"\"\"\n        if self.num_inference_steps is None:\n            raise ValueError(\n                \"Number of inference steps is 'None', you need to run 'set_timesteps' after creating the scheduler\"\n            )\n\n        if isinstance(timestep, torch.Tensor):\n            timestep = timestep.to(self.timesteps.device)\n        step_index = (self.timesteps == timestep).nonzero()\n        if len(step_index) == 0:\n            step_index = len(self.timesteps) - 1\n        else:\n            step_index = step_index.item()\n\n        use_corrector = (\n                step_index > 0 and self.last_sample is not None\n        )\n\n        model_output_convert = self.convert_model_output(model_output, timestep, sample)\n\n        if use_corrector:\n            current_tau = self.tau_func(self.timestep_list[-1])\n            sample = self.stochastic_adams_moulton_update(\n                this_model_output=model_output_convert,\n                this_timestep=timestep,\n                last_sample=self.last_sample,\n                last_noise=self.last_noise,\n                this_sample=sample,\n                order=self.this_corrector_order,\n                tau=current_tau,\n            )\n\n        prev_timestep = 0 if step_index == len(self.timesteps) - 1 else self.timesteps[step_index + 1]\n\n        for i in range(max(self.config.predictor_order, self.config.corrector_order - 1) - 1):\n            self.model_outputs[i] = self.model_outputs[i + 1]\n            self.timestep_list[i] = self.timestep_list[i + 1]\n\n        self.model_outputs[-1] = model_output_convert\n        self.timestep_list[-1] = timestep\n\n        noise = randn_tensor(\n            model_output.shape, generator=generator, device=model_output.device, dtype=model_output.dtype\n        )\n\n        if self.config.lower_order_final:\n            this_predictor_order = min(self.config.predictor_order, len(self.timesteps) - step_index)\n            this_corrector_order = min(self.config.corrector_order, len(self.timesteps) - step_index + 1)\n        else:\n            this_predictor_order = self.config.predictor_order\n            this_corrector_order = self.config.corrector_order\n\n        self.this_predictor_order = min(this_predictor_order, self.lower_order_nums + 1)  # warmup for multistep\n        self.this_corrector_order = min(this_corrector_order, self.lower_order_nums + 2)  # warmup for multistep\n        assert self.this_predictor_order > 0\n        assert self.this_corrector_order > 0\n\n        self.last_sample = sample\n        self.last_noise = noise\n\n        current_tau = self.tau_func(self.timestep_list[-1])\n        prev_sample = self.stochastic_adams_bashforth_update(\n            model_output=model_output_convert,\n            prev_timestep=prev_timestep,\n            sample=sample,\n            noise=noise,\n            order=self.this_predictor_order,\n            tau=current_tau,\n        )\n\n        if self.lower_order_nums < max(self.config.predictor_order, self.config.corrector_order - 1):\n            self.lower_order_nums += 1\n\n        if not return_dict:\n            return (prev_sample,)\n\n        return SchedulerOutput(prev_sample=prev_sample)\n\n    def scale_model_input(self, sample: torch.FloatTensor, *args, **kwargs) -> torch.FloatTensor:\n        \"\"\"\n        Ensures interchangeability with schedulers that need to scale the denoising model input depending on the\n        current timestep.\n\n        Args:\n            sample (`torch.FloatTensor`):\n                The input sample.\n\n        Returns:\n            `torch.FloatTensor`:\n                A scaled input sample.\n        \"\"\"\n        return sample\n\n    # Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler.add_noise\n    def add_noise(\n            self,\n            original_samples: torch.FloatTensor,\n            noise: torch.FloatTensor,\n            timesteps: torch.IntTensor,\n    ) -> torch.FloatTensor:\n        # Make sure alphas_cumprod and timestep have same device and dtype as original_samples\n        alphas_cumprod = self.alphas_cumprod.to(device=original_samples.device, dtype=original_samples.dtype)\n        timesteps = timesteps.to(original_samples.device)\n\n        sqrt_alpha_prod = alphas_cumprod[timesteps] ** 0.5\n        sqrt_alpha_prod = sqrt_alpha_prod.flatten()\n        while len(sqrt_alpha_prod.shape) < len(original_samples.shape):\n            sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1)\n\n        sqrt_one_minus_alpha_prod = (1 - alphas_cumprod[timesteps]) ** 0.5\n        sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.flatten()\n        while len(sqrt_one_minus_alpha_prod.shape) < len(original_samples.shape):\n            sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1)\n\n        return sqrt_alpha_prod * original_samples + sqrt_one_minus_alpha_prod * noise\n\n    def __len__(self):\n        return self.config.num_train_timesteps"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/utils/__init__.py",
    "content": ""
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/utils/checkpoint.py",
    "content": "import os\nimport re\nimport torch\n\nfrom diffusion.utils.logger import get_root_logger\n\n\ndef save_checkpoint(work_dir,\n                    epoch,\n                    model,\n                    model_ema=None,\n                    optimizer=None,\n                    lr_scheduler=None,\n                    keep_last=False,\n                    step=None,\n                    ):\n    os.makedirs(work_dir, exist_ok=True)\n    state_dict = dict(state_dict=model.state_dict())\n    if model_ema is not None:\n        state_dict['state_dict_ema'] = model_ema.state_dict()\n    if optimizer is not None:\n        state_dict['optimizer'] = optimizer.state_dict()\n    if lr_scheduler is not None:\n        state_dict['scheduler'] = lr_scheduler.state_dict()\n    if epoch is not None:\n        state_dict['epoch'] = epoch\n        file_path = os.path.join(work_dir, f\"epoch_{epoch}.pth\")\n        if step is not None:\n            file_path = file_path.split('.pth')[0] + f\"_step_{step}.pth\"\n    logger = get_root_logger()\n    torch.save(state_dict, file_path)\n    logger.info(f'Saved checkpoint of epoch {epoch} to {file_path.format(epoch)}.')\n    if keep_last:\n        for i in range(epoch):\n            previous_ckgt = file_path.format(i)\n            if os.path.exists(previous_ckgt):\n                os.remove(previous_ckgt)\n\n\ndef load_checkpoint(checkpoint,\n                    model,\n                    model_ema=None,\n                    optimizer=None,\n                    lr_scheduler=None,\n                    load_ema=False,\n                    resume_optimizer=True,\n                    resume_lr_scheduler=True\n                    ):\n    assert isinstance(checkpoint, str)\n    ckpt_file = checkpoint\n    checkpoint = torch.load(ckpt_file, map_location=\"cpu\")\n\n    state_dict_keys = ['pos_embed', 'base_model.pos_embed', 'model.pos_embed']\n    for key in state_dict_keys:\n        if key in checkpoint['state_dict']:\n            del checkpoint['state_dict'][key]\n            if 'state_dict_ema' in checkpoint and key in checkpoint['state_dict_ema']:\n                del checkpoint['state_dict_ema'][key]\n            break\n\n    if load_ema:\n        state_dict = checkpoint['state_dict_ema']\n    else:\n        state_dict = checkpoint.get('state_dict', checkpoint)  # to be compatible with the official checkpoint\n    # model.load_state_dict(state_dict)\n    missing, unexpect = model.load_state_dict(state_dict, strict=False)\n    if model_ema is not None:\n        model_ema.load_state_dict(checkpoint['state_dict_ema'], strict=False)\n    if optimizer is not None and resume_optimizer:\n        optimizer.load_state_dict(checkpoint['optimizer'])\n    if lr_scheduler is not None and resume_lr_scheduler:\n        lr_scheduler.load_state_dict(checkpoint['scheduler'])\n    logger = get_root_logger()\n    if optimizer is not None:\n        epoch = checkpoint.get('epoch', re.match(r'.*epoch_(\\d*).*.pth', ckpt_file).group()[0])\n        logger.info(f'Resume checkpoint of epoch {epoch} from {ckpt_file}. Load ema: {load_ema}, '\n                    f'resume optimizer： {resume_optimizer}, resume lr scheduler: {resume_lr_scheduler}.')\n        return epoch, missing, unexpect\n    logger.info(f'Load checkpoint from {ckpt_file}. Load ema: {load_ema}.')\n    return missing, unexpect\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/utils/data_sampler.py",
    "content": "# Copyright (c) OpenMMLab. All rights reserved.\nimport os\nfrom typing import Sequence\nfrom torch.utils.data import BatchSampler, Sampler, Dataset\nfrom random import shuffle, choice\nfrom copy import deepcopy\nfrom diffusion.utils.logger import get_root_logger\n\n\nclass AspectRatioBatchSampler(BatchSampler):\n    \"\"\"A sampler wrapper for grouping images with similar aspect ratio into a same batch.\n\n    Args:\n        sampler (Sampler): Base sampler.\n        dataset (Dataset): Dataset providing data information.\n        batch_size (int): Size of mini-batch.\n        drop_last (bool): If ``True``, the sampler will drop the last batch if\n            its size would be less than ``batch_size``.\n        aspect_ratios (dict): The predefined aspect ratios.\n    \"\"\"\n\n    def __init__(self,\n                 sampler: Sampler,\n                 dataset: Dataset,\n                 batch_size: int,\n                 aspect_ratios: dict,\n                 drop_last: bool = False,\n                 config=None,\n                 valid_num=0,   # take as valid aspect-ratio when sample number >= valid_num\n                 **kwargs) -> None:\n        if not isinstance(sampler, Sampler):\n            raise TypeError('sampler should be an instance of ``Sampler``, '\n                            f'but got {sampler}')\n        if not isinstance(batch_size, int) or batch_size <= 0:\n            raise ValueError('batch_size should be a positive integer value, '\n                             f'but got batch_size={batch_size}')\n        self.sampler = sampler\n        self.dataset = dataset\n        self.batch_size = batch_size\n        self.aspect_ratios = aspect_ratios\n        self.drop_last = drop_last\n        self.ratio_nums_gt = kwargs.get('ratio_nums', None)\n        self.config = config\n        assert self.ratio_nums_gt\n        # buckets for each aspect ratio\n        self._aspect_ratio_buckets = {ratio: [] for ratio in aspect_ratios}\n        self.current_available_bucket_keys =  [str(k) for k, v in self.ratio_nums_gt.items() if v >= valid_num]\n        logger = get_root_logger() if config is None else get_root_logger(os.path.join(config.work_dir, 'train_log.log'))\n        logger.warning(f\"Using valid_num={valid_num} in config file. Available {len(self.current_available_bucket_keys)} aspect_ratios: {self.current_available_bucket_keys}\")\n\n    def __iter__(self) -> Sequence[int]:\n        for idx in self.sampler:\n            data_info = self.dataset.get_data_info(idx)\n            height, width =  data_info['height'], data_info['width']\n            ratio = height / width\n            # find the closest aspect ratio\n            closest_ratio = min(self.aspect_ratios.keys(), key=lambda r: abs(float(r) - ratio))\n            if closest_ratio not in self.current_available_bucket_keys:\n                continue\n            bucket = self._aspect_ratio_buckets[closest_ratio]\n            bucket.append(idx)\n            # yield a batch of indices in the same aspect ratio group\n            if len(bucket) == self.batch_size:\n                yield bucket[:]\n                del bucket[:]\n\n        # yield the rest data and reset the buckets\n        for bucket in self._aspect_ratio_buckets.values():\n            while len(bucket) > 0:\n                if len(bucket) <= self.batch_size:\n                    if not self.drop_last:\n                        yield bucket[:]\n                    bucket = []\n                else:\n                    yield bucket[:self.batch_size]\n                    bucket = bucket[self.batch_size:]\n\n\nclass BalancedAspectRatioBatchSampler(AspectRatioBatchSampler):\n    def __init__(self, *args, **kwargs):\n        super().__init__(*args, **kwargs)\n        # Assign samples to each bucket\n        self.ratio_nums_gt = kwargs.get('ratio_nums', None)\n        assert self.ratio_nums_gt\n        self._aspect_ratio_buckets = {float(ratio): [] for ratio in self.aspect_ratios.keys()}\n        self.original_buckets = {}\n        self.current_available_bucket_keys =  [k for k, v in self.ratio_nums_gt.items() if v >= 3000]\n        self.all_available_keys = deepcopy(self.current_available_bucket_keys)\n        self.exhausted_bucket_keys = []\n        self.total_batches = len(self.sampler) // self.batch_size\n        self._aspect_ratio_count = {}\n        for k in self.all_available_keys:\n            self._aspect_ratio_count[float(k)] = 0\n            self.original_buckets[float(k)] = []\n        logger = get_root_logger(os.path.join(self.config.work_dir, 'train_log.log'))\n        logger.warning(f\"Available {len(self.current_available_bucket_keys)} aspect_ratios: {self.current_available_bucket_keys}\")\n\n    def __iter__(self) -> Sequence[int]:\n        i = 0\n        for idx in self.sampler:\n            data_info = self.dataset.get_data_info(idx)\n            height, width = data_info['height'], data_info['width']\n            ratio = height / width\n            closest_ratio = float(min(self.aspect_ratios.keys(), key=lambda r: abs(float(r) - ratio)))\n            if closest_ratio not in self.all_available_keys:\n                continue\n            if self._aspect_ratio_count[closest_ratio] < self.ratio_nums_gt[closest_ratio]:\n                self._aspect_ratio_count[closest_ratio] += 1\n                self._aspect_ratio_buckets[closest_ratio].append(idx)\n                self.original_buckets[closest_ratio].append(idx)    # Save the original samples for each bucket\n            if not self.current_available_bucket_keys:\n                self.current_available_bucket_keys, self.exhausted_bucket_keys = self.exhausted_bucket_keys, []\n\n            if closest_ratio not in self.current_available_bucket_keys:\n                continue\n            key = closest_ratio\n            bucket = self._aspect_ratio_buckets[key]\n            if len(bucket) == self.batch_size:\n                yield bucket[:self.batch_size]\n                del bucket[:self.batch_size]\n                i += 1\n                self.exhausted_bucket_keys.append(key)\n                self.current_available_bucket_keys.remove(key)\n\n        for _ in range(self.total_batches - i):\n            key = choice(self.all_available_keys)\n            bucket = self._aspect_ratio_buckets[key]\n            if len(bucket) >= self.batch_size:\n                yield bucket[:self.batch_size]\n                del bucket[:self.batch_size]\n\n                # If a bucket is exhausted\n                if not bucket:\n                    self._aspect_ratio_buckets[key] = deepcopy(self.original_buckets[key][:])\n                    shuffle(self._aspect_ratio_buckets[key])\n            else:\n                self._aspect_ratio_buckets[key] = deepcopy(self.original_buckets[key][:])\n                shuffle(self._aspect_ratio_buckets[key])\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/utils/dist_utils.py",
    "content": "\"\"\"\nThis file contains primitives for multi-gpu communication.\nThis is useful when doing distributed training.\n\"\"\"\nimport os\nimport pickle\nimport shutil\n\nimport gc\nimport mmcv\nimport torch\nimport torch.distributed as dist\nfrom mmcv.runner import get_dist_info\n\n\ndef is_distributed():\n    return get_world_size() > 1\n\n\ndef get_world_size():\n    if not dist.is_available():\n        return 1\n    return dist.get_world_size() if dist.is_initialized() else 1\n\n\ndef get_rank():\n    if not dist.is_available():\n        return 0\n    return dist.get_rank() if dist.is_initialized() else 0\n\n\ndef get_local_rank():\n    if not dist.is_available():\n        return 0\n    return int(os.getenv('LOCAL_RANK', 0)) if dist.is_initialized() else 0\n\n\ndef is_master():\n    return get_rank() == 0\n\n\ndef is_local_master():\n    return get_local_rank() == 0\n\n\ndef get_local_proc_group(group_size=8):\n    world_size = get_world_size()\n    if world_size <= group_size or group_size == 1:\n        return None\n    assert world_size % group_size == 0, f'world size ({world_size}) should be evenly divided by group size ({group_size}).'\n    process_groups = getattr(get_local_proc_group, 'process_groups', {})\n    if group_size not in process_groups:\n        num_groups = dist.get_world_size() // group_size\n        groups = [list(range(i * group_size, (i + 1) * group_size)) for i in range(num_groups)]\n        process_groups.update({group_size: [torch.distributed.new_group(group) for group in groups]})\n        get_local_proc_group.process_groups = process_groups\n\n    group_idx = get_rank() // group_size\n    return get_local_proc_group.process_groups.get(group_size)[group_idx]\n\n\ndef synchronize():\n    \"\"\"\n    Helper function to synchronize (barrier) among all processes when\n    using distributed training\n    \"\"\"\n    if not dist.is_available():\n        return\n    if not dist.is_initialized():\n        return\n    world_size = dist.get_world_size()\n    if world_size == 1:\n        return\n    dist.barrier()\n\n\ndef all_gather(data):\n    \"\"\"\n    Run all_gather on arbitrary picklable data (not necessarily tensors)\n    Args:\n        data: any picklable object\n    Returns:\n        list[data]: list of data gathered from each rank\n    \"\"\"\n    to_device = torch.device(\"cuda\")\n    # to_device = torch.device(\"cpu\")\n\n    world_size = get_world_size()\n    if world_size == 1:\n        return [data]\n\n    # serialized to a Tensor\n    buffer = pickle.dumps(data)\n    storage = torch.ByteStorage.from_buffer(buffer)\n    tensor = torch.ByteTensor(storage).to(to_device)\n\n    # obtain Tensor size of each rank\n    local_size = torch.LongTensor([tensor.numel()]).to(to_device)\n    size_list = [torch.LongTensor([0]).to(to_device) for _ in range(world_size)]\n    dist.all_gather(size_list, local_size)\n    size_list = [int(size.item()) for size in size_list]\n    max_size = max(size_list)\n\n    tensor_list = [\n        torch.ByteTensor(size=(max_size,)).to(to_device) for _ in size_list\n    ]\n    if local_size != max_size:\n        padding = torch.ByteTensor(size=(max_size - local_size,)).to(to_device)\n        tensor = torch.cat((tensor, padding), dim=0)\n    dist.all_gather(tensor_list, tensor)\n\n    data_list = []\n    for size, tensor in zip(size_list, tensor_list):\n        buffer = tensor.cpu().numpy().tobytes()[:size]\n        data_list.append(pickle.loads(buffer))\n\n    return data_list\n\n\ndef reduce_dict(input_dict, average=True):\n    \"\"\"\n    Args:\n        input_dict (dict): all the values will be reduced\n        average (bool): whether to do average or sum\n    Reduce the values in the dictionary from all processes so that process with rank\n    0 has the averaged results. Returns a dict with the same fields as\n    input_dict, after reduction.\n    \"\"\"\n    world_size = get_world_size()\n    if world_size < 2:\n        return input_dict\n    with torch.no_grad():\n        reduced_dict = _extracted_from_reduce_dict_14(input_dict, average, world_size)\n    return reduced_dict\n\n\n# TODO Rename this here and in `reduce_dict`\ndef _extracted_from_reduce_dict_14(input_dict, average, world_size):\n    names = []\n    values = []\n    # sort the keys so that they are consistent across processes\n    for k in sorted(input_dict.keys()):\n        names.append(k)\n        values.append(input_dict[k])\n    values = torch.stack(values, dim=0)\n    dist.reduce(values, dst=0)\n    if dist.get_rank() == 0 and average:\n        # only main process gets accumulated, so only divide by\n        # world_size in this case\n        values /= world_size\n    return dict(zip(names, values))\n\n\ndef broadcast(data, **kwargs):\n    if get_world_size() == 1:\n        return data\n    data = [data]\n    dist.broadcast_object_list(data, **kwargs)\n    return data[0]\n\n\ndef all_gather_cpu(result_part, tmpdir=None, collect_by_master=True):\n    rank, world_size = get_dist_info()\n    if tmpdir is None:\n        tmpdir = './tmp'\n    if rank == 0:\n        mmcv.mkdir_or_exist(tmpdir)\n    synchronize()\n    # dump the part result to the dir\n    mmcv.dump(result_part, os.path.join(tmpdir, f'part_{rank}.pkl'))\n    synchronize()\n    if collect_by_master and rank != 0:\n        return None\n    # load results of all parts from tmp dir\n    results = []\n    for i in range(world_size):\n        part_file = os.path.join(tmpdir, f'part_{i}.pkl')\n        results.append(mmcv.load(part_file))\n    if not collect_by_master:\n        synchronize()\n    # remove tmp dir\n    if rank == 0:\n        shutil.rmtree(tmpdir)\n    return results\n\ndef all_gather_tensor(tensor, group_size=None, group=None):\n    if group_size is None:\n        group_size = get_world_size()\n    if group_size == 1:\n        output = [tensor]\n    else:\n        output = [torch.zeros_like(tensor) for _ in range(group_size)]\n        dist.all_gather(output, tensor, group=group)\n    return output\n\n\ndef gather_difflen_tensor(feat, num_samples_list, concat=True, group=None, group_size=None):\n    world_size = get_world_size()\n    if world_size == 1:\n        return feat if concat else [feat]\n    num_samples, *feat_dim = feat.size()\n    # padding to max number of samples\n    feat_padding = feat.new_zeros((max(num_samples_list), *feat_dim))\n    feat_padding[:num_samples] = feat\n    # gather\n    feat_gather = all_gather_tensor(feat_padding, group=group, group_size=group_size)\n    for r, num in enumerate(num_samples_list):\n        feat_gather[r] = feat_gather[r][:num]\n    if concat:\n        feat_gather = torch.cat(feat_gather)\n    return feat_gather\n\n\nclass GatherLayer(torch.autograd.Function):\n    '''Gather tensors from all process, supporting backward propagation.\n    '''\n\n    @staticmethod\n    def forward(ctx, input):\n        ctx.save_for_backward(input)\n        num_samples = torch.tensor(input.size(0), dtype=torch.long, device=input.device)\n        ctx.num_samples_list = all_gather_tensor(num_samples)\n        output = gather_difflen_tensor(input, ctx.num_samples_list, concat=False)\n        return tuple(output)\n\n    @staticmethod\n    def backward(ctx, *grads):  # tuple(output)'s grad\n        input, = ctx.saved_tensors\n        num_samples_list = ctx.num_samples_list\n        rank = get_rank()\n        start, end = sum(num_samples_list[:rank]), sum(num_samples_list[:rank + 1])\n        grads = torch.cat(grads)\n        if is_distributed():\n            dist.all_reduce(grads)\n        grad_out = torch.zeros_like(input)\n        grad_out[:] = grads[start:end]\n        return grad_out, None, None\n\n\nclass GatherLayerWithGroup(torch.autograd.Function):\n    '''Gather tensors from all process, supporting backward propagation.\n    '''\n\n    @staticmethod\n    def forward(ctx, input, group, group_size):\n        ctx.save_for_backward(input)\n        ctx.group_size = group_size\n        output = all_gather_tensor(input, group=group, group_size=group_size)\n        return tuple(output)\n\n    @staticmethod\n    def backward(ctx, *grads):  # tuple(output)'s grad\n        input, = ctx.saved_tensors\n        grads = torch.stack(grads)\n        if is_distributed():\n            dist.all_reduce(grads)\n        grad_out = torch.zeros_like(input)\n        grad_out[:] = grads[get_rank() % ctx.group_size]\n        return grad_out, None, None\n\n\ndef gather_layer_with_group(data, group=None, group_size=None):\n    if group_size is None:\n        group_size = get_world_size()\n    return GatherLayer.apply(data, group, group_size)\n\nfrom typing import Union\nimport math\n# from torch.distributed.fsdp.fully_sharded_data_parallel import TrainingState_, _calc_grad_norm\n\n@torch.no_grad()\ndef clip_grad_norm_(\n    self, max_norm: Union[float, int], norm_type: Union[float, int] = 2.0\n) -> None:\n    self._lazy_init()\n    self._wait_for_previous_optim_step()\n    assert self._is_root, \"clip_grad_norm should only be called on the root (parent) instance\"\n    self._assert_state(TrainingState_.IDLE)\n\n    max_norm = float(max_norm)\n    norm_type = float(norm_type)\n    # Computes the max norm for this shard's gradients and sync's across workers\n    local_norm = _calc_grad_norm(self.params_with_grad, norm_type).cuda()  # type: ignore[arg-type]\n    if norm_type == math.inf:\n        total_norm = local_norm\n        dist.all_reduce(total_norm, op=torch.distributed.ReduceOp.MAX, group=self.process_group)\n    else:\n        total_norm = local_norm ** norm_type\n        dist.all_reduce(total_norm, group=self.process_group)\n        total_norm = total_norm ** (1.0 / norm_type)\n\n    clip_coef = torch.tensor(max_norm, dtype=total_norm.dtype, device=total_norm.device) / (total_norm + 1e-6)\n    if clip_coef < 1:\n        # multiply by clip_coef, aka, (max_norm/total_norm).\n        for p in self.params_with_grad:\n            assert p.grad is not None\n            p.grad.detach().mul_(clip_coef.to(p.grad.device))\n    return total_norm\n\n\ndef flush():\n    gc.collect()\n    torch.cuda.empty_cache()\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/utils/logger.py",
    "content": "import logging\nimport os\nimport torch.distributed as dist\nfrom datetime import datetime\nfrom .dist_utils import is_local_master\nfrom mmcv.utils.logging import logger_initialized\n\n\ndef get_root_logger(log_file=None, log_level=logging.INFO, name='PixArt'):\n    \"\"\"Get root logger.\n\n    Args:\n        log_file (str, optional): File path of log. Defaults to None.\n        log_level (int, optional): The level of logger.\n            Defaults to logging.INFO.\n        name (str): logger name\n    Returns:\n        :obj:`logging.Logger`: The obtained logger\n    \"\"\"\n    if log_file is None:\n        log_file = '/dev/null'\n    return get_logger(name=name, log_file=log_file, log_level=log_level)\n\n\ndef get_logger(name, log_file=None, log_level=logging.INFO):\n    \"\"\"Initialize and get a logger by name.\n\n    If the logger has not been initialized, this method will initialize the\n    logger by adding one or two handlers, otherwise the initialized logger will\n    be directly returned. During initialization, a StreamHandler will always be\n    added. If `log_file` is specified and the process rank is 0, a FileHandler\n    will also be added.\n\n    Args:\n        name (str): Logger name.\n        log_file (str | None): The log filename. If specified, a FileHandler\n            will be added to the logger.\n        log_level (int): The logger level. Note that only the process of\n            rank 0 is affected, and other processes will set the level to\n            \"Error\" thus be silent most of the time.\n\n    Returns:\n        logging.Logger: The expected logger.\n    \"\"\"\n    logger = logging.getLogger(name)\n    logger.propagate = False  # disable root logger to avoid duplicate logging\n\n    if name in logger_initialized:\n        return logger\n    # handle hierarchical names\n    # e.g., logger \"a\" is initialized, then logger \"a.b\" will skip the\n    # initialization since it is a child of \"a\".\n    for logger_name in logger_initialized:\n        if name.startswith(logger_name):\n            return logger\n\n    stream_handler = logging.StreamHandler()\n    handlers = [stream_handler]\n\n    rank = dist.get_rank() if dist.is_available() and dist.is_initialized() else 0\n    # only rank 0 will add a FileHandler\n    if rank == 0 and log_file is not None:\n        file_handler = logging.FileHandler(log_file, 'w')\n        handlers.append(file_handler)\n\n    formatter = logging.Formatter(\n        '%(asctime)s - %(name)s - %(levelname)s - %(message)s')\n    for handler in handlers:\n        handler.setFormatter(formatter)\n        handler.setLevel(log_level)\n        logger.addHandler(handler)\n\n    # only rank0 for each node will print logs\n    log_level = log_level if is_local_master() else logging.ERROR\n    logger.setLevel(log_level)\n\n    logger_initialized[name] = True\n\n    return logger\n\ndef rename_file_with_creation_time(file_path):\n    # 获取文件的创建时间\n    creation_time = os.path.getctime(file_path)\n    creation_time_str = datetime.fromtimestamp(creation_time).strftime('%Y-%m-%d_%H-%M-%S')\n\n    # 构建新的文件名\n    dir_name, file_name = os.path.split(file_path)\n    name, ext = os.path.splitext(file_name)\n    new_file_name = f\"{name}_{creation_time_str}{ext}\"\n    new_file_path = os.path.join(dir_name, new_file_name)\n\n    # 重命名文件\n    os.rename(file_path, new_file_path)\n    print(f\"File renamed to: {new_file_path}\")\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/utils/lr_scheduler.py",
    "content": "from diffusers import get_cosine_schedule_with_warmup, get_constant_schedule_with_warmup\nfrom torch.optim import Optimizer\nfrom torch.optim.lr_scheduler import LambdaLR\nimport math\n\nfrom diffusion.utils.logger import get_root_logger\n\n\ndef build_lr_scheduler(config, optimizer, train_dataloader, lr_scale_ratio):\n    if not config.get('lr_schedule_args', None):\n        config.lr_schedule_args = {}\n    if config.get('lr_warmup_steps', None):\n        config['num_warmup_steps'] = config.get('lr_warmup_steps')  # for compatibility with old version\n\n    logger = get_root_logger()\n    logger.info(\n        f'Lr schedule: {config.lr_schedule}, ' + \",\".join(\n            [f\"{key}:{value}\" for key, value in config.lr_schedule_args.items()]) + '.')\n    if config.lr_schedule == 'cosine':\n        lr_scheduler = get_cosine_schedule_with_warmup(\n            optimizer=optimizer,\n            **config.lr_schedule_args,\n            num_training_steps=(len(train_dataloader) * config.num_epochs),\n        )\n    elif config.lr_schedule == 'constant':\n        lr_scheduler = get_constant_schedule_with_warmup(\n            optimizer=optimizer,\n            **config.lr_schedule_args,\n        )\n    elif config.lr_schedule == 'cosine_decay_to_constant':\n        assert lr_scale_ratio >= 1\n        lr_scheduler = get_cosine_decay_to_constant_with_warmup(\n            optimizer=optimizer,\n            **config.lr_schedule_args,\n            final_lr=1 / lr_scale_ratio,\n            num_training_steps=(len(train_dataloader) * config.num_epochs),\n        )\n    else:\n        raise RuntimeError(f'Unrecognized lr schedule {config.lr_schedule}.')\n    return lr_scheduler\n\n\ndef get_cosine_decay_to_constant_with_warmup(optimizer: Optimizer,\n                                             num_warmup_steps: int,\n                                             num_training_steps: int,\n                                             final_lr: float = 0.0,\n                                             num_decay: float = 0.667,\n                                             num_cycles: float = 0.5,\n                                             last_epoch: int = -1\n                                             ):\n    \"\"\"\n    Create a schedule with a cosine annealing lr followed by a constant lr.\n\n    Args:\n        optimizer ([`~torch.optim.Optimizer`]):\n            The optimizer for which to schedule the learning rate.\n        num_warmup_steps (`int`):\n            The number of steps for the warmup phase.\n        num_training_steps (`int`):\n            The number of total training steps.\n        final_lr (`int`):\n            The final constant lr after cosine decay.\n        num_decay (`int`):\n            The\n        last_epoch (`int`, *optional*, defaults to -1):\n            The index of the last epoch when resuming training.\n\n    Return:\n        `torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.\n    \"\"\"\n\n    def lr_lambda(current_step):\n        if current_step < num_warmup_steps:\n            return float(current_step) / float(max(1, num_warmup_steps))\n\n        num_decay_steps = int(num_training_steps * num_decay)\n        if current_step > num_decay_steps:\n            return final_lr\n\n        progress = float(current_step - num_warmup_steps) / float(max(1, num_decay_steps - num_warmup_steps))\n        return (\n            max(\n                0.0,\n                0.5 * (1.0 + math.cos(math.pi * num_cycles * 2.0 * progress)),\n            )\n            * (1 - final_lr)\n        ) + final_lr\n\n    return LambdaLR(optimizer, lr_lambda, last_epoch)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/utils/misc.py",
    "content": "import collections\nimport datetime\nimport os\nimport random\nimport subprocess\nimport time\nfrom multiprocessing import JoinableQueue, Process\n\nimport numpy as np\nimport torch\nimport torch.distributed as dist\nfrom mmcv import Config\nfrom mmcv.runner import get_dist_info\n\nfrom diffusion.utils.logger import get_root_logger\n\nos.environ[\"MOX_SILENT_MODE\"] = \"1\"  # mute moxing log\n\n\ndef read_config(file):\n    # solve config loading conflict when multi-processes\n    import time\n    while True:\n        config = Config.fromfile(file)\n        if len(config) == 0:\n            time.sleep(0.1)\n            continue\n        break\n    return config\n\n\ndef init_random_seed(seed=None, device='cuda'):\n    \"\"\"Initialize random seed.\n\n    If the seed is not set, the seed will be automatically randomized,\n    and then broadcast to all processes to prevent some potential bugs.\n\n    Args:\n        seed (int, Optional): The seed. Default to None.\n        device (str): The device where the seed will be put on.\n            Default to 'cuda'.\n\n    Returns:\n        int: Seed to be used.\n    \"\"\"\n    if seed is not None:\n        return seed\n\n    # Make sure all ranks share the same random seed to prevent\n    # some potential bugs. Please refer to\n    # https://github.com/open-mmlab/mmdetection/issues/6339\n    rank, world_size = get_dist_info()\n    seed = np.random.randint(2 ** 31)\n    if world_size == 1:\n        return seed\n\n    if rank == 0:\n        random_num = torch.tensor(seed, dtype=torch.int32, device=device)\n    else:\n        random_num = torch.tensor(0, dtype=torch.int32, device=device)\n    dist.broadcast(random_num, src=0)\n    return random_num.item()\n\n\ndef set_random_seed(seed, deterministic=False):\n    \"\"\"Set random seed.\n\n    Args:\n        seed (int): Seed to be used.\n        deterministic (bool): Whether to set the deterministic option for\n            CUDNN backend, i.e., set `torch.backends.cudnn.deterministic`\n            to True and `torch.backends.cudnn.benchmark` to False.\n            Default: False.\n    \"\"\"\n    random.seed(seed)\n    np.random.seed(seed)\n    torch.manual_seed(seed)\n    torch.cuda.manual_seed_all(seed)\n    if deterministic:\n        torch.backends.cudnn.deterministic = True\n        torch.backends.cudnn.benchmark = False\n\nclass SimpleTimer:\n    def __init__(self, num_tasks, log_interval=1, desc=\"Process\"):\n        self.num_tasks = num_tasks\n        self.desc = desc\n        self.count = 0\n        self.log_interval = log_interval\n        self.start_time = time.time()\n        self.logger = get_root_logger()\n\n    def log(self):\n        self.count += 1\n        if (self.count % self.log_interval) == 0 or self.count == self.num_tasks:\n            time_elapsed = time.time() - self.start_time\n            avg_time = time_elapsed / self.count\n            eta_sec = avg_time * (self.num_tasks - self.count)\n            eta_str = str(datetime.timedelta(seconds=int(eta_sec)))\n            elapsed_str = str(datetime.timedelta(seconds=int(time_elapsed)))\n            log_info = f\"{self.desc} [{self.count}/{self.num_tasks}], elapsed_time:{elapsed_str},\" \\\n                       f\" avg_time: {avg_time}, eta: {eta_str}.\"\n            self.logger.info(log_info)\n\n\nclass DebugUnderflowOverflow:\n    \"\"\"\n    This debug class helps detect and understand where the model starts getting very large or very small, and more\n    importantly `nan` or `inf` weight and activation elements.\n    There are 2 working modes:\n    1. Underflow/overflow detection (default)\n    2. Specific batch absolute min/max tracing without detection\n    Mode 1: Underflow/overflow detection\n    To activate the underflow/overflow detection, initialize the object with the model :\n    ```python\n    debug_overflow = DebugUnderflowOverflow(model)\n    ```\n    then run the training as normal and if `nan` or `inf` gets detected in at least one of the weight, input or\n    output elements this module will throw an exception and will print `max_frames_to_save` frames that lead to this\n    event, each frame reporting\n    1. the fully qualified module name plus the class name whose `forward` was run\n    2. the absolute min and max value of all elements for each module weights, and the inputs and output\n    For example, here is the header and the last few frames in detection report for `google/mt5-small` run in fp16 mixed precision :\n    ```\n    Detected inf/nan during batch_number=0\n    Last 21 forward frames:\n    abs min  abs max  metadata\n    [...]\n                      encoder.block.2.layer.1.DenseReluDense.wi_0 Linear\n    2.17e-07 4.50e+00 weight\n    1.79e-06 4.65e+00 input[0]\n    2.68e-06 3.70e+01 output\n                      encoder.block.2.layer.1.DenseReluDense.wi_1 Linear\n    8.08e-07 2.66e+01 weight\n    1.79e-06 4.65e+00 input[0]\n    1.27e-04 2.37e+02 output\n                      encoder.block.2.layer.1.DenseReluDense.wo Linear\n    1.01e-06 6.44e+00 weight\n    0.00e+00 9.74e+03 input[0]\n    3.18e-04 6.27e+04 output\n                      encoder.block.2.layer.1.DenseReluDense T5DenseGatedGeluDense\n    1.79e-06 4.65e+00 input[0]\n    3.18e-04 6.27e+04 output\n                      encoder.block.2.layer.1.dropout Dropout\n    3.18e-04 6.27e+04 input[0]\n    0.00e+00      inf output\n    ```\n    You can see here, that `T5DenseGatedGeluDense.forward` resulted in output activations, whose absolute max value\n    was around 62.7K, which is very close to fp16's top limit of 64K. In the next frame we have `Dropout` which\n    renormalizes the weights, after it zeroed some of the elements, which pushes the absolute max value to more than\n    64K, and we get an overlow.\n    As you can see it's the previous frames that we need to look into when the numbers start going into very large for\n    fp16 numbers.\n    The tracking is done in a forward hook, which gets invoked immediately after `forward` has completed.\n    By default the last 21 frames are printed. You can change the default to adjust for your needs. For example :\n    ```python\n    debug_overflow = DebugUnderflowOverflow(model, max_frames_to_save=100)\n    ```\n        To validate that you have set up this debugging feature correctly, and you intend to use it in a training that may\n        take hours to complete, first run it with normal tracing enabled for one of a few batches as explained in the next\n        section.\n        Mode 2. Specific batch absolute min/max tracing without detection\n        The second work mode is per-batch tracing with the underflow/overflow detection feature turned off.\n        Let's say you want to watch the absolute min and max values for all the ingredients of each `forward` call of a\n    given batch, and only do that for batches 1 and 3. Then you instantiate this class as :\n    ```python\n    debug_overflow = DebugUnderflowOverflow(model, trace_batch_nums=[1,3])\n    ```\n    And now full batches 1 and 3 will be traced using the same format as explained above. Batches are 0-indexed.\n    This is helpful if you know that the program starts misbehaving after a certain batch number, so you can\n    fast-forward right to that area.\n    Early stopping:\n    You can also specify the batch number after which to stop the training, with :\n    ```python\n    debug_overflow = DebugUnderflowOverflow(model, trace_batch_nums=[1,3], abort_after_batch_num=3)\n    ```\n    This feature is mainly useful in the tracing mode, but you can use it for any mode.\n    **Performance**:\n    As this module measures absolute `min`/``max` of each weight of the model on every forward it'll slow the\n    training down. Therefore remember to turn it off once the debugging needs have been met.\n    Args:\n        model (`nn.Module`):\n            The model to debug.\n        max_frames_to_save (`int`, *optional*, defaults to 21):\n            How many frames back to record\n        trace_batch_nums(`List[int]`, *optional*, defaults to `[]`):\n            Which batch numbers to trace (turns detection off)\n        abort_after_batch_num  (`int``, *optional*):\n            Whether to abort after a certain batch number has finished\n    \"\"\"\n\n    def __init__(self, model, max_frames_to_save=21, trace_batch_nums=None, abort_after_batch_num=None):\n        if trace_batch_nums is None:\n            trace_batch_nums = []\n        self.model = model\n        self.trace_batch_nums = trace_batch_nums\n        self.abort_after_batch_num = abort_after_batch_num\n\n        # keep a LIFO buffer of frames to dump as soon as inf/nan is encountered to give context to the problem emergence\n        self.frames = collections.deque([], max_frames_to_save)\n        self.frame = []\n        self.batch_number = 0\n        self.total_calls = 0\n        self.detected_overflow = False\n        self.prefix = \"                 \"\n\n        self.analyse_model()\n\n        self.register_forward_hook()\n\n    def save_frame(self, frame=None):\n        if frame is not None:\n            self.expand_frame(frame)\n        self.frames.append(\"\\n\".join(self.frame))\n        self.frame = []  # start a new frame\n\n    def expand_frame(self, line):\n        self.frame.append(line)\n\n    def trace_frames(self):\n        print(\"\\n\".join(self.frames))\n        self.frames = []\n\n    def reset_saved_frames(self):\n        self.frames = []\n\n    def dump_saved_frames(self):\n        print(f\"\\nDetected inf/nan during batch_number={self.batch_number} \"\n              f\"Last {len(self.frames)} forward frames:\"\n              f\"{'abs min':8} {'abs max':8} metadata\"\n              f\"'\\n'.join(self.frames)\"\n              f\"\\n\\n\")\n        self.frames = []\n\n    def analyse_model(self):\n        # extract the fully qualified module names, to be able to report at run time. e.g.:\n        # encoder.block.2.layer.0.SelfAttention.o\n        #\n        # for shared weights only the first shared module name will be registered\n        self.module_names = {m: name for name, m in self.model.named_modules()}\n        # self.longest_module_name = max(len(v) for v in self.module_names.values())\n\n    def analyse_variable(self, var, ctx):\n        if torch.is_tensor(var):\n            self.expand_frame(self.get_abs_min_max(var, ctx))\n            if self.detect_overflow(var, ctx):\n                self.detected_overflow = True\n        elif var is None:\n            self.expand_frame(f\"{'None':>17} {ctx}\")\n        else:\n            self.expand_frame(f\"{'not a tensor':>17} {ctx}\")\n\n    def batch_start_frame(self):\n        self.expand_frame(f\"\\n\\n{self.prefix} *** Starting batch number={self.batch_number} ***\")\n        self.expand_frame(f\"{'abs min':8} {'abs max':8} metadata\")\n\n    def batch_end_frame(self):\n        self.expand_frame(f\"{self.prefix} *** Finished batch number={self.batch_number - 1} ***\\n\\n\")\n\n    def create_frame(self, module, input, output):\n        self.expand_frame(f\"{self.prefix} {self.module_names[module]} {module.__class__.__name__}\")\n\n        # params\n        for name, p in module.named_parameters(recurse=False):\n            self.analyse_variable(p, name)\n\n        # inputs\n        if isinstance(input, tuple):\n            for i, x in enumerate(input):\n                self.analyse_variable(x, f\"input[{i}]\")\n        else:\n            self.analyse_variable(input, \"input\")\n\n        # outputs\n        if isinstance(output, tuple):\n            for i, x in enumerate(output):\n                # possibly a tuple of tuples\n                if isinstance(x, tuple):\n                    for j, y in enumerate(x):\n                        self.analyse_variable(y, f\"output[{i}][{j}]\")\n                else:\n                    self.analyse_variable(x, f\"output[{i}]\")\n        else:\n            self.analyse_variable(output, \"output\")\n\n        self.save_frame()\n\n    def register_forward_hook(self):\n        self.model.apply(self._register_forward_hook)\n\n    def _register_forward_hook(self, module):\n        module.register_forward_hook(self.forward_hook)\n\n    def forward_hook(self, module, input, output):\n        # - input is a tuple of packed inputs (could be non-Tensors)\n        # - output could be a Tensor or a tuple of Tensors and non-Tensors\n\n        last_frame_of_batch = False\n\n        trace_mode = self.batch_number in self.trace_batch_nums\n        if trace_mode:\n            self.reset_saved_frames()\n\n        if self.total_calls == 0:\n            self.batch_start_frame()\n        self.total_calls += 1\n\n        # count batch numbers - the very first forward hook of the batch will be called when the\n        # batch completes - i.e. it gets called very last - we know this batch has finished\n        if module == self.model:\n            self.batch_number += 1\n            last_frame_of_batch = True\n\n        self.create_frame(module, input, output)\n\n        # if last_frame_of_batch:\n        #     self.batch_end_frame()\n\n        if trace_mode:\n            self.trace_frames()\n\n        if last_frame_of_batch:\n            self.batch_start_frame()\n\n        if self.detected_overflow and not trace_mode:\n            self.dump_saved_frames()\n\n            # now we can abort, as it's pointless to continue running\n            raise ValueError(\n                \"DebugUnderflowOverflow: inf/nan detected, aborting as there is no point running further. \"\n                \"Please scroll up above this traceback to see the activation values prior to this event.\"\n            )\n\n        # abort after certain batch if requested to do so\n        if self.abort_after_batch_num is not None and self.batch_number > self.abort_after_batch_num:\n            raise ValueError(\n                f\"DebugUnderflowOverflow: aborting after {self.batch_number} batches due to `abort_after_batch_num={self.abort_after_batch_num}` arg\"\n            )\n\n    @staticmethod\n    def get_abs_min_max(var, ctx):\n        abs_var = var.abs()\n        return f\"{abs_var.min():8.2e} {abs_var.max():8.2e} {ctx}\"\n\n    @staticmethod\n    def detect_overflow(var, ctx):\n        \"\"\"\n        Report whether the tensor contains any `nan` or `inf` entries.\n        This is useful for detecting overflows/underflows and best to call right after the function that did some math that\n        modified the tensor in question.\n        This function contains a few other helper features that you can enable and tweak directly if you want to track\n        various other things.\n        Args:\n            var: the tensor variable to check\n            ctx: the message to print as a context\n        Return:\n            `True` if `inf` or `nan` was detected, `False` otherwise\n        \"\"\"\n        detected = False\n        if torch.isnan(var).any().item():\n            detected = True\n            print(f\"{ctx} has nans\")\n        if torch.isinf(var).any().item():\n            detected = True\n            print(f\"{ctx} has infs\")\n        if var.dtype == torch.float32 and torch.ge(var.abs(), 65535).any().item():\n            detected = True\n            print(f\"{ctx} has overflow values {var.abs().max().item()}.\")\n        return detected\n"
  },
  {
    "path": "PixArt-alpha-ToCa/diffusion/utils/optimizer.py",
    "content": "import math\n\nfrom mmcv import Config\nfrom mmcv.runner import build_optimizer as mm_build_optimizer, OPTIMIZER_BUILDERS, DefaultOptimizerConstructor, \\\n    OPTIMIZERS\nfrom mmcv.utils import _BatchNorm, _InstanceNorm\nfrom torch.nn import GroupNorm, LayerNorm\n\nfrom .logger import get_root_logger\n\nfrom typing import Tuple, Optional, Callable\n\nimport torch\nfrom torch.optim.optimizer import Optimizer\n\n\ndef auto_scale_lr(effective_bs, optimizer_cfg, rule='linear', base_batch_size=256):\n    assert rule in ['linear', 'sqrt']\n    logger = get_root_logger()\n    # scale by world size\n    if rule == 'sqrt':\n        scale_ratio = math.sqrt(effective_bs / base_batch_size)\n    elif rule == 'linear':\n        scale_ratio = effective_bs / base_batch_size\n    optimizer_cfg['lr'] *= scale_ratio\n    logger.info(f'Automatically adapt lr to {optimizer_cfg[\"lr\"]:.7f} (using {rule} scaling rule).')\n    return scale_ratio\n\n\n@OPTIMIZER_BUILDERS.register_module()\nclass MyOptimizerConstructor(DefaultOptimizerConstructor):\n\n    def add_params(self, params, module, prefix='', is_dcn_module=None):\n        \"\"\"Add all parameters of module to the params list.\n\n        The parameters of the given module will be added to the list of param\n        groups, with specific rules defined by paramwise_cfg.\n\n        Args:\n            params (list[dict]): A list of param groups, it will be modified\n                in place.\n            module (nn.Module): The module to be added.\n            prefix (str): The prefix of the module\n\n        \"\"\"\n        # get param-wise options\n        custom_keys = self.paramwise_cfg.get('custom_keys', {})\n        # first sort with alphabet order and then sort with reversed len of str\n        # sorted_keys = sorted(sorted(custom_keys.keys()), key=len, reverse=True)\n\n        bias_lr_mult = self.paramwise_cfg.get('bias_lr_mult', 1.)\n        bias_decay_mult = self.paramwise_cfg.get('bias_decay_mult', 1.)\n        norm_decay_mult = self.paramwise_cfg.get('norm_decay_mult', 1.)\n        bypass_duplicate = self.paramwise_cfg.get('bypass_duplicate', False)\n\n        # special rules for norm layers and depth-wise conv layers\n        is_norm = isinstance(module,\n                             (_BatchNorm, _InstanceNorm, GroupNorm, LayerNorm))\n\n        for name, param in module.named_parameters(recurse=False):\n            base_lr = self.base_lr\n            if name == 'bias' and not is_norm and not is_dcn_module:\n                base_lr *= bias_lr_mult\n\n            # apply weight decay policies\n            base_wd = self.base_wd\n                # norm decay\n            if is_norm:\n                if self.base_wd is not None:\n                    base_wd *= norm_decay_mult\n            elif name == 'bias' and not is_dcn_module:\n                if self.base_wd is not None:\n                    # TODO: current bias_decay_mult will have affect on DCN\n                    base_wd *= bias_decay_mult\n\n            param_group = {'params': [param]}\n            if not param.requires_grad:\n                param_group['requires_grad'] = False\n                params.append(param_group)\n                continue\n            if bypass_duplicate and self._is_in(param_group, params):\n                logger = get_root_logger()\n                logger.warn(f'{prefix} is duplicate. It is skipped since '\n                            f'bypass_duplicate={bypass_duplicate}')\n                continue\n            # if the parameter match one of the custom keys, ignore other rules\n            is_custom = False\n            for key in custom_keys:\n                scope, key_name = key if isinstance(key, tuple) else (None, key)\n                if scope is not None and scope not in f'{prefix}':\n                    continue\n                if key_name in f'{prefix}.{name}':\n                    is_custom = True\n                    if 'lr_mult' in custom_keys[key]:\n                        # if 'base_classes' in f'{prefix}.{name}' or 'attn_base' in f'{prefix}.{name}':\n                        #     param_group['lr'] = self.base_lr\n                        # else:\n                        param_group['lr'] = self.base_lr * custom_keys[key]['lr_mult']\n                    elif 'lr' not in param_group:\n                        param_group['lr'] = base_lr\n                    if self.base_wd is not None:\n                        if 'decay_mult' in custom_keys[key]:\n                            param_group['weight_decay'] = self.base_wd * custom_keys[key]['decay_mult']\n                        elif 'weight_decay' not in param_group:\n                            param_group['weight_decay'] = base_wd\n\n            if not is_custom:\n                # bias_lr_mult affects all bias parameters\n                # except for norm.bias dcn.conv_offset.bias\n                if base_lr != self.base_lr:\n                    param_group['lr'] = base_lr\n                if base_wd != self.base_wd:\n                    param_group['weight_decay'] = base_wd\n            params.append(param_group)\n\n        for child_name, child_mod in module.named_children():\n            child_prefix = f'{prefix}.{child_name}' if prefix else child_name\n            self.add_params(\n                params,\n                child_mod,\n                prefix=child_prefix,\n                is_dcn_module=is_dcn_module)\n\n\ndef build_optimizer(model, optimizer_cfg):\n    # default parameter-wise config\n    logger = get_root_logger()\n\n    if hasattr(model, 'module'):\n        model = model.module\n    # set optimizer constructor\n    optimizer_cfg.setdefault('constructor', 'MyOptimizerConstructor')\n    # parameter-wise setting: cancel weight decay for some specific modules\n    custom_keys = dict()\n    for name, module in model.named_modules():\n        if hasattr(module, 'zero_weight_decay'):\n            custom_keys |= {\n                (name, key): dict(decay_mult=0)\n                for key in module.zero_weight_decay\n            }\n\n    paramwise_cfg = Config(dict(cfg=dict(custom_keys=custom_keys)))\n    if given_cfg := optimizer_cfg.get('paramwise_cfg'):\n        paramwise_cfg.merge_from_dict(dict(cfg=given_cfg))\n    optimizer_cfg['paramwise_cfg'] = paramwise_cfg.cfg\n    # build optimizer\n    optimizer = mm_build_optimizer(model, optimizer_cfg)\n\n    weight_decay_groups = dict()\n    lr_groups = dict()\n    for group in optimizer.param_groups:\n        if not group.get('requires_grad', True): continue\n        lr_groups.setdefault(group['lr'], []).append(group)\n        weight_decay_groups.setdefault(group['weight_decay'], []).append(group)\n\n    learnable_count, fix_count = 0, 0\n    for p in model.parameters():\n        if p.requires_grad:\n            learnable_count += 1\n        else:\n            fix_count += 1\n    fix_info = f\"{learnable_count} are learnable, {fix_count} are fix\"\n    lr_info = \"Lr group: \" + \", \".join([f'{len(group)} params with lr {lr:.5f}' for lr, group in lr_groups.items()])\n    wd_info = \"Weight decay group: \" + \", \".join(\n        [f'{len(group)} params with weight decay {wd}' for wd, group in weight_decay_groups.items()])\n    opt_info = f\"Optimizer: total {len(optimizer.param_groups)} param groups, {fix_info}. {lr_info}; {wd_info}.\"\n    logger.info(opt_info)\n\n    return optimizer\n\n\n@OPTIMIZERS.register_module()\nclass Lion(Optimizer):\n    def __init__(\n            self,\n            params,\n            lr: float = 1e-4,\n            betas: Tuple[float, float] = (0.9, 0.99),\n            weight_decay: float = 0.0,\n    ):\n        assert lr > 0.\n        assert all(0. <= beta <= 1. for beta in betas)\n\n        defaults = dict(lr=lr, betas=betas, weight_decay=weight_decay)\n\n        super().__init__(params, defaults)\n\n    @staticmethod\n    def update_fn(p, grad, exp_avg, lr, wd, beta1, beta2):\n        # stepweight decay\n        p.data.mul_(1 - lr * wd)\n\n        # weight update\n        update = exp_avg.clone().lerp_(grad, 1 - beta1).sign_()\n        p.add_(update, alpha=-lr)\n\n        # decay the momentum running average coefficient\n        exp_avg.lerp_(grad, 1 - beta2)\n\n    @staticmethod\n    def exists(val):\n        return val is not None\n\n    @torch.no_grad()\n    def step(\n            self,\n            closure: Optional[Callable] = None\n    ):\n\n        loss = None\n        if self.exists(closure):\n            with torch.enable_grad():\n                loss = closure()\n\n        for group in self.param_groups:\n            for p in filter(lambda p: self.exists(p.grad), group['params']):\n\n                grad, lr, wd, beta1, beta2, state = p.grad, group['lr'], group['weight_decay'], *group['betas'], \\\n                                                    self.state[p]\n\n                # init state - exponential moving average of gradient values\n                if len(state) == 0:\n                    state['exp_avg'] = torch.zeros_like(p)\n\n                exp_avg = state['exp_avg']\n\n                self.update_fn(\n                    p,\n                    grad,\n                    exp_avg,\n                    lr,\n                    wd,\n                    beta1,\n                    beta2\n                )\n\n        return loss\n"
  },
  {
    "path": "PixArt-alpha-ToCa/docker-compose.yml",
    "content": "version: \"3.8\"\nservices:\n  pixart:\n    container_name: pixart\n    image: pixart:latest\n    build:\n      context: .\n    ports:\n      - 12345:12345\n    environment:\n      - APP_CONTEXT=1024 #1024, 512, LCM\n    tmpfs:\n      - /tmp      \n    volumes:\n      - ./docker/cache/gradio:/workspace/gradio_cached_examples/30:rw\n      - ./docker/cache/huggingface:/root/.cache/huggingface:rw\n    deploy:\n      resources:\n        reservations:\n          devices:\n            - driver: nvidia\n              device_ids: ['0']\n              capabilities: [gpu]\n\n"
  },
  {
    "path": "PixArt-alpha-ToCa/docker-entrypoint.sh",
    "content": "#!/usr/bin/env bash\nset -Eeuo pipefail\n# Check if APP_CONTEXT matches one of the specific values\nif [ \"$APP_CONTEXT\" = \"1024\" ]; then\n    echo \"APP_CONTEXT is 1024\"\n    /usr/bin/python /workspace/app/app.py \"$@\"\nelif [ \"$APP_CONTEXT\" = \"512\" ]; then\n    echo \"APP_CONTEXT is 512\"\n    /usr/bin/python /workspace/app/app_512.py \"$@\"\nelif [ \"$APP_CONTEXT\" = \"LCM\" ]; then\n    echo \"APP_CONTEXT is LCM\"\n    /usr/bin/python /workspace/app/app_lcm.py \"$@\"\nelse\n    echo \"APP_CONTEXT is not set to 1024, 512, or LCM, defaulting to 1024\"\n    /usr/bin/python /workspace/app/app.py \"$@\"\nfi\n\n"
  },
  {
    "path": "PixArt-alpha-ToCa/docker-readme.md",
    "content": ""
  },
  {
    "path": "PixArt-alpha-ToCa/environment-pixart.yml",
    "content": "name: pixart\nchannels:\n  - defaults\ndependencies:\n  - _libgcc_mutex=0.1=main\n  - _openmp_mutex=5.1=1_gnu\n  - ca-certificates=2024.7.2=h06a4308_0\n  - ld_impl_linux-64=2.38=h1181459_1\n  - libffi=3.3=he6710b0_2\n  - libgcc-ng=11.2.0=h1234567_1\n  - libgomp=11.2.0=h1234567_1\n  - libstdcxx-ng=11.2.0=h1234567_1\n  - ncurses=6.4=h6a678d5_0\n  - openssl=1.1.1w=h7f8727e_0\n  - pip=24.2=py39h06a4308_0\n  - python=3.9.0=hdb3f193_2\n  - readline=8.2=h5eee18b_0\n  - setuptools=72.1.0=py39h06a4308_0\n  - sqlite=3.45.3=h5eee18b_0\n  - tk=8.6.14=h39e8969_0\n  - wheel=0.43.0=py39h06a4308_0\n  - xz=5.4.6=h5eee18b_1\n  - zlib=1.2.13=h5eee18b_1\n  - pip:\n    - absl-py==2.1.0\n    - accelerate==0.34.0\n    - addict==2.4.0\n    - aiofiles==23.2.1\n    - aiohappyeyeballs==2.4.0\n    - aiohttp==3.10.5\n    - aiosignal==1.3.1\n    - altair==5.4.1\n    - annotated-types==0.7.0\n    - anyio==4.4.0\n    - async-timeout==4.0.3\n    - attrs==24.2.0\n    - beautifulsoup4==4.12.3\n    - bs4==0.0.2\n    - certifi==2024.8.30\n    - charset-normalizer==3.3.2\n    - click==8.1.7\n    - coloredlogs==15.0.1\n    - contourpy==1.3.0\n    - cycler==0.12.1\n    - datasets==2.21.0\n    - diffusers==0.31.0.dev0\n    - dill==0.3.8\n    - einops==0.8.0\n    - exceptiongroup==1.2.2\n    - fastapi==0.112.2\n    - ffmpy==0.4.0\n    - filelock==3.15.4\n    - fonttools==4.53.1\n    - frozenlist==1.4.1\n    - fsspec==2024.6.1\n    - ftfy==6.2.3\n    - gradio==4.1.1\n    - gradio-client==0.7.0\n    - grpcio==1.66.1\n    - h11==0.14.0\n    - httpcore==1.0.5\n    - httpx==0.27.2\n    - huggingface-hub==0.24.6\n    - humanfriendly==10.0\n    - idna==3.8\n    - importlib-metadata==8.4.0\n    - importlib-resources==6.4.4\n    - jinja2==3.1.4\n    - jsonschema==4.23.0\n    - jsonschema-specifications==2023.12.1\n    - kiwisolver==1.4.5\n    - markdown==3.7\n    - markdown-it-py==3.0.0\n    - markupsafe==2.1.5\n    - matplotlib==3.9.2\n    - mdurl==0.1.2\n    - mmcv==1.7.0\n    - mpmath==1.3.0\n    - multidict==6.0.5\n    - multiprocess==0.70.16\n    - narwhals==1.6.1\n    - networkx==3.2.1\n    - numpy==1.26.4\n    - nvidia-cublas-cu12==12.1.3.1\n    - nvidia-cuda-cupti-cu12==12.1.105\n    - nvidia-cuda-nvrtc-cu12==12.1.105\n    - nvidia-cuda-runtime-cu12==12.1.105\n    - nvidia-cudnn-cu12==9.1.0.70\n    - nvidia-cufft-cu12==11.0.2.54\n    - nvidia-curand-cu12==10.3.2.106\n    - nvidia-cusolver-cu12==11.4.5.107\n    - nvidia-cusparse-cu12==12.1.0.106\n    - nvidia-nccl-cu12==2.20.5\n    - nvidia-nvjitlink-cu12==12.6.68\n    - nvidia-nvtx-cu12==12.1.105\n    - opencv-python==4.10.0.84\n    - optimum==1.21.4\n    - orjson==3.10.7\n    - packaging==24.1\n    - pandas==2.2.2\n    - peft==0.6.2\n    - pillow==10.4.0\n    - platformdirs==4.2.2\n    - protobuf==3.20.2\n    - psutil==6.0.0\n    - pyarrow==17.0.0\n    - pydantic==2.8.2\n    - pydantic-core==2.20.1\n    - pydub==0.25.1\n    - pygments==2.18.0\n    - pyparsing==3.1.4\n    - python-dateutil==2.9.0.post0\n    - python-multipart==0.0.9\n    - pytorch-fid==0.3.0\n    - pytz==2024.1\n    - pyyaml==6.0.2\n    - referencing==0.35.1\n    - regex==2024.7.24\n    - requests==2.32.3\n    - rich==13.8.0\n    - rpds-py==0.20.0\n    - safetensors==0.4.4\n    - scipy==1.13.1\n    - semantic-version==2.10.0\n    - sentencepiece==0.1.99\n    - shellingham==1.5.4\n    - six==1.16.0\n    - sniffio==1.3.1\n    - soupsieve==2.6\n    - starlette==0.38.4\n    - sympy==1.13.2\n    - tensorboard==2.17.1\n    - tensorboard-data-server==0.7.2\n    - tensorboardx==2.6.2.2\n    - timm==0.6.12\n    - tokenizers==0.19.1\n    - tomli==2.0.1\n    - tomlkit==0.12.0\n    - torch==2.4.0\n    - torchaudio==2.1.1+cu118\n    - torchvision==0.16.1+cu118\n    - tqdm==4.66.5\n    - transformers==4.43.4\n    - triton==3.0.0\n    - typer==0.12.5\n    - typing-extensions==4.12.2\n    - tzdata==2024.1\n    - urllib3==2.2.2\n    - uvicorn==0.30.6\n    - wcwidth==0.2.13\n    - websockets==11.0.3\n    - werkzeug==3.0.4\n    - xformers==0.0.27.post2\n    - xxhash==3.5.0\n    - yapf==0.40.1\n    - yarl==1.9.7\n    - zipp==3.20.1\nprefix: /root/miniconda3/envs/pixart\n"
  },
  {
    "path": "PixArt-alpha-ToCa/environment.yml",
    "content": "name: PixArt\nchannels:\n  - pytorch\n  - nvidia\ndependencies:\n  - python >= 3.8\n  - pytorch >= 1.13\n  - torchvision\n  - pytorch-cuda=11.7\n  - pip:\n    - timm==0.6.12\n    - diffusers\n    - accelerate\n    - mmcv==1.7.0\n    - diffusers\n    - accelerate==0.15.0\n    - tensorboard\n    - transformers==4.26.1\n    - sentencepiece~=0.1.97\n    - ftfy~=6.1.1\n    - beautifulsoup4~=4.11.1\n    - opencv-python\n    - bs4\n    - einops\n    - xformers"
  },
  {
    "path": "PixArt-alpha-ToCa/notebooks/PixArt_xl2_img512_internal_for_pokemon_sample_training.py",
    "content": "_base_ = ['/workspace/PixArt-alpha/configs/PixArt_xl2_internal.py']\ndata_root = '/workspace'\n\nimage_list_json = ['data_info.json',]\n\ndata = dict(type='InternalData', root='/workspace/pixart-pokemon', image_list_json=image_list_json, transform='default_train', load_vae_feat=True)\nimage_size = 512\n\n# model setting\nwindow_block_indexes = []\nwindow_size=0\nuse_rel_pos=False\nmodel = 'PixArt_XL_2'\nfp32_attention = True\nload_from = \"/workspace/PixArt-alpha/output/pretrained_models/PixArt-XL-2-512x512.pth\"\nvae_pretrained = \"output/pretrained_models/sd-vae-ft-ema\"\nlewei_scale = 1.0\n\n# training setting\nuse_fsdp=False   # if use FSDP mode\nnum_workers=10\ntrain_batch_size = 38 # 32\nnum_epochs = 200 # 3\ngradient_accumulation_steps = 1\ngrad_checkpointing = True\ngradient_clip = 0.01\noptimizer = dict(type='AdamW', lr=2e-5, weight_decay=3e-2, eps=1e-10)\nlr_schedule_args = dict(num_warmup_steps=1000)\n\neval_sampling_steps = 200\nlog_interval = 20\nsave_model_steps=100\nwork_dir = 'output/debug'\n"
  },
  {
    "path": "PixArt-alpha-ToCa/notebooks/convert-checkpoint-to-diffusers.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 2,\n   \"id\": \"2878bb5d-33a3-4a5b-b15c-c832c700129b\",\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"/workspace/PixArt-alpha\\n\"\n     ]\n    },\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"/usr/local/lib/python3.10/dist-packages/IPython/core/magics/osm.py:417: UserWarning: using dhist requires you to install the `pickleshare` library.\\n\",\n      \"  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"%cd PixArt-alpha\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 14,\n   \"id\": \"7dd2d98c-3f8f-40f1-a9e1-bc916774afb3\",\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Total number of transformer parameters: 610856096\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"!python tools/convert_pixart_alpha_to_diffusers.py \\\\\\n\",\n    \"    --orig_ckpt_path \\\"/workspace/PixArt-alpha/output/trained_model/checkpoints/epoch_5_step_110.pth\\\" \\\\\\n\",\n    \"    --dump_path \\\"/workspace/PixArt-alpha/output/diffusers_trained\\\" \\\\\\n\",\n    \"    --only_transformer=True \\\\\\n\",\n    \"    --image_size 512 \\\\\\n\",\n    \"    --multi_scale_train=False\\n\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3 (ipykernel)\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.10.12\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "PixArt-alpha-ToCa/notebooks/infer.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"id\": \"8b2458c4-c461-4ddc-af94-fcd837357da4\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from diffusers import PixArtAlphaPipeline\\n\",\n    \"import torch\\n\",\n    \"from diffusers import Transformer2DModel\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"81a5bc0f-682b-4ff9-92e9-43b68b3df8fc\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# for comparison\\n\",\n    \"\\n\",\n    \"orig_pipe = pipe = PixArtAlphaPipeline.from_pretrained(\\\"PixArt-alpha/PixArt-XL-2-512x512\\\", torch_dtype=torch.float16)\\n\",\n    \"orig_pipe = orig_pipe.to(\\\"cuda\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"efc07821-5479-4ca3-a2c6-114ac484fd1e\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"transformer = Transformer2DModel.from_pretrained(\\\"/workspace/PixArt-alpha/output/diffusers_trained/transformer\\\", torch_dtype=torch.float16)\\n\",\n    \"pipe = PixArtAlphaPipeline.from_pretrained(\\\"PixArt-alpha/PixArt-XL-2-512x512\\\", torch_dtype=torch.float16, transformer=transformer)\\n\",\n    \"pipe = pipe.to(\\\"cuda\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 22,\n   \"id\": \"57da873b-2c13-463b-b558-ee69522ccefc\",\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"application/vnd.jupyter.widget-view+json\": {\n       \"model_id\": \"d69c7683773c4c25914764800ec1ef4f\",\n       \"version_major\": 2,\n       \"version_minor\": 0\n      },\n      \"text/plain\": [\n       \"  0%|          | 0/20 [00:00<?, ?it/s]\"\n      ]\n     },\n     \"metadata\": {},\n     \"output_type\": \"display_data\"\n    },\n    {\n     \"data\": {\n      \"image/png\": \"iVBORw0KGgoAAAANSUhEUgAAAgAAAAIACAIAAAB7GkOtAAEAAElEQVR4nOy9daBlV5E9vKr2Ofc+b3d3i3VHOi7EiZIggcECBA06gcGH38DA4PINMDDIEFyGAEkIcXdvSXfSSbv36+fv3nvO2bvq+2Pvc+/rJMzAIEn6nQVpef3kyjlVtVetWkUuS8lEABEBABQgQBFAKFCgQIEC+x8UQiKSx/4CBQoUKDB8oNwo/AsUKFCgwDACMZSAguopUKBAgWEHLkJ/gQIFCgxP8HP9AAoUKFCgwHODIgEUKFCgwDBFkQAKFChQYJiiSAAFChQoMExRJIACBQoUGKYoEkCBAgUKDFMUCaBAgQIFhimKBFCgQIECwxRFAihQoECBYYoiARQoUKDAMEWRAAoUKFBgmKJIAAUKFCgwTFEkgAIFChQYpigSQIECBQoMUxQJoECBAgWGKYoEUKBAgQLDFEUCKFCgQIFhiiIBFChQoMAwRZEAChQoUGCYokgABQoUKDBMUSSAAgUKFBimKBJAgQIFCgxTFAmgQIECBYYpigRQoECBAsMURQIoUKBAgWGKIgEUKFCgwDBFkQAKFChQYJiiSAAFChQoMExRJIACBQoUGKYoEkCBAgUKDFMUCaBAgQIFhimKBFCgQIECwxRFAihQoECBYYoiARQoUKDAMEWRAAoUKFBgmKJIAAUKFCgwTFEkgAIFChQYpigSQIECBQoMUxQJoECBAgWGKYoEUKBAgQLDFEUCKFCgQIFhiiIBFChQoMAwRZEAChQoUGCYokgABQoUKDBMUSSAAgUKFBimKBJAgQIFCgxTFAmgQIECBYYpigRQoECBAsMURQIoUKBAgWGKIgEUKFCgwDBFkQAKFChQYJiiSAAFChQoMExRJIACBQoUGKYoEkCBAgUKDFMUCaBAgQIFhimKBFCgQIECwxRFAihQoECBYYoiARQoUKDAMEWRAAoUKFBgmKJIAAUKFCgwTFEkgAIFChQYpigSQIECBQoMUxQJoECBAgWGKYoEUKBAgQLDFEUCKFCgQIFhiiIBFChQoMAwRZEAChQoUGCYInquH0CBAgUKPA+hjT8IAALTc/ho/kYgVf3fP6tAgQIF9lMoQM/2N1EoYAElNAGiIAL2+eQXPIoTQIECBYYL9o310PAhVSUFVCCiBBJrs0xqA0lPpbqpd/fDm57q3rNt7sRJ55586qi21v0pBRQJoECBAvsfnkZsqIZArwKCqhAyZzOX9vf1dO7du7e7b9Bmu/r69nR19+zt42yga/fmWn//rv6eQU339Fb7Ogc62kfA4LVnn0P7Udjcf55JgQIFhjfygh4kgKooAFUnNsuy6uBgd0/Ptm1bN2zcsHHLpo1bt23Zvmlv556+gZ7UZQmpaY3QwqaJS6Wm5hLHTTqyo600uW1kS/vc1qlpf+2hu7Y0t4wCSEWI9xP5TJEAChQo8AKEQmnInwFRslaVaGAwHaj0Pf7kkw89/PDGTRu3bl6/cdP6gVrfYGUwc1lc0lJTXG6KRoxqGTW9deGkRa1t5XJby7gJo5qbS83tho0Kab91lSSt9Gra7TY/unHl3Y+9530fPOekY6HUiP76gmeDigRQoECBFxBU1REZECngrHPOCnSgv7Jl+85HV619dMXK1WtX79q7fc/eXQNJf1tH85hxHR3TW2dOmjd23KhR49rHTRw5esyIEaPbSsZETERaLkeaSRSpMWnV9leT2t7+/oGq7e7q7N5ZefKe9ZtX9x5zyClvvOiNJTI8NOS/wKM/ChVQgQIFnt/QIXpMqAJEA4OVvv6+zTt2PLzy0d9fc9WGDU909XYOpIMwtqM9njl34px50ydPmzxz5pTRoztGj+hobm2Km2LDTKzKKioOGRROkIFt5khSRmq1P+V0Z1ff5h3d6zZ3DQxkj938eLKlrz2b+PtfXzd58sRSOdpPqJ8cxQmgQIECzzmehUxRFSJSkKhasdXqYG9Pz7onNq5c+/g999/x6IqHd3Rud2Xb1BRPnDlm+XGLFyybN3nymOmTx4wbPbLJlGImA2YDqCg5qFWCqGikoo5EVGAodgpRMDRTV0mT3X3dj2/etWlb75OPd9kB6u+Uvs3Vb1/+xQmTJzSVo/2A83kaigRQoECB5xyEwOQTAAEECuLe/sH+/sHHnnjy/ofvvfnm6zZveXxwsJdKOnnqmOVnLZy78Pipk8eNGzdy7JgR5XIUl0rMEFUFjJKSgkhYoaRgKCmpA4FUVKBkGE6c1qqsLDEPWGzb23vPg2u27awO9IpJm3t39/XtrLzqNf9w4vEnNJdM/jD3KxQUUIECBZ5bqNZnr4Aks32DfRs2bnpg5SO33nnXmpWP7tq73cXZ6EkjFh884/DDlixaMnPsyPZSOWppiSIiJiFRNhBhJYif2VUV/12ZBKIqpCBDDlBVEmWoQpIUCrGMXZW++x5+5IEHVw70ANoWo61v6+C2NVsnjxj/u1/9Ztq4sRGzKoFoP0sBxQmgQIECf3+oBtUmKUiBWpLu6dq78vHHbrjthkdW3Ldp65ODyUDr6HjBAZPPWHr0vLkzZ86eNnbsqFLMTKIkCqhaYhaAmEUVkQIEgUKIxBCDoHAKgYANFEpONVNjlAxltmRI1CSbNq+/6a5H1z61o6eCOGpvx4jqrqTvib2lbvryVz83efTYiBng/S32AygSQIECBf7O0PArKSgVu2PXjgcffujGm25e+cQjO/t2SimdOH3kWSceu2DJ7FnTp4wZ0dzaFMcRKTlyTo1V578DMZjEsOeNlCBKgJABhJTDWIASgcgX/FaYTBSTqrUZLKjCtfseuOvW2+/evdsqt5fiEca02ordvWVHz57et7z6jUcdsbwUsyoB2C8TQEEBFShQ4O8AH2hIFKKwgj1dvQ+tXX3V9VevXnFXV9fOUoeZs2jqIUvnzZs7ecrUsa1tzaUoVnJEyJw1xsCAlRiqIsxEBCgYjJya0dBEIJCqECmUVAlK6pwzxMRGbAYi50xNaUd351U3X7di9VpDpShqMlmriUeiqjvWbtv51M7D5h/5X9/7/sSxI3zlr1okgL8LRISYaP/rthQoMBwRpnN9/9QquvZ2PbJy9XW33nL3igc2795Q7sCM2aOOOvKQRQfPHj9mdHOM5jLBpVyCOJiYnarhSJVBGpGSgIwBhEFQIWUChWwA+IpfVVmZAKHcAQJECgWpUwvqyir3rlpx1ZU3DVSzUnszwZByE1rKWdve9XvXr9zULM3XXn3t/LmzSsbU4+N+GZKeXxSQAk502/ZNI0eOLDW3C6RsfPcdtJ++AQUK7I8Ikn3f3XWgzr09D65cccNdNzzw8G3b9mxxpWz6/AmvufCQJUvmTJk6tlSKYoqIUyiEQBFbVS5BCaxCbJkNKaCWDMCqqqJg4pzmyaO0+mAPL/0xgBAUFhyJUJpJprx66+bf3njTxk2bI+oodZQQOWthuCWSlsFttd3rerkWf+rz/7Jg7sy4Ti/tp+U/nm8JgAAR96WvfHVX584Js2ZB7bjmltGjJk6eOHvCuI6O1qZRo0e3tLc3tTRzVGIiKAwxQYsTQ4ECzxPkFD9bla6ePSseeeSGW+688+F7N+7ZhLZs8qzW0884cvHSBdMnTRjTRE0RCE6NgwgTAQwlEEOJlQlKZFQ8lwPiiFRViAAiBgjsiR8DUlIV8mx9iNoAlKBMSeoypq29ndfe9sD9j65UF8fxaOayEDmkHEdGSq4f25/auXfDzgsufOkF57w0MqZO+++v0R/PtwQAIIrjKGr6/S23xg+vGDeyxGmXEokZxRxFqRsxcnTTyI6pM+ZMGDNl8qTxk8ZNmDxx8vgx40aP6WgqlcqGGGr25/erQIHnKcSX/YBT9PQOPLhq1W+u/8MD99+ya8cmjbLJc8edeeKSAw6aPX/O5JaO5lLcBJIIIhBwrCpMBKV6s5UNBSJYKPxRCT70hxRDgDLAxAL2qqCIhBnOOWJwzE7UOmQa99rsmlvvuP3BR/oqKDWNYhdRbKxmqsxoK3PcmsZbHtvQvWXrkvlzPvmJf2lpioZJTfm86wEAevv9d5/76ovi8a3vfv9rJ44t9/d19/bb/sGUUt25q3N3d081zQb3DMKpIVT60vETJrd3jD5o/pIFc+Ysmjdv9pSpLU1lo8ZEjCErHP4qQ3z73SRggQL/Z2j9NwVlont7e1asWn3dzTff+eBdG7evR3M2elLzwQcvPGjZ7FkzpowZ3dZSMqwOsUKZSckz9IaIwAoONxcRMRMxAeoJfvUqTCWi3AKOiKFCALFRJVEQFKpMCkMQdcoOmoHvW73u2lvu2LylS0yzK5UMlwwrGU3VGuJmtHRErf1PdK269aFmW/r+t79/ymmnGq8ZHQb3+vMuASi0rzrwyje86prrrznrlS9+1SVnzl88JWIoIamaLE3Zqh1MBnt6e3s6d3Xt7e6vPPnklt17Bmt9g719yYiWMTMnzVs8d/FBiw+eMnnqrOnT2pqbGAookbfx2P/f1AIF/nbwLpwaSn5NkrSzq+veFQ9fe+cNDz16R2fXTilh3MyOpQdPnztn+txZU8eMHtVaKisrEQkRkXGwzDAKEmGwRqQKhjIpgQle48MMINC7BDDyI0HIAYq890sAMYNIRACyMJGVOKOos7/vR1f8/pHHNqhEXG6OuMTEFJFjcWKZTHNUbkqbsj3JQ1femfTWPv7BD733Xe9taWkaPlHieZcAAHWK6274wwWveQ13YN7S0W+77KULFs8tRcrUUlJuQ8wCFRuVOVHnJKokNklcb9fAtu17t2/vfPyxjbu37HWZtpjmGZNnzJ4ye/7c+fPnz582fUpTqclQCYo8GRQoUOBPhYa4rwKtVCvr1z9193333HnXfWs2rd5T3RuNwuRZHQsXz5q7aObUSePHtpdby3FJoQxWgiEBEUIPl1nrjA+IAK/mAYFZwT6eE+BJfALBEEDgMD8gMACYVaAkkTHOOd8ZIJZUokpk7nz40V//7qbu3jSK25hbyUhsWKHMxrIA2sTlFpRbspa7f3fLnnWdpx570g9/9F9jRo1g5iIBPJcQ6EBl8KJXveq2R+9rmkKzDmx7yctOOuaE5c2mXMqoRQw5iHOGQcgIBiBmFscOVE3TWmb37hnYsmHr5vXbtmzZvXlDV1ePi0yppWX0CceedOKxLzpg8YIxHa0MJuAvyQMFHVRgv4bCx1qwiAooFe3p71255olrb7v5vgfu3bb18cwlLWNaZiyYuPSQ6TNnjZs6bUxrW1McxQpf76sJwZ+IILCRiUmZVH1gBxHXy3gChSKfDIjBDFLyLA9IDUEocLlMnuoheFNosAN5oScyV9rVX/vZddfc/eAKE3WYqMVoZDgiVjKkClWCQYSorKUObnngt3d1bdo7bcyUX/36lwvmzC6VSs/1y/53xfMxAfjAesNNN130ltdjfLk0pjJ6NM55+UknH3/suPKIks2Q2jguEyA2M5EhYYUoEZisE2LORESpWkv7+9LtW/ufeGrnhvVbtu3o2rlld+fO3jHto4496qiLznvFsUcvbzJNTmD4z4vkz/qSFcmgwAsfQfcIv0SRWKAC9PT1PfDAI1ffdMMt9926vXOHRW3UhI4Z88YfcvDCeQdOnzh2REfZNJdAajUSVVb2rL0Rr9MTMUQUEYhIiaAMgIi8Zt//QCYmEJigDDYhI6gQALAyQUL/Vw0BRPBqfwdlow5klBPLa9Zv/8lvr9y4oztuHalc8ttbYsMWwmQESswRTBM3tVHrY7c/uPWBbaWK+eV///KEk48tRfxX7Be+IPB8TAAAFJpZ96a3X/LL269a9KLFlcFOW9p9zOGHXHjmmTMntEWZRKZdRdWRiVidKoGZASWQgh0pEdssI0Jay6ppbe+u3p079nTu6uzeXVm7dtvadbsGu5IpE2e89LyXnXve+TOnTy8B+PPaPsPnIikwXKAQ33n1kp6evv6HVq/672t+c8f9t+3euZFapH1CadaCyYcdPG/evJlTpowul01Eka/QlUhBVjJlIfIjWazgiDiCGmVQpEogZZ8giHJ+Xxnw9I0v9xnEAJHR0BhQCjY/3vPB+OQAFuUMbJyyFTOYuD/ccc81N9+pWYmiDjIxRWAGK4hhQVYdxxKpaZLWjqh94wMbV9+9IuqXz//Lpy5+y5ubm6L9ZtHjn47naQIAVIAn1j1+2NknTjps5oyDp3b1bqr27Jo3c9JFF5x40Mx5lLWRapmNnwtXBVhVwCBRkjAKQoYAsUROGbUkqVQG+7vt9m09nbsq27Z23//AyrWPbWRuPu7QYy5+7T8ce/QxLXErwOYZ10ER7AvsN3jWsSYNJb8SuJpVH3vi8Rtvuem6m25ct+XxNK6NmzVi9pKJixfOnrdoxqSxo9tjilkZouydlSGOlViUvAenQkREFUoUs4mYjTJgWEGGQELeqpMYQM7lgInzHKDsNUFhwktZVUkVTARGJOqiyDhywlaFE+Gtu3v++8obVz65Po5GMrUwxwpFLFAXwTgRjWNicqiWqNTm2qUzueXn96Y99h8uOPtLX/7C6HGj/1waYP/A8zYBANBU7Zve/aaf3vb7A08/pmNSU5p0dnZuHN9uXnrmGUcuPbYVUs6sISGKRFVZCGxgiMhaJWYQiwipMhklIQAMB0lqFZdJb1dl187uDZt23XvX6kcfemr77sG5Mxa++eI3n3/2S8aNbCeVEj/vhiQKFPgrItfzqIAVcNAduzuvuunG313/+yfW3J+kfeOnNh+4dNYhS+dOnzFpwqRR5XLMMEpghkAQqCLx/4FIvde+qoiEwEJgMmw48vemEhmIChERgcIGAOV8nJPJMEBM0GDjQzDQMPzrK3RiIQOnIoatmP6KPvjYE7/5w419g1mp3EJUYjKkzExOPJekKiT+OMHUxk1Ng+aO396x98nKsYce9Y1vfXnRwnmRAcDDsNB7PicAKNy6jU8de/4ZbvqI+csXR+X+xPX29exoYzrzRaedefTy8VwSyTiCiqeAiJVVAZCIkjcVInbWgUBsxDmwEglIVcUqVbOse2+ybvPu2+5a8dA9q9ev2zalddJrX/YPL7vworkzZ0TE/MwGwfC7Sgrsj/A+OVBQxaYPrVjxiyv++/b7b9+xd1v7pPLcxROPOPSAxYunjR8zorVkIiOkqgZQA1JREEhCAzfEeiFAgkZIFaoiECgxQIYjNjFMoHz8DUThRqprMZjYgJly5x6QkgbVkCiYoX5SOFMiJUqVd3QO3HL3w/c9+FhmI4rKbGJlS6REBCElIiZRZWaBEBCTGW06Hrl69aYVG8Y3TfrKV7583stfXGL/fYfjBOnzNgF4BYKz4r7y1S984FufPfrCF5tWrlAnRVVbqbi+yjHLlrzmjLPGjWwXVyMVopiIoaSqXselVuF7S0San3tVVdn5bx9gKMlSRLxh3Z4V9z9++0333f/AU+0ds88682Uvv/C8g5csboWJoA1p2LAYECmwX0I1WPR4WkW2795+w523/OoPP1/1xMpUq9MWjjn+6EWLFsydPXNqa0s5YhYoEzkIgQTifdcg/htQfSRAFQqGqoQjhYpfyaJqQGATGcPEBjDh1iYlsBeCslf2gJW8XghEgDQeNJSZADgVMpEjscoDSbryyY033/Hglq1dbFoMl5hIWYXJc0oMcuzJIyEYoyhxqYxy17qdD165oikrf/gDH3zne99VLoPZDNsb+nmbADxUIT29XfOOPRKzOw477fCB6m6LPlat9iTsBpZMGvPql7x8xuTx0AyAihiKCQAcAApBW5lYc4MSkNe2CYiIFSIwUJeRYUtswV29gw8+uPEP195/650rylH7iceecN6JZ55y7NGjW0fFiKLCla7ACxWaC/mpu6979crHfnP1b29/+I5d/Vtbx8sBh8479JilC2fPmNTR3FwiEhVfITFBTcgbfhQ3mK8FRwbNlzmGk7eGsVwnImEZI4hNxMzMBjDepjnU/hwmu1SM1/4D8AofFcpvXgchAwdh5gym5uzO7t67H1j90KonBvtBpoWZ/bEeRI4ARaQEgiNVIhhloRJFJRfFA003/Owa7ecjFx9x+c9+OHnSOArzxs/Vm/Ic4/mcAHy0Fqvy7R/+4B2ffN8xFx49aurYfttbS3qNGmROB/pGt7W+8sILDpg7LZKKIYpMOXCSREYZBFXlELUVyiDVoDqAQry/uDFkrcBAGDZLTak0OCAPP/zkT//7xrsfWBNh1EGHHH7hGReee/pZo0vlZnD8HL8yBQr86dCg6lEk1q7fvOUPN//+uhuuW/fkCtNGsxeNWX7kwiMOXzBm/OjmplZlYYGSUyBYbPpObmDsxVvuwys0c0sezVWjCHW7CtR5H2gFEzGzIWImVjAhrOsN/j5QYoWE0V/fOCCFEikxERkr3tOfySn1DqaPb952+wMrt2ztsZZBTUwEY4mUxEAg7CcQBMSh18waU9xkSy22vOKGR3as2zGuffz3vvf9Y45eHoXFAsOX130+JwD4t0XgBqqVE08/abPdccS5x0k7Kmk/skxsBotksFLS7KXnnHLCsiVNMZgMgQwZ9leXejYQAIPqQ+weEsoZFf8hcYCBISg5QFOJdtfsjXev/s2vr1+1el2JRx9/5MmvPu9Vpx+9vIPLJcTDTjJW4IWB3MPYc/xEVnX7rh133//Qb6++6tHV93dXdoybMfLgI+ceeeSyBfOmjGwpR2TBVsEwqg4ASSjOVV3uwqD5xpV8gTtp4OilwQKFB5ALSb0zIzHBMMEPBHiLT38CIN9JJrDPKArJ1aFgQwZQ5VQNRKnmsLu3775HHnt45VPdfRamhVACCaloLMF1GqxMBCESBYtCWQ1zE8otSfuOlZsevv7+Nm7753/71JvfdEkpP33sd4t+/ww8zxOAhzrIjdddc/6bXz7r+IUzDluQIRHNsiRJrZSoVBvod7b39KOWnn3qSSNiNipxXIYLlxSTgYRTJcK8oH+/NdwkTAIhsO8bizLUmrimECuCUqmzUrnr1oevuuquhx7dVG6afvxRL37LK199xOID/FHAPMcvToECQ6EKb4zGSugb7F+5etXlv/vZ3Q/csadz54h2e8CyGUefuGTeolmTJ4yNo5iFPA/vIJ6SV+sMsxD7kE5e9ONrqeCxTEqNbKAgUdH6EcDH9rzCZxAp2Fu3h1+UlPMZXg1LvIg0zA4zkUK9I5BTUsdqlfursm7bnjseefTJp7arlMlvg1eNiCDqWJWUoazkQutYAN9rMCVTbqPWrjUDD137IPUn73rbW//xIx/saGkO7tP7tdvz/4oXRAKAQNM0ueg1r7zq4WuPfcWpLePLVlMn1gE2FXGWYDGwd/HcKS8948wZ48YYUSWYiGDBahRKMI1jnobTHsGLy4acasVrjYWQKWxEBILAOMRd1fSmOx795a9vWvX41nHt0846/ayXnfXSw+cuaAFHMPQXmUoUKPBXQB6RVcAbNm+87sYbf3/z71etecS1JOPnjjjy6EOPOnTO9Mnj2lqjOFbjnLKKMMg4JQDMrF7SoyFaayj8KWQJaH4PkaqKbyoo/J/zEeLcvI0Yub4zjHuF2S9DflQgWDTCE7MUVP8s6iJEIAJlqXMZ0a7eyqNrNz782JNde2ugFoFRODLWwJEQKTuGQAyYIAJSw0rWd5KNiWLXYrronl8/2LO596TlR/7w5z8eN7YjNsY/oeHJ/NTxwkgAvvLYvHXbUaccl46tHnfhGZZrFRkQQ7CSOUdQOJek/RM6Wl9++mkHz5naZOCnxJkiUiIxOfuT+5CoXx7qV8VB4EBEogwoqacyfcMYEKUIStJU6k8rt956389/esOqx7eNGD3nrJPPf/35Fy2ZOceAymAz3C+nAs8BPAMvBAL115J7H3r4J7/+2T333NLVtXHslI7lx8497KgDFy+cO3JUeylWC0tgJTFKAlEvugcF8Sbys7K3XPOGnAjVP8LtQJ7xCb+K39XSeCQgGDYcCnsf1sNMgKogDHP5b8UAWJmhIFU4JsAwSJ1SKjKYuie2dt636olNm3e4RJRKpExEyhAVUiFiEpLAGoFASuxUNEqIjYEpm5grzY9c+9CutbuXTFv0jW9964hDDzT+KxqK1OGLF0oCgEKc6m9+8YtXvueS5S85euKBEweywUxr1jkTGSsimYhTSapjmnHmMcuPP/zg5sioqiFWq4YiUi9fUKY6W0l+16iSqOeJQkLQkCIAf7QVkGFvSshiSjsHq9ff+vDvfnvjunWbxjZPPe1F55592kVHLVnYQRJTxODhflkV+HtBg0cCtu3a9btrrvnlb3+15slVKNfmHDDx6JMOOPzgA2dMHdfRGhtAyYpYRCzK3n6ZQNrg9sOlHwp58dL+oeyIKogRVKCad8+cOPWObBCtMz9EOfvvg2ygelQlEC6ihpkaCYCJXJgpBgmhkrrdXT1rNmxd8fiWzt6Ks8SI2c+rAQCcF5DmmSUXe7A4QgTiRKGxa22V8lP3bXns7sdGNXV88d++9IqXv7RsfKob7uSPxwsmAQBQSJLUXnfJW//77l+/+I1ncmtUc7VEqmqYQFnmDFitE1dr0+SE5YeefMzyjnJkSNmByUC99hn+EieVMPwRLnu/OVoJ6isjnyoUKl4z6n2sSJQYzGzivqzy4D2P/vq3N9316GbTPO+0Ey98w/nnHDp7XiviyC+tLtJAgb8JNHduIAd5ZNWqy3/5o9tuu2FX7562sbrsqFknHHvAkiXzxo0ZbZggRqBsIOJUlBhKBCHAASKob1jxpEzO86uGYVxP3AfalOqTXKoQVafqVBVB8h/uLCYi5sD/UBBeh+8afjHwds8c+raqIFViq6hmrqdSXb9916rHN2zevivJCABT5HVHnD9QZ3I9h7+Hc0WRH0s2LOWoyVTatq7csvr2lc2u/KF/+sClb3t7SzlmMvvxjt8/Fy+gBBBGwzZt2nLCmSfLBHvEWcfZkh2UAVERhTHGicIpaSbVWiTJsgPnn3XicRM72iImUoGCEEFVxYC8TgEA2Bf8pEJg9aq2QAH5M64fTBc0IrohGBAZI4Z7s+yOFU/+8re33LNi/biWqaccecL5x5x14qGHtiA2iIvLrMBfF+rDNmMwrd1x1z3/9bMf3P3A7VlUmTJr1FHHHXrkcYtnzxw3qpkiFrIsTE5MfbmVaDjmKgAWyfX77G+CoPn0TL9nRvNR3Xz5ih/TUn8sVlXAioqKk9zc3zM/FKx8hsTZ0J+goMZThTIZUTAYgACJo76ksn3P3pWPb3xqy87+fgvEhkPlBYWDEsBCgEqQcxAgvnmgJAQGRAmRchtaBjebW6+8OetJXnLmuV//96+OHTPCUCHa2AcvoAQQkDn73e99+72fvOyQM5aPXzAx4cyRtZKCmdQRGcksKUOsrfXOnT7p/BedMGfKxJgzEjDFANgZEMQ3oAQgIWVAXCMB+CvUZwgl3yyGMpMvmZgjWCJSYgMRbTIDWrvj3ke//8OrH9/Y09Ey66Rjznzry1934OQZJUi5EAoV+IvhY64K1FBP3+BVN/zh8p/+eNWqh6lZlh027tjjDz3siAPGTxpVLkVKNiIn6gBDnm9XFXGBepewLd0NEW3mPL0v48kTQARiXyl7qig37gF5V33fLUYmIip+t6PWx3ihnt4JS9opby8QSAXhXAGFkiHrkAkGK7K9t3/Nxg3rntrc05NaR0wxkwFSIkcaEUjyB5prkpDz+KFroQCRxGzipIX66NZf3ZX1ZYceuPQ/v/PtOdOmxFHsX8riZF7HCy8BCLRvoOf8l15457p7T3jpi1qmtlVsTcj6QoDViIoK4FSylCWdMCI659STDpg3o8wcCasKq/HD5whUoLfA9ZoHJfJXM4E0l7eFHEAUTAzh/QrBCopUwQrWxNCONP35Vbf+5orbdu0dWDhp6WvOftlFZ5w9uW1UjMgUMqEC/xcooCKBWN/d2/O73131Xz//4ap1D7WMbVl86OxTzzjhsKXTJoxqM3BkrCqDxFfUykxM6hzBl8YILL83aSAOKszAffofpgoIlHK3fqoX9qp5A5gCAaWkUAuIqnjSlPKxKg01O7zcM3wTBohhRZWYhMiJy1SrSdY7UHv8qe1rN23dvqcrzWBMEwkRiFU1yphInQlSvTzw54d2AljhXyElIoaUyZQHRt171V27Nu6Z0DLhu//1/VNedDx5FVIR//fFCy8BAFDoqjVrTznvdB1vDz/35KglrUrFqgMROSaGqLCFqooVSLW5TGccf8zyJQs6mqIIYkDiW0XC3oiEIIAIsdf8AP6ylXBsDcdm0mB6yAohYvGeo4ghGnGmrA5OSmbjnr2/ueKGa294aNdeXbrghIsvesOLjzthFJVKGH524wX+AqhK6E+R2bRj+/d/cvmvr/nvrdvWjpnRevTx844++vBlByzqaG9mFmElIieOjVGnrMowjklFiMQQiSOAHSsJAMdB4uALHIiP/76uz+v3YM8PwAdezxeR1jWc/qBsxY/eE0GCbiJ8N9Ig9vdnbTLqHf0FIMeSqVYz7R6sbtnVuXbT5i3bOgf6UmPKZIjgGGABhWF98sPDSs6rjBp6paAvVSFlEkZU0lKzLT9xx/p1DzzeSs3f+MY3zz373LgccejyPRdv5PMYL9QEYCHf+vo33vPpDx1y1lEzDhpflWqCTCSLKAKROGEYVVVxIiIubSJ71NIDTj582biO5shvJxWCI7+0zn9XCaYRQ1Rr4g+bgrx/lT8AkNc0g8TvsFZLgCEhZstcI7N6w5YfXXHtbXesddR89vJz3/yKVy6bvaQVJVMwQgX+N2hDk0kbtm/6rx/+4hdX/mzX3o2j57YsO3buCccdcciBs8a2tpWcg7c8Z2/FHGQ33vTHF8UIIhlShV/UAnUEGCUNzp1BCZFf93XvBwBhptiffTUvvfNmMQl8H1i8Z5DPFErBHVQp6IN8i5aF1AkzZVZqcANJumtv37qt25/asn1v7wAkYpT8KklBLSJiMaTs/PYwMCAS+hLqJT8CAgkRqYoSDEsJ5ajSMrCp59bf3MtVfPL/ffRd73xXuVwKr0QR/Z+BF2QCAKDQ/srgq1772utX33LGq0/RVk4pSZHmI4cMJ0SAaCYCkNiMJFs8dco5Jx8zfXIbO4koNiCyDIILs8IMWH+NiRc8i+8Q28bx0ZdB3sDQ1zee5gzDjewHGv3S6wFj71v5+I9+8bvHHt0ybuKCV5z1xte9+BWTyh0leHuK4jxQ4OkIgVhUmbfs3vX//ee3r77mt3t2bR47o/2Ek+Ydc/yyJQfNa20tRUDYkwvxXdzc/SEQ+t66JxfmhEzgzSHyQXgf9OtnXglCTT/K5fkerhf89Q4AEFhRqIq3/BQRqFfUGSH2oqCYIBCnjggckSrEKoBMMJC63f2Dm3Z2rnty6+7Orky0ThSFqRwmUoVwECIhtCZCC8BXbd6nlMWpEGesJtZSs5Z7NtQeuumRyp7qG1/zmk9/8pNtrS3GRKFyKxLAM/BCTQAABPrUpqfOuPCcWsvAEecclzRVU3IQIYhXi0FFFETsRNQRCzgbnDSm9YwTDz9k/tyyMpwYYnHKbES9/UlQB4VBMPWXoPMfIZJgfhgEEvlccUgN8G0z31swpIjiGsuOysDvrr/hiqtu3bHFnH3cue96ySVHLjqoTEyFIKFAAyFEiQgz7+zZ893Lf/zj3/x8264NY6a0HXHU/FNfdPTSg2aOaCsTRMkvQfXB0LMj+Z2seYtKxct98pkXX9zXjwcauqh5ad/w86wrPYc8tiETwCFteJWoiCjI/05goljA4rl4tQJRUmWIiBOxVtJU9vZXN+/eu37bru079iY18ZYVxKqwgABMavKEQ/7wTfXphJCglNSvnCdVp6xMLkYUJU2y1973h5V7NnadeMzRP/7Rj8ePHRlxpMED5u/0Rr6w8AJOAApkyK644oq3vu9t4w+ZuORFSxNKrKSAdf7KFvhLiADnhJVUXZbVxrTEpxx95PID5o8ox6RCApAJ8wGhCSyAyW+mIADVoDbzGranXU6BHiKqM6tEbCAgIo0oMdWntm/+zuVX3n7XxinNiy59/WUvOeXFI+JyDCmawwXgqRgBM3UN9Hz/Jz/5yZU/27rzyVHTSkcdOffYo5YffsCi9rYykzoSeGIcyA+jyDer1G9kyv8UpnsDsZPT9koqon4QHgj9XBFP1/glSkCdM/EHEv99Ke8ThIRDXv/jVDRkJCMEcWBWglNC5pwVcUA1k56B2vbO7k0792zetmtwMIOqgd/+4psQyv54ovnzCcO6AhDyif18IAHwlRqlBDZqmqkJPfGDNzy4Z333wXOXfPNb3zxw8aLI8PC2evvf8QJOAAAEUstq//KpT37tx18/5uyj22eNqnFqtaZEqo5gFGAJ9I0TpxASTWvVFoPjD11y4mHLxo7oMETqREUIkZLjBhEa7i8ln0xECeHec0RhX13QBWlDYoD81M0qiNiQOI4ki6gzG/zNzXf/+pe3bN1izz7qle98zRuWzZnVFLoCxUU6HOHX8IoAzP21ym+uvOq/fvrd9TufbJvetOigSccfc+jhh8wb39FRsqrsRAj1KXYf/clLeHKJWj3U56RJkOvkPS0o/KZe9QqHoNgRbxzKxJ7HrF+L/svrwmj/dyAMDWsw0yUnTgC/R95bulmxRJo5l6lUreutVHfu7d24Y/e2XXv7uqrQCDBEatj5l8AnGIaycP2+ygNTSEU6lHSlsOeFODMwURaXk/ipe3euffCJCW2jvvT5L5533tkRc5hgK+6tP44XdgIAoNAdnXve/M4337vyzqNfdgqNsRWXgpxv4YKUwXCOiKHigqoTBKDat3jOrDNOOnL62NElIogwGQCqjoMHlm9twUEZwgoldSSkIGFPSQ6pkii/yYQoSIiIoGpY1Te+TDlOIlq97fF///Yv7rl548Sxi9/z+nf+w9kXtnG5uRgZG27I624hZILrb7vlW5d/Z83aR7nVLT1k4lHHLjv++IM7RrSwKBkYQOAgPkLnGsw6K5KH8vrfNBTO/mNat+7XOgDJc4aXcALeuJ+CnUNOHOXNYfH0S5gBU1aCUMgkDlAigTpYUQdRq5KJ1qz0VJI9PdUtO7u27tqzt6cvS21EhhSG/EiCI5AGGYYy6n1qYMhTyo/i+RMm369wIDJEZSppX9S1fu/DN61p0ZYPf/iyt77pzeVSFG7nIYej4hZ7JvaHBGAhDz744AWvfiVP4uVnLsvKaQLr95ESK4QZ0NzOwVkHYnWi4pytTBnbcs5Jxx4wY37JEMFChWG8KShBhYzm3BBBPKfpDUIBhJqfJJc9U16ThauOCKpBbS3KEbMS0lh3JNVfXnHND398TXVXfPbJF77/kvcum7OgWSNDxRr6/R4KqLrQYwXTqsef/Op3v33TrVdmcW3qvLGnnXnSqScunjS6g52FcSJERutLVLya5xl6lnCN5t8fOTeUMzx51siHt4KBs4fkS32JiL1ZP/mrOrSVw4/IWwq5hSI5iKoQOFMIkXWZlTRzqapmTvqrdm9/dUdnz5Yd3bv39FWqCRlm8tt5XZhJo/p38021MIhcd5sIDQ0EOjb3lFYhEDkoSjDNtty3Ue67/t5ql73oJS/7yte+2NHaxBSFs1ER9/9HvOATAACBJs5974eXv/cTl80/ZOqi45ckcVKVhIj9ajt4Z3JVEiWosMKReJGEq7SV+IxjjjviwEVtTQKbRhSrEAuBfY0TWgCEUDYB6iUT4Vr1wwHwU5P142YghoRUVISZ/eiLMkFdRBWu3b9i9eU/uPquO9aNbV/yT+/66KvPOX+UiWIwF12B/RgqEiifaPue3d/58fd+ftUv96ads6bxCS9afvrJx0+ZPDYmq6xMxqljZkggfpSG2LMN4RqBIbR//efkCp+ha7xyOogU6sSrjfKLO6+Q6ttfNGeHKARnEmj954lCBU6dgDMVqy7LMutSByTWdfdXd3YNbtm1d9fu7v7BQXA4SoPhVBnMwoC6XObDQV6Hhkln/jTyv3gTIAUp4BQwoJjiKCm7Lnvf7x8c2F098dgTvvkf35o8frThopD6U7E/JAAADtpbGbjkzW+76pbfHXn+keMWjqogcWoVDmxUlARMhkQ8exh0bCIiakhKNjnp6MOPPXRxRymKxCjEIBZ1YBZ1/t7z9uFhDEUlmEkg9KoAhJkAYEgVJl4r5A0MScFkmKAOHHFC2Ny96/s/uvK3V9yfDERnnvDyD7/zHctmLygXR4H9E6oSLEYSMr+++nffu/w/1mxaGY/SZccecuG5y5cunN9KRJyJCAwDLKLGH181X6JYL2hzpSfwtIDZ+D1IPP1yRnizcwmfn3duJbBFqgrv3EmU93qD6afXRJPPFl5xJAoRQGBVrGotSTJ1zrlMtL+Sdvb3bd/dtW1HT99ALbOqKoYc+4YxUUbKYOOYCJYABYeJm/qKeWo8g/rsbtAs+d61IyWjVHKluL/j/j/c07W1e8rIyd//8Y8OX3awKSj/Pwf7SQIAYIH773/gwosu2FvqOelVp5ZGGStJhlRAhplBEOXgRAKCKPkbgNSRUUhWOeSgmeeccOKYptioiyhWgnNKfkeRCupug160AZVwLBfKZdMCAwgFsTKFo7hPFRw8c6FMyszkIBTxIOxtN935ve9duXJN16gJC975+ne/5cKXj4ybS36rZYEXLvZhH4IIJyOs37nzS9/4+jU3XSPZpvmLpp//0lOPOvqQMe0lQRYxhKwTZmYi9ttNPMdfD4oAUBfCPMvPDPy5Br1Onf/x/y75b6qqoiJhbMCfBcIqXg2eQFAiFSJlUUskQmRFxPOhKjZ1mbWpuNRmDlFidU9PdVtn97bde3p7+2tJJk68HSiHgwY7qBIxhLT+hJ7xPOqyVVL2ev/cj8LLjpg0QhS5KKpFT9y5cfPqLe2m5fOf/+JLL7ggMly3oCjunj8F+08CEMDBfe87333bRy8bv3DkoWcfFrVI1Vn1x2aBIVIhDmJiXxyJYWOdQkidZRpcOGPyWcceNX3cGFYDUoZRtewnykKh74U/gRtVKOcyCSj7+1VJWf3p1X9OzmT607T6ZXYAGWYlEznI41u2Xv7Lq664+h7XM+olZ1/4sbe8a9GU2TEiQwUd9MJHMBRHJRv86bVXf/Py761/as2oia1nnHLwBeefOnPauAgJqYpxfsZQyG/I9bJjqosi95F6/lFtS+NAClVRqTcAKHyRBiG/wIVOQCCGcjsgN1RIRGA4EhKCONJMXKqSOZvazFpnnROlJJO+Srars2/zju49XX21LLPOMSl7ZZDfUUmsfr/k0H1LVM8BecxWaONko1DiYCnh638HVqMUa4kG4l2rd6y8Yy1XzQc++L7L3nuZn/hVBReh/0/G/pMAACi0d3Dw3Ze97/Jf/eCA0w5asHxWgjQlUbjALKoh8hPs5PsBSmBQ5hyBnE1J7cyxHWefeOLsaRNiOAb7nXaqqjD5xepPA6J5saX5vZPfQn4s3vcGgDDDmIsbvIOtKBkGG3VgApWiXtt9+933f+O7N6xb0z138tLPfPBfT11+bDMoghZHgRcoQhEOqOrj65/6t29/4Q933FDlviOWTLroFeccc8QhLc0Ra+bUMbFAocrEEqbZBSR+whxDI38eK5/OBw35LM3bwHXVfqhfwuaUBsUv9QGwXDfqt8IPPTKIwu9LytSmWZaKTVyWWmcdWSfVmnT3Vbbs6dzVubdSSdPMAgQiOGcCdQOXG8/lyyDzo3RAo3G27wf9/ZYvoVElcowolnIpizvX9a64dXXSnf7DRa/4zKc+NaKjPYpi5FmrwJ+I/SoBAGqhG7dsPvOcczb1bT3m3ENGzR6VRJJKFvlChKJwVQVPEb/JQshQlgmRsABZdeKo9pOPXX7g7JnNJiKjJKKqTHFwjSPJe1X12RQ/yZI77SqDIGFYwBc0QTEdhvRzkZCAVE2JDakzRmuRPrRl039c/vvbr3+s2Yx9w3lvec/rXjttxKgIUdEZfsFBFArHxJ29fT/89Y9+/Oufb+rd1DoSLzrlyFe85Nh50yYZB5BTsSAimCCIVwNlgoDdkIsFuXgZCDXHkL/to3Wsf0b9EEph02/e3ZV6Dwy+uSD+RBxEniEpiAKicBDnuX7RxKWJtYnLrKgVVDPp70927Ozetbenp38gTVMK/kOsChXHAINV4cJqSM3ZGR2SAKjxNMKzUuRpwjuuKDGpAxGTMxpHSVOyPX3k5sf2bOo6+vAjLv/h5VMmjotNVAh+/g/YzxJAKLx/c/Vv33jpW9E2eMwFJ5lxUU1Tz2eGLXjqnQvZi5pJRcnLIBRKEFZx7SV30pFHHHbg7LaSifx+CxdxsAtVABzOAarCSiocmFPKJdpK4duGK79xeedGWwQvr2aNSRExrGrWZHbb6m+vuvEH37+mazOOXHbcpz7w0SMXLC0BMYrO8AsFeZBlvf/hh77+3f+4a8UDA9I9a/7EN7zqxccedWBrkzrNmNmpsFLExjkhVYoA4WC3kHsUYt/Wbh1PD3b7KIKonjAofzC+caAQUXj3NB/6pXEgIP+o/WJ4B7WqVjVzLnNZarNMbOrEilYzqdTcrr39uzq7u7oHKtWqN1JU56IgmQbgAm8DkSCKftqL9MwnsM8/UW4OpACRJeJIo9iadI9Zc/uaXRs6F0yb96WvfOnYo5cbL1/C0/ouBf537G8JAIAANal+9MMf+8p3vjlz2bRFJy7kdrVqhXzoVYChQsI+PMO7RhCUSKyIt/m0tZKxRy6dd9QhB4xrHxlHETsip+wvSxBQnxfzU5EC1Osq3/jN98n4E4AfsMmdWZCLlJnhHDMzA06ImWxEfcbe9vCq7/3Hb9Y+tH58x7wPXXrZa859WbtpjhEVl/fzHKri25Z91d7/+smPf/zLHz+5ZW37pNFnnXfcy1922rRxreyqpI5iOCL2VYhzgDHMfhOL10Tmgk/Vp0dKHVo1h9+GBr59FKIhqtdr7nACyHU+IlBVbwetAoH6UWGnyNRl4mrOZS7LXGadS61LrVSq0jNQ2b27Z3dnX99ANSib4KDeBAJO1JBhFYTDcVgvU3ehG6LzGYr8aeSpy2/rVlGNwGSNcmyb4n6z5vaNTz66fvzIsV/47BcvvPB8w/4r8h5HgT8H+2EC8Bf3Uxs3vvHt77hzxe0HHj9v1qGzXJwlsEJgZhIiOFYDeHEPsThCGH0Rhljx2h8jgwctnHPs4QdP6OhoNqbE7J2kvfUuyFCYyZT6jVov2Hwe2Hdwn3Nbq/wQTLkIj1TFGKIIEDiJ1Jpo4+5dv/jJDVf89s6B/ui8U1//wXe9b+GkSS0wcVHkPF8h6pTgrL3v0Qe+9p//fvMdN2bNtcOOnPnq155/+AGLmmJm40QdHEUmFlKxlgyIWCVWxERCmqlaHiLz2ScBUFCiDS0j/D80PkcRvBXCIUJFXW7M7+94AUIHWAQKsYAqiaj/n4Vm0ExcmmWpuMylicvSTAbTrHsw2dPZ39nZ0987YFVBpC7zU2EgBRnx37LB9/jjMlCnf+iZoX8ocuqfAvsDqJBGTCWUqKe0fdXOlbeuaTMtH//Ex974+oubmiKGKer+/zP2wwSAnAi64ebb3nTpW7or2444f3n7rLaEMkeqcAaGoH5tO4XaxI/LiD90OlGAWEVExFbnz516/LIDZ48b3xybEjNISQ2UVA2C0bhQ0FKT1tdO5BVQeEBh6TzqUoyhgo2wi8zPxTMBQmwkMnv6kyuuue5Hv7x555N2wqyF//T6d7z23LPb0FwqjgLPL6iE4tp19/d/77++85Nf/XRj5/qxk5rOevkp519w/Ixxo2JnlZyoM2xEDSk7ccawEsQJU+xUFWzgmBzljDyAXHcQCJ1QRe8b8yhw5/ULShsMY8POYUhTGjnP4zzbA1FVp1aciGbiMriaOBFXTZIks5U07c9sZ8/gzt29/b2VLLHOQWABB3GsIPUiH5Lcrk6Dj3/oYFBjNPl/BvnDj4alBOwTilHT4pr3rqk8cPODaY+9+DVv+Myn/19HexNTQf3/Rdg/E4BHNXVf/urXPvOlT8WToyNfsjweEaVwDqkSE4OFySt71G+Y8Jeo522cMsQpawzi1PZOHTf25OWHzJ82oa0pJqeGS0RgjUQkdIMhuZoCCt9mgOeC/c0M5LeC/7PPDo2bulH1EAHEqkysMJFluX/12u//6Ppb7nmM0hGvOPeVH3r7e+e2T2rCcJCIDr046Zn/UFfC1GPlcxELVL2vDdy9D9z32a98/r6HH9KouuyIRa9//SnLDp1TLhvnDaYgKkxU32erAPtLRVSU/F89ie63ndAQ5ianHhX5YG94CfKHgIaJj1KQJIS6X+qKB1EVkBUnIPHqGhHnnFNVJ367S+qshaRqa2lSqWWVVLr6azu7+/d291UGa1nNQnxSCu0DViJl8akgZByw5go4Pz+jQxmg+jOov78NAssrRUGq5FQ1Io4RRdWWwW2Vh254tNJZO+3kU7/2la9NnjS22PD+l2N/TgACdPX1vu+y9/70ql9NO3DikpMWoY0TTeFXzJEh9dv0QjkV6i7UbxvykmXrLDSdNKZ09LJFS+bO7yi3lGAIYijyCmeEUYDGS6m+uZAXZTKkAxZWzWhY0+Q/7pvDoNCkBlgIhsg3CawpbRro/vGvr7vyqrt7dw4esvC0z7/vY8fNX1ymeP9iPfPzUkO0jmfkgJClRTRzjuufx0SkTOyJN/LZ8W/+2iigTkDMu7t3ffcH37/8Z5fv3Lt1/LQJZ5934ktfdurkiQauAlIxCoo4cBp1yQtI2St2csIjbNMiIl9BB5scFdQXIDaKCKA+KeAPlLnQJ99gp+EoqkJghQpUFFbUqrNKwiqSQURcyA2ZOCc2cTYTO1hL+qvpQDXt7qvu6Rro7q9WK0kwE1KHsHAgbEeC+E5Y/SYgqh9Scu6pEfr35bHQeC5hZAdgMPyeSRZplibZFd/7+3u6dw0umL74e9//7tJDFtf1UShOAH8B9ucEAECBNRueeM3rX//ImgeXnH7Q7GUzXGQdnIMFU96YUhYmSH3Sl+BUiZmcd7olBtTZ/hGtTYcfdMDSefPHd7SWGYZYRZmisHggXOj+HpGhnblQgGleq4a7eh/9czgH0D5nZaUwNSYxVYD7V6766S//cOedW2aMOvj/u+yTpxx+dMQcPV9PAs+s7v7HT27UqoCKIs3sQH//jl27O/fs7u7t6evtySRL06zcVM6sG6gM9vT128wmSTViU2ppcqkbO3bcvNlzx42fNG706IljRpfLUVyK/TpDQ/yMPSd/+fNTvyvugRUPfebLn7n99lu5xR162PzXvOYlhy9fWGqyhNTBhsXlBIIjcEPar/my9PDn0K0lzk+VnrbJXxxqhM88yIPrtI8Gk7ScbA8lNpGKP5GKqkCdinXi1FpVR8453/OFiFjJErECrWV2oFrr6a/21bK+anVvz2BPb59NRazzvlrqlP00l4ZjBITDo/B5zT+cp0X5ISNqz/6KEsLUr0KgBmCKjGXpliduW79j7fYx7WO/8Y1vnnbKyVEc+dehCP1/Ifb7BKAWcvXvr3n1xRenLdmyk5dMOWBCFkuKTMhBlWEIZBwxYHMelfPxGa+ZIyZPlopN25uipQvmHLpw3vjR7c1RHEURCef7w4TrjGs4CjfiuJ+DQaOWCwUfoV7Ged6TtXHP+54AVNgQwFSL5PHde358xXXXX79ybN/Ed7/2bW+86B/ao5bn4Z5hzfPfH79FNS9lVZRFUatVO/fs2bDhqYcffXjNujXbtm/v6u3p7u2vJbU0TVWsU2vVqYqIOOeEFAo2KJfLxnCaOYW2tbeQxi2l5mljx02aPGXhggMXLFgwd878OTOnx1FcMjERG/4rvFzeTafXuu/++Hs/+N5/bNmzoWNM6czzTnnlRWfNnDKKURNKvbuxCoiMhl5RfuTbR8vTODoOec1kSAcpBEc/eKKhpdQoN1TVEamKaYRZQnCo9Ttb2Hllp1hxDhArzkIyp1YU4pyIqEs0G8xsf3+tb7DWV0l6+ivdfYMDlTRLs+ClSKHYz+t5/+DUe/l4+TPqyqV67Y+hXQt9+pMe8ioQQMwqKoAhjTV2vbzjkW1r715fds2f+MRH3vaWN8eluBiL+WthP08AAARazexn/u2zn/7K59onlI44+7C2qc0Jpc4IEdSCiY0w+/KJnFNiCn+DAhA/jiJKRCRZzZCbP3PikcsOmD5pdAmmKWplgIVUnd8EELxCwyWde+gC+dJ5XzkF/p/UazYoN31h1Ft3DAAOSgBzpM4quTQq70Xt+tvu/9Xlt+/YUDnrqLM/+5FPTG4dU6Lnd1u4MaApAJxTBVmSgYHeNWsff/CBRx56+JEn1q3t6u4eSPoFKbHjSIVJVRhKjFLJNLWWOka0t7e3jh4zcvSYkR0d7c3NTa1tzeWmUpamLW1NAKx1g9W0Vks2P7F9/aYdu3d0VWuuOWqaPnnGvJkLTjjuxMOXHTp+/FhDbGCYzf8pkKjXem7Z0fmFb339V1f8rFbdNnvhhDe86cJTTz+mqcTOJcziJwENsQuGtDk9Ei6SeoYHUI+NfhbFR0y3z8+sSwf8xrl8n7tI6F45v5U6/N9bx7GCnDoFWYWFs5I3ep2IFQexKpk6kHMqSWYHklr3YNo3kAxUsv6Bak9//0ClkomQsoqy99ESQjjghg15+fPKj8GN6J8/y30+0vho429K+cSv9/4RhjFZFGfxtkd3P/XARhmQi1/z6n/+2Mfb2lr8xO/TvkWB/xv2/wSggACdPV3/8OpX33jXLdOXTV183Mx4VJONnCPHjgggjT3fn7fL6ptWw4yAKphZRFUc1JFWp08bc+hBi+ZNmToibo2ZDNjfvXXi2u9L8qM1LtyWDAVBcqopfCYNuUG8+EE1d7QKv4mSUVFiTS20HPXB3vzAI1f//sE1d21dNHneVz78meULl0Vget5VRjrkV6+xhUbo6u+//vpbrrr+D3fefXN/X49zKUgQaxRzeXRpyuSxU6ePnztvese4MR0j2pqbSx3trU0tTU3lUmRMZCIiEBOTP5z5Jr4wE1SdCEVGBWpN6rRSSbZv79m0fvOdt97/+Mp1PXv6Jo2asGD2whe96PSTj3/R3DnzWuMSEfhPPhMoBCrk3Ion13/kc5+57rprmjtw+pkHXnLpy+fNmqKuAhGw+tWKwWsnED2SE1zkWSnsUwcHm9kws0X7MP3wn+tFx+y9hSAQERWR3CjO+GuOgzKNoSyAQC1JJs6qiDrnnHXirIpVUacqjpEhq2ZJ70Cld2BwoGIHU+kfqPb1Dg5UE7HW20SLn3iE73g1+rh5By3v+Ybj69Mq/GcMMwz5ECEcHZRJRZTFQEqIqb+pZ93eFbc/VenMTj/lhG//57fHjfE7fusvaoG/FPt/AkDIAXrPvfe/4W1vemrPhvlHzZh/+CzXJCllngLyigsWZVUOXVtW0voEL/tGmyoYXi/nJJkweuRRBx+0ePqkkc1NJTZkQELqHLEBIEHOoeEOobp/hKuT0Y3bJJynG6gvbAWHKWUHIsMAQa2QVCJ5dO2ma//wyD23rW2vjviXf/znc089pwkxP78KIw2eCGCxApg1a9Z+5Tv/ce0t1+/esYmbyMSutZ0nTxyzcPHMJQfNmzpr6sTJ45ubSnGkyiKxdxTzbRjyByo/IOU7JeS9AohUhZiIyO+2ZWLNlI0RAbFxlpIk2bl5z4r7HnvozhWrH91YqWhH29gD5i0554yzTzjhhPkL5sRU4mDU9D89HVWr0Ntuv+2yf/now+tXLZw1+vVvuODFZx/VPiISTQTC/gRJpE680pg0tzILuyWo/n5T+KjmjW/NB8lBqn7/lyJvGKt43tCJSpjd9TIzH5ENUd02M8ykCyiDpGSty1ScWOecWFUnoipKLM5kkO5qz96+/r7BWs26wUrS1zfQNziYpZbVRGpUoCJCYXlvLmEGDeF2/L0S7CfCqsh9r4JwodPQv/jf2Z+tlQVCJlWgJKWyLfWsq625Y+3AruTwA5d+5etfO2DxPGaf4Ar81TAsEgAABZzqz37x80s/8I+2qXrIqUsmzR+TcOZUHTkxrAoWGFAkzFAXump5p8lf86TELM4KQ1WyNB3T1nL0wQuXzJkxqrUtirlEgKgKExPAKg7wDrh1DyLK6R/N/+YLPp8LpH4eCExq+FRhZgcIERNYBVAhVJVXbdl2860r77lu1cCe2sUXvvldr3vL+JZRudfEc4zQRiGoaC1LNzz11De+8d0rb7hiV/eOljGlaXNHL1w8b/EhC+fNmz5u3Mi4pJEhGEdWhRxBxKgwwUBDG4YJDFEirnfN66N2dbtkf1ajsD1FVMkYoyoRlURALqrW3JNrtz9w94q7br1vx1M705qMHz1+6YFLTzrm9FNfdNK0aVPLpSYQ8TMkhv4SsEl65bVXfPwrn96w48mjT136lkvOPfTAhayZaE3VMbPhWKDihCloC1iBYD2I8LgDTa5ELCrBNyInDInzPsAQw7eQQER8Izc0m7xOVOEXuSiUvA8PIKKqZElT2EydtVbEwqqIOIVTAcOCBit2b3d3V6WvYm3iXP9AOtAzkKRpBoVTcspCEJ9g4TX+aAyxhARACDPMqGeFZ/Z5h8SZxnMKRwcikH/X2VRjirUao0sfuXZd99bBaWPGffUrXz3ljJNi5vAjng8X9/6C4ZIAPAbS2gc//JHv/+Q/W6e0LT3tkKbxTY4yx9axMhE5IihLxCAhF8bnwwZIX1rle/JAorBwpGiGO+ygAw45aNao5lKz4ZIxRiKIBTgch4nFkZeBB59HL5/Tes1Pfj2efztybgDq7YaUiYKo1BKIYEBgkCID1SKzYXvn3XevvvKqu3Zt6Dtz+Ys//b7/N3viVCLhsETpuTkQKMQfl3p6B2++/e6vff2rq1bdN1jrXHjw1NNOW7Zk8dxZi2Y1jSwrG6vWsTg4JSg7dhQWjOdREABxUEcRyK/0ITCFT8ojAu2jq0Hg0zhvhhoVIUTOOgFHHA32DD65atMd1971yH1rtm/vTQYxYeK0Q5cuPf/cc4478biJY8YR4hLHwdlJAYWDXv7jH37iXz+YlAbf8PbTXvaKMyaMak9cQv686NSYsN0EREYJzH51kFLuNIUg/lVAxesyVXNn2aEL2Sn/my+5RQGIKETyPS3IzcVBCmUVJf9jIAQnkjmbqXXWM0Uk4pfDWLDRGH3V6p6u7u6+gcFqasWkVgb6BwYGK4m1NpNQ4CtYPemvrn50afA/ucOQEvx6JKrnh32vhfyXxnuUW6OD/NSYwjgSisSUtVzZiXUPPLlz7d5xrWM/9JEPXPza18TRs6TkAn85hlcCcMDmLZtee/HrH1736JQDJ84/bh63cyqpshKJOmLASOw/1wddH5bDOdd3CIgIZJ0fVCHNsiiSJYtmLD9w/qSOkU2xKStDJDKxN11UqN8nFuigOuUfiNq8lAp7OQik0jgU+PpOGQh3om9PkIDYCRBFFmZbte+mu1dc+7t7tq3ZuWTEQV/8508vP/hQCuvJ/s4IT5kJvf09V19zzec+/9UntzzRPC4+aPm8C151/LypU8d0NMWxJC61ZbUKMJwTP/rARCriN3gidNL98/ADejnFETIDPUNJQ0MTAOVyFaIwoeS3njg/leQgIM1MT3d13ertd9503123P7Rn4142mDl56rnnnPPS8y46/JBDS2j2KxwrSe3b3//yv339C2PHj3jXB99w2umzmyLnRC1UyRAZdX4YUAlKZAAwmTAllh+FRJWYfDnvxI/f5poZ8tu48iZSPtalCvEObsG20wvE8qQXagX1amHrVKCZOutc6jKFSgYVwBlVopg1wmAt2dm1e09XVyVNrKgVrlSz/p5KUsuciMvDOKS+8y5E9XyjUZ4dQjOg0dIeInxD/Vns2/zNw79fBObbyKowAJIIxtTKcT8/dsfObY/vKim/5lWv+uQn/7mttUQUFbX/3wLDKwH4p3r7nbe/+R1v39G3+YBTlk5cMj7h1JEVzfzeMHYmfC4pS51u9rFI8gtaSUUZTkRZM+fYpQtmzjj8wEXTx40a0RQZJQgz+/vZz8oEjtc/klDcNYYj/fdVgOtGvoH1DmE/TK8Fy174Da5Myk7Flajbpg8+vPam61Y+csuTLRj9ics+dd7pZ7YSR3/Hm6YxC6248qqr/uVzn3hy06oZs9tOOe+oY05ZPm7SuJZSRBCIOlJLyMgSlIMclAWqQszwpS7q7cHQPRFfytcrx8YTy1vm/gtUiSjvoEoQ2gYqCggLdlnF+entWIWVS+TcYHfvqvvWXveb2x+5f01flxs3ctzJR5182bv+ceGiA5XSL3/lk9/53ueXvWjJ695z6ey5k0eUegQCihQsAsMEziWX+fsYrh7/8L3Al/KTnap1TuH3w4cukWH2DWKpXxC+PPbsvwben4jgt1SAvJ+5KhFTps4661SdU+esSNjHogpCzJHpz9z2nq5de7p6B3qdAxFnNqtUqwOVmkudComQhBe7XsrXtcxDq/v8tW/84z4ORX/k6sgvdICUmMiFAXwHkFGUJOa+5nX3PbFtZTcl5txzzvzs5z87blQ7m+e3wu2FjOGVAAAoYNV+59vf+cDHPtQxdcTikxa1TW9JTWY18e1W4ghKrEJgrxR1DB+aAD+oyYAFQH64hiwIWerE2umTxh5xyPy5k8a3NzWVEBs23uXTMEGEQi1oglYCyG+khl2EJ4LzjzABAgllYdAlCUBgf/OLIQbYqUUc94u7f81TN1374B23rs32lt/+2rf+4xvfNKbcxvirz0A9C0SFlIhp1drVn/vc52+4/ZqW8fTil5/44jMOnThxFCIVdmJFCSpGhBz785GK39JZX1dYz5NhSy2RkpCCBUTsY2n96RAQhE95WyCEXE+0UENuFag7VbATqz6PgFKnRJEyomDDibSqG9dtu+3399x+9b27dvSOiMctWjS/FCe7dj1+zgWHn/8PL+6YNlPVxZwwVCXyuq1QCqsQwYmru/D7xqxn/YIuH2Ebr/PBO/QwiABmDpaxwVc27w/4XKB5gZ2fcvLmgThRUcrEZmpFVR1UxLtdMRNFxlrd09e7Ydeuzt5eK2qisqSoVpK+an+WZdYJBEZZ/fWm4cnk15t/gRsDZqE0aRxknyliyL8o/EL12iAMbsO//XAEQmaUjbIZ4F0rKmvueaJs4/nT5vznD7+7cMFsJi5q/78dhl0CAKBA78DAhz7ywR/87MfjZo868PRl0WitaVWMAmJgfIBhgVFvzAaFOMqjSl6VB+qT1akDiMBpbXD0qLalBy1YMGPK2Pb2EkzMkaGMVJiIvD5FYhB8TAvRLlD8+UR/PmUTlmgA4XAdPkcIrKSiYgyp8y3iKLUO5bhC9MSmjTdc9/ANV6/o3F47ffkpX/3kZ2aMmRAh+tutFROIWuXI9A1U/v3rX/qvH34nS/Zc8JrTzn/1GSPHdBBniVTJUL5YMCcQmCDOKVRFfVekHrrrRsh+E0Pwx5dc5EL5wDbymN+oOnO7zCF9YhUCe6mXd1gTqFM456xXtxjjxJo4surNeqKylpqpqbpt8JY/3H7dtXd37u2cOX3CP77/jVNmmaY246IyxaUYpYjInzCUXJiBDS4MYd+6eNmPf9CB/AlVfGj2EIkIhRmq8PD9XKBVp+GMKKE40CHDX0GDL1acc+JERODUibh8GzUBkaqpOe3q792xa8+evs5EbKkck0ZJ1fb31QZqSeKsim/DciAjneO8c9QI8PgjOk6tv1VDvqDxCY1PRH6kITDlD15JlWCUIhfJAPq39D9281NRWp40cuRX//1rxx57jDGF0f/fFsM0AQh0/dYNb7j4jfc8+MCCYxfMPGISOshypqykhiFQNgArSEnC4C5yWkHrwy9KyvnIpmf8rU3bO6IlC6cfOGvOuPaOlrhkyDVFrCIMJjEkkebkZwhkngjdp0ryN12eF/KaV/OyKwhK/UYnFTAE5EAOaiPdsLP75lsfvuHaFduf6DlszkFf+NinDl+4NMLfYsu8Oieet3jgoYc/8IkPPvTIrSecduRb3vGS2bOnWJM4JKl1ZAhkAInIqSMLqFfJixP1Hgjk1FfDSmRyrkGFiUB1m/xAkTdmJNBom9YpoPDnPJEE9Tzlk3XkxDmoU3Eifn+hZ4msqIthIiOixmpJo7LGMUU9tgoXlazEJalUt8fNOnL0+BFNowzY5MYddfbLT8R6aaaqOoV4foYA7wxI9aga7jzfqc5zvI/FUIULjD81Jh1CzoA/CoiKc86GF1HVqVMBlJmU2KlmmeuvZVt2de3q3p05GzdFTJzWXLVa6+0ZrCZOlYkJzNY5hBkDUJiEx9Cgq89MAE/n+huJImTxQJbSkK/0l60B/HJ3QB0RIhfFtbhnQ7rq1sek17Vw+Yv//rmXXXAhM/t37/khats/MRwTADwRBHfVlb9/+zvftSfdfeiLD5u4aLwrJZYSITXEJAbQfLuvI8A4Y1QBCMPmtzEIJJqHbN/Molpaa2riBTOnHLJgztRxo5ojU2IyhAiGYEhiDZ8vQRUaXIRyLrguF/Q/QYVyoiMvAUkhYPbaEWVkYn3tSCAll0K293TffttjN1+3au1jOye3zfzy//vc6UccV6bI/PVuJYU6cRFHfX29n/3C537wi69PXjDikvdcePAhS9rKpcylGrtMLCM2YFJ2LkPkoGTJKFid8wMOyH0lnToQGFGDV/Yew0EGD2JDwf2mHmL2jQ3hYKBQMHnNOgcGBhAY55ySJtam1qaaWQlbGaM4hpCyOhKoRCYicawaUSkTtaxEqNXSpFYj6PixIya0j2pjikCqxuuRSASo28oq/CpdiCBkAL+hHb6nj7rXCKmqYQ6nndz+QaHO+4vktYaX/qj4nrkTVVHJrPMsEggK57cRgLiSaddgpbOnp7N3b6WWxOVyk2mWVCqDg339lf5a1TkFmBwEXuXmtxtRw0cOQN3Q7Zmx/tl+p3qv+GmHsvA8/LtFIhGgxJmQsMJoxNVSZXt17e3raztsE0Wf/rf/9/JXvKypKSY2/qcX8f9vh2GaAOAXh6Xp17781X/+/KeaxpaOOOfItmlxQomjDBCDkpA4kAETCYsaMaxKgKPg7wUguPyoMrMT543bMoWqi5HNmzV56cJ5k8aMaI+jplIcc6wOhkqAAk7hOHim1/nUcKrI1R2Ur9UISiFt3HUqBCVRJQcXblUVE7HLnDJZoj29tTsfXnXtdStX3b9tjBn1+cv+9eVnnVsi+qucAwSq1pkoevCRhz700X969MkHzrpo+Sted9bkqSPJpc45GAWTOKgQw7CSkrMkvkgGVNV4YTugfouI+vAWFoiHJriS+tYLgUOl7Lu4mrM99WhDoRssUCYmcaRQGP8OCSCAgyZpkjqXOmvzOV0wFGwAJvY8vkCJYJjUETESdk7FOZclKmnWFPOEjlETRrSWyDAMERkhalj3hAXoouJC9PfJW3KNr2rjnfTtAS/+r1NX6vvh/gtURcibOfvCX0XEirNOVMWpsr8OWFK1SeYGK9nu3sG9A301myCSZtMMZ9Ik6e0a6O0dTJxzxkBh/CMKdLwlApTz3oOEdbxDdP15DH964G9An+VP+ZNE4zyrRokICaBG2WRxsptW37xmcHui/XjHO97ysX/5aBxrxBFQ1/UW+Fth+K6ZZaClVHrjJZesemzNz3/78ycffPKAMYui5piMOmRWHYMjMqSB5nEkru7HoqSeNPUEKpOI5GObYgypGBWz7qndtYpdduC8mVPGSKZNBhFHUAs4Y8THBIIR3yAINb/kwz++WvRWYn4sWZnCmI/m9kFQJYGQhBFla4lJBIYxpq104tGLW1qbW/n+Ffdt+sePf6AcxxecdgbB/IU3ladklOmnP//5Rz79z+VS7z//6yXHn7aUIitp1THYGBBsZg0RxQbOOREoWJlARsWJKBkFk6+cNbRdNETMXCyr4fUmMgINvWz1g9nIc0Cuoc/DDkNzv0p/fPDrbSVzLhOX2sxZv/oTxKwQCEEteQlA3cEDsAoRMWBYVXVRzGTEGRLVnspgZGRES2tTFHxko9C8DW0aBQAm7/4k/huGhdH+FQz/+beaKe9n+KODijrA+PwYTgCkAhWodVlmnSiEVOEE6sgJaX8l7avW+gaqA4PJoK2RQUtcMiZymevp6d3b051kmVOGGhJVqFXl8FRDA6N+8MoNRRpXiQ7NBEPTQP2KyB98vSXc+AaBycrbPOQlTsIambTJ9cpTD23s31EjR6+9+FXv/sd3l2MwGww5ShT422H4JgCP0aNHfuyfP/LkpicfWvtI29iW2YdNM63sNMjxUGc/A9ngl+xxnYj3NhL5PIyv5eC7AgpWRFt3dqqk1WTm3KlTRpajmLJSxICSeDoXCiGONIw9he8TKtMGFdRw2fVVsYQZAW/wjhAlFN4IgZidKCna4/JRSxePbO9oa7n7tt+vevdH3tMa//uZJ53G/9e6ytegpFpLs89/+fP//o0vH3DEgg995L1zFky06Bepgsj4c6VQxIaI1AEgQxQSGqCAYeM8EYz8oTdK+qBwQSCUc8or74XUmyH1sn9oQM3ZCxUJZFDmNHNZKtY6l4lzGuIpKO8y5IIuoF6Fk5KKEBE5EYgaitQqM3Ecq3VpmnUPCDGhzKUoItJMxYBUNVwcjZk+T9145S78Y8x5/7y5H5Kcp4Ygos7/FggVFlXrxIl13glVRUmFNHM2yZIkTQarSU9/MlhLUydCiOMSE5HDQM/A7q7ugUpiLZGJiQjsd2zljdj8vBmC95DY3vjj00yJ/mfKoP5u5FNsQcnlvwkbFVESIoqsSbvN9hU7dq7pRM0cd9Th73rvO8eM62Bi3yzB0wm+An99DF8KyMMXPrfde8eb3/a2zbvXH3nu0aPndWSUOBaQiL9JlfMZYG/DQCwNzVveImYCERyF7S/qvM+ndUzS0Vo+eMH8hbOnjCg3laNyOTIxC5MwRVAFG1+sciOQBQWL1kdrFMhXZAPsRPwOVwBQdQjNAT8loCQARI0xnFmbqW7e2nXjHx684ke3jy/P+u//+sUBc+aXoj97Skyhos5QtH3r9re//13X3vn7l1109JsvfcXYMSMECZFTciCoX7QcJPjsU6gXxYs/wXgneYFVdTTUCKERfLSR9vwHBGRC3qrbPvjf8iBRr2F9KsxUAXWimbOZdZmzTkRBVrUxbeWVpsReaRoma0O8Ch3XkKHATpwSDEMzB4Uia2tq7mhuaW9qKkURKQzIic3XVAWaSnPj5tzBH3nrPxS4UufZKQx8hckAIe+7JlDnxD9+UaeAg9Rsreay/ko6UK1WUttbqYqwiESmBCicpDXX0zPQ29+XWe9sayhcqGrEKDTMXzytnn9m9N/34/8TBdT4eL1waXyM8gwHWFITu7Kpxevv3b51xY5arz384KWf+bdPHn7oQVFMBFMwP383DPcEAEAAC/vDH/zwPf/0T6XR0dJTDx41szkxLkMixh/gIyW/PJLDviLHAMCiCpcbkkGJQnuYxY8SqITpGWs7mpsWzZ2xcOa0Me0jyyVTMlqKUDJ+QpjJ79UD8s4bAH8GkVw44jkEKFRATlRgvY2Mv1e9JZioQJRJmdk5FRE2EQiZ4217un7zq7t+88O7l81b/otv/WTK2NHRnzMp7BkrIqxcsfZtl136+MbVr3jzWa979dGjRpecAAYQFWs59+mB1qML+3jmeStVFThVUlWr6iB59Kf8B0lOQQRvgaAnGbrjS4MZRk6pExGUyL8gouIgiXVOxIp1zjlRH/0B8n2GuoQojBN7Ct+TTxjSb6GGMlWCEF/C6iB1hqg5jtubm1uamkomMkQqwnVxTx73FPm4rK+2cz6F8sfjtQWaR3+/XZqVBCLeflbEihPn53tdkqX9SW2gNjg4mFZtVlE4oogidQqntaqtDA7091UqNaeisSn5I5QKvHyZhX1Txbs6h9ezkVH37eE+MzzoPl/zbKnCv3T1Fo0qACavWWOWSCLtbela17nurs2De7MpY8d/+atfPvXUE5pK8AYqRfT/u6FIAKEo6x+sfOYzn/vat78xelr7stOXlcZqlRNnLBlvzwAjHCgfVd+89fILCSyEL+1s3sfzH3OOPa9jSBEbN3fG1CXzZ40d0dbRUiLVmKNShIgMqzKY8/GknP/MDYJyqzgh8tHBivOhQYI40A8FQ1XJubprlnUAhQZ1Jrp9d+13v7zz97+494SDz/zpt7/TUY7/dH8VUVGi66+/+e3veXtWtm962zkvPmf5yJGp1VQ59v0JEqsMztmPvNXJPnRC1SnUWyAElaQK5QEjf+FCnK7X+lQvpvMmS0P6GXoCoqRe2aniRKxzVlzinPhBKh94/NMILF0YJPbh2SuLlAH4cp1V2R/tiCACJnhmR/IKXsURQ0UI2hSVW5vKreWmUmQMMcQFkiofAwu0Va5/ryc2BCOgfCxA82QYBELqxFmoVZtZmzqXplnN2sFarZqklSSrpWmWqSPJfF2dps5qpb9WrdaSJEsTxxwxGSgJi09CpBpUtAphNA45Q+6Fp98bOejZP6l+XgVC0m+odD1NCar7KgqIYjVRwn3r7cM3rMAAjWwaedk/vf9Nl7wuNspBAbzvYyrwt8Rw7wEgb1e1t7a+9x/fs2nL1l9decXj9z654JgZpZFRatSKzQO8L+VCEFGg4TjQKJbYk/L5PcAMdk5916CWYc3GzYNJ/4HzZ08bN6G5XCIFE4i9QJoVauqGK1D25gW+pvZuYDnJweQNBzSs5YayshcG+UoLIKcCYjAlWcqRAevoMc2nvOTwvX0D1/366m9+/z/e/7ZLVa2h/+Ua8C4EDu5b//ndj3/q4+OnjHrHu99w8mkLyk1Vp45M5AeIKPApIQoAoZ3qGR5RgZJT50ekcmrLh48htWYe/cP0FAKhE96oeqr1YSUkYBUVJ86KpDazIkEk4wUteW2bsxLBRMKHZ0Dytez+gecckt8GQf6l9qE6lPFBhBqa1qwiqc2QKIBMopKJ/crH3LoOvtQeSmgPJa3qrs7+tdAgeYVCU3Gpy1KxaZYmma2lWS1La2k2WE2TLEs9gQakiXOpy6wd7O+3mSSpdVaYo9iUADg/EKzeTDt4miggXF9H/fR7Qeuhe0h5/8ywnItx6++JT9LIj06aH7F8VoeyEtSoQcLJ7mzd3U+4Ptds+E1vfO1r3/DqOAb5/lrB+/99USQAIL+Cx4wa+eEPfWjL1i13PnovlWnR0XOiDnVqwV6u7xSAMtTb0KtnjY2DD73CsEQajMfYH+WhGhkEjSPDkVm/dWeSunSuzpgwES0RlMTAsEYMA84PAGAKc0A+YNSP0pSTyGHXTG4nKWp9ZDEMdeJZBgfrHCRCqilElGTM2Ojs8w/fs3HnJz73sRlTprz8nAv/51dGAILLnPvAhz/0rcu/uejwme9+36uXLp1Tbhpwxqn4vVpQ55SViTSokzwX5We94LyjATRPnHXqJrz49SBIeeJAENj6GJqLZDQ0BzXM2ZIV58Iqc5c5Z/1IrKoSnPoRUp+g698gL1DVF+ZDzOVD9KW6OV8okMOkWm444ckmMiKu3uRNMqualKxtaQIr4sgYIDDe+a5EIce5J5E/1/ljkbeabag8RZy4TKWqrppWa0mWplmaZbXMpS6tJGkttaJIM0mtTZKkVstcYm2aicJaESLiSAX+BwurNyJkQX2wXEioEWYb19WQe6ExO7fvx4feMlT/NT/ChYZJ/mfOXx6BcQCMRpHG1U7Z8OD2vVv62svNl779zW9/99taykQNWq7A3xUFBdSAAgLcccedr7vkrZt2bVx62oHTl43TZrGwDpYMqyhJpAxA2DvHCIwSqwIiTBn5ydVGYRsGOz2Fwl6couLs+BFtB82bM2fqzJaWUrnEpZhKFBtoZBikpBr5+0/AFD7itf9hEaBKpqrQTNSpiHjzFxBgIL40E/iWtCpD4NI0IWOy1A3WsG7tzu988de8p3TjlTfPnTwrMs9KBAVT0p6+vZde+u7fXP+7g4+ee8m7Ljr08JllqsGIEoBIHZG3rWao1uUlEFEHJ4rMiqjk5xc/BpU/t7AbERKOTVBf/hPUT3D59rofcFNyCOFT/IJgkczZ1Fkr3vO4XsCH6M2EsDPM7+cK1HQQW+WplHzLQdgq4H1XFUz+P6+VhDoYQLySx2/0qQt4fGuEROMoiqOoHEWlOI6ImE04HYZ5Qh6i7lLVYNXGQYKpTsl3ep21SZb0ZbaaJWlmrdg0sUnmajatpalTSWq2VknT1GZZ5hzUkjrnD0NK/gUSJlaoBqkwWHzTJNef5VvqG8cr3cc2qXFL/LFb5Y9+VPNur1ECNFOjRI4QcRLHA9H6e3Y/9dDGqKYvOfusL/77l0aNaDNeBVeE/+cCRQJ4OhIn3//eD9/3T++vlipHnHXopEVjtJRllFoSJmKJAIVGua5NAkfij8jKIGI4P+jpFx2FEUsE1wbAACyu2tJcmjdz1uL5s8aMbI5Um6NyZGCAKCLDRKoRIvZfR6FAFlIBfKvTKZTUDyg5EVVYiKoYAoMVCJZnRkV9LxRW1MIlie1P9JG71//8K1cfPuuYK37867ZyyzN343oGZdveXW9665vvuvO25acc8Ia3n7/kwOnl2AplOSdm4Byz33LpKRXvHcZWJHOZqlqRBovjOaLgaIS68kdznj/PEGBv+s915wQVsFNx6vxCKy/ssWGbudaN5Oo8Efmf9nQ5IeUVfr6d0WcXwLEAMEoUDJ+c/wYqjgmAUfiQLaxGya99Rm7QoFBihiETGY6NiY0xxsTGQJSJvc01RKEMkjDgR54aYVGkYi1cLbO1JE2qaeaySiaZuExcmmW1Wpr4f0oy61yaZDa1zop1pEokXvzlwOQ8PxVeNFIg9DYk/OA6p1YfPKF9gwANOf38UTztH582AaAgMqosBDIVhRqNShq7rtKWR7ZvWbGj2pVeeM5Zn/7cpydPGGuYiVjzwb8Cf2cUFNDTERl+9Wsv2r5t8ye/+NlHb1kVtx4ydmYLG8ORimfZgZzI9a01/3UNW5icNQ0Sdg1/aLCcgMCUBhO7+sknara6cM70CaNHQk0MjhnixGhkDEOVlRhhaXyumvdtYG81Cb/0VSQv5ZQsFCQQEVEh61mFzDmxSGxaE1ut2QyYOH/MoacccM8Vd3z5K1/58Ac+ZPa9+Tz5sXHb1kvec8nt995x1nlHXvzWC2YvGc9aFRU2JI6C145hhYMSEWsYeVKBcxLsCjTng/PIXM8GGFpIBy4iiMfh1Dv3+KgiTtWpeC2/dWKdy1wI/SHoB8+wITpRNMaEkTdvck6aQic597IPAYxAoeIXIjhSVRcxq6jQkCVcqgALHDV+IPulJgJJrTgnNpLIuYw4YkOsjDB9l58+yA/0OohYl1lXzWpJltWszazLrKtlaZJpZm1mbZplg5U0zbI0s1lmnYhYJ85PguXXo0+c4ak1avn8POY7sHnoD699vfXin3hdVIVnBvinf2ToB/YhiQj5oUtFYRRABEJqODF7nujZ8OBmW7GHHbjsAx/+0JSJ43yGblCBBf7uKE4A+6D+WnR2d7/v/R/40U9+OGLumCPOOLh1QtnGaYYU5Bevc70NRr76pGA1ibo6sbH3O+8J+1zg5aEEZnIuU3GTx409eMGiaePHNMcmiiiO2LABNI7KkSKCRCQNMSJUFZnXBRJ5V8vMOfGrZCBK5OBrZc+9iIQ14FpJk0FrByoW4MRKpcte+5837lzZf93vbj5k4SJjcqMgBZzs2tt58aVvufm+G8582dFvvOScWTPGRpEVEjgyzF5eA1YEr84QeZx4eY9Ya506EDnNDZsRXokwzEaNFoCG0bkwQuu5ES95ciLWSSYuc5Ja66xzohL26MJLbAwREXPePR5iFaT1bLCPohQY8i/+DVMJzpRMwbEHnpYyRkRUEBN77s3l3W1+2kCCQplYxXnnV2ZiIsPGnwBi4ymg0LW2EtJYZjXNslqaptZm4jLn0symLrU1l2RZktkkSZPMWSfWioj6wTSIAH42nXJpkeR9WD8wqIFuCxek1jMwDXnMTydeniX6/2+3ytAuSqPtCwCOxJDGKElf3LO+d+1t62yPmz9j1pe/9qXlRxxmDFMYuiii/3OGIgE8CxTqgC3btl188RtvvfvOyUsmHHTywvI4k3KmECVRMoCSmnz4iwLxEVyi8yKX6h1NqX9v7wdAxH4DmULFpiNb25YuWDhj6oSW5jIbjYhjw4bjiDkiFwMcBpYI3ipAnFM4dU5Z1CUq1llPRjBB/GIQp9Y5EScuTW2WCippOpDZWmZdKgLDWtr8yParvn/ThSdc+M0v/kdzXCKQCoiwc/u2N73lDXeveuDU84953TvPmT69LSYLOBCzxmLBsAAkSPxDVesDpIg4Veucp4PyZ05Djk3eeTh/jfLGhhKEICJCPoOpVUltljqXOJtmznd4QRxKfmICMbMBKLBFOeNDyDntPCDVtZdDfq6X54e3iyEQk3fcnRPDRkispgCBo6D44rD3i8OsGkJXWusLvCSfIgeTXwlEJorqnSEFEmt9RZ/YzGbiRFIrmbVWXZq5apZkqU1rWZpm1rosvLUKYYWKiL8CRAlKnJfQElrYhCDvD5V9nmL9i7NvoNUhb8s+H/8fYsLQDJr/2nh/8+8b3mhlZ8pZuX+jW3Xb6t6tAxM6xn/uC/963nkvjkuGC8n/8wBFAnh2+JLpznvvfvNbLl371Jo5R85dcPQsGgFl6zhzrIaYnKEgCfR6diXxMh7JD9iUW0PuM+vEpGHDn+dAWAGNxS2YN3vB/GltpVIpjiNDpBQbYwxFxjBRTCiR/1I/tQQrzjp1qhmpU+eLfgTHGJdZZ5211tmslmbZQJLVVPpraaqZOpckrsm0uIHokRtXrvzDY7/87q9POvKkUilipg2bN77jHW9/dNUDL375Ma99y/ljJpc5skSelWEIQ4jZL5pVVZhc2wPfnxBxEiaYAPIsTS7oCdJKVVH2RqB5gxuezVan6kefnLpMXOpsYm0q1lpxwb3NMDODvV0wgcKWTgDIp/LqYTBgSELI3986HRRiuH8H66c4gTFgQ2ADaCapWAiBDXnPTQwRsgaxKXmle/25MoUSFwqoUOpsmmXWucxa65xzcM5Z0cxlqbWZy2pplqQ2SbM0tWLVp28f/HMFj4pzfmFaUCKD/MN2DUpLNVx4hLy720h6z3atU96QqVNHf8Z90mgd14fdVEgNNFZjKu3VHdW1dzzZt32wzE0f++iHXvPqV7a0lIzXHRS1/3ONIgH8UQjgID/7yS/e+/4P9WZ9c4+aMXv5FG6RjJz4qTAxBCWJCFB2ef019AUdetoO4hT/NyJvb0+ZdcQCIlVHcFOnjj94/rwJI0aDHQiRAQmZyMSRKTGXCBExfGuZRMVlIpm1TqEEFhJI4pJE1FqXOZtlzma2llZrSTqQZhVns0xTmzi1mdiIm2Jq6dk8cOXXrjxy5rF/uPKGmPmO22591/vftXXHxjdfeuFLXnnSiAlNikTZESmzUVFyhkBKGSBKBgIfET2xI/k0loYFjGElShhW8Dy4kgKOXHhd4Hu8UCaBqCIMuBE5DX73mXPW+b1XIuKnIyi3evZDeTlJ1ojoYZNanYir0yCaH8zyPjAQ4raCvDpGSBGDnXXrt+8eMWLUxNGtmbqEMhVV7x3hRanKAmn8eB98BUzEzD7Jq7jM2cFammZZar0jBUQhFjazaZZZa5MsTTNXS7MsyzLnHf5zDwnNKRywiojzmwKgwYE5nDi9zKcuscoZH3+8yVmWIS2AodcoGlet4v8Uk0PFD09WKpEa1bKNdXfLo9c/3LVjQPr5rZe+5aMf/1BLkzFsQmOiiP7PNYom8B8FAQZ8wQUX7Ni1+zNf/MLGhza2j22fuHCkiaAMhQMscq15cB6rl05Bjhc6Bf4W9ZrzMDmsDIWSjYwIwTlrODIcb964J6vowtkzpk4c01SmLLUqkWbOGFMyJjZkiExkyPfW4M0L1FkrTlmZjVGhJK3V0sxaZzOXprY/rQ6kSSUT5xylTtUJkXVkKbWC5o7S7IVTH7n/kcv/8zubNzz1g5/+KG2uveNjLz/33GM6Ohim5r0ZGOSsY7CSBu06DHm2nsIT01xWmE++ksBPRWijw5u/QIzgqOTJaobnr1jZj75512qj8LlCrYgoWRHnJHXWiWRO8tczt8tuLFFj7z+WUzN1hqJua+k/2HhEQ2bYiARMTEqGS7ffv+LG6+/8xDvfvmjhVOU0NarMAiEHE5bEBbt/KPlpPmYG1ErY2ZIkaS1LK1WbWit+gsGp8x4VLktqaS2xfoBLBM6JEsAmN6cLhlAUTlfOv775AKLAj17k3RUMIbzq088Ir3L9+eajEPn1mnM69VfmTygKh9JHRPmP9o9ODGJjI9eLJ+5d27u91yR481sufs973tncFBlmbTSMCzzHKE4A/wsU6K9W//nj//LN732raULLQactGDG9ycVqxRIrE5HGAs+dB6KhHvEIRBDNlUJDUoT63bZCNlel+6UerCBV19pEC+fOmDF9QjmOoZEX1LNBySgrxVGpFEcRERMbJiayWZYkSZZkmUgWac0m1WparaU2tUmW9blsMMuyTCTLWCwTV1IRBoGMmpLEXY/23P7T+6IqJYP9U2aPe+W7LjjzpcePbFU2VnxHFiZUtw0HmMYAb+AdxNe7nqf3tL5XvyoFD7sGaSzB4x4h7tTzBteDyJDQLKSAgzqoCjvVzLnE2kSsE2d9A6Hee8/DHA+JYtQgOeqxp94crv9VFEowyEc5mIgM79xb/cilX2gHffrTl42aZAZKWWZYnEZap7eC1USwlBAlMqouTV0tS5MkTdK0lqZOjCpZkcw657I0TdNazVqbWucUzuU2T2BAvCMQoCREfleAekcjq6GgCMRSvXZvbBPwr2qebrVOcGn9xanH7MZF/ozmwL5/e7ZgTXlC9SMXngoVQgTEruS64+0Pb3787g1UwUvPP+dTn/23iVMnmIb6tMDzAkUC+N+hwPZde//pwx/48W9/NXpmx2FnHlIeSxkyx+oVmqzsx2CDvsVTyMgr4ry0yu/YeoQJN3LOnJIIiJiECY4iO2HiyLkzp49obyvFDFDmwpYCZBoZE5dMXIpjE5fiKGYiaFqzvZWBvsFakqWpS2q1pFqtVlI7CGctaQomgWYCscRpljGM0bictnSv2nvnz+/GgJ05Y/TFb3vpsRce2dJhYqNCGZC7WTaaG/X2X6Ah/JP1k8sKr5nJp36R214PITN8R0Rh6sHLvwIRON+OkkeJvMRWb7tN4d8ykSRzmbhqmlrfE/VZQBuxvh7hqKF49GGyHn9Cks7JEgGCxZwyQ8FMAEWIV9y++l8//p/nnXvGRW86NWkZpNYIjiJ1qc1ETMQRhEVEIE6t4chzVpVKbcDrN62fVDBOXJIl1Wo1S7PMOWfVOXXBOiRIJyn0+b3hHAUaR+DEWXGE4E1EWrdcGvLM6kkun+ENxX9OQO4beZ8eh+mPhvpnR547/SHYCQlDWE2UNce1+Ml7Nm1ZuSXpSV/y4rM/8clPzpgxJYpyYW+RAZ43KCig/x0ETJow5mMf/9CmrVvvuOfux0evn3fUtNIYVoLzdDCJupxUptxwII/1dWlEOBk0AiqQhybxYwSsTCTqlFkcb9u1t78yOGX8+MmTJrY3NYPZWUklUyeaCKowUWSYWCmOoqgUGSLLREypdZVaVkvSWuqS1GZwzoIsKVSATJ1vTTOZyDLVovVr1qVZdfaMSZde9ppjzzjcjFJFknuyN9qbnnSmIbV2oCfCE9dG/GmwChQif6NVmH/lM9gYCi8HBcUr8vgMEqgBJBcPRUQwhhTKUQLJxDamfD1tkvcB6hR3QwIafmT9AFA/HAR3NP8zyCgpMxmGLD1yybEnHnPTzQ+ecvIRLZMq4uKOkWMkdaVS2TpVayElZoIymyYRl2bJQKU2WE2SJM2sOqvOudTWsjSrZUmWZpkTJ0IwuT1P/lgMQeuy4sAh+naAX1VDQl5M6jcC6T6jDvs0nxoxVutv4ZAPhfcll0LlZ9Shx4L/CeRTDIXRFO94SEqCskau12x6dNPGRzZXutJjjjj0Qx/76IyZU6KwT6/g/Z9fKE4AfxIUsLB33XX/6990ycbO9XOOnrvgiFlRm6ZIhYQZEOODiGdjNac1GncnfAmWswVD/4l881H8mD4zrFMiIqMEjY2OGNE+ddK4sSPHlEola62IS7IkSVNxzrABqctEGXEpNqWIRQcGKoO1aqWSVm3qrKpAXajWHYtTRw4GJtKmFm1adcualXesOHDOjLde8trDTlzc3BYhSoWERIlln8tjCGdQZ29ypgfkN1lBnXjXT/8JJCr5sqwwOYFgqxnsN8N/5Ae/8ni1T+70aVIE4QQggFOkVtLM1pxLrc1E6jZvQ84VlPdFqfHAaZ8XP7wX5AfP8hOcP+aQITCTll15x9bs/33kc0cdtuh1b3jJDbf8oWVa+1HHHp1VUmHnXKbi3YHYqSZp0j9Y6euv1FKbOWed2kxSmwzWqja1WRbWb0mu1Hn6DZg/9kbyBFTgnAuK2vyqyl/h+hMd+v4MfSEx5Mt0yM/Ijz5K9RSRS6X+BPjOCxFgHIQ5IzVRFpuK2bmia80dj0nNHLn0sE9++l8OO2yZMSA8Y9a8wPMAxQngTwIBBtFRRy3/4mc+ffGllz51x1MdLW3TDhoTNxtL5CRjXz/6UFafTsXQsozye5bDP+QcLfIgBL9TGDCGxQlpRNDM2r3dvQODAyPbOseNHtvR3maiSJiFuJrWaukAOHxd1mdFVFKnThObeXmoOkSOSeHgQCxQVhOriVzMSXntg6tW3rl6yfxZH//4u+YdMEOjTKPU18DMftBUhx5onhZEGuV/Xlc3Bh5Ql5/k7AzqhFjjO4QoRvUGYv21QuMbqz9W1ed4Pe+gEbEYEyms10D6bzc0+ANoBP96TV3/98YfhzQHcg0NiAwIDsJWdPLUsRe+9JRf/PK/jzvqmANnHvX+f/14y4fHHHTIHCdOI4YLgp80Tfr7B/sHa7U0yzJJJUtqLknSmk2TLJPM5W9zMCca4oFUvxZCEA4dIw9psFiNyd/AyzUSwT4BH0/7S/3Eo0Ne5fytzT326rT+kLPTM74oaF3D8clLsgCNlKUa7X1892N3PUFZae60qZe9/x8PPWyZ8WLpQvPzvESRlv9UMFAiPvOMMy97xztjF62+Y/X21Z08EEcuIjEKIvLlf1h56gmNPA9QHl1yZgL1FgGg9dubiFgd4CRihnUiAEeKeDDRHXv7nti0Zc2GjZu27+zp609SlzodTLLunsHOvf09vZVKLR2o1AaqtcEktU6tI7WG1HihijKsOnVkJIrSuDlr2vzg5odvfeyYww/+f59816Jl01BK4pLzc7gM8i5z1CDV/SMcGv39Q84fuv8w5V/BUNZ8NPVpt35O0AyRsiPnYiiM9+Zf4wmP3L7Nc+IMYiJDFDH7YMp+I1seDPed+qJ9E8vQB0ND/+rnf8mf5DgfTgMpq+rgsS9aNmnquKv/cMOI9klzxy344me+2bm3s8nvdiMiImeTWq1WqSXVNKumaSVJBgZrfZXBvsGBajURJ0SGlBV+0ME5Pw0y9CE1Xte8tPdzg/n6+nARDWnA52xXmHv+E7BPmqAhvzY+Vn+T8z803gtVgm9TESk5EVASqS1lLaXBpu41PU8+sNlVdOaEqR/88Ide9KITDeuQG6DA8w5FAvjzUCrH77ns0rdf8kYd0JU3P75tTadWECGisHLJM9m5ApQa8RHwbbo8PgJD6lXypV4eB5mJVYUNE1gVIAZFzCa10tPbv33n7qee2rJx0/bOPb02U+eQpW5wMBkcqKU1myaSJjZNncvEWefbjFbFOlWAQZyhWdq3Pbb1vmvvmTN+yic/+r4FS6YJW+JURCgcYYLRQr2zGuJCnfwf8poEpkLz8B+Gsxrx+pn3vq9th/xLyAchztUZ6SGfH2pnMJM/8xATmOAdF5jyzJP/vPDThwj+kX/TIf9e/7f8BOM/JwRhyvWrVtg2dzSd88oXP7n9qcceW/3yF5/Xuxk//8GN6qJIWFUTl/XXKv0D1cFqWqmmlWo6MFjr66tUq4nzLp3C8CMDQJgE8KudKThP+Esk/JpLpLz16z5pKuip8isnl1c1wvQfR734GOLe9oxg/2x5RILPkHfBCCyb+nY0NFbmgbh/Q/+aO9b270xGtYy45K2XvOSC80ysTKbgmJ/PKBLAnwcGWqPmD3/kw//w0le4/mzVnY/tfLKbanEkhsXbGlN+aoc2LG+0Ee69gVeDykAkMAoD8lI6Yh81GWDDpE6cs7AKNQBDjRPOrFarSf9Atburr9KfpjVxGWyqWSqiUDKiDAULq6oja9WqCAsbG48wHbvW7b796rvHtJc++9n3TZzZRmwNW0BVvIG085nA8w20T2zcJz7UCRffmoX3xt6nnkUgkYZElnqi89+4vtQyhDyEfWFDdgWHmt7HSYbvmFNwHVYlnxK0ngFQ3/f1LG/h09JCo9Xg9UcN0ZBPhKqqpI5Tp+nByxYtPWzmzXdcO3rM5EMXLbv+t/fff9dqdrGmbrBS6R2odVcGu/sH+voH+3orAwM1P7zmtzN4D1fvBELeDhS8DyeFehAn+BhdT0dEjZjfeEPoaVF7n3dmnz8PPb4RwN6UCmoCfbPvq/MsL12uR4ISEXkfUxgL1cg1ldOWwS3V+657UAdkZKn5be9468VveF1klCiqP9wCz08UCeDPg78Rx4wY86lP/uuZJ5+edqerblq9d30vW8NqWFmcqxehjfGnenU6tILOb0eiuhgRgXFXIvGOkS4ijlCKyJBTWKgF/n/23jvekqu4E/9Wne4bXpycR6M4yhrlBEISGUnkHAw2xgFwAmy82NjGgAP22us1Tmt71/7B2gavyZiMEMpxpNFIM5Im5/RmXr6h+5yq3x/nnO6+b0ZCxlgSmlcf6c19993b4XT3t6q+lYRIjDhSBasJWKC+zxxEKRcrQtbBilNRB1+jwGy5YRvdA+1bv3SLtvIP//rPn3PhKZQqG4GCYzlWwaVU4KUKEeW7EZkKDAUiunqYLYCYys0Vln6kv8q/UQH2lQh64Usg2ul+JcOMGRBijUK5j5mUU/EvVTYSkJWqfwjRUyVSIg0VaSAViFPX4OSGV71srD2+fefm177spX2u8U9/98WxsU7WpfGJ7uGpyZHJ6SNTU2OtqXae5+qrthgmxMAr1JZXAsUlnyGiKqJOQ9zDE/SCSJqFDVTO91gnfawFeJw3e5y7sNUZAYTY0Zw0Nv9ThdSINTMTe6bWfW+9mzSJ6Ote86qf+7mfa/alzAn1bGNWnokyqwD+w+I56GWLF/3Z//gfL7ji6s5I64Eb1x3ZMZXaeqopg5zCUZgiq6HQK4YZCSFZ3hvGSj7DXUAW6kBQw0oGMIXGUGZBbPVLTMQqEDEg31hSRUQdxPnW874bpfpJMgbW5RAFMSNtah0Tyfe/cEdnbOpn3vmK51//HKROqOvbicI3u/bHBUR81hKkZ8JIZIYoRBliezTvx4S/MhOxx1v2hAeDfMmT30bIGELElcgmFXx4RDoJQ3N9zZ2HbtaKi+E5uABVxzhgqhrKPdyHllY3CvfBM/DQMGeFUljbnr9k/uUvvPimu28antf34kueu2X9oX/6py/sn+jsOXRk58FDB8eOTHdafjaZdwKJfMNuxCB09C7iQSlmwiT5Fv/iF0qhGob9lOmt4QR6CLlYd1fxt7TcSfW1Vt+c8bEZEugmUiJJSGuqZMWpyRiuZptpu9HZY9ff+tjUyGRfrf7KV73iv/3Wb8ybO8iI09hmdcAzW2YVwA8j/oFbtWrlX/71J6+6/Ir2/qkHvv3Q4R3jnIPUxJQWDUl+BCVwaQ370mDy8wg12LEg+EkhIRzgpwVTSC4SEBXzC0MtqM9XJAaIlaOJJgRWR8xkRcQhMUzKcGpyk+a1B2+8f9cje69/yYXv+vk31+ckllq+i4zClVZ5hbaP6fTRWJ1pJ5ZuTKS3gGhec/kfc4iQh0wef9i93HaFn4/brx5S4WkoYuUXVL1OQKF+IsOGo9GH4oaqTknxezzwEl8JvrWE9wMUvoUGFBc/7yLXnz/wyL3XPO/KlQNLvvvF++9d+9C+scOtTsc6ESUBhKGsobA57M3PmdEK846YHVtZbZ/eo9GtEY/+VYas+E5cGEUMOvWcYGUJopMVSaSC5KGeBZ9xiWOKUPABVVhVlVhVJAVzO7EH3Kbbth/Z2mrWamevXv0bv/XbixfPZ2KNWv2J3JJZeQbIrAL4IYVADD3lxJP//JOfPP+c86e2HXrw+w+P7W6lWS11qRFWja0aPU4Vz2IkmgsaWsIgLW8eCuAbbvoYZBHcREjYLrI+YoolFUyMT2JXQ8SANSSsFo4YaZOadTu07ubHNt738OUXnfjhj/5S//xajpaQ07AfxCycHtTUmK7aa8mVZE602LXoTMAgPzDBj3L0QwmJQvdOH7alqFaiwimRQov9oogB+F34McCll+CxUaUgg0pFdBSu/UDRarCatHRAvGcg6luwqYpbMH/O819x5Z3rbp3oTF96wXPcAb7v2+t0gjljsio+d8q3to7uFAV3oiDrywKL6HUEV6rgg0Aqvr9GL8dWKIfo65RKy6s+mom61PtvxfcIP6tfqn47qkxiMCvDqVWTAWpcmnTrmDAP3b5xdNdog2trzj7vf/71X65auTy0+ZzF/R8TmVUAP7x4eD/7zLP+4n/++dJlqya2HF737Ycm904mlowwg0RcsNDiHEIGKJAjfgtVeyu8WaBi4Iy9KxBNyGCfxyCDz9eJOAwAqo4UIiCQIUOgRLiR1fc/tO/hO9adsnL4w7/zC8PL+yXN4CcBUzAxPU199BnGw5qhGOJBRmJCe7/HFCz/HleAS+Y9muOleqlapJEIKv5Y7J8AiIgVyX2iUzkS8hhUxgzw69lXz2eo8kdooUmC7exdMSjUkbLigvPXnHrOqf/2tX81jWR+38DW+/a3R6whX7ILVacaeZkKYkdIjdkAcfGCIxhoLA3j4WMKUOUwC5WJHj+m6DY186SK468cwMwVOGolgOgJVUPSAIQIAklAidTchGy5d8ehbRPG4eRlyz/y8d8768wzTcK+nVPpYc3KM1tmFcB/ShhkwJdfevlf/PmfDw8vGN8+8sgdW1r720mWkONI1DApcUgp17LoaCZFG7kB9U0eA5iEagKvDTRAfhyjFz1tVQCsREpMpI58WwNSk6ppuMGDm8fv+vZtS+cN/MaH33vqBSdYyp1adapOPPxTsJ6PMvRjRPdosqYEsohe8QX8iBfW6AdQNTIcabCg/7TcVzyhMtIZE4E0KFE/cwYWyFV8pqvrdQB6aJyjjrvXmK7uuSQtNP6IfkAMjACqKlARmxC97V1vG1zUf+u9X5u3eBBjnZ33705dwkIAM5lA9pB6ZygUKAe9EpVAOQEzQL+Ii7PRfL5QgfdaaITSVj+KXq8mgh6Le6fwraBji51XTr28GkrRBoGSiIAygq1JPe2mPFbbvX7/9od21Kl+0ooVf/Anf3DFpZdAZbbc98dOZi/Yf1YYRMDLX37DH/3hH/b1De57bN9DtzwyfbBtHBs1oT4g0jZaPP8hBcSbw8zRqi8c/ZjtXbVMtYDPXhgown+igBPys1YIgEMf9U3snLz9azcPNeTXPviui55/nksyQQ4mVQnp9BrxqKqYtPRQAhz2CsWwrf90cExicicVcBKT3AtHBihSQItdROa7yIwJhxDQX8owrzqoFWdd+C9OiETcoWImwleO+Vi4eDQ5gsiFxWQkJpD6OfbkrKpLMuqX93z4p1/9M9ddePUZA3OTnRt2tEdsDanx4A1lgJRjSn9InPen4SPhgQbzkXxPo5WEWTDYCSGiVCjX6FAcdbw9gWWU2cZa+UxxZbX3VwrtSyopXgi0HgGqfs5Rokhy0+w0d99/aNNdO43jPk5/8yMffv611yYJGzZFIces/LjIbCuIH4EwoKCfeOtPTE9PfOT3Prbvsf31vvrpzzm1NidxRpRFyBkKVIsGpSuAeEc74B+0bNqL0B2B4NMFC0OcSps5Ouj+a/AD5wXGM88ENlzTwcO7p2//8vdSyd/3wZ+84sXnaiN37AwlHB9VlZBJGZFm5gN8DEOz/BmQQ70aI1U4769U6BuKxAfiV4CQIuuPgeJ+ffv+OF/Ln52GzwkcMavCilonuXW5c1ZDNW085LCkMw53Bu19lARyJLRp0oCIXKgxjaehSkJgsSRMndoCft7rLnXTVBvif/6Hbzx2584LX3yu9k13kKsSF+xcqc7C5mK4JtQxqPhKbQPRcCViEZr4aEBwBQo2jqpqsjiLSFpRtSNScD8qSVUz6TBU6alQR1FcK1UlElUklNRcmrYHtj+447F7HmFrFsyZ84k//eMbXnY9sVLopDSL/T9mMusB/GjEAHVjfv7nfv7d73x3nfv3PLpv812bZcolMP5ZVhFQ6AMU/QGq/ESRih6BT6AivpG+VnjkmAIECsMUAYBC93hfR+Ztz4ZN7CF75zdupWzqve998/Nf/pykX4Vy9kHiYFFr3GmwKwsCpTTWQdFB8QY+ozxmUKiXKs6IQySymPoe8l5KdPDMT2n9R/yNnwnLoqqiEqx+FQWsk9xZKzZ3zoo4VRfSgYByRXUGLh5DjvkJKvGw+Fx4wxdtAVBWIcTO1KpiGpoOyMve8vzTzljx2P1bjmybbmjdUMqAig1uGYmUVxYF1RbDNgF3VRygRFy4huWVQcEBFi/C3VJm7MZrFI97hhovIsSFB1D5TC91FDQVk6eclNQQG2eSTrr7wUPrvr+RcgzV+j78kd9++Q2vSGrGsDlq6Wblx0NmFcCPTBioc+2DH/z1n/7Jd6SS7tm4b9/Gg9yhmqYMVkIOEQYTGfXjYlkRsoO8u80h4Roexv1jLM6FGlIVwMHXGBCFrHefTkqJumiLq7KiKf1uNL31yzfaifGfffdrb3jjNaapAmtMAivwXDMFaxOosiBRCxR4H3/GhH5wUApMZILmYoBE4XxqqidqHKnA1/UWdEPMhWWWaBCX9LMW/wUNIkxCcIAvc3DQXNFx0nGuI66rzkIdVEjFFyIUiqWg0o4GpcehiCqgqUoihLDZ8GcJDFysDPANVp1Yl+T1efr2d7+8Uac7vnu3narVXIOQ+KGdYCZNWH2EpjCSqUz38VV6EoLlBVOkCh/d9gMWAs6zR3KfaxqTjONCFusXoxYhkQiikdISP77aa4kyplwEG8jrcgZIRJUtIImamqbUru3fNLLu9rWp8PBA/8f+6Pfe+IY3JqyGZjHkx1hmL96PUgiYM9D/8d/96Ote8Zps0m65f9uuDXupqwmxHzDlk/YVIn6sOyMWSZVOgS+YKoMAVFQDRGufVDx8Q4hUVcUpGfa5NWSpDw13yN3yhe/l46O/8PNvfe1br0/7Ich9BndpeZd1SYWU2Zw+pZQiRkc/pOR/KND7ARUrNIL/Y8h7j4hXBjM8uBVMhFbQJ3gi5R7VqToNxn4uNhObqctVrB87U1rzlXgEyndn8kHFMRzbTaim2Wg4hfCaoD45lwH2A+dVRB1U1ZE78/LVr3vLiyYPj2xdu6lm2TCB2BCROFUHYpLCKwocWOTOIteHXl0sPqajkZ2aEQWofPQoHe4POdYz++tSKdSITkWhiop4RLmaGphNAxjLaSc5uGHs3m+vt62shuS3PvKRN77hjY1GatiIHMPfmJUfF5lVAD9KIYChQ32Dv/exP3zDa147daS18Y5NB3eMUkapmEQZDg5WjYbKrch2BMAEQu2X1xdMlfzwgLcFFSQAmBwJSJl9HS8Mc537pg/L97/wne6R0ff94lte/YZrkz6ybMn4giahas5ggec9OZfV172nV/BDhSHrKxiouslqAmdglgoaPfyMiiHia4VS8tglKqJO1PqMT9XcqXWaW3E2tNUpVFWxE4931Ujvsc39+O6xtIASqLrNqApR0CMCUQjIeRsdCrVgQ694ywvOP/+Eh+980I5I3SXkr7ELtna5sTDtsTKogDzbXkl/kmigRwouEjgR+Gd4OhRuICqcIKVQW1JcPSXf/Kc865hWXOyIiJVIoGqcqjNqalJLOo29Gw49fNvD2qKhxtAn/uQTb3vzW2sp+8gwVa/9rPy4yawC+JELMfGyRQv++BN/fPn5V7T2tx++bdPejfs401QMgYVUyHEYouoJFo4/TexsWebOF5nzQCS6hQzYZ6X4VMLoQmjNcXskv/mrN2t76n2/8o7r3/BiHtCccnCglSgWD5e2o3oKP0BxNEXLlMWodmZazooq0kvxMtYj9C5KBXG1YEKKzxS5nhGbBSoQF5P9c5Wuc13nMiee/Q9J9MUGKukrvfvFMVVA5a8zZUYubJh4XCniihqYfTs6Ayioa/OBBbWfeu8bnM0fvmujsWkSSDwwKyQXEfHpPwjuisYXcdWiggnd8MK6EVDmBQU1Uro9Pdk+/ijFx0YKb63C+xfnUJ5ixUcqfQcAasDGMbVoZNPYg7dsyCbzwaTvt3/nd9721rfXGjVjkqofMSs/pjKrAH6UUvjlDFqycOmf/NGfnnTCSeO7xzbfu2t81zR11eSAkIiq+DRBZd/wK9TiYkahMEXDrDC0CEiUjXCClFzCIFUhaEJa12ZnRG/58u3Syt//qz91/euu1YbrwpIxEKg4+O40CCko0QIPqqiKlSET5Jj/zTjbYLFqIREXZgQYfV581FbEoBI7FIh5jj6ZVUUlzFIUtaK5Rddp5pCrOG/+Fzx29bhm2P8/8FJp9ddqGCQ2gtaYIYsIjMGU5gjHGvqmssu1e+qa017x2hc9un7D5PYp02oaTZSQuxwAwaAk/mMMt0wR9YehRVAoKkt/hlVd1/Mi6NFAnIUVDDeR+kgBEHpJaQ99hLjHUAlOpBAIyAHOKKeamHZtdOvEQ7dvoHYyt2/wY3/w0Xf+5E8lRhM2T2qRZ+UZL7MK4EcpFfAhBl1ywQV/99d/Pbd/7uiOsU1rd4zuPsI5JZowDMQxA0RCCvJcrwoVTeKCOVyYah7jYkMhVYLzOaMsbKBOUk3aB/Ob//02abc+8Cs/++JXX631LEfOJlBMKuLJF/FjXmLEoQBiLRj4o6zgGWeJCmAWwIWSwwggXskoQgx0gIiYTElvxYSiCMhFSSz5k3dQKyLinIhV0VAn64mZqgoCKv88Gelhriv+SLSbC3yN74YjIo20uaoSjDqCCEMz2Ny0X/2Oly1btfDmb9zRPWxT+H7IRU2XP3aKtQGowDHFCjCUJH3cdcwULjRdcfm4gv5Q9SWExaZLrw4zFUj8SZWDCN38QIAR1GwyuavzwE0Pjx+Y5sz89kd++x1vf3ujmRpO4jV6EiMIZuWZLbMK4EcqFZOWQQR93lVX/9X/+OS8/jn7Hjuwbe3e7kiHLZEAgIgKq7IHy0C5xCewIOpLugYI/LEY3wDIKZzH1Rqa04fszV+9LZvufOg3fuYFLz9fGllet5oQgWCFnItxRj9kRo865IDS/ggqaKGRj6gk7FS/AniGokryA2WCqjf5maovEFlq3/BMZ8BakTKkChXx3oALtbDhCIqy2qJ4ye96hoH7RNIbNp7xF4oJNZW115i0H8CcYo8PIvbWdoasfyG99Z3XTU9ObX94s7aIc2MoJZDAKnvvQTUGsMMA+ML9Crk7oQBEK3mf5aWqsHfxhgjVZPDxHYlVaGWVYLhKlchBmVJMIXSiyqIQBmpqaq5vbPvU3d+8MxvL5/UP//f/+UfveOvbU0OhzSeCLn+SKz0rz1iZVQD/JVLYbobMa1/z2v/+8T9saPPQptFH79nePjiVSmJgPLCKOigITKpMZYw3bkU1NgdAbCstJGIERgwRuaTWTfNRvefGe/PpqV/9tZ+9+rqLqZZZOGJv+zNisj9iugeVxnmJ9lQa/jGbvPRAHvdEKRRLaYSbeOYzKaCAXN5XIMCPxS1DiMFa7TF1fd5plV/ytnfV5kcBoJWD1fh37Xn7cS5T+Z3qb947Oeq7lXQtViY1PtYqTuCEmQjusmsvvPKq8zdv2DK+YyLpMsN4mFVnEed7URH/jwfpT6HM+ew5lkobqGo8QFChjGJuQKSDgm4t1yGeUBmm13hdfMKrssIo1Ww6ub1z57/fnY3bJhq/8eEPveXNb6k1ksD7V67trPy4y6wC+C8UH29Nid/81rf9t1/+ADru4OZDW9fuyickQcLKChAbUoKI54gR+Fjfm98bar4ylgkJaWI09c3QmNQQ1fKmG6vf8fV7p0am3/9r73zR9Rcpd5UlSRK1voFmBuMcIdZMBTTRgiKoWITe6AYVGSjQHvwM7FBZ3uqLTFEBckLZqCiy2JF5LnkNIk2IEja+UzTHiY6hb45nISh0yGAwKYkfokARH0N6VKktKoqsVMBPLFr9zLFBjXpea9U7QuHokIKZAIizCm00kjf/5A39DbPhrvWNrM9Y45xqtN0Z8GfEIb0nrKkTJxrUW+wNASBmcoIJxEoch9WEGRC+VsAh+JAUisqr517G6lU09iWk0OGOnIqQBYSVDBJktYk9nXu+dbcb1znNgY987Ld/5qd+upaooVne/1koswrgv1wUSJPkfR/4lXe9/Z1uWg5uGtm3aZ+d7hphBosoSIkrJHSFjI5FOwBJTAMHKSXMIpoI2XHc+a07pw6N/PJ73vmS664Et4RyGLVivcUngEeJOE2gsCHLHcUqX//nCttQtoKuYHfg88tDJRAXUWTETVb+HrUJxa/HQDCRARkfD1AgtkqiUqmQQWgwGZOECtqpJGiK0EvMaOw5AHocaC8P8PH1xFHeQ0E4eV5eg97yXIqGwLGV7spTF7/qddceOHRo39YD1IWhhBAhn+AJm3gC3nKPfZ8UoQ2sV4aR6w978TUgDr4GOvQTESqZfF8CViSz9pyJ+mTTcrgY+X5QIAUrEjF1m0ztzu74yt1juyfr0nzf+9730z/1082+mqEkhqMfT1POyo+lzCqAp0IY2t9o/tZHfuflL7nejnV3rt05tXfa2ITFQOCfaEuiHHnZQJEXProSlEnAVtQxMVPCarotufO7t4/s2fdLP//G62+4xNRypNGlcEKq6respfaIbEOVVQ7IUcmrL2iZGTROJRgRvhg1SYUw95jvTdGy+V0lZhjL3bx5H3uFVlVKJIkohohLGqkC2BTDCVWGfCY80RNRQE9SKopZCycgqshy/Xyxlqhaspq0X/Sq55166tL1ax+UMTS6tUQTOBWxcUiAqC/r1XLSQXSriguiZWWDnxrhpOjhWgQjfJyg+Epg2XoOnjQ0X4oqBxQKlcmpOgZSmLTbGN/auvfr93UO6dyBef/tQ7/23p//hVpqOPb5mUX+Z5/MKoD/cgn2rOrcuXP/9E/+9IKzzps+ML35ru2T+yeRewRVgRV21k8fRMDVYHBpeHj9htgQFMYqOrV1t60f2b3vp3/yda98/Quo2bXGwhARQTiwPB4ofO68WN9aRwu+XAMXXSQFRSj1zgNVQbhq0lfOLLD/RMTFEEhQbGWG8p2Cko8h5UCj+/kw0SWQuPtAGxVx0Miaa3XvYZMROGcQH0cf7xPIEzkBR/25zHcqrxIoJPt4LM9NVp+X/sTPviGz3cfWbuY2pZyADTOrcwDYsAjEl26FthgUWgxJeE1KKmUltFhV53tfQCP7X9HJxSEV/SvKoDnFjnJlh6YYR2eQETJdOrx1/K5v3NvalzXQ/5Nve/t73vveZn8jSVKUemlWnm0yqwD+yyW4zsSGzPJlK/7gD/5o+fxlI5sPbFm7szPWIUuszGr80xhiqRKQQOEQI3wOSlADSl0dU7WHb9q8a8OhN77+pW95x3Wu3s2NJZOI9fyxU4IQK0j8AEUnuRXrG8+UdniJ/hHEIuTPAFst/IMg1a0AMbexh1gqPgjVnjKx6DNEQiuqF6Xwv4QzFytwqqIikCItk0rl1Gv7V7Lle462MNIf7wJVPzZTZkYTojtTiXYXNnv8iKg4FTH5GRedePU15+7YtG18T8tOk1iCMIFJBE6gJNaJEyq0XLxgvtYYDijQ34kvgIvaothb2dihYMeodJSCRldVUibloFegCgdyBpQoUzvdu+HQ3d9em43QnIHhn/uZd/76h/5bf18j5vvPYv+zVmYVwH+9VPJcFHrlVc/91V/5AGt6cPPB3et3yXROghibI8R0cEHs6U8KhkBDdNSBO7WNdz66Zf3WF1979Tvf8wapT1mTU0qi6hu4EEEQWrMp1DrJxebinK8xLazaOGkeBdVS5nGWJHr1l5lnVpBIkXCOXY38Nv1Uc40hg+BYoAyk+iRHAsAU8iMFcOqcOitixeXihHxVQTS0iRi+eJqrvIRWfs6QMhvmceQHaofiVYUcK92OwIX5pRIwQWEcJGnIK9/44r7Bxn233z890qmpgQgzASQi1ubiSXwHqO96SnAg34jOqTgVC7UqceiBSmR5IrWmRRe86kWKvSuU4jxhEp9N5i8VgYjUKBmHtIsd6w7c8411nf0uxcArr7/+1z70/gUL5hg2RS7ArDxbZVYB/JdLb2yUGOZtP/lTb3/bO6jjDqzfN759SjNiB7Kh+Zf6mcAGIDVKJEasMpQILCln9Q13bNzy0LbrX3zZL77/9TyQ2YZoyqSM3CUaUvJJDITFeeMZDpQ517XOOhfol5hfFBPqfWy4zDxCAOuiT12pEgqmOb6g+FkAvnRBY6JRZdhN6TaEoK+GlqLECJjlRKxzubXdPM9snovNxVr11bbkN+a/U7TRi90yenLlZ4BWZNUf9wKVn67EGB5fp3h9pATh0Ie0cIIIDOecU+tIhPJFJyx449te1m6N73loO0+mqdTEqhMREVaGQpRVQ+8jdSJO1HnQV/WknRAckWMWQ0JQVjX+kmmldx+VAQNijb1klUiUQKTsu7h6I0Ihhkzqamayb+Mt29d/90HTag72D7/1ba//+O9/bN7wEM+MAs3Ks1NmFcBTJGX0DWg2mh/+rd+65LyLOyPdnffsHt85aXI2ltQG0lfIScjMVlY1SqRkFImt73l0x5aHHr3ysgt/5Vd/sn8uHDuTGIJRIQMGICJaIcglps9n4jo2y52LoUx/QEVg9qhIqgaDG1qU8RY4HqKfWhA50WmIiMwMjtAcin4JMf8nkkVFHqePkYiqEnJnM+uyPM+czZ11ImHATRy+VpjyCgl0UJnX2BOKnrH48byfwBXQx3k9Q6hove3D1RrXEkKkjNA6W3PnYOTaG6689Ioztj26fWJPizJASK26zIXlc+J5fxWNTcFDDmjVX4tBAo4dWn3fN6/DfZVBqFQOjKES+bwBEGslws5C5FIQO9Wp7vqbH3n0jq2mXW+mgy96wTW/9lsfWLBofpqYMOBlFv6f7TKrAJ4iKWECxITFi5Z+7Pf+6MQVq8Z2H9l08+apXZ0GagaJOg5zpEAKdk6VckNKjuu2cXjzoXW3PXjJBWf+6q+/MxlWYUmTmuakThMiIperKAyibR6IIyIHVUJXbMtmmQiYNcZqA/FMDuxZZ0WgIWIlWOChCtK9yOuHBMDx1UeWyLE6VjFAAjWqBhpsdo/8IcQpZTRTRMUpBAxloyaxRF1oR6UrkgOOVVlAagpfIkab4+wUxLBnmekU1cRMAqei+7S4Lr1XqWTVj5ZobfsQLDninDhnOBahUJUbO12LiCMml+RmKH/5G5/fbKYP3bNep2C6CZNhk2Q2EyeJGHIAmHyNQ1AGRdVD2cUHgf1XFRGIkFSLqkFQ1kIRR6onAYzzKw0BO4Imkqa21jrUXfvd+7fetzW1/YNDg2998+t+/xMfO2HJ4nRmRu+sPJtlVgE8dVLRASDg0ksv+cM/+qOFwwumdkzvXLvj4JYD6AoLkRAJFOLUgVlFrOs2qNE6NHXnd+48beXJv/GhXxpclDjucgJnLcBMLCrKyh6ffSp4aF0QYFGgVlzbdru269QG21+LRjGhpism3gDR4i4O3v8T7MtAIFHBBcVCsB6bv1ocHN/jwlVAzKcJDAqBiBOTGjYgCrVREoaXx3Z5paNC0JCWT4WJW5zGE2PYk/iM/iAQpMqHNG6yiHWElRJVEbYnnn/Cda95/uHR0b0b9icdJiERNeAEBHEi8A1Oo3MVXIvoJcWeR77yNyRPFQ6YCHx8AKqicAInECUoq7BzsEhIWcmAFakwtXls2+gD31y3d/0R6tb7k4E3vfq1H/6tD5588qoacygW1sfRgbPy7JJZBfCUiscLX0vPoBe9+GUf+Z2Pzmk2R7fv3b9+d7avRblRy2rVU8JKTsk0zEA+obd8444l8wc+9Os/O7y4IcZxAkDJdxX1lcMa5gYAkJAWEiz3or1Q5mw773addT7hqDCci3CwDzr76qCSOK9k1ACxM0WAvZhoUmF0ik0VWYe+w088f18moD7eCYgQAFbUiOpJ0lerN9N6aowhYvUzrDiMYynM0wLxi5qxGGuOKfFPLI8fEyg/cmxXwPsfrODQdDPCfqzkqkRUREScOmJ96WuvOuP0hY+u25gdcSYzkgNi4MjBOnIhpdN38hEfZ2EKXL8q1JE4EmGncKRCEoj+0A1OldT5rC9GzJISUXEgK5yLccpKrqbj9T0Pj6y7aeORx8aR1xfMX/S+9/3ix3/vo4sXLjJQgFVj5GdWjgOZVQBPgwSYVK2Z5E1vftPHfud3a9300IaDe9funtgxqpkyDJw3MW3KnOS89nvr8mn55V98xynnLrMmp0SZDIG96e5ZAZ+EHguEpFIqqwCYmJisSDvLOnlm/Yj6mAjks5AqR1hY29HGjaQ/xVY04ZMFQRSTi4pOERoIn2hRKmlki8QnIQIxqV1UlUGsqJMZTBpDjf7Bel9/Wq8xs1Q8hXhCFI67fAfVVNRjrfmMK/DEFcKlVHQAVd+lGIgFevwFv2hCvusPSEUgkKF5zbe96zUW7pH7H9FxbmgdQk6gBA1zZQAF+xI4H533QYWiSJgE3i0LTlHBx2lZnxbEKfyEYYSoscK0kO93W27b+sjNG8e2T6BTO2npif/jz/74V3/jA0Pz55okFnzNIv/xJMnTfQDHsRAztJ7WX/+Wt+/edfBv//bPDm8aabU7J6er+1bWQZwI102SZn1bHty5b/uet7/tVRddvaarXSQ5AFHh0OjdQw5JKB72iZWBD9cCKIgIrCxdsVNZN00SkzIIXHZXAGZgXHwRYp2R/i8JnxBs8FFRoeBqRE4nNoEIdInvQqDwBQ5Ww9grlHEGJcAAYLAxNdPoGNNWalvr1CkEYZh9KJPrIXECvxb3+CNrWBZ9lvJ3QrXRQsgH9ZpR49HFWKz6dj8QciLdMy889ZWvvfbz/3zjkkVLF61eKn2aIc9VDce2bqIQKIcVhCj8OSspg5QB3wDa009clg0SWIGocmI/URVisjWXiUxmk7tGDzxycGTX4ayVk+XhoeHf/b3fee3rXl5jFueY2R/trBxXMqsAnjaJzIk0Bvp+/aMfWrZizkc//rHRHWOb0y3LuguXnryoyQ1jaWoke/jOjReee+Zr3nAd1eGoQ0wiGq01qTDaAa1i+WeYPRh2p2AiAlvotOumuUmYGyYJyBXhS4sUdwqxaFAs3iqoldBNIqI/EP+lkhmKWCwa/AyNoWlVOPWIHoucA9+tvkaJCMZ3TDMppQBsK+84iCLOntQyIF0Q5+HY/LISxRBFr53fU4XwpKVnK1pcvMp1rLxT9MwBkUJECY4NxFhu8Itf99y1dz24fu1Dz5m7IF1OecJ+mX2pFoViD1ZyJKWXVeg1IvXlIUTwU9iK+ggVB9+STsgQkSMSshnyiWx079jBrYcObj/g2tblwjAM7RtKzr3ojIQI4owx5e0zK8eTzGr8p1uIDcFw8vqf+qn3vO+DjObuh/Ztv23nxKapWjvNDts7b7xjIDU/9bNvGF7U55DB+F6dLqCqsipHBqaoJwXKB9oHDX03MTAzDGcik1k2nWeZc5G5iV8JtUJl6kmoSwuGetn/Mn6k6ANUBogRtxjYHlEralWtaq7IVS3USWX6FQEMJRL2A9NCaiSD6kla44RjP7QC5GNZGGL8GkXTtBC7DsGBIh4RM1n/4zhXFHkVUokwFCHugm+jyEeFJnkKFXGi4tgOrUhf+aarDbmtDz7KrSTN/XSgMNYtpmSJhuNVIgX7UZJK7OIF9y1Aw9VRFWF1CVyNLCfQBjp1O5aMPdredtueu7+09oFvrdu9YVc21XW5MINIAJo4MvGXn/izkSP7Z1QWz8pxJbMewNMswWCF9Nf73/ue947s3f+pT/3f0cfG7ht5eNfi/bnrHtl/+F2/9OazLzxN0IYJ/INPtCnaKlRMcYrIGyfKFDPCi6/COaCT51PUNkRpLaVg9ftQbFABFPl9ibxP+D5Y1PmDL6q9IsdDRKHZg9cFQhQGUar6bhYa3Qq/ARBUYxjZG/hxchmgICPhaCjwLCjP2JMxkQ2aiWBUWZAC9H8IG5dipLvad4fKP2LGrn3UPVQKCDi2ZVCoE5cQXXLtRQ/csvWOGzcs2bpw3mlzJREr0Vnwg+LV640QBRKoL5cjqKiS54RUyJfFESDEjtnVBMg63enRqZGtR/btPDh1aNy2umV0gAkQNawOgEy1uv/62S/Onzf8B3/034v1mekwzcqzXWYVwNMvokpkANfXaH7kd393/uDCT//zpx/btnn6cIdTc/aFq1983bVIrDXWGKhVIhYw1E929EwBldAY6oE8X8wKcmqBgnICwMwgoVaeMZmE0maS1rhI6QSR8fFkAkJ/eVH4VHJPPQcSSD0uxsmzTBTQXUIFqwpFrh/wEeDIFmmY5gUKLYpVAFKICWSUKEFUHMSRC8cQh0AWzTWKQrSQxhQ6HRTuTBGE+OElRmd7Ygra60doQf8XnFRRYOe1Iqmq+ChrfSB51dte+tDaRx6778HLF15r0jQzmVEQhYYfRgxEmVj82hIxWJwQEZwSRBl+qqYRk+SmaVPbktaRfNeOvft27R4bOSydqJUNuGYazXqjr9E33Ogb6nPWjew+3Bpra9dMTrX/6Z8+85Lrr7vquS9IjMEs+h9/Uozum5WnTSJ2QNURkYXs2Lb7c5/93O9/4uPNObVf+/13XfmS87o0jcSRCgucTxhkQgBNFJwE+cnewTwmQETVqnMQpyoQzxwroOIgWjNmsFYfaDT7kyTxQUUK+OX5Fh9WFonJPNFoj60nix7GnnD3m+BAeUAtxIXAbwmZCuWCmwkxzkjNE7F4qkadIhPtOtu2rmNzqy62iwskTxlpDX5OYMK0J8PnP2vUHst30Oq5ACiqEZTgU3AZhQ/iXSelwNCbVFPq1L7w11/5p7/51lnnnrv8slW2X5SsiiqDAN8kHARhVVIwwYLARGLYqIqIGEoNJWQlm8wPbTqwa9Pu0QOjIq5vTv/w3DkLli9ImmZg/tDQ/KGkmdQaCRE4gaowG8lwZOehtd+6//CuCYPk2muu/NSn/mnBvIXGzI58Oe5k1gN4JkikKYgIksKsXH7ipVc+r3+gcfHzzrr0eWeq6bJxTpw6ZZ+oQRwzLJViKiJFnkJjDRYpkSrHLpMMVoWBioqCHGumMpl1lJlIG0masom8TkhijF3HqCwvi1mHVLRF9sFeiHdERJ36Sl8VBVxBCQXVEltBBIcFgV9BrBojBXxGuzoR55y4MDVTo0PiTzxEhDUkrRdB3xKcf1jop9BA//EciKPej2GPno5zUa2H0DipCMBiyTXq2TXXX37Ht9c9+vDDi05c0rdsoFNTIadwCiUlJnJUxHMIDDLkhIjY2NqAabq2O7j30KaNmw7u2A2HxcvnXvjyc+eduKg2ZwCJMQmxCQEasHbFkq/uAKmTWl+ycHjp1UsG7vjcnXsfGfv+TXf9679+9t0/995Z/uc4lFkP4BkiZTq5ABPjeMvb3vzghtt/73/96uqLF1maViPiBEpGId7QL+On0Q6N2FhkoUBVVKyKI3W+144nI1REVUgBZdW6SfrraX+t2TBpwmwgEKfE3rSOLoWq+t71MXUzKIDYfNi/TwBCMUKoR/BHRsE9QZgBA4lFveTTZcjXP/ugAzvVTKSb2651ue+DHBNbKegeUfLzEYGKJxDX8z8EZVVy/6gvHquNqFZaAEWHoOCnirBwdfOhYSoRpSZhq6Zbv+NLD/7Zx/9h3txFl7zoimye61BXWUglEcPETonUkIphw2ASAxjNJZvMdj62Z/tj21tHxsxwetrZp5x21sl98wfzpnV1K6QwrKox6B1XFwBDICqamoSVajDtHdM3/cudrcOd005c/vl/+8IpJ59qeNYiPL5k9no/Q8RDIhQkIl//xjduv/OuV7zx2hNPW+7clNaEVH2DxgJlKz3fe7dTRn9RaoJgO3PUNARSP4xLVHLRVp4rjNZQU05DdilFYinWCxMktqMXT9Mr/ExBrxs0BgUKtPZfL86QQl4MAPVFvhxDtRQjuapkIbmTTFzmxDeDIxQGf5mAFAzrmBjau5g/5FWoKlIUfouWybHVHZRvaeW3YnpPjyfhTxmikrs8TZIE7vxrzrj0xtO+9/VNJ2zbt3hoWW5EYAmsSkJExIYStmQ6nOYmm3ajB49se3TrgX378k42vHDgzKvOXXT+qmR+A6m2UgtWJ55bEyZWEd+TzoeQVVVFATCRiFVQpnZw5dCFL1hz59fu2bxzx9/977/73d/9aLNmZrNBjyuZVQDPFPEUh8B1u61PfvLPFiwfeu3bXpHUrLD4lEnmQH17vz5+z5MeFBsRAJEF9zFXb3WSctEhngElsHo70YeU0XWiyABxSSpMTMTsC7YQR8iG7zsNrWeKojAVQEUq8NcDo1rB6BBeCO9zyHdBZIAUQK4uF3Rzm4mzsay5zPXRGG4GV7dXwf8nwq9jZAtVoP3ob/ZEegOJpUd9uLqBHq9Ai0B0pLB8+zqXu0x1YEHj+je/6NZbN21Y/+i8kxbU55gcztcCs+OabRAYue1MdLc+tnvHlh3jB8dhMXfF0KlXnLvyjBPrixtTtZatZ0LqQ/gkpCxwUHIMVn9ZvJcvgO/uEYu0CSaDLD5z8fLty7av3/mvn/vcDddf/5wrn5fMOgHHk8xe7GeEaJiaQozkuzd9f9Puh6979dVD8wWpECt7Jh+kIgQiDjaphBhAbOJfGpvwSfQCBgTkKwWKuKnvxQ9RCq0riUQ1V7QyK0KasGEY1eCUKIK1TwF/JUSSoxcASMgw1TJPh1QDtQOEWoRoTEckJcSUJUABgWaqmdOuc13rBOo8laHku4UGur8sf/KnwxUnI4YnnlBmqIEn/kIMT5efib9VdHD4mxRKjuJCxLpp9Qn7YcSWMUps1Z15xTnPv+Gyb372riPbjyw/ezkbtYBTqgtjLN+7+8C2x7Ye2j+iFkNzePVlJ6w6f/XAsnloUsZ5x0wRROE4nLdnfNi3WRIV311PRWNkRGMYRRks6rrops3kzKtWT41OH9g28ld/+clLLr6cG8lscdDxI7MK4BkhMZwJAb7yla8m9fyKa85rzDGOOkDs6xMjp94ajtn6HIMBQQmEDQbdAMCnWfo2nHGMbIGc4Ng0mlTEEnXFwYph1Dh0IQBFLkhCmzIfQ44UkJ8qo8xMiDAUs1I51F6VzcUoEiO+CY6qglihTjRXl0M71uY+juyz/kPg2PeJkBjujnUPMbuoKCY+dtQ2ilZ+Vt6j6rrN/IrXnTP+UO4dlWhADycUrijFELcfhKDk1LJhkHZtNyF66Wuu+f637nlo7Ybly1bOWTZnQqYOHT687oHNI4/sz7raPz897bwTTzvrlKH5c2QAeYNy7kiiTsV3DTKe9SHxKQQQJUIYoOPgbYJwNUo9Rj6EDnFM1JzTf/olp903OnnTbbd+5zvfuO5lr2AzCwvHi8xe6WeQKHT/wcM333TbOReceso5yx131SiJhuhpSJ0vp0VWiBWOpWC+aaYvG/J5md7eF6Fi5HxI4Ve4IlgcjD5V57QLMWqcCAe73cdxw+487+9RMdjuRR/LcjBXQJ0yjOBD11WyJSARnCKHZiqZtVYldyGbCMRF1Dc0Air7TwcqLIZFQsZS6Rc8jjz+X2fwU0f/tXihsZ9EpX1euBKV3yvN6rSyEa8pRQSqbAyxO/2CZS96yUVf+ed77rnxvtpQbffInunu9Ny5zUuuXTPv5EVm3hBqDHKtRJXEz0Zj7y9SDLSETI7emHQvfaVl9hRiZw4HsEVuarzw1Lkn7l6+/b5tf/k3f3X1NS8c7C8mgs3Ks1xmFcAzS772tW+Nt8df+KKX9Q2xGqvkKFZnQTWkQgIFxFSZicL6ZQJ8l+gQlA0EDYFDyzgCAAaXDL0qQiswdULKISJNnkgIPoLffA8ZTrE7KMOrgWjpF+AdVZXGfKAQytUwsN466ajNnBMRW0JZkc1ZmPsabfXIAkWG3bs7CCrgieQH4fsPkKOoIJ2hUUIjOF+jTVW1EQarAepEDTMRnLMEbdZrL37V8777tfu3bdo6uHjggsvOWn7WCbXBPjF5t5Fbto7LaEtQP8VyIFRuU1ymyumXYYmq39Pr74iAc9han1l13gkju8fueeCBr33931//mjfwbE3A8SGzdN8zRRSwar/89S8vOXHBeRedRqyAg7rAeSj8xQqsjjeSe5sQM8KwFQWp70JJAhKEpjgUtEB47ee9cAVnfRNiKFSUfDVvGDqi0Bj41WJzTGyIGIYpYRjjh0AWh+EpaeIKL+MLuVSRi81VOiJt51qSd3NrnbNOoCTiT8dHCHy0m2Lj68jFlDmOBeRRoTPiJ3+IKzBTExQ7paLtUM/uysUspwKUtn/00KKaqLZREj+6LeGOdFadt+LFr7186NSha97wosWXndJdXh+bm030593ECgnUAf46hjpnLZOAFWEucFADJdiH//zOynY/VJ6lv7IiKhZZY1H9hLNXosb/88//fGJySmazw48PmVUAzxSxzu3bs+/B9fdffPUl/fMS4ryc5Vqa3ihgrsyAjFBbwKV/M76P8Bk/rTe+KNIxS0OxaOYW4B4ClYLARpi4Urzy6fzMTGBiUwCiT82XAFpwCqckIX5ADpKJZKJtZ9t51sqzjs2tH2QcghKBK/cN0FAg0Yy4bTXrqQS8H6UcpULiLo7eTw8ZVOjo4rMaDzKqLk+iKURdDqU+felbrj75kuXdQZnut1Pc6nI3Z+uHKJAwij5/xZ4UVQoqvqdU7K2iAbzaKIYqRGcCPqDjJM+dQ0LLTl82d8XcR7Zv/spXviTi/lNrNys/JjKrAJ4pQoyvf+M7lMill52XNCAsEBdy9o6GHG92xrJaZQE5b3sK/Kwqb3dytFGVS3vWI0RUBEAsQo4SggZwos6FQTNecSDMfC/2HouP4Xcdkno89FtVK7ACq+oAK8hEMie5c7mTdmYzJ9ZXp3lXwXe4JvL+S/QheEZkt6gi7nkHM+YD/NASVu6oaIEea+O9PkHsAxpfU9EtL4YnPIgHh0YVTkXY2aQ7uLx24sVL2v2tjNsMYfVz0iT0/uSwSF6dlE6PouoMzKgMmXH8WowU9j5dqCcM9YEucWY+Vp6x0tT5L/7yL1udlshRd92sPOtkVgE8I0Sh1rrPff5zF19y7vIVcx2rt5v9XyP1XJq8sdg3gHlI8yDfyscPYFcghAu50nqBQgRT1ecGhbrcCF5xf8VkMVFxTqUYOKbF/JiA1D5+7C19K2r9T9HcqXWSO5c5yZ1kzmbWdvO8+M+pOBG/m8pJFfR/CcRUxgOCl4GZXL/ODLjO/Ot/+HJU6yoA9KxPz5Yr71a5J4rbiV1JY1O5kEBbKAwVsXmW9qVLT17coinLztP6Sgiz0kqrXmfsqvJm1ezX4rCKldVYzKdliFhVlUFkSFQza8XIwlPmz1k2Z9OOLV//8ldF7H903Wblx05mFcAzQpzqpm1bDhzed8nl5/QPsRgVFngaJj7FFXgJpDP8OxqYF8AgUPBaoKYf78tKnppnAlH4lyMjzApW9s30KSbuK5GAnEAUzokV3+Ct6PDgoZ99bwnnxDnKnXSt/0+7uWS5ZrlkNr7pJHOSi+YgRwxlImZmTycBvs4tHBXFozEEo2SIQOQ/S+w/Gin3QiEdZaFHDv6HEKrGcHs3GfRuNORnRBwoRoK9Ji68Fa9ySYXIZ+34km5WEEQU0l59xuKBYRbpquYqElYknEKvmqn83huZqEoI4BTqQEt1AF/hQeoL+BQEFeTOYlBOPGcF1fP/7x//T9bOHqcV0qw8e2RWATxDRL7yxa/1DdROPmNFUrcw3AP4QPVFxZRHGRb1bAlMwCcGBcAJnRyo5Ho8zrL/j1BCf2nnejtRQszWKcSpFbHOOhEralWsqHPijf3cSdfazLrMucy5rrW5SC5iRaw6K86KuDDn3tcSAwixTFLP+YBCAKA4J8Q2dwr1rTRjkCGgYJyN+zhYrz0vtbSSe8IJvbR+JeJQMi3V9df4ZnxRBtg1Bl2KF5H4Cc4S4MuzKcbRfT8mMNikCZvB/r4kUmDV1J/S3EfZADsa+zOCH/H6afVIj1oijQ0E4dOFVSC+YeycFUPzV8xb/9j6h9evc9bOaoBnt8wqgKdfFLBiv3/brWeuOWvesrmSCKmFOpThzQJvPDVNVEEXFKmY5GkaEpBz8AFaw8RRGzAxMzGRIWKECAAFBA5bR2V+ozdbPWrnikzQFXSdeou+7VxL3LSzbee6ql0nVuCcBqfBNwSNXoyEwTJVm10KP6IkNCjyQRQSmPxEdJ+iVEagQ7iAI77iKFboWGHhsolnAZ5VLA0fp4JbL5Yk+Fxlak/YytGhGSVSMiDjjXsikAFYAFc079ZAAqlCIVBlIhXX36gtmTsXNhMnxMYHCWJM/XFunKNOuaSACscjgH81khQiw/FrEn0RdWx5kFaes3zaTv+/z/1bbmdDwc9ymVUAT784tRs2rD9wZM8Za05HapWskkPso9lrgc4IRcbSJwrPvLByAmZKDbNArVAY7cIUEjKjsQ8PehTKACgkk8dc0YDJkcMgAF4TBIpfxDrJrbNOrIgVFYRCM0XVkSiguMLnayWBMtZyxUBweWwF+19JKi1PPFJVFStZY65MhQTvET3qZ88Kh9daZNeUvH9UitVlOSoxtFypEmkJ8MV0vqgOiPyR+jWPUQInYpjnzRlMSFRUVAhV6NdeRkpn/Hus34q3NN42veoiVpGFfGJVQEVFWBasWjCwcO5XvvG1wyMjblYHPKtlVgE8/cJEt952G6V21ckLTV2EtWJbarTsCg+gKhGDlBTMlCQmtc7meZ5Z58hw2lRiTaAGRGA/Fh6RCAqZnCWzTdGGjeZjUexbwqbPGxGJY78KSzrCiUdhH6ao3F7VmgUtUxgrzEp0ZwK4xjOs0OwFiFGMfhSZNyqlijw2J/7EUj2cGZwPRT1bsm/lMZTKICxA5LT8gE0APpRfUQ/RnfMKs0jMTE0y2NefMMMqueLUKuV3xSH2rEa8Q452gGbqjKNUX+STKByniIqoM33pqjNX7Tiw4wtf/MKzOxdIIVoYWselzCqAp19E9bbbb1916uKhxakmFqyqJK5q5RV4WL4ZUCcy+wZkBKyaps3+wTnNpA6oak6qRMJcoHFAzWhfMxMzsQGxjwSUgmIfJRqHf6GFBRzejzEJ9dmg/lOkcTdl2VaIMT8ORvcyXlV8q5I6IXmpeqz+ozpjW9WNPh6Q9Zr/1TcLxO5ROxVvJmrHeEFmbJcIzLGywZBhMlSek6+xUD+sDUQieX9fOtDfdM4KyvDGUQdePc3Kbive1owF6Dmp3vP1PwvlpBBHaknmr1zYN6//3774b9blz+pQMDmg67LD00e2794m7rhzd2YVwNMsArd7z57HHtl05XMvTPodEqccul0e88GLj2xpAUYyBYCQYuJgZ9ODu5yr1bjGmiOxyiBVYiawz92PWfYaGgZppGZCKXGJCJEMqXLiPbhYGOhc0M5+m1Iaw9ErCFZ7VBQzTPqqeCVQsivVD1Te9TlNlTTRAMvHZkuOipf2fgAooL3YQ4/2Q3QFqDgRihRK+ZFCMYTDUT+1xkATwMQkXYq1dRycO2VVSuo8OKcPHCi90LePquRT1AmVii6dCfEzpWI9RFey2h4ilnl4b0RUnXG1heny1Ss3bnnkwYcecs8uWFTAqVpxVjE2Zddv3PI7H/voVddc+/O/+MtT3dbjGwrPTplVAE+zKHDjd282DT5l9fJaXQW5OBtwrfT6ew3TKm0TLWViZQOTpOvu3v7+d/3hX378X/btGDOSJkxkrLCATYj5ImSs+BqrgvSPf0FBbZfmZS+wzHRHIssc4JeqbEnxsUDNRw+kiv0z+P1iJ9XtoyCRylWpfK0MHfRwQCV9ohX07KVCqmqmV0UEaC6WolJrVnXJoj8SPaO4vuV2w2LEMV2EULOtIsJgH99WRS2l4eEms6/CrhxZOHNV9EydPCoE0PsG9fzhWBxQz7cJoYm1I7UNt/i0xVnivvyVLzp18iyCRVGbU7b30L6/+bt/fMc73/ra173o3ofunLKd+kC/WufHXx8/MqsAnk7xz+RNt918ynmrhhc3KHWGC979GB4AHeOJJ1+c5bv1s5gje8f2PzT2hb//9p9//FMPPrCPu6YpJhEBWY3xVw+Xyv5eZxQ9+rVMUSmzG6OOiE0aCsiOqFsa8bHSgMrdVEl8iud0NEU940QrQQH0KgNUkLogsBUiUCH/X+hajeh1hHKH0nAvthKOpWfThZpRKvC/SgHFng4VFSzaq5YRmjb4RFdSZWVSsDI0gbL/I4MAoTDek8gYAobn9CU1gWZRZwChh8NRbX5m3gsz5eiPFS7asb6kiH6AqOSUDS4bHJw38NnPfebQyMjRG/8xkWB2OFWrbrrb3j+655vf//rb3/HG57/owk/+7QdXnNr51Fc/9sHfetPQILpTh5OaoeOsD+psN9CnUxQ63Z7atHXDVS+/JKmLSq4sxsdPY+/LY1nH/rsFSBIUAgHbNIVIPtBf16xx9zceOTwy9aZ3PvfFL76yXue2ZgIBGxXfsMFbk7Ek2ONciGXGptFF5+W4U4omtNcgHvdCcWlPFJJim048HpGl5b9Hn5/nJIpYeO9RUJUiK4LJxbSrsLDFEHn0kuEUFU+xuaom6IHLooo3kjtatjvl0F+HY9KsarnhXiIIpLFTm0hBJRFYYZBomNZIpKRO62mtVjdZywVAPibaH2O5HvftY3hSOOYtRQVB5NtMaQ2LT1qy6Y5H773nnhuuu55N+oTH8MwShRAgYFFxag8fPvzQhoc+/7XPfefmb46O7j5l9ao3veeG615z9ZzGUCOlnXbCoTM8fx6bhIorfHzIrAJ4OkWhGzZsaOeTZ645xaQQVohTKUhtjcVS0fikKmRqYf2GTxPnznWnWrCuv9FgJI/evffP9n5h1+7szW+6oTbIgpZJyQLilIhUmEjjxEI/+f3YN38ZY6WKTihjwOFg4w/ETvWVfPPYdI6i8V4uwbGfuMpQ4bDhALWVtBilOJuy52CBCjcWjqe6oZkXoTTwi2k5AcOp5H8I8OO1WMkxwEJNkzpCTtJVp5AA7aGTQ3mRhAB2vvGCgBAiMb7XKTtYIlLVXCwMc8KDwwOHpse1cjGoOLwnJ/o4rytL5C9NXJ8Z7pZ3puoy/4T5j93jbrzpO9e95Do1z2hgLE5ToAoIMN3KRsYO37du7be+/rm1a29tTe4/4eQVb37rtWsuPWvFoqV9c1LwFCWjHZiR8fGJltabc4wxpT1wfMisAnjaRIE8tzfe9N0VJy0antenpsJpeIvatxGO3MKMHvMlAGtAcVYiRWu6Y5KUOBkaHLIT+aEd45/65JfbE+6n3nNDbbi/oy3hnCjxPWkAUFAfFUZdCZFFOVqKZB/0IlSEfqrCe+WjKKib4ui1chI/QCo6BohOQEGLhHho5LC9R4TYbKd0T4JlrtFKjweoxSkEVcG+TY5CiWCISZ0yG1JlcqySaA3itm/ZecTK0hXLh4bIGRWCE5igqore3XEhix2EMu0QeFFhQZioTEBC3N9sHtKxyJMVZ93rt8xc+N71Puai9X6kNDFmYH/wq4xAanPSgYVDN95y4/jk+Px584meoYxxsTRW3cTk6MMbH/3+XXffetft27eut2Z6+UkLr3/rNZddedac/qFas7+/P3V515jMoZtxV7W2e/eedjevNwfifL3jSAfMKoCnU0xq7rjjzjMuOWV4uOa048tii55oweYvVUDvfVnlppU8C24cjoxMESdEmudZPW3M0WRs/+T/91dfHutM/ML739w3nIrrOhZllZCsbzxN7etyNTTtjyyGUqwmLVgXIKgBKnyGwp6PBDlVnkrqNcHLc3lyz1mE5QpT4zWe+hEzRW+14hDLCYilkRut+Kgyeoit6GuVGik21iQGWA0JO0lA2jCsadfs3DT+jS/f+I2v32zb3ee+9ppf+sANSrkCxjBZ3yJVK6qSCo+CVbzvRJFgIiIiAygxIdOEuVmrU4WXOuqqz0Soikv2BNrgWIuqYdhcuUaxpZ6IU2Ye4CUnLd119/bbb7/jhuuvf8YAo7/koeu4QjOXHzpy6L51a7/5vW88sPb2kdEDSR+ffvbyd7/2qtNOP3HhvMW1WiOhlAyJAmRNPbViYQyjMTbd3bpzt81tYkjFgo8vSDy+zvYZIuqfNNU9+/aPjB886aTLag2yTE5R7YsA7XnUlWYYayWAUWy3YDM7fmhMhPKsS8K5s0zpykUrDo4f/sJffGVydOzdH3jt0lP62ppn1qphIxBmjdPlI9BWqSZVLWAhuCCFkd+bq+pJ6yI7nqITE/725Gz9o+2v8pSJwlQxP8cYkU6ageYkYbRZyUeVR1RcAgWOokAICiFi+GxZZWICZcSGU1LI5OjUg2s33fiFux+5fUd3+ygaKVxt132j2eFuc4G6xG/LROVWnEukv4JaiksR1Wro+aA+kCHNZi1NEkiMYR/bE4vn0bt2M9fy8aRwiCq+kd8ccQBXESWjc1fO3Xz35tvuuv366657JtjGBX8ooHanvWPX9tvvuu27t37v4S0Pjk4eGZhXP/X0Va+88oVnn3vS4vmDfQ2ppwxrFCQORMQQUSJmZw10YGKqs2nXgUceG6mnzWatrk+01M9OmVUAT4NQzOz+3vdvBGjZqmVKQiRMBRWvqOQRVk1T9CCf35z4jgkibNtu7579UHEsucpA//BFF17wM+/+uf0HDv3dp//h+1+9fe++iff+t9ede9EiIpurKpGSIzaxkrTI06kmw/U8FVR5z9MuJcaip9vwEyLX4yHJ4yNMqXMUEiZWVtimCtNEUigwX28VAtrBcAwpTMFG94a4hnw4BZhJhXw3HhJNmaXVnjg8cc9tD3zz/925ae2YsUPLTzj71R9489jYvn/9zP+Z3rK7fXhyYOGg83lVCgUTOyXxXL9q3F+8gkGdxjVXbxEgUEL1NKnVanmHVCmM8CxpoJK9Kdbrhwet6MZVzAot1aZREducX6/Pad58x60T09Nzh4Z/6F39RyQuVMU3CX6JAkSdzO3ae+A7N9/43Zu++dgja3M71bewce75J11y+YtPWrVy+dJlzf7UkAgrs1gnhMTbNIZIWeAAdWli2ra24+Dhb3173dgIpWlteO6QMccdHh53J/y0S3FXC9yXv/TVxSsWD8xtqnYDoUHsWwQA0UCt+AEBwEqiNvxFYhJ/d9odOTDhBLV6krj09a997a/+xgfmLpinTi6+/MKvfvmbf/WPf/XnH/3UO375uiuuOadGrqUZjBgnyim00genxHeUHBBAQJgp3MPnV7URgjVbJS8qIB3JoQgzPUsyY51mQluRGUtV2ItMi4YPROLMMxleGyj7EyKCZ9s5rqLfDBNT/CIryBmWtJakU+OtLZs2ff/rd95z4+ZDOybrZvi5a65/3uWvOuXUNbWB5ujYlv/3mb+bao3v2r53wWlnOWQgTsiIb/mpXFwylEddNo0AxFcGsLJPrCKFKpKE6oYzEeVwVFWvK5xZhRD7z0gV/RGOKlwQESVFvZEuWLF46yPbN2/ZfOGa8xN+ymYFK0Kjat9UD1b10KGRW2694/P//u8PPHTvVPfwohPnrHnhaRdfdM7Z5502f3i4lnJqVMURW3XqhyKRM8yGASZRcSBjDAmpk2Rk8sj377jjwL7WSaeevm/T7hrPTDw4HmRWATzVUgTfOll72+6tL7jhcuIMRoh8C7AeBoioQAyqbiLmpwQLEhAig5z27hzptDIVw8Rvessb3/9r71u0aJ5CRXTlkmWvecWrmrXGp//9U3/xiS9NdWrPf8E5iSGLrhJ8LJhAEEikeQgx9ExhzHnYeUS0guFGmbOp5TH6f4PWKkzXo7GejvWyUH5xMYpNqFYxNH48KIzIsvh/SMIRCYVy16hUVQisID+aVw2pCBtAataZuvCebQceuHfzbd+8f9MD2+2EGRo+8blXXPGcy69dvnw5ad62o9KaqtfN3EVzDo2O792+/2J7HpNlZn9lRBDGpXkT21dje5+Fgqr1vbsh4qeyda06UXVqVPv7m5OT7RjGKLA/rkVJWpXvPK48oY9Q3lta6nuEAIWCyCWYt2Lujge3fO/m751/3pon2M+PUFTFdydUKBGPjY2v37ju//7bv9x567cPTx6ev3zospeddtVVV5148qpFixbV0hRiFAKyAgGzQo0BlEWVvWvllI2FilMVTSyS0Sy/bf36h7fvSRv94xNjzMnCeQv5OAsAYFYBPF0iKvfdd+9E5/DSExbW+g0olwJhS3grqGEgsEKRL55ByxAUzCbZsXmvOlKnffXB173+dQuXLfDbYcMDQ00IXfO85050Jv7qn//uf338M5MHX/qKt1zdYHS1AyOwCphoeOsMrRNQiChyU0XMtAeQSY+B7zEuUGwpnE7vxsO7Fan+QhT0UdFjDSSIg9B7tE6oaItmuEJ882ghjjlBqqJg8n0qRKxRMppylkxPZ48+sOO+79x/z53rR7ZPMw0tWnj2xVdee8HZly5auIzJtacmVTOT8EBzaO78eatWrtp3YPdD9z9y3ZteVKvVHDzHDCIGyQwqo1gHv0oCYTCTceoANcwiIiJJwvVGTbUF9uFkjfdCReM+Gdv/ydFDZTavbz8UNQF8E2+m5twmpXTPvXdpyBH4Lw0EiCqUWFUt3M5du2763s2f+dK/rt90X3OeufA5J73nha87a83qucN9zdQZoxASEu+V+hlGSkpgUlUV5jheM/i2gfmbEHvbYw/c8eBGa2qNYcYRNzk1lXB63Nn/swrg6RImuuPOuxyyE1Yv10RgFFJh/Y/BigQUiMVVRRYlPLNBoqTpxg07cytM9RWLV5y75qxgaGoo52Xp5O3ppYtXPe+Ca778jS//3Z/+2+jo5E++6yV9TXS0A0PwGSzKoYXlMY4iFkepluNjovVdxieCme1fR42lgPclCmA6yiPQ0NY6cPhF9DkawoE0F5/s43WRllUKKN5RIjJCAsCo8SumXptCCcwQVUewSUJsKG+3pw601967555vP7Bl7ebxkUxyXnnSmisvfOE555y/YM586WYuP5iLE+eY2CBRa+tJ84KzLrvrrnt27jx46MCRxYPzwI44BVwwpKGF/q74cSSRyfJVy2CGiDjxjQgE0uxLOBFRq9EDqHJux6TMjpInBf+FbkG4rdQ3CJGYJqxG+ubU5y6ct27Dg/sPHVqxaIn5L6mVLVrKkhK12t2771/7/770mZtv/ubY4d1nX3T6h37zDWsuOnPp0vlJjVWMQo1CnERfyi+yMrOCoBLoQO9IiGUiInWAqGmpW7dry0333TeV55bRYTtvsNFs1hYtWfi4PQqfvTKrAJ4uodvuuG3psgX9Q3U1VtRRQasg8trhpX8+A4oU9q9/dGNqtoqq5LJ5wxYoWHHepRfU+xqBQPVfEcfkaqnW2Qw0+gfTgUP7p//5z75k2lPv+IVXpn2NDuXEMAJRn3DkU+m93VkGIgoHpEwBKn6hmH8ZSIqyCozQazhW0Kli+0f7uHhZ6ImiQUbI6KG4BsXHIvejsT4ZrGrIOHHCZBTeuPQVvJJADSUiJhvNd+w48ODdj264bcfe7eM6rtKuwQo7PmXpuZeef1Ut7XamR7Nu7rUGKalJfB4qCa85c02jVj9y4PC6+ze+5NQrSR04BZhUwP68g74rKKoiTh7XM7BYhpjBTEziEoZopmTiUs20u38QtB9FGz3+58pN+qzgkrEDg0iQpMmcxfN2rdt+z333rHjZDT94o/9RUQGIiLt5Z+uuXV+78ZYvffHftu1YV5trrnnpZa981XtWLlo60J+aVARWod7eF4Gq8asGdeIEBGYKHQ/VwTCIxSFJDUBORIgm83ztzk3fvfv+qcmMDNQ5p0kn6zRqjRVLlh9fRcAAZhXA0yICTLUnNjy88ZXvuKrRr0q5kpIUES+g8mSW5tlMgjy4BEweZujwofED+w6lCbNg3txBJjDIqRITBJrnedZptybz9kRncpxsXhfutviz//trArzzl95cS1sZug4+MAlRpTI3iCqjpQjoBfPghvQcc9HPODoFJeZT71lVT6mH24rKo2hKgSKZP2b5ICAsqYYAcIiPkIpXjipggJwKkSGCI5cAJre1I2N202P777tl8857t9rx3EjakDmStsnY6c6UgCbGDh46sL3e1yA1hhOoIyZmShIVUlHbyVunn3rSyiWLNh8YXXvHw9ded5npJ1VSSAxYkA9lFoXJ1TavhUKVMEcHzMzEIkiTOptEtKR+UF2WJ0VVxM0/OV6jPMbgtgT0J4Kkbnjp8Kb7ujff8v1Xvez6J7W5J9pT7G9aRI2IDo8evvueOz/1xc/cee8t1kxfdvHSt/3CDeevuXDO0NyUU8dCal0I6ITe5kxKpJbZt9HgoCtBcEQUeDiIMezEilJHaGR6+t6tm25e98Do6BSblFXShFPmsUPjanV4zvCTda6eRTKrAJ4GUdVHHtk82RpbfeYJtaZzHM2vmYG9ym8zHvsY7iQiqPi2NzseO9iezpNaqqDJrKXsVMHEEFLrsqwzMTY+Njl6eOJga3rU5Zkh1DXJR2uf/YtvNpPht/zSq1SOdLlDJrTsLJJqime1sOmDQR9s7vBKy79V7HQq+0FUW0oUp4WSPqq4ONU3i/MnxA5tKCKX5DtnFryTZ5x8izYOQXRRqwmxUOqSvINDeyfXrdv6wK0bDj460rADC9NhbjYAM51NtDJxZNkkksrW3Q9v27N62ZITa9SopTCUmARIQ5jBqetm7b6BRZdedNkjX9i8c8vB6cnWQLPunQ2vM4HC3q9cuR4KrHgRBhuIKkBprd5s9k23coWEkWiVc/6BUlnVJ6kutKLEiUL5X2DulGVgfh83a2vXP2Bdbnxf8R9KtOQQyYqb7kyvX//Qv3/nm9+/83s7d2xdccaSV7/r+VdfdcHqE4f6G3XkpKTOWSEJY8tIfbICkZIPnWtJiwIQVUMkPg2NSQlO4chNdbub9x2689ENj+7c1WrnRAbqg0B+KzJYHxzsG3iGFjr/V8qsAngaxKm76aab++f0n3DqMiVHcEWLHaBA/hmWyIw/+twcDzAqSMjxYw9skswlxlCzvn3Ltm07di2ft6jZ7FNwa6rdak/uHTl0YHR0++4de/fvJNiEjTqtc9Nl7n9/8p8G5vZf/5orkprVmhUR8jkqKFNR/FEEZj7s2h+Lgooa+mrPZH/AP9ikKlGqqvXKeOdRHy71SOCCIq2iIflHyDAJSI0jKDvSDDLtdm06+MAte9bfufXQ3qn+tP/UuacM1fpT59pZ3upM1k0nq0OnXcLsEjedj2/dsXPuwLKkv5nnooklsCHEhgHiJO/m2XOee9VnvvKFQ/s72zftPm/BasD3DAqeC1XOLCxVeZo+4BGWLtYqOEDSNO3r65ucGmUu9eqM5Xn8t6rUWFBXP8gRoKM3FG8zp0T1Oemc+UMbNjy4ZfOWM04/0/yHkVJFxa8L2IxNTGzeuumr3/ja92+5cfvOzbU+ufDys9/7gV866bSTl8wZ5LpV7XS7loJ1Dyaj5HyiAwMqDO+hqhjAEIV+tiriAz3EAlhxmctbmT04enjjjh3rt+/ePz6eWzWUClliUcskSg5q3dlnnZ2kP07d7n5UMqsAnmrxj+Jtt91xzgVn9w8ZIFe1qDLaj/e9o9WCKpgEUCbq8qNrN7MYkyYK3vDoo3/6J3/8smtecN45a7jWaLc6Y2MHt23fuX3nnr37DrbamUkT7VgVIkIKI+3mn/7W386tmRe+/qoJN56ZjNVbW0VijyIyLQVgx56fvrPZ4wF9mchYsDdlEKOsMCgYo3hqpZTehUaMr4SS44FEjIUnrxnCOcgklu3Y5N4tex68bdv6O7Yf3p43zaIThk9bNH/x8FDdSXti4rCmEEeuC0cwSUKgBEkntwdHdrZba2pJX61GxhhVP+CLmRLAKDTPsrPPOnvBwkV7j+x+8O7H1lx2lsLG1N24Nl6VzTghb8f68Ekw/4OrpVDA1moJVLRY42NC/+PdHcWihhWpMnCPJ+V3i2QzHxRwqrVaOn/pgo2P7Ln33nvOOP2MJ9xOzyGqT8EBMZnM2p27d333xpu+/p2v3b/hvpy7y06e9xPvf/Xzrj5/2YKFSQ3MInba5U5Z/BghqBrf3IOCS6XiN+tnWbDvlC2hPEWdIgc61na73cnp9sjYkT0HDm7bs3/v4dEpp6JMGgct+JEMTCSYODy2+vLTkuOvCgyzCuApFgUEmju77+C+l7zoCpgc7EJxUpkkWCX9K3ZZkVLv4Y980wJ43mDq8NTuzbtZAWvIUHt6fN29d/azO3jgwNCcearamZ4enxg/uO/A1PhEt21tBmaTKKkK4FLUSPgv/vjTy09YcdLlK3PrKBEixGJURMoJFXQJzWw4HCbFwh3vqx9tUfZEAcJ69OCSB8qC29GK41FqBh+goF4dEQ4mRCqUiAwrpD420j/y6MGHbr1v3dr1+7dPJOhbNLx01cITlyw5adGyRZpADHZs3zKy/xBRTsjUiQLMxigSsVMT+8fHR4YH5hMSQsKUMiXglCgxnECNy+y8ufOuuPSKz3/j8xvXbxkdnRxu9AcQZYIgxFHKhKjyRD3I+kYcni/yrxwUcHnWhYLKiHrRY+OoqIAWP4vYwgyL/snw2lrVAfDQCu+ikGFetnz+I+xuu/PWt771J3x+6xNvVCGq4vX9dKt9yx23/9Pn/uXOe++Ynti77LT5N/zEmsuuPG/N2af39TeNqqVpp3DQpA6IL+YykoAcFAIVglEYQMiQivWOEnxDVRWr2rW242Q6s9OZHWu1xibGDxw5fODw4YMjh7vW5eKIEkANyICskqpywqQgq65jzzz99Hqt9iRW6dkmswrgqRZV3b1nT9ZtL1uxglIGclVlPx0kfKJqzXmmPdQAxCff44U3eiVhR4Z3bT1w5NAkUc2kSZa5mqFGszY2NXFwbH+XctbEZd2RsSOjk+PdLM9yWEcEYiInKiBxSJN0ZH/rF9/7kY//9QfPvugkka5vcQxFaH8fwMxrKi7ik56dlfiHor+YJ+Z77M8e0hsleROBkWayFR47S9wMnXO8aio1ZlEzCmKoqiGtI5F84NufvePWf7kxH2/lmVo3ODRn2YKhJatPP+3k089Jmn2m2bA2W7R42X133HWok9s8SziFZJxybnMm7rTa4+NH3LIum74kSZjZmCRlkxg2YCa2DtbSxRdc8qVvfvHAgckjRyaGFveDvW4mosJT6fFsCgincFKeLGIgzOgRp41GzTBKB6xnTZ5AZlr6FLRpjwdyrC32aIuiA5RAjcBAB4eapsbrNz7UybrNGjPPpIF8azkAChUnYCXC4ZEj373xW5/6l8/e98g9GMjPv/L05z//+WvOPn35ikX9jTpsR2naOYgfqyCh3RWzUbD3fwQgsIg3R6DqAFWGiGYOWa5dm3W63el2Zzzrjk61Rqemj0xOjo5PTE5NdzIrClETqtlFmBjimMgBECTGkOVsvG0zx/wDldqzUGYVwFMqBIjKxoc3c4plJywUGmcGuZgXER/eaO+Vt2NhescGNj7XJfj3JObRR3ZlmdZrxESJMf2NZn/S1580h/r7SUDQls2mOl3XdZyLdJ1hcgSBMDGJJGSsczVKOvu7X/70Vy64+AMd1yVD5bHA651Kr4g4iz0aqPEEAq+hkbcpIF3LOYfVeWclMlFxqqHeOKZ7KsCAUtlo35vVvsoWhfvkd8pKLKLa1+CpHVsPbhuppX0wdeL+GtcWn7j01AtPX7J4xYJFi/sbfXm3a7tZPjZ+d+vwZGuckSaUpqaWSW7IiOq+AztOP/0sgk1SrdWolkjKtpZQo5bUTZImtSxzF1140YK58yaO7N384K5VZ65gViJyYn2wtAwG+wuoAIWRj+o7T0NV1HcAIRCUxGozrYu1SE3sDlL1C3tf9YiWrkDPq2OEnvF4W6j4oEwKlizvDAwPzF0wd8u2rRsfeeyC89YcvXcin3kvCgLTyFjr7/6/f/70P//Djh0bhxfxS1592Stf+bxTTlsxb7gp7GyOPOsSQwS+ZRIRM3n1qOJZHnGiSgyABSJqncKpdrO8k+ftbt7K3FQ3a3c7rW4+3WmPdbvj05MTE1PTnU63m7lcolr1BgsTE4QIyqwq3rcgyTSltL+//3FX9FktswrgqRbr3I03fX/h0vkw1o/fg7drwjA6QpH6H6WankwFIvscUFDOpJY2PbyJyOc1cKNp6o1as96/bMnK4cE5QwvnjI+Nj3ezgQGeNCId6zvjOFEBjDouSAOiGnjegkUu9KKUeEwzIhRFTwUqTflweOHwQ6xYoRXoA4p2cceyR2OoALH4tTxtjTkpMfBd9DArQ9KeoOKAdtZYSfMlywfJiMI4ZaNZ3yAtXb589Wlnnr569UB/XyNNRbU93RkeMIcndh88fMBMpcakKgoFGbK5HBkfydy0qc2HsZwkSY0aDdOop2nKaWKcaG6z4eE5F1140Te+t2ffviNZ7mpJyKD1RWcKKbHFF+VFQs03TWKi0AvIr5QSM6UpkpRtoNQqPCCOevW4UrSNOjYl9DjXoCD0QASBQKCGUNMFyxdt2LXp4Y0PXXDueUfBpaiqkgHx3r37P//FL/z9P/zjI9sfHl5Rf/XPPO9Nb37R6lUrGokSdW3WkQQOLErsHIMpSYQ0VJ6Rgtg/CwxDTAKXi+RiW91OK8tb3azT7XY6WatjpzPbyrPpbneq1RmbmJ5stbvdzFrnb1BDBhAVNUTs489BIbOKUwUZAOhO5XMG5p58ysnHJf7PKoCnVgQA45tf/+Yr33FVo985Z9k3KiPubXCGY92Mscyq2JpCWcik3VHZvmlnkiBNkr56XUn7++prLrrgqmuuXnXyqna7U2/w6NShe753y5FHt1Du6mltstMCNGFQ9K3ZkFXq5u60M08hw3CxriuiO0UbO5r9MXDpfw2wHunqIjAAhDACgePApkIKGI/Wf6h29kxAoRAi512mqhchh6i64ktf6AUQU1fzU9ecyvVv2W6biWuJzlu46LTTzjtr9VkLFw4yA8QA6vVkaOj0617ysr0H9rWmpjpd7ms0O+22QDnhTndyanqUsLTeMI16rVZParUkSRNTS6hmKFElm5rm86587ndu/Maux/ZNjbXn1htkwMSxuJUAPwWehCgyJeXJ+VOiCrOvgPpcdxFV5ri28Y8zVejjAVdYrvDRWNUxc1u95E9xKX1HICIliKolt3DpQrWPrX1g7Ztf/ybjLWv15bb+usmhI0c+9dnPfOpTn9667aEFi/Vnf+WyG1519erVJ9W5nrsWgTIRNSwQYjIKJiFITlAYw45IBaQggROBc+LEdaxt59lUuzPdbk9nrmNtJ8u63azVtu08n2q1pzvtVqubZZkVpyKhEkyIlJSSsJR+5E40Uoh8JQElzK6T9yXNhQsWHo/wP6sAnmJR1UMHRzsytXjZYNIQMkbUGV+zEgw1/7n4hWCQlU9lESAgMEGFwIKx3a2JfZOGKUl4oJGqk9NWnvSiF7/wwksvqyV1wAFYma8czuv7Nm55LNlDBmqMgMgh9CmDQMnmMrSw//xLznM2I+ZQolniBhDzWgILhcjxzOAleqj8aJdT6H5ZKQWqnKT/ks/3KNI5iYFYgUaRXELRmK7YI/kPiMYOzKRQdbaz8ozlQ4v6RndZl2WaNE4+6cznXvu8RUuGq1BKTKZmLr/yioc2PLxj2/ZWd9q2uNHXbLc7gDrknWyq0ag1mmmjWasZrtXTei1NayZJEpOoSShN0gvOvWDO0NyDu8eOHBibv3gFyHrauTh6it5PcR2VfLKoVwhUhFN88z5j2BimohNoaR8QzVi48FIr7ymOUhRUXiNFr/6oSuVjcYtKIDh1w/MHkPAjmzdZscb33vGtN2y+99ChL371C5/+v59Z/9jaxny+4e2XveUtV59z9kmNFE7zLBc1LM4wG0dgSQgEEmUSgTooqZ9GJFALZ8VmNu92s3aWT3U701k23e62Ot1W7rrWdfJup521Wlm7nXe6WZ7n1kJFmMK9EyLngpjFULZUhK8N9IvIMMSuk/dzrdlszoisHycyqwCeOvG2yNZt263NVq5eosYSC4HEKXrvvuCCH1VDVJDrYWY8w0jCjndu3pN1xSSmZkw9x0Dad9Hp55615vw0aQLw4wk5TVasPv+y51yzddeRbYe2sTJLmE8lquJUIFazsy9bs2D5nBwTTBJKbFFCUPUAS97naF6iKMvV8vMB3JXh5zLNJCW8IRzgkRCznQjEHHUGe/SSGIhQXwdWKKdAGzGrsKjCDi2Zc9LZJ4/ufJi1MTxv7gtfevWqZXMqYFkAJSW19Oprr/n2d785MTXemW7Xa2m30zWkBmZsYrRvoF6vpfWU+5qNet3UEmOMIUNpatJamhpzwooTz1p99v1b7z20e/SUM1cyQyl0ONDY/z+oc6qO9olcGgCOhrSIKpjZJIacAKzqqMTxHvs9rnmxoscgMrSE9HILxxQtkdJXdpBSyLsUk9eHqG9OY8vWxzp5xoYMmURpcnry81/88l/97d9s2Lh2qC970+suft1PvPyUM05oDKRW865QJkbZkBgQsSEVF1R0bNrBIlCxJJlIZvN2ZttZ1s7sdCdrdbtTedbOs3Y373bzdpZ3bd7utLNOnmV51s1FoQ6EhNS7lyFKxcq+lk1ZcwigLN6wEAnhMyIHVupMd5f1HafmP2YVwFMpBDhnH31sS/+8gaSPVZ2LZsvRD+QMogQxwSbSLAEsjLgGma2P7W13aCDRWkr9XF81b/5zrzh/sG+gdwOU9jUvueqaRx7dfM/9j027LLFKVFNSH4IUUtRp1fmnuTqLE8MxBluU8oY2C4jdfkoCvuB+NOytcA5QoIqveGKfw0lFlOAYkMUInEeZORpsZJ+a6LEpppsixBoAIvatfjyHRA4uGUjPuvzM+76xTqR72rmnX3ntpc1GDB8UbS/9x5lOOOGEhQsWb9my1TCnaa2W1jqSqbrRw4cHBvqH5/Q1G321WmIMpUmaGJPWEjYmSUxiuK9/4AXXvPjO9Xfv2ba/O31ms+aNTVWtdNAMKO0Xxfs6CO3uKWR8+qZoKvDBfII4FY63RDzYclV69ULPLRPVA5NKLCQo7qRSdxythivankJ8hdWppINJ30Dt4L4DE2PTjcXNVnvsq1/44t//70/ftf7e1HSe+6Jz3/ML159/4WpTMybRrmQgtmKYmIhUjADWCrOBgToQs4OKiojrWDtts6lut9PptLvdVjdrZbad5a1u1s5sx+bd3GZd221neW67WabilKBOmBgMqFPfg0MhTATyDYO8CqDQca+83iAQkYioRdZyq847ua+v7/jUALMK4CkVIX1ow4YTTlnRGEhBHQ5mEAPAUSb/MVz94rGlojmO62h747bttaR28iDqjLmUrV46fNopJ5JSQc4UBuPgshWrzz1/yYKvT4+NwYoSKXPXKkity9JBnHXOiawZB+zXMgmz1x0gzw0HUJnhqlDPMSOWFUXuhskoNExN8cQPCqococ6/ynUFlaJBH3mtEb0hBUAhgF4eiSEncJQK8tUXntrXb9pZtmzp0sGBfhWnxDF4XaguJeJ6vTZv3kJjak5barVWq3eznIgnJicUbniwv95oJoaTNDXGpIbZEJskSbnerNXr9csvuWLuwJxH1m9+3pFLm3MHCNB4XAonMcMnZC759SUl5p7UV1XPhCRArZ6ilUdnCOVIySJ0UMpM9EdE+tDYtbqDcPK9HE+4BkHtlvgPISiEQKopzZ0/tO/Agd3bDnzvazf9z3/8wwcfWA/C5dcu/vmffOtzrrlwzvw+C2fhOiChlJCYBCJW2QFCRIZCEW8O5Lmbdlmr05ludaayfDqznTzvZt1OlnWzvOtsN887eW670s263Sxz1kLEWT//jkSEfBABChUiaAi6kFcDUHjvkHxs2jdm9TebqEINcz2t5227aP4S1ZkLepzIrAJ4KkVVZPPmR1ZffCKnjjk8yFpQtr3mWG8goALkVIQVnTM8MTY5vmffipqeAszpS+YN8JLBtDZ3Ptj4B6DaYM6YZOXypacsHmod2DM5gXY3UzI11q7BVNedsHLhuWedBBHyiXLFTgubsQz7HkX7zHhZHH8wcRU+0q0QhGG73titfLUHxXrXQcqKAh8gJlIXhpoHRUDl90SVwMQMcStPWrJw+fwdo4czznLVXEw9icY5+Q0ogaGwzkhSm5bcgkFKjLSWZpmdmBjLsu7c+fPSmievKTGmliRgpIaTJDEgJpxx+ukL5y44sm9PdzpXqwqnpGEgge+f4wumY3oTASp+YglCppC/H7xT5lcp1AhDK8tZLk7Zd6NCa8V/oosV1q2SdyuRais8uKBHoUrMWknkjR8jwFBO5595zmA2593v/pmNGzeItpefN/T2n7jqTW98ycrFczlxbWnnTIKaU1aqiSUyIsSefHe5dQ6Zc1Pd9nTbTnWzqazTyfJ2lrUy185tZm1m8zy3xX/WWpuLc05EVJUBUhb15BQXT4N3AwMtKtG7i+5o0HUB+0EgPxhCFZpD2nZ4cJiPywAAZhXAUyY+gimkh0b3X716jWki+PwUDeDHsUCKO1OrkKgARARIahNjk43Dhy9eNHzWQNpUrklnmIFmUwPrUkELEANzBvpX9kPng4b6YGpTmYyqbh6bOtKVKy86d8Hc/kmZNjWQKyhlKjcR7NB4wAE7iI5x/BGuA/Ht+RtSwPh2Lb4/DCoJ6kXSPMEpuEyMUfgOXx5D/XFATVAnDggjFWM8FT4hlpSc2oEFzVPPXbVn8/jWTdu27T2IxcNDfQM1JkMEQESVWBRZBxsfO7hz76F2JkpgSihh0wCcs8h37d35ypWv6nbbYi1zWktYIWw4YUNEaZomCc2dO/zcy6/656//w8iuIyecMV+JvR4RCQhNyrFswS8umEOCq1aiKn56mYgQk4epYCj05IOipHSqv5YqUilUi2jVX/KZPQKNix0uD2Ixmki457yp7KurlblOzQWNhet3rF+/7sG87eYODz7nJZe95k0vuOY5q02zO9ltdfMuakk7k9x1lQlpnlvJutY5dPI8y/K802l18uncdqzNcpdZlznJnMud7bo8s846l3Wts9bmubPO5SIqSlABHDy/D8STiismJZeHSD+GJ0rV+JWjyJuGmxhsDKnVqfF23pWFC5ccn+Y/ZhXAUyYEEsjGh9dPtcfmzhuOfG+AIVTjb0f59kcn20TaWJyzozv2LzF8Qi1ZPJj0Q6XDJjVI0uLTcQchKNtf57l1xRxTp8TUkw4aB0CPPDCazO+//KWXuSQMKSm/WrHLKbJK5XsaUz1LIzTsLlhiMXtUC2UWc2GYWPwA9xDaFcQnNDoa4SQqCYyFnYyA9DARCqhIl2RlkIpAidKaOffyM2/97vqt27d99CMfO2Fx39lnnXvK8pPPOfvMgaFmalIIso5s23Xwc5//4mPrH6JuzkScsChYDBujTjc+tqHZV+/rqzm1YokgDCIDBjE4bSSkmqb03Oc89zNf/b/7dh8RB2J/iYVK+Om5IPE8JDBbIeKhvoEOA0X6Z5jLWb0oPUsdt1Qp0uu5hSKeR12g7JuIVI+iV8n7+jQCjJIx9SY1aALf/d73Nt74SJrS2c855eWvuu7ca85evHRg59iITLccHCXoijjl3Elmu1bzTFwmaq1mos6pOues5opcxDrJxeVWurnNXN7Jsqxrbe7EiopTb+QrAP+S2LcmDMOmtfBKdMZt592/kjz0d6sQeZIIVDxnSsbw1PhUgnTxwiU/dH/TH3eZVQBPnRB0w4aHNenOX9wPzdWbdyWzA4+mx/QFingn4nPAMAROBQc37lrZ4FUDZrCW9zmrtUSycZme0Maw35AE2xoASK2bGmugO7+fTeLypMssQ5QKt0887YQzLzoxRwesTlxSFByU7HH4QTMOrOdD8YGMz2URpqUYUQw9xgIT7nnb8GSWVUuhhxjFpkB+Z7FVGoqsyRAR5lA74H0KhKiDEkEz1znpvJOGVgxP7hs/a/7Aiv7OxMYb71z7vRv/La8NDvXPWYCkeWRsYvPWjUd27+nTzpIhl7ddK1cLAiEXUqbHHnv0yNjoqhOWGmabi4ozxKKOiAlkEs5zC8aF569ZOHfJlo17Oq1Oo1/BQQuFow+KrLe9Z1y04PAEV9FTMj4xqKDHZkB/9UWPUqleCSUIERM7KymMy3IhTRvm6AHong5jw86332E2qDfRMK366O4Dd3zz+52JzkUvPOesC85ZcNrC/pXDm6a37tqVNgVKAmJi01XrZ7qJiJJ1JFYB+HpHcioi4nJk1mVZ1rV5t+OyLMts5kRUIE5jRMhrQhcy9n3jNhSdSSiuofb0nCrrKoqwk7c6SgezzDNTNZx0Wt1G0jj55JOfwAV/dsusAnjqRBTr1t2/5KShdFAKG0XDo14wuV6ir19RDQVFQH6cuSosWctH9o3Nr1F/U+oJsSEyppOPZft2pnOXwKToua8Fkk+N7KolebPJYtCtiSZuero9pfaa51/eN8wt2yYmKmO/5VFoz7ao55+jHx6tvh8iyN7UrRhuxCEmqQU/G7DPH66UcYiI/RVNE0zq0gTU6CP4tWMicZxrPn/1nIGV9YlD7auvPPU5F54wPXFwZN+BHbt2b9m8c2rPVsm12e2e3B0/YdCCeKxjxigfy6SVq6XEkebg0fHxdQ89fOKJyzmlRmoUBgKiBKHqTFnZMBYuWnLW6ec8vPv+rJ3VJCGjIApJTV45xVUMSYoxFyh6TKShiZqCwBxiACgGdFbdqJkLXlHYFaePfP9SAZMhR63x6aSe1JsNJzaQRQEpFeKrcNUIamgkLq1xbezA2EN33HVw554Vy5aeds3qhScv5aH6ATc6emicEmrW6k3Ua0lKKYvC1ANjRaSGIYBTiKiqdU4ya7Mst5nzGZxZbp1TiO8U7QlRUn/5Q/erkLhTlFNUrnD4rbiRgl9Z/iXcCF4bVPlTr2dZCaLTE9NzBhb09/XNegCz8l8rCiiZ22+958LXnJbUcgWIA9gBqGDsDCSNOBvslh4iwMC4rH5w8/6zB5uNxNWhKRylQonuuPNb55x6ChpzgVpPhWnWPbx/a427zQagtgtTqzd3bztCcxvPfdH5uWRqlFFy7yW2HmUvRtuqqrSK7gyItrxq9e89//j5wIVT7n123+cz7Fnjcxx3qYrq9hxFUkpUisefQKriaWIGK3E6qKecvXzv7XfpoV1D+fx5A3bZclrVrK/qYmrE5S3bHWuPdvPMyGQ3m6jlh2r5EcJ0Ql1HfXnt0Fh3Isu+e/Mt1730hUkaGKeC1PHHzkzEqDfql118xT2P3D12ZGJo2SL4ob6+KaiYaJNWObVyKcPpIZJfPqBBocspeuK9R4uWm4tbJF9NERaRCIZB40emBuf2D/g8KO9CxaQgZVUSFtTRTPIGd+iBu+7fsW3T8PDwtS+7qm/xQhpKRnXK6ZRJeHJa6sZ0udtN67VawjZlkxiHlEkFIg7qnIhVss52bZbneZ45m2c2z52zvuI3Fm4ogSHSkw5FPtirrMTw/Z41TnwujP3qCR+1GjHNoLCvNNbZsRrmhLp0eN/oWSetaQ40jxXEOi5kVgE8RSLQkUNTubjlKxYkNQE40CxU4GyZmx347ao5AwBEJKBIAhkLy90RW580Q0OcGscqzJzWSNGe3rJ293c/u+LqV1L/MqUa/NBBRefwzrEdG5pGGzVyQs4Q6rx7auqkS85ZcsqCDNPwliCUYFA+aMegWsNBFz8QYaz8eOUcKt/yh484MSuEdSPtXLR5iIRTWJpYORTg0W8/BiqjwigtXw4ZRgR2lig5+4IzbzN37d2xU90ZyCdqdmwOj7WSCeoeanctO9fOWgwkCdcZdcODlrqubqU5OUjrWvtbubv7rrtb7Xaz0Uc+raWo4CIA4JQJqNXo8ksu+4t/NPv3ja44b0lkdByUi6SnaMHHAC1QcFsKFX+WvgLOK4uC8tIqTPW+rt4miLeS747tHUdSp5Y1aU+0B4b7AN/OtZwpDfhQOlgJTnc9tmPL+kfI2iuuumTOqmWuwS10BS1KSRWu49ghg3OUdUyHCAmZmkko6HRyIiJqnXNirYgTp6LOxfo+YgY79vaBkk/QIVXtte2VfbsJEMQnART3U9Uzrt6bke8paMMwVqDkIBWkIpIqUZuy8c6ShYvouOwD6mVWATx1sm3Xtqnu1NIVCyk+mwBQVLzGNPjSLCwkUichsMoAqbBNkvr+3ROYlub8OpM1CYFTlyTMef/k/vG7v07Illz+Gh5cgiRVZ7LpyYlH78TYrv5EayDrkNeSw1PTh/PsNdc9r5YqkZIx5Aq+payrrx5PQcP716TFzEINoBWedFNEHv33Cvu3SMsID2fcPsMzFmGco8TgRVUPUeEISGTJAjfkK2xLA5EIgGNDVvMTTljSHOzfuG13K5vuT1vkcnSpRk2yabvVyTMlTr37kRoaTLQubC2JQ95gXjUv3zl2YOem+9eufcE1z0WiPru+cp2CPW0YZ5x2+py+udsfG1nzfE5rBBJRYlZyQmCKDS2kOMIC5aM6UN9ZjUiKxMXyv16joMeBrOpXqnyWVZQShhXNJW9lqZhq0BcU9A/BAFyrmUcffKQ91jr1/JNXnnJCN3WTacexGCUVZQ1kXAImgrPIxRKBJe/4BFZAQ8Mnr7ElkPEKGC7C+36Gi68L92dSIDtF/y8kR7EKIF5vxNsnHvexTj4sDHG8QI4UYGiY4wYiYoF1nVHRtq5Zc15i+DiF/1kF8JQJgQ7s36cm6xtqMFSVON7sMbVSq89k+U9PhjIJe59ZyEhS55HdB2u5qbPWWDgJrRZSdim1JEv2r7tldOzgsrOfP7zyTDGNkUfuOfTALTXb4VQNJ2TTpJbu3HU4a9bPOP8MNoAaOIsKOswkfuJhzWQxZrztn7wwtkaLfFeU5n5ZMwzEMDA89e3Vnfpp9t4oLoneIgUkpjYilsSVfkiwnn1LT1JF4mjZkvnzli18YOvW1nS7mSJribY44X517DpCwsZQYlicGCcNp3VAEkHiVNG/au60w92HRv/ti1+45pqrDIAQUizrhwr13exrDg3M273zCJuGSg7NFWCkBJCGmiwJx8vqk1MicEKhSqKiqgIXFGlMH1LtdaiqEhRt1IhBCVOINEBJJQWbnKSdJZxQBWljHCCcTKeVnXjSickpRtO0lXQdORUYNYCwC0cDJacKghKRhH5NLmryqLR8j47Y9iIUO4dMTmJf2Ec9XyDfLrE4AxQPhc8fKDXkjJXo8Z17bkqNlQ5M4YYECUETMSMHpijjk044MTZiPB61wKwCeIrEuXzjow/OXTHQHKip+sZo/gYv7r4Z6D9TvFHFoTiXhFUlO7B3u+t2TZ6mnDALpyQKgmEmdGUgP6jTRw5tXz8xvEhMfWLsSD5+qM4Q32kGmjYaDx85cvLla+YuHBTqGoKIqgSNFBXUzGdjxu+BegnmfagQBgmRg5IR78dT6O4Z3IQQ3Q0hvxAippC+IUjAAnVFbqz0OCER3Nlvp0C+oEYRiQ8oIAwWlWZ/87xLzrz5s9/dtWv/wlWLKJ/MwS3X6mgHrJwQKYlTGD/1MWFiZeJEEkYf2YtOnN9m9+2vf33fb3541eJ5GuorCsgJNBVA/YNm/rx5Dx/Y0Z3OmzUGiNgHQn1eKJQ8IcQ+B1Z8Wz0CSWCoEbpziEIgMY1WVQvkO/YdUvDhEWJD2qcmhtVJIiZrZ5JRYhqkFMpmg54Gk7cslGoApblR0cyTRIkynGhRj6YFOGvRiCOop+DYaInplSLEEO9HYe4XTnCvWzPjtgegcb2rZXBVM2SGEeI/SMUoOyKAyYX5qaIMk6oZPTDR1xg66eQTUX36jjM5TmPfT72wMTt2b5uzeJATg9Ljhb9hK3cfRXM4/C28W8kDBymzJiKJ5lMTByfaUw6JMwlISDRJmdmItcZ2a7abtFuYOJjt3tDZs4EndqaSM9QwO0OcajvvHARd+LznpmkOyVWViJlizk3kUX/AuZEWR01a5Su8ViiGuMAbjcU2C3DQGAf20FI1G4MGirUGFQqA4r793sqD1LidQhIxrO7855ztUrrv3keN65dpm0+08+mWZJlhY7hGwuzU5KgJp8QpoZYgZVejbNB0lw3wpScvwsS+b33920XQoYpYChGFc9i+Y99Dj9y3cN5cVuOciKqhRDVSJwr4JB+/tuILWAP9LaoiPRhbQbbiFojzFmJmS3mvaO8RFf/6hs1Os+kOhE2SxFADEFrmEADyuWlshER89ZoSOR9RBxCa7UCLtSaScieBwSqOKVzh8ioUp18e7VESUn4Kd8jfAlT8qXLCR39dqy9i3DyYFf7QhRjMbIS5g8lDk8sWL583b16RKHocyqwCeIoks90DY3vnLRlmE0zraLRyBTE9++G/4dM/4tNd4oHPomeSxDEGFqf7UtmbJdO2z1GNE8fe8ktJDByp/92CYLuJE4gzhATMbCip7Z7sdAf6zzjnpES7oiJOy+fmiYH/qL9RJaQZ4nVEQiqswiIsakTZKUvxpBYhAV+TG1Ev/BcYfo0fDN8g+PQR9ZRzSZ6UIYZSCXnAUQVZsavPX7hoxeAt9z2SteskLNl4d2KEXSdh8VQ1EyfGGMPGwKTECThRk4Co228mTxziy06a/2+f/uR0blFgMQgg5wMhRA88+NhLXvF8Hjzyxnc9P20KEQjG5RLij+G4vLEdWRotzjMo3VjtQIGZJ9942QdIZtKCx7pGcUli9ySVkILa6WTGmLRelxkeXIHOvjxb1QAkcfkqn6zejD1K0CO7X0gtYLy6jzgGIdjxGq9Pz8+qSivq5uJUTCrYsN7T7nWd48WvGPUEsJI6CICUUmPZtVx7sn3Ouef0NRvlvXL8yawCeIrEip2YGBuaU3oAJab3GCAanwwtfgZbmUJjLgAQIk0kwYUvOXv4wuXfOXJkG/dlSR9MYtSaMIyQACERKJMjOBg1RmAoMTA1TqRee/jw2IkXnrZw8QCp88n/IuqtvxkZPJUD9D+LZ7K0QIPhFZnawkhVimgWf6lsjnqf+rCHYA9XEkOOEYwuuYTqkVVf+r2LqDrVofnpc15wwf079+4Zmcggeaa2bTVTFZO1RZ2QgMGJ4TQlTpRZDSmRJMY2tTU3za49ebnd9cjae+4QKffh4JMn8ed/88+v/ulXLT93zgc/8e6V58wHtRGYLiZwKPUt+HwVpdCXLeb7hCwnjYlAsfdDacv3wn2h8GZeohnry+w5N2pNtpt9fZyGbqoxhZXCdaGK7e3KLdOMvRS6tvi98BRpBjQXny+ulvZup9QmM26DWHJe6v5jmBzR9SOt3kYV7Vg9Hn+qBAJSTdrjXZe5008/yyQpiiaHx5/MKoCnQhTIul0VOzDY54t7fKQS6DWV/P0aPOmep5hiYg0I6sFd1HXtaWtO+M2//OUL3/WqL+/f82iXp7ihUAPlkGZNhtloypQACZQMQE6ZmYhHsvoDo+7aF7yk7nKFEBNiQFJ7MvF7pdc8K0Cp0FshkAlIAOjYwFIqCB6sTl8iLDFBsrQMw0e0ZxlmHEvQN/F1JA0iGPlgQsArIYY6e+0Nl0sz+febvp113NSBKTclNdM0jutCKVMjoUYiNXZMeULSANWUE5gESc2gZtsnDZmXnrHsn//XXxSl1Q4wIHH46J/85fs/9v6XvPE5H/z9dy44pb/rjiDNocrgEgALvYkKpxUTF1XJtzzr9V1QBL+LeGZlKSL0VeLDJQ4GRUokzDBi3eTExMDwIBkqcnDLGy5QOEErUPXQKutb6tpCTfVeEK0UfsT74piGREWlVFR/mSfQe57VTZT+Xakuom9V2WKkF0MZHimHZB+nRmujIy1Wc/rpqxNDsZns8SizCuApEdXc5kilb6gp4gggcKzRQaQ4/Scp4mDZ0cSjQCREIUoqrDC5SzuWuamXve6id/zNbz48D7ccPDRh+vLE+LxSgkKExLGCBCJKYEOsDGuTdfs644OrTj31pFS7KqriSNUfGAD1w2KOSTGgcsDwKCxV4iXCeFG46pRE/cA/37axYpcV1fzVAF9pFGo0m2fiCiJZXHhTRxmloeMSqSoYzumKUxetecGaz9xxy7797bwLq9LOunnecbZbS5Mk5SRRTiStJ7V6UmNTZ8NsGCRWUuf6pLNm8cKxh9ft2bQdAht399E//cRH/+5Df/APv3rDmy6f5nGrLTIG4rtUgJkpZjhpHPEVYa6ntE8iAVRQQZWSvIqZUDnR4t4pDeuKW+STi0UcCHlmO51Ovb+pfgBc+FZMuymCuaWhfgztP+NYgqcS36q4rYVGPtpB6X2LClUX0rpKq6PnU1WJLFBcR6Lqb/6PPbxQ0AFEIHWilJvDe47M7R9auWRZzyEffzKrAJ4SIWR516prNBsgqEqo7SwqYIuxMKVhWPG8PVaEWKkahU8kJTVOkRPBdOYuT2740C92zjn1lgP7xmt9kjZUAWaFr/AkASyRISXnEsEYGt88OH3uSy+cPzQFOJ9jLRqMVPVdemacQ0nwlAn9gBBcbNfseScGmJRJmbWuIZEbhDIzXABVFrADSfQQiEIbZNGiG11s7+mPIOoBVSlcp5IlK6HFBwYo/EcKqAgsqNNxb3r/DSP1yW/cc9+RNlvbVK2ZpJ40G9aIsjKhkaQpJSm4Blcjl7AxxhiTkLjUtppiL1019//+7R8qh1Fkv/eHv/8Pn//LP/vH961e02w3xtHUnEDEhslDjogowGAEXt2vhE9UDVFg8Wk/KlrCL4X0fN8VAhwWP9T3UnFnIJLjFPOJAF8z4nslKQgkaE+3c+uG5w8LaaleqgBcuBylg1a1xqs3Qu+LeAHoqDePRQr1ft/vpcw6K/f7hFZ5YYTEJODqDgtui+KCeOh3loBamkiOw/sOn3ryyavPPC20TD/6yI8PmVUAT4V4my7LbLfVNsxEJL4FSsUALMxfLUyaUkJ6DUDwLYX9f6zqVDx0qVDTXPPzb5k8a8V3tu4Z0SFJ+gUsRn0/FhCDObMWoLx/8KaRkZ19jWuuv1bZgQ1EQxKIlo9UZCJ6rP3yWIvWy/4lxcafUIQcDIaqCJEaUiZNWJgcGzCDwsg+CURN5EFi54IKdxB8orD3imVXTd/ubV6Ego5WghBASuQk6QgPzxv4qQ+9+cuPrtt4eFrcYIJ+pjoZTZrgmqapL6YAAwwxoSsnJymnUeEluwABAABJREFUCVKVppGThwYf+M7Xdm3fScBXv/3Nv//CX7//o+9ZdeqSnNua5DmFAIH45jbBMQodVj1GB8euEsImZu9GCZUhVCeqHGC6+GIPMs6gZRBDJnERRURJmZkcWlNtkyRcN8Tq28yhXLIYoJ0BgiEuoTPfDC9m3BnRaz1aM8wUmvFv/GKpDJ6cPU5ex0ZfuZIqVWwt3B5KRExMgFHTnbCdqez01ac36zUmc4wTP25kVgE8JaKa59ZaQWpEHEiIo33iH6jYCLkw7opxLBrT63w/lEC1kIi68ASKaOBrVGrNF3/gl6bOXv3VDbvGzLycyZLAdBTtGgs5Q1Sfv/TEmx4e+fzDm9/y7retGOjA95cIWIzCqKxk41XwFPGPhdpSghpVBoRJiSEEp05JDVnDkmkNZoi0j/I6SdPogJFh2EbWyZ3kgKiEcR/qYwEQIRWIRkM+8hWxSU44gmikBqNRCw7bL15oteorGQSkKiQ20TzvPP+VL3rTr/3clx5cu1+sSQZNLpzniUo9oaSmaaKGchJLqkaRqk0oNyRJAuK8Lq2lTX3peee85y1v+avP/suvfOJD7/3dnzz54kUtN+0goqJO2A/58o4UxZHvRcKqIlryPr+FiYyPjTsV+OB3dGAQgVVLy74Mm1YBFJWLV6huMuQHMdbRmDg0NTA0mPYllvKiTs7zRAVvFC/50eir5QtfrVHde3ljHOvmf2JsnZHQVNXiP0hoxsve7CMKSxDuCigZZhZmaw7uP8Jizj3vPKou5nEps4VgT4kQsiyTwEkI4CIMMCBEXLmXoyddxIjD5FhV+KHqUDgoMRJVqDoGE5ELSgDK/PL3vvnr2Re+eP+Dzzlj2Yn9C7U9leWSoS4mSRYO/Z9bHv6HRzdd+otvO++yExI3Zl1NySoQ0y5KlI3H7nG+aC9Qmo6BQiD28CyQ4HCrEis7bXXb37t9w4DOXT53sXa7R6ZbW3fv3bTlkb5hXHPt+WeefVpST1nL1sQ+WxDlA11SuiXRr5WGcNVlK2gBCq2EikiCQkVBJE4sYEYPT5z+wkuWnLzw//32751Tn3fpipPTTo1UNW/BaMKcNlJRB0sAG8PKhCQRAtVFBGlNhvrTfQcf/e2PffCjf/FbJ57cdPkYJU58y3n4HjfhKqoPqxTr6+tf/YH5xnca2qHF/MnAiaiS+vbIRMUZRRUYlXJp+x/T6FYGCUhVjfLU6OSiZctg4sCxgjjXGQZwxYc4+k4uVY/OVBNaPQ76QSgeLf7CvtDyAj4ZmYn+PfRT9EH9L6EEnQjKRDU1+7cdanLziisuN8F1PU75H8wqgKdMnErX5lPTLWNIRRQgEwYTFlPhe+zsALEUrUXPymtkhsXPWAcDECdKYIEIQV1OfbWX/Orbv/m//vEz377jmtWXnjp/lUlp38jh7QdHv3PL/Q+3D//ER3/m4mvO65dxB3EwgVyO7ZULBCCfpNP7XHltUBQroLAXSclPe/Q5j1BiY1A7vPfQ33/60/V2vZ70jRwez5Gfd/np173qurPPO5nY+Zoor3rCEEotHt2i9LSqeCr54+VzWz3CGEz2Bq6W0OBUCJ6Uspm26quH3vGFv77p7//PZ7933xnNhafOWTyvb2nNtbXTISJjRFKffG+QmrYxyixwIjKZH167Z9tFLzjl3W9/zeBKbpsJQ13xNaY+/yocXZm07i8z4ohOqqJVGNAoAidwUpaIS1DpiC5XCb4z4KpyzxT1dCCARFQZSWqmD0+1JqeH581xJEqiscoiLFbU59ASD48F79XvYObfw63xZBG8d7+Vo/7hLPJiU/Gf+FT5lwQIBAwjuY4dGjt79RknnnQCF6UJx6vMKoCnQnwqSifrdrOuqDVUNDAgYlYRPy4X6qupOBi/RVAMgM9kiLwCiEVDraYr4FKVyQlxy3abdXP9e95858pln/0/X7OZG0LiJltukOpXnPrhd//msiX1hnSdgzIrrBY4GczTkJxSYG3PP2FkTcUPoFhmCRAM1Imf2cvaHKq//SdeP0BDn/rkZ0ZHJlecsPTNb3vNtS+7lOZwhmkhS4WTowBYEbqD9aZl/P/svXeYZdlVH/pba59z761bubqqq3NP98x0z/TkPNOTk2ak0UigDCIYCWz8jA3YgMHvAfZz+gzm2WAERjbI8EBkhISEAsoaTdQkTZ7pnFN1daUbztl7rffH3vucc6uqJwjB9/mp9teh6t4T9tnnnBV+67fW8pdd/BpCu0vsxdJjQKSUgBAtbF9dHoDmDAPTtguG2zf+8Jvk3ste+txDX3z8xeQQT6TN4UbDqGZ5VwggyhRtdFsJLOc83OxfNbZm24Y7v++tndHxfCDtmIWcM6tCKkogMLyWKRatBFQiyhIFE0XKl/p8DRUrzqkTCAgOzsKJV/ykBewSvIslBrpSNKQLH1IJBIGwJvMzbXXcHO5T4xCgdhRSO0ynUFzloQtFS70fLf+UUw96VB7wrIMieNWjwV97LFGBld0rhyKKbxlIWQlEzsyfbmcL3SuvuKJeS2npXt9hY0UB/P0MEiVntTXXgYAShe+AqMSGFTD+1WGoKlOoc+nfdyHfDpvUBR3gESP/LcF3E2dAVISInQiY2m5G077r33XvBbfd8cTDj6m1W7dsHVwzjFX9pKep08ryzBiTEllxQeiHw4cDBxtyUTw6ZKZWPP0CbqaAXMB7AaoCtlmeIH3v+9+6qm/gxW++8v7ve9/45pF5mmrnrSStMUEVhlggCM2diqijlvKklCQlb3apmFr2JS6JQv5iVBXKyo6kZlJnu21Xo4lztv3gBTve0Z09enLqyNFjJ063Wws1w5ykDdMYHhmcHBlpDo80BhuuL6X+OgSzkoPQlgUylGgg2CL2N0CpMr1BTKQMKCspCZUAlu/Ly1BAjbhcnfhSEEysCgiRUNSsCKUYyFfoqUjawnAOjkXEvQVsSJRY+PTx6YGBQdOXWM60Wka1fEKrgEn1hvfo/0WWdnX1l679WT7v3aTQAa97EMXKUa9yzGJjAFAJNU4Sk9UPHzzMTq++9mpjuHDRvmPHigL4+xgEHRgeVJO2Wl2lkB0Khlg1ojUFCwyngiSps+GUlOHgRETFsrOaq4CYBBAfExVRQ0FMwJdZExBcKCSpSjSfZR2Z5770qnuuJko6nWwBOXVOsczXkSRpH0Gds8RJbMEqMWnGzxmMYHwjEFHC5wEWCgZ3ASJwKG3j91cQITFJbtuOs7vffdPd77k177ZnMZVRK22mPqIdPPMg56OPs8gWjZa7apEsFr59FdkTlVk0AcN+AVMzSk7EoK9LmiXMmteMNjas2bBh7WZKmAROLamqOnEKUuU2wYFs5ogoMeRsliTsBAplqKohBokviR8bqxdAWgGtlSEWL61JfakGQFSsc05ECQLxmYKhIGr0vQq8Wwv/pwJzxMNH494nAwqleXrq0Knx9euRKkgK/6T6gMaDLZH+PXpm0c+0+IMerbLEQzvbrXpdowR3gnf8WqOKXhKTwKqYmk2O7joy0By46eabjTFLFuI7bqwogL+fQSnXTFo/dXoOqjAgESaipI8cC9dnZrITu7qvPPGynj6aLbQM583B/sFVg6vWTgyvXTWwerQ+PJAn1jWEuGuMqFUwORFSQ+xEHIjUkKgiFwKJgRMn2iXt2FYOFicuMSBhQt0FSUiWYNRVWhGEl55itLLMzlLfSdy/Mq6Q+lrUdhZ4sJXIRGKJOs01UWXpykwuQokRFaMJKzvnvBnLPphBpYBXrxfgw77RW6IC1Y0pcnG65eTDRz3vPlAoKSZWp0ogC/GGZKJwCuTCDBWfrZZDfFqDE6NqSEKRUBJVw5G8adTCMQxp4OiTKhUZDN4Q90rKezog5VAUiKhYQAarwtdbUyOSKpgoy6wIuZwN4IQgCNV8IBUVWfECCq4AKGxKSkzkqGFr2Zmss9AdXzcCY6PriKp+jKvWM5aATEvH8qjQYteMzhI1QOUeViGgs4QBSsjxbIdaNI3eyLYBGaJ81s4en92544bVE2MUAmnf0WNFAfx9DAL19w+MDq86dfqMY9RYNJM61eDkxef3f+7jX5l6dn96oHv1udvX9Q8ODQwmtT6XSef4ialH97y0MDedt3Wg3tw2sfGqc865aPPI2vFGbTBH5jS35IxvfkWSixJ7CiLZzAGwQuJckrCX9wk4ArWaWwlIDZniPS2seoEHJgqmv1aUhFRlRWFaw5OZPIKgsXiwACHFlNQ5JMTCqhCJHV/hSw9VX18lIqm8zBrBm0AEoqqYWE4aVISIFtg4EPWI/5CZWNUBSC2gQmBHjhjio7K+SZW/eS4clgp0XYSZVMWvksZcovgvQZ3Cc+1JISBmgJVJKVE2vjqQ38GJsloFpbRmYE2fzM21W23Tme92c2eFjEfzCzKsz2pWEPkKzRoA+mLZQ/0gZRCYTJonx4+dguG+4T7fSqiAxMrlKv8ptUgVVzrreI3vX58TQL1y/yybF18uf8LiAmjxxwT4CoxJZs4cne/OZtdce02tZlakP1YUwN/PIHA9qZ236ZyHd325NTPfHGv2m7EDT+z/09/+9MGn9127ZeP92y7beGn/gKqhJK0xsZIycZqYMavkOD05Ozs913nkw198rD2fTfRf9447Lr3zimZ/ynDOWpiaiFVRUutUSQ0xGGwFbFjA6jLDCVQFjgE13iQV51mIoTcrsefvkJIQMcdKDuyhZwnQhI/ThixbKBFYqxTSgECreitM1UcxDRknAlKwFlkOQISYvQADopT35CApK6JpiQi9ht8eZHxEjigIPAUQW68pPB0IROS9Md+LXmMyhAIeOBZfNI0osHAVoACiM1jV191jQx5bSbwOZGJVYlanIKWaMSmSxBmSFFmKLG11cObM/Fx79siBY4cOHZo6M31qZmqg2RxZNbJm0/rBiYnhWidruDPzc4IUbASwVlNiYhJnQWBiEY16NkzSryPDWGtRIwNTR3pwz6HBsYGkny1Zr8CgS41yfwer/M+l8vgs6vY1H39a4l8sOlhhXSzSO+FXisco+WcFL7YSz6BqtCLQABgQX+kqodwc2n0o4eTW224lold/iL5DxooC+HsaCSXbt2z73CMfz1uSDDae/pvHP/6bnxuYqX3g6jvXJjIqmszNNurEXGOOOfzEmndYkZj6xjpvqI9dcuObz7Tx0vEjj/3eg5/5H7//vT/x/Rffdm2Wmrm8I0YoYaiqGIEQsZUQsXWixImI76xHTgFfSkFAEEfMUA/4O4i32dm/tUQcI5sOvqS9pn52GjrjAgTlkOhaCN2iG20kr4owyHCoeqDsCxSEegRAmXdGEcJZxCfU6GzErcO5zyJ/CoGgMWtAi7Qq79AAvv+thqzlwBYCUYjNhpkzIlU8TCvwtKBMPu+hCEYQVMT4Pl+JAUjVJmT6KDWWuy07dfz0K88dfOmp/a88f/DksZmFM52806HcJZoSm9yplTzLHFJggi+6css1t12zZmTjgm3Pd1u55oZBAg46C8pKzJ7QGfMk/NoDCjYMUrGuvZBNnTpz3pVbKIEWCV+I3sOiJSsC1Msv6/IYzCKN/Goa4fUK3XhjvfQvPo1zXhSrWDQFLY4BeJo1CGTZnZGp/afOWbvp0ksv9Q/9ig5YUQB/T8OQuey8Cxqozc/0vfjykd/7+d+5Yf1F111xQZNcw1rAijNiaqaRqlOFEDOYwawkDg5Ok4S0szBByZrJ/hvWXvvC9OY/+uU/fPjrz3/3T3ywPtjM8zmXUO4ECgMIqYP1bd29Te8LE4iSsoZSoQr4zlAeiukRIOI1kIF4sIIISgxw4JAQKVgDOiQSSzFwhaAoXjIGZRB76MK7FP6kBR1WYymJCN4rF4mqMWhZynw9u+TvGctDz1p+E5wC/3sV+kAQPCWAFJN3g09ErCyEPFEIWNjXvGAkzA6ASBfGKGrs6p2W/frfPPknH/nsqZeP6wKGmxgbbm7qH5zctKlhao1arV6rG2LD1HG21c2OTp0+cPLEc3+1+4UHjl1688Wbrzy/b2w4wYLVzKpYqDGkrE6FocQQ8WsbPB2PpzEBoITMiaOnWGR0cpXjQnguk0gX1/Z1OFi6zG+v42ZUdlgK/fdOofI5FRv3AHsoKJ7+WzrbEUTVMBJKkrw+fWy2M9e+4213DA40mcyK9MeKAvjWxrdgOySgi87fMoLhRz7z8vOf/fr16y68YcN5w9o1kvt2d6ZRz60YS8wgY8gpKUGEiJTJwKg4mK5zll23j9IrR8c33/Y9H3320d/4yV/4xz//j4Y3TpzJ264OdiQi3cxyQkwqAnIKkJIRhSMHb0UGl9/EtCMtnGqPfAuBSR2UIUxM6o32aN1HLEa9tlBlMgjQQqzgoAUWEymmFFsbagE1VKo/hrV9ldWtGvYROTrbKPHnReZhVZ0U2/gLA4iLbQhV+UOAb0wY1BUH5URCwiwq3pAnITVEiVIKk1Gyb9fBj/7nP9r90IF+Hblm7bZNk5P9tVrNCMEZ0YTgcsuwaoUM2MlgPZ3YsPaiDRtOzWUvHjj60mcef+EbT199+6XnXLK12azNYb5DzsZuu9GkJzB5UzdqX1WGYU7EHNl3on9wqDnWFLaecOV1WqxMob0L8Tr4NVTQkpZBcapDl/lSF33R8+syQYf4u3+UejV6BfxB9Ucq/gGIWMmpaK2TnNx7sm5qV195XZLUvqWX+P+Hg5YkdK+MVxvB3qnyJV/3jvPt6fve8/bnX3zugnT4H157+ahbqLFNwH1pPROb9HHNoG5SA0Hiy6UREakoJaQkAuI0EUFKSF0iXdGk0R1IH5re/buPffkn/su/Gt+x6VRrhut1cRCnpOqLTyZgVaswjhRwnlYe3mCEtrpAlMlBNHAgLyoRwERMbMgQwbAEqN8ow3g4SCAGiWjReVsLymKkrQR4WrRo7xvqvsVScEpCvoRaWOOiPk1psheGe2F30hKBHqXIWR5srZaKqKSOLRGE0eOgkndYFvBWeM2ZJXCEVBlOTaJWhBKok37TZMEffeRTj33i4XW6utZOxoeHm6Y2UE+hTjRXCBRkyM9H1KmqYQOoy8VQwsQiyVSePXtgz0unTg6dM3b5nddOnjOWmfaCtETjA1g6SN55Ig/vMEyiTGfsl/78sXMu2Lr5xnNsX0cko1iwAqhCQIWeLPOYX+1RLrIGlhuExbekPOCr7bf0GMXP6PUCK1to+XPI+/M7BGSOwGoYNWmYo7Uv/eEDq2rDn/7C5zetGze0YvsCK8Xg3tBQwKlMz55pa+YEvszv69+70ei/6tJrTp9YmFi1PnWSkLJRZurmuRpSJefLgIFVDHzlIAETQ0HCiSbaBazaTNvOcZ3h8npn4caRje8479pf+8n/eOLxPaPJoMvzLuUgYmYIKVTI93dSKEgilVIhvolwcW0I1RQDVz7gASq+5KhILi63LrM2E2uhVtSKy0UEooADRJ0Tn8jq/QBVCNSBVNQSRMSKOA3SnYAi45lAKiRCqqFWQUXix7JolURfP6o2+mLsoPe3CG54VKpyzeHroiQALd03EkuooD+Rr/KppOBQ1ZWYiITJkXMNbhzdM/2hn/utp3/ngfs3X/v2i2/c2r9qvMYD9RzSATKosDKI1DEJwTG5hDXRnMQZYxIwxFkmO9YwO7dvv+eiy/uOu6/9v5/d/eVnajM6qM2UjCqU2KcKE+BJugwQkTIrwI5PHZ52zo1MjCvDiVUoc3zly0UoUite1aNCcT8QCaTVP5VtFi3ukvVcbix3+5bcgyItsQciio8HQmwqqjZVCrkTSrnMHpvN5/JLL71q7ZrVTCtyL4yVhXhjw5H81p/+rw/81D/8wtMPtJic+iSe1x4EMOj2m24xad+Bo8dzhTpRC+uE04RAIspgElLxYpoF6kgs1CmLGBFmIp/+hdR1yNokR55xK7v3/IvfsubSX/vpX5o9ams8BEsKI+J7lmimzpJCHatlDcWRFSQEV+Z/At5OD9FY/wMR+cLNnh8iArWiudPMua6VjpWOc12VTNF1YiGOoQxHcHBOnFNRVksigBpi5jpMTcg4sAOcr2sNVahwrP/sUwFUEdiXETSqjCh2SihfEcIQVfGjy0j0IC+DsqPygBRLjJbSiErN4Y+mHOArJUewRCKcoKbIyVgI+qneZ1Y99eCBD/+fH5576PAPX3X9PRu35IcPDKaZMapQB6uBSKu+4rRXsqxiFERKKqrqIJJSZqzTLIFsHhq8+4Idlw2vf+4Lz3z5jx7oHJShbKDPpXAea2MJDBkmJYhCJDFct8nhXUf7BwbGJoeJHQBfFhUaLy5ca1gIej0AUE/lNDqL4H6Vj2jJN1T5T3vlu1YmVplivHexnC4CTSvOX4sjskLFKNc43fPi3sTybbfcwiuQR2WsKIA3MBSYbc199jN/9bmvfOL7/un7/tP//OXjrdMKV6Aor7qvEvj6624cG10905q3eRaMF0AUDlADCxWBr/gvGgU2q4Q0I0B8PzBlISfOsSJBCsH8zP0XXXFFes5v/dx/wZQOykAipCJpUmMiJwokCC+ThugsVAlC7HNRQb5vjBSd3QMXSYMlBfVKQZRIiJyqqFpAAOuQO4dEhNVCMmRCDgwhceosLCn3JQODjeHR5sRo/+RoY81QbcygrkwudCtwohIqXoun7igACaWRF6HJ1T/L+AUVqaQVo7YUCwhhiorIC95IcYDKIbU8G1EhglRYLEkCTgBmA4JRSrL0gT/76p//h49snKYP3HTntWs2LRw9vrAwp8YAREKs7Bed1OdNh9rPSr40iPpb5AvrQb0bp5B8oMbXnLf1tvMulkPzD3z0M6efOzpoGw1NIFAEJ0xFWZHAiBMSdOc7J46enFi/2gySJgqAmEWKHL4CkKma8K+N/1SAmMW2f++GZ/tYl25IYdFL6b+4BsmrjeV1DcH7ZCRd5GfszInW+OjE29/73YmhlQyAYqwsxOsd/rHdtXv/oYNHLr10w7nbR/7b//hPP/qLP/bIC093VPOzJjuGQSAmXjW26t577pvrtDPN4nMrzlPj1RE5DyarMNRHGUXJxfR9361XABJfdEbUQXPNUwG3uz94y62DLx/8/Ec/XqeBJAeIujlE60w1VdbKi8UKBrNyKglrQpKypCycKqVKJgAbREocSxr4ZKSgpZiImIRJCJJAE0M1J3DqlH1VHOfUgg2bgRpN1O2qY3uyD/37v/rAW//tB97+f//Gf/rY809NOzfRSMeHasMNTgyEYH2Qj8hXxCOGQSA9LuJs90orbx2Wlugi3IACrWkZMVGohFBbhgo5FM+w6LQa/AcF+ch4QnAER4CKuHTok3/xtS/++p9dz83vvWz7ef3JwvzsK6dPzyXIufB1iqhbkWMdj05AaBTn1YAaBTmoWLAKHEl386qRuy+9aG1S/9pffGX3N15q2lq/1IxlA/axfSdKihonxvG+XfuE3OR5a7N6brlr2NeeI4CjTq/+WxHrVLnm3hVYYvDT8lue1TVY/MGrCvqzKxEU2FXvZt4PIAoKFZRSmmrj2IGphbn2zltuGWg2eSUBoDJWIiGvdxDggGdefmW203rH7XdefsNF33zp4Cc/9bkf+8V/8MPf80+++953r+obScB81qeZABjgPW/7rs/+yR8cm5lbu7q/BgKR2NzAwMApSQlkKJwApL5XIwmr8QJIPLlZQSCxQiAlSWH7MveP7r7vZ//0ry666frzLpqwDsQs6hhKrCKBEk8ASESFyEAhEookM3txGbwZz03yYCoF9g4UapgUIiKGA+3HkQMZkFqoIWElEU41YWrML5gXH3/xS3/x5acefc4u1OtmKOHkM7sf/MQff2FkTd+VN++49fZrN5+3eXB02KHdzTo5ulZcoCGp+lw0EQ29CtArmJV6JVVvcWNUN69WTdACUNZyz2Vt4OB8+JIOPi7ig5FOhCkB+SxhK5YIjc/98Ve/8ut/ee/4mnu2XTiesltoHZjvHrcdRwnDSLwMlCWTl8Qi4tn89KIDQwKvwZUkG2rUbjhv+8DBI89+7un507Pbb76kOZR2JAOYWHM4MTCW3YIeePnQqrVr+ieaypmoLxYOw1SE3RHLeFD1giszeU3kvrLHq6FAtOhulIuw7PEXo3ZnHwHwocA9QLwWBcCGoeJy13CNQ7uO1NPGjTff2Neoh8yAlQFgRQG8oaGQvYdfpGZ3/ZZVmzZPTKweveryrQ9++cu/8Yl//YXHv/RD7/7RnRddN5A0CUjPdgClnTuvveii7XuP7Nkxur1ZYxgwGVX2lQ4cKcUW6qRKSqIGxJUMHQJKeroChkkAFZtCJurNu9dd8me/9tFf/M2frEvSZo+gWwOGkoAU3vxXJo8IW2dAyqQ+McCzyw2RjwAoR2g9WOKC0PyFidiLXAeCsBJZNkQgI32pmvnTrSe+8fIXPv3Y7kde6rON9cPnNIf76qaWEPleuKcWZh7+iye/8rFH15yz7sY7r7py5yUbN61JTLdLLasZoKEkAwgM8ikDosGcL1/2uLD+3wIBLkVr/IiCzIm7lckEheVbgBE9ToMvvhn39NsYkypE4VhdApX68Gf/5KEv/erHb1+9+U0XbJlQpq7MUXZsbtY6SpUcnM+7KCpBR+9lKeQSrjEqPfWVJxQAQ1lJpE585dbNI0N9jzy+a6Ezf/GdN/YN9bWplbPVBKpSw9CBfYdnFlrbrruU+yRjV5j4PvZeSGSf2YZFS9mjE6nyRTni18WRFjlfPVsu2duvxNJ0hGUdtSWb9BzOL2ClGHZIJgkhA1Vpz3RmT81uXLvlrnvuMKy0EgGujBUF8AaGQqamT1IN6UAC2IQxNlS/+803r92x+aGHXvyJX/5HV6299p1veedN19+8qm+EQClMz/sBAmGgr/HBf/CBD//8j0/n1EyozoiES4iQkihDApnDI/bkkYFI1VcpmX8CwImAwEYNnHHdu7Zf9PVP/PlzDzx37k07Msk0VVUSsQaxUE9JfxQVQUj7jXm5xDErCl4+ecKOAixEQoZVFJwY67WVIcCSMYZQM33GmNljC1/6zJNf+euHD7w03W+GNzbPadJAvZYwQ7KMUxYFq1nbnFjTXN0VN3Ns/lMf/sLn//jL2y7efMs911584wVDw6vy7kJbFzJYJ05VjWeuB2FeEjn9IMCXzax+omGzyt3r0RpUlXBBlkWSZ4WtWNjHRUiEfP02ToxKztCGNL/0yW9+5r987M7R8fsu3DZmkHRtl/iM7U45KNU1dL0pJlQmy0Wdo4WY1Phf2XZBEYuCAgQYR2AWe+7kRGqSR194+YXWI5e96ZL+Nck0SQKqS8oLtPeFfcNjo+MbV7vECYl63E4C4qPx6ovEgWUe9mJhlhPLuuj/cEmFfl4K9fRsfRYx//qk/9J9on4ubrhn1xIjFT564HQ2n19729WrV4/yCujdO1YUwOsf6tTNz8w3qG66cE4sO7hun+FLtm7cuGb1zst3PPipx37ml//xurHt737b99xz272bVm80qnUyVVzIEO54090f+dWNu04vjG8YqjtLhg2xFYLxtrSvjq8KJmGfceAplRKN+FhqB0QUyymTJTKMVQb3bD7/r//X5378yotrdXScE0HC/ijqY77E8DXFSBnK4kRACQfKPwDAV3rwBBwQwAF0Cia1BJnlyDe0FU5RmzudffULj332T788s6c7lI6eM7w1obQOTpmdZAJO6knunDIlTN3M1RKTMo83B0YGB+YW5p9/ZP9jD78wsWnwlntuufHWy8fXDxvTzpNWLg6igVQamxZQhP4XSXhEWzeWcyu0QkQeNLJ74jclw2iRS1DsVORckRrmwI8SxwrDI08+tvtT/+aPd6aD37Vjy7gRwHVTZGqmZ22WASTCsVAGSvFffa7Qe85o5Mac6ADUUIjBs3EiZICubB4dHb/sqi8/8fSjf/PEle/e2aizUTdAjT27jkwfm778zmvMEOfUJQgHt8LHPyMGVNzOZSRrL4a/5HtaXhr7r84qx6nn/zeIwyzvghRfUbkJKZRYwZJoRgf3HKubxm233sK8gv0sHisK4A0MZ13edaxkxCSgDE5YrXW1hEf7+oa3nbN+3ZpL9xx+5GvP/Oc/+A//9cP/z5tvect73v6Oay69sk51VmOIARBjfO26m9/yXY9+7KNXbhjkBB54UZBoaCfrVIjIqFENlE0q2oKAAOKicxcBxFAByDeMabC7csvWv3zg87te2LX1qnWkEsrVCEVbCcH7BzMlTMZBSK2owie4FuYnBCE0KqpMPu6sPrvAgQkOBqau9dYZ+5UvPvT5j3/t+K7Z8cG1W1aNplpLiQ2ruFzZeRNWnGOQZ6LW6ybLcwDEDOtG+gYGBgayzJ45PvNXv/WZz/7Jp6++9aLb7r5+84Wbmw3T4VbmOj4pgUt0mqBK7C+/yE4FgJB+XAiGSoGKCAAtFl9V2VGUuA5GO2lEgMipMDOTEGUNW3vxhan/9Qu/v71L77zyovX1OqntJtxN0OrKqVYbSInIhmZh4Tw9oj64FktN42CaF64H+R8oVIkCRFTY5v2JuXrb+V87vPv5Lz12+R1XNU09X7CvPL17cNXIpvPXC7o++66S2CGhGl48Q0GGLWGzJSuzVGzqIhEcIba/k7HoNEu/iDGAynYKQDNkM2722Py5a7bcdPPNJmZKr6iBYqwogNc/1BgeHOl3hNOnZ1SMgSIh1UQEhjRJadVI/brLNl+8dd2h26588rHnP/OFP/jE5/7i+qtuf/fb3nHHjbcP1pspEkNg0Jve9PbPf/T3Zlp21WDKlMNBwRHtZYIJFqd4ES/qa+koAUi8G0DqO0h66qAIlImhItn4cN+59dGvfPqhbVe/K3HqTDxa7FXixAHGdm1n1o4MDqVpn5BTzS1y+ELNauHLYxKI4NSX7LeUkGhXjCbGkNb66s35U+4rn37mc3/5tfkjs6PNsfPHNxoYBjErkBPYsBERGAKRiDMgMkZFcidkCAo2UBBYybqmMX2DY+PNwTOd+Yc+/vwDn/nmhVfsuOvNN557xbmN/rrjOYGIqgtCWmPZagBCsfy+QomCr+BvnI9rh0zRGGONIqM3kNDjFWhF9nk5KapKDKPgPOm25c//w0f7D8+89dqL1g/WAVEDOAA812q3MquJrxJEKCZWniosb3FOf4MEpeqFomiBEJwAQNT55GSkpArJ88lVw9en5zz63IunVu+78rrrH3r8yZOzU1fecbkZpC6rkjJERZR8+W/fs9KfPFyfagURqz7xFTCnKvEJ5acUC/+9btbmGxxLpD9VBX+YZuFTkHrMkpUpTbR+dP9h17X3ve2tE6tWsfeDV6R/ZawogNc/mNmMjU3Oz3dbCwuSW7BnDzKxusyxcSlxLU1qA2Zo28aN68cvu/rC5144+PDXH/sX//cXt67b+o573/Hu+945MTZpKN164fZkeNULxw5PDpw7yEosxHAiHsYRUWM8/9/jGaqhKDOiIAieu8R6O0QgIRgF2SRNL51c84knduVzjhsswZIkB+9QkAKGkXCyd/euxx78xoYNG2646frJNaMp9+V5R2FB7NRyqDjGZFJ16jQnI6xZzdSb9Xo2q1/9m8f/5mOPnnxlbqw5vnl0nMSkMMSqYok94KCqxMwOPuvABFnM5JuhE0icGCYFjPF1Q6Vu0omB0fH+4bmsc+gbh/77I78zsXXs5jsvv+6ea0bHh3PtdLWTs3WwrL43DTP5zGmDGC4prD0NLM/IwfHgUCntF4m9ioiOixxTxISZWFmc1hIDV/vMb3/26Nf3vOOybReuXdVwuXAqhqywU3Nqtuso8eW3PXuXishE75nC3fUluReBUSXwH6YacG6CT+MmEhiCzTcOjOTrth5++tjR+qHnn949vmZ84wWbrLGhR5sSlClWcC2TsFExnHuWoYCHqJCtizaJAKQiYj7LOgrfhrHE0Vg6z7CNFnEi8pU2km7t+Csnh/uGr7/mxiRJVwigS8eKAngDI0V6zuTG8cFV33z86TvvvcUoibocYkWTWqJCVoSEOYEhHa03B8c2bt22+vLLN+966dgjDzz1S//93/yPj/z6O+9713ve+YHV4+uvu+uWz//ehy7etLneB5PnSJKk1rCqgCaqzlkG++RQaNkWVomEfEcQR94nUADwwVxVVQOIPX9yNH3qlRe+eXDz9VtF2w5i1YB8dXRHgHNiTHL55TsuOH/Hgw98/b996DfSvsbNO2+/6orLB/oHQN0k7cJaYYFaiFVQLUnYoV4fnWvrlz713IOfevToi6ebZmjT4DqjSQKYhMQ5AhEbQMVnK7CGQnIeePDV1nx/MfWlC1iDt0GAJmyIoVaIabDebJq044am957+ww995pN/+tD1t15x/d3XrNu+rmHa1rSgWS6WoOR8bQtfubTIJAtpr0ylmEUUar0FhcJPQUEWQlEpUEhJvfhwhFSY8mTf86e//McPXDLUuGbTqhoLQdlZuLRW65uabx3vZC0YB/KqOSiliEj5yRQ1mDw2p1piMdWWt8X8QlUiZlH1QWErMCBAyOLcybXJXONrX3qwJe1r33INDaZdaiuEJR6iTEFAZBr1XvuryvCq+R9npiVUVvgQ33YZq4t/C+yoimYsLiuWbAKpgeXWmc7U4dlrLrn6uuuvYQhgvt2T+99+rCiANzAYuGT7RZS5fQdPLOTdRiqAGtSVVZxAhcRbtAQiZiWiZrO2/YLNazeuufja7UeOnnj+iZf/6Au/9zt/+Ntb1lzYmWsfaddenuoOrakNGzYMkZzYiM/mN6LRnAk1dQoeSgEOBzs18NR9B1tiNcDEYN8k+OGvPbXxxgvh5hEiuCLRdlMmJ9akpj6W3n3/HVfedN03HvnG5z7z15/8q49fdsGlN153zdr1qwYH+7meUZoZVRV1WTY/p5/64vOPfunp9r75Jg2u69vEztTAYBFyog4cZK4K+eZkCoAksNpDFyxwtM8lRlkjSk8AVJCYRFRVxZi0DqwdWTMmEzNn5r7ypw8+/IXHtu7YdOVNl2y76tyhycHBpuScUTd3UCvOKqAact0841urMfhCzlbk3qKfohYo62NU3AeQaKqUDT74sWfyw/NXX7ZlXV/KImyMtaCE26onW+22wiYGkJDRvPhcXhnGu6jwpqvnslexlHjSGLUgOC2Be99VTQVIWDI3kDRm5turL984es6Ypa5jR6FStGeVBvYUFc2eF4/ivEWFkGUkuiIGJCqxlb8rF2CJ9MfiexZUUFA/PmhFysQ1Sfa8cpQ7etPOm4Yade/9rLgAi8aKAnhjY/vWLeeu3/T43q8e2T21/aLJXFqiyhzyprysVkhI9XfETGTc0GjSHB5cs67/8ku2vvX+W5584NlP/tkDu144BjQfOnho0/Bko5nUSY3COnCaQslZRQqQMlwiwmAKFZWdghypD/5S4HUH155BJOwyqdeSjQNDX33ilfaZbr3fFwpTXzUgMCFEhEgpt9IhMsPjtXvevnPnHVc8+/jTn/3Lrzz52BOrR9ffsPOWdRvHJyZW9dWxf8/Bp5564ZuPvNQ5pSP1gbX1CaaE1ZAhIZ90acQJcYixRt+kCHF6rMpEUaGxjxi02DLYbhBRJd9/i4hgiE1iEk3qQzximwut2We//MIzD74wuXXVph3nXnrdtsnzV4+PDyR1y2me5bmG9GFAySgzEQuTb//om6B5YI2qPkCpTquYSzUE4JEFNgzF/NTMS1//xtokPW/dZI0JxL4DpJB0xZ5udUPFn4g5RwcE0WIlxOZiFLsfhxOUQyvk0CIRwaMcvhUNEVRdcE3E1A4dPpH3J1uvPc/VbJdyduIngeIaKpkSFNVt9YwozheWYnmBGQG28MzHJfw7HxQXRXvuTiAFx4pASkpw7HI5svfQ5Niqd779bexL6/7dz/B/u7GiAN7YGBsZuv6q6z//2Ff37j5xwaXrnQiIPdG7qBNjKERmmTzVXhNlYzgj1RqNbxi8+/03X3z7xV/+ykuf+q3PP3/w0ONHublxw2RfkgBMZG3ORtMElhwRsW/LJRTfS1USy8SIqalKCHlSUAcYynNxoNHB4bmDu48fOr7lotEcuYiGEsRgkE9LJSEBQ8jmKnlX643k+luvvvr6aw89f/yrn37443/28fkznZSG8pbMnZqvcW3z5JqtzbEG9xlYIdd1VhMVHxsVLZoUMgW4VVUBDtnFxCEMG0xdlEIf0cMBAJiERRSqzIaYmFlEVDVJE064rzE+MjTabnVnXpl55NmnH/7zJ0fWD5978bpLb7hg68Vrx9aMga2jTIWdSKbqxHriERuupQakINEo/70O8lQrAIsMcMQYAHyira8Haujk7kOdw6cuGByaGBwwyJM0ESuGSUxyerYz37XB5VIgxN0D6BSRFoVn5PZEBHrboFFw/1AY5FTuGW88ETOYrGLe2hdOHNyw87yBjUNdbvtIeRmqrVj3iGb7ciM6cL1fL8Lhy3lWci20svG3ZyyZwzKTpsqGGoJKpATLpw/Pdea6N19z03nbzzN/t2GK/43HigJ4Y8OYZOfOGwd+Z81Tj+1983dd5UusSURiIrAc6BZsOLI5oA5JLcmsIzgRt3pt473vvWLn5es/8u/+8KuPnRysjV+3ZniIXDO1SS0RqIPRTGsJQAiJuj4/mGNVgCD0PUgtXl5YUaewgpa4NG2Yttv9/EubLrzW19lXAiuHejAAQUSE2DDgxDLIukxEROTcS9adf/77Tx2f2vXc/uce3f3YA882GmYgSWzmuibjhOpp6iyM6bOaw0lChERI1YmEdQCph1GIVJkisq4xrTayRwK1iUJwFlU4mlRJKAZQyVpnjKn3pXXFYHN49eiEip6enj91dOaBl5994BPPT2wYetPbb9t2yfozC6ePnzx1YurUmemZjus6YxsjtQ3nrb3y8h2TE8PGhAQ3DxaQFtWpewr0VLigfor+F2OUTuw7mnTbG1av6UtrDFGnhlmEusDJhVbLF8NjUnFAlNblQ+SlUeSGVjyDOCqURvWtFQqopeihRiJiGGBWTjhNn3z2BYzVL7jhki5yJUmINJfQJRjFRVXjEMuOakGF8t/i2/CYo+DdUuX7s+iUb9MoZqKLPg0AIgGwziUJJ1zTPDn40hG27s1vfUuSmBBdWZH+S8aKAijH67APlMFXXnHthedd9MhjL50+uTAwnjpPrrEeiDQxpkYEqIA1ZKiKiHZhODWqyip5Ls5t3jrywX/5zv/28x/72p7dDbflktWDCSkSC5MQm5QYBFHJBUzKJAKCLxjss5IIECZmIYhARB2oZW07d9Nz7a6VUdPc841XbnvrdVKDkCvs3SB4VBNOnEBVIMaRJMSOISRzMs/SHlrXuHbDRVffetnb3n/Xy08deO6xZ154etcrJw8n4GZtqNno62v015JamqTEIGIRYTIBMRdPxTEetgLYN9sqIBAABZAQTGSPm3mmPxOUJZCciJCkiamlLII8F8OSq7XWZt2uI5cCg1Rrt7qHnj/wP1/+yPDqZnM8aYz0rV4/MTYxvnlyS32oMTTWbIzUGrV6bn3lIsDnQPuUuoiRxychmtzVQaECB4vOnJxKgLHBJhs2UFVWJhhqZZ25VsfBKFSljNNUkowL/YZKrBnR9FbEWE8JPlXiAAT4djwqYGMAFWZnkmOzM/vmT13+zp00bJy02edoUJEqUurUytUtsuvRK2CXeRVCMlqE/gUsIUrub22oElI5yN/d6IHt4uzUpOxUEqsyk08fOLVm1eSb73trLUnwut7u78Txna4A/DsnDoZf1wNCoP7+wQ/+gx/88X/744899MTd9+90MhOQSQXBspcT/g1RUkq81GY2uWQEp2QkkyRNyJC47raLNv3jX3zXf/vJ//7VA/uazR1ESGxGqfZzvclImRNCwhBiozF+6fl/ML4Qj7Nw0Eykk+cda+dsPjPfzbrqYNYMju5/+WjeUdQTpVxFDCUgUOhiwKpC/qU2GgpHO8eGBYKGdNSSOqolQ5vNdZvOu/z2cxYW5o7vO3Z416FnH91z8tCpY9P7bYcalDZqQ8PDY/19zYQTVTDYgNgkqs6pI0++EIUq+YhsLDhR3AUUUtKYAFkwOGEnYsgYg27X5lmn3em2OvOt1ux8ezanLieukdb6m4Nr1/Zzvb9/cuPg2qG1G1ev3joxuXF12kjq9dSkzASGKIuSCodUNkZskiu+9ZpWZCPgIxMKJWWwQlXFK6lE0JqZJWBosGnYk2ZVxVgyp2bbC7kIJz6H2kSfJpbuQCHHo6wkofLRistQkPKJQrjXgRygSqwwKpKoJsTChk1yRvKHXnlpzeXbVl+4rkttQFigKmRMLBsSDtxzfcsI6V5fodRMxYJEHD0Y/+rrJKkKgblX3Sx5mapOxbc+KoenGBUJFCsCSJGAjcXxPcezM603/8C9Q80m0TKlKVaGH9/pCuD01PT0wulV61Y30UdWmJLE0NlVAQFg8D133nbOh9Z++W+euv3+25nSXLs+RBvYcAEFUoI6dQoisCoSk/gHldn45ldGa9LFhRdt+KGfeed//ZcfffLkScjQALmaceZMp7+W9vfXm31JatgYNkwEmICZsEJFIM5lVjvWtl2+kGXzWbcj0smVJbHEjSRtn5hzjnKnlIiqeKISIBxiaaH+j3cJPGJALrjUChAnVsWxsBEdpYGxgYF15597/bZb3ndv3ranp6aOHJw6uuvwk4++vGvP3vbheWMpTerDfcOD/QONepMYtSRREyIPHA7MEGUyGkRvNOaISNlJpkqZzfPcZjbrZN12t53n7Xa3LciQYGSsf/XFqy45/9zNO7au3zyxamy02T9AdadCnIASdpw738eMHcGCPDBl1OfxRgkL+AqnHjiO2He40TGzqUzIhZfYKgpHWaubMho1hioZn5DGjpLZVtc6KFOo3sQFmh5DCRHkKtRBaceGpKoyPhtSEPxxgoLQEGgicsJITUZ4avee+SFz4x1X5EnHqRivzZRFVFUYJrpdr/k2BIyu8ntBByq0UkgphJIKGYU6gJhMoeXikRTfLqFfTo4WXUXpskQ4SsixafGJvScbWr/zrjfV6gm8pl9RAcuN72gFIIpPfvrjP/vLvzC+cfK2m255/9u+95ILLlVNObRDWX4YpdWjI+++795f/8Pf2vP86XO311RyZfKixCpATAoWBPAbShCBsmMiVVhliCqDmJlATO6GN1375KP7vvi7X+PO1gsmJvqoS5yddinnmZlBPU0SwymzIWJf7FxFVcWps5qL7QpydRZwogIlQ7mIg+lv1NzRzsFdh9ZcvQmUxaQCDfRM/y6xQoXEfxaqjpnCIHY+ssbqiHz/KiNq2NS7tT6zYXR8w9Y15saLvvsH7806+fSpMy8+t2f/3oP7Xjg8dWTqwOmDtp07C/KNCGMZMgYDSmQ4lhdSVWJy8BUzhYnEaFpPav21gcnm6tVjk+tG1m5YO7F2bP2GyVXjQ/X+urB1iYoRJuR5l42KdQQShNyvCFV7kj2LqhIMs4p4QD+UuQgNWRRUxGo1RhyL5F0FQAwIVNRl6jI1hEZfQ6M4FuaZrj3RlTxKfWbSgnaLqHHQG7eMioGCiBbvSQYHiRRwoeIGG5/AAXEMw8xWoJw8e2D/ns709e+7iYeRkYIJvucLkS8Qstjsf61REHJR3ZNQSFBlCBRKiVDSRW5Ra9YsOSFH5cbV8a1K/4riCJpkKfwPCumGUKga1NglM6faU8enL7/88utv2OmJZCvS/2zjO1oBMGHtxGQ7z/bNH/r1v/jNP/jT3779+ts/+K5/fPM1tzf6E1/tmLWngGB8gfk973j///rox/74dz/1r3/l+9RKV62q7/ahKp5oTd6yDhALDEMUIGUVz6PzvQ8dq4Had3zgjgf/5pFH9h5tNkc2D5Kppc5Zb54uiHBMYvFSGiIAqy8fClWQkBdgLE7UiRLDgBk18PSxqbW6SbzsDaRD9dlRBB/BiJA8SpodVcxUn0GkDk5zJg4n0cyChASJJpyIwWj/wA0br7iRryFrslxac+3WTKs7lzkH282ybp51bbvTtpntdjpZlnc7mYhjNkktaTQatb5as782NjrU7Gs0h5uNgUa9WTM1wwkpOSEIWxVRsi3tCqs6FXGGGRCPdagngUtIlTNkBIICdtJo2fu7o/EjoLT2F41Yf0cBOJASk4EV1xVmpKlXYkRkHDDd6S64YB4LQs2jInaMuMT+tGWwNVJ8IrIRqV7EEW73cKIAysxkSZksFJTsO3HimeMHz731kvHzJjtZR4woREPZJih6qe+vZohrKWMXb1PsFvRCSCtnMZK4TufUkal1mzbSELyxEx6gYK6jR/T2lO57fSOWvCqeSv+hFuWqqnYMqbMuaddO7DnJubnlpjv76g1asf5fdXxHKwAAV1133bYLL9inu9/7Y++efunwy888/OP/5qubxrZff/1tt9x67yUXXTY2MGyApHg5lEBgMhs2n//B93/wV37/l19++obN29f5hlZgMpQQEcHX6xEO1D/24pUAZQ5MdBLP3HfMZGl87egP/l/v+0//5PcfPriref65q9NaLeHM5TkoMQkxuk6YCOT7RgKkvrBjAKpVALAEeaVMIpomSS2p739x78X3XikJhcz+2D41AtSlRVq+6D1EzeL103gNPg4NZoYSmBwAVisOxgKWEqWa6W9yc2LQWE44gVPyzYUDJdtjbSG5wZczFVZRJ+qszZXUiQWyXH1GlxKzQGAEJBAYioXhfBOsHuyeovyRYL4XgdQiy7eC9dCysj+mtvqDFwa8IZAjySVlGA65ZgRywOmFBSsoqegBptbFhy2miTI0WcDmFMVnsRsRxXbJCXJNWS3gOD29kD95+NDYJWu337S9K5mwEqk6V8D0HIV1z92tTqVHOaCkPfWoiugPUGRwAaTwNOS01pif7s7V54cHhoUcfJ0iVI4akx1KSf2GRixcHlGxCrYEqpB3Q31TQ4a7fGLvqZHm0P3337/S/fE1x3e6AhgdGfnB7/mef/6r/7yepvd/8J3dzj2nj0699OTur+763Ef/60f7O33bN2674dKbrr382gu2XTg8MFDjOmAIaJj0vrfd8+/+2y888fQ3N21b1+1kVAN72yuY/ghAgxedUARaCKgQwipCUCepMaJy452XvefHj/7Bf/zrbxwZuP6cDeNJwomSOFGX52Bm5xSGPX1RJMYZEDtcaKj8ApA68VZ+I61PHZliEIGYCQ6AEhOkUvIA5AGQiFEDlUyfOKiCV0cajYDgy58JgRSOlAQ5gUQtEalRNWwBShFEaQF9R4hJYzg4dMCh4HKpiXwYf3GFRPNmpgODQMZD+R6AC3weLSzHYO9HBaG66Kp0EZOkWI5CZJbZQ0X5BBISK4aZjRcvrKQd66Y7eU7Gl1wKhutyNdZimKiYHaIaqmwS8fegCtQzi4nIqObEfKrjnjh4INk4tuOuS2yaiThwMAKKRYvRh4oOWHyVhemvPTd/maFQLudGBBEhmzSawyMTxw6dGDtnnFm0UBflxVQ9AERlerbzlPqwUCTRWkE0+6Ph7xOAQ2krKMAMtpg/Od+d6Vxy6bU7LriQ2ayY/68+vtMVgCrecttb/vtHP/QXH/2rddvWrR0b33jexsnN49fnV+TZwvFXju1+Yd/nXv6zP/n6b9fs0NrR9ZNjq7ds2DIxPrFh/frDR3atmpSubc115zu2DSFDppYIEQwnRD5BOMQGoAAHaJqLd5sZJCC1IoaNyeUHfuxtAv2Tf//pWoqrz1k3YJhJ1Tr1gSwFiZASMamEJOAQJyx6NsZ4pqgS02Bz8NjRk5RbSgDjtxXAVGVxtPFQQBSVFaoiADEGFwATJY0GliKyGrVAnX3jSg1hT795TGyKejD69BUz0evMQogCqmDiGLGN0qG06zUmXEWrO045lisozOBCnRS0kGgiR+xFK3uVjJbKdvEuUELsk4wBOOGZdredOee7WDLFCfREVKMojpGAct0puBpx08pKhuslVeMzKziZ7din9x9d6OfL77wqGXKiuXJMBSxvlwa9EnQm0dnkezDv/Y2pTGrJoPKJ8EklAMv4xMiBl3a7mU46bPKaUbg4a6VSw1Us9/JmLDpP5TGLBTuWmwhpMEaK2m8AyCgbMeSSfbv3qtWdN96aJAkVpUZWxlnGd7oCYKK1o6t/4gf++Y/90k999s+/+O4fuAfNOhp5M2Vq9I1cuWXbVRfkQLednznRmpmePXPm9ONTD8w9PZU93DJIvuuH7738tkvn8zYRnFgD6eS5ITJsmJPEmISMAZNhL+0oxGEp5j+Z8FYQHGCFbJ69+4fvnD3d+tSHv5K5fOfWzYMGxKTirPggMvne30Ag8RCK6jaKUE84JHsxUV+tfubkkXyuaxomtHcBSmdeY8YqlbKoVwVU06GCAI7gUNQDCLhHkRpUZM9GQ5qKg4XrlRKWiV8VtqX/hasziogUBW+qkNdaTqQqQUKWWXngChBPkd6J8uhxy7B91EsaPQ4/dwF7S9gwUCMwhU67Djzb6uaOAAY7raxgj0DGIpGm5Q/FapUGPHzwhSjgbKoQxVzuHt+792TirnjLrfV1NcWsC4q1ZBlVVGoUqGeT/lROrpqGXC5CcWeCXKdwOwyD4GDrg/31et/+lw6cf8151tfjAwRVhyacKK6IVs57tgkVX8YHWxdtUXhH4teJiDRXO9udO9WaHF/znu95T8IUUxNWxlnHd7oCIFCN07fd8Y6Pf/JTn//8J7dfsvH6nZewOiEfWbWGMlbpH6TBwf4N2u+wltQkCrUKlzqbI8kzWJ8NLKoKOAWsJThmMsSGk0QMiDhJWLhPKYGyqBrYGJJVsSCDBDbLlOgHfuZ9eZJ+8Tc/30zdVRs39zETia/eA6iIq1CyQ1JrVbL7h96XLWg2mnbaTp04sXZ8fSYWvuGvC74CQgpPIWCjBRgdguKXqLEK1CiiKsEMq4o3WgKsKFB24g4wgDdJe/SNVjeIWidaw0G+hR38XApZoJG3CR8xiFopchP9XlzqFCooMhUztTLd8j9vRhd8SCKhlI1hlsBpIrQdTbWyILHjjrFCTQ8GEueL6jVT5QJjUAExBY4AGIYQhE3b0Td27zlCrUveunPg/Jp1cyAlZYUrnofiXGU4trjms9n21VEVzYUWCwrCB2SCjycChSSNfGLNyOFXDpx/4fkmMdZI7F2ki56A8uSLjYyetfdaLJo1flF7Ng/eJIyCAKfkmNTAGEqmp87MnGzfe+9bNqxfs1L7//WMlQgJiGh4oP8Xf/znBrrjH/9/v3J412lDhgDV3ACAUxXnXGY7TjLJOyy5cxmTKDJKYlw0dPAlUTgVgQo0F9d1tp115rqt+W5rtt1ayDrtvNPOux1n29bm4nI4hVqgbdtdyayydSyu+wP/7K23/YNrHzt8/LF9u1uUmqQBIhFnnfMcVVFfIEdFA+4tFSPOv0JEqNWMzXD66HRKSSGCCzubqAwJV9ajBGdAIXhdYD9FkABxX5SSp1fgBDHv/3DADXytIDBp8IdKr0j9h8w+8dl/4neP4rE4XDx8uJYyQUkr/kb1LhcfaPWT5SVE8RXFtfIgW1FrJkmJWUmVlUV5IcdCps4fmSkqZe+dabw/Xg9WJH+EsKJuKaSsAlAJOo2ZHaCUzDo8vm//QW1dcN+1YztWWV1QVmiRYV29j1FnV5Zgeemv0e/wd6H6SY9aD6Eaz4oKk1RSotzko+tHbO5O7j9pxDCZIuF78UnjjetRf4GIVaxGMR2/S29FQUB9IIdCWA2hiamqKLo4vn+6kTQvu/yqJDHL+hcrY9FYUQAgUMJ00fYdP/uBn84Ouo/9r090ZzklGCNgEjgruahTUhVLIBFREStOyQIWCl+KM9q5SiCBOBFRdSJW1TqxVnKbtbPWdD5/KmudzBamstZ01prptKcXWgt5t+PsQrebw+awoi41/EM/931v/cnbnzo59bmnn5jpZElaVzgmEYgDBBRKAgWZHF9TggfnfSnIWmIG09r+Vw6xlJQUhYcyQhkcUKDOUJSkBIRK9nGJKERAKwXGNJKiKtY/9fwprV8vSQvNQuU7TuUGiC98KSh6ty0PBipw4FKvxMlq3L5UVksAkEqhy8oJyj39n4icaAHfiTjTgOlPLFTBCs7UnG515q0DE5GDSCQiFTBFPGVY+Mqf8KWAJJTMA5RKuoBPViaTtoSe2ndk9+zsxXffOHnJOqddIk3ALAoJlLNoPUffSaPyrqzb0qe/vJ0ViVma2xV9H5kBUuh6MpST7Z/oGxpo7n95H1kmxz7wgAIyDKtass38k1qR91r8TiiouRXXqfBv420jgEmI1DdKJrBBahf45OEzo0PDb7n/zWbF/H99Y0UBAACD0zT94e//wPfe/e7nH977+b/6SoK+GmpE6siZNCUmiKoDKamIinqEhSrkiRiQIoRsJ40CShVwUBGxKl21Xc26kmcu79qsm2ddl7W6ndw6J2Ild5oriXU5p3rPD9x20wdu3Eetzz7z5JHZhbrpN2CIqIiHPSTEflW8EUUhMOBxEZAyYbAxMH3oNPvpcZwthVyFCs2ikLMF+loMDa9xj6cQwZOqvO8R11r8WqRTLRqF/4Qe21ArB9TyT6lqgnSOXyGUnlv8p3L20qoPXy1nH1LP/72eBIFIiEQl0WS45ggkCjWZo6lWJ9NEQ8qpt4C1WC1afIriLGWgwidzqAr5RC4GEYFJCCBecO7pvXv3njl5yT071126waGbq4UQQZlJFIH+E0VuYUQrqqqvxx0oF/JspvIivREeCiWNhWih5LOeE55Yv2bm9JnudNc4f/sK36n60FTO2rvS8YdKBANa6sdCK/vHLRaiCK1SfUWnHPPHF/K5bMcFOyYmRtlnUayM1xorCiAMJmrW6j/7Yz9760V3fOoPPv+VTz6MLE1YQbkonKi6opu2gEnhRbBEYmJ82IO5E+3z4PVrACq8QgiVm8WpWHFWRFRycQKx4tSTHCFwWX9/864P3n3NB24/OpR85uXn959ZIBo2rm6cCdmtXgrBW+e+/UhhWwY7c7x/eP/LB7sLlj08oRFkCKJfKxKqB/pdMqoWfLH9khetOEblJT+r8RmVaO95qbIRUWHml0cvz/pq5u3iEy6dbQV/qKoNVA3nqG9CuimQoDbaZxnOqgjPde2ZTmahylXUvWKM98wtqF8tXZ14mQTftoGJQkcJApmkK/zUvgOvzJ/cce81629Yl5s5UZeyIVVxEtsdczXfrHctlrtHVIp+XXaDyoblj+Ho4RH3aY+kEOjYmjGn7vShqSRPIpRXqvOqs1M56rJqOLCXlrvdxVwYygIViEKJiQVk5fihk2Rx7913psZEPGtlvMZYUQAoqCTMZt3qNb/2i/95q7noL/773zz24Au2S8akFk5UTcpgdSJAIPsH67tEMQNAGR7WKtgBBFwm0GhUKFQJCF4CQmKwr1CmTkQtWVOTxsBA36V3bd/5wdv0/OEv7nrmiYP7lRpNrkvXiViBE4LAOdVgJBeyHwDAREONvvbp9tzpBYKB8Vz4KuaBimnmp15Y84VNHz0FhFTaJSKmgpxQcbTCVEfVHo9/CmlZ3X05bVHF/eOXvUpLK1tRddUXHa3nhhSOQXHuxU9Fz2fezlZAIUMTI11CO0eX+HSrPZ9nAY8I+r2ELGLago8hlB5h8X2Fn0Mq8GR/5lTAoHTe0ZN79+1pnb7wvmvX37ChjTOOMoIoOX8vJN7yyNXR6hJR6Yf03q6K2qSe1Vp+ULllz13yFyCqjeFm31D/wT0H2TKFhgXxXL32QuX/ZfT9koku1p0xyB7y6JWUoIkyLeipg2cmhtfcededafL6KjuujBUFAJSeOoMN89YtF/z6f/zQoIz//v/4y0e+utu1+vq0hkyttVA1xCQhQAf4F67Xs44MF1RkjUbz0Rt5XuhQ5NDHjYPkCulOBCUHZwcaKZPw6toNH7h3431XfmPhwBdefH4WSd0kiWNS45wvLdFjTRVKicH9tYYsZIf2HyVKVJSEWA0JkYtlXyL6X5Rm65XXFMLbATnx2iru0SNUS5+/Io6XhJmX3IC4V2/MYJntqKI2yj3j0hdOTylgChGMxY5Oj3yPEZyoxRfDJr03Wd3Y5KhlzGeuJZjqdrPgiZWyP/JmQrwyrEevTgxbhugFxZ4OagAiNUmjbfnxXUdenp275M23rr1mU5cXHOXECjjfRVkWydfKL+XlLTXul2rvINzP6gdoXKTid4oXSUSUAE0dWzs+OzPrWo7EFDiYRoO+CNFUDlk9XRkLLk6oceVLbVU2SiBEP8o45rw2O2UXZrLrbrx5zZrVK82/Xv9YUQBA5dEmcFJLr7th56/90q/Ws8E//MgnvvHYK2mWNtIGG8o0D2iQOilN3dKiLP7TYBhF6UMAqbIUIEMMqKl/jWLXD/KFngECqaiApWYx2GjOdRemdWbypnMufvcNB8dan3rukRPtLDUNsT5WS059pbJ4Su9ogEhRU65ZPrjniAusfzDAPpEntspbbD4XQU8qXP6guoqobyHWKYiCGJMu5FyJ3WORcKkojUIQxB8qpvtSNVCKESoUZ080orD/NRbWLO9y2FErWwZDspytVxnFh1H2R62izESi42tWoZGcyd1sbmfbmaVYWqkymaosKw6DYg4azVhST6oJ2cwmESIlM2/dIy+9uHv++BX33zp5xcZM2pl0CarwydP+6ri4d4uvEoVEX06sVx7Zypav5gH0uBKlolYAIurqOrJupNPNzxybh41mRRnML7jES85ROSgtuV3as5VfouIt8zF4ERF0zMn9p/q475Ybb62lKZW+wsp4jbGiAIDKo60AA2lCt+5803/+579ippt/+ZG/fvqRfWj1JbbBlMCkhtPU1MIb46vz9FjOlSIvFAx9jv/6343CKBFYyUeKDZTUNz/xrXABFQiswrFoHzglm+eteVoYvXjtld9z2+wW88Cep493u6lpsPOBaBshqYAthYpgIo3EjDUH9718oJs5wDBCl0IlwDjAeQ4moAQh/6+XSwpSsCpBGcoKVjWB9VQ1t6vkolIVVMzsxe/iYgs8iIsiklKag1o5agF0hBdcI66CUEaitBajxb/oxBW0jVRZ1Iga1URhFCxgBQezvfBeQnGPeFkKgoyuGuX+5sGZ2enczue5hQqIo3enUGESX/M0yOhIyw1PSWREeV0s8CtsPI6U1BYcPfDS8wcwfe333Dx53UiXpoSyWpKABFCwUKBo9ay/X7tyAXp0wdKHvrJ1j0+1yJ9D7+6FVtPADVMIYMkOjDVraXLq8EnjjEeB/CoWqSW9ZZHCWxLNoOIJBEolHpRpnE1pkwSXSRwp1ZIUmTm+98SmybX33nOXoUBwWPECXs9YUQA9owAjaml63z33/5sf/7dzr9BH/vMffeMLz9CCadp+zhop99dNmlKijtTFalUVH6K0SqOlXLzupDELqrAyw7MeWv+qqm+DRRSKpymUFXVJUqQiZrbd4vG+a95xK++Y+PyLT+w7M00mdU6LM6vPR1OVkCIgJBisNU8cPNHt2FCKGUJwFNNy4eV5CfvE4UntFKVNgeYwKFQX0gpwFFmehWHe81spWhb9QWUSxZtffLko7Fz5r5hjz/2LYqdk4PS4NkCF+aRxBQrfpVfmqULFI/8UCPoqcCpucKh/eHL1/ump6U6365SYEaqQhp2DegumdeEH9ThDgbPrg0FE4NSSEU5PtrOvvfT8yTS7/r13TFwy0XKzYvKUiFT88wblwq06q5R7Tfv3NTZYpAN6dwvLFhO0WB0k6a81hwbPTM2yNexK/VmoberV+v7u9qx7sKYqofHl5oTwuIKZCGRs0jrdXZhq3XzTTRMjQ8zmta58ZZTjOz0TuDoqiAEZoK/e9/3v+YDmjX/zSz/12T/6/IF9e+qDAydn8vpIdu75w9dcdSVToqrMxdNakkIDzECFQxBFIEg98hKQ9WB3K4HK2r0BxmGQiiixkCh8RWRiMp3uQqO/ftl33/Ykf/2hbzzfPWfrprFViaq4XBVkjBeCoQ6RqDE8VGvuWTg1PX1mcGRYVX1DLAdiTSmGQsuEgFDSWIl8s0YUZquqr8CpUITmJRE+CFay94EKMzF882rLXhygMF570SJCkd9WmvfL3r2INvRAc9ojN1C5WeqvyedHKMAcrsoHVVVDklGRQEsggZCSkMH4uesefPSrl7XPN5JaEjVUNBjwWJ4HePzpGJASZClSp0mdMhwxExKlREzt0PTs4/v32mFz4zvf0txU6+YtGGeUVSyIjGEJxJ9oeSyvAsoaQJVlPtvSvbayKLfUMmQVV0l9fRLTTIbHR469fNzNZ6bBkpCqqE+WKIpreK8lviWhTXRw+CIrtpx2ufDlJ8UXCqaExWjO+3ft66s177rjzsRQebKV8TrGigcQhn+aRX2zxECrbiR05123XHXZVcf2TX/tk9/4+me/fmT3XtvOIanNHIGMMUUANR5oEeeNNMbCECweInCwl0uIwX8bPvQdpfy3qprn1pIImMX70pw5O2/sJW+9Zmzn1of27XnpyAlxtTpqCacgOIFC1ZNMnbBqf6PebnVPTZ2hBCCnJELO9xkGOHISEXMZiLw/UYnxFgYdR+ckCvYiTlBh4gfXpgxuFGtcsbL1LH+W3pnq+99zhDJDqOdX9B5t0c9hEYu4BSCRlqmqtrD3Y3d7je6aMkCk7EidW3/5xukaDh+bIk2Nb7SQsIaFKrKU4DVC6POCkLARWraIMpiIRY2jREyy+9jxB198RsfqN7/nnoH1g7lre3VDodiqqmh0JYtHrserKJ/l12Pi09IfX2OUYKnndhEjRp6IMTw+7MTNn1owjrUSsY0TXF4JxbMvB9pTRcsVRYQK/8/nindwfN+p7efv2HnTTlqR/m9wrHgAYXgMA4DEIsezrdaXv/7FX/vQL7+w+4Wrr73krnfcvvH8tY3+OtVVTcchd+K8rUiBDgFv4Bc1bkobrTSaStc9NqAiICCmKI1gAqmKMqBW2622U0fkCMxCxCSkXelqwjveehXX6ekv7pbc7dhwTpM1t21f50aUDAfqasrGZHTm1Bmbr2F1vnl94sAkAoBIyHq/pDS3i3WpGOhlykOVZx1QgYBHF4a2f09jNR4UucQVeOH1B+oUKKCBQh1Qr+DwbkvP1BbdYvhbFIqVUbgTFTNIUVFtCPwoUHErlUjBEJE1567lUTo0NTu5bkK0a+HUGVKmcPQyiZjIZ40ENeoRf6gaKIiEmJO0w/WnX9nz8on9G6/beundN+oQd92cGN+nTaEgBknhm4Ul7o0z967W8oOW/a7qJb3GqLrJ8LY5FfBW/1CTGccPnRjdPEwNIlPg99U+M1XosThq5dclz19wwkrZHp5SFkq1Nn3iTL6Q33ffW/oaCcc67CvjdY4VBQCFisAwOejc3NzRY4cffvThL37xSy/ufXbBzW84d82/+KEfO+/8c2oDqmQtupZyKzlImIxCqGgn25PeqIoKQO3FV8D2URQ2LgQhhbqP4dUIyLsSgbs2Pz09R8LETGpI2YAAUZgcuoB8x71X9Jm+Rz/37JE8u2XL+ROadl1mDZFhgpAgBTXSWpLx3MlWu+sSypMECRIRFViBUYKBECCh+SzCW1qVrMH2il47wn8AyEtdBNkUPfboBESvPZiMhWMfwCMqqN1x917Mpnqjip8Ui4REcRoKyrf4rtQRPZvFA1bo89DY1WqJL1L0aASgIGNVRteODq8fOfD0zKXsb4qX+yRVFql3f+L6xNMRkfGN5BVGkuR0p/Ponr2HO2eufctVG687zzVdrguaeEJYBHwkPGFl4KhnEeI1R9laoCZL5P0yOoAq/y7Z5awS1Yt+QlHnXOsDadIwc9PzjJQ0BwgqUYn20JSXBWoqiFZ5eVT5LvyoxExKFgpkfGTPyYHG0O2331GrpWeb6so42/hOVwAKOMVXv/7gr/zaLx06eKDVPt3hvDleO2fLuju+94prbr22f7gfqrm0uq7j1DlxYBAHIJOjbVI83VRxksuzVBAiwJeLVwCxQD5rkWVMHmDxAVgjglMzMyfPnGE17JNeGHCWwQIBUabZbG7Pf9OlgxsmvvQXD5x87vHv2rxt48BoK5t3ImCFU2apMw1RMj81O7eQE2eNmjYMPPyvCpBzKkQG5Dt8e3YTAiGJwJGrHuYYXmB/wUHf+e7HlZeYPDjguTFKUZRR4Ql5YaiFxxETbYOnsMRIrPxYkeVFlDm6HsUMKhIwHhFUSPGKpAlmdKgwQNF9UdWic4N6/EVhVbvkSCip166+ZtuXn3js2ML8qqGaFWvIIVJZVEshFs/nU/WEmInIkXGoW01fOnrmiUOH8qHk1u+/Z/W5gx3TEu0KhNWEPPMYE6ouRNXUOMuj/SoOVsVvWORTnVXex7tNxW7FsqnHMsUoN7l/qJnN5W5ezKDxze+cWp/dHFd/GSVPPf+VdkN1MuF0RFAVARuT1hr5tDu69/i1F9+w/YJtdNbJr4yzju90BQCACXNz81977IFrbr/80uuuGN/Yv/G8jaNjI2k9yfKsg2m16uBgRK1yYlRV1JWufZBB/jWvGJVRE5SGY9wsaAQPkfs3IgaAvfT3kQJV6dh875HDC508gWGweHuZFFAmCIQktYIz3YXh80bv/JE3Pfixr//5c9+8ffP2C0ZHa+JPbclJLjTUqHMnnz7dtrxQS6m/0ddf66uZWsqq7NSpN149SoFo78OXj0CF7kqG4Jsdo4hkcHh3qSKY/NJKJBOpMsoYohccMUiuwStCCXEUwFpYw4qFGyU3VRUGYqyBFpuWpYgLWWzFbQnfc2mbxvBs7G6rnlIlEJXc2UxsZvNWN887to9q26684MtjT+2bOTHS3JCyappJqFFcKjUi8rQuPxEmCLFTdmwWcrywb883jx4avWjbbffdMDipXcw7tWBDCiAyzAIQFb0loopCKK5ikag/Kwa0zKhs+yoCNKjYqo6N90KhpBAV6qP+sb5TJ6bnp9uNyUQDh6riO2pxJZWzFatVzWNGPHKcYHAavB43pAp0eOHYjJvr3H7zjY1aQkSV52tlvK7xna4A/IN40807JzdvuPe9916xcx032q1sNrPtLtSxWCsEMgQIyACk6pQM+ZAxBZYIEMrLF6ZrxIzjm0Ph5S1afGgBoERpqsQMhYgaMgxa6HR3Hzly6NgpowmDScGhnERCPkKgIBaQOpJM0RxJ7nzfLbsfeP4rj750am782i07+iBsUU8SMa7GVMttu53NaJtI6qbbn3SajXqzhnotSTklgNlPMDJ8ijWKmA2BSA0iqzJ0vQd7PijI5zoQB/3GTGRYQMpkAg5A6h2MMncginARZYaoMJOIEpdkqmBLF0EVj7GjpCFRIZV63YaqQlJv5ROJioZuylHvAOGAqk7UORHR3Dqr4px2u3nmbCfLW3nezvNWlnW6jtu6dnjNuis2H3lw/458dX8tyR0pOSWmQJKCFuXefJd6YkMMNl2p7T126tlDu88gu+F9N268egunC23tSMKqTCqG4USIIT48pFrIwiVs+qBoCu6T/7zAg17r8Y9x6mUOutwORYG5KMKJihQMohqaw33AmenjpzduX2t9owJvQvQcP6ZqxRvrb1bwpEsNXqBAod+Dxq7crDBqqOOO7ToyMTT23ne9mwvHbWW8kfGdrgD8GBoc/K577/+ND/3mf7z0p8b7QDVS53KrUOLIxxEIE8Ona4lnt0jFEKsYrKTxDUThlkZ4Ab7iV/iKWH0VXwITBwiaWEVbWXbwxMkX9+7POhYgCu3tYiEfAcNz+lXggMQSWadJg86/6+KhDWN7v/CNo898dcfqLdtXb2RKc3RbyOsGmXMdVYXt5nlLO7WWaTD66mlfvZEmaa2WsiHj5yK+B32V0ymAR3uDKe2znKJv4zODODgLCt882XDMegrgFiLOUgpHoEcUFyCNly9V+FiLIhQSbx5T5d0PdmrVHwuEGM9JVBXAiZfw4nxDBV+424V/rXN5bvPc5lasOuc0y5wVzZzLVZ0gV3ECuyCUn1m7Y8PxZ/cdmZ9qjq41lCp1VbuKJJJrVVWJjDdOFYkYzHQ6T+3et//MqdqG5u333T5+zoSttTKxlBC8BgGs1ShcyxrRy0E6FVeg97uIzxSe6NJ9zyosl5X+lSc9BlYQTXSv6whKrtZfT+u19lyb1TdG8BSmyB0uD+YfrGUuqDi0514FXe8tJI3JwEScMS3wzIn5nVfcOL56DXNE91bGGxkrCgAEGOgH3/89H/307/32b/3Zj/7MO5v1PmOcERtQa0OkjpVYVcl3rOXwyKN8s6LEp4hLFGBpeaIQD4zkzwJBEqgEu9Yo6UK3s+vQwb0Hj8+3MgUZY0Qcw3e+C/RqYqMQZRiw+A6RKZyqVbf64g1rN6164evPPvjQgX2nZ88dG2sMNo+1un1q8hyGYaEimjvnrOsqz3WFqJOapJ4maZr01dIkMY1aYpgN+StV8tUjoFAHeLKsKhiqEpchegxRw/lgnToiYvIke4KvPaG9HkZQBqoKZg4XWGBMQRYU4hyoIDMU4ZpCxAeBFJp4BUdNRBWkTh3UiVjrnIgTiKqIWhHnnKqvtC3OOSvinCggAufK2k/B7RA0mylptvrcNVuu3777b3YP9fWv6xs2XWVSYlWIZ3opITaHTrtqXjp8+oXDhzq1bMudF553w8X14SRHJirEiUgATOBFJiJztCJqC2O9FJyLpWivLV+mUhdqnCpbUsXboyJd6zXeluppgmMb9JNA+obqaT2Zn2mRYwYrQSVyv8rqJwGn8+crTYGeeQIotiKgCLSJAglSY+vHDpzKO/KWt31XaphphdH+rYwVBQAABNq64fx33/G+3/jEh5OafP+PvGN8aJQ0yyV31gpUHRtAKfcUfkRDRBEQiiLqp6QRoKBg9BSotKovHeG/VCWKVjyTEcApsiybmZs9eOTYoWMn51oOSA2UQa70LSKXSBQc3j8l8f0dRSGkTnMzkmy69+LxCzft+/yLX3/xlUST+Zqew43MWYEoV2iKQrkISLtZZ6ENIiRMhqmWmDRJammtliRpjQ2blA0zYisaL3YDT0UR045UVa2i2kKxgLE9PhQzeKPvX0ZQxNvLAekNEp8K61dFFQrfCE1EnYiIr4ap6sUPpOCiIlRrClBRyIsWFUCcOCfOOeeTPhSq6pwER0LDuYIQUl8OQxkMARkyAJjFSid31NCNl2xv759+7Jlnt9tt64ZGBmpNlTzRXCAOAiKr7BzlwBN7dj9zYrq2emjnXdesvWCtHUCOFnGADyXAWaQiBojUSS2kIlVM6B7sJIzlYZte4J0WfVPRBq+JF1XOU6iBQEyKGA1Qa6RJPe1Md6Qr1EdkSCJAF2JS0VNDBQtCz8yicgngV/xEI9uaRKyYbjK1f2q0f9WtO29LkqRwuFfGGxorCgAACKhx+i/+4T/7zOc/89k/eGTuJL73+9++dsvqtJYkyHMn4FzFqbJKjUgAG7NSouEKRHHmrZXwTPcSnLkQ/F7KEPvMWlZoN3cL7fbxk6cOHT0yN9vOMmVNgyuhykVlm4LsHr4hhXohouRAUGILJyCYpH/zqsvee+vcEyee+/JTThzqiRPnjIqIUaiyqJBKEOUxbJHnDgRo18P5hmDYJGmSGGPY1GtJkhhjDDMbYsMwzFBv2SsHi1U0WPUhVVQ0AO4UmC0RJephbZIi9DSICQQRcY5sGAVExMM1iBi+hp/Dono3QOJXEurmqwhERL1FLypQkdi9XH3HZaKQ2F0xdDVmaagoSJ1PEhMAMLyQZYOD/Re99YYXBp945tlX9pxubB7dODwwPNRsGiLDyMRmSqcW5p7Y/dyx7sK6q8675u6rhlb1daQdwjjKTBAVitGOsgExAkxOBZOo4gP1mv5L4qfFEXoec+3dz9PQSohNK9stfUdeTUXEnHeTJo1mX+tUe2G6VR/kwBBCgdhRIcxp0dQWzb4sC1RRGQSFMjETd+Y6M8fO3H/P21ePjxZJcSvjjY4VBRCGQjesW/8vf+Qnf/hf/fSDf/3CwZcO3fzWq2687aoNG9emjT7bmRPbdqpWIJ4ws5htUj7NkfGu6Eko8k6DFmamB0WsuMy6+XZnanrmxKlTp6fPtNuZCAPGu8wCCc93EVcuyOBRJEA8tZSVIKJERI4JbDXh4drG67Y+//Tzbq6Vu1xF1MM54uMYhrxlFpn93oWBQn3DWetyBUGok3s5y8QEYWJiGDZskDAbNmmaGGMSwwzPYyUCkYEWGlEjHBGTHorlqqiAKlYdJqQVcpWG4ZsxBF0RDHzx0p7EqRNRiCiciDhfvtWpg3cdvB+lqLSFD1UfIIG3HmMIGo5fTE7U1+4AEcCsKm3p1AbN9rdeM7Z97Z6HXnxi/4v5fuGcB+v9owODBjy10J5amG3V8mvuvn77DRe5RtbSFhKrSiSIRoCC1DswCDBjuC2lqR8nVP5effx6oZ349VK5raW0XiLSX0WMlideslF8yBWASUyj2Zdn+fyZhebmkbwamwnRGADUezil3gsrv1rC6iEiBifOHD96xi7kd915Zz1NVuT/tzxWFEAYBFbIu9797t/83d9//MWXW1P66Jee3Htg3+YdW7bvuGzblg0jfQPiWjnleQ6hRGMP8MgsDCBPJbAZwGvyeoCYNOacMvnWX90sn2u1zszOnZyaPjMzNzfXtqrMiRITjLpKW94KFKokFE1nAAySAmhSJVYGB5oLKUh3vbTr5LHTzYGmncvFwqVkQCRgX5YgwErsQfUImUTEgUKFOYkvqPWFhERBouoCfqGqBGbmqEkKpIfJBNaoNwaZiUOeARsTCEMU/0bHv+JJFbKuYuwHRQDvCDjnQSFxIioqqiKFyxH7pEU0IV5eATsUvwewRQsKanl+iiY2BUhKxDDBOiXuku0qDMyq8zdOnrOpNTU9fWDq5K4zJ/aceP7QcSjnNYysHnr7ffeMbh1s1Vod6UgCIjJqFNaR9RiJFhkWQPyxV+IXFeDi31J9Va6vVxgulf49mxR8hart/2rStMd/CFtKfDwZMDVOG2ytnZ+ZX0NjPdP3K9qjkopqWSUViQoDh+ITH6qLKIGMMjkDa44dPDExPnnlldcSvdp8V8arjxUFUAwFaKA58Av/6l999/vf357myY1DfX0jx49Pnzz90HPP1s7fMLl968aRoaGEG05z4a5TbxAWb0y0aiOrHZE7H3n+otDcuTyTbidbaHfOzMydnpmZnpmbn2+pMpSJjSf6M1RIiDiWY4vvSkwdKKJoPhytAXf3aQQAAJYakvq8Pv+1FwBsOndTK1/odDqUMozHw4k40FE9mh9eJYkC058lRF+9TuMiDSxEwgNp3ZvnFJtS+sOqF2zF8aJSVHgPgCOuVcqf8J9GwzsKEEUpfCK0FNRsyUoJDkGQHlR+BmivZCvcjriIGkmIVORjRwu8enD1VXkITA4hlUHgjFjOIZwwaKK+dsN5W3cOyJQ98Ozhh77yjYm1I7fedVtjtG8BMx3qKIeHw6kUUCFCSjkVmWdF0DPS5wvXqPyxsKjPitD0uqkx0lusNlWlf3GgVxlLdYxW7guBmCltJA55t50zTLwLceLlo4ug0Rcb+OGI/kUKdy0qDVK/8tQ6tTA3tfC+N79v7bo1ZwW/VsbrGCsKoBiB13nXHbfef/utH/vkJxbW1dFNBkabtkZnsvZDr7z89OH9myY3nLvpwvHBRi1JGbknimpIAoiPKQBAo3Tz5EPnRES6ebdr3Wy7PTfbOj0zNzMzNz/fViVj6v5QIVlKQBBmKFz5JsRSEai+I4iuvlc+RCKeSo8E9UZ38Nhz+2cPT2+/5IJzrzz/m0dfGDw1NdlcnTA7aEVQxlBtMN45CHBEg5TA5IO0EA15TaGVjH+PGZ7LoTHfIcri2HgACg4qTAVErNHIrjDxK8K6IpN6zVKKwHC0D4vPe9ajPNpySEI1NVujpRmQl4pILQ8YGSwU9yBHIGILAZOSEZWWdo1yYkg0E8m7+cILx1465/oNl990paPujCxk6DISn0ahLidfP0GK9uXhLleUYUEE0qqNjACiVa6nuHdaiNy4ZlUfIRZXK0CbssF6IW0rz3DPClR+Rs8Ui8fIk/gpTQ2R2sx6PoQAqD4XhZYmlFhUvH/xMIgPBhE48K5I/EPW4PqJYydlQW7eeROroxUh9rcYK2vXMxiocfJvf+EXP/nZTx85MjVyYqJ/skmGOamldc6de2Xfvv2HjkwOj5y7ccOGtRMpJ2wIRKQUemR7kiOVstKp5NZ1u1mr3Zmbn5+Za83Mzc8ttNvtXJUMpQp2ooCHU5gCbFSYar3mFuDhkcL+j8WICICIsCGGgWhdOWmbZx5+ZnjtyMU7L28PWDPfmJ2aWb9ukhIiQFl8EDna9IGsUaSfBv8mpCqAACZiYoWnZEe0IpI4K/ZhkPlVKaUhTgtCYIYgwBgBBCpcgUVJTVgij1B6JpUR9y+YuCXoEyZTzWkodg4Uxd4zRaVTuQFBUnpmu3pnK5b+s0oMkHEqpKxIjx499tAXvrp27cYLd16cp9ZJ7ljYMUMZvnk8Q13kOGmJ/cSzFm5ZZcbVeQeToDpdIDCovFQv/aDSRY3CvRpKjgd9TSvaa4kloGShoZTAxGTSBEDeySveR5Hujt5VpeJhQXETohoiKjwwgXocUeAILTpzZG5ibOKGm25MzEr1/7/VWFEAiwYRcMGOC/+PH/nRX/2fv3b80PE1WyfSJqk4n6Bbr6UsemJ65vjMmcE9AxvXrp0cGR4aaSaGSZEkxncFceKsOutsq9Pp2s5CqzPfas8tdOfm251WnmdWIQCDSKC+zVNBx9AIgXpOOC81yCrYSI989W6AECkz0KTa7udfnj926pb33CWrkq7p9K8Zmj11SnNQzfPZlYlJXGF4RYmuRVaNrz6fBNJR0BPqaykXNM0grQpPHXF6GkmvUdqEYvvejGdP2UEUb54lGrVaOEop1Sswh9cuhQqMQqSUaYXcE0X1i0V4RymIdPESVy8iXF4U/6RqQC44AjDGWHK+njRzWqM0O73w2OcfWjMycePNN+TIbZ6rIVYmQFWduGDYk/fDJCpciFctPqdKIYXBrOVUI42gFP3FapAHSRTCEApuQnlDKNS59avFWKJBzz6qi6OLUnVjxEgBzytgw+IksIV9eKNw1sKVhgvx38S0EMTkj0IPelfP515CRBM2NVPvzNqj+078wDv+weBAcwX//1uOFQWweBDAwP/5s//yrz7913te2rv+vHVrRlal9cQa6ztsgf0mOtfqvLx3z37DA0N9gwMDQ82BvnqdiKAus1mW552s3el02912u9PtdPNupuoAMHkmWyhnAI8rVwRVJEKWkyokWPHiaZG5E/D0EAUIeVY1NWjlzzz85KrNa0c3rOlQ5lLUmukZI612e7jZYIaoi85GOGY0IyNKwMwoKPkhwC1aePGB5xTFZFWKFoiS+rL6FCRUD67NPh20gmhX+I/hXy/ICwiCer+vCOoeYVZgGdW9lso76vlx8SKjBw0JcsyrSQqZ22CCiFAgPpma1mXWPvK5rw02Bm65+U4HsciVAVElEdJYHEICqgYoQyEBHIsq0S+XqCIk1RVMKSquw8eDQqosAfA1+6CsygAKPoK3oCk8YxK0cuFcVNbpdQ2q/ESVpS6+MkyAEHO1c00RTqJi6lVPorryGlPLKfhtgAJqmEgIHZo9NkcdvebKK2s184ZmvjKWjhUFsMwgYHxs9MP/9Ve+64e+75UnXhlaUx+cHAInVnNxwfolIohYVevc/Kns5NR8ahL2jR1BTpyqOHEiIiHF1DeMIkIUiSXEE03n0tIFsEimLZ1jBevw7wuRc5oYSsk0pe/gS3taM/NX3Xldt086RkDUSGuNWmN2YWFopMbMSr5rLZcnC4m3hQ2t8Lx4T4EX9TlkfqKBs+5bJoY5lcK9sNgJWkQBUP2wCHdSpWnAMle51DwH0JMV1TsqwdLys9IpUFQmU0yoUChUbF7Ml0qJ69fIn1zAbAARZzkhw0kifSZLv/HVh6lLN7z5+rwvb1PHkRoAmov4gDGT8U2WyYmSCAPeWoaC4AMCQeGSeothyWyJyFOpmFRZSZ2AQAJfcQgxmKxckAUIos4X41DlkMdQ6u4lK3bW0eNp9X7jFVpgANdqtaiE4t0N0BQFziuqqx4UW/gq6ENP+1cHMf6aBcbp0f1Hz9m8+aYbbyKsOAB/27GiAJYdpMBtd975D7//H/4/f/Abu5/Yt2Pn+fXRAVVmEYWADdTTdGCICXCq1uUEUriAFUSuiie7MBkGizrAN/CNTnGPUVuI0MJmpUqArmKraWV/ivwaQWoShZAY08Zzjz03snVseMvqtrG++btJkkZff7fddm6YjRCXbPxoK1aMX3/QUCeAQwU4VvjEsWiboerWBzmrMQU3qoIeNntUXlpeV2HOn03lFZerSz7r3azQRKqLVIRPQ+v5sMfSDxcQoZaqDih3KMomAIAoGHBJkio5EjKu9sKj35w6duqm229qDve1peMpupT4np9QdUbFs3ahYowyITGUJEmScM3U0oQTTpjZMIuIinrtHCtEwInkzubOZZltdbu5kzy3IGKwQaKh8jYJxIkDfMMur980tJj2aX/xrvRe51lcgfjxoshMtNTj2oQV8r6oGGMUYPLFjYoFLzC/6o3QeLxo73N4g4p7qiRQSlxCbZw5NnvXfe+eXDO5Uvzhbz9WFMDyg0FA8hM/+U8/8+XPH3rxpWMDA1uu2ZqkPgFMRYTJVwVQcc4/9YH4QhqY597XDWE5UlFSjZWHqTDlAAQ5HiVh1TSKn1QURclnjAK6QIUDx5oM+Pi+Q6cPz1/zfbdlfeqoqyBVYUZffzNrt61zdW9oKUlh/lIBMJRGcQH4BkSWIOqrXUrUAxWEN0ywGswLkiNQWKOwiYtQGveFLSoV/YdCP5TCGYW12CuwtKrACoCsYJkoqv8Vp4sRigpz1+/kvTSurAAQF0cVREwsVpH6o9T6ksaBZ/buevrZq667amLj5IK087olYSe5c8QAyBqWVNQYrdfTwUb/UKM+Ojw0PNTf39/X7OtLOanVUgYlxihgqGxRTGSYSUVMYiAQqHOaW9vOsnY3PzU9c3p27szswskzM/OdrOustxmUyRKpEyIDo2Rh1Otd8dZ41P/RYSivsBxhVcrlLjaI9kp1a3+bRMRpUkvKXL3i3lIF5+lVAPF3jZyC6o1VAowadrWpY/Oa8w07bzTkEcqV8bcaKwrgrEOh68Yn//VP/9wP/qMfPPbSyYGx5sQFkxmzhDwhUWElYSaVyI4IKGdBxY/ySxRQgaOC/aLyKqcu962ILC0N0HCWIIAJiIa6wEE47ZoXHnuxb2xwfMOaDtoK9X3giTlNE+lQZl3qb70C7OfS49dH2U0RXQIAMEQFyiEb119cIIAEIY+q9C98glJmFPBLRdx6yw+V81cFQFRDpUosuT0VG51691pkYVaEVmVJy22qxqwCJd1WQ8Z2KeZCAQtVFWYjBIFrUGPh5Pw3vvLoxg0btl2yrSWdrJZbsQShFExaS7iZmqGB/uH+5ujA4NjwyFCjPtzsq7Gp11NjyDDDF4UIXl+wCRQqLjRpUAWRg2FVIGGt14b76wLasHqVsnFCuchCuzM1P3f81Km9hw+fnpldyDqOCKy5IyZS31RONIQAohsQPNUIxZSLu8QjW/xB4SsVN0qh1kElrSWOpOpelLfI55uXO/fokFBHJB7SWx5CYp2rd82ZIzOrBscvuvASYwwKwtMKEvStjhUFcNZBIAbuv/+tP/Hkj/3m7/7m/qf3D61eZcZqqrkyjC8w6XteUcgQhU+Ois+6F2/xAVcHD2yyFmGuJS9Y7wTKoYHhF6Rq8fJEMiVUlQksbLL63Mm5gwdPXnzj5ZJqTmrgCCoqDE4Sk6UmU+kDtEzSKtBUirAVCLFsJ4Ut4DPHCODANFXxkUuo06pRF3bRCk8oXkZh/6PHQC8grx7JHangVKqLyvIUfkvlmIu0QDhNyU+qkukrC01Ld4kzoZggVp0AE4taQ5RokrKhFn35sw8mfcnVd17TqtkF0xWyBEnI1gyG6n0To4MbVw1NjA6PDg030lo9bRjyXFwyRAqfI+EZvRLdSQJEFInh4uKJyFcm9XwpZSGAWYlESfpqPJiaieGR89eMXHfhufOt7tFTJ5/ZtfvgySkGWyVRgqjxJSwQgXgUvN9y5c8mUZdsQwXdNDyJorZrSXlgeLBIAaTwvSv2oyLO5N3W6AlSSDCMN8GzzdQxUZokcHT84PH7bv2uc7duVhWEJhNnmevKeB1jRQG82iCgbpKf+al/+dhDjzz4/NcPrj286fpNXGOF5DZnNsy+5K+EvK/42EZCm9cBPj4XHlTpqUDzqkMrNlqU+xqFZMml8xR+ZgVIuc+lu59/GQnWbd+Ya0ahuA2Hog+snCQiCM3FguKAD+f6FscgxJY1ITHWiwr2kAiH1H82BCYSiMJp7rmnRTVOrxS1aJAetRaibwHfgSziPPFyJF5yoIuGS9ZCGZWrRpUfglm+ZAXDkaNOqwBcVNmgopKKHm3xzkHLNggSbgWrgg0DapwZpqFHH3rqzJHT97zrrnQ4aWlXyTLljTQdHxjasHpsw+TkWH9ztK9WT0xiEkO+oCwMiEAxCE+kqk7Zs2FBoThdsNQppmvHmYV11mAFqBhWUQtShiaJqTHVBxurBjZu33Lu4ROnHnnqyb3Hj1txjlJ1EJQwe4wzoRS7vTp1yTtRXa2SfhuCAMLSFRJOGzWB63G3lv+hks4Q7gcFkMpPgpSUSNmIyeZtPptdcdml9ZoxK2b/t2OsKIDXGAQMDgz9ws//+/d+8B0vPPnC0PqBVZvHLTuFqBIiEF4a8xE0DSacf7MLuCYQYNBrS5393ECwzXrf01LZhG1IRIgNq5GOe+W5XRObNzSGmgvaNkTirBLYkBVL4HqSKokVx4aYSCW8gkQAGKFYabS5/Q8MhLLPUbpH15sMSMCUOmu5DFmHEK8hY9WSD05qqBkTwXmORi0AiChBQSwq0CB5laq6ATG2vHSdAl7VI9rRs8ARZwrgfrFrKPET7c3Cri2Emtc6WoReiFRUVAxpirSB5qn9U88/9fylV108vmWiZVrKtgE32myet37NlsnVY0P9A/VGklAtSX02VKwBDhGBKlfcOvi0PBIoQmFV5mhiA2p88wWNl+3rVIsCSqos6ogTjvrQsBDYSGfr6uF1d9y878Sxh5569ujUGaupCFRTVRGONkqxlhUMqly9csl6Vz/0PStySAhOOwvtJE1qgzVHeaWYU/CEI04Yi55qVXEXHomoPzbBcyhISHI9dvj4YP/wBRde5Pdb0v1zZbzhsRJFee3BhJ03X/fT//xnuFN78asvdE7Nc8RHgNjcxQ+fxEPeiqHIZIP4xk5UATLOTmOsDvVkwPhiKJZjRAZoiJhBgpOHTndPtc7dca5jC/Ixai/cNSFOwEouy+atzZwvBs2qBo7VBWOMIFBfX9OjseFCw4tM4cWE73LiJSsRjKewmCRhkxiTGpMakxiqJ/XUJKkx9SRNkyRN/Z/UGGLDxvjiQgVopowYmCRQJMerhiJvyy6axm2i+4EKxbYkZKFYRF/UIhrAvYoi3kfE4hqFAA5JagpCkiQkpLmiax574PFVowOXXn2Rpl2qd5sDbts5626/5qprL7xw0+rx8cFmf2rqZEhAwr48DjkmxwyTcEpqSJnA4VkJ10ihdWaI7TMpqxCKMqgKLYoKKcWkBAOFOPU9WLyOIQMl26hjy/qJu2+++rIdGxu1rjG5UysRtonXf3ZXqvqw9a48ok3gFTQrzZ2e66v3mTorROK6hxoUXg8X9VXDI1U5XYWJpqqeO81MTIZzOrr7xI7tl+y45GLCiuz/9owVD+C1hxd2P/D9P/TQww/9xV/92a7H9227/cK0LxV1TkU5ZvRAStpdafZ4xFM9mBwkyLJjOVCopLWEiRS4EApjVhWkZDhRJ4nS3pd3JYP1iTVrcnIKZSgpOVFiBZNChwf6h4dqnFLXZs7BEEl0AQSq4vmCABlP/6dYCsifPADSIbsrpBFoKH4BmGDfA/BGbAhdAKq+ZTAAqCgZKhisAmFRhQFgC4tYq7pSGbR04ZYF/INnFDePQfJwD8rPencqXbIgVQtrODpZ8OoIzEQqRElfrX/fM/tOHjv1lrff1Ddmun0Lk2MD501u3LFh81i9wSQJgzSkiakSfHs1pZgWx5BoPGvl4ajiMVH5FDON4Xf2ljI0ZnJ471JjqhVBVADNHZTEkKaMyZG+ay/ZlhrzzRf35VZBMc+s+OfsQrXUnxWHIVRAEfjMOFK4XOZm5oZXDZtarGNY1C0qPBd/d2Jh0AIqRWQde5pVeMJVIcxqJEfekp033Nisp4bKuMjK+NuMFQXwOocO9w/84s/9/ANf+/r+Z48kQ40Lrttq0kRhRaFimRIUHcBRpNJGLnwQm0VNnSUmZ/Xn8rXwv0V8IL43hflUEHX8Nsxk29mhvYfXbdpk+usdnoUvR0SUGHaSqzCRjI6O3nPj9Vm7tfvg3uPTU5nNuk7FCSeGQI4UzKK+/VUExIt3l2KbVi+tok9PgA8QaEwL86FhACKOilLPgcoUmsl7hmNCLI5AoaEjB4M9YEn+5Exx3bT8HxUJsAhEKCWhlzxFQCFupYv2B6GIdofjaJD3FP4tMCRv8Bo11LYvf/P5Cy/aumnbGgzk4+vGLjzn/HPGJuuidc+4IQFxaJPTw2wq8KQi1BJVeQi4EKlvChmWQQEXfZZgTaCYcnhiSv6B1wG+h67XuiIEYWgzTc85Z/OuQ6dmF9olGFZI9EX+UPkhLftdCR95/QbO5jut+YXV529CTYWdpx6gMAIQlW9BHisanxW3VSvZzRQAMbF65uhMX9o877zzjOFyYis64G83VhTA6xoe696+fcdP/Yuf/tl/9/N7n9i/avWqteeOGCWwgg3gi1yCtIDVCsQzvDnLvGRV6kUJXRcwEYDSpgtHooosK45ExGoolzPHT9tWfs6F51hjHamKGG+qS2g5rOqajWS4pmMjo1vG+uY683tPHj96ZnZ2ZqGdZdaF/q2kCKWKfAFo5RJwDUHZgLyX1jbBgxWxhJzvAiZsuILOhHAwRZfA/88ADJGE4jUiqhRKbZOgKBi0GCum6uJRlM+I32k0qatqImpXKm9Cz02pqnD4BNoizqnBmiUylBhb27fniMtmbrn1jrHJ+tD6gcsuvHDINFJR9uCfItj6DBViNhRkZWQEhCqyVEw9aiwCYlVqqKgo4HyLg4qsrjxScRHCLOMXPlSjYkAqJApHQgqSBGKg6hkJKIDgXssknmW5T6PKEn+F7EO1xuRmdnous3ZwYoCMxBxzLWpAeDpEeGQ1MA+kGvQPXbJ95F7FqUk4MTV0zeG9R4eaY9fdcG1Va6+Mv+VYUQCvd3gg6Ef+0Q9/8atf+NQX/uaVx/Yyb5zYPA6IlZx8Hf5QOodK3L5q32i0MwuNEKUUgJCWVA1kVuUdxagCiqqKoNDNC2BAYDI+vOsI6hhdO9ZBV2A5WFiqKsTghERg864xrmak2V8faKbDQ0PnO5mdnT955sz07Mzp+bl2lme5c+oxWAJIEJpNAvEygnGK4H8UHrlS6E+pPnLqaewamwwUcjd4CRSvDlAyBIEhNh78FVGvDHwfYIkQR3E/tOIY9eRKB24qBeS5yJXr0cHFUaK4LCVn4WH4T0qQBOprOKnNbbfxzSeeu/2OS865cGRs/dAF27Y1UYPNjcIwRf6UB0l89SiK9jrHmGkw1BF6fGpUjeVMvc50Kk4DO14UxEUkO5AKKGgThKsvtCWDfMFBj8cbOItTJ6bn5jq+JJAvSaiVG1NZFsRlocqHhT/qf/ahXIVRcS7t1s8cnjEm6R8bkNgDWOPjS4FpFlYSFNmu8UnyzwNRbIxG8NYDLNM8dabmLz33sqHhQeMJb8u8oyvjDY8VBfAGBgFN0/yVX/ovB773PS8e2rX7MR1o9g+saUCRi4TCKwRSX43RKGKabWm6xRc7GHqxz0p5BlSs2MLxDtY3w7c/LOUYM/vetgYwOR/fd2L1htWm3zjqIJRTAYhVkQTRpWqddQ5EqpwoDyToT9LxVcm6kaE8s/NZZ3a+NbOwMNdaaHU67dxlLu86caJWyYmA/RsfnJHoFEhocelxMD9RJYBMTC0oArMBNfLlP2M3qcgngWeAsIEyQcln+xhhEREfCi66QVYRnUKqVIg9njMDqgq0Qg3FHLeQwxcdKy3ciGL1i0w2BXy4hWtp+soz++p9ev1dV5x7wdoN69fXnWErUG8JeKNaAAYMqSEigmNfJgRcKcQj3iEQ9Sxa8bCeP63AOXVOxYe/w9R90AVFzjRF8Kf0C5WgEFZiolytqjKpYXXWtBbkpb1H5rsWXGDzpR9RaJOKJqxK2sqDquUjbNUlyglSzmonDkwNjA43RmsZtxWOKD7PYVkpAlxx5UPWffAB1Kc3ILSmY29hWJ0+NN2d7rz3ve9OmJfkH6+Mb32sKIA3MLxwOW/Tlt/6tQ/d/653zh/q7P7GwXOvWNO/pimsOTkAKo6VA1BC5XsS+6UHI7aI+BWiaQmtmShGUP0f8qI3iilvVpHAG0TsOJ/PZ6fmt158nkusSSBKDBIVXyNUnYA0cjGZYEiJRBM2AjFESZqKSQcajVXNQQdk1rba7VaWL2StuW631c3aWd7NrROxvsadihNxIqJwGjJ2QEREKhLmCaEQP/bYvvf7FQE28JwXLaRNaeMXlqKHrggmMQwNhS5F4du+R+NeA5YQZ1HZPxw5zk18k3ef6ADAJ2GpInSn7BF4CP6TB2XY30oWQ9Yceunld33P3Rdedv7qoYGG1Ekcq7Axok697inYqX46pArygHjFvyOfiu39FEZMuwCcioNfXhE//ZgwEK/HH6UAwfyzFAhMqqSkFo4ZzvmyRZQ7vHjgyKGTp0UpAVeIlBSxHkKvyD/ru+CZnCrEwgoCTG4609n86daWi7ZwA2SIQKrEVOj20mUsckFQgHfeQ/BOUfFAsJKyyXnuxILpJlddcXWarFQA/XaOFQXwxob3l6++8tpf/aX/+s/+2f9xcvcJgVx0y/Z0iMSpQokTK05JDTMp2KNCAfUMBlU0sQo5EFNmK0ogvAEipXwDVIJQJbB3OCBgJQEZ4VMnTsNictO6HFbEkQpQMG+CXcwwEHK+sCeMz23ybWxIyad2JTVSULOWDDbqTiXXkcy5zObdLHPOtjuZFc1zm7s8E9fuZq08b3e7We4c4MSDNia0D5Mg4Dx9NKT5e4alqELJcGCSIpDtY9xXAonQ09w5Kg6/OKxQGMTQI7zkYw29awJYLxJC5kHGKVSFyQTIIchL9WFqhDBJYCp6MQoEIclKzqkxmnLN2PrRfcdWr0pvuX7H+PBILamT87m1opprLJsHJQ5tHW2Bc2hEdcgru3CvvZdE4lMjlASaa+78Ivpnp4SptMAEK0qOAAgJVHyPHwUrnKgk5MvEGiJz4PSJR198sZs7YxJRKeIMBVDpR4FIFqK2AgnF35QAX6eTGUw5pXly8MBRyWXV+klXUzVS+CZagH1EHFIQVSvZbZFp7BeDFQJSdaqWVci23clDp+64+02r10yupH99e8eKAnjDgwADes+73tGfJj/0oz9y9KWTfcPNTRdN1gfTDNaqY/ZGsHrhRyW0UETxdPExNRqp/vfI5yDfYczjNSFySSwk3rAk9c3cCZQiOXV4qtbX1xxsdrirLLG3iIJYS2INqao6IQbEM8cDB1+LNzv47QRWZk7BDU0VNZGmaLBJnXNWNBM33+m2smyh02lnWe6kmzsnzjoRJ9aKU+scxKmoOOdzvKAKUQf2lHYXQQEA5DlLgNdv7GEjUa9LFEGgR+M3MnXibfHEmQLNIRNURsDbVQAJORNBLHneIZOqEFhi/xQKUFRwI5ihqsYQwGKRWHPglVfe+923bd4wWTP1RNnfPDKIWj5o9yKRoKdcQgU7C7MrtgVAEHKi4tRJeVNQ6KLyASkUm/+A4MET/1iIigGSpOZcJqICmWllDz793JmFHEjIgQxTBciLAWRUZlYO7fnZZ+76ubFXJAlx4vjwnqP1voH+sUElpzGcGyDBiDAR4r4VFmhgBhQnUIgRNqROqUsLUy3t6rVX3pCYpJzQiib4dowVBfCtDJ8D9ea33vc7H/7IB3/0gy899DIyOf+68+r9idq2GhAHsoN41reawMmDj/oF17cUy5HbF8uyaXCTS5VQQbIVDIh6rCAe1umJ48dXrR5FyoKArYR3K6aXejmjqk5yQAADRLXj65aSZ2FC1fnMsuJ7gJWUmB1bBdSQMDulRr2eOdfpZl3nurntZrlVZ0WcE3HOiVgLa8VZZ8VaK06cdS53VhTWehKtiFBxJk9W8jKaw4UGm1li3qsPBvhdgpQODdUqUJpKNJy9E6ZEIMOIPEqCCfsGdCgKWWhsZRxcNufrPbHXT3rm9OlGnW65ZWd/o8nksSyr6vyKcwB7wlJzSVXiGPmJ5nbxTQj+BgPeibNqJQBavluQL6nhtVf0lzxk43vMAAIWiiyCUOeTxKmA2JiFXL/y5FMHjp5gSj3uRJ5khVLw+1FJAl5exFLMi9agkpQ1IaG5mdnp6ak1kxvTfmPZSvB2gHgSv28BfPpFKNKcA0gUT+/fjIRMXevHj51qJPV77r3XcAySr0j/b9NYUQDf6lCkJr3/rW/53Q//z+/9oR966eu71fGWy9Y2x+tdsT7D1kMeHrzwdNGyeLN/SSvGnX+kvaMcodEgIdSzOUrngdj7DOJLSMLA5K1sZurMBZddhhS+xrOX9wg2sxBFNF3gRKLbH49ZvvVEitCdXhDZNR7eZ6iSCpGCffMbsDENMn1krGrX2m7dOhGnzqk6cU6cCDlHKuLEOSe5c9brAGutuFxcnjubi7POORGnnvISMX3WiOsTG1YARlUZ6mvqibD3C4LiFK1QKwEoCVWIs95mpQrsRkH6xk73wRwvTHYlH8/2yL0xSV3T3fteuebqyycmhqHWUApRYhWC86m63gKPVNcigF+dQsSDKJag8ACOCtSps+qciMBb8/5ZCGhUIaqVKkn8Qd1QdI+UiYnY9xMgTee77qFnXnhm1z5IquSbl8W+xtSjj6peaO8oTh0eQ3jM3q9LrqlLjhw65vJ8Yv0ENUmNFM9q0BTwiRRVjw3leRUFI8h/52lDsKAOdWc6wwNjG9ZvSHilA/C3eawogG91ePFH/KZ73/Ir/+FX/sk//ccvP/aS2HzbdVuTIc4ckNL/x957x2t2XuWhz1rv3t936syZqtEUTVEvVpdsWW5yA4OxwQQTk9AvJDekQMJNIMklhYSE/HLDDcXmJiQQAqEFCJCAjcG4YbnIsqxmdY1GZfqc+p3v23u/a637x3rfvfcZiXuTH5bGf5zX8pxzvrrrKs961rOMk7ICmZEKU18LAkCOerxnh7K1Rhvvu/Vws90S7jxfJ06SMRYsFEYr55brqNsv3q4QYiMzIk4dRD4osoV2jUTVK6VdvJwFN89rkaJsHNh7WYFAwfmhfuuWzMQ0YFay6QFHKaJpIxoliqcBRqamULWg6lrxGlVFTM0aiRJVREW0biRKbGppGmlijCIqJEoiakbiYJEZXEs4AEbGEGV16MMApDEFvTqm9fCXhEVTcCuIjKuTWdCMw/lRMj85igCoWuCgaiEUQcLK2Rfe9pb3lUwgNhFSE6Suhw1Scpn1ko5mgqbaRKz1Avk7DWJ+xDSF5TkXQv60VPbJmGEysebPJtdFRGDnjJKKVCg/8/Cjn/riI4YhFcHFh6CKrgFh42X9UvhKjstbmIqA3CJRcEAox8MXnjhdFsX2vduoVPOejvYjW8wOaaMT1pbr9RkOAnJOS6AALqisxvH5o8ff9fb3cvB0aNMHfDnXpgP4cy0zK0L4ju/6VtL6e//W9z3++SeowKHr9xZbi0hRAowl3S2c4IWM/uTbOaEXPRZQd0umVvjMmoABxkZqIDI1JlgATLmxpRNny8Dz27aIShKJSV1cni6oU+pgULMooi5tSaG73bPzyZm6R2yWgjHn3ECTVSMOLd0GINe2NA5MpYUhmxaFiNaxiapC6uJ5xmQgRVAzVRIR1VKhamhEVU1Uo1qMGiWKWFSNUWOUGKOqqkLcgwjUfF49DCym6tQbg9cn8u4kC0eWbTEllmdma7ZBeRvre3aABOQoJS0H4ggNgVaeX961ff7I/n0wCwGAiUkb67ZRuaVaRA/ES6e0F8TnVMTdgSZWlbSngb2RwJExtHlMetK8pyoTNX3aMIMJSU+ITM343ocf/fR995sVAiahwMhegvqXWnvdtf/Si57a+NOcp0mGQnl0ZrJ8Ym1hYWF6a1lTFFWP4i3rpLc+OWFCvXIXfDeyT0tgqSd+tVbLsZDwpje+cVCWtHGDN9eff206gD/fIgowZv7L3/Edyvw3//bffPq+pxHtytuPlDvCWCaqIDZhUeWQsFvrB97JgGbcuG/9jQxgMpAxm2avQaA0aENMApgoDDF97vi5uYUtw63DdVpzjAgZD7dUi/RQjAyIJuJK9OZk+VSj7t3hRnkuh5kpaaoVmgKhmyIAsqw3QDAGk1kwI2ajINBhCGIWEaOqwsQV3dQIUIOIqkM+QCMSzdRMoFHEbOCSbSImUUTN4X8TE9Uoomr+ehWYOWii3rxmUVU01TLUTAydwGjqMO7bZjIgCTq1fpgMRkxsDDWNSoGKsiApHz/2wmtuv3FmyEwGggmIWSEmZmbMbve4A1TaTiq389pG7hnxIAZM1dRUk5cFmBOOlHDxXE9O8T6TMRGU0qy5tBNetcnZZG3lfU8+/dF7v1hFAnEIgdSCi8tpqqD/L2DpiSJLgLcVG3MgQQBPYfj40Sebqr7oksvCHFuoQIq2xaOdleFXVt6bnGRBfSqdU6UzJKamASFQOHPizNCGt996Wz8b2lxfrrXpAP5cKwWRZoNB+Z3f8Z1M+Gt/628+++AJVbv8lv3DXUVFKiagVEoEQXMJLKf01LoBN8JEyJMkPY1P0V/rGswACFh91jBqYGyLJ88tXLxTgyAwIMQ9pBkgZlOBQU0J5JMMrHFWfa7DdZi4fw8nvr2bHkOaUN7zXwmpR+tEOkSbgEDMTiYiLoN5hdMdgAMmVkAUYlEIUU1MxZMAjpY5/laolOr2yu1EymDU3KNIVInpI6N59dm5SuKbp2Jw3qtnBVyo80OdkuqoiqlLeRCApIXHIGIlUkZJKNgMwzC7fOrM1V/31qIMHMwsqgnDCe9KKemgpH2US6xtL20fXzcAloEamNfCU407XxgbQ3Tq+LLd5Udi6gWb3EgbDKSKRuj+o898/N77YlM4l4yV2mJPIuZvCMJ7CdGfsSxxQ42gSmRkJYUwYVqzE0+f4KLYsn/BpoUKIwUZm2kXWuTbJQ0ja6s06RpNqJfTuJKrUQoNr54ZHT50+d6D+zYbwF6OtekAvgyLiAIQCv62b//Oqen57/mr3/3co89R0Kted/Vwa6zrxkyNLJoGL8C1sVHGXbKxT4wW/1spsdwJQOpCUhCxBYCgqiykNOTBaHltdXl01e03C8w1PRM+C0sfYGACUeAigEklRYz52zkZfwC5G4c6zR+HgtJ0J4GmIDOTmPzV/RGX2po6MqaARFWyghNS4Rx9eGiOQh0ggomZd/zGKGqq3gZl0GSzTdT3LaiRmqoZRKOIKdS0IVWR2IiqmZailjrVLCcQQIwCK1qKp1e2k6nywNS5PgayAGMuClAU4kEYjE5NZFQdOXgo0dYNzGQGUXE2JBOpGcHIHH9jELLuveZ6SsfhJIJCxTTtC2XZO6YkhYwO/EsYFpSQZ6QBAaQmBim88UpMjRoaPvTMsT+5577lUcU0SPKuCQxDmwB11Wnkn46a9S1tjgwMnnSYEcOYTIhloGVohmeeX1o+u7x1546Zi+brgShJ6vBuPyDFEP1CgrWdy0itv7ljxDQQlxSssXq9WV5af/fXvc6BuD/HPbq5XnptOoAv3zIMCv6Lf/GbLr5o+3d93/cefeC54czMoev2T8+XNcXaYjf6CWoGTpQRam1DisEcaGnlhCwZY8t4RS7tAkwKY6VTL5yJwgu7tpML6FDCe7oezFRzVA7BYI3UUeOAinxb9sCftkcZGWJuqwM9XCOFgy2m1UaHHajV4tNKKfYj/1AysoS4GIO8SRjE/ilewLBgDmWLCuBjx0hUBFkLwrdPlV3zwd2DqRv9KCrRsw0VEUluBRLFjBwvShmGZySWasC+zw7PszKUzWeqlWHL1Ny5J08PaLB1ywybWKKuQlV889zOcjaZzoP0/u8O9ss9yn3+r/+HdNrazMFJuATqXpkEwFOvBhFBLHqyoQbRWKAgLr909NjH7v3C6lokngZpIHRzZPyL2mksf8bKpZEuTklvTm13xIGZVBsZ1OH5J05q1L2H9/EcCyqxiKRClB1Kei86zSbkC6d1MZblrpwTFa3gsHh6OYBvufXWshz8r92Mm+t/bm06gC/fIhCsZLzxrrt++9f/21//wb/+6Y9/eu3Myo1vunVqm8LWzcy0UUJgNjVRYeYU85Nr7wMZ6c+/BMfihbNZ9yDV0uRU01gMwtFnnpnfMV/ODoQbSygrkrJWmjhFYB/pSDE2MUaf+NIjgrthSEaQACfn54jQWnsPf3nbu2ZeMOimYKILLLkFDVpykaMV7auSBcgdUwCDzdTM0wYi0wJECjGiKLEx0TQix7RXv0gpj09EcTcALxSbiIh6TzVUxCTVs9t/YeRv8mMgSGO3WAkCgVIxAGxuajhaX5mbnZoqC5ioemHfPDlhZqZAPcA7kFlGeey8g5ZcauczATBB0T2QGKy5Mw3ZDiu5H6CcPjKIA0G1NlhEcfT4yT+97wuLqyPioWXGAScXn1OPzI7tX73+b3bfOd7u0TYNLv5J3sIXUAQr15eqE8+cnp6bvejQbi5M2EjJSK1XYEgn3DZ8W6I3+P7lNkfvTjQ1ER0Ww/W19dnB9B133M6b8M/LszYdwJd3EcwK5puuf9Wv/6df/oEf/Lu/+du/cfbE8qvuvG7fVbsl1DVZA4uxCVyGECxH2HBait+vzvrJzUmZx544LZl6bQYjRQhBqubUidOHr7yCZsiC5Jmy6CGv1vsL8Bl7yJhvJqMAbdDe3au+eQmJSLlBh2J0Ow2HK7Lxy2O+iJC0HFI4mS1hYr5Swpfy5yC/JsXCCUpigkkOQnP/Qm84lvNciVw3zczIBdiQRjYTQSn3zRFC3n4zA9xWBTLPh0gpOwADjFS1KIgDhlxO1ta2zM+VUwUo2fb2AKafOW2DweXxjBzdaDE2P5kJ/zH4WebAJpY61txDMlrNy77t8wvAEyBlAMRkpKIhFET8pWePf/K++88srQOFl7ETW7hXUchUgBetHkCz4cF8reQGZkMaasOllc88eWwymhy4fN/8rtlJqEQlpzOW4pBcBCd0TRHt9eV9AWxgb71270QIHKzB2tL6oYOXb906x7Q5u/BlWZuH9cu9iAjEsH179v/sT77/b/+Nv7P43Nm7/8cnn773OayGoQwLMFsbHBozMQcCGBRywxcbiKwAytzARUi6Lc5sISZisNBQps4dX67Xm/1H9lGpQj59JTNI2ve4xU32Vp1L39mA7DHcnjmdMhvm9H+D95ChQ3dsg+XecF8nkMahjdzU0MOP2uYmQs5q+lFnm1uQLzCFVFpVULK2zMzM6UggTSQjgJiIKcAImhi0DAqcvjO1VBkIxkSB4FMpC+aCqCAKxEXggrkouCy5LENBXCLMTc0sL61Oz88zEZkxc66Iph1yilOLphmZuiI2ctNycgNEBGZiYiYOPvGQKJBveS7atJV2tB+i5FoWlqJnsBEJKBpiI/zMycVP3f+lZ08vRyuMC8/LnCzKadxm5tn0V4fQvMj8917LRmTBR7dZkIKY62KyWD396NGyKPZdcQgzjJAKzdReMxtP6oa4xNJ/yPlBygTNYFwUZT2OZ08u3nrrreWg/P9AqzbXn2dtOoCXZxkTYceOhX/54z/2q7/wyxfN7P7073ziE7/92XNHV4exHIZBQO7ucrtAaO0TMTzKT7QNZ6YYszF3sbiCyQhDnXn+yZMYYutFWxtESTY36UMk50FGZETEzEQQE5GYBoxY38xnUDxTdZKRT1CJiqq149rbPc3b3f1ALvq1qI5bvewI/G3Z6HfvzFBT/zAmaXgzN6++a5RRLmIK7BJ2ydBzIGYw1KBGBhPLHPukkZa4qGZOS/J6ALodBrxrzV0gFGQIpEj1aCI2i+asT/d0DrSBvDqbN9yrC60nbMP47LySa8t0fJ/J6Qwkag1iPsBoDSUUau2hNKgamRUcTiwufeq+B46fXQZNAawxcgBnqeW+aaeNdj4TX9F+S88nWNd9mEEkK8RYCRg0xYlHnl9fGs3vWth5eGfDtSBynvWGVAynNtnYsDvtdnRuLl0TiS2rNF5sZC3edP31RZFn6WyuL/fadAAvw7JsFmEF6Ju+8b1/+MEPvftrvv74/Uc/+t8++dRnj/FKOcSg1EDGaBkxpOa0bpgyjMCu+dnybIyC5wdOOhSewnAQp45+6dglV+znGW1CFcqg5kNc2Aip65/geAgMxByImRnmsHDCYAyqUDUVb/L11/sNmlAqghOMWvA8pRY9dkbCq9smI8sFZmqdSdsVihTSWiYnJiuYCIqcEwTybZNGY1RREyPLPKJM8ifNDQmSPpYM7PMwfRdVVSTGqIKcGCVBIUpMq7bpTtOmQ9VimkJApkwo6kmcmZlppFGIQsV5sUSUIn0vJ0Azym95oA2ykyAYkVEanNlScswH5mQP1B6ihLWg9U0ggAmBAIgLQlGU4uTS+DMPP/b0qXNN4+4WRSjYYKr+Zc5y7Q40kMgBcDt/HibkyRq1fppABlJK2hysRdCiXoxHH3meiS+79jCmJJJEabwvo3/+zvP66D1KCZ/0uMeJZMSBATVpls+sbpleuP3WW3Petrm+/GvTAbwMqzOITGA2veLyK//Tf/xP/+CH/kE823z+w1+896MPrj67HpQLDgA7pG5pAHaLnrIDNcRsBCU1SkaXWMsimFiQsHp6vHJ6+dJXXSZlQyUZlDmVDtx0E8Cc8CCDMaMsi1AU7G2syOLBORCj1nAjJwgJ6umh/4bkTnoGIsM5ObDPIWXv3k0v4+5VyIiItyclAAfuBjPwpElTyFIOkGNqpDFjOdWxzhKrKQxMzOQjdCwPVIQH8ulPsGNxqhD/V1TURMz8ZWIASYwiUldCIBGbNA0CnKHk7jY1PHQb7gGwqSvZZRKQpQPU5Vuafk3UeG+Lc6kHzUfdsqcVMiFjqLf4uiyTRDq1tP65Lz3+1LOnYsMEkCqgTLkAnSoOBtJ+DnZ+ZP4Sl7HlH2TkbhUECUasIVSDFx4/vnxufcfFu3cc3o5pRaHMlHKefC37JdV6mhevDmGEpRKAGYMGPDj3wsrh/ZcvLGzPdOXN9eVfmw7g5VzJxHEAbZ3f8iP/5B//7m/+zkVbLnrqi0/d98kHx2eVI7MV4MJAQqZMQj4WhMmUiEMgNeMklGhgspCC6cA0YH7owftmd81tu3h34+AxhBNZyBuwiDjjCmXBgUOwUIA4qCUsOQWcbdSZAs0cowJo/QmyMfN7dSNI3X6Mf2hboLUMk6N9NbITQRvdInOKNBUHcoDqmUKjjZoqkgZRYkimD3SDrtnueg2YUtVaTUViFFWLolHaITamZoqkualQIWtYYsighx8G9x9kxihCCcXayurqeBTJjMmYlNg4cTN9BEGAum90C+4ZlZJGFcvZhQLiEw9cDQkmsMYaRTSoIDq5U2Fti4AYnMvELIQaEA6ojU8sjz/zyGMPPvPsqBY1I0hwMW3VxCRuyz3WHTK8lDXunaUWujfAWIkQDGZoQHGAUE5m1k/HZx99lokO3niEtnHDY7Xaa97MQcVS1pcKx5Qvgg4i81lo7eE2Upd601otUmHF6pnRq19752BY5ktic33516YDeGUWmdmgKN75te+49wuffc9X/4UTDx3/wkfuPf3kEtc0IA4hACRpkBaImYiFNEIdxWELbD5aNhhByFhZGzv21NGDV15qhRmiKyQbGwcwcwipSAoCBVfvJCYqysDMud/Vet2plBHYVKtEa9d9ZRi7Jfi9+K5MliaTPtGa0jb/773WWsOQS8JElGYkOrcdqU8qIzaufODNDqkSYanB2Emf3g+Q6Opi5m3C3mDsmI/2TJ9lDIaUkhSpf4RDXejQLlWrq2owNRONQ1lEa4RqI/+63C9N3TFtKyupKJJYt6kfwc2fpHjfxBLWlMJnI1GpNeZ+OseBmMFEQdSimppNannuzNI9jz72xDPHx2OXLDUQU9Kd5h5xl1o3DqArv3eXZ/K1vb87z02U1GUtCMhYMKiGzz90aunU6s59Fy0c2GahbqwWVa9nmyn3cjw6/1O7MnBvO8gICjVWClxSuXpiFCJffeQqBm9WgF++tekAXqGVqpzQ3Tv3/PzP/dwHfvoDWMO9H/780tFRWC4H1bC0cmDFQAJFIBqBJaBmcSYHEwIRg3xEWMFcWnHihbOV6eXXHjFutOVHEhkl+54awtDdhQZjnxcomnk8nbH2F/iQmWTU0mNuk9HyaHJW8OLILH1oih9bT0HIJB1qtwwZV9r4Xp8Fpp3NdMdmrblX9SheNSb9n/ZBE/H2MTVTldg0sWmiuJiQphwlfW6ubPvyYrDHpZ5QSPp6mJoVpAWVZVhaPtdUsa6TOhE4yVdHmOT/UoAPQ1vi8KYE08b/E21Ua5NKNJqD+V6cRq7GgPJ4AIWaE/BVrIlRTRHWann6zJl7H3vs8WPPjSbRDE4QIHeADKSkhNEe677nO9/8t7+86AkfUIMIVmhBcXpgs8snlo89cpQGxaHrDxZzFksJTmLLMg+cfeFLfUt7JXXPGLlH9C7GMKDByunVKYRbb7m5LYtvrpdjbfYBvJKLYMRkW2anvuu7vuPO19zxvd/3v3/+D+45ct2lOw9uXdi7pZwaGFO0WMMaExCIKRXvkOqwIYVWNF2Uj3/piT0HLx5sn5qUIzNHfJBkjinPQ8lvTmQZVVWtqyrGWBYBlBt+KX0+iEyVuPUm+RdHRGB5mq53puYgPHesWgYPsi/K2UQO+tIPb2jy5/I39U2EG0M1U5iQSS4f+9zfBG1oGjWJXHH2VCLpAkVpmtiIiIh/j9PxTVtxvISwczq6vjHJ6qX4110LawB4yMUUnzl3+vSJxcFgYboMMBAVRnBlVknUl/bD03F1f9l5uuTlLJX31dKEzhbSMqixz4aRlG25hrMRLCrWquqpF049euzZ4yeXqhpioe3lMGp3wrLR79v3vDbY0/waAiwz9V1/EFAyQENBMGEJIQ5sVBx94OnlxaVLX3XprsM741A0ie0l/0pA362n+CK3mFF+sGMQpPE/ICIjmGhoivVz413bd+3bt599LNEmBPTyrM0M4JVdifdHJfiaa675nd/67e/+5u++/0P3fPEj9z9195PrxyZYRdEMghUuMcbKjrgkUjkps4/Z5dWl9ePPnbr0mqubMgqZlwACkUMFjkeAjANRoBDIACIORVDTqpqISC/8T9uW7IDfclnRMjdrGbKiGjLu31r1zP9sgYZUwHXHkJoRqC0xWBIBAvzuJ8oWzMxVDdRcGlpFzYF7NUgWjItqUU1SppCgFcuRoiqiaFXXtYf/ampwuQZNKJJTi3LEDSjARqEVyW+9UWrO46aKq+PRTa+5ZjxZvf8Lj8RJ2URSkHsmkIlKO8xGWxFS3xH44xSNo6ExbVSjiZiISmMdSBVVG7NoiObbrI1EgSo3GpqG4prEZxdX7nny6OefePLoybOjaJEIZqwgNTJi82pRB6y9ZKyfk7jujGUqQPKNbCAEWCANBFZVMpTKU9X084+ceOaxo/MLMwevPxi2BAuS5Oy6ihF6znxDBroxI6D+RedkLCYKFLThcyeXr7n2+iJwzio218uyNh3ABVlExGy2Y/v2H/8X/+J3f/t/7J/b/+AnH/rTD/7p4/c8vX58XE7CQIuBhsJnSYKSng6ZQQOVQ5p94qGnZ+fnt120oKEGAeyNq+TFSW+EYgZIic2S4qS4HHzdNOuTyrs5nXuaQzK/JxOtvr1BUyU5wQnIFHbaCOFsMDXW/ueDfakrrdqGdyVfotl8OAKfZaK10TRBTLQthybGqsFcIk/zJ4pZVKtjUzd1E1PtV3z4jM8nE2s/x6WnFRah0ZRUSdpiNCnSDOFAPnahgNmuPXODId3/wFOjJtRqkcygYqLwgrC/w3LLln9FCu8VGjU2Ko2KmPjeaaIg5eKEk4mYmIPAKtMamJiMY1wZ16dWlp9+4cR9jzz50KNHT51ZbSL7WDQH3illaX4qOlivPZXJEHeNBqBOrcF32oMMtD7ew3I1YzJSDGVqfHz85L1PShMPX39oy4GFiipin5lgbfpBbedH3oQukMhflj/d3Q4RKMk5QRlcj2T93OjGG24oAtPG62pzfXnXJgR0IZbn3MxsOizCO772q2+68fqf+Q8/8+9//t8/ePfDLxw7ffjqSy46uHu4fUABkTVylRuBjCmwDmJNTzzy+LU3vYqmNDKpaChSTU3VnCaemf8WAszABGbMThfTZUnMVSNRjAKFPAm9j39k9AIJ2nFj7zT2Nk7MnCAj7xzgvHMJBvIMPyEzPf5JSwfPNNP2CYCgKsn6m8bExnQsiABoC2+5ZhEBTg0yMxUfFdDEWMfGjDsOZuKZEOWirzNq8zh3EDhh/grLEJcL7ogJsXERVOLs1nD4in1PPn3i2RNnD83OzVJwNq2auJIPkbElSClBHJq+2SiXR9LW+EG0NOErVYi8I4PUbKyoGhVAVOvYrKyPT54+9/zJ0+dWJk2jpiUxQZTB7NRTaj0tpU5zIk6SUdk799mU+bi3xV5fnCVHQIBaNAusjKKMU7LCT97/pZXTS7su2XXguv02WwsLQ2HCBJ+9DGvVr9uroP26zi9Z3xtYgivVjJQIWF1cLa24+YYbeZP+8zKvTQdwIVaXibOZhcD7L9n3j//RP/3qt3ztP/1nP/qRj//J6WMn91568WXXHt57eBe2shRkHqKyGYeimH7miefHVbXnsotsqjFycWcCs9d4OaDwQnFBXNBgEIqyHJYDAuampwZFGE4No0ojEpgDccvK9G3K8Cz1AGRqAWXeSBdJT1AmreeXta/om39/SwKGstEDweNfzhQgycQYUU12U7NJajeRiLw+3Cvn1k0TGx8x2Y4zbvk4Le7sdQRKCYyfBUChqRcsKwUxsQu3GoHFBDo1Pbjlzht/8f4/fuDhJ3defOVgahqpX47E30lUsLsd+JYhDXVvyVBAqvcae6ceSM0gSTBHzCTapB5Xhog4is3qyvjs4tK5xdVzK2ujSSViMIIRiwHGCbj3GUF+SDUjK6aO5G0E4ZAr/L7B6TQ5NKlgsEHB7nC9S4VKK3hSPvelZ5959JlyZnj5zVfO7pkeFWMikEK1BX3IkgBqe27Ph6D6NYKsDETwUowpcVHqYPXcyuxgy7U3XMub5v9lXpsO4IItT7UzO4iGXL7hDa/77f/227/yS7/8f7//px555IGTx08eetXBw7cdnNs9Ewq2Wpi4lGILlY9+9t69h/YMt8xOwtjYmLiWejgYMMAWmWnAtHVuduvM1NYts3PTUxy4GJZ1XbtGvRGNYxyLBC6KgjoD3ZXqejPCiHLvldmGWzpbNgNaeTW0FV8zgHy6X88dWJ4NaJBMF/ImXkTTaNpoTIxPU1XTdoKO5I0hc8VQ59XEqGqJ4N9E0ShJ1dNMOZdytReHMrcuKlNdlYgV5vNmTS2PO1BipoKaRkomLmnSyKEjl+7Y8vnPf/pzN99+YGq6mJ4JUGNwCGxmRQg+DTEqXI/VyEgJZqTGqZHWj5Mp0Dh3l4ORRtWqiQ5emcVJo+OmWlwdnVtcXVqarK6tx5j6r8m1QpzrCSinXKrTj6bUzsFOIU0xubdZGJErryUH2IfwAgK0IA5ijaAJAYWGop6bbmZf+NILj33+6Vr0ylsO7bpiZz1U5jSfgYkUIpmtkM16Kipn3/LiLNBScpmUay2EYCCmsLI8PnLZlUyb9v9lX5sO4IKu82pjZvMzM9/zvd/z9e/7hp//uZ//uV/8D49/8dFTiyeuuuXQjl07tixsG4bZLTw89aUTZx45+5bvfBOmY1NotFiUXAxLSM1EW2eHC1vndm7Zsn1u6/z0cGZ6MCymzLQhHU1GdXQM2sy0aRrhwgrqIJt+tExJjNjhnhQU56JdCh8T64Uy+77VjXA4uEWKkLEk6jRushlOFgoaTaNJVAFIVExbBVD1soMYMoVG1UxFomrTSHTJfxPJlQwzZXQjWHKmQjkWdhxJu3qlqY/MYfjwNoWBmKKpRSqpgFpsIhdzi+eWL9q19czZxUcefXwwc6WFqYJDUUBVA4UmCplxYPVeXspyCmY+fNkABXmtWAy1ai1aNaOqaeoYo4jBRLSu67W1am00Xl4bra6M62gaU5FWocScD3AaFGFZRjA/6MW9XL93kKU9KZl/hPSRrSAIxJRJnXWAwlRlYGFQl2tPLz9xz1OrZ9YvvuyiS647yPNorFITMwsU4JKrULBrzWU2by7zZhlWSjhQyzmjlptFIHghXypZObN22zfcMRwOzssmN9eXfW06gK+QZdk8GQG75nf8Hz/wg9/+nd/2B3/wuz/3Sz/3xMe+9NzWkxcfvmRufmHPzOwf/8FH57cPdh3YFVkNMRSi0JJ4fmqwY2HLRVtnd+3Yvn1+y9xwqgzeChaiSW0iOlCrjYwNFrVumqYYiAb4KL7O9LdNuL1oP/9l+Z7sdYn14mvrRo1Q7zlCDvPSQ0lSwjmUZiomUTVqFNftb7mSOU5XU2vZQSKioqKNSBTv7e1w9WzXEjaCxFZ1H5aFrC25O4bHz0o+A9JbHiyAYaaIKBhEFpnn5reder755V/89b/6zd904rlTf/zRP9135GBZErFNTU2TYnrAMOOCVYRQMFy3FQwnuEPAMFGjGlqpVFHGTRxP6kldNXUUJ72a1U1cG4+XV8bjtfGkajQakoiDl2u453p9B5PSq189Zpp0h1ogKkN8lrAz6k4KMiPIAB/tDEQT4ghjVh7E6fpM9cBnHzz7/OnZbdOHbj0ydWCmLsfGVihpmk7tcy28fcRTNssQXx+Aa/1xOlGJM5qTzhipKII0onW89MiRotisAL/sa9MBfIWsNheg/IftXtj1rd/8bV//7nd97I8+8d9++7//6ec++cTSY6B66eTotvfcUGwZrtuyWjTVwYB3b9tyaOeuPdsWdi/Mz00NCRSYOOOtpIBZQUVAjEaU8YcYJUokZqAnuJ559mmLMk6QDXqy/t3dS8Qpn3eTStlJdLWETPtpq4AJf3D0P0qMqtEkmuTvpmwfkiynqIhBRJoYY4xpLLDl7lnkDMV8tGMaMpu/2qfn+GYnSKItc3gZVn0ImcvzBBcIRUElm2nUYnpqNKrf/89/9sbrr3j7m2598slnfuX3/+hzn3ni5tfsX5ifFipJbVILAeX00MQGIbBFJgQjJEW4YBAxqUXWYzOu66qpqiZWtUZRjVCzKDKe1Cuj8fp4MhlHbdS7QUKK0M28yNOdhQ2XT5KTcoZPYu6aeeE85Tqc9hxegjBCmlmQJQTZRIkMQQtQaVPVaXv8M0+dPbbERbjsxsN7rtrTTDXKPjeTGNxnA7OxwXX5vI/Nk4+UAeYKeAuC5Y3yH4yiKMh4tLI+Pz1z6JKDjPDnvq021//P2nQAX7GLAoi43Daz8xve9Z6v+er3rC2f/Q+/+O//3g/+g22Xbr32+mvHMp4UYxQ6N1Uc2bPv0n0H9y9sGbBOD4d5pgxBjZgNVDArghQamCHwOVhRpI4xCicByQST9KThgFy0c3OcrSgA14OwhDJ0kVpLEspYgFuC9un0BMRSx6wjPyamUR29MedCKgxsaqaiUbVuGlETiTFKbu01uCtrIWd0/VRJujQZm7TVaq3Zd5YLFMpscGFSJSgTwaBNE0tmABoHszRcPMPv/5n/cMm22X/2Q98zM1Xvv2TnW9/22j/444/v3Ps1dHCqiiPRZnV9PJia2oKFOS4DxEzVSFOJWyVYZaibetJUk0lV1bU0IqqaLCA3dbM2WV9ZXR+N6liLCZmkagwZkXoHm/Vx8eyAkTKulmSUmjSSu005QmJncq4TmB8kTvE6G1GjQlwTUyE8jFN2durhTz9+9JHnKNq+6/cfue1ImNWGIzFYs5Jdl+2lgY/EnmKwmZIROHN7narUK/0mTAhQ0wKhLIcW+ezi8vaFHde+6mpvz/jz3USb6/9nbTqAr+jllAyBFINyZmHmvgceRok3fO2dYQFjrPBU2Ltj+7WHLju0fddcUcyEdrAuEYiJXTnZodWSy8jKxAwSVROLLHVs6iYwUZo53yYg1hHGN3D2O0h2g0JjUmvoQUZtVKjoJoJlGMCRCouev0AFJm7bXB/boR6zKBJFY2yiaF03jvVIL+a3jFhZxnXIq67eWWspi8i+wlpIyVlEOcVQImiaHaZqKk3jAjiN1HM2u7TS/OxP/Fqs5cd/5vu3zljdNHPzU3/hm97+J5/43B/990999TvfVA7jWhzXdcMc5uaWD158cV3WM8NBoGAhimkjsRGtEKvYVHXdxEZEKabNi43WzWRlbX15db0e1yogZWiiRZmZ5j7odBB7R7krsOa5M+moZwIvKA2Xd3DOfM6m64WYcHLZMAoKtaBMWhoGzdBO0TP3nXjqi8dkUm/ds+2K264Y7gzrNFGVkOoa6SzQhuwwaW6kzvC0fUQE4+DXTYL+OCFYAJjJCKbKKJbOLF+568ZBOeBN6//yr00H8JW+CBzAVaz+9Y/9+K/88i9f/vpLdt+wcBrn5raWVx85/KpLrtgSBqUp+SwwIlVzZmISKTP4hCwxZaJQMAtrQ06imUSZEgkaiBAIbA4woBOQQJ4fDjippL9l1iYERJr4/h3ygmSNEuzugj5qyMgPxFTUG3HJjJ3Kb2ai1qg2GiVKVTdRYuOID6C5ruvItf9FmlioLVxlSK1lbqMcBHdEPEFLLck1izSYCTNEGzJBkFAMVcvZMHv2TPPvf/aXqtWln/2pf7Bv3zZtzEJBxgcWtvz1b3/Xv/mZX73v8w/O7Ni+3qwUXASz6dnl5bPjhS3zu/fsmJ4tDU3JQZqalKM2US26QwYbQ6M0Tb2+Pl5cXK0mTWzIlFgDMYQikboQrAaKlsx7S7xFatZO8H2yxt453uu7S3F/IukbyNudFRbI+wYp/Y2iYUJh5bRM65mph+9++MkvPiNjmdpZ3HTX1dsOTE9CjaGQQEVC1zWST3TL/QTYr4e2luRcN7+yvO2BXLCboZZdkAaGTrB8cvmqt10xGBZ9GvLmepnWpgP4Sl8CQPDrv/IbP/HT//aiK7ff/s6bFoeLOy5eePXVV18yv3tOA0tkKABhBRMJIykAs8PwCahVkLlKF7vpVBXEWMUmBG8/zZIyIMv9921zFygT6/PKEEOLQzBaKmJ+WSdw4+aHEv4kpiBK41YoyTn4vJcYpW6kik3C+hvJwm0tecTFMfyLjIg0WfgM7nveI+kFG9uvjKgtUSREyryEgVpBAZEMJMEU82F6soz/9FO/sXZm5QP/5geuu+wiKKgAGavoNA2+6q47P/vZB//kngdvesNdMzNbiGESo9jxEyeeO/7C4Lly69b5hS2zC/NbS7apYqgQM7JABjKl2MRqUi8uLq+PR3Vj1sCMvEJvAoMaLCAwZXlRAjzH60i3ROZn19pWDmp3C8RATGbavOWWUmaQRvEQwMQKgJUIQ4RQ0/iF+rG7jz59/9OyFsuFqStuvHzfNXvqaREWUyMVIxZvlU6tgQn0S1lXW69IZ95HHXuJyBLW2NWICQAHghlbaNa4Wdc9F+3ZtP6vzNp0AF/pywx3333PD//ID2OKrr7j4EVXbDt07a5L9++f1yHXEpQAUYACKZmpBYDMgWxKDH0yB2Ypz6F16QARY9XRpHLjISiGzOyixuST2L2AmAjePlayzfMTERRZej9RetDBwtkWuOi0wUTN9TtjovnACK700MQYRWqJdRPrOjYSfR6LJWudWEjUWsMcbKoTT4w8C1HJkj/OlUf2DrnAbqqZMQMzYyaLRgEWEOsabMwDyPQMQjMe/JcP/Mbisef/xY/+jVuvP2hRqBjGpkGgEMxQz5RT3/c93/L40/9kcvb5vQcuq21cBZXYFAUVXJjo4snFlTPLz/HJ4VRRhmI4HJZlCSaD1nVdV3Xd1KqkUbyBj2CphQDm4h4EtlRpYcfQRTWlP5nIn2azUU9y38lBqi4Km/XdLCBJhQQq1OfFB6mtCgWYmZvBsJpdfmzpobsfPvn0GdLAW/mqOw5feueRuNWUG4b5DDQhNSKmYGYKTRuZdfA6npilhr/kvpP/JoMqGWDsU49UTcFMZrZ0etUauvnm6ymfr831sq5NB/CVvEwsLi3X//Cf/N2VevXa2w9/1XvecsUt+7dsGUwph+hjT4woKKmZFwx6aKxllouj35RmZrnSjxGpaiUR/giTESiEgpktDRFAUqGkhOUSVLWtFAAdwu4vSGYe3h+alpoakNT2TVxuJ6qqWRNFTJoodYx108QoUaWJrszcmn4QkagRcw502zQk8Yha7CdNoVHzoqPbxha1at9oamCvcZCouvCOiRFBScXiQAaD4dbf+YX/8cT9j/3kv/j+17/hOqGaGCAhIiaDz1vRat+uhb/9Pd/6Yx/4hW3bd9JMUZSsg2C1kIGNmEszU+HxWrOuFfHYKy0ed4P9+BEQCAbNlHhib97lbEZ91CV7yTs3ZSdVi1ToBSUKvhdclUDErK1eBhQZFgoEUTOIOwQKBLOhFHYunHj4ucfvObr4/FLBpRLv3L/j0puPFLuGFU9cG1BUOTDDBIkC4C656/2ynvVP5fa20ttua3v+TGEc2IIpGcewdmY00MHhSw7wZhPYK7I2HcBX7lIAxP/mJ//l5x76/CVX7Pym73jHLXdcQUU1JGqaRpWDU8Ezq9vSYHHJBboUt/udmNGd1BGlaWiXTsS0sjR8qiwHHMrAgZkdHMj9poR2bG0mFQKZedJC/xnqSao2ruhgCo2iYtqoeddu1Chmk7r22L9poku2SY4x0yIApB2u4NC+f1OiPLYgjyF3CqQ6s6Wyp+92t7GUwQeDW0kDhAo2H802PZgaljv/x3/92Bc/f98P/d1vfeObrxWqUTAIGoUoaycxlGIp+tpbbvq6Nzz+kfs+f8l118eGi8CNNaZgKl0ciA1KzGWh0DTexBMYJQabKShF9fBGr0T1Mfe//tqQJIbyiSQj7zPOuVYCXLKRTU+SqzvAKZVKSSaIiCioAgHEWhRNsCU+/cC5x+95euXMChFJGcNseeObr5/aM9OEWn1yaB7aRUDRayawrqKSUhFY0pTtmALtBDkvAnhgkhnDRWCNkaKOV9YP7ju4deuWTfP/yqxNB/AVutyS3v/Q53/6//mpuT3D9/2Vd938+supaEJQUWLjRJjxkehgj8IInaXPUZdXOFtcJulEu72MIiJSk9YxVk1TlYOpshyW5bAsgloISWW6SO4ldRl5XThX/Frb78oHqjBRE0BNvOYZVWOMjUhMYm3SNLFRiTG6EnIe6EgZ5edkrEHZXPqw3WT981QABUi1LfFm9qMvcrfghrUtSbfRJxnU6UMEYvZRCTTkmaHMP3jPg5/9xJ/8ze99z3vf9/ZGRqEAFWyiFBgwg2ROjgkbrPmOb/0L9z/x8LFjTx44dI1oRUVpQqqpDZiJjaGqTJwyszZvUqFec3SGxZKOm0I0k7MIZKapWts7x22t13cwk4LISJXFgIJCujCQJpeZKRXBCMHAiqEVtsRP3fPUsw+cHC/VhEK0jlH2XrZ/+pKFOBSFuaZdOhE+mQ3UYj6AdduUjm7+f9qxRCrzjmFHJluumRHEjIghNDq3ctuNNzFv2qVXaG0e6K/QJYi1Vn/vh/+hDeI3f9fXve6rXkVTNUFUjS042oOs+25ppoY5y86ZdTlUTNGuWb7zMobjXbUNUEk9rm29rodlPT0YTA+HQymnBoNCqeBQENx0hVRVBoDMKQQAlwJ2nreoRtMo2qioSCPSiDUSE5Mn9+56QmBmmox+m09kXCNDB2YZVW5phznqd0hBu9onLDfEutwnZXOYPjxVEpJf4ewInfgfmILpkLadeHb0h//1v/2d7/0L7/sLd9VxdTAMHEKsYyBSR8Q80nUCfyCTpmD9R3//+7/7b/3IqfKZXfsOiAkHayBQFFw2ojALqRRC/l6vZ7Q9DF5qybAOURooRnlrFQxNz7T4T66OZHvqYT8lzM6r/ShSygMFg9X9lpEFGBNzXdoqPf25Z48+8Nz60mQQphoyEAWynRdtCVPccE2k7vAUapbI+ZlRyuf3HrRuoE2zcq0ibWX2UvBrxz1bRABrjWbUXHvddWVZ8GYB4BVZmw7gK3EpANB/+vn//Cd3f/wbv+PN7/zG15Zz0bQxRQBHkUEIaewqEuiew2VNesQOh6ebyABy6TTbECGrqEQgRhFR1E0ZqrWimBlODYfF9GA4VRSDsigDT3EomArvSHX0PBtlH8YrKmoQaCOxjrFppJEmRolRKhdri6Jp6G6i+SdblpmbHlRqG+OnzSfz9gBzAqhlZ9Zy+k01YeVebkTixrSuqguPU8jpfVHpee+N1cDMylM0XFkZ/eZ/+aUf+L5vffdX38nluDCAEGOktlkO4o1tTAxYU8cyFIBsm5395z/8w3/17/yD6eHWLTt2NDpiDoCpRgI7udYhNCMlN+dtMRrZowJtiYVaFnyLqni8bzlv8eeS/ma7eUgNFVl2yb2ekc+7VGMEUACVEoomTE7pE/c9ffT+Y/Wo2bZ924FDBx595HFZk1CUoSisICVhRDBICEoG4fZAZ6y/d+W2DtbawQLUnopuuzNjCwCnowGh0blxKXzVkSsDbVr/V2htOoCvxNVIPH7y1D/9l//qmluu+Kbvevv0PJM1liS3nC+RwNwU+meKhd9UQkixXv4fqG3dMU1IuapZFK1F1cetwBq1IFI1cToORuuT4aCcKgeDshgWoeAQOBS8cX6XQs1EJJpE1TrGGJtGJEaNKipR1KIrWOYmJTM4YMXESX/fyYNmfX3QDDkoYNGrlz0eaAJKLKu75WREiSmDDIY0Hgv9wrivZIkYBFUtgxWBS5sPNv17v/xL3/zOt77zHbcp1huVoiw0wvuDsxHzbohAhKhiBZpgJkqIV1560Y/98Pf//R/9yWtuu23H3ovVxkaNmCgAZTPOAtUMuHkWuBZ0Bwy5JzO07Re+7+m3/Eokzo8XAvKm5YoAQQyWa8ySkgQzM2JjhMDMGng0WHt+/akvHjv26LFqPN558cKR6/fvPbz35NILJx9bKyTwxAjRILnMwkTECJY1nSydp3xQe1BU55/ShqVHkTwdZXqB5EEPzFqsL48K4Vtuvqnz2pvrZV6bDuArbinMEH7kH//IWOp3f/M7du2dK0KjoGBEzBotKbZ39Aqn3rSquwxAUxWQFfBE2zQh7c6LhPsCkaZuoogklpCZ1gVxVVeDoijGoSyKYRHKInBgJgouM0mmRpYkOVXUXMotjWZ3sNnpfo7KJL0x8oougU1FLI3m8vzEAX7LtJZc3DSDgRPM3KY01ivspuJpxqeTNMWLKIReYmjBpECBQGrCxEwoag6Y/rVf/I1brrzqW7/xXYGWmGEcYoxFKMkKM8m9bs5rtNSpaySiDAvMbPbG1976ne/9xp/6xd987ZvetGP3bCSxorRIpr5zxokU70Y8zTukXD/NqZylgo2RpoYM86YtRhq/m48RU/ozOUICCRkxWTr6xiQ+/4ARCrWAIkiBNTvzyJlHPv/42ecXwXTRkT1X3HHZ1r3buRhuv2TXqadPkoV6HEXUexbMlAwgbqvnLR3Lzu8PyacjP9i9EGTtqfJnErIkoShpjeJaNTc7v7CwbbMH+BVbmw7gK24Z9I8+8qHf/dDvvuGrX3fnXdeXYV1hHioppa4uNfMR8DDNmH7WTwPDlNw0kndZiRoULGbis0g8vo5WN1FNo5nFXGIlrsyqOlakw7IclGiCBtZciBRTcdWclJEYDKTQXN3raKbB0eeEmLsenbZ0zWT+nYoEQM1cNCZH8NSWHNJQd294SvhIqkTDWlwnI0MgELhVH+IMkntQbaylERM0SiwGysyFzc4OtvzGz//eQjn9fX/lm2HLRKAQTBGMO94RUvuSqYfqCgOYgoG58M5axOq7vu0bB8PB+z/wyze95XVbdu8mNIFrjSoEZjJ1nJ7JzA9qAnASgyYVWtPuZM6OkU/2DACR93ukGlCHdRFAxgrzsZJQIGoAcfDtZkJZTsqiKepzzdGHn3zi/seqUT2Ymz547cG9N1xSbi2aIhqqPVfuefyeh23V1s6NueEwCMTiJl/TtB7KbloZnBWvM/yfO0Haq9l/dLlAC0smz+0elS1i8fTKq+98U1GGzS6wV2xtOoCvrNVYXFo89fd/+O/uObjwF7/z7bNbGk2gKZDFLmFw22+pJbat9WoCfFOYC4+FXRdAfHJirg4AsCypr9qGcGSW7JuYTeomig6Hg0FJRSi8lwySOgecwEGAGWv68BwepuKkw00QVbKeOKQmBAomlrFqsqwQnQCEXDnMfBKYUvZ0QK4QWz9fQJKeyaASJazHwGRmbI6OAWTKGko2FTJjmfrEH336zNHjP/ajPzQ7HYMfJVGAmJ0Bm7wsgb0BwMxCHiUPZjOhQCIxBApc/+X3fg019H//3C/edOebd128W4OYzzJIEwgoad4lFCe5bUpwSk4GWuA8o/lIChJkZIwcPSefAHOHzK7oaTAumZmCyyqxFgOb0iV77skXnrr/ybMnTjNj9/6L9t9wePuRHTQLK1SImZqZPYP5PVsWl8drSxXGMcxAWi+bcrjkT7tt7NbGEL9bXRNYerNZbjcnA0yM6mC13nrzLUXgTQjoFVubDuAraBmspOJf/8RPHF96/i9/y7svPjjFQyPOiEBCX8iynWj7rlqo2wBNgFAyHa6044Qf9RYrR2lg0TQmabVWYSZLOvikPxUj01rFilKlLIrAKMDkpD3AI1lRC2DzQS5p2K6aQjXPX0eu2WrL58nQRkuqSaWMXCRMQWCqZlJSr3SLQa3yfVv38OOQ4GlKTBqylENIchrkcghRIkIkJhYe0uzRh05+7IOf/zt/432HD88RReVE8nTJBiDRaAgw0mzICNCAYEYEVphF44BoMZhwKP7yX3rH3v07/v7/+RNX3HDLocuuLAZ1FcYqda7UKwHEHCHpvJmko+/7bOwlDs0dHESqpEwMQZJM9RNsILVA7A0fSJgZUWBp1JRCMxgiVOeap5949vlHn148dbagsGvfjgNXHd5xeAdvL3QQwU4RUlev23PZnuWnjk2WJ/VSNbVtEEPIpFzK8FPL5W89ft9oZ6rniy9wPzOdP6BEEDVrKp0qp1/zmtvbJsTN9QqsTQfwFbTM9KFHHvvV3/ivV1938PVvuWYwh9Rjb20dLb2S8r8pdqbu2axx2fFgjCCKaBpVo/fdA4B5AS4Hm8mSti28BDaQRjOJ0sSamJkZ5nVg5kCB2DFoJVUDWVRVqCHNcXfBYFEP+x2wcUtKAMzryZxVioFEWKEXDQK3rtmNWpTE99VfTtkaZmiGkOJpR1AsNQwzFErRq8kiMtSpuGYf/rWPven2N77jba8uB6sKZhAxkwWDUJazRFf6dOAdSG25DLHAAQRTDWwulRxC87a33l6P/tqP/8QvLp9Yvvbma2d2TFUFVEQsTTYjnz5JyffBkPyqzybzDts8z9GgRKQwL22TEbdWOQ1MY3Xup3KgQBZMUK/Vy6dXnnn0+dPHTlRrk9np8pLDB/Zcsmfhkt28tdTpaCRgteB9XjASULH34J4np1+QSVw9szR7cE/SmSMzjQoF2BtCOtebLpj+aesxlbIj9+YRbxlPowvccTMVCJOxNBPdv2/Ppu1/JdemA/hKWQYY2f/1E/9KEO96x+2792/jEDXl35YrfEBKpz0GRroJe233SbIz60AYoEbi1l/F671eFhCXYjOcdy+7Mm9qmGVASVU1jWrP5D72Ts8E+ydZsBwSWiIHdtGhfyzBiChYagrygR8+GD3ViYHsIjokoQWB/K82akxeMR+VfBQpzXdRnySjSfWHTdSMyKgmJoo0xVPcbPnwb32MG/nub/u6opxQoEFZxkaJiaCZVJlD8Cwk6gbZaZwOyAQk0QWYA/FKVlmsv/qdr92xY+FH/uFPfuZjZ6+84fZt+3aVU6sTnXiBXFQTqua9tGqcavltNmTZH1NA8JyOWdRIhYzzyScBJ4WfoimoLkJNo+Xx6ROnn3vy2NLpRRgWdizsvfLSbQd3Tu+c5pkilqohNtYEJmZSNVeVUwTVOLdrbuvO2eVnV86eXrpILobmInw6wrm7EOg1g7Wdgd35xobTmH7Nr/dCMTGYFGhQj0VrjlWz6QBeybXpAL5SVtT4wP33/sGHP3jTG1910+uvGgxUGSZGrf2HY7/ZTMP/bUFwo9ZFdL03ZDAxjSaNxCgi5nx8aVSiiGm+sRPunswwUdGmEklhKGu8a7L1jvJQpusbzGvPKe5Lsa1vJJGHtHAl0l6DVgY4kPYQyLIClDOCtK8JKQJau5z2PzckADmOVsegiNrDR6YwNoqkgJgZKxc2//Bnnn7wc4/82N/73w8emlJaMUMToxNmWj3jVsmY0LN25mkCAeAEPRlxUBhEODCIuAyw5jV3XveB9//oj/yjf3v3hz964LLLL73mwNaL5mOIE63AkmUr3CQaLHDCfMxg3lYNEJubZ4UqAgGBDCZExkYobEgSOMIaHq/WZ184e+rp588cf6Ge1Fu2zl961eHdBy6e3b01zIVmUNeFGtVEQoSCSVSg7dx4MlOmEAratW/bmWfPrJxdQyQrU9EpZXBdhaIP96cIogv602VlWSwon67UtWDp0PmeClYW1y/au2/Hrh1fnttpc/3PrU0H8BWxvOD40+//WczVb/iqa3dcPJBSQgic5155KN1133YxsaXwLINAbeJNxmriMsu11HVsGkk6ayJW1dEpQLDACWOhHhc9FxUy49L1RY2zIIRRvpOBth03dzYRoeDUI+o2tIV7AW+hTaCOqkHNRUpTMVgzJNKCRabsHaeZUoSU9KRSrbNJU11VE4pFasoQzgVUMzOLVSwGVpYD1ukzZ0af+NBH3v66G17/+suLciJlQawaNTB3ovVm3lmQgKpk+5W99EIUkp9MRVkigNhEjUmgTADksisuev9P/pN/9W9+/qOfuuezJ09fcd0VF+/fMZiZKaaaJlYITEkVO3lDJShQAJ5bAOz0pcBEYDVix4GESilN2SZYXRmdO3Hy1HPHV86dsxi37Nhy5U0Hdx65eLB9xoaFFlaF2siYTFU8t1MxGFxzzbLPDsxQtUALu7cZ62h5PU6Epnzuo9dX2s5f6nF1LKdK3bWTL0J3aBmjtMxNa727gUCDoqxH9c7tu5FHxGyuV2ZtOoCviGWmDz/06Ic/8pFbXnft9a8+RLOlWpNql2QwdoZPC4q090g2uO1Tqcsm5AqpwB2ANFEaFeejS5SmiVHFBYOt/eR8R6vBxYk5aeU7KORf2DX/ZBYOshCFb0uSifAZVBnRT+hAauBtIRxKb1VLrUvq1KJU+fV99PkB6EoVLYvJVKWPSACgoOybISZiZlAWgwCmZSi0abRRbmY/84f3bCnDd33n108vBKWGwCZIfW4GhbmVB5jTOJX2GCeWKgwKyeeFkxt0JQVHn8hBnXrr7sGP/Iu/dtdHvvCbv/LBL973hccfLvZefmDfpfumt85IYRIrEmN3XCagAJcq8iqwEYgYKKUMMnCsqKqq9ZXR0umVc8eXzp0411TjqZly78UXXXH9ke27dhQzQQZWc9MMVWxiwczFYs0YRMYw8fPb7+MlMjUNCAbMzc9wQaPV9XrcDHaECPFrzc8x1Hm2Lez/Ypu9oQZMieCVKiqdGwfSxdsYIm684UZqXen/5J2zuf58a9MBXPhlgJD+m5/5v8I83vr1b9yyaw5UE7OOlZi0w8TJJ3Gn6Df7AR+A0qKtIeOwRhCgMRlLXI9NrVHEiEgNtUkldZQ2mAOyME0mj3Y+JjFqmGAEZo0NZRKqG35C97/cCZAcU6ardmEmCDkaTNF9quRKxn9cwZhSddqQOK/tTKvsfpIWEHIm0SL0fnQUJgHRx6q4YIWpWQC4lMET9z3z1INP/42/8nUHLtshoaZAEJCBQsKuua1ImKZaPDqrRdmmEVMuZfrWam4wNiY2IYDACjQDq+968zVvf+stjz/29H/8L7/7sU999tgTR6+46fpte7dPzc5pUWtTU2FkJipFIAaIgocADDKhteXJyplzS6eWF5eX1ydrYnF6y/TOHdtuuOa6LdvnirmhMMA2ZlOKCmWYeQ+zpnQunwNttdw6U5uUOcj9wPTsdFkWdR3Ho/EQM87Ccg0+UO7Mc0CyLX5kCKglc7XFG83XmFl2pZTKWkwM0WYisZZXvepVbV/f5npl1qYDuPAranz4kfs/9umPXHfHlZfdeDCwOeBLCSzNfVEtDn4eSy71J8FVvhJ9ECRqjclE4iTGuomipmpEFEXHk2pSNaZELiqXSJo52CfKUJN1KTmRqGkUEKGdSZI3JbufzNQnO+/ZXCHuq/TkhML3iTKEn0P81thqMr5JMtQ3OKkJufU1j8dTmUI0ehuZigolUSQfgts0EnSIsd738S/ceOWV7/jq1wSqxECmjJAIP5yNeaYWtcg/uhpmrlVYt39Z5AC5E5lgRmwKNhOCgKNwPHLl7n/8z77/qSdP/t5vfuT3P/xHTz4qV914/dbdu8JwqsaYApHPfnHpaCYwlGh5eemZx54alvM79u06eN3hqdmpwSBoIUaqokYSrdLgNN4u/YKi26p0oPOZ6HgE/ZIKzGAq09MzxTCsr1bVuGbME+rs8XNq2aWMKetpwccN1r/73HRhUaYROLDIxADWVtchdnD/gc0e4Fd4bTqAC7wMKDj86m/+RgyTO++6ccu20kIkI21iFnKwRI/0cmAnstV3AylwZ8BYCSTGFcVxbNabelw3MaoqABZYDWliFFEYMwcf2UId+5KgagZm55d6XEgAClYC12n8YuKfIuHCidzdblOy7j2dmi5peTHKm0yr6xqkt1iqD7qUkCWVCII3EQOtL3ATpMk5pN40U7BBmIgAgSmEAwZhiqvZh+55tBkvf8v7/tL8loFyDEVQER+PqznlsERWRGpJTu7AMtrdOYKekfP9Sn1MuYYOU3XpaUDMIsCI1VWXzl/1/d/4vve+7td+63c+fPdnlpePHLzmWrBpbIoA00gBRhBTY2Zgy/btt73pIhDXSDN1aquVVEkpwMzYCM6uTEc084f892ygLfc2tCmZv5p6HSYGCmUoQhBRqSMTm1Gf+9/ubK4DtBlAxiT7ykBtOSeBlaCWt6wWTaaKQVOtz07NHD5yYBP+eYXXpgO4wEtMzi4tfujDH778yn3X33aIpxQMFeNQwHuSbENS3PF8suWhtiDnubYBZA1kYs261OtNXcdoAjZmhpo2MVZ1LdGIoJZzcXcgKQInECXmvm8Ag0DsPE4h4qCdDgTQqwajRyWlVn25d0t3Br5/mxNZykLc0LvQUDLk6SlzIXp0vWRtRQEJ5cj1BYBSwTyBQmQUYEy6jvVT8ZF7Hnvra2+49dYrFBUFAMYc2uSK2oyhzVdycTyjJCkla3Md32NLBjGbvwwaEShLeyppYbBhCdUxKFxyYPsP/sB3v+cb3/0rv/mHdz/46b2XHhnMzrnOh7FnVIxAJIrAE1MgCgvYRL2UmqlTRjC4DqzllCfb0uSz8qlFr5byUovMTIkRipAmtyFdVcwMUtHUG9Ev1HccLaQNa9OB/FtuV+l9rROEXOZ16+wWi8LDTYv0iq7Nw30hl0ebv/5ff/3s0un3/tVv2LqjDMHLlmBmgeZguJdx++IOJrE2HTAAaer6ROOoqdaqatI0ospGTjBU1bqJVRWbKGlCPOVwLo/t9m8iI1ElYuJkZOtJtXxuZX5hexiE3k0NZAihpYIjWYFEjk8bngCEXBJQZJIR+RgwE7iGQ4J6svXPJshy0Tm7Acv73nqUVl+fciuDAmTGyiBIYJRf/PR9U4y/+BffOTUFKYxDMFGfmQJiSh4F3T8bAuV8NpLDTDvXmjzrkqJUI2gVqk0jeW+xaNP4ZGYzUmuqSw/s/KHv//bHX3jhp//9f16rl7Zu36NK4qrOBIsRRBxio7kvOJ/5tvifbDC1elCGnhfqZ4o50+r9nU5QmzpkjJ8AGDPnRo1ECm6loKnnzbuLwdB/LF1L7fWFNlxwdxTKMmiFc6cW3/yqO8syYHO9smvTAVzIJdA6rv/qb/zGxUf23HzHVcNCGxERC1aoSr7BKJtWApCbKD3mA3CepSKB1RpHcbLWVOOmjj5UigkEn/s4qetJVVeThoMRcZmxerI0QybbdgNItVYyZrJamnH1+INP7dsb9xzeYyUAJWU3M9oOX8ySAW2gvhGosvzR+Rfzf320uyM9aTdzKtB/NfqmpTV9yIF/SjhcQdmdF7R1JWXNy+dWjj32xLe+750HLtutGPsLc4SeauEdvJZqAEY56kci5xjlyVzWNil3BtThFM05QoLnCg6AmTEFFvO5WiSQsghNHKOYHNm/8I/+3t/873/ykU/de//s3E4pgqThbQxwlJx1mHFy2SllArVz2fNWehrTP/y9A0fZ3G/IAqh7lau9qUQKPBgMiJjaun/qNkZqSc4pXuZlZZSpzU/bM0/IdIHklAwwVVgY8GD51PLFr93Dm4OAX/G16QAu5DLTuz/7mcefe+xb/trXbFkwKiQgzds2TVFlDtDbYN9Nokv8OjrhK/V81aqjGEd1M6nqJsaUeZO5MH0UbSptxhLH4lWFJgPZXQtASi6S7otoDGCORFLM2Pyj9z6xa9sezGulDRTMUIWRs4/QId/tPiZmSNcojM4AtSara+RKMb31tqPdJjrvY9O84/bfTColUyUqoGSmYsKkJYYDbPn8p+/etTD91W+/LQxiJC04QNXpTb7HuV8t5xAMV4NIwSu5OW63NVu2FgbqYv8WI3EJaEdWkriGpW5tM2ijDDZCg2jbp4fv+5q37Nu548Of+RzRnAgCERdcq5oph1JMAuCyEBndyocjfUcHvRgS8GRtyI7WjZ4HxqRz5CkaDKomgqIsZ+Zm1FRT96/1St0pmUgNvx3da0P4nw4LaSo8WFspafv5VGpp1upD+w+GsFkBfqXXpgO4YMsAVfzmb/3W1u0zV193RTldmk4ACuRSvyGFwVkSwBIUlHAUc8OXxoYowGISTcfSrDXVqJpUVaNeQWRSaC0KUJSoUbQxqVUt80B7mG3aMoMRFMrEgIoRGQ8w2H/wkqOPHz/37NmtR+Yai4hiZEYM8kIhp7D8fDuQrEWOEnP0mu2I5S/tLephW10OkJ7Lzipr0uXuIpcpcOPmfoktFGRKPCkni/L8E8e/+1vevO/ggpFwQUQg5Ra4z71ouUpNRL22pGzjkNE3S3azowl1JjZhM14K6RWQW5zE94qZxRjQIGUgkqYpC7rr1TctbJ3/tQ99zGjIKBpRBCJlJsqS2amWa3kcTnvaNgByG44ZNqZQLaEnn6oEXpFBjaHRmqouy3I4PfBGPNfVCOxC2z5rxzIBrUtQyRi55pyvrXY7KMcyyU0SoQjFZLVpRs3s1Mxm+P/Kr00HcMGWAUurS3/8Rx+99vZDV1y1DzYhLi3GpBpsRK6b7yiCx6CqmjFUUQXUx2xTMDGpNE5MViaT9VjXjWgEGxcUACjEk4D18Xh9NG4aUYUknkui1vemsCawIQWERg4CNFwPZoe7L9r1wBfuf+PeOwK4CQCZT6k3kAv7tyXZjavXxHz+T3ThfTYrqTiQkagUlyYr3L3RgR6kakMGhcwJTGYkLk7BMixs6v7PP7Rzy9TXfv1d5YCoAIg0WpFxa/QYr0BrtRKKTwT4XPp87jQBHZTNPNgtbFst9mYxAD4MksjAbp0TbJbxGxCrgMw4lGTgZnLzZUfCYOZXPvghES14GCPUfDi7qVlgR6C6QN7bonsl6d6x7Y5VPsD57zbhan2zQYkYhqpqYqPTW2eHc4NoE0tFevdtnNNTZI5ur4aPHoKX9zBN/ULuOMwTygxKas1Ig5aHDh/ItLLN9cqtzZzrgi2D/fFHPr7erF33qsu2zhM4iMsXk6WRKig81BIyhaqpmKlaVImqjseqmZE1qhON69Ks1dV6U1dNbBoVJWIGkRqiYFI1K6vjpeXR2npVR1UDKbN5BBvYC31tJTF5IJ9CxcGHvBAry94jFy8tr547tjiQoDAlC4xcPm7ZJj423DUAuDWU1CFa/f8SjmKGpE2UGaAp6fEsiFIFtP0vF5BzxTjbPMpImZoKxcZq1qCj8uSTz9/1xtfOLZSGiVgUM+ptAnIy0WEo+QsBKJl3M2XL5QxVtSR51EsUOiSt9VXqrRN5gg0SmOTvs1xeJTKwGohCgF59yZ53vfFO5rHahMmKIiTCFmemlO+oi6wiW/NsW9ORQ6qIoGXIbkB+0pVouW5sWYSjXm+0trm5OQouDJ4SnnavX7z6H5dzEeQjkDZGYZbRJGIQYI3Wq7Iwt/3A/ouZz8sbN9fLvjYzgAu2FPJrv/1rxSzd/NprYLXbLAZ5/G8wSTYQ6hLO3Z3m4ZSKqKcAlWEs9Xpdr1eTupZoYgQK1JjWtSksSlwfTZZX15dX1tfHtSgAcIaVOrEuZCQY3vqbvspJnxAio607tuy5aNdD9zzw+p1vGA54PQr7lGIv2WYsHF0xubUGXayNNrRPZWB/NMld9HApy7x+AOhbmbyxoDYKzXmDZwpK1FhDQFEMpmzL04+eLK2+6203hWloQQzXIGqlqPsf2C0vd6bIltKQxcwB8mGXXTEeKfTvEJlkVQ0gdZ+RFTyTozBLgmiJsqPKDGWGYQrxNVdfubK29rEv3FPVVIQpjaacvCFaIMWyEl93uDekTu2mbUCAzsfb/LVETFBlo9W1dVPbsmMbCoiJ15zyddt+YvbcyLXdjHzlU9x9myX/124gkxHDIDpaHh05clkIAS/hnDbXy7s2M4ALsDwkq5q1z33hc9sObZnbU3JhUDb1WDCrdao0IlGiREmzXMwkZe4kZg20gTjov1pPRlU1qWs1AYGYFFLFZqLN6mR85tzKydNLZ8+urK1OmiZpKGgXlydkIkMKlFr1Qdabu5iCN8LVN157+tzKsSeeC1oysbIp1A2h5W4izwB68IO1lB+0WI3/m9CHDEi0x6g9UunfNvDuf1z3xhYS8UfVhAOBYTWmZO7Yw08e3LP9yGX7eFAggEA+FhjwHIjacfft16at84PRzjjI1j9wknLesNn5GHaHNX0a57owkHvu2kObWEzOWgJDmUAFFWWMb7r55rtuu01lNQThfLN2jjAH/T09p+6fPgyUDlM+9m2KkE9V3m1NRd71lQk0zC3MgtXjgYxrcUby291tOwjbiym7/A2HpTX9eQvNNYXCyuLaRbv2bVKALsjadAAXYDne8dSjTy8trV7xqivKWatRRa0UTWNNVIkmjcWooqZpfLqYWh4IzmSswiaEsdhq3axNqknVSKOsoeCiBEOsrmLVxPG4Wl4enTm7vLyyPh7HGA2KdqirtRCC+v2fYI5E8m6Z7m4bmaxAzZOZ3TP7L9n/mbu/GJdkKg7IyGfEU3vf52ZTahuEetBN95L0jR1E0dl9mKmZ5vnB2fZa9gZooQ70DU2LRqQp6og8U8w8f/TZ088/+7avvWtmroCJqZizY9POAd2cW/Q/UZEG3RtSOSYBN8xMzAn7yV6zd4bzcfNjSSAmYwYxegbSUZmeiCb5FB73ooqCeAb06qtedcct1zXNmqoSsQ/xsRSJd0iLok1G3LimNGsDOtMCUNR/JLvQlKUAkdZW1pnD1h0LQprqUG3p/UV5WMq5cn2jh4b1HUViNHfHyIgpqNJ4Uh289HDYtP8XYm06gAuziOz+++9vtNpz8XYtZNyMG60baaoYG2mi1qIiSAIIbvXd8BNYRSqRcYxr0izVk5V6MpY6mhhAYI00mcS11cloNFlZHZ05vXL29MpoeVKNVSORBjKGUg+p9TAyhfvGZmRKCu//z4W5nPirBpqguvyWK2D04CceCiMOysqszuNPL0pG2+17a7uTC+hiaTNYGhtppim9sWT93X5l0eDWrrSpQjb+vWQAmV7jBk0NwEyYeuL+x2am6Jbbri/YSDUDIoq+58guxbIZVC8jePux5vg1uUImr4pQL3HABrOKZCr93JFzSVPHhbd4eALRt4qJi0tkxOYv1KkS115+eOvclJGaCiWkXntHIuHsOTl5SaBng9W283733C37ZBJaObMynJqa3jIAfH5ZukbO2830ARu9X/7wHhwG9DLNtnGPVC1Wwo0c2r+fabML7AKszRrAhVmE8NjTTwiNZuYVFNfryaBElKiiTFBXtEeeFsWJjy5mqqhUa5L12IyaSS0qGX0X0bqq1sdNVVVVLVVVj9fH4/VaxCTCI9AWJkmhWpahTw919zEnTIGyiIzbSwUTKWOwc+r6W6/+4kcfPHzgkq3XztWWdHNMBblS6mVNaqmSlEqVHT+RMmuE8pdSBrRdCW8jIpRt8J99UBMJJ70wUKESY62nT77wqusvn98xo4hEZirMAb2agbN4MpZkLcrSUhy9Ouy2H275TbKwTaK6dLgMoT0EIEuwiWl2p270c7bE5ONqumYz8sfMwJU0Y6kZumvbjsW1JUWWwAMs8Y5MAJBp6qBLn0I9ROw8698e/O4vMgMYHEAao03iaHVlftvsYGHQ8MQgCfBvPdqf4Vd6+9XD4zacwlwHsuQ+q4lapIMHDgTeDEYvwNp0ABdgeUD58CMP7tw92H3RrKpRMahN1JQD5Tm9TvxkkKmxqSnQqFYSx9KMmskkxmhpnoqqTOp6PKomk7VaIYLJeDJeq5oqSmPk7ArLDcR+K1MvdOuMna9O2cXNdQJzzMBkajEKc3Hw2iOP3fvMZz5+75t23zHcgXrgg39VRdRHYhkb1OM601avaMOiDeYBie+DHvoEtPhKX7Ny46e86Fc3pIapYnrpxNpkZeXyKw8PZkO0CQEMpqxZlng/uam1jfNzJN3B25waYnMYm71NGxdTKmBzch/dATWDT1Zot8w7DBKlP81J85BZATYjUrKJ1CONZ0bLSyuTgqcCFaJi6Mosbkxb6Z0+9gLqjm3eoZ7j9Tf23uXwTwAFK6uVuqkm2w/swZQpRUuNxvkMnXfEzzsD2QN1Z6onFZo611zDwnlRtVktUsUXXxub6xVYmw7gAiw3GxOprr3+0m3bt1cxoABpY+zjRQgwFTG1JqZprSqxaqSSWElsRIRMBQo0sRmPJ+OqrutaIlRtNKkmo6qaNFILlMiCqPNrnOfXj2wp+wI3+F6bADlKQpzAIcdvPaw0EBCYFaZD3Pa22/74Vz70yCfuu/mtdzRzkzpoQUTMUDATCyEPiUwGsXUqPccDOCi9QbWgpZVgw+va3639oxWwcyQfLTsfJoppKk6ePsOkV157uJjhCBmQH1G4xJ2DKZ3HA9pgnrwsSckbhRBA8LZeVSMKqVaSX97bKJjlmQq5mB6SmWYjU3Crz8lerlBOGRMDZLU1EViNkzOra2eWR+dGzUSIKLBFkBhx6lDLEF5r1bvj1Ttm1ulVbDDc+cUGELFBxYSKplxcXI0ad16yzQaKpEibWbndOerFCy9KCLKLbr1Ge6786rO2paFar63G3OzcZgXggqxNB3ABlpqeOnvuuVPHr33jxYPZclQ3MBpILSYgBpGoikQViVFF0IiqSjREk0YsileKtW7qOjaiGtVEqRlX41E9GlciZsIBBZLKASWGZQsZ95CSXuzWhq208dlcFm4RG4OSSaHz++auuuWqxz7zyO49zy7ctCvMQqkhJRCbNWbgwOzibhso6K2JzQjLS/A7s5JO97aNeFD7Rw/m8H5gawVUmRnlqedPL2yd233RNiUVE296yG0FyRG2tW6k0Dl3eSVmC9hRf6R/QZrl1nIWsXHL+rCaM0lzMOydVN44oUQEH0MGZgrKoqQRUpksjqul9dHi6nhxXJ9ZXl9ZHfs4IMtONBdH4JOi2+J3/6TaiwCZ3hFuz7pnLkaBJMowTi2fWmUKCzu2gNRy6uBhwPnJV+fR0WFv/VP14sA+552mYNBoeZ1puGPb5ijgC7M2HcAFWEy8Pl45d+7M6vpMY1aNV0ouzDSqRjNTi6J108QoGk0MUVREBTAl1/OpoxeLY6MqtUkjsWmqyaSuG4nQCNJWKdIod+Yn6aBUiATgwVg/lkMPuWjjdENLAmznPxkphEu99jXXnHnqxOc+/YW79r11avewHqqxF0+NmdpIOHHps02njD+4R7JO1qCH8mTAOW9JTlRA7Wf5kxsAo4y7u/Eteerc6XN7dy9s2zFnqNWUUHThPrkKECg7vnRkmOCofK7w5jqv7wzaVtwUPrffnDAVzc7I+wxcr41So0TuuPIzEpjNSEmNTNhqs3EtK5PJ6dF4ZTxZHVUrq+Ols2vjtXVY0x2o3motf8uzt04RiNAdxpdYG3pzAUYRZHj2+PL0/Oz8jtnGJpbFjja0Z/SuF+ujTef5h43tJWg9tCObZkS8vra+dXZux45t1kYJm+sVXJsO4AIsg00mlTKaQKNm0mBST7hqtI4iqiLqMj0uxW4UoqARETUVi2qxVhGJkghDUklT1VEsNpDIBApIoaIz+zL6A6BvPDvT2vMBvXygi9f9ldSaOjNjgIgjaZgdvOFdb/i9X/qDez9y76u/6tXlReWEmmDMDA9WFWlaDKWYOUPluZiJLl7euAVt/pH+8F2h3jZhY6ngRYGuwSysj9Z2XnrpcDYAGBQDF9jMNV2AjIk5zSFm8g64ziUg5QHJhfa2ZiPM3x6bbBIp51KJ/mheZ/WDaAogeC3AAGYzbUjH2qzV9fL6eHF9vLjerE3q1dW1pXNr55ZWqthYizadv7v5zFDnhbpXtJvePbJxEci8LgIWljUbLY127t3GM2RsqkpdorbR1KfyUM/B9DOE/HCPwAUAZgpiAxEH1BQn1b6LLhkOBrxp/S/E2nQAF2ApbDKZKDCY2bIykSauBWEaUyXRxYddhI2ZwWhEnRbaiDaNNI3GRjSKxBhrbapoqqaomggNjMK006rsEfpbQ/liA3Ceveib+g0RHeWMnthETc2YQsN1uTO8/mvu+Mh//fhDdz989Vuv46mCyybC1OKgKNrm4Iy3tCH7S0D7rXndEOSmQcgZjOhtdGuQ20SGuu1nAjeVVJN6246tU7NDMylCwSbEbeWX3e4USfLe86O0nZSRKCKfspihHp+lCU7HmXobbhkX2uCJqC0O54oCEyixjihGaIU4kmZpfbyyXi1N6tVxszySlfW1xaVzo5WVphYx52m+OPw//8zlP6kFnqi1wl0QkGP39GL2nSy0WDy9NBlNLr5knw3MSIiSklH+NstmvTtzmVDQfUGXdyRSl+djaP8goOBQrdVxEm+96zbe1AG9QGvTAVyARWARGgwGCFNN1OXR8tzWmZKnXUnRlNhLctEakXEjTYOqliZqVTciolGkkRgFyt6lz4bSggIKMUpSEm5zDQAU3KmXJbP50rTuLg3PQmbUhnDJpVA7gx4mEg02tIUrd15x+zWP3f3wcDh32auP2Lw0A5JgjUVCGdpPc80ykBr1Q9YubM0KN+cdL2t/68MKaYu7B6i1dQYKYGZpDA1mZqeKMrVZExCS3U+je5mYwVkJwTq8J+03t9WRvOOkKm1xuIdstduAzPXsNpC8vEsOAxG8+guqUa/reGkyWRyvr4yqlUmzOmnWRs1oXZYXl1dHq01VJ7fEoNSOYF3mkw5Jbr3rAvQc+mcfYVkA1H1kYp22PcSsIApCZ4+fK8Jg2+7tqdyQexe6QD9/S4sO5Rfm3e0lAb1OvYQG+kapKZibcVOPqquvumqTA3qh1qYDuACLgPnZmcHUcG29Xp8ooQxcBCuIKCAwQWFNHSexWVkbjycyqbSqRAQiAgJiA0I0mAhHZiNVTSr1bB0LsW9aU+jZycd31t96JrVraMo88lRzzCJpBKiADEwGIBCBlUyC3fCm69aXVr/0mS9u2zKz+5q9xUzg6RokcB0h76piAkwNPg0ra+f3Y0rr2yw3pHlU7fmcUXTWv7VR7fvMoIzQ1IDR1PSACyYWzyPIpU/ZR6IRg0IrAtfFyP6T89FyPMt8giGItIOAel/b7Yd3WLcfl2c8+uxiVlWo8kRlNdbnJqNz6+srk2o0juOqGVXj0erK2uJ4sl6LENNAEI1AAnTTYNCWzsmoPU8dJtQF4/1jTBny7/mKxDwiqFnUk8dP79i5fWbrjGLi7WZkMHDu8DVLPsTSpNAWRuyfGesfly7bSL2A7E5U66oOxgcP7A+bOhAXaG06gAuwCChCAaPTi+dW1ic7d8/Xk3ERpgfDaYta1fWkrtfW1tdG49Fo0iiLsEQn2YMIPgzGldzM+SgZTielNCssi291KKwbLEujflMJ0ldqENiYCmxEb1MqbyAQgzVPEwaZRhBbU8bb33LLn5xd+9MP331LfdslNx6oB9YwmAwqqhoCRdPATGA1133r4xnJyJ4XTKcw27eBehvXblgGuDzCzEbHmCwQx0ahVJRBJBYBDGUmjzcDBaLECO32nDZ8Q0vxTCFs222LdkN7QXF3rPOmUddUZqQAKVQJjUotUk3i0vpocby+WK2vNNV6FatJnIyqpZXVyWRNGos+2b0nrs3IbNc+rt792Tmu8/xqQqY6WCg9TeZk39R+MV6djJbXLrnqSJgLTaGJsw8DlBBarMeyu+lCixwyoD1A3QZuSOgs52GkqEb1oJzeuXMnNtcFWpsO4AIsgy1smd+6sOX46UUIYxKLYqjg1dXxeFKvjsaro/F4fVJHI01aONCcuqd4PU2NMRKjFKRl6DnbsBQmpvhMNpj3NFOekOBzjyIzRpvVnTcCLqlZyiulBnb7qEbECpGAsHXwune/4aP//aOf/+jniGXvrfuqEGjKjJMSJHOwxgrjwOxQiCtMpDZWOm8i70b0Kf30Z63/KBFpu5Xt7huZIBhDQRS4CETRQgCTAgFeKScfrtz1uHq+0kpkZOm9bPXTX543pIlsBN3gmjiBUHACrKopBaslglhUK8jqpF5er1bW1lcnk+UmThqpK5mMq9FotLY2mVRSNQpn3rdSpwDIekqcnYnNdYguzO/SKWQRzozapINKZJSqCmAYiSkXMjh18rRG27Vvtw5UgsKVQZJeFLdTB7IP6LCl7CoNXdbWc+XdNoNApAQjNq5GNVkYlOVm+H+h1qYDuACLgIWdC7u373z6+KnxanVq/Vw5PYhxPKon1aSq6thEUzFjYoRkdKy7kz0Ga5s7k45ZCzjnr0BnGtq14eket9PtSYq2NzBEWzp+i/z298NTBwNgShYHsdw39dp33PHR3/74PR+79xbBgZv2r4dxM7AiEEQ1WgCCq9kZqY+OcrpkNllEvW/KwSrhvJA31Xlb60x5exyuyg3MxgCMRMVMDcac4H923Dw5HMrl2c6KmSF3QCHH/SBY7u5K30kZe89bRUAAKZEQ2ExhIFaQKmsVm9H6ZGl9tDQar1Rx1DTrUapo1ViqtaZaHY8m46qO0QgICk2ZHDqvD2sHwZ8fYG8s8reJk3kM39VhOz+W25kBYiBiMBmcfubs3Pzc9NZhbbGSBmRMME1Rf98TJmSpd/20oGJLjbV+RSe5sO5iIkU9FrJik/9zAdemA7ggixh86SWHP/qle868cC7yok0p0dAIiB6fMRGbQVQZxlkhp29+LeuzdSaoC/u6L9rwA+c/1TZ2brgHe6Bx77VOULKNwbffymLGBG1YxWT+kq1vfM8bP/k7n/jCx++d3zq15crtZGJBFC4bCgmmVoOZvAvXgYhEVqVscttdavlImvfWs4O84yn4zRmNJUNHxDBjMkAn1aSJysHxayPA2PcHGaMhMwVBU5BNWVouyy7krgT2IoKBIF48dTYRsjl0tyQm5lQpmKpO6moc5czK2vLa+sqoGtVNbdqIVlHGk6Zar8ejST1uYlRNn4pku1tCUbLYL6GnsaF60X/YWi+e87k0Aa7N/mBkQhqMg03LCp97fnnP3n1T24txWPVpQ64fZYAhKkNA1NU3vO2ujRDaikd7VfWrI23bhKUjLxorMwlkmw7ggq3N4vsFWabQ177m9c0KrZxZpzCIgkasiSbGRsGMVAAlNnLsNXertv+lz0lOoecaNlhygEBsSXlm43/nvd3gqH4vnejJMwBk/ZZ/f3d3O2cTo0CFZuuhhbe8543zO2c/+sFPLT6yODsZBmME5hAU2ogoLEaNChWoQkUtmolBzMRMQAKIIf9LYiTkgTUpkl6S99UakfrwsrSnMJCjTgYzAWwyETWIqcB8XkLae6fWExLSlv8TVTHN2+KTANLSLFPtKULSSm4jaoOYCGKjsYE0JBOT5WpyYnn5iROnjp46+8K51TNr1ajSca1rVbM6qpeX1laWR2vjZhy1NoqAY2H91GLDGX/pC6p9UXb51j3ef03XLuwHEBYAGEJTrJ+q6zVduHirzlZaCHycvbpEHwAzEqM0psh6+WCXquUEwXKS0b8ccxLlEyIpINTjqNGaGDc9wIVamxnAhVkMvvbKV22Znls6uzK/dxdIRFCwkzHEUwSzPLKrfztZe1+9KOA/H6LpHu0Qkg0BvJnlinGbwL/E23sI+IZIOxscj6RNjIKRKTdSNzM7Z9/y3rd87Pc//qcf+tM77fXbrt3WRKEyqlGEFBRAIBMCk/bKk6Y9tmLXpWxA6iFLSUHiJCksC/ZQlwwApG6LbUhEIYzXqqayYgoGqELJAFUDweEHtRyeJkzJ1Hp2rQdyE8HghQMQjLOIkeVTYmpNjFGgkVA1srxenV1eOb20sjJu6gaqENIIaZo4GU/W16t60jSVRoWBFMoEU+pmtKXD3vnhlzo//Z/pGmlBu7ZQ0D/Flg8nGbEEs1DEcPzYc6Eotl+yoy5roAkwWLunRtBCusBgw3d7pmE5TVJPLimr1qXrp+tFJB9yR1KjsGI4KM/fqc31Sq1NB3BhVkC4ePe+Sy4+fPb06b2yjQim0RDIm4Q0W/zOpiVgBEAfvEV6tM8732A2kbjnPaJgu9p7cgOe3FqMLrDrIUmWepFac+KbSt7cr+YcvzJUXJcXh9e/+7Wf+r3Pf/KP/vT2+tUXX7m7mq7jAMSmEplLgrmmTvp0dZ2djVuaYJWUpSTb5WiRg9yuV+8BM7cgtwEEpqmZQRjw6sr6uKoL1SkMYD63gAkgE+RiNzKTKlN9vEaQ8CbH3rNyhh+AYDADG4RgBlPTaKIWxaSxONa4uDo+tbhyZnFtZbU2HqiSQdWskaoaV+P1ajKuBKTiBto5pklVdKOl/zMCf2Rz3ntR+9L2AG7cmQ3+xEAKBA2Y0MlnX9iysGVqYaA0rqUhAyMQHBLUXI5o08LeNUZt21dC+ruUIB+xhLERvOXOoNWkJnA5mArF5iSAC7Y2IaALsgjA/Mz8W1735sVzKyALlIF+UjMY96koyPdsG+F1VtlaW9wBQS1MxN5xmnT287wPUEZk4WQWAihhJy2IkpFngHqfb7l3qvd1BgIzyPvRAAGZsdShHlvF26ff9J437j148X1/cs+5x8+Wk7mpem5KysJCHrLOHq76viu10bYP4co2P3NOydx+UCbaONHSYR9KM3uRUGpVEW2KQbG0srY+rpo6appGkw6FQsVUzQTmCqh5SxLfyg8XAa1yRN5WUzUxiGpUraNOmnocx+NmMmmacaVLa/r8qfXHjp16+vkzZ5eaSstaUGmcaD1aH60sjpaXRuNJoyii02qMkPUYPH/xKBpo29deCibJCWJ3VVnvDRm0y2Ta7ikzSjAjASFYE1aW1pZWT++7fAcPNZqmIW9mSiokyupfr+TzgqAwJROCEASkIAFFRiQ0jIY1simTwhQ+GEjBZlBThVlBoaAQq3qqKANvOoALtjYdwAVbDHv97a9ePzNePbFGDZUhEBmbAzOZf5iQ6v7KcVwPpOivbMN6OK91tgDpyRz35m9Bn0uY4lH07F16tN22NMQrjQw2UzCYQQWCiaooEVFZVBSbWX3TO1+3a9/Oj3/ok8/c+8zUapjSmUAlQVSS8SYzh9PVB4mRgbjzbTmY7LYfAAiWB85Yh1AhR6uqJhqp1Jn52brWGFU0Nk0dzYySYRKopLmPoiTaTn2EE0RT3UXNDFCQ+kweQ1RtTCuJtTRVrNebyagZj5rJJFZrdXV6ZfTEc6cfffr4C6dWVtesaUgaq5tYVZO1pbXR2nhcNU1ELVaLePEB3pyr6PiV+Xy9FOiz4Wf3xJ+dJ3SflaAsg9NwyUwxsMGp508ZxZ0Ht0nhBSlxpwp3zPlycwBfjZTSCIH2GmMDqwcUhCRz1OpnGHIhCgw19bpKjM3c7DRvToO8cGsTArpgqwS9+sYbDsxsO/aFp29+802T0Kg1ZAjB4dnMgDc638jnfBvozHp/tTDIBtDn/I9ouT5d2dBfb6DcqJVem9MNAKkLKf9pBjgzhondJITAYgohJbOgZBanp9/03rf86Qc/8cVP3Lty6szVd95SXDSHcllJCCALEcoczIgN7EPHyIgcm89iRrmDqMt5cgdSWxRwx5ACYQaITW1+65ZxPanrSBbG48nUzACCwJJzGU39bF7WNfigRlXLHVjWHh4zn1rJamqoozYao5oaaVSJigZ2ZqU6fmblxKmV0XgM5oKHBayJ44k2Vax95mcUNXEeDbnSjkNQLYryEmd1AxD3Eu6hrRq0Z6Y9JIaNT/hRMjCIYwhS8iQce/TY/MLU/NZBY9HUiNny1eUgThuRAL1rI22DkbeGU6pbgciS3Id5SzYZGdTzCApM4Mmkrpt6157ds/NzL97dzfXKrE0HcCHXtoUt7/nqd/7UL73/xteCpmEMUvEJwKnJCMBLmG7gvMDQNv7a2a2XsCTp3Z3xt57xz5x+YGNA2en/9L7X2lZdIqhap9tgAKkJERsCjQe1DPTVX3VbYH3iC8cmy3rtHVctXDoczyCaiBIFNpiIEAU3NmlbiPOxQFdkPK8NDL1pMq36kT+vRgGzW2ZfePZ0jBbrenp2WDWqhEFOfs3MZzJ21tK0Bf6NkoMwqDe9RVOBiIipKImIGjRKFONxFc+urh09sXhmeb2qNFA5CAXU1mM9qceVNFFExCCtD/NGuAT2bCRNbhAZaqskLz7dG894/mDb8Lp8mvOfyZqTC+ZRw6Mz9bnnzl15016eMeEa3rZs5t3LKcHK1ZpcacpnP32T5T7B/P3BO83ymYMZ5wjDrGA2mDS6dX7LS16lm+uVWZsO4EIuBr33G97z/n/3s8888dzem/cMQ2kUPelOJqgXeqf14mz5vNsn2e0X5Q1o0aMNL865woZovx9OZjZOhqGt3YockDtb3dF4kKmmVlGKMI4mYBFtZmanbn77q0saPHnvU/d9dPXq1et3XL632LJWh0YjG0sIgYDIpinm12AEg/bYqS1nPHstF75uDae1T7ExWaitmp4erK6Nz51b2r5zanpGEGujoIqCudXrz2XfhDUl1qebftP89VAjMY1iZpFMo0UX7o6GlfXJqcXlZ0+cXl6vG9HAA4LGuhGL43rSVE3imZq14JwfJXehmneMurOeMf3zTuSfYf37579NhixfMtb9pNRpAlZDtFgiHnvyeKGD3fMHZwcLTVgUjQZzpAeAQ2YEStUKpa5Ona+1Fq/rWrKV0J41EuTCEmXUMMZIoJ07d/JLljc21yuyNh3AhVwMXHfNtdcduvGej3zuwFX7p7baSEUojYRvB1u1so7pbRtv6w0rsSfbJ6hnL/ES6UGn6+KGyPrf04Z41L27jSVzcpJ7Ya01LkoENijnxk9jgENlOpgvb3vHnTOD4f2fefjTf/ipS5+75Oo7r5zfPbOOptJoHA1w1Uw1ZU46CE4JzYfivL3eqCZkOQb1wrJEFMWO3QuyvH70yROHLj28NlqfniJlM7CqM4FSk6tj722e0VL+TaGpIuPwt5lBVX04mxFX0c6srD13+tzJxeX19WgKJg5MqnFNq1qbRs2i9s2cZQWjVoGps/75JTkD7J3MlzrdG9eL2sQyp7Z39h15Yj/TQkJVdfroyRBnV8/FledXw85iJhQa1NiE1BIY5jNyfBNDzrMsdfKZ5nI5EWkfu1TP5dytefdd2lKejCdMdsn+/UXYpIFesLXpAC7kYvCwnP7n//RHv+pbvu6JTz124xuvG07pOirrbIGl6ucGq94LEc+LDan1Aen59jl7Ufif3EprzXvk8fTuPqicuffZZBmMsjxxphhacI2gZKmTHIAyBYUpi5mgsGvedtPufbs/90efe/yBp08eP3ndG27efcVODLnWiZK45J0RCGwgM3YaYtjgg6ivwZD6mFscyzdWlYLVPNmycwagE8+ebapLo46JhqLQIgQ2CuoNwyYKQOFkoFyDN1LkOndCQdTrAKJRoY3R2rg6vTJ65rnFxdW1SRSiIsDYoFabxVqbRiMZBypgguwwkz/r12hSpO82mjacupd09f9fziCdastGuj0o+QyTqjFZwcxSrJxeXz9Xb5/eeezhk4898RTN0fzu6fkt08O5qfmtc9OzU8OZYSgKLpUDUQATG7vGBYiUCGaMYGDX2WYFpEhtBhpqFRNol70RARw0TNbGbLZz5wKdf2FurldubTqAC77sjW+64823vf4jd39w/xUHFg7NMDcC8zJsH6TvDEGHFmw0A6mE2ALAG4JjtLag/eLuuRZkboFna4u9+eNalIiS6+iABkp6NYnzYW3vAYjZYKpQRgBCUYUopjtu3HvXzrse+uiDTzx69FMf/NNDJy658rarp3dMj7kRRLOamKAAM0iVjRRB/bs5fy+nWjlIW4Z7K+pDrgiktdWmgadnzz2/Ml6LM1tCFaVRUSm4CByJieFbyKxI8a63iaXmApAZBKoKdehfoCoT0+X1yfHT554/fmZtbGLMBUtsQmCYVfUEwYyIrWAzNhXLkznbbMYPq5+1FmbLJ7vLCl7iksFGD5CbPM4rGfhLKRczHKUhg4KYjEQVkOLUs2esqn/qp/+1lfbkU0898ewTJ5ZOPn/i2bOPnTlVnY4xej96URIPOJTMAw6DEMpQDsowDMOpQSjDoBxQMDGtVieTURUCLeyZ33rRbLF1Og5YGGoaDFkpD4NQro/GzGFh6za8aKM31yu2Nh3ABV6BSFR+4Vd+4ZJDRz76B59693e+fTgsqiACVUPBLfjTWv/2rX0L38I37Z/IiHjbGGada8ivMTN4yN4LR93ZdFBy+0RuhTWPUr2T1hLB3AgEhauHAZpeZwRidn4rGoswCFPDOrV3+qavu+OiK/bf9+n7nrz3mcXR2lW3XTu/e0c5qAIrIpxJxMSiGpjVKE+VQZoQZgSzkCb4Zul6Qq6sMhE3avPTw4WZLY899KQ0bzFwhJEYm7FE3wkyEIw4GIFcKMLBH3dqxmIa1TSqiohEFRtHOTUeHX3+5NK5NbgTH00AACmwSURBVDWzEMyMtGFTiSRmVEA1sZg0oecuemPUOwddrpQx9Pa0IdckUvXFzrsOetlZ+17rPd5GCY6K5fPXgoFBC7bheBlPP3L8LW+89bKrd6+Oli/Zf8Nd8YZaRaKKNqvLaytLy4tLKyvr68uryyurayujldX18fpktLq2Nj43WRuPztSrdVPXVT2pa2ni1GBw+aEjb3n961ab5Y/+4SfmDu04dPNhmquijYkIEBgZQ0mXRism5baFXbxp/S/c2nQAF26lmI8UfObZ5duvueOzj33qyU9/6crXXyoFS8rflSjAOuNNG6xAa3n9Lu969Lvsv7UMnTxka827GoAlkAPUmpIcOPZRZH+lAkq5vEfGbm1SJi/p492saRL6NCR1NmMQAgI3hLDAB2+77OJ9ex74zAMPPPToZ07efeXNV++7Ync5PyA2ZZHcP2oKzhqfRqYp9mcgbST365JJUUKVjRBixI5dOx6/72GpUTdqGtn7X9k7h4m8BUw9MFYXE1VTVW1EiIOo1o3EKFHEVJsmnl5afeqFM+tVbUIAMxkjmpp3IivUhLIv9ePGKUUxo6yV0J6b7BISjt7L4c6/Xl78YFun6bn/9gJJV4dlvpb7ZvIWQaNyHJ564Fw8x1/1trfFuKbNpGpqiVKEgkyHIQwXtu7Zsd3AoODe3AiqJqYiFiXGGOuqGk/GdVXFqE1TX7zn4h3btgbC3Ja5I7uO/MMP/Muq0StvOzg9N5QQI0AKlw9cH42msO2pR547dGDftm0zGzLRzfVKrU0HcOFWum9psjJ+8oGHfuB7/rcf/8nTX7r73kPXHpy6aFBr5CKQqqgQOCAgSWdmc5/KAN0EX4/1e1gC9WPJ7n3IkHaC88VpL23HK/exY2qdAdhFlNGLw30+lRH5NEpTQBnEHDS/1J0NUQCMGcRsqjCg8EJxcdGhg4cPXHXHa45/6IN/8Pjd9588unDw2st2HdgZpgcoVSVyEaCqwcwJKOwlB5cR8mZWiqo99weFGkMJxKFR3nNg7+Ofe+Cxh5685U1XWlQKpAGqKo2ScTAGALZENzISFwoCi2JSj0RUjdSsAa2N6xMnzpw6fU4lmDKl0gQZwCCfk4PUcpub1NJ57gfu2di1rH/a8PsGEMd6b7SXfAnQ5gDtIUibRT3H75mEghAsFHFQrcjRh5695rLrrrvqusl4wiAT81FFxCSmxFpTVAgAUhCxRQMQQAXTkImGbDNTxDNMgZmYuK4a4Wp9XNdx8sbXvvF9jz3+y3/y65PVxZted+NwIVAZDRLAk3E9GdWHdu2B2Cf/8CP7D+y+8rprZ7bM/y/cPpvry7E2HcAFXAkjefi++7fPzIUtcz/41//6D/zo//Gx3/3km77lzVNzg4k2akIcCFARyqB/ryaYm2LzT+o93OL62RAArtzTPZ70jJXSvFx/JEM+BPQZli0WZT6sJBksa79fCcYcDKZpSpkSOc/SI2MmIigVxhAulAk0qdcm6+Nzx5aOP/JsXK6mVvnc0tLiC/ctXLLzwBUH9l6+Zzg9FJhwBCQpNasRkakahIsgohQISTE77wUxYBojEYxl654tYDx8/9O3vvmGOq6oUjBQAaaCrQiWvKEiRhFVFZWqljrGKJ4JmCqiyOJofOrs4upKFYUKTr2unDF76ersbZGkrbQnC59ncqZHepdCrzDQK+SkU7FhbZDubqG5DS/unGG3HarGIA4MiIGCDI8/8mxzdv1r3vu2mUEpk4qo8JpPJyBhBkEAO4PHgJ52vzNEBQZTEtRRiLyxQVCWBNao9Xd/27cXZfGLv/ufR8srN77++p0H5ymQNWF1VNcr8fLbD+/duy1OyqWl1U9/6lOXHDx45LJLuSw2U4FXbG06gAu4yGBnz5ytJuMCYMW+Xfv/8jd860/8yr999AuPXXvntcZWaVTSkMaVt41XeLEV6QTEWieQOXf55d6an/IGv9FdTIEsEEDGQNJ46cmGZbGQrLPsnD8Gez3RbQXnad/e/+mIkAMeasoloCg5QEIgFBJCxOri6MSJxReeev70seOTlfGWcv7G62541TdcX0/qzz/wxfueeuSLzz107tjqJZcf3Hbx9mI6xHJE1KgZcuBtphoVYI1GUFDSgjMznzlcIBBLxOrM3Mxg1/SjDx8bryhvYyWTOkKYGU6HJOUIrTXWdSOidayjmYrFKKYQU1EdTSYnTy2O1xq1ELgwCHE31TijbhnV70XsyAF7j8ll2HgakT6ipbBiA5//vNe1fi77ij6I1H1tfrt76sBkaqoWmAoajs/R0QeOX7Z/312vvSXBX2hDjJxUUmoAzBeeZb+QKwnoBtSQu3ywmTruJFaVYfi/ve/bt81t+cD/+MAnfv+TN7/2xkOHD09NDUanj2Eihw/unZ7BWGmAKaX47LNHV1eWr7jmmtm5uY1ubnO9XGvTAVzIRaCVpRVWZWOwIurrX3Pnb/7R7z1yzxMHrzowfXE5cYl8H8DtyzIm008F+lw/tyOJgd8nS/qd3GUNPbQgq6C14WWL/HTfkIoETI7mUyJL5i0z1XZDQmAn7xOHgJJBzERCFGVtafz8Ey8sPX9u8fgyVdi+devbbn3L13zVO6+47EoWmAk4fMt7/9JjTx39z7/+a5/7wmePP/zcjn07dx/etfvwji27ZrlktSimjQqYVY0Dw4zYtW1y8mNw/U41KJo4qPcc2n3soWOnTq5unx5oOSk4qIpprZoOqf8RozRe6hUxZQduqiaePrW4NhqpsFkRKLgWktvFvs1Ox67LsqiLznM9pTv5L7oYuj/y+aQNL2jPMVH7fNYptSTxY+lre9vgp9gpVQgC4qEOjj50eny6/pp3v3371rlqMgrEsOTFqd0C69UkEgZo+UG/iHpEtPZaIDaAAaOgGoPhfe997y133vxTv/BTn/vDz586cOaWW284++hiWZdXXXZEmgpmUCO2qXK4srp6/333Xn3VtQs7trsCHzbXy7n6V8rmekWXW4YHvvjwmaeeLSkoq0Irot/5yO9/4Jfff/mbLrv+LZeuYz1qAWW2JJUM61llAO2DvT97xqgFhfP8qw2lws4bJEw/Iwob7Q55KTGDCcikwlSYJUOqapKRIXARlAK4NLZIQXl9HE8+d+r00eNnnz3TjKrZ6flDey+57ZY7rrj08isvvWJ+yyyiwbRp6iKEGKUoC0KopX7+5ImPfeKjv/8Hf3h25Uy5dWrbvh0HL9u/46Idw63TUsSaG4EoKWBRhZ0mlCRpxJSE1cjUbBB4+Zlzn/mte979Pe9607teXYe1kkUV0mgTY2xUVEUtKkRNIEZg5iZqI7K8Mj57bkka05gVcojYzCy2EqHJ8G88cP3YvH8su5PzEgT41pnYRoO/4ZOoM8td417u6d7gRjxiJwQAokKlBKJgQ17ij/7HBw/M7/+XP/pD27dNx6byyV/UFfzbC2WDfaDz/u4uofOXIVNbjWHBmNdp/Mef/vC/+3fvHw5nltZ0x8K2//wzH5gfzDQxcgQKNhMlYuamaa685uqL9u2HKnhTsPJlXJsZwAVbKdRmCLSgwgA1IuD2G2/77Q/vf+aRk1fdfvFgq0U0qiVblvLthXbAhmpAztKzbUD7VMrW3ZRYCpE7Q26UxFo2eIMNm2pIAjlG6E1gcftiSmAiBnNQIwEaqarqxIlz5547d/rYmdVzo1LDZXsPvfn133DNZdccOnTZ/JZ5ChgOB7FppKmlakIITAS1sggqItYMirB/z8Xf8a3f/k1f/01f/MIXP/bJT3zqi5/99CN3F/ODXft27Tm8Z/v+bdNbpmkYRCUEJW8ISNNduEEkEGlRGJFg2/wOmho89aWjb/ra28ViRAOlGMn1oMUQASEgBEgQadYmk8XllXOLSxKJqQxUUjAz6Y42Uc6bekjNRjR+g32nFz1sOM+un39yk6XvI3sbVnbuKTujNmBPjxgbt317VMBIQDzE8KmHj1Yrq1/1nrfu3LlQT1Zf7Kn6H3/eQxvaT9Lf/SiSuh8+RNQM0Ng001ODd731a++48faf/MAHPvapu6+75s752fk4HpuCQvAGDGIzEWJ+5EsPF0W5Y/eul3Cjm+vLtzYzgAu5DHjmyacfvfehqcG0kSqkMRXwf/ngr/7yH/6XG77qyFW3Xb6mVaNEESxIcXgCdvqpQG6xz+yTXmgIuGFps3VrjX7eiJ6NIcrGPYVwyRWkCBFERj5pPRAXxgCzsRhP1nVtPS6fWj7z7Mnlk2dHSyssdtH2nTddd8Orrrz22suv2bl9x3Q5VQQy02iRSopR3KwCxCARCUxGMDUuYLBoRqFgQcEDET5z9uQXHnzo/23v3GIsu47zXFVr77PPpe/dMyI5F3J4E6WhJFK0SVOyLSdwEAQJEARKgCAwAgMBkofAeTD8kBe/JYjzmrznLUCCGAiQy0NgA4kB2nJIiheJoShRQ4rk3DnT09OXc9l7rao81Fpr79PdQw4hUjPWqW8G3X1u++zLOVWr/lVV689ffun1H751MB1jRdXK6MTpzdWT6+sn14seUd9BwUIgGIhAMEBwxM5BUdXlS//zL0N/8gf/9nd9NQvk2QMIMYMIBQ6NBM8yq8P+eLp9e/fmrW0OUpV9IqenhiEwBydY6grASJyXZYb2ZLe9HTryfVLKjzNkmEWVQ9/EVFXXlfpU4cm+QgBi9VuSmKJGo87cEVPs/EYYikAQ+lSFj/v/+z/92RPLZ//oD/9oebkMwWPq2vapEcAdHVF3ZJKfKLE7KBIhSB08OUeubIL/2cXLG5ubD64t+aZBIt150vbeIlKQZnw9+9xzw9EyWLvoLwyLAO4lCLC+uuo5BGSVYJ0IIb3wtWf+65/+8fUPtp97fnnUW556ZueFmT1zENY1rdJ6kVG61zZmWbqNNim3kxEt3Up/JdGIIWd/aohBUVFuBQ1M+j8BAWLwgQM3dTOehYPdyf7tya0bO3u3Dia3JmHKVVmc3jr9zFd+4/yTX336qa9sbm5UDsvSIWs5VAjAGFe+lAJIc+d15xxpylA0tyjoSOetMUjjA5/c2vqbv/Xbv/2dv35wMH777R+/+uZrr7z5+qU3PrrAFwC4NyoHa4Ph2rA/qkYrw+XlAQMzQwjMDZa1C5Mw3h9vX91feahqnKQKNmLhAK6u693x/rXrt/Z292dNIOoNqwELIgeWAACE5FSNYdGM1Gjk8uxoMvm50Ta0A/P2iif5J90TXbEc9wTuxHLpvbpD7ORvUFJbOcitK0B7MAF6RAQMKOyoGPjqnTeuTq/P/v7v/O3N9Wo684SEyMLQUbOiqARJd/yEYbjkTxu2d6WfuhQ1BmZAKVwh6Jq6HlTVlx9+BJwE74lIQOsnUCBWfUBgRJzV/t133/36176B6Gwy4AvCHMA9Zri80qsqz1yQoKBDApRHTz3yzCPn/+K1l//4yp9srK8Phivl0JX9qlcUrixcSa4kVzhySM4hiCsIkQpXCDMSAgIhMosDUl08BNbMjhAYEJgZECUwoSN0iChemEE8M0vdNL72fubrWWjqenIwbWazycFkOp5NDyazWe2bpmk8sLiiGA0GJ0+eePyBs+d//cnHHjl37uwjq2urVTHAOPgLCMDcAKVuRIIimm/UKiFxBiEtLEtIap5dQGFBQhYpeijsRRpCWl3qfevFZ7794rNNaG7evPnu+z996513Ll659OHFD2+8d3M31ADQNI0P2tknAEi/V/HU1TT58RsfPbv5FFdhrz6Y1mFWcz2rJ+PpdH86mU28FxDsl5UjB6FxTjyAc47jai3qYbXdXdBQSdp2eno4OhubXHA02Wons1/oWNM2cSgSHUs+QUnSix3VYv+16BWw9esEBALsAAMACCITEolDDrUDKqFHTX9vmy+8/rPnv/orv/7Ci+yDIxEGSNM5qYn/oV08fKPdd4E81khHE482dTFBQUCHKjAi+6pwHAJzQ+wEczu+NF+us8jAEGDQK29v3/z4+scnH3rgrr5LxmfHHMA9puwV65sbN27eiGnygCJQlYN/+g//2fnXf+WDDy+ND/bHO7W/We81ByE0gYMIBxFB34QGCJi1pQEjADM756ggCSkjU0BEgg/A4IroNjSOEC+iK2NpnwPPUSli/c66wpW9XlW6st/vr1Xrq1sro1PD9Y31jdX19dXVk1tbG5vrq6vL1aBXlk5z8VEkgJdmAk7LxXSWGGNr/dw5utNTLhuMpEvlRyXWJbMAAAa1niIQGBik4cDOFZtbaye/9MK3fu3XmGUyme4fjMfjg8lksr+3T+R8CFVVjZaXitJVxfK//w//7k//x5+cefY0r9Xb451b++N6FsK05gacdyKOSDTZPc+5YFzlRJctjAP6TvFdK9fP5eHm6tuuNJcmiyF16cjb7LiB1Fg1zs/Ez0mq5NAgI5YToLarQ46d1pAhrtOl54rV2xMhAKNw6Yc/fvMCTfG7f/e7g/4gzGYMpPV9eX8Td6UMd+oNoD0Lca6pfQJm0w7IWrlNRSe6SJ1GEVKhBAIyBC6wuHL50ubJLVdax9AvBHMA957Hn3zsyvcuF9gnNTkEEOTx0489ee5pdEIYmLUXmQQOQYRD8IEb3/jQzJp6PJ1N6/rmre3d/f29/f0b2zf29vZ2bt3aPziYzSYhcAieRQaDwcrKyubG5mhpVPX6hOjQVWWv6pVE1CtKVxRLw+GgPxgNRv3BYDQYDgb9sihc4QpXiDASEhEiEmGsEgIRYUARDogYOJC2RaBYW9yxcZ3GFHl6Id2RtYeczSQdwxmH1OnPqBYEICTNPWUfdHvDQTkarCFuCIBD7TtGmFabFIB//A+++/K/+t7PfnzxzDcf8DOZhhBIHLqyJAQn7EFYSyNYxBGhrnQIIOBjf7nYUC3tXBqlS/sTIOVmtlMo0OkKka1sekaeuoUjT83zMtk6prfWNwxp0kcAG/WgqNmcCIIBJQB4QGGBqlftfjT96Rsf/o0Xnn/x+Webep8AHVKnHZ109uoz0hG5MMuMem1j1kEa4Mv8AaabKoVpWgIBIjlmKJAO9vbG4/Hy6sqdohDj58EcwL1n7eTG2vLq9GAqojExEGJgbqYHQJ4cYxo+g6DKuoVzFRFAH0ckgkJE54gBiUgAONV7URqRIunssQTPAFi4In//MI5zRQSEAwBA6oasBpmFEYOmgIN4iKXAufGntgNWTd8xpz6geV4asrmK4/rU+nJeV77Tt7ttLZpt4ZxSrW9DMeNE3ywAQACvXeGi5ILgqHzqyfNnNs/+v7/40aPPnKlchcVE2EMBUgsUIoE4MGEa+kuaI0ndqeeMbwpvWht/aMfbXZX2riMNXDsH1H2o83dMtRUmbaos2U/Gwg4hlqBXHkRn67XJnwACkiAEhKII5dvf+9FGb/Wf/KPfKcR7IL3SWcPvBivS3a/DHK8NtSIQdu9s56ZjpJfL1dsuRSLdpyNx8I6cc8QsLDKdzpZXj9sR4+fGHMC9BgEAv/bsM//3pZcCkgoNDgi4QadfUWTpmLEoJ0dtRU0UIaKgQxEfPYQ+ORtazdlkkco5FkZpQNq6Kd0N6UwdIwkCqbhNaezepiTGSVBMM8txhKd2KRUJZAmg1TOSqpGHvodnRDM5n71Thzp/2lqz0Wlol6xWTICXqNGolsTCRdV/4bnn/+P/+S+zndlgMBzK/tjXcZcFIMpwmuIJLIJO2ye3dRZZ5IlqXdaykobRivzxJGQ3J9Kx/njkj7nPRNeKSjSRyCSdGCm32RMhh8gcAFK9tl5MBCR0UpAUJa/+7AdXr793+fd/95+fPf3QeLznXJq9zu+Szqm0l+awm74LWs/RLSLTh1K8J9ANmjAtjBMn1INzJCAcYhxlSUBfHFZkcV+wtrp25uGHa54GQIA0SGIBQQlIQsCE4pARmUiQmFCQGJEJGSGAprKTjsxBEASBEUX/qyVyCCweQYSDMAMLas9jEQkCrP8BGTAgsKCuzMKIgsAQ/8fAIBWOQuzDE32I5KQXTEYwOa22FcJhs3/MOBMhjcMP3Y3RIwjmvwkJYiKTHrruU7S+DAIocQY38F/79ndoCt//szddXa6Xy6PeUoE9JIcIFA8NJa4EjNr/EiieUEEAEqE86xs9Jkoa8IueIuzI95ia68w5jxREQIq10ok9dC5aYUiS8h+3JRq0qUAFuoCjXikkcA7IMRUN4QxhQjuXDt586e3vfPP5v/d3/tZ4Osa2oUMnKJFWmusGIMd9YI8Bj7nVkf3i5TkUPsTTJCiMEPPaKC2Jh8IgVLrllZVuqaPxOWIRwP0BwpfPf3Vvd/fW9rYrq6CNhQkBGRg4RvqQbIwGAvmrqyVJnThaf0pKCtfbInNj9jTozzsQXxnl65S00w7l4rNE8tPTYD7fkzqywdwIPb/DoTvmbkraeM5El5TW+qlf/Vw7276Vxj3SmuV4cEEefuTcN89/4+X/9ertunnhN58bOedgOuUJQ3CIAUC0EyhzbG6DIFrwjCmcAgEUSlV0ksOZrk9ozxEnMQagNbA5csrTqIdVGIkCltrIuD3qvKX6Rw1CGFDAaU0FOgcNOsQKkIKrD8K1y9vvvPLRkyee+Je//3vQjBGZyHFgihd67vLi3Jm8O+bjhXxvzHySuVhgzgvGcUPME9APNrGOfAAKmjSzx89+uSzLzxaEGHeNFYLdJ4gwhFnz/e+/fHtnp8A+AQsRgmgbuCjUSmoX1prFfE+ru2LbB7q11rEmM04Qtq/OT2j/TqYp2x2Zfwocte93YTY6m5/Tf7rbyM/sqhNdif2Y7gnzL+reSIZfRXwSAATHlfvgo/f/8N/86/cvvAtnhmfPnzp97szqyeVq2OPAgYUlAIoAM4pIiMuN5dyfpM64OGmZ05fSeY5PglxMJ0AorS9rfXLribtHnw9VAOLsR6pZyEmfAoAoiEAoyMiCzkGcjQcPTmi8N9u+eGv74s0bH93GuvzVb/zGH/zev9gcAE8mjC5WFAilLNZWxbpDZ4dPIA1EUmbSoVr0bqCn5ybXt0XRKZ03AQ3liAoKLFNozpw9d+7co0TYNiU0PlfMAdw3iIBIPavfeO21ne2dqqiYQ0zqT98cgNaap29dO7w6bEK6t+PT0jC08/jxNh2TZzm6m3nDGPe684rjD+zYQXybCdqx2u3hSNql/MDhjjjZbsr8ZufuwlY6UAvjAJHATf3s9be//5//+3/74U/fmtzeB5DiS8snzpw4c/bU+tbKyuZIoA4le/aBhIl1ARzWTWtTNV3apD2KNPndPebUTRM6rivVYAsm24nxWOPIX++JgZC6fJ0GihIJIwICETgXtB7bcQMFhKn3H1/dvnTh8s7FW2HCy8O1sydPf+vF33r6qa89duqR0jXIHoRFiw41JjxSwHvkFHbOd5y07dS5HXNxZf5XfIHkqfX8UtQwh4EIASjNIDFIE5q1jbVT5x7b2NpwYNb/C8QcwP2EiAiw5w/evfD+e+9R4QBYO0FDHD1GH5D+tTO3mTa3QjepktGRp6pNOmS+5x++824ebyHu9EE6uq25t4o6VatBASRvdEiS6O5+K+0ct932gVYDQgQRJNE0VVcUjmrEg2nzwQeXX3ntldd++MpbF96ajfcBAgxp66GNM19+8OwTp6rVfgMcwDMKo6ZHiUoWzHOztYLRScWDPqxfRdsnKb0KkiCU11HWlyFA7DGBcTyNGAQRWVjAiRCRA+RADoQbt/vx/s6l29feuzQ+mPRc/yvnnnrmqeeffurrD516aGk4AuDCAQYfpNbJDQAU1oZ2MBflfcrYHztX/lDMOPda1BkUxDYywhT6iSBSOgmAREB6WAiBEbCqyo2NrQfPPNQfjKAoyOGnfBaNnw9zAPcVcVDvG75x9dq7P3m7brxDB7HzcC72P6SKdH/dIRrQR4686lgRZi5y/zRPcMyfdySO7I8ZtB99o6MRAECuFYbuweTfnSwgSO+i+gLl2WcUIIoqjecA4ErqQU9E2Pu96e233/3Jn7/6l6/+4IcXL13z3Ljl/nCjGq0tn33iwfUvrY1W+oJCVdlgCDBm1sb3qd4L40BXAKK0ohGA6uCaiZtb90SFHEVAXTwixXRTRgQQZiJEYQdIIATaJqQMNc2mcuv67SsfXb119ePpzngog7NfOvvtX/3Nx889ee6RR1eXByWVBIzCQYIjaLx3+cx1ZSc9syKH/WmaIG4jlPn5p2MuYreHEAKAtsoQ5DRMofTqeLmRUZhZAMBRf9DfWN/cOnFiaXnFlVpngpai8gvAHMD9h14Rxp2bNy9c+Mn+7l6eZc3fsnQzTU4eNd0dfST9PBKzt1/iuTvvENsfG+xnteauD+6INHWHcSRAZ8gfTWd7+84KRBSyNYkVGUHncHVzuZQsWkICAKYAhMJBil4h6IAKoWJy0Nze3b967do77//k/Ysf3di5fvn6h9PZuOyXqxuro63h5qmlla3lquoBOQYQgiAsIl5YYsansMRkIgZmAgZOddDq6QWjlA8gKKJpSCIsgALMgMLCMhM/DePdyfbVm9tXb+9cPyi4tzZYefTUo8+c/+b5p77+8INnlkd9CVyWBYGwNAhBKKRGTzrpgKwpXJJ7+rcnsGvQj5F+JGlVrWvtBD0AsZVctPS6EckeGfUoU8MMzXoikqrqr66tLa+srp/YqvoVgBbtxSU/sXv5jS8McwD3KSIMARjClYtXLn304f7+vogU5FKxJKQ+vxj18fTCjqw8v8Gjd3am4NJrj3DngOLI5ju/2qM4ZMQhK1fHqkiftv27NwlRfdeXkWQX0E55ShKGABAFmFgbF4s4CEKuLKkERGFsGt80vD85eP0Hb7311o9efvW1i1cuM+wCMBZUVsXSynA4GgyXh4Ol/mBp2Ov3ykFVlEVvWBWlc4Oi6AFXjKUE5BBLqJk9S2Bi5Ia59rs7e9PbB/WsmU1m3nMzberaz2ZBamjGstxf2to6cfrEg88+/ezj55549JHHql6BUKAEEg6hAcchBEJSTS1eN0QWjiOEnAAmHUsOrR/X54tkmR/b65WuQVRk2gSt+auYihAENHFBWIRFAjM5Knu90XC0vLyysrq6tLTUH1RIDgkA43KTd/9pMD4vzAHcr+jQTIIE4BB2trevX7t269a2Lt4NiBStPuIRu9gabZzfHnYMabLBaVjcNfVHRJlj7O4hE55CAWkfTgPBz3TYhyTmY7zE8YlAh4WJfFQSiwNAYyVGiKmayehJalCURuvaSg8kDnAZiZwAOSx9gyiunobLlz6+evXSB5c/uHTl/QvvX7h56/rOzk7dTBtpG0SjQyBdm9KxC1jxysnh1qmN5ZXlACIhOoAwDc3ESxPqST3sDZZHS2vLGw89cOrE5oOrK+trS5sFVGfPnO33q9HSsHCIIITIHLzUzB4IEDXXVKeHgYBiG4446xFLBmMs0I35uhesvVzYeg9oLwVqswYd8AtmLyHpX7z8cWFIJOf6VX84Gg1HSytrq/3BsNfvaaU6EoHECsO2MPizflKMzwNzAPc/cSjFEnwTxvt7N29u397ZmU7GwQfhoE3fEJKUrN/c45VaVaoT87pRRynKqdvzrzy6Y8f/hjSevIOxPnxs2Qp03QmkjBM47EikNeB3MhoSVQokljh7oqp0nvSMU65q83WOIAjqtCi0wRUgMManAAgKgSNEcmVVViICLA03IuibZu/g4GA83t3d3d/f29vfn81mO7s79ay+fuPm1WvXuJQHTp984IGTD558sF9VCNgv+4PBYNgfINLK8opzRb8aFK5w6ApXFOQIyaELjRcJzEEgsHgR8eKJSMfYROh945zjwLo0s3YLTR5ej5nzwB2S0Z7/BKjfT5+YpNR3PACmyxAX3MmtQgSRiFzhyl41GAyGw+FotFQNBqPRgIi0edTcp6k7WjGrf68xB/BXgfwNZRWJAVG48bPpdDqZTCbjejZr6qauZ7Pp1DcNi3AIzLnyC1LsLt15vU5e5d0pMofG5UcVhKyrZPFZZE7zObI9abcAkCSI9F7JKmGnT0Xe/ieeKgEQRBFwKmgIAkogEIgSSRK1AQGACVC03Wh8ROvlUlFx3AUEirI9MbCOgRGICIWB0HHspaxN8xwiEpFmuJRlKYIcL5/EWu3AEiV/SNmisYm2FqCx1i+7JOkwo2gDUBDRhlECAuRQV5PLVWXUGeFjbA+RPHIU47sNO9I1yDO06fSmdYb1HQWJiLAoiqrfr6p+fzDoVVV/NCx7VdWvXFEgISJhahF1t4V8xr3DHMBfZQQgfaFBIiGEEEJTN977yWQSQpjNZl5b+DeN9z54HzgEHwCEWcWR9nufE0yTwKSWOA8Msxrclf3nnUfH1LRikhz71EjH7mPrTOZcR1dwPqpvtVtqz0us/koKD+RCpe7mAHJuFTK0UyIcW1oIEAtg+zVRJQQBUyv7zpnQbBddK1I6Xyxd9IYJBMnlI1Xbmqr2NOLBVOellV/aYFWzS1HXoYmlfMIxdMFcuU0IIpQWds9BlD4DhKF78tugTQAgOh3tBygSJ2MdOVf0emVR9gaDQb8/GA2HVVX1B/2iLImIHAEAIqn3OCYl2bjvMQfwy0hq9Ja+jgiYogdWLcl77wOHpm6auvbeN3Xtg/d14733vvHecwghBBFm5iTxYjaX+jbZELd9JjDd3xn6YSviJPtz6EMnc3apuybJJ1mUzvu375gqoxlzANHZNLR+JG05+wVJ3T0BUhYnaHZOfELa1UM7n3xkKrnAztlAhtxzT4fo+frkDJ18GDlEU+udFmpROx7tcj5uYuaYg6qZNVrj3YnrYsUCahJOUm4AmLl7ZhDJOdfrlWXZK3u94WhU9nrDpaWyLHtVz7nSFS5WKjjq9HE1W//LgDmAxUZkzixrSMEAICEEEfGNZ+a6qTWqYA45kshOIoTAProKEW29LzkiySJDsjidcAAg6VMx2sizgsm8dFIUsWuy52pSOyQDq5YQonyj/mG+E3NnZ9JsqCS9vC0/a3ceclub+HpO76cOJ07EQB6mR7lFIGW1AyBwbDGk8vy80+p4SFKVT9uDC0BHx8e0dzFsSkk3KZ+psxu6m3rwIgLiiqJXlmWvqqpeFHCqftWvemVJzvV6PdDQJu/w3Fk1fgkxB2B8Iockg/kHNOWdmYWZQ2Bm770w++A5BN/4EHzwPoSg3oI5sBLSH9rxvUNOJ0nv0xm6A6Rx+dwO5VKm+QkF0qbZWafJpjvbbTWtulqmE21CmfySxGR5yVn7SefQG535lUMnJdUDJyuv0wBy5JsmebIkHx6kPRUA5E4gFScPIB+lSkEoRIgEhFT2ekRUFGWhU8hErijKsuz1+4RUDfq9qlf2SgAsyiKa+Kzb2Lh+UTEHYHyRSGeqt52rSL9UctaQgTnNXbMAZD8R4g/OAQe3TxdmlsBBWMMPTpqVlmBpPALQFl5JW6qbhtII2uenAJVutLM0xb1vYxiW7hwEHIk94iahDVg6JyFuCDsPpRAnTrcgAAAR6QQyAQCBLsBGziEhEhZFWZRFURbOubJXOVf0+72yLHu9HiJSURCpKN9J3lFovuXo0T00FhVzAMYvmkO2p3sz/921V8mJtBOX8VfqkZeUq6Q/cQAADswizAwgHFgjFf3JIeh8ZxNCHbyXAMzNeAIAwTORAyRyRAjBewBiEV14J+6RJLmncxxd2USVdyLSEIBIc4N0ehjJOeecI+cKl9EBOyIQuTiPrJ4hLuqmG8/pqblQK1b55o5PrczWnjgz9cYdMQdg/FLSmXXuzBQIZLPZijSg43tdAA1RhLMw3yYn6c92emJOuml1svYGHHn4OA65vjm/F+dD0k6aITc+f8wBGAvHUWMb/cThWYQYXBzuQfRJpviTHr4b+92NgczeG1805gAMwzAWFGvAZBiGsaCYAzAMw1hQzAEYhmEsKOYADMMwFhRzAIZhGAuKOQDDMIwFxRyAYRjGgmIOwDAMY0ExB2AYhrGgmAMwDMNYUMwBGIZhLCjmAAzDMBYUcwCGYRgLijkAwzCMBcUcgGEYxoJiDsAwDGNBMQdgGIaxoJgDMAzDWFDMARiGYSwo5gAMwzAWFHMAhmEYC4o5AMMwjAXFHIBhGMaCYg7AMAxjQTEHYBiGsaCYAzAMw1hQzAEYhmEsKOYADMMwFhRzAIZhGAuKOQDDMIwFxRyAYRjGgmIOwDAMY0ExB2AYhrGgmAMwDMNYUMwBGIZhLCjmAAzDMBYUcwCGYRgLijkAwzCMBcUcgGEYxoJiDsAwDGNBMQdgGIaxoJgDMAzDWFDMARiGYSwo5gAMwzAWFGKQe70PhmEYxj2A8F7vgWEYhvGLRwD+P9+1NmicLKQOAAAAAElFTkSuQmCC\\n\",\n      \"text/plain\": [\n       \"<PIL.Image.Image image mode=RGB size=512x512>\"\n      ]\n     },\n     \"execution_count\": 22,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"prompt = \\\"A green pokemon on white background\\\"\\n\",\n    \"image = pipe(prompt=prompt).images[0]\\n\",\n    \"image\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3 (ipykernel)\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.10.12\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "PixArt-alpha-ToCa/notebooks/train.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"c423d2a1-475e-482e-b759-f16456fd6707\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Install\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"0440d6a7-78b9-49e9-98a2-9a5ed75e1a2f\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!git clone https://github.com/kopyl/PixArt-alpha.git\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"0abadf51-a7e3-4091-bb02-0bdd8d28fb73\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%cd PixArt-alpha\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"4df1af24-f439-485d-a946-966dbf16c49b\",\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"!pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117\\n\",\n    \"!pip install -r requirements.txt\\n\",\n    \"!pip install wandb\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"d44474fd-0b92-48fc-b4cf-142b59d3917c\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Download model\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"06b1c1c9-f8b1-4719-8564-2383eac9ff28\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!python tools/download.py --model_names \\\"PixArt-XL-2-512x512.pth\\\"\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"f298a89c-d2a5-4da7-8304-c1390da0ba58\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Make dataset out of Hugginggface dataset\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"e17b8883-0a5c-4fa3-a7d0-e8ee95e42027\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"from tqdm.notebook import tqdm\\n\",\n    \"from datasets import load_dataset\\n\",\n    \"import json\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"92957b2c-6765-48ee-9296-d6739066d74d\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"dataset = load_dataset(\\\"lambdalabs/pokemon-blip-captions\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"0095cdda-c31a-48ee-a115-076a5fc393c3\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"root_dir = \\\"/workspace/pixart-pokemon\\\"\\n\",\n    \"images_dir = \\\"images\\\"\\n\",\n    \"captions_dir = \\\"captions\\\"\\n\",\n    \"\\n\",\n    \"images_dir_absolute = os.path.join(root_dir, images_dir)\\n\",\n    \"captions_dir_absolute = os.path.join(root_dir, captions_dir)\\n\",\n    \"\\n\",\n    \"if not os.path.exists(root_dir):\\n\",\n    \"    os.makedirs(os.path.join(root_dir, images_dir))\\n\",\n    \"\\n\",\n    \"if not os.path.exists(os.path.join(root_dir, images_dir)):\\n\",\n    \"    os.makedirs(os.path.join(root_dir, images_dir))\\n\",\n    \"if not os.path.exists(os.path.join(root_dir, captions_dir)):\\n\",\n    \"    os.makedirs(os.path.join(root_dir, captions_dir))\\n\",\n    \"\\n\",\n    \"image_format = \\\"png\\\"\\n\",\n    \"json_name = \\\"partition/data_info.json\\\"\\n\",\n    \"if not os.path.exists(os.path.join(root_dir, \\\"partition\\\")):\\n\",\n    \"    os.makedirs(os.path.join(root_dir, \\\"partition\\\"))\\n\",\n    \"\\n\",\n    \"absolute_json_name = os.path.join(root_dir, json_name)\\n\",\n    \"data_info = []\\n\",\n    \"\\n\",\n    \"order = 0\\n\",\n    \"for item in tqdm(dataset[\\\"train\\\"]): \\n\",\n    \"    image = item[\\\"image\\\"]\\n\",\n    \"    image.save(f\\\"{images_dir_absolute}/{order}.{image_format}\\\")\\n\",\n    \"    with open(f\\\"{captions_dir_absolute}/{order}.txt\\\", \\\"w\\\") as text_file:\\n\",\n    \"        text_file.write(item[\\\"text\\\"])\\n\",\n    \"    \\n\",\n    \"    width, height = 512, 512\\n\",\n    \"    ratio = 1\\n\",\n    \"    data_info.append({\\n\",\n    \"        \\\"height\\\": height,\\n\",\n    \"        \\\"width\\\": width,\\n\",\n    \"        \\\"ratio\\\": ratio,\\n\",\n    \"        \\\"path\\\": f\\\"images/{order}.{image_format}\\\",\\n\",\n    \"        \\\"prompt\\\": item[\\\"text\\\"],\\n\",\n    \"    })\\n\",\n    \"        \\n\",\n    \"    order += 1\\n\",\n    \"\\n\",\n    \"with open(absolute_json_name, \\\"w\\\") as json_file:\\n\",\n    \"    json.dump(data_info, json_file)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"25be1c03\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Extract features\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"9f07a4f5-1873-48bf-86d0-9304942de5d3\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!python /workspace/PixArt-alpha/tools/extract_features.py \\\\\\n\",\n    \"    --img_size 512 \\\\\\n\",\n    \"    --json_path \\\"/workspace/pixart-pokemon/partition/data_info.json\\\" \\\\\\n\",\n    \"    --t5_save_root \\\"/workspace/pixart-pokemon/caption_feature_wmask\\\" \\\\\\n\",\n    \"    --vae_save_root \\\"/workspace/pixart-pokemon/img_vae_features\\\" \\\\\\n\",\n    \"    --pretrained_models_dir \\\"/workspace/PixArt-alpha/output/pretrained_models\\\" \\\\\\n\",\n    \"    --dataset_root \\\"/workspace/pixart-pokemon\\\"\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"9fc653d0\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!wandb login REPLACE_THIS_WITH_YOUR_AUTH_TOKEN_OF_WANDB\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"2cf1fd1a\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Train model\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"ea0e9dab-17bc-45ed-9c81-b670bbb8de47\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!python -m torch.distributed.launch \\\\\\n\",\n    \"    train_scripts/train.py \\\\\\n\",\n    \"    /workspace/PixArt-alpha/notebooks/PixArt_xl2_img512_internal_for_pokemon_sample_training.py \\\\\\n\",\n    \"    --work-dir output/trained_model \\\\\\n\",\n    \"    --report_to=\\\"wandb\\\" \\\\\\n\",\n    \"    --loss_report_name=\\\"train_loss\\\"\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3 (ipykernel)\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.13\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "PixArt-alpha-ToCa/requirements.txt",
    "content": "torch==2.1.1\ntorchaudio==2.1.1\ntorchvision==0.16.1\nmmcv==1.7.0\ngit+https://github.com/huggingface/diffusers\ntimm==0.6.12\naccelerate\ntensorboard\ntensorboardX\ntransformers\nsentencepiece~=0.1.99\nftfy\nbeautifulsoup4\nprotobuf==3.20.2\ngradio==4.1.1\nyapf==0.40.1\nopencv-python\nbs4\neinops\nxformers\noptimum\npeft==0.6.2"
  },
  {
    "path": "PixArt-alpha-ToCa/scripts/infer_pixart_8_bits.py",
    "content": "# pip install -U accelerate transformers bitsandbytes\n# pip install -U git+https://github.com/huggingface/diffusers\n\nfrom transformers import T5EncoderModel\nfrom diffusers import PixArtAlphaPipeline\nimport torch\nimport gc\n\n\ndef flush():\n    gc.collect()\n    torch.cuda.empty_cache()\n\ndef bytes_to_giga_bytes(bytes):\n    return bytes / 1024 / 1024 / 1024\n\n# Loading in 8 bits needs `bitsandbytes`.\ntext_encoder = T5EncoderModel.from_pretrained(\n    \"PixArt-alpha/PixArt-XL-2-1024-MS\",\n    subfolder=\"text_encoder\",\n    load_in_8bit=True,\n    device_map=\"auto\",\n\n)\n\npipe = PixArtAlphaPipeline.from_pretrained(\n    \"PixArt-alpha/PixArt-XL-2-1024-MS\",\n    text_encoder=text_encoder,\n    transformer=None,\n    device_map=\"auto\"\n)\n\nwith torch.no_grad():\n    prompt = \"cute cat\"\n    prompt_embeds, prompt_attention_mask, negative_embeds, negative_prompt_attention_mask = pipe.encode_prompt(prompt)\n\ndel text_encoder\ndel pipe\nflush()\n\npipe = PixArtAlphaPipeline.from_pretrained(\n    \"PixArt-alpha/PixArt-XL-2-1024-MS\",\n    text_encoder=None,\n    torch_dtype=torch.float16,\n).to(\"cuda\")\n\nlatents = pipe(\n    negative_prompt=None,\n    prompt_embeds=prompt_embeds,\n    negative_prompt_embeds=negative_embeds,\n    prompt_attention_mask=prompt_attention_mask,\n    negative_prompt_attention_mask=negative_prompt_attention_mask,\n    num_images_per_prompt=1,\n    output_type=\"latent\",\n).images\n\ndel pipe.transformer\nflush()\n\nwith torch.no_grad():\n    image = pipe.vae.decode(latents / pipe.vae.config.scaling_factor, return_dict=False)[0]\nimage = pipe.image_processor.postprocess(image, output_type=\"pil\")\n\nimage[0].save(\"cat.png\")\n\nprint(f\"Max memory allocated: {bytes_to_giga_bytes(torch.cuda.max_memory_allocated())} GB\")"
  },
  {
    "path": "PixArt-alpha-ToCa/scripts/inference.py",
    "content": "import os\nimport sys\nfrom pathlib import Path\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\nimport warnings\nwarnings.filterwarnings(\"ignore\")  # ignore warning\nimport re\nimport argparse\nfrom datetime import datetime\nfrom tqdm import tqdm\nimport torch\nfrom torchvision.utils import save_image\nfrom diffusers.models import AutoencoderKL\n\nfrom diffusion.model.utils import prepare_prompt_ar\nfrom diffusion import IDDPM, DPMS, SASolverSampler\nfrom tools.download import find_model\nfrom diffusion.model.nets import PixArtMS_XL_2, PixArt_XL_2\nfrom diffusion.model.t5 import T5Embedder\n#from diffusion.data.datasets import get_chunks, ASPECT_RATIO_512_TEST, ASPECT_RATIO_1024_TEST\nfrom diffusion.data.datasets import get_chunks, ASPECT_RATIO_256_TEST, ASPECT_RATIO_512_TEST, ASPECT_RATIO_1024_TEST\n\n\ndef get_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--image_size', default=256, type=int)\n    parser.add_argument('--t5_path', default='../autodl-tmp/pretrained_models/t5_ckpts', type=str) # change to your own path\n    parser.add_argument('--tokenizer_path', default='../autodl-tmp/pretrained_models/sd-vae-ft-ema', type=str) # change to your own path\n    parser.add_argument('--txt_file', default='asset/samples.txt', type=str) # change to your own path\n    parser.add_argument('--model_path', default='../autodl-tmp/pretrained_models/PixArt-XL-2-1024x1024.pth', type=str) # change to your own path\n    parser.add_argument('--bs', default=1, type=int)\n    parser.add_argument('--cfg_scale', default=4.5, type=float)\n    parser.add_argument('--sampling_algo', default='dpm-solver', type=str, choices=['iddpm', 'dpm-solver', 'sa-solver'])\n    parser.add_argument('--seed', default=0, type=int)\n    parser.add_argument('--dataset', default='custom', type=str)\n    parser.add_argument('--step', default=-1, type=int)\n    parser.add_argument('--save_name', default='test_sample', type=str)\n    parser.add_argument(\"--fresh_ratio\", type=float, default=0.30)\n    parser.add_argument(\"--cache_type\", type=str, choices=['random', 'attention','similarity','norm', 'compress'], default='attention')\n    parser.add_argument(\"--ratio_scheduler\", type=str, default='ToCa', choices=['linear', 'cosine', 'exp', 'constant','linear-mode','layerwise','ToCa'])\n    parser.add_argument(\"--force_fresh\", type=str, choices=['global', 'local'], default='global',\n                        help=\"Force fresh strategy. global: fresh all tokens. local: fresh tokens acheiving fresh step threshold.\")\n    parser.add_argument(\"--fresh_threshold\", type=int, default=3)\n    parser.add_argument(\"--soft_fresh_weight\", type=float, default=0.25,\n                        help=\"soft weight for updating the stale tokens by adding extra scores.\")\n    \n    return parser.parse_args()\n\n\ndef set_env(seed=0):\n    torch.manual_seed(seed)\n    torch.set_grad_enabled(False)\n    for _ in range(30):\n        torch.randn(1, 4, args.image_size, args.image_size)\n\n\n@torch.inference_mode()\ndef visualize(items, bs, sample_steps, cfg_scale):\n\n    for chunk in tqdm(list(get_chunks(items, bs)), unit='batch'):\n\n        prompts = []\n        if bs == 1:\n            prompt_clean, _, hw, ar, custom_hw = prepare_prompt_ar(chunk[0], base_ratios, device=device, show=False)  # ar for aspect ratio\n            if args.image_size == 1024:\n                latent_size_h, latent_size_w = int(hw[0, 0] // 8), int(hw[0, 1] // 8)\n            else:\n                hw = torch.tensor([[args.image_size, args.image_size]], dtype=torch.float, device=device).repeat(bs, 1)\n                ar = torch.tensor([[1.]], device=device).repeat(bs, 1)\n                latent_size_h, latent_size_w = latent_size, latent_size\n            prompts.append(prompt_clean.strip())\n        else:\n            hw = torch.tensor([[args.image_size, args.image_size]], dtype=torch.float, device=device).repeat(bs, 1)\n            ar = torch.tensor([[1.]], device=device).repeat(bs, 1)\n            for prompt in chunk:\n                prompts.append(prepare_prompt_ar(prompt, base_ratios, device=device, show=False)[0].strip())\n            latent_size_h, latent_size_w = latent_size, latent_size\n\n        null_y = model.y_embedder.y_embedding[None].repeat(len(prompts), 1, 1)[:, None]\n\n        with torch.no_grad():\n            caption_embs, emb_masks = t5.get_text_embeddings(prompts)\n            caption_embs = caption_embs.float()[:, None]\n            print('finish embedding')\n\n            if args.sampling_algo == 'iddpm':\n                # Create sampling noise:\n                n = len(prompts)\n                z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device).repeat(2, 1, 1, 1)\n                model_kwargs = dict(y=torch.cat([caption_embs, null_y]),\n                                    cfg_scale=cfg_scale, data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks,\n                                    cache_type = args.cache_type,\n                                    fresh_ratio = args.fresh_ratio,\n                                    fresh_threshold = args.fresh_threshold,\n                                    force_fresh = args.force_fresh,\n                                    ratio_scheduler = args.ratio_scheduler,\n                                    soft_fresh_weight = args.soft_fresh_weight)\n                diffusion = IDDPM(str(sample_steps))\n                # Sample images:\n                samples = diffusion.p_sample_loop(\n                    model.forward_with_cfg, z.shape, z, clip_denoised=False, model_kwargs=model_kwargs, progress=True,\n                    device=device\n                )\n                samples, _ = samples.chunk(2, dim=0)  # Remove null class samples\n            elif args.sampling_algo == 'dpm-solver':\n                # Create sampling noise:\n                n = len(prompts)\n                z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device)\n                model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks,\n                                    cache_type = args.cache_type,\n                                    fresh_ratio = args.fresh_ratio,\n                                    fresh_threshold = args.fresh_threshold,\n                                    force_fresh = args.force_fresh,\n                                    ratio_scheduler = args.ratio_scheduler,\n                                    soft_fresh_weight = args.soft_fresh_weight)\n                dpm_solver = DPMS(model.forward_with_dpmsolver,\n                                  condition=caption_embs,\n                                  uncondition=null_y,\n                                  cfg_scale=cfg_scale,\n                                  model_kwargs=model_kwargs)\n                samples = dpm_solver.sample(\n                    z,\n                    steps=sample_steps,\n                    order=2,\n                    skip_type=\"time_uniform\",\n                    method=\"multistep\",\n                    model_kwargs = model_kwargs,\n                )\n            elif args.sampling_algo == 'sa-solver':\n                # Create sampling noise:\n                n = len(prompts)\n                model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks,\n                                    cache_type = args.cache_type,\n                                    fresh_ratio = args.fresh_ratio,\n                                    fresh_threshold = args.fresh_threshold,\n                                    force_fresh = args.force_fresh,\n                                    ratio_scheduler = args.ratio_scheduler,\n                                    soft_fresh_weight = args.soft_fresh_weight)\n                sa_solver = SASolverSampler(model.forward_with_dpmsolver, device=device)\n                samples = sa_solver.sample(\n                    S=25,\n                    batch_size=n,\n                    shape=(4, latent_size_h, latent_size_w),\n                    eta=1,\n                    conditioning=caption_embs,\n                    unconditional_conditioning=null_y,\n                    unconditional_guidance_scale=cfg_scale,\n                    model_kwargs=model_kwargs,\n                    \n                )[0]\n        samples = vae.decode(samples / 0.18215).sample\n        torch.cuda.empty_cache()\n        # Save images:\n        os.umask(0o000)  # file permission: 666; dir permission: 777\n        for i, sample in enumerate(samples):\n            save_path = os.path.join(save_root, f\"{prompts[i][:100]}.jpg\")\n            print(\"Saving path: \", save_path)\n            save_image(sample, save_path, nrow=1, normalize=True, value_range=(-1, 1))\n\n\nif __name__ == '__main__':\n    args = get_args()\n    # Setup PyTorch:\n    seed = args.seed\n    set_env(seed)\n    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n    assert args.sampling_algo in ['iddpm', 'dpm-solver', 'sa-solver']\n\n    # only support fixed latent size currently\n    latent_size = args.image_size // 8\n    lewei_scale = {256: 1, 512: 1, 1024: 2}     # trick for positional embedding interpolation\n    #lewei_scale = {512: 1, 1024: 2}     # trick for positional embedding interpolation\n    sample_steps_dict = {'iddpm': 100, 'dpm-solver': 20, 'sa-solver': 25}\n    sample_steps = args.step if args.step != -1 else sample_steps_dict[args.sampling_algo]\n    weight_dtype = torch.float16\n    print(f\"Inference with {weight_dtype}\")\n\n    # model setting\n    if args.image_size in [256, 512]:\n        model = PixArt_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size]).to(device)\n    else:\n        model = PixArtMS_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size]).to(device)\n\n    print(f\"Generating sample from ckpt: {args.model_path}\")\n    state_dict = find_model(args.model_path)\n    del state_dict['state_dict']['pos_embed']\n    missing, unexpected = model.load_state_dict(state_dict['state_dict'], strict=False)\n    print('Missing keys: ', missing)\n    print('Unexpected keys', unexpected)\n    model.eval()\n    model.to(weight_dtype)\n    base_ratios = eval(f'ASPECT_RATIO_{args.image_size}_TEST')\n\n    vae = AutoencoderKL.from_pretrained(args.tokenizer_path).to(device)\n    t5 = T5Embedder(device=\"cuda\", local_cache=True, cache_dir=args.t5_path, torch_dtype=torch.float)\n    work_dir = os.path.join(*args.model_path.split('/')[:-2])\n    work_dir = f'/{work_dir}' if args.model_path[0] == '/' else work_dir\n\n    # data setting\n    with open(args.txt_file, 'r') as f:\n        items = [item.strip() for item in f.readlines()]\n\n    # img save setting\n    try:\n        epoch_name = re.search(r'.*epoch_(\\d+).*.pth', args.model_path).group(1)\n        step_name = re.search(r'.*step_(\\d+).*.pth', args.model_path).group(1)\n    except Exception:\n        epoch_name = 'unknown'\n        step_name = 'unknown'\n    img_save_dir = os.path.join(work_dir, 'vis')\n    os.umask(0o000)  # file permission: 666; dir permission: 777\n    os.makedirs(img_save_dir, exist_ok=True)\n\n    save_root = os.path.join(img_save_dir, f\"{datetime.now().date()}_{args.dataset}_epoch{epoch_name}_step{step_name}_scale{args.cfg_scale}_step{sample_steps}_size{args.image_size}_bs{args.bs}_samp{args.sampling_algo}_seed{seed}\")\n    os.makedirs(save_root, exist_ok=True)\n    visualize(items, args.bs, sample_steps, args.cfg_scale)"
  },
  {
    "path": "PixArt-alpha-ToCa/scripts/inference_ddp.py",
    "content": "import os\nimport sys\nfrom pathlib import Path\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\nimport warnings\nwarnings.filterwarnings(\"ignore\")  # ignore warning\nimport re\nimport argparse\nfrom datetime import datetime\nfrom tqdm import tqdm\nimport torch\nfrom torchvision.utils import save_image\nfrom diffusers.models import AutoencoderKL\nimport torch.distributed as dist\nfrom torch.utils.data import DataLoader, DistributedSampler\n\nfrom diffusion.model.utils import prepare_prompt_ar\nfrom diffusion import IDDPM, DPMS, SASolverSampler\nfrom tools.download import find_model\nfrom diffusion.model.nets import PixArtMS_XL_2, PixArt_XL_2\nfrom diffusion.model.t5 import T5Embedder\nfrom diffusion.data.datasets import get_chunks, ASPECT_RATIO_256_TEST, ASPECT_RATIO_512_TEST, ASPECT_RATIO_1024_TEST\n\n\ndef get_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--image_size', default=256, type=int)\n    parser.add_argument('--t5_path', default='../autodl-tmp/pretrained_models/t5_ckpts', type=str) # change to your t5 path\n    parser.add_argument('--tokenizer_path', default='../autodl-tmp/pretrained_models/sd-vae-ft-ema', type=str) # change to your tokenizer path\n    parser.add_argument('--txt_file', default='asset/samples.txt', type=str) # change to your txt prompt file\n    parser.add_argument('--model_path', default='../autodl-tmp/pretrained_models/PixArt-XL-2-1024x1024.pth', type=str)\n    parser.add_argument('--bs', default=1, type=int)\n    parser.add_argument('--cfg_scale', default=4.5, type=float)\n    parser.add_argument('--sampling_algo', default='dpm-solver', type=str, choices=['iddpm', 'dpm-solver', 'sa-solver'])\n    parser.add_argument('--seed', default=0, type=int)\n    parser.add_argument('--dataset', default='custom', type=str)\n    parser.add_argument('--step', default=-1, type=int)\n    parser.add_argument('--save_name', default='test_sample', type=str)\n    parser.add_argument(\"--fresh_ratio\", type=float, default=0.30)\n    parser.add_argument(\"--cache_type\", type=str, choices=['random', 'attention', 'similarity', 'norm', 'compress'], default='attention')\n    parser.add_argument(\"--ratio_scheduler\", type=str, default='ToCa', choices=['linear', 'cosine', 'exp', 'constant', 'linear-mode', 'layerwise', 'ToCa'])\n    parser.add_argument(\"--force_fresh\", type=str, choices=['global', 'local'], default='global')\n    parser.add_argument(\"--fresh_threshold\", type=int, default=3)\n    parser.add_argument(\"--soft_fresh_weight\", type=float, default=0.25)\n    return parser.parse_args()\n\n\ndef setup_ddp():\n    dist.init_process_group(backend='nccl')\n    local_rank = dist.get_rank()\n    torch.cuda.set_device(local_rank)\n    return local_rank\n\n\ndef cleanup_ddp():\n    dist.destroy_process_group()\n\n\ndef set_env(seed=0, local_rank=None):\n    global_seed = seed + local_rank\n    torch.manual_seed(global_seed)\n    torch.cuda.manual_seed(global_seed)\n    #torch.cuda.manual_seed_all(global_seed)\n    torch.set_grad_enabled(False)\n    return torch.device(f'cuda:{local_rank}')\n\n\n\n@torch.inference_mode()\ndef visualize(items, bs, sample_steps, cfg_scale, device):\n    sampler = DistributedSampler(items, shuffle=False, num_replicas=dist.get_world_size(), rank=dist.get_rank())\n    data_loader = DataLoader(items, batch_size=bs, sampler=sampler, drop_last=False)\n    \n    pbar = tqdm(data_loader, unit='batch') if dist.get_rank() == 0 else data_loader\n    for chunk in pbar:\n        prompts = []\n        if bs == 1:\n            prompt_clean, _, hw, ar, custom_hw = prepare_prompt_ar(chunk[0], base_ratios, device=device, show=False)  # ar for aspect ratio\n            if args.image_size == 1024:\n                latent_size_h, latent_size_w = int(hw[0, 0] // 8), int(hw[0, 1] // 8)\n            else:\n                hw = torch.tensor([[args.image_size, args.image_size]], dtype=torch.float, device=device).repeat(bs, 1)\n                ar = torch.tensor([[1.]], device=device).repeat(bs, 1)\n                latent_size_h, latent_size_w = latent_size, latent_size\n            prompts.append(prompt_clean.strip())\n        else:\n            hw = torch.tensor([[args.image_size, args.image_size]], dtype=torch.float, device=device).repeat(bs, 1)\n            ar = torch.tensor([[1.]], device=device).repeat(bs, 1)\n            for prompt in chunk:\n                prompts.append(prepare_prompt_ar(prompt, base_ratios, device=device, show=False)[0].strip())\n            latent_size_h, latent_size_w = latent_size, latent_size\n\n\n        null_y = model.module.y_embedder.y_embedding[None].repeat(len(prompts), 1, 1)[:, None]\n\n        with torch.no_grad():\n            caption_embs, emb_masks = t5.get_text_embeddings(prompts)\n            caption_embs = caption_embs.float()[:, None]\n            #print('finish embedding')\n\n            if args.sampling_algo == 'iddpm':\n                # we have not tested this part, there may bugsss.\n                n = len(prompts)\n                z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device).repeat(2, 1, 1, 1)\n                model_kwargs = dict(y=torch.cat([caption_embs, null_y]),\n                                    cfg_scale=cfg_scale, data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks,\n                                    cache_type=args.cache_type,\n                                    fresh_ratio=args.fresh_ratio,\n                                    fresh_threshold=args.fresh_threshold,\n                                    force_fresh=args.force_fresh,\n                                    ratio_scheduler=args.ratio_scheduler,\n                                    soft_fresh_weight=args.soft_fresh_weight)\n                diffusion = IDDPM(str(sample_steps))\n                samples = diffusion.p_sample_loop(\n                    model.module.forward_with_cfg, z.shape, z, clip_denoised=False, model_kwargs=model_kwargs, progress=True,\n                    device=device\n                )\n                samples, _ = samples.chunk(2, dim=0)\n\n            elif args.sampling_algo == 'dpm-solver':\n                # Main srategy, we have tested and make sure it works.\n                n = len(prompts)\n                z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device)\n                model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks,\n                                    cache_type=args.cache_type,\n                                    fresh_ratio=args.fresh_ratio,\n                                    fresh_threshold=args.fresh_threshold,\n                                    force_fresh=args.force_fresh,\n                                    ratio_scheduler=args.ratio_scheduler,\n                                    soft_fresh_weight=args.soft_fresh_weight)\n                dpm_solver = DPMS(model.module.forward_with_dpmsolver,\n                                  condition=caption_embs,\n                                  uncondition=null_y,\n                                  cfg_scale=cfg_scale,\n                                  model_kwargs=model_kwargs)\n                samples = dpm_solver.sample(\n                    z,\n                    steps=sample_steps,\n                    order=2,\n                    skip_type=\"time_uniform\",\n                    method=\"multistep\",\n                    model_kwargs=model_kwargs,\n                    rank = dist.get_rank()\n                )\n            # not supported now\n            elif args.sampling_algo == 'sa-solver':\n                n = len(prompts)\n                model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks,\n                                    cache_type=args.cache_type,\n                                    fresh_ratio=args.fresh_ratio,\n                                    fresh_threshold=args.fresh_threshold,\n                                    force_fresh=args.force_fresh,\n                                    ratio_scheduler=args.ratio_scheduler,\n                                    soft_fresh_weight=args.soft_fresh_weight)\n                sa_solver = SASolverSampler(model.module.forward_with_dpmsolver, device=device)\n                samples = sa_solver.sample(\n                    S=25,\n                    batch_size=n,\n                    shape=(4, latent_size_h, latent_size_w),\n                    eta=1,\n                    conditioning=caption_embs,\n                    unconditional_conditioning=null_y,\n                    unconditional_guidance_scale=cfg_scale,\n                    model_kwargs=model_kwargs,\n                )[0]\n\n        samples = vae.decode(samples / 0.18215).sample\n        torch.cuda.empty_cache()\n\n        dist.barrier()\n        #if dist.get_rank() == 0:\n        os.umask(0o000)\n        for i, sample in enumerate(samples):\n            save_path = os.path.join(save_root, f\"{prompts[i][:100]}.jpg\")\n            #print(\"Saving path: \", save_path)\n            save_image(sample, save_path, nrow=1, normalize=True, value_range=(-1, 1))\n\n\nif __name__ == '__main__':\n    args = get_args()\n    \n    # Setup DDP\n    local_rank = setup_ddp()\n    \n    # Setup environment\n    device = set_env(args.seed, local_rank)\n    \n    # only support fixed latent size currently\n    latent_size = args.image_size // 8\n    lewei_scale = {256: 1, 512: 1, 1024: 2}\n    sample_steps_dict = {'iddpm': 100, 'dpm-solver': 20, 'sa-solver': 25}\n    sample_steps = args.step if args.step != -1 else sample_steps_dict[args.sampling_algo]\n    weight_dtype = torch.float16\n    print(f\"Inference with {weight_dtype}\")\n\n    # model setting\n    if args.image_size in [256, 512]:\n        model = PixArt_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size]).to(device)\n    else:\n        model = PixArtMS_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size]).to(device)\n\n    print(f\"Generating sample from ckpt: {args.model_path}\")\n    state_dict = find_model(args.model_path)\n    del state_dict['state_dict']['pos_embed']\n    missing, unexpected = model.load_state_dict(state_dict['state_dict'], strict=False)\n    print('Missing keys: ', missing)\n    print('Unexpected keys', unexpected)\n    model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[local_rank])\n    model.module.eval()\n    model.module.to(weight_dtype)\n    base_ratios = eval(f'ASPECT_RATIO_{args.image_size}_TEST')\n\n    vae = AutoencoderKL.from_pretrained(args.tokenizer_path).to(device)\n    t5 = T5Embedder(device=\"cuda\", local_cache=True, cache_dir=args.t5_path, torch_dtype=torch.float)\n    work_dir = os.path.join(*args.model_path.split('/')[:-2])\n    work_dir = f'/{work_dir}' if args.model_path[0] == '/' else work_dir\n\n    with open(args.txt_file, 'r') as f:\n        items = [item.strip() for item in f.readlines()]\n\n    epoch_name = re.search(r'.*epoch_(\\d+).*.pth', args.model_path).group(1) if re.search(r'.*epoch_(\\d+).*.pth', args.model_path) else 'unknown'\n    step_name = re.search(r'.*step_(\\d+).*.pth', args.model_path).group(1) if re.search(r'.*step_(\\d+).*.pth', args.model_path) else 'unknown'\n    img_save_dir = os.path.join(work_dir, 'vis')\n    os.umask(0o000)\n    os.makedirs(img_save_dir, exist_ok=True)\n\n    save_root = os.path.join(img_save_dir, f\"{datetime.now().date()}_{args.dataset}_epoch{epoch_name}_step{step_name}_scale{args.cfg_scale}_step{sample_steps}_size{args.image_size}_bs{args.bs}_samp{args.sampling_algo}_seed{args.seed}\")\n    os.makedirs(save_root, exist_ok=True)\n\n    visualize(items, args.bs, sample_steps, args.cfg_scale, device)\n    \n    cleanup_ddp()\n"
  },
  {
    "path": "PixArt-alpha-ToCa/scripts/inference_lcm.py",
    "content": "import os\nimport sys\nfrom pathlib import Path\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\nimport warnings\nwarnings.filterwarnings(\"ignore\")  # ignore warning\nimport re\nimport argparse\nfrom datetime import datetime\nfrom tqdm import tqdm\nimport torch\nfrom torchvision.utils import save_image\nfrom diffusers.models import AutoencoderKL\n\nfrom diffusion.model.utils import prepare_prompt_ar\nfrom tools.download import find_model\nfrom diffusion.model.nets import PixArtMS_XL_2, PixArt_XL_2\nfrom diffusion.model.t5 import T5Embedder\nfrom diffusion.data.datasets import get_chunks\nfrom diffusion.lcm_scheduler import LCMScheduler\nfrom diffusion.data.datasets import ASPECT_RATIO_512_TEST, ASPECT_RATIO_1024_TEST\n\n\n\ndef get_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--image_size', default=1024, type=int)\n    parser.add_argument('--t5_path', default='output/pretrained_models/t5_ckpts', type=str)\n    parser.add_argument('--tokenizer_path', default='output/pretrained_models/sd-vae-ft-ema', type=str)\n    parser.add_argument('--txt_file', default='asset/samples.txt', type=str)\n    parser.add_argument('--model_path', default='output/pretrained_models/PixArt-XL-2-1024x1024.pth', type=str)\n    parser.add_argument('--bs', default=1, type=int)\n    parser.add_argument('--cfg_scale', default=4.5, type=float)\n    parser.add_argument('--sample_steps', default=4, type=int)\n    parser.add_argument('--seed', default=0, type=int)\n    parser.add_argument('--dataset', default='custom', type=str)\n    parser.add_argument('--step', default=-1, type=int)\n    parser.add_argument('--save_name', default='test_sample', type=str)\n\n    return parser.parse_args()\n\n\ndef set_env(seed=0):\n    torch.manual_seed(seed)\n    torch.set_grad_enabled(False)\n    for _ in range(30):\n        torch.randn(1, 4, args.image_size, args.image_size)\n\n@torch.inference_mode()\ndef visualize(items, bs, sample_steps, cfg_scale):\n    # 4. Prepare timesteps\n    scheduler.set_timesteps(sample_steps, 50)\n    timesteps = scheduler.timesteps\n\n    for chunk in tqdm(list(get_chunks(items, bs)), unit='batch'):\n\n        prompts = []\n        if bs == 1:\n            prompt_clean, _, hw, ar, custom_hw = prepare_prompt_ar(chunk[0], base_ratios, device=device, show=False)  # ar for aspect ratio\n            if args.image_size == 1024:\n                latent_size_h, latent_size_w = int(hw[0, 0] // 8), int(hw[0, 1] // 8)\n            else:\n                hw = torch.tensor([[args.image_size, args.image_size]], dtype=torch.float, device=device).repeat(bs, 1)\n                ar = torch.tensor([[1.]], device=device).repeat(bs, 1)\n                latent_size_h, latent_size_w = latent_size, latent_size\n            prompts.append(prompt_clean.strip())\n        else:\n            hw = torch.tensor([[args.image_size, args.image_size]], dtype=torch.float, device=device).repeat(bs, 1)\n            ar = torch.tensor([[1.]], device=device).repeat(bs, 1)\n            prompts.append(prepare_prompt_ar(prompt, base_ratios, device=device, show=False)[0].strip())\n            latent_size_h, latent_size_w = latent_size, latent_size\n\n        with torch.no_grad():\n            caption_embs, emb_masks = t5.get_text_embeddings(prompts)\n            caption_embs = caption_embs.float()[:, None]\n            print('finish embedding')\n\n            # Create sampling noise:\n            n = len(prompts)\n            latents = torch.randn(n, 4, latent_size_h, latent_size_w, device=device)\n            model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks)\n\n            # 7. LCM MultiStep Sampling Loop:\n            for i, t in tqdm(list(enumerate(timesteps))):\n                ts = torch.full((bs,), t, device=device, dtype=torch.long)\n\n                # model prediction (v-prediction, eps, x)\n                model_pred = model(latents, ts, caption_embs, **model_kwargs)[:, :4]\n\n                # compute the previous noisy sample x_t -> x_t-1\n                latents, denoised = scheduler.step(model_pred, i, t, latents, return_dict=False)\n\n        samples = vae.decode(denoised / 0.18215).sample\n        torch.cuda.empty_cache()\n        # Save images:\n        os.umask(0o000)  # file permission: 666; dir permission: 777\n        for i, sample in enumerate(samples):\n            save_path = os.path.join(save_root, f\"{prompts[i][:100]}.jpg\")\n            print(\"Saving path: \", save_path)\n            save_image(sample, save_path, nrow=1, normalize=True, value_range=(-1, 1))\n\n\nif __name__ == '__main__':\n    args = get_args()\n    # Setup PyTorch:\n    seed = args.seed\n    set_env(seed)\n    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n\n    # only support fixed latent size currently\n    latent_size = args.image_size // 8\n    lewei_scale = {512: 1, 1024: 2}     # trick for positional embedding interpolation\n    sample_steps = args.sample_steps\n\n    # Initalize Scheduler:\n    scheduler = LCMScheduler(beta_start=0.0001, beta_end=0.02, beta_schedule=\"linear\", prediction_type=\"epsilon\")\n\n    # model setting\n    if args.image_size == 512:\n        model = PixArt_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size]).to(device)\n    else:\n        model = PixArtMS_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size]).to(device)\n\n    print(f\"Generating sample from ckpt: {args.model_path}\")\n    state_dict = find_model(args.model_path)\n    del state_dict['state_dict']['pos_embed']\n    missing, unexpected = model.load_state_dict(state_dict['state_dict'], strict=False)\n    print('Missing keys: ', missing)\n    print('Unexpected keys', unexpected)\n    model.eval()\n    base_ratios = eval(f'ASPECT_RATIO_{args.image_size}_TEST')\n\n    vae = AutoencoderKL.from_pretrained(args.tokenizer_path).to(device)\n    t5 = T5Embedder(device=\"cuda\", local_cache=True, cache_dir=args.t5_path, torch_dtype=torch.float)\n    work_dir = os.path.join(*args.model_path.split('/')[:-2])\n    work_dir = f'/{work_dir}' if args.model_path[0] == '/' else work_dir\n\n    # data setting\n    with open(args.txt_file, 'r') as f:\n        items = [item.strip() for item in f.readlines()]\n\n    # img save setting\n    try:\n        epoch_name = re.search(r'.*epoch_(\\d+).*.pth', args.model_path).group(1)\n        step_name = re.search(r'.*step_(\\d+).*.pth', args.model_path).group(1)\n    except Exception:\n        epoch_name = 'unknown'\n        step_name = 'unknown'\n    img_save_dir = os.path.join(work_dir, 'vis')\n    os.umask(0o000)  # file permission: 666; dir permission: 777\n    os.makedirs(img_save_dir, exist_ok=True)\n\n    save_root = os.path.join(img_save_dir, f\"{datetime.now().date()}_{args.dataset}_epoch{epoch_name}_step{step_name}_scale{args.cfg_scale}_step{sample_steps}_size{args.image_size}_bs{args.bs}_sampLCM_seed{seed}\")\n    os.makedirs(save_root, exist_ok=True)\n    visualize(items, args.bs, sample_steps, args.cfg_scale)\n\n"
  },
  {
    "path": "PixArt-alpha-ToCa/scripts/interface.py",
    "content": "import argparse\nimport sys\nfrom pathlib import Path\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\nimport os\nimport random\nimport torch\nfrom torchvision.utils import save_image\nfrom diffusion import IDDPM, DPMS, SASolverSampler\nfrom diffusers.models import AutoencoderKL\nfrom tools.download import find_model\nfrom datetime import datetime\nfrom typing import List, Union\nimport gradio as gr\nimport numpy as np\nfrom gradio.components import Textbox, Image\nfrom diffusion.model.utils import prepare_prompt_ar, resize_and_crop_tensor\nfrom diffusion.model.nets import PixArtMS_XL_2, PixArt_XL_2\nfrom diffusion.model.t5 import T5Embedder\nfrom torchvision.utils import _log_api_usage_once, make_grid\nfrom diffusion.data.datasets import ASPECT_RATIO_512_TEST, ASPECT_RATIO_1024_TEST\nfrom asset.examples import examples\n\n\nMAX_SEED = np.iinfo(np.int32).max\n\n\ndef get_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--image_size', default=1024, type=int)\n    parser.add_argument('--model_path', default='output/pretrained_models/PixArt-XL-2-1024-MS.pth', type=str)\n    parser.add_argument('--t5_path', default='output/pretrained_models', type=str)\n    parser.add_argument('--tokenizer_path', default='output/pretrained_models/sd-vae-ft-ema', type=str)\n    parser.add_argument('--llm_model', default='t5', type=str)\n    parser.add_argument('--port', default=7788, type=int)\n\n    return parser.parse_args()\n\n\n@torch.no_grad()\ndef ndarr_image(tensor: Union[torch.Tensor, List[torch.Tensor]], **kwargs,) -> None:\n    if not torch.jit.is_scripting() and not torch.jit.is_tracing():\n        _log_api_usage_once(save_image)\n    grid = make_grid(tensor, **kwargs)\n    # Add 0.5 after unnormalizing to [0, 255] to round to the nearest integer\n    return grid.mul(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).to(\"cpu\", torch.uint8).numpy()\n\n\ndef set_env(seed=0):\n    torch.manual_seed(seed)\n    torch.set_grad_enabled(False)\n    for _ in range(30):\n        torch.randn(1, 4, args.image_size, args.image_size)\n\n\ndef randomize_seed_fn(seed: int, randomize_seed: bool) -> int:\n    if randomize_seed:\n        seed = random.randint(0, MAX_SEED)\n    return seed\n\n\n@torch.inference_mode()\ndef generate_img(prompt, sampler, sample_steps, scale, seed=0, randomize_seed=False):\n    seed = int(randomize_seed_fn(seed, randomize_seed))\n    set_env(seed)\n\n    os.makedirs(f'output/demo/online_demo_prompts/', exist_ok=True)\n    save_promt_path = f'output/demo/online_demo_prompts/tested_prompts{datetime.now().date()}.txt'\n    with open(save_promt_path, 'a') as f:\n        f.write(prompt + '\\n')\n    print(prompt)\n    prompt_clean, prompt_show, hw, ar, custom_hw = prepare_prompt_ar(prompt, base_ratios, device=device)      # ar for aspect ratio\n    prompt_clean = prompt_clean.strip()\n    if isinstance(prompt_clean, str):\n        prompts = [prompt_clean]\n\n    caption_embs, emb_masks = llm_embed_model.get_text_embeddings(prompts)\n    caption_embs = caption_embs[:, None]\n\n    null_y = model.y_embedder.y_embedding[None].repeat(len(prompts), 1, 1)[:, None]\n\n    latent_size_h, latent_size_w = int(hw[0, 0]//8), int(hw[0, 1]//8)\n    # Sample images:\n    if sampler == 'iddpm':\n        # Create sampling noise:\n        n = len(prompts)\n        z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device).repeat(2, 1, 1, 1)\n        model_kwargs = dict(y=torch.cat([caption_embs, null_y]),\n                            cfg_scale=scale, data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks)\n        diffusion = IDDPM(str(sample_steps))\n        samples = diffusion.p_sample_loop(\n            model.forward_with_cfg, z.shape, z, clip_denoised=False, model_kwargs=model_kwargs, progress=True,\n            device=device\n        )\n        samples, _ = samples.chunk(2, dim=0)  # Remove null class samples\n    elif sampler == 'dpm-solver':\n        # Create sampling noise:\n        n = len(prompts)\n        z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device)\n        model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks)\n        dpm_solver = DPMS(model.forward_with_dpmsolver,\n                          condition=caption_embs,\n                          uncondition=null_y,\n                          cfg_scale=scale,\n                          model_kwargs=model_kwargs)\n        samples = dpm_solver.sample(\n            z,\n            steps=sample_steps,\n            order=2,\n            skip_type=\"time_uniform\",\n            method=\"multistep\",\n        )\n    elif sampler == 'sa-solver':\n        # Create sampling noise:\n        n = len(prompts)\n        model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks)\n        sa_solver = SASolverSampler(model.forward_with_dpmsolver, device=device)\n        samples = sa_solver.sample(\n            S=sample_steps,\n            batch_size=n,\n            shape=(4, latent_size_h, latent_size_w),\n            eta=1,\n            conditioning=caption_embs,\n            unconditional_conditioning=null_y,\n            unconditional_guidance_scale=scale,\n            model_kwargs=model_kwargs,\n        )[0]\n    samples = vae.decode(samples / 0.18215).sample\n    torch.cuda.empty_cache()\n    samples = resize_and_crop_tensor(samples, custom_hw[0,1], custom_hw[0,0])\n    display_model_info = f'Model path: {args.model_path},\\nBase image size: {args.image_size}, \\nSampling Algo: {sampler}'\n    return ndarr_image(samples, normalize=True, value_range=(-1, 1)), prompt_show, display_model_info, seed\n\n\nif __name__ == '__main__':\n    from diffusion.utils.logger import get_root_logger\n    args = get_args()\n    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n    logger = get_root_logger()\n\n    assert args.image_size in [512, 1024], \"We only provide pre-trained models for 256x256, 512x512 and 1024x1024 resolutions.\"\n    lewei_scale = {512: 1, 1024: 2}\n    latent_size = args.image_size // 8\n    t5_device = {512: 'cuda', 1024: 'cuda'}\n    if args.image_size == 512:\n        model = PixArt_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size]).to(device)\n    else:\n        model = PixArtMS_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size]).to(device)\n    state_dict = find_model(args.model_path)\n    del state_dict['state_dict']['pos_embed']\n    missing, unexpected = model.load_state_dict(state_dict['state_dict'], strict=False)\n    logger.warning(f'Missing keys: {missing}')\n    logger.warning(f'Unexpected keys: {unexpected}')\n    model.eval()\n    base_ratios = eval(f'ASPECT_RATIO_{args.image_size}_TEST')\n\n    vae = AutoencoderKL.from_pretrained(args.tokenizer_path).to(device)\n\n    if args.llm_model == 't5':\n        llm_embed_model = T5Embedder(device=t5_device[args.image_size], local_cache=True, cache_dir=args.t5_path, torch_dtype=torch.float)\n    else:\n        print('We support t5 only, please initialize the llm again')\n        sys.exit()\n\n    title = f\"\"\"\n        '' Unleashing your Creativity \\n ''\n        <div style='display: flex; align-items: center; justify-content: center; text-align: center;'>\n            <img src='https://raw.githubusercontent.com/PixArt-alpha/PixArt-alpha.github.io/master/static/images/logo.png' style='width: 400px; height: auto; margin-right: 10px;' />\n            {args.image_size}px\n        </div>\n    \"\"\"\n    DESCRIPTION = \"\"\"# PixArt-Alpha 1024px\n            ## If PixArt-Alpha is helpful, please help to ⭐ the [Github Repo](https://github.com/PixArt-alpha/PixArt) and recommend it to your friends 😊'\n            #### [PixArt-Alpha 1024px](https://github.com/PixArt-alpha/PixArt-alpha) is a transformer-based text-to-image diffusion system trained on text embeddings from T5. This demo uses the [PixArt-alpha/PixArt-XL-2-1024-MS](https://huggingface.co/PixArt-alpha/PixArt-XL-2-1024-MS) checkpoint.\n            #### English prompts ONLY; 提示词仅限英文\n            Don't want to queue? Try [OpenXLab](https://openxlab.org.cn/apps/detail/PixArt-alpha/PixArt-alpha) or [Google Colab Demo](https://colab.research.google.com/drive/1jZ5UZXk7tcpTfVwnX33dDuefNMcnW9ME?usp=sharing).\n            \"\"\"\n    if not torch.cuda.is_available():\n        DESCRIPTION += \"\\n<p>Running on CPU 🥶 This demo does not work on CPU.</p>\"\n\n    demo = gr.Interface(\n        fn=generate_img,\n        inputs=[Textbox(label=\"Note: If you want to specify a aspect ratio or determine a customized height and width, \"\n                              \"use --ar h:w (or --aspect_ratio h:w) or --hw h:w. If no aspect ratio or hw is given, all setting will be default.\",\n                        placeholder=\"Please enter your prompt. \\n\"),\n                gr.Radio(\n                    choices=[\"iddpm\", \"dpm-solver\"],\n                    label=f\"Sampler\",\n                    interactive=True,\n                    value='dpm-solver',\n                ),\n                gr.Slider(\n                    label='Sample Steps',\n                    minimum=1,\n                    maximum=100,\n                    value=14,\n                    step=1\n                ),\n                gr.Slider(\n                    label='Guidance Scale',\n                    minimum=0.1,\n                    maximum=30.0,\n                    value=4.5,\n                    step=0.1\n                ),\n                gr.Slider(\n                    label=\"Seed\",\n                    minimum=0,\n                    maximum=MAX_SEED,\n                    step=1,\n                    value=0,\n                ),\n                gr.Checkbox(label=\"Randomize seed\", value=True),\n                ],\n        outputs=[Image(type=\"numpy\", label=\"Img\"),\n                 Textbox(label=\"clean prompt\"),\n                 Textbox(label=\"model info\"),\n                 gr.Slider(label='seed')],\n        title=title,\n        description=DESCRIPTION,\n        examples=examples,\n    )\n    demo.launch(server_name=\"0.0.0.0\", server_port=args.port, debug=True)"
  },
  {
    "path": "PixArt-alpha-ToCa/scripts/interface_controlnet.py",
    "content": "import argparse\nimport os\nfrom datetime import datetime\nimport numpy as np\nimport sys\nfrom pathlib import Path\nfrom typing import List, Union\n\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\n\nimport gradio as gr\nfrom gradio.components import Textbox, Image, Slider\nimport torch\nimport torchvision.transforms as T\nimport torchvision.transforms.functional as TF\nfrom torchvision.utils import _log_api_usage_once, make_grid, save_image\n\nfrom diffusion import IDDPM, DPMS, SASolverSampler\nfrom diffusion.data.datasets import *\nfrom diffusion.model.hed import HEDdetector\nfrom diffusion.model.nets import PixArtMS_XL_2, ControlPixArtHalf, ControlPixArtMSHalf\nfrom diffusion.model.t5 import T5Embedder\nfrom diffusion.model.utils import prepare_prompt_ar, resize_and_crop_tensor\nfrom diffusion.utils.misc import read_config\nfrom diffusers.models import AutoencoderKL\nfrom tools.download import find_model\n\nvae_scale = 0.18215\n\nDESCRIPTION = \"\"\"![Logo](https://raw.githubusercontent.com/PixArt-alpha/PixArt-alpha.github.io/master/static/images/logo.png)\n        # PixArt-Alpha 1024px + ControlNet. This is the demo for ControlNet combined with 1024px PixArt-Alpha.\n        # The input reference image need to be around 1024x1024. And descriptive prompts also need to be provided.\n        # You may change the random seed, if you didn't get satisfied results.\n        \"\"\"\n\n\ndef get_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"config\", type=str, help=\"config\")\n    parser.add_argument('--num_sampling_steps', default=14, type=int)\n    parser.add_argument('--cfg_scale', default=4.5, type=int)\n    parser.add_argument('--image_size', default=1024, type=int)\n    parser.add_argument('--model_path', type=str)\n    parser.add_argument('--tokenizer_path', default='output/pretrained_models/sd-vae-ft-ema', type=str)\n\n    parser.add_argument('--llm_model', default='t5', type=str)\n\n    parser.add_argument('--sampling_algo', default='dpm-solver', type=str, choices=['iddpm', 'dpm-solver', 'sa-solver'])\n\n    parser.add_argument('--port', default=7788, type=int)\n    parser.add_argument('--condition_strength', default=1, type=float)\n\n    return parser.parse_args()\n\n\n@torch.no_grad()\ndef ndarr_image(tensor: Union[torch.Tensor, List[torch.Tensor]], **kwargs, ) -> None:\n    if not torch.jit.is_scripting() and not torch.jit.is_tracing():\n        _log_api_usage_once(save_image)\n    grid = make_grid(tensor, **kwargs)\n    ndarr = grid.mul(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).to(\"cpu\", torch.uint8).numpy()\n    return ndarr\n\n\ndef set_env():\n    torch.manual_seed(0)\n    torch.set_grad_enabled(False)\n\n\n@torch.inference_mode()\ndef generate_img(prompt, given_image, seed):\n    torch.manual_seed(seed)\n    torch.cuda.empty_cache()\n    strength = 1.0\n    c_vis = given_image\n\n    save_promt_path = f'{save_prompt_path}/tested_prompts{datetime.now().date()}.txt'\n    with open(save_promt_path, 'a') as f:\n        f.write(prompt + '\\n')\n    prompt_clean, prompt_show, hw, ar, custom_hw = prepare_prompt_ar(prompt, base_ratios, device=device)  # ar for aspect ratio\n    prompt_clean = prompt_clean.strip()\n    if isinstance(prompt_clean, str):\n        prompts = [prompt_clean]\n\n    caption_embs, emb_masks = llm_embed_model.get_text_embeddings(prompts)\n    caption_embs = caption_embs[:, None]\n\n    null_y = model.y_embedder.y_embedding[None].repeat(len(prompts), 1, 1)[:, None]\n\n    # condition process\n    if given_image is not None:\n        ar = torch.tensor([given_image.size[1] / given_image.size[0]], device=device)[None]\n        custom_hw = torch.tensor([given_image.size[1], given_image.size[0]], device=device)[None]\n        closest_hw = base_ratios[min(base_ratios.keys(), key=lambda ratio: abs(float(ratio) - ar))]\n        hw = torch.tensor(closest_hw, device=device)[None]\n        condition_transform = T.Compose([\n            T.Lambda(lambda img: img.convert('RGB')),\n            T.Resize(int(min(closest_hw))),\n            T.CenterCrop([int(closest_hw[0]), int(closest_hw[1])]),\n            T.ToTensor(),\n        ])\n\n        given_image = condition_transform(given_image).unsqueeze(0).to(device)\n        hed_edge = hed(given_image) * strength\n        hed_edge = TF.normalize(hed_edge, [.5], [.5])\n        hed_edge = hed_edge.repeat(1, 3, 1, 1)\n        posterior = vae.encode(hed_edge).latent_dist\n        condition = posterior.sample()\n        c = condition * vae_scale\n        c_vis = vae.decode(condition)['sample']\n        c_vis = torch.clamp(127.5 * c_vis + 128.0, 0, 255).permute(0, 2, 3, 1).to(\"cpu\", dtype=torch.uint8).numpy()[0]\n    else:\n        c = None\n\n    latent_size_h, latent_size_w = int(hw[0, 0] // 8), int(hw[0, 1] // 8)\n    # Sample images:\n    if args.sampling_algo == 'iddpm':\n        # Create sampling noise:\n        n = len(prompts)\n        z = torch.randn(n, 4, latent_size, latent_size, device=device).repeat(2, 1, 1, 1)\n        model_kwargs = dict(y=torch.cat([caption_embs, null_y]), cfg_scale=args.cfg_scale,\n                            data_info={'img_hw': hw, 'aspect_ratio': ar},\n                            mask=emb_masks, c=c)\n        diffusion = IDDPM(str(args.num_sampling_steps))\n        samples = diffusion.p_sample_loop(\n            model.forward_with_cfg, z.shape, z, clip_denoised=False, model_kwargs=model_kwargs, progress=True,\n            device=device\n        )\n        samples, _ = samples.chunk(2, dim=0)  # Remove null class samples\n    elif args.sampling_algo == 'dpm-solver':\n        # Create sampling noise:\n        n = len(prompts)\n        z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device)\n        model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks, c=c)\n        dpm_solver = DPMS(model.forward_with_dpmsolver,\n                          condition=caption_embs,\n                          uncondition=null_y,\n                          cfg_scale=args.cfg_scale,\n                          model_kwargs=model_kwargs)\n        samples = dpm_solver.sample(\n            z,\n            steps=args.num_sampling_steps,\n            order=2,\n            skip_type=\"time_uniform\",\n            method=\"multistep\",\n        )\n\n    elif args.sampling_algo == 'sa-solver':\n        # Create sampling noise:\n        n = len(prompts)\n        model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks, c=c)\n        sas_solver = SASolverSampler(model.forward_with_dpmsolver, device=device)\n        samples = sas_solver.sample(\n            S=args.num_sampling_steps,\n            batch_size=n,\n            shape=(4, latent_size_h, latent_size_w),\n            eta=1,\n            conditioning=caption_embs,\n            unconditional_conditioning=null_y,\n            unconditional_guidance_scale=args.cfg_scale,\n            model_kwargs=model_kwargs,\n        )[0]\n\n    samples = vae.decode(samples / vae_scale).sample\n    torch.cuda.empty_cache()\n    samples = resize_and_crop_tensor(samples, custom_hw[0, 1], custom_hw[0, 0])\n\n    return ndarr_image(samples, normalize=True, value_range=(-1, 1)), c_vis, prompt_show\n\n\nif __name__ == '__main__':\n    args = get_args()\n    config = read_config(args.config)\n    set_env()\n    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n    save_prompt_path = 'output/demo/online_demo_prompts/'\n    os.makedirs(save_prompt_path, exist_ok=True)\n\n    assert args.image_size in [512, 1024], \"We only provide pre-trained models for 512x512 and 1024x1024 resolutions.\"\n    lewei_scale = {512: 1, 1024: 2}\n    latent_size = args.image_size // 8\n    weight_dtype = torch.float16\n    print(f\"Inference with {weight_dtype}\")\n\n    model = PixArtMS_XL_2(input_size=latent_size, lewei_scale=lewei_scale[args.image_size])\n    if config.image_size == 512:\n        print('model architecture ControlPixArtHalf and image size is 512')\n        model = ControlPixArtHalf(model).to(device)\n    elif config.image_size == 1024:\n        print('model architecture ControlPixArtMSHalf and image size is 1024')\n        model = ControlPixArtMSHalf(model).to(device)\n\n    state_dict = find_model(args.model_path)['state_dict']\n    if 'pos_embed' in state_dict:\n        del state_dict['pos_embed']\n    elif 'base_model.pos_embed' in state_dict:\n        del state_dict['base_model.pos_embed']\n    missing, unexpected = model.load_state_dict(state_dict, strict=False)\n    print('Missing keys (missing pos_embed is normal): ', missing)\n    print('Unexpected keys', unexpected)\n    model.eval()\n    model.to(weight_dtype)\n    display_model_info = f'model path: {args.model_path},\\n base image size: {args.image_size}'\n    base_ratios = eval(f'ASPECT_RATIO_{args.image_size}_TEST')\n\n    vae = AutoencoderKL.from_pretrained(args.tokenizer_path).to(device)\n    hed = HEDdetector(False).to(device)\n\n    if args.llm_model == 't5':\n        print(\"begin load t5\")\n        llm_embed_model = T5Embedder(device=device, local_cache=True, cache_dir='data/t5_ckpts', torch_dtype=torch.float)\n        print(\"finish load t5\")\n    else:\n        print(f'We support t5 only, please initialize the llm again')\n        sys.exit()\n\n    gr.Markdown(DESCRIPTION)\n    demo = gr.Interface(fn=generate_img,\n                        inputs=[\n                            Textbox(label=\"Enter a reference image, the resolution of image need around 1024 x 1024\",\n                                    placeholder=\"Please enter your prompt. \\n\"),\n                            Image(type=\"pil\", label=\"Condition\"),\n                            Slider(minimum=0., maximum=10000., value=0, step=2, label='seed'),\n                            ],\n                        outputs=[Image(type=\"numpy\", label=\"Img\"),\n                                 Image(type=\"numpy\", label=\"HED Edge Map\"),\n                                 Textbox(label=\"clean prompt\"),]\n                        )\n    demo.queue(max_size=20).launch(server_name=\"0.0.0.0\", server_port=args.port, debug=True)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/scripts/pipeline_pixart_inpaint.py",
    "content": "# Copyright 2023 PixArt-Alpha Authors and The HuggingFace Team. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\nimport html\nimport inspect\nimport re\nimport urllib.parse as ul\nfrom typing import Callable, List, Optional, Tuple, Union\n\nimport torch\nimport torch.nn.functional as F\nfrom transformers import T5EncoderModel, T5Tokenizer\n\nfrom diffusers.image_processor import PipelineImageInput, PixArtImageProcessor, VaeImageProcessor\nfrom diffusers.models import AutoencoderKL, Transformer2DModel\nfrom diffusers.pipelines.pipeline_utils import DiffusionPipeline, ImagePipelineOutput\nfrom diffusers.schedulers import DPMSolverMultistepScheduler\nfrom diffusers.utils import (\n    BACKENDS_MAPPING,\n    deprecate,\n    is_bs4_available,\n    is_ftfy_available,\n    logging,\n    replace_example_docstring,\n)\nfrom diffusers.utils.torch_utils import randn_tensor\n\n\nlogger = logging.get_logger(__name__)  # pylint: disable=invalid-name\n\nif is_bs4_available():\n    from bs4 import BeautifulSoup\n\nif is_ftfy_available():\n    import ftfy\n\nEXAMPLE_DOC_STRING = \"\"\"\n    Examples:\n        ```py\n        >>> import torch\n        >>> from diffusers import PixArtAlphaInpaintPipeline\n\n        >>> # You can replace the checkpoint id with \"PixArt-alpha/PixArt-XL-2-512x512\" too.\n        >>> pipe = PixArtAlphaInpaintPipeline.from_pretrained(\"PixArt-alpha/PixArt-XL-2-1024-MS\", torch_dtype=torch.float16)\n        >>> # Enable memory optimizations.\n        >>> pipe.enable_model_cpu_offload()\n\n        >>> prompt = \"\"\n        >>> image = Image.open('')\n        >>> image = pipe(prompt,\n                        image=image,\n                        mask_image=mask_image,\n                        strength=1.0).images[0]\n        ```\n\"\"\"\n\nASPECT_RATIO_1024_BIN = {\n    \"0.25\": [512.0, 2048.0],\n    \"0.28\": [512.0, 1856.0],\n    \"0.32\": [576.0, 1792.0],\n    \"0.33\": [576.0, 1728.0],\n    \"0.35\": [576.0, 1664.0],\n    \"0.4\": [640.0, 1600.0],\n    \"0.42\": [640.0, 1536.0],\n    \"0.48\": [704.0, 1472.0],\n    \"0.5\": [704.0, 1408.0],\n    \"0.52\": [704.0, 1344.0],\n    \"0.57\": [768.0, 1344.0],\n    \"0.6\": [768.0, 1280.0],\n    \"0.68\": [832.0, 1216.0],\n    \"0.72\": [832.0, 1152.0],\n    \"0.78\": [896.0, 1152.0],\n    \"0.82\": [896.0, 1088.0],\n    \"0.88\": [960.0, 1088.0],\n    \"0.94\": [960.0, 1024.0],\n    \"1.0\": [1024.0, 1024.0],\n    \"1.07\": [1024.0, 960.0],\n    \"1.13\": [1088.0, 960.0],\n    \"1.21\": [1088.0, 896.0],\n    \"1.29\": [1152.0, 896.0],\n    \"1.38\": [1152.0, 832.0],\n    \"1.46\": [1216.0, 832.0],\n    \"1.67\": [1280.0, 768.0],\n    \"1.75\": [1344.0, 768.0],\n    \"2.0\": [1408.0, 704.0],\n    \"2.09\": [1472.0, 704.0],\n    \"2.4\": [1536.0, 640.0],\n    \"2.5\": [1600.0, 640.0],\n    \"3.0\": [1728.0, 576.0],\n    \"4.0\": [2048.0, 512.0],\n}\n\nASPECT_RATIO_512_BIN = {\n    \"0.25\": [256.0, 1024.0],\n    \"0.28\": [256.0, 928.0],\n    \"0.32\": [288.0, 896.0],\n    \"0.33\": [288.0, 864.0],\n    \"0.35\": [288.0, 832.0],\n    \"0.4\": [320.0, 800.0],\n    \"0.42\": [320.0, 768.0],\n    \"0.48\": [352.0, 736.0],\n    \"0.5\": [352.0, 704.0],\n    \"0.52\": [352.0, 672.0],\n    \"0.57\": [384.0, 672.0],\n    \"0.6\": [384.0, 640.0],\n    \"0.68\": [416.0, 608.0],\n    \"0.72\": [416.0, 576.0],\n    \"0.78\": [448.0, 576.0],\n    \"0.82\": [448.0, 544.0],\n    \"0.88\": [480.0, 544.0],\n    \"0.94\": [480.0, 512.0],\n    \"1.0\": [512.0, 512.0],\n    \"1.07\": [512.0, 480.0],\n    \"1.13\": [544.0, 480.0],\n    \"1.21\": [544.0, 448.0],\n    \"1.29\": [576.0, 448.0],\n    \"1.38\": [576.0, 416.0],\n    \"1.46\": [608.0, 416.0],\n    \"1.67\": [640.0, 384.0],\n    \"1.75\": [672.0, 384.0],\n    \"2.0\": [704.0, 352.0],\n    \"2.09\": [736.0, 352.0],\n    \"2.4\": [768.0, 320.0],\n    \"2.5\": [800.0, 320.0],\n    \"3.0\": [864.0, 288.0],\n    \"4.0\": [1024.0, 256.0],\n}\n\n\n# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.retrieve_timesteps\ndef retrieve_timesteps(\n    scheduler,\n    num_inference_steps: Optional[int] = None,\n    device: Optional[Union[str, torch.device]] = None,\n    timesteps: Optional[List[int]] = None,\n    **kwargs,\n):\n    \"\"\"\n    Calls the scheduler's `set_timesteps` method and retrieves timesteps from the scheduler after the call. Handles\n    custom timesteps. Any kwargs will be supplied to `scheduler.set_timesteps`.\n\n    Args:\n        scheduler (`SchedulerMixin`):\n            The scheduler to get timesteps from.\n        num_inference_steps (`int`):\n            The number of diffusion steps used when generating samples with a pre-trained model. If used,\n            `timesteps` must be `None`.\n        device (`str` or `torch.device`, *optional*):\n            The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.\n        timesteps (`List[int]`, *optional*):\n                Custom timesteps used to support arbitrary spacing between timesteps. If `None`, then the default\n                timestep spacing strategy of the scheduler is used. If `timesteps` is passed, `num_inference_steps`\n                must be `None`.\n\n    Returns:\n        `Tuple[torch.Tensor, int]`: A tuple where the first element is the timestep schedule from the scheduler and the\n        second element is the number of inference steps.\n    \"\"\"\n    if timesteps is not None:\n        accepts_timesteps = \"timesteps\" in set(inspect.signature(scheduler.set_timesteps).parameters.keys())\n        if not accepts_timesteps:\n            raise ValueError(\n                f\"The current scheduler class {scheduler.__class__}'s `set_timesteps` does not support custom\"\n                f\" timestep schedules. Please check whether you are using the correct scheduler.\"\n            )\n        scheduler.set_timesteps(timesteps=timesteps, device=device, **kwargs)\n        timesteps = scheduler.timesteps\n        num_inference_steps = len(timesteps)\n    else:\n        scheduler.set_timesteps(num_inference_steps, device=device, **kwargs)\n        timesteps = scheduler.timesteps\n    return timesteps, num_inference_steps\n\n\n# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.retrieve_latents\ndef retrieve_latents(\n    encoder_output: torch.Tensor, generator: Optional[torch.Generator] = None, sample_mode: str = \"sample\"\n):\n    if hasattr(encoder_output, \"latent_dist\") and sample_mode == \"sample\":\n        return encoder_output.latent_dist.sample(generator)\n    elif hasattr(encoder_output, \"latent_dist\") and sample_mode == \"argmax\":\n        return encoder_output.latent_dist.mode()\n    elif hasattr(encoder_output, \"latents\"):\n        return encoder_output.latents\n    else:\n        raise AttributeError(\"Could not access latents of provided encoder_output\")\n\n\nclass PixArtAlphaInpaintPipeline(DiffusionPipeline):\n    r\"\"\"\n    Pipeline for text-to-image generation using PixArt-Alpha.\n\n    This model inherits from [`DiffusionPipeline`]. Check the superclass documentation for the generic methods the\n    library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)\n\n    Args:\n        vae ([`AutoencoderKL`]):\n            Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations.\n        text_encoder ([`T5EncoderModel`]):\n            Frozen text-encoder. PixArt-Alpha uses\n            [T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5EncoderModel), specifically the\n            [t5-v1_1-xxl](https://huggingface.co/PixArt-alpha/PixArt-alpha/tree/main/t5-v1_1-xxl) variant.\n        tokenizer (`T5Tokenizer`):\n            Tokenizer of class\n            [T5Tokenizer](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Tokenizer).\n        transformer ([`Transformer2DModel`]):\n            A text conditioned `Transformer2DModel` to denoise the encoded image latents.\n        scheduler ([`SchedulerMixin`]):\n            A scheduler to be used in combination with `transformer` to denoise the encoded image latents.\n    \"\"\"\n\n    bad_punct_regex = re.compile(\n        r\"[\"\n        + \"#®•©™&@·º½¾¿¡§~\"\n        + r\"\\)\"\n        + r\"\\(\"\n        + r\"\\]\"\n        + r\"\\[\"\n        + r\"\\}\"\n        + r\"\\{\"\n        + r\"\\|\"\n        + \"\\\\\"\n        + r\"\\/\"\n        + r\"\\*\"\n        + r\"]{1,}\"\n    )  # noqa\n\n    _optional_components = [\"tokenizer\", \"text_encoder\"]\n    model_cpu_offload_seq = \"text_encoder->transformer->vae\"\n\n    def __init__(\n        self,\n        tokenizer: T5Tokenizer,\n        text_encoder: T5EncoderModel,\n        vae: AutoencoderKL,\n        transformer: Transformer2DModel,\n        scheduler: DPMSolverMultistepScheduler,\n    ):\n        super().__init__()\n\n        self.register_modules(\n            tokenizer=tokenizer, text_encoder=text_encoder, vae=vae, transformer=transformer, scheduler=scheduler\n        )\n\n        self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1)\n        self.image_processor = PixArtImageProcessor(vae_scale_factor=self.vae_scale_factor)\n        self.mask_processor = VaeImageProcessor(\n            vae_scale_factor=self.vae_scale_factor, do_normalize=False, do_binarize=True, do_convert_grayscale=True\n        )\n\n    # Adapted from https://github.com/PixArt-alpha/PixArt-alpha/blob/master/diffusion/model/utils.py\n    def mask_text_embeddings(self, emb, mask):\n        if emb.shape[0] == 1:\n            keep_index = mask.sum().item()\n            return emb[:, :, :keep_index, :], keep_index\n        else:\n            masked_feature = emb * mask[:, None, :, None]\n            return masked_feature, emb.shape[2]\n\n    # Adapted from diffusers.pipelines.deepfloyd_if.pipeline_if.encode_prompt\n    def encode_prompt(\n        self,\n        prompt: Union[str, List[str]],\n        do_classifier_free_guidance: bool = True,\n        negative_prompt: str = \"\",\n        num_images_per_prompt: int = 1,\n        device: Optional[torch.device] = None,\n        prompt_embeds: Optional[torch.FloatTensor] = None,\n        negative_prompt_embeds: Optional[torch.FloatTensor] = None,\n        prompt_attention_mask: Optional[torch.FloatTensor] = None,\n        negative_prompt_attention_mask: Optional[torch.FloatTensor] = None,\n        clean_caption: bool = False,\n        **kwargs,\n    ):\n        r\"\"\"\n        Encodes the prompt into text encoder hidden states.\n\n        Args:\n            prompt (`str` or `List[str]`, *optional*):\n                prompt to be encoded\n            negative_prompt (`str` or `List[str]`, *optional*):\n                The prompt not to guide the image generation. If not defined, one has to pass `negative_prompt_embeds`\n                instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is less than `1`). For\n                PixArt-Alpha, this should be \"\".\n            do_classifier_free_guidance (`bool`, *optional*, defaults to `True`):\n                whether to use classifier free guidance or not\n            num_images_per_prompt (`int`, *optional*, defaults to 1):\n                number of images that should be generated per prompt\n            device: (`torch.device`, *optional*):\n                torch device to place the resulting embeddings on\n            prompt_embeds (`torch.FloatTensor`, *optional*):\n                Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not\n                provided, text embeddings will be generated from `prompt` input argument.\n            negative_prompt_embeds (`torch.FloatTensor`, *optional*):\n                Pre-generated negative text embeddings. For PixArt-Alpha, it's should be the embeddings of the \"\"\n                string.\n            clean_caption (bool, defaults to `False`):\n                If `True`, the function will preprocess and clean the provided caption before encoding.\n        \"\"\"\n\n        if \"mask_feature\" in kwargs:\n            deprecation_message = \"The use of `mask_feature` is deprecated. It is no longer used in any computation and that doesn't affect the end results. It will be removed in a future version.\"\n            deprecate(\"mask_feature\", \"1.0.0\", deprecation_message, standard_warn=False)\n\n        if device is None:\n            device = self._execution_device\n\n        if prompt is not None and isinstance(prompt, str):\n            batch_size = 1\n        elif prompt is not None and isinstance(prompt, list):\n            batch_size = len(prompt)\n        else:\n            batch_size = prompt_embeds.shape[0]\n\n        # See Section 3.1. of the paper.\n        max_length = 120\n\n        if prompt_embeds is None:\n            prompt = self._text_preprocessing(prompt, clean_caption=clean_caption)\n            text_inputs = self.tokenizer(\n                prompt,\n                padding=\"max_length\",\n                max_length=max_length,\n                truncation=True,\n                add_special_tokens=True,\n                return_tensors=\"pt\",\n            )\n            text_input_ids = text_inputs.input_ids\n            untruncated_ids = self.tokenizer(prompt, padding=\"longest\", return_tensors=\"pt\").input_ids\n\n            if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not torch.equal(\n                text_input_ids, untruncated_ids\n            ):\n                removed_text = self.tokenizer.batch_decode(untruncated_ids[:, max_length - 1 : -1])\n                logger.warning(\n                    \"The following part of your input was truncated because CLIP can only handle sequences up to\"\n                    f\" {max_length} tokens: {removed_text}\"\n                )\n\n            prompt_attention_mask = text_inputs.attention_mask\n            prompt_attention_mask = prompt_attention_mask.to(device)\n\n            prompt_embeds = self.text_encoder(text_input_ids.to(device), attention_mask=prompt_attention_mask)\n            prompt_embeds = prompt_embeds[0]\n\n        if self.text_encoder is not None:\n            dtype = self.text_encoder.dtype\n        elif self.transformer is not None:\n            dtype = self.transformer.dtype\n        else:\n            dtype = None\n\n        prompt_embeds = prompt_embeds.to(dtype=dtype, device=device)\n\n        bs_embed, seq_len, _ = prompt_embeds.shape\n        # duplicate text embeddings and attention mask for each generation per prompt, using mps friendly method\n        prompt_embeds = prompt_embeds.repeat(1, num_images_per_prompt, 1)\n        prompt_embeds = prompt_embeds.view(bs_embed * num_images_per_prompt, seq_len, -1)\n        prompt_attention_mask = prompt_attention_mask.view(bs_embed, -1)\n        prompt_attention_mask = prompt_attention_mask.repeat(num_images_per_prompt, 1)\n\n        # get unconditional embeddings for classifier free guidance\n        if do_classifier_free_guidance and negative_prompt_embeds is None:\n            uncond_tokens = [negative_prompt] * batch_size\n            uncond_tokens = self._text_preprocessing(uncond_tokens, clean_caption=clean_caption)\n            max_length = prompt_embeds.shape[1]\n            uncond_input = self.tokenizer(\n                uncond_tokens,\n                padding=\"max_length\",\n                max_length=max_length,\n                truncation=True,\n                return_attention_mask=True,\n                add_special_tokens=True,\n                return_tensors=\"pt\",\n            )\n            negative_prompt_attention_mask = uncond_input.attention_mask\n            negative_prompt_attention_mask = negative_prompt_attention_mask.to(device)\n\n            negative_prompt_embeds = self.text_encoder(\n                uncond_input.input_ids.to(device), attention_mask=negative_prompt_attention_mask\n            )\n            negative_prompt_embeds = negative_prompt_embeds[0]\n\n        if do_classifier_free_guidance:\n            # duplicate unconditional embeddings for each generation per prompt, using mps friendly method\n            seq_len = negative_prompt_embeds.shape[1]\n\n            negative_prompt_embeds = negative_prompt_embeds.to(dtype=dtype, device=device)\n\n            negative_prompt_embeds = negative_prompt_embeds.repeat(1, num_images_per_prompt, 1)\n            negative_prompt_embeds = negative_prompt_embeds.view(batch_size * num_images_per_prompt, seq_len, -1)\n\n            negative_prompt_attention_mask = negative_prompt_attention_mask.view(bs_embed, -1)\n            negative_prompt_attention_mask = negative_prompt_attention_mask.repeat(num_images_per_prompt, 1)\n        else:\n            negative_prompt_embeds = None\n            negative_prompt_attention_mask = None\n\n        return prompt_embeds, prompt_attention_mask, negative_prompt_embeds, negative_prompt_attention_mask\n\n    # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_extra_step_kwargs\n    def prepare_extra_step_kwargs(self, generator, eta):\n        # prepare extra kwargs for the scheduler step, since not all schedulers have the same signature\n        # eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.\n        # eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502\n        # and should be between [0, 1]\n\n        accepts_eta = \"eta\" in set(inspect.signature(self.scheduler.step).parameters.keys())\n        extra_step_kwargs = {}\n        if accepts_eta:\n            extra_step_kwargs[\"eta\"] = eta\n\n        # check if the scheduler accepts generator\n        accepts_generator = \"generator\" in set(inspect.signature(self.scheduler.step).parameters.keys())\n        if accepts_generator:\n            extra_step_kwargs[\"generator\"] = generator\n        return extra_step_kwargs\n\n    def check_inputs(\n        self,\n        prompt,\n        height,\n        width,\n        negative_prompt,\n        callback_steps,\n        prompt_embeds=None,\n        negative_prompt_embeds=None,\n        prompt_attention_mask=None,\n        negative_prompt_attention_mask=None,\n    ):\n        if height % 8 != 0 or width % 8 != 0:\n            raise ValueError(f\"`height` and `width` have to be divisible by 8 but are {height} and {width}.\")\n\n        if (callback_steps is None) or (\n            callback_steps is not None and (not isinstance(callback_steps, int) or callback_steps <= 0)\n        ):\n            raise ValueError(\n                f\"`callback_steps` has to be a positive integer but is {callback_steps} of type\"\n                f\" {type(callback_steps)}.\"\n            )\n\n        if prompt is not None and prompt_embeds is not None:\n            raise ValueError(\n                f\"Cannot forward both `prompt`: {prompt} and `prompt_embeds`: {prompt_embeds}. Please make sure to\"\n                \" only forward one of the two.\"\n            )\n        elif prompt is None and prompt_embeds is None:\n            raise ValueError(\n                \"Provide either `prompt` or `prompt_embeds`. Cannot leave both `prompt` and `prompt_embeds` undefined.\"\n            )\n        elif prompt is not None and (not isinstance(prompt, str) and not isinstance(prompt, list)):\n            raise ValueError(f\"`prompt` has to be of type `str` or `list` but is {type(prompt)}\")\n\n        if prompt is not None and negative_prompt_embeds is not None:\n            raise ValueError(\n                f\"Cannot forward both `prompt`: {prompt} and `negative_prompt_embeds`:\"\n                f\" {negative_prompt_embeds}. Please make sure to only forward one of the two.\"\n            )\n\n        if negative_prompt is not None and negative_prompt_embeds is not None:\n            raise ValueError(\n                f\"Cannot forward both `negative_prompt`: {negative_prompt} and `negative_prompt_embeds`:\"\n                f\" {negative_prompt_embeds}. Please make sure to only forward one of the two.\"\n            )\n\n        if prompt_embeds is not None and prompt_attention_mask is None:\n            raise ValueError(\"Must provide `prompt_attention_mask` when specifying `prompt_embeds`.\")\n\n        if negative_prompt_embeds is not None and negative_prompt_attention_mask is None:\n            raise ValueError(\"Must provide `negative_prompt_attention_mask` when specifying `negative_prompt_embeds`.\")\n\n        if prompt_embeds is not None and negative_prompt_embeds is not None:\n            if prompt_embeds.shape != negative_prompt_embeds.shape:\n                raise ValueError(\n                    \"`prompt_embeds` and `negative_prompt_embeds` must have the same shape when passed directly, but\"\n                    f\" got: `prompt_embeds` {prompt_embeds.shape} != `negative_prompt_embeds`\"\n                    f\" {negative_prompt_embeds.shape}.\"\n                )\n            if prompt_attention_mask.shape != negative_prompt_attention_mask.shape:\n                raise ValueError(\n                    \"`prompt_attention_mask` and `negative_prompt_attention_mask` must have the same shape when passed directly, but\"\n                    f\" got: `prompt_attention_mask` {prompt_attention_mask.shape} != `negative_prompt_attention_mask`\"\n                    f\" {negative_prompt_attention_mask.shape}.\"\n                )\n\n    # Copied from diffusers.pipelines.deepfloyd_if.pipeline_if.IFPipeline._text_preprocessing\n    def _text_preprocessing(self, text, clean_caption=False):\n        if clean_caption and not is_bs4_available():\n            logger.warn(BACKENDS_MAPPING[\"bs4\"][-1].format(\"Setting `clean_caption=True`\"))\n            logger.warn(\"Setting `clean_caption` to False...\")\n            clean_caption = False\n\n        if clean_caption and not is_ftfy_available():\n            logger.warn(BACKENDS_MAPPING[\"ftfy\"][-1].format(\"Setting `clean_caption=True`\"))\n            logger.warn(\"Setting `clean_caption` to False...\")\n            clean_caption = False\n\n        if not isinstance(text, (tuple, list)):\n            text = [text]\n\n        def process(text: str):\n            if clean_caption:\n                text = self._clean_caption(text)\n                text = self._clean_caption(text)\n            else:\n                text = text.lower().strip()\n            return text\n\n        return [process(t) for t in text]\n\n    # Copied from diffusers.pipelines.deepfloyd_if.pipeline_if.IFPipeline._clean_caption\n    def _clean_caption(self, caption):\n        caption = str(caption)\n        caption = ul.unquote_plus(caption)\n        caption = caption.strip().lower()\n        caption = re.sub(\"<person>\", \"person\", caption)\n        # urls:\n        caption = re.sub(\n            r\"\\b((?:https?:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))\",\n            # noqa\n            \"\",\n            caption,\n        )  # regex for urls\n        caption = re.sub(\n            r\"\\b((?:www:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))\",\n            # noqa\n            \"\",\n            caption,\n        )  # regex for urls\n        # html:\n        caption = BeautifulSoup(caption, features=\"html.parser\").text\n\n        # @<nickname>\n        caption = re.sub(r\"@[\\w\\d]+\\b\", \"\", caption)\n\n        # 31C0—31EF CJK Strokes\n        # 31F0—31FF Katakana Phonetic Extensions\n        # 3200—32FF Enclosed CJK Letters and Months\n        # 3300—33FF CJK Compatibility\n        # 3400—4DBF CJK Unified Ideographs Extension A\n        # 4DC0—4DFF Yijing Hexagram Symbols\n        # 4E00—9FFF CJK Unified Ideographs\n        caption = re.sub(r\"[\\u31c0-\\u31ef]+\", \"\", caption)\n        caption = re.sub(r\"[\\u31f0-\\u31ff]+\", \"\", caption)\n        caption = re.sub(r\"[\\u3200-\\u32ff]+\", \"\", caption)\n        caption = re.sub(r\"[\\u3300-\\u33ff]+\", \"\", caption)\n        caption = re.sub(r\"[\\u3400-\\u4dbf]+\", \"\", caption)\n        caption = re.sub(r\"[\\u4dc0-\\u4dff]+\", \"\", caption)\n        caption = re.sub(r\"[\\u4e00-\\u9fff]+\", \"\", caption)\n        #######################################################\n\n        # все виды тире / all types of dash --> \"-\"\n        caption = re.sub(\n            r\"[\\u002D\\u058A\\u05BE\\u1400\\u1806\\u2010-\\u2015\\u2E17\\u2E1A\\u2E3A\\u2E3B\\u2E40\\u301C\\u3030\\u30A0\\uFE31\\uFE32\\uFE58\\uFE63\\uFF0D]+\",\n            # noqa\n            \"-\",\n            caption,\n        )\n\n        # кавычки к одному стандарту\n        caption = re.sub(r\"[`´«»“”¨]\", '\"', caption)\n        caption = re.sub(r\"[‘’]\", \"'\", caption)\n\n        # &quot;\n        caption = re.sub(r\"&quot;?\", \"\", caption)\n        # &amp\n        caption = re.sub(r\"&amp\", \"\", caption)\n\n        # ip adresses:\n        caption = re.sub(r\"\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\", \" \", caption)\n\n        # article ids:\n        caption = re.sub(r\"\\d:\\d\\d\\s+$\", \"\", caption)\n\n        # \\n\n        caption = re.sub(r\"\\\\n\", \" \", caption)\n\n        # \"#123\"\n        caption = re.sub(r\"#\\d{1,3}\\b\", \"\", caption)\n        # \"#12345..\"\n        caption = re.sub(r\"#\\d{5,}\\b\", \"\", caption)\n        # \"123456..\"\n        caption = re.sub(r\"\\b\\d{6,}\\b\", \"\", caption)\n        # filenames:\n        caption = re.sub(r\"[\\S]+\\.(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)\", \"\", caption)\n\n        #\n        caption = re.sub(r\"[\\\"\\']{2,}\", r'\"', caption)  # \"\"\"AUSVERKAUFT\"\"\"\n        caption = re.sub(r\"[\\.]{2,}\", r\" \", caption)  # \"\"\"AUSVERKAUFT\"\"\"\n\n        caption = re.sub(self.bad_punct_regex, r\" \", caption)  # ***AUSVERKAUFT***, #AUSVERKAUFT\n        caption = re.sub(r\"\\s+\\.\\s+\", r\" \", caption)  # \" . \"\n\n        # this-is-my-cute-cat / this_is_my_cute_cat\n        regex2 = re.compile(r\"(?:\\-|\\_)\")\n        if len(re.findall(regex2, caption)) > 3:\n            caption = re.sub(regex2, \" \", caption)\n\n        caption = ftfy.fix_text(caption)\n        caption = html.unescape(html.unescape(caption))\n\n        caption = re.sub(r\"\\b[a-zA-Z]{1,3}\\d{3,15}\\b\", \"\", caption)  # jc6640\n        caption = re.sub(r\"\\b[a-zA-Z]+\\d+[a-zA-Z]+\\b\", \"\", caption)  # jc6640vc\n        caption = re.sub(r\"\\b\\d+[a-zA-Z]+\\d+\\b\", \"\", caption)  # 6640vc231\n\n        caption = re.sub(r\"(worldwide\\s+)?(free\\s+)?shipping\", \"\", caption)\n        caption = re.sub(r\"(free\\s)?download(\\sfree)?\", \"\", caption)\n        caption = re.sub(r\"\\bclick\\b\\s(?:for|on)\\s\\w+\", \"\", caption)\n        caption = re.sub(r\"\\b(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)(\\simage[s]?)?\", \"\", caption)\n        caption = re.sub(r\"\\bpage\\s+\\d+\\b\", \"\", caption)\n\n        caption = re.sub(r\"\\b\\d*[a-zA-Z]+\\d+[a-zA-Z]+\\d+[a-zA-Z\\d]*\\b\", r\" \", caption)  # j2d1a2a...\n\n        caption = re.sub(r\"\\b\\d+\\.?\\d*[xх×]\\d+\\.?\\d*\\b\", \"\", caption)\n\n        caption = re.sub(r\"\\b\\s+\\:\\s+\", r\": \", caption)\n        caption = re.sub(r\"(\\D[,\\./])\\b\", r\"\\1 \", caption)\n        caption = re.sub(r\"\\s+\", \" \", caption)\n\n        caption.strip()\n\n        caption = re.sub(r\"^[\\\"\\']([\\w\\W]+)[\\\"\\']$\", r\"\\1\", caption)\n        caption = re.sub(r\"^[\\'\\_,\\-\\:;]\", r\"\", caption)\n        caption = re.sub(r\"[\\'\\_,\\-\\:\\-\\+]$\", r\"\", caption)\n        caption = re.sub(r\"^\\.\\S+$\", \"\", caption)\n\n        return caption.strip()\n\n    # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_latents\n    def prepare_latents(\n        self,\n        batch_size,\n        num_channels_latents,\n        height,\n        width,\n        dtype,\n        device,\n        generator,\n        latents=None,\n        image=None,\n        timestep=None,\n        is_strength_max=True,\n        return_image_latents=True,\n    ):\n        shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor)\n        if isinstance(generator, list) and len(generator) != batch_size:\n            raise ValueError(\n                f\"You have passed a list of generators of length {len(generator)}, but requested an effective batch\"\n                f\" size of {batch_size}. Make sure the batch size matches the length of the generators.\"\n            )\n\n        if (image is None or timestep is None) and not is_strength_max:\n            raise ValueError(\n                \"Since strength < 1. initial latents are to be initialised as a combination of Image + Noise.\"\n                \"However, either the image or the noise timestep has not been provided.\"\n            )\n\n        if return_image_latents or (latents is None and not is_strength_max):\n            image = image.to(device=device, dtype=dtype)\n\n            if image.shape[1] == 4:\n                image_latents = image\n            else:\n                image_latents = self._encode_vae_image(image=image, generator=generator)\n            image_latents = image_latents.repeat(batch_size // image_latents.shape[0], 1, 1, 1)\n\n        if latents is None:\n            noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype)\n            # if strength is 1. then initialise the latents to noise, else initial to image + noise\n            latents = noise if is_strength_max else self.scheduler.add_noise(image_latents, noise, timestep)\n            # if pure noise then scale the initial latents by the  Scheduler's init sigma\n            latents = latents * self.scheduler.init_noise_sigma if is_strength_max else latents\n        else:\n            noise = latents.to(device)\n            latents = noise * self.scheduler.init_noise_sigma\n\n        # scale the initial noise by the standard deviation required by the scheduler\n        latents = latents * self.scheduler.init_noise_sigma\n        return latents, noise, image_latents\n\n    def _encode_vae_image(self, image: torch.Tensor, generator: torch.Generator):\n        if isinstance(generator, list):\n            image_latents = [\n                retrieve_latents(self.vae.encode(image[i : i + 1]), generator=generator[i])\n                for i in range(image.shape[0])\n            ]\n            image_latents = torch.cat(image_latents, dim=0)\n        else:\n            image_latents = retrieve_latents(self.vae.encode(image), generator=generator)\n\n        image_latents = self.vae.config.scaling_factor * image_latents\n\n        return image_latents\n\n    def prepare_mask_latents(\n        self, mask, batch_size, height, width, dtype, device, generator, do_classifier_free_guidance\n    ):\n        # resize the mask to latents shape as we concatenate the mask to the latents\n        # we do that before converting to dtype to avoid breaking in case we're using cpu_offload\n        # and half precision\n        mask = torch.nn.functional.interpolate(\n            mask, size=(height // self.vae_scale_factor, width // self.vae_scale_factor)\n        )\n        mask = mask.to(device=device, dtype=dtype)\n\n        if mask.shape[0] < batch_size:\n            if not batch_size % mask.shape[0] == 0:\n                raise ValueError(\n                    \"The passed mask and the required batch size don't match. Masks are supposed to be duplicated to\"\n                    f\" a total batch size of {batch_size}, but {mask.shape[0]} masks were passed. Make sure the number\"\n                    \" of masks that you pass is divisible by the total requested batch size.\"\n                )\n            mask = mask.repeat(batch_size // mask.shape[0], 1, 1, 1)\n\n        mask = torch.cat([mask] * 2) if do_classifier_free_guidance else mask\n\n        return mask\n\n    # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.StableDiffusionImg2ImgPipeline.get_timesteps\n    def get_timesteps(self, num_inference_steps, strength, device):\n        # get the original timestep using init_timestep\n        init_timestep = min(int(num_inference_steps * strength), num_inference_steps)\n\n        t_start = max(num_inference_steps - init_timestep, 0)\n        timesteps = self.scheduler.timesteps[t_start * self.scheduler.order :]\n\n        return timesteps, num_inference_steps - t_start\n\n    @torch.no_grad()\n    @replace_example_docstring(EXAMPLE_DOC_STRING)\n    def __call__(\n        self,\n        prompt: Union[str, List[str]] = None,\n        image: PipelineImageInput = None,\n        mask_image: PipelineImageInput = None,\n        strength: float = 1.0,\n        negative_prompt: str = \"\",\n        num_inference_steps: int = 20,\n        timesteps: List[int] = None,\n        guidance_scale: float = 4.5,\n        num_images_per_prompt: Optional[int] = 1,\n        height: Optional[int] = None,\n        width: Optional[int] = None,\n        eta: float = 0.0,\n        generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,\n        latents: Optional[torch.FloatTensor] = None,\n        prompt_embeds: Optional[torch.FloatTensor] = None,\n        prompt_attention_mask: Optional[torch.FloatTensor] = None,\n        negative_prompt_embeds: Optional[torch.FloatTensor] = None,\n        negative_prompt_attention_mask: Optional[torch.FloatTensor] = None,\n        output_type: Optional[str] = \"pil\",\n        return_dict: bool = True,\n        callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None,\n        callback_steps: int = 1,\n        clean_caption: bool = True,\n        use_resolution_binning: bool = True,\n        **kwargs,\n    ) -> Union[ImagePipelineOutput, Tuple]:\n        \"\"\"\n        Function invoked when calling the pipeline for generation.\n\n        Args:\n            prompt (`str` or `List[str]`, *optional*):\n                The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`.\n                instead.\n            image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):\n                `Image`, numpy array or tensor representing an image batch to be inpainted (which parts of the image to\n                be masked out with `mask_image` and repainted according to `prompt`). For both numpy array and pytorch\n                tensor, the expected value range is between `[0, 1]` If it's a tensor or a list or tensors, the\n                expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a list of arrays, the\n                expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image latents as `image`, but\n                if passing latents directly it is not encoded again.\n            mask_image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):\n                `Image`, numpy array or tensor representing an image batch to mask `image`. White pixels in the mask\n                are repainted while black pixels are preserved. If `mask_image` is a PIL image, it is converted to a\n                single channel (luminance) before use. If it's a numpy array or pytorch tensor, it should contain one\n                color channel (L) instead of 3, so the expected shape for pytorch tensor would be `(B, 1, H, W)`, `(B,\n                H, W)`, `(1, H, W)`, `(H, W)`. And for numpy array would be for `(B, H, W, 1)`, `(B, H, W)`, `(H, W,\n                1)`, or `(H, W)`.\n            negative_prompt (`str` or `List[str]`, *optional*):\n                The prompt or prompts not to guide the image generation. If not defined, one has to pass\n                `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is\n                less than `1`).\n            num_inference_steps (`int`, *optional*, defaults to 100):\n                The number of denoising steps. More denoising steps usually lead to a higher quality image at the\n                expense of slower inference.\n            timesteps (`List[int]`, *optional*):\n                Custom timesteps to use for the denoising process. If not defined, equal spaced `num_inference_steps`\n                timesteps are used. Must be in descending order.\n            guidance_scale (`float`, *optional*, defaults to 4.5):\n                Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).\n                `guidance_scale` is defined as `w` of equation 2. of [Imagen\n                Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale >\n                1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`,\n                usually at the expense of lower image quality.\n            num_images_per_prompt (`int`, *optional*, defaults to 1):\n                The number of images to generate per prompt.\n            height (`int`, *optional*, defaults to self.unet.config.sample_size):\n                The height in pixels of the generated image.\n            width (`int`, *optional*, defaults to self.unet.config.sample_size):\n                The width in pixels of the generated image.\n            eta (`float`, *optional*, defaults to 0.0):\n                Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to\n                [`schedulers.DDIMScheduler`], will be ignored for others.\n            generator (`torch.Generator` or `List[torch.Generator]`, *optional*):\n                One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)\n                to make generation deterministic.\n            latents (`torch.FloatTensor`, *optional*):\n                Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image\n                generation. Can be used to tweak the same generation with different prompts. If not provided, a latents\n                tensor will ge generated by sampling using the supplied random `generator`.\n            prompt_embeds (`torch.FloatTensor`, *optional*):\n                Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not\n                provided, text embeddings will be generated from `prompt` input argument.\n            prompt_attention_mask (`torch.FloatTensor`, *optional*): Pre-generated attention mask for text embeddings.\n            negative_prompt_embeds (`torch.FloatTensor`, *optional*):\n                Pre-generated negative text embeddings. For PixArt-Alpha this negative prompt should be \"\". If not\n                provided, negative_prompt_embeds will be generated from `negative_prompt` input argument.\n            negative_prompt_attention_mask (`torch.FloatTensor`, *optional*):\n                Pre-generated attention mask for negative text embeddings.\n            output_type (`str`, *optional*, defaults to `\"pil\"`):\n                The output format of the generate image. Choose between\n                [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.\n            return_dict (`bool`, *optional*, defaults to `True`):\n                Whether or not to return a [`~pipelines.stable_diffusion.IFPipelineOutput`] instead of a plain tuple.\n            callback (`Callable`, *optional*):\n                A function that will be called every `callback_steps` steps during inference. The function will be\n                called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`.\n            callback_steps (`int`, *optional*, defaults to 1):\n                The frequency at which the `callback` function will be called. If not specified, the callback will be\n                called at every step.\n            clean_caption (`bool`, *optional*, defaults to `True`):\n                Whether or not to clean the caption before creating embeddings. Requires `beautifulsoup4` and `ftfy` to\n                be installed. If the dependencies are not installed, the embeddings will be created from the raw\n                prompt.\n            use_resolution_binning (`bool` defaults to `True`):\n                If set to `True`, the requested height and width are first mapped to the closest resolutions using\n                `ASPECT_RATIO_1024_BIN`. After the produced latents are decoded into images, they are resized back to\n                the requested resolution. Useful for generating non-square images.\n\n        Examples:\n\n        Returns:\n            [`~pipelines.ImagePipelineOutput`] or `tuple`:\n                If `return_dict` is `True`, [`~pipelines.ImagePipelineOutput`] is returned, otherwise a `tuple` is\n                returned where the first element is a list with the generated images\n        \"\"\"\n        if \"mask_feature\" in kwargs:\n            deprecation_message = \"The use of `mask_feature` is deprecated. It is no longer used in any computation and that doesn't affect the end results. It will be removed in a future version.\"\n            deprecate(\"mask_feature\", \"1.0.0\", deprecation_message, standard_warn=False)\n        # 1. Check inputs. Raise error if not correct\n        height = height or self.transformer.config.sample_size * self.vae_scale_factor\n        width = width or self.transformer.config.sample_size * self.vae_scale_factor\n        if use_resolution_binning:\n            aspect_ratio_bin = (\n                ASPECT_RATIO_1024_BIN if self.transformer.config.sample_size == 128 else ASPECT_RATIO_512_BIN\n            )\n            orig_height, orig_width = height, width\n            height, width = self.image_processor.classify_height_width_bin(height, width, ratios=aspect_ratio_bin)\n\n        self.check_inputs(\n            prompt,\n            height,\n            width,\n            negative_prompt,\n            callback_steps,\n            prompt_embeds,\n            negative_prompt_embeds,\n            prompt_attention_mask,\n            negative_prompt_attention_mask,\n        )\n\n        # 2. Default height and width to transformer\n        if prompt is not None and isinstance(prompt, str):\n            batch_size = 1\n        elif prompt is not None and isinstance(prompt, list):\n            batch_size = len(prompt)\n        else:\n            batch_size = prompt_embeds.shape[0]\n\n        device = self._execution_device\n\n        # here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)\n        # of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`\n        # corresponds to doing no classifier free guidance.\n        do_classifier_free_guidance = guidance_scale > 1.0\n\n        # 3. Encode input prompt\n        (\n            prompt_embeds,\n            prompt_attention_mask,\n            negative_prompt_embeds,\n            negative_prompt_attention_mask,\n        ) = self.encode_prompt(\n            prompt,\n            do_classifier_free_guidance,\n            negative_prompt=negative_prompt,\n            num_images_per_prompt=num_images_per_prompt,\n            device=device,\n            prompt_embeds=prompt_embeds,\n            negative_prompt_embeds=negative_prompt_embeds,\n            prompt_attention_mask=prompt_attention_mask,\n            negative_prompt_attention_mask=negative_prompt_attention_mask,\n            clean_caption=clean_caption,\n        )\n        if do_classifier_free_guidance:\n            prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)\n            prompt_attention_mask = torch.cat([negative_prompt_attention_mask, prompt_attention_mask], dim=0)\n\n        # 4. Prepare timesteps\n        timesteps, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, timesteps)\n        timesteps, num_inference_steps = self.get_timesteps(\n            num_inference_steps=num_inference_steps, strength=strength, device=device\n        )\n\n        # at which timestep to set the initial noise (n.b. 50% if strength is 0.5)\n        latent_timestep = timesteps[:1].repeat(batch_size * num_images_per_prompt)\n        # create a boolean to check if the strength is set to 1. if so then initialise the latents with pure noise\n        is_strength_max = strength == 1.0\n        init_image = self.image_processor.preprocess(image, height=height, width=width)\n        init_image = init_image.to(dtype=torch.float32)\n\n        # 5. Prepare latents.\n        latent_channels = self.transformer.config.in_channels\n        latents_outputs = self.prepare_latents(\n            batch_size * num_images_per_prompt,\n            latent_channels,\n            height,\n            width,\n            prompt_embeds.dtype,\n            device,\n            generator,\n            latents,\n            image=init_image,\n            timestep=latent_timestep,\n            is_strength_max=is_strength_max,\n        )\n        latents, noise, image_latents = latents_outputs\n\n        mask_condition = self.mask_processor.preprocess(mask_image, height=height, width=width)\n        mask = self.prepare_mask_latents(\n            mask_condition,\n            batch_size * num_images_per_prompt,\n            height,\n            width,\n            prompt_embeds.dtype,\n            device,\n            generator,\n            do_classifier_free_guidance,\n        )\n\n        # 6. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline\n        extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta)\n\n        # 6.1 Prepare micro-conditions.\n        added_cond_kwargs = {\"resolution\": None, \"aspect_ratio\": None}\n        if self.transformer.config.sample_size == 128:\n            resolution = torch.tensor([height, width]).repeat(batch_size * num_images_per_prompt, 1)\n            aspect_ratio = torch.tensor([float(height / width)]).repeat(batch_size * num_images_per_prompt, 1)\n            resolution = resolution.to(dtype=prompt_embeds.dtype, device=device)\n            aspect_ratio = aspect_ratio.to(dtype=prompt_embeds.dtype, device=device)\n            added_cond_kwargs = {\"resolution\": resolution, \"aspect_ratio\": aspect_ratio}\n\n        # 7. Denoising loop\n        num_warmup_steps = max(len(timesteps) - num_inference_steps * self.scheduler.order, 0)\n\n        with self.progress_bar(total=num_inference_steps) as progress_bar:\n            for i, t in enumerate(timesteps):\n                latent_model_input = torch.cat([latents] * 2) if do_classifier_free_guidance else latents\n                latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)\n\n                current_timestep = t\n                if not torch.is_tensor(current_timestep):\n                    # TODO: this requires sync between CPU and GPU. So try to pass timesteps as tensors if you can\n                    # This would be a good case for the `match` statement (Python 3.10+)\n                    is_mps = latent_model_input.device.type == \"mps\"\n                    if isinstance(current_timestep, float):\n                        dtype = torch.float32 if is_mps else torch.float64\n                    else:\n                        dtype = torch.int32 if is_mps else torch.int64\n                    current_timestep = torch.tensor([current_timestep], dtype=dtype, device=latent_model_input.device)\n                elif len(current_timestep.shape) == 0:\n                    current_timestep = current_timestep[None].to(latent_model_input.device)\n                # broadcast to batch dimension in a way that's compatible with ONNX/Core ML\n                current_timestep = current_timestep.expand(latent_model_input.shape[0])\n\n                # predict noise model_output\n                noise_pred = self.transformer(\n                    latent_model_input,\n                    encoder_hidden_states=prompt_embeds,\n                    encoder_attention_mask=prompt_attention_mask,\n                    timestep=current_timestep,\n                    added_cond_kwargs=added_cond_kwargs,\n                    return_dict=False,\n                )[0]\n\n                # perform guidance\n                if do_classifier_free_guidance:\n                    noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)\n                    noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)\n\n                # learned sigma\n                if self.transformer.config.out_channels // 2 == latent_channels:\n                    noise_pred = noise_pred.chunk(2, dim=1)[0]\n                else:\n                    noise_pred = noise_pred\n\n                # compute previous image: x_t -> x_t-1\n                latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs, return_dict=False)[0]\n\n                init_latents_proper = image_latents\n                if do_classifier_free_guidance:\n                    init_mask, _ = mask.chunk(2)\n                else:\n                    init_mask = mask\n\n                if i < len(timesteps) - 1:\n                    noise_timestep = timesteps[i + 1]\n                    init_latents_proper = self.scheduler.add_noise(\n                        init_latents_proper, noise, torch.tensor([noise_timestep])\n                    )\n\n                latents = (1 - init_mask) * init_latents_proper + init_mask * latents\n\n                # call the callback, if provided\n                if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0):\n                    progress_bar.update()\n                    if callback is not None and i % callback_steps == 0:\n                        step_idx = i // getattr(self.scheduler, \"order\", 1)\n                        callback(step_idx, t, latents)\n\n        if not output_type == \"latent\":\n            image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False)[0]\n            if use_resolution_binning:\n                image = self.image_processor.resize_and_crop_tensor(image, orig_width, orig_height)\n        else:\n            image = latents\n\n        if not output_type == \"latent\":\n            image = self.image_processor.postprocess(image, output_type=output_type)\n\n        # Offload all models\n        self.maybe_free_model_hooks()\n\n        if not return_dict:\n            return (image,)\n\n        return ImagePipelineOutput(images=image)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/scripts/pipeline_pixart_reference.py",
    "content": "# Copyright 2023 PixArt-Alpha Authors and The HuggingFace Team. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\nimport html\nimport inspect\nimport re\nimport urllib.parse as ul\nfrom typing import Callable, List, Optional, Tuple, Union\nfrom PIL import Image\n\nimport torch\nimport torch.nn.functional as F\nfrom transformers import T5EncoderModel, T5Tokenizer\n\nfrom diffusers.image_processor import VaeImageProcessor, PipelineImageInput\nfrom diffusers.models import AutoencoderKL, Transformer2DModel\nfrom diffusers.schedulers import DPMSolverMultistepScheduler\nfrom diffusers.utils import (\n    BACKENDS_MAPPING,\n    deprecate,\n    is_bs4_available,\n    is_ftfy_available,\n    logging,\n    replace_example_docstring,\n)\nfrom diffusers.utils.torch_utils import randn_tensor\nfrom diffusers.pipelines.pipeline_utils import DiffusionPipeline, ImagePipelineOutput\n\nlogger = logging.get_logger(__name__)  # pylint: disable=invalid-name\n\nif is_bs4_available():\n    from bs4 import BeautifulSoup\n\nif is_ftfy_available():\n    import ftfy\n\nEXAMPLE_DOC_STRING = \"\"\"\n    Examples:\n        ```py\n        >>> import PIL\n        >>> from io import BytesIO\n        >>> import requests\n        >>> import torch\n        \n        >>> from diffusers import PixArtAlphaReferencePipeline\n        \n        >>> def download_image(url):\n        ...     response = requests.get(url)\n        ...     return PIL.Image.open(BytesIO(response.content)).convert(\"RGB\")\n\n        >>> # You can replace the checkpoint id with \"PixArt-alpha/PixArt-XL-2-512x512\" too.\n        >>> pipe = PixArtAlphaReferencePipeline.from_pretrained(\"PixArt-alpha/PixArt-XL-2-1024-MS\", torch_dtype=torch.float16)\n        >>> pipe = pipe.to('cuda')\n        \n        >>> img_url = \"http://p1.qhimgs4.com/t01fef6f9d5e69335dd.jpg\"\n        >>> ref_image = download_image(img_url).crop((0, 0, 2160, 2160)).resize((1024, 1024))\n        >>> image_out = pipe(\n        ...           prompt='',\n        ...            height=1024,\n        ...            width=1024,\n        ...            image=ref_image,\n        ...            num_inference_steps=20,\n        ...            guidance_scale=4.0,\n        ...            ).images[0]\n        ```\n\"\"\"\n\nASPECT_RATIO_1024_BIN = {\n    \"0.25\": [512.0, 2048.0],\n    \"0.28\": [512.0, 1856.0],\n    \"0.32\": [576.0, 1792.0],\n    \"0.33\": [576.0, 1728.0],\n    \"0.35\": [576.0, 1664.0],\n    \"0.4\": [640.0, 1600.0],\n    \"0.42\": [640.0, 1536.0],\n    \"0.48\": [704.0, 1472.0],\n    \"0.5\": [704.0, 1408.0],\n    \"0.52\": [704.0, 1344.0],\n    \"0.57\": [768.0, 1344.0],\n    \"0.6\": [768.0, 1280.0],\n    \"0.68\": [832.0, 1216.0],\n    \"0.72\": [832.0, 1152.0],\n    \"0.78\": [896.0, 1152.0],\n    \"0.82\": [896.0, 1088.0],\n    \"0.88\": [960.0, 1088.0],\n    \"0.94\": [960.0, 1024.0],\n    \"1.0\": [1024.0, 1024.0],\n    \"1.07\": [1024.0, 960.0],\n    \"1.13\": [1088.0, 960.0],\n    \"1.21\": [1088.0, 896.0],\n    \"1.29\": [1152.0, 896.0],\n    \"1.38\": [1152.0, 832.0],\n    \"1.46\": [1216.0, 832.0],\n    \"1.67\": [1280.0, 768.0],\n    \"1.75\": [1344.0, 768.0],\n    \"2.0\": [1408.0, 704.0],\n    \"2.09\": [1472.0, 704.0],\n    \"2.4\": [1536.0, 640.0],\n    \"2.5\": [1600.0, 640.0],\n    \"3.0\": [1728.0, 576.0],\n    \"4.0\": [2048.0, 512.0],\n}\n\nASPECT_RATIO_512_BIN = {\n    \"0.25\": [256.0, 1024.0],\n    \"0.28\": [256.0, 928.0],\n    \"0.32\": [288.0, 896.0],\n    \"0.33\": [288.0, 864.0],\n    \"0.35\": [288.0, 832.0],\n    \"0.4\": [320.0, 800.0],\n    \"0.42\": [320.0, 768.0],\n    \"0.48\": [352.0, 736.0],\n    \"0.5\": [352.0, 704.0],\n    \"0.52\": [352.0, 672.0],\n    \"0.57\": [384.0, 672.0],\n    \"0.6\": [384.0, 640.0],\n    \"0.68\": [416.0, 608.0],\n    \"0.72\": [416.0, 576.0],\n    \"0.78\": [448.0, 576.0],\n    \"0.82\": [448.0, 544.0],\n    \"0.88\": [480.0, 544.0],\n    \"0.94\": [480.0, 512.0],\n    \"1.0\": [512.0, 512.0],\n    \"1.07\": [512.0, 480.0],\n    \"1.13\": [544.0, 480.0],\n    \"1.21\": [544.0, 448.0],\n    \"1.29\": [576.0, 448.0],\n    \"1.38\": [576.0, 416.0],\n    \"1.46\": [608.0, 416.0],\n    \"1.67\": [640.0, 384.0],\n    \"1.75\": [672.0, 384.0],\n    \"2.0\": [704.0, 352.0],\n    \"2.09\": [736.0, 352.0],\n    \"2.4\": [768.0, 320.0],\n    \"2.5\": [800.0, 320.0],\n    \"3.0\": [864.0, 288.0],\n    \"4.0\": [1024.0, 256.0],\n}\n\n\n# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.retrieve_timesteps\ndef retrieve_timesteps(\n        scheduler,\n        num_inference_steps: Optional[int] = None,\n        device: Optional[Union[str, torch.device]] = None,\n        timesteps: Optional[List[int]] = None,\n        **kwargs,\n):\n    \"\"\"\n    Calls the scheduler's `set_timesteps` method and retrieves timesteps from the scheduler after the call. Handles\n    custom timesteps. Any kwargs will be supplied to `scheduler.set_timesteps`.\n\n    Args:\n        scheduler (`SchedulerMixin`):\n            The scheduler to get timesteps from.\n        num_inference_steps (`int`):\n            The number of diffusion steps used when generating samples with a pre-trained model. If used,\n            `timesteps` must be `None`.\n        device (`str` or `torch.device`, *optional*):\n            The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.\n        timesteps (`List[int]`, *optional*):\n                Custom timesteps used to support arbitrary spacing between timesteps. If `None`, then the default\n                timestep spacing strategy of the scheduler is used. If `timesteps` is passed, `num_inference_steps`\n                must be `None`.\n\n    Returns:\n        `Tuple[torch.Tensor, int]`: A tuple where the first element is the timestep schedule from the scheduler and the\n        second element is the number of inference steps.\n    \"\"\"\n    if timesteps is not None:\n        accepts_timesteps = \"timesteps\" in set(inspect.signature(scheduler.set_timesteps).parameters.keys())\n        if not accepts_timesteps:\n            raise ValueError(\n                f\"The current scheduler class {scheduler.__class__}'s `set_timesteps` does not support custom\"\n                f\" timestep schedules. Please check whether you are using the correct scheduler.\"\n            )\n        scheduler.set_timesteps(timesteps=timesteps, device=device, **kwargs)\n        timesteps = scheduler.timesteps\n        num_inference_steps = len(timesteps)\n    else:\n        scheduler.set_timesteps(num_inference_steps, device=device, **kwargs)\n        timesteps = scheduler.timesteps\n    return timesteps, num_inference_steps\n\n\n# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.retrieve_latents\ndef retrieve_latents(\n        encoder_output: torch.Tensor, generator: Optional[torch.Generator] = None, sample_mode: str = \"sample\"\n):\n    if hasattr(encoder_output, \"latent_dist\") and sample_mode == \"sample\":\n        return encoder_output.latent_dist.sample(generator)\n    elif hasattr(encoder_output, \"latent_dist\") and sample_mode == \"argmax\":\n        return encoder_output.latent_dist.mode()\n    elif hasattr(encoder_output, \"latents\"):\n        return encoder_output.latents\n    else:\n        raise AttributeError(\"Could not access latents of provided encoder_output\")\n\n\nclass PixArtAlphaReferencePipeline(DiffusionPipeline):\n    r\"\"\"\n    Pipeline for image-to-image generation using PixArt-Alpha.\n\n    This model inherits from [`DiffusionPipeline`]. Check the superclass documentation for the generic methods the\n    library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)\n\n    Args:\n        vae ([`AutoencoderKL`]):\n            Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations.\n        text_encoder ([`T5EncoderModel`]):\n            Frozen text-encoder. PixArt-Alpha uses\n            [T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5EncoderModel), specifically the\n            [t5-v1_1-xxl](https://huggingface.co/PixArt-alpha/PixArt-alpha/tree/main/t5-v1_1-xxl) variant.\n        tokenizer (`T5Tokenizer`):\n            Tokenizer of class\n            [T5Tokenizer](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Tokenizer).\n        transformer ([`Transformer2DModel`]):\n            A text conditioned `Transformer2DModel` to denoise the encoded image latents.\n        scheduler ([`SchedulerMixin`]):\n            A scheduler to be used in combination with `transformer` to denoise the encoded image latents.\n    \"\"\"\n\n    bad_punct_regex = re.compile(\n        r\"[\"\n        + \"#®•©™&@·º½¾¿¡§~\"\n        + r\"\\)\"\n        + r\"\\(\"\n        + r\"\\]\"\n        + r\"\\[\"\n        + r\"\\}\"\n        + r\"\\{\"\n        + r\"\\|\"\n        + \"\\\\\"\n        + r\"\\/\"\n        + r\"\\*\"\n        + r\"]{1,}\"\n    )  # noqa\n\n    _optional_components = [\"tokenizer\", \"text_encoder\"]\n    model_cpu_offload_seq = \"text_encoder->transformer->vae\"\n\n    def __init__(\n            self,\n            tokenizer: T5Tokenizer,\n            text_encoder: T5EncoderModel,\n            vae: AutoencoderKL,\n            transformer: Transformer2DModel,\n            scheduler: DPMSolverMultistepScheduler,\n    ):\n        super().__init__()\n\n        self.register_modules(\n            tokenizer=tokenizer, text_encoder=text_encoder, vae=vae, transformer=transformer, scheduler=scheduler\n        )\n\n        self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1)\n        self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor)\n        self.mask_processor = VaeImageProcessor(\n            vae_scale_factor=self.vae_scale_factor, do_normalize=False, do_binarize=True, do_convert_grayscale=True\n        )\n\n    # Adapted from https://github.com/PixArt-alpha/PixArt-alpha/blob/master/diffusion/model/utils.py\n    def mask_text_embeddings(self, emb, mask):\n        if emb.shape[0] == 1:\n            keep_index = mask.sum().item()\n            return emb[:, :, :keep_index, :], keep_index\n        else:\n            masked_feature = emb * mask[:, None, :, None]\n            return masked_feature, emb.shape[2]\n\n    # Adapted from diffusers.pipelines.deepfloyd_if.pipeline_if.encode_prompt\n    def encode_prompt(\n            self,\n            prompt: Union[str, List[str]],\n            do_classifier_free_guidance: bool = True,\n            negative_prompt: str = \"\",\n            num_images_per_prompt: int = 1,\n            device: Optional[torch.device] = None,\n            prompt_embeds: Optional[torch.FloatTensor] = None,\n            negative_prompt_embeds: Optional[torch.FloatTensor] = None,\n            prompt_attention_mask: Optional[torch.FloatTensor] = None,\n            negative_prompt_attention_mask: Optional[torch.FloatTensor] = None,\n            clean_caption: bool = False,\n            **kwargs,\n    ):\n        r\"\"\"\n        Encodes the prompt into text encoder hidden states.\n\n        Args:\n            prompt (`str` or `List[str]`, *optional*):\n                prompt to be encoded\n            negative_prompt (`str` or `List[str]`, *optional*):\n                The prompt not to guide the image generation. If not defined, one has to pass `negative_prompt_embeds`\n                instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is less than `1`). For\n                PixArt-Alpha, this should be \"\".\n            do_classifier_free_guidance (`bool`, *optional*, defaults to `True`):\n                whether to use classifier free guidance or not\n            num_images_per_prompt (`int`, *optional*, defaults to 1):\n                number of images that should be generated per prompt\n            device: (`torch.device`, *optional*):\n                torch device to place the resulting embeddings on\n            prompt_embeds (`torch.FloatTensor`, *optional*):\n                Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not\n                provided, text embeddings will be generated from `prompt` input argument.\n            negative_prompt_embeds (`torch.FloatTensor`, *optional*):\n                Pre-generated negative text embeddings. For PixArt-Alpha, it's should be the embeddings of the \"\"\n                string.\n            clean_caption (bool, defaults to `False`):\n                If `True`, the function will preprocess and clean the provided caption before encoding.\n        \"\"\"\n\n        if \"mask_feature\" in kwargs:\n            deprecation_message = \"The use of `mask_feature` is deprecated. It is no longer used in any computation and that doesn't affect the end results. It will be removed in a future version.\"\n            deprecate(\"mask_feature\", \"1.0.0\", deprecation_message, standard_warn=False)\n\n        if device is None:\n            device = self._execution_device\n\n        if prompt is not None and isinstance(prompt, str):\n            batch_size = 1\n        elif prompt is not None and isinstance(prompt, list):\n            batch_size = len(prompt)\n        else:\n            batch_size = prompt_embeds.shape[0]\n\n        # See Section 3.1. of the paper.\n        max_length = 120\n\n        if prompt_embeds is None:\n            prompt = self._text_preprocessing(prompt, clean_caption=clean_caption)\n            text_inputs = self.tokenizer(\n                prompt,\n                padding=\"max_length\",\n                max_length=max_length,\n                truncation=True,\n                add_special_tokens=True,\n                return_tensors=\"pt\",\n            )\n            text_input_ids = text_inputs.input_ids\n            untruncated_ids = self.tokenizer(prompt, padding=\"longest\", return_tensors=\"pt\").input_ids\n\n            if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not torch.equal(\n                    text_input_ids, untruncated_ids\n            ):\n                removed_text = self.tokenizer.batch_decode(untruncated_ids[:, max_length - 1: -1])\n                logger.warning(\n                    \"The following part of your input was truncated because CLIP can only handle sequences up to\"\n                    f\" {max_length} tokens: {removed_text}\"\n                )\n\n            prompt_attention_mask = text_inputs.attention_mask\n            prompt_attention_mask = prompt_attention_mask.to(device)\n\n            prompt_embeds = self.text_encoder(text_input_ids.to(device), attention_mask=prompt_attention_mask)\n            prompt_embeds = prompt_embeds[0]\n\n        if self.text_encoder is not None:\n            dtype = self.text_encoder.dtype\n        elif self.transformer is not None:\n            dtype = self.transformer.dtype\n        else:\n            dtype = None\n\n        prompt_embeds = prompt_embeds.to(dtype=dtype, device=device)\n\n        bs_embed, seq_len, _ = prompt_embeds.shape\n        # duplicate text embeddings and attention mask for each generation per prompt, using mps friendly method\n        prompt_embeds = prompt_embeds.repeat(1, num_images_per_prompt, 1)\n        prompt_embeds = prompt_embeds.view(bs_embed * num_images_per_prompt, seq_len, -1)\n        prompt_attention_mask = prompt_attention_mask.view(bs_embed, -1)\n        prompt_attention_mask = prompt_attention_mask.repeat(num_images_per_prompt, 1)\n\n        # get unconditional embeddings for classifier free guidance\n        if do_classifier_free_guidance and negative_prompt_embeds is None:\n            uncond_tokens = [negative_prompt] * batch_size\n            uncond_tokens = self._text_preprocessing(uncond_tokens, clean_caption=clean_caption)\n            max_length = prompt_embeds.shape[1]\n            uncond_input = self.tokenizer(\n                uncond_tokens,\n                padding=\"max_length\",\n                max_length=max_length,\n                truncation=True,\n                return_attention_mask=True,\n                add_special_tokens=True,\n                return_tensors=\"pt\",\n            )\n            negative_prompt_attention_mask = uncond_input.attention_mask\n            negative_prompt_attention_mask = negative_prompt_attention_mask.to(device)\n\n            negative_prompt_embeds = self.text_encoder(\n                uncond_input.input_ids.to(device), attention_mask=negative_prompt_attention_mask\n            )\n            negative_prompt_embeds = negative_prompt_embeds[0]\n\n        if do_classifier_free_guidance:\n            # duplicate unconditional embeddings for each generation per prompt, using mps friendly method\n            seq_len = negative_prompt_embeds.shape[1]\n\n            negative_prompt_embeds = negative_prompt_embeds.to(dtype=dtype, device=device)\n\n            negative_prompt_embeds = negative_prompt_embeds.repeat(1, num_images_per_prompt, 1)\n            negative_prompt_embeds = negative_prompt_embeds.view(batch_size * num_images_per_prompt, seq_len, -1)\n\n            negative_prompt_attention_mask = negative_prompt_attention_mask.view(bs_embed, -1)\n            negative_prompt_attention_mask = negative_prompt_attention_mask.repeat(num_images_per_prompt, 1)\n        else:\n            negative_prompt_embeds = None\n            negative_prompt_attention_mask = None\n\n        return prompt_embeds, prompt_attention_mask, negative_prompt_embeds, negative_prompt_attention_mask\n\n    # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_extra_step_kwargs\n    def prepare_extra_step_kwargs(self, generator, eta):\n        # prepare extra kwargs for the scheduler step, since not all schedulers have the same signature\n        # eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.\n        # eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502\n        # and should be between [0, 1]\n\n        accepts_eta = \"eta\" in set(inspect.signature(self.scheduler.step).parameters.keys())\n        extra_step_kwargs = {}\n        if accepts_eta:\n            extra_step_kwargs[\"eta\"] = eta\n\n        # check if the scheduler accepts generator\n        accepts_generator = \"generator\" in set(inspect.signature(self.scheduler.step).parameters.keys())\n        if accepts_generator:\n            extra_step_kwargs[\"generator\"] = generator\n        return extra_step_kwargs\n\n    def check_inputs(\n            self,\n            prompt,\n            image,\n            height,\n            width,\n            negative_prompt,\n            callback_steps,\n            prompt_embeds=None,\n            negative_prompt_embeds=None,\n            prompt_attention_mask=None,\n            negative_prompt_attention_mask=None,\n    ):\n        if height % 8 != 0 or width % 8 != 0:\n            raise ValueError(f\"`height` and `width` have to be divisible by 8 but are {height} and {width}.\")\n\n        if (callback_steps is None) or (\n                callback_steps is not None and (not isinstance(callback_steps, int) or callback_steps <= 0)\n        ):\n            raise ValueError(\n                f\"`callback_steps` has to be a positive integer but is {callback_steps} of type\"\n                f\" {type(callback_steps)}.\"\n            )\n\n        if prompt is not None and prompt_embeds is not None:\n            raise ValueError(\n                f\"Cannot forward both `prompt`: {prompt} and `prompt_embeds`: {prompt_embeds}. Please make sure to\"\n                \" only forward one of the two.\"\n            )\n        elif prompt is None and prompt_embeds is None:\n            raise ValueError(\n                \"Provide either `prompt` or `prompt_embeds`. Cannot leave both `prompt` and `prompt_embeds` undefined.\"\n            )\n        elif prompt is not None and (not isinstance(prompt, str) and not isinstance(prompt, list)):\n            raise ValueError(f\"`prompt` has to be of type `str` or `list` but is {type(prompt)}\")\n\n        if prompt is not None and negative_prompt_embeds is not None:\n            raise ValueError(\n                f\"Cannot forward both `prompt`: {prompt} and `negative_prompt_embeds`:\"\n                f\" {negative_prompt_embeds}. Please make sure to only forward one of the two.\"\n            )\n\n        if negative_prompt is not None and negative_prompt_embeds is not None:\n            raise ValueError(\n                f\"Cannot forward both `negative_prompt`: {negative_prompt} and `negative_prompt_embeds`:\"\n                f\" {negative_prompt_embeds}. Please make sure to only forward one of the two.\"\n            )\n\n        if prompt_embeds is not None and prompt_attention_mask is None:\n            raise ValueError(\"Must provide `prompt_attention_mask` when specifying `prompt_embeds`.\")\n\n        if negative_prompt_embeds is not None and negative_prompt_attention_mask is None:\n            raise ValueError(\"Must provide `negative_prompt_attention_mask` when specifying `negative_prompt_embeds`.\")\n\n        if prompt_embeds is not None and negative_prompt_embeds is not None:\n            if prompt_embeds.shape != negative_prompt_embeds.shape:\n                raise ValueError(\n                    \"`prompt_embeds` and `negative_prompt_embeds` must have the same shape when passed directly, but\"\n                    f\" got: `prompt_embeds` {prompt_embeds.shape} != `negative_prompt_embeds`\"\n                    f\" {negative_prompt_embeds.shape}.\"\n                )\n            if prompt_attention_mask.shape != negative_prompt_attention_mask.shape:\n                raise ValueError(\n                    \"`prompt_attention_mask` and `negative_prompt_attention_mask` must have the same shape when passed directly, but\"\n                    f\" got: `prompt_attention_mask` {prompt_attention_mask.shape} != `negative_prompt_attention_mask`\"\n                    f\" {negative_prompt_attention_mask.shape}.\"\n                )\n\n        if image is None:\n            raise ValueError(\n                \"Provide `image`. Cannot leave `image` undefined.\"\n            )\n\n    # Copied from diffusers.pipelines.deepfloyd_if.pipeline_if.IFPipeline._text_preprocessing\n    def _text_preprocessing(self, text, clean_caption=False):\n        if clean_caption and not is_bs4_available():\n            logger.warn(BACKENDS_MAPPING[\"bs4\"][-1].format(\"Setting `clean_caption=True`\"))\n            logger.warn(\"Setting `clean_caption` to False...\")\n            clean_caption = False\n\n        if clean_caption and not is_ftfy_available():\n            logger.warn(BACKENDS_MAPPING[\"ftfy\"][-1].format(\"Setting `clean_caption=True`\"))\n            logger.warn(\"Setting `clean_caption` to False...\")\n            clean_caption = False\n\n        if not isinstance(text, (tuple, list)):\n            text = [text]\n\n        def process(text: str):\n            if clean_caption:\n                text = self._clean_caption(text)\n                text = self._clean_caption(text)\n            else:\n                text = text.lower().strip()\n            return text\n\n        return [process(t) for t in text]\n\n    # Copied from diffusers.pipelines.deepfloyd_if.pipeline_if.IFPipeline._clean_caption\n    def _clean_caption(self, caption):\n        caption = str(caption)\n        caption = ul.unquote_plus(caption)\n        caption = caption.strip().lower()\n        caption = re.sub(\"<person>\", \"person\", caption)\n        # urls:\n        caption = re.sub(\n            r\"\\b((?:https?:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))\",\n            # noqa\n            \"\",\n            caption,\n        )  # regex for urls\n        caption = re.sub(\n            r\"\\b((?:www:(?:\\/{1,3}|[a-zA-Z0-9%])|[a-zA-Z0-9.\\-]+[.](?:com|co|ru|net|org|edu|gov|it)[\\w/-]*\\b\\/?(?!@)))\",\n            # noqa\n            \"\",\n            caption,\n        )  # regex for urls\n        # html:\n        caption = BeautifulSoup(caption, features=\"html.parser\").text\n\n        # @<nickname>\n        caption = re.sub(r\"@[\\w\\d]+\\b\", \"\", caption)\n\n        # 31C0—31EF CJK Strokes\n        # 31F0—31FF Katakana Phonetic Extensions\n        # 3200—32FF Enclosed CJK Letters and Months\n        # 3300—33FF CJK Compatibility\n        # 3400—4DBF CJK Unified Ideographs Extension A\n        # 4DC0—4DFF Yijing Hexagram Symbols\n        # 4E00—9FFF CJK Unified Ideographs\n        caption = re.sub(r\"[\\u31c0-\\u31ef]+\", \"\", caption)\n        caption = re.sub(r\"[\\u31f0-\\u31ff]+\", \"\", caption)\n        caption = re.sub(r\"[\\u3200-\\u32ff]+\", \"\", caption)\n        caption = re.sub(r\"[\\u3300-\\u33ff]+\", \"\", caption)\n        caption = re.sub(r\"[\\u3400-\\u4dbf]+\", \"\", caption)\n        caption = re.sub(r\"[\\u4dc0-\\u4dff]+\", \"\", caption)\n        caption = re.sub(r\"[\\u4e00-\\u9fff]+\", \"\", caption)\n        #######################################################\n\n        # все виды тире / all types of dash --> \"-\"\n        caption = re.sub(\n            r\"[\\u002D\\u058A\\u05BE\\u1400\\u1806\\u2010-\\u2015\\u2E17\\u2E1A\\u2E3A\\u2E3B\\u2E40\\u301C\\u3030\\u30A0\\uFE31\\uFE32\\uFE58\\uFE63\\uFF0D]+\",\n            # noqa\n            \"-\",\n            caption,\n        )\n\n        # кавычки к одному стандарту\n        caption = re.sub(r\"[`´«»“”¨]\", '\"', caption)\n        caption = re.sub(r\"[‘’]\", \"'\", caption)\n\n        # &quot;\n        caption = re.sub(r\"&quot;?\", \"\", caption)\n        # &amp\n        caption = re.sub(r\"&amp\", \"\", caption)\n\n        # ip adresses:\n        caption = re.sub(r\"\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\", \" \", caption)\n\n        # article ids:\n        caption = re.sub(r\"\\d:\\d\\d\\s+$\", \"\", caption)\n\n        # \\n\n        caption = re.sub(r\"\\\\n\", \" \", caption)\n\n        # \"#123\"\n        caption = re.sub(r\"#\\d{1,3}\\b\", \"\", caption)\n        # \"#12345..\"\n        caption = re.sub(r\"#\\d{5,}\\b\", \"\", caption)\n        # \"123456..\"\n        caption = re.sub(r\"\\b\\d{6,}\\b\", \"\", caption)\n        # filenames:\n        caption = re.sub(r\"[\\S]+\\.(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)\", \"\", caption)\n\n        #\n        caption = re.sub(r\"[\\\"\\']{2,}\", r'\"', caption)  # \"\"\"AUSVERKAUFT\"\"\"\n        caption = re.sub(r\"[\\.]{2,}\", r\" \", caption)  # \"\"\"AUSVERKAUFT\"\"\"\n\n        caption = re.sub(self.bad_punct_regex, r\" \", caption)  # ***AUSVERKAUFT***, #AUSVERKAUFT\n        caption = re.sub(r\"\\s+\\.\\s+\", r\" \", caption)  # \" . \"\n\n        # this-is-my-cute-cat / this_is_my_cute_cat\n        regex2 = re.compile(r\"(?:\\-|\\_)\")\n        if len(re.findall(regex2, caption)) > 3:\n            caption = re.sub(regex2, \" \", caption)\n\n        caption = ftfy.fix_text(caption)\n        caption = html.unescape(html.unescape(caption))\n\n        caption = re.sub(r\"\\b[a-zA-Z]{1,3}\\d{3,15}\\b\", \"\", caption)  # jc6640\n        caption = re.sub(r\"\\b[a-zA-Z]+\\d+[a-zA-Z]+\\b\", \"\", caption)  # jc6640vc\n        caption = re.sub(r\"\\b\\d+[a-zA-Z]+\\d+\\b\", \"\", caption)  # 6640vc231\n\n        caption = re.sub(r\"(worldwide\\s+)?(free\\s+)?shipping\", \"\", caption)\n        caption = re.sub(r\"(free\\s)?download(\\sfree)?\", \"\", caption)\n        caption = re.sub(r\"\\bclick\\b\\s(?:for|on)\\s\\w+\", \"\", caption)\n        caption = re.sub(r\"\\b(?:png|jpg|jpeg|bmp|webp|eps|pdf|apk|mp4)(\\simage[s]?)?\", \"\", caption)\n        caption = re.sub(r\"\\bpage\\s+\\d+\\b\", \"\", caption)\n\n        caption = re.sub(r\"\\b\\d*[a-zA-Z]+\\d+[a-zA-Z]+\\d+[a-zA-Z\\d]*\\b\", r\" \", caption)  # j2d1a2a...\n\n        caption = re.sub(r\"\\b\\d+\\.?\\d*[xх×]\\d+\\.?\\d*\\b\", \"\", caption)\n\n        caption = re.sub(r\"\\b\\s+\\:\\s+\", r\": \", caption)\n        caption = re.sub(r\"(\\D[,\\./])\\b\", r\"\\1 \", caption)\n        caption = re.sub(r\"\\s+\", \" \", caption)\n\n        caption.strip()\n\n        caption = re.sub(r\"^[\\\"\\']([\\w\\W]+)[\\\"\\']$\", r\"\\1\", caption)\n        caption = re.sub(r\"^[\\'\\_,\\-\\:;]\", r\"\", caption)\n        caption = re.sub(r\"[\\'\\_,\\-\\:\\-\\+]$\", r\"\", caption)\n        caption = re.sub(r\"^\\.\\S+$\", \"\", caption)\n\n        return caption.strip()\n\n    # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_latents\n    def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None,\n                        image=None,\n                        timestep=None,\n                        is_strength_max=True,\n                        return_image_latents=True,\n                        ):\n        shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor)\n        if isinstance(generator, list) and len(generator) != batch_size:\n            raise ValueError(\n                f\"You have passed a list of generators of length {len(generator)}, but requested an effective batch\"\n                f\" size of {batch_size}. Make sure the batch size matches the length of the generators.\"\n            )\n\n        if (image is None or timestep is None) and not is_strength_max:\n            raise ValueError(\n                \"Since strength < 1. initial latents are to be initialised as a combination of Image + Noise.\"\n                \"However, either the image or the noise timestep has not been provided.\"\n            )\n\n        if return_image_latents or (latents is None and not is_strength_max):\n            image = image.to(device=device, dtype=dtype)\n\n            if image.shape[1] == 4:\n                image_latents = image\n            else:\n                image_latents = self._encode_vae_image(image=image, generator=generator)\n            image_latents = image_latents.repeat(batch_size // image_latents.shape[0], 1, 1, 1)\n\n        if latents is None:\n            noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype)\n            # if strength is 1. then initialise the latents to noise, else initial to image + noise\n            latents = noise if is_strength_max else self.scheduler.add_noise(image_latents, noise, timestep)\n            # if pure noise then scale the initial latents by the  Scheduler's init sigma\n            latents = latents * self.scheduler.init_noise_sigma if is_strength_max else latents\n        else:\n            noise = latents.to(device)\n            latents = noise * self.scheduler.init_noise_sigma\n\n        # scale the initial noise by the standard deviation required by the scheduler\n        latents = latents * self.scheduler.init_noise_sigma\n        return latents, noise, image_latents\n\n    @staticmethod\n    def classify_height_width_bin(height: int, width: int, ratios: dict) -> Tuple[int, int]:\n        \"\"\"Returns binned height and width.\"\"\"\n        ar = float(height / width)\n        closest_ratio = min(ratios.keys(), key=lambda ratio: abs(float(ratio) - ar))\n        default_hw = ratios[closest_ratio]\n        return int(default_hw[0]), int(default_hw[1])\n\n    @staticmethod\n    def resize_and_crop_tensor(samples: torch.Tensor, new_width: int, new_height: int) -> torch.Tensor:\n        orig_height, orig_width = samples.shape[2], samples.shape[3]\n\n        # Check if resizing is needed\n        if orig_height != new_height or orig_width != new_width:\n            ratio = max(new_height / orig_height, new_width / orig_width)\n            resized_width = int(orig_width * ratio)\n            resized_height = int(orig_height * ratio)\n\n            # Resize\n            samples = F.interpolate(\n                samples, size=(resized_height, resized_width), mode=\"bilinear\", align_corners=False\n            )\n\n            # Center Crop\n            start_x = (resized_width - new_width) // 2\n            end_x = start_x + new_width\n            start_y = (resized_height - new_height) // 2\n            end_y = start_y + new_height\n            samples = samples[:, :, start_y:end_y, start_x:end_x]\n\n        return samples\n\n    def _encode_vae_image(self, image: torch.Tensor, generator: torch.Generator):\n        if isinstance(generator, list):\n            image_latents = [\n                retrieve_latents(self.vae.encode(image[i: i + 1]), generator=generator[i])\n                for i in range(image.shape[0])\n            ]\n            image_latents = torch.cat(image_latents, dim=0)\n        else:\n            image_latents = retrieve_latents(self.vae.encode(image), generator=generator)\n\n        image_latents = self.vae.config.scaling_factor * image_latents\n\n        return image_latents\n\n    def prepare_mask_latents(\n            self, mask, batch_size, height, width, dtype, device, generator, do_classifier_free_guidance\n    ):\n        # resize the mask to latents shape as we concatenate the mask to the latents\n        # we do that before converting to dtype to avoid breaking in case we're using cpu_offload\n        # and half precision\n        mask = torch.nn.functional.interpolate(\n            mask, size=(height // self.vae_scale_factor, width // self.vae_scale_factor)\n        )\n        mask = mask.to(device=device, dtype=dtype)\n\n        if mask.shape[0] < batch_size:\n            if not batch_size % mask.shape[0] == 0:\n                raise ValueError(\n                    \"The passed mask and the required batch size don't match. Masks are supposed to be duplicated to\"\n                    f\" a total batch size of {batch_size}, but {mask.shape[0]} masks were passed. Make sure the number\"\n                    \" of masks that you pass is divisible by the total requested batch size.\"\n                )\n            mask = mask.repeat(batch_size // mask.shape[0], 1, 1, 1)\n\n        mask = torch.cat([mask] * 2) if do_classifier_free_guidance else mask\n\n        return mask\n\n    # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.StableDiffusionImg2ImgPipeline.get_timesteps\n    def get_timesteps(self, num_inference_steps, strength, device):\n        # get the original timestep using init_timestep\n        init_timestep = min(int(num_inference_steps * strength), num_inference_steps)\n\n        t_start = max(num_inference_steps - init_timestep, 0)\n        timesteps = self.scheduler.timesteps[t_start * self.scheduler.order:]\n\n        return timesteps, num_inference_steps - t_start\n\n    @torch.no_grad()\n    @replace_example_docstring(EXAMPLE_DOC_STRING)\n    def __call__(\n            self,\n            prompt: Union[str, List[str]] = None,\n            image: PipelineImageInput = None,\n            strength: float = 1.0,\n            negative_prompt: str = \"\",\n            num_inference_steps: int = 20,\n            timesteps: List[int] = None,\n            guidance_scale: float = 4.5,\n            num_images_per_prompt: Optional[int] = 1,\n            height: Optional[int] = None,\n            width: Optional[int] = None,\n            eta: float = 0.0,\n            generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,\n            latents: Optional[torch.FloatTensor] = None,\n            prompt_embeds: Optional[torch.FloatTensor] = None,\n            prompt_attention_mask: Optional[torch.FloatTensor] = None,\n            negative_prompt_embeds: Optional[torch.FloatTensor] = None,\n            negative_prompt_attention_mask: Optional[torch.FloatTensor] = None,\n            output_type: Optional[str] = \"pil\",\n            return_dict: bool = True,\n            callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None,\n            callback_steps: int = 1,\n            clean_caption: bool = True,\n            use_resolution_binning: bool = True,\n            **kwargs,\n    ) -> Union[ImagePipelineOutput, Tuple]:\n        \"\"\"\n        Function invoked when calling the pipeline for generation.\n\n        Args:\n            prompt (`str` or `List[str]`, *optional*):\n                The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`.\n                instead.\n            image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):\n                The reference image guides the image generation.\n            negative_prompt (`str` or `List[str]`, *optional*):\n                The prompt or prompts not to guide the image generation. If not defined, one has to pass\n                `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is\n                less than `1`).\n            num_inference_steps (`int`, *optional*, defaults to 100):\n                The number of denoising steps. More denoising steps usually lead to a higher quality image at the\n                expense of slower inference.\n            timesteps (`List[int]`, *optional*):\n                Custom timesteps to use for the denoising process. If not defined, equal spaced `num_inference_steps`\n                timesteps are used. Must be in descending order.\n            guidance_scale (`float`, *optional*, defaults to 4.5):\n                Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).\n                `guidance_scale` is defined as `w` of equation 2. of [Imagen\n                Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale >\n                1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`,\n                usually at the expense of lower image quality.\n            num_images_per_prompt (`int`, *optional*, defaults to 1):\n                The number of images to generate per prompt.\n            height (`int`, *optional*, defaults to self.unet.config.sample_size):\n                The height in pixels of the generated image.\n            width (`int`, *optional*, defaults to self.unet.config.sample_size):\n                The width in pixels of the generated image.\n            eta (`float`, *optional*, defaults to 0.0):\n                Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to\n                [`schedulers.DDIMScheduler`], will be ignored for others.\n            generator (`torch.Generator` or `List[torch.Generator]`, *optional*):\n                One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)\n                to make generation deterministic.\n            latents (`torch.FloatTensor`, *optional*):\n                Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image\n                generation. Can be used to tweak the same generation with different prompts. If not provided, a latents\n                tensor will ge generated by sampling using the supplied random `generator`.\n            prompt_embeds (`torch.FloatTensor`, *optional*):\n                Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not\n                provided, text embeddings will be generated from `prompt` input argument.\n            prompt_attention_mask (`torch.FloatTensor`, *optional*): Pre-generated attention mask for text embeddings.\n            negative_prompt_embeds (`torch.FloatTensor`, *optional*):\n                Pre-generated negative text embeddings. For PixArt-Alpha this negative prompt should be \"\". If not\n                provided, negative_prompt_embeds will be generated from `negative_prompt` input argument.\n            negative_prompt_attention_mask (`torch.FloatTensor`, *optional*):\n                Pre-generated attention mask for negative text embeddings.\n            output_type (`str`, *optional*, defaults to `\"pil\"`):\n                The output format of the generate image. Choose between\n                [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.\n            return_dict (`bool`, *optional*, defaults to `True`):\n                Whether or not to return a [`~pipelines.stable_diffusion.IFPipelineOutput`] instead of a plain tuple.\n            callback (`Callable`, *optional*):\n                A function that will be called every `callback_steps` steps during inference. The function will be\n                called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`.\n            callback_steps (`int`, *optional*, defaults to 1):\n                The frequency at which the `callback` function will be called. If not specified, the callback will be\n                called at every step.\n            clean_caption (`bool`, *optional*, defaults to `True`):\n                Whether or not to clean the caption before creating embeddings. Requires `beautifulsoup4` and `ftfy` to\n                be installed. If the dependencies are not installed, the embeddings will be created from the raw\n                prompt.\n            use_resolution_binning (`bool` defaults to `True`):\n                If set to `True`, the requested height and width are first mapped to the closest resolutions using\n                `ASPECT_RATIO_1024_BIN`. After the produced latents are decoded into images, they are resized back to\n                the requested resolution. Useful for generating non-square images.\n\n        Examples:\n\n        Returns:\n            [`~pipelines.ImagePipelineOutput`] or `tuple`:\n                If `return_dict` is `True`, [`~pipelines.ImagePipelineOutput`] is returned, otherwise a `tuple` is\n                returned where the first element is a list with the generated images\n        \"\"\"\n        if \"mask_feature\" in kwargs:\n            deprecation_message = \"The use of `mask_feature` is deprecated. It is no longer used in any computation and that doesn't affect the end results. It will be removed in a future version.\"\n            deprecate(\"mask_feature\", \"1.0.0\", deprecation_message, standard_warn=False)\n        # 1. Check inputs. Raise error if not correct\n        height = height or self.transformer.config.sample_size * self.vae_scale_factor\n        width = width or self.transformer.config.sample_size * self.vae_scale_factor\n\n        width *= 2\n        ref = image\n        image = Image.new(\"RGB\", (width, height), (255, 255, 255))\n        image.paste(ref, (0, 0))\n\n        mask_image = Image.new(\"RGB\", (width, height), (255, 255, 255))\n        balck_rect = Image.new(\"RGB\", (width // 2, height), (0, 0, 0))\n        mask_image.paste(balck_rect, (0, 0))\n\n        if use_resolution_binning:\n            aspect_ratio_bin = (\n                ASPECT_RATIO_1024_BIN if self.transformer.config.sample_size == 128 else ASPECT_RATIO_512_BIN\n            )\n            orig_height, orig_width = height, width\n            height, width = self.classify_height_width_bin(height, width, ratios=aspect_ratio_bin)\n\n        self.check_inputs(\n            prompt,\n            image,\n            height,\n            width,\n            negative_prompt,\n            callback_steps,\n            prompt_embeds,\n            negative_prompt_embeds,\n            prompt_attention_mask,\n            negative_prompt_attention_mask,\n        )\n\n        # 2. Default height and width to transformer\n        if prompt is not None and isinstance(prompt, str):\n            batch_size = 1\n        elif prompt is not None and isinstance(prompt, list):\n            batch_size = len(prompt)\n        else:\n            batch_size = prompt_embeds.shape[0]\n\n        device = self._execution_device\n\n        # here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)\n        # of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`\n        # corresponds to doing no classifier free guidance.\n        do_classifier_free_guidance = guidance_scale > 1.0\n\n        # 3. Encode input prompt\n        (\n            prompt_embeds,\n            prompt_attention_mask,\n            negative_prompt_embeds,\n            negative_prompt_attention_mask,\n        ) = self.encode_prompt(\n            prompt,\n            do_classifier_free_guidance,\n            negative_prompt=negative_prompt,\n            num_images_per_prompt=num_images_per_prompt,\n            device=device,\n            prompt_embeds=prompt_embeds,\n            negative_prompt_embeds=negative_prompt_embeds,\n            prompt_attention_mask=prompt_attention_mask,\n            negative_prompt_attention_mask=negative_prompt_attention_mask,\n            clean_caption=clean_caption,\n        )\n        if do_classifier_free_guidance:\n            prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)\n            prompt_attention_mask = torch.cat([negative_prompt_attention_mask, prompt_attention_mask], dim=0)\n\n        # 4. Prepare timesteps\n        timesteps, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, timesteps)\n        timesteps, num_inference_steps = self.get_timesteps(\n            num_inference_steps=num_inference_steps, strength=strength, device=device\n        )\n\n        # at which timestep to set the initial noise (n.b. 50% if strength is 0.5)\n        latent_timestep = timesteps[:1].repeat(batch_size * num_images_per_prompt)\n        # create a boolean to check if the strength is set to 1. if so then initialise the latents with pure noise\n        is_strength_max = strength == 1.0\n        init_image = self.image_processor.preprocess(image, height=height, width=width)\n        init_image = init_image.to(dtype=torch.float32)\n\n        # 5. Prepare latents.\n        latent_channels = self.transformer.config.in_channels\n        latents_outputs = self.prepare_latents(\n            batch_size * num_images_per_prompt,\n            latent_channels,\n            height,\n            width,\n            prompt_embeds.dtype,\n            device,\n            generator,\n            latents,\n            image=init_image,\n            timestep=latent_timestep,\n            is_strength_max=is_strength_max,\n        )\n        latents, noise, image_latents = latents_outputs\n\n        mask_condition = self.mask_processor.preprocess(mask_image, height=height, width=width)\n        mask = self.prepare_mask_latents(\n            mask_condition,\n            batch_size * num_images_per_prompt,\n            height,\n            width,\n            prompt_embeds.dtype,\n            device,\n            generator,\n            do_classifier_free_guidance,\n        )\n\n        # 6. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline\n        extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta)\n\n        # 6.1 Prepare micro-conditions.\n        added_cond_kwargs = {\"resolution\": None, \"aspect_ratio\": None}\n        if self.transformer.config.sample_size == 128:\n            resolution = torch.tensor([height, width]).repeat(batch_size * num_images_per_prompt, 1)\n            aspect_ratio = torch.tensor([float(height / width)]).repeat(batch_size * num_images_per_prompt, 1)\n            resolution = resolution.to(dtype=prompt_embeds.dtype, device=device)\n            aspect_ratio = aspect_ratio.to(dtype=prompt_embeds.dtype, device=device)\n            added_cond_kwargs = {\"resolution\": resolution, \"aspect_ratio\": aspect_ratio}\n\n        # 7. Denoising loop\n        num_warmup_steps = max(len(timesteps) - num_inference_steps * self.scheduler.order, 0)\n\n        latent_model_input = torch.cat([latents] * 2) if do_classifier_free_guidance else latents\n        with self.progress_bar(total=num_inference_steps) as progress_bar:\n            for i, t in enumerate(timesteps):\n                latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)\n\n                current_timestep = t\n                if not torch.is_tensor(current_timestep):\n                    # TODO: this requires sync between CPU and GPU. So try to pass timesteps as tensors if you can\n                    # This would be a good case for the `match` statement (Python 3.10+)\n                    is_mps = latent_model_input.device.type == \"mps\"\n                    if isinstance(current_timestep, float):\n                        dtype = torch.float32 if is_mps else torch.float64\n                    else:\n                        dtype = torch.int32 if is_mps else torch.int64\n                    current_timestep = torch.tensor([current_timestep], dtype=dtype, device=latent_model_input.device)\n                elif len(current_timestep.shape) == 0:\n                    current_timestep = current_timestep[None].to(latent_model_input.device)\n                # broadcast to batch dimension in a way that's compatible with ONNX/Core ML\n\n                # predict noise model_output\n                noise_pred = self.transformer(\n                    latent_model_input,\n                    encoder_hidden_states=prompt_embeds,\n                    encoder_attention_mask=prompt_attention_mask,\n                    timestep=current_timestep,\n                    added_cond_kwargs=added_cond_kwargs,\n                    return_dict=False,\n                )[0]\n\n                # perform guidance\n                if do_classifier_free_guidance:\n                    noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)\n                    noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)\n\n                # learned sigma\n                if self.transformer.config.out_channels // 2 == latent_channels:\n                    noise_pred = noise_pred.chunk(2, dim=1)[0]\n                else:\n                    noise_pred = noise_pred\n\n                # compute previous image: x_t -> x_t-1\n                latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs, return_dict=False)[0]\n\n                init_latents_proper = image_latents\n                if do_classifier_free_guidance:\n                    init_mask, _ = mask.chunk(2)\n                else:\n                    init_mask = mask\n\n                if i < len(timesteps) - 1:\n                    noise_timestep = timesteps[i + 1]\n                    init_latents_proper = self.scheduler.add_noise(\n                        init_latents_proper, noise, torch.tensor([noise_timestep])\n                    )\n                latents_ = latents\n                latents = (1 - init_mask) * init_latents_proper + init_mask * latents\n\n                latent_model_input = torch.cat([latents_] + [latents]) if do_classifier_free_guidance else latents\n\n                # call the callback, if provided\n                if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0):\n                    progress_bar.update()\n                    if callback is not None and i % callback_steps == 0:\n                        step_idx = i // getattr(self.scheduler, \"order\", 1)\n                        callback(step_idx, t, latents)\n        if not output_type == \"latent\":\n            image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False)[0]\n            if use_resolution_binning:\n                image = self.resize_and_crop_tensor(image, orig_width, orig_height)\n        else:\n            image = latents\n\n        image = image.chunk(2, -1)[1]\n        if not output_type == \"latent\":\n            image = self.image_processor.postprocess(image, output_type=output_type)\n\n        # Offload all models\n        self.maybe_free_model_hooks()\n\n        if not return_dict:\n            return (image,)\n\n        return ImagePipelineOutput(images=image)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/timing_analysis.py",
    "content": "import json\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nwith open('timing_info.json', 'r') as f:\n    data = json.load(f)\n\nattn_times = []\ncross_attn_times = []\nmlp_times = []\nblock_times = []\n\nfor entry in data:\n    timing_info = entry['timing_info']\n    attn_times.extend(timing_info['attn_time'])\n    cross_attn_times.extend(timing_info['cross_attn_time'])\n    mlp_times.extend(timing_info['mlp_time'])\n    block_times.extend(timing_info['block_time'])\n\naverage_attn_time = np.mean(attn_times)\naverage_cross_attn_time = np.mean(cross_attn_times)\naverage_mlp_time = np.mean(mlp_times)\naverage_block_time = np.mean(block_times)\n\nprint(f\"Average Attention Time: {average_attn_time:.4f} ms\")\nprint(f\"Average Cross Attention Time: {average_cross_attn_time:.4f} ms\")\nprint(f\"Average MLP Time: {average_mlp_time:.4f} ms\")\nprint(f\"Average Block Time: {average_block_time:.4f} ms\")\n\nlabels = ['Attention', 'Cross Attention', 'MLP', 'Block']\navg_times = [average_attn_time, average_cross_attn_time, average_mlp_time, average_block_time]\n\nplt.bar(labels, avg_times, color=['blue', 'green', 'red', 'orange'])\nplt.ylabel('Average Time (ms)')\nplt.title('Average Time per Module')\n\nplt.savefig('module_average_times.png')\n"
  },
  {
    "path": "PixArt-alpha-ToCa/timing_info.json",
    "content": "[{\"timing_info\": {\"block_time\": [10.906271934509277], \"attn_time\": [7.704576015472412], \"cross_attn_time\": [0.9379839897155762], \"mlp_time\": [2.0203518867492676]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.602560043334961], \"attn_time\": [0.5560320019721985], \"cross_attn_time\": [0.5662720203399658], \"mlp_time\": [0.30105599761009216]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4970879554748535], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2969599962234497]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4981119632720947], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5314559936523438], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4755840301513672], \"attn_time\": [0.4925439953804016], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4776320457458496], \"attn_time\": [0.48742398619651794], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.2969599962234497]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4428160190582275], \"attn_time\": [0.4925439953804016], \"cross_attn_time\": [0.5038080215454102], \"mlp_time\": [0.2764799892902374]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4407680034637451], \"attn_time\": [0.4761599898338318], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.46943998336792], \"attn_time\": [0.49459201097488403], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.465343952178955], \"attn_time\": [0.48230400681495667], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4632960557937622], \"attn_time\": [0.48742398619651794], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4612480401992798], \"attn_time\": [0.4761599898338318], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4592000246047974], \"attn_time\": [0.48230400681495667], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.435647964477539], \"attn_time\": [0.47308799624443054], \"cross_attn_time\": [0.506879985332489], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.46943998336792], \"attn_time\": [0.49663999676704407], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5421439409255981], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5631999969482422], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.474560022354126], \"attn_time\": [0.4915199875831604], \"cross_attn_time\": [0.5048320293426514], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4725120067596436], \"attn_time\": [0.48947200179100037], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4499839544296265], \"attn_time\": [0.48230400681495667], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4899200201034546], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4551039934158325], \"attn_time\": [0.48947200179100037], \"cross_attn_time\": [0.5120000243186951], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.46943998336792], \"attn_time\": [0.48947200179100037], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5011839866638184], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4776320457458496], \"attn_time\": [0.4853760004043579], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.457152009010315], \"attn_time\": [0.4843519926071167], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4960639476776123], \"attn_time\": [0.5099520087242126], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.462272047996521], \"attn_time\": [0.4843519926071167], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4888960123062134], \"attn_time\": [0.4997119903564453], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.30105599761009216]}, \"current\": {\"num_steps\": 20, \"step\": 0, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.733855962753296], \"attn_time\": [0.579584002494812], \"cross_attn_time\": [0.567296028137207], \"mlp_time\": [0.3266560137271881]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5523840188980103], \"attn_time\": [0.5181440114974976], \"cross_attn_time\": [0.5355520248413086], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5104000568389893], \"attn_time\": [0.48742398619651794], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5196160078048706], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5360000133514404], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5345280170440674], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5185920000076294], \"attn_time\": [0.49459201097488403], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.481727957725525], \"attn_time\": [0.48844799399375916], \"cross_attn_time\": [0.5038080215454102], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.528831958770752], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4796799421310425], \"attn_time\": [0.48127999901771545], \"cross_attn_time\": [0.5099520087242126], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5314559936523438], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5052800178527832], \"attn_time\": [0.48947200179100037], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5001599788665771], \"attn_time\": [0.4843519926071167], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5380480289459229], \"attn_time\": [0.5120000243186951], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5063040256500244], \"attn_time\": [0.4904960095882416], \"cross_attn_time\": [0.5130239725112915], \"mlp_time\": [0.2959359884262085]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5206400156021118], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5104000568389893], \"attn_time\": [0.4904960095882416], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.2959359884262085]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [2.950144052505493], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [1.1509759426116943], \"mlp_time\": [0.9451519846916199]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.49561598896980286], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.29900801181793213]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4940160512924194], \"attn_time\": [0.4904960095882416], \"cross_attn_time\": [0.506879985332489], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5267839431762695], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5011839866638184], \"attn_time\": [0.4843519926071167], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4899200201034546], \"attn_time\": [0.4864000082015991], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.5099520087242126], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [3.0791680812835693], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [1.8472959995269775]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.6936960220336914], \"attn_time\": [0.6215680241584778], \"cross_attn_time\": [0.5591040253639221], \"mlp_time\": [0.30003198981285095]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5421439409255981], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5355520248413086], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.4976640045642853], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5452159643173218], \"attn_time\": [0.5150719881057739], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.2969599962234497]}, \"current\": {\"num_steps\": 20, \"step\": 1, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.6070079803466797], \"attn_time\": [0.5406720042228699], \"cross_attn_time\": [0.5591040253639221], \"mlp_time\": [0.3092480003833771]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.598464012145996], \"attn_time\": [0.5355520248413086], \"cross_attn_time\": [0.5427200198173523], \"mlp_time\": [0.30822399258613586]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5472639799118042], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5437440276145935], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5370240211486816], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.29900801181793213]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.56876802444458], \"attn_time\": [0.5191680192947388], \"cross_attn_time\": [0.5427200198173523], \"mlp_time\": [0.2969599962234497]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5554560422897339], \"attn_time\": [0.5140479803085327], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.3041279911994934]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5544320344924927], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5437440276145935], \"mlp_time\": [0.2969599962234497]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5380480289459229], \"attn_time\": [0.4997119903564453], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5738879442214966], \"attn_time\": [0.5263360142707825], \"cross_attn_time\": [0.536575973033905], \"mlp_time\": [0.30105599761009216]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.558527946472168], \"attn_time\": [0.5191680192947388], \"cross_attn_time\": [0.536575973033905], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5472639799118042], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5386239886283875], \"mlp_time\": [0.2969599962234497]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5185920000076294], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.2949120104312897]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.563647985458374], \"attn_time\": [0.5191680192947388], \"cross_attn_time\": [0.5396479964256287], \"mlp_time\": [0.2949120104312897]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5441919565200806], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5355520248413086], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.6773120164871216], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.3491840064525604]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5656960010528564], \"attn_time\": [0.5191680192947388], \"cross_attn_time\": [0.5345280170440674], \"mlp_time\": [0.3041279911994934]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5493119955062866], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5345280170440674], \"mlp_time\": [0.30105599761009216]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5278079509735107], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.546239972114563], \"attn_time\": [0.5099520087242126], \"cross_attn_time\": [0.5345280170440674], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.533951997756958], \"attn_time\": [0.49561598896980286], \"cross_attn_time\": [0.5345280170440674], \"mlp_time\": [0.2969599962234497]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.521664023399353], \"attn_time\": [0.49459201097488403], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.572864055633545], \"attn_time\": [0.5283839702606201], \"cross_attn_time\": [0.5386239886283875], \"mlp_time\": [0.2949120104312897]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.558527946472168], \"attn_time\": [0.5191680192947388], \"cross_attn_time\": [0.5386239886283875], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5226880311965942], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.504256010055542], \"attn_time\": [0.49663999676704407], \"cross_attn_time\": [0.5109760165214539], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.533951997756958], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.2969599962234497]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.539072036743164], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.532480001449585], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.516543984413147], \"attn_time\": [0.4935680031776428], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 2, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5578240156173706], \"attn_time\": [0.5150719881057739], \"cross_attn_time\": [0.5457919836044312], \"mlp_time\": [0.2969599962234497]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5022079944610596], \"attn_time\": [0.4915199875831604], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5001599788665771], \"attn_time\": [0.49459201097488403], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5134719610214233], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5022079944610596], \"attn_time\": [0.4915199875831604], \"cross_attn_time\": [0.5140479803085327], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.491968035697937], \"attn_time\": [0.4843519926071167], \"cross_attn_time\": [0.5130239725112915], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.4997119903564453], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5185920000076294], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5452159643173218], \"attn_time\": [0.5140479803085327], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2979840040206909]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4970879554748535], \"attn_time\": [0.4853760004043579], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5001599788665771], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.521664023399353], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5011839866638184], \"attn_time\": [0.48742398619651794], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [2.6859519481658936], \"attn_time\": [1.0670080184936523], \"cross_attn_time\": [0.8294399976730347], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5206400156021118], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4960639476776123], \"attn_time\": [0.48947200179100037], \"cross_attn_time\": [0.5109760165214539], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5063040256500244], \"attn_time\": [0.4935680031776428], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5370240211486816], \"attn_time\": [0.5140479803085327], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5022079944610596], \"attn_time\": [0.4915199875831604], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5175679922103882], \"attn_time\": [0.4843519926071167], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5011839866638184], \"attn_time\": [0.48742398619651794], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5380480289459229], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.2959359884262085]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5001599788665771], \"attn_time\": [0.4843519926071167], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4878720045089722], \"attn_time\": [0.4853760004043579], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5370240211486816], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.511423945426941], \"attn_time\": [0.49459201097488403], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.2969599962234497]}, \"current\": {\"num_steps\": 20, \"step\": 3, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.547327995300293], \"attn_time\": [0.5232639908790588], \"cross_attn_time\": [0.5335040092468262], \"mlp_time\": [0.30105599761009216]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5308799743652344], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5329279899597168], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4940160512924194], \"attn_time\": [0.48947200179100037], \"cross_attn_time\": [0.5079039931297302], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5175679922103882], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5155199766159058], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4960639476776123], \"attn_time\": [0.4833280146121979], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4981119632720947], \"attn_time\": [0.48947200179100037], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.533951997756958], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5400960445404053], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.29900801181793213]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5155199766159058], \"attn_time\": [0.49459201097488403], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5011839866638184], \"attn_time\": [0.4935680031776428], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5134719610214233], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.521664023399353], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5083520412445068], \"attn_time\": [0.4904960095882416], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5155199766159058], \"attn_time\": [0.4904960095882416], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.2979840040206909]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.499135971069336], \"attn_time\": [0.4853760004043579], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.521664023399353], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5226880311965942], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5155199766159058], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5134719610214233], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5196160078048706], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5032320022583008], \"attn_time\": [0.4864000082015991], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4960639476776123], \"attn_time\": [0.4833280146121979], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5022079944610596], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5130239725112915], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.533951997756958], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5345280170440674], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5237120389938354], \"attn_time\": [0.49663999676704407], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 4, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.53711998462677], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5437440276145935], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5360000133514404], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.532480001449585], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5022079944610596], \"attn_time\": [0.4915199875831604], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5278079509735107], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.539072036743164], \"attn_time\": [0.5160959959030151], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5411200523376465], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.3020800054073334]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5175679922103882], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5226880311965942], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5063040256500244], \"attn_time\": [0.4935680031776428], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5022079944610596], \"attn_time\": [0.4904960095882416], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.509376049041748], \"attn_time\": [0.4925439953804016], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5206400156021118], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5278079509735107], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5083520412445068], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5001599788665771], \"attn_time\": [0.48844799399375916], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5073280334472656], \"attn_time\": [0.4935680031776428], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5104000568389893], \"attn_time\": [0.49663999676704407], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5073280334472656], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5073280334472656], \"attn_time\": [0.49561598896980286], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5052800178527832], \"attn_time\": [0.4925439953804016], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.491968035697937], \"attn_time\": [0.4904960095882416], \"cross_attn_time\": [0.5120000243186951], \"mlp_time\": [0.2764799892902374]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5196160078048706], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5011839866638184], \"attn_time\": [0.4925439953804016], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5155199766159058], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5022079944610596], \"attn_time\": [0.4925439953804016], \"cross_attn_time\": [0.5150719881057739], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.481727957725525], \"attn_time\": [0.48230400681495667], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 5, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5565439462661743], \"attn_time\": [0.5181440114974976], \"cross_attn_time\": [0.5416960120201111], \"mlp_time\": [0.3041279911994934]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4960639476776123], \"attn_time\": [0.48742398619651794], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.521664023399353], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5011839866638184], \"attn_time\": [0.4915199875831604], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5267839431762695], \"attn_time\": [0.5099520087242126], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5196160078048706], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5335040092468262], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5134719610214233], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.509376049041748], \"attn_time\": [0.4935680031776428], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.509376049041748], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5185920000076294], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5267839431762695], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4970879554748535], \"attn_time\": [0.48844799399375916], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.2764799892902374]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4735360145568848], \"attn_time\": [0.48127999901771545], \"cross_attn_time\": [0.5120000243186951], \"mlp_time\": [0.2764799892902374]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5032320022583008], \"attn_time\": [0.4935680031776428], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5124479532241821], \"attn_time\": [0.49459201097488403], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5257600545883179], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5063040256500244], \"attn_time\": [0.4935680031776428], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5083520412445068], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5226880311965942], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5073280334472656], \"attn_time\": [0.4925439953804016], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5104000568389893], \"attn_time\": [0.49561598896980286], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5257600545883179], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5155199766159058], \"attn_time\": [0.4976640045642853], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5278079509735107], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5472639799118042], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5237120389938354], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 6, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.535904049873352], \"attn_time\": [0.5171200037002563], \"cross_attn_time\": [0.5355520248413086], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5196160078048706], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.521664023399353], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5155199766159058], \"attn_time\": [0.4997119903564453], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5329279899597168], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5452159643173218], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5314559936523438], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5032320022583008], \"attn_time\": [0.4997119903564453], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5206400156021118], \"attn_time\": [0.49663999676704407], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5052800178527832], \"attn_time\": [0.48947200179100037], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5134719610214233], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5130239725112915], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5032320022583008], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.509376049041748], \"attn_time\": [0.49561598896980286], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5196160078048706], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5032320022583008], \"attn_time\": [0.49561598896980286], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5226880311965942], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5441919565200806], \"attn_time\": [0.5140479803085327], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.2959359884262085]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5360000133514404], \"attn_time\": [0.5150719881057739], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5278079509735107], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.532480001449585], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5083520412445068], \"attn_time\": [0.49561598896980286], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5104000568389893], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5120000243186951], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5099520087242126], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5175679922103882], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5534080266952515], \"attn_time\": [0.5171200037002563], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2969599962234497]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5308799743652344], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.528831958770752], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5411200523376465], \"attn_time\": [0.5160959959030151], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 7, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.546463966369629], \"attn_time\": [0.5181440114974976], \"cross_attn_time\": [0.536575973033905], \"mlp_time\": [0.3031040132045746]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5206400156021118], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [3.84716796875], \"attn_time\": [0.749567985534668], \"cross_attn_time\": [0.5457919836044312], \"mlp_time\": [0.30720001459121704]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5175679922103882], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5130239725112915], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5237120389938354], \"attn_time\": [0.5120000243186951], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5104000568389893], \"attn_time\": [0.49561598896980286], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5022079944610596], \"attn_time\": [0.49663999676704407], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5196160078048706], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5308799743652344], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.528831958770752], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5308799743652344], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5257600545883179], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5052800178527832], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.521664023399353], \"attn_time\": [0.4997119903564453], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.511423945426941], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.528831958770752], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5022079944610596], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5140479803085327], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.509376049041748], \"attn_time\": [0.4976640045642853], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5298559665679932], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5134719610214233], \"attn_time\": [0.4976640045642853], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.511423945426941], \"attn_time\": [0.49459201097488403], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5308799743652344], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5144959688186646], \"attn_time\": [0.4997119903564453], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5370240211486816], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5155199766159058], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5140479803085327], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5185920000076294], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5150719881057739], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5134719610214233], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5134719610214233], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 8, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5425920486450195], \"attn_time\": [0.5160959959030151], \"cross_attn_time\": [0.5375999808311462], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5267839431762695], \"attn_time\": [0.5099520087242126], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5144959688186646], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5175679922103882], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.528831958770752], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5278079509735107], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5196160078048706], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5257600545883179], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5104000568389893], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5144959688186646], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4981119632720947], \"attn_time\": [0.49459201097488403], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.2744320034980774]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4981119632720947], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5140479803085327], \"mlp_time\": [0.2764799892902374]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5196160078048706], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5150719881057739], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5134719610214233], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5134719610214233], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5083520412445068], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5150719881057739], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5226880311965942], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5150719881057739], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5052800178527832], \"attn_time\": [0.4935680031776428], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.516543984413147], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5124479532241821], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5150719881057739], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.509376049041748], \"attn_time\": [0.49561598896980286], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2764799892902374]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.533951997756958], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5083520412445068], \"attn_time\": [0.4976640045642853], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2764799892902374]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5206400156021118], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5140479803085327], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.511423945426941], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5134719610214233], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4981119632720947], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5109760165214539], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.499135971069336], \"attn_time\": [0.4935680031776428], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 9, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5166079998016357], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5226880311965942], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5237120389938354], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5063040256500244], \"attn_time\": [0.4976640045642853], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5073280334472656], \"attn_time\": [0.4976640045642853], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5175679922103882], \"attn_time\": [0.4997119903564453], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5124479532241821], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5130239725112915], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5073280334472656], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5120000243186951], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5073280334472656], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5237120389938354], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.521664023399353], \"attn_time\": [0.5099520087242126], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5144959688186646], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5144959688186646], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5124479532241821], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5257600545883179], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.516543984413147], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5820800065994263], \"attn_time\": [0.536575973033905], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.2949120104312897]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.6005120277404785], \"attn_time\": [0.5335040092468262], \"cross_attn_time\": [0.5478399991989136], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.551360011100769], \"attn_time\": [0.5120000243186951], \"cross_attn_time\": [0.536575973033905], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5370240211486816], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.6680959463119507], \"attn_time\": [0.5765119791030884], \"cross_attn_time\": [0.5488640069961548], \"mlp_time\": [0.30003198981285095]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5718400478363037], \"attn_time\": [0.5171200037002563], \"cross_attn_time\": [0.5437440276145935], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5523840188980103], \"attn_time\": [0.5150719881057739], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.2949120104312897]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5380480289459229], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5335040092468262], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5360000133514404], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5314559936523438], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.546239972114563], \"attn_time\": [0.5171200037002563], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2949120104312897]}, \"current\": {\"num_steps\": 20, \"step\": 10, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5750720500946045], \"attn_time\": [0.5294079780578613], \"cross_attn_time\": [0.5529599785804749], \"mlp_time\": [0.29900801181793213]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5237120389938354], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5278079509735107], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5544320344924927], \"attn_time\": [0.5222399830818176], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.2959359884262085]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5380480289459229], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.532480001449585], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.539072036743164], \"attn_time\": [0.5171200037002563], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5329279899597168], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5375999808311462], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.539072036743164], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.528831958770752], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5335040092468262], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5646719932556152], \"attn_time\": [0.5345280170440674], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5400960445404053], \"attn_time\": [0.5150719881057739], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5360000133514404], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5335040092468262], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5278079509735107], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5411200523376465], \"attn_time\": [0.5140479803085327], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5534080266952515], \"attn_time\": [0.5181440114974976], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.29900801181793213]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.521664023399353], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [2.9480960369110107], \"attn_time\": [0.5120000243186951], \"cross_attn_time\": [0.8601599931716919], \"mlp_time\": [1.1284480094909668]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5503360033035278], \"attn_time\": [0.5099520087242126], \"cross_attn_time\": [0.5406720042228699], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5226880311965942], \"attn_time\": [0.49663999676704407], \"cross_attn_time\": [0.5335040092468262], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5104000568389893], \"attn_time\": [0.4925439953804016], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5032320022583008], \"attn_time\": [0.4935680031776428], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5134719610214233], \"attn_time\": [0.4976640045642853], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5083520412445068], \"attn_time\": [0.4976640045642853], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.516543984413147], \"attn_time\": [0.4976640045642853], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5140479803085327], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5493119955062866], \"attn_time\": [0.5150719881057739], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 11, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5667200088500977], \"attn_time\": [0.5396479964256287], \"cross_attn_time\": [0.5335040092468262], \"mlp_time\": [0.30617600679397583]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5605759620666504], \"attn_time\": [0.5222399830818176], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.2959359884262085]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5482879877090454], \"attn_time\": [0.5160959959030151], \"cross_attn_time\": [0.5314559936523438], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5237120389938354], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5104000568389893], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5421439409255981], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2949120104312897]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.539072036743164], \"attn_time\": [0.5120000243186951], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5206400156021118], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.533951997756958], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5329279899597168], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5349760055541992], \"attn_time\": [0.5099520087242126], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.511423945426941], \"attn_time\": [0.49663999676704407], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.533951997756958], \"attn_time\": [0.5120000243186951], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5257600545883179], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.8565119504928589], \"attn_time\": [0.6768640279769897], \"cross_attn_time\": [0.5652480125427246], \"mlp_time\": [0.317440003156662]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5615999698638916], \"attn_time\": [0.5171200037002563], \"cross_attn_time\": [0.5447679758071899], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5534080266952515], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.2949120104312897]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5523840188980103], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.5314559936523438], \"mlp_time\": [0.2949120104312897]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5441919565200806], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5345280170440674], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5472639799118042], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5386239886283875], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.551360011100769], \"attn_time\": [0.5150719881057739], \"cross_attn_time\": [0.5314559936523438], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5431679487228394], \"attn_time\": [0.5120000243186951], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5329279899597168], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.546239972114563], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.536575973033905], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5441919565200806], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5345280170440674], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.558527946472168], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.5335040092468262], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.551360011100769], \"attn_time\": [0.5140479803085327], \"cross_attn_time\": [0.532480001449585], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5534080266952515], \"attn_time\": [0.5181440114974976], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 12, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.593727946281433], \"attn_time\": [0.5375999808311462], \"cross_attn_time\": [0.5550079941749573], \"mlp_time\": [0.3051519989967346]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5708160400390625], \"attn_time\": [0.5171200037002563], \"cross_attn_time\": [0.5447679758071899], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5575040578842163], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5396479964256287], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5677440166473389], \"attn_time\": [0.5150719881057739], \"cross_attn_time\": [0.5406720042228699], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5575040578842163], \"attn_time\": [0.5160959959030151], \"cross_attn_time\": [0.5386239886283875], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5595519542694092], \"attn_time\": [0.5160959959030151], \"cross_attn_time\": [0.5396479964256287], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5503360033035278], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5355520248413086], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5779839754104614], \"attn_time\": [0.5242879986763], \"cross_attn_time\": [0.536575973033905], \"mlp_time\": [0.3031040132045746]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5267839431762695], \"attn_time\": [0.4997119903564453], \"cross_attn_time\": [0.5314559936523438], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5411200523376465], \"attn_time\": [0.5099520087242126], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5360000133514404], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5411200523376465], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5355520248413086], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5605759620666504], \"attn_time\": [0.5171200037002563], \"cross_attn_time\": [0.5375999808311462], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5052800178527832], \"attn_time\": [0.4997119903564453], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4940160512924194], \"attn_time\": [0.4915199875831604], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5380480289459229], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.2949120104312897]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5226880311965942], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5308799743652344], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5298559665679932], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.533951997756958], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5308799743652344], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5206400156021118], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5370240211486816], \"attn_time\": [0.5120000243186951], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5175679922103882], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.511423945426941], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5155199766159058], \"attn_time\": [0.4997119903564453], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5421439409255981], \"attn_time\": [0.5140479803085327], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.504256010055542], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.2764799892902374]}, \"current\": {\"num_steps\": 20, \"step\": 13, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5440959930419922], \"attn_time\": [0.5212159752845764], \"cross_attn_time\": [0.5314559936523438], \"mlp_time\": [0.30003198981285095]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5073280334472656], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5109760165214539], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5124479532241821], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.504256010055542], \"attn_time\": [0.49663999676704407], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5052800178527832], \"attn_time\": [0.49561598896980286], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5278079509735107], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5206400156021118], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5349760055541992], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.516543984413147], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.504256010055542], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5226880311965942], \"attn_time\": [0.5160959959030151], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5022079944610596], \"attn_time\": [0.49459201097488403], \"cross_attn_time\": [0.5120000243186951], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5063040256500244], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5099520087242126], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5237120389938354], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [5.377024173736572], \"attn_time\": [1.6383999586105347], \"cross_attn_time\": [1.7756160497665405], \"mlp_time\": [1.4632960557937622]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5380480289459229], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.539072036743164], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5206400156021118], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5349760055541992], \"attn_time\": [0.5140479803085327], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5267839431762695], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.521664023399353], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5421439409255981], \"attn_time\": [0.5120000243186951], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2949120104312897]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5400960445404053], \"attn_time\": [0.5140479803085327], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5360000133514404], \"attn_time\": [0.5140479803085327], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5196160078048706], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 14, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.540992021560669], \"attn_time\": [0.5160959959030151], \"cross_attn_time\": [0.5335040092468262], \"mlp_time\": [0.3031040132045746]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5370240211486816], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5335040092468262], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5595519542694092], \"attn_time\": [0.5171200037002563], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.3031040132045746]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.49663999676704407], \"cross_attn_time\": [0.5375999808311462], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5124479532241821], \"attn_time\": [0.4904960095882416], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5175679922103882], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5120000243186951], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5185920000076294], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5400960445404053], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.2979840040206909]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4981119632720947], \"attn_time\": [0.4997119903564453], \"cross_attn_time\": [0.5130239725112915], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5185920000076294], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5237120389938354], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5380480289459229], \"attn_time\": [0.5120000243186951], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5441919565200806], \"attn_time\": [0.5140479803085327], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5308799743652344], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5185920000076294], \"attn_time\": [0.49663999676704407], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5134719610214233], \"attn_time\": [0.4976640045642853], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.516543984413147], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.528831958770752], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5175679922103882], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5267839431762695], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5083520412445068], \"attn_time\": [0.49663999676704407], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5063040256500244], \"attn_time\": [0.4935680031776428], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5032320022583008], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5124479532241821], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5181440114974976], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5329279899597168], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5314559936523438], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 15, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.540287971496582], \"attn_time\": [0.5181440114974976], \"cross_attn_time\": [0.5314559936523438], \"mlp_time\": [0.2979840040206909]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5134719610214233], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5120000243186951], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.509376049041748], \"attn_time\": [0.49663999676704407], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5185920000076294], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5257600545883179], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5155199766159058], \"attn_time\": [0.4997119903564453], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5237120389938354], \"attn_time\": [0.5099520087242126], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5329279899597168], \"attn_time\": [0.5130239725112915], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5278079509735107], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5144959688186646], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5140479803085327], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5001599788665771], \"attn_time\": [0.49663999676704407], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5278079509735107], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5308799743652344], \"attn_time\": [0.5099520087242126], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5175679922103882], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4970879554748535], \"attn_time\": [0.48947200179100037], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.516543984413147], \"attn_time\": [0.4935680031776428], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5052800178527832], \"attn_time\": [0.4976640045642853], \"cross_attn_time\": [0.5150719881057739], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5185920000076294], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5185920000076294], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5360000133514404], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5345280170440674], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5155199766159058], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5140479803085327], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5155199766159058], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5120000243186951], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5155199766159058], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5140479803085327], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.516543984413147], \"attn_time\": [0.4997119903564453], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5278079509735107], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5314559936523438], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [2.0336639881134033], \"attn_time\": [0.7659519910812378], \"cross_attn_time\": [0.6256639957427979], \"mlp_time\": [0.3164159953594208]}, \"current\": {\"num_steps\": 20, \"step\": 16, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.549888014793396], \"attn_time\": [0.5150719881057739], \"cross_attn_time\": [0.5447679758071899], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.546239972114563], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5386239886283875], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5175679922103882], \"attn_time\": [0.4935680031776428], \"cross_attn_time\": [0.532480001449585], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5196160078048706], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5175679922103882], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5150719881057739], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5329279899597168], \"attn_time\": [0.5120000243186951], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5329279899597168], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.532480001449585], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5278079509735107], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5482879877090454], \"attn_time\": [0.5160959959030151], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.546239972114563], \"attn_time\": [0.5171200037002563], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5144959688186646], \"attn_time\": [0.49561598896980286], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4981119632720947], \"attn_time\": [0.4904960095882416], \"cross_attn_time\": [0.5150719881057739], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.521664023399353], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.509376049041748], \"attn_time\": [0.4864000082015991], \"cross_attn_time\": [0.5314559936523438], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5329279899597168], \"attn_time\": [0.5120000243186951], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5370240211486816], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5335040092468262], \"mlp_time\": [0.28569599986076355]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5329279899597168], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.532480001449585], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5226880311965942], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5124479532241821], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5298559665679932], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5226880311965942], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5380480289459229], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5267839431762695], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5196160078048706], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5308799743652344], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5283839702606201], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5360000133514404], \"attn_time\": [0.5160959959030151], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 17, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5399680137634277], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5427200198173523], \"mlp_time\": [0.2949120104312897]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5257600545883179], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5400960445404053], \"attn_time\": [0.5150719881057739], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.528831958770752], \"attn_time\": [0.5120000243186951], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5298559665679932], \"attn_time\": [0.5120000243186951], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5144959688186646], \"attn_time\": [0.4976640045642853], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5124479532241821], \"attn_time\": [0.4935680031776428], \"cross_attn_time\": [0.5304319858551025], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5237120389938354], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5345280170440674], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5134719610214233], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5206400156021118], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.491968035697937], \"attn_time\": [0.4915199875831604], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.27750399708747864]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5267839431762695], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.532480001449585], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5083520412445068], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5140479803085327], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5329279899597168], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5329279899597168], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.28672000765800476]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.521664023399353], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5273600220680237], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5155199766159058], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5319039821624756], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.499135971069336], \"attn_time\": [0.49459201097488403], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [14.97599983215332], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5130239725112915], \"mlp_time\": [13.750271797180176]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5267839431762695], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5124479532241821], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5109760165214539], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5104000568389893], \"attn_time\": [0.49561598896980286], \"cross_attn_time\": [0.5232639908790588], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5267839431762695], \"attn_time\": [0.5099520087242126], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5329279899597168], \"attn_time\": [0.502784013748169], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5380480289459229], \"attn_time\": [0.5109760165214539], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.29388800263404846]}, \"current\": {\"num_steps\": 20, \"step\": 18, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5278079509735107], \"attn_time\": [0.5140479803085327], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.2949120104312897]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 0, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4960639476776123], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5150719881057739], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 1, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.533951997756958], \"attn_time\": [0.5038080215454102], \"cross_attn_time\": [0.5314559936523438], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 2, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5032320022583008], \"attn_time\": [0.49459201097488403], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.2877439856529236]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 3, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.481727957725525], \"attn_time\": [0.48025599122047424], \"cross_attn_time\": [0.5150719881057739], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 4, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.511423945426941], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 5, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.499135971069336], \"attn_time\": [0.4997119903564453], \"cross_attn_time\": [0.5140479803085327], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 6, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5063040256500244], \"attn_time\": [0.49561598896980286], \"cross_attn_time\": [0.5222399830818176], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 7, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5247360467910767], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5242879986763], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 8, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5329279899597168], \"attn_time\": [0.5089280009269714], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 9, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.521664023399353], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5263360142707825], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 10, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.533951997756958], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.532480001449585], \"mlp_time\": [0.2836480140686035]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 11, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.4950400590896606], \"attn_time\": [0.48947200179100037], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.27955201268196106]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 12, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5144959688186646], \"attn_time\": [0.5048320293426514], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 13, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5134719610214233], \"attn_time\": [0.5007359981536865], \"cross_attn_time\": [0.5160959959030151], \"mlp_time\": [0.28467199206352234]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 14, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5298559665679932], \"attn_time\": [0.5120000243186951], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 15, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5257600545883179], \"attn_time\": [0.5099520087242126], \"cross_attn_time\": [0.5181440114974976], \"mlp_time\": [0.289792001247406]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 16, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5226880311965942], \"attn_time\": [0.506879985332489], \"cross_attn_time\": [0.5171200037002563], \"mlp_time\": [0.2826240062713623]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 17, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5431679487228394], \"attn_time\": [0.5171200037002563], \"cross_attn_time\": [0.52019202709198], \"mlp_time\": [0.29183998703956604]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 18, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5431679487228394], \"attn_time\": [0.5140479803085327], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.2959359884262085]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 19, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5052800178527832], \"attn_time\": [0.5017600059509277], \"cross_attn_time\": [0.5058559775352478], \"mlp_time\": [0.2887679934501648]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 20, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5144959688186646], \"attn_time\": [0.49561598896980286], \"cross_attn_time\": [0.5253120064735413], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 21, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5175679922103882], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5294079780578613], \"mlp_time\": [0.2805759906768799]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 22, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5257600545883179], \"attn_time\": [0.5058559775352478], \"cross_attn_time\": [0.5191680192947388], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 23, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5349760055541992], \"attn_time\": [0.5079039931297302], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.29286399483680725]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 24, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.5052800178527832], \"attn_time\": [0.4915199875831604], \"cross_attn_time\": [0.5150719881057739], \"mlp_time\": [0.2908160090446472]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 25, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.516543984413147], \"attn_time\": [0.4986880123615265], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.2815999984741211]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 26, \"is_force_fresh\": true, \"module\": \"mlp\"}}, {\"timing_info\": {\"block_time\": [1.504256010055542], \"attn_time\": [0.49663999676704407], \"cross_attn_time\": [0.5212159752845764], \"mlp_time\": [0.27852800488471985]}, \"current\": {\"num_steps\": 20, \"step\": 19, \"layer\": 27, \"is_force_fresh\": true, \"module\": \"mlp\"}}]"
  },
  {
    "path": "PixArt-alpha-ToCa/tools/VLM_caption_lightning.py",
    "content": "# {'model': 'LLaVA-7B-v0', 'prompt': 'You are LLaVA, a large language and vision assistant trained by UW Madison WAIV Lab.You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.Follow the instructions carefully and explain your answers in detail.###Human: Hi!###Assistant: Hi there!  How can I help you today?\\n###Human: ?\\n<image>###Assistant:', 'temperature': 0.2, 'max_new_tokens': 512, 'stop': '###', 'images': \"List of 1 images: ['793f00027d3dc5bd69445a388a2f289c']\"}\nimport sys\nfrom pathlib import Path\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\nimport argparse\nimport torch\nfrom transformers import AutoTokenizer, CLIPImageProcessor, CLIPVisionModel, AutoConfig\nfrom diffusion.model.llava import LlavaMPTForCausalLM\nfrom PIL import Image\nfrom tqdm import tqdm\nfrom os import path, makedirs\nfrom torch.utils.data import Dataset, DataLoader\nimport json\n\n\nDEFAULT_IMAGE_TOKEN = \"<image>\"\nDEFAULT_IMAGE_PATCH_TOKEN = \"<im_patch>\"\nDEFAULT_IM_START_TOKEN = \"<im_start>\"\nDEFAULT_IM_END_TOKEN = \"<im_end>\"\n\n\ndef expand2square(pil_img, background_color=(122, 116, 104)):\n    width, height = pil_img.size\n    if width == height:\n        return pil_img\n    elif width > height:\n        result = Image.new(pil_img.mode, (width, width), background_color)\n        result.paste(pil_img, (0, (width - height) // 2))\n        return result\n    else:\n        result = Image.new(pil_img.mode, (height, height), background_color)\n        result.paste(pil_img, ((height - width) // 2, 0))\n        return result\n\n\ndef pad2square(image):\n    max_hw, min_hw = max(image.size), min(image.size)\n    aspect_ratio = max_hw / min_hw\n    max_len, min_len = 800, 400\n    shortest_edge = int(min(max_len / aspect_ratio, min_len, min_hw))\n    longest_edge = int(shortest_edge * aspect_ratio)\n    W, H = image.size\n    if H > W:\n        H, W = longest_edge, shortest_edge\n    else:\n        H, W = shortest_edge, longest_edge\n    image = image.resize((W, H))\n    return image\n\n\ndef load_model(model_path):\n    tokenizer = AutoTokenizer.from_pretrained(model_path)\n    model = LlavaMPTForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, low_cpu_mem_usage=True)\n\n    mm_use_im_start_end = getattr(model.config, \"mm_use_im_start_end\", False)\n    tokenizer.add_tokens([DEFAULT_IMAGE_PATCH_TOKEN], special_tokens=True)\n    if mm_use_im_start_end:\n        tokenizer.add_tokens([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True)\n\n    vision_tower = model.get_model().vision_tower[0]\n    if vision_tower.device.type == 'meta':\n        vision_tower = CLIPVisionModel.from_pretrained(\n            vision_tower.config._name_or_path, torch_dtype=torch.float16, low_cpu_mem_usage=True).cuda()\n        model.get_model().vision_tower[0] = vision_tower\n    else:\n        vision_tower.to(device='cuda', dtype=torch.float16)\n    vision_config = vision_tower.config\n    vision_config.im_patch_token = tokenizer.convert_tokens_to_ids(\n        [DEFAULT_IMAGE_PATCH_TOKEN])[0]\n    vision_config.use_im_start_end = mm_use_im_start_end\n    if mm_use_im_start_end:\n        vision_config.im_start_token, vision_config.im_end_token = tokenizer.convert_tokens_to_ids(\n            [DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN])\n\n    model.cuda()\n\n    if hasattr(model.config, \"max_sequence_length\"):\n        context_len = model.config.max_sequence_length\n    else:\n        context_len = 2048\n\n    return tokenizer, model, context_len\n\n\nclass SanitizedLaion(Dataset):\n    def __init__(self, root_dir, index_file, prompt, config, img_extension='.jpg', caption=True) -> None:\n        super().__init__()\n        self.root_dir = root_dir\n        self.image_processor = CLIPImageProcessor.from_pretrained(AutoConfig.from_pretrained(config).mm_vision_tower, torch_dtype=torch.float16)\n        self.prompt = prompt\n        self.img_extension = img_extension\n        self.caption=caption\n\n        if '.txt' in index_file:\n            with open(index_file, 'r') as f:\n                self.lines = f.readlines()\n        elif '.json' in index_file:\n            with open(index_file, 'r') as f:\n                self.lines = json.load(f)\n        else:\n            raise ValueError(f'{index_file} format not supported')\n\n    def __len__(self):\n        return len(self.lines)\n\n    def __getitem__(self, idx):\n        item = self.lines[idx]\n        caption = item['prompt'].strip()\n        prompt = self.prompt.format(caption) if self.caption else self.prompt\n        with open(path.join(self.root_dir, item['path']), 'rb') as f:\n            img = pad2square(Image.open(f).convert('RGB'))\n        return self.image_processor(img, return_tensors='pt')['pixel_values'].squeeze(), prompt, item['path'].split(self.img_extension)[0]\n\n\n@torch.no_grad()\ndef caption(tokenizer, model, context_len, images, prompt, prefix):\n    images = images.to(model.device, dtype=torch.float16)\n    # HACK: 256 is the max image token length hacked\n    replace_token = DEFAULT_IMAGE_PATCH_TOKEN * 256\n    if getattr(model.config, 'mm_use_im_start_end', False):\n        replace_token = DEFAULT_IM_START_TOKEN + replace_token + DEFAULT_IM_END_TOKEN\n\n    prompt = list(map(lambda p: p.replace(DEFAULT_IMAGE_TOKEN, replace_token), prompt))\n\n    temperature = 0.2\n    max_new_tokens = 1024\n    stop_str = '<|im_end|>'\n\n    max_src_len = context_len - max_new_tokens - 8\n    input_ids = tokenizer(prompt).input_ids\n    input_ids = list(map(lambda input_id: input_id[-max_src_len:], input_ids))\n    lens = list(map(lambda x: len(x), input_ids))\n    longest = max(lens)\n    input_ids = list(map(lambda x: x if len(x) == longest else [tokenizer.pad_token_id] * (longest - len(x)) + x, input_ids))\n\n    pred_ids = torch.zeros([images.shape[0], 0], device=model.device, dtype=torch.long)\n    past_key_values = None\n    finish = [False] * images.shape[0]\n    for i in tqdm(range(max_new_tokens), leave=False):\n        if i == 0:\n            out = model(\n                torch.as_tensor(input_ids).cuda(),\n                use_cache=True,\n                images=images)\n            del images\n        else:\n            attention_mask = torch.ones(1, past_key_values[0][0].shape[-2] + 1, device=\"cuda\")\n            out = model(input_ids=token,\n                        use_cache=True,\n                        attention_mask=attention_mask,\n                        past_key_values=past_key_values)\n        past_key_values = out.past_key_values\n        logits = out.logits\n        last_token_logits = logits[:, -1]\n        if temperature < 1e-4:\n            token = torch.argmax(last_token_logits)\n        else:\n            probs = torch.softmax(last_token_logits / temperature, dim=-1)\n            token = torch.multinomial(probs, num_samples=1)\n\n        pred_ids = torch.concatenate([pred_ids, token], dim=1)\n\n        for ii in torch.nonzero(token.cpu() == tokenizer.eos_token_id, as_tuple=True)[0]:\n            if finish[ii]:\n                continue\n            ii = int(ii)\n            output = tokenizer.decode(pred_ids[ii][:-1]).removesuffix(stop_str)\n            finish[ii] = True\n            yield output, prefix[ii]\n\n        if all(finish):\n            break\n\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"--model-path\", type=str, default=\"liuhaotian/LLaVA-Lightning-MPT-7B-preview\")\n    parser.add_argument(\"--data-root\", type=str, required=True)\n    parser.add_argument('--index', type=str, required=True)\n    parser.add_argument('--output', type=str, required=True)\n    args = parser.parse_args()\n\n    prompt = \"\"\"<|im_start|>system\n    - You are LLaVA, a large language and vision assistant trained by UW Madison WAIV Lab.\n    - You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.\n    - You should follow the instructions carefully and explain your answers in detail.<|im_end|><|im_start|>user\n    Given the caption of this image \"{}\", describe this image in a very detailed manner\n    <image><|im_end|><|im_start|>assistant\\n\"\"\"\n\n    prompt_nocap = \"\"\"<|im_start|>system\n    - You are LLaVA, a large language and vision assistant trained by UW Madison WAIV Lab.\n    - You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.\n    - You should follow the instructions carefully and explain your answers in detail.<|im_end|><|im_start|>user\n    Describe this image in a very detailed manner\n    <image><|im_end|><|im_start|>assistant\\n\"\"\"\n    d = SanitizedLaion(args.data_root, args.index, prompt, args.model_path, img_extension='.png')\n    l = DataLoader(d, batch_size=32, pin_memory=True, num_workers=10)\n\n    tokenizer, model, context_len = load_model(args.model_path)\n    # model = torch.compile(model)\n    for b in tqdm(l):\n        for c, p in caption(tokenizer, model, context_len, *b):\n            o = path.join(args.output, f'{p}.txt')\n            makedirs(path.dirname(o), exist_ok=True, mode=0o755)\n            with open(o, 'w') as k:\n                k.write(c)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/tools/convert_pixart_alpha_to_diffusers.py",
    "content": "import argparse\nimport os\n\nimport torch\nfrom transformers import T5EncoderModel, T5Tokenizer\n\nfrom diffusers import AutoencoderKL, DPMSolverMultistepScheduler, PixArtAlphaPipeline, Transformer2DModel\n\n\nckpt_id = \"PixArt-alpha/PixArt-alpha\"\n# https://github.com/PixArt-alpha/PixArt-alpha/blob/0f55e922376d8b797edd44d25d0e7464b260dcab/scripts/inference.py#L125\ninterpolation_scale = {256: 0.5, 512: 1, 1024: 2}\n\n\ndef main(args):\n    all_state_dict = torch.load(args.orig_ckpt_path, map_location='cpu')\n    state_dict = all_state_dict.pop(\"state_dict\")\n    converted_state_dict = {}\n\n    # Patch embeddings.\n    converted_state_dict[\"pos_embed.proj.weight\"] = state_dict.pop(\"x_embedder.proj.weight\")\n    converted_state_dict[\"pos_embed.proj.bias\"] = state_dict.pop(\"x_embedder.proj.bias\")\n\n    # Caption projection.\n    converted_state_dict[\"caption_projection.linear_1.weight\"] = state_dict.pop(\"y_embedder.y_proj.fc1.weight\")\n    converted_state_dict[\"caption_projection.linear_1.bias\"] = state_dict.pop(\"y_embedder.y_proj.fc1.bias\")\n    converted_state_dict[\"caption_projection.linear_2.weight\"] = state_dict.pop(\"y_embedder.y_proj.fc2.weight\")\n    converted_state_dict[\"caption_projection.linear_2.bias\"] = state_dict.pop(\"y_embedder.y_proj.fc2.bias\")\n\n    # AdaLN-single LN\n    converted_state_dict[\"adaln_single.emb.timestep_embedder.linear_1.weight\"] = state_dict.pop(\n        \"t_embedder.mlp.0.weight\"\n    )\n    converted_state_dict[\"adaln_single.emb.timestep_embedder.linear_1.bias\"] = state_dict.pop(\"t_embedder.mlp.0.bias\")\n    converted_state_dict[\"adaln_single.emb.timestep_embedder.linear_2.weight\"] = state_dict.pop(\n        \"t_embedder.mlp.2.weight\"\n    )\n    converted_state_dict[\"adaln_single.emb.timestep_embedder.linear_2.bias\"] = state_dict.pop(\"t_embedder.mlp.2.bias\")\n\n    if args.image_size == 1024 and args.multi_scale_train:\n        # Resolution.\n        converted_state_dict[\"adaln_single.emb.resolution_embedder.linear_1.weight\"] = state_dict.pop(\n            \"csize_embedder.mlp.0.weight\"\n        )\n        converted_state_dict[\"adaln_single.emb.resolution_embedder.linear_1.bias\"] = state_dict.pop(\n            \"csize_embedder.mlp.0.bias\"\n        )\n        converted_state_dict[\"adaln_single.emb.resolution_embedder.linear_2.weight\"] = state_dict.pop(\n            \"csize_embedder.mlp.2.weight\"\n        )\n        converted_state_dict[\"adaln_single.emb.resolution_embedder.linear_2.bias\"] = state_dict.pop(\n            \"csize_embedder.mlp.2.bias\"\n        )\n        # Aspect ratio.\n        converted_state_dict[\"adaln_single.emb.aspect_ratio_embedder.linear_1.weight\"] = state_dict.pop(\n            \"ar_embedder.mlp.0.weight\"\n        )\n        converted_state_dict[\"adaln_single.emb.aspect_ratio_embedder.linear_1.bias\"] = state_dict.pop(\n            \"ar_embedder.mlp.0.bias\"\n        )\n        converted_state_dict[\"adaln_single.emb.aspect_ratio_embedder.linear_2.weight\"] = state_dict.pop(\n            \"ar_embedder.mlp.2.weight\"\n        )\n        converted_state_dict[\"adaln_single.emb.aspect_ratio_embedder.linear_2.bias\"] = state_dict.pop(\n            \"ar_embedder.mlp.2.bias\"\n        )\n    # Shared norm.\n    converted_state_dict[\"adaln_single.linear.weight\"] = state_dict.pop(\"t_block.1.weight\")\n    converted_state_dict[\"adaln_single.linear.bias\"] = state_dict.pop(\"t_block.1.bias\")\n\n    for depth in range(28):\n        # Transformer blocks.\n        converted_state_dict[f\"transformer_blocks.{depth}.scale_shift_table\"] = state_dict.pop(\n            f\"blocks.{depth}.scale_shift_table\"\n        )\n\n        # Attention is all you need 🤘\n\n        # Self attention.\n        q, k, v = torch.chunk(state_dict.pop(f\"blocks.{depth}.attn.qkv.weight\"), 3, dim=0)\n        q_bias, k_bias, v_bias = torch.chunk(state_dict.pop(f\"blocks.{depth}.attn.qkv.bias\"), 3, dim=0)\n        converted_state_dict[f\"transformer_blocks.{depth}.attn1.to_q.weight\"] = q\n        converted_state_dict[f\"transformer_blocks.{depth}.attn1.to_q.bias\"] = q_bias\n        converted_state_dict[f\"transformer_blocks.{depth}.attn1.to_k.weight\"] = k\n        converted_state_dict[f\"transformer_blocks.{depth}.attn1.to_k.bias\"] = k_bias\n        converted_state_dict[f\"transformer_blocks.{depth}.attn1.to_v.weight\"] = v\n        converted_state_dict[f\"transformer_blocks.{depth}.attn1.to_v.bias\"] = v_bias\n        # Projection.\n        converted_state_dict[f\"transformer_blocks.{depth}.attn1.to_out.0.weight\"] = state_dict.pop(\n            f\"blocks.{depth}.attn.proj.weight\"\n        )\n        converted_state_dict[f\"transformer_blocks.{depth}.attn1.to_out.0.bias\"] = state_dict.pop(\n            f\"blocks.{depth}.attn.proj.bias\"\n        )\n\n        # Feed-forward.\n        converted_state_dict[f\"transformer_blocks.{depth}.ff.net.0.proj.weight\"] = state_dict.pop(\n            f\"blocks.{depth}.mlp.fc1.weight\"\n        )\n        converted_state_dict[f\"transformer_blocks.{depth}.ff.net.0.proj.bias\"] = state_dict.pop(\n            f\"blocks.{depth}.mlp.fc1.bias\"\n        )\n        converted_state_dict[f\"transformer_blocks.{depth}.ff.net.2.weight\"] = state_dict.pop(\n            f\"blocks.{depth}.mlp.fc2.weight\"\n        )\n        converted_state_dict[f\"transformer_blocks.{depth}.ff.net.2.bias\"] = state_dict.pop(\n            f\"blocks.{depth}.mlp.fc2.bias\"\n        )\n\n        # Cross-attention.\n        q = state_dict.pop(f\"blocks.{depth}.cross_attn.q_linear.weight\")\n        q_bias = state_dict.pop(f\"blocks.{depth}.cross_attn.q_linear.bias\")\n        k, v = torch.chunk(state_dict.pop(f\"blocks.{depth}.cross_attn.kv_linear.weight\"), 2, dim=0)\n        k_bias, v_bias = torch.chunk(state_dict.pop(f\"blocks.{depth}.cross_attn.kv_linear.bias\"), 2, dim=0)\n\n        converted_state_dict[f\"transformer_blocks.{depth}.attn2.to_q.weight\"] = q\n        converted_state_dict[f\"transformer_blocks.{depth}.attn2.to_q.bias\"] = q_bias\n        converted_state_dict[f\"transformer_blocks.{depth}.attn2.to_k.weight\"] = k\n        converted_state_dict[f\"transformer_blocks.{depth}.attn2.to_k.bias\"] = k_bias\n        converted_state_dict[f\"transformer_blocks.{depth}.attn2.to_v.weight\"] = v\n        converted_state_dict[f\"transformer_blocks.{depth}.attn2.to_v.bias\"] = v_bias\n\n        converted_state_dict[f\"transformer_blocks.{depth}.attn2.to_out.0.weight\"] = state_dict.pop(\n            f\"blocks.{depth}.cross_attn.proj.weight\"\n        )\n        converted_state_dict[f\"transformer_blocks.{depth}.attn2.to_out.0.bias\"] = state_dict.pop(\n            f\"blocks.{depth}.cross_attn.proj.bias\"\n        )\n\n    # Final block.\n    converted_state_dict[\"proj_out.weight\"] = state_dict.pop(\"final_layer.linear.weight\")\n    converted_state_dict[\"proj_out.bias\"] = state_dict.pop(\"final_layer.linear.bias\")\n    converted_state_dict[\"scale_shift_table\"] = state_dict.pop(\"final_layer.scale_shift_table\")\n\n    # DiT XL/2\n    transformer = Transformer2DModel(\n        sample_size=args.image_size // 8,\n        num_layers=28,\n        attention_head_dim=72,\n        in_channels=4,\n        out_channels=8,\n        patch_size=2,\n        attention_bias=True,\n        num_attention_heads=16,\n        cross_attention_dim=1152,\n        activation_fn=\"gelu-approximate\",\n        num_embeds_ada_norm=1000,\n        norm_type=\"ada_norm_single\",\n        norm_elementwise_affine=False,\n        norm_eps=1e-6,\n        caption_channels=4096,\n    )\n    transformer.load_state_dict(converted_state_dict, strict=True)\n\n    assert transformer.pos_embed.pos_embed is not None\n    state_dict.pop(\"pos_embed\")\n    state_dict.pop(\"y_embedder.y_embedding\")\n    assert len(state_dict) == 0, f\"State dict is not empty, {state_dict.keys()}\"\n\n    num_model_params = sum(p.numel() for p in transformer.parameters())\n    print(f\"Total number of transformer parameters: {num_model_params}\")\n\n    if args.only_transformer:\n        transformer.save_pretrained(os.path.join(args.dump_path, \"transformer\"))\n    else:\n        scheduler = DPMSolverMultistepScheduler()\n\n        vae = AutoencoderKL.from_pretrained(ckpt_id, subfolder=\"sd-vae-ft-ema\")\n\n        tokenizer = T5Tokenizer.from_pretrained(ckpt_id, subfolder=\"t5-v1_1-xxl\")\n        text_encoder = T5EncoderModel.from_pretrained(ckpt_id, subfolder=\"t5-v1_1-xxl\")\n\n        pipeline = PixArtAlphaPipeline(\n            tokenizer=tokenizer, text_encoder=text_encoder, transformer=transformer, vae=vae, scheduler=scheduler\n        )\n\n        pipeline.save_pretrained(args.dump_path)\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n\n    # set multi_scale_train=True if using PixArtMS structure during training else set it to False\n    parser.add_argument(\"--multi_scale_train\", default=True, type=str, required=True, help=\"If use Multi-Scale PixArtMS structure during training.\")\n    parser.add_argument(\"--orig_ckpt_path\", default=None, type=str, required=False, help=\"Path to the checkpoint to convert.\")\n    parser.add_argument(\n        \"--image_size\",\n        default=1024,\n        type=int,\n        choices=[256, 512, 1024],\n        required=False,\n        help=\"Image size of pretrained model, either 512 or 1024.\",\n    )\n    parser.add_argument(\"--dump_path\", default=None, type=str, required=True, help=\"Path to the output pipeline.\")\n    parser.add_argument(\"--only_transformer\", default=True, type=bool, required=True)\n\n    args = parser.parse_args()\n    main(args)\n"
  },
  {
    "path": "PixArt-alpha-ToCa/tools/download.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n\n\"\"\"\nFunctions for downloading pre-trained PixArt models\n\"\"\"\nfrom torchvision.datasets.utils import download_url\nimport torch\nimport os\nimport argparse\n\n\npretrained_models = {'PixArt-XL-2-512x512.pth', 'PixArt-XL-2-1024-MS.pth'}\nvae_models = {\n    'sd-vae-ft-ema/config.json',\n    'sd-vae-ft-ema/diffusion_pytorch_model.bin'\n}\nt5_models = {\n    't5-v1_1-xxl/config.json', 't5-v1_1-xxl/pytorch_model-00001-of-00002.bin',\n    't5-v1_1-xxl/pytorch_model-00002-of-00002.bin', 't5-v1_1-xxl/pytorch_model.bin.index.json',\n    't5-v1_1-xxl/special_tokens_map.json', 't5-v1_1-xxl/spiece.model',\n    't5-v1_1-xxl/tokenizer_config.json',\n}\n\n\ndef find_model(model_name):\n    \"\"\"\n    Finds a pre-trained G.pt model, downloading it if necessary. Alternatively, loads a model from a local path.\n    \"\"\"\n    if model_name in pretrained_models:\n        return download_model(model_name)\n    assert os.path.isfile(model_name), f'Could not find PixArt checkpoint at {model_name}'\n    return torch.load(model_name, map_location=lambda storage, loc: storage)\n\n\ndef download_model(model_name):\n    \"\"\"\n    Downloads a pre-trained PixArt model from the web.\n    \"\"\"\n    assert model_name in pretrained_models\n    local_path = f'output/pretrained_models/{model_name}'\n    if not os.path.isfile(local_path):\n        os.makedirs('output/pretrained_models', exist_ok=True)\n        web_path = f'https://huggingface.co/PixArt-alpha/PixArt-alpha/resolve/main/{model_name}'\n        download_url(web_path, 'output/pretrained_models')\n    return torch.load(local_path, map_location=lambda storage, loc: storage)\n\n\ndef download_other(model_name, model_zoo, output_dir):\n    \"\"\"\n    Downloads a pre-trained PixArt model from the web.\n    \"\"\"\n    assert model_name in model_zoo\n    local_path = os.path.join(output_dir, model_name)\n    if not os.path.isfile(local_path):\n        os.makedirs(output_dir, exist_ok=True)\n        web_path = f'https://huggingface.co/PixArt-alpha/PixArt-alpha/resolve/main/{model_name}'\n        print(web_path)\n        download_url(web_path, os.path.join(output_dir, model_name.split('/')[0]))\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--model_names', nargs='+', type=str, default=pretrained_models)\n    args = parser.parse_args()\n    model_names = args.model_names\n    model_names = set(model_names)\n\n    # Download PixArt checkpoints\n    for t5_model in t5_models:\n        download_other(t5_model, t5_models, 'output/pretrained_models/t5_ckpts')\n    for vae_model in vae_models:\n        download_other(vae_model, vae_models, 'output/pretrained_models/')\n    for model in model_names:\n        download_model(model)    # for vae_model in vae_models:\n    print('Done.')\n"
  },
  {
    "path": "PixArt-alpha-ToCa/tools/extract_features.py",
    "content": "import os\nfrom pathlib import Path\nimport sys\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\nfrom PIL import Image\nimport torch\nfrom torchvision import transforms as T\nimport numpy as np\nimport json\nfrom tqdm import tqdm\nimport argparse\nimport threading\nfrom queue import Queue\nfrom pathlib import Path\nfrom torch.utils.data import DataLoader, RandomSampler\nfrom accelerate import Accelerator\nfrom torchvision.transforms.functional import InterpolationMode\nfrom torchvision.datasets.folder import default_loader\n\nfrom diffusion.model.t5 import T5Embedder\nfrom diffusers.models import AutoencoderKL\nfrom diffusion.data.datasets.InternalData import InternalData\nfrom diffusion.utils.misc import SimpleTimer\nfrom diffusion.utils.data_sampler import AspectRatioBatchSampler\nfrom diffusion.data.builder import DATASETS\nfrom diffusion.data import ASPECT_RATIO_512, ASPECT_RATIO_1024\n\n\ndef get_closest_ratio(height: float, width: float, ratios: dict):\n    aspect_ratio = height / width\n    closest_ratio = min(ratios.keys(), key=lambda ratio: abs(float(ratio) - aspect_ratio))\n    return ratios[closest_ratio], float(closest_ratio)\n\n\n@DATASETS.register_module()\nclass DatasetMS(InternalData):\n    def __init__(self, root, image_list_json=None, transform=None, resolution=1024, load_vae_feat=False, aspect_ratio_type=None, start_index=0, end_index=100000000, **kwargs):\n        if image_list_json is None:\n            image_list_json = ['data_info.json']\n        assert os.path.isabs(root), 'root must be a absolute path'\n        self.root = root\n        self.img_dir_name = 'InternalImgs'        # need to change to according to your data structure\n        self.json_dir_name = 'InternalData'        # need to change to according to your data structure\n        self.transform = transform\n        self.load_vae_feat = load_vae_feat\n        self.resolution = resolution\n        self.meta_data_clean = []\n        self.img_samples = []\n        self.txt_feat_samples = []\n        self.aspect_ratio = aspect_ratio_type\n        assert self.aspect_ratio in [ASPECT_RATIO_1024, ASPECT_RATIO_512]\n        self.ratio_index = {}\n        self.ratio_nums = {}\n        for k, v in self.aspect_ratio.items():\n            self.ratio_index[float(k)] = []     # used for self.getitem\n            self.ratio_nums[float(k)] = 0      # used for batch-sampler\n\n        image_list_json = image_list_json if isinstance(image_list_json, list) else [image_list_json]\n        for json_file in image_list_json:\n            meta_data = self.load_json(os.path.join(self.root, 'partition', json_file))\n            meta_data_clean = [item for item in meta_data if item['ratio'] <= 4]\n            self.meta_data_clean.extend(meta_data_clean)\n            self.img_samples.extend([os.path.join(self.root.replace(self.json_dir_name, self.img_dir_name), item['path']) for item in meta_data_clean])\n\n        self.img_samples = self.img_samples[start_index: end_index]\n        # scan the dataset for ratio static\n        for i, info in enumerate(self.meta_data_clean[:len(self.meta_data_clean)//3]):\n            ori_h, ori_w = info['height'], info['width']\n            closest_size, closest_ratio = get_closest_ratio(ori_h, ori_w, self.aspect_ratio)\n            self.ratio_nums[closest_ratio] += 1\n            if len(self.ratio_index[closest_ratio]) == 0:\n                self.ratio_index[closest_ratio].append(i)\n\n        # Set loader and extensions\n        if self.load_vae_feat:\n            raise ValueError(\"No VAE loader here\")\n        self.loader = default_loader\n\n    def __getitem__(self, idx):\n        data_info = {}\n        for _ in range(20):\n            try:\n                img_path = self.img_samples[idx]\n                img = self.loader(img_path)\n                if self.transform:\n                    img = self.transform(img)\n                # Calculate closest aspect ratio and resize & crop image[w, h]\n                if isinstance(img, Image.Image):\n                    h, w = (img.size[1], img.size[0])\n                    assert h, w == (self.meta_data_clean[idx]['height'], self.meta_data_clean[idx]['width'])\n                    closest_size, closest_ratio = get_closest_ratio(h, w, self.aspect_ratio)\n                    closest_size = list(map(lambda x: int(x), closest_size))\n                    transform = T.Compose([\n                        T.Lambda(lambda img: img.convert('RGB')),\n                        T.Resize(closest_size, interpolation=InterpolationMode.BICUBIC),  # Image.BICUBIC\n                        T.CenterCrop(closest_size),\n                        T.ToTensor(),\n                        T.Normalize([.5], [.5]),\n                    ])\n                    img = transform(img)\n                    data_info['img_hw'] = torch.tensor([h, w], dtype=torch.float32)\n                    data_info['aspect_ratio'] = closest_ratio\n                # change the path according to your data structure\n                return img, '_'.join(self.img_samples[idx].rsplit('/', 2)[-2:]) # change from 'serial-number-of-dir/serial-number-of-image.png' ---> 'serial-number-of-dir_serial-number-of-image.png'\n            except Exception as e:\n                print(f\"Error details: {str(e)}\")\n                idx = np.random.randint(len(self))\n        raise RuntimeError('Too many bad data.')\n\n    def get_data_info(self, idx):\n        data_info = self.meta_data_clean[idx]\n        return {'height': data_info['height'], 'width': data_info['width']}\n\n\ndef extract_caption_t5_do(q):\n    while not q.empty():\n        item = q.get()\n        extract_caption_t5_job(item)\n        q.task_done()\n\n\ndef extract_caption_t5_job(item):\n    global mutex\n    global t5\n    global t5_save_dir\n\n    with torch.no_grad():\n        caption = item['prompt'].strip()\n        if isinstance(caption, str):\n            caption = [caption]\n\n        save_path = os.path.join(t5_save_dir, Path(item['path']).stem)\n        if os.path.exists(f\"{save_path}.npz\"):\n            return\n        try:\n            mutex.acquire()\n            caption_emb, emb_mask = t5.get_text_embeddings(caption)\n            mutex.release()\n            emb_dict = {\n                'caption_feature': caption_emb.float().cpu().data.numpy(),\n                'attention_mask': emb_mask.cpu().data.numpy(),\n            }\n            np.savez_compressed(save_path, **emb_dict)\n        except Exception as e:\n            print(e)\n\n\ndef extract_caption_t5():\n    global t5\n    global t5_save_dir\n    # global images_extension\n    t5 = T5Embedder(device=\"cuda\", local_cache=True, cache_dir=f'{args.pretrained_models_dir}/t5_ckpts', model_max_length=120)\n    t5_save_dir = args.t5_save_root\n    os.makedirs(t5_save_dir, exist_ok=True)\n\n    train_data_json = json.load(open(args.json_path, 'r'))\n    train_data = train_data_json[args.start_index: args.end_index]\n\n    global mutex\n    mutex = threading.Lock()\n    jobs = Queue()\n\n    for item in tqdm(train_data):\n        jobs.put(item)\n\n    for _ in range(20):\n        worker = threading.Thread(target=extract_caption_t5_do, args=(jobs,))\n        worker.start()\n\n    jobs.join()\n\n\ndef extract_img_vae_do(q):\n    while not q.empty():\n        item = q.get()\n        extract_img_vae_job(item)\n        q.task_done()\n\n\ndef extract_img_vae_job(item):\n    return\n\n\ndef extract_img_vae():\n    vae = AutoencoderKL.from_pretrained(f'{args.pretrained_models_dir}/sd-vae-ft-ema').to(device)\n\n    train_data_json = json.load(open(args.json_path, 'r'))\n    image_names = set()\n\n    vae_save_root = f'{args.vae_save_root}/{image_resize}resolution'\n    os.umask(0o000)  # file permission: 666; dir permission: 777\n    os.makedirs(vae_save_root, exist_ok=True)\n\n    vae_save_dir = os.path.join(vae_save_root, 'noflip')\n    os.makedirs(vae_save_dir, exist_ok=True)\n\n    for item in train_data_json:\n        image_name = item['path']\n        if image_name in image_names:\n            continue\n        image_names.add(image_name)\n    lines = sorted(image_names)\n    lines = lines[args.start_index: args.end_index]\n\n    _, images_extension = os.path.splitext(lines[0])\n\n    transform = T.Compose([\n        T.Lambda(lambda img: img.convert('RGB')),\n        T.Resize(image_resize),  # Image.BICUBIC\n        T.CenterCrop(image_resize),\n        T.ToTensor(),\n        T.Normalize([.5], [.5]),\n    ])\n\n    os.umask(0o000)  # file permission: 666; dir permission: 777\n    for image_name in tqdm(lines):\n        save_path = os.path.join(vae_save_dir, Path(image_name).stem)\n        if os.path.exists(f\"{save_path}.npy\"):\n            continue\n        try:\n            img = Image.open(f'{args.dataset_root}/{image_name}')\n            img = transform(img).to(device)[None]\n\n            with torch.no_grad():\n                posterior = vae.encode(img).latent_dist\n                z = torch.cat([posterior.mean, posterior.std], dim=1).detach().cpu().numpy().squeeze()\n\n            np.save(save_path, z)\n        except Exception as e:\n            print(e)\n            print(image_name)\n\n\ndef save_results(results, paths, signature, work_dir):\n    timer = SimpleTimer(len(results), log_interval=100, desc=\"Saving Results\")\n    # save to npy\n    new_paths = []\n    os.umask(0o000)  # file permission: 666; dir permission: 777\n    for res, p in zip(results, paths):\n        file_name = p.split('.')[0] + '.npy'\n        new_folder = signature\n        save_folder = os.path.join(work_dir, new_folder)\n        if os.path.exists(save_folder):\n            raise FileExistsError(f\"{save_folder} exists. BE careful not to overwrite your files. Comment this error raising for overwriting!!\")\n        os.makedirs(save_folder, exist_ok=True)\n        new_paths.append(os.path.join(new_folder, file_name))\n        np.save(os.path.join(save_folder, file_name), res)\n        timer.log()\n    # save paths\n    with open(os.path.join(work_dir, f\"VAE-{signature}.txt\"), 'w') as f:\n        f.write('\\n'.join(new_paths))\n\n\ndef inference(vae, dataloader, signature, work_dir):\n    timer = SimpleTimer(len(dataloader), log_interval=100, desc=\"VAE-Inference\")\n\n    for batch in dataloader:\n        with torch.no_grad():\n            with torch.cuda.amp.autocast(enabled=True):\n                posterior = vae.encode(batch[0]).latent_dist\n                results = torch.cat([posterior.mean, posterior.std], dim=1).detach().cpu().numpy()\n        path = batch[1]\n        save_results(results, path, signature=signature, work_dir=work_dir)\n        timer.log()\n\n\ndef extract_img_vae_multiscale(bs=1):\n\n    assert image_resize in [512, 1024]\n    work_dir = os.path.abspath(args.vae_save_root)\n    os.umask(0o000)  # file permission: 666; dir permission: 777\n    os.makedirs(work_dir, exist_ok=True)\n    accelerator = Accelerator(mixed_precision='fp16')\n    vae = AutoencoderKL.from_pretrained(f'{args.pretrained_models_dir}/sd-vae-ft-ema').to(device)\n\n    signature = 'ms'\n\n    aspect_ratio_type = ASPECT_RATIO_1024 if image_resize == 1024 else ASPECT_RATIO_512\n    dataset = DatasetMS(args.dataset_root, image_list_json=[args.json_file], transform=None, sample_subset=None,\n                        aspect_ratio_type=aspect_ratio_type, start_index=args.start_index, end_index=args.end_index)\n\n    # create AspectRatioBatchSampler\n    sampler = AspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset, batch_size=bs, aspect_ratios=dataset.aspect_ratio, ratio_nums=dataset.ratio_nums)\n\n    # create DataLoader\n    dataloader = DataLoader(dataset, batch_sampler=sampler, num_workers=13, pin_memory=True)\n    dataloader = accelerator.prepare(dataloader, )\n\n    inference(vae, dataloader, signature=signature, work_dir=work_dir)\n    accelerator.wait_for_everyone()\n\n    print('done')\n\n\ndef get_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"--multi_scale\", action='store_true', default=False, help=\"multi-scale feature extraction\")\n    parser.add_argument(\"--img_size\", default=512, type=int, help=\"image scale for multi-scale feature extraction\")\n    parser.add_argument('--start_index', default=0, type=int)\n    parser.add_argument('--end_index', default=1000000, type=int)\n    \n    parser.add_argument('--json_path', type=str)\n    parser.add_argument('--t5_save_root', default='data/data_toy/caption_feature_wmask', type=str)\n    parser.add_argument('--vae_save_root', default='data/data_toy/img_vae_features', type=str)\n    parser.add_argument('--dataset_root', default='data/data_toy', type=str)\n    parser.add_argument('--pretrained_models_dir', default='output/pretrained_models', type=str)\n\n    ### for multi-scale(ms) vae feauture extraction\n    parser.add_argument('--json_file', type=str)\n    return parser.parse_args()\n\n\nif __name__ == '__main__':\n\n    args = get_args()\n    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n    image_resize = args.img_size\n\n    # prepare extracted caption t5 features for training\n    extract_caption_t5()\n\n    # prepare extracted image vae features for training\n    if args.multi_scale:\n        print(f'Extracting Multi-scale Image Resolution based on {image_resize}')\n        extract_img_vae_multiscale(bs=1)    # recommend bs = 1 for AspectRatioBatchSampler\n    else:\n        print(f'Extracting Single Image Resolution {image_resize}')\n        extract_img_vae()"
  },
  {
    "path": "PixArt-alpha-ToCa/train.sh",
    "content": "CUDA_VISIBLE_DEVICES=5,6,7 python -m torch.distributed.launch --nproc_per_node=3 \\\n    --master_port=26662 train_scripts/train_controlnet.py \\\n    configs/pixart_app_config/PixArt_xl2_img1024_controlHed_Half.py \\\n    --work-dir output/debug"
  },
  {
    "path": "PixArt-alpha-ToCa/train_latents.py",
    "content": "import os\n\nimport sys\nimport types\nfrom pathlib import Path\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\nimport argparse\nimport datetime\nimport time\nimport warnings\nwarnings.filterwarnings(\"ignore\")  # ignore warning\nimport torch\nimport torch.nn as nn\nfrom accelerate import Accelerator, InitProcessGroupKwargs\nfrom accelerate.utils import DistributedType\nfrom diffusers.models import AutoencoderKL\nfrom torch.utils.data import RandomSampler\nfrom mmcv.runner import LogBuffer\nfrom copy import deepcopy\nfrom PIL import Image\nimport numpy as np\n\nfrom diffusion import IDDPM\nfrom diffusion.utils.checkpoint import save_checkpoint, load_checkpoint\nfrom diffusion.utils.dist_utils import synchronize, get_world_size, clip_grad_norm_\nfrom diffusion.data.builder import build_dataset, build_dataloader, set_data_root\nfrom diffusion.model.builder import build_model\nfrom diffusion.utils.logger import get_root_logger\nfrom diffusion.utils.misc import set_random_seed, read_config, init_random_seed, DebugUnderflowOverflow\nfrom diffusion.utils.optimizer import build_optimizer, auto_scale_lr\nfrom diffusion.utils.lr_scheduler import build_lr_scheduler\nfrom diffusion.utils.data_sampler import AspectRatioBatchSampler, BalancedAspectRatioBatchSampler\n\ndef set_fsdp_env():\n    os.environ[\"ACCELERATE_USE_FSDP\"] = 'true'\n    os.environ[\"FSDP_AUTO_WRAP_POLICY\"] = 'TRANSFORMER_BASED_WRAP'\n    os.environ[\"FSDP_BACKWARD_PREFETCH\"] = 'BACKWARD_PRE'\n    os.environ[\"FSDP_TRANSFORMER_CLS_TO_WRAP\"] = 'PixArtBlock'\n\n\ndef ema_update(model_dest: nn.Module, model_src: nn.Module, rate):\n    param_dict_src = dict(model_src.named_parameters())\n    for p_name, p_dest in model_dest.named_parameters():\n        p_src = param_dict_src[p_name]\n        assert p_src is not p_dest\n        p_dest.data.mul_(rate).add_((1 - rate) * p_src.data)\n\n\n\n\n\ndef train():\n    if config.get('debug_nan', False):\n        DebugUnderflowOverflow(model)\n        logger.info('NaN debugger registered. Start to detect overflow during training.')\n    time_start, last_tic = time.time(), time.time()\n    log_buffer = LogBuffer()\n        \n    start_step = start_epoch * len(train_dataloader)\n    global_step = 0\n    total_steps = len(train_dataloader) * config.num_epochs\n\n    # load_vae_feat = getattr(train_dataloader.dataset, 'load_vae_feat', False)\n    # Now you train the model\n    for epoch in range(start_epoch + 1, config.num_epochs + 1):\n        data_time_start= time.time()\n        data_time_all = 0\n        for step, batch in enumerate(train_dataloader):\n            data_time_all += time.time() - data_time_start\n            # if load_vae_feat:\n            z = batch[0]\n            # else:\n            #     with torch.no_grad():\n            #         with torch.cuda.amp.autocast(enabled=config.mixed_precision == 'fp16'):\n            #             posterior = vae.encode(batch[0]).latent_dist\n            #             if config.sample_posterior:\n            #                 z = posterior.sample()\n            #             else:\n            #                 z = posterior.mode()\n            clean_images = z * config.scale_factor\n            y = batch[1]\n            y_mask = batch[2]\n            data_info = batch[3]\n\n            # Sample a random timestep for each image\n            bs = clean_images.shape[0]\n            timesteps = torch.randint(0, config.train_sampling_steps, (bs,), device=clean_images.device).long()\n            grad_norm = None\n            with accelerator.accumulate(model):\n                # Predict the noise residual\n                optimizer.zero_grad()\n                loss_term = train_diffusion.training_losses(model, clean_images, timesteps, model_kwargs=dict(y=y, mask=y_mask, data_info=data_info))\n                loss = loss_term['loss'].mean()\n                accelerator.backward(loss)\n                if accelerator.sync_gradients:\n                    grad_norm = accelerator.clip_grad_norm_(model.parameters(), config.gradient_clip)\n                optimizer.step()\n                lr_scheduler.step()\n                if accelerator.sync_gradients:\n                    ema_update(model_ema, model, config.ema_rate)\n\n            lr = lr_scheduler.get_last_lr()[0]\n            logs = {args.loss_report_name: accelerator.gather(loss).mean().item()}\n            if grad_norm is not None:\n                logs.update(grad_norm=accelerator.gather(grad_norm).mean().item())\n            log_buffer.update(logs)\n            \n            # logging on terminal\n            if (step + 1) % config.log_interval == 0 or (step + 1) == 1:\n                t = (time.time() - last_tic) / config.log_interval\n                t_d = data_time_all / config.log_interval\n                avg_time = (time.time() - time_start) / (global_step + 1)\n                eta = str(datetime.timedelta(seconds=int(avg_time * (total_steps - start_step - global_step - 1))))\n                eta_epoch = str(datetime.timedelta(seconds=int(avg_time * (len(train_dataloader) - step - 1))))\n                # avg_loss = sum(loss_buffer) / len(loss_buffer)\n                log_buffer.average()\n                info = f\"Step/Epoch [{(epoch-1)*len(train_dataloader)+step+1}/{epoch}][{step + 1}/{len(train_dataloader)}]:total_eta: {eta}, \" \\\n                       f\"epoch_eta:{eta_epoch}, time_all:{t:.3f}, time_data:{t_d:.3f}, lr:{lr:.3e}, s:({model.module.h}, {model.module.w}), \"\n                info += ', '.join([f\"{k}:{v:.4f}\" for k, v in log_buffer.output.items()])\n                logger.info(info)\n                last_tic = time.time()\n                log_buffer.clear()\n                data_time_all = 0\n            logs.update(lr=lr)\n            accelerator.log(logs, step=global_step + start_step)\n\n            global_step += 1\n            data_time_start= time.time()\n\n            synchronize()\n            if accelerator.is_main_process:\n                if ((epoch - 1) * len(train_dataloader) + step + 1) % config.save_model_steps == 0:\n                    os.umask(0o000)\n                    save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),\n                                    epoch=epoch,\n                                    step=(epoch - 1) * len(train_dataloader) + step + 1,\n                                    model=accelerator.unwrap_model(model),\n                                    model_ema=accelerator.unwrap_model(model_ema),\n                                    optimizer=optimizer,\n                                    lr_scheduler=lr_scheduler\n                                    )\n            synchronize()\n\n        synchronize()\n        if accelerator.is_main_process:\n            if epoch % config.save_model_epochs == 0 or epoch == config.num_epochs:\n                os.umask(0o000)\n                save_checkpoint(os.path.join(config.output_dir, 'checkpoints'),\n                                epoch=epoch,\n                                step=(epoch - 1) * len(train_dataloader) + step + 1,\n                                model=accelerator.unwrap_model(model),\n                                model_ema=accelerator.unwrap_model(model_ema),\n                                optimizer=optimizer,\n                                lr_scheduler=lr_scheduler\n                                )\n            ########### EVAL ###################\n            if epoch % config.save_image_epochs == 0 or epoch == config.num_epochs:                \n                if config.validation_prompts is not None:\n                    logger.info(\"Running inference for collecting generated images...\")\n      \n                    assert config.eval_sampler in ['iddpm', 'dpm-solver', 'sa-solver']\n                    sample_steps_dict = {'iddpm': 100, 'dpm-solver': 20, 'sa-solver': 25}\n                    sample_steps = config.eval_steps if config.eval_steps != -1 else sample_steps_dict[config.eval_sampler]\n                    # base_ratios = eval(f'ASPECT_RATIO_{config.image_size}_TEST')\n                    \n                    eval_dir = os.path.join(config.output_dir, 'eval')\n                    os.makedirs(eval_dir, exist_ok=True)\n                    save_path = os.path.join(eval_dir, f'{epoch}_{global_step}.png')\n                    \n                    model.eval()\n                    images = []\n                    # device = t5.device\n                    for ip, prompt in enumerate(config.validation_prompts):\n                        prompts = [prompt]\n                        # prompts = []\n                        # prompt_clean, _, hw, ar, custom_hw = prepare_prompt_ar(prompt, base_ratios, device=device, show=False)  # ar for aspect ratio\n                        # if config.image_size == 1024:\n                            # latent_size_h, latent_size_w = int(hw[0, 0] // 8), int(hw[0, 1] // 8)\n                        # else:\n                        #     hw = torch.tensor([[config.image_size, config.image_size]], dtype=torch.float, device=device).repeat(bs, 1)\n                        #     ar = torch.tensor([[1.]], device=device).repeat(bs, 1)\n                        #     latent_size_h, latent_size_w = latent_size, latent_size\n                        # prompts.append(prompt_clean.strip())\n                        null_y = model.module.y_embedder.y_embedding[None].repeat(len(prompts), 1, 1)[:, None]\n                        \n                        with torch.no_grad():\n                            caption_embs, emb_masks, len_prompts = val_txt_embs[ip]\n                            # caption_embs, emb_masks = t5.get_text_embeddings(prompts)\n                            # caption_embs = caption_embs.float()[:, None]\n                            print(f'finish embedding')\n                            n = len_prompts\n                            if config.eval_sampler == 'iddpm':\n                                # Create sampling noise:\n                                z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device).repeat(2, 1, 1, 1)\n                                model_kwargs = dict(y=torch.cat([caption_embs, null_y]),\n                                                    cfg_scale=config.cfg_scale, data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks)\n                                diffusion = IDDPM(str(sample_steps))\n                                # Sample images:\n                                samples = diffusion.p_sample_loop(\n                                    model.module.forward_with_cfg, z.shape, z, clip_denoised=False, model_kwargs=model_kwargs, progress=True,\n                                    device=device\n                                )\n                                samples, _ = samples.chunk(2, dim=0)  # Remove null class samples\n                            elif config.eval_sampler == 'dpm-solver':\n                                # Create sampling noise:\n                                z = torch.randn(n, 4, latent_size_h, latent_size_w, device=device)\n                                model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks)\n                                dpm_solver = DPMS(model.module.forward_with_dpmsolver,\n                                                condition=caption_embs,\n                                                uncondition=null_y,\n                                                cfg_scale=config.cfg_scale,\n                                                model_kwargs=model_kwargs)\n                                samples = dpm_solver.sample(\n                                    z,\n                                    steps=sample_steps,\n                                    order=2,\n                                    skip_type=\"time_uniform\",\n                                    method=\"multistep\",\n                                )\n                            elif config.eval_sampler == 'sa-solver':\n                                # Create sampling noise:\n                                model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks)\n                                sa_solver = SASolverSampler(model.module.forward_with_dpmsolver, device=device)\n                                samples = sa_solver.sample(\n                                    S=25,\n                                    batch_size=n,\n                                    shape=(4, latent_size_h, latent_size_w),\n                                    eta=1,\n                                    conditioning=caption_embs,\n                                    unconditional_conditioning=null_y,\n                                    unconditional_guidance_scale=config.cfg_scale,\n                                    model_kwargs=model_kwargs,\n                                )[0]\n                        samples = vae.decode(samples / 0.18215).sample\n                        # decode image\n                        image = make_grid(samples, nrow=1, normalize=True, value_range=(-1, 1))\n                        image = image.mul(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).to(\"cpu\", torch.uint8).numpy()\n                        image = Image.fromarray(image)\n                        images.append(image)\n                        \n                    image_grid = make_image_grid(images, 2, len(images)//2)\n                    image_grid.save(save_path)\n                    for tracker in accelerator.trackers:\n                        if tracker.name == \"tensorboard\":\n                            np_images = np.stack([np.asarray(img) for img in images])\n                            tracker.writer.add_images(\"validation\", np_images, epoch, dataformats=\"NHWC\")\n                        elif tracker.name == \"comet_ml\":\n                            logger.info('Logging validation images')\n                            tracker.writer.log_image(image_grid, name=f\"{epoch}\", step=global_step)\n                        else:\n                            logger.warn(f\"image logging not implemented for {tracker.name}\")\n                    \n                    del images, image, samples, image_grid\n                    torch.cuda.empty_cache()\n                        \n        model.train()\n        synchronize()\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description=\"Process some integers.\")\n    parser.add_argument(\"config\", type=str, help=\"config\")\n    parser.add_argument(\"--cloud\", action='store_true', default=False, help=\"cloud or local machine\")\n    parser.add_argument('--work-dir', help='the dir to save logs and models')\n    parser.add_argument('--resume-from', help='the dir to resume the training')\n    parser.add_argument('--load-from', default=None, help='the dir to load a ckpt for training')\n    parser.add_argument('--local-rank', type=int, default=-1)\n    parser.add_argument('--local_rank', type=int, default=-1)\n    parser.add_argument('--debug', action='store_true')\n    parser.add_argument(\n        \"--report_to\",\n        type=str,\n        default=\"tensorboard\",\n        help=(\n            'The integration to report the results and logs to. Supported platforms are `\"tensorboard\"`'\n            ' (default), `\"wandb\"` and `\"comet_ml\"`. Use `\"all\"` to report to all integrations.'\n        ),\n    )\n    parser.add_argument(\n        \"--tracker_project_name\",\n        type=str,\n        default=\"text2image-fine-tune\",\n        help=(\n            \"The `project_name` argument passed to Accelerator.init_trackers for\"\n            \" more information see https://huggingface.co/docs/accelerate/v0.17.0/en/package_reference/accelerator#accelerate.Accelerator\"\n        ),\n    )\n    parser.add_argument(\"--loss_report_name\", type=str, default=\"loss\")\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == '__main__':\n    args = parse_args()\n    config = read_config(args.config)\n    if args.work_dir is not None:\n        # update configs according to CLI args if args.work_dir is not None\n        config.work_dir = args.work_dir\n    if args.cloud:\n        config.data_root = '/data/data'\n    if args.resume_from is not None:\n        config.load_from = None\n        config.resume_from = dict(\n            checkpoint=args.resume_from,\n            load_ema=False,\n            resume_optimizer=True,\n            resume_lr_scheduler=True)\n    if args.debug:\n        config.log_interval = 1\n        config.train_batch_size = 8\n        config.valid_num = 100\n\n    os.umask(0o000)\n    config.output_dir = os.path.join(config.work_dir, \n                                     f\"\"\"{config.model}_{config.dataset_alias}_{config.image_size}_batch{config.train_batch_size}_{config.lr_schedule}_lr{config.optimizer['lr']}_warmup{config.lr_schedule_args['num_warmup_steps']}_gas{config.gradient_accumulation_steps}\"\"\")        \n    os.makedirs(config.output_dir, exist_ok=True)\n\n    init_handler = InitProcessGroupKwargs()\n    init_handler.timeout = datetime.timedelta(seconds=5400)  # change timeout to avoid a strange NCCL bug\n    # Initialize accelerator and tensorboard logging\n    if config.use_fsdp:\n        init_train = 'FSDP'\n        from accelerate import FullyShardedDataParallelPlugin\n        from torch.distributed.fsdp.fully_sharded_data_parallel import FullStateDictConfig\n        set_fsdp_env()\n        fsdp_plugin = FullyShardedDataParallelPlugin(state_dict_config=FullStateDictConfig(offload_to_cpu=False, rank0_only=False),)\n    else:\n        init_train = 'DDP'\n        fsdp_plugin = None\n\n    even_batches = True\n    if config.multi_scale:\n        even_batches=False,\n        \n    if args.report_to == \"comet_ml\":\n        import comet_ml\n        comet_ml.init(\n            project_name=args.tracker_project_name,\n        )     \n\n    accelerator = Accelerator(\n        mixed_precision=config.mixed_precision,\n        gradient_accumulation_steps=config.gradient_accumulation_steps,\n        log_with=args.report_to,\n        project_dir=os.path.join(config.output_dir, \"logs\"),\n        fsdp_plugin=fsdp_plugin,\n        even_batches=even_batches,\n        kwargs_handlers=[init_handler]\n    )\n\n    logger = get_root_logger(os.path.join(config.output_dir, 'train_log.log'))\n\n    config.seed = init_random_seed(config.get('seed', None))\n    set_random_seed(config.seed)\n\n    if accelerator.is_main_process:\n        config.dump(os.path.join(config.output_dir, 'config.py'))\n\n    logger.info(f\"Config: \\n{config.pretty_text}\")\n    logger.info(f\"World_size: {get_world_size()}, seed: {config.seed}\")\n    logger.info(f\"Initializing: {init_train} for training\")\n    image_size = config.image_size  # @param [256, 512]\n    latent_size = int(image_size) // 8\n    pred_sigma = getattr(config, 'pred_sigma', True)\n    learn_sigma = getattr(config, 'learn_sigma', True) and pred_sigma\n    model_kwargs={\"window_block_indexes\": config.window_block_indexes, \"window_size\": config.window_size,\n                  \"use_rel_pos\": config.use_rel_pos, \"lewei_scale\": config.lewei_scale, 'config':config,\n                  'model_max_length': config.model_max_length}\n    \n    if config.validation_prompts is not None:\n        logger.info('Precompute validation prompt embeddings')\n        from diffusion.model.utils import prepare_prompt_ar\n        from diffusion import IDDPM, DPMS, SASolverSampler\n        from diffusion.model.t5 import T5Embedder\n        from diffusion.data.datasets import ASPECT_RATIO_256_TEST, ASPECT_RATIO_512_TEST, ASPECT_RATIO_1024_TEST\n        from diffusers.utils import  make_image_grid\n        from torchvision.utils import make_grid\n        \n        t5 = T5Embedder(device=\"cuda\", local_cache=True, cache_dir='output/pretrained_models/t5_ckpts', torch_dtype=torch.float)\n        device = t5.device\n        base_ratios = eval(f'ASPECT_RATIO_{config.image_size}_TEST')\n        pbs = 1\n        val_txt_embs = []\n        for prompt in config.validation_prompts:\n            prompts = []\n            prompt_clean, _, hw, ar, custom_hw = prepare_prompt_ar(prompt, base_ratios, device=device, show=False)  # ar for aspect ratio\n            if config.image_size == 1024:\n                latent_size_h, latent_size_w = int(hw[0, 0] // 8), int(hw[0, 1] // 8)\n            else:\n                hw = torch.tensor([[config.image_size, config.image_size]], dtype=torch.float, device=device).repeat(pbs, 1)\n                ar = torch.tensor([[1.]], device=device).repeat(pbs, 1)\n                latent_size_h, latent_size_w = latent_size, latent_size\n            prompts.append(prompt_clean.strip())\n            \n            with torch.no_grad():\n                caption_embs, emb_masks = t5.get_text_embeddings(prompts)\n                caption_embs = caption_embs.float()[:, None]\n                val_txt_embs.append([caption_embs, emb_masks, len(prompts)])\n        del t5\n        import gc         # garbage collect library\n        gc.collect()\n        torch.cuda.empty_cache()\n        logger.info('[ DONE ]')\n\n    # build models\n    train_diffusion = IDDPM(str(config.train_sampling_steps), learn_sigma=learn_sigma, pred_sigma=pred_sigma, snr=config.snr_loss)\n    model = build_model(config.model,\n                        config.grad_checkpointing,\n                        config.get('fp32_attention', False),\n                        input_size=latent_size,\n                        learn_sigma=learn_sigma,\n                        pred_sigma=pred_sigma,\n                        **model_kwargs).train()\n    logger.info(f\"{model.__class__.__name__} Model Parameters: {sum(p.numel() for p in model.parameters()):,}\")\n    logger.info(f\"T5 max token length: {config.model_max_length}\")\n    model_ema = deepcopy(model).eval()\n\n    if config.load_from is not None:\n        if args.load_from is not None:\n            config.load_from = args.load_from\n        missing, unexpected = load_checkpoint(config.load_from, model, load_ema=config.get('load_ema', False))\n        logger.warning(f'Missing keys: {missing}')\n        logger.warning(f'Unexpected keys: {unexpected}')\n\n    ema_update(model_ema, model, 0.)\n    if not config.data.load_vae_feat:\n        vae = AutoencoderKL.from_pretrained(config.vae_pretrained).cuda()\n\n    # prepare for FSDP clip grad norm calculation\n    if accelerator.distributed_type == DistributedType.FSDP:\n        for m in accelerator._models:\n            m.clip_grad_norm_ = types.MethodType(clip_grad_norm_, m)\n\n    # build dataloader\n    set_data_root(config.data_root)\n    dataset = build_dataset(config.data, resolution=image_size, aspect_ratio_type=config.aspect_ratio_type)\n    if config.multi_scale:\n        batch_sampler = AspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,\n                                                batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio, drop_last=True,\n                                                ratio_nums=dataset.ratio_nums, config=config, valid_num=config.valid_num)\n        # used for balanced sampling\n        # batch_sampler = BalancedAspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,\n        #                                                 batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio,\n        #                                                 ratio_nums=dataset.ratio_nums)\n        train_dataloader = build_dataloader(dataset, batch_sampler=batch_sampler, num_workers=config.num_workers)\n    else:\n        logger.info(f'Batch size {config.train_batch_size}')\n        train_dataloader = build_dataloader(dataset, num_workers=config.num_workers, batch_size=config.train_batch_size, shuffle=True)\n\n    # build optimizer and lr scheduler\n    lr_scale_ratio = 1\n    if config.get('auto_lr', None):\n        lr_scale_ratio = auto_scale_lr(config.train_batch_size * get_world_size() * config.gradient_accumulation_steps,\n                                       config.optimizer, **config.auto_lr)\n    optimizer = build_optimizer(model, config.optimizer)\n    lr_scheduler = build_lr_scheduler(config, optimizer, train_dataloader, lr_scale_ratio)\n\n    timestamp = time.strftime(\"%Y-%m-%d_%H:%M:%S\", time.localtime())\n\n    if accelerator.is_main_process:\n        tracker_config = dict(vars(config))\n        accelerator.init_trackers(args.tracker_project_name, tracker_config)\n        accelerator.get_tracker(\"comet_ml\").writer.add_tags([config.model, \n                                                            config.dataset_alias, \n                                                            config.image_size, \n                                                            config.lr_schedule, \n                                                            f'bs{config.train_batch_size}',\n                                                            f'gs{config.gradient_accumulation_steps}'\n                                                            ])\n\n    start_epoch = 0\n    if config.resume_from is not None and config.resume_from['checkpoint'] is not None:\n        start_epoch, missing, unexpected = load_checkpoint(**config.resume_from,\n                                                           model=model,\n                                                           model_ema=model_ema,\n                                                           optimizer=optimizer,\n                                                           lr_scheduler=lr_scheduler,\n                                                           )\n\n        logger.warning(f'Missing keys: {missing}')\n        logger.warning(f'Unexpected keys: {unexpected}')\n    # Prepare everything\n    # There is no specific order to remember, you just need to unpack the\n    # objects in the same order you gave them to the prepare method.\n    model, model_ema = accelerator.prepare(model, model_ema)\n    optimizer, train_dataloader, lr_scheduler = accelerator.prepare(optimizer, train_dataloader, lr_scheduler)\n    train()\n"
  },
  {
    "path": "PixArt-alpha-ToCa/train_scripts/train.py",
    "content": "import argparse\nimport datetime\nimport os\nimport sys\nimport time\nimport types\nimport warnings\nfrom copy import deepcopy\nfrom pathlib import Path\n\nimport torch\nimport torch.nn as nn\nfrom accelerate import Accelerator, InitProcessGroupKwargs\nfrom accelerate.utils import DistributedType\nfrom diffusers.models import AutoencoderKL\nfrom mmcv.runner import LogBuffer\nfrom torch.utils.data import RandomSampler\n\nfrom diffusion import IDDPM\nfrom diffusion.data.builder import build_dataset, build_dataloader, set_data_root\nfrom diffusion.model.builder import build_model\nfrom diffusion.utils.checkpoint import save_checkpoint, load_checkpoint\nfrom diffusion.utils.data_sampler import AspectRatioBatchSampler, BalancedAspectRatioBatchSampler\nfrom diffusion.utils.dist_utils import get_world_size, clip_grad_norm_\nfrom diffusion.utils.logger import get_root_logger\nfrom diffusion.utils.lr_scheduler import build_lr_scheduler\nfrom diffusion.utils.misc import set_random_seed, read_config, init_random_seed, DebugUnderflowOverflow\nfrom diffusion.utils.optimizer import build_optimizer, auto_scale_lr\n\nwarnings.filterwarnings(\"ignore\")  # ignore warning\n\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\n\n\ndef set_fsdp_env():\n    os.environ[\"ACCELERATE_USE_FSDP\"] = 'true'\n    os.environ[\"FSDP_AUTO_WRAP_POLICY\"] = 'TRANSFORMER_BASED_WRAP'\n    os.environ[\"FSDP_BACKWARD_PREFETCH\"] = 'BACKWARD_PRE'\n    os.environ[\"FSDP_TRANSFORMER_CLS_TO_WRAP\"] = 'PixArtBlock'\n\n\ndef ema_update(model_dest: nn.Module, model_src: nn.Module, rate):\n    param_dict_src = dict(model_src.named_parameters())\n    for p_name, p_dest in model_dest.named_parameters():\n        p_src = param_dict_src[p_name]\n        assert p_src is not p_dest\n        p_dest.data.mul_(rate).add_((1 - rate) * p_src.data)\n\ndef train():\n    if config.get('debug_nan', False):\n        DebugUnderflowOverflow(model)\n        logger.info('NaN debugger registered. Start to detect overflow during training.')\n    time_start, last_tic = time.time(), time.time()\n    log_buffer = LogBuffer()\n\n    start_step = start_epoch * len(train_dataloader)\n    global_step = 0\n    total_steps = len(train_dataloader) * config.num_epochs\n\n    load_vae_feat = getattr(train_dataloader.dataset, 'load_vae_feat', False)\n    # Now you train the model\n    for epoch in range(start_epoch + 1, config.num_epochs + 1):\n        data_time_start= time.time()\n        data_time_all = 0\n        for step, batch in enumerate(train_dataloader):\n            data_time_all += time.time() - data_time_start\n            if load_vae_feat:\n                z = batch[0]\n            else:\n                with torch.no_grad():\n                    with torch.cuda.amp.autocast(enabled=config.mixed_precision == 'fp16'):\n                        posterior = vae.encode(batch[0]).latent_dist\n                        if config.sample_posterior:\n                            z = posterior.sample()\n                        else:\n                            z = posterior.mode()\n            clean_images = z * config.scale_factor\n            y = batch[1]\n            y_mask = batch[2]\n            data_info = batch[3]\n\n            # Sample a random timestep for each image\n            bs = clean_images.shape[0]\n            timesteps = torch.randint(0, config.train_sampling_steps, (bs,), device=clean_images.device).long()\n            grad_norm = None\n            with accelerator.accumulate(model):\n                # Predict the noise residual\n                optimizer.zero_grad()\n                loss_term = train_diffusion.training_losses(model, clean_images, timesteps, model_kwargs=dict(y=y, mask=y_mask, data_info=data_info))\n                loss = loss_term['loss'].mean()\n                accelerator.backward(loss)\n                if accelerator.sync_gradients:\n                    grad_norm = accelerator.clip_grad_norm_(model.parameters(), config.gradient_clip)\n                optimizer.step()\n                lr_scheduler.step()\n                if accelerator.sync_gradients:\n                    ema_update(model_ema, model, config.ema_rate)\n\n            lr = lr_scheduler.get_last_lr()[0]\n            logs = {args.loss_report_name: accelerator.gather(loss).mean().item()}\n            if grad_norm is not None:\n                logs.update(grad_norm=accelerator.gather(grad_norm).mean().item())\n            log_buffer.update(logs)\n            if (step + 1) % config.log_interval == 0 or (step + 1) == 1:\n                t = (time.time() - last_tic) / config.log_interval\n                t_d = data_time_all / config.log_interval\n                avg_time = (time.time() - time_start) / (global_step + 1)\n                eta = str(datetime.timedelta(seconds=int(avg_time * (total_steps - start_step - global_step - 1))))\n                eta_epoch = str(datetime.timedelta(seconds=int(avg_time * (len(train_dataloader) - step - 1))))\n                # avg_loss = sum(loss_buffer) / len(loss_buffer)\n                log_buffer.average()\n                info = f\"Step/Epoch [{(epoch-1)*len(train_dataloader)+step+1}/{epoch}][{step + 1}/{len(train_dataloader)}]:total_eta: {eta}, \" \\\n                       f\"epoch_eta:{eta_epoch}, time_all:{t:.3f}, time_data:{t_d:.3f}, lr:{lr:.3e}, s:({model.module.h}, {model.module.w}), \"\n                info += ', '.join([f\"{k}:{v:.4f}\" for k, v in log_buffer.output.items()])\n                logger.info(info)\n                last_tic = time.time()\n                log_buffer.clear()\n                data_time_all = 0\n            logs.update(lr=lr)\n            accelerator.log(logs, step=global_step + start_step)\n\n            global_step += 1\n            data_time_start= time.time()\n\n            if ((epoch - 1) * len(train_dataloader) + step + 1) % config.save_model_steps == 0:\n                accelerator.wait_for_everyone()\n                if accelerator.is_main_process:\n                    os.umask(0o000)\n                    save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),\n                                    epoch=epoch,\n                                    step=(epoch - 1) * len(train_dataloader) + step + 1,\n                                    model=accelerator.unwrap_model(model),\n                                    model_ema=accelerator.unwrap_model(model_ema),\n                                    optimizer=optimizer,\n                                    lr_scheduler=lr_scheduler\n                                    )\n\n        if epoch % config.save_model_epochs == 0 or epoch == config.num_epochs:\n            accelerator.wait_for_everyone()\n            if accelerator.is_main_process:\n                os.umask(0o000)\n                save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),\n                                epoch=epoch,\n                                step=(epoch - 1) * len(train_dataloader) + step + 1,\n                                model=accelerator.unwrap_model(model),\n                                model_ema=accelerator.unwrap_model(model_ema),\n                                optimizer=optimizer,\n                                lr_scheduler=lr_scheduler\n                                )\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description=\"Process some integers.\")\n    parser.add_argument(\"config\", type=str, help=\"config\")\n    parser.add_argument(\"--cloud\", action='store_true', default=False, help=\"cloud or local machine\")\n    parser.add_argument('--work-dir', help='the dir to save logs and models')\n    parser.add_argument('--resume-from', help='the dir to resume the training')\n    parser.add_argument('--load-from', default=None, help='the dir to load a ckpt for training')\n    parser.add_argument('--local-rank', type=int, default=-1)\n    parser.add_argument('--local_rank', type=int, default=-1)\n    parser.add_argument('--debug', action='store_true')\n    parser.add_argument(\n        \"--report_to\",\n        type=str,\n        default=\"tensorboard\",\n        help=(\n            'The integration to report the results and logs to. Supported platforms are `\"tensorboard\"`'\n            ' (default), `\"wandb\"` and `\"comet_ml\"`. Use `\"all\"` to report to all integrations.'\n        ),\n    )\n    parser.add_argument(\n        \"--tracker_project_name\",\n        type=str,\n        default=\"text2image-fine-tune\",\n        help=(\n            \"The `project_name` argument passed to Accelerator.init_trackers for\"\n            \" more information see https://huggingface.co/docs/accelerate/v0.17.0/en/package_reference/accelerator#accelerate.Accelerator\"\n        ),\n    )\n    parser.add_argument(\"--loss_report_name\", type=str, default=\"loss\")\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == '__main__':\n    args = parse_args()\n    config = read_config(args.config)\n    if args.work_dir is not None:\n        # update configs according to CLI args if args.work_dir is not None\n        config.work_dir = args.work_dir\n    if args.cloud:\n        config.data_root = '/data/data'\n    if args.resume_from is not None:\n        config.load_from = None\n        config.resume_from = dict(\n            checkpoint=args.resume_from,\n            load_ema=False,\n            resume_optimizer=True,\n            resume_lr_scheduler=True)\n    if args.debug:\n        config.log_interval = 1\n        config.train_batch_size = 8\n        config.valid_num = 100\n\n    os.umask(0o000)\n    os.makedirs(config.work_dir, exist_ok=True)\n\n    init_handler = InitProcessGroupKwargs()\n    init_handler.timeout = datetime.timedelta(seconds=5400)  # change timeout to avoid a strange NCCL bug\n    # Initialize accelerator and tensorboard logging\n    if config.use_fsdp:\n        init_train = 'FSDP'\n        from accelerate import FullyShardedDataParallelPlugin\n        from torch.distributed.fsdp.fully_sharded_data_parallel import FullStateDictConfig\n        set_fsdp_env()\n        fsdp_plugin = FullyShardedDataParallelPlugin(state_dict_config=FullStateDictConfig(offload_to_cpu=False, rank0_only=False),)\n    else:\n        init_train = 'DDP'\n        fsdp_plugin = None\n\n    even_batches = True\n    if config.multi_scale:\n        even_batches=False,\n\n    accelerator = Accelerator(\n        mixed_precision=config.mixed_precision,\n        gradient_accumulation_steps=config.gradient_accumulation_steps,\n        log_with=args.report_to,\n        project_dir=os.path.join(config.work_dir, \"logs\"),\n        fsdp_plugin=fsdp_plugin,\n        even_batches=even_batches,\n        kwargs_handlers=[init_handler]\n    )\n\n    logger = get_root_logger(os.path.join(config.work_dir, 'train_log.log'))\n\n    config.seed = init_random_seed(config.get('seed', None))\n    set_random_seed(config.seed)\n\n    if accelerator.is_main_process:\n        config.dump(os.path.join(config.work_dir, 'config.py'))\n\n    logger.info(f\"Config: \\n{config.pretty_text}\")\n    logger.info(f\"World_size: {get_world_size()}, seed: {config.seed}\")\n    logger.info(f\"Initializing: {init_train} for training\")\n    image_size = config.image_size  # @param [256, 512, 1024]\n    latent_size = int(image_size) // 8\n    pred_sigma = getattr(config, 'pred_sigma', True)\n    learn_sigma = getattr(config, 'learn_sigma', True) and pred_sigma\n    model_kwargs={\"window_block_indexes\": config.window_block_indexes, \"window_size\": config.window_size,\n                  \"use_rel_pos\": config.use_rel_pos, \"lewei_scale\": config.lewei_scale, 'config':config,\n                  'model_max_length': config.model_max_length}\n\n    # build models\n    train_diffusion = IDDPM(str(config.train_sampling_steps), learn_sigma=learn_sigma, pred_sigma=pred_sigma, snr=config.snr_loss)\n    model = build_model(config.model,\n                        config.grad_checkpointing,\n                        config.get('fp32_attention', False),\n                        input_size=latent_size,\n                        learn_sigma=learn_sigma,\n                        pred_sigma=pred_sigma,\n                        **model_kwargs).train()\n    logger.info(f\"{model.__class__.__name__} Model Parameters: {sum(p.numel() for p in model.parameters()):,}\")\n    model_ema = deepcopy(model).eval()\n\n    if config.load_from is not None:\n        if args.load_from is not None:\n            config.load_from = args.load_from\n        missing, unexpected = load_checkpoint(config.load_from, model, load_ema=config.get('load_ema', False))\n        logger.warning(f'Missing keys: {missing}')\n        logger.warning(f'Unexpected keys: {unexpected}')\n\n    ema_update(model_ema, model, 0.)\n    if not config.data.load_vae_feat:\n        vae = AutoencoderKL.from_pretrained(config.vae_pretrained).cuda()\n\n    # prepare for FSDP clip grad norm calculation\n    if accelerator.distributed_type == DistributedType.FSDP:\n        for m in accelerator._models:\n            m.clip_grad_norm_ = types.MethodType(clip_grad_norm_, m)\n\n    # build dataloader\n    set_data_root(config.data_root)\n    dataset = build_dataset(config.data, resolution=image_size, aspect_ratio_type=config.aspect_ratio_type)\n    if config.multi_scale:\n        batch_sampler = AspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,\n                                                batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio, drop_last=True,\n                                                ratio_nums=dataset.ratio_nums, config=config, valid_num=config.valid_num)\n        # used for balanced sampling\n        # batch_sampler = BalancedAspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,\n        #                                                 batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio,\n        #                                                 ratio_nums=dataset.ratio_nums)\n        train_dataloader = build_dataloader(dataset, batch_sampler=batch_sampler, num_workers=config.num_workers)\n    else:\n        train_dataloader = build_dataloader(dataset, num_workers=config.num_workers, batch_size=config.train_batch_size, shuffle=True)\n\n    # build optimizer and lr scheduler\n    lr_scale_ratio = 1\n    if config.get('auto_lr', None):\n        lr_scale_ratio = auto_scale_lr(config.train_batch_size * get_world_size() * config.gradient_accumulation_steps,\n                                       config.optimizer, **config.auto_lr)\n    optimizer = build_optimizer(model, config.optimizer)\n    lr_scheduler = build_lr_scheduler(config, optimizer, train_dataloader, lr_scale_ratio)\n\n    timestamp = time.strftime(\"%Y-%m-%d_%H:%M:%S\", time.localtime())\n\n    if accelerator.is_main_process:\n        tracker_config = dict(vars(config))\n        try:\n            accelerator.init_trackers(args.tracker_project_name, tracker_config)\n        except:\n            accelerator.init_trackers(f\"tb_{timestamp}\")\n\n    start_epoch = 0\n    if config.resume_from is not None and config.resume_from['checkpoint'] is not None:\n        start_epoch, missing, unexpected = load_checkpoint(**config.resume_from,\n                                                           model=model,\n                                                           model_ema=model_ema,\n                                                           optimizer=optimizer,\n                                                           lr_scheduler=lr_scheduler,\n                                                           )\n\n        logger.warning(f'Missing keys: {missing}')\n        logger.warning(f'Unexpected keys: {unexpected}')\n    # Prepare everything\n    # There is no specific order to remember, you just need to unpack the\n    # objects in the same order you gave them to the prepare method.\n    model, model_ema = accelerator.prepare(model, model_ema)\n    optimizer, train_dataloader, lr_scheduler = accelerator.prepare(optimizer, train_dataloader, lr_scheduler)\n    train()\n"
  },
  {
    "path": "PixArt-alpha-ToCa/train_scripts/train_controlnet.py",
    "content": "import argparse\nimport datetime\nimport os\nimport sys\nimport time\nimport types\nimport warnings\nfrom pathlib import Path\n\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\n\nimport torch\nfrom accelerate import Accelerator, InitProcessGroupKwargs\nfrom accelerate.utils import DistributedType\nfrom mmcv.runner import LogBuffer\nfrom torch.utils.data import RandomSampler\n\nfrom diffusion import IDDPM\nfrom diffusion.data.builder import build_dataset, build_dataloader, set_data_root\nfrom diffusion.model.builder import build_model\nfrom diffusion.model.nets import PixArtMS, ControlPixArtHalf, ControlPixArtMSHalf\nfrom diffusion.utils.checkpoint import save_checkpoint, load_checkpoint\nfrom diffusion.utils.data_sampler import AspectRatioBatchSampler, BalancedAspectRatioBatchSampler\nfrom diffusion.utils.dist_utils import synchronize, get_world_size, clip_grad_norm_\nfrom diffusion.utils.logger import get_root_logger\nfrom diffusion.utils.lr_scheduler import build_lr_scheduler\nfrom diffusion.utils.misc import set_random_seed, read_config, init_random_seed, DebugUnderflowOverflow\nfrom diffusion.utils.optimizer import build_optimizer, auto_scale_lr\n\nwarnings.filterwarnings(\"ignore\")  # ignore warning\n\n\ndef set_fsdp_env():\n    os.environ[\"ACCELERATE_USE_FSDP\"] = 'true'\n    os.environ[\"FSDP_AUTO_WRAP_POLICY\"] = 'TRANSFORMER_BASED_WRAP'\n    os.environ[\"FSDP_BACKWARD_PREFETCH\"] = 'BACKWARD_PRE'\n    os.environ[\"FSDP_TRANSFORMER_CLS_TO_WRAP\"] = 'PixArtBlock'\n\n\ndef train():\n    if config.get('debug_nan', False):\n        DebugUnderflowOverflow(model)\n        logger.info('NaN debugger registered. Start to detect overflow during training.')\n    time_start, last_tic = time.time(), time.time()\n    log_buffer = LogBuffer()\n\n    start_step = start_epoch * len(train_dataloader)\n    global_step = 0\n    total_steps = len(train_dataloader) * config.num_epochs\n\n    load_vae_feat = getattr(train_dataloader.dataset, 'load_vae_feat', False)\n    if not load_vae_feat:\n        raise ValueError(\"Only support load vae features for now.\")\n    # Now you train the model\n    for epoch in range(start_epoch + 1, config.num_epochs + 1):\n        data_time_start = time.time()\n        data_time_all = 0\n        for step, batch in enumerate(train_dataloader):\n            data_time_all += time.time() - data_time_start\n            z = batch[0]  # 4 x 4 x 128 x 128 z:vae output, 3x1024x1024->vae->4x128x128\n            clean_images = z * config.scale_factor  # vae needed scale factor\n            y = batch[1]  # 4 x 1 x 120 x 4096 # T5 extracted feature of caption, 120 token, 4096\n            y_mask = batch[2]  # 4 x 1 x 1 x 120 # caption indicate whether valid\n            data_info = batch[3]\n\n            # Sample a random timestep for each image\n            bs = clean_images.shape[0]\n            timesteps = torch.randint(0, config.train_sampling_steps, (bs,), device=clean_images.device).long()\n            grad_norm = None\n            with accelerator.accumulate(model):\n                # Predict the noise residual\n                optimizer.zero_grad()\n                loss_term = train_diffusion.training_losses(model, clean_images, timesteps, model_kwargs=dict(y=y, mask=y_mask, data_info=data_info, c=data_info['condition'] * config.scale_factor))\n                loss = loss_term['loss'].mean()\n                accelerator.backward(loss)\n                if accelerator.sync_gradients:\n                    grad_norm = accelerator.clip_grad_norm_(model.parameters(), config.gradient_clip)\n                optimizer.step()\n                lr_scheduler.step()\n\n            lr = lr_scheduler.get_last_lr()[0]\n            logs = {\"loss\": accelerator.gather(loss).mean().item()}\n            if grad_norm is not None:\n                logs.update(grad_norm=accelerator.gather(grad_norm).mean().item())\n            log_buffer.update(logs)\n            if (step + 1) % config.log_interval == 0 or (step + 1) == 1:\n                t = (time.time() - last_tic) / config.log_interval\n                t_d = data_time_all / config.log_interval\n                avg_time = (time.time() - time_start) / (global_step + 1)\n                eta = str(datetime.timedelta(seconds=int(avg_time * (total_steps - start_step - global_step - 1))))\n                eta_epoch = str(datetime.timedelta(seconds=int(avg_time * (len(train_dataloader) - step - 1))))\n                # avg_loss = sum(loss_buffer) / len(loss_buffer)\n                log_buffer.average()\n                info = f\"Step/Epoch [{(epoch - 1) * len(train_dataloader) + step + 1}/{epoch}][{step + 1}/{len(train_dataloader)}]:total_eta: {eta}, \" \\\n                       f\"epoch_eta:{eta_epoch}, time_all:{t:.3f}, time_data:{t_d:.3f}, lr:{lr:.3e}, s:({data_info['img_hw'][0][0].item()}, {data_info['img_hw'][0][1].item()}), \"\n                info += ', '.join([f\"{k}:{v:.4f}\" for k, v in log_buffer.output.items()])\n                logger.info(info)\n                last_tic = time.time()\n                log_buffer.clear()\n                data_time_all = 0\n            logs.update(lr=lr)\n            accelerator.log(logs, step=global_step + start_step)\n\n            if (global_step + 1) % 1000 == 0 and config.s3_work_dir is not None:\n                logger.info(f\"s3_work_dir: {config.s3_work_dir}\")\n\n            global_step += 1\n            data_time_start = time.time()\n\n            synchronize()\n            if accelerator.is_main_process:\n                if ((epoch - 1) * len(train_dataloader) + step + 1) % config.save_model_steps == 0:\n                    os.umask(0o000)  # file permission: 666; dir permission: 777\n                    save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),\n                                    epoch=epoch,\n                                    step=(epoch - 1) * len(train_dataloader) + step + 1,\n                                    model=accelerator.unwrap_model(model),\n                                    optimizer=optimizer,\n                                    lr_scheduler=lr_scheduler\n                                    )\n            synchronize()\n\n        synchronize()\n        # After each epoch you optionally sample some demo images with evaluate() and save the model\n        if accelerator.is_main_process:\n            if epoch % config.save_model_epochs == 0 or epoch == config.num_epochs:\n                os.umask(0o000)  # file permission: 666; dir permission: 777\n                save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),\n                                epoch=epoch,\n                                step=(epoch - 1) * len(train_dataloader) + step + 1,\n                                model=accelerator.unwrap_model(model),\n                                optimizer=optimizer,\n                                lr_scheduler=lr_scheduler\n                                )\n        synchronize()\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description=\"Process some integers.\")\n    parser.add_argument(\"config\", type=str, help=\"config\")\n    parser.add_argument(\"--cloud\", action='store_true', default=False, help=\"cloud or local machine\")\n    parser.add_argument('--work-dir', help='the dir to save logs and models')\n    parser.add_argument('--resume_from', help='the dir to save logs and models')\n    parser.add_argument('--local-rank', type=int, default=-1)\n    parser.add_argument('--local_rank', type=int, default=-1)\n    parser.add_argument('--debug', action='store_true')\n    parser.add_argument(\n        \"--report_to\",\n        type=str,\n        default=\"tensorboard\",\n        help=(\n            'The integration to report the results and logs to. Supported platforms are `\"tensorboard\"`'\n            ' (default), `\"wandb\"` and `\"comet_ml\"`. Use `\"all\"` to report to all integrations.'\n        ),\n    )\n    parser.add_argument(\n        \"--tracker_project_name\",\n        type=str,\n        default=\"text2image-fine-tune\",\n        help=(\n            \"The `project_name` argument passed to Accelerator.init_trackers for\"\n            \" more information see https://huggingface.co/docs/accelerate/v0.17.0/en/package_reference/accelerator#accelerate.Accelerator\"\n        ),\n    )\n    parser.add_argument('--lr', type=float, default=2e-4)\n    parser.add_argument('--data_root', type=str, default=None)\n    parser.add_argument('--resume_optimizer', action='store_true')\n    parser.add_argument('--resume_lr_scheduler', action='store_true')\n\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == '__main__':\n    args = parse_args()\n    config = read_config(args.config)\n    if args.work_dir is not None:\n        # update configs according to CLI args if args.work_dir is not None\n        config.work_dir = args.work_dir\n    if args.cloud:\n        config.data_root = '/data/data'\n    if args.data_root:\n        config.data_root = args.data_root\n    if args.resume_from is not None:\n        config.load_from = None\n        config.resume_from = dict(\n            checkpoint=args.resume_from,\n            load_ema=False,\n            resume_optimizer=args.resume_optimizer,\n            resume_lr_scheduler=args.resume_lr_scheduler)\n    if args.debug:\n        config.log_interval = 1\n        config.train_batch_size = 6\n        config.optimizer.update({'lr': args.lr})\n\n    os.umask(0o000)  # file permission: 666; dir permission: 777\n    os.makedirs(config.work_dir, exist_ok=True)\n\n    init_handler = InitProcessGroupKwargs()\n    init_handler.timeout = datetime.timedelta(seconds=9600)  # change timeout to avoid a strange NCCL bug\n    # Initialize accelerator and tensorboard logging\n    if config.use_fsdp:\n        init_train = 'FSDP'\n        from accelerate import FullyShardedDataParallelPlugin\n        from torch.distributed.fsdp.fully_sharded_data_parallel import FullStateDictConfig\n        set_fsdp_env()\n        fsdp_plugin = FullyShardedDataParallelPlugin(state_dict_config=FullStateDictConfig(offload_to_cpu=False, rank0_only=False),)\n    else:\n        init_train = 'DDP'\n        fsdp_plugin = None\n\n    even_batches = True\n    if config.multi_scale:\n        even_batches=False,\n\n    accelerator = Accelerator(\n        mixed_precision=config.mixed_precision,\n        gradient_accumulation_steps=config.gradient_accumulation_steps,\n        log_with=args.report_to,\n        project_dir=os.path.join(config.work_dir, \"logs\"),\n        fsdp_plugin=fsdp_plugin,\n        even_batches=even_batches,\n        kwargs_handlers=[init_handler]\n    )\n\n    logger = get_root_logger(os.path.join(config.work_dir, 'train_log.log'))\n\n    config.seed = init_random_seed(config.get('seed', None))\n    set_random_seed(config.seed)\n\n    if accelerator.is_main_process:\n        config.dump(os.path.join(config.work_dir, 'config.py'))\n\n    logger.info(f\"Config: \\n{config.pretty_text}\")\n    logger.info(f\"World_size: {get_world_size()}, seed: {config.seed}\")\n    logger.info(f\"Initializing: {init_train} for training\")\n    image_size = config.image_size  # @param [512, 1024]\n    latent_size = int(image_size) // 8\n    pred_sigma = getattr(config, 'pred_sigma', True)\n    learn_sigma = getattr(config, 'learn_sigma', True) and pred_sigma\n    model_kwargs={\"window_block_indexes\": config.window_block_indexes, \"window_size\": config.window_size,\n                  \"use_rel_pos\": config.use_rel_pos, \"lewei_scale\": config.lewei_scale, 'config':config,\n                  'model_max_length': config.model_max_length}\n\n    # build models\n    train_diffusion = IDDPM(str(config.train_sampling_steps))\n    model: PixArtMS = build_model(config.model,\n                                  config.grad_checkpointing,\n                                  config.get('fp32_attention', False),\n                                  input_size=latent_size,\n                                  learn_sigma=learn_sigma,\n                                  pred_sigma=pred_sigma,\n                                  **model_kwargs)\n\n    if config.load_from is not None and args.resume_from is None:\n        # load from PixArt model\n        missing, unexpected = load_checkpoint(config.load_from, model)\n        logger.warning(f'Missing keys: {missing}')\n        logger.warning(f'Unexpected keys: {unexpected}')\n\n    if image_size == 1024:\n        model: ControlPixArtMSHalf = ControlPixArtMSHalf(model, copy_blocks_num=config.copy_blocks_num).train()\n    else:\n        model: ControlPixArtHalf = ControlPixArtHalf(model, copy_blocks_num=config.copy_blocks_num).train()\n\n    logger.info(f\"{model.__class__.__name__} Model Parameters: {sum(p.numel() for p in model.parameters()):,}\")\n    logger.info(f\"T5 max token length: {config.model_max_length}\")\n\n    # if args.local_rank == 0:\n    #     for name, params in model.named_parameters():\n    #         if params.requires_grad == False: logger.info(f\"freeze param: {name}\")\n    #\n    #     for name, params in model.named_parameters():\n    #         if params.requires_grad == True: logger.info(f\"trainable param: {name}\")\n\n    # prepare for FSDP clip grad norm calculation\n    if accelerator.distributed_type == DistributedType.FSDP:\n        for m in accelerator._models:\n            m.clip_grad_norm_ = types.MethodType(clip_grad_norm_, m)\n\n    # build dataloader\n    set_data_root(config.data_root)\n    dataset = build_dataset(config.data, resolution=image_size, aspect_ratio_type=config.aspect_ratio_type, train_ratio=config.train_ratio)\n    if config.multi_scale:\n        batch_sampler = AspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,\n                                                batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio, drop_last=True,\n                                                ratio_nums=dataset.ratio_nums, config=config, valid_num=1)\n        # batch_sampler = BalancedAspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,\n        #                                                 batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio,\n        #                                                 ratio_nums=dataset.ratio_nums)\n        train_dataloader = build_dataloader(dataset, batch_sampler=batch_sampler, num_workers=config.num_workers)\n    else:\n        train_dataloader = build_dataloader(dataset, num_workers=config.num_workers, batch_size=config.train_batch_size, shuffle=True)\n\n    # build optimizer and lr scheduler\n    lr_scale_ratio = 1\n    if config.get('auto_lr', None):\n        lr_scale_ratio = auto_scale_lr(config.train_batch_size * get_world_size() * config.gradient_accumulation_steps,\n                                       config.optimizer, **config.auto_lr)\n    optimizer = build_optimizer(model.controlnet, config.optimizer)\n    lr_scheduler = build_lr_scheduler(config, optimizer, train_dataloader, lr_scale_ratio)\n\n    timestamp = time.strftime(\"%Y-%m-%d_%H:%M:%S\", time.localtime())\n\n    if accelerator.is_main_process:\n        tracker_config = dict(vars(config))\n        try:\n            accelerator.init_trackers(args.tracker_project_name, tracker_config)\n        except:\n            accelerator.init_trackers(f\"tb_{timestamp}\")\n\n    start_epoch = 0\n    if config.resume_from is not None and config.resume_from['checkpoint'] is not None:\n        if args.resume_optimizer == False or args.resume_lr_scheduler == False:\n            missing, unexpected = load_checkpoint(args.resume_from, model)\n        else:\n            start_epoch, missing, unexpected = load_checkpoint(**config.resume_from,\n                                                               model=model,\n                                                               optimizer=optimizer,\n                                                               lr_scheduler=lr_scheduler,\n                                                               )\n\n        logger.warning(f'Missing keys: {missing}')\n        logger.warning(f'Unexpected keys: {unexpected}')\n    # Prepare everything\n    # There is no specific order to remember, you just need to unpack the\n    # objects in the same order you gave them to the prepare method.\n    model = accelerator.prepare(model,)\n    optimizer, train_dataloader, lr_scheduler = accelerator.prepare(optimizer, train_dataloader, lr_scheduler)\n    train()\n"
  },
  {
    "path": "PixArt-alpha-ToCa/train_scripts/train_diffusers.py",
    "content": "import argparse\nimport datetime\nimport os\nimport sys\nimport time\nimport types\nimport warnings\nfrom pathlib import Path\n\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\n\nimport accelerate\nimport gc\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom accelerate import Accelerator, InitProcessGroupKwargs\nfrom accelerate.utils import DistributedType\nfrom copy import deepcopy\nfrom diffusers import AutoencoderKL, Transformer2DModel, PixArtAlphaPipeline, DPMSolverMultistepScheduler\nfrom mmcv.runner import LogBuffer\nfrom packaging import version\nfrom torch.utils.data import RandomSampler\nfrom transformers import T5Tokenizer, T5EncoderModel\n\nfrom diffusion import IDDPM\nfrom diffusion.data.builder import build_dataset, build_dataloader, set_data_root\nfrom diffusion.utils.data_sampler import AspectRatioBatchSampler, BalancedAspectRatioBatchSampler\nfrom diffusion.utils.dist_utils import get_world_size, clip_grad_norm_, flush\nfrom diffusion.utils.logger import get_root_logger, rename_file_with_creation_time\nfrom diffusion.utils.lr_scheduler import build_lr_scheduler\nfrom diffusion.utils.misc import set_random_seed, read_config, init_random_seed, DebugUnderflowOverflow\nfrom diffusion.utils.optimizer import build_optimizer, auto_scale_lr\n\nwarnings.filterwarnings(\"ignore\")  # ignore warning\n\n\ndef set_fsdp_env():\n    os.environ[\"ACCELERATE_USE_FSDP\"] = 'true'\n    os.environ[\"FSDP_AUTO_WRAP_POLICY\"] = 'TRANSFORMER_BASED_WRAP'\n    os.environ[\"FSDP_BACKWARD_PREFETCH\"] = 'BACKWARD_PRE'\n    os.environ[\"FSDP_TRANSFORMER_CLS_TO_WRAP\"] = 'Transformer2DModel'\n\n\ndef ema_update(model_dest: nn.Module, model_src: nn.Module, rate):\n    param_dict_src = dict(model_src.named_parameters())\n    for p_name, p_dest in model_dest.named_parameters():\n        p_src = param_dict_src[p_name]\n        assert p_src is not p_dest\n        p_dest.data.mul_(rate).add_((1 - rate) * p_src.data)\n\n\ndef token_drop(y, y_mask, force_drop_ids=None):\n    \"\"\"\n    Drops labels to enable classifier-free guidance.\n    \"\"\"\n    if force_drop_ids is None:\n        drop_ids = torch.rand(y.shape[0]).cuda() < config.class_dropout_prob\n    else:\n        drop_ids = force_drop_ids == 1\n    y = torch.where(drop_ids[:, None, None], uncond_prompt_embeds, y)\n    y_mask = torch.where(drop_ids[:, None], uncond_prompt_attention_mask, y_mask)\n    return y, y_mask\n\n\ndef get_null_embed(npz_file, max_length=120):\n    if os.path.exists(npz_file) and (npz_file.endswith('.npz') or npz_file.endswith('.pth')):\n        data = torch.load(npz_file)\n        uncond_prompt_embeds = data['uncond_prompt_embeds'].to(accelerator.device)\n        uncond_prompt_attention_mask = data['uncond_prompt_attention_mask'].to(accelerator.device)\n    else:\n        tokenizer = T5Tokenizer.from_pretrained(args.pipeline_load_from, subfolder=\"tokenizer\")\n        text_encoder = T5EncoderModel.from_pretrained(args.pipeline_load_from, subfolder=\"text_encoder\")\n        uncond = tokenizer(\"\", max_length=max_length, padding=\"max_length\", truncation=True, return_tensors=\"pt\")\n        uncond_prompt_embeds = text_encoder(uncond.input_ids, attention_mask=uncond.attention_mask)[0]\n\n        torch.save({\n            'uncond_prompt_embeds': uncond_prompt_embeds.cpu(),\n            'uncond_prompt_attention_mask': uncond.attention_mask.cpu()\n        }, npz_file)\n\n        uncond_prompt_embeds = uncond_prompt_embeds.to(accelerator.device)\n        uncond_prompt_attention_mask = uncond.attention_mask.to(accelerator.device)\n\n    return uncond_prompt_embeds, uncond_prompt_attention_mask\n\n\ndef prepare_vis():\n    if accelerator.is_main_process:\n        # preparing embeddings for visualization. We put it here for saving GPU memory\n        validation_prompts = [\n            \"dog\",\n            \"portrait photo of a girl, photograph, highly detailed face, depth of field\",\n            \"Self-portrait oil painting, a beautiful cyborg with golden hair, 8k\",\n            \"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k\",\n            \"A photo of beautiful mountain with realistic sunset and blue lake, highly detailed, masterpiece\",\n        ]\n        logger.info(\"Preparing Visualization prompt embeddings...\")\n        logger.info(f\"Loading text encoder and tokenizer from {args.pipeline_load_from} ...\")\n        skip = True\n        for prompt in validation_prompts:\n            if not os.path.exists(f'output/tmp/{prompt}_{max_length}token.pth'):\n                skip = False\n                break\n        if accelerator.is_main_process and not skip:\n            print(f\"Saving visualizate prompt text embedding at output/tmp/\")\n            tokenizer = T5Tokenizer.from_pretrained(args.pipeline_load_from, subfolder=\"tokenizer\")\n            text_encoder = T5EncoderModel.from_pretrained(args.pipeline_load_from, subfolder=\"text_encoder\").to(accelerator.device)\n            for prompt in validation_prompts:\n                caption_token = tokenizer(prompt, max_length=max_length, padding=\"max_length\", truncation=True, return_tensors=\"pt\").to(accelerator.device)\n                caption_emb = text_encoder(caption_token.input_ids, attention_mask=caption_token.attention_mask)[0]\n                torch.save({'caption_embeds': caption_emb, 'emb_mask': caption_token.attention_mask}, f'output/tmp/{prompt}_{max_length}token.pth')\n        flush()\n\n\n@torch.inference_mode()\ndef log_validation(model, accelerator, weight_dtype, step):\n\n\n    logger.info(\"Running validation... \")\n\n    model = accelerator.unwrap_model(model)\n    pipeline = PixArtAlphaPipeline.from_pretrained(\n        args.pipeline_load_from,\n        transformer=model,\n        tokenizer=None,\n        text_encoder=None,\n        torch_dtype=weight_dtype,\n    )\n    pipeline = pipeline.to(accelerator.device)\n    pipeline.set_progress_bar_config(disable=True)\n\n    generator = torch.Generator(device=accelerator.device).manual_seed(0)\n\n    validation_prompts = [\n        \"dog\",\n        \"portrait photo of a girl, photograph, highly detailed face, depth of field\",\n        \"Self-portrait oil painting, a beautiful cyborg with golden hair, 8k\",\n        \"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k\",\n        \"A photo of beautiful mountain with realistic sunset and blue lake, highly detailed, masterpiece\",\n    ]\n    image_logs = []\n    images = []\n    latents = []\n    for _, prompt in enumerate(validation_prompts):\n        embed = torch.load(f'output/tmp/{prompt}_{max_length}token.pth', map_location='cpu')\n        caption_embs, emb_masks = embed['caption_embeds'].to(accelerator.device), embed['emb_mask'].to(accelerator.device)\n        latents.append(pipeline(\n            num_inference_steps=14,\n            num_images_per_prompt=1,\n            generator=generator,\n            guidance_scale=4.5,\n            prompt_embeds=caption_embs,\n            prompt_attention_mask=emb_masks,\n            negative_prompt=None,\n            negative_prompt_embeds=uncond_prompt_embeds,\n            negative_prompt_attention_mask=uncond_prompt_attention_mask,\n            output_type=\"latent\",\n        ).images)\n\n    flush()\n\n    for latent in latents:\n        images.append(pipeline.vae.decode(latent.to(weight_dtype) / pipeline.vae.config.scaling_factor, return_dict=False)[0])\n    for prompt, image in zip(validation_prompts, images):\n        image = pipeline.image_processor.postprocess(image, output_type=\"pil\")\n        image_logs.append({\"validation_prompt\": prompt, \"images\": image})\n\n    for tracker in accelerator.trackers:\n        if tracker.name == \"tensorboard\":\n            for log in image_logs:\n                images = log[\"images\"]\n                validation_prompt = log[\"validation_prompt\"]\n                formatted_images = []\n                for image in images:\n                    formatted_images.append(np.asarray(image))\n\n                formatted_images = np.stack(formatted_images)\n\n                tracker.writer.add_images(validation_prompt, formatted_images, step, dataformats=\"NHWC\")\n        elif tracker.name == \"wandb\":\n            import wandb\n            formatted_images = []\n\n            for log in image_logs:\n                images = log[\"images\"]\n                validation_prompt = log[\"validation_prompt\"]\n                for image in images:\n                    image = wandb.Image(image, caption=validation_prompt)\n                    formatted_images.append(image)\n\n            tracker.log({\"validation\": formatted_images})\n        else:\n            logger.warn(f\"image logging not implemented for {tracker.name}\")\n\n    del pipeline\n    gc.collect()\n    torch.cuda.empty_cache()\n    return image_logs\n\n\ndef train(model):\n    if config.get('debug_nan', False):\n        DebugUnderflowOverflow(model)\n        logger.info('NaN debugger registered. Start to detect overflow during training.')\n    time_start, last_tic = time.time(), time.time()\n    log_buffer = LogBuffer()\n\n    global_step = start_step + 1\n\n    load_vae_feat = getattr(train_dataloader.dataset, 'load_vae_feat', False)\n\n    # Now you train the model\n    for epoch in range(start_epoch + 1, config.num_epochs + 1):\n        data_time_start= time.time()\n        data_time_all = 0\n        for step, batch in enumerate(train_dataloader):\n            data_time_all += time.time() - data_time_start\n            if load_vae_feat:\n                z = batch[0]\n            else:\n                with torch.no_grad():\n                    with torch.cuda.amp.autocast(enabled=config.mixed_precision == 'fp16'):\n                        posterior = vae.encode(batch[0]).latent_dist\n                        if config.sample_posterior:\n                            z = posterior.sample()\n                        else:\n                            z = posterior.mode()\n            latents = (z * config.scale_factor).to(weight_dtype)\n            y = batch[1].squeeze(1).to(weight_dtype)\n            y_mask = batch[2].squeeze(1).squeeze(1).to(weight_dtype)\n            y, y_mask = token_drop(y, y_mask)   # classifier-free guidance\n            data_info = {'resolution': batch[3]['img_hw'].to(weight_dtype), 'aspect_ratio': batch[3]['aspect_ratio'].to(weight_dtype),}\n\n            # Sample a random timestep for each image\n            bs = latents.shape[0]\n            timesteps = torch.randint(0, config.train_sampling_steps, (bs,), device=latents.device).long()\n            grad_norm = None\n            with accelerator.accumulate(model):\n                # Predict the noise residual\n                optimizer.zero_grad()\n                loss_term = train_diffusion.training_losses_diffusers(\n                    model, latents, timesteps,\n                    model_kwargs = dict(encoder_hidden_states=y, encoder_attention_mask=y_mask, added_cond_kwargs=data_info),\n                )\n                loss = loss_term['loss'].mean()\n                accelerator.backward(loss)\n                if accelerator.sync_gradients:\n                    grad_norm = accelerator.clip_grad_norm_(model.parameters(), config.gradient_clip)\n                optimizer.step()\n                lr_scheduler.step()\n\n                # if accelerator.sync_gradients:\n                #     ema_update(model_ema, accelerator.unwrap_model(model), config.ema_rate)\n\n            lr = lr_scheduler.get_last_lr()[0]\n            logs = {args.loss_report_name: accelerator.gather(loss).mean().item()}\n            if grad_norm is not None:\n                logs.update(grad_norm=accelerator.gather(grad_norm).mean().item())\n            log_buffer.update(logs)\n            if (step + 1) % config.log_interval == 0 or (step + 1) == 1:\n                t = (time.time() - last_tic) / config.log_interval\n                t_d = data_time_all / config.log_interval\n                avg_time = (time.time() - time_start) / (global_step - start_step)\n                eta = str(datetime.timedelta(seconds=int(avg_time * (total_steps - global_step - 1))))\n                eta_epoch = str(datetime.timedelta(seconds=int(avg_time * (len(train_dataloader) - step - 1))))\n                # avg_loss = sum(loss_buffer) / len(loss_buffer)\n                log_buffer.average()\n                info = f\"Step/Epoch [{global_step}/{epoch}][{step + 1}/{len(train_dataloader)}]:total_eta: {eta}, \" \\\n                       f\"epoch_eta:{eta_epoch}, time_all:{t:.3f}, time_data:{t_d:.3f}, lr:{lr:.3e},\" \\\n                       f\"s:({data_info['resolution'][0][0].item()}, {data_info['resolution'][0][1].item()}), \"\n                       # f\"s:({data_info['resolution'][0][0].item() * relative_to_1024 // 8}, {data_info['resolution'][0][1].item() * relative_to_1024 // 8}), \"\n                info += ', '.join([f\"{k}:{v:.4f}\" for k, v in log_buffer.output.items()])\n                logger.info(info)\n                last_tic = time.time()\n                log_buffer.clear()\n                data_time_all = 0\n            logs.update(lr=lr)\n            accelerator.log(logs, step=global_step)\n\n            global_step += 1\n            data_time_start= time.time()\n\n            accelerator.wait_for_everyone()\n            if accelerator.is_main_process:\n                if global_step % config.save_model_steps == 0:\n                    save_path = os.path.join(os.path.join(config.work_dir, 'checkpoints'), f\"checkpoint-{global_step}\")\n                    os.umask(0o000)\n                    logger.info(f\"Start to save state to {save_path}\")\n                    accelerator.save_state(save_path)\n                    logger.info(f\"Saved state to {save_path}\")\n\n                if global_step % config.eval_sampling_steps == 0 or (step + 1) == 1:\n                    log_validation(model, accelerator, weight_dtype, global_step)\n\n        accelerator.wait_for_everyone()\n        if epoch % config.save_model_epochs == 0 or epoch == config.num_epochs:\n            os.umask(0o000)\n            save_path = os.path.join(os.path.join(config.work_dir, 'checkpoints'), f\"checkpoint-{global_step}\")\n            logger.info(f\"Start to save state to {save_path}\")\n            model = accelerator.unwrap_model(model)\n            model.save_pretrained(save_path)\n            logger.info(f\"Saved state to {save_path}\")\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description=\"Process some integers.\")\n    parser.add_argument(\"config\", type=str, help=\"config\")\n    parser.add_argument(\"--cloud\", action='store_true', default=False, help=\"cloud or local machine\")\n    parser.add_argument('--work-dir', help='the dir to save logs and models')\n    parser.add_argument('--resume-from', help='the dir to resume the training')\n    parser.add_argument('--load-from', default=None, help='the dir to load a ckpt for training')\n    parser.add_argument('--local-rank', type=int, default=-1)\n    parser.add_argument('--local_rank', type=int, default=-1)\n    parser.add_argument('--debug', action='store_true')\n    parser.add_argument(\"--pipeline_load_from\", default='output/pretrained_models/pixart_omega_sdxl_256px_diffusers_from512', type=str, help=\"path for loading text_encoder, tokenizer and vae\")\n    parser.add_argument(\n        \"--report_to\",\n        type=str,\n        default=\"tensorboard\",\n        help=(\n            'The integration to report the results and logs to. Supported platforms are `\"tensorboard\"`'\n            ' (default), `\"wandb\"` and `\"comet_ml\"`. Use `\"all\"` to report to all integrations.'\n        ),\n    )\n    parser.add_argument(\n        \"--tracker_project_name\",\n        type=str,\n        default=\"text2image-pixart-omega\",\n        help=(\n            \"The `project_name` argument passed to Accelerator.init_trackers for\"\n            \" more information see https://huggingface.co/docs/accelerate/v0.17.0/en/package_reference/accelerator#accelerate.Accelerator\"\n        ),\n    )\n    parser.add_argument(\"--loss_report_name\", type=str, default=\"loss\")\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == '__main__':\n    args = parse_args()\n    config = read_config(args.config)\n    if args.work_dir is not None:\n        # update configs according to CLI args if args.work_dir is not None\n        config.work_dir = args.work_dir\n    if args.cloud:\n        config.data_root = '/data/data'\n    if args.resume_from is not None:\n        config.resume_from = args.resume_from\n    if args.debug:\n        config.log_interval = 1\n        config.train_batch_size = 32\n        config.valid_num = 100\n\n    os.umask(0o000)\n    os.makedirs(config.work_dir, exist_ok=True)\n\n    init_handler = InitProcessGroupKwargs()\n    init_handler.timeout = datetime.timedelta(seconds=5400)  # change timeout to avoid a strange NCCL bug\n    # Initialize accelerator and tensorboard logging\n    if config.use_fsdp:\n        init_train = 'FSDP'\n        from accelerate import FullyShardedDataParallelPlugin\n        from torch.distributed.fsdp.fully_sharded_data_parallel import FullStateDictConfig\n        set_fsdp_env()\n        fsdp_plugin = FullyShardedDataParallelPlugin(state_dict_config=FullStateDictConfig(offload_to_cpu=False, rank0_only=False),)\n    else:\n        init_train = 'DDP'\n        fsdp_plugin = None\n\n    even_batches = True\n    if config.multi_scale:\n        even_batches=False,\n\n    accelerator = Accelerator(\n        mixed_precision=config.mixed_precision,\n        gradient_accumulation_steps=config.gradient_accumulation_steps,\n        log_with=args.report_to,\n        project_dir=os.path.join(config.work_dir, \"logs\"),\n        fsdp_plugin=fsdp_plugin,\n        even_batches=even_batches,\n        kwargs_handlers=[init_handler]\n    )\n\n    log_name = 'train_log.log'\n    if accelerator.is_main_process:\n        if os.path.exists(os.path.join(config.work_dir, log_name)):\n            rename_file_with_creation_time(os.path.join(config.work_dir, log_name))\n    logger = get_root_logger(os.path.join(config.work_dir, log_name))\n\n    logger.info(accelerator.state)\n    config.seed = init_random_seed(config.get('seed', None))\n    set_random_seed(config.seed)\n\n    if accelerator.is_main_process:\n        config.dump(os.path.join(config.work_dir, 'config.py'))\n\n    logger.info(f\"Config: \\n{config.pretty_text}\")\n    logger.info(f\"World_size: {get_world_size()}, seed: {config.seed}\")\n    logger.info(f\"Initializing: {init_train} for training\")\n    image_size = config.image_size  # @param [256, 512, 1024]\n    latent_size = int(image_size) // 8\n    relative_to_1024 = float(image_size / 1024)\n    pred_sigma = getattr(config, 'pred_sigma', True)\n    learn_sigma = getattr(config, 'learn_sigma', True) and pred_sigma\n\n    # Create for unconditional prompt embedding for classifier free guidance\n    logger.info(\"Embedding for classifier free guidance\")\n    max_length = config.model_max_length\n    uncond_prompt_embeds, uncond_prompt_attention_mask = get_null_embed(\n        f'output/pretrained_models/null_embed_diffusers_{max_length}token.pth', max_length=max_length\n    )\n    # preparing embeddings for visualization. We put it here for saving GPU memory\n    prepare_vis()\n\n    # build models\n    train_diffusion = IDDPM(str(config.train_sampling_steps), learn_sigma=learn_sigma, pred_sigma=pred_sigma, snr=config.snr_loss)\n    model = Transformer2DModel.from_pretrained(config.load_from, subfolder=\"transformer\").train()\n    logger.info(f\"{model.__class__.__name__} Model Parameters: {sum(p.numel() for p in model.parameters()):,}\")\n    logger.info(f\"lewei scale: {model.pos_embed.interpolation_scale} base size: {model.pos_embed.base_size}\")\n    # model_ema = deepcopy(model).eval()\n\n    # 9. Handle mixed precision and device placement\n    # For mixed precision training we cast all non-trainable weigths to half-precision\n    # as these weights are only used for inference, keeping weights in full precision is not required.\n    weight_dtype = torch.float32\n    if accelerator.mixed_precision == \"fp16\":\n        weight_dtype = torch.float16\n    elif accelerator.mixed_precision == \"bf16\":\n        weight_dtype = torch.bfloat16\n\n    # 11. Enable optimizations\n    # model.enable_xformers_memory_efficient_attention()    # not available for now\n\n    # for name, params in model.named_parameters():\n    #     if params.requires_grad == False: logger.info(f\"freeze param: {name}\")\n    #\n    # for name, params in model.named_parameters():\n    #     if params.requires_grad == True: logger.info(f\"trainable param: {name}\")\n\n    # 10. Handle saving and loading of checkpoints\n    # `accelerate` 0.16.0 will have better support for customized saving\n    if version.parse(accelerate.__version__) >= version.parse(\"0.16.0\"):\n        # create custom saving & loading hooks so that `accelerator.save_state(...)` serializes in a nice format\n        def save_model_hook(models, weights, output_dir):\n            if accelerator.is_main_process:\n                transformer_ = accelerator.unwrap_model(models[0])\n                # save weights in peft format to be able to load them back\n                transformer_.save_pretrained(output_dir)\n\n                for _, model in enumerate(models):\n                    # make sure to pop weight so that corresponding model is not saved again\n                    weights.pop()\n\n        def load_model_hook(models, input_dir):\n\n            for i in range(len(models)):\n                # pop models so that they are not loaded again\n                model = models.pop()\n\n                # load diffusers style into model\n                load_model = Transformer2DModel.from_pretrained(input_dir)\n                model.register_to_config(**load_model.config)\n\n                model.load_state_dict(load_model.state_dict())\n                del load_model\n\n        accelerator.register_save_state_pre_hook(save_model_hook)\n        accelerator.register_load_state_pre_hook(load_model_hook)\n\n    if config.grad_checkpointing:\n        model.enable_gradient_checkpointing()\n\n    if not config.data.load_vae_feat:\n        vae = AutoencoderKL.from_pretrained(config.vae_pretrained).cuda()\n\n    # prepare for FSDP clip grad norm calculation\n    if accelerator.distributed_type == DistributedType.FSDP:\n        for m in accelerator._models:\n            m.clip_grad_norm_ = types.MethodType(clip_grad_norm_, m)\n\n    # build dataloader\n    set_data_root(config.data_root)\n    logger.info(f\"ratio of real user prompt: {config.real_prompt_ratio}\")\n    dataset = build_dataset(\n        config.data, resolution=image_size, aspect_ratio_type=config.aspect_ratio_type,\n        real_prompt_ratio=config.real_prompt_ratio, max_length=max_length, config=config,\n    )\n    if config.multi_scale:\n        batch_sampler = AspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,\n                                                batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio, drop_last=True,\n                                                ratio_nums=dataset.ratio_nums, config=config, valid_num=config.valid_num)\n        # used for balanced sampling\n        # batch_sampler = BalancedAspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,\n        #                                                 batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio,\n        #                                                 ratio_nums=dataset.ratio_nums)\n        train_dataloader = build_dataloader(dataset, batch_sampler=batch_sampler, num_workers=config.num_workers)\n    else:\n        train_dataloader = build_dataloader(dataset, num_workers=config.num_workers, batch_size=config.train_batch_size, shuffle=True)\n\n    # build optimizer and lr scheduler\n    lr_scale_ratio = 1\n    if config.get('auto_lr', None):\n        lr_scale_ratio = auto_scale_lr(config.train_batch_size * get_world_size() * config.gradient_accumulation_steps,\n                                       config.optimizer, **config.auto_lr)\n    optimizer = build_optimizer(model, config.optimizer)\n    lr_scheduler = build_lr_scheduler(config, optimizer, train_dataloader, lr_scale_ratio)\n\n    timestamp = time.strftime(\"%Y-%m-%d_%H:%M:%S\", time.localtime())\n\n    if accelerator.is_main_process:\n        tracker_config = dict(vars(config))\n        accelerator.init_trackers(f\"tb_{timestamp}_{args.tracker_project_name}\")\n        logger.info(f\"Training tracker at tb_{timestamp}_{args.tracker_project_name}\")\n\n    start_epoch = 0\n    start_step = 0\n    total_steps = len(train_dataloader) * config.num_epochs\n\n    # Prepare everything\n    # There is no specific order to remember, you just need to unpack the\n    # objects in the same order you gave them to the prepare method.\n    # model, model_ema = accelerator.prepare(model, model_ema)\n    model = accelerator.prepare(model)\n    optimizer, train_dataloader, lr_scheduler = accelerator.prepare(optimizer, train_dataloader, lr_scheduler)\n\n    if config.resume_from is not None:\n        if config.resume_from != \"latest\":\n            path = os.path.basename(config.resume_from)\n        else:\n            # Get the most recent checkpoint\n            dirs = os.listdir(os.path.join(config.work_dir, 'checkpoints'))\n            dirs = [d for d in dirs if d.startswith(\"checkpoint\")]\n            dirs = sorted(dirs, key=lambda x: int(x.split(\"-\")[1]))\n            path = dirs[-1] if len(dirs) > 0 else None\n\n        if path is None:\n            accelerator.print(f\"Checkpoint '{config.resume_from}' does not exist. Starting a new training run.\")\n            config.resume_from = None\n        else:\n            accelerator.print(f\"Resuming from checkpoint {path}\")\n            accelerator.load_state(os.path.join(config.work_dir, 'checkpoints', path))\n            start_step = int(path.split(\"-\")[1])\n            start_epoch = start_step // len(train_dataloader)\n\n    train(model)"
  },
  {
    "path": "PixArt-alpha-ToCa/train_scripts/train_dreambooth.py",
    "content": "import os\nimport sys\nimport types\nfrom pathlib import Path\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\nimport argparse\nimport datetime\nimport time\nimport warnings\nwarnings.filterwarnings(\"ignore\")  # ignore warning\n\nfrom mmcv.runner import LogBuffer\nfrom copy import deepcopy\nfrom diffusion.utils.checkpoint import save_checkpoint, load_checkpoint\n\nimport torch\nimport torch.nn as nn\nfrom accelerate import Accelerator, InitProcessGroupKwargs\nfrom accelerate.utils import DistributedType\nfrom torch.utils.data import RandomSampler\n\nfrom diffusion import IDDPM\nfrom diffusion.utils.dist_utils import synchronize, get_world_size, clip_grad_norm_\nfrom diffusion.data.builder import build_dataset, build_dataloader, set_data_root\nfrom diffusion.model.builder import build_model\nfrom diffusion.utils.logger import get_root_logger\nfrom diffusion.utils.misc import set_random_seed, read_config, init_random_seed, DebugUnderflowOverflow\nfrom diffusion.utils.optimizer import build_optimizer, auto_scale_lr\nfrom diffusion.utils.lr_scheduler import build_lr_scheduler\nfrom diffusion.model.t5 import T5Embedder\nfrom diffusion.utils.data_sampler import AspectRatioBatchSampler\n\ndef set_fsdp_env():\n    os.environ[\"ACCELERATE_USE_FSDP\"] = 'true'\n    os.environ[\"FSDP_AUTO_WRAP_POLICY\"] = 'TRANSFORMER_BASED_WRAP'\n    os.environ[\"FSDP_BACKWARD_PREFETCH\"] = 'BACKWARD_PRE'\n    os.environ[\"FSDP_TRANSFORMER_CLS_TO_WRAP\"] = 'PixArtBlock'\n\n\ndef ema_update(model_dest: nn.Module, model_src: nn.Module, rate):\n    param_dict_src = dict(model_src.named_parameters())\n    for p_name, p_dest in model_dest.named_parameters():\n        p_src = param_dict_src[p_name]\n        assert p_src is not p_dest\n        p_dest.data.mul_(rate).add_((1 - rate) * p_src.data)\n\ndef train():\n    if config.get('debug_nan', False):\n        DebugUnderflowOverflow(model)\n        logger.info('NaN debugger registered. Start to detect overflow during training.')\n    time_start, last_tic = time.time(), time.time()\n    log_buffer = LogBuffer()\n\n    start_step = start_epoch * len(train_dataloader)\n    global_step = 0\n    total_steps = len(train_dataloader) * config.num_epochs\n    # txt related\n    prompt = config.data.prompt if isinstance(config.data.prompt, list) else [config.data.prompt]\n    llm_embed_model = T5Embedder(device=\"cpu\", local_cache=True, cache_dir='output/pretrained_models/t5_ckpts', torch_dtype=torch.float)\n    prompt_embs, attention_mask = llm_embed_model.get_text_embeddings(prompt)\n    prompt_embs, attention_mask = prompt_embs[None].cuda(), attention_mask[None].cuda()\n    del llm_embed_model\n\n    # Now you train the model\n    for epoch in range(start_epoch + 1, config.num_epochs + 1):\n        data_time_start= time.time()\n        data_time_all = 0\n        for step, batch in enumerate(train_dataloader):\n            data_time_all += time.time() - data_time_start\n            z = batch[0]\n            clean_images = z * config.scale_factor\n            y = prompt_embs\n            y_mask = attention_mask\n            data_info = batch[1]\n\n            # Sample a random timestep for each image\n            bs = clean_images.shape[0]\n            timesteps = torch.randint(0, config.train_sampling_steps, (bs,), device=clean_images.device).long()\n            grad_norm = None\n            with accelerator.accumulate(model):\n                # Predict the noise residual\n                optimizer.zero_grad()\n                loss_term = train_diffusion.training_losses(model, clean_images, timesteps, model_kwargs=dict(y=y, mask=y_mask, data_info=data_info))\n                loss = loss_term['loss'].mean()\n                accelerator.backward(loss)\n                if accelerator.sync_gradients:\n                    grad_norm = accelerator.clip_grad_norm_(model.parameters(), config.gradient_clip)\n                optimizer.step()\n                lr_scheduler.step()\n                if accelerator.sync_gradients:\n                    ema_update(model_ema, model, config.ema_rate)\n\n            lr = lr_scheduler.get_last_lr()[0]\n            logs = {\"loss\": accelerator.gather(loss).mean().item()}\n            if grad_norm is not None:\n                logs.update(grad_norm=accelerator.gather(grad_norm).mean().item())\n            log_buffer.update(logs)\n            if (step + 1) % config.log_interval == 0:\n                t = (time.time() - last_tic) / config.log_interval\n                t_d = data_time_all / config.log_interval\n                avg_time = (time.time() - time_start) / (global_step + 1)\n                eta = str(datetime.timedelta(seconds=int(avg_time * (total_steps - start_step - global_step - 1))))\n                eta_epoch = str(datetime.timedelta(seconds=int(avg_time * (len(train_dataloader) - step - 1))))\n                # avg_loss = sum(loss_buffer) / len(loss_buffer)\n                log_buffer.average()\n                info = f\"Steps [{(epoch-1)*len(train_dataloader)+step+1}][{step + 1}/{len(train_dataloader)}]:total_eta: {eta}, \" \\\n                       f\"epoch_eta:{eta_epoch}, time_all:{t:.3f}, time_data:{t_d:.3f}, lr:{lr:.3e}, s:({model.module.h}, {model.module.w}), \"\n                info += ', '.join([f\"{k}:{v:.4f}\" for k, v in log_buffer.output.items()])\n                logger.info(info)\n                last_tic = time.time()\n                log_buffer.clear()\n                data_time_all = 0\n            logs.update(lr=lr)\n            accelerator.log(logs, step=global_step + start_step)\n\n            global_step += 1\n            data_time_start= time.time()\n\n            synchronize()\n            if accelerator.is_main_process:\n                if ((epoch - 1) * len(train_dataloader) + step + 1) % config.save_model_steps == 0:\n                    os.umask(0o000)\n                    save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),\n                                    epoch=epoch,\n                                    step=(epoch - 1) * len(train_dataloader) + step + 1,\n                                    model=accelerator.unwrap_model(model),\n                                    model_ema=accelerator.unwrap_model(model_ema),\n                                    optimizer=optimizer,\n                                    lr_scheduler=lr_scheduler\n                                    )\n            synchronize()\n\n        synchronize()\n        if accelerator.is_main_process:\n            if epoch % config.save_model_epochs == 0 or epoch == config.num_epochs:\n                os.umask(0o000)\n                save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),\n                                epoch=epoch,\n                                step=(epoch - 1) * len(train_dataloader) + step + 1,\n                                model=accelerator.unwrap_model(model),\n                                model_ema=accelerator.unwrap_model(model_ema),\n                                optimizer=optimizer,\n                                lr_scheduler=lr_scheduler\n                                )\n        synchronize()\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description=\"Process some integers.\")\n    parser.add_argument(\"config\", type=str, help=\"config\")\n    parser.add_argument('--work-dir', help='the dir to save logs and models')\n    parser.add_argument('--resume-from', help='the dir to resume the training')\n    parser.add_argument('--load-from', default=None, help='the dir to load a ckpt for training')\n    parser.add_argument('--local-rank', type=int, default=-1)\n    parser.add_argument('--local_rank', type=int, default=-1)\n    parser.add_argument('--debug', action='store_true')\n\n    parser.add_argument('--save_step', type=int, default=100)\n    parser.add_argument('--lr', type=float, default=5e-6)\n    parser.add_argument('--train_class', type=str)\n    parser.add_argument('--prompt', type=str, default='a photo of sks dog')\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == '__main__':\n    args = parse_args()\n    config = read_config(args.config)\n    if args.work_dir is not None:\n        # update configs according to CLI args if args.work_dir is not None\n        config.work_dir = args.work_dir\n    if args.resume_from is not None:\n        config.resume_from = dict(\n            checkpoint=args.resume_from,\n            load_ema=False,\n            resume_optimizer=True,\n            resume_lr_scheduler=True)\n    if args.debug:\n        config.log_interval = 1\n        config.train_batch_size = 1\n\n        config.save_model_steps=args.save_step\n        config.data.update({'prompt': [args.prompt], 'root': args.train_class})\n        config.optimizer.update({'lr': args.lr})\n\n    os.umask(0o000)\n    os.makedirs(config.work_dir, exist_ok=True)\n\n    init_handler = InitProcessGroupKwargs()\n    init_handler.timeout = datetime.timedelta(seconds=5400)  # change timeout to avoid a strange NCCL bug\n    # Initialize accelerator and tensorboard logging\n    if config.use_fsdp:\n        init_train = 'FSDP'\n        from accelerate import FullyShardedDataParallelPlugin\n        from torch.distributed.fsdp.fully_sharded_data_parallel import FullStateDictConfig\n        set_fsdp_env()\n        fsdp_plugin = FullyShardedDataParallelPlugin(state_dict_config=FullStateDictConfig(offload_to_cpu=False, rank0_only=False),)\n    else:\n        init_train = 'DDP'\n        fsdp_plugin = None\n\n    even_batches = True\n    if config.multi_scale:\n        even_batches=False,\n\n    accelerator = Accelerator(\n        mixed_precision=config.mixed_precision,\n        gradient_accumulation_steps=config.gradient_accumulation_steps,\n        log_with=\"tensorboard\",\n        project_dir=os.path.join(config.work_dir, \"logs\"),\n        fsdp_plugin=fsdp_plugin,\n        even_batches=even_batches,\n        kwargs_handlers=[init_handler]\n    )\n\n    logger = get_root_logger(os.path.join(config.work_dir, 'train_log.log'))\n\n    config.seed = init_random_seed(config.get('seed', None))\n    set_random_seed(config.seed)\n\n    if accelerator.is_main_process:\n        config.dump(os.path.join(config.work_dir, 'config.py'))\n\n    logger.info(f\"Config: \\n{config.pretty_text}\")\n    logger.info(f\"World_size: {get_world_size()}, seed: {config.seed}\")\n    logger.info(f\"Initializing: {init_train} for training\")\n    image_size = config.image_size  # @param [256, 512]\n    latent_size = int(image_size) // 8\n    pred_sigma = getattr(config, 'pred_sigma', True)\n    learn_sigma = getattr(config, 'learn_sigma', True) and pred_sigma\n    model_kwargs={\"window_block_indexes\": config.window_block_indexes, \"window_size\": config.window_size,\n                  \"use_rel_pos\": config.use_rel_pos, \"lewei_scale\": config.lewei_scale, 'config':config,\n                  'model_max_length': config.model_max_length}\n\n    # build models\n    train_diffusion = IDDPM(str(config.train_sampling_steps))\n    eval_diffusion = IDDPM(str(config.eval_sampling_steps))\n\n    model = build_model(config.model,\n                        config.grad_checkpointing,\n                        config.get('fp32_attention', False),\n                        input_size=latent_size,\n                        learn_sigma=learn_sigma,\n                        pred_sigma=pred_sigma,\n                        **model_kwargs).train()\n    logger.info(f\"{config.model} Model Parameters: {sum(p.numel() for p in model.parameters()):,}\")\n    model_ema = deepcopy(model).eval()\n\n    if config.load_from is not None:\n        if args.load_from is not None:\n            config.load_from = args.load_from\n        missing, unexpected = load_checkpoint(config.load_from, model, load_ema=config.get('load_ema', False))\n        # model.reparametrize()\n        if accelerator.is_main_process:\n            print('Warning Missing keys: ', missing)\n            print('Warning Unexpected keys', unexpected)\n\n    ema_update(model_ema, model, 0.)\n\n    # prepare for FSDP clip grad norm calculation\n    if accelerator.distributed_type == DistributedType.FSDP:\n        for m in accelerator._models:\n            m.clip_grad_norm_ = types.MethodType(clip_grad_norm_, m)\n\n    # build dataloader\n    logger.warning(f\"Training prompt: {config.data['prompt']}, Training data class: {config.data['root']}\")\n    set_data_root(config.data_root)\n    dataset = build_dataset(config.data, resolution=image_size, aspect_ratio_type=config.aspect_ratio_type)\n    if config.multi_scale:\n        batch_sampler = AspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,\n                                                batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio, drop_last=True,\n                                                ratio_nums=dataset.ratio_nums, config=config, valid_num=1)\n        # batch_sampler = BalancedAspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,\n        #                                                 batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio,\n        #                                                 ratio_nums=dataset.ratio_nums)\n        train_dataloader = build_dataloader(dataset, batch_sampler=batch_sampler, num_workers=config.num_workers)\n    else:\n        train_dataloader = build_dataloader(dataset, num_workers=config.num_workers, batch_size=config.train_batch_size, shuffle=True)\n\n    # build optimizer and lr scheduler\n    lr_scale_ratio = 1\n    if config.get('auto_lr', None):\n        lr_scale_ratio = auto_scale_lr(config.train_batch_size * get_world_size() * config.gradient_accumulation_steps,\n                                       config.optimizer,\n                                       **config.auto_lr)\n    optimizer = build_optimizer(model, config.optimizer)\n    lr_scheduler = build_lr_scheduler(config, optimizer, train_dataloader, lr_scale_ratio)\n\n    timestamp = time.strftime(\"%Y-%m-%d_%H:%M:%S\", time.localtime())\n\n    if accelerator.is_main_process:\n        accelerator.init_trackers(f\"tb_{timestamp}\")\n\n    start_epoch = 0\n    if config.resume_from is not None and config.resume_from['checkpoint'] is not None:\n        start_epoch, missing, unexpected = load_checkpoint(**config.resume_from,\n                                                           model=model,\n                                                           model_ema=model_ema,\n                                                           optimizer=optimizer,\n                                                           lr_scheduler=lr_scheduler,\n                                                           )\n\n        if accelerator.is_main_process:\n            print('Warning Missing keys: ', missing)\n            print('Warning Unexpected keys', unexpected)\n    # Prepare everything\n    # There is no specific order to remember, you just need to unpack the\n    # objects in the same order you gave them to the prepare method.\n    model, model_ema = accelerator.prepare(model, model_ema)\n    optimizer, train_dataloader, lr_scheduler = accelerator.prepare(optimizer, train_dataloader, lr_scheduler)\n    train()"
  },
  {
    "path": "PixArt-alpha-ToCa/train_scripts/train_pixart_lcm.py",
    "content": "import os\nimport sys\nimport types\nfrom pathlib import Path\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\nimport argparse\nimport datetime\nimport time\nimport warnings\nwarnings.filterwarnings(\"ignore\")  # ignore warning\nimport torch\nimport torch.nn as nn\nfrom accelerate import Accelerator, InitProcessGroupKwargs\nfrom accelerate.utils import DistributedType\nfrom diffusers.models import AutoencoderKL\nfrom torch.utils.data import RandomSampler\nfrom mmcv.runner import LogBuffer\nfrom copy import deepcopy\nimport numpy as np\nimport torch.nn.functional as F\nfrom tqdm import tqdm\n\nfrom diffusion import IDDPM\nfrom diffusion.utils.checkpoint import save_checkpoint, load_checkpoint\nfrom diffusion.utils.dist_utils import synchronize, get_world_size, clip_grad_norm_\nfrom diffusion.data.builder import build_dataset, build_dataloader, set_data_root\nfrom diffusion.model.builder import build_model\nfrom diffusion.utils.logger import get_root_logger\nfrom diffusion.utils.misc import set_random_seed, read_config, init_random_seed, DebugUnderflowOverflow\nfrom diffusion.utils.optimizer import build_optimizer, auto_scale_lr\nfrom diffusion.utils.lr_scheduler import build_lr_scheduler\nfrom diffusion.utils.data_sampler import AspectRatioBatchSampler, BalancedAspectRatioBatchSampler\nfrom diffusion.lcm_scheduler import LCMScheduler\nfrom torchvision.utils import save_image\n\n\ndef set_fsdp_env():\n    os.environ[\"ACCELERATE_USE_FSDP\"] = 'true'\n    os.environ[\"FSDP_AUTO_WRAP_POLICY\"] = 'TRANSFORMER_BASED_WRAP'\n    os.environ[\"FSDP_BACKWARD_PREFETCH\"] = 'BACKWARD_PRE'\n    os.environ[\"FSDP_TRANSFORMER_CLS_TO_WRAP\"] = 'PixArtBlock'\n\n\ndef ema_update(model_dest: nn.Module, model_src: nn.Module, rate):\n    param_dict_src = dict(model_src.named_parameters())\n    for p_name, p_dest in model_dest.named_parameters():\n        p_src = param_dict_src[p_name]\n        assert p_src is not p_dest\n        p_dest.data.mul_(rate).add_((1 - rate) * p_src.data)\n\n\ndef append_dims(x, target_dims):\n    \"\"\"Appends dimensions to the end of a tensor until it has target_dims dimensions.\"\"\"\n    dims_to_append = target_dims - x.ndim\n    if dims_to_append < 0:\n        raise ValueError(f\"input has {x.ndim} dims but target_dims is {target_dims}, which is less\")\n    return x[(...,) + (None,) * dims_to_append]\n\n\n# From LCMScheduler.get_scalings_for_boundary_condition_discrete\ndef scalings_for_boundary_conditions(timestep, sigma_data=0.5, timestep_scaling=10.0):\n    c_skip = sigma_data**2 / ((timestep / 0.1) ** 2 + sigma_data**2)\n    c_out = (timestep / 0.1) / ((timestep / 0.1) ** 2 + sigma_data**2) ** 0.5\n    return c_skip, c_out\n\n\ndef extract_into_tensor(a, t, x_shape):\n    b, *_ = t.shape\n    out = a.gather(-1, t)\n    return out.reshape(b, *((1,) * (len(x_shape) - 1)))\n\n\nclass DDIMSolver:\n    def __init__(self, alpha_cumprods, timesteps=1000, ddim_timesteps=50):\n        # DDIM sampling parameters\n        step_ratio = timesteps // ddim_timesteps\n\n        self.ddim_timesteps = (np.arange(1, ddim_timesteps + 1) * step_ratio).round().astype(np.int64) - 1\n        self.ddim_alpha_cumprods = alpha_cumprods[self.ddim_timesteps]\n        self.ddim_alpha_cumprods_prev = np.asarray(\n            [alpha_cumprods[0]] + alpha_cumprods[self.ddim_timesteps[:-1]].tolist()\n        )\n        # convert to torch tensors\n        self.ddim_timesteps = torch.from_numpy(self.ddim_timesteps).long()\n        self.ddim_alpha_cumprods = torch.from_numpy(self.ddim_alpha_cumprods)\n        self.ddim_alpha_cumprods_prev = torch.from_numpy(self.ddim_alpha_cumprods_prev)\n\n    def to(self, device):\n        self.ddim_timesteps = self.ddim_timesteps.to(device)\n        self.ddim_alpha_cumprods = self.ddim_alpha_cumprods.to(device)\n        self.ddim_alpha_cumprods_prev = self.ddim_alpha_cumprods_prev.to(device)\n        return self\n\n    def ddim_step(self, pred_x0, pred_noise, timestep_index):\n        alpha_cumprod_prev = extract_into_tensor(self.ddim_alpha_cumprods_prev, timestep_index, pred_x0.shape)\n        dir_xt = (1.0 - alpha_cumprod_prev).sqrt() * pred_noise\n        x_prev = alpha_cumprod_prev.sqrt() * pred_x0 + dir_xt\n        return x_prev\n\n\n@torch.no_grad()\ndef log_validation(model, step, device):\n    if hasattr(model, 'module'):\n        model = model.module\n    scheduler = LCMScheduler(beta_start=0.0001, beta_end=0.02, beta_schedule=\"linear\", prediction_type=\"epsilon\")\n    scheduler.set_timesteps(4, 50)\n    infer_timesteps = scheduler.timesteps\n\n    dog_embed = torch.load('data/tmp/dog.pth', map_location='cpu')\n    caption_embs, emb_masks = dog_embed['dog_text'].to(device), dog_embed['dog_mask'].to(device)\n    hw = torch.tensor([[1024, 1024]], dtype=torch.float, device=device).repeat(1, 1)\n    ar = torch.tensor([[1.]], device=device).repeat(1, 1)\n    # Create sampling noise:\n    infer_latents = torch.randn(1, 4, 1024, 1024, device=device)\n    model_kwargs = dict(data_info={'img_hw': hw, 'aspect_ratio': ar}, mask=emb_masks)\n    logger.info(\"Running validation... \")\n\n    # 7. LCM MultiStep Sampling Loop:\n    for i, t in tqdm(list(enumerate(infer_timesteps))):\n        ts = torch.full((1,), t, device=device, dtype=torch.long)\n\n        # model prediction (v-prediction, eps, x)\n        model_pred = model(infer_latents, ts, caption_embs, **model_kwargs)[:, :4]\n\n        # compute the previous noisy sample x_t -> x_t-1\n        infer_latents, denoised = scheduler.step(model_pred, i, t, infer_latents, return_dict=False)\n    samples = vae.decode(denoised / 0.18215).sample\n    torch.cuda.empty_cache()\n    save_image(samples[0], f'output_cv/vis/{step}.jpg', nrow=1, normalize=True, value_range=(-1, 1))\n\n\ndef train():\n    if config.get('debug_nan', False):\n        DebugUnderflowOverflow(model)\n        logger.info('NaN debugger registered. Start to detect overflow during training.')\n    time_start, last_tic = time.time(), time.time()\n    log_buffer = LogBuffer()\n\n    start_step = start_epoch * len(train_dataloader)\n    global_step = 0\n    total_steps = len(train_dataloader) * config.num_epochs\n\n    load_vae_feat = getattr(train_dataloader.dataset, 'load_vae_feat', False)\n\n    # Create uncond embeds for classifier free guidance\n    uncond_prompt_embeds = model.module.y_embedder.y_embedding.repeat(config.train_batch_size, 1, 1, 1)\n\n    # Now you train the model\n    for epoch in range(start_epoch + 1, config.num_epochs + 1):\n        data_time_start= time.time()\n        data_time_all = 0\n        for step, batch in enumerate(train_dataloader):\n            data_time_all += time.time() - data_time_start\n            if load_vae_feat:\n                z = batch[0]\n            else:\n                with torch.no_grad():\n                    with torch.cuda.amp.autocast(enabled=config.mixed_precision == 'fp16'):\n                        posterior = vae.encode(batch[0]).latent_dist\n                        if config.sample_posterior:\n                            z = posterior.sample()\n                        else:\n                            z = posterior.mode()\n            latents = z * config.scale_factor\n            y = batch[1]\n            y_mask = batch[2]\n            data_info = batch[3]\n\n            # Sample a random timestep for each image\n            grad_norm = None\n            with accelerator.accumulate(model):\n                # Predict the noise residual\n                optimizer.zero_grad()\n                # Sample noise that we'll add to the latents\n                noise = torch.randn_like(latents)\n                bsz = latents.shape[0]\n\n                # Sample a random timestep for each image t_n ~ U[0, N - k - 1] without bias.\n                topk = config.train_sampling_steps // config.num_ddim_timesteps\n                index = torch.randint(0, config.num_ddim_timesteps, (bsz,), device=latents.device).long()\n                start_timesteps = solver.ddim_timesteps[index]\n                timesteps = start_timesteps - topk\n                timesteps = torch.where(timesteps < 0, torch.zeros_like(timesteps), timesteps)\n\n                # Get boundary scalings for start_timesteps and (end) timesteps.\n                c_skip_start, c_out_start = scalings_for_boundary_conditions(start_timesteps)\n                c_skip_start, c_out_start = [append_dims(x, latents.ndim) for x in [c_skip_start, c_out_start]]\n                c_skip, c_out = scalings_for_boundary_conditions(timesteps)\n                c_skip, c_out = [append_dims(x, latents.ndim) for x in [c_skip, c_out]]\n\n                # Sample a random guidance scale w from U[w_min, w_max] and embed it\n                # w = (config.w_max - config.w_min) * torch.rand((bsz,)) + config.w_min\n                w = config.cfg_scale * torch.ones((bsz,))\n                w = w.reshape(bsz, 1, 1, 1)\n                w = w.to(device=latents.device, dtype=latents.dtype)\n\n                # Get online LCM prediction on z_{t_{n + k}}, w, c, t_{n + k}\n                _, pred_x_0, noisy_model_input = train_diffusion.training_losses(model, latents, start_timesteps, model_kwargs=dict(y=y, mask=y_mask, data_info=data_info), noise=noise)\n\n                model_pred = c_skip_start * noisy_model_input + c_out_start * pred_x_0\n\n                # Use the ODE solver to predict the kth step in the augmented PF-ODE trajectory after\n                # noisy_latents with both the conditioning embedding c and unconditional embedding 0\n                # Get teacher model prediction on noisy_latents and conditional embedding\n                with torch.no_grad():\n                    with torch.autocast(\"cuda\"):\n                        cond_teacher_output, cond_pred_x0, _ = train_diffusion.training_losses(model_teacher, latents, start_timesteps, model_kwargs=dict(y=y, mask=y_mask, data_info=data_info), noise=noise)\n\n                        # Get teacher model prediction on noisy_latents and unconditional embedding\n                        uncond_teacher_output, uncond_pred_x0, _ = train_diffusion.training_losses(model_teacher, latents, start_timesteps, model_kwargs=dict(y=uncond_prompt_embeds, mask=y_mask, data_info=data_info), noise=noise)\n\n                        # Perform \"CFG\" to get x_prev estimate (using the LCM paper's CFG formulation)\n                        pred_x0 = cond_pred_x0 + w * (cond_pred_x0 - uncond_pred_x0)\n                        pred_noise = cond_teacher_output + w * (cond_teacher_output - uncond_teacher_output)\n                        x_prev = solver.ddim_step(pred_x0, pred_noise, index)\n\n                # Get target LCM prediction on x_prev, w, c, t_n\n                with torch.no_grad():\n                    with torch.autocast(\"cuda\", enabled=True):\n                        _, pred_x_0, _ = train_diffusion.training_losses(model_ema, x_prev.float(), timesteps, model_kwargs=dict(y=y, mask=y_mask, data_info=data_info), skip_noise=True)\n\n                    target = c_skip * x_prev + c_out * pred_x_0\n\n                # Calculate loss\n                if config.loss_type == \"l2\":\n                    loss = F.mse_loss(model_pred.float(), target.float(), reduction=\"mean\")\n                elif config.loss_type == \"huber\":\n                    loss = torch.mean(torch.sqrt((model_pred.float() - target.float()) ** 2 + config.huber_c**2) - config.huber_c)\n\n                # Backpropagation on the online student model (`model`)\n                accelerator.backward(loss)\n                if accelerator.sync_gradients:\n                    grad_norm = accelerator.clip_grad_norm_(model.parameters(), config.gradient_clip)\n                optimizer.step()\n                lr_scheduler.step()\n                optimizer.zero_grad(set_to_none=True)\n\n                if accelerator.sync_gradients:\n                    ema_update(model_ema, model, config.ema_decay)\n\n            lr = lr_scheduler.get_last_lr()[0]\n            logs = {\"loss\": accelerator.gather(loss).mean().item()}\n            if grad_norm is not None:\n                logs.update(grad_norm=accelerator.gather(grad_norm).mean().item())\n            log_buffer.update(logs)\n            if (step + 1) % config.log_interval == 0 or (step + 1) == 1:\n                t = (time.time() - last_tic) / config.log_interval\n                t_d = data_time_all / config.log_interval\n                avg_time = (time.time() - time_start) / (global_step + 1)\n                eta = str(datetime.timedelta(seconds=int(avg_time * (total_steps - start_step - global_step - 1))))\n                eta_epoch = str(datetime.timedelta(seconds=int(avg_time * (len(train_dataloader) - step - 1))))\n                # avg_loss = sum(loss_buffer) / len(loss_buffer)\n                log_buffer.average()\n                info = f\"Step/Epoch [{(epoch-1)*len(train_dataloader)+step+1}/{epoch}][{step + 1}/{len(train_dataloader)}]:total_eta: {eta}, \" \\\n                       f\"epoch_eta:{eta_epoch}, time_all:{t:.3f}, time_data:{t_d:.3f}, lr:{lr:.3e}, s:({data_info['resolution'][0][0].item()}, {data_info['resolution'][0][1].item()}), \"\n                info += ', '.join([f\"{k}:{v:.4f}\" for k, v in log_buffer.output.items()])\n                logger.info(info)\n                last_tic = time.time()\n                log_buffer.clear()\n                data_time_all = 0\n            logs.update(lr=lr)\n            accelerator.log(logs, step=global_step + start_step)\n\n            global_step += 1\n            data_time_start= time.time()\n\n            synchronize()\n            torch.cuda.empty_cache()\n            if accelerator.is_main_process:\n                # log_validation(model_ema, step, model.device)\n                if ((epoch - 1) * len(train_dataloader) + step + 1) % config.save_model_steps == 0:\n                    os.umask(0o000)\n                    save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),\n                                    epoch=epoch,\n                                    step=(epoch - 1) * len(train_dataloader) + step + 1,\n                                    model=accelerator.unwrap_model(model),\n                                    model_ema=accelerator.unwrap_model(model_ema),\n                                    optimizer=optimizer,\n                                    lr_scheduler=lr_scheduler\n                                    )\n            synchronize()\n\n        synchronize()\n        if accelerator.is_main_process:\n            if epoch % config.save_model_epochs == 0 or epoch == config.num_epochs:\n                os.umask(0o000)\n                save_checkpoint(os.path.join(config.work_dir, 'checkpoints'),\n                                epoch=epoch,\n                                step=(epoch - 1) * len(train_dataloader) + step + 1,\n                                model=accelerator.unwrap_model(model),\n                                model_ema=accelerator.unwrap_model(model_ema),\n                                optimizer=optimizer,\n                                lr_scheduler=lr_scheduler\n                                )\n        synchronize()\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description=\"Process some integers.\")\n    parser.add_argument(\"config\", type=str, help=\"config\")\n    parser.add_argument(\"--cloud\", action='store_true', default=False, help=\"cloud or local machine\")\n    parser.add_argument('--work-dir', help='the dir to save logs and models')\n    parser.add_argument('--resume-from', help='the dir to resume the training')\n    parser.add_argument('--load-from', default=None, help='the dir to load a ckpt for training')\n    parser.add_argument('--local-rank', type=int, default=-1)\n    parser.add_argument('--local_rank', type=int, default=-1)\n    parser.add_argument('--debug', action='store_true')\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == '__main__':\n    args = parse_args()\n    config = read_config(args.config)\n    if args.work_dir is not None:\n        # update configs according to CLI args if args.work_dir is not None\n        config.work_dir = args.work_dir\n    if args.cloud:\n        config.data_root = '/data/data'\n    if args.resume_from is not None:\n        config.load_from = None\n        config.resume_from = dict(\n            checkpoint=args.resume_from,\n            load_ema=False,\n            resume_optimizer=True,\n            resume_lr_scheduler=True)\n    if args.debug:\n        config.log_interval = 1\n        config.train_batch_size = 11\n        config.valid_num = 100\n        config.load_from = None\n\n    os.umask(0o000)\n    os.makedirs(config.work_dir, exist_ok=True)\n\n    init_handler = InitProcessGroupKwargs()\n    init_handler.timeout = datetime.timedelta(seconds=5400)  # change timeout to avoid a strange NCCL bug\n    # Initialize accelerator and tensorboard logging\n    if config.use_fsdp:\n        init_train = 'FSDP'\n        from accelerate import FullyShardedDataParallelPlugin\n        from torch.distributed.fsdp.fully_sharded_data_parallel import FullStateDictConfig\n        set_fsdp_env()\n        fsdp_plugin = FullyShardedDataParallelPlugin(state_dict_config=FullStateDictConfig(offload_to_cpu=False, rank0_only=False),)\n    else:\n        init_train = 'DDP'\n        fsdp_plugin = None\n\n    even_batches = True\n    if config.multi_scale:\n        even_batches=False,\n\n    accelerator = Accelerator(\n        mixed_precision=config.mixed_precision,\n        gradient_accumulation_steps=config.gradient_accumulation_steps,\n        log_with=\"tensorboard\",\n        project_dir=os.path.join(config.work_dir, \"logs\"),\n        fsdp_plugin=fsdp_plugin,\n        even_batches=even_batches,\n        kwargs_handlers=[init_handler]\n    )\n\n    logger = get_root_logger(os.path.join(config.work_dir, 'train_log.log'))\n\n    config.seed = init_random_seed(config.get('seed', None))\n    set_random_seed(config.seed)\n\n    if accelerator.is_main_process:\n        config.dump(os.path.join(config.work_dir, 'config.py'))\n\n    logger.info(f\"Config: \\n{config.pretty_text}\")\n    logger.info(f\"World_size: {get_world_size()}, seed: {config.seed}\")\n    logger.info(f\"Initializing: {init_train} for training\")\n    image_size = config.image_size  # @param [256, 512]\n    latent_size = int(image_size) // 8\n    pred_sigma = getattr(config, 'pred_sigma', True)\n    learn_sigma = getattr(config, 'learn_sigma', True) and pred_sigma\n    model_kwargs={\"window_block_indexes\": config.window_block_indexes, \"window_size\": config.window_size,\n                  \"use_rel_pos\": config.use_rel_pos, \"lewei_scale\": config.lewei_scale, 'config':config,\n                  'model_max_length': config.model_max_length}\n\n    # build models\n    train_diffusion = IDDPM(str(config.train_sampling_steps), learn_sigma=learn_sigma, pred_sigma=pred_sigma,\n                            snr=config.snr_loss, return_startx=True)\n    model = build_model(config.model,\n                        config.grad_checkpointing,\n                        config.get('fp32_attention', False),\n                        input_size=latent_size,\n                        learn_sigma=learn_sigma,\n                        pred_sigma=pred_sigma,\n                        **model_kwargs).train()\n    logger.info(f\"{model.__class__.__name__} Model Parameters: {sum(p.numel() for p in model.parameters()):,}\")\n\n    if config.load_from is not None:\n        if args.load_from is not None:\n            config.load_from = args.load_from\n        missing, unexpected = load_checkpoint(config.load_from, model, load_ema=config.get('load_ema', False))\n        logger.warning(f'Missing keys: {missing}')\n        logger.warning(f'Unexpected keys: {unexpected}')\n\n    model_ema = deepcopy(model).eval()\n    model_teacher = deepcopy(model).eval()\n\n    if not config.data.load_vae_feat:\n        vae = AutoencoderKL.from_pretrained(config.vae_pretrained).cuda()\n\n    # prepare for FSDP clip grad norm calculation\n    if accelerator.distributed_type == DistributedType.FSDP:\n        for m in accelerator._models:\n            m.clip_grad_norm_ = types.MethodType(clip_grad_norm_, m)\n\n    # build dataloader\n    set_data_root(config.data_root)\n    dataset = build_dataset(config.data, resolution=image_size, aspect_ratio_type=config.aspect_ratio_type)\n    if config.multi_scale:\n        batch_sampler = AspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,\n                                                batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio, drop_last=True,\n                                                ratio_nums=dataset.ratio_nums, config=config, valid_num=config.valid_num)\n        # used for balanced sampling\n        # batch_sampler = BalancedAspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,\n        #                                                 batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio,\n        #                                                 ratio_nums=dataset.ratio_nums)\n        train_dataloader = build_dataloader(dataset, batch_sampler=batch_sampler, num_workers=config.num_workers)\n    else:\n        train_dataloader = build_dataloader(dataset, num_workers=config.num_workers, batch_size=config.train_batch_size, shuffle=True)\n\n    # build optimizer and lr scheduler\n    lr_scale_ratio = 1\n    if config.get('auto_lr', None):\n        lr_scale_ratio = auto_scale_lr(config.train_batch_size * get_world_size() * config.gradient_accumulation_steps,\n                                       config.optimizer,\n                                       **config.auto_lr)\n    optimizer = build_optimizer(model, config.optimizer)\n    lr_scheduler = build_lr_scheduler(config, optimizer, train_dataloader, lr_scale_ratio)\n\n    timestamp = time.strftime(\"%Y-%m-%d_%H:%M:%S\", time.localtime())\n\n    if accelerator.is_main_process:\n        accelerator.init_trackers(f\"tb_{timestamp}\")\n\n    start_epoch = 0\n    if config.resume_from is not None and config.resume_from['checkpoint'] is not None:\n        start_epoch, missing, unexpected = load_checkpoint(**config.resume_from,\n                                                           model=model,\n                                                           model_ema=model_ema,\n                                                           optimizer=optimizer,\n                                                           lr_scheduler=lr_scheduler,\n                                                           )\n\n        logger.warning(f'Missing keys: {missing}')\n        logger.warning(f'Unexpected keys: {unexpected}')\n\n    solver = DDIMSolver(train_diffusion.alphas_cumprod, timesteps=config.train_sampling_steps, ddim_timesteps=config.num_ddim_timesteps)\n    solver.to(accelerator.device)\n    # Prepare everything\n    # There is no specific order to remember, you just need to unpack the\n    # objects in the same order you gave them to the prepare method.\n    model, model_ema, model_teacher = accelerator.prepare(model, model_ema, model_teacher)\n    # model, model_ema = accelerator.prepare(model, model_ema)\n    optimizer, train_dataloader, lr_scheduler = accelerator.prepare(optimizer, train_dataloader, lr_scheduler)\n    train()\n"
  },
  {
    "path": "PixArt-alpha-ToCa/train_scripts/train_pixart_lcm_lora.py",
    "content": "import os\nimport sys\nimport types\nfrom pathlib import Path\ncurrent_file_path = Path(__file__).resolve()\nsys.path.insert(0, str(current_file_path.parent.parent))\nimport argparse\nimport datetime\nimport time\nimport warnings\nwarnings.filterwarnings(\"ignore\")  # ignore warning\nimport torch\nfrom accelerate import Accelerator, InitProcessGroupKwargs\nfrom accelerate.utils import DistributedType\nfrom torch.utils.data import RandomSampler\nfrom mmcv.runner import LogBuffer\nimport torch.nn.functional as F\nimport numpy as np\nimport re\nfrom packaging import version\nimport accelerate\n\nfrom diffusion import IDDPM\nfrom diffusion.utils.dist_utils import get_world_size, clip_grad_norm_\nfrom diffusion.data.builder import build_dataset, build_dataloader, set_data_root\nfrom diffusion.utils.logger import get_root_logger\nfrom diffusion.utils.misc import set_random_seed, read_config, init_random_seed, DebugUnderflowOverflow\nfrom diffusion.utils.optimizer import build_optimizer, auto_scale_lr\nfrom diffusion.utils.lr_scheduler import build_lr_scheduler\nfrom diffusion.utils.data_sampler import AspectRatioBatchSampler, BalancedAspectRatioBatchSampler\nfrom peft import LoraConfig, get_peft_model, get_peft_model_state_dict\nfrom diffusers import AutoencoderKL, Transformer2DModel, StableDiffusionPipeline, PixArtAlphaPipeline\n\n\ndef set_fsdp_env():\n    os.environ[\"ACCELERATE_USE_FSDP\"] = 'true'\n    os.environ[\"FSDP_AUTO_WRAP_POLICY\"] = 'TRANSFORMER_BASED_WRAP'\n    os.environ[\"FSDP_BACKWARD_PREFETCH\"] = 'BACKWARD_PRE'\n    os.environ[\"FSDP_TRANSFORMER_CLS_TO_WRAP\"] = 'PixArtBlock'\n\ndef filter_keys(key_set):\n    def _f(dictionary):\n        return {k: v for k, v in dictionary.items() if k in key_set}\n\n    return _f\n\n\ndef append_dims(x, target_dims):\n    \"\"\"Appends dimensions to the end of a tensor until it has target_dims dimensions.\"\"\"\n    dims_to_append = target_dims - x.ndim\n    if dims_to_append < 0:\n        raise ValueError(f\"input has {x.ndim} dims but target_dims is {target_dims}, which is less\")\n    return x[(...,) + (None,) * dims_to_append]\n\n\n# From LCMScheduler.get_scalings_for_boundary_condition_discrete\ndef scalings_for_boundary_conditions(timestep, sigma_data=0.5, timestep_scaling=10.0):\n    c_skip = sigma_data**2 / ((timestep / 0.1) ** 2 + sigma_data**2)\n    c_out = (timestep / 0.1) / ((timestep / 0.1) ** 2 + sigma_data**2) ** 0.5\n    return c_skip, c_out\n\n\n# Compare LCMScheduler.step, Step 4\ndef predicted_origin(model_output, timesteps, sample, prediction_type, alphas, sigmas):\n    if prediction_type == \"epsilon\":\n        sigmas = extract_into_tensor(sigmas, timesteps, sample.shape)\n        alphas = extract_into_tensor(alphas, timesteps, sample.shape)\n        pred_x_0 = (sample - sigmas * model_output) / alphas\n    elif prediction_type == \"v_prediction\":\n        sigmas = extract_into_tensor(sigmas, timesteps, sample.shape)\n        alphas = extract_into_tensor(alphas, timesteps, sample.shape)\n        pred_x_0 = alphas * sample - sigmas * model_output\n    else:\n        raise ValueError(f\"Prediction type {prediction_type} currently not supported.\")\n\n    return pred_x_0\n\n\ndef extract_into_tensor(a, t, x_shape):\n    b, *_ = t.shape\n    out = a.gather(-1, t)\n    return out.reshape(b, *((1,) * (len(x_shape) - 1)))\n\n\nclass DDIMSolver:\n    def __init__(self, alpha_cumprods, timesteps=1000, ddim_timesteps=50):\n        # DDIM sampling parameters\n        step_ratio = timesteps // ddim_timesteps\n\n        self.ddim_timesteps = (np.arange(1, ddim_timesteps + 1) * step_ratio).round().astype(np.int64) - 1\n        self.ddim_alpha_cumprods = alpha_cumprods[self.ddim_timesteps]\n        self.ddim_alpha_cumprods_prev = np.asarray(\n            [alpha_cumprods[0]] + alpha_cumprods[self.ddim_timesteps[:-1]].tolist()\n        )\n        # convert to torch tensors\n        self.ddim_timesteps = torch.from_numpy(self.ddim_timesteps).long()\n        self.ddim_alpha_cumprods = torch.from_numpy(self.ddim_alpha_cumprods)\n        self.ddim_alpha_cumprods_prev = torch.from_numpy(self.ddim_alpha_cumprods_prev)\n\n    def to(self, device):\n        self.ddim_timesteps = self.ddim_timesteps.to(device)\n        self.ddim_alpha_cumprods = self.ddim_alpha_cumprods.to(device)\n        self.ddim_alpha_cumprods_prev = self.ddim_alpha_cumprods_prev.to(device)\n        return self\n\n    def ddim_step(self, pred_x0, pred_noise, timestep_index):\n        alpha_cumprod_prev = extract_into_tensor(self.ddim_alpha_cumprods_prev, timestep_index, pred_x0.shape)\n        dir_xt = (1.0 - alpha_cumprod_prev).sqrt() * pred_noise\n        x_prev = alpha_cumprod_prev.sqrt() * pred_x0 + dir_xt\n        return x_prev\n\n\ndef train(model):\n    if config.get('debug_nan', False):\n        DebugUnderflowOverflow(model)\n        logger.info('NaN debugger registered. Start to detect overflow during training.')\n    time_start, last_tic = time.time(), time.time()\n    log_buffer = LogBuffer()\n\n    global_step = start_step\n\n    load_vae_feat = getattr(train_dataloader.dataset, 'load_vae_feat', False)\n\n    # Create uncond embeds for classifier free guidance\n    uncond_prompt_embeds = torch.load('output/pretrained_models/null_embed.pth', map_location='cpu').to(accelerator.device).repeat(config.train_batch_size, 1, 1, 1)\n\n    # Now you train the model\n    for epoch in range(start_epoch + 1, config.num_epochs + 1):\n        data_time_start= time.time()\n        data_time_all = 0\n        for step, batch in enumerate(train_dataloader):\n            data_time_all += time.time() - data_time_start\n            if load_vae_feat:\n                z = batch[0]\n            else:\n                with torch.no_grad():\n                    with torch.cuda.amp.autocast(enabled=config.mixed_precision == 'fp16'):\n                        posterior = vae.encode(batch[0]).latent_dist\n                        if config.sample_posterior:\n                            z = posterior.sample()\n                        else:\n                            z = posterior.mode()\n            latents = (z * config.scale_factor).to(weight_dtype)\n            y = batch[1].squeeze(1).to(weight_dtype)\n            y_mask = batch[2].squeeze(1).squeeze(1).to(weight_dtype)\n            data_info = {'resolution': batch[3]['img_hw'].to(weight_dtype), 'aspect_ratio': batch[3]['aspect_ratio'].to(weight_dtype),}\n\n            # Sample a random timestep for each image\n            grad_norm = None\n            with accelerator.accumulate(model):\n                # Predict the noise residual\n                optimizer.zero_grad()\n                # Sample noise that we'll add to the latents\n                noise = torch.randn_like(latents)\n                bsz = latents.shape[0]\n\n                # Sample a random timestep for each image t_n ~ U[0, N - k - 1] without bias.\n                topk = config.train_sampling_steps // config.num_ddim_timesteps\n                index = torch.randint(0, config.num_ddim_timesteps, (bsz,), device=latents.device).long()\n                start_timesteps = solver.ddim_timesteps[index]\n                timesteps = start_timesteps - topk\n                timesteps = torch.where(timesteps < 0, torch.zeros_like(timesteps), timesteps)\n\n                # Get boundary scalings for start_timesteps and (end) timesteps.\n                c_skip_start, c_out_start = scalings_for_boundary_conditions(start_timesteps)\n                c_skip_start, c_out_start = [append_dims(x, latents.ndim) for x in [c_skip_start, c_out_start]]\n                c_skip, c_out = scalings_for_boundary_conditions(timesteps)\n                c_skip, c_out = [append_dims(x, latents.ndim) for x in [c_skip, c_out]]\n\n                # Sample a random guidance scale w from U[w_min, w_max] and embed it\n                # w = (config.w_max - config.w_min) * torch.rand((bsz,)) + config.w_min\n                w = config.cfg_scale * torch.ones((bsz,))\n                w = w.reshape(bsz, 1, 1, 1)\n                w = w.to(device=latents.device, dtype=latents.dtype)\n\n                # Get online LCM prediction on z_{t_{n + k}}, w, c, t_{n + k}\n                _, pred_x_0, noisy_model_input  = train_diffusion.training_losses_diffusers(\n                    model, latents, start_timesteps,\n                    model_kwargs=dict(encoder_hidden_states=y, encoder_attention_mask=y_mask, added_cond_kwargs=data_info),\n                    noise=noise\n                )\n                model_pred = c_skip_start * noisy_model_input + c_out_start * pred_x_0\n\n                with torch.no_grad():\n                    with torch.autocast(\"cuda\"):\n                        cond_teacher_output, cond_pred_x0, _ = train_diffusion.training_losses_diffusers(\n                            model_teacher, latents, start_timesteps,\n                            model_kwargs=dict(encoder_hidden_states=y, encoder_attention_mask=y_mask, added_cond_kwargs=data_info),\n                            noise=noise\n                        )\n                        # Get teacher model prediction on noisy_latents and unconditional embedding\n                        uncond_teacher_output, uncond_pred_x0, _ = train_diffusion.training_losses_diffusers(\n                            model_teacher, latents, start_timesteps,\n                            model_kwargs=dict(encoder_hidden_states=uncond_prompt_embeds, encoder_attention_mask=y_mask, added_cond_kwargs=data_info),\n                            noise=noise\n                        )\n\n                        # Perform \"CFG\" to get x_prev estimate (using the LCM paper's CFG formulation)\n                        pred_x0 = cond_pred_x0 + w * (cond_pred_x0 - uncond_pred_x0)\n                        pred_noise = cond_teacher_output + w * (cond_teacher_output - uncond_teacher_output)\n                        x_prev = solver.ddim_step(pred_x0, pred_noise, index)\n\n                # Get target LCM prediction on x_prev, w, c, t_n\n                with torch.no_grad():\n                    with torch.autocast(\"cuda\", enabled=True):\n                        _, pred_x_0, _ = train_diffusion.training_losses_diffusers(\n                            model, x_prev.float(), timesteps,\n                            model_kwargs=dict(encoder_hidden_states=y, encoder_attention_mask=y_mask, added_cond_kwargs=data_info),\n                            skip_noise=True\n                        )\n\n                    target = c_skip * x_prev + c_out * pred_x_0\n\n                # Calculate loss\n                if config.loss_type == \"l2\":\n                    loss = F.mse_loss(model_pred.float(), target.float(), reduction=\"mean\")\n                elif config.loss_type == \"huber\":\n                    loss = torch.mean(torch.sqrt((model_pred.float() - target.float()) ** 2 + config.huber_c**2) - config.huber_c)\n\n                accelerator.backward(loss)\n                if accelerator.sync_gradients:\n                    grad_norm = accelerator.clip_grad_norm_(model.parameters(), config.gradient_clip)\n                optimizer.step()\n                lr_scheduler.step()\n                optimizer.zero_grad(set_to_none=True)\n\n            lr = lr_scheduler.get_last_lr()[0]\n            logs = {\"loss\": accelerator.gather(loss).mean().item()}\n            if grad_norm is not None:\n                logs.update(grad_norm=accelerator.gather(grad_norm).mean().item())\n            log_buffer.update(logs)\n            if (step + 1) % config.log_interval == 0 or (step + 1) == 1:\n                t = (time.time() - last_tic) / config.log_interval\n                t_d = data_time_all / config.log_interval\n                avg_time = (time.time() - time_start) / (global_step + 1)\n                eta = str(datetime.timedelta(seconds=int(avg_time * (total_steps - start_step - global_step - 1))))\n                eta_epoch = str(datetime.timedelta(seconds=int(avg_time * (len(train_dataloader) - step - 1))))\n                # avg_loss = sum(loss_buffer) / len(loss_buffer)\n                log_buffer.average()\n                info = f\"Step/Epoch [{(epoch-1)*len(train_dataloader)+step+1}/{epoch}][{step + 1}/{len(train_dataloader)}]:total_eta: {eta}, \" \\\n                       f\"epoch_eta:{eta_epoch}, time_all:{t:.3f}, time_data:{t_d:.3f}, lr:{lr:.3e}, s:({data_info['resolution'][0][0].item()}, {data_info['resolution'][0][1].item()}), \"\n                info += ', '.join([f\"{k}:{v:.4f}\" for k, v in log_buffer.output.items()])\n                logger.info(info)\n                last_tic = time.time()\n                log_buffer.clear()\n                data_time_all = 0\n            logs.update(lr=lr)\n            accelerator.log(logs, step=global_step + start_step)\n\n            global_step += 1\n            data_time_start= time.time()\n\n            accelerator.wait_for_everyone()\n            if accelerator.is_main_process:\n                if ((epoch - 1) * len(train_dataloader) + step + 1) % config.save_model_steps == 0:\n                    save_path = os.path.join(os.path.join(config.work_dir, 'checkpoints'), f\"checkpoint-{(epoch - 1) * len(train_dataloader) + step + 1}\")\n                    os.umask(0o000)\n                    logger.info(f\"Start to save state to {save_path}\")\n                    accelerator.save_state(save_path)\n                    logger.info(f\"Saved state to {save_path}\")\n\n\n        accelerator.wait_for_everyone()\n        if epoch % config.save_model_epochs == 0 or epoch == config.num_epochs:\n            os.umask(0o000)\n            save_path = os.path.join(os.path.join(config.work_dir, 'checkpoints'), f\"checkpoint-{(epoch - 1) * len(train_dataloader) + step + 1}\")\n            logger.info(f\"Start to save state to {save_path}\")\n            model = accelerator.unwrap_model(model)\n            model.save_pretrained(save_path)\n            lora_state_dict = get_peft_model_state_dict(model, adapter_name=\"default\")\n            StableDiffusionPipeline.save_lora_weights(os.path.join(save_path, \"transformer_lora\"), lora_state_dict)\n            logger.info(f\"Saved state to {save_path}\")\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description=\"Process some integers.\")\n    parser.add_argument(\"config\", type=str, help=\"config\")\n    parser.add_argument(\"--cloud\", action='store_true', default=False, help=\"cloud or local machine\")\n    parser.add_argument(\"--work-dir\", default='output', help='the dir to save logs and models')\n    parser.add_argument(\"--resume-from\", help='the dir to save logs and models')\n    parser.add_argument(\"--local-rank\", type=int, default=-1)\n    parser.add_argument(\"--local_rank\", type=int, default=-1)\n    parser.add_argument(\"--debug\", action='store_true')\n    parser.add_argument(\"--lora_rank\", type=int, default=64, help=\"The rank of the LoRA projection matrix.\", )\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == '__main__':\n    args = parse_args()\n    config = read_config(args.config)\n\n    config.resume_from = None\n    if args.work_dir is not None:\n        # update configs according to CLI args if args.work_dir is not None\n        config.work_dir = args.work_dir\n    if args.cloud:\n        config.data_root = '/data/data'\n    if args.resume_from is not None:\n        config.resume_from = args.resume_from\n    if args.debug:\n        config.log_interval = 1\n        config.train_batch_size = 4\n        config.valid_num = 10\n        config.save_model_steps = 10\n\n    os.umask(0o000)\n    os.makedirs(config.work_dir, exist_ok=True)\n\n    init_handler = InitProcessGroupKwargs()\n    init_handler.timeout = datetime.timedelta(seconds=5400)  # change timeout to avoid a strange NCCL bug\n    # Initialize accelerator and tensorboard logging\n    if config.use_fsdp:\n        init_train = 'FSDP'\n        from accelerate import FullyShardedDataParallelPlugin\n        from torch.distributed.fsdp.fully_sharded_data_parallel import FullStateDictConfig\n        set_fsdp_env()\n        fsdp_plugin = FullyShardedDataParallelPlugin(state_dict_config=FullStateDictConfig(offload_to_cpu=False, rank0_only=False),)\n    else:\n        init_train = 'DDP'\n        fsdp_plugin = None\n\n    even_batches = True\n    if config.multi_scale:\n        even_batches=False,\n\n    accelerator = Accelerator(\n        mixed_precision=config.mixed_precision,\n        gradient_accumulation_steps=config.gradient_accumulation_steps,\n        log_with=\"tensorboard\",\n        project_dir=os.path.join(config.work_dir, \"logs\"),\n        fsdp_plugin=fsdp_plugin,\n        even_batches=even_batches,\n        kwargs_handlers=[init_handler]\n    )\n\n    logger = get_root_logger(os.path.join(config.work_dir, 'train_log.log'))\n\n    logger.info(accelerator.state)\n    config.seed = init_random_seed(config.get('seed', None))\n    set_random_seed(config.seed)\n\n    if accelerator.is_main_process:\n        config.dump(os.path.join(config.work_dir, 'config.py'))\n\n    logger.info(f\"Config: \\n{config.pretty_text}\")\n    logger.info(f\"World_size: {get_world_size()}, seed: {config.seed}\")\n    logger.info(f\"Initializing: {init_train} for training\")\n    image_size = config.image_size  # @param [256, 512]\n    latent_size = int(image_size) // 8\n    pred_sigma = getattr(config, 'pred_sigma', True)\n    learn_sigma = getattr(config, 'learn_sigma', True) and pred_sigma\n\n    # prepare null_embedding for training\n    if not os.path.exists('output/pretrained_models/null_embed.pth'):\n        logger.info(f\"Creating output/pretrained_models/null_embed.pth\")\n        os.makedirs('output/pretrained_models/', exist_ok=True)\n        pipe = PixArtAlphaPipeline.from_pretrained(\"PixArt-alpha/PixArt-XL-2-1024-MS\", torch_dtype=torch.float16, use_safetensors=True,).to(\"cuda\")\n        torch.save(pipe.encode_prompt(\"\"), 'output/pretrained_models/null_embed.pth')\n        del pipe\n        torch.cuda.empty_cache()\n\n    # build models\n    train_diffusion = IDDPM(str(config.train_sampling_steps), learn_sigma=learn_sigma, pred_sigma=pred_sigma, return_startx=True)\n    model_teacher = Transformer2DModel.from_pretrained(config.load_from, subfolder=\"transformer\")\n    model_teacher.requires_grad_(False)\n    model = Transformer2DModel.from_pretrained(config.load_from, subfolder=\"transformer\").train()\n    logger.info(f\"{model.__class__.__name__} Model Parameters: {sum(p.numel() for p in model.parameters()):}\")\n\n    lora_config = LoraConfig(\n        r=config.lora_rank,\n        target_modules=[\n            \"to_q\",\n            \"to_k\",\n            \"to_v\",\n            \"to_out.0\",\n            \"proj_in\",\n            \"proj_out\",\n            \"ff.net.0.proj\",\n            \"ff.net.2\",\n            \"proj\",\n            \"linear\",\n            \"linear_1\",\n            \"linear_2\",\n            # \"scale_shift_table\",      # not available due to the implementation in huggingface/peft, working on it.\n        ],\n    )\n    print(lora_config)\n    model = get_peft_model(model, lora_config)\n    model.print_trainable_parameters()\n\n    # 9. Handle mixed precision and device placement\n    # For mixed precision training we cast all non-trainable weigths to half-precision\n    # as these weights are only used for inference, keeping weights in full precision is not required.\n    weight_dtype = torch.float32\n    if accelerator.mixed_precision == \"fp16\":\n        weight_dtype = torch.float16\n    elif accelerator.mixed_precision == \"bf16\":\n        weight_dtype = torch.bfloat16\n\n    # 11. Enable optimizations\n    # model.enable_xformers_memory_efficient_attention()\n    # model_teacher.enable_xformers_memory_efficient_attention()\n\n    lora_layers = filter(lambda p: p.requires_grad, model.parameters())\n\n    # for name, params in model.named_parameters():\n    #     if params.requires_grad == False: logger.info(f\"freeze param: {name}\")\n    #\n    # for name, params in model.named_parameters():\n    #     if params.requires_grad == True: logger.info(f\"trainable param: {name}\")\n\n    # 10. Handle saving and loading of checkpoints\n    # `accelerate` 0.16.0 will have better support for customized saving\n    if version.parse(accelerate.__version__) >= version.parse(\"0.16.0\"):\n        # create custom saving & loading hooks so that `accelerator.save_state(...)` serializes in a nice format\n        def save_model_hook(models, weights, output_dir):\n            if accelerator.is_main_process:\n                transformer_ = accelerator.unwrap_model(models[0])\n                lora_state_dict = get_peft_model_state_dict(transformer_, adapter_name=\"default\")\n                StableDiffusionPipeline.save_lora_weights(os.path.join(output_dir, \"transformer_lora\"), lora_state_dict)\n                # save weights in peft format to be able to load them back\n                transformer_.save_pretrained(output_dir)\n\n                for _, model in enumerate(models):\n                    # make sure to pop weight so that corresponding model is not saved again\n                    weights.pop()\n\n        def load_model_hook(models, input_dir):\n            # load the LoRA into the model\n            transformer_ = accelerator.unwrap_model(models[0])\n            transformer_.load_adapter(input_dir, \"default\", is_trainable=True)\n\n            for _ in range(len(models)):\n                # pop models so that they are not loaded again\n                models.pop()\n\n        accelerator.register_save_state_pre_hook(save_model_hook)\n        accelerator.register_load_state_pre_hook(load_model_hook)\n\n    if config.grad_checkpointing:\n        model.enable_gradient_checkpointing()\n\n    if not config.data.load_vae_feat:\n        vae = AutoencoderKL.from_pretrained(config.vae_pretrained).cuda()\n\n    # prepare for FSDP clip grad norm calculation\n    if accelerator.distributed_type == DistributedType.FSDP:\n        for m in accelerator._models:\n            m.clip_grad_norm_ = types.MethodType(clip_grad_norm_, m)\n\n    # build dataloader\n    set_data_root(config.data_root)\n    dataset = build_dataset(config.data, resolution=image_size, aspect_ratio_type=config.aspect_ratio_type)\n    if config.multi_scale:\n        batch_sampler = AspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,\n                                                batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio, drop_last=True,\n                                                ratio_nums=dataset.ratio_nums, config=config, valid_num=config.valid_num)\n        # used for balanced sampling\n        # batch_sampler = BalancedAspectRatioBatchSampler(sampler=RandomSampler(dataset), dataset=dataset,\n        #                                                 batch_size=config.train_batch_size, aspect_ratios=dataset.aspect_ratio,\n        #                                                 ratio_nums=dataset.ratio_nums)\n        train_dataloader = build_dataloader(dataset, batch_sampler=batch_sampler, num_workers=config.num_workers)\n    else:\n        train_dataloader = build_dataloader(dataset, num_workers=config.num_workers, batch_size=config.train_batch_size, shuffle=True)\n\n    # build optimizer and lr scheduler\n    lr_scale_ratio = 1\n    if config.get('auto_lr', None):\n        lr_scale_ratio = auto_scale_lr(config.train_batch_size * get_world_size() * config.gradient_accumulation_steps,\n                                       config.optimizer,\n                                       **config.auto_lr)\n    optimizer = build_optimizer(model, config.optimizer)\n    lr_scheduler = build_lr_scheduler(config, optimizer, train_dataloader, lr_scale_ratio)\n\n    timestamp = time.strftime(\"%Y-%m-%d_%H:%M:%S\", time.localtime())\n\n    if accelerator.is_main_process:\n        accelerator.init_trackers(f\"tb_{timestamp}\")\n\n    start_epoch = 0\n    start_step = 0\n    total_steps = len(train_dataloader) * config.num_epochs\n\n    solver = DDIMSolver(train_diffusion.alphas_cumprod, timesteps=config.train_sampling_steps, ddim_timesteps=config.num_ddim_timesteps)\n    solver.to(accelerator.device)\n\n    # Prepare everything\n    # There is no specific order to remember, you just need to unpack the\n    # objects in the same order you gave them to the prepare method.\n    model, model_teacher = accelerator.prepare(model, model_teacher)\n    optimizer, train_dataloader, lr_scheduler = accelerator.prepare(optimizer, train_dataloader, lr_scheduler)\n\n    if config.resume_from is not None:\n        if config.resume_from != \"latest\":\n            path = os.path.basename(config.resume_from)\n        else:\n            # Get the most recent checkpoint\n            dirs = os.listdir(os.path.join(config.work_dir, 'checkpoints'))\n            dirs = [d for d in dirs if d.startswith(\"checkpoint\")]\n            dirs = sorted(dirs, key=lambda x: int(x.split(\"-\")[1]))\n            path = dirs[-1] if len(dirs) > 0 else None\n\n        if path is None:\n            accelerator.print(f\"Checkpoint '{config.resume_from}' does not exist. Starting a new training run.\")\n            config.resume_from = None\n        else:\n            accelerator.print(f\"Resuming from checkpoint {path}\")\n            accelerator.load_state(os.path.join(config.work_dir, 'checkpoints', path))\n            start_step = int(path.split(\"-\")[1])\n            start_epoch = start_step // len(train_dataloader)\n\n    train(model)"
  },
  {
    "path": "PixArt-alpha-ToCa/train_scripts/train_pixart_lora_hf.py",
    "content": "# coding=utf-8\n# Copyright 2023 The HuggingFace Inc. team. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\"\"\"Fine-tuning script for Stable Diffusion for text2image with support for LoRA.\"\"\"\n\nimport argparse\nimport logging\nimport math\nimport os\nimport random\nimport shutil\nfrom pathlib import Path\nfrom typing import List, Union\n\nimport datasets\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\nimport torch.utils.checkpoint\nimport transformers\nimport accelerate\nfrom accelerate import Accelerator\nfrom accelerate.logging import get_logger\nfrom accelerate.utils import ProjectConfiguration, set_seed\nfrom datasets import load_dataset\nfrom huggingface_hub import create_repo, upload_folder\nfrom packaging import version\nfrom peft import LoraConfig, get_peft_model_state_dict, get_peft_model, PeftModel\nfrom torchvision import transforms\nfrom tqdm.auto import tqdm\n\nimport diffusers\nfrom diffusers import AutoencoderKL, DDPMScheduler, DiffusionPipeline, StableDiffusionPipeline, PixArtAlphaPipeline, Transformer2DModel\nfrom transformers import T5EncoderModel, T5Tokenizer\nfrom diffusers.optimization import get_scheduler\nfrom diffusers.training_utils import compute_snr\nfrom diffusers.utils import check_min_version, is_wandb_available\nfrom diffusers.utils.import_utils import is_xformers_available\n\n\n# Will error if the minimal version of diffusers is not installed. Remove at your own risks.\ncheck_min_version(\"0.25.0.dev0\")\n\nlogger = get_logger(__name__, log_level=\"INFO\")\n\n\n# TODO: This function should be removed once training scripts are rewritten in PEFT\ndef text_encoder_lora_state_dict(text_encoder):\n    state_dict = {}\n\n    def text_encoder_attn_modules(text_encoder):\n        from transformers import CLIPTextModel, CLIPTextModelWithProjection\n\n        attn_modules = []\n\n        if isinstance(text_encoder, (CLIPTextModel, CLIPTextModelWithProjection)):\n            for i, layer in enumerate(text_encoder.text_model.encoder.layers):\n                name = f\"text_model.encoder.layers.{i}.self_attn\"\n                mod = layer.self_attn\n                attn_modules.append((name, mod))\n\n        return attn_modules\n\n    for name, module in text_encoder_attn_modules(text_encoder):\n        for k, v in module.q_proj.lora_linear_layer.state_dict().items():\n            state_dict[f\"{name}.q_proj.lora_linear_layer.{k}\"] = v\n\n        for k, v in module.k_proj.lora_linear_layer.state_dict().items():\n            state_dict[f\"{name}.k_proj.lora_linear_layer.{k}\"] = v\n\n        for k, v in module.v_proj.lora_linear_layer.state_dict().items():\n            state_dict[f\"{name}.v_proj.lora_linear_layer.{k}\"] = v\n\n        for k, v in module.out_proj.lora_linear_layer.state_dict().items():\n            state_dict[f\"{name}.out_proj.lora_linear_layer.{k}\"] = v\n\n    return state_dict\n\n\ndef save_model_card(repo_id: str, images=None, base_model=str, dataset_name=str, repo_folder=None):\n    img_str = \"\"\n    for i, image in enumerate(images):\n        image.save(os.path.join(repo_folder, f\"image_{i}.png\"))\n        img_str += f\"![img_{i}](./image_{i}.png)\\n\"\n\n    yaml = f\"\"\"\n---\nlicense: creativeml-openrail-m\nbase_model: {base_model}\ntags:\n- stable-diffusion\n- stable-diffusion-diffusers\n- text-to-image\n- diffusers\n- lora\ninference: true\n---\n    \"\"\"\n    model_card = f\"\"\"\n# LoRA text2image fine-tuning - {repo_id}\nThese are LoRA adaption weights for {base_model}. The weights were fine-tuned on the {dataset_name} dataset. You can find some example images in the following. \\n\n{img_str}\n\"\"\"\n    with open(os.path.join(repo_folder, \"README.md\"), \"w\") as f:\n        f.write(yaml + model_card)\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description=\"Simple example of a training script.\")\n    parser.add_argument(\n        \"--pretrained_model_name_or_path\",\n        type=str,\n        default=None,\n        required=True,\n        help=\"Path to pretrained model or model identifier from huggingface.co/models.\",\n    )\n    parser.add_argument(\n        \"--revision\",\n        type=str,\n        default=None,\n        required=False,\n        help=\"Revision of pretrained model identifier from huggingface.co/models.\",\n    )\n    parser.add_argument(\n        \"--variant\",\n        type=str,\n        default=None,\n        help=\"Variant of the model files of the pretrained model identifier from huggingface.co/models, 'e.g.' fp16\",\n    )\n    parser.add_argument(\n        \"--dataset_name\",\n        type=str,\n        default=None,\n        help=(\n            \"The name of the Dataset (from the HuggingFace hub) to train on (could be your own, possibly private,\"\n            \" dataset). It can also be a path pointing to a local copy of a dataset in your filesystem,\"\n            \" or to a folder containing files that 🤗 Datasets can understand.\"\n        ),\n    )\n    parser.add_argument(\n        \"--dataset_config_name\",\n        type=str,\n        default=None,\n        help=\"The config of the Dataset, leave as None if there's only one config.\",\n    )\n    parser.add_argument(\n        \"--train_data_dir\",\n        type=str,\n        default=None,\n        help=(\n            \"A folder containing the training data. Folder contents must follow the structure described in\"\n            \" https://huggingface.co/docs/datasets/image_dataset#imagefolder. In particular, a `metadata.jsonl` file\"\n            \" must exist to provide the captions for the images. Ignored if `dataset_name` is specified.\"\n        ),\n    )\n    parser.add_argument(\n        \"--image_column\", type=str, default=\"image\", help=\"The column of the dataset containing an image.\"\n    )\n    parser.add_argument(\n        \"--caption_column\",\n        type=str,\n        default=\"text\",\n        help=\"The column of the dataset containing a caption or a list of captions.\",\n    )\n    parser.add_argument(\n        \"--validation_prompt\", type=str, default=None, help=\"A prompt that is sampled during training for inference.\"\n    )\n    parser.add_argument(\n        \"--num_validation_images\",\n        type=int,\n        default=4,\n        help=\"Number of images that should be generated during validation with `validation_prompt`.\",\n    )\n    parser.add_argument(\n        \"--validation_epochs\",\n        type=int,\n        default=1,\n        help=(\n            \"Run fine-tuning validation every X epochs. The validation process consists of running the prompt\"\n            \" `args.validation_prompt` multiple times: `args.num_validation_images`.\"\n        ),\n    )\n    parser.add_argument(\n        \"--max_train_samples\",\n        type=int,\n        default=None,\n        help=(\n            \"For debugging purposes or quicker training, truncate the number of training examples to this \"\n            \"value if set.\"\n        ),\n    )\n    parser.add_argument(\n        \"--output_dir\",\n        type=str,\n        default=\"sd-model-finetuned-lora\",\n        help=\"The output directory where the model predictions and checkpoints will be written.\",\n    )\n    parser.add_argument(\n        \"--cache_dir\",\n        type=str,\n        default=None,\n        help=\"The directory where the downloaded models and datasets will be stored.\",\n    )\n    parser.add_argument(\"--seed\", type=int, default=None, help=\"A seed for reproducible training.\")\n    parser.add_argument(\n        \"--resolution\",\n        type=int,\n        default=512,\n        help=(\n            \"The resolution for input images, all the images in the train/validation dataset will be resized to this\"\n            \" resolution\"\n        ),\n    )\n    parser.add_argument(\n        \"--center_crop\",\n        default=False,\n        action=\"store_true\",\n        help=(\n            \"Whether to center crop the input images to the resolution. If not set, the images will be randomly\"\n            \" cropped. The images will be resized to the resolution first before cropping.\"\n        ),\n    )\n    parser.add_argument(\n        \"--random_flip\",\n        action=\"store_true\",\n        help=\"whether to randomly flip images horizontally\",\n    )\n    parser.add_argument(\n        \"--train_batch_size\", type=int, default=16, help=\"Batch size (per device) for the training dataloader.\"\n    )\n    parser.add_argument(\"--num_train_epochs\", type=int, default=100)\n    parser.add_argument(\n        \"--max_train_steps\",\n        type=int,\n        default=None,\n        help=\"Total number of training steps to perform.  If provided, overrides num_train_epochs.\",\n    )\n    parser.add_argument(\n        \"--gradient_accumulation_steps\",\n        type=int,\n        default=1,\n        help=\"Number of updates steps to accumulate before performing a backward/update pass.\",\n    )\n    parser.add_argument(\n        \"--gradient_checkpointing\",\n        action=\"store_true\",\n        help=\"Whether or not to use gradient checkpointing to save memory at the expense of slower backward pass.\",\n    )\n    parser.add_argument(\n        \"--learning_rate\",\n        type=float,\n        default=1e-6,\n        help=\"Initial learning rate (after the potential warmup period) to use.\",\n    )\n    parser.add_argument(\n        \"--scale_lr\",\n        action=\"store_true\",\n        default=False,\n        help=\"Scale the learning rate by the number of GPUs, gradient accumulation steps, and batch size.\",\n    )\n    parser.add_argument(\n        \"--lr_scheduler\",\n        type=str,\n        default=\"constant\",\n        help=(\n            'The scheduler type to use. Choose between [\"linear\", \"cosine\", \"cosine_with_restarts\", \"polynomial\",'\n            ' \"constant\", \"constant_with_warmup\"]'\n        ),\n    )\n    parser.add_argument(\n        \"--lr_warmup_steps\", type=int, default=500, help=\"Number of steps for the warmup in the lr scheduler.\"\n    )\n    parser.add_argument(\n        \"--snr_gamma\",\n        type=float,\n        default=None,\n        help=\"SNR weighting gamma to be used if rebalancing the loss. Recommended value is 5.0. \"\n        \"More details here: https://arxiv.org/abs/2303.09556.\",\n    )\n    parser.add_argument(\n        \"--use_8bit_adam\", action=\"store_true\", help=\"Whether or not to use 8-bit Adam from bitsandbytes.\"\n    )\n    parser.add_argument(\n        \"--use_dora\",\n        action=\"store_true\",\n        default=False,\n        help=\"Whether or not to use Dora. For more information, see\"\n        \" https://huggingface.co/docs/peft/package_reference/lora#peft.LoraConfig.use_dora\"\n    )\n    parser.add_argument(\n        \"--use_rslora\",\n        action=\"store_true\",\n        default=False,\n        help=\"Whether or not to use RS Lora. For more information, see\"\n        \" https://huggingface.co/docs/peft/package_reference/lora#peft.LoraConfig.use_rslora\"\n    )\n    parser.add_argument(\n        \"--allow_tf32\",\n        action=\"store_true\",\n        help=(\n            \"Whether or not to allow TF32 on Ampere GPUs. Can be used to speed up training. For more information, see\"\n            \" https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices\"\n        ),\n    )\n    parser.add_argument(\n        \"--dataloader_num_workers\",\n        type=int,\n        default=0,\n        help=(\n            \"Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process.\"\n        ),\n    )\n    parser.add_argument(\"--adam_beta1\", type=float, default=0.9, help=\"The beta1 parameter for the Adam optimizer.\")\n    parser.add_argument(\"--adam_beta2\", type=float, default=0.999, help=\"The beta2 parameter for the Adam optimizer.\")\n    parser.add_argument(\"--adam_weight_decay\", type=float, default=1e-2, help=\"Weight decay to use.\")\n    parser.add_argument(\"--adam_epsilon\", type=float, default=1e-08, help=\"Epsilon value for the Adam optimizer\")\n    parser.add_argument(\"--max_grad_norm\", default=1.0, type=float, help=\"Max gradient norm.\")\n    parser.add_argument(\"--push_to_hub\", action=\"store_true\", help=\"Whether or not to push the model to the Hub.\")\n    parser.add_argument(\"--hub_token\", type=str, default=None, help=\"The token to use to push to the Model Hub.\")\n    # ----Diffusion Training Arguments----\n    parser.add_argument(\n        \"--proportion_empty_prompts\",\n        type=float,\n        default=0,\n        help=\"Proportion of image prompts to be replaced with empty strings. Defaults to 0 (no prompt replacement).\",\n    )\n    parser.add_argument(\n        \"--prediction_type\",\n        type=str,\n        default=None,\n        help=\"The prediction_type that shall be used for training. Choose between 'epsilon' or 'v_prediction' or leave `None`. If left to `None` the default prediction type of the scheduler: `noise_scheduler.config.prediciton_type` is chosen.\",\n    )\n    parser.add_argument(\n        \"--hub_model_id\",\n        type=str,\n        default=None,\n        help=\"The name of the repository to keep in sync with the local `output_dir`.\",\n    )\n    parser.add_argument(\n        \"--logging_dir\",\n        type=str,\n        default=\"logs\",\n        help=(\n            \"[TensorBoard](https://www.tensorflow.org/tensorboard) log directory. Will default to\"\n            \" *output_dir/runs/**CURRENT_DATETIME_HOSTNAME***.\"\n        ),\n    )\n    parser.add_argument(\n        \"--mixed_precision\",\n        type=str,\n        default=None,\n        choices=[\"no\", \"fp16\", \"bf16\"],\n        help=(\n            \"Whether to use mixed precision. Choose between fp16 and bf16 (bfloat16). Bf16 requires PyTorch >=\"\n            \" 1.10.and an Nvidia Ampere GPU.  Default to the value of accelerate config of the current system or the\"\n            \" flag passed with the `accelerate.launch` command. Use this argument to override the accelerate config.\"\n        ),\n    )\n    parser.add_argument(\n        \"--report_to\",\n        type=str,\n        default=\"tensorboard\",\n        help=(\n            'The integration to report the results and logs to. Supported platforms are `\"tensorboard\"`'\n            ' (default), `\"wandb\"` and `\"comet_ml\"`. Use `\"all\"` to report to all integrations.'\n        ),\n    )\n    parser.add_argument(\"--local_rank\", type=int, default=-1, help=\"For distributed training: local_rank\")\n    parser.add_argument(\n        \"--checkpointing_steps\",\n        type=int,\n        default=500,\n        help=(\n            \"Save a checkpoint of the training state every X updates. These checkpoints are only suitable for resuming\"\n            \" training using `--resume_from_checkpoint`.\"\n        ),\n    )\n    parser.add_argument(\n        \"--checkpoints_total_limit\",\n        type=int,\n        default=None,\n        help=(\"Max number of checkpoints to store.\"),\n    )\n    parser.add_argument(\n        \"--resume_from_checkpoint\",\n        type=str,\n        default=None,\n        help=(\n            \"Whether training should be resumed from a previous checkpoint. Use a path saved by\"\n            ' `--checkpointing_steps`, or `\"latest\"` to automatically select the last available checkpoint.'\n        ),\n    )\n    parser.add_argument(\n        \"--enable_xformers_memory_efficient_attention\", action=\"store_true\", help=\"Whether or not to use xformers.\"\n    )\n    parser.add_argument(\"--noise_offset\", type=float, default=0, help=\"The scale of noise offset.\")\n    parser.add_argument(\n        \"--rank\",\n        type=int,\n        default=4,\n        help=(\"The dimension of the LoRA update matrices.\"),\n    )\n\n    parser.add_argument(\"--local-rank\", type=int, default=-1)\n\n    args = parser.parse_args()\n    env_local_rank = int(os.environ.get(\"LOCAL_RANK\", -1))\n    if env_local_rank != -1 and env_local_rank != args.local_rank:\n        args.local_rank = env_local_rank\n\n    # Sanity checks\n    if args.dataset_name is None and args.train_data_dir is None:\n        raise ValueError(\"Need either a dataset name or a training folder.\")\n\n    if args.proportion_empty_prompts < 0 or args.proportion_empty_prompts > 1:\n        raise ValueError(\"`--proportion_empty_prompts` must be in the range [0, 1].\")\n\n    return args\n\n\nDATASET_NAME_MAPPING = {\"lambdalabs/pokemon-blip-captions\": (\"image\", \"text\"),}\n\n\ndef main():\n    args = parse_args()\n    logging_dir = Path(args.output_dir, args.logging_dir)\n\n    accelerator_project_config = ProjectConfiguration(project_dir=args.output_dir, logging_dir=logging_dir)\n\n    accelerator = Accelerator(\n        gradient_accumulation_steps=args.gradient_accumulation_steps,\n        mixed_precision=args.mixed_precision,\n        log_with=args.report_to,\n        project_config=accelerator_project_config,\n    )\n    if args.report_to == \"wandb\":\n        if not is_wandb_available():\n            raise ImportError(\"Make sure to install wandb if you want to use it for logging during training.\")\n        import wandb\n\n    # Make one log on every process with the configuration for debugging.\n    logging.basicConfig(\n        format=\"%(asctime)s - %(levelname)s - %(name)s - %(message)s\",\n        datefmt=\"%m/%d/%Y %H:%M:%S\",\n        level=logging.INFO,\n    )\n    logger.info(accelerator.state, main_process_only=False)\n    if accelerator.is_local_main_process:\n        datasets.utils.logging.set_verbosity_warning()\n        transformers.utils.logging.set_verbosity_warning()\n        diffusers.utils.logging.set_verbosity_info()\n    else:\n        datasets.utils.logging.set_verbosity_error()\n        transformers.utils.logging.set_verbosity_error()\n        diffusers.utils.logging.set_verbosity_error()\n\n    # If passed along, set the training seed now.\n    if args.seed is not None:\n        set_seed(args.seed)\n\n    # Handle the repository creation\n    if accelerator.is_main_process:\n        if args.output_dir is not None:\n            os.makedirs(args.output_dir, exist_ok=True)\n\n        if args.push_to_hub:\n            repo_id = create_repo(repo_id=args.hub_model_id or Path(args.output_dir).name, exist_ok=True, token=args.hub_token).repo_id\n\n    # See Section 3.1. of the paper.\n    max_length = 120\n\n    # For mixed precision training we cast all non-trainable weigths (vae, non-lora text_encoder and non-lora transformer) to half-precision\n    # as these weights are only used for inference, keeping weights in full precision is not required.\n    weight_dtype = torch.float32\n    if accelerator.mixed_precision == \"fp16\":\n        weight_dtype = torch.float16\n    elif accelerator.mixed_precision == \"bf16\":\n        weight_dtype = torch.bfloat16\n\n    # Load scheduler, tokenizer and models.\n    noise_scheduler = DDPMScheduler.from_pretrained(args.pretrained_model_name_or_path, subfolder=\"scheduler\", torch_dtype=weight_dtype)\n    tokenizer = T5Tokenizer.from_pretrained(args.pretrained_model_name_or_path, subfolder=\"tokenizer\", revision=args.revision, torch_dtype=weight_dtype)\n\n    text_encoder = T5EncoderModel.from_pretrained(args.pretrained_model_name_or_path, subfolder=\"text_encoder\", revision=args.revision, torch_dtype=weight_dtype)\n    text_encoder.requires_grad_(False)\n    text_encoder.to(accelerator.device)\n\n    vae = AutoencoderKL.from_pretrained(args.pretrained_model_name_or_path, subfolder=\"vae\", revision=args.revision, variant=args.variant, torch_dtype=weight_dtype)\n    vae.requires_grad_(False)\n    vae.to(accelerator.device)\n\n    transformer = Transformer2DModel.from_pretrained(args.pretrained_model_name_or_path, subfolder=\"transformer\", torch_dtype=weight_dtype)\n\n    # freeze parameters of models to save more memory\n    transformer.requires_grad_(False)    \n    \n    # Freeze the transformer parameters before adding adapters\n    for param in transformer.parameters():\n        param.requires_grad_(False)\n\n    lora_config = LoraConfig(\n        r=args.rank,\n        init_lora_weights=\"gaussian\",\n        target_modules=[\n            \"to_k\",\n            \"to_q\",\n            \"to_v\",\n            \"to_out.0\",\n            \"proj_in\",\n            \"proj_out\",\n            \"ff.net.0.proj\",\n            \"ff.net.2\",\n            \"proj\",\n            \"linear\",\n            \"linear_1\",\n            \"linear_2\",\n            # \"scale_shift_table\",      # not available due to the implementation in huggingface/peft, working on it.\n        ],\n        use_dora = args.use_dora,\n        use_rslora = args.use_rslora\n    )\n\n    # Move transformer, vae and text_encoder to device and cast to weight_dtype\n    transformer.to(accelerator.device)\n    \n    def cast_training_params(model: Union[torch.nn.Module, List[torch.nn.Module]], dtype=torch.float32):\n        if not isinstance(model, list):\n            model = [model]\n        for m in model:\n            for param in m.parameters():\n                # only upcast trainable parameters into fp32\n                if param.requires_grad:\n                    param.data = param.to(dtype)\n\n    transformer = get_peft_model(transformer, lora_config)\n    if args.mixed_precision == \"fp16\":\n        # only upcast trainable parameters (LoRA) into fp32\n        cast_training_params(transformer, dtype=torch.float32)\n\n    transformer.print_trainable_parameters()\n\n    # 10. Handle saving and loading of checkpoints\n    # `accelerate` 0.16.0 will have better support for customized saving\n    if version.parse(accelerate.__version__) >= version.parse(\"0.16.0\"):\n        # create custom saving & loading hooks so that `accelerator.save_state(...)` serializes in a nice format\n        def save_model_hook(models, weights, output_dir):\n            if accelerator.is_main_process:\n                transformer_ = accelerator.unwrap_model(transformer)\n                lora_state_dict = get_peft_model_state_dict(transformer_, adapter_name=\"default\")\n                StableDiffusionPipeline.save_lora_weights(os.path.join(output_dir, \"transformer_lora\"), lora_state_dict)\n                # save weights in peft format to be able to load them back\n                transformer_.save_pretrained(output_dir)\n\n                for _, model in enumerate(models):\n                    # make sure to pop weight so that corresponding model is not saved again\n                    weights.pop()\n\n        def load_model_hook(models, input_dir):\n            # load the LoRA into the model\n            transformer_ = accelerator.unwrap_model(transformer)\n            transformer_.load_adapter(input_dir, \"default\", is_trainable=True)\n\n            for _ in range(len(models)):\n                # pop models so that they are not loaded again\n                models.pop()\n\n        accelerator.register_save_state_pre_hook(save_model_hook)\n        accelerator.register_load_state_pre_hook(load_model_hook)\n\n    if args.enable_xformers_memory_efficient_attention:\n        if is_xformers_available():\n            import xformers\n\n            xformers_version = version.parse(xformers.__version__)\n            if xformers_version == version.parse(\"0.0.16\"):\n                logger.warn(\n                    \"xFormers 0.0.16 cannot be used for training in some GPUs. If you observe problems during training, please update xFormers to at least 0.0.17. See https://huggingface.co/docs/diffusers/main/en/optimization/xformers for more details.\"\n                )\n            transformer.enable_xformers_memory_efficient_attention()\n        else:\n            raise ValueError(\"xformers is not available. Make sure it is installed correctly\")\n\n    lora_layers = filter(lambda p: p.requires_grad, transformer.parameters())\n\n    # Enable TF32 for faster training on Ampere GPUs,\n    # cf https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices\n    if args.allow_tf32:\n        torch.backends.cuda.matmul.allow_tf32 = True\n\n    if args.gradient_checkpointing:\n        transformer.enable_gradient_checkpointing()\n\n    if args.scale_lr:\n        args.learning_rate = args.learning_rate * args.gradient_accumulation_steps * args.train_batch_size * accelerator.num_processes\n\n    # Initialize the optimizer\n    if args.use_8bit_adam:\n        try:\n            import bitsandbytes as bnb\n        except ImportError:\n            raise ImportError(\"Please install bitsandbytes to use 8-bit Adam. You can do so by running `pip install bitsandbytes`\")\n\n        optimizer_cls = bnb.optim.AdamW8bit\n    else:\n        optimizer_cls = torch.optim.AdamW\n\n    optimizer = optimizer_cls(\n        lora_layers,\n        lr=args.learning_rate,\n        betas=(args.adam_beta1, args.adam_beta2),\n        weight_decay=args.adam_weight_decay,\n        eps=args.adam_epsilon,\n    )\n\n    # Get the datasets: you can either provide your own training and evaluation files (see below)\n    # or specify a Dataset from the hub (the dataset will be downloaded automatically from the datasets Hub).\n\n    # In distributed training, the load_dataset function guarantees that only one local process can concurrently\n    # download the dataset.\n    if args.dataset_name is not None:\n        # Downloading and loading a dataset from the hub.\n        dataset = load_dataset(\n            args.dataset_name,\n            args.dataset_config_name,\n            cache_dir=args.cache_dir,\n            data_dir=args.train_data_dir,\n        )\n    else:\n        data_files = {}\n        if args.train_data_dir is not None:\n            data_files[\"train\"] = os.path.join(args.train_data_dir, \"**\")\n        dataset = load_dataset(\n            \"imagefolder\",\n            data_files=data_files,\n            cache_dir=args.cache_dir,\n        )\n        # See more about loading custom images at\n        # https://huggingface.co/docs/datasets/v2.4.0/en/image_load#imagefolder\n\n    # Preprocessing the datasets.\n    # We need to tokenize inputs and targets.\n    column_names = dataset[\"train\"].column_names\n\n    # 6. Get the column names for input/target.\n    dataset_columns = DATASET_NAME_MAPPING.get(args.dataset_name, None)\n    if args.image_column is None:\n        image_column = dataset_columns[0] if dataset_columns is not None else column_names[0]\n    else:\n        image_column = args.image_column\n        if image_column not in column_names:\n            raise ValueError(\n                f\"--image_column' value '{args.image_column}' needs to be one of: {', '.join(column_names)}\"\n            )\n    if args.caption_column is None:\n        caption_column = dataset_columns[1] if dataset_columns is not None else column_names[1]\n    else:\n        caption_column = args.caption_column\n        if caption_column not in column_names:\n            raise ValueError(\n                f\"--caption_column' value '{args.caption_column}' needs to be one of: {', '.join(column_names)}\"\n            )\n\n    # Preprocessing the datasets.\n    # We need to tokenize input captions and transform the images.\n    def tokenize_captions(examples, is_train=True, proportion_empty_prompts=0., max_length=120):\n        captions = []\n        for caption in examples[caption_column]:\n            if random.random() < proportion_empty_prompts:\n                captions.append(\"\")\n            elif isinstance(caption, str):\n                captions.append(caption)\n            elif isinstance(caption, (list, np.ndarray)):\n                # take a random caption if there are multiple\n                captions.append(random.choice(caption) if is_train else caption[0])\n            else:\n                raise ValueError(\n                    f\"Caption column `{caption_column}` should contain either strings or lists of strings.\"\n                )\n        inputs = tokenizer(captions, max_length=max_length, padding=\"max_length\", truncation=True, return_tensors=\"pt\")\n        return inputs.input_ids, inputs.attention_mask\n\n    # Preprocessing the datasets.\n    train_transforms = transforms.Compose(\n        [\n            transforms.Resize(args.resolution, interpolation=transforms.InterpolationMode.BILINEAR),\n            transforms.CenterCrop(args.resolution) if args.center_crop else transforms.RandomCrop(args.resolution),\n            transforms.RandomHorizontalFlip() if args.random_flip else transforms.Lambda(lambda x: x),\n            transforms.ToTensor(),\n            transforms.Normalize([0.5], [0.5]),\n        ]\n    )\n\n    def preprocess_train(examples):\n        images = [image.convert(\"RGB\") for image in examples[image_column]]\n        examples[\"pixel_values\"] = [train_transforms(image) for image in images]\n        examples[\"input_ids\"], examples['prompt_attention_mask'] = tokenize_captions(examples, proportion_empty_prompts=args.proportion_empty_prompts, max_length=max_length)\n        return examples\n\n    with accelerator.main_process_first():\n        if args.max_train_samples is not None:\n            dataset[\"train\"] = dataset[\"train\"].shuffle(seed=args.seed).select(range(args.max_train_samples))\n        # Set the training transforms\n        train_dataset = dataset[\"train\"].with_transform(preprocess_train)\n\n    def collate_fn(examples):\n        pixel_values = torch.stack([example[\"pixel_values\"] for example in examples])\n        pixel_values = pixel_values.to(memory_format=torch.contiguous_format).float()\n        input_ids = torch.stack([example[\"input_ids\"] for example in examples])\n        prompt_attention_mask = torch.stack([example[\"prompt_attention_mask\"] for example in examples])\n        return {\"pixel_values\": pixel_values, \"input_ids\": input_ids, 'prompt_attention_mask': prompt_attention_mask}\n\n    # DataLoaders creation:\n    train_dataloader = torch.utils.data.DataLoader(\n        train_dataset,\n        shuffle=True,\n        collate_fn=collate_fn,\n        batch_size=args.train_batch_size,\n        num_workers=args.dataloader_num_workers,\n    )\n\n    # Scheduler and math around the number of training steps.\n    overrode_max_train_steps = False\n    num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)\n    if args.max_train_steps is None:\n        args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch\n        overrode_max_train_steps = True\n\n    lr_scheduler = get_scheduler(\n        args.lr_scheduler,\n        optimizer=optimizer,\n        num_warmup_steps=args.lr_warmup_steps * accelerator.num_processes,\n        num_training_steps=args.max_train_steps * accelerator.num_processes,\n    )\n\n    # Prepare everything with our `accelerator`.\n    transformer, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(transformer, optimizer, train_dataloader, lr_scheduler)\n\n    # We need to recalculate our total training steps as the size of the training dataloader may have changed.\n    num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)\n    if overrode_max_train_steps:\n        args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch\n    # Afterwards we recalculate our number of training epochs\n    args.num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)\n\n    # We need to initialize the trackers we use, and also store our configuration.\n    # The trackers initializes automatically on the main process.\n    if accelerator.is_main_process:\n        accelerator.init_trackers(\"text2image-fine-tune\", config=vars(args))\n\n    # Train!\n    total_batch_size = args.train_batch_size * accelerator.num_processes * args.gradient_accumulation_steps\n\n    logger.info(\"***** Running training *****\")\n    logger.info(f\"  Num examples = {len(train_dataset)}\")\n    logger.info(f\"  Num Epochs = {args.num_train_epochs}\")\n    logger.info(f\"  Instantaneous batch size per device = {args.train_batch_size}\")\n    logger.info(f\"  Total train batch size (w. parallel, distributed & accumulation) = {total_batch_size}\")\n    logger.info(f\"  Gradient Accumulation steps = {args.gradient_accumulation_steps}\")\n    logger.info(f\"  Total optimization steps = {args.max_train_steps}\")\n    global_step = 0\n    first_epoch = 0\n\n    # Potentially load in the weights and states from a previous save\n    if args.resume_from_checkpoint:\n        if args.resume_from_checkpoint != \"latest\":\n            path = os.path.basename(args.resume_from_checkpoint)\n        else:\n            # Get the most recent checkpoint\n            dirs = os.listdir(args.output_dir)\n            dirs = [d for d in dirs if d.startswith(\"checkpoint\")]\n            dirs = sorted(dirs, key=lambda x: int(x.split(\"-\")[1]))\n            path = dirs[-1] if len(dirs) > 0 else None\n\n        if path is None:\n            accelerator.print(\n                f\"Checkpoint '{args.resume_from_checkpoint}' does not exist. Starting a new training run.\"\n            )\n            args.resume_from_checkpoint = None\n            initial_global_step = 0\n        else:\n            accelerator.print(f\"Resuming from checkpoint {path}\")\n            accelerator.load_state(os.path.join(args.output_dir, path))\n            global_step = int(path.split(\"-\")[1])\n\n            initial_global_step = global_step\n            first_epoch = global_step // num_update_steps_per_epoch\n    else:\n        initial_global_step = 0\n\n    progress_bar = tqdm(\n        range(0, args.max_train_steps),\n        initial=initial_global_step,\n        desc=\"Steps\",\n        # Only show the progress bar once on each machine.\n        disable=not accelerator.is_local_main_process,\n    )\n\n    for epoch in range(first_epoch, args.num_train_epochs):\n        transformer.train()\n        train_loss = 0.0\n        for step, batch in enumerate(train_dataloader):\n            with accelerator.accumulate(transformer):\n                # Convert images to latent space\n                latents = vae.encode(batch[\"pixel_values\"].to(dtype=weight_dtype)).latent_dist.sample()\n                latents = latents * vae.config.scaling_factor\n\n                # Sample noise that we'll add to the latents\n                noise = torch.randn_like(latents)\n                if args.noise_offset:\n                    # https://www.crosslabs.org//blog/diffusion-with-offset-noise\n                    noise += args.noise_offset * torch.randn((latents.shape[0], latents.shape[1], 1, 1), device=latents.device)\n\n                bsz = latents.shape[0]\n                # Sample a random timestep for each image\n                timesteps = torch.randint(0, noise_scheduler.config.num_train_timesteps, (bsz,), device=latents.device)\n                timesteps = timesteps.long()\n\n                # Add noise to the latents according to the noise magnitude at each timestep\n                # (this is the forward diffusion process)\n                noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)\n\n                # Get the text embedding for conditioning\n                prompt_embeds = text_encoder(batch[\"input_ids\"], attention_mask=batch['prompt_attention_mask'])[0]\n                prompt_attention_mask = batch['prompt_attention_mask']\n                # Get the target for loss depending on the prediction type\n                if args.prediction_type is not None:\n                    # set prediction_type of scheduler if defined\n                    noise_scheduler.register_to_config(prediction_type=args.prediction_type)\n\n                if noise_scheduler.config.prediction_type == \"epsilon\":\n                    target = noise\n                elif noise_scheduler.config.prediction_type == \"v_prediction\":\n                    target = noise_scheduler.get_velocity(latents, noise, timesteps)\n                else:\n                    raise ValueError(f\"Unknown prediction type {noise_scheduler.config.prediction_type}\")\n\n                # Prepare micro-conditions.\n                added_cond_kwargs = {\"resolution\": None, \"aspect_ratio\": None}\n                if getattr(transformer, 'module', transformer).config.sample_size == 128:\n                    resolution = torch.tensor([args.resolution, args.resolution]).repeat(bsz, 1)\n                    aspect_ratio = torch.tensor([float(args.resolution / args.resolution)]).repeat(bsz, 1)\n                    resolution = resolution.to(dtype=weight_dtype, device=latents.device)\n                    aspect_ratio = aspect_ratio.to(dtype=weight_dtype, device=latents.device)\n                    added_cond_kwargs = {\"resolution\": resolution, \"aspect_ratio\": aspect_ratio}\n\n                # Predict the noise residual and compute loss\n                model_pred = transformer(noisy_latents,\n                                         encoder_hidden_states=prompt_embeds,\n                                         encoder_attention_mask=prompt_attention_mask,\n                                         timestep=timesteps,\n                                         added_cond_kwargs=added_cond_kwargs).sample.chunk(2, 1)[0]\n\n                if args.snr_gamma is None:\n                    loss = F.mse_loss(model_pred.float(), target.float(), reduction=\"mean\")\n                else:\n                    # Compute loss-weights as per Section 3.4 of https://arxiv.org/abs/2303.09556.\n                    # Since we predict the noise instead of x_0, the original formulation is slightly changed.\n                    # This is discussed in Section 4.2 of the same paper.\n                    snr = compute_snr(noise_scheduler, timesteps)\n                    if noise_scheduler.config.prediction_type == \"v_prediction\":\n                        # Velocity objective requires that we add one to SNR values before we divide by them.\n                        snr = snr + 1\n                    mse_loss_weights = (torch.stack([snr, args.snr_gamma * torch.ones_like(timesteps)], dim=1).min(dim=1)[0] / snr)\n\n                    loss = F.mse_loss(model_pred.float(), target.float(), reduction=\"none\")\n                    loss = loss.mean(dim=list(range(1, len(loss.shape)))) * mse_loss_weights\n                    loss = loss.mean()\n\n                # Gather the losses across all processes for logging (if we use distributed training).\n                avg_loss = accelerator.gather(loss.repeat(args.train_batch_size)).mean()\n                train_loss += avg_loss.item() / args.gradient_accumulation_steps\n\n                # Backpropagate\n                accelerator.backward(loss)\n                if accelerator.sync_gradients:\n                    params_to_clip = lora_layers\n                    accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)\n                optimizer.step()\n                lr_scheduler.step()\n                optimizer.zero_grad()\n\n            # Checks if the accelerator has performed an optimization step behind the scenes\n            if accelerator.sync_gradients:\n                progress_bar.update(1)\n                global_step += 1\n                accelerator.log({\"train_loss\": train_loss}, step=global_step)\n                train_loss = 0.0\n\n                if global_step % args.checkpointing_steps == 0:\n                    if accelerator.is_main_process:\n                        # _before_ saving state, check if this save would set us over the `checkpoints_total_limit`\n                        if args.checkpoints_total_limit is not None:\n                            checkpoints = os.listdir(args.output_dir)\n                            checkpoints = [d for d in checkpoints if d.startswith(\"checkpoint\")]\n                            checkpoints = sorted(checkpoints, key=lambda x: int(x.split(\"-\")[1]))\n\n                            # before we save the new checkpoint, we need to have at _most_ `checkpoints_total_limit - 1` checkpoints\n                            if len(checkpoints) >= args.checkpoints_total_limit:\n                                num_to_remove = len(checkpoints) - args.checkpoints_total_limit + 1\n                                removing_checkpoints = checkpoints[0:num_to_remove]\n\n                                logger.info(f\"{len(checkpoints)} checkpoints already exist, removing {len(removing_checkpoints)} checkpoints\")\n                                logger.info(f\"removing checkpoints: {', '.join(removing_checkpoints)}\")\n\n                                for removing_checkpoint in removing_checkpoints:\n                                    removing_checkpoint = os.path.join(args.output_dir, removing_checkpoint)\n                                    shutil.rmtree(removing_checkpoint)\n\n                        save_path = os.path.join(args.output_dir, f\"checkpoint-{global_step}\")\n                        accelerator.save_state(save_path)\n\n                        unwrapped_transformer = accelerator.unwrap_model(transformer, keep_fp32_wrapper=False)\n                        transformer_lora_state_dict = get_peft_model_state_dict(unwrapped_transformer)\n\n                        StableDiffusionPipeline.save_lora_weights(\n                            save_directory=save_path,\n                            unet_lora_layers=transformer_lora_state_dict,\n                            safe_serialization=True,\n                        )\n\n                        logger.info(f\"Saved state to {save_path}\")\n\n            logs = {\"step_loss\": loss.detach().item(), \"lr\": lr_scheduler.get_last_lr()[0]}\n            progress_bar.set_postfix(**logs)\n\n            if global_step >= args.max_train_steps:\n                break\n\n        if accelerator.is_main_process:\n            if args.validation_prompt is not None and epoch % args.validation_epochs == 0:\n                logger.info(\n                    f\"Running validation... \\n Generating {args.num_validation_images} images with prompt:\"\n                    f\" {args.validation_prompt}.\"\n                )\n                # create pipeline\n                pipeline = DiffusionPipeline.from_pretrained(\n                    args.pretrained_model_name_or_path,\n                    transformer=accelerator.unwrap_model(transformer, keep_fp32_wrapper=False),\n                    text_encoder=text_encoder, vae=vae,\n                    torch_dtype=weight_dtype,\n                )\n                pipeline = pipeline.to(accelerator.device)\n                pipeline.set_progress_bar_config(disable=True)\n\n                # run inference\n                generator = torch.Generator(device=accelerator.device)\n                if args.seed is not None:\n                    generator = generator.manual_seed(args.seed)\n                images = []\n                for _ in range(args.num_validation_images):\n                    images.append(pipeline(args.validation_prompt, num_inference_steps=20, generator=generator).images[0])\n\n                for tracker in accelerator.trackers:\n                    if tracker.name == \"tensorboard\":\n                        np_images = np.stack([np.asarray(img) for img in images])\n                        tracker.writer.add_images(\"validation\", np_images, epoch, dataformats=\"NHWC\")\n                    if tracker.name == \"wandb\":\n                        tracker.log(\n                            {\n                                \"validation\": [wandb.Image(image, caption=f\"{i}: {args.validation_prompt}\") for i, image in enumerate(images)]\n                            }\n                        )\n\n                del pipeline\n                torch.cuda.empty_cache()\n\n    # Save the lora layers\n    accelerator.wait_for_everyone()\n    if accelerator.is_main_process:\n        transformer = accelerator.unwrap_model(transformer, keep_fp32_wrapper=False)\n        transformer.save_pretrained(args.output_dir)\n        lora_state_dict = get_peft_model_state_dict(transformer)\n        StableDiffusionPipeline.save_lora_weights(os.path.join(args.output_dir, \"transformer_lora\"), lora_state_dict)\n\n        if args.push_to_hub:\n            save_model_card(\n                repo_id,\n                images=images,\n                base_model=args.pretrained_model_name_or_path,\n                dataset_name=args.dataset_name,\n                repo_folder=args.output_dir,\n            )\n            upload_folder(\n                repo_id=repo_id,\n                folder_path=args.output_dir,\n                commit_message=\"End of training\",\n                ignore_patterns=[\"step_*\", \"epoch_*\"],\n            )\n\n    \n    # Final inference\n    # Load previous transformer\n    transformer = Transformer2DModel.from_pretrained(args.pretrained_model_name_or_path, subfolder='transformer', torch_dtype=weight_dtype)\n    # load lora weight\n    transformer = PeftModel.from_pretrained(transformer, args.output_dir)\n    # Load previous pipeline\n    pipeline = DiffusionPipeline.from_pretrained(args.pretrained_model_name_or_path, transformer=transformer, text_encoder=text_encoder, vae=vae, torch_dtype=weight_dtype,)\n    pipeline = pipeline.to(accelerator.device)\n\n    del transformer\n    torch.cuda.empty_cache()\n\n    # run inference\n    generator = torch.Generator(device=accelerator.device)\n    if args.seed is not None:\n        generator = generator.manual_seed(args.seed)\n    images = []\n    for _ in range(args.num_validation_images):\n        images.append(pipeline(args.validation_prompt, num_inference_steps=20, generator=generator).images[0])\n\n    if accelerator.is_main_process:\n        for tracker in accelerator.trackers:\n            if len(images) != 0:\n                if tracker.name == \"tensorboard\":\n                    np_images = np.stack([np.asarray(img) for img in images])\n                    tracker.writer.add_images(\"test\", np_images, epoch, dataformats=\"NHWC\")\n                if tracker.name == \"wandb\":\n                    tracker.log(\n                        {\n                            \"test\": [\n                                wandb.Image(image, caption=f\"{i}: {args.validation_prompt}\")\n                                for i, image in enumerate(images)\n                            ]\n                        }\n                    )\n\n    accelerator.end_training()\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "PixArt-alpha-ToCa-tools/clip_score.py",
    "content": "import os\nimport torch\nfrom PIL import Image\nfrom torchvision.transforms import ToTensor\nfrom torchmetrics.multimodal.clip_score import CLIPScore\nfrom tqdm import tqdm\nimport torch.multiprocessing as mp\n\n# Load prompts file\ndef load_prompts(txt_file):\n    with open(txt_file, \"r\") as f:\n        prompts = f.read().splitlines()\n    return prompts\n\n# Find matching image file: first, directly use the prompt as the filename, \n# and if not found, match using a prefix\ndef find_image_file(image_folder, prompt):\n    img_filename = prompt + \".jpg\"  # Assume filename is {prompt}.jpg\n    img_path = os.path.join(image_folder, img_filename)\n    \n    if os.path.exists(img_path):\n        return img_path\n\n    # If direct match fails, use prefix matching\n    for file in os.listdir(image_folder):\n        if file.startswith(prompt[:20]):  # Use the first 20 characters as a prefix for matching\n            return os.path.join(image_folder, file)\n\n    return None\n\n# Load a batch of images and convert them to Tensors\ndef load_images(image_folder, prompts_batch):\n    images = []\n    valid_prompts = []\n    \n    for prompt in prompts_batch:\n        img_path = find_image_file(image_folder, prompt)\n        \n        if img_path:\n            try:\n                image = Image.open(img_path).convert(\"RGB\")\n                image_tensor = ToTensor()(image).unsqueeze(0)  # Shape (1, C, H, W)\n                images.append(image_tensor)\n                valid_prompts.append(prompt)\n            except Exception as e:\n                print(f\"Error loading image {img_path}: {e}\")\n        else:\n            print(f\"No image found for prompt: {prompt}\")\n    \n    if len(images) > 0:\n        images_tensor = torch.cat(images, dim=0)  # Combine into a single batch (N, C, H, W)\n        return images_tensor, valid_prompts\n    else:\n        return None, None\n\n# Single task: process a batch of prompts and corresponding images, and calculate CLIP Score\ndef process_batch(prompts_batch, image_folder, model_path, device):\n    clip_score_metric = CLIPScore(model_name_or_path=model_path).to(device)\n    \n    # Load image batch\n    images_tensor, valid_prompts = load_images(image_folder, prompts_batch)\n    if images_tensor is not None:\n        images_tensor = images_tensor.to(device)\n        \n        with torch.no_grad():  # Avoid building computation graph, reducing memory consumption\n            # Calculate CLIP Score for each image and prompt\n            for i, prompt in enumerate(valid_prompts):\n                clip_score_metric.update(images_tensor[i].unsqueeze(0).float(), prompt)\n        \n        # Release memory\n        del images_tensor\n        torch.cuda.empty_cache()\n\n        return clip_score_metric.compute().item()\n    else:\n        return None\n\n# Split data into batches\ndef chunked(iterable, batch_size):\n    \"\"\"Yield successive n-sized chunks from iterable.\"\"\"\n    for i in range(0, len(iterable), batch_size):\n        yield iterable[i:i + batch_size]\n\n# Main processing function\ndef main_worker(rank, prompts, image_folder, model_path, device, batch_size, queue):\n    # Split into batches\n    prompts_batches = list(chunked(prompts, batch_size))\n    \n    clip_scores = []\n    for batch in prompts_batches:\n        score = process_batch(batch, image_folder, model_path, device)\n        if score is not None:\n            clip_scores.append(score)\n        # After processing each batch, send information to the main process\n        queue.put(1)  # Send signal indicating one batch is processed\n    \n    queue.put(clip_scores)  # Put final result into the queue for the main process\n\ndef main(prompt_file=\"prompts.txt\", image_folder=\"images\", batch_size=64, num_workers=4):\n    # Load prompts\n    prompts = load_prompts(prompt_file)\n    model_path = \"/root/autodl-tmp/pretrained_models/clip-vit-large-patch14\"\n    device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n\n    # Create multiprocessing queue\n    queue = mp.Queue()\n\n    # Start multiple processes\n    processes = []\n    chunk_size = len(prompts) // num_workers\n    total_batches = (len(prompts) + batch_size - 1) // batch_size  # Calculate total batch count\n    for rank in range(num_workers):\n        worker_prompts = prompts[rank * chunk_size: (rank + 1) * chunk_size]\n        p = mp.Process(target=main_worker, args=(rank, worker_prompts, image_folder, model_path, device, batch_size, queue))\n        p.start()\n        processes.append(p)\n\n    # Use tqdm to create a progress bar\n    with tqdm(total=total_batches, desc=\"Processing batches\") as pbar:\n        all_scores = []\n        finished_batches = 0\n\n        # Get results or progress from the queue\n        while finished_batches < total_batches:\n            result = queue.get()\n            if isinstance(result, list):  # If it's a list, it means final scores\n                all_scores.extend(result)\n            else:\n                pbar.update(1)  # Update progress bar\n                finished_batches += 1\n\n    # Wait for subprocesses to end\n    for p in processes:\n        p.join()\n\n    # Calculate final result\n    if all_scores:\n        final_clip_score = sum(all_scores) / len(all_scores)\n        print(f\"Final averaged CLIP Score for folder '{image_folder}': {final_clip_score}\")\n    else:\n        print(f\"No valid images found in folder '{image_folder}'.\")\n\nif __name__ == \"__main__\":\n    import argparse\n    parser = argparse.ArgumentParser(description=\"Calculate CLIP Score for images and prompts with batch parallel processing.\")\n    parser.add_argument(\"--prompt_file\", type=str, default=\"/root/autodl-tmp/COCO/COCO_caption_prompts_30k.txt\", help=\"Path to the prompts text file.\")\n    parser.add_argument(\"--image_folder\", type=str, default=\"/root/autodl-tmp/vis/2024-09-04_custom_epochunknown_stepunknown_scale4.5_step20_size256_bs100_sampdpm-solver_seed0\", help=\"Path to the folder containing images.\")\n    parser.add_argument(\"--batch_size\", type=int, default=64, help=\"Number of images to process in each batch.\")\n    parser.add_argument(\"--num_workers\", type=int, default=4, help=\"Number of parallel workers.\")\n    args = parser.parse_args()\n    \n    # Set multiprocessing start method to 'spawn', suitable for CUDA\n    mp.set_start_method('spawn', force=True)\n\n    main(prompt_file=args.prompt_file, image_folder=args.image_folder, batch_size=args.batch_size, num_workers=args.num_workers)\n"
  },
  {
    "path": "README.md",
    "content": "<div align=center>\n  \n# **[ICLR 2025]** *ToCa*: Accelerating Diffusion Transformers with *To*ken-wise Feature *Ca*ching\n\n<p>\n<a href='https://arxiv.org/abs/2410.05317'><img src='https://img.shields.io/badge/Paper-arXiv-red'></a>\n<a href='https://toca2024.github.io/ToCa/'><img src='https://img.shields.io/badge/Project-Page-blue'></a>\n</p>\n\n</div>\n\n## 🔥 News\n\n* `2025/03/10` 🚀🚀 Our latest work \"From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers\" is released! Codes are available at [TaylorSeer](https://github.com/Shenyi-Z/TaylorSeer)! TaylorSeer supports lossless compression at a rate of 4.99x on FLUX.1-dev (with a latency speedup of 3.53x) and high-quality acceleration at a compression rate of 5.00x on HunyuanVideo (with a latency speedup of 4.65x)! We hope *TaylorSeer* can move the paradigm of feature caching methods from reusing to forecasting.For more details, please refer to our latest research [paper](https://arxiv.org/abs/2503.06923).\n* `2025/02/19` 🚀🚀 ToCa solution for **FLUX** has been officially released after adjustments, now achieving up to **3.14× lossless acceleration**!\n* `2025/01/22` 💥💥 ToCa is honored to be accepted by ICLR 2025!\n* `2024/12/29` 🚀🚀 We release our work [DuCa](https://arxiv.org/abs/2412.18911) about accelerating diffusion transformers for FREE, which achieves nearly lossless acceleration of **2.50×** on [OpenSora](https://github.com/hpcaitech/Open-Sora)! 🎉 **DuCa also overcomes the limitation of ToCa by fully supporting FlashAttention, enabling broader compatibility and efficiency improvements.**\n* `2024/12/24` 🤗🤗 We release an open-sourse repo \"[Awesome-Token-Reduction-for-Model-Compression](https://github.com/xuyang-liu16/Awesome-Token-Reduction-for-Model-Compression)\", which collects recent awesome token reduction papers! Feel free to contribute your suggestions!\n* `2024/12/20` 💥💥 Our ToCa has achieved nearly lossless acceleration of **1.51×** on [FLUX](https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell), feel free to check the latest version of our [paper](https://arxiv.org/pdf/2410.05317#page=19)!\n* `2024/12/10` 💥💥 Our team's recent work, **SiTo** (https://github.com/EvelynZhang-epiclab/SiTo), has been accepted to **AAAI 2025**. It accelerates diffusion models through adaptive **Token Pruning**.\n* `2024/10/16` 🤗🤗 Users with autodl accounts can now quickly experience [OpenSora-ToCa](https://www.codewithgpu.com/i/Shenyi-Z/ToCa/OpenSora-ToCa) by directly using our publicly available image!\n* `2024/10/12` 🚀🚀 We release our work [ToCa](https://arxiv.org/abs/2410.05317) about accelerating diffusion transformers for FREE, which achieves nearly lossless acceleration of **2.36×** on [OpenSora](https://github.com/hpcaitech/Open-Sora)!\n* `2024/07/15` 🤗🤗 We release an open-sourse repo \"[Awesome-Generation-Acceleration](https://github.com/xuyang-liu16/Awesome-Generation-Acceleration)\", which collects recent awesome generation accleration papers! Feel free to contribute your suggestions!\n\n## TODO:\n\n- [x] Support for FLOPs calculation\n- [x] Add the FLUX version of ToCa\n- [ ] Further optimize the code logic to reduce the time consumption of tensor operations\n\n\n##  Dependencies\n``` cmd\nPython>=3.9\nCUDA>=11.8\n```\n\n## 🛠 Installation\n\n``` cmd\ngit clone https://github.com/Shenyi-Z/ToCa.git\n```\n\n### Environment Settings\n\n#### Original Models (recommended)\n\nWe evaluated our model under the same environments as the original models.\nSo you may set the environments through following the requirements of the mentioned original models.\n\nLinks:\n\n| Original  Models |                     urls                     |\n| :--------------: | :------------------------------------------: |\n|       DiT        |   https://github.com/facebookresearch/DiT    |\n|     PixArt-α     | https://github.com/PixArt-alpha/PixArt-alpha |\n|     OpenSora     |    https://github.com/hpcaitech/Open-Sora    |\n|       FLUX       |  https://github.com/black-forest-labs/flux   |\n\nBesides, we provide a replica for our environment here:\n\n<details>\n<summary>From our environment.yaml</summary>\n\n##### DiT\n\n  ```bash\n  cd DiT-ToCa\n  conda env create -f environment-dit.yml\n  ```\n\n##### PixArt-α\n\n  ```bash\n  cd PixArt-alpha-ToCa\n  conda env create -f environment-pixart.yml\n  ```\n\n##### OpenSora\n\n  ```bash\n  cd Open-Sora\n  conda env create -f environment-opensora.yml\n  pip install -v . # for development mode, `pip install -v -e .`\n  ```\n\n</details>\n\n## 🚀 Run and evaluation\n\n### Run DiT-ToCa\n\n#### DDPM-250 Steps\n\nsample images for **visualization**\n\n```bash\ncd DiT-ToCa\npython sample.py --image-size 256 --num-sampling-steps 250 --cache-type attention --fresh-threshold 4 --fresh-ratio 0.07 --ratio-scheduler ToCa-ddpm250  --force-fresh global --soft-fresh-weight 0.25\n```\n\nsample images for **evaluation** (e.g 50k)\n\n```bash\ncd DiT-ToCa\ntorchrun --nnodes=1 --nproc_per_node=6 sample_ddp.py --model DiT-XL/2 --per-proc-batch-size 150 --image-size 256 --cfg-scale 1.5 --num-sampling-steps 250 --cache-type attention --fresh-ratio 0.07 --ratio-scheduler ToCa-ddpm250 --force-fresh global --fresh-threshold 4 --soft-fresh-weight 0.25 --num-fid-samples 50000\n```\n\n#### DDIM-50 Steps\n\nsample images for **visualization**\n\n```bash\ncd DiT-ToCa\npython sample.py --image-size 256 --num-sampling-steps 50 --cache-type attention --fresh-threshold 3 --fresh-ratio 0.07 --ratio-scheduler ToCa-ddim50  --force-fresh global --soft-fresh-weight 0.25 --ddim-sample\n```\n\nsample images for **evaluation** (e.g 50k)\n\n```bash\ncd DiT-ToCa\ntorchrun --nnodes=1 --nproc_per_node=6 sample_ddp.py --model DiT-XL/2 --per-proc-batch-size 150 --image-size 256 --cfg-scale 1.5 --num-sampling-steps 50 --cache-type attention --fresh-ratio 0.07 --ratio-scheduler ToCa-ddim50 --force-fresh global --fresh-threshold 3 --soft-fresh-weight 0.25 --num-fid-samples 50000 --ddim-sample\n```\n\n#### test FLOPs\n\nJust add --test-FLOPs, here an example: \n\n```bash\ncd DiT-ToCa\npython sample.py --image-size 256 --num-sampling-steps 50 --cache-type attention --fresh-threshold 3 --fresh-ratio 0.07 --ratio-scheduler ToCa-ddim50  --force-fresh global --soft-fresh-weight 0.25 --ddim-sample --test-FLOPs\n```\n\n### Run PixArt-α-ToCa\n\nsample images for **visualization**\n\n```bash\ncd PixArt-alpha-ToCa\npython scripts/inference.py --model_path /root/autodl-tmp/pretrained_models/PixArt-XL-2-256x256.pth --image_size 256 --bs 100 --txt_file /root/autodl-tmp/test.txt --fresh_threshold 3 --fresh_ratio 0.30 --cache_type attention --force_fresh global --soft_fresh_weight 0.25 --ratio_scheduler ToCa\n```\n\nsample images for **evaluation** (e.g 30k for COCO, 1.6k for PartiPrompts)\n\n```bash\ncd PixArt-alpha-ToCa\ntorchrun --nproc_per_node=6 scripts/inference_ddp.py --model_path /root/autodl-tmp/pretrained_models/PixArt-XL-2-256x256.pth --image_size 256 --bs 100 --txt_file /root/autodl-tmp/COCO/COCO_caption_prompts_30k.txt --fresh_threshold 3 --fresh_ratio 0.30 --cache_type attention --force_fresh global --soft_fresh_weight 0.25 --ratio_scheduler ToCa\n```\n\n（Besides, if you need our npz file: https://drive.google.com/file/d/1vUdoSgdIvtXo1cAS_aOFCJ1-XC_i1KEQ/view?usp=sharing)\n\n### Run OpenSora-ToCa\n\nsample video for **visualization**\n\n```bash\ncd Open-Sora\npython scripts/inference.py configs/opensora-v1-2/inference/sample.py   --num-frames 2s --resolution 480p --aspect-ratio 9:16   --prompt \"a beautiful waterfall\"\n```\n\nsample video for **VBench evaluation**\n\n```bash\ncd Open-Sora\nbash eval/vbench/launch.sh /root/autodl-tmp/pretrained_models/hpcai-tech/OpenSora-STDiT-v3/model.safetensors 51 opensora-ToCa 480p 9:16\n```\n\n(remember replacing  \"/root/autodl-tmp/pretrained_models/hpcai-tech/OpenSora-STDiT-v3/model.safetensors\" with your own path!)\n\n### Run FLUX-ToCa\n\nFirst, you need to enter the environment adapted for FLUX. While the official documentation uses `venv` to build the environment, you can also set it up using `conda`, which you might be more familiar with.\n\n<details>\n<summary>How to build a conda environment for FLUX?</summary>\n\n```bash\ncd flux-ToCa\nconda create -n flux python=3.10\npip install -e \".[all]\"\n```\n\n</details>\n\nFor interactive sampling run\n\n```bash\npython -m flux --name <name> --loop\n```\n\nOr to generate a single sample run\n\n```bash\npython -m flux --name <name> \\\n  --height <height> --width <width> \\\n  --prompt \"<prompt>\"\n```\n\nTypically, `<name>` should be set to `flux-dev`.\n\nGenerate image samples with a txt file\n\n```bash\npython src/sample.py --prompt_file </path/to/your/prompt.txt> --width 1024 --height 1024 --model_name flux-dev --add_sampling_metadata --output_dir </path/to/your/generated/samples/folder> --num_steps 50\n```\n\nThe `--add_sampling_metadata` parameter is used to control whether the prompt is added to the image's EXIF metadata.\nWe also provide function for FLOPs testing, but **in this mode, no generated samples are given**.\n\n```bash\npython src/sample.py --prompt_file </path/to/your/test/prompt.txt> --width 1024 --height 1024 --model_name flux-dev --add_sampling_metadata --output_dir </path/to/your/generated/samples/folder> --num_steps 50 --test_FLOPs\n```\n\nUse the framework of Geneval for evaluation\n\n\n```bash\npython src/geneval_flux.py /root/geneval/prompts/evaluation_metadata.jsonl --model_name flux-dev --n_samples 4 --steps 50 --width 1024 --height 1024 --seed 42 --output_dir /root/autodl-tmp/samples/flux-ToCa\n```\n\n<details>\n<summary>How to prepare environment for geneval?</summary>\n\nThe environment required for Geneval's metric computation is somewhat specific. As of February 2025, it is not yet possible to set up the environment directly using the default method provided in the project. However, we can follow the guidance in this Geneval issue [https://github.com/djghosh13/geneval/issues/12](https://github.com/djghosh13/geneval/issues/12) to set up the environment. The instructions are very detailed.\n\n</details>\n\n#### Awesome acceleration results for the Latest Version of ToCa on FLUX\n\n\n| Method       | Geneval $\\uparrow$<br />overall score | ImageRewrd $\\uparrow$<br />DrawBench200 | FLOPs $\\downarrow$ | Latency $\\downarrow$ | Compress Ratio $\\uparrow$ | Speed Up $\\uparrow$ |\n| ------------ | :-----------------------------------: | :-------------------------------------: | :----------------: | :------------------: | :-----------------------: | :-----------------: |\n| **original** |                0.6752                 |                 0.9898                  |      3719.50       |        33.87s        |           1.00            |        1.00         |\n| 60% steps    |                0.6700                 |                 0.9739                  |      2231.70       |        20.49s        |           1.67            |        1.65         |\n| 50% steps    |                0.6656                 |                 0.9429                  |      1859.75       |        17.12s        |           2.00            |        1.98         |\n| 40% steps    |                0.6606                 |                 0.9317                  |      1487.80       |        13.77s        |           2.62            |        2.45         |\n| **FORA3**    |                0.6594                 |                 0.9227                  |      1320.07       |        12.98s        |           2.82            |        2.61         |\n| **ToCa4-01** |                0.6748                 |               **0.9798**                |      1263.22       |        11.91s        |           2.94            |        2.84         |\n| **ToCa5-01** |              **0.6750**               |                 0.9731                  |      1126.76       |        10.80s        |           3.30            |        3.14         |\n| **ToCa6-01** |                0.6653                 |                 0.9493                  |       990.30       |        9.48s         |           3.76            |        3.57         |\n\n\n<details>\n<summary>Explanation of the Improved ToCa</summary>\n\nThe **acceleration effect has significantly improved while maintaining generation quality** compared with the previous version. This is because, in the current version of the code, we have further optimized ToCa and adopted more reliable metrics (Image Reward on DrawBench200, Geneval).\n\n</details>\n\n## 👍 Acknowledgements\n\n- Thanks to [DiT](https://github.com/facebookresearch/DiT) for their great work and codebase upon which we build DiT-ToCa.\n- Thanks to [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha) for their great work and codebase upon which we build PixArt-α-ToCa.\n- Thanks to [OpenSora](https://github.com/hpcaitech/Open-Sora) for their great work and codebase upon which we build OpenSora-ToCa.\n- Thanks to [FLUX](https://github.com/black-forest-labs/flux) for their great work and codebase upon which we build FLUX-ToCa.\n\n## 📌 Citation\n\n```bibtex\n@article{zou2024accelerating,\n  title={Accelerating Diffusion Transformers with Token-wise Feature Caching},\n  author={Zou, Chang and Liu, Xuyang and Liu, Ting and Huang, Siteng and Zhang, Linfeng},\n  journal={arXiv preprint arXiv:2410.05317},\n  year={2024}\n}\n```\n\n## :e-mail: Contact\n\nIf you have any questions, please email [`shenyizou@outlook.com`](mailto:shenyizou@outlook.com).\n"
  },
  {
    "path": "flux-ToCa/.gitignore",
    "content": "# Created by https://www.toptal.com/developers/gitignore/api/linux,windows,macos,visualstudiocode,python\n# Edit at https://www.toptal.com/developers/gitignore?templates=linux,windows,macos,visualstudiocode,python\n\n### Linux ###\n*~\n\n# temporary files which can be created if a process still has a handle open of a deleted file\n.fuse_hidden*\n\n# KDE directory preferences\n.directory\n\n# Linux trash folder which might appear on any partition or disk\n.Trash-*\n\n# .nfs files are created when an open file is removed but is still being accessed\n.nfs*\n\n### macOS ###\n# General\n.DS_Store\n.AppleDouble\n.LSOverride\n\n# Icon must end with two \\r\nIcon\n\n\n# Thumbnails\n._*\n\n# Files that might appear in the root of a volume\n.DocumentRevisions-V100\n.fseventsd\n.Spotlight-V100\n.TemporaryItems\n.Trashes\n.VolumeIcon.icns\n.com.apple.timemachine.donotpresent\n\n# Directories potentially created on remote AFP share\n.AppleDB\n.AppleDesktop\nNetwork Trash Folder\nTemporary Items\n.apdisk\n\n### Python ###\n# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\nshare/python-wheels/\n*.egg-info/\n.installed.cfg\n*.egg\nMANIFEST\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.nox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n*.py,cover\n.hypothesis/\n.pytest_cache/\ncover/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\ndb.sqlite3\ndb.sqlite3-journal\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\ndocs/_build/\n\n# PyBuilder\n.pybuilder/\ntarget/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# IPython\nprofile_default/\nipython_config.py\n\n# pyenv\n#   For a library or package, you might want to ignore these files since the code is\n#   intended to run in multiple environments; otherwise, check them in:\n# .python-version\n\n# pipenv\n#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.\n#   However, in case of collaboration, if having platform-specific dependencies or dependencies\n#   having no cross-platform support, pipenv may install dependencies that don't work, or not\n#   install all needed dependencies.\n#Pipfile.lock\n\n# PEP 582; used by e.g. github.com/David-OConnor/pyflow\n__pypackages__/\n\n# Celery stuff\ncelerybeat-schedule\ncelerybeat.pid\n\n# SageMath parsed files\n*.sage.py\n\n# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Spyder project settings\n.spyderproject\n.spyproject\n\n# Rope project settings\n.ropeproject\n\n# mkdocs documentation\n/site\n\n# mypy\n.mypy_cache/\n.dmypy.json\ndmypy.json\n\n# Pyre type checker\n.pyre/\n\n# pytype static type analyzer\n.pytype/\n\n# Cython debug symbols\ncython_debug/\n\n### VisualStudioCode ###\n.vscode/*\n!.vscode/settings.json\n!.vscode/tasks.json\n!.vscode/launch.json\n!.vscode/extensions.json\n*.code-workspace\n\n# Local History for Visual Studio Code\n.history/\n\n### VisualStudioCode Patch ###\n# Ignore all local history of files\n.history\n.ionide\n\n### Windows ###\n# Windows thumbnail cache files\nThumbs.db\nThumbs.db:encryptable\nehthumbs.db\nehthumbs_vista.db\n\n# Dump file\n*.stackdump\n\n# Folder config file\n[Dd]esktop.ini\n\n# Recycle Bin used on file shares\n$RECYCLE.BIN/\n\n# Windows Installer files\n*.cab\n*.msi\n*.msix\n*.msm\n*.msp\n\n# Windows shortcuts\n*.lnk\n\n# End of https://www.toptal.com/developers/gitignore/api/linux,windows,macos,visualstudiocode,python\n"
  },
  {
    "path": "flux-ToCa/LICENSE",
    "content": "                                 Apache License\n                           Version 2.0, January 2004\n                        http://www.apache.org/licenses/\n\n   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION\n\n   1. Definitions.\n\n      \"License\" shall mean the terms and conditions for use, reproduction,\n      and distribution as defined by Sections 1 through 9 of this document.\n\n      \"Licensor\" shall mean the copyright owner or entity authorized by\n      the copyright owner that is granting the License.\n\n      \"Legal Entity\" shall mean the union of the acting entity and all\n      other entities that control, are controlled by, or are under common\n      control with that entity. For the purposes of this definition,\n      \"control\" means (i) the power, direct or indirect, to cause the\n      direction or management of such entity, whether by contract or\n      otherwise, or (ii) ownership of fifty percent (50%) or more of the\n      outstanding shares, or (iii) beneficial ownership of such entity.\n\n      \"You\" (or \"Your\") shall mean an individual or Legal Entity\n      exercising permissions granted by this License.\n\n      \"Source\" form shall mean the preferred form for making modifications,\n      including but not limited to software source code, documentation\n      source, and configuration files.\n\n      \"Object\" form shall mean any form resulting from mechanical\n      transformation or translation of a Source form, including but\n      not limited to compiled object code, generated documentation,\n      and conversions to other media types.\n\n      \"Work\" shall mean the work of authorship, whether in Source or\n      Object form, made available under the License, as indicated by a\n      copyright notice that is included in or attached to the work\n      (an example is provided in the Appendix below).\n\n      \"Derivative Works\" shall mean any work, whether in Source or Object\n      form, that is based on (or derived from) the Work and for which the\n      editorial revisions, annotations, elaborations, or other modifications\n      represent, as a whole, an original work of authorship. For the purposes\n      of this License, Derivative Works shall not include works that remain\n      separable from, or merely link (or bind by name) to the interfaces of,\n      the Work and Derivative Works thereof.\n\n      \"Contribution\" shall mean any work of authorship, including\n      the original version of the Work and any modifications or additions\n      to that Work or Derivative Works thereof, that is intentionally\n      submitted to Licensor for inclusion in the Work by the copyright owner\n      or by an individual or Legal Entity authorized to submit on behalf of\n      the copyright owner. For the purposes of this definition, \"submitted\"\n      means any form of electronic, verbal, or written communication sent\n      to the Licensor or its representatives, including but not limited to\n      communication on electronic mailing lists, source code control systems,\n      and issue tracking systems that are managed by, or on behalf of, the\n      Licensor for the purpose of discussing and improving the Work, but\n      excluding communication that is conspicuously marked or otherwise\n      designated in writing by the copyright owner as \"Not a Contribution.\"\n\n      \"Contributor\" shall mean Licensor and any individual or Legal Entity\n      on behalf of whom a Contribution has been received by Licensor and\n      subsequently incorporated within the Work.\n\n   2. Grant of Copyright License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      copyright license to reproduce, prepare Derivative Works of,\n      publicly display, publicly perform, sublicense, and distribute the\n      Work and such Derivative Works in Source or Object form.\n\n   3. Grant of Patent License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      (except as stated in this section) patent license to make, have made,\n      use, offer to sell, sell, import, and otherwise transfer the Work,\n      where such license applies only to those patent claims licensable\n      by such Contributor that are necessarily infringed by their\n      Contribution(s) alone or by combination of their Contribution(s)\n      with the Work to which such Contribution(s) was submitted. If You\n      institute patent litigation against any entity (including a\n      cross-claim or counterclaim in a lawsuit) alleging that the Work\n      or a Contribution incorporated within the Work constitutes direct\n      or contributory patent infringement, then any patent licenses\n      granted to You under this License for that Work shall terminate\n      as of the date such litigation is filed.\n\n   4. Redistribution. You may reproduce and distribute copies of the\n      Work or Derivative Works thereof in any medium, with or without\n      modifications, and in Source or Object form, provided that You\n      meet the following conditions:\n\n      (a) You must give any other recipients of the Work or\n          Derivative Works a copy of this License; and\n\n      (b) You must cause any modified files to carry prominent notices\n          stating that You changed the files; and\n\n      (c) You must retain, in the Source form of any Derivative Works\n          that You distribute, all copyright, patent, trademark, and\n          attribution notices from the Source form of the Work,\n          excluding those notices that do not pertain to any part of\n          the Derivative Works; and\n\n      (d) If the Work includes a \"NOTICE\" text file as part of its\n          distribution, then any Derivative Works that You distribute must\n          include a readable copy of the attribution notices contained\n          within such NOTICE file, excluding those notices that do not\n          pertain to any part of the Derivative Works, in at least one\n          of the following places: within a NOTICE text file distributed\n          as part of the Derivative Works; within the Source form or\n          documentation, if provided along with the Derivative Works; or,\n          within a display generated by the Derivative Works, if and\n          wherever such third-party notices normally appear. The contents\n          of the NOTICE file are for informational purposes only and\n          do not modify the License. You may add Your own attribution\n          notices within Derivative Works that You distribute, alongside\n          or as an addendum to the NOTICE text from the Work, provided\n          that such additional attribution notices cannot be construed\n          as modifying the License.\n\n      You may add Your own copyright statement to Your modifications and\n      may provide additional or different license terms and conditions\n      for use, reproduction, or distribution of Your modifications, or\n      for any such Derivative Works as a whole, provided Your use,\n      reproduction, and distribution of the Work otherwise complies with\n      the conditions stated in this License.\n\n   5. Submission of Contributions. Unless You explicitly state otherwise,\n      any Contribution intentionally submitted for inclusion in the Work\n      by You to the Licensor shall be under the terms and conditions of\n      this License, without any additional terms or conditions.\n      Notwithstanding the above, nothing herein shall supersede or modify\n      the terms of any separate license agreement you may have executed\n      with Licensor regarding such Contributions.\n\n   6. Trademarks. This License does not grant permission to use the trade\n      names, trademarks, service marks, or product names of the Licensor,\n      except as required for reasonable and customary use in describing the\n      origin of the Work and reproducing the content of the NOTICE file.\n\n   7. Disclaimer of Warranty. Unless required by applicable law or\n      agreed to in writing, Licensor provides the Work (and each\n      Contributor provides its Contributions) on an \"AS IS\" BASIS,\n      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or\n      implied, including, without limitation, any warranties or conditions\n      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A\n      PARTICULAR PURPOSE. You are solely responsible for determining the\n      appropriateness of using or redistributing the Work and assume any\n      risks associated with Your exercise of permissions under this License.\n\n   8. Limitation of Liability. In no event and under no legal theory,\n      whether in tort (including negligence), contract, or otherwise,\n      unless required by applicable law (such as deliberate and grossly\n      negligent acts) or agreed to in writing, shall any Contributor be\n      liable to You for damages, including any direct, indirect, special,\n      incidental, or consequential damages of any character arising as a\n      result of this License or out of the use or inability to use the\n      Work (including but not limited to damages for loss of goodwill,\n      work stoppage, computer failure or malfunction, or any and all\n      other commercial damages or losses), even if such Contributor\n      has been advised of the possibility of such damages.\n\n   9. Accepting Warranty or Additional Liability. While redistributing\n      the Work or Derivative Works thereof, You may choose to offer,\n      and charge a fee for, acceptance of support, warranty, indemnity,\n      or other liability obligations and/or rights consistent with this\n      License. However, in accepting such obligations, You may act only\n      on Your own behalf and on Your sole responsibility, not on behalf\n      of any other Contributor, and only if You agree to indemnify,\n      defend, and hold each Contributor harmless for any liability\n      incurred by, or claims asserted against, such Contributor by reason\n      of your accepting any such warranty or additional liability.\n\n   END OF TERMS AND CONDITIONS\n\n   APPENDIX: How to apply the Apache License to your work.\n\n      To apply the Apache License to your work, attach the following\n      boilerplate notice, with the fields enclosed by brackets \"[]\"\n      replaced with your own identifying information. (Don't include\n      the brackets!)  The text should be enclosed in the appropriate\n      comment syntax for the file format. We also recommend that a\n      file or class name and description of purpose be included on the\n      same \"printed page\" as the copyright notice for easier\n      identification within third-party archives.\n\n   Copyright [yyyy] [name of copyright owner]\n\n   Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License.\n"
  },
  {
    "path": "flux-ToCa/README.md",
    "content": "# FLUX\nby Black Forest Labs: https://blackforestlabs.ai. Documentation for our API can be found here: [docs.bfl.ml](https://docs.bfl.ml/).\n\n![grid](assets/grid.jpg)\n\nThis repo contains minimal inference code to run image generation & editing with our Flux models.\n\n## Local installation\n\n```bash\ncd $HOME && git clone https://github.com/black-forest-labs/flux\ncd $HOME/flux\n\n# Using pyvenv\npython3.10 -m venv .venv\nsource .venv/bin/activate\npip install -e \".[all]\"\n```\n\n### Models\n\nWe are offering an extensive suite of models. For more information about the individual models, please refer to the link under **Usage**.\n\n| Name                        | Usage                                                      | HuggingFace repo                                               | License                                                               |\n| --------------------------- | ---------------------------------------------------------- | -------------------------------------------------------------- | --------------------------------------------------------------------- |\n| `FLUX.1 [schnell]`          | [Text to Image](docs/text-to-image.md)                     | https://huggingface.co/black-forest-labs/FLUX.1-schnell        | [apache-2.0](model_licenses/LICENSE-FLUX1-schnell)                    |\n| `FLUX.1 [dev]`              | [Text to Image](docs/text-to-image.md)                     | https://huggingface.co/black-forest-labs/FLUX.1-dev            | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) |\n| `FLUX.1 Fill [dev]`         | [In/Out-painting](docs/fill.md)                            | https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev       | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) |\n| `FLUX.1 Canny [dev]`        | [Structural Conditioning](docs/structural-conditioning.md) | https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev      | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) |\n| `FLUX.1 Depth [dev]`        | [Structural Conditioning](docs/structural-conditioning.md) | https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev      | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) |\n| `FLUX.1 Canny [dev] LoRA`   | [Structural Conditioning](docs/structural-conditioning.md) | https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev-lora | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) |\n| `FLUX.1 Depth [dev] LoRA`   | [Structural Conditioning](docs/structural-conditioning.md) | https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev-lora | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) |\n| `FLUX.1 Redux [dev]`        | [Image variation](docs/image-variation.md)                 | https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev      | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) |\n| `FLUX.1 [pro]`              | [Text to Image](docs/text-to-image.md)                     | [Available in our API.](https://docs.bfl.ml/)                  |                                                                       |\n| `FLUX1.1 [pro]`             | [Text to Image](docs/text-to-image.md)                     | [Available in our API.](https://docs.bfl.ml/)                  |                                                                       |\n| `FLUX1.1 [pro] Ultra/raw`   | [Text to Image](docs/text-to-image.md)                     | [Available in our API.](https://docs.bfl.ml/)                  |                                                                       |\n| `FLUX.1 Fill [pro]`         | [In/Out-painting](docs/fill.md)                            | [Available in our API.](https://docs.bfl.ml/)                  |                                                                       |\n| `FLUX.1 Canny [pro]`        | [Structural Conditioning](docs/structural-conditioning.md) | [Available in our API.](https://docs.bfl.ml/)                  |                                                                       |\n| `FLUX.1 Depth [pro]`        | [Structural Conditioning](docs/structural-conditioning.md) | [Available in our API.](https://docs.bfl.ml/)                  |                                                                       |\n| `FLUX1.1 Redux [pro]`       | [Image variation](docs/image-variation.md)                 | [Available in our API.](https://docs.bfl.ml/)                  |                                                                       |\n| `FLUX1.1 Redux [pro] Ultra` | [Image variation](docs/image-variation.md)                 | [Available in our API.](https://docs.bfl.ml/)                  |                                                                       |\n\nThe weights of the autoencoder are also released under [apache-2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) and can be found in the HuggingFace repos above.\n\n## API usage\n\nOur API offers access to our models. It is documented here:\n[docs.bfl.ml](https://docs.bfl.ml/).\n\nIn this repository we also offer an easy python interface. To use this, you\nfirst need to register with the API on [api.bfl.ml](https://api.bfl.ml/), and\ncreate a new API key.\n\nTo use the API key either run `export BFL_API_KEY=<your_key_here>` or provide\nit via the `api_key=<your_key_here>` parameter. It is also expected that you\nhave installed the package as above.\n\nUsage from python:\n\n```python\nfrom flux.api import ImageRequest\n\n# this will create an api request directly but not block until the generation is finished\nrequest = ImageRequest(\"A beautiful beach\", name=\"flux.1.1-pro\")\n# or: request = ImageRequest(\"A beautiful beach\", name=\"flux.1.1-pro\", api_key=\"your_key_here\")\n\n# any of the following will block until the generation is finished\nrequest.url\n# -> https:<...>/sample.jpg\nrequest.bytes\n# -> b\"...\" bytes for the generated image\nrequest.save(\"outputs/api.jpg\")\n# saves the sample to local storage\nrequest.image\n# -> a PIL image\n```\n\nUsage from the command line:\n\n```bash\n$ python -m flux.api --prompt=\"A beautiful beach\" url\nhttps:<...>/sample.jpg\n\n# generate and save the result\n$ python -m flux.api --prompt=\"A beautiful beach\" save outputs/api\n\n# open the image directly\n$ python -m flux.api --prompt=\"A beautiful beach\" image show\n```\n\n## Citation\n\nIf you find the provided code or models useful for your research, consider citing them as:\n\n```bib\n@misc{flux2023,\n    author={Black Forest Labs},\n    title={FLUX},\n    year={2023},\n    howpublished={\\url{https://github.com/black-forest-labs/flux}},\n}\n```\n"
  },
  {
    "path": "flux-ToCa/demo_gr.py",
    "content": "import os\nimport time\nimport uuid\n\nimport gradio as gr\nimport numpy as np\nimport torch\nfrom einops import rearrange\nfrom PIL import ExifTags, Image\nfrom transformers import pipeline\n\nfrom flux.cli import SamplingOptions\nfrom flux.sampling import denoise, get_noise, get_schedule, prepare, unpack\nfrom flux.ideas import denoise_cache\nfrom flux.util import configs, embed_watermark, load_ae, load_clip, load_flow_model, load_t5\n\nNSFW_THRESHOLD = 0.85\n\n\ndef get_models(name: str, device: torch.device, offload: bool, is_schnell: bool):\n    t5 = load_t5(device, max_length=256 if is_schnell else 512)\n    clip = load_clip(device)\n    model = load_flow_model(name, device=\"cpu\" if offload else device)\n    ae = load_ae(name, device=\"cpu\" if offload else device)\n    nsfw_classifier = pipeline(\"image-classification\", model=\"Falconsai/nsfw_image_detection\", device=device)\n    return model, ae, t5, clip, nsfw_classifier\n\n\nclass FluxGenerator:\n    def __init__(self, model_name: str, device: str, offload: bool):\n        self.device = torch.device(device)\n        self.offload = offload\n        self.model_name = model_name\n        self.is_schnell = model_name == \"flux-schnell\"\n        self.model, self.ae, self.t5, self.clip, self.nsfw_classifier = get_models(\n            model_name,\n            device=self.device,\n            offload=self.offload,\n            is_schnell=self.is_schnell,\n        )\n\n    @torch.inference_mode()\n    def generate_image(\n        self,\n        width,\n        height,\n        num_steps,\n        guidance,\n        seed,\n        prompt,\n        init_image=None,\n        image2image_strength=0.0,\n        add_sampling_metadata=True,\n    ):\n        seed = int(seed)\n        if seed == -1:\n            seed = None\n\n        opts = SamplingOptions(\n            prompt=prompt,\n            width=width,\n            height=height,\n            num_steps=num_steps,\n            guidance=guidance,\n            seed=seed,\n        )\n\n        if opts.seed is None:\n            opts.seed = torch.Generator(device=\"cpu\").seed()\n        print(f\"Generating '{opts.prompt}' with seed {opts.seed}\")\n        t0 = time.perf_counter()\n\n        if init_image is not None:\n            if isinstance(init_image, np.ndarray):\n                init_image = torch.from_numpy(init_image).permute(2, 0, 1).float() / 255.0\n                init_image = init_image.unsqueeze(0)\n            init_image = init_image.to(self.device)\n            init_image = torch.nn.functional.interpolate(init_image, (opts.height, opts.width))\n            if self.offload:\n                self.ae.encoder.to(self.device)\n            init_image = self.ae.encode(init_image.to())\n            if self.offload:\n                self.ae = self.ae.cpu()\n                torch.cuda.empty_cache()\n\n        # prepare input\n        x = get_noise(\n            1,\n            opts.height,\n            opts.width,\n            device=self.device,\n            dtype=torch.bfloat16,\n            seed=opts.seed,\n        )\n        timesteps = get_schedule(\n            opts.num_steps,\n            x.shape[-1] * x.shape[-2] // 4,\n            shift=(not self.is_schnell),\n        )\n        if init_image is not None:\n            t_idx = int((1 - image2image_strength) * num_steps)\n            t = timesteps[t_idx]\n            timesteps = timesteps[t_idx:]\n            x = t * x + (1.0 - t) * init_image.to(x.dtype)\n\n        if self.offload:\n            self.t5, self.clip = self.t5.to(self.device), self.clip.to(self.device)\n        inp = prepare(t5=self.t5, clip=self.clip, img=x, prompt=opts.prompt)\n\n        # offload TEs to CPU, load model to gpu\n        if self.offload:\n            self.t5, self.clip = self.t5.cpu(), self.clip.cpu()\n            torch.cuda.empty_cache()\n            self.model = self.model.to(self.device)\n\n        # denoise initial noise\n        x = denoise_cache(self.model, **inp, timesteps=timesteps, guidance=opts.guidance)\n\n        # offload model, load autoencoder to gpu\n        if self.offload:\n            self.model.cpu()\n            torch.cuda.empty_cache()\n            self.ae.decoder.to(x.device)\n\n        # decode latents to pixel space\n        x = unpack(x.float(), opts.height, opts.width)\n        with torch.autocast(device_type=self.device.type, dtype=torch.bfloat16):\n            x = self.ae.decode(x)\n\n        if self.offload:\n            self.ae.decoder.cpu()\n            torch.cuda.empty_cache()\n\n        t1 = time.perf_counter()\n\n        print(f\"Done in {t1 - t0:.1f}s.\")\n        # bring into PIL format\n        x = x.clamp(-1, 1)\n        x = embed_watermark(x.float())\n        x = rearrange(x[0], \"c h w -> h w c\")\n\n        img = Image.fromarray((127.5 * (x + 1.0)).cpu().byte().numpy())\n        nsfw_score = [x[\"score\"] for x in self.nsfw_classifier(img) if x[\"label\"] == \"nsfw\"][0]\n\n        if nsfw_score < NSFW_THRESHOLD:\n            filename = f\"output/gradio/{uuid.uuid4()}.jpg\"\n            os.makedirs(os.path.dirname(filename), exist_ok=True)\n            exif_data = Image.Exif()\n            if init_image is None:\n                exif_data[ExifTags.Base.Software] = \"AI generated;txt2img;flux\"\n            else:\n                exif_data[ExifTags.Base.Software] = \"AI generated;img2img;flux\"\n            exif_data[ExifTags.Base.Make] = \"Black Forest Labs\"\n            exif_data[ExifTags.Base.Model] = self.model_name\n            if add_sampling_metadata:\n                exif_data[ExifTags.Base.ImageDescription] = prompt\n\n            img.save(filename, format=\"jpeg\", exif=exif_data, quality=95, subsampling=0)\n\n            return img, str(opts.seed), filename, None\n        else:\n            return None, str(opts.seed), None, \"Your generated image may contain NSFW content.\"\n\n\ndef create_demo(\n    model_name: str, device: str = \"cuda\" if torch.cuda.is_available() else \"cpu\", offload: bool = False\n):\n    generator = FluxGenerator(model_name, device, offload)\n    is_schnell = model_name == \"flux-schnell\"\n\n    with gr.Blocks() as demo:\n        gr.Markdown(f\"# Flux Image Generation Demo - Model: {model_name}\")\n\n        with gr.Row():\n            with gr.Column():\n                prompt = gr.Textbox(\n                    label=\"Prompt\",\n                    value='a photo of a forest with mist swirling around the tree trunks. The word \"FLUX\" is painted over it in big, red brush strokes with visible texture',\n                )\n                do_img2img = gr.Checkbox(label=\"Image to Image\", value=False, interactive=not is_schnell)\n                init_image = gr.Image(label=\"Input Image\", visible=False)\n                image2image_strength = gr.Slider(\n                    0.0, 1.0, 0.8, step=0.1, label=\"Noising strength\", visible=False\n                )\n\n                with gr.Accordion(\"Advanced Options\", open=False):\n                    width = gr.Slider(128, 8192, 1360, step=16, label=\"Width\")\n                    height = gr.Slider(128, 8192, 768, step=16, label=\"Height\")\n                    num_steps = gr.Slider(1, 50, 4 if is_schnell else 50, step=1, label=\"Number of steps\")\n                    guidance = gr.Slider(\n                        1.0, 10.0, 3.5, step=0.1, label=\"Guidance\", interactive=not is_schnell\n                    )\n                    seed = gr.Textbox(-1, label=\"Seed (-1 for random)\")\n                    add_sampling_metadata = gr.Checkbox(\n                        label=\"Add sampling parameters to metadata?\", value=True\n                    )\n\n                generate_btn = gr.Button(\"Generate\")\n\n            with gr.Column():\n                output_image = gr.Image(label=\"Generated Image\")\n                seed_output = gr.Number(label=\"Used Seed\")\n                warning_text = gr.Textbox(label=\"Warning\", visible=False)\n                download_btn = gr.File(label=\"Download full-resolution\")\n\n        def update_img2img(do_img2img):\n            return {\n                init_image: gr.update(visible=do_img2img),\n                image2image_strength: gr.update(visible=do_img2img),\n            }\n\n        do_img2img.change(update_img2img, do_img2img, [init_image, image2image_strength])\n\n        generate_btn.click(\n            fn=generator.generate_image,\n            inputs=[\n                width,\n                height,\n                num_steps,\n                guidance,\n                seed,\n                prompt,\n                init_image,\n                image2image_strength,\n                add_sampling_metadata,\n            ],\n            outputs=[output_image, seed_output, download_btn, warning_text],\n        )\n\n    return demo\n\n\nif __name__ == \"__main__\":\n    import argparse\n\n    parser = argparse.ArgumentParser(description=\"Flux\")\n    parser.add_argument(\n        \"--name\", type=str, default=\"flux-schnell\", choices=list(configs.keys()), help=\"Model name\"\n    )\n    parser.add_argument(\n        \"--device\", type=str, default=\"cuda\" if torch.cuda.is_available() else \"cpu\", help=\"Device to use\"\n    )\n    parser.add_argument(\"--offload\", action=\"store_true\", help=\"Offload model to CPU when not in use\")\n    parser.add_argument(\"--share\", action=\"store_true\", help=\"Create a public link to your demo\")\n    args = parser.parse_args()\n\n    demo = create_demo(args.name, args.device, args.offload)\n    demo.launch(share=args.share)\n"
  },
  {
    "path": "flux-ToCa/demo_st.py",
    "content": "import os\nimport re\nimport time\nfrom glob import iglob\nfrom io import BytesIO\n\nimport streamlit as st\nimport torch\nfrom einops import rearrange\nfrom fire import Fire\nfrom PIL import ExifTags, Image\nfrom st_keyup import st_keyup\nfrom torchvision import transforms\nfrom transformers import pipeline\n\nfrom flux.cli import SamplingOptions\nfrom flux.sampling import denoise, get_noise, get_schedule, prepare, unpack\nfrom flux.ideas import denoise_cache\nfrom flux.util import (\n    configs,\n    embed_watermark,\n    load_ae,\n    load_clip,\n    load_flow_model,\n    load_t5,\n)\n\nNSFW_THRESHOLD = 0.85\n\n\n@st.cache_resource()\ndef get_models(name: str, device: torch.device, offload: bool, is_schnell: bool):\n    t5 = load_t5(device, max_length=256 if is_schnell else 512)\n    clip = load_clip(device)\n    model = load_flow_model(name, device=\"cpu\" if offload else device)\n    ae = load_ae(name, device=\"cpu\" if offload else device)\n    nsfw_classifier = pipeline(\"image-classification\", model=\"Falconsai/nsfw_image_detection\", device=device)\n    return model, ae, t5, clip, nsfw_classifier\n\n\ndef get_image() -> torch.Tensor | None:\n    image = st.file_uploader(\"Input\", type=[\"jpg\", \"JPEG\", \"png\"])\n    if image is None:\n        return None\n    image = Image.open(image).convert(\"RGB\")\n\n    transform = transforms.Compose(\n        [\n            transforms.ToTensor(),\n            transforms.Lambda(lambda x: 2.0 * x - 1.0),\n        ]\n    )\n    img: torch.Tensor = transform(image)\n    return img[None, ...]\n\n\n@torch.inference_mode()\ndef main(\n    device: str = \"cuda\" if torch.cuda.is_available() else \"cpu\",\n    offload: bool = False,\n    output_dir: str = \"output\",\n):\n    torch_device = torch.device(device)\n    names = list(configs.keys())\n    name = st.selectbox(\"Which model to load?\", names)\n    if name is None or not st.checkbox(\"Load model\", False):\n        return\n\n    is_schnell = name == \"flux-schnell\"\n    model, ae, t5, clip, nsfw_classifier = get_models(\n        name,\n        device=torch_device,\n        offload=offload,\n        is_schnell=is_schnell,\n    )\n\n    do_img2img = (\n        st.checkbox(\n            \"Image to Image\",\n            False,\n            disabled=is_schnell,\n            help=\"Partially noise an image and denoise again to get variations.\\n\\nOnly works for flux-dev\",\n        )\n        and not is_schnell\n    )\n    if do_img2img:\n        init_image = get_image()\n        if init_image is None:\n            st.warning(\"Please add an image to do image to image\")\n        image2image_strength = st.number_input(\"Noising strength\", min_value=0.0, max_value=1.0, value=0.8)\n        if init_image is not None:\n            h, w = init_image.shape[-2:]\n            st.write(f\"Got image of size {w}x{h} ({h*w/1e6:.2f}MP)\")\n        resize_img = st.checkbox(\"Resize image\", False) or init_image is None\n    else:\n        init_image = None\n        resize_img = True\n        image2image_strength = 0.0\n\n    # allow for packing and conversion to latent space\n    width = int(\n        16 * (st.number_input(\"Width\", min_value=128, value=1360, step=16, disabled=not resize_img) // 16)\n    )\n    height = int(\n        16 * (st.number_input(\"Height\", min_value=128, value=768, step=16, disabled=not resize_img) // 16)\n    )\n    num_steps = int(st.number_input(\"Number of steps\", min_value=1, value=(4 if is_schnell else 50)))\n    guidance = float(st.number_input(\"Guidance\", min_value=1.0, value=3.5, disabled=is_schnell))\n    seed_str = st.text_input(\"Seed\", disabled=is_schnell)\n    if seed_str.isdecimal():\n        seed = int(seed_str)\n    else:\n        st.info(\"No seed set, set to positive integer to enable\")\n        seed = None\n    save_samples = st.checkbox(\"Save samples?\", not is_schnell)\n    add_sampling_metadata = st.checkbox(\"Add sampling parameters to metadata?\", True)\n\n    default_prompt = (\n        \"a photo of a forest with mist swirling around the tree trunks. The word \"\n        '\"FLUX\" is painted over it in big, red brush strokes with visible texture'\n    )\n    prompt = st_keyup(\"Enter a prompt\", value=default_prompt, debounce=300, key=\"interactive_text\")\n\n    output_name = os.path.join(output_dir, \"img_{idx}.jpg\")\n    if not os.path.exists(output_dir):\n        os.makedirs(output_dir)\n        idx = 0\n    else:\n        fns = [fn for fn in iglob(output_name.format(idx=\"*\")) if re.search(r\"img_[0-9]+\\.jpg$\", fn)]\n        if len(fns) > 0:\n            idx = max(int(fn.split(\"_\")[-1].split(\".\")[0]) for fn in fns) + 1\n        else:\n            idx = 0\n\n    rng = torch.Generator(device=\"cpu\")\n\n    if \"seed\" not in st.session_state:\n        st.session_state.seed = rng.seed()\n\n    def increment_counter():\n        st.session_state.seed += 1\n\n    def decrement_counter():\n        if st.session_state.seed > 0:\n            st.session_state.seed -= 1\n\n    opts = SamplingOptions(\n        prompt=prompt,\n        width=width,\n        height=height,\n        num_steps=num_steps,\n        guidance=guidance,\n        seed=seed,\n    )\n\n    if name == \"flux-schnell\":\n        cols = st.columns([5, 1, 1, 5])\n        with cols[1]:\n            st.button(\"↩\", on_click=increment_counter)\n        with cols[2]:\n            st.button(\"↪\", on_click=decrement_counter)\n    if is_schnell or st.button(\"Sample\"):\n        if is_schnell:\n            opts.seed = st.session_state.seed\n        elif opts.seed is None:\n            opts.seed = rng.seed()\n        print(f\"Generating '{opts.prompt}' with seed {opts.seed}\")\n        t0 = time.perf_counter()\n\n        if init_image is not None:\n            if resize_img:\n                init_image = torch.nn.functional.interpolate(init_image, (opts.height, opts.width))\n            else:\n                h, w = init_image.shape[-2:]\n                init_image = init_image[..., : 16 * (h // 16), : 16 * (w // 16)]\n                opts.height = init_image.shape[-2]\n                opts.width = init_image.shape[-1]\n            if offload:\n                ae.encoder.to(torch_device)\n            init_image = ae.encode(init_image.to(torch_device))\n            if offload:\n                ae = ae.cpu()\n                torch.cuda.empty_cache()\n\n        # prepare input\n        x = get_noise(\n            1,\n            opts.height,\n            opts.width,\n            device=torch_device,\n            dtype=torch.bfloat16,\n            seed=opts.seed,\n        )\n        # divide pixel space by 16**2 to account for latent space conversion\n        timesteps = get_schedule(\n            opts.num_steps,\n            (x.shape[-1] * x.shape[-2]) // 4,\n            shift=(not is_schnell),\n        )\n        if init_image is not None:\n            t_idx = int((1 - image2image_strength) * num_steps)\n            t = timesteps[t_idx]\n            timesteps = timesteps[t_idx:]\n            x = t * x + (1.0 - t) * init_image.to(x.dtype)\n\n        if offload:\n            t5, clip = t5.to(torch_device), clip.to(torch_device)\n        inp = prepare(t5=t5, clip=clip, img=x, prompt=opts.prompt)\n\n        # offload TEs to CPU, load model to gpu\n        if offload:\n            t5, clip = t5.cpu(), clip.cpu()\n            torch.cuda.empty_cache()\n            model = model.to(torch_device)\n\n        # denoise initial noise\n        x = denoise_cache(model, **inp, timesteps=timesteps, guidance=opts.guidance)\n\n        # offload model, load autoencoder to gpu\n        if offload:\n            model.cpu()\n            torch.cuda.empty_cache()\n            ae.decoder.to(x.device)\n\n        # decode latents to pixel space\n        x = unpack(x.float(), opts.height, opts.width)\n        with torch.autocast(device_type=torch_device.type, dtype=torch.bfloat16):\n            x = ae.decode(x)\n\n        if offload:\n            ae.decoder.cpu()\n            torch.cuda.empty_cache()\n\n        t1 = time.perf_counter()\n\n        fn = output_name.format(idx=idx)\n        print(f\"Done in {t1 - t0:.1f}s.\")\n        # bring into PIL format and save\n        x = x.clamp(-1, 1)\n        x = embed_watermark(x.float())\n        x = rearrange(x[0], \"c h w -> h w c\")\n\n        img = Image.fromarray((127.5 * (x + 1.0)).cpu().byte().numpy())\n        nsfw_score = [x[\"score\"] for x in nsfw_classifier(img) if x[\"label\"] == \"nsfw\"][0]\n\n        if nsfw_score < NSFW_THRESHOLD:\n            buffer = BytesIO()\n            exif_data = Image.Exif()\n            if init_image is None:\n                exif_data[ExifTags.Base.Software] = \"AI generated;txt2img;flux\"\n            else:\n                exif_data[ExifTags.Base.Software] = \"AI generated;img2img;flux\"\n            exif_data[ExifTags.Base.Make] = \"Black Forest Labs\"\n            exif_data[ExifTags.Base.Model] = name\n            if add_sampling_metadata:\n                exif_data[ExifTags.Base.ImageDescription] = prompt\n            img.save(buffer, format=\"jpeg\", exif=exif_data, quality=95, subsampling=0)\n\n            img_bytes = buffer.getvalue()\n            if save_samples:\n                print(f\"Saving {fn}\")\n                with open(fn, \"wb\") as file:\n                    file.write(img_bytes)\n                idx += 1\n\n            st.session_state[\"samples\"] = {\n                \"prompt\": opts.prompt,\n                \"img\": img,\n                \"seed\": opts.seed,\n                \"bytes\": img_bytes,\n            }\n            opts.seed = None\n        else:\n            st.warning(\"Your generated image may contain NSFW content.\")\n            st.session_state[\"samples\"] = None\n\n    samples = st.session_state.get(\"samples\", None)\n    if samples is not None:\n        st.image(samples[\"img\"], caption=samples[\"prompt\"])\n        st.download_button(\n            \"Download full-resolution\",\n            samples[\"bytes\"],\n            file_name=\"generated.jpg\",\n            mime=\"image/jpg\",\n        )\n        st.write(f\"Seed: {samples['seed']}\")\n\n\ndef app():\n    Fire(main)\n\n\nif __name__ == \"__main__\":\n    app()\n"
  },
  {
    "path": "flux-ToCa/demo_st_fill.py",
    "content": "import os\nimport re\nimport tempfile\nimport time\nfrom glob import iglob\nfrom io import BytesIO\n\nimport numpy as np\nimport streamlit as st\nimport torch\nfrom einops import rearrange\nfrom PIL import ExifTags, Image\nfrom st_keyup import st_keyup\nfrom streamlit_drawable_canvas import st_canvas\nfrom transformers import pipeline\n\nfrom flux.sampling import denoise, get_noise, get_schedule, prepare_fill, unpack\nfrom flux.ideas import denoise_cache\nfrom flux.util import embed_watermark, load_ae, load_clip, load_flow_model, load_t5\n\nNSFW_THRESHOLD = 0.85\n\n\ndef add_border_and_mask(image, zoom_all=1.0, zoom_left=0, zoom_right=0, zoom_up=0, zoom_down=0, overlap=0):\n    \"\"\"Adds a black border around the image with individual side control and mask overlap\"\"\"\n    orig_width, orig_height = image.size\n\n    # Calculate padding for each side (in pixels)\n    left_pad = int(orig_width * zoom_left)\n    right_pad = int(orig_width * zoom_right)\n    top_pad = int(orig_height * zoom_up)\n    bottom_pad = int(orig_height * zoom_down)\n\n    # Calculate overlap in pixels\n    overlap_left = int(orig_width * overlap)\n    overlap_right = int(orig_width * overlap)\n    overlap_top = int(orig_height * overlap)\n    overlap_bottom = int(orig_height * overlap)\n\n    # If using the all-sides zoom, add it to each side\n    if zoom_all > 1.0:\n        extra_each_side = (zoom_all - 1.0) / 2\n        left_pad += int(orig_width * extra_each_side)\n        right_pad += int(orig_width * extra_each_side)\n        top_pad += int(orig_height * extra_each_side)\n        bottom_pad += int(orig_height * extra_each_side)\n\n    # Calculate new dimensions (ensure they're multiples of 32)\n    new_width = 32 * round((orig_width + left_pad + right_pad) / 32)\n    new_height = 32 * round((orig_height + top_pad + bottom_pad) / 32)\n\n    # Create new image with black border\n    bordered_image = Image.new(\"RGB\", (new_width, new_height), (0, 0, 0))\n    # Paste original image in position\n    paste_x = left_pad\n    paste_y = top_pad\n    bordered_image.paste(image, (paste_x, paste_y))\n\n    # Create mask (white where the border is, black where the original image was)\n    mask = Image.new(\"L\", (new_width, new_height), 255)  # White background\n    # Paste black rectangle with overlap adjustment\n    mask.paste(\n        0,\n        (\n            paste_x + overlap_left,  # Left edge moves right\n            paste_y + overlap_top,  # Top edge moves down\n            paste_x + orig_width - overlap_right,  # Right edge moves left\n            paste_y + orig_height - overlap_bottom,  # Bottom edge moves up\n        ),\n    )\n\n    return bordered_image, mask\n\n\n@st.cache_resource()\ndef get_models(name: str, device: torch.device, offload: bool):\n    t5 = load_t5(device, max_length=128)\n    clip = load_clip(device)\n    model = load_flow_model(name, device=\"cpu\" if offload else device)\n    ae = load_ae(name, device=\"cpu\" if offload else device)\n    nsfw_classifier = pipeline(\"image-classification\", model=\"Falconsai/nsfw_image_detection\", device=device)\n    return model, ae, t5, clip, nsfw_classifier\n\n\ndef resize(img: Image.Image, min_mp: float = 0.5, max_mp: float = 2.0) -> Image.Image:\n    width, height = img.size\n    mp = (width * height) / 1_000_000  # Current megapixels\n\n    if min_mp <= mp <= max_mp:\n        # Even if MP is in range, ensure dimensions are multiples of 32\n        new_width = int(32 * round(width / 32))\n        new_height = int(32 * round(height / 32))\n        if new_width != width or new_height != height:\n            return img.resize((new_width, new_height), Image.Resampling.LANCZOS)\n        return img\n\n    # Calculate scaling factor\n    if mp < min_mp:\n        scale = (min_mp / mp) ** 0.5\n    else:  # mp > max_mp\n        scale = (max_mp / mp) ** 0.5\n\n    new_width = int(32 * round(width * scale / 32))\n    new_height = int(32 * round(height * scale / 32))\n\n    return img.resize((new_width, new_height), Image.Resampling.LANCZOS)\n\n\ndef clear_canvas_state():\n    \"\"\"Clear all canvas-related state\"\"\"\n    keys_to_clear = [\"canvas\", \"last_image_dims\"]\n    for key in keys_to_clear:\n        if key in st.session_state:\n            del st.session_state[key]\n\n\ndef set_new_image(img: Image.Image):\n    \"\"\"Safely set a new image and clear relevant state\"\"\"\n    st.session_state[\"current_image\"] = img\n    clear_canvas_state()\n    st.rerun()\n\n\ndef downscale_image(img: Image.Image, scale_factor: float) -> Image.Image:\n    \"\"\"Downscale image by a given factor while maintaining 32-pixel multiple dimensions\"\"\"\n    if scale_factor >= 1.0:\n        return img\n\n    width, height = img.size\n    new_width = int(32 * round(width * scale_factor / 32))\n    new_height = int(32 * round(height * scale_factor / 32))\n\n    # Ensure minimum dimensions\n    new_width = max(64, new_width)  # minimum 64 pixels\n    new_height = max(64, new_height)  # minimum 64 pixels\n\n    return img.resize((new_width, new_height), Image.Resampling.LANCZOS)\n\n\n@torch.inference_mode()\ndef main(\n    device: str = \"cuda\" if torch.cuda.is_available() else \"cpu\",\n    offload: bool = False,\n    output_dir: str = \"output\",\n):\n    torch_device = torch.device(device)\n    st.title(\"Flux Fill: Inpainting & Outpainting\")\n\n    # Model selection and loading\n    name = \"flux-dev-fill\"\n    if not st.checkbox(\"Load model\", False):\n        return\n\n    try:\n        model, ae, t5, clip, nsfw_classifier = get_models(\n            name,\n            device=torch_device,\n            offload=offload,\n        )\n    except Exception as e:\n        st.error(f\"Error loading models: {e}\")\n        return\n\n    # Mode selection\n    mode = st.radio(\"Select Mode\", [\"Inpainting\", \"Outpainting\"])\n\n    # Image handling - either from previous generation or new upload\n    if \"input_image\" in st.session_state:\n        image = st.session_state[\"input_image\"]\n        del st.session_state[\"input_image\"]\n        set_new_image(image)\n        st.write(\"Continuing from previous result\")\n    else:\n        uploaded_image = st.file_uploader(\"Upload image\", type=[\"jpg\", \"jpeg\", \"png\"])\n        if uploaded_image is None:\n            st.warning(\"Please upload an image\")\n            return\n\n        if (\n            \"current_image_name\" not in st.session_state\n            or st.session_state[\"current_image_name\"] != uploaded_image.name\n        ):\n            try:\n                image = Image.open(uploaded_image).convert(\"RGB\")\n                st.session_state[\"current_image_name\"] = uploaded_image.name\n                set_new_image(image)\n            except Exception as e:\n                st.error(f\"Error loading image: {e}\")\n                return\n        else:\n            image = st.session_state.get(\"current_image\")\n            if image is None:\n                st.error(\"Error: Image state is invalid. Please reupload the image.\")\n                clear_canvas_state()\n                return\n\n    # Add downscale control\n    with st.expander(\"Image Size Control\"):\n        current_mp = (image.size[0] * image.size[1]) / 1_000_000\n        st.write(f\"Current image size: {image.size[0]}x{image.size[1]} ({current_mp:.1f}MP)\")\n\n        scale_factor = st.slider(\n            \"Downscale Factor\",\n            min_value=0.1,\n            max_value=1.0,\n            value=1.0,\n            step=0.1,\n            help=\"1.0 = original size, 0.5 = half size, etc.\",\n        )\n\n        if scale_factor < 1.0 and st.button(\"Apply Downscaling\"):\n            image = downscale_image(image, scale_factor)\n            set_new_image(image)\n            st.rerun()\n\n    # Resize image with validation\n    try:\n        original_mp = (image.size[0] * image.size[1]) / 1_000_000\n        image = resize(image)\n        width, height = image.size\n        current_mp = (width * height) / 1_000_000\n\n        if width % 32 != 0 or height % 32 != 0:\n            st.error(\"Error: Image dimensions must be multiples of 32\")\n            return\n\n        st.write(f\"Image dimensions: {width}x{height} pixels\")\n        if original_mp != current_mp:\n            st.write(\n                f\"Image has been resized from {original_mp:.1f}MP to {current_mp:.1f}MP to stay within bounds (0.5MP - 2MP)\"\n            )\n    except Exception as e:\n        st.error(f\"Error processing image: {e}\")\n        return\n\n    if mode == \"Outpainting\":\n        # Outpainting controls\n        zoom_all = st.slider(\"Zoom Out Amount (All Sides)\", min_value=1.0, max_value=3.0, value=1.0, step=0.1)\n\n        with st.expander(\"Advanced Zoom Controls\"):\n            st.info(\"These controls add additional zoom to specific sides\")\n            col1, col2 = st.columns(2)\n            with col1:\n                zoom_left = st.slider(\"Left\", min_value=0.0, max_value=1.0, value=0.0, step=0.1)\n                zoom_right = st.slider(\"Right\", min_value=0.0, max_value=1.0, value=0.0, step=0.1)\n            with col2:\n                zoom_up = st.slider(\"Up\", min_value=0.0, max_value=1.0, value=0.0, step=0.1)\n                zoom_down = st.slider(\"Down\", min_value=0.0, max_value=1.0, value=0.0, step=0.1)\n\n        overlap = st.slider(\"Overlap\", min_value=0.01, max_value=0.25, value=0.01, step=0.01)\n\n        # Generate bordered image and mask\n        image_for_generation, mask = add_border_and_mask(\n            image,\n            zoom_all=zoom_all,\n            zoom_left=zoom_left,\n            zoom_right=zoom_right,\n            zoom_up=zoom_up,\n            zoom_down=zoom_down,\n            overlap=overlap,\n        )\n        width, height = image_for_generation.size\n\n        # Show preview\n        col1, col2 = st.columns(2)\n        with col1:\n            st.image(image_for_generation, caption=\"Image with Border\")\n        with col2:\n            st.image(mask, caption=\"Mask (white areas will be generated)\")\n\n    else:  # Inpainting mode\n        # Canvas setup with dimension tracking\n        canvas_key = f\"canvas_{width}_{height}\"\n        if \"last_image_dims\" not in st.session_state:\n            st.session_state.last_image_dims = (width, height)\n        elif st.session_state.last_image_dims != (width, height):\n            clear_canvas_state()\n            st.session_state.last_image_dims = (width, height)\n            st.rerun()\n\n        try:\n            canvas_result = st_canvas(\n                fill_color=\"rgba(255, 255, 255, 0.0)\",\n                stroke_width=st.slider(\"Brush size\", 1, 500, 50),\n                stroke_color=\"#fff\",\n                background_image=image,\n                height=height,\n                width=width,\n                drawing_mode=\"freedraw\",\n                key=canvas_key,\n                display_toolbar=True,\n            )\n        except Exception as e:\n            st.error(f\"Error creating canvas: {e}\")\n            clear_canvas_state()\n            st.rerun()\n            return\n\n    # Sampling parameters\n    num_steps = int(st.number_input(\"Number of steps\", min_value=1, value=50))\n    guidance = float(st.number_input(\"Guidance\", min_value=1.0, value=30.0))\n    seed_str = st.text_input(\"Seed\")\n    if seed_str.isdecimal():\n        seed = int(seed_str)\n    else:\n        st.info(\"No seed set, using random seed\")\n        seed = None\n\n    save_samples = st.checkbox(\"Save samples?\", True)\n    add_sampling_metadata = st.checkbox(\"Add sampling parameters to metadata?\", True)\n\n    # Prompt input\n    prompt = st_keyup(\"Enter a prompt\", value=\"\", debounce=300, key=\"interactive_text\")\n\n    # Setup output path\n    output_name = os.path.join(output_dir, \"img_{idx}.jpg\")\n    if not os.path.exists(output_dir):\n        os.makedirs(output_dir)\n        idx = 0\n    else:\n        fns = [fn for fn in iglob(output_name.format(idx=\"*\")) if re.search(r\"img_[0-9]+\\.jpg$\", fn)]\n        idx = len(fns)\n\n    if st.button(\"Generate\"):\n        valid_input = False\n\n        if mode == \"Inpainting\" and canvas_result.image_data is not None:\n            valid_input = True\n            # Create mask from canvas\n            try:\n                mask = Image.fromarray(canvas_result.image_data)\n                mask = mask.getchannel(\"A\")  # Get alpha channel\n                mask_array = np.array(mask)\n                mask_array = (mask_array > 0).astype(np.uint8) * 255\n                mask = Image.fromarray(mask_array)\n                image_for_generation = image\n            except Exception as e:\n                st.error(f\"Error creating mask: {e}\")\n                return\n\n        elif mode == \"Outpainting\":\n            valid_input = True\n            # image_for_generation and mask are already set above\n\n        if not valid_input:\n            st.error(\"Please draw a mask or configure outpainting settings\")\n            return\n\n        # Create temporary files\n        with (\n            tempfile.NamedTemporaryFile(suffix=\".png\", delete=False) as tmp_img,\n            tempfile.NamedTemporaryFile(suffix=\".png\", delete=False) as tmp_mask,\n        ):\n            try:\n                image_for_generation.save(tmp_img.name)\n                mask.save(tmp_mask.name)\n            except Exception as e:\n                st.error(f\"Error saving temporary files: {e}\")\n                return\n\n            try:\n                # Generate inpainting/outpainting\n                rng = torch.Generator(device=\"cpu\")\n                if seed is None:\n                    seed = rng.seed()\n\n                print(f\"Generating with seed {seed}:\\n{prompt}\")\n                t0 = time.perf_counter()\n\n                x = get_noise(\n                    1,\n                    height,\n                    width,\n                    device=torch_device,\n                    dtype=torch.bfloat16,\n                    seed=seed,\n                )\n\n                if offload:\n                    t5, clip, ae = t5.to(torch_device), clip.to(torch_device), ae.to(torch_device)\n\n                inp = prepare_fill(\n                    t5,\n                    clip,\n                    x,\n                    prompt=prompt,\n                    ae=ae,\n                    img_cond_path=tmp_img.name,\n                    mask_path=tmp_mask.name,\n                )\n\n                timesteps = get_schedule(num_steps, inp[\"img\"].shape[1], shift=True)\n\n                if offload:\n                    t5, clip, ae = t5.cpu(), clip.cpu(), ae.cpu()\n                    torch.cuda.empty_cache()\n                    model = model.to(torch_device)\n\n                x = denoise_cache(model, **inp, timesteps=timesteps, guidance=guidance)\n\n                if offload:\n                    model.cpu()\n                    torch.cuda.empty_cache()\n                    ae.decoder.to(x.device)\n\n                x = unpack(x.float(), height, width)\n                with torch.autocast(device_type=torch_device.type, dtype=torch.bfloat16):\n                    x = ae.decode(x)\n\n                t1 = time.perf_counter()\n                print(f\"Done in {t1 - t0:.1f}s\")\n\n                # Process and display result\n                x = x.clamp(-1, 1)\n                x = embed_watermark(x.float())\n                x = rearrange(x[0], \"c h w -> h w c\")\n                img = Image.fromarray((127.5 * (x + 1.0)).cpu().byte().numpy())\n\n                nsfw_score = [x[\"score\"] for x in nsfw_classifier(img) if x[\"label\"] == \"nsfw\"][0]\n\n                if nsfw_score < NSFW_THRESHOLD:\n                    buffer = BytesIO()\n                    exif_data = Image.Exif()\n                    exif_data[ExifTags.Base.Software] = \"AI generated;inpainting;flux\"\n                    exif_data[ExifTags.Base.Make] = \"Black Forest Labs\"\n                    exif_data[ExifTags.Base.Model] = name\n                    if add_sampling_metadata:\n                        exif_data[ExifTags.Base.ImageDescription] = prompt\n                    img.save(buffer, format=\"jpeg\", exif=exif_data, quality=95, subsampling=0)\n\n                    img_bytes = buffer.getvalue()\n                    if save_samples:\n                        fn = output_name.format(idx=idx)\n                        print(f\"Saving {fn}\")\n                        with open(fn, \"wb\") as file:\n                            file.write(img_bytes)\n\n                    st.session_state[\"samples\"] = {\n                        \"prompt\": prompt,\n                        \"img\": img,\n                        \"seed\": seed,\n                        \"bytes\": img_bytes,\n                    }\n                else:\n                    st.warning(\"Your generated image may contain NSFW content.\")\n                    st.session_state[\"samples\"] = None\n\n            except Exception as e:\n                st.error(f\"Error during generation: {e}\")\n                return\n            finally:\n                # Clean up temporary files\n                try:\n                    os.unlink(tmp_img.name)\n                    os.unlink(tmp_mask.name)\n                except Exception as e:\n                    print(f\"Error cleaning up temporary files: {e}\")\n\n    # Display results\n    samples = st.session_state.get(\"samples\", None)\n    if samples is not None:\n        st.image(samples[\"img\"], caption=samples[\"prompt\"])\n        col1, col2 = st.columns(2)\n        with col1:\n            st.download_button(\n                \"Download full-resolution\",\n                samples[\"bytes\"],\n                file_name=\"generated.jpg\",\n                mime=\"image/jpg\",\n            )\n        with col2:\n            if st.button(\"Continue from this image\"):\n                # Store the generated image\n                new_image = samples[\"img\"]\n                # Clear ALL canvas state\n                clear_canvas_state()\n                if \"samples\" in st.session_state:\n                    del st.session_state[\"samples\"]\n                # Set as current image\n                st.session_state[\"current_image\"] = new_image\n                st.rerun()\n\n        st.write(f\"Seed: {samples['seed']}\")\n\n\nif __name__ == \"__main__\":\n    st.set_page_config(layout=\"wide\")\n    main()\n"
  },
  {
    "path": "flux-ToCa/docs/fill.md",
    "content": "## Models\n\nFLUX.1 Fill introduces advanced inpainting and outpainting capabilities. It allows for seamless edits that integrate naturally with existing images.\n\n| Name                | HuggingFace repo                                         | License                                                               | sha256sum                                                        |\n| ------------------- | -------------------------------------------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------- |\n| `FLUX.1 Fill [dev]` | https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) | 03e289f530df51d014f48e675a9ffa2141bc003259bf5f25d75b957e920a41ca |\n| `FLUX.1 Fill [pro]` | Only available in our API.                               |\n\n## Examples\n\n![inpainting](../assets/docs/inpainting.png)\n![outpainting](../assets/docs/outpainting.png)\n\n## Open-weights usage\n\nThe weights will be downloaded automatically from HuggingFace once you start one of the demos. To download `FLUX.1 Fill [dev]`, you will need to be logged in, see [here](https://huggingface.co/docs/huggingface_hub/guides/cli#huggingface-cli-login). Alternatively, if you have downloaded the model weights manually from [here](https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev), you can specify the downloaded paths via environment variables:\n\n```bash\nexport FLUX_DEV_FILL=<path_to_flux_dev_fill_sft_file>\nexport AE=<path_to_ae_sft_file>\n```\n\nFor interactive sampling run\n\n```bash\npython -m src.flux.cli_fill --loop\n```\n\nOr to generate a single sample run\n\n```bash\npython -m src.flux.cli_fill \\\n  --img_cond_path <path_to_input_image> \\\n  --img_mask_path <path_to_input_mask>\n```\n\nThe input_mask should be an image of the same size as the conditioning image that only contains black and white pixels; see [an example mask](../assets/cup_mask.png) for [this image](../assets/cup.png).\n\nWe also provide an interactive streamlit demo. The demo can be run via\n\n```bash\nstreamlit run demo_st_fill.py\n```\n"
  },
  {
    "path": "flux-ToCa/docs/image-variation.md",
    "content": "## Models\n\nFLUX.1 Redux is an adapter for the FLUX.1 text-to-image base models, FLUX.1 [dev] and FLUX.1 [schnell], which can be used to generate image variations. \nIn addition, FLUX.1 Redux [pro] is available in our API and, augmenting the [dev] adapter, the API endpoint allows users to modify an image given a textual description. The feature is supported in our latest model FLUX1.1 [pro] Ultra, allowing for combining input images and text prompts to create high-quality 4-megapixel outputs with flexible aspect ratios.\n\n| Name                        | HuggingFace repo                                                                                | License                                                               | sha256sum                                                        |\n| --------------------------- | ----------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------- |\n| `FLUX.1 Redux [dev]`        | https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev                                       | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) | a1b3bdcb4bdc58ce04874b9ca776d61fc3e914bb6beab41efb63e4e2694dca45 |\n| `FLUX.1 Redux [pro]`        | [Available in our API.](https://docs.bfl.ml/) Supports image variations.                        |\n| `FLUX1.1 Redux [pro] Ultra` | [Available in our API.](https://docs.bfl.ml/) Supports image variations based on a text prompt. |\n\n## Examples\n\n![redux](../assets/docs/redux.png)\n\n## Open-weights usage\n\nThe text-to-image base model weights and the autoencoder weights will be downloaded automatically from HuggingFace once you start the demo. To download `FLUX.1 [dev]`, you will need to be logged in, see [here](https://huggingface.co/docs/huggingface_hub/guides/cli#huggingface-cli-login). You need to manually download the adapter weights from [here](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev) and specify them via an environment variable `export FLUX_REDUX=<path_to_flux_redux_sft_file>`. In general, you may specify any manually downloaded weights via environment variables:\n\n```bash\nexport FLUX_REDUX=<path_to_flux_redux_sft_file>\nexport FLUX_SCHNELL=<path_to_flux_schnell_sft_file>\nexport FLUX_DEV=<path_to_flux_dev_sft_file>\nexport AE=<path_to_ae_sft_file>\n```\n\nFor interactive sampling run\n\n```bash\npython -m src.flux.cli_redux --loop --name <name>\n```\n\nwhere `name` is one of `flux-dev` or `flux-schnell`.\n"
  },
  {
    "path": "flux-ToCa/docs/structural-conditioning.md",
    "content": "## Models\n\nStructural conditioning uses canny edge or depth detection to maintain precise control during image transformations. By preserving the original image's structure through edge or depth maps, users can make text-guided edits while keeping the core composition intact. This is particularly effective for retexturing images. We release four variations: two based on edge maps (full model and LoRA for FLUX.1 [dev]) and two based on depth maps (full model and LoRA for FLUX.1 [dev]).\n\n| Name                      | HuggingFace repo                                               | License                                                               | sha256sum                                                        |\n| ------------------------- | -------------------------------------------------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------- |\n| `FLUX.1 Canny [dev]`      | https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev      | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) | 996876670169591cb412b937fbd46ea14cbed6933aef17c48a2dcd9685c98cdb |\n| `FLUX.1 Depth [dev]`      | https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev      | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) | 41360d1662f44ca45bc1b665fe6387e91802f53911001630d970a4f8be8dac21 |\n| `FLUX.1 Canny [dev] LoRA` | https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev-lora | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) | 8eaa21b9c43d5e7242844deb64b8cf22ae9010f813f955ca8c05f240b8a98f7e |\n| `FLUX.1 Depth [dev] LoRA` | https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev-lora | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) | 1938b38ea0fdd98080fa3e48beb2bedfbc7ad102d8b65e6614de704a46d8b907 | \n| `FLUX.1 Canny [pro]`      | [Available in our API](https://docs.bfl.ml/).                  |\n| `FLUX.1 Depth [pro]`      | [Available in our API](https://docs.bfl.ml/).                  |\n\n## Examples\n\n![canny](../assets/docs/canny.png)\n![depth](../assets/docs/depth.png)\n\n## Open-weights usage\n\nThe full model weights (`FLUX.1 Canny [dev], Flux.1 Depth [dev], FLUX.1 [dev], and the autoencoder) will be downloaded automatically from HuggingFace once you start one of the demos. To download them, you will need to be logged in, see [here](https://huggingface.co/docs/huggingface_hub/guides/cli#huggingface-cli-login). The LoRA weights are not downloaded automatically, but can be downloaded manually [here (Canny)](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev-lora) and [here (Depth)](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev-lora). You may specify any manually downloaded weights via environment variables: (**necessary for LoRAs**):\n\n```bash\nexport FLUX_DEV_DEPTH=<path_to_flux_dev_depth_sft_file>\nexport FLUX_DEV_CANNY=<path_to_flux_dev_canny_sft_file>\nexport FLUX_DEV_DEPTH_LORA=<path_to_flux_dev_depth_lora_sft_file>\nexport FLUX_DEV_CANNY_LORA=<path_to_flux_dev_canny_lora_sft_file>\nexport FLUX_REDUX=<path_to_flux_redux_sft_file>\nexport FLUX_SCHNELL=<path_to_flux_schnell_sft_file>\nexport FLUX_DEV=<path_to_flux_dev_sft_file>\nexport AE=<path_to_ae_sft_file>\n```\n\nFor interactive sampling run\n\n```bash\npython -m src.flux.cli_control --loop --name <name>\n```\n\nwhere `name` is one of `flux-dev-canny`, `flux-dev-depth`, `flux-dev-canny-lora`, or `flux-dev-depth-lora`.\n\n## Diffusers usage\n\nFlux Control (including the LoRAs) is also compatible with the `diffusers` Python library. Check out the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux) to learn more.\n"
  },
  {
    "path": "flux-ToCa/docs/text-to-image.md",
    "content": "## Models\n\nWe currently offer four text-to-image models. `FLUX1.1 [pro]` is our most capable model which can generate images at up to 4MP while maintaining an impressive generation time of only 10 seconds per sample.\n\n| Name                      | HuggingFace repo                                        | License                                                               | sha256sum                                                        |\n| ------------------------- | ------------------------------------------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------- |\n| `FLUX.1 [schnell]`        | https://huggingface.co/black-forest-labs/FLUX.1-schnell | [apache-2.0](model_licenses/LICENSE-FLUX1-schnell)                    | 9403429e0052277ac2a87ad800adece5481eecefd9ed334e1f348723621d2a0a |\n| `FLUX.1 [dev]`            | https://huggingface.co/black-forest-labs/FLUX.1-dev     | [FLUX.1-dev Non-Commercial License](model_licenses/LICENSE-FLUX1-dev) | 4610115bb0c89560703c892c59ac2742fa821e60ef5871b33493ba544683abd7 |\n| `FLUX.1 [pro]`            | [Available in our API](https://docs.bfl.ml/).           |\n| `FLUX1.1 [pro]`           | [Available in our API](https://docs.bfl.ml/).           |\n| `FLUX1.1 [pro] Ultra/raw` | [Available in our API](https://docs.bfl.ml/).           |\n\n## Open-weights usage\n\nThe weights will be downloaded automatically from HuggingFace once you start one of the demos. To download `FLUX.1 [dev]`, you will need to be logged in, see [here](https://huggingface.co/docs/huggingface_hub/guides/cli#huggingface-cli-login).\nIf you have downloaded the model weights manually, you can specify the downloaded paths via environment-variables:\n\n```bash\nexport FLUX_SCHNELL=<path_to_flux_schnell_sft_file>\nexport FLUX_DEV=<path_to_flux_dev_sft_file>\nexport AE=<path_to_ae_sft_file>\n```\n\nFor interactive sampling run\n\n```bash\npython -m flux --name <name> --loop\n```\n\nOr to generate a single sample run\n\n```bash\npython -m flux --name <name> \\\n  --height <height> --width <width> \\\n  --prompt \"<prompt>\"\n```\n\nWe also provide a streamlit demo that does both text-to-image and image-to-image. The demo can be run via\n\n```bash\nstreamlit run demo_st.py\n```\n\nWe also offer a Gradio-based demo for an interactive experience. To run the Gradio demo:\n\n```bash\npython demo_gr.py --name flux-schnell --device cuda\n```\n\nOptions:\n\n- `--name`: Choose the model to use (options: \"flux-schnell\", \"flux-dev\")\n- `--device`: Specify the device to use (default: \"cuda\" if available, otherwise \"cpu\")\n- `--offload`: Offload model to CPU when not in use\n- `--share`: Create a public link to your demo\n\nTo run the demo with the dev model and create a public link:\n\n```bash\npython demo_gr.py --name flux-dev --share\n```\n\n## Diffusers integration\n\n`FLUX.1 [schnell]` and `FLUX.1 [dev]` are integrated with the [🧨 diffusers](https://github.com/huggingface/diffusers) library. To use it with diffusers, install it:\n\n```shell\npip install git+https://github.com/huggingface/diffusers.git\n```\n\nThen you can use `FluxPipeline` to run the model\n\n```python\nimport torch\nfrom diffusers import FluxPipeline\n\nmodel_id = \"black-forest-labs/FLUX.1-schnell\" #you can also use `black-forest-labs/FLUX.1-dev`\n\npipe = FluxPipeline.from_pretrained(\"black-forest-labs/FLUX.1-schnell\", torch_dtype=torch.bfloat16)\npipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power\n\nprompt = \"A cat holding a sign that says hello world\"\nseed = 42\nimage = pipe(\n    prompt,\n    output_type=\"pil\",\n    num_inference_steps=4, #use a larger number if you are using [dev]\n    generator=torch.Generator(\"cpu\").manual_seed(seed)\n).images[0]\nimage.save(\"flux-schnell.png\")\n```\n\nTo learn more check out the [diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux) documentation\n"
  },
  {
    "path": "flux-ToCa/model_cards/FLUX.1-dev.md",
    "content": "![FLUX.1 [dev] Grid](../assets/dev_grid.jpg)\n\n`FLUX.1 [dev]` is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.\nFor more information, please read our [blog post](https://blackforestlabs.ai/announcing-black-forest-labs/).\n\n# Key Features\n1. Cutting-edge output quality, second only to our state-of-the-art model `FLUX.1 [pro]`.\n2. Competitive prompt following, matching the performance of closed source alternatives.\n3. Trained using guidance distillation, making `FLUX.1 [dev]` more efficient.\n4. Open weights to drive new scientific research, and empower artists to develop innovative workflows.\n5. Generated outputs can be used for personal, scientific, and commercial purposes, as described in the [flux-1-dev-non-commercial-license](./licence.md).\n\n# Usage\nWe provide a reference implementation of `FLUX.1 [dev]`, as well as sampling code, in a dedicated [github repository](https://github.com/black-forest-labs/flux).\nDevelopers and creatives looking to build on top of `FLUX.1 [dev]` are encouraged to use this as a starting point.\n\n## API Endpoints\nThe FLUX.1 models are also available via API from the following sources\n1. [bfl.ml](https://docs.bfl.ml/) (currently `FLUX.1 [pro]`)\n2. [replicate.com](https://replicate.com/collections/flux)\n3. [fal.ai](https://fal.ai/models/fal-ai/flux/dev)\n\n## ComfyUI\n`FLUX.1 [dev]` is also available in [Comfy UI](https://github.com/comfyanonymous/ComfyUI) for local inference with a node-based workflow.\n\n---\n# Limitations\n- This model is not intended or able to provide factual information.\n- As a statistical model this checkpoint might amplify existing societal biases.\n- The model may fail to generate output that matches the prompts.\n- Prompt following is heavily influenced by the prompting-style.\n\n# Out-of-Scope Use\nThe model and its derivatives may not be used\n\n- In any way that violates any applicable national, federal, state, local or international law or regulation.\n- For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; including but not limited to the solicitation, creation, acquisition, or dissemination of child exploitative content.\n- To generate or disseminate verifiably false information and/or content with the purpose of harming others.\n- To generate or disseminate personal identifiable information that can be used to harm an individual.\n- To harass, abuse, threaten, stalk, or bully individuals or groups of individuals.\n- To create non-consensual nudity or illegal pornographic content.\n- For fully automated decision making that adversely impacts an individual's legal rights or otherwise creates or modifies a binding, enforceable obligation.\n- Generating or facilitating large-scale disinformation campaigns.\n\n# License\nThis model falls under the [`FLUX.1 [dev]` Non-Commercial License](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md).\n"
  },
  {
    "path": "flux-ToCa/model_cards/FLUX.1-schnell.md",
    "content": "![FLUX.1 [schnell] Grid](../assets/schnell_grid.jpg)\n\n`FLUX.1 [schnell]` is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.\nFor more information, please read our [blog post](https://blackforestlabs.ai/announcing-black-forest-labs/).\n\n# Key Features\n1. Cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives.\n2. Trained using latent adversarial diffusion distillation, `FLUX.1 [schnell]` can generate high-quality images in only 1 to 4 steps.\n3. Released under the `apache-2.0` licence, the model can be used for personal, scientific, and commercial purposes.\n\n# Usage\nWe provide a reference implementation of `FLUX.1 [schnell]`, as well as sampling code, in a dedicated [github repository](https://github.com/black-forest-labs/flux).\nDevelopers and creatives looking to build on top of `FLUX.1 [schnell]` are encouraged to use this as a starting point.\n\n## API Endpoints\nThe FLUX.1 models are also available via API from the following sources\n1. [bfl.ml](https://docs.bfl.ml/) (currently `FLUX.1 [pro]`)\n2. [replicate.com](https://replicate.com/collections/flux)\n3. [fal.ai](https://fal.ai/models/fal-ai/flux/schnell)\n\n## ComfyUI\n`FLUX.1 [schnell]` is also available in [Comfy UI](https://github.com/comfyanonymous/ComfyUI) for local inference with a node-based workflow.\n\n---\n# Limitations\n- This model is not intended or able to provide factual information.\n- As a statistical model this checkpoint might amplify existing societal biases.\n- The model may fail to generate output that matches the prompts.\n- Prompt following is heavily influenced by the prompting-style.\n\n# Out-of-Scope Use\nThe model and its derivatives may not be used\n\n- In any way that violates any applicable national, federal, state, local or international law or regulation.\n- For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; including but not limited to the solicitation, creation, acquisition, or dissemination of child exploitative content.\n- To generate or disseminate verifiably false information and/or content with the purpose of harming others.\n- To generate or disseminate personal identifiable information that can be used to harm an individual.\n- To harass, abuse, threaten, stalk, or bully individuals or groups of individuals.\n- To create non-consensual nudity or illegal pornographic content.\n- For fully automated decision making that adversely impacts an individual's legal rights or otherwise creates or modifies a binding, enforceable obligation.\n- Generating or facilitating large-scale disinformation campaigns.\n"
  },
  {
    "path": "flux-ToCa/model_licenses/LICENSE-FLUX1-dev",
    "content": "FLUX.1 [dev] Non-Commercial License \nBlack Forest Labs, Inc. (“we” or “our” or “Company”) is pleased to make available the weights, parameters and inference code for the FLUX.1 [dev] Model (as defined below) freely available for your non-commercial and non-production use as set forth in this FLUX.1 [dev] Non-Commercial License (“License”).  The “FLUX.1 [dev] Model” means the FLUX.1 [dev] AI models, including FLUX.1 [dev], FLUX.1 Fill [dev], FLUX.1 Depth [dev], FLUX.1 Canny [dev], FLUX.1 Redux [dev], FLUX.1 Canny [dev] LoRA and FLUX.1 Depth [dev] LoRA, and their elements which includes algorithms, software, checkpoints, parameters, source code (inference code, evaluation code, and if applicable, fine-tuning code) and any other materials associated with the FLUX.1 [dev] AI models made available by Company under this License, including if any, the technical documentation, manuals and instructions for the use and operation thereof (collectively, “FLUX.1 [dev] Model”).\nBy downloading, accessing, use, Distributing (as defined below), or creating a Derivative (as defined below) of the FLUX.1 [dev] Model, you agree to the terms of this License. If you do not agree to this License, then you do not have any rights to access, use, Distribute or create a Derivative of the FLUX.1 [dev] Model and you must immediately cease using the FLUX.1 [dev] Model. If you are agreeing to be bound by the terms of this License on behalf of your employer or other entity, you represent and warrant to us that you have full legal authority to bind your employer or such entity to this License. If you do not have the requisite authority, you may not accept the License or access the FLUX.1 [dev] Model on behalf of your employer or other entity.\n    1. Definitions. Capitalized terms used in this License but not defined herein have the following meanings:\n        a. “Derivative”  means any (i) modified version of the FLUX.1 [dev] Model (including but not limited to any customized or fine-tuned version thereof), (ii) work based on the FLUX.1 [dev] Model, or (iii) any other derivative work thereof. For the avoidance of doubt, Outputs are not considered Derivatives under this License. \n        b. “Distribution” or “Distribute” or “Distributing” means providing or making available, by any means, a copy of the FLUX.1 [dev] Models and/or the Derivatives as the case may be. \n        c. “Non-Commercial Purpose” means any of the following uses, but only so far as you do not receive any direct or indirect payment arising from the use of the model or its output: (i) personal use for research, experiment, and testing for the benefit of public knowledge, personal study, private entertainment, hobby projects, or otherwise not directly or indirectly connected to any commercial activities, business operations, or employment responsibilities; (ii) use by commercial or for-profit entities for testing, evaluation, or non-commercial research and development in a non-production environment, (iii) use by any charitable organization for charitable purposes, or for testing or evaluation. For clarity, use for revenue-generating activity or direct interactions with or impacts on end users, or use to train, fine tune or distill other models for commercial use is not a Non-Commercial purpose.\n        d. “Outputs” means any content generated by the operation of the FLUX.1 [dev] Models or the Derivatives from a prompt (i.e., text instructions) provided by users. For the avoidance of doubt, Outputs do not include any components of a FLUX.1 [dev] Models, such as any fine-tuned versions of the FLUX.1 [dev] Models, the weights, or parameters. \n        e.   “you” or “your” means the individual or entity entering into this License with Company.\n    2. License Grant.\n        a. License. Subject to your compliance with this License, Company grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license to access, use, create Derivatives of, and Distribute the FLUX.1 [dev] Models solely for your Non-Commercial Purposes. The foregoing license is personal to you, and you may not assign or sublicense this License or any other rights or obligations under this License without Company’s prior written consent; any such assignment or sublicense will be void and will automatically and immediately terminate this License.  Any restrictions set forth herein in regarding the FLUX.1 [dev] Model also applies to any Derivative you create or that are created on your behalf.\n        b. Non-Commercial Use Only.  You may only access, use, Distribute, or creative Derivatives of or the FLUX.1 [dev] Model or Derivatives for Non-Commercial Purposes.  If You want to use a FLUX.1 [dev] Model a Derivative for any purpose that is not expressly authorized under this License, such as for a commercial activity, you must request a license from Company, which Company may grant to you in Company’s sole discretion and which additional use may be subject to a fee, royalty or other revenue share. Please contact Company at the following e-mail address if you want to discuss such a license: info@blackforestlabs.ai. \n        c. Reserved Rights. The grant of rights expressly set forth in this License are the complete grant of rights to you in the FLUX.1 [dev] Model, and no other licenses are granted, whether by waiver, estoppel, implication, equity or otherwise. Company and its licensors reserve all rights not expressly granted by this License. \n        d. Outputs. We claim no ownership rights in and to the Outputs. You are solely responsible for the Outputs you generate and their subsequent uses in accordance with this License.  You may use Output for any purpose (including for commercial purposes), except as expressly prohibited herein.  You may not use the Output to train, fine-tune or distill a model that is competitive with the FLUX.1 [dev] Model.\n    3. Distribution. Subject to this License, you may Distribute copies of the FLUX.1 [dev] Model and/or Derivatives made by you, under the following conditions: \n        a. you must make available a copy of this License to third-party recipients of the FLUX.1 [dev] Models and/or Derivatives you Distribute, and specify that any rights to use the FLUX.1 [dev] Models and/or Derivatives shall be directly granted by Company to said third-party recipients pursuant to this License; \n        b. you must make prominently display the following notice alongside the Distribution of the FLUX.1 [dev] Model or Derivative (such as via a “Notice” text file distributed as part of such FLUX.1 [dev] Model or Derivative) (the “Attribution Notice”): \n“The FLUX.1 [dev] Model is licensed by Black Forest Labs. Inc. under the FLUX.1 [dev] Non-Commercial License. Copyright Black Forest Labs. Inc. \nIN NO EVENT SHALL BLACK FOREST LABS, INC. BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL.”\n        c. in the case of Distribution of Derivatives made by you, you must also include in the Attribution Notice a statement that you have modified the applicable FLUX.1 [dev] Model; and\n        d. in the case of Distribution of Derivatives made by you, any terms and conditions you impose on any third-party recipients relating to Derivatives made by or for you shall neither limit such third-party recipients’ use of the FLUX.1 [dev] Model or any Derivatives made by or for Company in accordance with this License nor conflict with any of its terms and conditions. \n        e. In the case of Distribution of Derivatives made by you, you must not misrepresent or imply, through any means, that the Derivatives made by or for you and/or any modified version of the FLUX.1 [dev] Model you Distribute under your name and responsibility is an official product of the Company or has been endorsed, approved or validated by the Company, unless you are authorized by Company to do so in writing.\n    4. Restrictions.  You will not, and will not permit, assist or cause any third party to \n        a. use, modify, copy, reproduce, create Derivatives of, or Distribute the FLUX.1 [dev] Model (or any Derivative thereof, or any data produced by the FLUX.1 [dev] Model), in whole or in part, for (i) any commercial or production purposes, (ii) military purposes, (iii) purposes of surveillance, including any research or development relating to surveillance, (iv) biometric processing, (v) in any manner that infringes, misappropriates, or otherwise violates any third-party rights, or (vi) in any manner that violates any applicable law and violating any privacy or security laws, rules, regulations, directives, or governmental requirements (including the General Data Privacy Regulation (Regulation (EU) 2016/679), the California Consumer Privacy Act, and any and all laws governing the processing of biometric information), as well as all amendments and successor laws to any of the foregoing;\n        b. alter or remove copyright and other proprietary notices which appear on or in any portion of the FLUX.1 [dev] Model;\n        c. utilize any equipment, device, software, or other means to circumvent or remove any security or protection used by Company in connection with the FLUX.1 [dev] Model, or to circumvent or remove any usage restrictions, or to enable functionality disabled by FLUX.1 [dev] Model; or\n        d. offer or impose any terms on the FLUX.1 [dev] Model that alter, restrict, or are inconsistent with the terms of this License.\n        e. violate any applicable U.S. and non-U.S. export control and trade sanctions laws (“Export Laws”) in connection with your use or Distribution of any FLUX.1 [dev] Model;\n        f. directly or indirectly Distribute, export, or otherwise transfer FLUX.1 [dev] Model  (a) to any individual, entity, or country prohibited by Export Laws; (b) to anyone on U.S. or non-U.S. government restricted parties lists; or (c) for any purpose prohibited by Export Laws, including nuclear, chemical or biological weapons, or missile technology applications; 3) use or download FLUX.1 [dev] Model if you or they are  (a) located in a comprehensively sanctioned jurisdiction, (b) currently listed on any U.S. or non-U.S. restricted parties list, or (c) for any purpose prohibited by Export Laws; and (4) will not disguise your location through IP proxying or other methods.\n    5. DISCLAIMERS.  THE FLUX.1 [dev] MODEL IS PROVIDED “AS IS” AND “WITH ALL FAULTS” WITH NO WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. COMPANY EXPRESSLY DISCLAIMS ALL REPRESENTATIONS AND WARRANTIES, EXPRESS OR IMPLIED, WHETHER BY STATUTE, CUSTOM, USAGE OR OTHERWISE AS TO ANY MATTERS RELATED TO THE FLUX.1 [dev] MODEL, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, SATISFACTORY QUALITY, OR NON-INFRINGEMENT. COMPANY MAKES NO WARRANTIES OR REPRESENTATIONS THAT THE FLUX.1 [dev] MODEL WILL BE ERROR FREE OR FREE OF VIRUSES OR OTHER HARMFUL COMPONENTS, OR PRODUCE ANY PARTICULAR RESULTS.\n    6. LIMITATION OF LIABILITY.  TO THE FULLEST EXTENT PERMITTED BY LAW, IN NO EVENT WILL COMPANY BE LIABLE TO YOU OR YOUR EMPLOYEES, AFFILIATES, USERS, OFFICERS OR DIRECTORS (A) UNDER ANY THEORY OF LIABILITY, WHETHER BASED IN CONTRACT, TORT, NEGLIGENCE, STRICT LIABILITY, WARRANTY, OR OTHERWISE UNDER THIS LICENSE, OR (B) FOR ANY INDIRECT, CONSEQUENTIAL, EXEMPLARY, INCIDENTAL, PUNITIVE OR SPECIAL DAMAGES OR LOST PROFITS, EVEN IF COMPANY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE FLUX.1 [dev] MODEL, ITS CONSTITUENT COMPONENTS, AND ANY OUTPUT (COLLECTIVELY, “MODEL MATERIALS”) ARE NOT DESIGNED OR INTENDED FOR USE IN ANY APPLICATION OR SITUATION WHERE FAILURE OR FAULT OF THE MODEL MATERIALS COULD REASONABLY BE ANTICIPATED TO LEAD TO SERIOUS INJURY OF ANY PERSON, INCLUDING POTENTIAL DISCRIMINATION OR VIOLATION OF AN INDIVIDUAL’S PRIVACY RIGHTS, OR TO SEVERE PHYSICAL, PROPERTY, OR ENVIRONMENTAL DAMAGE (EACH, A “HIGH-RISK USE”). IF YOU ELECT TO USE ANY OF THE MODEL MATERIALS FOR A HIGH-RISK USE, YOU DO SO AT YOUR OWN RISK. YOU AGREE TO DESIGN AND IMPLEMENT APPROPRIATE DECISION-MAKING AND RISK-MITIGATION PROCEDURES AND POLICIES IN CONNECTION WITH A HIGH-RISK USE SUCH THAT EVEN IF THERE IS A FAILURE OR FAULT IN ANY OF THE MODEL MATERIALS, THE SAFETY OF PERSONS OR PROPERTY AFFECTED BY THE ACTIVITY STAYS AT A LEVEL THAT IS REASONABLE, APPROPRIATE, AND LAWFUL FOR THE FIELD OF THE HIGH-RISK USE.\n    7. INDEMNIFICATION\n\nYou will indemnify, defend and hold harmless Company and our subsidiaries and affiliates, and each of our respective shareholders, directors, officers, employees, agents, successors, and assigns (collectively, the “Company Parties”) from and against any losses, liabilities, damages, fines, penalties, and expenses (including reasonable attorneys’ fees) incurred by any Company Party in connection with any claim, demand, allegation, lawsuit, proceeding, or investigation (collectively, “Claims”) arising out of or related to  (a) your access to or use of the FLUX.1 [dev] Model (as well as any Output, results or data generated from such access or use), including any High-Risk Use (defined below); (b) your violation of this License; or (c) your violation, misappropriation or infringement of any rights of another (including intellectual property or other proprietary rights and privacy rights). You will promptly notify the Company Parties of any such Claims, and cooperate with Company Parties in defending such Claims. You will also grant the Company Parties sole control of the defense or settlement, at Company’s sole option, of any Claims. This indemnity is in addition to, and not in lieu of, any other indemnities or remedies set forth in a written agreement between you and Company or the other Company Parties.\n    8. Termination; Survival.\n        a. This License will automatically terminate upon any breach by you of the terms of this License.\n        b. We may terminate this License, in whole or in part, at any time upon notice (including electronic) to you.\n        c. If You initiate any legal action or proceedings against Company or any other entity (including a cross-claim or counterclaim in a lawsuit), alleging that the FLUX.1 [dev] Model or any Derivative, or any part thereof, infringe upon intellectual property or other rights owned or licensable by you, then any licenses granted to you under this License will immediately terminate as of the date such legal action or claim is filed or initiated.\n        d. Upon termination of this License, you must cease all use, access or Distribution of the FLUX.1 [dev] Model and any Derivatives.  The following sections survive termination of this License  2(c), 2(d), 4-11.  \n    9. Third Party Materials. The FLUX.1 [dev] Model may contain third-party software or other components (including free and open source software) (all of the foregoing, “Third Party Materials”), which are subject to the license terms of the respective third-party licensors. Your dealings or correspondence with third parties and your use of or interaction with any Third Party Materials are solely between you and the third party. Company does not control or endorse, and makes no representations or warranties regarding, any Third Party Materials, and your access to and use of such Third Party Materials are at your own risk.\n    10. Trademarks. You have not been granted any trademark license as part of this License and may not use any name or mark associated with Company without the prior written permission of Company, except to the extent necessary to make the reference required in the Attribution Notice as specified above or as is reasonably necessary in describing the FLUX.1 [dev] Model and its creators.  \n    11. General. This License will be governed and construed under the laws of the State of Delaware without regard to conflicts of law provisions. If any provision or part of a provision of this License is unlawful, void or unenforceable, that provision or part of the provision is deemed severed from this License, and will not affect the validity and enforceability of any remaining provisions. The failure of Company to exercise or enforce any right or provision of this License will not operate as a waiver of such right or provision. This License does not confer any third-party beneficiary rights upon any other person or entity. This License, together with the Documentation, contains the entire understanding between you and Company regarding the subject matter of this License, and supersedes all other written or oral agreements and understandings between you and Company regarding such subject matter. No change or addition to any provision of this License will be binding unless it is in writing and signed by an authorized representative of both you and Company."
  },
  {
    "path": "flux-ToCa/model_licenses/LICENSE-FLUX1-schnell",
    "content": "\n\nApache License\nVersion 2.0, January 2004\nhttp://www.apache.org/licenses/\n\nTERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION\n\n1. Definitions.\n\n\"License\" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.\n\n\"Licensor\" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.\n\n\"Legal Entity\" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, \"control\" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.\n\n\"You\" (or \"Your\") shall mean an individual or Legal Entity exercising permissions granted by this License.\n\n\"Source\" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.\n\n\"Object\" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.\n\n\"Work\" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).\n\n\"Derivative Works\" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.\n\n\"Contribution\" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, \"submitted\" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as \"Not a Contribution.\"\n\n\"Contributor\" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.\n\n2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.\n\n3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.\n\n4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:\n\n    You must give any other recipients of the Work or Derivative Works a copy of this License; and\n    You must cause any modified files to carry prominent notices stating that You changed the files; and\n    You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and\n    If the Work includes a \"NOTICE\" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.\n\nYou may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.\n\n5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.\n\n6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.\n\n7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.\n\n8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.\n\n9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.\n\nEND OF TERMS AND CONDITIONS\n"
  },
  {
    "path": "flux-ToCa/pyproject.toml",
    "content": "[project]\nname = \"flux\"\nauthors = [\n  { name = \"Black Forest Labs\", email = \"support@blackforestlabs.ai\" },\n]\ndescription = \"Inference codebase for FLUX\"\nreadme = \"README.md\"\nrequires-python = \">=3.10\"\nlicense = { file = \"LICENSE.md\" }\ndynamic = [\"version\"]\ndependencies = [\n  \"torch == 2.5.1\",\n  \"torchvision\",\n  \"einops\",\n  \"fire >= 0.6.0\",\n  \"huggingface-hub\",\n  \"safetensors\",\n  \"sentencepiece\",\n  \"transformers\",\n  \"tokenizers\",\n  \"protobuf\",\n  \"requests\",\n  \"invisible-watermark\",\n  \"ruff == 0.6.8\",\n]\n\n[project.optional-dependencies]\nstreamlit = [\n  \"streamlit\",\n  \"streamlit-drawable-canvas\",\n  \"streamlit-keyup\",\n]\ngradio = [\n  \"gradio\",\n]\nall = [\n  \"flux[streamlit]\",\n  \"flux[gradio]\",\n]\n\n[project.scripts]\nflux = \"flux.cli:app\"\n\n[build-system]\nbuild-backend = \"setuptools.build_meta\"\nrequires = [\"setuptools>=64\", \"wheel\", \"setuptools_scm>=8\"]\n\n[tool.ruff]\nline-length = 110\ntarget-version = \"py310\"\nextend-exclude = [\"/usr/lib/*\"]\n\n[tool.ruff.lint]\nignore = [\n  \"E501\", # line too long - will be fixed in format\n]\n\n[tool.ruff.format]\nquote-style = \"double\"\nindent-style = \"space\"\nline-ending = \"auto\"\nskip-magic-trailing-comma = false\ndocstring-code-format = true\nexclude = [\n  \"src/flux/_version.py\", # generated by setuptools_scm\n]\n\n[tool.ruff.lint.isort]\ncombine-as-imports = true\nforce-wrap-aliases = true\nknown-local-folder = [\"src\"]\nknown-first-party = [\"flux\"]\n\n[tool.pyright]\ninclude = [\"src\"]\nexclude = [\n  \"**/__pycache__\", # cache directories\n  \"./typings\",      # generated type stubs\n]\nstubPath = \"./typings\"\n\n[tool.tomlsort]\nin_place = true\nno_sort_tables = true\nspaces_before_inline_comment = 1\nspaces_indent_inline_array = 2\ntrailing_comma_inline_array = true\nsort_first = [\n  \"project\",\n  \"build-system\",\n  \"tool.setuptools\",\n]\n\n# needs to be last for CI reasons\n[tool.setuptools_scm]\nwrite_to = \"src/flux/_version.py\"\nparentdir_prefix_version = \"flux-\"\nfallback_version = \"0.0.0\"\nversion_scheme = \"post-release\"\n"
  },
  {
    "path": "flux-ToCa/setup.py",
    "content": "import setuptools\n\nsetuptools.setup()\n"
  },
  {
    "path": "flux-ToCa/src/flux/__init__.py",
    "content": "try:\n    from ._version import (\n        version as __version__,  # type: ignore\n        version_tuple,\n    )\nexcept ImportError:\n    __version__ = \"unknown (no version information available)\"\n    version_tuple = (0, 0, \"unknown\", \"noinfo\")\n\nfrom pathlib import Path\n\nPACKAGE = __package__.replace(\"_\", \"-\")\nPACKAGE_ROOT = Path(__file__).parent\n"
  },
  {
    "path": "flux-ToCa/src/flux/__main__.py",
    "content": "from .cli import app\n\nif __name__ == \"__main__\":\n    app()\n"
  },
  {
    "path": "flux-ToCa/src/flux/_version.py",
    "content": "# file generated by setuptools_scm\n# don't change, don't track in version control\nTYPE_CHECKING = False\nif TYPE_CHECKING:\n    from typing import Tuple, Union\n    VERSION_TUPLE = Tuple[Union[int, str], ...]\nelse:\n    VERSION_TUPLE = object\n\nversion: str\n__version__: str\n__version_tuple__: VERSION_TUPLE\nversion_tuple: VERSION_TUPLE\n\n__version__ = version = '0.0.post49+gd06f828.d20250206'\n__version_tuple__ = version_tuple = (0, 0, 'gd06f828.d20250206')\n"
  },
  {
    "path": "flux-ToCa/src/flux/api.py",
    "content": "import io\nimport os\nimport time\nfrom pathlib import Path\n\nimport requests\nfrom PIL import Image\n\nAPI_URL = \"https://api.bfl.ml\"\nAPI_ENDPOINTS = {\n    \"flux.1-pro\": \"flux-pro\",\n    \"flux.1-dev\": \"flux-dev\",\n    \"flux.1.1-pro\": \"flux-pro-1.1\",\n}\n\n\nclass ApiException(Exception):\n    def __init__(self, status_code: int, detail: str | list[dict] | None = None):\n        super().__init__()\n        self.detail = detail\n        self.status_code = status_code\n\n    def __str__(self) -> str:\n        return self.__repr__()\n\n    def __repr__(self) -> str:\n        if self.detail is None:\n            message = None\n        elif isinstance(self.detail, str):\n            message = self.detail\n        else:\n            message = \"[\" + \",\".join(d[\"msg\"] for d in self.detail) + \"]\"\n        return f\"ApiException({self.status_code=}, {message=}, detail={self.detail})\"\n\n\nclass ImageRequest:\n    def __init__(\n        self,\n        # api inputs\n        prompt: str,\n        name: str = \"flux.1.1-pro\",\n        width: int | None = None,\n        height: int | None = None,\n        num_steps: int | None = None,\n        prompt_upsampling: bool | None = None,\n        seed: int | None = None,\n        guidance: float | None = None,\n        interval: float | None = None,\n        safety_tolerance: int | None = None,\n        # behavior of this class\n        validate: bool = True,\n        launch: bool = True,\n        api_key: str | None = None,\n    ):\n        \"\"\"\n        Manages an image generation request to the API.\n\n        All parameters not specified will use the API defaults.\n\n        Args:\n            prompt: Text prompt for image generation.\n            width: Width of the generated image in pixels. Must be a multiple of 32.\n            height: Height of the generated image in pixels. Must be a multiple of 32.\n            name: Which model version to use\n            num_steps: Number of steps for the image generation process.\n            prompt_upsampling: Whether to perform upsampling on the prompt.\n            seed: Optional seed for reproducibility.\n            guidance: Guidance scale for image generation.\n            safety_tolerance: Tolerance level for input and output moderation.\n                 Between 0 and 6, 0 being most strict, 6 being least strict.\n            validate: Run input validation\n            launch: Directly launches request\n            api_key: Your API key if not provided by the environment\n\n        Raises:\n            ValueError: For invalid input, when `validate`\n            ApiException: For errors raised from the API\n        \"\"\"\n        if validate:\n            if name not in API_ENDPOINTS.keys():\n                raise ValueError(f\"Invalid model {name}\")\n            elif width is not None and width % 32 != 0:\n                raise ValueError(f\"width must be divisible by 32, got {width}\")\n            elif width is not None and not (256 <= width <= 1440):\n                raise ValueError(f\"width must be between 256 and 1440, got {width}\")\n            elif height is not None and height % 32 != 0:\n                raise ValueError(f\"height must be divisible by 32, got {height}\")\n            elif height is not None and not (256 <= height <= 1440):\n                raise ValueError(f\"height must be between 256 and 1440, got {height}\")\n            elif num_steps is not None and not (1 <= num_steps <= 50):\n                raise ValueError(f\"steps must be between 1 and 50, got {num_steps}\")\n            elif guidance is not None and not (1.5 <= guidance <= 5.0):\n                raise ValueError(f\"guidance must be between 1.5 and 4, got {guidance}\")\n            elif interval is not None and not (1.0 <= interval <= 4.0):\n                raise ValueError(f\"interval must be between 1 and 4, got {interval}\")\n            elif safety_tolerance is not None and not (0 <= safety_tolerance <= 6.0):\n                raise ValueError(f\"safety_tolerance must be between 0 and 6, got {interval}\")\n\n            if name == \"flux.1-dev\":\n                if interval is not None:\n                    raise ValueError(\"Interval is not supported for flux.1-dev\")\n            if name == \"flux.1.1-pro\":\n                if interval is not None or num_steps is not None or guidance is not None:\n                    raise ValueError(\"Interval, num_steps and guidance are not supported for \" \"flux.1.1-pro\")\n\n        self.name = name\n        self.request_json = {\n            \"prompt\": prompt,\n            \"width\": width,\n            \"height\": height,\n            \"steps\": num_steps,\n            \"prompt_upsampling\": prompt_upsampling,\n            \"seed\": seed,\n            \"guidance\": guidance,\n            \"interval\": interval,\n            \"safety_tolerance\": safety_tolerance,\n        }\n        self.request_json = {key: value for key, value in self.request_json.items() if value is not None}\n\n        self.request_id: str | None = None\n        self.result: dict | None = None\n        self._image_bytes: bytes | None = None\n        self._url: str | None = None\n        if api_key is None:\n            self.api_key = os.environ.get(\"BFL_API_KEY\")\n        else:\n            self.api_key = api_key\n\n        if launch:\n            self.request()\n\n    def request(self):\n        \"\"\"\n        Request to generate the image.\n        \"\"\"\n        if self.request_id is not None:\n            return\n        response = requests.post(\n            f\"{API_URL}/v1/{API_ENDPOINTS[self.name]}\",\n            headers={\n                \"accept\": \"application/json\",\n                \"x-key\": self.api_key,\n                \"Content-Type\": \"application/json\",\n            },\n            json=self.request_json,\n        )\n        result = response.json()\n        if response.status_code != 200:\n            raise ApiException(status_code=response.status_code, detail=result.get(\"detail\"))\n        self.request_id = response.json()[\"id\"]\n\n    def retrieve(self) -> dict:\n        \"\"\"\n        Wait for the generation to finish and retrieve response.\n        \"\"\"\n        if self.request_id is None:\n            self.request()\n        while self.result is None:\n            response = requests.get(\n                f\"{API_URL}/v1/get_result\",\n                headers={\n                    \"accept\": \"application/json\",\n                    \"x-key\": self.api_key,\n                },\n                params={\n                    \"id\": self.request_id,\n                },\n            )\n            result = response.json()\n            if \"status\" not in result:\n                raise ApiException(status_code=response.status_code, detail=result.get(\"detail\"))\n            elif result[\"status\"] == \"Ready\":\n                self.result = result[\"result\"]\n            elif result[\"status\"] == \"Pending\":\n                time.sleep(0.5)\n            else:\n                raise ApiException(status_code=200, detail=f\"API returned status '{result['status']}'\")\n        return self.result\n\n    @property\n    def bytes(self) -> bytes:\n        \"\"\"\n        Generated image as bytes.\n        \"\"\"\n        if self._image_bytes is None:\n            response = requests.get(self.url)\n            if response.status_code == 200:\n                self._image_bytes = response.content\n            else:\n                raise ApiException(status_code=response.status_code)\n        return self._image_bytes\n\n    @property\n    def url(self) -> str:\n        \"\"\"\n        Public url to retrieve the image from\n        \"\"\"\n        if self._url is None:\n            result = self.retrieve()\n            self._url = result[\"sample\"]\n        return self._url\n\n    @property\n    def image(self) -> Image.Image:\n        \"\"\"\n        Load the image as a PIL Image\n        \"\"\"\n        return Image.open(io.BytesIO(self.bytes))\n\n    def save(self, path: str):\n        \"\"\"\n        Save the generated image to a local path\n        \"\"\"\n        suffix = Path(self.url).suffix\n        if not path.endswith(suffix):\n            path = path + suffix\n        Path(path).resolve().parent.mkdir(parents=True, exist_ok=True)\n        with open(path, \"wb\") as file:\n            file.write(self.bytes)\n\n\nif __name__ == \"__main__\":\n    from fire import Fire\n\n    Fire(ImageRequest)\n"
  },
  {
    "path": "flux-ToCa/src/flux/cli.py",
    "content": "import os\nimport re\nimport time\nfrom dataclasses import dataclass\nfrom glob import iglob\n\nimport torch\nfrom fire import Fire\nfrom transformers import pipeline\n\nfrom flux.sampling import denoise, get_noise, get_schedule, prepare, unpack\nfrom flux.ideas import denoise_cache\nfrom flux.util import configs, load_ae, load_clip, load_flow_model, load_t5, save_image\n\nNSFW_THRESHOLD = 0.85\n\n\n@dataclass\nclass SamplingOptions:\n    prompt: str\n    width: int\n    height: int\n    num_steps: int\n    guidance: float\n    seed: int | None\n\n\ndef parse_prompt(options: SamplingOptions) -> SamplingOptions | None:\n    user_question = \"Next prompt (write /h for help, /q to quit and leave empty to repeat):\\n\"\n    usage = (\n        \"Usage: Either write your prompt directly, leave this field empty \"\n        \"to repeat the prompt or write a command starting with a slash:\\n\"\n        \"- '/w <width>' will set the width of the generated image\\n\"\n        \"- '/h <height>' will set the height of the generated image\\n\"\n        \"- '/s <seed>' sets the next seed\\n\"\n        \"- '/g <guidance>' sets the guidance (flux-dev only)\\n\"\n        \"- '/n <steps>' sets the number of steps\\n\"\n        \"- '/q' to quit\"\n    )\n\n    while (prompt := input(user_question)).startswith(\"/\"):\n        if prompt.startswith(\"/w\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, width = prompt.split()\n            options.width = 16 * (int(width) // 16)\n            print(\n                f\"Setting resolution to {options.width} x {options.height} \"\n                f\"({options.height *options.width/1e6:.2f}MP)\"\n            )\n        elif prompt.startswith(\"/h\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, height = prompt.split()\n            options.height = 16 * (int(height) // 16)\n            print(\n                f\"Setting resolution to {options.width} x {options.height} \"\n                f\"({options.height *options.width/1e6:.2f}MP)\"\n            )\n        elif prompt.startswith(\"/g\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, guidance = prompt.split()\n            options.guidance = float(guidance)\n            print(f\"Setting guidance to {options.guidance}\")\n        elif prompt.startswith(\"/s\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, seed = prompt.split()\n            options.seed = int(seed)\n            print(f\"Setting seed to {options.seed}\")\n        elif prompt.startswith(\"/n\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, steps = prompt.split()\n            options.num_steps = int(steps)\n            print(f\"Setting number of steps to {options.num_steps}\")\n        elif prompt.startswith(\"/q\"):\n            print(\"Quitting\")\n            return None\n        else:\n            if not prompt.startswith(\"/h\"):\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n            print(usage)\n    if prompt != \"\":\n        options.prompt = prompt\n    return options\n\n\n@torch.inference_mode()\ndef main(\n    name: str = \"flux-schnell\",\n    width: int = 1360,\n    height: int = 768,\n    seed: int | None = None,\n    prompt: str = (\n        \"a photo of a forest with mist swirling around the tree trunks. The word \"\n        '\"FLUX\" is painted over it in big, red brush strokes with visible texture'\n    ),\n    device: str = \"cuda\" if torch.cuda.is_available() else \"cpu\",\n    num_steps: int | None = None,\n    loop: bool = False,\n    guidance: float = 3.5,\n    offload: bool = False,\n    output_dir: str = \"output\",\n    add_sampling_metadata: bool = True,\n):\n    \"\"\"\n    Sample the flux model. Either interactively (set `--loop`) or run for a\n    single image.\n\n    Args:\n        name: Name of the model to load\n        height: height of the sample in pixels (should be a multiple of 16)\n        width: width of the sample in pixels (should be a multiple of 16)\n        seed: Set a seed for sampling\n        output_name: where to save the output image, `{idx}` will be replaced\n            by the index of the sample\n        prompt: Prompt used for sampling\n        device: Pytorch device\n        num_steps: number of sampling steps (default 4 for schnell, 50 for guidance distilled)\n        loop: start an interactive session and sample multiple times\n        guidance: guidance value used for guidance distillation\n        add_sampling_metadata: Add the prompt to the image Exif metadata\n    \"\"\"\n    nsfw_classifier = pipeline(\"image-classification\", model=\"Falconsai/nsfw_image_detection\", device=device)\n\n    if name not in configs:\n        available = \", \".join(configs.keys())\n        raise ValueError(f\"Got unknown model name: {name}, chose from {available}\")\n\n    torch_device = torch.device(device)\n    if num_steps is None:\n        num_steps = 4 if name == \"flux-schnell\" else 50\n\n    # allow for packing and conversion to latent space\n    height = 16 * (height // 16)\n    width = 16 * (width // 16)\n\n    output_name = os.path.join(output_dir, \"img_{idx}.jpg\")\n    if not os.path.exists(output_dir):\n        os.makedirs(output_dir)\n        idx = 0\n    else:\n        fns = [fn for fn in iglob(output_name.format(idx=\"*\")) if re.search(r\"img_[0-9]+\\.jpg$\", fn)]\n        if len(fns) > 0:\n            idx = max(int(fn.split(\"_\")[-1].split(\".\")[0]) for fn in fns) + 1\n        else:\n            idx = 0\n\n    # init all components\n    t5 = load_t5(torch_device, max_length=256 if name == \"flux-schnell\" else 512)\n    clip = load_clip(torch_device)\n    model = load_flow_model(name, device=\"cpu\" if offload else torch_device)\n    ae = load_ae(name, device=\"cpu\" if offload else torch_device)\n\n    rng = torch.Generator(device=\"cpu\")\n    opts = SamplingOptions(\n        prompt=prompt,\n        width=width,\n        height=height,\n        num_steps=num_steps,\n        guidance=guidance,\n        seed=seed,\n    )\n\n    if loop:\n        opts = parse_prompt(opts)\n\n    while opts is not None:\n        if opts.seed is None:\n            opts.seed = rng.seed()\n        print(f\"Generating with seed {opts.seed}:\\n{opts.prompt}\")\n        t0 = time.perf_counter()\n\n        # prepare input\n        x = get_noise(\n            1,\n            opts.height,\n            opts.width,\n            device=torch_device,\n            dtype=torch.bfloat16,\n            seed=opts.seed,\n        )\n        opts.seed = None\n        if offload:\n            ae = ae.cpu()\n            torch.cuda.empty_cache()\n            t5, clip = t5.to(torch_device), clip.to(torch_device)\n        inp = prepare(t5, clip, x, prompt=opts.prompt)\n        timesteps = get_schedule(opts.num_steps, inp[\"img\"].shape[1], shift=(name != \"flux-schnell\"))\n\n        # offload TEs to CPU, load model to gpu\n        if offload:\n            t5, clip = t5.cpu(), clip.cpu()\n            torch.cuda.empty_cache()\n            model = model.to(torch_device)\n\n        # denoise initial noise\n        x = denoise_cache(model, **inp, timesteps=timesteps, guidance=opts.guidance)\n\n        # offload model, load autoencoder to gpu\n        if offload:\n            model.cpu()\n            torch.cuda.empty_cache()\n            ae.decoder.to(x.device)\n\n        # decode latents to pixel space\n        x = unpack(x.float(), opts.height, opts.width)\n        with torch.autocast(device_type=torch_device.type, dtype=torch.bfloat16):\n            x = ae.decode(x)\n\n        if torch.cuda.is_available():\n            torch.cuda.synchronize()\n        t1 = time.perf_counter()\n\n        fn = output_name.format(idx=idx)\n        print(f\"Done in {t1 - t0:.1f}s. Saving {fn}\")\n\n        idx = save_image(nsfw_classifier, name, output_name, idx, x, add_sampling_metadata, prompt)\n\n        if loop:\n            print(\"-\" * 80)\n            opts = parse_prompt(opts)\n        else:\n            opts = None\n\n\ndef app():\n    Fire(main)\n\n\nif __name__ == \"__main__\":\n    app()\n"
  },
  {
    "path": "flux-ToCa/src/flux/cli_control.py",
    "content": "import os\nimport re\nimport time\nfrom dataclasses import dataclass\nfrom glob import iglob\n\nimport torch\nfrom fire import Fire\nfrom transformers import pipeline\n\nfrom flux.modules.image_embedders import CannyImageEncoder, DepthImageEncoder\nfrom flux.sampling import denoise, get_noise, get_schedule, prepare_control, unpack\nfrom flux.ideas import denoise_cache\nfrom flux.util import configs, load_ae, load_clip, load_flow_model, load_t5, save_image\n\n\n@dataclass\nclass SamplingOptions:\n    prompt: str\n    width: int\n    height: int\n    num_steps: int\n    guidance: float\n    seed: int | None\n    img_cond_path: str\n    lora_scale: float | None\n\n\ndef parse_prompt(options: SamplingOptions) -> SamplingOptions | None:\n    user_question = \"Next prompt (write /h for help, /q to quit and leave empty to repeat):\\n\"\n    usage = (\n        \"Usage: Either write your prompt directly, leave this field empty \"\n        \"to repeat the prompt or write a command starting with a slash:\\n\"\n        \"- '/w <width>' will set the width of the generated image\\n\"\n        \"- '/h <height>' will set the height of the generated image\\n\"\n        \"- '/s <seed>' sets the next seed\\n\"\n        \"- '/g <guidance>' sets the guidance (flux-dev only)\\n\"\n        \"- '/n <steps>' sets the number of steps\\n\"\n        \"- '/q' to quit\"\n    )\n\n    while (prompt := input(user_question)).startswith(\"/\"):\n        if prompt.startswith(\"/w\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, width = prompt.split()\n            options.width = 16 * (int(width) // 16)\n            print(\n                f\"Setting resolution to {options.width} x {options.height} \"\n                f\"({options.height *options.width/1e6:.2f}MP)\"\n            )\n        elif prompt.startswith(\"/h\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, height = prompt.split()\n            options.height = 16 * (int(height) // 16)\n            print(\n                f\"Setting resolution to {options.width} x {options.height} \"\n                f\"({options.height *options.width/1e6:.2f}MP)\"\n            )\n        elif prompt.startswith(\"/g\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, guidance = prompt.split()\n            options.guidance = float(guidance)\n            print(f\"Setting guidance to {options.guidance}\")\n        elif prompt.startswith(\"/s\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, seed = prompt.split()\n            options.seed = int(seed)\n            print(f\"Setting seed to {options.seed}\")\n        elif prompt.startswith(\"/n\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, steps = prompt.split()\n            options.num_steps = int(steps)\n            print(f\"Setting number of steps to {options.num_steps}\")\n        elif prompt.startswith(\"/q\"):\n            print(\"Quitting\")\n            return None\n        else:\n            if not prompt.startswith(\"/h\"):\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n            print(usage)\n    if prompt != \"\":\n        options.prompt = prompt\n    return options\n\n\ndef parse_img_cond_path(options: SamplingOptions | None) -> SamplingOptions | None:\n    if options is None:\n        return None\n\n    user_question = \"Next conditioning image (write /h for help, /q to quit and leave empty to repeat):\\n\"\n    usage = (\n        \"Usage: Either write your prompt directly, leave this field empty \"\n        \"to repeat the conditioning image or write a command starting with a slash:\\n\"\n        \"- '/q' to quit\"\n    )\n\n    while True:\n        img_cond_path = input(user_question)\n\n        if img_cond_path.startswith(\"/\"):\n            if img_cond_path.startswith(\"/q\"):\n                print(\"Quitting\")\n                return None\n            else:\n                if not img_cond_path.startswith(\"/h\"):\n                    print(f\"Got invalid command '{img_cond_path}'\\n{usage}\")\n                print(usage)\n            continue\n\n        if img_cond_path == \"\":\n            break\n\n        if not os.path.isfile(img_cond_path) or not img_cond_path.lower().endswith(\n            (\".jpg\", \".jpeg\", \".png\", \".webp\")\n        ):\n            print(f\"File '{img_cond_path}' does not exist or is not a valid image file\")\n            continue\n\n        options.img_cond_path = img_cond_path\n        break\n\n    return options\n\n\ndef parse_lora_scale(options: SamplingOptions | None) -> tuple[SamplingOptions | None, bool]:\n    changed = False\n\n    if options is None:\n        return None, changed\n\n    user_question = \"Next lora scale (write /h for help, /q to quit and leave empty to repeat):\\n\"\n    usage = (\n        \"Usage: Either write your prompt directly, leave this field empty \"\n        \"to repeat the lora scale or write a command starting with a slash:\\n\"\n        \"- '/q' to quit\"\n    )\n\n    while (prompt := input(user_question)).startswith(\"/\"):\n        if prompt.startswith(\"/q\"):\n            print(\"Quitting\")\n            return None, changed\n        else:\n            if not prompt.startswith(\"/h\"):\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n            print(usage)\n    if prompt != \"\":\n        options.lora_scale = float(prompt)\n        changed = True\n    return options, changed\n\n\n@torch.inference_mode()\ndef main(\n    name: str,\n    width: int = 1024,\n    height: int = 1024,\n    seed: int | None = None,\n    prompt: str = \"a robot made out of gold\",\n    device: str = \"cuda\" if torch.cuda.is_available() else \"cpu\",\n    num_steps: int = 50,\n    loop: bool = False,\n    guidance: float | None = None,\n    offload: bool = False,\n    output_dir: str = \"output\",\n    add_sampling_metadata: bool = True,\n    img_cond_path: str = \"assets/robot.webp\",\n    lora_scale: float | None = 0.85,\n):\n    \"\"\"\n    Sample the flux model. Either interactively (set `--loop`) or run for a\n    single image.\n\n    Args:\n        height: height of the sample in pixels (should be a multiple of 16)\n        width: width of the sample in pixels (should be a multiple of 16)\n        seed: Set a seed for sampling\n        output_name: where to save the output image, `{idx}` will be replaced\n            by the index of the sample\n        prompt: Prompt used for sampling\n        device: Pytorch device\n        num_steps: number of sampling steps (default 4 for schnell, 50 for guidance distilled)\n        loop: start an interactive session and sample multiple times\n        guidance: guidance value used for guidance distillation\n        add_sampling_metadata: Add the prompt to the image Exif metadata\n        img_cond_path: path to conditioning image (jpeg/png/webp)\n    \"\"\"\n    nsfw_classifier = pipeline(\"image-classification\", model=\"Falconsai/nsfw_image_detection\", device=device)\n\n    assert name in [\n        \"flux-dev-canny\",\n        \"flux-dev-depth\",\n        \"flux-dev-canny-lora\",\n        \"flux-dev-depth-lora\",\n    ], f\"Got unknown model name: {name}\"\n    if guidance is None:\n        if name in [\"flux-dev-canny\", \"flux-dev-canny-lora\"]:\n            guidance = 30.0\n        elif name in [\"flux-dev-depth\", \"flux-dev-depth-lora\"]:\n            guidance = 10.0\n        else:\n            raise NotImplementedError()\n\n    if name not in configs:\n        available = \", \".join(configs.keys())\n        raise ValueError(f\"Got unknown model name: {name}, chose from {available}\")\n\n    torch_device = torch.device(device)\n\n    output_name = os.path.join(output_dir, \"img_{idx}.jpg\")\n    if not os.path.exists(output_dir):\n        os.makedirs(output_dir)\n        idx = 0\n    else:\n        fns = [fn for fn in iglob(output_name.format(idx=\"*\")) if re.search(r\"img_[0-9]+\\.jpg$\", fn)]\n        if len(fns) > 0:\n            idx = max(int(fn.split(\"_\")[-1].split(\".\")[0]) for fn in fns) + 1\n        else:\n            idx = 0\n\n    # init all components\n    t5 = load_t5(torch_device, max_length=512)\n    clip = load_clip(torch_device)\n    model = load_flow_model(name, device=\"cpu\" if offload else torch_device)\n    ae = load_ae(name, device=\"cpu\" if offload else torch_device)\n\n    # set lora scale\n    if \"lora\" in name and lora_scale is not None:\n        for _, module in model.named_modules():\n            if hasattr(module, \"set_scale\"):\n                module.set_scale(lora_scale)\n\n    if name in [\"flux-dev-depth\", \"flux-dev-depth-lora\"]:\n        img_embedder = DepthImageEncoder(torch_device)\n    elif name in [\"flux-dev-canny\", \"flux-dev-canny-lora\"]:\n        img_embedder = CannyImageEncoder(torch_device)\n    else:\n        raise NotImplementedError()\n\n    rng = torch.Generator(device=\"cpu\")\n    opts = SamplingOptions(\n        prompt=prompt,\n        width=width,\n        height=height,\n        num_steps=num_steps,\n        guidance=guidance,\n        seed=seed,\n        img_cond_path=img_cond_path,\n        lora_scale=lora_scale,\n    )\n\n    if loop:\n        opts = parse_prompt(opts)\n        opts = parse_img_cond_path(opts)\n        if \"lora\" in name:\n            opts, changed = parse_lora_scale(opts)\n            if changed:\n                # update the lora scale:\n                for _, module in model.named_modules():\n                    if hasattr(module, \"set_scale\"):\n                        module.set_scale(opts.lora_scale)\n\n    while opts is not None:\n        if opts.seed is None:\n            opts.seed = rng.seed()\n        print(f\"Generating with seed {opts.seed}:\\n{opts.prompt}\")\n        t0 = time.perf_counter()\n\n        # prepare input\n        x = get_noise(\n            1,\n            opts.height,\n            opts.width,\n            device=torch_device,\n            dtype=torch.bfloat16,\n            seed=opts.seed,\n        )\n        opts.seed = None\n        if offload:\n            t5, clip, ae = t5.to(torch_device), clip.to(torch_device), ae.to(torch_device)\n        inp = prepare_control(\n            t5,\n            clip,\n            x,\n            prompt=opts.prompt,\n            ae=ae,\n            encoder=img_embedder,\n            img_cond_path=opts.img_cond_path,\n        )\n        timesteps = get_schedule(opts.num_steps, inp[\"img\"].shape[1], shift=(name != \"flux-schnell\"))\n\n        # offload TEs and AE to CPU, load model to gpu\n        if offload:\n            t5, clip, ae = t5.cpu(), clip.cpu(), ae.cpu()\n            torch.cuda.empty_cache()\n            model = model.to(torch_device)\n\n        # denoise initial noise\n        x = denoise_cache(model, **inp, timesteps=timesteps, guidance=opts.guidance)\n\n        # offload model, load autoencoder to gpu\n        if offload:\n            model.cpu()\n            torch.cuda.empty_cache()\n            ae.decoder.to(x.device)\n\n        # decode latents to pixel space\n        x = unpack(x.float(), opts.height, opts.width)\n        with torch.autocast(device_type=torch_device.type, dtype=torch.bfloat16):\n            x = ae.decode(x)\n\n        if torch.cuda.is_available():\n            torch.cuda.synchronize()\n        t1 = time.perf_counter()\n        print(f\"Done in {t1 - t0:.1f}s\")\n\n        idx = save_image(nsfw_classifier, name, output_name, idx, x, add_sampling_metadata, prompt)\n\n        if loop:\n            print(\"-\" * 80)\n            opts = parse_prompt(opts)\n            opts = parse_img_cond_path(opts)\n            if \"lora\" in name:\n                opts, changed = parse_lora_scale(opts)\n                if changed:\n                    # update the lora scale:\n                    for _, module in model.named_modules():\n                        if hasattr(module, \"set_scale\"):\n                            module.set_scale(opts.lora_scale)\n        else:\n            opts = None\n\n\ndef app():\n    Fire(main)\n\n\nif __name__ == \"__main__\":\n    app()\n"
  },
  {
    "path": "flux-ToCa/src/flux/cli_fill.py",
    "content": "import os\nimport re\nimport time\nfrom dataclasses import dataclass\nfrom glob import iglob\n\nimport torch\nfrom fire import Fire\nfrom PIL import Image\nfrom transformers import pipeline\n\nfrom flux.sampling import denoise, get_noise, get_schedule, prepare_fill, unpack\nfrom flux.ideas import denoise_cache\nfrom flux.util import configs, load_ae, load_clip, load_flow_model, load_t5, save_image\n\n\n@dataclass\nclass SamplingOptions:\n    prompt: str\n    width: int\n    height: int\n    num_steps: int\n    guidance: float\n    seed: int | None\n    img_cond_path: str\n    img_mask_path: str\n\n\ndef parse_prompt(options: SamplingOptions) -> SamplingOptions | None:\n    user_question = \"Next prompt (write /h for help, /q to quit and leave empty to repeat):\\n\"\n    usage = (\n        \"Usage: Either write your prompt directly, leave this field empty \"\n        \"to repeat the prompt or write a command starting with a slash:\\n\"\n        \"- '/s <seed>' sets the next seed\\n\"\n        \"- '/g <guidance>' sets the guidance (flux-dev only)\\n\"\n        \"- '/n <steps>' sets the number of steps\\n\"\n        \"- '/q' to quit\"\n    )\n\n    while (prompt := input(user_question)).startswith(\"/\"):\n        if prompt.startswith(\"/g\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, guidance = prompt.split()\n            options.guidance = float(guidance)\n            print(f\"Setting guidance to {options.guidance}\")\n        elif prompt.startswith(\"/s\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, seed = prompt.split()\n            options.seed = int(seed)\n            print(f\"Setting seed to {options.seed}\")\n        elif prompt.startswith(\"/n\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, steps = prompt.split()\n            options.num_steps = int(steps)\n            print(f\"Setting number of steps to {options.num_steps}\")\n        elif prompt.startswith(\"/q\"):\n            print(\"Quitting\")\n            return None\n        else:\n            if not prompt.startswith(\"/h\"):\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n            print(usage)\n    if prompt != \"\":\n        options.prompt = prompt\n    return options\n\n\ndef parse_img_cond_path(options: SamplingOptions | None) -> SamplingOptions | None:\n    if options is None:\n        return None\n\n    user_question = \"Next conditioning image (write /h for help, /q to quit and leave empty to repeat):\\n\"\n    usage = (\n        \"Usage: Either write your prompt directly, leave this field empty \"\n        \"to repeat the conditioning image or write a command starting with a slash:\\n\"\n        \"- '/q' to quit\"\n    )\n\n    while True:\n        img_cond_path = input(user_question)\n\n        if img_cond_path.startswith(\"/\"):\n            if img_cond_path.startswith(\"/q\"):\n                print(\"Quitting\")\n                return None\n            else:\n                if not img_cond_path.startswith(\"/h\"):\n                    print(f\"Got invalid command '{img_cond_path}'\\n{usage}\")\n                print(usage)\n            continue\n\n        if img_cond_path == \"\":\n            break\n\n        if not os.path.isfile(img_cond_path) or not img_cond_path.lower().endswith(\n            (\".jpg\", \".jpeg\", \".png\", \".webp\")\n        ):\n            print(f\"File '{img_cond_path}' does not exist or is not a valid image file\")\n            continue\n        else:\n            with Image.open(img_cond_path) as img:\n                width, height = img.size\n\n            if width % 32 != 0 or height % 32 != 0:\n                print(f\"Image dimensions must be divisible by 32, got {width}x{height}\")\n                continue\n\n        options.img_cond_path = img_cond_path\n        break\n\n    return options\n\n\ndef parse_img_mask_path(options: SamplingOptions | None) -> SamplingOptions | None:\n    if options is None:\n        return None\n\n    user_question = \"Next conditioning mask (write /h for help, /q to quit and leave empty to repeat):\\n\"\n    usage = (\n        \"Usage: Either write your prompt directly, leave this field empty \"\n        \"to repeat the conditioning mask or write a command starting with a slash:\\n\"\n        \"- '/q' to quit\"\n    )\n\n    while True:\n        img_mask_path = input(user_question)\n\n        if img_mask_path.startswith(\"/\"):\n            if img_mask_path.startswith(\"/q\"):\n                print(\"Quitting\")\n                return None\n            else:\n                if not img_mask_path.startswith(\"/h\"):\n                    print(f\"Got invalid command '{img_mask_path}'\\n{usage}\")\n                print(usage)\n            continue\n\n        if img_mask_path == \"\":\n            break\n\n        if not os.path.isfile(img_mask_path) or not img_mask_path.lower().endswith(\n            (\".jpg\", \".jpeg\", \".png\", \".webp\")\n        ):\n            print(f\"File '{img_mask_path}' does not exist or is not a valid image file\")\n            continue\n        else:\n            with Image.open(img_mask_path) as img:\n                width, height = img.size\n\n            if width % 32 != 0 or height % 32 != 0:\n                print(f\"Image dimensions must be divisible by 32, got {width}x{height}\")\n                continue\n            else:\n                with Image.open(options.img_cond_path) as img_cond:\n                    img_cond_width, img_cond_height = img_cond.size\n\n                if width != img_cond_width or height != img_cond_height:\n                    print(\n                        f\"Mask dimensions must match conditioning image, got {width}x{height} and {img_cond_width}x{img_cond_height}\"\n                    )\n                    continue\n\n        options.img_mask_path = img_mask_path\n        break\n\n    return options\n\n\n@torch.inference_mode()\ndef main(\n    seed: int | None = None,\n    prompt: str = \"a white paper cup\",\n    device: str = \"cuda\" if torch.cuda.is_available() else \"cpu\",\n    num_steps: int = 50,\n    loop: bool = False,\n    guidance: float = 30.0,\n    offload: bool = False,\n    output_dir: str = \"output\",\n    add_sampling_metadata: bool = True,\n    img_cond_path: str = \"assets/cup.png\",\n    img_mask_path: str = \"assets/cup_mask.png\",\n):\n    \"\"\"\n    Sample the flux model. Either interactively (set `--loop`) or run for a\n    single image. This demo assumes that the conditioning image and mask have\n    the same shape and that height and width are divisible by 32.\n\n    Args:\n        seed: Set a seed for sampling\n        output_name: where to save the output image, `{idx}` will be replaced\n            by the index of the sample\n        prompt: Prompt used for sampling\n        device: Pytorch device\n        num_steps: number of sampling steps (default 4 for schnell, 50 for guidance distilled)\n        loop: start an interactive session and sample multiple times\n        guidance: guidance value used for guidance distillation\n        add_sampling_metadata: Add the prompt to the image Exif metadata\n        img_cond_path: path to conditioning image (jpeg/png/webp)\n        img_mask_path: path to conditioning mask (jpeg/png/webp\n    \"\"\"\n    nsfw_classifier = pipeline(\"image-classification\", model=\"Falconsai/nsfw_image_detection\", device=device)\n\n    name = \"flux-dev-fill\"\n    if name not in configs:\n        available = \", \".join(configs.keys())\n        raise ValueError(f\"Got unknown model name: {name}, chose from {available}\")\n\n    torch_device = torch.device(device)\n\n    output_name = os.path.join(output_dir, \"img_{idx}.jpg\")\n    if not os.path.exists(output_dir):\n        os.makedirs(output_dir)\n        idx = 0\n    else:\n        fns = [fn for fn in iglob(output_name.format(idx=\"*\")) if re.search(r\"img_[0-9]+\\.jpg$\", fn)]\n        if len(fns) > 0:\n            idx = max(int(fn.split(\"_\")[-1].split(\".\")[0]) for fn in fns) + 1\n        else:\n            idx = 0\n\n    # init all components\n    t5 = load_t5(torch_device, max_length=128)\n    clip = load_clip(torch_device)\n    model = load_flow_model(name, device=\"cpu\" if offload else torch_device)\n    ae = load_ae(name, device=\"cpu\" if offload else torch_device)\n\n    rng = torch.Generator(device=\"cpu\")\n    with Image.open(img_cond_path) as img:\n        width, height = img.size\n    opts = SamplingOptions(\n        prompt=prompt,\n        width=width,\n        height=height,\n        num_steps=num_steps,\n        guidance=guidance,\n        seed=seed,\n        img_cond_path=img_cond_path,\n        img_mask_path=img_mask_path,\n    )\n\n    if loop:\n        opts = parse_prompt(opts)\n        opts = parse_img_cond_path(opts)\n\n        with Image.open(opts.img_cond_path) as img:\n            width, height = img.size\n        opts.height = height\n        opts.width = width\n\n        opts = parse_img_mask_path(opts)\n\n    while opts is not None:\n        if opts.seed is None:\n            opts.seed = rng.seed()\n        print(f\"Generating with seed {opts.seed}:\\n{opts.prompt}\")\n        t0 = time.perf_counter()\n\n        # prepare input\n        x = get_noise(\n            1,\n            opts.height,\n            opts.width,\n            device=torch_device,\n            dtype=torch.bfloat16,\n            seed=opts.seed,\n        )\n        opts.seed = None\n        if offload:\n            t5, clip, ae = t5.to(torch_device), clip.to(torch_device), ae.to(torch_device)\n        inp = prepare_fill(\n            t5,\n            clip,\n            x,\n            prompt=opts.prompt,\n            ae=ae,\n            img_cond_path=opts.img_cond_path,\n            mask_path=opts.img_mask_path,\n        )\n\n        timesteps = get_schedule(opts.num_steps, inp[\"img\"].shape[1], shift=(name != \"flux-schnell\"))\n\n        # offload TEs and AE to CPU, load model to gpu\n        if offload:\n            t5, clip, ae = t5.cpu(), clip.cpu(), ae.cpu()\n            torch.cuda.empty_cache()\n            model = model.to(torch_device)\n\n        # denoise initial noise\n        x = denoise_cache(model, **inp, timesteps=timesteps, guidance=opts.guidance)\n\n        # offload model, load autoencoder to gpu\n        if offload:\n            model.cpu()\n            torch.cuda.empty_cache()\n            ae.decoder.to(x.device)\n\n        # decode latents to pixel space\n        x = unpack(x.float(), opts.height, opts.width)\n        with torch.autocast(device_type=torch_device.type, dtype=torch.bfloat16):\n            x = ae.decode(x)\n\n        if torch.cuda.is_available():\n            torch.cuda.synchronize()\n        t1 = time.perf_counter()\n        print(f\"Done in {t1 - t0:.1f}s\")\n\n        idx = save_image(nsfw_classifier, name, output_name, idx, x, add_sampling_metadata, prompt)\n\n        if loop:\n            print(\"-\" * 80)\n            opts = parse_prompt(opts)\n            opts = parse_img_cond_path(opts)\n\n            with Image.open(opts.img_cond_path) as img:\n                width, height = img.size\n            opts.height = height\n            opts.width = width\n\n            opts = parse_img_mask_path(opts)\n        else:\n            opts = None\n\n\ndef app():\n    Fire(main)\n\n\nif __name__ == \"__main__\":\n    app()\n"
  },
  {
    "path": "flux-ToCa/src/flux/cli_redux.py",
    "content": "import os\nimport re\nimport time\nfrom dataclasses import dataclass\nfrom glob import iglob\n\nimport torch\nfrom fire import Fire\nfrom transformers import pipeline\n\nfrom flux.modules.image_embedders import ReduxImageEncoder\nfrom flux.sampling import denoise, get_noise, get_schedule, prepare_redux, unpack\nfrom flux.ideas import denoise_cache\nfrom flux.util import configs, load_ae, load_clip, load_flow_model, load_t5, save_image\n\n\n@dataclass\nclass SamplingOptions:\n    prompt: str\n    width: int\n    height: int\n    num_steps: int\n    guidance: float\n    seed: int | None\n    img_cond_path: str\n\n\ndef parse_prompt(options: SamplingOptions) -> SamplingOptions | None:\n    user_question = \"Write /h for help, /q to quit and leave empty to repeat):\\n\"\n    usage = (\n        \"Usage: Leave this field empty to do nothing \"\n        \"or write a command starting with a slash:\\n\"\n        \"- '/w <width>' will set the width of the generated image\\n\"\n        \"- '/h <height>' will set the height of the generated image\\n\"\n        \"- '/s <seed>' sets the next seed\\n\"\n        \"- '/g <guidance>' sets the guidance (flux-dev only)\\n\"\n        \"- '/n <steps>' sets the number of steps\\n\"\n        \"- '/q' to quit\"\n    )\n\n    while (prompt := input(user_question)).startswith(\"/\"):\n        if prompt.startswith(\"/w\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, width = prompt.split()\n            options.width = 16 * (int(width) // 16)\n            print(\n                f\"Setting resolution to {options.width} x {options.height} \"\n                f\"({options.height *options.width/1e6:.2f}MP)\"\n            )\n        elif prompt.startswith(\"/h\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, height = prompt.split()\n            options.height = 16 * (int(height) // 16)\n            print(\n                f\"Setting resolution to {options.width} x {options.height} \"\n                f\"({options.height *options.width/1e6:.2f}MP)\"\n            )\n        elif prompt.startswith(\"/g\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, guidance = prompt.split()\n            options.guidance = float(guidance)\n            print(f\"Setting guidance to {options.guidance}\")\n        elif prompt.startswith(\"/s\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, seed = prompt.split()\n            options.seed = int(seed)\n            print(f\"Setting seed to {options.seed}\")\n        elif prompt.startswith(\"/n\"):\n            if prompt.count(\" \") != 1:\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n                continue\n            _, steps = prompt.split()\n            options.num_steps = int(steps)\n            print(f\"Setting number of steps to {options.num_steps}\")\n        elif prompt.startswith(\"/q\"):\n            print(\"Quitting\")\n            return None\n        else:\n            if not prompt.startswith(\"/h\"):\n                print(f\"Got invalid command '{prompt}'\\n{usage}\")\n            print(usage)\n    return options\n\n\ndef parse_img_cond_path(options: SamplingOptions | None) -> SamplingOptions | None:\n    if options is None:\n        return None\n\n    user_question = \"Next conditioning image (write /h for help, /q to quit and leave empty to repeat):\\n\"\n    usage = (\n        \"Usage: Either write your prompt directly, leave this field empty \"\n        \"to repeat the conditioning image or write a command starting with a slash:\\n\"\n        \"- '/q' to quit\"\n    )\n\n    while True:\n        img_cond_path = input(user_question)\n\n        if img_cond_path.startswith(\"/\"):\n            if img_cond_path.startswith(\"/q\"):\n                print(\"Quitting\")\n                return None\n            else:\n                if not img_cond_path.startswith(\"/h\"):\n                    print(f\"Got invalid command '{img_cond_path}'\\n{usage}\")\n                print(usage)\n            continue\n\n        if img_cond_path == \"\":\n            break\n\n        if not os.path.isfile(img_cond_path) or not img_cond_path.lower().endswith(\n            (\".jpg\", \".jpeg\", \".png\", \".webp\")\n        ):\n            print(f\"File '{img_cond_path}' does not exist or is not a valid image file\")\n            continue\n\n        options.img_cond_path = img_cond_path\n        break\n\n    return options\n\n\n@torch.inference_mode()\ndef main(\n    name: str = \"flux-dev\",\n    width: int = 1360,\n    height: int = 768,\n    seed: int | None = None,\n    device: str = \"cuda\" if torch.cuda.is_available() else \"cpu\",\n    num_steps: int | None = None,\n    loop: bool = False,\n    guidance: float = 2.5,\n    offload: bool = False,\n    output_dir: str = \"output\",\n    add_sampling_metadata: bool = True,\n    img_cond_path: str = \"assets/robot.webp\",\n):\n    \"\"\"\n    Sample the flux model. Either interactively (set `--loop`) or run for a\n    single image.\n\n    Args:\n        name: Name of the model to load\n        height: height of the sample in pixels (should be a multiple of 16)\n        width: width of the sample in pixels (should be a multiple of 16)\n        seed: Set a seed for sampling\n        output_name: where to save the output image, `{idx}` will be replaced\n            by the index of the sample\n        prompt: Prompt used for sampling\n        device: Pytorch device\n        num_steps: number of sampling steps (default 4 for schnell, 50 for guidance distilled)\n        loop: start an interactive session and sample multiple times\n        guidance: guidance value used for guidance distillation\n        add_sampling_metadata: Add the prompt to the image Exif metadata\n        img_cond_path: path to conditioning image (jpeg/png/webp)\n    \"\"\"\n    nsfw_classifier = pipeline(\"image-classification\", model=\"Falconsai/nsfw_image_detection\", device=device)\n\n    if name not in configs:\n        available = \", \".join(configs.keys())\n        raise ValueError(f\"Got unknown model name: {name}, chose from {available}\")\n\n    torch_device = torch.device(device)\n    if num_steps is None:\n        num_steps = 4 if name == \"flux-schnell\" else 50\n\n    output_name = os.path.join(output_dir, \"img_{idx}.jpg\")\n    if not os.path.exists(output_dir):\n        os.makedirs(output_dir)\n        idx = 0\n    else:\n        fns = [fn for fn in iglob(output_name.format(idx=\"*\")) if re.search(r\"img_[0-9]+\\.jpg$\", fn)]\n        if len(fns) > 0:\n            idx = max(int(fn.split(\"_\")[-1].split(\".\")[0]) for fn in fns) + 1\n        else:\n            idx = 0\n\n    # init all components\n    t5 = load_t5(torch_device, max_length=256 if name == \"flux-schnell\" else 512)\n    clip = load_clip(torch_device)\n    model = load_flow_model(name, device=\"cpu\" if offload else torch_device)\n    ae = load_ae(name, device=\"cpu\" if offload else torch_device)\n    img_embedder = ReduxImageEncoder(torch_device)\n\n    rng = torch.Generator(device=\"cpu\")\n    prompt = \"\"\n    opts = SamplingOptions(\n        prompt=prompt,\n        width=width,\n        height=height,\n        num_steps=num_steps,\n        guidance=guidance,\n        seed=seed,\n        img_cond_path=img_cond_path,\n    )\n\n    if loop:\n        opts = parse_prompt(opts)\n        opts = parse_img_cond_path(opts)\n\n    while opts is not None:\n        if opts.seed is None:\n            opts.seed = rng.seed()\n        print(f\"Generating with seed {opts.seed}:\\n{opts.prompt}\")\n        t0 = time.perf_counter()\n\n        # prepare input\n        x = get_noise(\n            1,\n            opts.height,\n            opts.width,\n            device=torch_device,\n            dtype=torch.bfloat16,\n            seed=opts.seed,\n        )\n        opts.seed = None\n        if offload:\n            ae = ae.cpu()\n            torch.cuda.empty_cache()\n            t5, clip = t5.to(torch_device), clip.to(torch_device)\n        inp = prepare_redux(\n            t5,\n            clip,\n            x,\n            prompt=opts.prompt,\n            encoder=img_embedder,\n            img_cond_path=opts.img_cond_path,\n        )\n        timesteps = get_schedule(opts.num_steps, inp[\"img\"].shape[1], shift=(name != \"flux-schnell\"))\n\n        # offload TEs to CPU, load model to gpu\n        if offload:\n            t5, clip = t5.cpu(), clip.cpu()\n            torch.cuda.empty_cache()\n            model = model.to(torch_device)\n\n        # denoise initial noise\n        x = denoise_cache(model, **inp, timesteps=timesteps, guidance=opts.guidance)\n\n        # offload model, load autoencoder to gpu\n        if offload:\n            model.cpu()\n            torch.cuda.empty_cache()\n            ae.decoder.to(x.device)\n\n        # decode latents to pixel space\n        x = unpack(x.float(), opts.height, opts.width)\n        with torch.autocast(device_type=torch_device.type, dtype=torch.bfloat16):\n            x = ae.decode(x)\n\n        if torch.cuda.is_available():\n            torch.cuda.synchronize()\n        t1 = time.perf_counter()\n        print(f\"Done in {t1 - t0:.1f}s\")\n\n        idx = save_image(nsfw_classifier, name, output_name, idx, x, add_sampling_metadata, prompt)\n\n        if loop:\n            print(\"-\" * 80)\n            opts = parse_prompt(opts)\n            opts = parse_img_cond_path(opts)\n        else:\n            opts = None\n\n\ndef app():\n    Fire(main)\n\n\nif __name__ == \"__main__\":\n    app()\n"
  },
  {
    "path": "flux-ToCa/src/flux/ideas/__init__.py",
    "content": "from .cache_denoise import denoise_cache"
  },
  {
    "path": "flux-ToCa/src/flux/ideas/cache_denoise.py",
    "content": "import torch\nfrom ..model import Flux\nfrom torch import Tensor\nfrom ..modules.cache_functions import cache_init\n\ndef denoise_cache(\n    model: Flux,\n    # model input\n    img: Tensor,\n    img_ids: Tensor,\n    txt: Tensor,\n    txt_ids: Tensor,\n    vec: Tensor,\n    # sampling parameters\n    timesteps: list[float],\n    guidance: float = 4.0,\n):  \n    # init cache\n    cache_dic, current = cache_init(timesteps)\n    # this is ignored for schnell\n    guidance_vec = torch.full((img.shape[0],), guidance, device=img.device, dtype=img.dtype)\n    current['step']=0\n    current['num_steps'] = len(timesteps)-1\n    for t_curr, t_prev in zip(timesteps[:-1], timesteps[1:]):\n        t_vec = torch.full((img.shape[0],), t_curr, dtype=img.dtype, device=img.device)\n        current['t'] = t_curr\n        #print(t_curr)\n        pred = model(\n            img=img,\n            img_ids=img_ids,\n            txt=txt,\n            txt_ids=txt_ids,\n            y=vec,\n            timesteps=t_vec,\n            cache_dic = cache_dic,\n            current = current,\n            guidance=guidance_vec,\n        )\n        #print(img.shape)\n        img = img + (t_prev - t_curr) * pred\n        current['step'] += 1\n\n    return img\n"
  },
  {
    "path": "flux-ToCa/src/flux/math.py",
    "content": "import torch\nfrom einops import rearrange\nfrom torch import Tensor\n\n\ndef attention(q: Tensor, k: Tensor, v: Tensor, pe: Tensor, **kwargs) -> Tensor:\n    \n    cache_dic = kwargs.get('cache_dic', None)\n    current = kwargs.get('current', None)     \n\n    q, k = apply_rope(q, k, pe)\n    \n    if cache_dic is None:\n        x, score = dot_product_attention(q, k, v)\n        #x = torch.nn.functional.scaled_dot_product_attention(q, k, v)\n    elif cache_dic['cache_type'] == 'attention':\n        x, score = dot_product_attention(q, k, v)\n        cache_dic['attn_map'][-1][current['stream']][current['layer']]['total'] = score\n    else:\n        #x = torch.nn.functional.scaled_dot_product_attention(q, k, v)\n        x, score = dot_product_attention(q, k, v) # if you are testing the FLOPs, should change to dot_product_attention\n    x = rearrange(x, \"B H L D -> B L (H D)\")\n\n    return x\n\ndef rope(pos: Tensor, dim: int, theta: int) -> Tensor:\n    assert dim % 2 == 0\n    scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim\n    omega = 1.0 / (theta**scale)\n    out = torch.einsum(\"...n,d->...nd\", pos, omega)\n    out = torch.stack([torch.cos(out), -torch.sin(out), torch.sin(out), torch.cos(out)], dim=-1)\n    out = rearrange(out, \"b n d (i j) -> b n d i j\", i=2, j=2)\n    return out.float()\n\n\ndef apply_rope(xq: Tensor, xk: Tensor, freqs_cis: Tensor) -> tuple[Tensor, Tensor]:\n    xq_ = xq.float().reshape(*xq.shape[:-1], -1, 1, 2)\n    xk_ = xk.float().reshape(*xk.shape[:-1], -1, 1, 2)\n    xq_out = freqs_cis[..., 0] * xq_[..., 0] + freqs_cis[..., 1] * xq_[..., 1]\n    xk_out = freqs_cis[..., 0] * xk_[..., 0] + freqs_cis[..., 1] * xk_[..., 1]\n    return xq_out.reshape(*xq.shape).type_as(xq), xk_out.reshape(*xk.shape).type_as(xk)\n\n############################################################################################################\n\nimport math\n\ndef dot_product_attention(query, key, value, attn_mask=None, dropout_p=0.0,\n        is_causal=False, scale=None, enable_gqa=False) -> torch.Tensor | torch.Tensor:\n    L, S = query.size(-2), key.size(-2)\n    scale_factor = 1 / math.sqrt(query.size(-1)) if scale is None else scale\n    attn_bias = torch.zeros(L, S, dtype=query.dtype, device=query.device)\n    if is_causal:\n        assert attn_mask is None\n        temp_mask = torch.ones(L, S, dtype=torch.bool).tril(diagonal=0)\n        attn_bias.masked_fill_(temp_mask.logical_not(), float(\"-inf\"))\n        attn_bias.to(query.dtype)\n\n    if attn_mask is not None:\n        if attn_mask.dtype == torch.bool:\n            attn_bias.masked_fill_(attn_mask.logical_not(), float(\"-inf\"))\n        else:\n            attn_bias += attn_mask\n\n    if enable_gqa:\n        key = key.repeat_interleave(query.size(-3)//key.size(-3), -3)\n        value = value.repeat_interleave(query.size(-3)//value.size(-3), -3)\n\n    #attn_weight = query @ key.transpose(-2, -1) * scale_factor\n    attn_weight = torch.matmul(query, key.transpose(-2, -1))* scale_factor\n    attn_weight += attn_bias\n    \n    #attn_weight = torch.softmax(attn_weight, dim=-1)\n    #attn_weight = torch.dropout(attn_weight, dropout_p, train=True)\n#\n    #return torch.matmul(attn_weight, value)\n\n    attn_map = torch.softmax(attn_weight, dim=-1)\n    attn_weight = torch.dropout(attn_map, dropout_p, train=True)\n    #return attn_weight @ value, attn_map.mean(dim=1).mean(dim=1) \n    return torch.matmul(attn_weight, value), attn_map.mean(dim=1).mean(dim=1) "
  },
  {
    "path": "flux-ToCa/src/flux/model.py",
    "content": "from dataclasses import dataclass\n\nimport torch\nfrom torch import Tensor, nn\n\nfrom flux.modules.layers import (\n    DoubleStreamBlock,\n    EmbedND,\n    LastLayer,\n    MLPEmbedder,\n    SingleStreamBlock,\n    timestep_embedding,\n)\nfrom flux.modules.lora import LinearLora, replace_linear_with_lora\nfrom flux.modules.cache_functions import cal_type\n\n@dataclass\nclass FluxParams:\n    in_channels: int\n    out_channels: int\n    vec_in_dim: int\n    context_in_dim: int\n    hidden_size: int\n    mlp_ratio: float\n    num_heads: int\n    depth: int\n    depth_single_blocks: int\n    axes_dim: list[int]\n    theta: int\n    qkv_bias: bool\n    guidance_embed: bool\n\n\nclass Flux(nn.Module):\n    \"\"\"\n    Transformer model for flow matching on sequences.\n    \"\"\"\n\n    def __init__(self, params: FluxParams):\n        super().__init__()\n\n        self.params = params\n        self.in_channels = params.in_channels\n        self.out_channels = params.out_channels\n        if params.hidden_size % params.num_heads != 0:\n            raise ValueError(\n                f\"Hidden size {params.hidden_size} must be divisible by num_heads {params.num_heads}\"\n            )\n        pe_dim = params.hidden_size // params.num_heads\n        if sum(params.axes_dim) != pe_dim:\n            raise ValueError(f\"Got {params.axes_dim} but expected positional dim {pe_dim}\")\n        self.hidden_size = params.hidden_size\n        self.num_heads = params.num_heads\n        self.pe_embedder = EmbedND(dim=pe_dim, theta=params.theta, axes_dim=params.axes_dim)\n        self.img_in = nn.Linear(self.in_channels, self.hidden_size, bias=True)\n        self.time_in = MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size)\n        self.vector_in = MLPEmbedder(params.vec_in_dim, self.hidden_size)\n        self.guidance_in = (\n            MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size) if params.guidance_embed else nn.Identity()\n        )\n        self.txt_in = nn.Linear(params.context_in_dim, self.hidden_size)\n\n        self.double_blocks = nn.ModuleList(\n            [\n                DoubleStreamBlock(\n                    self.hidden_size,\n                    self.num_heads,\n                    mlp_ratio=params.mlp_ratio,\n                    qkv_bias=params.qkv_bias,\n                )\n                for _ in range(params.depth)\n            ]\n        )\n\n        self.single_blocks = nn.ModuleList(\n            [\n                SingleStreamBlock(self.hidden_size, self.num_heads, mlp_ratio=params.mlp_ratio)\n                for _ in range(params.depth_single_blocks)\n            ]\n        )\n\n        self.final_layer = LastLayer(self.hidden_size, 1, self.out_channels)\n\n    def forward(\n        self,\n        img: Tensor,\n        img_ids: Tensor,\n        txt: Tensor,\n        txt_ids: Tensor,\n        timesteps: Tensor,\n        y: Tensor,\n        guidance: Tensor | None = None,\n        *args,\n        **kwargs,\n    ) -> Tensor:\n        if img.ndim != 3 or txt.ndim != 3:\n            raise ValueError(\"Input img and txt tensors must have 3 dimensions.\")\n        \n        cache_dic = kwargs.get('cache_dic', None)\n        current = kwargs.get('current', None)\n        \n        # running on sequences img\n        img = self.img_in(img)\n        vec = self.time_in(timestep_embedding(timesteps, 256))\n        if self.params.guidance_embed:\n            if guidance is None:\n                raise ValueError(\"Didn't get guidance strength for guidance distilled model.\")\n            vec = vec + self.guidance_in(timestep_embedding(guidance, 256))\n        vec = vec + self.vector_in(y)\n        txt = self.txt_in(txt)\n\n        ids = torch.cat((txt_ids, img_ids), dim=1)\n        pe = self.pe_embedder(ids)\n\n        cal_type(cache_dic=cache_dic, current=current)\n\n        for i, block in enumerate(self.double_blocks):\n            current['layer'] = i\n            img, txt = block(img=img, txt=txt, vec=vec, pe=pe, cache_dic=cache_dic, current=current)\n\n        img = torch.cat((txt, img), 1)\n        for i, block in enumerate(self.single_blocks):\n            current['layer'] = i\n            img = block(img, vec=vec, pe=pe, cache_dic=cache_dic, current=current)\n        img = img[:, txt.shape[1] :, ...]\n\n        img = self.final_layer(img, vec)  # (N, T, patch_size ** 2 * out_channels)\n        return img\n\n\nclass FluxLoraWrapper(Flux):\n    def __init__(\n        self,\n        lora_rank: int = 128,\n        lora_scale: float = 1.0,\n        *args,\n        **kwargs,\n    ) -> None:\n        super().__init__(*args, **kwargs)\n\n        self.lora_rank = lora_rank\n\n        replace_linear_with_lora(\n            self,\n            max_rank=lora_rank,\n            scale=lora_scale,\n        )\n\n    def set_lora_scale(self, scale: float) -> None:\n        for module in self.modules():\n            if isinstance(module, LinearLora):\n                module.set_scale(scale=scale)\n"
  },
  {
    "path": "flux-ToCa/src/flux/modules/autoencoder.py",
    "content": "from dataclasses import dataclass\n\nimport torch\nfrom einops import rearrange\nfrom torch import Tensor, nn\n\n\n@dataclass\nclass AutoEncoderParams:\n    resolution: int\n    in_channels: int\n    ch: int\n    out_ch: int\n    ch_mult: list[int]\n    num_res_blocks: int\n    z_channels: int\n    scale_factor: float\n    shift_factor: float\n\n\ndef swish(x: Tensor) -> Tensor:\n    return x * torch.sigmoid(x)\n\n\nclass AttnBlock(nn.Module):\n    def __init__(self, in_channels: int):\n        super().__init__()\n        self.in_channels = in_channels\n\n        self.norm = nn.GroupNorm(num_groups=32, num_channels=in_channels, eps=1e-6, affine=True)\n\n        self.q = nn.Conv2d(in_channels, in_channels, kernel_size=1)\n        self.k = nn.Conv2d(in_channels, in_channels, kernel_size=1)\n        self.v = nn.Conv2d(in_channels, in_channels, kernel_size=1)\n        self.proj_out = nn.Conv2d(in_channels, in_channels, kernel_size=1)\n\n    def attention(self, h_: Tensor) -> Tensor:\n        h_ = self.norm(h_)\n        q = self.q(h_)\n        k = self.k(h_)\n        v = self.v(h_)\n\n        b, c, h, w = q.shape\n        q = rearrange(q, \"b c h w -> b 1 (h w) c\").contiguous()\n        k = rearrange(k, \"b c h w -> b 1 (h w) c\").contiguous()\n        v = rearrange(v, \"b c h w -> b 1 (h w) c\").contiguous()\n        h_ = nn.functional.scaled_dot_product_attention(q, k, v)\n\n        return rearrange(h_, \"b 1 (h w) c -> b c h w\", h=h, w=w, c=c, b=b)\n\n    def forward(self, x: Tensor) -> Tensor:\n        return x + self.proj_out(self.attention(x))\n\n\nclass ResnetBlock(nn.Module):\n    def __init__(self, in_channels: int, out_channels: int):\n        super().__init__()\n        self.in_channels = in_channels\n        out_channels = in_channels if out_channels is None else out_channels\n        self.out_channels = out_channels\n\n        self.norm1 = nn.GroupNorm(num_groups=32, num_channels=in_channels, eps=1e-6, affine=True)\n        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=1)\n        self.norm2 = nn.GroupNorm(num_groups=32, num_channels=out_channels, eps=1e-6, affine=True)\n        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)\n        if self.in_channels != self.out_channels:\n            self.nin_shortcut = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0)\n\n    def forward(self, x):\n        h = x\n        h = self.norm1(h)\n        h = swish(h)\n        h = self.conv1(h)\n\n        h = self.norm2(h)\n        h = swish(h)\n        h = self.conv2(h)\n\n        if self.in_channels != self.out_channels:\n            x = self.nin_shortcut(x)\n\n        return x + h\n\n\nclass Downsample(nn.Module):\n    def __init__(self, in_channels: int):\n        super().__init__()\n        # no asymmetric padding in torch conv, must do it ourselves\n        self.conv = nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=2, padding=0)\n\n    def forward(self, x: Tensor):\n        pad = (0, 1, 0, 1)\n        x = nn.functional.pad(x, pad, mode=\"constant\", value=0)\n        x = self.conv(x)\n        return x\n\n\nclass Upsample(nn.Module):\n    def __init__(self, in_channels: int):\n        super().__init__()\n        self.conv = nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1)\n\n    def forward(self, x: Tensor):\n        x = nn.functional.interpolate(x, scale_factor=2.0, mode=\"nearest\")\n        x = self.conv(x)\n        return x\n\n\nclass Encoder(nn.Module):\n    def __init__(\n        self,\n        resolution: int,\n        in_channels: int,\n        ch: int,\n        ch_mult: list[int],\n        num_res_blocks: int,\n        z_channels: int,\n    ):\n        super().__init__()\n        self.ch = ch\n        self.num_resolutions = len(ch_mult)\n        self.num_res_blocks = num_res_blocks\n        self.resolution = resolution\n        self.in_channels = in_channels\n        # downsampling\n        self.conv_in = nn.Conv2d(in_channels, self.ch, kernel_size=3, stride=1, padding=1)\n\n        curr_res = resolution\n        in_ch_mult = (1,) + tuple(ch_mult)\n        self.in_ch_mult = in_ch_mult\n        self.down = nn.ModuleList()\n        block_in = self.ch\n        for i_level in range(self.num_resolutions):\n            block = nn.ModuleList()\n            attn = nn.ModuleList()\n            block_in = ch * in_ch_mult[i_level]\n            block_out = ch * ch_mult[i_level]\n            for _ in range(self.num_res_blocks):\n                block.append(ResnetBlock(in_channels=block_in, out_channels=block_out))\n                block_in = block_out\n            down = nn.Module()\n            down.block = block\n            down.attn = attn\n            if i_level != self.num_resolutions - 1:\n                down.downsample = Downsample(block_in)\n                curr_res = curr_res // 2\n            self.down.append(down)\n\n        # middle\n        self.mid = nn.Module()\n        self.mid.block_1 = ResnetBlock(in_channels=block_in, out_channels=block_in)\n        self.mid.attn_1 = AttnBlock(block_in)\n        self.mid.block_2 = ResnetBlock(in_channels=block_in, out_channels=block_in)\n\n        # end\n        self.norm_out = nn.GroupNorm(num_groups=32, num_channels=block_in, eps=1e-6, affine=True)\n        self.conv_out = nn.Conv2d(block_in, 2 * z_channels, kernel_size=3, stride=1, padding=1)\n\n    def forward(self, x: Tensor) -> Tensor:\n        # downsampling\n        hs = [self.conv_in(x)]\n        for i_level in range(self.num_resolutions):\n            for i_block in range(self.num_res_blocks):\n                h = self.down[i_level].block[i_block](hs[-1])\n                if len(self.down[i_level].attn) > 0:\n                    h = self.down[i_level].attn[i_block](h)\n                hs.append(h)\n            if i_level != self.num_resolutions - 1:\n                hs.append(self.down[i_level].downsample(hs[-1]))\n\n        # middle\n        h = hs[-1]\n        h = self.mid.block_1(h)\n        h = self.mid.attn_1(h)\n        h = self.mid.block_2(h)\n        # end\n        h = self.norm_out(h)\n        h = swish(h)\n        h = self.conv_out(h)\n        return h\n\n\nclass Decoder(nn.Module):\n    def __init__(\n        self,\n        ch: int,\n        out_ch: int,\n        ch_mult: list[int],\n        num_res_blocks: int,\n        in_channels: int,\n        resolution: int,\n        z_channels: int,\n    ):\n        super().__init__()\n        self.ch = ch\n        self.num_resolutions = len(ch_mult)\n        self.num_res_blocks = num_res_blocks\n        self.resolution = resolution\n        self.in_channels = in_channels\n        self.ffactor = 2 ** (self.num_resolutions - 1)\n\n        # compute in_ch_mult, block_in and curr_res at lowest res\n        block_in = ch * ch_mult[self.num_resolutions - 1]\n        curr_res = resolution // 2 ** (self.num_resolutions - 1)\n        self.z_shape = (1, z_channels, curr_res, curr_res)\n\n        # z to block_in\n        self.conv_in = nn.Conv2d(z_channels, block_in, kernel_size=3, stride=1, padding=1)\n\n        # middle\n        self.mid = nn.Module()\n        self.mid.block_1 = ResnetBlock(in_channels=block_in, out_channels=block_in)\n        self.mid.attn_1 = AttnBlock(block_in)\n        self.mid.block_2 = ResnetBlock(in_channels=block_in, out_channels=block_in)\n\n        # upsampling\n        self.up = nn.ModuleList()\n        for i_level in reversed(range(self.num_resolutions)):\n            block = nn.ModuleList()\n            attn = nn.ModuleList()\n            block_out = ch * ch_mult[i_level]\n            for _ in range(self.num_res_blocks + 1):\n                block.append(ResnetBlock(in_channels=block_in, out_channels=block_out))\n                block_in = block_out\n            up = nn.Module()\n            up.block = block\n            up.attn = attn\n            if i_level != 0:\n                up.upsample = Upsample(block_in)\n                curr_res = curr_res * 2\n            self.up.insert(0, up)  # prepend to get consistent order\n\n        # end\n        self.norm_out = nn.GroupNorm(num_groups=32, num_channels=block_in, eps=1e-6, affine=True)\n        self.conv_out = nn.Conv2d(block_in, out_ch, kernel_size=3, stride=1, padding=1)\n\n    def forward(self, z: Tensor) -> Tensor:\n        # z to block_in\n        h = self.conv_in(z)\n\n        # middle\n        h = self.mid.block_1(h)\n        h = self.mid.attn_1(h)\n        h = self.mid.block_2(h)\n\n        # upsampling\n        for i_level in reversed(range(self.num_resolutions)):\n            for i_block in range(self.num_res_blocks + 1):\n                h = self.up[i_level].block[i_block](h)\n                if len(self.up[i_level].attn) > 0:\n                    h = self.up[i_level].attn[i_block](h)\n            if i_level != 0:\n                h = self.up[i_level].upsample(h)\n\n        # end\n        h = self.norm_out(h)\n        h = swish(h)\n        h = self.conv_out(h)\n        return h\n\n\nclass DiagonalGaussian(nn.Module):\n    def __init__(self, sample: bool = True, chunk_dim: int = 1):\n        super().__init__()\n        self.sample = sample\n        self.chunk_dim = chunk_dim\n\n    def forward(self, z: Tensor) -> Tensor:\n        mean, logvar = torch.chunk(z, 2, dim=self.chunk_dim)\n        if self.sample:\n            std = torch.exp(0.5 * logvar)\n            return mean + std * torch.randn_like(mean)\n        else:\n            return mean\n\n\nclass AutoEncoder(nn.Module):\n    def __init__(self, params: AutoEncoderParams):\n        super().__init__()\n        self.encoder = Encoder(\n            resolution=params.resolution,\n            in_channels=params.in_channels,\n            ch=params.ch,\n            ch_mult=params.ch_mult,\n            num_res_blocks=params.num_res_blocks,\n            z_channels=params.z_channels,\n        )\n        self.decoder = Decoder(\n            resolution=params.resolution,\n            in_channels=params.in_channels,\n            ch=params.ch,\n            out_ch=params.out_ch,\n            ch_mult=params.ch_mult,\n            num_res_blocks=params.num_res_blocks,\n            z_channels=params.z_channels,\n        )\n        self.reg = DiagonalGaussian()\n\n        self.scale_factor = params.scale_factor\n        self.shift_factor = params.shift_factor\n\n    def encode(self, x: Tensor) -> Tensor:\n        z = self.reg(self.encoder(x))\n        z = self.scale_factor * (z - self.shift_factor)\n        return z\n\n    def decode(self, z: Tensor) -> Tensor:\n        z = z / self.scale_factor + self.shift_factor\n        return self.decoder(z)\n\n    def forward(self, x: Tensor) -> Tensor:\n        return self.decode(self.encode(x))\n"
  },
  {
    "path": "flux-ToCa/src/flux/modules/cache_functions/__init__.py",
    "content": "from .cache_cutfresh import cache_cutfresh\nfrom .fresh_ratio_scheduler import fresh_ratio_scheduler\nfrom .score_evaluate import score_evaluate\nfrom .global_force_fresh import global_force_fresh\nfrom .cache_cutfresh import cache_cutfresh\nfrom .update_cache import update_cache\nfrom .force_init import force_init\nfrom .attention import cached_attention_forward\nfrom .cache_init import cache_init\nfrom .cal_type import cal_type\nfrom .force_scheduler import force_scheduler\nfrom .support_set_selection import support_set_selection"
  },
  {
    "path": "flux-ToCa/src/flux/modules/cache_functions/attention.py",
    "content": "# Besides, re-arrange the attention module\nfrom torch.jit import Final\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom typing import Optional, Union\n#from xformers.ops.fmha.attn_bias import BlockDiagonalMask\ndef cached_attention_forward(\n    query: torch.Tensor,\n    key: torch.Tensor,\n    value: torch.Tensor,\n    #attn_bias: Optional[Union[torch.Tensor, BlockDiagonalMask]] = None,\n    attn_bias,\n    p: float = 0.0,\n    scale: Optional[float] = None\n) -> torch.Tensor:\n    scale = 1.0 / query.shape[-1] ** 0.5\n    query = query * scale\n    query = query.transpose(1, 2)\n    key = key.transpose(1, 2)\n    value = value.transpose(1, 2)\n    attn = query @ key.transpose(-2, -1)\n    if attn_bias is not None:\n        attn_bias = attn_bias.materialize(shape= attn.shape, dtype= attn.dtype, device= attn.device)\n        attn = attn + attn_bias\n    #out_map = attn\n    attn_map = attn.softmax(-1)\n    attn = F.dropout(attn_map, p)\n    attn = attn @ value\n\n    return attn.transpose(1, 2).contiguous(), attn_map.mean(dim=1)"
  },
  {
    "path": "flux-ToCa/src/flux/modules/cache_functions/cache_cutfresh.py",
    "content": "from .fresh_ratio_scheduler import fresh_ratio_scheduler\nfrom .score_evaluate import score_evaluate\n#from .token_merge import token_merge\nfrom .support_set_selection import support_set_selection\nimport torch\ndef cache_cutfresh(cache_dic, tokens, current):\n    '''\n    Cut fresh tokens from the input tokens and update the cache counter.\n    \n    cache_dic: dict, the cache dictionary containing cache(main extra memory cost), indices and some other information.\n    tokens: torch.Tensor, the input tokens to be cut.\n    current: dict, the current step, layer, and module information. Particularly convenient for debugging.\n    '''\n    step = current['step']\n    layer = current['layer']\n    stream = current['stream']\n    module = current['module']\n    \n    fresh_ratio = fresh_ratio_scheduler(cache_dic, current)\n    fresh_ratio = torch.clamp(torch.tensor(fresh_ratio, device = tokens.device), min=0, max=1)\n    \n    # Generate the index tensor for fresh tokens\n    score = score_evaluate(cache_dic, tokens, current) # s1, s2, s3 mentioned in the paper\n    #score = local_selection_with_bonus(score, 0.4, 4) # Uniform Spatial Distribution s4 mentioned in the paper\n    indices = score.argsort(dim=-1, descending=True)\n    topk = int(fresh_ratio * score.shape[1])\n    fresh_indices = indices[:, :topk]\n    stale_indices = indices[:, topk:]\n\n    #fresh_indices = support_set_selection(tokens, fresh_ratio, 0.4, current, cache_dic) # (B, fresh_ratio * N) # 0.4\n\n    # (B, fresh_ratio *N)\n\n    # Updating the Cache Frequency Score s3 mentioned in the paper\n    # stale tokens index + 1 in each ***module***, fresh tokens index = 0\n    cache_dic['cache_index'][-1][layer][module] += 1\n    cache_dic['cache_index'][-1][layer][module].scatter_(dim=1, index=fresh_indices, \n                                                                    src = torch.zeros_like(fresh_indices, dtype=torch.int, device=fresh_indices.device))\n    #cache_dic['cache_index']['layer_index'][module] += 1\n    #cache_dic['cache_index']['layer_index'][module].scatter_(dim=1, index=fresh_indices, \n    #                                                                src = torch.zeros_like(fresh_indices, dtype=torch.int, device=fresh_indices.device))\n    \n    fresh_indices_expand = fresh_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1])\n\n    fresh_tokens = torch.gather(input = tokens, dim = 1, index = fresh_indices_expand)\n    return fresh_indices, fresh_tokens\n    \ndef local_selection_with_bonus(score, bonus_ratio, grid_size=2):\n    batch_size, num_tokens = score.shape\n    image_size = int(num_tokens ** 0.5)\n    block_size = grid_size * grid_size\n    \n    assert num_tokens % block_size == 0, \"The number of tokens must be divisible by the block size.\"\n    \n    # Step 1: Reshape score to group it by blocks\n    score_reshaped = score.view(batch_size, image_size // grid_size, grid_size, image_size // grid_size, grid_size)\n    score_reshaped = score_reshaped.permute(0, 1, 3, 2, 4).contiguous()\n    score_reshaped = score_reshaped.view(batch_size, -1, block_size)  # [batch_size, num_blocks, block_size]\n    \n    # Step 2: Find the max token in each block\n    max_scores, max_indices = score_reshaped.max(dim=-1, keepdim=True)  # [batch_size, num_blocks, 1]\n    \n    # Step 3: Create a mask to identify max score tokens\n    mask = torch.zeros_like(score_reshaped)\n    mask.scatter_(-1, max_indices, 1)  # Set mask to 1 at the max indices\n    \n    # Step 4: Apply the bonus only to the max score tokens\n    score_reshaped = score_reshaped + (mask * max_scores * bonus_ratio)  # Apply bonus only to max tokens\n    \n    # Step 5: Reshape the score back to its original shape\n    score_modified = score_reshaped.view(batch_size, image_size // grid_size, image_size // grid_size, grid_size, grid_size)\n    score_modified = score_modified.permute(0, 1, 3, 2, 4).contiguous()\n    score_modified = score_modified.view(batch_size, num_tokens)\n    \n    return score_modified"
  },
  {
    "path": "flux-ToCa/src/flux/modules/cache_functions/cache_init.py",
    "content": "def cache_init(timesteps, model_kwargs=None):   \n    '''\n    Initialization for cache.\n    '''\n    cache_dic = {}\n    cache = {}\n    cache_index = {}\n    cache[-1]={}\n    cache_index[-1]={}\n    cache_index['layer_index']={}\n    cache_dic['attn_map'] = {}\n    cache_dic['attn_map'][-1] = {}\n    cache_dic['attn_map'][-1]['double_stream'] = {}\n    cache_dic['attn_map'][-1]['single_stream'] = {}\n\n    cache_dic['k-norm'] = {}\n    cache_dic['k-norm'][-1] = {}\n    cache_dic['k-norm'][-1]['double_stream'] = {}\n    cache_dic['k-norm'][-1]['single_stream'] = {}\n\n    cache_dic['v-norm'] = {}\n    cache_dic['v-norm'][-1] = {}\n    cache_dic['v-norm'][-1]['double_stream'] = {}\n    cache_dic['v-norm'][-1]['single_stream'] = {}\n\n    cache_dic['cross_attn_map'] = {}\n    cache_dic['cross_attn_map'][-1] = {}\n    cache[-1]['double_stream']={}\n    cache[-1]['single_stream']={}\n    cache_dic['cache_counter'] = 0\n\n    for j in range(19):\n        cache[-1]['double_stream'][j] = {}\n        cache_index[-1][j] = {}\n        cache_dic['attn_map'][-1]['double_stream'][j] = {}\n        cache_dic['attn_map'][-1]['double_stream'][j]['total'] = {}\n        cache_dic['attn_map'][-1]['double_stream'][j]['txt_mlp'] = {}\n        cache_dic['attn_map'][-1]['double_stream'][j]['img_mlp'] = {}\n        \n        cache_dic['k-norm'][-1]['double_stream'][j] = {}\n        cache_dic['k-norm'][-1]['double_stream'][j]['txt_mlp'] = {}\n        cache_dic['k-norm'][-1]['double_stream'][j]['img_mlp'] = {}\n\n        cache_dic['v-norm'][-1]['double_stream'][j] = {}\n        cache_dic['v-norm'][-1]['double_stream'][j]['txt_mlp'] = {}\n        cache_dic['v-norm'][-1]['double_stream'][j]['img_mlp'] = {}\n\n    for j in range(38):\n        cache[-1]['single_stream'][j] = {}\n        cache_index[-1][j] = {}\n        cache_dic['attn_map'][-1]['single_stream'][j] = {}\n        cache_dic['attn_map'][-1]['single_stream'][j]['total'] = {}\n\n        cache_dic['k-norm'][-1]['single_stream'][j] = {}\n        cache_dic['k-norm'][-1]['single_stream'][j]['total'] = {}\n\n        cache_dic['v-norm'][-1]['single_stream'][j] = {}\n        cache_dic['v-norm'][-1]['single_stream'][j]['total'] = {}\n\n    mode = 'ToCa'\n    if mode == 'original':\n        cache_dic['cache_type'] = 'random'              # model_kwargs['cache_type'] # no use\n        cache_dic['cache_index'] = cache_index\n        cache_dic['cache'] = cache\n        cache_dic['fresh_ratio_schedule'] = 'ToCa'      # model_kwargs['ratio_scheduler']\n        cache_dic['fresh_ratio'] = 0.0                  # model_kwargs['fresh_ratio']\n        cache_dic['fresh_threshold'] = 1                # model_kwargs['fresh_threshold']\n        cache_dic['force_fresh'] = 'global'             # model_kwargs['force_fresh']\n        cache_dic['soft_fresh_weight'] = 0.0            # model_kwargs['soft_fresh_weight']\n    \n    elif mode == 'ToCa':\n        cache_dic['cache_type'] = 'attention'           # Attention cache type for ToCa, use Self-Attention Weight to evaluate the importance of each token\n        cache_dic['cache_index'] = cache_index\n        cache_dic['cache'] = cache\n        cache_dic['fresh_ratio_schedule'] = 'ToCa' \n        cache_dic['fresh_ratio'] = 0.1\n        cache_dic['fresh_threshold'] = 4\n        cache_dic['force_fresh'] = 'global' \n        cache_dic['soft_fresh_weight'] = 0.25\n        \n    current = {}\n    current['final_time'] = timesteps[-2]\n    return cache_dic, current\n"
  },
  {
    "path": "flux-ToCa/src/flux/modules/cache_functions/cal_type.py",
    "content": "from .force_scheduler import force_scheduler\n\ndef cal_type(cache_dic, current):\n    '''\n    Determine calculation type for this step\n    '''\n    if cache_dic['fresh_ratio'] == 0.0:\n        # FORA: Uniform\n        first_step = (current['step'] == 0)\n    else:\n        # ToCa: First 3 steps enhanced\n        first_step = (current['step'] <= 2)\n    \n    force_fresh = cache_dic['force_fresh']\n    if not first_step:\n        fresh_interval = cache_dic['cal_threshold']\n    else:\n        fresh_interval = cache_dic['fresh_threshold']\n\n    if (first_step) or (cache_dic['cache_counter'] == fresh_interval - 1 ):\n        current['type'] = 'full'\n        cache_dic['cache_counter'] = 0\n        force_scheduler(cache_dic, current)\n    \n    # ToCa\n    else:\n        cache_dic['cache_counter'] += 1\n        current['type'] = 'ToCa'\n\n######################################################################\n    #if (current['step'] in [3,2,1,0]):\n    #    current['type'] = 'full'"
  },
  {
    "path": "flux-ToCa/src/flux/modules/cache_functions/force_init.py",
    "content": "import torch\n\ndef force_init(cache_dic, current, tokens):\n    '''\n    Initialization for Force Activation step.\n    '''\n    cache_dic['cache_index'][-1][current['layer']][current['module']] = torch.zeros(tokens.shape[0], tokens.shape[1], dtype=torch.int, device=tokens.device)\n\n    #if current['layer'] == 0:\n    #    cache_dic['cache_index']['layer_index'][current['module']] = torch.zeros(tokens.shape[0], tokens.shape[1], dtype=torch.int, device=tokens.device)"
  },
  {
    "path": "flux-ToCa/src/flux/modules/cache_functions/force_scheduler.py",
    "content": "import torch\ndef force_scheduler(cache_dic, current):\n    if cache_dic['fresh_ratio'] == 0:\n        # FORA\n        linear_step_weight = 0.0\n    else: \n        # TokenCache\n        linear_step_weight = 0.0\n    step_factor = torch.tensor(1 - linear_step_weight + 2 * linear_step_weight * current['step'] / current['num_steps'])\n    threshold = torch.round(cache_dic['fresh_threshold'] / step_factor)\n\n    # no force constrain for sensitive steps, cause the performance is good enough.\n    # you may have a try.\n    \n    cache_dic['cal_threshold'] = threshold\n    #return threshold"
  },
  {
    "path": "flux-ToCa/src/flux/modules/cache_functions/fresh_ratio_scheduler.py",
    "content": "import torch\ndef fresh_ratio_scheduler(cache_dic, current):\n    '''\n    Return the fresh ratio for the current step.\n    '''\n    fresh_ratio = cache_dic['fresh_ratio']\n    fresh_ratio_schedule = cache_dic['fresh_ratio_schedule']\n    step = current['step']\n    num_steps = current['num_steps']\n    threshold = cache_dic['fresh_threshold']\n    weight = 0.9\n    if fresh_ratio_schedule == 'constant':\n        return fresh_ratio\n    elif fresh_ratio_schedule == 'linear':\n        return fresh_ratio * (1 + weight - 2 * weight * step / num_steps)\n    elif fresh_ratio_schedule == 'exp':\n        #return 0.5 * (0.052 ** (step/num_steps))\n        return fresh_ratio * (weight ** (step / num_steps))\n    elif fresh_ratio_schedule == 'linear-mode':\n        mode = (step % threshold)/threshold - 0.5\n        mode_weight = 0.1\n        return fresh_ratio * (1 + weight - 2 * weight * step / num_steps + mode_weight * mode)\n    elif fresh_ratio_schedule == 'layerwise':\n        return fresh_ratio * (1 + weight - 2 * weight * current['layer'] / 27)\n    elif fresh_ratio_schedule == 'linear-layerwise':\n        step_weight = -0.9 #0.9\n        step_factor = 1 - step_weight + 2 * step_weight * step / num_steps\n        #if current['layer'] == 2:\n        #    return 1.0\n        #sigmoid\n        #sigmoid_weight = 0.13\n        #layer_factor = 2 * torch.sigmoid(torch.tensor([sigmoid_weight * (13.5 - current['layer'])]))\n        layer_weight = 0.6\n        layer_factor = 1 + layer_weight - 2 * layer_weight * current['layer'] / 27\n\n        module_weight = 1.0 #TokenCache N=8 2.5 N=6 2.5 #N=4 2.1\n        module_time_weight = 0.6\n        module_factor = (1 - (1-module_time_weight) * module_weight) if current['module']=='cross-attn' else (1 + module_time_weight * module_weight)\n        \n        return fresh_ratio * layer_factor * step_factor * module_factor\n\n    elif fresh_ratio_schedule == 'ToCa':\n        step_weight = 0.0 #0.9\n        step_factor = 1 - step_weight + 2 * step_weight * step / num_steps\n\n        layer_weight = 0.5\n        layer_factor = 1 + layer_weight - 2 * layer_weight * current['layer'] / 27\n\n        #module_weight = 1.0\n        #module_time_weight = 0.6\n        # this means 60*x% cross-attn computation, and 160*x% mlp computation. This is designed for cross-attn has best temporal redundancy, and mlp has worse.\n        # so cross-attn compute less and mlp compute more.\n        #module_factor = (1 - (1-module_time_weight) * module_weight) if current['module']=='cross-attn' else (1 + module_time_weight * module_weight)\n        stream_weight = 0.6\n        stream_factor = (1 - stream_weight) if current['stream']=='double_stream' else (1 + stream_weight)\n        return fresh_ratio * layer_factor * step_factor * stream_factor #* module_factor\n\n    else:\n        raise ValueError(\"unrecognized fresh ratio schedule\", fresh_ratio_schedule)\n"
  },
  {
    "path": "flux-ToCa/src/flux/modules/cache_functions/global_force_fresh.py",
    "content": "from .force_scheduler import force_scheduler\ndef global_force_fresh(cache_dic, current):\n    '''\n    Return whether to force fresh tokens globally.\n    '''\n    first_step = (current['step'] == 0)\n    second_step = (current['step'] == 1)\n    force_fresh = cache_dic['force_fresh']\n    if not first_step:\n        fresh_threshold = cache_dic['cal_threshold']\n    else:\n        fresh_threshold = cache_dic['fresh_threshold']\n\n    if force_fresh == 'global':\n        return (first_step or (current['step']% fresh_threshold == 0))\n    elif force_fresh == 'local':\n        return first_step\n    elif force_fresh == 'none':\n        return first_step\n    else:\n        raise ValueError(\"unrecognized force fresh strategy\", force_fresh)"
  },
  {
    "path": "flux-ToCa/src/flux/modules/cache_functions/score_evaluate.py",
    "content": "import torch\nimport torch.nn as nn\nfrom .scores import attn_score, similarity_score, norm_score, k_norm_score, v_norm_score\ndef score_evaluate(cache_dic, tokens, current) -> torch.Tensor:\n    '''\n    Return the score tensor (B, N) for the given tokens.\n    '''\n\n    #if ((not current['is_force_fresh']) and (cache_dic['force_fresh'] == 'local')):\n    #    # abandoned branch, if you want to explore the local force fresh strategy, this may help.\n    #    force_fresh_mask = torch.as_tensor((cache_dic['cache_index'][-1][current['layer']][current['module']] >= 2 * cache_dic['fresh_threshold']), dtype = int) # 2 because the threshold is for step, not module\n    #    force_len = force_fresh_mask.sum(dim=1)\n    #    force_indices = force_fresh_mask.argsort(dim = -1, descending = True)[:, :force_len.min()]\n    #    force_indices = force_indices[:, torch.randperm(force_indices.shape[1])]\n\n    # Just see more explanation in the version of DiT-ToCa if needed.\n\n    if cache_dic['cache_type'] == 'random':\n        score = torch.rand(tokens.shape[0], tokens.shape[1], device=tokens.device)\n\n    elif cache_dic['cache_type'] == 'straight':\n        score = torch.ones(tokens.shape[0], tokens.shape[1]).to(tokens.device)\n    \n    elif cache_dic['cache_type'] == 'attention':\n        # cache_dic['attn_map'][step][layer] (B, N, N), the last dimention has get softmaxed\n        score = attn_score(cache_dic, current)\n        #score = score + 0.0 * torch.rand_like(score, device= score.device)\n    \n    elif cache_dic['cache_type'] == 'similarity':\n        score = similarity_score(cache_dic, current, tokens)\n\n    elif cache_dic['cache_type'] == 'norm':\n        score = norm_score(cache_dic, current, tokens)\n\n    elif cache_dic['cache_type'] == 'k-norm':\n        score = k_norm_score(cache_dic, current)\n\n    elif cache_dic['cache_type'] == 'v-norm':\n        score = v_norm_score(cache_dic, current)\n\n    elif cache_dic['cache_type'] == 'compress':\n        score1 = torch.rand(int(tokens.shape[0]*0.5), tokens.shape[1])\n        score1 = torch.cat([score1, score1], dim=0).to(tokens.device)\n        score2 = cache_dic['attn_map'][-1][current['layer']].sum(dim=1)#.mean(dim=0) # (B, N)\n        # normalize\n        score2 = score2 / score2.max(dim=1, keepdim=True)[0]\n        score = 0.5 * score1 + 0.5 * score2\n    \n    # abandoned the branch, if you want to explore the local force fresh strategy, this may help.\n    #if ((not current['is_force_fresh']) and (cache_dic['force_fresh'] == 'local')): # current['is_force_fresh'] is False, cause when it is True, no cut and fresh are needed\n    #        #print(torch.ones_like(force_indices, dtype=float, device=force_indices.device).dtype)\n    #    score.scatter_(dim=1, index=force_indices, src=torch.ones_like(force_indices, dtype=torch.float32, \n    #                                                                       device=force_indices.device))\n    \n    if (True and (cache_dic['force_fresh'] == 'global')):\n        soft_step_score = cache_dic['cache_index'][-1][current['layer']][current['module']].float() / (cache_dic['fresh_threshold'])\n        #soft_layer_score = cache_dic['cache_index']['layer_index'][current['module']].float() / (27)\n        score = score + cache_dic['soft_fresh_weight'] * soft_step_score #+ 0.1 *soft_layer_score\n    \n    return score.to(tokens.device)"
  },
  {
    "path": "flux-ToCa/src/flux/modules/cache_functions/scores.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\ndef attn_score(cache_dic, current):\n    #self_attn_score = 1- cache_dic['attn_map'][-1][current['layer']].diagonal(dim1=1, dim2=2)\n    #self_attn_score = F.normalize(self_attn_score, dim=1, p=2)\n    #attention_score = F.normalize(cache_dic['attn_map'][-1][current['layer']].sum(dim=1), dim=1, p=2)\n    #cross_attn_map = F.threshold(cache_dic['cross_attn_map'][-1][current['layer']],threshold=0.0, value=0.0)\n    #cross_attention_score = F.normalize(cross_attn_map.sum(dim=-1), dim=-1, p=2)\n\n    # Note: It is important to give a same selection method for cfg and no cfg.\n    # Because the influence of **Cross-Attention** in text-contidional models makes cfg and no cfg a BIG difference.\n\n    # Same selection for cfg and no cfg\n    #cond_cmap, uncond_cmap = torch.split(cache_dic['attn_map'][-1][current['layer']], len(cache_dic['cross_attn_map'][-1][current['layer']]) // 2, dim=0)\n    #cond_weight = 0.5\n    #cmap = cond_weight * cond_cmap + (1 - cond_weight) * uncond_cmap\n\n    ## Entropy score\n    #cross_attention_entropy = -torch.sum(cmap * torch.log(cmap + 1e-7), dim=-1)\n    #cross_attention_score   = F.normalize(1 + cross_attention_entropy, dim=1, p=2) # Note here \"1\" does not influence the sorted sequence, but provie stability.\n    #score = cross_attention_score.repeat(2, 1)\n    if current['stream'] == 'double_stream':\n        score = F.normalize(cache_dic['attn_map'][-1][current['stream']][current['layer']][current['module']], dim=-1, p=2)\n    elif current['stream'] == 'single_stream':\n        score = F.normalize(cache_dic['attn_map'][-1][current['stream']][current['layer']]['total'], dim=-1, p=2)\n\n    # You can try conbining the self_attention_score (s1) and cross_attention_score (s2) as the final score, there exists a balance.\n    #cross_weight = 0.0\n    #score =  (1-cross_weight) * attention_score + cross_weight * cross_attention_score\n    return score\n\ndef similarity_score(cache_dic, current, tokens):\n    cosine_sim = F.cosine_similarity(tokens, cache_dic['cache'][-1][current['layer']][current['module']], dim=-1)\n\n    return F.normalize(1- cosine_sim, dim=-1, p=2)\n\ndef norm_score(cache_dic, current, tokens):\n    norm = tokens.norm(dim=-1, p=2)\n    return F.normalize(norm, dim=-1, p=2)\n\ndef kv_norm_score(cache_dic, current):\n    # (B, N, num_heads)\n    #cond_k_norm, uncond_k_norm = torch.split(cache_dic['cache'][-1][current['layer']]['k_norm'], len(cache_dic['cache'][-1][current['layer']]['k_norm']) // 2, dim=0)\n    cond_v_norm, uncond_v_norm = torch.split(cache_dic['cache'][-1][current['layer']]['v_norm'], len(cache_dic['cache'][-1][current['layer']]['v_norm']) // 2, dim=0)\n    cond_weight = 0.5\n    #k_norm = cond_weight * cond_k_norm + (1 - cond_weight) * uncond_k_norm\n    v_norm = cond_weight * cond_v_norm + (1 - cond_weight) * uncond_v_norm\n    kv_norm = 1 -v_norm\n\n    ## 计算 (B/2, N) 张量在 N 维度上的每个元素与均值的绝对值差\n    #kv_norm_mean = kv_norm.mean(dim=-2, keepdim=True)\n    #kv_norm_diff = torch.abs(kv_norm - kv_norm_mean)\n    \n    return F.normalize(kv_norm.sum(dim=-1), p=2).repeat(2, 1)\n\ndef k_norm_score(cache_dic, current):\n    # (B, N)\n\n    if current['stream'] == 'double_stream':\n        score = F.normalize(cache_dic['k-norm'][-1][current['stream']][current['layer']][current['module']], dim=-1, p=2)\n    elif current['stream'] == 'single_stream':\n        score = F.normalize(cache_dic['k-norm'][-1][current['stream']][current['layer']]['total'], dim=-1, p=2)\n\n    return score\n\ndef v_norm_score(cache_dic, current):\n    # (B, N)\n\n    if current['stream'] == 'double_stream':\n        score = F.normalize(cache_dic['v-norm'][-1][current['stream']][current['layer']][current['module']], dim=-1, p=2)\n    elif current['stream'] == 'single_stream':\n        score = F.normalize(cache_dic['v-norm'][-1][current['stream']][current['layer']]['total'], dim=-1, p=2)\n\n    return score\n\n"
  },
  {
    "path": "flux-ToCa/src/flux/modules/cache_functions/support_set_selection.py",
    "content": "import torch\nfrom typing import Dict\n\ndef support_set_selection(x: torch.Tensor, fresh_ratio: float, base_ratio: float, current: Dict, cache_dic: Dict) -> torch.Tensor:\n    \n    #selection_start = 0\n    #\n    #if current['stream'] == 'single_stream':\n    #    # only select from the img tokens\n    #    x = x[:, cache_dic['txt_shape'] :]\n    #    selection_start = cache_dic['txt_shape']\n\n    B, N, H = x.shape\n    num_total = int(fresh_ratio * N)         # 最终每个 batch 选取的 token 数\n    base_count = int(base_ratio * num_total)  # 随机选取的 token 数\n    #base_count = 1\n    add_count = num_total - base_count  # 需要从候选集中选取的 token 数\n\n    # 1. 随机选取 (B, base_count) 个 token\n    random_indices = torch.randperm(N, device=x.device)\n    base_indices = random_indices[:base_count]\n    other_indices = random_indices[base_count:]\n\n    base_tokens = x.gather(dim=1, index=base_indices.unsqueeze(-1).expand(B, -1, H))\n    #other_tokens = x.gather(dim=1, index=other_indices.unsqueeze(-1).expand(-1, -1, H))\n\n    # 2. 计算余下 token 与已选 token 的相似度\n    \n    # normaize\n    base_tokens = base_tokens / base_tokens.norm(dim=-1, keepdim=True)\n    #other_tokens = other_tokens / other_tokens.norm(dim=-1, keepdim=True)\n    x_norm = x / x.norm(dim=-1, keepdim=True)\n\n    # 计算余下 token 与已选 token 的相似度\n    similarity = torch.einsum('bnd,bmd->bnm', base_tokens, x_norm)\n\n    # 计算每列最小值\n    min_similarity = similarity.min(dim=1).values\n    #min_similarity = similarity.max(dim=1).values\n\n    # 3. 选取相似度最小的 token\n    _, min_indices = min_similarity.topk(add_count, largest=False)\n    #_, min_indices = min_similarity.topk(add_count, largest=True)\n\n    # 4. 合并 base_indices 和 min_indices\n    #indices = torch.cat([base_indices, other_indices[min_indices]], dim=-1)\n    indices = torch.cat([base_indices.expand(B, -1), min_indices], dim=-1) #+ selection_start\n\n    return indices\n\n\n    "
  },
  {
    "path": "flux-ToCa/src/flux/modules/cache_functions/token_merge.py",
    "content": "import torch\ndef token_merge(cache_dic, tokens, current, fresh_indices, stale_indices):\n    '''\n    An abandoned branch in exploring if token merge helps. The answer is no, at least no for training-free strategy.\n    '''\n    if (current['layer'] % 1 == 0):\n        fresh_tokens = torch.gather(input = tokens, dim = 1, index = fresh_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1]))\n        stale_tokens = torch.gather(input = tokens, dim = 1, index = stale_indices.unsqueeze(-1).expand(-1, -1, tokens.shape[-1]))\n        method = 'similarity'\n        if method == 'distance':\n            descending = False\n            distance = torch.cdist(stale_tokens, fresh_tokens, p=1)\n            stale_fresh_dist, stale_fresh_indices_allstale = torch.min(distance, dim=2)\n        elif method == 'similarity':\n            descending = True\n            fresh_tokens = torch.nn.functional.normalize(fresh_tokens, p=2, dim=-1)\n            stale_tokens = torch.nn.functional.normalize(stale_tokens, p=2, dim=-1)\n            similarity = stale_tokens @ fresh_tokens.transpose(1, 2)\n            stale_fresh_dist, stale_fresh_indices_allstale = torch.max(similarity, dim=2)\n        \n\n        saved_topk_stale = int((stale_fresh_dist > 0.995).sum(dim=1).min())\n        merged_stale_sequence = torch.sort(stale_fresh_dist, dim=1, descending=descending)[1][:,:saved_topk_stale]\n        stale_fresh_indices = stale_fresh_indices_allstale.gather(1, merged_stale_sequence)\n        merged_stale_sequence = stale_indices.gather(1, merged_stale_sequence)\n        merged_stale_fresh_indices = fresh_indices.gather(1, stale_fresh_indices)\n        cache_dic['merged_stale_fresh_indices'] = merged_stale_fresh_indices\n        cache_dic['merged_stale_sequence'] = merged_stale_sequence \n"
  },
  {
    "path": "flux-ToCa/src/flux/modules/cache_functions/update_cache.py",
    "content": "import torch\ndef update_cache(fresh_indices, fresh_tokens, cache_dic, current, fresh_attn_map=None):\n    '''\n    Update the cache with the fresh tokens.\n    '''\n    step = current['step']\n    layer = current['layer']\n    module = current['module']\n    # Update the cached tokens at the positions\n\n\n    indices = fresh_indices\n\n    cache_dic['cache'][-1][current['stream']][current['layer']][current['module']].scatter_(dim=1, index=indices.unsqueeze(-1).expand(-1, -1, fresh_tokens.shape[-1]), src=fresh_tokens)\n    \n    \n\n        \n        "
  },
  {
    "path": "flux-ToCa/src/flux/modules/conditioner.py",
    "content": "from torch import Tensor, nn\nfrom transformers import CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5Tokenizer\n\n\nclass HFEmbedder(nn.Module):\n    def __init__(self, version: str, max_length: int, **hf_kwargs):\n        super().__init__()\n        self.is_clip = \"openai\" in version\n        self.max_length = max_length\n        self.output_key = \"pooler_output\" if self.is_clip else \"last_hidden_state\"\n\n        if self.is_clip:\n            self.tokenizer: CLIPTokenizer = CLIPTokenizer.from_pretrained(version, max_length=max_length)\n            self.hf_module: CLIPTextModel = CLIPTextModel.from_pretrained(version, **hf_kwargs)\n        else:\n            self.tokenizer: T5Tokenizer = T5Tokenizer.from_pretrained(version, max_length=max_length)\n            self.hf_module: T5EncoderModel = T5EncoderModel.from_pretrained(version, **hf_kwargs)\n\n        self.hf_module = self.hf_module.eval().requires_grad_(False)\n\n    def forward(self, text: list[str]) -> Tensor:\n        batch_encoding = self.tokenizer(\n            text,\n            truncation=True,\n            max_length=self.max_length,\n            return_length=False,\n            return_overflowing_tokens=False,\n            padding=\"max_length\",\n            return_tensors=\"pt\",\n        )\n\n        outputs = self.hf_module(\n            input_ids=batch_encoding[\"input_ids\"].to(self.hf_module.device),\n            attention_mask=None,\n            output_hidden_states=False,\n        )\n        return outputs[self.output_key]\n"
  },
  {
    "path": "flux-ToCa/src/flux/modules/image_embedders.py",
    "content": "import os\n\nimport cv2\nimport numpy as np\nimport torch\nfrom einops import rearrange, repeat\nfrom PIL import Image\nfrom safetensors.torch import load_file as load_sft\nfrom torch import nn\nfrom transformers import AutoModelForDepthEstimation, AutoProcessor, SiglipImageProcessor, SiglipVisionModel\n\nfrom flux.util import print_load_warning\n\n\nclass DepthImageEncoder:\n    depth_model_name = \"LiheYoung/depth-anything-large-hf\"\n\n    def __init__(self, device):\n        self.device = device\n        self.depth_model = AutoModelForDepthEstimation.from_pretrained(self.depth_model_name).to(device)\n        self.processor = AutoProcessor.from_pretrained(self.depth_model_name)\n\n    def __call__(self, img: torch.Tensor) -> torch.Tensor:\n        hw = img.shape[-2:]\n\n        img = torch.clamp(img, -1.0, 1.0)\n        img_byte = ((img + 1.0) * 127.5).byte()\n\n        img = self.processor(img_byte, return_tensors=\"pt\")[\"pixel_values\"]\n        depth = self.depth_model(img.to(self.device)).predicted_depth\n        depth = repeat(depth, \"b h w -> b 3 h w\")\n        depth = torch.nn.functional.interpolate(depth, hw, mode=\"bicubic\", antialias=True)\n\n        depth = depth / 127.5 - 1.0\n        return depth\n\n\nclass CannyImageEncoder:\n    def __init__(\n        self,\n        device,\n        min_t: int = 50,\n        max_t: int = 200,\n    ):\n        self.device = device\n        self.min_t = min_t\n        self.max_t = max_t\n\n    def __call__(self, img: torch.Tensor) -> torch.Tensor:\n        assert img.shape[0] == 1, \"Only batch size 1 is supported\"\n\n        img = rearrange(img[0], \"c h w -> h w c\")\n        img = torch.clamp(img, -1.0, 1.0)\n        img_np = ((img + 1.0) * 127.5).numpy().astype(np.uint8)\n\n        # Apply Canny edge detection\n        canny = cv2.Canny(img_np, self.min_t, self.max_t)\n\n        # Convert back to torch tensor and reshape\n        canny = torch.from_numpy(canny).float() / 127.5 - 1.0\n        canny = rearrange(canny, \"h w -> 1 1 h w\")\n        canny = repeat(canny, \"b 1 ... -> b 3 ...\")\n        return canny.to(self.device)\n\n\nclass ReduxImageEncoder(nn.Module):\n    siglip_model_name = \"google/siglip-so400m-patch14-384\"\n\n    def __init__(\n        self,\n        device,\n        redux_dim: int = 1152,\n        txt_in_features: int = 4096,\n        redux_path: str | None = os.getenv(\"FLUX_REDUX\"),\n        dtype=torch.bfloat16,\n    ) -> None:\n        assert redux_path is not None, \"Redux path must be provided\"\n\n        super().__init__()\n\n        self.redux_dim = redux_dim\n        self.device = device if isinstance(device, torch.device) else torch.device(device)\n        self.dtype = dtype\n\n        with self.device:\n            self.redux_up = nn.Linear(redux_dim, txt_in_features * 3, dtype=dtype)\n            self.redux_down = nn.Linear(txt_in_features * 3, txt_in_features, dtype=dtype)\n\n            sd = load_sft(redux_path, device=str(device))\n            missing, unexpected = self.load_state_dict(sd, strict=False, assign=True)\n            print_load_warning(missing, unexpected)\n\n            self.siglip = SiglipVisionModel.from_pretrained(self.siglip_model_name).to(dtype=dtype)\n        self.normalize = SiglipImageProcessor.from_pretrained(self.siglip_model_name)\n\n    def __call__(self, x: Image.Image) -> torch.Tensor:\n        imgs = self.normalize.preprocess(images=[x], do_resize=True, return_tensors=\"pt\", do_convert_rgb=True)\n\n        _encoded_x = self.siglip(**imgs.to(device=self.device, dtype=self.dtype)).last_hidden_state\n\n        projected_x = self.redux_down(nn.functional.silu(self.redux_up(_encoded_x)))\n\n        return projected_x\n"
  },
  {
    "path": "flux-ToCa/src/flux/modules/layers.py",
    "content": "import math\nfrom dataclasses import dataclass\nfrom typing import Optional\nimport torch\nfrom einops import rearrange\nfrom torch import Tensor, nn\n\nfrom flux.math import attention, rope\n\nfrom flux.modules.cache_functions import force_init, cache_cutfresh, update_cache\n\nclass EmbedND(nn.Module):\n    def __init__(self, dim: int, theta: int, axes_dim: list[int]):\n        super().__init__()\n        self.dim = dim\n        self.theta = theta\n        self.axes_dim = axes_dim\n\n    def forward(self, ids: Tensor) -> Tensor:\n        n_axes = ids.shape[-1]\n        emb = torch.cat(\n            [rope(ids[..., i], self.axes_dim[i], self.theta) for i in range(n_axes)],\n            dim=-3,\n        )\n\n        return emb.unsqueeze(1)\n\n\ndef timestep_embedding(t: Tensor, dim, max_period=10000, time_factor: float = 1000.0):\n    \"\"\"\n    Create sinusoidal timestep embeddings.\n    :param t: a 1-D Tensor of N indices, one per batch element.\n                      These may be fractional.\n    :param dim: the dimension of the output.\n    :param max_period: controls the minimum frequency of the embeddings.\n    :return: an (N, D) Tensor of positional embeddings.\n    \"\"\"\n    t = time_factor * t\n    half = dim // 2\n    freqs = torch.exp(-math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half).to(\n        t.device\n    )\n\n    args = t[:, None].float() * freqs[None]\n    embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)\n    if dim % 2:\n        embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)\n    if torch.is_floating_point(t):\n        embedding = embedding.to(t)\n    return embedding\n\n\nclass MLPEmbedder(nn.Module):\n    def __init__(self, in_dim: int, hidden_dim: int):\n        super().__init__()\n        self.in_layer = nn.Linear(in_dim, hidden_dim, bias=True)\n        self.silu = nn.SiLU()\n        self.out_layer = nn.Linear(hidden_dim, hidden_dim, bias=True)\n\n    def forward(self, x: Tensor) -> Tensor:\n        return self.out_layer(self.silu(self.in_layer(x)))\n\n\nclass RMSNorm(torch.nn.Module):\n    def __init__(self, dim: int):\n        super().__init__()\n        self.scale = nn.Parameter(torch.ones(dim))\n\n    def forward(self, x: Tensor):\n        x_dtype = x.dtype\n        x = x.float()\n        rrms = torch.rsqrt(torch.mean(x**2, dim=-1, keepdim=True) + 1e-6)\n        return (x * rrms).to(dtype=x_dtype) * self.scale\n\n\nclass QKNorm(torch.nn.Module):\n    def __init__(self, dim: int):\n        super().__init__()\n        self.query_norm = RMSNorm(dim)\n        self.key_norm = RMSNorm(dim)\n\n    def forward(self, q: Tensor, k: Tensor, v: Tensor) -> tuple[Tensor, Tensor]:\n        q = self.query_norm(q)\n        k = self.key_norm(k)\n        return q.to(v), k.to(v)\n\n\nclass SelfAttention(nn.Module):\n    def __init__(self, dim: int, num_heads: int = 8, qkv_bias: bool = False):\n        super().__init__()\n        self.num_heads = num_heads\n        head_dim = dim // num_heads\n\n        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)\n        self.norm = QKNorm(head_dim)\n        self.proj = nn.Linear(dim, dim)\n\n    def forward(self, x: Tensor, pe: Tensor) -> Tensor:\n        qkv = self.qkv(x)\n        q, k, v = rearrange(qkv, \"B L (K H D) -> K B H L D\", K=3, H=self.num_heads)\n        q, k = self.norm(q, k, v)\n        x = attention(q, k, v, pe=pe)\n        x = self.proj(x)\n        return x\n\n\n@dataclass\nclass ModulationOut:\n    shift: Tensor\n    scale: Tensor\n    gate: Tensor\n\n\nclass Modulation(nn.Module):\n    def __init__(self, dim: int, double: bool):\n        super().__init__()\n        self.is_double = double\n        self.multiplier = 6 if double else 3\n        self.lin = nn.Linear(dim, self.multiplier * dim, bias=True)\n\n    def forward(self, vec: Tensor) -> tuple[ModulationOut, ModulationOut | None]:\n        out = self.lin(nn.functional.silu(vec))[:, None, :].chunk(self.multiplier, dim=-1)\n\n        return (\n            ModulationOut(*out[:3]),\n            ModulationOut(*out[3:]) if self.is_double else None,\n        )\n\n\nclass DoubleStreamBlock(nn.Module):\n    def __init__(self, hidden_size: int, num_heads: int, mlp_ratio: float, qkv_bias: bool = False):\n        super().__init__()\n\n        mlp_hidden_dim = int(hidden_size * mlp_ratio)\n        self.num_heads = num_heads\n        self.hidden_size = hidden_size\n        self.img_mod = Modulation(hidden_size, double=True)\n        self.img_norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n        self.img_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias)\n\n        self.img_norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n        self.img_mlp = nn.Sequential(\n            nn.Linear(hidden_size, mlp_hidden_dim, bias=True),\n            nn.GELU(approximate=\"tanh\"),\n            nn.Linear(mlp_hidden_dim, hidden_size, bias=True),\n        )\n\n        self.txt_mod = Modulation(hidden_size, double=True)\n        self.txt_norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n        self.txt_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias)\n\n        self.txt_norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n        self.txt_mlp = nn.Sequential(\n            nn.Linear(hidden_size, mlp_hidden_dim, bias=True),\n            nn.GELU(approximate=\"tanh\"),\n            nn.Linear(mlp_hidden_dim, hidden_size, bias=True),\n        )\n\n    def forward(self, img: Tensor, txt: Tensor, vec: Tensor, pe: Tensor, **kwargs) -> tuple[Tensor, Tensor]:\n        \n        cache_dic = kwargs.get('cache_dic', None)\n        current = kwargs.get('current', None)        \n        \n        if cache_dic is None:\n            img_mod1, img_mod2 = self.img_mod(vec)\n            txt_mod1, txt_mod2 = self.txt_mod(vec)\n\n            # prepare image for attention\n            img_modulated = self.img_norm1(img)\n            img_modulated = (1 + img_mod1.scale) * img_modulated + img_mod1.shift\n            img_qkv = self.img_attn.qkv(img_modulated)\n            img_q, img_k, img_v = rearrange(img_qkv, \"B L (K H D) -> K B H L D\", K=3, H=self.num_heads)\n            img_q, img_k = self.img_attn.norm(img_q, img_k, img_v)\n\n            # prepare txt for attention\n            txt_modulated = self.txt_norm1(txt)\n            txt_modulated = (1 + txt_mod1.scale) * txt_modulated + txt_mod1.shift\n            txt_qkv = self.txt_attn.qkv(txt_modulated)\n            txt_q, txt_k, txt_v = rearrange(txt_qkv, \"B L (K H D) -> K B H L D\", K=3, H=self.num_heads)\n            txt_q, txt_k = self.txt_attn.norm(txt_q, txt_k, txt_v)\n\n            # run actual attention\n            q = torch.cat((txt_q, img_q), dim=2)\n            k = torch.cat((txt_k, img_k), dim=2)\n            v = torch.cat((txt_v, img_v), dim=2)\n\n            attn = attention(q, k, v, pe=pe)\n            txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1] :]\n\n            # calculate the img bloks\n            img = img + img_mod1.gate * self.img_attn.proj(img_attn)\n            img = img + img_mod2.gate * self.img_mlp((1 + img_mod2.scale) * self.img_norm2(img) + img_mod2.shift)\n\n            # calculate the txt bloks\n            txt = txt + txt_mod1.gate * self.txt_attn.proj(txt_attn)\n            txt = txt + txt_mod2.gate * self.txt_mlp((1 + txt_mod2.scale) * self.txt_norm2(txt) + txt_mod2.shift)\n        \n        else:\n            current['stream'] = 'double_stream'\n\n            if current['type'] == 'full':    \n                img_mod1, img_mod2 = self.img_mod(vec)\n                txt_mod1, txt_mod2 = self.txt_mod(vec)\n\n                # prepare image for attention\n                img_modulated = self.img_norm1(img)\n                img_modulated = (1 + img_mod1.scale) * img_modulated + img_mod1.shift\n                img_qkv = self.img_attn.qkv(img_modulated)\n                img_q, img_k, img_v = rearrange(img_qkv, \"B L (K H D) -> K B H L D\", K=3, H=self.num_heads)\n                \n                if cache_dic['cache_type'] == 'k-norm':\n                    img_k_norm = img_k.norm(dim=-1, p=2).mean(dim=1)\n                    cache_dic['k-norm'][-1][current['stream']][current['layer']]['img_mlp'] = img_k_norm\n                elif cache_dic['cache_type'] == 'v-norm':\n                    img_v_norm = img_v.norm(dim=-1, p=2).mean(dim=1)\n                    cache_dic['v-norm'][-1][current['stream']][current['layer']]['img_mlp'] = img_v_norm\n                \n                img_q, img_k = self.img_attn.norm(img_q, img_k, img_v)\n\n                # prepare txt for attention\n                txt_modulated = self.txt_norm1(txt)\n                txt_modulated = (1 + txt_mod1.scale) * txt_modulated + txt_mod1.shift\n                txt_qkv = self.txt_attn.qkv(txt_modulated)\n                txt_q, txt_k, txt_v = rearrange(txt_qkv, \"B L (K H D) -> K B H L D\", K=3, H=self.num_heads)\n\n                if cache_dic['cache_type'] == 'k-norm':\n                    txt_k_norm = txt_k.norm(dim=-1, p=2).mean(dim=1)\n                    cache_dic['k-norm'][-1][current['stream']][current['layer']]['txt_mlp'] = txt_k_norm\n                elif cache_dic['cache_type'] == 'v-norm':\n                    txt_v_norm = txt_v.norm(dim=-1, p=2).mean(dim=1)\n                    cache_dic['v-norm'][-1][current['stream']][current['layer']]['txt_mlp'] = txt_v_norm\n                \n                txt_q, txt_k = self.txt_attn.norm(txt_q, txt_k, txt_v)\n\n                # run actual attention\n                q = torch.cat((txt_q, img_q), dim=2)\n                k = torch.cat((txt_k, img_k), dim=2)\n                v = torch.cat((txt_v, img_v), dim=2)\n\n                attn = attention(q, k, v, pe=pe, cache_dic=cache_dic, current=current)\n                cache_dic['cache'][-1]['double_stream'][current['layer']]['attn'] = attn\n\n                txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1] :]\n                cache_dic['txt_shape'] = txt.shape[1]\n                \n                if cache_dic['cache_type'] == 'attention':\n                    cache_dic['attn_map'][-1][current['stream']][current['layer']]['txt_mlp'] = cache_dic['attn_map'][-1][current['stream']][current['layer']]['total'][:, : txt.shape[1]]\n                    cache_dic['attn_map'][-1][current['stream']][current['layer']]['img_mlp'] = cache_dic['attn_map'][-1][current['stream']][current['layer']]['total'][:, txt.shape[1] :]\n\n                current['module'] = 'img_mlp'\n                force_init(cache_dic=cache_dic, current=current, tokens=img)\n                # calculate the img bloks\n                img = img + img_mod1.gate * self.img_attn.proj(img_attn)\n                cache_dic['cache'][-1]['double_stream'][current['layer']]['img_mlp'] = self.img_mlp((1 + img_mod2.scale) * self.img_norm2(img) + img_mod2.shift)\n                img = img + img_mod2.gate * cache_dic['cache'][-1]['double_stream'][current['layer']]['img_mlp']\n\n                current['module'] = 'txt_mlp'\n                force_init(cache_dic=cache_dic, current=current, tokens=txt)\n                # calculate the txt bloks\n                txt = txt + txt_mod1.gate * self.txt_attn.proj(txt_attn)\n                cache_dic['cache'][-1]['double_stream'][current['layer']]['txt_mlp'] = self.txt_mlp((1 + txt_mod2.scale) * self.txt_norm2(txt) + txt_mod2.shift)\n                txt = txt + txt_mod2.gate * cache_dic['cache'][-1]['double_stream'][current['layer']]['txt_mlp']\n\n            elif current['type'] == 'ToCa':\n                img_mod1, img_mod2 = self.img_mod(vec)\n                txt_mod1, txt_mod2 = self.txt_mod(vec)\n\n                attn = cache_dic['cache'][-1]['double_stream'][current['layer']]['attn']\n                txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1] :]\n\n                current['module'] = 'img_mlp'\n                # calculate the img bloks\n                img = img + img_mod1.gate * self.img_attn.proj(img_attn)\n                fresh_indices, fresh_tokens_img = cache_cutfresh(cache_dic=cache_dic, tokens=img, current=current)\n                fresh_tokens_img = self.img_mlp((1 + img_mod2.scale) * self.img_norm2(fresh_tokens_img) + img_mod2.shift)\n                update_cache(fresh_indices=fresh_indices, fresh_tokens=fresh_tokens_img, cache_dic=cache_dic, current=current)\n                cache_dic['cache'][-1]['double_stream'][current['layer']]['img_mlp']\n                img = img + img_mod2.gate * cache_dic['cache'][-1]['double_stream'][current['layer']]['img_mlp']\n\n                current['module'] = 'txt_mlp'\n                # calculate the txt bloks\n                txt = txt + txt_mod1.gate * self.txt_attn.proj(txt_attn)\n                fresh_indices, fresh_tokens_txt = cache_cutfresh(cache_dic=cache_dic, tokens=txt, current=current)\n                fresh_tokens_txt = self.txt_mlp((1 + txt_mod2.scale) * self.txt_norm2(fresh_tokens_txt) + txt_mod2.shift)\n                update_cache(fresh_indices=fresh_indices, fresh_tokens=fresh_tokens_txt, cache_dic=cache_dic, current=current)\n                txt = txt + txt_mod2.gate * cache_dic['cache'][-1]['double_stream'][current['layer']]['txt_mlp']\n            \n            elif current['type'] == 'FORA':\n                img_mod1, img_mod2 = self.img_mod(vec)\n                txt_mod1, txt_mod2 = self.txt_mod(vec)\n                img = img + img_mod2.gate * cache_dic['cache'][-1]['double_stream'][current['layer']]['img_mlp']\n                txt = txt + txt_mod2.gate * cache_dic['cache'][-1]['double_stream'][current['layer']]['txt_mlp']\n            elif current['type'] == 'aggressive':\n                current['module'] = 'skipped'\n            else:\n                raise ValueError(\"Unknown cache type.\")\n            \n        return img, txt\n\n\nclass SingleStreamBlock(nn.Module):\n    \"\"\"\n    A DiT block with parallel linear layers as described in\n    https://arxiv.org/abs/2302.05442 and adapted modulation interface.\n    \"\"\"\n\n    def __init__(\n        self,\n        hidden_size: int,\n        num_heads: int,\n        mlp_ratio: float = 4.0,\n        qk_scale: float | None = None,\n    ):\n        super().__init__()\n        self.hidden_dim = hidden_size\n        self.num_heads = num_heads\n        head_dim = hidden_size // num_heads\n        self.scale = qk_scale or head_dim**-0.5\n\n        self.mlp_hidden_dim = int(hidden_size * mlp_ratio)\n        # qkv and mlp_in\n        self.linear1 = nn.Linear(hidden_size, hidden_size * 3 + self.mlp_hidden_dim)\n        # proj and mlp_out\n        self.linear2 = nn.Linear(hidden_size + self.mlp_hidden_dim, hidden_size)\n\n        self.norm = QKNorm(head_dim)\n\n        self.hidden_size = hidden_size\n        self.pre_norm = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n\n        self.mlp_act = nn.GELU(approximate=\"tanh\")\n        self.modulation = Modulation(hidden_size, double=False)\n        # mlp_in\n        self.mlp_in = nn.Linear(hidden_size, self.mlp_hidden_dim)\n\n    def load_mlp_in_weights(self, linear1_weight: torch.Tensor, linear1_bias: Optional[torch.Tensor] = None):\n        \"\"\"\n        Split and load the weights of the original `linear1` layer, keeping only the MLP hidden layer part.\n\n        Parameters:\n          - linear1_weight: Tensor, with shape (hidden_size * 3 + mlp_hidden_dim, hidden_size)\n          - linear1_bias: Tensor, with shape (hidden_size * 3 + mlp_hidden_dim,) or None\n\n        \"\"\"\n        hidden_size = self.hidden_size\n        mlp_hidden_dim = self.mlp_hidden_dim\n        device = self.linear1.weight.device  # target device\n\n        self.mlp_in.weight = torch.nn.Parameter(linear1_weight[hidden_size * 3:, :].to(device))\n\n        if linear1_bias is not None:\n\n            self.mlp_in.bias = torch.nn.Parameter(linear1_bias[hidden_size * 3:].to(device))\n\n    def forward(self, x: Tensor, vec: Tensor, pe: Tensor, **kwargs) -> Tensor:\n\n        cache_dic = kwargs.get('cache_dic', None)\n        current = kwargs.get('current', None)\n\n        mod, _ = self.modulation(vec)\n        \n        if cache_dic is None:\n            x_mod = (1 + mod.scale) * self.pre_norm(x) + mod.shift\n            qkv, mlp = torch.split(self.linear1(x_mod), [3 * self.hidden_size, self.mlp_hidden_dim], dim=-1)\n\n            q, k, v = rearrange(qkv, \"B L (K H D) -> K B H L D\", K=3, H=self.num_heads)\n            q, k = self.norm(q, k, v)\n\n            # compute attention\n            attn = attention(q, k, v, pe=pe, cache_dic=cache_dic, current=current)\n            # compute activation in mlp stream, cat again and run second linear layer\n            output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2))\n        \n        else:\n            current['stream'] = 'single_stream'\n\n            if current['type'] == 'full':\n                #if (current['layer'] == 0):\n                #    print(current['step'])\n                x_mod = (1 + mod.scale) * self.pre_norm(x) + mod.shift\n                qkv, mlp = torch.split(self.linear1(x_mod), [3 * self.hidden_size, self.mlp_hidden_dim], dim=-1)\n                cache_dic['cache'][-1]['single_stream'][current['layer']]['mlp'] = mlp\n                current['module'] = 'attn'\n                q, k, v = rearrange(qkv, \"B L (K H D) -> K B H L D\", K=3, H=self.num_heads)\n\n                if cache_dic['cache_type'] == 'k-norm':\n                    cache_dic['k-norm'][-1][current['stream']][current['layer']]['total'] = k.norm(dim=-1, p=2).mean(dim=1)\n                elif cache_dic['cache_type'] == 'v-norm':\n                    cache_dic['v-norm'][-1][current['stream']][current['layer']]['total'] = v.norm(dim=-1, p=2).mean(dim=1)\n                \n                q, k = self.norm(q, k, v)\n\n                # compute attention\n                attn = attention(q, k, v, pe=pe, cache_dic=cache_dic, current=current)\n                force_init(cache_dic=cache_dic, current=current, tokens=attn)\n                cache_dic['cache'][-1]['single_stream'][current['layer']]['attn'] = attn\n                # compute activation in mlp stream, cat again and run second linear layer\n                current['module'] = 'mlp'\n                output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2))\n                force_init(cache_dic=cache_dic, current=current, tokens=output)\n                current['module'] = 'total'\n                cache_dic['cache'][-1]['single_stream'][current['layer']]['total'] = output\n\n            elif current['type'] == 'ToCa':\n                self.load_mlp_in_weights(self.linear1.weight, self.linear1.bias)\n                current['module'] = 'mlp'\n                fresh_indices, fresh_tokens_mlp = cache_cutfresh(cache_dic=cache_dic, tokens=x, current=current)\n                x_mod = (1 + mod.scale) * self.pre_norm(fresh_tokens_mlp) + mod.shift\n                #cache_dic['cache'][-1]['single_stream'][current['layer']]['mlp']\n                mlp_fresh = self.mlp_in(x_mod)\n                #_, mlp_fresh1 = torch.split(self.linear1(x_mod), [3 * self.hidden_size, self.mlp_hidden_dim], dim=-1)\n                update_cache(fresh_indices=fresh_indices, fresh_tokens=mlp_fresh, cache_dic=cache_dic, current=current)\n                # compute attention\n                fake_fresh_attn = torch.gather(input = cache_dic['cache'][-1]['single_stream'][current['layer']]['attn'], dim = 1, \n                                               index = fresh_indices.unsqueeze(-1).expand(-1, -1, cache_dic['cache'][-1]['single_stream'][current['layer']]['attn'].shape[-1]))\n                \n                current['module'] = 'total'\n                fresh_tokens_output = self.linear2(torch.cat((fake_fresh_attn, self.mlp_act(mlp_fresh)), 2))\n                update_cache(fresh_indices=fresh_indices, fresh_tokens=fresh_tokens_output, cache_dic=cache_dic, current=current)\n                #attn = cache_dic['cache'][-1]['single_stream'][current['layer']]['attn']\n                #mlp  = cache_dic['cache'][-1]['single_stream'][current['layer']]['mlp']\n                # compute activation in mlp stream, cat again and run second linear layer\n                #output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2))\n                output = cache_dic['cache'][-1]['single_stream'][current['layer']]['total']\n            \n            elif current['type'] == 'FORA':\n                output = cache_dic['cache'][-1]['single_stream'][current['layer']]['total']\n                \n            elif current['type'] == 'aggressive':\n                current['module'] = 'skipped'\n                if current['layer'] == 37:\n                    x = cache_dic['cache'][-1]['aggressive_output']\n                return x\n            else:\n                raise ValueError(\"Unknown cache type.\")\n            \n            if current['layer'] == 37:\n                cache_dic['cache'][-1]['aggressive_output'] = x\n            \n        return x + mod.gate * output\n\n\nclass LastLayer(nn.Module):\n    def __init__(self, hidden_size: int, patch_size: int, out_channels: int):\n        super().__init__()\n        self.norm_final = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)\n        self.linear = nn.Linear(hidden_size, patch_size * patch_size * out_channels, bias=True)\n        self.adaLN_modulation = nn.Sequential(nn.SiLU(), nn.Linear(hidden_size, 2 * hidden_size, bias=True))\n\n    def forward(self, x: Tensor, vec: Tensor) -> Tensor:\n        shift, scale = self.adaLN_modulation(vec).chunk(2, dim=1)\n        x = (1 + scale[:, None, :]) * self.norm_final(x) + shift[:, None, :]\n        x = self.linear(x)\n        return x\n"
  },
  {
    "path": "flux-ToCa/src/flux/modules/lora.py",
    "content": "import torch\nfrom torch import nn\n\n\ndef replace_linear_with_lora(\n    module: nn.Module,\n    max_rank: int,\n    scale: float = 1.0,\n) -> None:\n    for name, child in module.named_children():\n        if isinstance(child, nn.Linear):\n            new_lora = LinearLora(\n                in_features=child.in_features,\n                out_features=child.out_features,\n                bias=child.bias,\n                rank=max_rank,\n                scale=scale,\n                dtype=child.weight.dtype,\n                device=child.weight.device,\n            )\n\n            new_lora.weight = child.weight\n            new_lora.bias = child.bias if child.bias is not None else None\n\n            setattr(module, name, new_lora)\n        else:\n            replace_linear_with_lora(\n                module=child,\n                max_rank=max_rank,\n                scale=scale,\n            )\n\n\nclass LinearLora(nn.Linear):\n    def __init__(\n        self,\n        in_features: int,\n        out_features: int,\n        bias: bool,\n        rank: int,\n        dtype: torch.dtype,\n        device: torch.device,\n        lora_bias: bool = True,\n        scale: float = 1.0,\n        *args,\n        **kwargs,\n    ) -> None:\n        super().__init__(\n            in_features=in_features,\n            out_features=out_features,\n            bias=bias is not None,\n            device=device,\n            dtype=dtype,\n            *args,\n            **kwargs,\n        )\n\n        assert isinstance(scale, float), \"scale must be a float\"\n\n        self.scale = scale\n        self.rank = rank\n        self.lora_bias = lora_bias\n        self.dtype = dtype\n        self.device = device\n\n        if rank > (new_rank := min(self.out_features, self.in_features)):\n            self.rank = new_rank\n\n        self.lora_A = nn.Linear(\n            in_features=in_features,\n            out_features=self.rank,\n            bias=False,\n            dtype=dtype,\n            device=device,\n        )\n        self.lora_B = nn.Linear(\n            in_features=self.rank,\n            out_features=out_features,\n            bias=self.lora_bias,\n            dtype=dtype,\n            device=device,\n        )\n\n    def set_scale(self, scale: float) -> None:\n        assert isinstance(scale, float), \"scalar value must be a float\"\n        self.scale = scale\n\n    def forward(self, input: torch.Tensor) -> torch.Tensor:\n        base_out = super().forward(input)\n\n        _lora_out_B = self.lora_B(self.lora_A(input))\n        lora_update = _lora_out_B * self.scale\n\n        return base_out + lora_update\n"
  },
  {
    "path": "flux-ToCa/src/flux/sampling.py",
    "content": "import math\nfrom typing import Callable\n\nimport numpy as np\nimport torch\nfrom einops import rearrange, repeat\nfrom PIL import Image\nfrom torch import Tensor\n\nfrom .model import Flux\nfrom .modules.autoencoder import AutoEncoder\nfrom .modules.conditioner import HFEmbedder\nfrom .modules.image_embedders import CannyImageEncoder, DepthImageEncoder, ReduxImageEncoder\nfrom .modules.cache_functions import cache_init\n\ndef get_noise(\n    num_samples: int,\n    height: int,\n    width: int,\n    device: torch.device,\n    dtype: torch.dtype,\n    seed: int,\n):\n    return torch.randn(\n        num_samples,\n        16,\n        # allow for packing\n        2 * math.ceil(height / 16),\n        2 * math.ceil(width / 16),\n        device=device,\n        dtype=dtype,\n        generator=torch.Generator(device=device).manual_seed(seed),\n    )\n\n\ndef prepare(t5: HFEmbedder, clip: HFEmbedder, img: Tensor, prompt: str | list[str]) -> dict[str, Tensor]:\n    bs, c, h, w = img.shape\n    if bs == 1 and not isinstance(prompt, str):\n        bs = len(prompt)\n\n    img = rearrange(img, \"b c (h ph) (w pw) -> b (h w) (c ph pw)\", ph=2, pw=2)\n    if img.shape[0] == 1 and bs > 1:\n        img = repeat(img, \"1 ... -> bs ...\", bs=bs)\n\n    img_ids = torch.zeros(h // 2, w // 2, 3)\n    img_ids[..., 1] = img_ids[..., 1] + torch.arange(h // 2)[:, None]\n    img_ids[..., 2] = img_ids[..., 2] + torch.arange(w // 2)[None, :]\n    img_ids = repeat(img_ids, \"h w c -> b (h w) c\", b=bs)\n\n    #small_img_ids = torch.zeros((h // 2) // 2, (w // 2) // 2, 3)\n    #small_img_ids[..., 1] = small_img_ids[..., 1] + torch.arange((h // 2) // 2)[:, None]\n    #small_img_ids[..., 2] = small_img_ids[..., 2] + torch.arange((w // 2) // 2)[None, :]\n    #small_img_ids = repeat(small_img_ids, \"h w c -> b (h w) c\", b=bs)\n\n    if isinstance(prompt, str):\n        prompt = [prompt]\n    txt = t5(prompt)\n    if txt.shape[0] == 1 and bs > 1:\n        txt = repeat(txt, \"1 ... -> bs ...\", bs=bs)\n    txt_ids = torch.zeros(bs, txt.shape[1], 3)\n\n    vec = clip(prompt)\n    if vec.shape[0] == 1 and bs > 1:\n        vec = repeat(vec, \"1 ... -> bs ...\", bs=bs)\n\n    return {\n        \"img\": img,\n        #\"img_ids\": [img_ids.to(img.device), small_img_ids.to(img.device)],\n        \"img_ids\": img_ids.to(img.device),\n        \"txt\": txt.to(img.device),\n        \"txt_ids\": txt_ids.to(img.device),\n        \"vec\": vec.to(img.device),\n    }\n\n\ndef prepare_control(\n    t5: HFEmbedder,\n    clip: HFEmbedder,\n    img: Tensor,\n    prompt: str | list[str],\n    ae: AutoEncoder,\n    encoder: DepthImageEncoder | CannyImageEncoder,\n    img_cond_path: str,\n) -> dict[str, Tensor]:\n    # load and encode the conditioning image\n    bs, _, h, w = img.shape\n    if bs == 1 and not isinstance(prompt, str):\n        bs = len(prompt)\n\n    img_cond = Image.open(img_cond_path).convert(\"RGB\")\n\n    width = w * 8\n    height = h * 8\n    img_cond = img_cond.resize((width, height), Image.LANCZOS)\n    img_cond = np.array(img_cond)\n    img_cond = torch.from_numpy(img_cond).float() / 127.5 - 1.0\n    img_cond = rearrange(img_cond, \"h w c -> 1 c h w\")\n\n    with torch.no_grad():\n        img_cond = encoder(img_cond)\n        img_cond = ae.encode(img_cond)\n\n    img_cond = img_cond.to(torch.bfloat16)\n    img_cond = rearrange(img_cond, \"b c (h ph) (w pw) -> b (h w) (c ph pw)\", ph=2, pw=2)\n    if img_cond.shape[0] == 1 and bs > 1:\n        img_cond = repeat(img_cond, \"1 ... -> bs ...\", bs=bs)\n\n    return_dict = prepare(t5, clip, img, prompt)\n    return_dict[\"img_cond\"] = img_cond\n    return return_dict\n\n\ndef prepare_fill(\n    t5: HFEmbedder,\n    clip: HFEmbedder,\n    img: Tensor,\n    prompt: str | list[str],\n    ae: AutoEncoder,\n    img_cond_path: str,\n    mask_path: str,\n) -> dict[str, Tensor]:\n    # load and encode the conditioning image and the mask\n    bs, _, _, _ = img.shape\n    if bs == 1 and not isinstance(prompt, str):\n        bs = len(prompt)\n\n    img_cond = Image.open(img_cond_path).convert(\"RGB\")\n    img_cond = np.array(img_cond)\n    img_cond = torch.from_numpy(img_cond).float() / 127.5 - 1.0\n    img_cond = rearrange(img_cond, \"h w c -> 1 c h w\")\n\n    mask = Image.open(mask_path).convert(\"L\")\n    mask = np.array(mask)\n    mask = torch.from_numpy(mask).float() / 255.0\n    mask = rearrange(mask, \"h w -> 1 1 h w\")\n\n    with torch.no_grad():\n        img_cond = img_cond.to(img.device)\n        mask = mask.to(img.device)\n        img_cond = img_cond * (1 - mask)\n        img_cond = ae.encode(img_cond)\n        mask = mask[:, 0, :, :]\n        mask = mask.to(torch.bfloat16)\n        mask = rearrange(\n            mask,\n            \"b (h ph) (w pw) -> b (ph pw) h w\",\n            ph=8,\n            pw=8,\n        )\n        mask = rearrange(mask, \"b c (h ph) (w pw) -> b (h w) (c ph pw)\", ph=2, pw=2)\n        if mask.shape[0] == 1 and bs > 1:\n            mask = repeat(mask, \"1 ... -> bs ...\", bs=bs)\n\n    img_cond = img_cond.to(torch.bfloat16)\n    img_cond = rearrange(img_cond, \"b c (h ph) (w pw) -> b (h w) (c ph pw)\", ph=2, pw=2)\n    if img_cond.shape[0] == 1 and bs > 1:\n        img_cond = repeat(img_cond, \"1 ... -> bs ...\", bs=bs)\n\n    img_cond = torch.cat((img_cond, mask), dim=-1)\n\n    return_dict = prepare(t5, clip, img, prompt)\n    return_dict[\"img_cond\"] = img_cond.to(img.device)\n    return return_dict\n\n\ndef prepare_redux(\n    t5: HFEmbedder,\n    clip: HFEmbedder,\n    img: Tensor,\n    prompt: str | list[str],\n    encoder: ReduxImageEncoder,\n    img_cond_path: str,\n) -> dict[str, Tensor]:\n    bs, _, h, w = img.shape\n    if bs == 1 and not isinstance(prompt, str):\n        bs = len(prompt)\n\n    img_cond = Image.open(img_cond_path).convert(\"RGB\")\n    with torch.no_grad():\n        img_cond = encoder(img_cond)\n\n    img_cond = img_cond.to(torch.bfloat16)\n    if img_cond.shape[0] == 1 and bs > 1:\n        img_cond = repeat(img_cond, \"1 ... -> bs ...\", bs=bs)\n\n    img = rearrange(img, \"b c (h ph) (w pw) -> b (h w) (c ph pw)\", ph=2, pw=2)\n    if img.shape[0] == 1 and bs > 1:\n        img = repeat(img, \"1 ... -> bs ...\", bs=bs)\n\n    img_ids = torch.zeros(h // 2, w // 2, 3)\n    img_ids[..., 1] = img_ids[..., 1] + torch.arange(h // 2)[:, None]\n    img_ids[..., 2] = img_ids[..., 2] + torch.arange(w // 2)[None, :]\n    img_ids = repeat(img_ids, \"h w c -> b (h w) c\", b=bs)\n\n    if isinstance(prompt, str):\n        prompt = [prompt]\n    txt = t5(prompt)\n    txt = torch.cat((txt, img_cond.to(txt)), dim=-2)\n    if txt.shape[0] == 1 and bs > 1:\n        txt = repeat(txt, \"1 ... -> bs ...\", bs=bs)\n    txt_ids = torch.zeros(bs, txt.shape[1], 3)\n\n    vec = clip(prompt)\n    if vec.shape[0] == 1 and bs > 1:\n        vec = repeat(vec, \"1 ... -> bs ...\", bs=bs)\n\n    return {\n        \"img\": img,\n        \"img_ids\": img_ids.to(img.device),\n        \"txt\": txt.to(img.device),\n        \"txt_ids\": txt_ids.to(img.device),\n        \"vec\": vec.to(img.device),\n    }\n\n\ndef time_shift(mu: float, sigma: float, t: Tensor):\n    return math.exp(mu) / (math.exp(mu) + (1 / t - 1) ** sigma)\n\n\ndef get_lin_function(\n    x1: float = 256, y1: float = 0.5, x2: float = 4096, y2: float = 1.15\n) -> Callable[[float], float]:\n    m = (y2 - y1) / (x2 - x1)\n    b = y1 - m * x1\n    return lambda x: m * x + b\n\n\ndef get_schedule(\n    num_steps: int,\n    image_seq_len: int,\n    base_shift: float = 0.5,\n    max_shift: float = 1.15,\n    shift: bool = True,\n) -> list[float]:\n    # extra step for zero\n    timesteps = torch.linspace(1, 0, num_steps + 1)\n\n    # shifting the schedule to favor high timesteps for higher signal images\n    if shift:\n        # estimate mu based on linear estimation between two points\n        mu = get_lin_function(y1=base_shift, y2=max_shift)(image_seq_len)\n        timesteps = time_shift(mu, 1.0, timesteps)\n\n    return timesteps.tolist()\n\n\ndef denoise(\n    model: Flux,\n    # model input\n    img: Tensor,\n    img_ids: Tensor,\n    txt: Tensor,\n    txt_ids: Tensor,\n    vec: Tensor,\n    # sampling parameters\n    timesteps: list[float],\n    guidance: float = 4.0,\n    # extra img tokens\n    img_cond: Tensor | None = None,\n):\n    # this is ignored for schnell\n    guidance_vec = torch.full((img.shape[0],), guidance, device=img.device, dtype=img.dtype)\n\n\n    for t_curr, t_prev in zip(timesteps[:-1], timesteps[1:]):\n\n\n        t_vec = torch.full((img.shape[0],), t_curr, dtype=img.dtype, device=img.device)\n        pred = model(\n            img=torch.cat((img, img_cond), dim=-1) if img_cond is not None else img,\n            #img_ids=img_ids[1] if small else img_ids[0],\n            img_ids=img_ids[0],\n            txt=txt,\n            txt_ids=txt_ids,\n            y=vec,\n            timesteps=t_vec,\n            guidance=guidance_vec,\n        )\n\n        img = img + (t_prev - t_curr) * pred\n\n    return img\n\n\ndef unpack(x: Tensor, height: int, width: int) -> Tensor:\n    return rearrange(\n        x,\n        \"b (h w) (c ph pw) -> b c (h ph) (w pw)\",\n        h=math.ceil(height / 16),\n        w=math.ceil(width / 16),\n        ph=2,\n        pw=2,\n    )\n\n####################################################################################################\n\nfrom calflops import calculate_flops\n\ndef denoise_test_FLOPs(\n    model: Flux,\n    # model input\n    img: Tensor,\n    img_ids: Tensor,\n    txt: Tensor,\n    txt_ids: Tensor,\n    vec: Tensor,\n    # sampling parameters\n    timesteps: list[float],\n    guidance: float = 4.0,\n):  \n    # init cache\n    cache_dic, current = cache_init(timesteps)\n    # this is ignored for schnell\n    guidance_vec = torch.full((img.shape[0],), guidance, device=img.device, dtype=img.dtype)\n    current['step']=0\n    current['num_steps'] = len(timesteps)-1\n    total_flops = 0\n    for t_curr, t_prev in zip(timesteps[:-1], timesteps[1:]):\n        t_vec = torch.full((img.shape[0],), t_curr, dtype=img.dtype, device=img.device)\n        inputs=dict(\n            img=img,\n            img_ids=img_ids,\n            txt=txt,\n            txt_ids=txt_ids,\n            y=vec,\n            timesteps=t_vec,\n            cache_dic = cache_dic,\n            current = current,\n            guidance=guidance_vec,\n        )\n        flops, macs, params = calculate_flops(model=model,\n                                      kwargs = inputs,\n                                      print_results=False)\n        total_flops += convert_flops(flops)\n        current['step'] += 1\n    \n    print(f\"Total {total_flops * 10 **(-12)} TFLOPs.\" )\n    return img\n\nimport re\n\ndef convert_flops(flops_str):\n    \"\"\"\n    将表示 FLOPS 的字符串（如 '12.34 GFLOPS', '1.2 TFLOPS'）转换为对应的数值。\n    \"\"\"\n    # 使用正则表达式匹配数字和单位\n    match = re.match(r\"([\\d.]+)\\s*([GT]?FLOPS)\", flops_str.strip(), re.IGNORECASE)\n    if not match:\n        raise ValueError(f\"无法解析 FLOPS 字符串: {flops_str}\")\n    \n    # 提取数字和单位\n    value = float(match.group(1))\n    unit = match.group(2).upper()\n    \n    # 根据单位转换为数字\n    if unit == \"GFLOPS\":\n        return value * 10**9\n    elif unit == \"TFLOPS\":\n        return value * 10**12\n    else:\n        raise ValueError(f\"未知的 FLOPS 单位: {unit}\")\n"
  },
  {
    "path": "flux-ToCa/src/flux/util.py",
    "content": "import os\nfrom dataclasses import dataclass\n\nimport torch\nfrom einops import rearrange\nfrom huggingface_hub import hf_hub_download\nfrom imwatermark import WatermarkEncoder\nfrom PIL import ExifTags, Image\nfrom safetensors.torch import load_file as load_sft\n\nfrom flux.model import Flux, FluxLoraWrapper, FluxParams\nfrom flux.modules.autoencoder import AutoEncoder, AutoEncoderParams\nfrom flux.modules.conditioner import HFEmbedder\n\n\ndef save_image(\n    nsfw_classifier,\n    name: str,\n    output_name: str,\n    idx: int,\n    x: torch.Tensor,\n    add_sampling_metadata: bool,\n    prompt: str,\n    nsfw_threshold: float = 0.85,\n) -> int:\n    fn = output_name.format(idx=idx)\n    print(f\"Saving {fn}\")\n    # bring into PIL format and save\n    x = x.clamp(-1, 1)\n    x = embed_watermark(x.float())\n    x = rearrange(x[0], \"c h w -> h w c\")\n\n    img = Image.fromarray((127.5 * (x + 1.0)).cpu().byte().numpy())\n    nsfw_score = [x[\"score\"] for x in nsfw_classifier(img) if x[\"label\"] == \"nsfw\"][0]\n\n    if nsfw_score < nsfw_threshold:\n        exif_data = Image.Exif()\n        exif_data[ExifTags.Base.Software] = \"AI generated;txt2img;flux\"\n        exif_data[ExifTags.Base.Make] = \"Black Forest Labs\"\n        exif_data[ExifTags.Base.Model] = name\n        if add_sampling_metadata:\n            exif_data[ExifTags.Base.ImageDescription] = prompt\n        img.save(fn, exif=exif_data, quality=95, subsampling=0)\n        idx += 1\n    else:\n        print(\"Your generated image may contain NSFW content.\")\n\n    return idx\n\n\n@dataclass\nclass ModelSpec:\n    params: FluxParams\n    ae_params: AutoEncoderParams\n    ckpt_path: str | None\n    lora_path: str | None\n    ae_path: str | None\n    repo_id: str | None\n    repo_flow: str | None\n    repo_ae: str | None\n\n\nconfigs = {\n    \"flux-dev\": ModelSpec(\n        repo_id=\"black-forest-labs/FLUX.1-dev\",\n        repo_flow=\"flux1-dev.safetensors\",\n        repo_ae=\"ae.safetensors\",\n        ckpt_path=os.getenv(\"FLUX_DEV\"),\n        lora_path=None,\n        params=FluxParams(\n            in_channels=64,\n            out_channels=64,\n            vec_in_dim=768,\n            context_in_dim=4096,\n            hidden_size=3072,\n            mlp_ratio=4.0,\n            num_heads=24,\n            depth=19,\n            depth_single_blocks=38,\n            axes_dim=[16, 56, 56],\n            theta=10_000,\n            qkv_bias=True,\n            guidance_embed=True,\n        ),\n        ae_path=os.getenv(\"AE\"),\n        ae_params=AutoEncoderParams(\n            resolution=256,\n            in_channels=3,\n            ch=128,\n            out_ch=3,\n            ch_mult=[1, 2, 4, 4],\n            num_res_blocks=2,\n            z_channels=16,\n            scale_factor=0.3611,\n            shift_factor=0.1159,\n        ),\n    ),\n    \"flux-schnell\": ModelSpec(\n        repo_id=\"black-forest-labs/FLUX.1-schnell\",\n        repo_flow=\"flux1-schnell.safetensors\",\n        repo_ae=\"ae.safetensors\",\n        ckpt_path=os.getenv(\"FLUX_SCHNELL\"),\n        lora_path=None,\n        params=FluxParams(\n            in_channels=64,\n            out_channels=64,\n            vec_in_dim=768,\n            context_in_dim=4096,\n            hidden_size=3072,\n            mlp_ratio=4.0,\n            num_heads=24,\n            depth=19,\n            depth_single_blocks=38,\n            axes_dim=[16, 56, 56],\n            theta=10_000,\n            qkv_bias=True,\n            guidance_embed=False,\n        ),\n        ae_path=os.getenv(\"AE\"),\n        ae_params=AutoEncoderParams(\n            resolution=256,\n            in_channels=3,\n            ch=128,\n            out_ch=3,\n            ch_mult=[1, 2, 4, 4],\n            num_res_blocks=2,\n            z_channels=16,\n            scale_factor=0.3611,\n            shift_factor=0.1159,\n        ),\n    ),\n    \"flux-dev-canny\": ModelSpec(\n        repo_id=\"black-forest-labs/FLUX.1-Canny-dev\",\n        repo_flow=\"flux1-canny-dev.safetensors\",\n        repo_ae=\"ae.safetensors\",\n        ckpt_path=os.getenv(\"FLUX_DEV_CANNY\"),\n        lora_path=None,\n        params=FluxParams(\n            in_channels=128,\n            out_channels=64,\n            vec_in_dim=768,\n            context_in_dim=4096,\n            hidden_size=3072,\n            mlp_ratio=4.0,\n            num_heads=24,\n            depth=19,\n            depth_single_blocks=38,\n            axes_dim=[16, 56, 56],\n            theta=10_000,\n            qkv_bias=True,\n            guidance_embed=True,\n        ),\n        ae_path=os.getenv(\"AE\"),\n        ae_params=AutoEncoderParams(\n            resolution=256,\n            in_channels=3,\n            ch=128,\n            out_ch=3,\n            ch_mult=[1, 2, 4, 4],\n            num_res_blocks=2,\n            z_channels=16,\n            scale_factor=0.3611,\n            shift_factor=0.1159,\n        ),\n    ),\n    \"flux-dev-canny-lora\": ModelSpec(\n        repo_id=\"black-forest-labs/FLUX.1-dev\",\n        repo_flow=\"flux1-dev.safetensors\",\n        repo_ae=\"ae.safetensors\",\n        ckpt_path=os.getenv(\"FLUX_DEV\"),\n        lora_path=os.getenv(\"FLUX_DEV_CANNY_LORA\"),\n        params=FluxParams(\n            in_channels=128,\n            out_channels=64,\n            vec_in_dim=768,\n            context_in_dim=4096,\n            hidden_size=3072,\n            mlp_ratio=4.0,\n            num_heads=24,\n            depth=19,\n            depth_single_blocks=38,\n            axes_dim=[16, 56, 56],\n            theta=10_000,\n            qkv_bias=True,\n            guidance_embed=True,\n        ),\n        ae_path=os.getenv(\"AE\"),\n        ae_params=AutoEncoderParams(\n            resolution=256,\n            in_channels=3,\n            ch=128,\n            out_ch=3,\n            ch_mult=[1, 2, 4, 4],\n            num_res_blocks=2,\n            z_channels=16,\n            scale_factor=0.3611,\n            shift_factor=0.1159,\n        ),\n    ),\n    \"flux-dev-depth\": ModelSpec(\n        repo_id=\"black-forest-labs/FLUX.1-Depth-dev\",\n        repo_flow=\"flux1-depth-dev.safetensors\",\n        repo_ae=\"ae.safetensors\",\n        ckpt_path=os.getenv(\"FLUX_DEV_DEPTH\"),\n        lora_path=None,\n        params=FluxParams(\n            in_channels=128,\n            out_channels=64,\n            vec_in_dim=768,\n            context_in_dim=4096,\n            hidden_size=3072,\n            mlp_ratio=4.0,\n            num_heads=24,\n            depth=19,\n            depth_single_blocks=38,\n            axes_dim=[16, 56, 56],\n            theta=10_000,\n            qkv_bias=True,\n            guidance_embed=True,\n        ),\n        ae_path=os.getenv(\"AE\"),\n        ae_params=AutoEncoderParams(\n            resolution=256,\n            in_channels=3,\n            ch=128,\n            out_ch=3,\n            ch_mult=[1, 2, 4, 4],\n            num_res_blocks=2,\n            z_channels=16,\n            scale_factor=0.3611,\n            shift_factor=0.1159,\n        ),\n    ),\n    \"flux-dev-depth-lora\": ModelSpec(\n        repo_id=\"black-forest-labs/FLUX.1-dev\",\n        repo_flow=\"flux1-dev.safetensors\",\n        repo_ae=\"ae.safetensors\",\n        ckpt_path=os.getenv(\"FLUX_DEV\"),\n        lora_path=os.getenv(\"FLUX_DEV_DEPTH_LORA\"),\n        params=FluxParams(\n            in_channels=128,\n            out_channels=64,\n            vec_in_dim=768,\n            context_in_dim=4096,\n            hidden_size=3072,\n            mlp_ratio=4.0,\n            num_heads=24,\n            depth=19,\n            depth_single_blocks=38,\n            axes_dim=[16, 56, 56],\n            theta=10_000,\n            qkv_bias=True,\n            guidance_embed=True,\n        ),\n        ae_path=os.getenv(\"AE\"),\n        ae_params=AutoEncoderParams(\n            resolution=256,\n            in_channels=3,\n            ch=128,\n            out_ch=3,\n            ch_mult=[1, 2, 4, 4],\n            num_res_blocks=2,\n            z_channels=16,\n            scale_factor=0.3611,\n            shift_factor=0.1159,\n        ),\n    ),\n    \"flux-dev-fill\": ModelSpec(\n        repo_id=\"black-forest-labs/FLUX.1-Fill-dev\",\n        repo_flow=\"flux1-fill-dev.safetensors\",\n        repo_ae=\"ae.safetensors\",\n        ckpt_path=os.getenv(\"FLUX_DEV_FILL\"),\n        lora_path=None,\n        params=FluxParams(\n            in_channels=384,\n            out_channels=64,\n            vec_in_dim=768,\n            context_in_dim=4096,\n            hidden_size=3072,\n            mlp_ratio=4.0,\n            num_heads=24,\n            depth=19,\n            depth_single_blocks=38,\n            axes_dim=[16, 56, 56],\n            theta=10_000,\n            qkv_bias=True,\n            guidance_embed=True,\n        ),\n        ae_path=os.getenv(\"AE\"),\n        ae_params=AutoEncoderParams(\n            resolution=256,\n            in_channels=3,\n            ch=128,\n            out_ch=3,\n            ch_mult=[1, 2, 4, 4],\n            num_res_blocks=2,\n            z_channels=16,\n            scale_factor=0.3611,\n            shift_factor=0.1159,\n        ),\n    ),\n}\n\n\ndef print_load_warning(missing: list[str], unexpected: list[str]) -> None:\n    if len(missing) > 0 and len(unexpected) > 0:\n        print(f\"Got {len(missing)} missing keys:\\n\\t\" + \"\\n\\t\".join(missing))\n        print(\"\\n\" + \"-\" * 79 + \"\\n\")\n        print(f\"Got {len(unexpected)} unexpected keys:\\n\\t\" + \"\\n\\t\".join(unexpected))\n    elif len(missing) > 0:\n        print(f\"Got {len(missing)} missing keys:\\n\\t\" + \"\\n\\t\".join(missing))\n    elif len(unexpected) > 0:\n        print(f\"Got {len(unexpected)} unexpected keys:\\n\\t\" + \"\\n\\t\".join(unexpected))\n\n\ndef load_flow_model(\n    name: str, device: str | torch.device = \"cuda\", hf_download: bool = True, verbose: bool = False\n) -> Flux:\n    # Loading Flux\n    print(\"Init model\")\n    ckpt_path = configs[name].ckpt_path\n    lora_path = configs[name].lora_path\n    if (\n        ckpt_path is None\n        and configs[name].repo_id is not None\n        and configs[name].repo_flow is not None\n        and hf_download\n    ):\n        ckpt_path = hf_hub_download(configs[name].repo_id, configs[name].repo_flow)\n\n    with torch.device(\"meta\" if ckpt_path is not None else device):\n        if lora_path is not None:\n            model = FluxLoraWrapper(params=configs[name].params).to(torch.bfloat16)\n        else:\n            model = Flux(configs[name].params).to(torch.bfloat16)\n\n    if ckpt_path is not None:\n        print(\"Loading checkpoint\")\n        # load_sft doesn't support torch.device\n        sd = load_sft(ckpt_path, device=str(device))\n        sd = optionally_expand_state_dict(model, sd)\n        missing, unexpected = model.load_state_dict(sd, strict=False, assign=True)\n        if verbose:\n            print_load_warning(missing, unexpected)\n\n    if configs[name].lora_path is not None:\n        print(\"Loading LoRA\")\n        lora_sd = load_sft(configs[name].lora_path, device=str(device))\n        # loading the lora params + overwriting scale values in the norms\n        missing, unexpected = model.load_state_dict(lora_sd, strict=False, assign=True)\n        if verbose:\n            print_load_warning(missing, unexpected)\n    return model\n\n\ndef load_t5(device: str | torch.device = \"cuda\", max_length: int = 512) -> HFEmbedder:\n    # max length 64, 128, 256 and 512 should work (if your sequence is short enough)\n    return HFEmbedder(\"/root/autodl-tmp/pretrained_models/google/t5-v1_1-xxl\", max_length=max_length, torch_dtype=torch.bfloat16).to(device)\n\n\ndef load_clip(device: str | torch.device = \"cuda\") -> HFEmbedder:\n    return HFEmbedder(\"/root/autodl-tmp/pretrained_models/openai/clip-vit-large-patch14\", max_length=77, torch_dtype=torch.bfloat16).to(device)\n\n\ndef load_ae(name: str, device: str | torch.device = \"cuda\", hf_download: bool = True) -> AutoEncoder:\n    ckpt_path = configs[name].ae_path\n    if (\n        ckpt_path is None\n        and configs[name].repo_id is not None\n        and configs[name].repo_ae is not None\n        and hf_download\n    ):\n        ckpt_path = hf_hub_download(configs[name].repo_id, configs[name].repo_ae)\n\n    # Loading the autoencoder\n    print(\"Init AE\")\n    with torch.device(\"meta\" if ckpt_path is not None else device):\n        ae = AutoEncoder(configs[name].ae_params)\n\n    if ckpt_path is not None:\n        sd = load_sft(ckpt_path, device=str(device))\n        missing, unexpected = ae.load_state_dict(sd, strict=False, assign=True)\n        print_load_warning(missing, unexpected)\n    return ae\n\n\ndef optionally_expand_state_dict(model: torch.nn.Module, state_dict: dict) -> dict:\n    \"\"\"\n    Optionally expand the state dict to match the model's parameters shapes.\n    \"\"\"\n    for name, param in model.named_parameters():\n        if name in state_dict:\n            if state_dict[name].shape != param.shape:\n                print(\n                    f\"Expanding '{name}' with shape {state_dict[name].shape} to model parameter with shape {param.shape}.\"\n                )\n                # expand with zeros:\n                expanded_state_dict_weight = torch.zeros_like(param, device=state_dict[name].device)\n                slices = tuple(slice(0, dim) for dim in state_dict[name].shape)\n                expanded_state_dict_weight[slices] = state_dict[name]\n                state_dict[name] = expanded_state_dict_weight\n\n    return state_dict\n\n\nclass WatermarkEmbedder:\n    def __init__(self, watermark):\n        self.watermark = watermark\n        self.num_bits = len(WATERMARK_BITS)\n        self.encoder = WatermarkEncoder()\n        self.encoder.set_watermark(\"bits\", self.watermark)\n\n    def __call__(self, image: torch.Tensor) -> torch.Tensor:\n        \"\"\"\n        Adds a predefined watermark to the input image\n\n        Args:\n            image: ([N,] B, RGB, H, W) in range [-1, 1]\n\n        Returns:\n            same as input but watermarked\n        \"\"\"\n        image = 0.5 * image + 0.5\n        squeeze = len(image.shape) == 4\n        if squeeze:\n            image = image[None, ...]\n        n = image.shape[0]\n        image_np = rearrange((255 * image).detach().cpu(), \"n b c h w -> (n b) h w c\").numpy()[:, :, :, ::-1]\n        # torch (b, c, h, w) in [0, 1] -> numpy (b, h, w, c) [0, 255]\n        # watermarking libary expects input as cv2 BGR format\n        for k in range(image_np.shape[0]):\n            image_np[k] = self.encoder.encode(image_np[k], \"dwtDct\")\n        image = torch.from_numpy(rearrange(image_np[:, :, :, ::-1], \"(n b) h w c -> n b c h w\", n=n)).to(\n            image.device\n        )\n        image = torch.clamp(image / 255, min=0.0, max=1.0)\n        if squeeze:\n            image = image[0]\n        image = 2 * image - 1\n        return image\n\n\n# A fixed 48-bit message that was chosen at random\nWATERMARK_MESSAGE = 0b001010101111111010000111100111001111010100101110\n# bin(x)[2:] gives bits of x as str, use int to convert them to 0/1\nWATERMARK_BITS = [int(bit) for bit in bin(WATERMARK_MESSAGE)[2:]]\nembed_watermark = WatermarkEmbedder(WATERMARK_BITS)\n"
  },
  {
    "path": "flux-ToCa/src/geneval_flux.py",
    "content": "import argparse\nimport json\nimport os\n\nimport torch\nimport numpy as np\nfrom PIL import Image, ExifTags\nfrom tqdm import tqdm, trange\nfrom einops import rearrange\nfrom torchvision.utils import make_grid\nfrom torchvision.transforms import ToTensor\n\n# --- Imports related to FLUX module ---\nfrom flux.sampling import (\n    denoise_test_FLOPs,\n    get_noise,\n    get_schedule,\n    prepare,\n    unpack,\n)\nfrom flux.ideas import denoise_cache\nfrom flux.util import (\n    embed_watermark,\n    load_ae,\n    load_clip,\n    load_flow_model,\n    load_t5,\n)\nfrom transformers import pipeline\n\n# NSFW threshold (adjustable as needed)\nNSFW_THRESHOLD = 0.85\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description=\"Generate images using the FLUX model within the Geneval framework\")\n    # Required: input JSONL metadata file, each line must contain at least the \"prompt\" key\n    parser.add_argument(\n        \"metadata_file\",\n        type=str,\n        help=\"JSONL file containing metadata for each prompt, each line is a JSON object\"\n    )\n    # FLUX model related parameters\n    parser.add_argument(\n        \"--model_name\",\n        type=str,\n        default=\"flux-schnell\",\n        choices=[\"flux-dev\", \"flux-schnell\"],\n        help=\"FLUX model name\"\n    )\n    parser.add_argument(\n        \"--n_samples\",\n        type=int,\n        default=1,\n        help=\"Number of images to generate per prompt\"\n    )\n    parser.add_argument(\n        \"--steps\",\n        type=int,\n        default=None,\n        help=\"Number of sampling steps (if not specified: 4 for flux-schnell, 50 for flux-dev)\"\n    )\n    parser.add_argument(\n        \"--width\",\n        type=int,\n        default=1360,\n        help=\"Width of the generated image (pixels)\"\n    )\n    parser.add_argument(\n        \"--height\",\n        type=int,\n        default=768,\n        help=\"Height of the generated image (pixels)\"\n    )\n    parser.add_argument(\n        \"--guidance\",\n        type=float,\n        default=3.5,\n        help=\"Conditional guidance scale\"\n    )\n    parser.add_argument(\n        \"--seed\",\n        type=int,\n        default=42,\n        help=\"Random seed\"\n    )\n    parser.add_argument(\n        \"--batch_size\",\n        type=int,\n        default=1,\n        help=\"Number of samples per batch during image generation\"\n    )\n    # Output related parameters\n    parser.add_argument(\n        \"--output_dir\",\n        type=str,\n        default=\"outputs\",\n        help=\"Output directory to save the generated results\"\n    )\n    parser.add_argument(\n        \"--skip_grid\",\n        action=\"store_true\",\n        help=\"Skip saving the overall grid image\"\n    )\n    # Other options\n    parser.add_argument(\n        \"--add_sampling_metadata\",\n        action=\"store_true\",\n        help=\"Add the prompt text to the metadata of the generated images\"\n    )\n    parser.add_argument(\n        \"--use_nsfw_filter\",\n        action=\"store_true\",\n        help=\"Enable NSFW content filtering (requires downloading the relevant model)\"\n    )\n    parser.add_argument(\n        \"--test_FLOPs\",\n        action=\"store_true\",\n        help=\"Test inference FLOPs only (no images will be generated)\"\n    )\n    return parser.parse_args()\n\n\ndef main(args):\n    # Read the metadata file, each line is a JSON object (must contain at least the \"prompt\" field)\n    with open(args.metadata_file, \"r\", encoding=\"utf-8\") as fp:\n        metadatas = [json.loads(line) for line in fp if line.strip()]\n\n    # Set device\n    device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n\n    # If NSFW filtering is enabled, load the corresponding classifier (please modify the model path or name accordingly)\n    if args.use_nsfw_filter:\n        nsfw_classifier = pipeline(\n            \"image-classification\",\n            model=\"/path/to/your/nsfw_model\",  # Please replace with the actual NSFW model path\n            device=0 if torch.cuda.is_available() else -1\n        )\n    else:\n        nsfw_classifier = None\n\n    # If sampling steps are not specified, set default steps based on the model name\n    if args.steps is None:\n        args.steps = 4 if args.model_name == \"flux-schnell\" else 50\n\n    # Ensure the image width and height are multiples of 16 (required by FLUX)\n    args.width = 16 * (args.width // 16)\n    args.height = 16 * (args.height // 16)\n\n    # Load FLUX model components onto the device (T5, CLIP, Flow model, autoencoder)\n    t5 = load_t5(device, max_length=256 if args.model_name == \"flux-schnell\" else 512)\n    clip = load_clip(device)\n    model = load_flow_model(args.model_name, device=device)\n    ae = load_ae(args.model_name, device=device)\n\n    # Generate results for each prompt:\n    # Each prompt corresponds to a subfolder (e.g., outputs/00000/), inside which samples and (optionally) a grid image grid.png are saved,\n    # along with the prompt's metadata saved in a metadata.jsonl file.\n    for idx, metadata in enumerate(metadatas):\n        prompt = metadata.get(\"prompt\", \"\")\n        print(f\"Processing prompt {idx + 1}/{len(metadatas)}: '{prompt}'\")\n\n        # Define output directory and samples directory\n        outpath = os.path.join(args.output_dir, f\"{idx:05d}\")\n        sample_path = os.path.join(outpath, \"samples\")\n\n        # If the output directory already exists, check the number of PNG files already in the samples folder\n        existing_samples = []\n        sample_count = 0\n        if os.path.exists(sample_path):\n            files = sorted(\n                fname for fname in os.listdir(sample_path)\n                if fname.endswith(\".png\") and fname != \"grid.png\"\n            )\n            sample_count = len(files)\n            # Load existing images (to be used later for generating the grid image)\n            for fname in files:\n                full_path = os.path.join(sample_path, fname)\n                try:\n                    img = Image.open(full_path).convert(\"RGB\")\n                    existing_samples.append(ToTensor()(img))\n                except Exception as e:\n                    print(f\"Failed to read existing image {full_path}: {e}\")\n\n        # If the number of generated images is sufficient, skip generation\n        if sample_count >= args.n_samples:\n            print(f\"Samples for prompt {idx + 1} already exist ({sample_count} images), skipping generation.\")\n            continue\n\n        # Create output directory and samples subdirectory\n        os.makedirs(outpath, exist_ok=True)\n        os.makedirs(sample_path, exist_ok=True)\n        # Save the current prompt's metadata to metadata.jsonl\n        with open(os.path.join(outpath, \"metadata.jsonl\"), \"w\", encoding=\"utf-8\") as fp:\n            json.dump(metadata, fp)\n\n        # Initialize: use the number of existing images as the starting count, and copy existing samples for later grid generation\n        local_index = sample_count\n        all_samples = existing_samples.copy()\n        # The initial value of the progress bar is the number of existing samples\n        pbar = tqdm(total=args.n_samples, initial=sample_count, desc=\"Sampling\")\n\n        # For the current prompt, only generate the missing images\n        while local_index < args.n_samples:\n            current_bs = min(args.batch_size, args.n_samples - local_index)\n            # Set seed for the current batch (using the number of images already present in the prompt as offset)\n            seed = args.seed + local_index\n            # Generate random noise\n            x = get_noise(current_bs, args.height, args.width, device=device, dtype=torch.bfloat16, seed=seed)\n            prompt_list = [prompt] * current_bs\n            # Prepare input (prompt encoding, initial image noise, etc.)\n            inp = prepare(t5, clip, x, prompt=prompt_list)\n            # Compute denoising schedule based on the input shape (note: the second parameter is the number of latent channels)\n            timesteps = get_schedule(args.steps, inp[\"img\"].shape[1], shift=(args.model_name != \"flux-schnell\"))\n\n            with torch.no_grad():\n                if args.test_FLOPs:\n                    latent = denoise_test_FLOPs(model, **inp, timesteps=timesteps, guidance=args.guidance)\n                else:\n                    latent = denoise_cache(model, **inp, timesteps=timesteps, guidance=args.guidance)\n                # Unpack latent to a shape suitable for the decoder input\n                latent = unpack(latent.float(), args.height, args.width)\n                # Decode to image with automatic mixed precision\n                with torch.autocast(device_type=device.type, dtype=torch.bfloat16):\n                    decoded = ae.decode(latent)\n\n            # Post-processing: clamp, embed watermark, and rearrange to [B, H, W, C] format\n            decoded = decoded.clamp(-1, 1)\n            decoded = embed_watermark(decoded.float())\n            images_tensor = rearrange(decoded, \"b c h w -> b h w c\")\n\n            # Iterate over each generated image in the current batch\n            for i in range(current_bs):\n                img_array = (127.5 * (images_tensor[i] + 1.0)).cpu().numpy().astype(np.uint8)\n                img = Image.fromarray(img_array)\n                # NSFW filtering (if enabled)\n                if nsfw_classifier is not None:\n                    nsfw_result = nsfw_classifier(img)\n                    nsfw_score = next((res[\"score\"] for res in nsfw_result if res[\"label\"] == \"nsfw\"), 0.0)\n                else:\n                    nsfw_score = 0.0\n\n                if nsfw_score < NSFW_THRESHOLD:\n                    # Add sampling metadata (EXIF info); note: PNG format may not fully support EXIF\n                    if args.add_sampling_metadata:\n                        exif_data = Image.Exif()\n                        exif_data[ExifTags.Base.Software] = \"AI generated;txt2img;flux\"\n                        exif_data[ExifTags.Base.Make] = \"Black Forest Labs\"\n                        exif_data[ExifTags.Base.Model] = args.model_name\n                        exif_data[ExifTags.Base.ImageDescription] = prompt\n                    else:\n                        exif_data = None\n\n                    sample_fname = os.path.join(sample_path, f\"{local_index:05d}.png\")\n                    if exif_data is not None:\n                        img.save(sample_fname, exif=exif_data)\n                    else:\n                        img.save(sample_fname)\n                    all_samples.append(ToTensor()(img))\n                else:\n                    print(\"The generated image may contain inappropriate content and has been skipped.\")\n                local_index += 1\n                pbar.update(1)\n            # end for current batch\n        pbar.close()\n\n        # If grid generation is not skipped and there is at least one sample, create and save a grid image (consistent with Geneval format)\n        if not args.skip_grid and len(all_samples) > 0:\n            grid_tensor = torch.stack(all_samples, 0)\n            grid = make_grid(grid_tensor, nrow=args.batch_size)\n            grid = 255.0 * rearrange(grid, \"c h w -> h w c\").cpu().numpy()\n            grid_img = Image.fromarray(grid.astype(np.uint8))\n            grid_img.save(os.path.join(outpath, \"grid.png\"))\n    # end for each prompt\n\n    print(\"Generation completed.\")\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n    main(args)\n\n'''\npython src/geneval_flux.py /root/geneval/prompts/evaluation_metadata.jsonl --model_name flux-dev --n_samples 4 --steps 50 --width 1024 --height 1024 --seed 42 --output_dir /root/autodl-tmp/samples/geneval_original --batch_size 1\n'''\n"
  },
  {
    "path": "flux-ToCa/src/sample.py",
    "content": "import os\nimport re\nimport time\nfrom dataclasses import dataclass\nfrom glob import iglob\n\nimport torch\nfrom einops import rearrange\nfrom PIL import ExifTags, Image\nfrom transformers import pipeline\nfrom tqdm import tqdm\n\nfrom flux.sampling import denoise, get_noise, get_schedule, prepare, unpack, denoise_test_FLOPs\nfrom flux.ideas import denoise_cache\nfrom flux.util import configs, embed_watermark, load_ae, load_clip, load_flow_model, load_t5\n\nNSFW_THRESHOLD = 0.85  # NSFW score threshold\n\n\n@dataclass\nclass SamplingOptions:\n    prompts: list[str]          # List of prompts\n    width: int                  # Image width\n    height: int                 # Image height\n    num_steps: int              # Number of sampling steps\n    guidance: float             # Guidance value\n    seed: int | None            # Random seed\n    num_images_per_prompt: int  # Number of images generated per prompt\n    batch_size: int             # Batch size (number of prompts per batch)\n    model_name: str             # Model name\n    output_dir: str             # Output directory\n    add_sampling_metadata: bool # Whether to add metadata\n    use_nsfw_filter: bool       # Whether to enable NSFW filter\n    test_FLOPs: bool            # Whether in FLOPs testing mode (in which case no images are generated)\n\n\ndef main(opts: SamplingOptions):\n    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n\n    # Optional NSFW classifier\n    if opts.use_nsfw_filter:\n        nsfw_classifier = pipeline(\n            \"image-classification\",\n            model=\"/root/autodl-tmp/pretrained_models/Falconsai/nsfw_image_detection\",\n            device=device\n        )\n    else:\n        nsfw_classifier = None\n\n    # Load model\n    model_name = opts.model_name\n    if model_name not in configs:\n        available = \", \".join(configs.keys())\n        raise ValueError(f\"Unknown model name: {model_name}, available: {available}\")\n\n    if opts.num_steps is None:\n        opts.num_steps = 4 if model_name == \"flux-schnell\" else 50\n\n    # Ensure width and height are multiples of 16\n    opts.width = 16 * (opts.width // 16)\n    opts.height = 16 * (opts.height // 16)\n\n    # Set output directory and index\n    output_name = os.path.join(opts.output_dir, f\"img_{{idx}}.jpg\")\n    if not os.path.exists(opts.output_dir):\n        os.makedirs(opts.output_dir)\n    idx = 0  # Image index\n\n    # Initialize model components\n    torch_device = device\n\n    # Load T5 and CLIP models onto GPU\n    t5 = load_t5(torch_device, max_length=256 if model_name == \"flux-schnell\" else 512)\n    clip = load_clip(torch_device)\n\n    # Load model onto GPU\n    model = load_flow_model(model_name, device=torch_device)\n    ae = load_ae(model_name, device=torch_device)\n\n    # Set random seed\n    if opts.seed is not None:\n        base_seed = opts.seed\n    else:\n        base_seed = torch.randint(0, 2**32, (1,)).item()\n\n    prompts = opts.prompts\n\n    total_images = len(prompts) * opts.num_images_per_prompt\n    progress_bar = tqdm(total=total_images, desc=\"Generating images\")\n\n    # Calculate number of prompt batches\n    num_prompt_batches = (len(prompts) + opts.batch_size - 1) // opts.batch_size\n\n    for batch_idx in range(num_prompt_batches):\n        prompt_start = batch_idx * opts.batch_size\n        prompt_end = min(prompt_start + opts.batch_size, len(prompts))\n        batch_prompts = prompts[prompt_start:prompt_end]\n        num_prompts_in_batch = len(batch_prompts)\n\n        # For each prompt, generate the corresponding number of images\n        for image_idx in range(opts.num_images_per_prompt):\n            # Prepare random seed\n            seed = base_seed + idx  # Set a different seed for each image\n            idx += num_prompts_in_batch  # Update image index\n\n            # Prepare input\n            batch_size = num_prompts_in_batch\n            x = get_noise(\n                batch_size,\n                opts.height,\n                opts.width,\n                device=torch_device,\n                dtype=torch.bfloat16,\n                seed=seed,\n            )\n\n            # Prepare prompts\n            # batch_prompts is a list containing the prompts for the current batch\n            inp = prepare(t5, clip, x, prompt=batch_prompts)\n            timesteps = get_schedule(opts.num_steps, inp[\"img\"].shape[1], shift=(model_name != \"flux-schnell\"))\n            \n            # Denoise\n            with torch.no_grad():\n                if opts.test_FLOPs:\n                    x = denoise_test_FLOPs(model, **inp, timesteps=timesteps, guidance=opts.guidance)\n                else:\n                    x = denoise_cache(model, **inp, timesteps=timesteps, guidance=opts.guidance)\n\n                # Decode latent variables\n                x = unpack(x.float(), opts.height, opts.width)\n                with torch.autocast(device_type=torch_device.type, dtype=torch.bfloat16):\n                    x = ae.decode(x)\n\n            # Convert to PIL format and save\n            x = x.clamp(-1, 1)\n            x = embed_watermark(x.float())\n            x = rearrange(x, \"b c h w -> b h w c\")\n\n            for i in range(batch_size):\n                img_array = x[i]\n                img = Image.fromarray((127.5 * (img_array + 1.0)).cpu().byte().numpy())\n\n                # Optional NSFW filtering\n                if opts.use_nsfw_filter:\n                    nsfw_result = nsfw_classifier(img)\n                    nsfw_score = next((res[\"score\"] for res in nsfw_result if res[\"label\"] == \"nsfw\"), 0.0)\n                else:\n                    nsfw_score = 0.0  # If filter is not enabled, consider safe\n\n                if nsfw_score < NSFW_THRESHOLD:\n                    exif_data = Image.Exif()\n                    exif_data[ExifTags.Base.Software] = \"AI generated;txt2img;flux\"\n                    exif_data[ExifTags.Base.Make] = \"Black Forest Labs\"\n                    exif_data[ExifTags.Base.Model] = model_name\n                    if opts.add_sampling_metadata:\n                        exif_data[ExifTags.Base.ImageDescription] = batch_prompts[i]\n                    # Save image\n                    fn = output_name.format(idx=idx - num_prompts_in_batch + i)\n                    img.save(fn, exif=exif_data, quality=95, subsampling=0)\n                else:\n                    print(f\"The generated image may contain inappropriate content and has been skipped.\")\n\n                progress_bar.update(1)\n\n    progress_bar.close()\n\n\ndef read_prompts(prompt_file: str):\n    with open(prompt_file, 'r', encoding='utf-8') as f:\n        prompts = [line.strip() for line in f if line.strip()]\n    return prompts\n\n\ndef app():\n    import argparse\n\n    parser = argparse.ArgumentParser(description=\"Generate images using the flux model.\")\n    parser.add_argument('--prompt_file', type=str, required=True, help='Path to the prompt text file.')\n    parser.add_argument('--width', type=int, default=1360, help='Width of the generated image.')\n    parser.add_argument('--height', type=int, default=768, help='Height of the generated image.')\n    parser.add_argument('--num_steps', type=int, default=None, help='Number of sampling steps.')\n    parser.add_argument('--guidance', type=float, default=3.5, help='Guidance value.')\n    parser.add_argument('--seed', type=int, default=0, help='Random seed.')\n    parser.add_argument('--num_images_per_prompt', type=int, default=1, help='Number of images generated per prompt.')\n    parser.add_argument('--batch_size', type=int, default=1, help='Batch size (number of prompts per batch).')\n    parser.add_argument('--model_name', type=str, default='flux-schnell', choices=['flux-dev', 'flux-schnell'], help='Model name.')\n    parser.add_argument('--output_dir', type=str, default='/root/autodl-tmp/samples', help='Directory to save images.')\n    parser.add_argument('--add_sampling_metadata', action='store_true', help='Whether to add prompts to image metadata.')\n    parser.add_argument('--use_nsfw_filter', action='store_true', help='Enable NSFW filter.')\n    parser.add_argument('--test_FLOPs', action='store_true', help='Test inference FLOPs.')\n\n    args = parser.parse_args()\n\n    prompts = read_prompts(args.prompt_file)\n\n    opts = SamplingOptions(\n        prompts=prompts,\n        width=args.width,\n        height=args.height,\n        num_steps=args.num_steps,\n        guidance=args.guidance,\n        seed=args.seed,\n        num_images_per_prompt=args.num_images_per_prompt,\n        batch_size=args.batch_size,\n        model_name=args.model_name,\n        output_dir=args.output_dir,\n        add_sampling_metadata=args.add_sampling_metadata,\n        use_nsfw_filter=args.use_nsfw_filter,\n        test_FLOPs=args.test_FLOPs,\n    )\n\n    main(opts)\n\n\nif __name__ == '__main__':\n    app()\n"
  }
]